BillyHW Posted December 31, 2013 Share Posted December 31, 2013 Has anyone ever tried to run some of the modern CPU benchmarks (like SPECint and SPECfp) on the classic 8-bit (6502, Z80) and 16-bit (68000, 65816) CPUs? My googling has found nothing. (Only thing I could find was a rating for the 68060.) How hard would it be to set something like that up? Quote Link to comment Share on other sites More sharing options...
JamesD Posted January 1, 2014 Share Posted January 1, 2014 Those benchmarks are pretty much targeted at modern CPU features that I doubt anyone will ever use on 8 or 16 bit CPUs. I've never seen it done and I have my doubts about this ever happening without modifying the benchmarks heavily since they were aimed at 32/64 bit cpus with floating point and modern optimizing compilers. How hard is it?First you have to get the C/C++ benchmarks to compile which would be no easy task in itself. I doubt most 8 bit C compilers would even support the code.The benchmarks themselves might not even fit in 64K of RAM.Then I'm pretty sure at least some of those benchmarks take more RAM for data than 8 bit CPUs can address.I think one of the benchmarks deals with close to 100,000 data points (3D?). Even if each data point isn't 3D (x,y & z floating point numbers) you are pretty much out of luck right there.FWIW, those benchmarks are aimed at similar hardware with modern compilers.For different CPUs you'd have different compilers, different floating point implementations, etc... and you aren't just benchmarking the CPUs, you are benchmarking the compilers.Even switching compilers on the same CPU would change results as would changing floating point library implementations.Someone could certainly come up with a similar benchmark suite for old CPUs but experience tells me it will devolve into endless tweaking of the code for better results and you'll end up with something that would never be used outside of a demo... that or people will complain the benchmark is biased and you'll never hear the end of it. 1 Quote Link to comment Share on other sites More sharing options...
BillyHW Posted January 1, 2014 Author Share Posted January 1, 2014 Those benchmarks are pretty much targeted at modern CPU features that I doubt anyone will ever use on 8 or 16 bit CPUs. I've never seen it done and I have my doubts about this ever happening without modifying the benchmarks heavily since they were aimed at 32/64 bit cpus with floating point and modern optimizing compilers. How hard is it? First you have to get the C/C++ benchmarks to compile which would be no easy task in itself. I doubt most 8 bit C compilers would even support the code. The benchmarks themselves might not even fit in 64K of RAM. Then I'm pretty sure at least some of those benchmarks take more RAM for data than 8 bit CPUs can address. I think one of the benchmarks deals with close to 100,000 data points (3D?). Even if each data point isn't 3D (x,y & z floating point numbers) you are pretty much out of luck right there. FWIW, those benchmarks are aimed at similar hardware with modern compilers. For different CPUs you'd have different compilers, different floating point implementations, etc... and you aren't just benchmarking the CPUs, you are benchmarking the compilers. Even switching compilers on the same CPU would change results as would changing floating point library implementations. Someone could certainly come up with a similar benchmark suite for old CPUs but experience tells me it will devolve into endless tweaking of the code for better results and you'll end up with something that would never be used outside of a demo... that or people will complain the benchmark is biased and you'll never hear the end of it. Oh boo. What I'd like is some sort of objective measurement to compare the speeds of various processors--a reason for saying that a 6502 clocked at x MHz is *roughly* equivalent to a Z80 clocked at 2x MHz, say, and similarly for the other architectures. Would MIPS be a reasonably fair measure? What if I'm trying to compare the 8-bit 6502 to the 16-bit 68000? Quote Link to comment Share on other sites More sharing options...
Kyle22 Posted January 1, 2014 Share Posted January 1, 2014 Make a C program that is compatible with cc65 on the Atari, and BDS-C(Z) on a 4 Mhz. CP/M (Z)System. Make it long and complicated, use large floating point numbers, the only requirements would be 1: no graphics (difficult, or impossible on many CP/M systems) and 2: Make it small enough to fit into the AVERAGE users memory. Most Ataris under SDX, can operate with a LOMEM under $1000, and most (modern) CP/M systems have a TPA of 55-61K typically. I know only enough C language to be dangerous, but, I think that would be the best comparison test. Quote Link to comment Share on other sites More sharing options...
JamesD Posted January 1, 2014 Share Posted January 1, 2014 Oh boo. What I'd like is some sort of objective measurement to compare the speeds of various processors--a reason for saying that a 6502 clocked at x MHz is *roughly* equivalent to a Z80 clocked at 2x MHz, say, and similarly for the other architectures. Would MIPS be a reasonably fair measure? What if I'm trying to compare the 8-bit 6502 to the 16-bit 68000? MIPS is pretty much meaningless when comparing different architectures (6502 vs Z80). It takes a different number of instructions to do the same work on different architectures. Quote Link to comment Share on other sites More sharing options...
Kyle22 Posted January 1, 2014 Share Posted January 1, 2014 yes, it does, but the real difference is clock cycles per instruction. Quote Link to comment Share on other sites More sharing options...
JamesD Posted January 1, 2014 Share Posted January 1, 2014 (edited) Make a C program that is compatible with cc65 on the Atari, and BDS-C(Z) on a 4 Mhz. CP/M (Z)System. Make it long and complicated, use large floating point numbers, the only requirements would be 1: no graphics (difficult, or impossible on many CP/M systems) and 2: Make it small enough to fit into the AVERAGE users memory. Most Ataris under SDX, can operate with a LOMEM under $1000, and most (modern) CP/M systems have a TPA of 55-61K typically. I know only enough C language to be dangerous, but, I think that would be the best comparison test. So cc65 supports floats now? When did that happen? <edit> BTW, I suggest SDCC for the Z80. lcc65 might generate faster code for the 6502 than cc65 but I've never looked at the output of the two side by side. Edited January 1, 2014 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted January 1, 2014 Share Posted January 1, 2014 yes, it does, but the real difference is clock cycles per instruction. Well, with Z80 assembly code being a bit smaller than 6502 code the "real difference" as you call it isn't quite so clear. Dealing with 16 bit or larger integers can make quite a difference on code size and speed with the 6502. Also, the 6502 doesn't support compilers as well so the Z80 would get quite a boost from any C benchmark but hand written assembly would be more favorable to the 6502. Quote Link to comment Share on other sites More sharing options...
Kyle22 Posted January 1, 2014 Share Posted January 1, 2014 You can use the OS routines ( try FastChip) in cc65, and BDS-C has "The Incredible Superpowerful Floating Point Package". SDCC is a cross compiler. I am trying to keep things as real as possible. That's why I didn't suggest Deep Blue C. Quote Link to comment Share on other sites More sharing options...
Kyle22 Posted January 1, 2014 Share Posted January 1, 2014 yes, assembly is always faster, but most people don't like to write large assembly programs. However, in this case, it shouldn't be too difficult to write a simple assembly program to challenge the CPU. Just figure out how to read the clock on each system, so we can have a number to compare when it's done. or: How about this? It doesn't matter what language you use, it is fine if you keyed it in on your front panel switches.... The only requirement is: do the same math calculations (and other cpu stressing routines) on each machine. However you write it, that's ok. It must only perform the same functions as the program on the other machine. Who would be faster then? Quote Link to comment Share on other sites More sharing options...
Rybags Posted January 1, 2014 Share Posted January 1, 2014 A big difference is that most older CPUs have little or no support for maths operations beyond simple binary stuff. And such operations form the basis for many benchmarks. Even the way FP is expressed is different, the old gear mostly used BCD with exponent component in binary, modern day uses binary for both. Using high-level language or even C won't help a lot - some might call OS routines which can be variable (e.g. the factory Atari 8-bit Floating Point in ROM is spectacularly inefficient and slow), and even the same branded compiler could have total different algorithms on different machines even if the target CPU is the same. In fairness a benchmark to compare old to new should allow optimisations beyond what you might expect, e.g. larger floating point routines with table-based assistance for calculations. Insofar as MIPS are concerned - a big grey area there. Not only do different CPUs take different # of cycles per instruction but there's also a big difference in just what can be done in a single instruction. Even 6502 vs 68000 which aren't that far apart in years - the 68000 can in a pair of instructions move 60 bytes worth of data from one area of memory to another (MOVEM.L) where with the 6502 you'd need a loop using about 32 times as many instruction executions. MFLOPS is probably a "fairer" measure, but then you want to have the same problems being solved. All that said, for a 6502 probably divide the clock speed by 10 to get a comparitive measure of MIPS. Divide probably by 100 or more for MFLOPS. 1 Quote Link to comment Share on other sites More sharing options...
JamesD Posted January 1, 2014 Share Posted January 1, 2014 (edited) You can use the OS routines ( try FastChip) in cc65, and BDS-C has "The Incredible Superpowerful Floating Point Package". SDCC is a cross compiler. I am trying to keep things as real as possible. That's why I didn't suggest Deep Blue C. Can you use the OS routines by just declaring something as a float? It's still an Apples to Oranges type of comparison if you aren't using the same floating point format, number of digits and algorithm. The Apple II has had much faster trig operations implemented on it in recent years and it was faster than the Atari to begin with. You pretty much have to implement identical floating point results for a benchmark to be meaningful. Otherwise you might as well just perform benchmarks in BASIC. I'm trying to understand how SDCC isn't "real". Most people use cross compilers and cross assemblers these days and even Atari used development systems with cross development tools back in the day. Deep Blue C generates a P-Code like language that is interpreted. How would that be a fair comparison vs a compiler that generates native code? If you want to go that route, you could use UCSD Pascal. Then at least you are running the same P-Code from the same compiler. UCSD Pascal was available for the Apple II, Amiga, PC and CP/M. Source code is available and it could certainly run on an Atari. FWIW, UCSD Pascal wasn't known for generating great code but it was highly portable. <edit> If what I read is correct, the "C-Code" generated by Deep Blue C is actually 8080 code. If so, the Z80 could run that directly. Edited January 1, 2014 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted January 1, 2014 Share Posted January 1, 2014 yes, assembly is always faster, but most people don't like to write large assembly programs. However, in this case, it shouldn't be too difficult to write a simple assembly program to challenge the CPU. Just figure out how to read the clock on each system, so we can have a number to compare when it's done. or: How about this? It doesn't matter what language you use, it is fine if you keyed it in on your front panel switches.... The only requirement is: do the same math calculations (and other cpu stressing routines) on each machine. However you write it, that's ok. It must only perform the same functions as the program on the other machine. Who would be faster then? I think a common algorithm is in order right down to the floating point format. Quote Link to comment Share on other sites More sharing options...
The Usotsuki Posted January 1, 2014 Share Posted January 1, 2014 inb4 BogoMIPS Quote Link to comment Share on other sites More sharing options...
Rybags Posted January 1, 2014 Share Posted January 1, 2014 Common algorithm - with machines with alike CPUs it's only really necessary to run on the one. e.g. run on Atari 8-bit, you can then fairly accurately say how quickly it'll run on Apple 2, C64. Simple calculation based on clock differences and cycle steal overheads of each machine. Problems can arise in that certain methods will favour one CPU but penalize another. That can occur both ways - in some cases one CPU might need multiple instructions to accomplish what another does in one. On the other hand, if a common algorithm dictates that smaller chunks of data are operated on at a time than the CPU could otherwise handle then that becomes a handicap. Quote Link to comment Share on other sites More sharing options...
JamesD Posted January 1, 2014 Share Posted January 1, 2014 inb4 BogoMIPS I'll have to look that one up. Several small benchmarks that you might actually do on an 8 bit would get my vote and each could receive some sort of score. Some possibilities: Translation, rotation and scaling of some data points. Drawing a bresenham line in a 2 bit or 4 bit per pixel bitmap. The computer doesn't even need to support the video mode. No optimizations for different angles, it's about the same loops and memory manipulation operations. A 256x192 and/or 320x200 bitmap would be reasonable since both were common and should fit in memory. Maybe draw a 3D object on the previously mentioned bitmap, rotate it, scale it and make it move so far without worrying about frame rates. Blitting some 2D objects onto a bitmap, do some logical operations with the background such as masking, etc... Use 8x8 and 16x16 sprites. No hardware sprites, it'a bout the CPUs. Draw some text on a graphics screen. Some simple string manipulation, searching and sorting. Seive of Erathosthenes (I hope that's spelled right) Etc... Quote Link to comment Share on other sites More sharing options...
JamesD Posted January 1, 2014 Share Posted January 1, 2014 Common algorithm - with machines with alike CPUs it's only really necessary to run on the one. e.g. run on Atari 8-bit, you can then fairly accurately say how quickly it'll run on Apple 2, C64. Simple calculation based on clock differences and cycle steal overheads of each machine. Problems can arise in that certain methods will favour one CPU but penalize another. That can occur both ways - in some cases one CPU might need multiple instructions to accomplish what another does in one. On the other hand, if a common algorithm dictates that smaller chunks of data are operated on at a time than the CPU could otherwise handle then that becomes a handicap. While I agree for the most part, this won't demonstrate the penalty for things like the Apple II's horrid graphics setup, using display lists on the Atari, etc... It also doesn't show flaws like lack of a vertical black interrupt (Apple II, Oric, etc...) It is purely CPU efficiency on that hardware. Quote Link to comment Share on other sites More sharing options...
JamesD Posted January 2, 2014 Share Posted January 2, 2014 FWIW, I found the following SIEVE results embedded in the source to a UCSD Pascal version of the benchmark: Sage II IV.1 57 (68000 at 8 MHz) WD uEngine III.0 59 (fillchar is so slow on uE) LSI-11/23 IV.01 92-122 (depends on memory speed) LSI-11/23 II.0 105 (98 seconds under IV.01) LSI-11/23 IV.1 107 (non-extended memory) LSI-11/23 IV.1 128 (extended memory) NEC APC IV.1 144 8086 at 4.9 Mhz extended memory JONOS IV.03 ? 162 (pretty good for a 4 MHz Z-80A) NorthStar I.5 183 (Z-80 at 4 MHz) OSI C8P-DF II.0 ? 197 (6502 at 2 MHz) H-89 II.0 200 (4 MHz Z-80A) LSI-11/2 IV.0 202 IBM PC IV.03 203 (4.77 MHz 8088) LSI-11/2 II.0 220 Apple ][ II.1 390 (1 MHz 6502) H-89 II.0 455 (2 MHz Z-80) Quote Link to comment Share on other sites More sharing options...
Classic Pac Posted January 2, 2014 Share Posted January 2, 2014 I think you just did something no one else ever thought of. Quote Link to comment Share on other sites More sharing options...
JamesD Posted January 2, 2014 Share Posted January 2, 2014 FWIW, the only Z80 P-Code interpreter I've found source code for so far is actually for the 8080 and I *think* it was for the H-89 which would explain why it's so much slower than the Apple II. Quote Link to comment Share on other sites More sharing options...
Keatah Posted January 3, 2014 Share Posted January 3, 2014 I think this is way too complicated. You ask the CPU to solve a problem. Use the most efficient means possible. Compare the end results. Quote Link to comment Share on other sites More sharing options...
JamesD Posted January 3, 2014 Share Posted January 3, 2014 (edited) I think this is way too complicated. You ask the CPU to solve a problem. Use the most efficient means possible. Compare the end results. None of the things I suggested are that complex and they were just suggestions. I've actually written code to do everything on that list in the past except for the SIEVE which has source code published all over the place. Modern benchmarks are large and complicated to keep people from coming up with ways to get good results on benchmarks that aren't reflected in real world apps. I merely tried to come up with a variety of suggested tests that could give an overall speed comparison that wouldn't favor one CPU over another and would discourage cheating. http://www.programming-techniques.com/2012/03/3d-transformation-translation-rotation.html https://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes http://www.zentut.com/c-tutorial/c-quicksort-algorithm/ Blitting isn't exactly difficult. There is existing graphics text code out there for several processors. String manipulation isn't exactly rocket science. Why is this too complicated? Edited January 3, 2014 by JamesD Quote Link to comment Share on other sites More sharing options...
thealgorithm Posted January 9, 2014 Share Posted January 9, 2014 Benchmarking would not be ideal for different cpu architectures. Would be a different case if they were all based on the same command set (eg arm v7, x86 etc) Quote Link to comment Share on other sites More sharing options...
JamesD Posted January 9, 2014 Share Posted January 9, 2014 Benchmarking would not be ideal for different cpu architectures. Would be a different case if they were all based on the same command set (eg arm v7, x86 etc) Making a CPU do the same operations and comparing the results won't be ideal? Why? Quote Link to comment Share on other sites More sharing options...
thealgorithm Posted January 10, 2014 Share Posted January 10, 2014 This is dependant on how the benchmark routines are coded. Using the same command set and routine (on same architecture of cpu) would indicate any efficiency in the amount of cycles it takes for each opcode etc. If the benchmark routine for a different architecture of cpu is written differently however, this may not be optimal and the benchmark results would be flawed Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.