Benchmarking Classic CPUs

JamesD · January 10, 2014

This is dependant on how the benchmark routines are coded. Using the same command set and routine (on same architecture of cpu) would indicate any efficiency in the amount of cycles it takes for each opcode etc.

I'm not exactly sure what you mean by command set. You refer to same CPU architecture but then compare cycles for opcodes.

On the same architecture you'd run the same code and on a different architecture you'd run equivalent code.

Comparing clock cycles per instruction is the approach I've seen countless 6502 fans try to use.

This drastically skews the results in favor of the 6502 in most of the comparisons I've seen because most comparisons don't deal with 16 bit manipulation and when it is used it's often limited. The number of pointers and variables being manipulated is often limited as well. When doing this, results can make the 6502 appear to be over 3 times faster at the same MHz as the Z80.

On the other hand, you can't call a benchmark that always deals with 16 bits fair because it will adversely impact 6502 results and favor CPUs that deal with 16 bits better. This is why I suggest a variety of benchmarks. I intentionally chose some that the 6502 would perform very well and some CPUs that support 16 bits would perform well.

Many benchmarks for modern hardware do the same thing, they give a score for different categories. I suggested several benchmarks involving bit manipulation, math (assuming identical algorithms and accuracy for floats) and other things an 8 bit would realistically be asked to do. At least one benchmark totally written in C is important because it can contrast differences in how well different CPUs support high level languages.

The SIEVE results I found in the UCSD Pascal code are important because they were conducted on optimized P-Machines for each CPU running the exact same code. The CPU has to perform the same exact work. The results varied but if you look at them, the ratio between the 6502 and Z80 speed difference is more like 2:1 on average or less if you take the fastest Z80 version. Some people will try to argue that their favorite CPU's virtual machine probably wasn't optimal, but emulating these P-Code instructions is pretty simple so I doubt you'll get back to a 3:1 clock ratio through optimizing.

One flaw in the UCSD Pascal SIEVE benchmark is that UCSD Pascal runs identically on every platform and it was originally designed for mainframes or mini-computers. It could possibly be tuned to work on 8 bit CPUs better. I believe BASIC-09 for the CoCo offers a much greater speed advantage over Microsoft BASIC than Apple Pascal (modified UCSD) does even though it uses a similar virtual machine.

If the benchmark routine for a different architecture of cpu is written differently however, this may not be optimal and the benchmark results would be flawed

You have to use the same algorithm but with common sense optimizations for each CPU. It would not make sense to draw lines on the screen and allow the code for one CPU to have special cases for different angles but not use the same optimizations for other CPUs.

thealgorithm · January 10, 2014

It all boils down to how the routines are written for each architecture. An algorithm written for one CPU may not run efficiently on another type of CPU and wise versa. Again creating different methods for each CPU would not be ideal either.

For example, if a CPU cannot calculate in 16bits and requires multiple instructions in order to achieve what a cpu (that can operate in 16bits), The speed is dependant on how this routine is put together.

Certainly it would still be an approximation and i guess nothing can be done about this. The point i am trying to make here is that a benchmark which runs on the same command set would be more accurate (Such as x86 vs x86, arm vs arm etc)

JamesD · January 10, 2014

Certainly it would still be an approximation and i guess nothing can be done about this. The point i am trying to make here is that a benchmark which runs on the same command set would be more accurate (Such as x86 vs x86, arm vs arm etc)

6502 vs 6502 and Z80 vs Z80 would certainly demonstrate differences in efficiency between machine implementations.

In my suggestion the graphics were to a buffer that you treat like a bitmap because I wanted to compare different CPUs.

Once you start comparing architectures based on the same CPU then you have other factors. If you are actually drawing to a screen, a machine based on a 9918 VDP isn't going to let the CPU manipulated the screen as fast as a memory mapped bitmap. Even it's 256x192 graphics that let you manipulate each pixel is implemented as characters. But 9938 or 9958 VDP have a true bitmap, blit and line operations. Suddenly having to draw lines or sprites through the VDP isn't a bottleneck, it may be an advantage since you can be drawing while calculating. If you really want to see what the machine can do as a whole, you end up throwing out some of the rules you'd use for just benchmarking a CPU.

When comparing different CPUs... ultimately you end up with different results for different benchmarks with one CPU doing some things better and another CPU doing other things better. If someone comes up with an optimization someone didn't see before, the results can change. There's not much you can do to keep it fair other than set some ground rules as to what you can or cannot do for optimizations.

thealgorithm · January 10, 2014

Yes. The point i made, you have mentioned in the last paragraph, but nothing can be done about it. You are corrent in regards to keeping benchmarks for drawing in a buffer (as there may be factors with latency etc in gfx buffer areas) although similar can alse be said in general ram

carlsson · January 10, 2014

I think benchmarking should almost entirely consist of CPU related activities. As every computer has more or less different graphics hardware etc, the side effects of outputting to screen say very little about the computational strength. In the end, it also boils down to price and availability. If for example a 4 MHz Z80 would calculate 4 more decimals of Pi than a 2 MHz 6502 in the same amount of time, but the Z80 with support chips and RAM of required speed would be a rather more expensive solution, the customer might still choose the inferior, cheaper system if it performs better per dollar.

When implementing benchmarks, it can be argued whether those should be in hand coded assembler or using a common compiled language. In the latter case, the efficiency of the benchmark to a great deal depends on the efficiency of the compiler, but as long as every C compiler is designed to produce as fast (or small) code as possible, for every possible type of program (all-purpose), it would not be an obstacle for cross-architectural comparisons. You could also use an interpreting language, e.g. BASIC but then it depends on the BASIC parser how fast the program will run. Those types of benchmarks have been run since 1977 and still are to this day on various 8-bit computers, and even using different BASIC implementations on the same hardware.

Actually, I think a starting point for a modern benchmark suite might include some of those old BASIC benchmarks, although perhaps in compiled form or hand coded assembler. Then the more complex tasks like Sieve, Pi, sorting, cryptography, compression algorithms etc can be added to give the CPU's what they deserve.

Edited January 10, 2014 by carlsson

TPA5 · January 10, 2014

Vogons.org (Which is an awesome site for DOS enthusiasts) has some fantastic benchmark work done for classic CPU's:

http://www.vogons.org/viewtopic.php?f=46&t=28470

http://www.vogons.org/viewtopic.php?f=46&t=34666

Keatah · January 10, 2014

Should we consider these classic CPUs? I'm inclined to say no, not yet.

But a terrific work nonetheless!

JamesD · January 11, 2014

I think benchmarking should almost entirely consist of CPU related activities. As every computer has more or less different graphics hardware etc, the side effects of outputting to screen say very little about the computational strength. In the end, it also boils down to price and availability. If for example a 4 MHz Z80 would calculate 4 more decimals of Pi than a 2 MHz 6502 in the same amount of time, but the Z80 with support chips and RAM of required speed would be a rather more expensive solution, the customer might still choose the inferior, cheaper system if it performs better per dollar.

The whole point of using a buffer was to avoid additional wait states that would be added by accessing the actual graphics memory.

If you are benchmarking a CPU you don't want added wait states. If you benchmark a machine then you want to take them into account.

I think the fact that the Spectrum, VZ200 and several other machines were so cheap demonstrates that building a machine with a Z80 wasn't that much more expensive. FWIW, that wasn't always the case. At some point Zilog lowered the price. The same thing happened with the 68000 and machines like the Amiga and Atari ST were the result.

When implementing benchmarks, it can be argued whether those should be in hand coded assembler or using a common compiled language. In the latter case, the efficiency of the benchmark to a great deal depends on the efficiency of the compiler, but as long as every C compiler is designed to produce as fast (or small) code as possible, for every possible type of program (all-purpose), it would not be an obstacle for cross-architectural comparisons. You could also use an interpreting language, e.g. BASIC but then it depends on the BASIC parser how fast the program will run. Those types of benchmarks have been run since 1977 and still are to this day on various 8-bit computers, and even using different BASIC implementations on the same hardware.

I figured things like blitting, drawing text on the graphics screen, drawing a line, etc... would be assembler. Maybe as a library for a high level language. When you start doing that much memory manipulation a compiler just can't compare to assembly.

The actual benchmark would be in a high level language but it would link to a graphics library.

I think this would be a realistic scenario. I know Wizardry used this approach with Apple Pascal and I think some of the companies specializing in simulations actually used compiled BASIC so they probably added some libraries in assembler.

Actually, I think a starting point for a modern benchmark suite might include some of those old BASIC benchmarks, although perhaps in compiled form or hand coded assembler. Then the more complex tasks like Sieve, Pi, sorting, cryptography, compression algorithms etc can be added to give the CPU's what they deserve.

Compiled BASIC often depends heavily on ROM calls so you are still stuck with how efficient parts of ROM BASIC are as well as how efficient the compiler is. Since C and Pascal operate independent of the ROMs one of those might be a better choice.

I think the Sieve, Pi, sorts, etc... should be compiled rather than assembly. A compiler should do a decent job for that kind of manipulation and we can probably get away with using existing source code.

I like the idea of a P-Code or similar based language since CPUs end up having to do the same work, but UCSD Pascal seems to be the only common alternative for different CPUs and it's useless if we can't locate the native P-Machines. I found Z-80 source and the executable for CP/M but nothing else in a runnable state. So that makes it no better than any other solution for the moment.

I ran across a C compiler that was written for Flex that generates a P-Code and it included an interpreter in 6800 assembly. It might be easier to port since unlike UCSD Pascal, it didn't require an entire OS environment to get running... but it's Small C and I can't promise it wouldn't be a buggy piece of garbage. It also won't have floating point math.

I think even if we were to get some sort of P-Machine based language benchmarks we'd want to use native code compiler versions as well. We could probably do benchmarks in those already. The problem we will run into here is lack of floating point support in many compilers.

Given the problems with compilers we are back to using BASIC, at least for the moment. About all we can do is try to use a BASIC that functions pretty similar on multiple CPUs and uses the same number of digits of precision for the floating point libraries.

The closest thing we have there is Microsoft BASIC. I can tell you right now that overhead due to key polling and other things that take place in the interrupt handlers are going to impact the results differently from one machine to the next. It would also rule out a link library for some of the benchmarks I suggested.

Any suggestions?

Edited January 11, 2014 by JamesD

TPA5 · January 11, 2014

Should we consider these classic CPUs? I'm inclined to say no, not yet.

But a terrific work nonetheless!

Actually I agree, I misread the original post, didn't realize he was speaking about 8 and 16 bit machines! My bad, sorry folks.

Sign In

Benchmarking Classic CPUs

Recommended Posts

JamesD

Link to comment

Share on other sites

thealgorithm

Link to comment

Share on other sites

JamesD

Link to comment

Share on other sites

thealgorithm

Link to comment

Share on other sites

carlsson

Link to comment

Share on other sites

TPA5

Link to comment

Share on other sites

Keatah

Link to comment

Share on other sites

JamesD

Link to comment

Share on other sites

TPA5

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More