Jump to content
IGNORED

Benchmarking Classic CPUs


BillyHW

Recommended Posts

Those benchmarks are pretty much targeted at modern CPU features that I doubt anyone will ever use on 8 or 16 bit CPUs.

I've never seen it done and I have my doubts about this ever happening without modifying the benchmarks heavily since they were aimed at 32/64 bit cpus with floating point and modern optimizing compilers.

How hard is it?
First you have to get the C/C++ benchmarks to compile which would be no easy task in itself. I doubt most 8 bit C compilers would even support the code.
The benchmarks themselves might not even fit in 64K of RAM.
Then I'm pretty sure at least some of those benchmarks take more RAM for data than 8 bit CPUs can address.
I think one of the benchmarks deals with close to 100,000 data points (3D?). Even if each data point isn't 3D (x,y & z floating point numbers) you are pretty much out of luck right there.

FWIW, those benchmarks are aimed at similar hardware with modern compilers.
For different CPUs you'd have different compilers, different floating point implementations, etc... and you aren't just benchmarking the CPUs, you are benchmarking the compilers.
Even switching compilers on the same CPU would change results as would changing floating point library implementations.

Someone could certainly come up with a similar benchmark suite for old CPUs but experience tells me it will devolve into endless tweaking of the code for better results and you'll end up with something that would never be used outside of a demo... that or people will complain the benchmark is biased and you'll never hear the end of it.

  • Like 1
Link to comment
Share on other sites

Those benchmarks are pretty much targeted at modern CPU features that I doubt anyone will ever use on 8 or 16 bit CPUs.

I've never seen it done and I have my doubts about this ever happening without modifying the benchmarks heavily since they were aimed at 32/64 bit cpus with floating point and modern optimizing compilers.

How hard is it?

First you have to get the C/C++ benchmarks to compile which would be no easy task in itself. I doubt most 8 bit C compilers would even support the code.

The benchmarks themselves might not even fit in 64K of RAM.

Then I'm pretty sure at least some of those benchmarks take more RAM for data than 8 bit CPUs can address.

I think one of the benchmarks deals with close to 100,000 data points (3D?). Even if each data point isn't 3D (x,y & z floating point numbers) you are pretty much out of luck right there.

 

FWIW, those benchmarks are aimed at similar hardware with modern compilers.

For different CPUs you'd have different compilers, different floating point implementations, etc... and you aren't just benchmarking the CPUs, you are benchmarking the compilers.

Even switching compilers on the same CPU would change results as would changing floating point library implementations.

 

Someone could certainly come up with a similar benchmark suite for old CPUs but experience tells me it will devolve into endless tweaking of the code for better results and you'll end up with something that would never be used outside of a demo... that or people will complain the benchmark is biased and you'll never hear the end of it.

 

 

 

Oh boo.

 

What I'd like is some sort of objective measurement to compare the speeds of various processors--a reason for saying that a 6502 clocked at x MHz is *roughly* equivalent to a Z80 clocked at 2x MHz, say, and similarly for the other architectures.

 

Would MIPS be a reasonably fair measure? What if I'm trying to compare the 8-bit 6502 to the 16-bit 68000?

Link to comment
Share on other sites

Make a C program that is compatible with cc65 on the Atari, and BDS-C(Z) on a 4 Mhz. CP/M (Z)System.

 

Make it long and complicated, use large floating point numbers, the only requirements would be 1: no graphics (difficult, or impossible on many CP/M systems) and 2: Make it small enough to fit into the AVERAGE users memory. Most Ataris under SDX, can operate with a LOMEM under $1000, and most (modern) CP/M systems have a TPA of 55-61K typically.

 

I know only enough C language to be dangerous, but, I think that would be the best comparison test.

Link to comment
Share on other sites

 

 

Oh boo.

 

What I'd like is some sort of objective measurement to compare the speeds of various processors--a reason for saying that a 6502 clocked at x MHz is *roughly* equivalent to a Z80 clocked at 2x MHz, say, and similarly for the other architectures.

 

Would MIPS be a reasonably fair measure? What if I'm trying to compare the 8-bit 6502 to the 16-bit 68000?

MIPS is pretty much meaningless when comparing different architectures (6502 vs Z80).

It takes a different number of instructions to do the same work on different architectures.

 

Link to comment
Share on other sites

Make a C program that is compatible with cc65 on the Atari, and BDS-C(Z) on a 4 Mhz. CP/M (Z)System.

 

Make it long and complicated, use large floating point numbers, the only requirements would be 1: no graphics (difficult, or impossible on many CP/M systems) and 2: Make it small enough to fit into the AVERAGE users memory. Most Ataris under SDX, can operate with a LOMEM under $1000, and most (modern) CP/M systems have a TPA of 55-61K typically.

 

I know only enough C language to be dangerous, but, I think that would be the best comparison test.

So cc65 supports floats now? When did that happen?

 

 

<edit>

BTW, I suggest SDCC for the Z80.

 

lcc65 might generate faster code for the 6502 than cc65 but I've never looked at the output of the two side by side.

Edited by JamesD
Link to comment
Share on other sites

yes, it does, but the real difference is clock cycles per instruction.

Well, with Z80 assembly code being a bit smaller than 6502 code the "real difference" as you call it isn't quite so clear. Dealing with 16 bit or larger integers can make quite a difference on code size and speed with the 6502.

Also, the 6502 doesn't support compilers as well so the Z80 would get quite a boost from any C benchmark but hand written assembly would be more favorable to the 6502.

Link to comment
Share on other sites

yes, assembly is always faster, but most people don't like to write large assembly programs. However, in this case, it shouldn't be too difficult to write a simple assembly program to challenge the CPU. Just figure out how to read the clock on each system, so we can have a number to compare when it's done.

 

or:

 

How about this? It doesn't matter what language you use, it is fine if you keyed it in on your front panel switches.... The only requirement is: do the same math calculations (and other cpu stressing routines) on each machine. However you write it, that's ok. It must only perform the same functions as the program on the other machine.

 

Who would be faster then?

Link to comment
Share on other sites

A big difference is that most older CPUs have little or no support for maths operations beyond simple binary stuff. And such operations form the basis for many benchmarks.

Even the way FP is expressed is different, the old gear mostly used BCD with exponent component in binary, modern day uses binary for both.

 

Using high-level language or even C won't help a lot - some might call OS routines which can be variable (e.g. the factory Atari 8-bit Floating Point in ROM is spectacularly inefficient and slow), and even the same branded compiler could have total different algorithms on different machines even if the target CPU is the same.

 

In fairness a benchmark to compare old to new should allow optimisations beyond what you might expect, e.g. larger floating point routines with table-based assistance for calculations.

 

Insofar as MIPS are concerned - a big grey area there.

Not only do different CPUs take different # of cycles per instruction but there's also a big difference in just what can be done in a single instruction.

 

Even 6502 vs 68000 which aren't that far apart in years - the 68000 can in a pair of instructions move 60 bytes worth of data from one area of memory to another (MOVEM.L) where with the 6502 you'd need a loop using about 32 times as many instruction executions.

 

MFLOPS is probably a "fairer" measure, but then you want to have the same problems being solved.

 

All that said, for a 6502 probably divide the clock speed by 10 to get a comparitive measure of MIPS.

Divide probably by 100 or more for MFLOPS.

  • Like 1
Link to comment
Share on other sites

You can use the OS routines ( try FastChip) in cc65, and BDS-C has "The Incredible Superpowerful Floating Point Package".

 

SDCC is a cross compiler. I am trying to keep things as real as possible. That's why I didn't suggest Deep Blue C.

Can you use the OS routines by just declaring something as a float? It's still an Apples to Oranges type of comparison if you aren't using the same floating point format, number of digits and algorithm. The Apple II has had much faster trig operations implemented on it in recent years and it was faster than the Atari to begin with.

You pretty much have to implement identical floating point results for a benchmark to be meaningful. Otherwise you might as well just perform benchmarks in BASIC.

 

I'm trying to understand how SDCC isn't "real". Most people use cross compilers and cross assemblers these days and even Atari used development systems with cross development tools back in the day.

Deep Blue C generates a P-Code like language that is interpreted. How would that be a fair comparison vs a compiler that generates native code?

If you want to go that route, you could use UCSD Pascal. Then at least you are running the same P-Code from the same compiler.

UCSD Pascal was available for the Apple II, Amiga, PC and CP/M. Source code is available and it could certainly run on an Atari.

FWIW, UCSD Pascal wasn't known for generating great code but it was highly portable.

<edit>

 

If what I read is correct, the "C-Code" generated by Deep Blue C is actually 8080 code. If so, the Z80 could run that directly.

Edited by JamesD
Link to comment
Share on other sites

yes, assembly is always faster, but most people don't like to write large assembly programs. However, in this case, it shouldn't be too difficult to write a simple assembly program to challenge the CPU. Just figure out how to read the clock on each system, so we can have a number to compare when it's done.

 

or:

 

How about this? It doesn't matter what language you use, it is fine if you keyed it in on your front panel switches.... The only requirement is: do the same math calculations (and other cpu stressing routines) on each machine. However you write it, that's ok. It must only perform the same functions as the program on the other machine.

 

Who would be faster then?

I think a common algorithm is in order right down to the floating point format.

Link to comment
Share on other sites

Common algorithm - with machines with alike CPUs it's only really necessary to run on the one. e.g. run on Atari 8-bit, you can then fairly accurately say how quickly it'll run on Apple 2, C64. Simple calculation based on clock differences and cycle steal overheads of each machine.

 

Problems can arise in that certain methods will favour one CPU but penalize another.

That can occur both ways - in some cases one CPU might need multiple instructions to accomplish what another does in one. On the other hand, if a common algorithm dictates that smaller chunks of data are operated on at a time than the CPU could otherwise handle then that becomes a handicap.

Link to comment
Share on other sites

inb4 BogoMIPS

I'll have to look that one up.

 

 

Several small benchmarks that you might actually do on an 8 bit would get my vote and each could receive some sort of score.

 

Some possibilities:

Translation, rotation and scaling of some data points.

Drawing a bresenham line in a 2 bit or 4 bit per pixel bitmap. The computer doesn't even need to support the video mode. No optimizations for different angles, it's about the same loops and memory manipulation operations. A 256x192 and/or 320x200 bitmap would be reasonable since both were common and should fit in memory.

Maybe draw a 3D object on the previously mentioned bitmap, rotate it, scale it and make it move so far without worrying about frame rates.

Blitting some 2D objects onto a bitmap, do some logical operations with the background such as masking, etc... Use 8x8 and 16x16 sprites. No hardware sprites, it'a bout the CPUs.

Draw some text on a graphics screen.

Some simple string manipulation, searching and sorting.

Seive of Erathosthenes (I hope that's spelled right)

Etc...

Link to comment
Share on other sites

Common algorithm - with machines with alike CPUs it's only really necessary to run on the one. e.g. run on Atari 8-bit, you can then fairly accurately say how quickly it'll run on Apple 2, C64. Simple calculation based on clock differences and cycle steal overheads of each machine.

 

Problems can arise in that certain methods will favour one CPU but penalize another.

That can occur both ways - in some cases one CPU might need multiple instructions to accomplish what another does in one. On the other hand, if a common algorithm dictates that smaller chunks of data are operated on at a time than the CPU could otherwise handle then that becomes a handicap.

While I agree for the most part, this won't demonstrate the penalty for things like the Apple II's horrid graphics setup, using display lists on the Atari, etc...

It also doesn't show flaws like lack of a vertical black interrupt (Apple II, Oric, etc...)

It is purely CPU efficiency on that hardware.

Link to comment
Share on other sites

FWIW, I found the following SIEVE results embedded in the source to a UCSD Pascal version of the benchmark:

Sage II IV.1 57 (68000 at 8 MHz)
WD uEngine III.0 59 (fillchar is so slow on uE)
LSI-11/23 IV.01 92-122 (depends on memory speed)
LSI-11/23 II.0 105 (98 seconds under IV.01)
LSI-11/23 IV.1 107 (non-extended memory)
LSI-11/23 IV.1 128 (extended memory)
NEC APC IV.1 144 8086 at 4.9 Mhz extended memory
JONOS IV.03 ? 162 (pretty good for a 4 MHz Z-80A)
NorthStar I.5 183 (Z-80 at 4 MHz)
OSI C8P-DF II.0 ? 197 (6502 at 2 MHz)
H-89 II.0 200 (4 MHz Z-80A)
LSI-11/2 IV.0 202
IBM PC IV.03 203 (4.77 MHz 8088)
LSI-11/2 II.0 220
Apple ][ II.1 390 (1 MHz 6502)
H-89 II.0 455 (2 MHz Z-80)
Link to comment
Share on other sites

I think this is way too complicated. You ask the CPU to solve a problem. Use the most efficient means possible. Compare the end results.

None of the things I suggested are that complex and they were just suggestions.

I've actually written code to do everything on that list in the past except for the SIEVE which has source code published all over the place.

 

Modern benchmarks are large and complicated to keep people from coming up with ways to get good results on benchmarks that aren't reflected in real world apps.

I merely tried to come up with a variety of suggested tests that could give an overall speed comparison that wouldn't favor one CPU over another and would discourage cheating.

 

http://www.programming-techniques.com/2012/03/3d-transformation-translation-rotation.html

https://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm

https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes

http://www.zentut.com/c-tutorial/c-quicksort-algorithm/

Blitting isn't exactly difficult.

There is existing graphics text code out there for several processors.

String manipulation isn't exactly rocket science.

 

Why is this too complicated?

Edited by JamesD
Link to comment
Share on other sites

This is dependant on how the benchmark routines are coded. Using the same command set and routine (on same architecture of cpu) would indicate any efficiency in the amount of cycles it takes for each opcode etc. If the benchmark routine for a different architecture of cpu is written differently however, this may not be optimal and the benchmark results would be flawed

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...