Jump to content
IGNORED

Benchmarking...


vol

Recommended Posts

I have run my number π calculator on various models of Atari.  The results are seconds required to compute 3000 digits of this number.  The calculator program for TOS is attached. It has two variants:
1) pure 68000 code which works on every models - PI-ST.TOS;
2) 68030 code which requires the 68020 or later the 68k processor - PI-ST30.TOS.
I use Hatari v1.4 emu.  Results for the Amiga 1200 were taken from real hardware, results for the Amiga 500 are taken from FS-UAE 3.0.5.

                          68000     68020/30
ST 68000@8MHz            139.55         -
Mega STE 68000@16MHz      68.95         -
TT 68030@32MHz            11.36      6.17
Falcon 68030@16MHz        22.90     12.47
Amiga 500 68000@7.1MHz   167.00         -
Amiga 500 Fast RAM       164.00         -
Amiga 1200 68020@14.2MHz  43.00     37.00
Amiga 1200 Fast Ram         n/d     34.00

Some results for the Atari look rather implausible for me.  It is slightly odd that the Mega STE shows more than two times faster speed.  IMHO it must be exactly two times faster.  Maybe it is because the Mega STE has cache which is not affected by video?

However the main issue is the 68030 code results.  They look completely wrong for me.  How can the Falcon be almost 3 times faster than the Amiga 1200?!  I am sure that there is a great inaccuracy of the 68030 timing emulation in Hatari. :(  I don't know an emulator which is better than Hatari. :( Please help me to get right results.  If anyone has the 68030 based Atari please run PI-ST.TOS and PI-ST30.TOS for me.  Many thanks in advance.

One more question.  Are there ways to increase the processor performance on the ST/STE/TT/Falcon?  For example, the Commodore 64 is faster when its screen is blank.

pi-st.zip

Edited by vol
Link to comment
Share on other sites

Indeed, Hatari results should be taken with reserve.  And I can add that so far nobody emulated Mega STE with it's cache accurately. Actually, it might never happen - because it is not something really important - there is no SW which needs exact timing of Mega STE at 16 MHz. Furthermore, that's probably hardly possible, without knowing exact way how that cache works, then be able to reset it somehow to empty begin condition, etc .

Then, no, Mega STE at 16 MHz is not 2x faster than at 8 MHz - because CPU runs at 16 MHz only when accessing cache - if no that address in cache, it must access slower main RAM, so 8 MHz effectively. So, it is averagely about 60% faster, and that depends from running SW.

 

One of best things by Atari ST is that it does not need to slowdown CPU (better said to insert wait states) when video accessing RAM - because RAM is fast enough to serve both. There are 4 74LSxxx chips between RAM and main bus, which do sync when both try accessing RAM at once - then data is kept in them for 250 nS.

Yeah - RAM cycle time is 250 nS, while CPU and video cycle time is 500 nS - for 16 bits - just to check my memory, and show that it gives proper scanline times let's see: one line in color modes is 64 microS = 128x500nS x2 (as it is 16 bit bus) = 512 bytes transfer time. Now, in medium res we have 640 px, 2 bits/px , so 1 line is 160 bytes - so video needs less than half of RAM bandwidth, and that's exactly same in color mode. Little more in monochrome, because smaller borders.

Of course, you can inc. CPU performance in different ways, but CPU clock must then be more than 8 MHz.

Falcon and TT are different story, despite faster RAM  - higher  video modes need much bigger bandwidth for video generations, so there is slowdown of CPU by them  and that's why SW runs faster from so called  fast RAM - but in lower screen modes it's not case - no wait states. (I guess there is something like that by Amiga too).

Yes, in case of Falcon it matters in which video mode you run tests like this.

  • Like 1
Link to comment
Share on other sites

Okay, there is definitely something screwy going on.

 

I have a Pak 68/3 equipped Atari STacy. The Pak is an accelerator

board, 68030, running at 40mhz on my STacy, with a 40mhz FPU

(68882). It's TOS v3.06 with 4 megs of ST-RAM (no FAST/TT/ALT

RAM).

 

A few questions first... Your program did not give me a result. It

simply dropped back to the desktop when done. I timed it using

a stopwatch to get results. I'm not sure how totally accurate that

is.

 

Does your software support a math chip? (FPU). That could

make a huge difference.

 

Anyway, running it here on the setup above, I timed it at 16.5

seconds.

 

I don't see any possible way a stock Falcon could beat that score.

 

I also find it hard to believe that a 32mhz TT beat my score.

 

Using Chris Swinson's GemBench 6, my STacy is registered as

faster than a stock Falcon or TT030.

 

Hope this helps.  :)

 

  • Like 1
Link to comment
Share on other sites

15 hours ago, ParanoidLittleMan said:

Of course, you can inc. CPU performance in different ways, but CPU clock must then be more than 8 MHz.

Yes, in case of Falcon it matters in which video mode you run tests like this.

Thank you.  Would your like to provide some hints for me?  How can I increase the CPU performance?

I run tests under EmuCON and IMHO it uses a quite simple video mode which doesn't slow down the CPU.

7 hours ago, DarkLord said:

Okay, there is definitely something screwy going on.

 

I have a Pak 68/3 equipped Atari STacy. The Pak is an accelerator

board, 68030, running at 40mhz on my STacy, with a 40mhz FPU

(68882). It's TOS v3.06 with 4 megs of ST-RAM (no FAST/TT/ALT

RAM).

 

A few questions first... Your program did not give me a result. It

simply dropped back to the desktop when done. I timed it using

a stopwatch to get results. I'm not sure how totally accurate that

is.

 

Does your software support a math chip? (FPU). That could

make a huge difference.

 

Anyway, running it here on the setup above, I timed it at 16.5

seconds.

 

I don't see any possible way a stock Falcon could beat that score.

 

I also find it hard to believe that a 32mhz TT beat my score.

 

Using Chris Swinson's GemBench 6, my STacy is registered as

faster than a stock Falcon or TT030.

 

Hope this helps.  :)

 

Thank you very much!  Your results confirm my point that Hatari is very inaccurate for timings of top Atari models.  Please run the pi calculator from EmuCON or another command line interface: the program prints digits of the number pi and then it prints the time of calculation.  The calculator code uses only integers, no FPU is required.  What program shows 16.5 s,  PI-ST or PI-ST30?

Your board is a very interesting one for me.  It would be great if you could provide me with results for 100, 1000, and 3000 digits.  I can put them in my research table then.

Edited by vol
Link to comment
Share on other sites

I don't think that this SW uses FPU. It is math calculation in purpose to test raw CPU speed - and I saw diverse benchmarks done in same fashion for diverse computers.

Looking those results from first post they look realistic when it is done with real HW. Falcon and TT, and Mega STE at 16 MHz are not emulated so accurately with Hatari, and not emulated at all by other SW, as I know.  So, we need tests on real HW - I will do it on Mega STE soon.

Diff between ST and Amiga 500 is pretty realistic considering that ST runs at 8 MHz, Amiga at 7.09 MHz, if I remember well. But maybe is worth to run test in different screen resolutions on Amiga - in higher video modes, not from Fast RAM it should be significantly slower. And if want, can do same on ST - although, as said all video modes work without CPU slowdown on ST, STE . But I did it on Falcon, not with this program, in past, and it was slower, especially in so called true color mode.

  • Like 1
Link to comment
Share on other sites

18 hours ago, vol said:

inaccuracy of the 68030 timing emulation in Hatari. :(  I don't know an emulator which is better than Hatari. :( Please help me to get right results.  If anyone has the 68030 based Atari please run PI-ST.TOS and PI-ST30.TOS for me.  Many thanks in advance.

One more question.  Are there ways to increase the processor performance on the ST/STE/TT/Falcon?  For example, the Commodore 64 is faster when its screen is blank.

pi-st.zip 1.43 kB · 5 downloads

Hello,

 

For your last question: you can increase a bit the speed on a TT by setting flags to force it to load, run and allocate from TT-RAM (a faster ram only usable by the CPU and that is not accessed by the shifter or DMA soudn etc...). There is a CPX for that.

 

Else, I made some tests for 3000 digits on my machines and they are suprising...: (I set the flags on the TT!)

 

TT 32MHz + TTRam 

With PI-ST.TOS : 15,30 sec

With PI-ST30.TOS: 17,64 sec

 

The 030 version is slower than the standard one...

 

MegaSTE 16MHz:

With PI-ST.TOS : 70,66 sec.

 

STE 8MHz:

With PI-ST.TOS : 139,32 sec

 

Have you got the source code for the TT version? I'd like to understand why can it be slower than the ST version...

 

Guillaume.

 

 

  • Like 1
Link to comment
Share on other sites

I got practically same results with Mega STE as moulinaie .

 Considering file sizes, it is coded in ASM, I guess.

And couple suggestions:  should not do any screen write, that can make little difference between blitter off and blitter on - and text print on screen goes via blitter if it is activated in Desktop Options.

There should be stop before exit, so can read result of test. I needed to look in code to realize that it at all self measures time, because  it just immediately returns to Desktop and deletes screen. Sorry, not everyone using MagiC or some TOS/Desktop extension. And maybe add new line before time show, "seconds" after it ?

  • Like 1
Link to comment
Share on other sites

4 hours ago, moulinaie said:

Hello,

 

For your last question: you can increase a bit the speed on a TT by setting flags to force it to load, run and allocate from TT-RAM (a faster ram only usable by the CPU and that is not accessed by the shifter or DMA soudn etc...). There is a CPX for that.

 

Else, I made some tests for 3000 digits on my machines and they are suprising...: (I set the flags on the TT!)

 

TT 32MHz + TTRam 

With PI-ST.TOS : 15,30 sec

With PI-ST30.TOS: 17,64 sec

 

The 030 version is slower than the standard one...

 

MegaSTE 16MHz:

With PI-ST.TOS : 70,66 sec.

 

STE 8MHz:

With PI-ST.TOS : 139,32 sec

 

Have you got the source code for the TT version? I'd like to understand why can it be slower than the ST version...

Thank you very much for your results.  I always had doubts that my pi-spigot implementation is slower for the 68020/30 than for the 68000 but all emulators I use (FS-UAE, Hatari) show that the 68020/30 version is faster.  :( For Hatari it is even much faster.  A man reported me results from their Amiga 1200 but I thought he just confounded programs.
Your results show that Hatari is quite accurate for the Mega STE: the emu is only about 2% faster.  For the ST/STE it is rather cycle exact, the difference is only about 0.1%!  For the TT, Hatari shows approximately 35% faster speed than real hardware on PI-ST, and 186% (!) faster on PI-ST30.  It seems that DIVUL takes much less cycles in the emu than in hardware.  Would you like to run PI-ST without TTRAM?  It is very interesting to know the difference.  It would also be good to get a screenshot.
The 68000 and 68020/30 versions are in the same file.  I use conditional assembly.  Check __VASM variable.  The only difference between versions is 32-bit division implementation.  This is 68020/30 code

     divul.l d4,d7:d6
     move d7,(a3)

And this is its equivalent for the 68000 (it is faster)

     moveq.l #0,d7
     divu d4,d6
     bvc .div32no\@

     swap d6
     move d6,d7
     divu d4,d7
     swap d7
     move d7,d6
     swap d6
     divu d4,d6
.div32no\@
     move d6,d7
     clr d6
     swap d6
     move d6,(a3)

The code for the 68000 is faster because the branch to .div32no\@ is taken almost always in the pi-spigot algo.

 

1 hour ago, ParanoidLittleMan said:

I got practically same results with Mega STE as moulinaie .

 Considering file sizes, it is coded in ASM, I guess.

And couple suggestions:  should not do any screen write, that can make little difference between blitter off and blitter on - and text print on screen goes via blitter if it is activated in Desktop Options.

There should be stop before exit, so can read result of test. I needed to look in code to realize that it at all self measures time, because  it just immediately returns to Desktop and deletes screen. Sorry, not everyone using MagiC or some TOS/Desktop extension. And maybe add new line before time show, "seconds" after it ?

Thank you.   Sorry, I thought command line interface is more common on TOS.  I have just made my programs more user friendly, they wait a key press before they finish now.  The updated programs are attached and my git repo is also updated.

pi-st-2.zip

Link to comment
Share on other sites

2 hours ago, ParanoidLittleMan said:

And couple suggestions:  should not do any screen write, that can make little difference between blitter off and blitter on - and text print on screen goes via blitter if it is activated in Desktop Options.

 

Sorry I can't still find how can I activate blitter?  I checked Set preferences... and Desktop configuration... and found nothing about blitter there. :( 

Edited by vol
Link to comment
Share on other sites

I have improved PI-ST30 code.  Now it is actually faster than code for PI-ST.  But the speed difference is almost invisible.  IMHO it is impossible to get a visible advantage from the 68020/30 additional instructions in this case. :( The new programs are attached. Code for the PI-ST was not changed.

37 minutes ago, ParanoidLittleMan said:

Blitter field below Print Screen will appear only when there is blitter in machine, or it is activated in emulator.

And min TOS 1.02 .

Thank you.  How could I miss it?! :( However I can't notice any speed difference for the pi-calculator.

pi-st-3.zip

Link to comment
Share on other sites

30 minutes ago, vol said:

I have improved PI-ST30 code.  Now it is actually faster than code for PI-ST.  But the speed difference is almost invisible.  IMHO it is impossible to get a visible advantage from the 68020/30 additional instructions in this case. :( The new programs are attached. Code for the PI-ST was not changed.

Thank you.  How could I miss it?! :( However I can't notice any speed difference for the pi-calculator.

pi-st-3.zip 1.44 kB · 1 download

Hi,

 

I have written a similar program for the 68030 some years ago.

It doesn't display anything on screen, just compute and then save the digits into a file.

It displays the calculation time, the saving time and total time.

 

You'll find it here, it is SUPERPI.TOS :

 

https://gtello.pagesperso-orange.fr/kronos_soft.htm

 

You'll be able to compare the results of my page with what HATARI gives you (the results given are for 16384 digits)

 

Guillaume.

Edited by moulinaie
  • Like 1
Link to comment
Share on other sites

Okay, I downloaded the latest version of Pi-ST and re-ran it with 100, 1000, and 3000 as the settings.

 

Things are still not what they should be though. Scores are pretty much identical in either mode, with the

68k version being slightly faster. That's not right.  :)

 

68000 @ 100, 1000, and 3000:

 

68000-100.thumb.JPG.ab772340ddc7d9b89506959d9390b1dc.JPG68000-1000.thumb.JPG.399cdc3c9978c9c21745b5de9618b9f4.JPG68000-3000.thumb.JPG.2a17de7374781d26da8c39d9d7f57d11.JPG

 

 

Now, the 68030 version at 100, 1000, and 3000:

 

68030-100.thumb.JPG.c748175172b10bbdea473024f64b4803.JPG68030-1000.thumb.JPG.6f8537c2b8ed90b26776b4bbf0acb7f0.JPG68030-3000.thumb.JPG.671f45e48c29dcc462b50c5a465f6156.JPG

 

 

 

Sorry for the low quality of the pictures but they should be good enough for you

to evaluate the results.

 

Hope this helps.

 

 

 

 

 

 

Edited by DarkLord
Corrections...
  • Like 1
Link to comment
Share on other sites

On 4/26/2021 at 9:11 PM, moulinaie said:

Hi,

 

I have written a similar program for the 68030 some years ago.

It doesn't display anything on screen, just compute and then save the digits into a file.

It displays the calculation time, the saving time and total time.

 

You'll find it here, it is SUPERPI.TOS :

 

https://gtello.pagesperso-orange.fr/kronos_soft.htm

 

You'll be able to compare the results of my page with what HATARI gives you (the results given are for 16384 digits)

 

Guillaume.

Thank you very much.  Your programs and results are very interesting.  Your code uses a different algorithm.  The pi-spigot is easier but of course slower.  IMHO the pi-spigot is the easiest algo for the number pi computation.  So it is quite good for the implementation on any system, even 8-bit. :) It allows to use 16-bit numbers that often simplifies calculations but that also limits the number of digits to 9400.  My implementation of pi-spigot prints digits on screen because it takes usually less than 1% of overall time for 3000 digits.  Some my table entries have results for cases when output was redirected to a file which gives results similar to yours.  BTW I started my project in 2015, 3 years after you finished yours.
Hatari (the TT without TTRAM) executes SP16K for 33.58 seconds, your table (the TT  without TTRAM) contains 249.05 seconds - so this emu is very-very inaccurate. :( 
Your table shows that TTRAM makes a system about 10% faster but it depends on an algo used.  So it is still interesting to get results from your system with TTRAM disabled.  Indeed it will be very interesting to get results from some system you listed in your tables. ;) Do you have any explanation why the Atari Stacy PAK68 / 3 utilizes its 68030 less efficiently that the TT?  Is it possible to download your program for the x86? 
 

Link to comment
Share on other sites

I have a problem with output redirection.  :( I don't understand why my programs do not redirect their output to a file when I use >-sign.  Standard utilities work quite well, for instance dir >file works fine.  But when I type pi-st >file it doesn't work.  It just creates an empty file. :(  I use standard GEMDOS functions for screen output in my programs...  GEMDOS functions directly correspond MS-DOS functions which are always easy to redirect using the >-sign.  Please help me with this issue.  Thank you.

Edited by vol
Link to comment
Share on other sites

9 hours ago, vol said:

 ;) Do you have any explanation why the Atari Stacy PAK68 / 3 utilizes its 68030 less efficiently that the TT?  Is it possible to download your program for the x86? 
 

 

Not so sure about it being less efficient? According to Chris Swinson's GemBench 6, a well known and recognized benchmarking standard, the Pak 68/3 board in my

STacy is faster than a TT (as it should be).

 

I ran a test just now and even with the TT with Alt (FAST) RAM, the Pak 68/3 wins except in one category and I'm assuming that's because the TT has better RAM

access (is it 32 bit in ALT-RAM?). Perhaps some knowledgeable TT owners can tell us.  :)

 

 

76402110_STacyvsTT.thumb.JPG.d3b5d1b5eb4553126efe5326810880c7.JPG

  • Like 1
Link to comment
Share on other sites

Here are the stats running on my Atari TT030 @ 32Mhz...

 

Pi-TOSv3:   0.06, 2.09, 16.61 (100, 1000, 3000 places)

Pi-TOS30v3: 0.06, 2.09, 16.61 (100, 1000, 3000 places)

SuperPi:    0.01, 0.85, 7.51 (100, 1000, 3000 places)

 

There is a CaTTamaran installed which is supposed to run the 030 at 48MHz but I got the same numbers so it much be malfunctioning or something is physically out of place. ?

  • Like 1
Link to comment
Share on other sites

13 hours ago, vol said:

Thank you very much.  Your programs and results are very interesting.  Your code uses a different algorithm.  The pi-spigot is easier but of course slower.  IMHO the pi-spigot is the easiest algo for the number pi computation.  So it is quite good for the implementation on any system, even 8-bit. :) It allows to use 16-bit numbers that often simplifies calculations but that also limits the number of digits to 9400.  My implementation of pi-spigot prints digits on screen because it takes usually less than 1% of overall time for 3000 digits.  Some my table entries have results for cases when output was redirected to a file which gives results similar to yours.  BTW I started my project in 2015, 3 years after you finished yours.
Hatari (the TT without TTRAM) executes SP16K for 33.58 seconds, your table (the TT  without TTRAM) contains 249.05 seconds - so this emu is very-very inaccurate. :( 
Your table shows that TTRAM makes a system about 10% faster but it depends on an algo used.  So it is still interesting to get results from your system with TTRAM disabled.  Indeed it will be very interesting to get results from some system you listed in your tables. ;) Do you have any explanation why the Atari Stacy PAK68 / 3 utilizes its 68030 less efficiently that the TT?  Is it possible to download your program for the x86? 
 

Hi,

The time consumed by the display depends on the size of the screen. On my MegaSTE, as I have a graphic card, then the 3000 digits fit on the screen without scrolling.

But, on the STE, even in ST-HIGH, the screen scrolls several times to display it all.

 

Yes, HATARI doesn't emulate the timings of the 68030. Maybe it's a hard thing as there are caches (data/code) but I'm sure that division is far faster on a x86, so as I use a lot the DIV instruction, my program is sped up.

 

If the PAK68/3 doesn't come with fast ram, sure that this explains why the TT is still faster, even at 32MHz. My SuperPI uses a lot the ram accesses.

 

Here you'll find the Windows version of SuperPI

 

Guillaume.

Super Pi.zip

  • Like 1
Link to comment
Share on other sites

2 hours ago, MasterMotorola said:

Here are the stats running on my Atari TT030 @ 32Mhz...

 

Pi-TOSv3:   0.06, 2.09, 16.61 (100, 1000, 3000 places)

Pi-TOS30v3: 0.06, 2.09, 16.61 (100, 1000, 3000 places)

SuperPi:    0.01, 0.85, 7.51 (100, 1000, 3000 places)

 

There is a CaTTamaran installed which is supposed to run the 030 at 48MHz but I got the same numbers so it much be malfunctioning or something is physically out of place. ?

There is a DIP switch to set to specify that the accelerator is installed. If not in place, then the CaTTamaran software is not run and you don't get the 48MHz speed.

Guillaume

  • Thanks 1
Link to comment
Share on other sites

On 4/27/2021 at 1:18 AM, DarkLord said:

Okay, I downloaded the latest version of Pi-ST and re-ran it with 100, 1000, and 3000 as the settings.

 

Things are still not what they should be though. Scores are pretty much identical in either mode, with the

68k version being slightly faster. That's not right.  :)

 

68000 @ 100, 1000, and 3000:

 

Sorry for the low quality of the pictures but they should be good enough for you

to evaluate the results.

 

Hope this helps.

Thank you very much!  The table is updated.  The entry for the Atari Stacy Pak68/3 has position #8 in this table now.  One of your screenshots is a part of this entry.  Indeed, a bigger screen area would be a bit better there. ;) I believe people who send results for me, I only ask them for a screenshot because it is good illustrative material for the table. ;) 

I am sure that your result that the 68000 version is faster than the 68030 version was a kind of a random accident.  The 68030 version is faster but only about fractions of a percent. :)

 

6 hours ago, DarkLord said:

 

Not so sure about it being less efficient? According to Chris Swinson's GemBench 6, a well known and recognized benchmarking standard, the Pak 68/3 board in my

STacy is faster than a TT (as it should be).

 

I ran a test just now and even with the TT with Alt (FAST) RAM, the Pak 68/3 wins except in one category and I'm assuming that's because the TT has better RAM

access (is it 32 bit in ALT-RAM?). Perhaps some knowledgeable TT owners can tell us.  :)

 

 

76402110_STacyvsTT.thumb.JPG.d3b5d1b5eb4553126efe5326810880c7.JPG

Indeed your Pak68/3 is faster but it uses 40 MHz clock while the TT has 32 MHz.  The Pak68/3 utilizes its CPU less efficient than the TT because it is less than 25% faster.  I took 25% because of 40/32=1.25.  Maybe it is because the Pak68/3 doesn't have fast RAM.  This is confirmed by MasterMotorola's results whose TT doesn't have fast RAM and shows itself exactly 25% slower than the Pak68/3.  However moulinaie's results for his super-pi shows that the Pak68/3 @50MHz is only slightly (1%) faster than the Atari TT without fast RAM...

Edited by vol
Link to comment
Share on other sites

4 hours ago, MasterMotorola said:

Here are the stats running on my Atari TT030 @ 32Mhz...

 

Pi-TOSv3:   0.06, 2.09, 16.61 (100, 1000, 3000 places)

Pi-TOS30v3: 0.06, 2.09, 16.61 (100, 1000, 3000 places)

SuperPi:    0.01, 0.85, 7.51 (100, 1000, 3000 places)

 

There is a CaTTamaran installed which is supposed to run the 030 at 48MHz but I got the same numbers so it much be malfunctioning or something is physically out of place. ?

Thank you very much.  The table is updated.  Your data is in the 11th row there.  Sorry I don't have a proper screenshot to add for this entry. :(

Edited by vol
  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...