Jump to content
IGNORED

Benchmarking...


vol

Recommended Posts

A stock 32 mhz TT030 is not faster than a Pak 68/3 board running at 40mhz.

 

It would be even slower compared to a 50mhz Pak board.

 

A 32 mhz TT030 equipped with ALT (Fast or TT) RAM would be closer.

 

GemBench 6 clearly shows a Pak 68/3 board @ 32 mhz with no ALT (Fast or TT) RAM

is overall, 161 percent faster, and that's vs a 32 mhz TT030 with ALT (Fast or TT) RAM.

 

All my tests are done with TOS v3.06, 4 megs of ST RAM, no ALT (Fast or TT) RAM.

 

It would be interesting to find someone with a 32 mhz Pak 68/3 board with no ALT (Fast or TT) RAM,

and check their results against a stock 32 mhz TT030 with no ALT (Fast or TT) RAM.

 

HTHs.

 

 

Link to comment
Share on other sites

1 hour ago, DarkLord said:

So I downloaded and ran Super Pi on my STacy. I honestly think I did this before

when it was first released.  :)

 

100 places - 0.01, 1000 places - 0.78, 3000 places - 6.85.

 

Here are the screenshots:

 

 

Yes I think you did!

Your result is on my page, a line with:

Stacy PAK68/3 68030 50 MHz TOS ST Ram

4mn 5.77s

 

for the calculation of 16384 digits. Do you still have the same result?

 

Guillaume.

  • Like 1
Link to comment
Share on other sites

6 hours ago, moulinaie said:

Hello,

 

I made the tests again without the TT RAM (TT 32MHz, ST RAM):

pi-st.tos : 15,97 sec

pi-st30.tos : 18,95 sec

 

Guillaume.

Thank you. It seems that you run PI-ST v1 or 2.  However the timings must be the same, only PI-ST30 must become faster.  MasterMotorola's result is slightly slower, 16.61 s. 

Now I only need the Falcon results and maybe several screenshots. ;) 

Link to comment
Share on other sites

8 hours ago, moulinaie said:

Yes I think you did!

Your result is on my page, a line with:

Stacy PAK68/3 68030 50 MHz TOS ST Ram

4mn 5.77s

 

for the calculation of 16384 digits. Do you still have the same result?

 

Guillaume.

 

I'll have to run it at 16384 again to see. Will do that for the next post.

 

If that's my STacy, then probably need to correct something. I'm not running at 50 mhz but 40 mhz. It is TOS v3.06 and it's 4 megs of ST RAM.

 

I'll report back once I re-run that. Thanks!  :)

 

  • Like 1
Link to comment
Share on other sites

9 hours ago, vol said:

Thank you. It seems that you run PI-ST v1 or 2.  However the timings must be the same, only PI-ST30 must become faster.  MasterMotorola's result is slightly slower, 16.61 s. 

Now I only need the Falcon results and maybe several screenshots. ;) 

Yes, you're right! I swapped to Version 3 and here are the results:

(What is strange is that the version for 68000 and for 68030 have exactly the same timings! See the screen shots)

 

FOR THE MEGA STE:

16 MHz+cache : 70,72

8MHz, no cache : 140.39

megaste8.jpg

mste-16.jpg

Edited by moulinaie
  • Like 1
Link to comment
Share on other sites

12 hours ago, DarkLord said:

So I downloaded and ran Super Pi on my STacy. I honestly think I did this before

when it was first released.  :)

 

100 places - 0.01, 1000 places - 0.78, 3000 places - 6.85.

 

Here are the screenshots:

 

super_pi-3000.thumb.JPG.7a03172d63160ec3465748ff9ae42349.JPG

Hello,

 

Here are the results on the TT 32MHz with and without TT-Ram.

And you're right, the PAK68/3 is faster!

 

 

superpi-stram.jpg

superpi-ttram.jpg

Link to comment
Share on other sites

On 4/29/2021 at 9:54 AM, ParanoidLittleMan said:

I was about to add here text about what everything has influence on benchmark results, and not only that - overall speed of running SW. But, it will be long, so better start new thread - will appear soon in programming section.

Please wait.  Amiga people showed me that there are ways to make my code faster.  I am working under it now...

Link to comment
Share on other sites

On 4/29/2021 at 12:16 AM, moulinaie said:

For the TT 32MHz:

 

using ST-RAM : 15,97 (both versions)

using TT-RAM : 15,18 (both versions)

 

How did you disable/enable each type of RAM?  I should redo my tests for each type to see how it affects the numbers.  I have 4Mb of ST-RAM and 256Mb of TT-RAM.

  • Like 1
Link to comment
Share on other sites

On 5/1/2021 at 5:56 PM, MasterMotorola said:

 

How did you disable/enable each type of RAM?  I should redo my tests for each type to see how it affects the numbers.  I have 4Mb of ST-RAM and 256Mb of TT-RAM.

I use a CPX from the control panel that allows you to set the flags for an executable file (PRG/TOS).

It is called PRG Flagsetter in the Control Panel.

 

Guillaume

  • Thanks 1
Link to comment
Share on other sites

Okay, sorry to take so long to get these results.

 

(I didn't take a picture this time)

 

How many digits : 16384

Calculating...

Digits saved into PI.OUT.TXT

Time for calculating 204.01 s

Time for saving 0.40 s

Global time 204.41 s

Program in ST RAM and workspace in ST RAM.

 

Link to comment
Share on other sites

Amiga people helped me to find two optimizations for my code.  In particular they suggested to replace MULU with a sequence of 9 instructions which is faster than MULU for the 68020/30! :)
Would you like please to run for me `pi-st.tos' one last time for 100, 1000, and 3000 digits?  I am seeking results from the Atari MegaSTE @16MHz, Atari TT, Atari TT + TTRAM, Atari Stacy Pak/3, Falcon and other Ataris which use 68030.
Indeed results from the 68040 and 68060 are very interesting but my project is rather for hardware released before 1995.
I have read that it is possible to use the Atari Falcon CT60 in the 68030 mode...
There is no need to run `pi-st30.tos' this time because it should show the same performance as `pi-st.tos'.

BTW I have already updated tables but I had to use estimates and this is pretty bad for the accuracy of the data. :( 

Thank you.

pi-st-5.zip

Link to comment
Share on other sites

2 hours ago, vol said:

Would you like please to run for me `pi-st.tos' one last time for 100, 1000, and 3000 digits?  I am seeking results from the Atari MegaSTE @16MHz, Atari TT, Atari TT + TTRAM, Atari Stacy Pak/3, pi-st-5.zip 3.87 kB · 3 downloads

Hi !

 

Here are the results:

Atari TT/32MHz with ST Ram

100 = 0.05

1000 = 2.01

3000 = 15,96

 

Atari TT/32MHz with TT Ram

100 = 0.05

1000 = 1,89

3000 = 14,93

 

MegaSTE/16MHz

100 = 0.13

1000 = 8.23

3000 = 70.73

 

That's it !

Guillaume.

  • Like 1
Link to comment
Share on other sites

14 hours ago, moulinaie said:

Here are the results:

Atari TT/32MHz with ST Ram

100 = 0.05

1000 = 2.01

3000 = 15,96

 

Atari TT/32MHz with TT Ram

100 = 0.05

1000 = 1,89

3000 = 14,93

 

MegaSTE/16MHz

100 = 0.13

1000 = 8.23

3000 = 70.73

Thank you very much!  The table is updated.  It is really interesting, is it possible to use the Falcon CT60 in the 68030 mode? ;)

BTW it is also interesting that your Super-Pi is much faster (35%) on the 68030 than on the 80386 but my Pi-Spigot shows that the 80386 is slightly (10%) faster.  Maybe I will try to dig into your x86 code but I am not sure when.

Link to comment
Share on other sites

So, what's about this is ? About measuring performance of some computer, or how to make faster, better optimized code ?

And, this 'really interesting' talk - sorry, but this are actually trivial things. Using Falcon CT60 in 68030 mode - of  course that it is necessary - otherwise it will not run big %-age of regular Falcon SW. And not always because different CPU, sometimes some SW works not just because execution speed is much higher than for what is made and tested on.

Then - oh what a shock ! - some SW works much faster on another CPU than other SW, what runs not so much faster .... This should go in programming section, and learn some things before making tables and some big conclusions.

So, concrete example: 68000 and 68030 were designed before 80386, and no way that later is only 10% faster at same clock. And as I know 68030 used external cache (too), so much depends from it's size and speed (something ignored here completely by vol) .

More concrete:  loops are where often most of CPU time is spent.  With 68000 there are instructions specially good for loops -  DBcc .

Then repeating same one in row multiple times will make it faster, because loop control instruction relative time will be less - called enroll usually.

But it is not fastest way to do RAM clear, mem block copy, for instance.  movem.l  with plenty of registers is fastest on 68000. Used in TOS too.

Why it is fastest ? Because no instruction fetch for every transfer, write ...  And that's is what is better in later CPU - it can repeat same instruction without need to fetch it over and over again.  Step above it is MMX and what came after - multiple data processing with single instruction.

 

When we say benchmark, that should be something what performs code similar to average, everyday used SW.  Math calculation is good, but only to compare computers, CPU's capability in that area.  

Really good benchmark need to be thorough, to utilize all computer parts involved in usual SW execution. So video, cache, large RAM areas (because of cache mostly), then OS calls involved too (maybe best as option) - because OS speed matters too. And storage speed is what can be measured, even if it is not basic part, but it has influence on working speed of complexer APPs, and hard disk speed depends in big part from motherboard, hard disk controller/DMA in it, not only from disk speeds self.

 

I know what I talking, I made some benchmark SW.

And even did some code post optimizations, not because it was so necessary in most cases, but to learn, see how better it can be - in speed and in code size, although 2 goes not together in most cases ?

 

So, here is one example of optimizing  ASM code for speed: http://atari.8bitchip.info/FastCoding.html

Yeah, I was wrong - STE with mass storage enough fast can play CD audio format on fly. The hard part is framerate conversion needed.

And no, I don't think that I'm off topic.

 

  • Like 1
Link to comment
Share on other sites

3 hours ago, ParanoidLittleMan said:

So, here is one example of optimizing  ASM code for speed: http://atari.8bitchip.info/FastCoding.html

Yeah, I was wrong - STE with mass storage enough fast can play CD audio format on fly. The hard part is framerate conversion needed.

And no, I don't think that I'm off topic.

 

On your page, you're wondering is some SW use this kind of techniques.

In MP_STE and M_PLAYER, I have to change sounds from the PC world (11025, 22050 and 44100 Hz) to the Atari World (12517, 25033, 50066 Hz).

I use a very dirty trick to get the max speed:

I take 8 samples from the source and copy them to 9 samples in the destination just by repeating the last sample twice.

So 11025 * 9 / 8 = 12 403 very close to what is needed on Atari.

And that's the same for 22KHz to 25KHz and 44kHz to 50kHz.

 

Guillaume.

 

Link to comment
Share on other sites

First, thanks for improving your code.

 

This is from your latest release, and not the '030 version (you

said not to run that, correct?)

 

Atari STacy, 4 megs of ST RAM, TOS v3.06, 68030@40hz, 68882@40mhz

 

100 = .04

 

1000 = 1.59

 

3000 = 13.02

 

Hope this helps.  :)

 

  • Like 1
Link to comment
Share on other sites

2 hours ago, ParanoidLittleMan said:

Interesting. And why is it (again) that "PC World" used that 44100 Hz (audio CD) sample rate and what can simply binarily divide from it , while Atari went on some special freq. ?

If I remember well, to reinder a sound at a given frequency, you must sample at least at the double frequency.

So, as the human ear can detect up to 20kHz (i think..), sampling at 40kHz or a bit higher is correct.

Guillaume.

Link to comment
Share on other sites

6 hours ago, ParanoidLittleMan said:

So, what's about this is ? About measuring performance of some computer, or how to make faster, better optimized code ?

And, this 'really interesting' talk - sorry, but this are actually trivial things. Using Falcon CT60 in 68030 mode - of  course that it is necessary - otherwise it will not run big %-age of regular Falcon SW. And not always because different CPU, sometimes some SW works not just because execution speed is much higher than for what is made and tested on.

[...]

So, concrete example: 68000 and 68030 were designed before 80386, and no way that later is only 10% faster at same clock. And as I know 68030 used external cache (too), so much depends from it's size and speed (something ignored here completely by vol) .

[...]

It is just my project which can show which processor is better for the number pi computation using the easiest known algo. :) The idea is quite simple - make the best code for every CPU tested.  IMHO results in table #2 look quite interesting.  Indeed it is not a perfect general benchmark, it is only one algorithm implementation testing.

I remember my friend had some business around Atari computers because they were popular among musicians.  He had Falcons in 1993 but I never touched them. :( So it is very interesting for me to get results for the π computation from this computer which has a very unusual architecture, the 68030 on the 16-bit data-bus.

The 68030 appeared in 1987 and the 80386 in 1985, so the 68030 was designed a bit later.  It is still unclear what CPU is actually faster.  We have very few benchmark results for both CPUs. :( Every processor has its own set of advantages:  the 68030 has more registers, an ability to execute several instructions at once, higher frequencies but the 80386 has its advantages too: faster memory access cycle, instant EA calculation, much faster division.

3 hours ago, DarkLord said:

Atari STacy, 4 megs of ST RAM, TOS v3.06, 68030@40hz, 68882@40mhz

 

100 = .04

 

1000 = 1.59

 

3000 = 13.02

 

Hope this helps.  :)

 

Thank you very much.  Your Atari is the fastest in my tables that have just been updated.  It is interesting that the 68020 and 68030 show almost identical performance.

Edited by vol
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...