Jump to content

Photo

Ahl's Benchmark?


140 replies to this topic

#126 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,755 posts
  • Location:Flyover State

Posted Tue Aug 8, 2017 9:59 AM

 

  1. There is nothing to show during compute-time. Zero. No point in wasting 25%-30% of 6502's output... because it is literally being halted by Antic. Moreover, stuff CAN BE SHOWN, even if Antic is "turned off". System Information 2.24 achieves exactly this.
  2. Atari Basic (nor Atari OS) are not aware of trivial arithmetic and basic optimizations... that they would otherwise be with more memory to spare (instead of a "miserable" 8Kbytes span). 
  3. Unrolling MAY or MAY NOT help. Atari Basic, for instance, does not seem to operate For-Next loops with pure integer arithmetic. Atari Basic is VERY, VERY constrained.
  4. The system rom I am using (800XL/XE-Rev3-FP) runs add / subs. operations about 2.3x faster and Mult / Div. operations 5.0-5.8x faster than original Atari FP routines. That's the key.

 

Anyone here is welcome to post resulting times (and screen shots) from similar optimizations. Going from 400+ secs. down to 42 sec (still on ATARI Basic !!!) shows how wasteful and potentially pointless this benchmark is on Atari. 

 

Cheers!

I stand corrected on the Antic.
But you are still missing the point on the benchmark.  
You aren't running the same code which is the point of a benchmark.
 



#127 Faicuai OFFLINE  

Faicuai

    Dragonstomper

  • 701 posts
  • Location:Florida, U.S.A.

Posted Tue Aug 8, 2017 2:57 PM

I stand corrected on the Antic.
But you are still missing the point on the benchmark.  
You aren't running the same code which is the point of a benchmark.
 

 

Not really the intention to over-rotate on a trivial matter but, objectively, I can't agree. To put things in perspective:

 

  • Even benchmarking my Broadwell-based HP Z840 (Xeon v4), there is a MULTITUDE of items that need to be addressed BEFORE benchmarking (!):
    • Runtime-power management,
    • idle-power-states,
    • enabling / disabling Virtualization, Hyper-Threading, etc.,
    • defining process-to-processor affinity, etc.,
  • All of the above play key roles in determining what the HW platform is really capable of doing.
  • Likewise, on the Atari, ensuring that your CPU can devote as much cycles as it can (thus capitalizing on its 1.7+ Mhz of raw speed) is MANDATORY (not optional !!!). WHAT is the point of running a test where we know UP-FRONT that close to 30% of CPU-time is wasted (!?)

 

And that is, in my opinion, the crux of this particular story / benchmark (and where I believe you are missing the point):

  • Going from 400+ secs to 42+ secs (almost TEN-FOLD reduction of time, STILL within Atari Basic interpreter !) simply tells us how WOEFULLY inadequate the environment (FP routines / BASIC implementation) were, instead of what the HW platform (CPU, memory, supporting chipsets, etc.) itself TRUE capabilities.
  • When seeking real, definitive benchmarking results, the LATTER is what I am interested on (because it is what I can't immediately control or change). The first part, however (SW / Operating environment) we have much better control of, for almost any platform. 

 

If it was up to me, I would invite ANYONE reading this thread, on ANY COMPARABLE platform of their choice, running this benchmark to such platform's BEST ability, and show the results here... that would be really, really interesting (putting aside the markedly skewed nature of the benchmark, which mostly hammers Floating-Point and integer-processing handling).

Cheers!



#128 Stephen OFFLINE  

Stephen

    Quadrunner

  • 6,513 posts
  • A8 Gear Head
  • Location:No longer in Crakron, Ohio

Posted Tue Aug 8, 2017 3:18 PM

 

Not really the intention to over-rotate on a trivial matter but, objectively, I can't agree. To put things in perspective:

 

  • Even benchmarking my Broadwell-based HP Z840 (Xeon v4), there is a MULTITUDE of items that need to be addressed BEFORE benchmarking (!):
    • Runtime-power management,
    • idle-power-states,
    • enabling / disabling Virtualization, Hyper-Threading, etc.,
    • defining process-to-processor affinity, etc.,
  • All of the above play key roles in determining what the HW platform is really capable of doing.
  • Likewise, on the Atari, ensuring that your CPU can devote as much cycles as it can (thus capitalizing on its 1.7+ Mhz of raw speed) is MANDATORY (not optional !!!). WHAT is the point of running a test where we know UP-FRONT that close to 30% of CPU-time is wasted (!?)

 

And that is, in my opinion, the crux of this particular story / benchmark (and where I believe you are missing the point):

  • Going from 400+ secs to 42+ secs (almost TEN-FOLD reduction of time, STILL within Atari Basic interpreter !) simply tells us how WOEFULLY inadequate the environment (FP routines / BASIC implementation) were, instead of what the HW platform (CPU, memory, supporting chipsets, etc.) itself TRUE capabilities.
  • When seeking real, definitive benchmarking results, the LATTER is what I am interested on (because it is what I can't immediately control or change). The first part, however (SW / Operating environment) we have much better control of, for almost any platform. 

 

If it was up to me, I would invite ANYONE reading this thread, on ANY COMPARABLE platform of their choice, running this benchmark to such platform's BEST ability, and show the results here... that would be really, really interesting (putting aside the markedly skewed nature of the benchmark, which mostly hammers Floating-Point and integer-processing handling).

Cheers!

That's not what a standard benchmark does!

 

If I run a graphics test 100% coded and optimized for an AMD graphics card and then I run an entirely different test 100% coded and optimized for my NVidia graphics card, the results don't mean jack shit.  SAME code running on two different platforms, valid result.

 

If you're testing 2 cars for 0=60MPH time, you don't test a professional racecar driver against a new to driving person.  Again - results wouldn't mean shit.  Same driver, 2 cars, valid test result.



#129 Faicuai OFFLINE  

Faicuai

    Dragonstomper

  • 701 posts
  • Location:Florida, U.S.A.

Posted Tue Aug 8, 2017 4:28 PM

That's not what a standard benchmark does!

 

(...)

 

If you're testing 2 cars for 0=60MPH time, you don't test a professional racecar driver against a new to driving person.  Again - results wouldn't mean shit.  Same driver, 2 cars, valid test result.

 

Wrong, my friend! And here's the proof:

 



#130 Stephen OFFLINE  

Stephen

    Quadrunner

  • 6,513 posts
  • A8 Gear Head
  • Location:No longer in Crakron, Ohio

Posted Tue Aug 8, 2017 5:24 PM

 

Wrong, my friend! And here's the proof:

 

Now that's a wicked car!



#131 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,755 posts
  • Location:Flyover State

Posted Tue Aug 8, 2017 5:37 PM

Not really the intention to over-rotate on a trivial matter but, objectively, I can't agree. To put things in perspective:

  • Even benchmarking my Broadwell-based HP Z840 (Xeon v4), there is a MULTITUDE of items that need to be addressed BEFORE benchmarking (!):
    • Runtime-power management,
    • idle-power-states,
    • enabling / disabling Virtualization, Hyper-Threading, etc.,
    • defining process-to-processor affinity, etc.,
  • All of the above play key roles in determining what the HW platform is really capable of doing.
  • Likewise, on the Atari, ensuring that your CPU can devote as much cycles as it can (thus capitalizing on its 1.7+ Mhz of raw speed) is MANDATORY (not optional !!!). WHAT is the point of running a test where we know UP-FRONT that close to 30% of CPU-time is wasted (!?)
And that is, in my opinion, the crux of this particular story / benchmark (and where I believe you are missing the point):
  • Going from 400+ secs to 42+ secs (almost TEN-FOLD reduction of time, STILL within Atari Basic interpreter !) simply tells us how WOEFULLY inadequate the environment (FP routines / BASIC implementation) were, instead of what the HW platform (CPU, memory, supporting chipsets, etc.) itself TRUE capabilities.
  • When seeking real, definitive benchmarking results, the LATTER is what I am interested on (because it is what I can't immediately control or change). The first part, however (SW / Operating environment) we have much better control of, for almost any platform. 
If it was up to me, I would invite ANYONE reading this thread, on ANY COMPARABLE platform of their choice, running this benchmark to such platform's BEST ability, and show the results here... that would be really, really interesting (putting aside the markedly skewed nature of the benchmark, which mostly hammers Floating-Point and integer-processing handling).

Cheers!

You list all that stuff that needs addressed before you can benchmark and forget one crucial thing.  None of that is the actual benchmark code.
And that is why your opinion is wrong.

Look, if someone wants to post stuff like that, by all means, post away.  We've included compiler numbers, results with and without A*A instead of A^2, etc...all along
But don't leave out the tiny minuscule little detail that you are using a tuned benchmark until someone calls you on it.
 


Edited by JamesD, Tue Aug 8, 2017 5:40 PM.


#132 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,755 posts
  • Location:Flyover State

Posted Tue Aug 8, 2017 5:46 PM

If someone optimizes the BASIC interpreter like BASIC++, Altirra, etc...that's awesome.
You know why?  Every BASIC program benefits from that.
If you optimize the benchmark, what benefits from that?   Only the benchmark.

 



#133 Faicuai OFFLINE  

Faicuai

    Dragonstomper

  • 701 posts
  • Location:Florida, U.S.A.

Posted Tue Aug 8, 2017 5:50 PM

(...)

Look, if someone wants to post stuff like that, by all means, post away.  We've included compiler numbers, results with and without A*A instead of A^2, etc...all along (..)

 

Exactly!

 

And by doing so (by taking into account the profound and marked deficiencies of our beloved Atari's OS/Basic framework), those prior numbers have been already shattered. The benchmark itself is virtually unchanged. You can even discard unrolling, if you wish (little improvement with Atari Basic).

 

And that is the final point that I am attempting to illustrate: (relative) tons of juice on this platform... totally wasted. That's what this benchmark shows. NOTHING else.

 

Cheers!

P.S. As a side note, I would LOVE to see this benchmark coded (in a modular way) in ASSEMBLER and run it accross ALL 6502 platforms we can put our hands on (by just changing graphics buffer address, and Floating Point / Match vectors, etc.). THAT would be lovely!



#134 Faicuai OFFLINE  

Faicuai

    Dragonstomper

  • 701 posts
  • Location:Florida, U.S.A.

Posted Tue Aug 8, 2017 11:17 PM

Latest update on this famous little thread (UPDATE #3 - 8/10/2017):

 

=> Implementation NOTES (all runs):

  • ANTIC=OFF​ for maximizing 6502 CPU bandwidth (unless otherwise noted).
  • a=A*A for direct FP-mult. math look-up / routines (unless otherwise noted).

 

=> (Atari 800/Incognito, Colleen Mode / AXLON, SDX, OS-b + Newell high-performance FP roms), and ATARI BASIC (Rev.C) Interpreted:

  • Accuracy: 0.013649  (pretty steady)
  • Random: 7.785987 (varies all over the place)
  • Time (s): 59.8000 
  • Time (s): 55.4666 (Inner For / Next Loops unrolled. Slow handling of integer For/Next loops in Atari Basic)

 

=> (Altirra 2.90 w/ FP=OFF, SDX, XE ROM patched w/ optimized FP pack), and ATARI BASIC (Rev.C) Interpreted:

  • Accuracy: 0.013649  (pretty steady)
  • Random: 11.306536 (varies all over the place)
  • Time (s): 46.3000 
  • Time (s): 42.9166 (Inner For / Next Loops unrolled. Slow handling of integer For/Next loops in Atari Basic) 

 

=> (Altirra 2.90 w/ FP=OFF, SDX, XL ROM Rev.2 OEM), and MICROSOFT BASIC (v1.0) Interpreted:

  • Accuracy: 0.111523  (poor precision, seems to have its OWN FP library, independent of O/S)
  • Random: 11.306536 (varies all over the place)
  • Time (s): 43.0667 

 

=> (Altirra 2.90 w/ FP=OFF, SDX, XE ROM patched w/ optimized FP pack), and BASIC XE (v4.1p) Interpreted:

  • Accuracy: 0.013649  (pretty steady)
  • Random: 14.79776 (varies all over the place)
  • Time (s): 37.9666
  • Time (s): 35.5333 (Inner For / Next Loops unrolled) 

 

=> (Altirra 2.90 w/ FP=OFF, MyDos, ALTIRRA ROM), and ALTIRRA BASIC (v1.54) Interpreted:

  • Accuracy: 0.000452  (WoW! BIG jump in precision !!!)
  • Random: 2.605347 (varies all over the place)
  • Time (s): 33.9833
  • Time (s): 32.7000 (Inner For / Next Loops unrolled)

 

=> (Altirra 2.90 w/ FP=OFF, MyDOS, XL ROM Rev.2 OEM), and TURBO BASIC (1.5):

  • Accuracy: 0.013649  (pretty steady)
  • Random: 2.10417 (varies all over the place)
  • Time (s): 26.68 (non-compiled)
  • Time (s): 25.50 (non-compiled, Inner For / Next Loops unrolled)
  • Time (s): 21.75 (compiled, Inner For / Next Loops unrolled)

 

=> (Altirra 2.90 w/ FP=OFF, SDX, XE ROM patched w/ optimized FP pack), and BASIC++ (v1.08) Interpreted:

  • Accuracy: 0.014842  (slightly lower precision)
  • Random: 8.052295 (varies all over the place)
  • Time (s): 40.9500  ​(ANTIC=ON,   A=A^2, NO inner For / Next unrolling)
  • Time (s): 37.8666  ​(ANTIC=ON,   A=A*A, NO inner For / Next unrolling)
  • Time (s): 28.0000  ​(ANTIC=OFF, A=A^2, NO inner For / Next unrolling)
  • Time (s): 25.9000  ​(ANTIC=OFF, A=A*A, NO inner For / Next unrolling)
  • Time (s): 24.1000  ​(ANTIC=OFF, A=A*A, inner For / Next unrolling)

 

=> (Altirra 2.90 w/ FP=OFF, SDX, XE ROM patched w/ optimized FP pack), and ALTIRRA BASIC (v1.54) Interpreted:

  • Accuracy: 0.014842  (slightly lower precision)
  • Random: 5.4557 (varies all over the place)
  • Time (s): 77.5000 (ANTIC=ON,   A=A^2, NO inner For / Next unrolling)
  • Time (s): 52.9166 (ANTIC=OFF, A=A^2, NO inner For / Next unrolling)
  • Time (s): 28.8833 (ANTIC=ON,   A=A*A, NO inner For / Next unrolling)
  • Time (s): 19.7333 (ANTIC=OFF, A=A*A, NO inner For / Next unrolling)
  • Time (s): 18.5800 (ANTIC=OFF, A=A*A, inner For / Next unrolling)

 

In summary:

 

  1. Up to TWENTY (20) TIMES faster results could be attained with exact same base HW (setting aside nature of Basic interpreter optimizations).
  2. A long, long way from the dumb-ass 405+ secs. of original timing listed back in the days... Will hardly get any better than this while still preserving benchmark's core logic / structure intact. 
  3. Basic++ 1.08 manages to extract impressive improvements, from what seems a pretty small code-base (8K). Maybe larger once loaded? It also seems to handle optimally integer powers (^2) by calling fast FP mult. function (!)
  4. Altirra Basic manages to outperform almost any other package BOTH in speed AND precision departments (latter with Altirra OS loaded), also from what seems to be a pretty small code-base (8K).

 

Cheers!



#135 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,755 posts
  • Location:Flyover State

Posted Wed Aug 9, 2017 10:08 AM

You have to wonder how many hours in run time were wasted across millions of computers simply because these BASICs were slow.



#136 Faicuai OFFLINE  

Faicuai

    Dragonstomper

  • 701 posts
  • Location:Florida, U.S.A.

Posted Thu Aug 10, 2017 8:28 PM

LATEST UPDATE:

 

Now with Basic++, with ANTIC both ON and OFF, and with both A=A^2 and A=A*A, and with and without loops unrolling.

Coincidentally, and after-the-fact, found this article where it discusses precisely THIS benchmark and most of its implementation challenges, impact and deeper limitations embedded on Atari (points EXACTLY on the same direction as several of us did here, in every step of the way). VERY interesting read:

 

 

http://www.atarimaga...ight_atari.html

 

 

Cheers!



#137 kenjennings OFFLINE  

kenjennings

    Dragonstomper

  • 758 posts
  • Me + sio2pc-usb + 70 old floppies
  • Location:Florida, USA

Posted Fri Aug 11, 2017 7:57 PM

 

. . .

 

And that is the final point that I am attempting to illustrate: (relative) tons of juice on this platform... totally wasted.

. . .

 

It is a benchmark running in BASIC.  By definition it totally wastes the performance of whatever computer it runs on.  Did I mention this is BASIC?    The purpose of a benchmark in BASIC (wow, that's an oxymoron, isn't  it?) is to evaluate how poorly BASIC performs.   (and by extension the other things we know that affect this, like the difference a slow floating point library makes.)   

 

AND, some kind of ballpark figure measuring BASIC was actually a useful goal back in the day.  In the 70s/early 80s a major purpose of 8-bit computers was to run BASIC.  In fact, for some computers BASIC was the operating system and user interface for users.  Many, MANY people buying 8-bit computers would not  make it as far as assembly coding, (unlike the rest of us basement dwellers without meaningful social lives,) so a general comparison of BASIC languages was a reasonable, if not precisely accurate yardstick to evaluate their user experience.   Today, its primary function is as gasoline for a burning debate on efficiency.



#138 kenjennings OFFLINE  

kenjennings

    Dragonstomper

  • 758 posts
  • Me + sio2pc-usb + 70 old floppies
  • Location:Florida, USA

Posted Fri Aug 11, 2017 8:02 PM

My favorite "benchmark" was the Dead On Arrival/Failure Out Of The Box industry stats that one of the multi-platform magazines published.   Wish I could remember which one that was.   I recall  not many computers came close to the reliability of the Ataris. 



#139 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,755 posts
  • Location:Flyover State

Posted Fri Aug 11, 2017 9:41 PM

It is a benchmark running in BASIC.  By definition it totally wastes the performance of whatever computer it runs on.  Did I mention this is BASIC?    The purpose of a benchmark in BASIC (wow, that's an oxymoron, isn't  it?) is to evaluate how poorly BASIC performs.   (and by extension the other things we know that affect this, like the difference a slow floating point library makes.)

It is a bit of a who sucks the least competition.   :grin: 

For the companies selling the machines, it was important to have A BASIC, not A FAST BASIC.
And companies like Microsoft were all too happy to deliver the bare minimum.

After seeing how little effort went in to optimizing Microsoft's code... I have to wonder how nobody came out with a more competitive product.
Everything I've optimized up through the last release, still fits in 8K!  What the hell was their problem?
Skipping a few passes between BREAK checks literally took 4 instructions and less than a minute to write!  Instant free clock cycles!
The multiply took a while to get just so... but I would think Tandy or Motorola would have wanted their machines to be faster.

The 6800 version of BASIC followed the 8080 version rather quickly.   It was within months.  The 6803 came out in 1978 and the 6809 shortly after from what I can tell.
The MC-10 came out in 1983.  They had about 8 years to come up with the skip, and 5 years to take advantage of the multiply instruction, but they never did either one!
 



#140 Faicuai OFFLINE  

Faicuai

    Dragonstomper

  • 701 posts
  • Location:Florida, U.S.A.

Posted Sat Aug 12, 2017 9:42 AM

(...) Many, MANY people buying 8-bit computers would not  make it as far as assembly coding, (unlike the rest of us basement dwellers without meaningful social lives,) (...)  Today, its primary function is as gasoline for a burning debate on efficiency.


IMG_0260.JPG

Well I do have a social life... but could not help chuckling with that comment... LoOoOoL !!! ;-

#141 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,755 posts
  • Location:Flyover State

Posted Sat Aug 12, 2017 11:32 AM

I don't really have a basement






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users