A collection of Atari BASIC benchmarks

thorfdbg · June 2, 2021

36 minutes ago, Faicuai said:

Basic++ 1.08 CANNOT run Bench64.bas to completion, in presence of XE03IFPE and XE03IFPX OS-loads, and not even OS/b with Newell FastChip. At least, I have not been able to find the right settings or procedure to get it working here, but Altirra Basic and TurboBasic work like a charm.

Of course it can. You must have made something very wrong. It's really that easy: Insert Basic++ as cartridge, start the system (actually, this was probably too obvious?)

Maury Markowitz · June 2, 2021

1 hour ago, thorfdbg said:

What's "Bench64"? There is no Bench64 in the github repo you quoted.

scruss's mega-bench program. It's in the repo now.

Maury Markowitz · June 2, 2021

9 minutes ago, thorfdbg said:

I'm adding these to the results table, is this with Atari's original math package, or a different one?

Faicuai · June 2, 2021

35 minutes ago, thorfdbg said:

It's really that easy: Insert Basic++ as cartridge

WRONG.

Already tried with ROM image, as well, from Ultimate/SD and AVG carts.

It crashes when hitting "Math" tests (this is on A800 / Incognito), even with OS/b + Newell FP pack.

It also crashes with Altirra and extended XL/OS FP-packs. It fails where rest of Interpreters DO WORK

This can all be perfectly replicated with the .ATR I compiled above for this purpose.

I can test pretty much anything you want, over here. Let me know If I can help with anything in particular.

(NOTE: I have managed to run it correctly ONLY if original OS/B or XL/XE 03/04 FP packs are present. If its running on your ++ setup, it is because there is likely a dependence between Basic++ and OS++ that enables it to run correctly. But outside of that, it will crash).

Faicuai · June 2, 2021

1 hour ago, Maury Markowitz said:

scruss's mega-bench program. It's in the repo now.

Here are actual times (from REAL hardware), running in full interactive E: session over XEP80 (can be fully replicated as well with Altirra XEP80 emulation), or by simply turning ON / OFF DMA with the OS-loads I posted on last .ATR, pressing CTRL+Inverse, and then any key to re-enable):

Altirra Basic 1.57 + Altirra FP-pack, Colleen + XL compatible (slightly ahead of BBC-Micro):

Here with Turbo-Basic XL v1.5 (64KB XL-compatible), flexing its optimized "muscles":

For your reference...

thorfdbg · June 2, 2021

1 hour ago, Faicuai said:

WRONG.

(NOTE: I have managed to run it correctly ONLY if original OS/B or XL/XE 03/04 FP packs are present. If its running on your ++ setup, it is because there is likely a dependence between Basic++ and OS++ that enables it to run correctly. But outside of that, it will crash).

Right. There you go. There is a dependency between the mathpack entries in the original(!) unaltered(!) math pack and Basic++.

If other math packs use different/other entries, not a Basic++ fault. Actually, the entries Basic++ are using are those that are also used by other carts....

Faicuai · June 2, 2021

25 minutes ago, thorfdbg said:

There is a dependency between the mathpack entries in the original(!) unaltered(!)

Basic++ is the only one that can't handle them.

Sorry, we can't test Basic++ with any other FP-pack, and for this reason, it will always lag behind the rest of (better=replacement) interpreters.

Case closed. (UPDATE: ok, not so closed... Could you please try running FPTEST34.BAS from last image I posted and show a screen-shot of final index, so I can try to guess where your FP-pack is baselining? Is it 2KB in size or larger?)

Maury Markowitz · June 2, 2021

These numbers are significantly higher than what I got under Atari800Mac for Turbo 1.5. I got an overall Index of 164.

Does running under the XEP get rid of wait states perhaps?

Faicuai · June 2, 2021

8 minutes ago, Maury Markowitz said:

Does running under the XEP get rid of wait states perhaps?

No.

When running your E: session via XEP80 (and interfacing through PIA), the traffic legwork is managed by 6502 directly, therefore ANTIC DMA is turned off, as it is not needed during text-based interaction (you can enable DMA at any time, fyi).

The A8's processing "power band" is pretty much elastic. Even some Antic-based mixed graphics mode return 80-85% of 6502 potential. And all supported at OS-level (it is a resource, per se).

VBXE users will see the exact same performance boost. And that would also apply to 800's Bit3 80-column card, which dates back to 1982.

Maury Markowitz · June 2, 2021

17 minutes ago, Faicuai said:

therefore ANTIC DMA is turned off

... thereby eliminating the CPU waiting on memory. Perhaps there is some other term than "wait state" I should use?

In any event, these numbers are not the same conditions, which explains the different numbers.

Faicuai · June 2, 2021

10 minutes ago, Maury Markowitz said:

which explains the different numbers

Yes.

About 30% of CPU bandwidth remains unused in your initial conditions.

It can be reclaimed in at least two (2) different ways without altering source code (in Basic).

(Wait-states... CPU-halt... well, we know what it is, and we are on the same page, already...)

Maury Markowitz · June 2, 2021

2 minutes ago, Faicuai said:

Can be reclaimed in at least two (2) different ways without altering source code (in Basic).

Sure, but then it's no longer the same benchmark. The conditions of the test are just as important as the code.

Can you run the first set in ANTIC 0?

Edited June 2, 2021 by Maury Markowitz

Faicuai · June 2, 2021

8 minutes ago, Maury Markowitz said:

Sure, but then it's no longer the same benchmark.

Well, that is not the case, for obvious reasons.

If you compare results to a host running with most of its CPU power-band, it only begs the question whether you can or cannot match that condition. This is part of the environment, not the source code. And on the Atari. the answer is a resounding YES, and in different ways, because control is already present at OS-level.

FYI, even Action! provided built-in controls for this, right from 1983. There is nothing new or revolutionary here, at all, trust me.

For the results table to be complete, both DMA=on and DMA=off output should be included. The rest is just a self-inflicted wound that has no real or sound justification, either in the functional scope of these tests, or in today´s times.

+Stephen · June 2, 2021

2 hours ago, Faicuai said:

For the results table to be complete, both DMA=on and DMA=off output should be included.

Correct, but you have a habit of always showing your benchmarks with DMA off and often times not mentioning it, to make the Atari seem faster than it is in "normal operation" by most people. Consider a road rally where you're going for top MPG in matched stock cars. You take one car, take all the tension off the rear drum brakes, over-inflate the shit out of the tires, and remove the air cleaner so you manage to squeeze out an extra 2MPG. You can't claim the stock car gets better mileage.

Mazzspeed · June 2, 2021

5 hours ago, Faicuai said:

Altirra Basic 1.57 + Altirra FP-pack, Colleen + XL compatible (slightly ahead of BBC-Micro):

Except the BBC will most likely achieve it's results with the screen on while not using an external 80 column solution running the Basic it shipped with. [rolleyes]

If I was to use the car analogy, what you're trying to do is strip the car down to it's basic chassis/driveline and seat to reduce weight to win a race against a similar slightly newer car in stock configuration.

Edited June 2, 2021 by Mazzspeed

Faicuai · June 2, 2021

39 minutes ago, Mazzspeed said:

with the screen on while not using an external 80 column

It is irrelevant (eg. external 80-cols. solution is not needed).

What is relevant is clarity on how results are achieved., and how existing resources can or are best used, in the context of the given tasks.

With that I would agree 100% (which is precisely what Bench64's author complains when it comes to how similar tests are carried out on the BBC platform, and what motivated him to write Bench64, in first place! ?).

Mazzspeed · June 2, 2021

4 minutes ago, Faicuai said:

It is irrelevant (eg. external 80-cols. solution is not needed).

What is relevant is clarity on how results are achieved., and how existing resources can or are best used, in the context of the given tasks.

With that I would agree 100% (which is precisely what Bench64's author complains when it comes to how similar tests are carried out on the BBC platform, and what motivated him to write Bench64, in first place! ?).

Except it is relevant, as people aren't going to run Basic programs with the screen turned off, and the number of people using XEP80's can probably be counted on one hand. If every other machine is running in stock configuration, then it stands to reason the Atari must also run in stock configuration for this test in order for the benchmark to be meaningful.

The A8 is impressive for it's age, it's OK to admit there are newer designs that may be better at certain tasks.

Mazzspeed · June 2, 2021

In case you missed the benchmarking criteria regarding bench64:

Faicuai · June 2, 2021

2 minutes ago, Mazzspeed said:

as people aren't going to run Basic programs with the screen turned off,

In the context of benchmarking, of course they will. Why? Because we want to know where the real limits are That's all.

Action! (circa 1983, and pretty much the fastest cart-based, higher-level programming package that I remember seeing anywhere on the 8bit world) incorporates such controls configurable on its Options menu, in order to accelerate inert I/O/ file-opening, compiling-times, etc. They seem they were truly serious about performance, back then.

In this case, context is (again) important: these tests produce essentially nothing on the screen, other than partial timing-results. Fireworks come at the very end. Why not cutting that bullshit altogether, and enable the host machine to crunch them faster? Answer: there is no reason for not doing so (none), as long as it is disclosed clearly (as pointed out above).

In the same car-mechanics analogy, my C63 AMG came with dual, electronically-controlled throttle-body valves limited to about 80%-85% of flow. By ensuring the could be opened fully, total power output jumped to 517-520 hp., by essentially doing nothing. If an opportunity arises for fully opening them, why not doing so when possible? Answer: there is no reason for not doing that.

The ONLY thing that matters, at the end, are: 1) what resources are available on the system, 2) how can they be (or are being) used to accomplish any given task. This applies, of course, to ANY other system, as well, as long as properly disclosed (agree 100% with that).

Faicuai · June 2, 2021

10 minutes ago, Mazzspeed said:

In case you missed the benchmarking criteria regarding bench64:

My machine boots automatically into 80-columns mode, all the way in (if that is the case).

In case you don't understand what is being asked, he refers to BBC OS-driven interrupts, for instance, which are likely to give another 3-4% of performance boost on the system.

The BBC-Micro Basic interpreter runs on a space of 16 KBytes, loaded with plenty of optimization tricks (including single-char. variables with fixed RAM addresses, fast line-look up, etc.) and math is ALL done in binary (not BCD). It is faster, and more precise, just to begin with.

The Atari, on the other hand, runs here with 8Kbytes of ROM space for the Basic interpreter, and 2Kbytes of ROM for BCD FP pack, and manages to come pretty close to the BBC-Micro, with 7 digits of precision (in general) as long as DMA operations are off.

In that context, we could also run some tests on the Atari with DMA=ON that will pretty much make the BBC puke on the spot (if that were the case).

However, we are just testing here mere Basic interpreters, that are outputting text messages on the screen, while crunching "something" in the background, Nothing else.

Mazzspeed · June 3, 2021

29 minutes ago, Faicuai said:

My machine boots automatically into 80-columns mode, all the way in (if that is the case).

In case you don't understand what is being asked, he refers to BBC OS-driven interrupts, for instance, which are likely to give another 3-4% of performance boost on the system.

The BBC-Micro Basic interpreter runs on a space of 16 KBytes, loaded with plenty of optimization tricks (including single-char. variables with fixed RAM addresses, fast line-look up, etc.) and math is ALL done in binary (not BCD). It is faster, and more precise, just to begin with.

The Atari, on the other hand, runs here with 8Kbytes of ROM space for the Basic interpreter, and 2Kbytes of ROM for BCD FP pack, and manages to come pretty close to the BBC-Micro, with 7 digits of precision (in general) as long as DMA operations are off.

In that context, we could also run some tests on the Atari with DMA=ON that will pretty much make the BBC puke on the spot (if that were the case).

However, we are just testing here mere Basic interpreters, that are outputting text messages on the screen, while crunching "something" in the background, Nothing else.

I understand exactly what is being asked, it appears you're the one manipulating context to gain an advantage - As always. If the BBC has an advantage in stock configuration, than the BBC is the better design in the context of this benchmark - It's that simple.

Next thing you know, you'll run the test on the Z80 fitted to your IndusGT with the A8 running as no more than a terminal, and claim the A8's faster! But if the BBC user runs with a RPi based ARM CPU fitted to it's Tube port (which is how the machine is on power up), you'll cry foul.

35 minutes ago, Faicuai said:

In the context of benchmarking, of course they will. Why? Because we want to know where the real limits are That's all.

All you're really highlighting is that early variants of DMA aren't ideal as the implementation steals too many CPU cycles when using the device as intended.

Edited June 3, 2021 by Mazzspeed

Faicuai · June 3, 2021

4 minutes ago, Mazzspeed said:

it appears you're the one manipulating context to gain an advantage

Of course not, because we are talking about maximizing system-resources utilization.

That is an unquestionable and irreductible principle.

Now, disclosing HOW it is done, is important, of course.

Mazzspeed · June 3, 2021

7 minutes ago, Faicuai said:

Of course not, because we are talking about maximizing system-resources utilization.

Some call that cheating to gain an advantage. Describing how it is done is interesting from a proof of concept context, but invalid when it comes to the context of the benchmark.

If I was to run the exact same test on an Ultimate 64 at 48Mhz it would romp it in - I could also argue that's how the machine was when I powered it on. However in doing so I'd be like the AMG C63 driver that moves into the overtaking lane, flying past the other car that was stationary at the lights and pulling away, and then claiming I won the race.

Edited June 3, 2021 by Mazzspeed

Faicuai · June 3, 2021

27 minutes ago, Mazzspeed said:

Some call that cheating to gain an advantage.

082AC60C-C798-456B-8A3E-923C2D81F4B3.jpeg.aefebe91b2f66a80e8ba347a6c9ef19f.jpeg

Using the machines´ EXISTING CPU, to its originally rated clock-speed, thus better suiting it for the specific task-on-hand.. and you call that cheating?

Yeah. it did not think so, either.... So what I conclude from this exchange is that:

1. I, in one hand, look forward to do the best I can with existing resources, while..

2. You, on the other hand, profoundly laments (or resents) any realized improvement on that front.

I will definitely stick to #1, any time of the day.

Have fun!

Mazzspeed · June 3, 2021

20 minutes ago, Faicuai said:

Using the machines´ EXISTING CPU, to its originally rated clock-speed, thus better suiting it for the specific task-on-hand.. and you call that cheating?

To requote myself:

1 hour ago, Mazzspeed said:

All you're really highlighting is that early variants of DMA aren't ideal as the implementation steals too many CPU cycles when using the device as intended.

If you want to cheat, the PAL +4 would probably be the fastest machine there with the screen blanked in NTSC mode as in such a scenario the +4 is 115% faster than a C64. Furthermore, in your situation, you always run a script to blank the screen before starting the benchmark, so you're effectively running additional software to gain an advantage when running the benchmark.

I've got all your scripts here, you have scripts just to change a directory. Even the C64 would technically benefit from screen blanking as the VIC-II wouldn't be pulling AEC low during every badline.

This is usually where you start posting pics of half naked Women.

Edited June 3, 2021 by Mazzspeed

A collection of Atari BASIC benchmarks

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members