JamesD
-
Content Count
8,999 -
Joined
-
Last visited
-
Days Won
6
Posts posted by JamesD
-
-
Does this program actually finish? It's been running for 20 minutes.
-
1 hour ago, Faicuai said:Ok, once again, let's put this to the test:
Please, run SORTBAS7.MSB (or .MST for ease of conversion on your end), as is (it is written for MS-Basic, with essentially minimal adaptions), and post the results here.
Let's see.
SORTBAS7.MSB isn't formatted so I can use it.
How about carriage returns instead of >
I used the data from the text file version and reformatted the code.
GOTO everywhere. This is definitely not written by someone used to optimizing MS BASIC code. -
3 hours ago, Faicuai said:Not correct.
MS Basic II and Indus/GT-Z80 timings (running MS-Basic as well) are also posted on that thread, at the very end (just as I reported them here with Banana sort).
Quite a wide plethora of Interpreters and optimization settings were included on that summary. However, Inter-platform MS-Basic timings are subject to significant implementation variations, though, and to such an extent that I don't take MS-Basic results as any real solid / stable reference of machine-to-machine performance. Even Basic XL + Newell/FPP (from 1983-1984) would have blown Atari MS-Basic II out of the water, as long as the OS FP package was used!
And still within 8,192 bytes of ROM, and same 16 Kbytes of OS-ROM, we can already see how much better (and superior) the Atari is capable of performing, without absolutely any change to the underlying CPU (clock, speed, etc.) or RAM timings.
If you look at how MS BASIC is implemented, it works pretty much the same way on every 68xx & 6502 platform.
If anything, it's closer to the machines doing the same work than any other test.
The one significant difference... patching the BASIC to take advantage of the hardware multiply on the MC-10 & CoCo 3 makes a huge speed difference.
The MC-10 number even came from an early version of BASIC I created that is missing most of the optimizations I've written. Most of the changes in that version were to squeeze in the larger multiply code.
I know you don't like the slower numbers, but it is most definitely a good machine to machine comparison. -
5 hours ago, Faicuai said:ALL timings, for ALL AHL variants / configs. on Atari, here (already in 18 secs. territory... far below 42 secs...):
Now that you mention HW speed preference, thought I would refresh your memory...
Not for a MS BASIC.
-
1 hour ago, Faicuai said:Unfortunately, most of these observations seem to me out of practical context.
An algorithm does not change if the underlying expression (mathematically) does not change. 2^n is just 2.2.2.2... given n an integer. It would be broken, however, if such parameter was computed and produced as the real space and then processed as an integer. Even so, there is no problem in presenting the code WITH and WITHOUT its algebraic expansion, as we will clearly see who's trapping the expansion or not (which is the case of IBM's PC/MS-Basic, and a match co-processor has little to do with this).
Resolving m^n (as s simple example) is just a WRENCH that can be thrown at these interpreters. Another wrench, as we learned, is a sort routine based on DATA-statement look-up (and long line #s). Talk about a BIG wrench. Another wrench is Banana-sort itself, as it relies on maintaining a properly updated sort-index on real time, which requires as fast array access, indexing and memory copy as possible, record after record. Even Atari Basic comes close to MS-Basic II on this test (on ATARI's implementation).
We definitely see this process from different perspectives. But the results are the only ones that count. Nothing else.
Practical? Do benchmarks have to be practical?
The DATA statements are a PITA, but the MC-10 doesn't have a disk drive, and that's what I used to test my BASIC changes.
Sorry it turned out to be a wrench for the Atari.
Would a disk file have been better? That would be a PITA on MS BASIC since every DOS is different.
Is the banana sort a variation of an insertion sort? I haven't spent much time looking at it.
The standard MC-10, Apple, and Plus/4 ROMs usually copy a byte at a time.
My update uses some 16 bit memory copies, but I don't know which code this will pass through to do that.
If you have to use ^3, or ^5, it uses the same code as ^2 on every 8 bit. *2 does not.
You'll never budge, so I won't waste any more time but to say the following...
I've already patched the MC-10 and CoCo 3 to use a hardware multiply.
Your last number for Atari using that optimization was 42.3.
On the MC-10 with the new ROM, using *2 instead of ^2 drops the time from 66 seconds, to about 42 or 43 seconds (hand timed)
That's about a 36% difference!
Are you going to tell me the MC-10 is as fast as the Atari?
With an HD6303, the hardware multiply is 3 clock cycles faster (30%), and many opcodes are only 1 clock cycle.
That would put it ahead of the Atari even with just a 5% speedup.
The CoCo 3 runs at double the clock speed, so even without the multiply or any other patch, using *2 comes in at 43.55, machine timed.
Using *2 with the multiply patch should easily put it in the 27 second range if it has the same speedup.
A CoCo 3 with an HD6309 is 20% faster in native mode on 6809 code, and I could implement some of the math functions using the additional 16 bit register, combined 32 bit register, and could use memory move instructions.
That would put it in the sub 20 second range.
According to Creative Computing, the IBM PC was timed at 24 seconds... even with it's internal *2 optimization in BASIC
A note on the scan shows the Amiga at 22.
The BBC Micro is 21, but who knows with what optimizations.
Do you still think this is fair? -
1 hour ago, drpeter said:OK and what is the reason for that interest? Are you imagining which machine you might buy if you were transported back to 1980? Or is it more of a fascination with the development over time of 8-bit computing power? Or is is an interest in seeing if you can make a given system perform faster? For me, the chief fascination is in exploring imaginative solutions to a programming challenge within the tight constraints of a particular system. This started with the challenge of seeing whether a stock Atari could even be programmed to do the full 891 name sort at all. I'm less interested in whether another machine from the year before or the year after might be cajoled into performing the same arbitrary task a few seconds, or even minutes quicker or slower. Although it's of interest to see how different BASIC implementations make particular programming tasks simpler or harder. But that's just me and where I'm coming from.
For me, these machines were competitors 40 years ago, not nowadays. I'm not so much interested in the C64 or the Apple II or the BBC micro, not because I think they are bad or inferior machines, but simply because I never owned one. I love the Atari simply because it was my first computer, and you never forget your first love, do you? Oh, or Star Raiders and M.U.L.E 😀
Do we really need a specific reason?
My interest is a little bit of all of the above.
There's the 'this is how the machines performed on different benchmarks' aspect, and how the performance changed over time.
There's the what if angle... like what if some worthless manager actually made a smart decision to update the BASIC they included with the machine, possibly even before it's release. Seriously, a 4%-5% speedup on the CoCo, and MC-10 required almost no effort.
The Atari? How do you slow down the one of the fastest personal computers on the market (at intro), to where it looks the slowest??? And then you NEVER do anything about it even though BASIC came on a freakin cart? Why couldn't they have
It's also interesting to see the different solutions to a problem, while sticking to just BASIC.
MS BASIC is pretty strait forward with string handling, and that was mostly tweaking the sort to work properly using for loops.
Porting the MS BASIC to Apple, and the C64 required changing one command, and it took a couple minutes to look up the Plus/4 display disable POKE.
The new gap calculation was sort of a "how can I do this with less code, and no divide?" challenge, thinking that would speed it up a tiny bit. The fact it seems to work that well was a total surprise, but it may not be as efficient with much smaller, or larger data sets.
The Atari was a bit different. I remember seeing a couple comments about it's poor string handling, and was curious to see if/and how it could do it.
How many different versions and tweaks were tried before coming to the "final" one?
Imagine if Atari had actually funded a BASIC XL like project, and introduced it by 1981, and faster math to go with it before the XL series came out.
Speaking of XL, did you ever run the last version of the code on BASIC XL and XE?
-
3 hours ago, Faicuai said:Yep... like modifying actual MS-Basic statements by removing literal values, and reference (instead) global constants... or like by running MS-Basic across different machines and finding out a WIDE latitude of optimizations and versions... like PC MS-Basic / Basic, for instance, which catch 2^n operations and handle them as discrete, serial multiplications (from very early on its life)...
Did the algorithm change by putting constants in variables? No
Did the algorithm change by switching from ^2 to *2? Yes
Every other machine benchmarked in Creative computing used ^2.
Does the PC benefit unfairly from recognizing that optimization? Maybe. but then that's part of BASIC itself, not a change to the benchmark.
They could have used the math coprocessor for drastically better results, so if anything, PC BASIC is slower than it could be.
Besides, isn't that why we are benchmarking modern optimized BASICs?
So people can add things like that if they want?
3 hours ago, Faicuai said:I think you got the picture upside-down. What we need to do is (if working with Basic);
- Identify the top 5 wrenches we can throw at ANY interpreter to understand how they handle them, and the effects of their design and implementation choices.
- Throw these wrenches sequentially at EACH interpreter and present, for EACH run, two (2) results: unoptimized (as is), and then optimized to handle the wrench, if the interpreter instance allows for it. If not, timing for 2nd run is the same as unoptimized.
- Totalize all timings, and report the lowest aggregate #.
That is a much more solid (and transparent) approach. Much. much harder to later debunk, in fact.
Because clearly, no optimization was used in the Atari code but lots was used in the MS BASIC code.

MS BASIC wrenches.
It doesn't convert constants to binary at tokenization time. Not even line numbers. They must be converted from ASCII to float every time.*edit* It converts the line #s at the start of the line, not line #s in GOTOs, GOSUBs, or ON GO... statements.
This slows down GOTO & GOSUB
Using FOR loops saves the address of the end of the FOR statement, making it easy to return to with the NEXT, so it's faster than GOTO.
It puts variables in it's table in the order they are created, and searches from start to finish to find them. FLOATS are stored before strings.
This is why MS BASIC programs almost always define variables at the top.
Since you complained about how I optimized the MS BASIC code, I just changed line 0 to define variables in slightly better order for the sort, and the MC-10 time dropped another second. Not much, but if you don't define them it can add up to quite a difference. If I play with it, I might knock off another second.
But you can see, defining/moving a variable or two can make a difference, even if it's small.
It doesn't store a pointer to the current line. Any loop on the same line using GOTO has to start searching from the start of the program.
It does check to see if the line number in the GOTO follows the current line number.
It also has to search for the end of a line character by character to find the end of line marker after a failed IF condition.
If you have a long line of code that depends on an IF at the start of the line, it's better to reverse the logic and GOTO a line a few lines down, and drop down for the true condition of the IF. But this also depends on how often the condition is TRUE, so you may have to benchmark it.
So, to make it run faster, this:
50 IF A > 1 THEN do a lot of crap:GOTO 70
60 do something else
70
Becomes:
50 IF A<=1 THEN 70
60 do a lot of crap:GOTO80
70 do something else
80
Those are the biggies off the top of my head.
NEXT without the loop variable is faster.
Spaces are left in the code except on the Apple II, which removes them during tokenization.
But you can't use certain variables on the Apple II because they may combine with keywords during tokenization.
On other machines, the spaces just slow down the interpreter.
The biggest problem with MS BASIC is inefficient implementation, which is what most of my optimizations target.
Poll the entire keyboard before every token is executed, inefficient parsing, 8 instead of 16 bit code...
If I convert constants during tokenization, create a faster way to look up lines, and a faster way to look up variables, everything should run faster.
Math offers the biggest room for improvement. A faster LOG function is on the list. -
19 hours ago, drpeter said:I think once again it is becoming unclear what the question is.
Is it which hardware is/was fastest?
Is it which hardware/software combination is fastest?
Is it which hardware/firmware/software combination is fastest?
Is it which hardware/firmware/software/application combination is fastest?
Is the question restricted to hardware and/or software of a certain time period?
Is the question restricted to high-volume off-the-shelf hardware?
Is it 'cheating' to skip parts of the benchmark made unnecessary by the string-handling method chosen?
Is it legitimate to change the sort algorithm, or even the sort type?
Is it legitimate to alter the number of items to be sorted or interfere with the data in other ways?
Is it legitimate to 'mod' the hardware and/or firmware, and if so to what extent?
What speed enhancing 'tricks' in the routine are legitimate?
Is it legitimate to 'mod' even the software itself (Fast Basic)?
etc. etc.
Bearing in mind that the chief thing demonstrated by a benchmark is the system's ability to run that exact benchmark.
Which thought led Bill Wilkinson no less to repeatedly comment that most benchmark results were worth the paper they were written on and little more....
Answers to these questions will depend on the point to be demonstrated, and I'm not at all sure we agree what that is any more, although it's certainly been an interesting debate so far ☺️
And I think some interesting things have been clearly demonstrated and clarified along the way.
And perhaps it's more about the journey than the destination, so long as you can enjoy the ride!
All good points.
I'm more interested in how fast the machines are. No, one rule for you, but no rules for me stuff.
Machine, which BASIC, and year. Year would also mean when the BASIC came out in the case upgrades.
Having to adapt the code to a different BASIC... there has to be some allowance for that.
Unrolling loops, changing the functionality as with going from ^2 to *2 in Ahl's benchmark, changing sort, etc... it's not the same thing even if the result is the same.
If some machines can sort more names, we can also include that number, and how fast it was. The Plus/4 would win this hands down, or a CoCo 3 with a program called Big BASIC.
The fastest machine is going to differ by year, and newer machines are likely to be faster. That's just the way it is.
It seems a bit disingenuous to say some machines that came out later can't be compared, when the Atari came out later than the Apple II, TRS-80 Model I, and Pet. The Atari had the benefit of later development, and it was designed to be better than those machines. The first three were designed to compete with with the KIM 1, Altair, etc... which weren't really even personal computers.
The whole I get to use a machine that was introduced later than the Apple, but you can't bring up Apples introduced later thing is laughable.
The Apple III from 1980 was about twice as fast as the Apple II. It should manage around the 107 number I posted for a 2MHz Apple with zero BASIC modifications.
The CoCo came out in 1980, and it could run anything in the ROM bank at double speed. That speeds up BASIC by 30%, a 6309 can make it another 20% faster, it's ROM would benefit from the same modifications as the MC-10, and there are ways to make it even faster without an accelerator.
So, is 1980 too late?
-
5 hours ago, Maury Markowitz said:But that's not true, the original basic, Dartmouth, was a compiler. This became common later, indeed, but it is not clear to me that they ever considered stop-n-go to be important.
Interpreters became the thing because they required little RAM, fit in a small ROM, and didn't require disk storage.
-
35 minutes ago, Faicuai said:Because it is SOFTWARE (and accesible to the masses)... and you CAN, of course, bring ANY modern interpreter for the Apple II, etc., whenever you wish so!
But 4 Mhz CPUs? Accelerators? WTF?
Oh, the IIc Plus is too late but software is different... then the date doesn't matter.
Accessible to the masses? You left off not the classes... I thought that was Commodore's line.
Anyone that wants a IIgs can probably buy one if they look around.
ebay prices are a bit absurd (starting around $150 shipped). but you can still find them locally for under $100.
You can also buy new accelerator boards for the IIe. A bit overpriced at $150, but then people are buying them at that price:
http://www.a2heaven.com/webshop/index.php?rt=product/product&product_id=147
I'm not seeing how that is out of reach for people. -
35 minutes ago, drpeter said:Only about 2% difference, vs 18% difference for 561 names
I *think* there was a 15+ second difference for 891 in my tests.
-
23 minutes ago, Faicuai said:I am really not sure what is the point of bringing into the picture a product released on 1988 with a 65C02 core running at 4 Mhz (!) It seems that "invincibility" got you so bugged that there was no choice other than traveling in space and time... 9 years ahead of the introduction of the 400 / 800 (!) 🤣
The best thing you could do is simply post the exact code you are running, the specific language / version you are using, so we can all run on our end (preferably on real HW, or the closest-best emulation for such platform). More often than not, it is becoming less clear what you are doing, though.
I also have the impression that Banana-sort (SORTBAS7.LST) will place quite a toll on your candidates, above (with no real wrench loaded at it!)... especially if using Microsoft Basic... I will try to find some time later, to run it myself...
What is the point of bringing a BASIC into the picture that came out over 20 years after the Apple IIc Plus?
And the first Apple II accelerator board came out in 82... something I already mentioned.
I haven't had time to look at the banana sort yet -
4 minutes ago, drpeter said:Just checking something- when in Altirra the fast floating-point math check-box is selected under Configure Emulation, I've assumed that this switches in fast 2K FP code (similar to the Newell chip) as replacement for the OS 2K FP chip.
Is that correct?
Reading what Altirra says when selecting this -'Intercept calls to the floating point math-pack and execute FP math operations in native code', I wonder if I've got this wrong?
If I remember right, one of the options runs the math on the host machine, not the 6502.
Have you been turning that on? That would definitely be fast! LOL -
1 minute ago, drpeter said:The difference is small for the 891 name sort as both gap calculations complete in 26 passes.
The difference for the 561 name sort is huge!
But what about the time difference for 891?
-
The full run on an Apple II takes 215 seconds.
At 2 MHz, it only takes 107 seconds, but every accelerator or fast Apple was clocked higher than that, and should finish it in under 100 seconds.
Those came out long before the modern Atari BASICs.
My IIc Plus will finish this in under a minute.
So, invincible? Against 1 MHz (or less), and crippled BASICs like the Plus/4... maybe.
-
4 minutes ago, Faicuai said:Told you.... invincible.
The FP-package you are using is the exact same located in XE04_FP sys. ROM.
I am still looking forward to results of running Banana-sort on other MS-Basic implementations... as the Atari MS-Basic II version gets CRUSHED by Altirra Basic + Altirra FP Package!
-
1 minute ago, drpeter said:Yes, I only recently twigged that the effect of F on elapsed time is both trigger-happy and wildly non-linear for a comb sort, so didn't pay this enough attention 😞
Given that the Atari code has been implemented without the benefit of string arrays, I think these timings are a near-miracle and a tribute to both Altirra BASIC and the fast maths chip, which for this benchmark speeds Altirra BASIC up 2.5-3x - an almost unbelievable effect and presumably, given the lack of maths involved, relating to a highly-optimised float to integer conversion routine plus optimisations in Altirra BASIC to take full advantage of that, since Atari BASIC is only accelerated by ~1.5x.
Yeah, I think the original calculation tries to drop the gap way too quickly at first.
I haven't done more than a handful of tests, but this gap calculation has always finished faster.
The new Math ROM + Altera certainly offers quite an improvement.
-
13 minutes ago, drpeter said:Ah. Given the rather random effect of varying the gap calculation on number of passes and time to completion of a comb sort, everyone needs to be using an EXACTLY equivalent gap calculation algorithm and not try to change/optimise the gaps generated.
Tiny changes to F can result in wild non-linear jumps in number of passes and completion times...
Changing the algorithm I was working to (progressive division of G by 1.3, as in the original code- which leads to the aberrant 28 passes for the 561 name dataset as previously noted) to the one you are now using reduces passes from 28 to 23 and the time to completion under the previously-posted conditions from 180.5 seconds to 148.1 seconds (ANTIC on) or from 132.6 seconds to 108.8 seconds (ANTIC off).
An 'optimised' choice of F for this dataset requires only 19 passes, with a further substantial reduction in time...
I think I said I switched to that a couple times, but... sorry if I wasn't clear.
That sounds a little more like what I was expecting here.
*edit*
If the HD6303 offers even a 10% speedup, it will be pretty close to the Atari with the Antic on. -
The Plus/4 turns in a time of 193.15 for the list of 561 names, full run, when turning off the screen for the sort.
-
2 minutes ago, _The Doctor__ said:is all of this being done on real hardware or it this a mix... emulation is pointless for benchmarking... you might get ball bark figure. Some are damn close but none are perfect..
I'm currently using emulation. I don't even have a cassette cable for my MC-10 at the moment.
The #'s are consistent over multiple runs.
No way I'm typing in all that data. For that matter, the only RAM expansion I have is the factory one so it might not hold that many names.
The MCX RAM expansion would though.
The Plus/4 is timing itself, so it should be pretty accurate since the interrupts are based on the emulator's internal timing.
Not sure about the Apple emulator, but it's always going to be the slow MS BASIC machine here.
I might have a serial cable to transfer the program with. -
43 minutes ago, drpeter said:PS ironically there is some sort of bug in the Turbo-BASIC compiler which means that in the specific case of the indexed-DATA-statements-as-string-array technique, compiled code actually runs significantly slower than interpreted source code!
567.9 seconds for full routine, 891 names (compiled)
328.3 seconds for full routine, 891 names (interpreted) (Antic off in either case)
Ouch!
-
51 minutes ago, drpeter said:By the way, are you still getting the 28 passes for 561 names? if not, we've lost the direct comparison of data churned...
28? Maybe for the 891, but not 561. I only get 23 passes with this sort using the modified gap calculation... which was mostly supposed to eliminate the division which is slow. Not that I'd be able to measure the difference.
It only makes two passes at 1, where 891 required 3
O = constant for One, Z = Constant for ZeroSpoiler0 I=0:G=I:N=I:C=I:D=I:E=I:NW=I:H=I:T=I:O=1:Z=0:GOTO20
1 Q=O:F=.79:G=INT(N/1.3):FORT=OTOZSTEPZ:T=0:PRINT"PASS =";Q;"GAP =";G
2 FORI=OTON-G:IFA$(B(I))>A$(B(I+G))THENT=B(I):B(I)=B(I+G):B(I+G)=T
3 NEXT:Q=Q+O:G=INT(G*F):IFG<OTHENG=O
4 IFG>OTHENT=O
5 NEXT:RETURN -
2 minutes ago, drpeter said:I think this applies to MS BASIC but not to Atari BASIC family? Can anyone confirm?
I think only MS BASIC
-
47 minutes ago, JamesD said:No sort change, unrolled data statements, names and numbers, 561 names
MC-10 New ROM
Scan data, dimension, load data, sort, check sort, print names 171 secondsScan data, dimension, load data, sort 149 seconds
Dimension, load data, sort, check sort, print names 159 seconds
*edit*
Non-unrolled data statements are a second or so fasterJust an FYI, I found something I forgot to optimize in the sort before.
This uses constants which have to be converted from ASCII to FLOAT every time
4 IFG>1THENT=1
So I switched it to O which is defined as 1 at the top
4 IFG>OTHENT=O

So, what about strings?
in Atari 8-Bit Computers
Posted
31 minutes, 20 seconds.
Congrats, you broke MS BASIC
That would be dog slow even at double speed.