Jump to content
Maury Markowitz

Anyone for a little benchmarking?

Recommended Posts

19 hours ago, ChildOfCv said:

An alternative version used by MS-BASIC PC and GW-BASIC is that when it parses the line number as it enters a statement into program memory, it uses a "line number" token followed by the 16-bit integer line number.  When it has to look up a line while running, it replaces the token with a "line pointer" token and the 16-bit integer that follows becomes the pointer to the line instead.  Of course if you edit or save the program, it first has to revert all of the tokens to line numbers.

That is exactly what I first considered, but using different tokens instead, simplify's parsing at run time.
I may still do this do to some other issues it would solve.

 

Share this post


Link to post
Share on other sites
On 7/9/2019 at 1:32 AM, ChildOfCv said:

An alternative version used by MS-BASIC PC and GW-BASIC is that when it parses the line number as it enters a statement into program memory, it uses a "line number" token followed by the 16-bit integer line number.  When it has to look up a line while running, it replaces the token with a "line pointer" token and the 16-bit integer that follows becomes the pointer to the line instead.  Of course if you edit or save the program, it first has to revert all of the tokens to line numbers.

That's the idea that was tossed around in earlier posts, but that second part is a killer on the 8-bit. On a PC you're likely talking about a system with a small program running in a (relatively) large memory space that it can scan with some speed. Also, in Atari BASIC the RHS of the GOTO is an expression, not a constant, so in the (rare) case the expression produces a different line number you'd have to be able to deal with that too.

Share this post


Link to post
Share on other sites
On 7/8/2019 at 4:02 PM, thorfdbg said:

It is a function that maps a large set (such as the set of all line numbers) to a smaller set (such as 256 integers)

 

Not a hash function, the hash function 🙂

 

This is not how I thought Turbo worked, I thought it just pushed lines onto a list as they were encountered. But now I see how this works, it makes sense.

 

That does raise the question those - if you're considering just the cases where the GOTO/GOSUB has a constant on the left, I wonder what the actual number of GOTO targets actually is? I'm currently hand-coding What to Do After You Hit Return, and so far it's been maybe 15 targets per program with perhaps 30 as the outside bound.

 

On 7/8/2019 at 4:02 PM, thorfdbg said:

That is not quite as effective as keeping the absolute address of a line on GOSUB or FOR/NEXT, but has the advantage that it also works for GOTO and GOSUB, and avoids the problem of having to keep the pointers on the stack consistent in case someone inserts or removes lines from the program within a subroutine or a FOR/NEXT loop.

So what does it do if the code is modified during runtime such that the starting point of the 256-line block moves?

Share this post


Link to post
Share on other sites
On 6/29/2019 at 12:30 PM, dmsc said:

Hi!,

 

 

Well, if you need floating point arrays in FastBasic, just ask!

 

Attached is a new beta (testing) version, compiled from current sources, adding floating point arrays. This version has many changes from the current released 4.0, I should do a proper release soon.

 

With this version, and using the "AtariOS-800XE-Rev03-FastMath.rom" rom that you posted on another thread, the timings are:

 

image.thumb.png.c7ff01391017d764907150c3b483a25a.png

 

The benchmark source is in the ATR image, the changes are like this (line 24):

image.thumb.png.c9e45613bd78a8afa0d899e84579d9d0.png

 

Have Fun!

 

 

fastbasic-4.1-beta2.atr 130.02 kB · 3 downloads

NICE!!!!

 

Just got back from Europe (vacations) and still trying to catch up here and P.M.s (I even have some testing homework to do!)

 

Well, just the fact that FastBasic now effectively supports arrays of Bytes, Integers (x2 bytes) and Float sets it on a class of its own...

 

...and, boy, would you look at those exec times! Almost THREE seconds shaved from my last recorded run! 

 

THANKS for this update!

Edited by Faicuai
  • Like 1

Share this post


Link to post
Share on other sites
On 6/26/2019 at 5:07 PM, Maury Markowitz said:

Ah yes, so it is not really the benchmark, and not really Atari BASIC.

 

@Faicuai, are you willing to run it again using real Atari BASIC instead of Altirra, and without merging the lines?

 

I just noticed that Wilkinson did the same on his version of the sieve, so it's invalid as well.

Sorry for taking so long to address this, but here are the results of Test 2 (complete SEPARATE and extracted out of the whole suite, typed in exactly as original test with 3-digit line numbers and no shortcuts, with the addition of ANTIC=OFF/ON commands), on Altirra Basic 1.56 (8K), and both XL03 and XE04 16K system ROMs tested with Incognito:

 

XL03 ROM (High-Performance FP pack): 2.8 secs.

XL04 ROM (Altirra FP pack): 2.0 secs.

 

As predicted, no significant deviations in performance.

 

Cheers!

Edited by Faicuai

Share this post


Link to post
Share on other sites
On 6/29/2019 at 10:49 AM, JamesD said:

After some thought, this is probably what I will do.  I'll divert the normal GOTO & GOSUB tokens to routines that turn the 16 bit line numbers into pointers, switch the code to alternate tokens, and then call the normal function.  I don't have to worry about ON, I'll just keep converting until I get to a : or the end of the line, skipping commas if there are any.  After that, the GOTO or GOSUB will just load the pointer as if it had already searched for the line number, and will continue normally from there.
There's no startup delay, numbers only need converted when the GOTO or GOSUB is executed, and execution is always fast after that point.
Converting back will have a delay, but that's not such a big deal as it only has to be done once before entering multiple new lines.

Atari BASIC will probably need tokens just due to the ability to use variables instead of constants.
I thought using tokens might allow me to add GOTO variable capability to MS BASIC, then I remembered that floating point doesn't represent all numbers exactly.  Do I truncate?  Round down?  Round up?  Want a reason to use BCD?  This is a good one.
If I can simply truncate numbers to represent the entire BASIC range, I may add this feature.

Changing to 16 bit integers, and pointers does present a few challenges on MS BASIC.
The code that scans through a line assumes ASCII, and isn't smart enough to skip these, so I *may* need to use tokens as well.
Maybe use tokens $FE, and $FF so I can just test for > $FD or something like that. 
I'm not sure yet though, $FF was used on Extended COLOR BASIC for 2 byte tokens.  Guess I could to the same with $FD.
MS BASIC is (was) very dependent on searching for zero as an end of line marker. 
If it has to drop to the next line, it searches for the end of line marker even though the first thing on a line is the pointer to the next line.
Yup, if there is an IF at the start of the line, and the line is 200 characters long, it has to skip over 190+ characters to find the next line.
After I started keeping the pointer to the current line rather than the current line number, that went away, but it still has to skip constants searching for the next statement, and I'm sure for a few other things.
If I want to convert ALL constants, not just line numbers, I'll have to use tokens to distinguish between INT, FLOAT, and variables.
But regularly used constants can be defined at the top of the program to skip converting the type, so this isn't as big of a deal there.
I do need to see if I can speed up searching the variable table, or eliminate the search completely. 

Sorry if I've hijacked the thread a bit, just thought some of this might be useful to further speed up Atari BASIC as well.
We have some of the same issues to deal with.

Very nice and interesting discussion, to say the least...

 

But, if I were on to this path, I would have started by simply taking a look at Apple's INTEGER Basic... That thing is REALLY FAST, considering the time and age and Apple II's 1.0 Mhz clock-speed... How things got implemented there, is probably a good starting point...


Cheers!

Share this post


Link to post
Share on other sites
6 minutes ago, Faicuai said:

Very nice and interesting discussion, to say the least...

 

But, if I were on to this path, I would have started by simply taking a look at Apple's INTEGER Basic... That thing is REALLY FAST, considering the time and age and Apple II's 1.0 Mhz clock-speed... How things got implemented there, is probably a good starting point...


Cheers!

Integer BASIC is not a MS BASIC, so it is very different.  I'd be writing an interpreter from scratch.
Supporting integers is on my list of wants, but it requires a lot of changes to the parser.  Some of the changes I've mentioned will actually make supporting integers easier in the long run.  The big issue with integers, is that I need to create an integer math library, searching for integer variables in variable storage space, possibly modify the stack frame, modify expression evaluation, etc...  so it's no simple task.
As I said, I may use tokens to identify different variable types to simplify parsing, even if I'm not thrilled with the idea.

Ultimately, the best way to speed up a BASIC program is with a compiler. 
I've made progress with several projects in that area, and I may even ditch further interpreter optimizations once those are done.

Share this post


Link to post
Share on other sites
1 hour ago, Faicuai said:

But, if I were on to this path, I would have started by simply taking a look at Apple's INTEGER Basic... That thing is REALLY FAST, considering the time and age and Apple II's 1.0 Mhz clock-speed... How things got implemented there, is probably a good starting point...

They have the advantage of storing all numbers as 16-bit instead of 40-bit. That's no small advantage!!!

 

I've been thinking a lot about this - too much really, given no code to actually test it. But...

 

MS BASIC leaves constants in programs in their original text format. So FOR I=1 TO 10 has "1 TO 10" as text. Atari BASIC tokenizes these values into FP at edit time. For something like a GOTO, it then has to convert them back to 16-bit format to do the search.

 

So the thing I've been mulling: considering most constants are small, likely 0 and 1 dominating the list and <256 dominating the remainder, is the MS solution faster or slower than the Atari version? A lot of numbers are going to be a single digit, so the conversion is rapid, but I strongly suspect even a single-digit conversion still takes longer in MS because of all the tests and branches on things like "is the next character a legal part of a number?".

 

I'll have to cycle count both versions to find out which is ultimately faster, but I suspect the Atari solution is better even for small numbers.

 

One thing that seems like a total no-brainer is to replace 0 and 1 with their own tokens, say $10 and $11. In that case they can be pushed onto the eval stack using their own routines that simply inserts five 0's (etc) so every instance of these now uses up 1 byte of memory instead of 6. There's no conversion in this case, so it will be just as fast as a straight copy.

Edited by Maury Markowitz

Share this post


Link to post
Share on other sites
57 minutes ago, JamesD said:

Supporting integers is on my list of wants, but it requires a lot of changes to the parser.

Perhaps not that much though.

 

Have a flag somewhere that you reset when you enter the eval routine. Set it to "yes, this could be integer". Now parse the line as normal, converting consts to FP representation, but any time you encounter something that cannot possibly be an integer, whether that's a "0.5" or a "sin(10)" or even "A" as opposed to "A%", set the flag to "nope, there's FP here".

 

Now if you get to the end of the eval and the flag is still set, you still have everything you need on the eval stack. So convert operations tokens to their int-only versions, which might just be the + and -, run FP-to-INT on the constants, and replace the = with one that does an invisible FLOAT() on the result.

 

Now if you want to get a little trickier, and you have int variables, all you need to do is add a couple more versions of =, so that you have one that assigns float results to float vars, int results to int vars, and int-to-float and float-to-int.

 

This isn't a perfect solution, because it means something like A=A+10 will use FP to store the 10, because the A on the RHS forced it to be FP. But it seems this solution means you don't have to do anything to the eval stack at all - you can use the 40-bits to store the 16-bit vals as long as the new operator tokens know this. So the stack doesn't have to change, neither does the actual operation of the eval beyond keeping track of the flag, and this avoids all the back-n-forth conversions during the eval. Its also extensible, you can add new int math routines as you want, start with + and - and maybe add * and / later.

Share this post


Link to post
Share on other sites
On 6/29/2019 at 8:49 AM, JamesD said:

MS BASIC is (was) very dependent on searching for zero as an end of line marker. 
If it has to drop to the next line, it searches for the end of line marker even though the first thing on a line is the pointer to the next line.
Yup, if there is an IF at the start of the line, and the line is 200 characters long, it has to skip over 190+ characters to find the next line.
After I started keeping the pointer to the current line rather than the current line number, that went away, but it still has to skip constants searching for the next statement, and I'm sure for a few other things.

Just noticed this quote since someone else quoted it.

 

On PC, there's a routine that searches the line pointers for a line number.  If the target line is greater than the current line number, it will start at the current line number pointer.  Otherwise it starts at the beginning of the program and fast-searches the line pointers.  I know there is at least one circumstance where it looks for the end of THIS line, though.  Well, I think that has to do with handling the ELSE case, and it's actually searching for the end of the current statement (its search also stops at ELSE tokens).

 

I guess if other platforms don't do the line pointer search, it's probably because that special routine was axed for space.

Edited by ChildOfCv

Share this post


Link to post
Share on other sites
On 6/29/2019 at 10:49 AM, JamesD said:

MS BASIC is (was) very dependent on searching for zero as an end of line marker. 
If it has to drop to the next line, it searches for the end of line marker even though the first thing on a line is the pointer to the next line.
 Yup, if there is an IF at the start of the line, and the line is 200 characters long, it has to skip over 190+ characters to find the next line.

Wait, I totally missed this too... why the heck does it do this?!

Share this post


Link to post
Share on other sites
6 hours ago, ChildOfCv said:

Just noticed this quote since someone else quoted it.

 

On PC, there's a routine that searches the line pointers for a line number.  If the target line is greater than the current line number, it will start at the current line number pointer.  Otherwise it starts at the beginning of the program and fast-searches the line pointers.  I know there is at least one circumstance where it looks for the end of THIS line, though.  Well, I think that has to do with handling the ELSE case, and it's actually searching for the end of the current statement (its search also stops at ELSE tokens).

 

I guess if other platforms don't do the line pointer search, it's probably because that special routine was axed for space.

If you mean it searches through the linked list looking for line numbers, that is the same.
I added ELSE to my custom version of the interpreter, but it would be faster without it since it doesn't have to check to see if there is an ELSE.
It just starts at the next line if the line number is greater than the current line.  
The original code searches for the end of the current line before starting the line number search.
My version just starts checking line numbers using the pointer to the current line.
 

Share this post


Link to post
Share on other sites
4 hours ago, Maury Markowitz said:

Wait, I totally missed this too... why the heck does it do this?!

Because MS BASIC (at least the versions I've looked at) store the current line number, rather than a pointer to the current line.
That means there is no other way to find the linked list the lines are part of.
Why does it store current line number instead of a pointer? 
I haven't figured that out yet. 
A few bugs have been introduced over several versions, and once I track those down I might find the source of this madness.
If not, it may just be a bad design choice.

 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...