Jump to content
IGNORED

Some thoughts on BASIC tuning


Maury Markowitz

Recommended Posts

Warning, brain dump follows!

 

After the benchmarking thread of last summer, I began getting interested in seeing which parts of BASIC deserve some attention, as opposed to guessing. For instance, one might guess that you would want to optimize GOTO because they're so slow. But GOTOs aren't all that common, and often run only once. Is that the right place to spend bytes, or some other place?

 

So I started thinking about a tool that would collect statistics from BASIC programs on which to make informed decisions. These would be static, not runtime traces or anything like that. At first I thought I could do it all in something like Python, but as the complexity became clearer I decided the "simplest" solution was to just write an entire BASIC in lex/yacc. I found a useful starting point and I've been hacking it an hour here an hour there ever since. And as it turns out, not so simple...

 

Well as of yesterday it correctly parsed and partially ran Super Star Trek - partially because I don't eval DEF FNs yet (although it should be possible). And the statistics I'm seeing definitely seem useful! For instance, in Super Star Trek (SST) one finds:

 

  • A total of 825 statements on 425 lines, which implies about 2 statements per line. This hides the fact that SST generally either has a single statement, PRINT, or 3 or 4 statements, which may or may not be typical of other programs.
  • There are 709 numeric constants, of which only 41 are not integers. Of the 668 integer constants, 139 of them are zero, and 179 of them are ones! 169 of the remaining are a single digit (2 through 9). Looking at it another way, 490 of the numbers could be represented in a single byte, and 178 in two bytes - which is every integer in the app.
  • There are 272 string constants, of which 140 are larger than 16 chars. The largest is 62.
  • There are 184 branches; 7 ONs, 36 GOSUBs and 151 GOTOs. Of the GOTOs, a slight majority jump forward, whereas in IF statements the vast majority are backward. I didn't bother checking GOSUBs, I assumed they were generally forward, and ONs can go both so I didn't collect these.

 

So the immediate lessons one learns:

 

The VAST majority of numeric constants in BASIC are single-digit integers. Zero and one represent over a third of all numbers. This STRONGLY suggests that in an Atari-a-like BASIC where numbers are converted at parse time, tokens $D0 and $D1 should be used to represent these constants. In SST, this would save 1390 bytes of memory, which is not a small amount! Using $D2 through $D9 for the rest of the single-digit variables would save an additional 845 bytes. This too seems worthwhile, although it would slightly complicate the code compared to zero and one only. With this change we've already saved 1875 bytes.

 

At runtime, when one of these tokens is encountered, the evaluation stack is loaded with a constant, rather than copying over the next five bytes. Everything else is unchanged. This appears to be a trivial modification that would have BIG paybacks in terms of program storage (and load time too!) while requiring only a few bytes of code in the parser and runtime.

 

One could also attack this problem by using a binary format for small numbers, say a $D0 followed by a one or two-byte value. This improves the number of values that can be represented in this format, but requires a runtime conversion (assuming no INT math package). If you go the one-byte route you get 490 replacements at 2 bytes, a savings of 4 per number, for a total of 1960. If you go two-byte, you get 668 replacements with a savings of 3 bytes per, so 2004.

 

This sort of comparison is precisely why I wrote the parser. It seems the single-digit-token approach saves almost the same amount of memory, but would be much easier to implement in code, and be faster as well. This is assuming you don't have a separate integer math package, at which point you need a two-byte format anyway.

 

Turning to the GOTO/GOSUB/ETC issue, which was notoriously slow in Atari BASIC. Since the value is ultimately a 16-bit int, that does provide a slight additional argument that there should be 16-bit constants, as it would save that one conversion at runtime. However, that would also require a separate "GOTO-16-bit" token and a little code to look for this case and do the conversion. This is because if the line is "GOTO I*10" then you don't want to convert the 10 to 16-bit, leaving it as FP makes more sense.

 

What is interesting here is that the majority of branches are backward. I guess this shouldn't be entirely surprising. But what that does suggest is that the "big problem" in Atari, that the line search always starts at the top, likely isn't as much of a problem as you'd guess. The counterexample is GOSUB - of the 49 forward branches, 36 of them are GOSUBs!

 

So this is also extremely informative, we can make better decisions on how to optimize these lookups. The options include:

 

1) having a separate 32-bit token for line numbers, where the first 16-bits is the line and the second is the memory address. Now one can branch immediately to the code without any searching - this is what BASIC XL did IIRC. The problem with this approach is that every time the program is edited you have to go through the entire source and do the lookups. So let's exclude this one, given the statistics above, this really doesn't seem like a worthwhile approach.

 

2) Turbo's solution, store a small table of hashed numbers for more rapid lookup. After calculating the number (whether its a constant or not), you hash the result, find the closest match in the hash, and start from there. So for a few burned cycles you skip over a lot of code. Better yet, when the program runs you only have this little cache to recreate, fast enough that you could do it with every edit.

 

3) my febrile idea, a "variable like" table - Atari stores its variables in a table that is created when the program is typed in. Perhaps we could use this solution for GOTO? In this case the table would simply have 127 entries of 16-bits. At edit time, any branch-to-constant is replaced by an index in this table, and the original number is saved as a string in the same fashion as variable names. At runtime, if the value is zero you do the search (potentially also doing (2) if you wish) and drop that number into the slot in the table. If the code changes, this table is set to zero.

 

(3) only works if the table is small. With 194 branches in SST (although I didn't see what numbers they branched too, the number of unique branches might be smaller) it seems unlikely that (3) works. It looks like some variation on Turbo is the way to go.

 

Still to do: I need to fix the problems with variable counting and add code for FOR loop stats. I strongly suspect that one should have two separate FOR handlers, one for FOR I = (someint) TO (someint) (STEP int), but I'll know once it's implemented. I also need to get command line parsing working so you can script it.

 

The code is GPL and once I get the last bits working I'll post to GitHub. It's currently running in Xcode because I like IDEs over CLIs, but the code itself should run on anything remotely Unix-y, including GW on win.

 

  • Like 2
Link to comment
Share on other sites

 

32 minutes ago, Maury Markowitz said:

Zero and one represent over a third of all numbers

 

32 minutes ago, Maury Markowitz said:

At runtime, when one of these tokens is encountered, the evaluation stack is loaded with a constant, rather than copying over the next five bytes. Everything else is unchanged. This appears to be a trivial modification that would have BIG paybacks in terms of program storage (and load time too!) while requiring only a few bytes of code in the parser and runtime.

 

I suppose this is exactly why some processors have explicit instructions to load zero (and sometimes one) into a register or memory location.

Link to comment
Share on other sites

3 hours ago, Maury Markowitz said:

The VAST majority of numeric constants in BASIC are single-digit integers. Zero and one represent over a third of all numbers. This STRONGLY suggests that in an Atari-a-like BASIC where numbers are converted at parse time, tokens $D0 and $D1 should be used to represent these constants. In SST, this would save 1390 bytes of memory, which is not a small amount! Using $D2 through $D9 for the rest of the single-digit variables would save an additional 845 bytes. This too seems worthwhile, although it would slightly complicate the code compared to zero and one only. With this change we've already saved 1875 bytes.

 

Turning to the GOTO/GOSUB/ETC issue, which was notoriously slow in Atari BASIC. However, that would also require a separate "GOTO-16-bit" token and a little code to look for this case and do the conversion. This is because if the line is "GOTO I*10" then you don't want to convert the 10 to 16-bit, leaving it as FP makes more sense.

 

What is interesting here is that the majority of branches are backward. I guess this shouldn't be entirely surprising. But what that does suggest is that the "big problem" in Atari, that the line search always starts at the top, likely isn't as much of a problem as you'd guess. The counterexample is GOSUB - of the 49 forward branches, 36 of them are GOSUBs!

 

2) Turbo's solution, store a small table of hashed numbers for more rapid lookup. After calculating the number (whether its a constant or not), you hash the result, find the closest match in the hash, and start from there. So for a few burned cycles you skip over a lot of code. Better yet, when the program runs you only have this little cache to recreate, fast enough that you could do it with every edit.

 

3) my febrile idea, a "variable like" table - Atari stores its variables in a table that is created when the program is typed in.

 

Still to do: I need to fix the problems with variable counting and add code for FOR loop stats. I strongly suspect that one should have two separate FOR handlers, one for FOR I = (someint) TO (someint) (STEP int), but I'll know once it's implemented. I also need to get command line parsing working so you can script it.

Concerning "short integers": TurboBasic provides %0 to %3 as tokens, but one has to actively use them. I afraid they never became very popular because you have to remember to put them in. To reduce the size of the program, it was already custom in Atari Basic to replace constants by variables, to be initialized at program start. I believe there was also an automated solution for that which put in variables C0=0,C1=1,C2=2... at the end of the program.

 

Concerning line number searching, Basic++ optimizes it in so far as it scans from the current line if it is a forward branch. TurboBasic keeps a 256 byte lookup table, indexed by the high-byte of the line number. However, Turbo also uses this type of line number search for FOR/NEXT loops, and for RETURN. Basic++ keeps here the address of the target line which is certainly faster. The ABC compiler creates a line number table as suggested, and uses a bisection to find a target line.

 

The comment about FOR I do not understand. What is the other "FOR"?

 

 

Link to comment
Share on other sites

After last summer, and working on patching OSS Integer Basic for compatibility with SDX / Incognito (plus everything behind them), I confirmed that, what matters the most, is to be able to handle stuff that is integer in nature as PURE INTEGERS, and ONLY the rest (not related to memory access, indexing, etc.) as floats... plus caching of lines locations as actual memory addresses, etc.

 

The speed difference is ENORMOUS. Integer Basic SOARS like, pretty much, BBC Basic does (for the same reasons, essentially).

 

That is the bulk of performance-robbing crap with these interpreters.

Edited by Faicuai
Link to comment
Share on other sites

Some Basics give built in constants like TRUE/FALSE.  Though TRUE is usually -1.

Integers are only fully useful if the conversion to do maths with other formats doesn't slow the whole thing down to worse than what it would have been in BCD alone.

The C= implementation there isn't real good - you're losing a byte each reference from the % as well as suffering conversion overhead.

Link to comment
Share on other sites

5 hours ago, thorfdbg said:

Concerning "short integers": TurboBasic provides %0 to %3 as tokens, but one has to actively use them.

 

How does one use these? Simply I=%1?

5 hours ago, thorfdbg said:

I afraid they never became very popular because you have to remember to put them in. To reduce the size of the program, it was already custom in Atari Basic to replace constants by variables,

Yes, but this is something the user has to know about. The parser can do this for them and then every program gets it.

5 hours ago, thorfdbg said:

The comment about FOR I do not understand. What is the other "FOR"?

The one where it goes from one int to another int by one. In that case you don't use the FP at all, and just do a 16-bit add right inline.

Link to comment
Share on other sites

3 hours ago, Rybags said:

Integers are only fully useful if the conversion to do maths with other formats doesn't slow the whole thing down to worse than what it would have been in BCD alone.

I'm talking about replacing the entire number with a separate token and the code handler for that token inserts the FP number directly into the FPacc. There are no integers anywhere and no conversions of anything from anything to anything.

Quote

The C= implementation there isn't real good - you're losing a byte each reference from the % as well as suffering conversion overhead.

Again, I'm not talking about integers or integer values.

Edited by Maury Markowitz
Link to comment
Share on other sites

Hi!

11 hours ago, Maury Markowitz said:

The VAST majority of numeric constants in BASIC are single-digit integers. Zero and one represent over a third of all numbers. This STRONGLY suggests that in an Atari-a-like BASIC where numbers are converted at parse time, tokens $D0 and $D1 should be used to represent these constants. In SST, this would save 1390 bytes of memory, which is not a small amount! Using $D2 through $D9 for the rest of the single-digit variables would save an additional 845 bytes. This too seems worthwhile, although it would slightly complicate the code compared to zero and one only. With this change we've already saved 1875 bytes.

 

At runtime, when one of these tokens is encountered, the evaluation stack is loaded with a constant, rather than copying over the next five bytes. Everything else is unchanged. This appears to be a trivial modification that would have BIG paybacks in terms of program storage (and load time too!) while requiring only a few bytes of code in the parser and runtime.

 

 

FastBasic does this, there are tokens for "0", "1" and for numbers less than 256, so those are smaller and faster.

 

11 hours ago, Maury Markowitz said:

Turning to the GOTO/GOSUB/ETC issue, which was notoriously slow in Atari BASIC. Since the value is ultimately a 16-bit int, that does provide a slight additional argument that there should be 16-bit constants, as it would save that one conversion at runtime. However, that would also require a separate "GOTO-16-bit" token and a little code to look for this case and do the conversion. This is because if the line is "GOTO I*10" then you don't want to convert the 10 to 16-bit, leaving it as FP makes more sense.

 

Well, this is why FastBasic does not have GOTOs :) . In fact, the FastBasic bytecode does have a JUMP instruction, used internally for the loops, EXIT and the IF/THEN, but the parameter is the address of the target instruction so the jump is really fast. By replacing GOTO targets in other BASICs with addresses on first RUN you could achieve a big speedup.

 

11 hours ago, Maury Markowitz said:

3) my febrile idea, a "variable like" table - Atari stores its variables in a table that is created when the program is typed in. Perhaps we could use this solution for GOTO? In this case the table would simply have 127 entries of 16-bits. At edit time, any branch-to-constant is replaced by an index in this table, and the original number is saved as a string in the same fashion as variable names. At runtime, if the value is zero you do the search (potentially also doing (2) if you wish) and drop that number into the slot in the table. If the code changes, this table is set to zero.

 

This exists in TurboBasic, the PROC/EXEC and GO# use labels that are stored in the same variable table, and the target address is populated on run.

 

In FastBasic, labels are different than variables, so the parser has two name tables. Also, FastBasic has a table of label addresses and where each label is used, this is used to patch the JUMP and CALL addresses with the target address when know. And finally, for loops FastBasic has a stack of loop addresses, with the type of loop and the address of loop start/end. But one of the differences of FastBasic with other interpreted basics is that all that tables and stacks are only maintained during parsing, on running all that data is not needed as all the addresses are already stored with the bytecode.

 

11 hours ago, Maury Markowitz said:

(3) only works if the table is small. With 194 branches in SST (although I didn't see what numbers they branched too, the number of unique branches might be smaller) it seems unlikely that (3) works. It looks like some variation on Turbo is the way to go.

There is no reason why the table could be bigger, and I doubt the number of branch destinations are that much.

 

11 hours ago, Maury Markowitz said:

Still to do: I need to fix the problems with variable counting and add code for FOR loop stats. I strongly suspect that one should have two separate FOR handlers, one for FOR I = (someint) TO (someint) (STEP int), but I'll know once it's implemented. I also need to get command line parsing working so you can script it.

 

The code is GPL and once I get the last bits working I'll post to GitHub. It's currently running in Xcode because I like IDEs over CLIs, but the code itself should run on anything remotely Unix-y, including GW on win.

 

You could have a look at my turbo-basic parsing tool, at https://github.com/dmsc/tbxl-parser , that tool can parse all TurboBasic XL syntax and optionally do some useful transformations:

- delete unused variables,

- delete unused line numbers (line numbers that are not referenced),

- replace constants with variables storing the number (for example, all "0" are replaced by %0)

- remove extra parentheses and other unneeded tokens.

 

At end of parsing the tool writes the number of variables and lines.

 

For FastBasic, I have some real profiling tools, I run test programs and the profile counts the number of tokens executed and the time spent executing each token, this has been very valuable to analyze and improve the speed of the interpreter.

 

Have Fun!

 

  • Like 3
Link to comment
Share on other sites

17 hours ago, Maury Markowitz said:

I am not very familiar with BBC but I should be.

 

Do you know of any introduction to its internal workings in the fashion of Chapter 10 of De Re?

BBC-Basic is so legendary that is has already gone cross-platform! I am running it even on my Indust/GT Z80s (under CP/M) (!)

 

Here's the best resource I've found for it. It can be ported to Atari's memory space (it is originally written for 6502) but it will create some serious contention on Page-0 with Atari's own O/S:

 

http://www.bbcbasic.co.uk/bbcbasic.html

 

I will post this weekend my "FP-Index" profiler from BBC-Basic (as well as Altirra-Extended Basic), which I've put together and tested with the idea of simultaneously profiling BOTH speed and precision in a relatively reliable manner. It provides a final figure-of-merit which Atari Basic (and OEM OS) crank out at 1.0 level.

 

You will NOT believe what BBC-Basic is capable of achieving, in relative terms. Not even Altirra X-Basic is able to fully catch up with it). My conclusion is that, to reap the benefits of BCD, a much higher (relative) computational power was required for the time it got introduced with Atari. Plain and simple.

 

Cheers!

 

Link to comment
Share on other sites

On 2/29/2020 at 1:45 PM, Faicuai said:

BBC-Basic is so legendary that is has already gone cross-platform! I am running it even on my Indust/GT Z80s (under CP/M) (!)

 

Here's the best resource I've found for it. It can be ported to Atari's memory space (it is originally written for 6502) but it will create some serious contention on Page-0 with Atari's own O/S:

 

http://www.bbcbasic.co.uk/bbcbasic.html

 

I will post this weekend my "FP-Index" profiler from BBC-Basic (as well as Altirra-Extended Basic), which I've put together and tested with the idea of simultaneously profiling BOTH speed and precision in a relatively reliable manner. It provides a final figure-of-merit which Atari Basic (and OEM OS) crank out at 1.0 level.

 

You will NOT believe what BBC-Basic is capable of achieving, in relative terms. Not even Altirra X-Basic is able to fully catch up with it). My conclusion is that, to reap the benefits of BCD, a much higher (relative) computational power was required for the time it got introduced with Atari. Plain and simple.

 

Cheers!

 

 

Here, both Altirra Extended Basic vs. BBC-Basic, running FP-Index between 1.79 to 2.0 Mhz. Result? Not only BBC-Basic is 10% faster, but it is also TEN TIMES more accurate than the already excellent BCD-based Altirra X-Basic (!)

 

1E47C1F3-C5B2-4630-B9B7-13B6BC660AC8.thumb.jpeg.ae00d3239e171489d6d6495a2283bc51.jpeg

 

F87D3943-CAC5-40B8-B65D-F43D191CD081.thumb.jpeg.788d12f1f2b2dcd6de0566d56fcfff1d.jpeg

 

Here is the .ATR with the source of "FP-Index" in Atari Basic (file is FPTEST34.BAS):

 

 

Scratchpad-DOS-130K-I.ATR

 

REMEMBER:

 

1. Atari Basic Rev. C + Atari OEM OS math will yield in a 1.00 index (DMA=OFF). Notice that we are dealing above with 10.30 to 11.59 indexes (!!!!)

 

2. Fast-Basic V4.1-BetaX version yields in 7.24 performance index, with Altirra-OS FP pack (2Kbytes). The highest you will get there. I will post the source later (it's not yet transferred from my 800/i)

Edited by Faicuai
Link to comment
Share on other sites

This is not so much astonishing given the problems of BCD arithmetics in general. This has been discussed before. Not only is it slower because there is no easy way to multiply or divide, it is also less precise as the error increases with the size of the basis. Now, Atari Basic has a basis of 100 (!). I don't know what BBC basic uses, but if it is a sane implementation, probably a basis of 2 which gives not only faster execution, but also more precise computation.

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...