Yet another BASIC thread

Maury Markowitz · May 17, 2018

I'm trying to flesh out the articles related to BASIC on the Wiki and on AtariWIki. As part of this I've been delving into historical BASICs and others that I was not familiar with. And here's some ideas for everyone to chew on:

1) line searching

Atari BASIC infamously had to search the entire program to look for a GOTO/GOSUB target, while MS-derived BASICs used a simple trick of starting at the top only if the line number was smaller than the one you were reading.

Now I would say that in most programs, most GOTOs tend to be short, maybe a dozen lines in either direction. So in the case of a forward jump, moving forward from your current position is likely to be efficient, but jumping backward from it is not, it has to start at the top.

Why? Because the original Altair BASIC, on which everything is patterned, had a forward pointer only, so there was no way they could scan backward. So you could only start from where you were, or the top.

Now one solution would be to cache all the line numbers you encounter in a GOTO/GOSUB, but that requires lots of memory even if they are never called at runtime. Another solution would be to put a backpointer on every line as well, but that would use even more memory, and do so for every line instead of just the interesting ones.

So what about a trailing XXX lines cache? That is, as you roll through the code you keep track of the last, say, 16 lines, as back pointers. Now on a backward jump, you can scan that list first, and if you don't find it there, start at the top. This is likely almost as fast as full caching, because if you're not jumping back a little, you're likely jumping back a LOT, so a start-from-top would be the right one anyway.

This would require much less memory than the other solutions, yet likely offer almost all the same speed.

2) variable lookup

I'm not exactly sure what Atari BASIC does here, but I assume its the same as Altair/MS in that there is a list of variable names and their memory locations that it scans when it encounters a variable name in code.

One of the really fast BASICs is BBC BASIC and I was curious why and got some great answers. One of the reasons is that instead of one big list of variable names, it has a bunch of smaller ones, I think one for each possible starting letter.

I think a great improvement on that would be a shorter hash table, say with 16 entries. This can be hand-shuffled - there's a whole lot of I, J, K and X, Y, Z's in most programs, so maybe they get their own list each, and then sort out the rest into four groups of five letters each, ABCDE, FGHLM, etc. Now you have 10 lists, you can easily figure out which one to start in, and it should be really much faster.

BBC BASIC also had a completely separate list with fixed entries for every possible integer variable. This makes sense if you think about it - you use these for speed

3) integer math package

One of the great mysteries to me is why the later MS BASICs had integer variables but did not have a separate math library for working with them. Looking into this, it seems the only interest here was saving memory - they did not see people actually doing math on these, but if you used them for indexes and such it could save you enough memory to make up for the slightly larger interpreter size needed to parse them.

Of course, this seems odd in retrospect. After all, if you use them as intended, you're likely going to do a LOT of math on them, specifically a +1 in FOR loops. Ok, so we wouldn't put SIN() in the library, but increment, decrement, +, - and maybe even * and / seem like no brainers. And of course a trunk that just does a NOOP would likely be useful too, and perhaps a RND%(), and I'm sure you can all think of others.

Now FastBasic does this naturally, but for the other Atari BASICs out there do not appear to have these. Given the savings in code size (or at least memory use) it would seem such a package could be implemented within the 8k limits, a statement I base on the fact that MS's size increased about 1000 bytes with the addition of both 9-byte FP and ints.

Aside from pure-int BASICs, like FBI, is there a A%-capable BASIC on the Atari? Did the MS BASICs have this?

4) cool control structures

This is a little more esoteric, but cool nonetheless. DEC's line of BASICs derived from Dartmouth BASIC, which is pretty similar to MS. But they then added a feature that did not make it into MS, and I think it's worth mentioning.

In BASIC-PLUS, most commands could be followed by a logical operator. So for instance:

10 FOR I=1 TO 10: A=A+A: NEXT I

could be written this way:

10 A=A+A FOR I=1 TO 10

Now what's the difference? At first I thought nothing, but then some of the smarter people convinced me this might offer a great advantage in terms of constructing the loop in the resulting interpreted code. For instance, there can only possibly be one operation in the loop, which makes the construction much simpler. Whether this offsets the increase in complexity of the parser I cannot guess. But there were other variations...

A=A+10 WHILE I<100

A=A+10 UNTIL I>=100

These are basically different variations on the FOR, but simpler to read. The second one... I'm not sure this really clarifies anything compared to the WHILE.? There's also the one-time versions of the same:

A=A+10 IF A<>100

A=A+10 UNLESS A=100

Again, I question the need for the UNLESS version.

I still haven't decided if these are a good idea or a distributed cost.

Any comments?

Edited May 17, 2018 by Maury Markowitz

dmsc · May 17, 2018

Hi Maury!,

I'm trying to flesh out the articles related to BASIC on the Wiki and on AtariWIki. As part of this I've been delving into historical BASICs and others that I was not familiar with. And here's some ideas for everyone to chew on:

1) line searching

Atari BASIC infamously had to search the entire program to look for a GOTO/GOSUB target, while MS-derived BASICs used a simple trick of starting at the top only if the line number was smaller than the one you were reading.

Now I would say that in most programs, most GOTOs tend to be short, maybe a dozen lines in either direction. So in the case of a forward jump, moving forward from your current position is likely to be efficient, but jumping backward from it is not, it has to start at the top.

Why? Because the original Altair BASIC, on which everything is patterned, had a forward pointer only, so there was no way they could scan backward. So you could only start from where you were, or the top.

Now one solution would be to cache all the line numbers you encounter in a GOTO/GOSUB, but that requires lots of memory even if they are never called at runtime. Another solution would be to put a backpointer on every line as well, but that would use even more memory, and do so for every line instead of just the interesting ones.

So what about a trailing XXX lines cache? That is, as you roll through the code you keep track of the last, say, 16 lines, as back pointers. Now on a backward jump, you can scan that list first, and if you don't find it there, start at the top. This is likely almost as fast as full caching, because if you're not jumping back a little, you're likely jumping back a LOT, so a start-from-top would be the right one anyway.

This would require much less memory than the other solutions, yet likely offer almost all the same speed.

The problem is basically how to efficiently search a linked list. TurboBasicXL uses another approach: at program startup it generates a table with pointers to the first line number > 256*N, for all N from 0 to 127, using 256 bytes of memory. So, when you need searching line X, you start searching forward from the pointer to the line [X/256]. You can exploit this in TurboBasicXL by having your gotos target lines multiple of 256.

Note that in TurboBasicXL and other more advanced Basics, there is support for "labels" by menas of the "GO#" statement. This is always fast as the label points to the line memory directly.

2) variable lookup

I'm not exactly sure what Atari BASIC does here, but I assume its the same as Altair/MS in that there is a list of variable names and their memory locations that it scans when it encounters a variable name in code.

One of the really fast BASICs is BBC BASIC and I was curious why and got some great answers. One of the reasons is that instead of one big list of variable names, it has a bunch of smaller ones, I think one for each possible starting letter.

I think a great improvement on that would be a shorter hash table, say with 16 entries. This can be hand-shuffled - there's a whole lot of I, J, K and X, Y, Z's in most programs, so maybe they get their own list each, and then sort out the rest into four groups of five letters each, ABCDE, FGHLM, etc. Now you have 10 lists, you can easily figure out which one to start in, and it should be really much faster.

BBC BASIC also had a completely separate list with fixed entries for every possible integer variable. This makes sense if you think about it - you use these for speed

AtariBASIC (and all other derived BASICs) don't search the variable names on execution, it stores the variable number directly maintaining a variable name list and a variable value list. This means that variable access is always fast.

3) integer math package

One of the great mysteries to me is why the later MS BASICs had integer variables but did not have a separate math library for working with them. Looking into this, it seems the only interest here was saving memory - they did not see people actually doing math on these, but if you used them for indexes and such it could save you enough memory to make up for the slightly larger interpreter size needed to parse them.

Of course, this seems odd in retrospect. After all, if you use them as intended, you're likely going to do a LOT of math on them, specifically a +1 in FOR loops. Ok, so we wouldn't put SIN() in the library, but increment, decrement, +, - and maybe even * and / seem like no brainers. And of course a trunk that just does a NOOP would likely be useful too, and perhaps a RND%(), and I'm sure you can all think of others.

Now FastBasic does this naturally, but for the other Atari BASICs out there do not appear to have these. Given the savings in code size (or at least memory use) it would seem such a package could be implemented within the 8k limits, a statement I base on the fact that MS's size increased about 1000 bytes with the addition of both 9-byte FP and ints.

Aside from pure-int BASICs, like FBI, is there a A%-capable BASIC on the Atari? Did the MS BASICs have this?

I know of AdvanBasic at least. But I think you are overestimating the advantage of integer calculations in the direct interpreted BASICs, as the parsing speed is slower than calculation if you use fast FP routines. FastBasic and AdvanBasic are both compiled, so the parsing speed is not important and you can take advantage of integer only calculations.

4) cool control structures

This is a little more esoteric, but cool nonetheless. DEC's line of BASICs derived from Dartmouth BASIC, which is pretty similar to MS. But they then added a feature that did not make it into MS, and I think it's worth mentioning.

In BASIC-PLUS, most commands could be followed by a logical operator. So for instance:

10 FOR I=1 TO 10: A=A+A: NEXT I

could be written this way:

10 A=A+A FOR I=1 TO 10

Now what's the difference? At first I thought nothing, but then some of the smarter people convinced me this might offer a great advantage in terms of constructing the loop in the resulting interpreted code. For instance, there can only possibly be one operation in the loop, which makes the construction much simpler. Whether this offsets the increase in complexity of the parser I cannot guess. But there were other variations...

A=A+10 WHILE I<100

A=A+10 UNTIL I>=100

These are basically different variations on the FOR, but simpler to read. The second one... I'm not sure this really clarifies anything compared to the WHILE.? There's also the one-time versions of the same:

A=A+10 IF A<>100

A=A+10 UNLESS A=100

Again, I question the need for the UNLESS version.

I still haven't decided if these are a good idea or a distributed cost.

Any comments?

I don't like that kind of syntax (used in PERL a lot), as I think that is confusing to the reader. And I don't think that it will produce faster code, a compiler like FastBasic already produces pretty good code for any FOR loop.

Maury Markowitz · May 17, 2018

The problem is basically how to efficiently search a linked list. TurboBasicXL uses another approach: at program startup it generates a table with pointers to the first line number > 256*N, for all N from 0 to 127, using 256 bytes of memory. So, when you need searching line X, you start searching forward from the pointer to the line [X/256]. You can exploit this in TurboBasicXL by having your gotos target lines multiple of 256.

Note that in TurboBasicXL and other more advanced Basics, there is support for "labels" by menas of the "GO#" statement. This is always fast as the label points to the line memory directly.

Yes, and I recall that BASIC XL's FAST command did the same thing basically. But this uses a little more memory and retains the forward-only search, albeit improved. So if you're not on the 256 boundary, especially if you're the 254th item, the searching is slower than a back-link. Still much better than Atari BASIC, but this can be improved, especially in the close-match case. If your code is repeatedly looping from line 110 to 100, a tiny cache of the last 8 or 16 line numbers is going to kick.

As to the labels, yes this is very good, but they should have used the @ symbol. It more closely represents what it means "goto the line at...". I'm too used to using # for hex and addresses! Of course, one could say the same for $...

Atari BASIC (and all other derived BASICs) don't search the variable names on execution, it stores the variable number directly maintaining a variable name list and a variable value list. This means that variable access is always fast.

Yes, but it still has to search for that variable during the tokenization, I think that's where this was used.

I know of AdvanBasic at least. But I think you are overestimating the advantage of integer calculations in the direct interpreted BASICs, as the parsing speed is slower than calculation if you use fast FP routines.

FB examines the entire formula and then decides to build one or the other, right? What if you assume both are possible, build both sets of code, and then drop one or the other as soon as the condition fails?

For instance...

PRINT 10 + 20

would fail immediately because the 10 is not an integer variable. So this immediately goes to FP. Whereas...

PRINT A% + 1

would continue testing both possibilities until it gets to the end. All you need is a flag that any FP-only op turns off. You have to test the constants for decimals, but that doesn't seem like a big deal?

This would mean the user would have to understand that using the integers in this manner requires them to write the code in a certain way. But of course, that's exactly what you are doing when you use integer variables.

And I don't think that it will produce faster code, a compiler like FastBasic already produces pretty good code for any FOR loop.

Yeah, but that's a compiler like FastBasic from you, not a crappy interpreter from someone at MS :-)

Someone is attempting to decompile the BASIC-PLUS version to see if there is an advantage or not. I'm still not convinced one way or the other.

phaeron · May 19, 2018

There is another trick possible with constant GOTOs, which is to rewrite the statement on the fly from GOTO <constant> to GOTO_address <address>, and then rewrite it back if the program needs to be SAVEd or edited. No bucket/hash table space is required. The catch is that there needs to be enough code space to support the scanning routine to convert the statements back, and most 8K BASICs don't have the space for this. I was working on a similar technique to handle marshaling string literals to Atari memory in Veronica BASIC before I concluded that it wouldn't be enough to fix the compatibility issues.

As for integer math, one of the first compatibility problems is that what little integer math there is in Atari BASIC is unsigned (0-65535). This works for addresses and line numbers, not so much for general arithmetic. Atari BASIC is also a bit unusual in that its expression tokens are strongly typed -- there are different tokens for operators that handle numbers and strings even when the syntax is the same. This means that existing interpreters don't have the infrastructure to do type promotion during evaluation. If you don't do type promotion during evaluation, then handling integer overflow cases like 30000+30000 becomes a compatibility issue.

thorfdbg · May 19, 2018

There is another trick possible with constant GOTOs, which is to rewrite the statement on the fly from GOTO <constant> to GOTO_address <address>, and then rewrite it back if the program needs to be SAVEd or edited. No bucket/hash table space is required. The catch is that there needs to be enough code space to support the scanning routine to convert the statements back, and most 8K BASICs don't have the space for this. I was working on a similar technique to handle marshaling string literals to Atari memory in Veronica BASIC before I concluded that it wouldn't be enough to fix the compatibility issues.

Basic++ pushes absolute addresses on the stack, along with the line number/statement offset for FOR-NEXT loops and GOSUB-RETURN. This already speeds up programs.

The catch is that the return addresses change if a program modifies itself, with techniques such as POKE 842,13. To prevent a problem here, Basic++ keeps for that a "generation counter" on the stack as well, whose global counterpart is incremented whenever the interpreter leaves from executon mode to interactive mode. It compares this global counter with the counter value on the stack. If the program generation changed, the "full line number/statement offset" seach is used, and the absolute statement offset is ignored. Hence, line number searching is available as fall-back.

I also prepared a version with an additional 256 "line number hash" as TurboBasic uses it, but it is a bit a tight squeeze and the title string had to go to make it fit. IRRC, I haven't published this version yet.

Sign In

Yet another BASIC thread

Recommended Posts

Maury Markowitz

Link to comment

Share on other sites

dmsc

Link to comment

Share on other sites

Maury Markowitz

Link to comment

Share on other sites

phaeron

Link to comment

Share on other sites

thorfdbg

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More