Maury Markowitz
-
Content Count
124 -
Joined
-
Last visited
Posts posted by Maury Markowitz
-
-
57 minutes ago, JamesD said:Supporting integers is on my list of wants, but it requires a lot of changes to the parser.
Perhaps not that much though.
Have a flag somewhere that you reset when you enter the eval routine. Set it to "yes, this could be integer". Now parse the line as normal, converting consts to FP representation, but any time you encounter something that cannot possibly be an integer, whether that's a "0.5" or a "sin(10)" or even "A" as opposed to "A%", set the flag to "nope, there's FP here".
Now if you get to the end of the eval and the flag is still set, you still have everything you need on the eval stack. So convert operations tokens to their int-only versions, which might just be the + and -, run FP-to-INT on the constants, and replace the = with one that does an invisible FLOAT() on the result.
Now if you want to get a little trickier, and you have int variables, all you need to do is add a couple more versions of =, so that you have one that assigns float results to float vars, int results to int vars, and int-to-float and float-to-int.
This isn't a perfect solution, because it means something like A=A+10 will use FP to store the 10, because the A on the RHS forced it to be FP. But it seems this solution means you don't have to do anything to the eval stack at all - you can use the 40-bits to store the 16-bit vals as long as the new operator tokens know this. So the stack doesn't have to change, neither does the actual operation of the eval beyond keeping track of the flag, and this avoids all the back-n-forth conversions during the eval. Its also extensible, you can add new int math routines as you want, start with + and - and maybe add * and / later.
-
1 hour ago, Faicuai said:But, if I were on to this path, I would have started by simply taking a look at Apple's INTEGER Basic... That thing is REALLY FAST, considering the time and age and Apple II's 1.0 Mhz clock-speed... How things got implemented there, is probably a good starting point...
They have the advantage of storing all numbers as 16-bit instead of 40-bit. That's no small advantage!!!
I've been thinking a lot about this - too much really, given no code to actually test it. But...
MS BASIC leaves constants in programs in their original text format. So FOR I=1 TO 10 has "1 TO 10" as text. Atari BASIC tokenizes these values into FP at edit time. For something like a GOTO, it then has to convert them back to 16-bit format to do the search.
So the thing I've been mulling: considering most constants are small, likely 0 and 1 dominating the list and <256 dominating the remainder, is the MS solution faster or slower than the Atari version? A lot of numbers are going to be a single digit, so the conversion is rapid, but I strongly suspect even a single-digit conversion still takes longer in MS because of all the tests and branches on things like "is the next character a legal part of a number?".
I'll have to cycle count both versions to find out which is ultimately faster, but I suspect the Atari solution is better even for small numbers.
One thing that seems like a total no-brainer is to replace 0 and 1 with their own tokens, say $10 and $11. In that case they can be pushed onto the eval stack using their own routines that simply inserts five 0's (etc) so every instance of these now uses up 1 byte of memory instead of 6. There's no conversion in this case, so it will be just as fast as a straight copy.
-
On 7/8/2019 at 4:02 PM, thorfdbg said:It is a function that maps a large set (such as the set of all line numbers) to a smaller set (such as 256 integers)
Not a hash function, the hash function 🙂
This is not how I thought Turbo worked, I thought it just pushed lines onto a list as they were encountered. But now I see how this works, it makes sense.
That does raise the question those - if you're considering just the cases where the GOTO/GOSUB has a constant on the left, I wonder what the actual number of GOTO targets actually is? I'm currently hand-coding What to Do After You Hit Return, and so far it's been maybe 15 targets per program with perhaps 30 as the outside bound.
On 7/8/2019 at 4:02 PM, thorfdbg said:That is not quite as effective as keeping the absolute address of a line on GOSUB or FOR/NEXT, but has the advantage that it also works for GOTO and GOSUB, and avoids the problem of having to keep the pointers on the stack consistent in case someone inserts or removes lines from the program within a subroutine or a FOR/NEXT loop.
So what does it do if the code is modified during runtime such that the starting point of the 256-line block moves?
-
On 7/9/2019 at 1:32 AM, ChildOfCv said:An alternative version used by MS-BASIC PC and GW-BASIC is that when it parses the line number as it enters a statement into program memory, it uses a "line number" token followed by the 16-bit integer line number. When it has to look up a line while running, it replaces the token with a "line pointer" token and the 16-bit integer that follows becomes the pointer to the line instead. Of course if you edit or save the program, it first has to revert all of the tokens to line numbers.
That's the idea that was tossed around in earlier posts, but that second part is a killer on the 8-bit. On a PC you're likely talking about a system with a small program running in a (relatively) large memory space that it can scan with some speed. Also, in Atari BASIC the RHS of the GOTO is an expression, not a constant, so in the (rare) case the expression produces a different line number you'd have to be able to deal with that too.
-
On 7/3/2019 at 11:49 PM, phaeron said:Not sure exactly what you mean by LHS,
Left-Hand Side. But I meant RHS 🙂
Quotebut a simple hash table on 7 bits of the line number with no collision chaining works fine in practice. It only needs to cache branch targets and not all executed lines, so collisions are rare.
Ahhh right, it's a branch. But surely a straight GOTO can jump more that 128?
I think I'm missing something key here, can you expand on this a bit? What is the hash function?
-
QuoteThey also appear as tokens on the expression stack
Ahhh, this is the part I was missing.
Quoteso there is no "variable type" in the tokenized source
Got it.
-
On 6/30/2019 at 12:56 AM, ChildOfCv said:Well, it does attempt to use the easiest type to represent a number. The parser begins with an int type and starts filling it with digits until it exceeds 3275 and still has another digit to add.
Which version of MS BASIC is this? QBASIC or later?
-
On 6/27/2019 at 3:08 PM, JamesD said:An alternative (just thought this up), would be to give GOTO and GOSUB alternate tokens. When you encounter the tokens that signify INTs follow at runtime, you convert the #s that follow, and update the tokens, then call the alternate routine for GOTO/GOSUB... or part of it anyway.
It eliminates the delay at startup, and you don't have to constantly check whether numbers are ints or pointers.That's what I was thinking about. In the case of Atari BASIC you have the "this is a FP constant" token following the GOTO (well, 9 times our of 10 anyway), so when you encounter these you replace them with "this is a line number constant". This requires one more check in the parsing/eval, but if it hits (which it will that same 9 of 10), you're ready to call the GOTO. Then on SAVE you convert them all back to their original format, and on run/edit, well, I'm not 100% sure.
But as the rest of this thread has noted, the single-step-of-indirect you get by placing the address in a hash somewhat offers no discernable difference in overall performance while avoiding this overhead. So I'm no longer so hot on this concept.
Of course in Atari BASIC one might still wish to replace the 40-bit FP constant that does follow that 9-of-10 with a 16-bit version simply for RAM considerations.
On 6/27/2019 at 3:08 PM, JamesD said:
I wouldn't bother during runtime. The first use of a GOTO/GOSUB is only going to be a few clocks slower than before, and then it's faster after that.
The speed penalty of searching for a line number on the 6803 & 6809 isn't so badHmm, I think the 65C02 would also allow this by having the line number addresses directly hashed and then using that ridiculously named "indexed absolute indirect" addressing on the JSR. It would mean having to put the line-number-table starting address in the zero page, however.
-
On 6/27/2019 at 3:01 PM, phaeron said:No, this is not a problem for binary floating point either. In 32-bit binary floating point with a 23-bit mantissa, all integers under +/-2^24 are represented exactly.
So much for that theory. And it was such a nice theory...
On 6/27/2019 at 3:01 PM, phaeron said:Line number caching is actually quite a lot simpler than using binary addresses: it only requires some hashtable memory, a hash lookup/insertion in the line number lookup code, and a few hooks to invalidate/clear the hash table when execution starts or the program is modified.
Agreed. Of course the table improves if the LHS is 16-bit, is that what you do on insertion?
On 6/27/2019 at 3:01 PM, phaeron said:In practice, between common practice in Atari BASIC programs and line caching, I haven't seen line number lookup showing up very hot in the profiles in Altirra Extended BASIC.
...but that's because its cached, no? I suspect it would show up a *bit* higher in traces of the original!
-
On 7/1/2019 at 12:16 PM, thorfdbg said:This bit is, for example, set when a string constant such as "Hello world!" is found as a token in the program, and then pushed onto the operator stack. It indicates to the statement that it does not need to attempt to add an offset to use the variable. Thus, the cases "PRINT A$" and "PRINT "HELLO WORLD"" both push a string expression on the operator token, but one is relative, and one is absolute. The first copies a copy from the variable value table onto the expression stack, of relative type, the second pushes an absolute string expression of type $82 on the stack.
I got the rest of the post no problem, but this remains a bit confusing.
As I understand it, constants found in the code are left in-place. Am I correct in thinking this is what you are referring to as "absolute".
If so, does this flag only get "applied" during the parsing when the stack is being built? IE, this bit would never be set in the variable table itself?
-
I'm reading over https://www.atarimax.com/jindroush.atari.org/afmtbas.html about the BASIC format and I'm confused about one thing: it says that an array of FP is indicated by a $40 or $41, and a string by $80 or $81. Is this actually true, and did Atari BASIC actually use both? Was there any reason for having two codes, or does anything with the right bit set work the same?
-
2 hours ago, ijor said:Let's not forget that, precisely because of the FP routines, it doesn't actually fit into 8K. It "borrows" an extra 2K ($D800-$DFFF) from the OS space.
Isn't the FP code in MS BASIC about the same size? I seem to recall 2700 bytes, but that can't be right.
-
On 6/25/2019 at 5:16 PM, R0ger said:You don't avoid rounding in operations. You do avoid rounding in conversion to and from text. That's sometimes significant. But I'm not sure it's the reason.
I'm sure that is the reason. I see that now, after thinking about a different issue in another thread.
By using BDC they can have exact representations of small integers. So that lets them tokenize constants into an FP and know that when you LIST you'll get back the same number, exactly.
In contrast, MS basic uses binary FP, which cannot be exactly reversed. So they leave their constants in text form, and run the text-to-fp routine every time.
Skipping that conversion is likely a major speed boost. Whether that boost makes up for the general performance of BCD as opposed to non-BCD FP code is something I would love to explore.
-
21 minutes ago, _The Doctor__ said:conversely, you could combine the lines on all the others just like the atari listing....
Sure, but they didn't do that when they published the results.
So if you want to compare the Atari to, say, the Apple II, you either have to have both machines and run the same code, or use the published result and use the original code.
As @phaeron noted, the difference is somewhere like a factor of 2x, so this is non-trivial. And yes, it would be about 2x on the Apple II as well.
-
14 hours ago, JamesD said:I don't know what Atari BASIC does, but MS BASIC does not convert line numbers in GOTO or GOSUB statement to 16 bit integers during tokenization.
Wild! But I think I know why...
To be honest, I'm not sure what Atari BASIC (AB herein) does here. I know for normal numeric constants, like A=A+10, the constant is parsed and replaced by it's FP representation, with a $0E placed in front of it to indicate a constant. The FP is 40-bits, so the total length of any constant is 1-byte + 6-bytes.
I'm not sure if they do this for line numbers, I'll have to look. It would save the conversion at runtime, but since AB's maximum line number is 32767, it's 7 bytes of constant vs. maximum of 5 bytes of text, and typically maybe 3 bytes. Moreover, you have to convert the resulting FP number into the 16-bit line number anyway (which it uses, like MS), so there's still a little performance hit.
And wow, I just realized why they used a BCD math package! Its because if you tokenize a number into a "real" FP, as opposed to BCD, then when you convert it back to a string you might get line 99.999999... This doesn't really do anything, but it sure looks weird - too weird. That's the problem you were avoiding by storing the original string constant. AB doesn't have to do that, small integers will always convert back correctly.
Which brings me full circle on the whole int-FP question. If AB had 16-bit int constants, a relatively trivial upgrade, then you wouldn't need to use FP for (most) constants, and then you could replace the FP package with the MS one with few downsides. That makes for a very interesting test!
QuoteI wanted to convert the ASCII to INTs when the line is entered (which by itself would be a big improvement), then convert those to pointers when you type RUN, so all the lookups take place before the code executes, and just directly jump to the line without any kind of search after that.
In AB I would do that by using a unique token for the original 16-bit line number, say $0B, and then if you encounter an $0B during runtime...
Oh, wait, this works perfectly!
... if you encounter a $0B during runtime, you run the line-lookup routine, change the $0B to $0A, and change the 16-bit line number to a 16-bit line address.
I didn't think this would work because I think doing the scan-and-replace at program startup would be a high cost, especially in long programs. I note that BASIC XL does this if you use the FAST command. I concluded the table-lookup was the only solution that offered both a performance improvement and avoided startup issues.
But looking at it now, you don't have to do this replacement for every line, only the ones that actually get called. And you only know that at runtime, so just do that then. This does introduce some randomness in the performance of the GOTO, but its worst case is still MS's best case.
Of course one could combine the two approaches. When a $0B is encountered, you do the line-number-lookup and replace it in-place. But then you also put it in a lookup table. So then when the next unconverted GOTO pointed to the same line is seen, you can convert it using the table. However, I'm not convinced this is worthwhile given the complexity it would add to the runtime.
It seems a direct comparison of the two (three?) approaches would be needed. Unfortunately, existing benches would be useless for this, because they are very small and only have one or zero GOTOs. You'd want to run something much longer with lots of typical BASIC spaghetti to know what's really going on. Super Star Trek would be great, but I can't imagine a way to benchmark it.
-
6 hours ago, ChildOfCv said:Hit enter to break up a line if necessary. Then place the cursor at the
Ahhh, that's the missing trick! Thanks!
-
15 hours ago, Faicuai said:Of course, not a problem (although I am traveling overseas, vacations, and will be back mid July). I will try to do it sometime via RPD on Win10, and check results.
I am curious, does overseas entail a trip eastward or westward?
-
15 hours ago, _The Doctor__ said:what makes it invalid? it is permissible in a number of BASIC's. If a BASIC flavor can't do a certain thing, should that not thing then not be used in any other BASIC's that exist? There sure wouldn't be much left to work with if we did that...
Any one of the machines listed in the R-F lists also benefit from this same trick. So if you change them in the same way then their numbers change too.
So you can't change one and not the other and have any basis for direct comparison. And direct comparison is the whole point of a benchmark.
Certainly one can use such modifications as a demonstration of how to improve performance, but not to compare it. For instance, if you run the code before and after modifications on a single platform then you are demonstrating how such changes effect performance. But if you run one set on one machine and another on a different one, which is what happened here, then you learn nothing at all.
-
1 hour ago, phaeron said:Benchmark test #2 has been modified in the Atari BASIC in an important way:both of the loop statements are on the same line.
Ah yes, so it is not really the benchmark, and not really Atari BASIC.
@Faicuai, are you willing to run it again using real Atari BASIC instead of Altirra, and without merging the lines?
I just noticed that Wilkinson did the same on his version of the sieve, so it's invalid as well.
-
4 hours ago, JamesD said:The only 8K BASIC interpreters I'm aware of either don't have floating point math or any extended BASIC commands.
@Maury Markowitz, I was looking at using tokens to identify what type of value followed to speed up the BASIC on the MC-10, and to do the conversions at tokenization time as you suggest.
To store floating point numbers, I planned on using pointer to a location in a table with the FP value, and the original string.
Numbers don't convert exactly when using floating point, so the original string needs preserved somehow.
Plus, Microsoft BASIC has to copy the floating point number to a virtual floating point register, and the function that does that requires a pointer anyway.
I also considered converting line numbers to pointers, and the original ASCII line number can be generated from the line the pointer refers to.
This makes the interpreter faster, and listing the file or editing a line easy, but saving the program in a compatible format is a little more difficult on a machine with only cassette tape with no on/off control. 🙄Ok first off, how do I break up quotes so I can answer in bits at a time and retain the "posted by so-and-so"? If I reply I get a big block of quoted text, and if I try to edit in the middle it just goes nuts. I know I can select-n-paste-as-quote, but the screen above this has a full page of replies so scrolling through it all is a pain!
So anyway:
QuoteThe only 8K BASIC interpreters I'm aware of either don't have floating point math or any extended BASIC commands.
On the 6502, yes, but of course the original 8080 code was under 8k (barely) with 32-bit FP.QuoteTo store floating point numbers, I planned on using pointer to a location in a table with the FP value, and the original string.
I have put considerable thought into this if you're interested in my solutions. However, they did assume Atari BASIC layout to some degree so some of the suggestions make no sense elsewhere.
It's funny - Atari BASIC uses a lookup table as you mention for MC, and that really is faster than MS's solution. However, that advantage is just WIPED OUT by the slowness of the loops. It's really sad.
QuoteNumbers don't convert exactly when using floating point
So there's the one upside to Atari's BCD code, it saves you having to do what you did. But tip-o-da-hat, that is a clever solution!
QuoteI also considered converting line numbers to pointers, and the original ASCII line number can be generated from the line the pointer refers to.
So... what's your thought here? Certainly you can replace Atari's 6-byte FP with a 16-bit int. Is that what you mean, store the actual value as a int instead of a string? Doesn't MS do that?
-
14 hours ago, Faicuai said:you can of course have a Basic Package relying entirely on integer computations, if anyone really wishes so...
A more interesting question, IMHO, is whether you can have improvements if you have both and still stay within a size limit.
MS BASIC 1.1 and on had the integer variable type, I%. However, this was used only to save RAM. If you did I%=I%+1 it would convert I% to float, add 1, and then INT the result to put it back in I%. So this was actually slower than using FP, although only a tiny amount.
Obviously, you could save some serious cycles if you had an inline 16-bit package that performed the math in 16-bit, not just stored it that way. However, deciding whether or not you can use the int math may be complex. Consider:
I%=I%+1
I%=A+1
I=A%+1
Of course we can see that the first is int, the second is a float+int and the third is float. But the instructions needed to determine this at run time would slow things down. Perhaps to the point where you should just do everything in FP.
But...
Atari BASIC has a number of unused tokens. Among these are two constant indicators, $10 and $11. It would be easy to modify the tokenizer to look at the format of the constant and put in a $10 if it is an integer, followed by 16-bits of value. This immediately saves 3 bytes of RAM for every int constant in your app - and I suspect those represent >>90% of the constants in a typical program.
One might also note that the vast majority of constants are small, -1 through 10 likely cover 95% of all the non-line-number related constants in your program. In this case one might consider using $11 as a 'small int' type that uses only one signed byte. This would require only two bytes more code in the decoder, a branch past the code that copies the high-order byte into the FP operand.
Likewise there is ample room in the variable table's type map for an integer variable type. $60 seems like an obvious choice. One does not need to make any other change, if you're willing the burn the three bytes in the value storage area then the rest of the variable handling works as-is. But one might also consider having two storage tables, one for FP and one for int, in the same way there is another storage system for handling strings.
To make this work, some minor changes also have to be made to the instructions that read the variables and constants and load them into the FP "registers" in zero page. But this is a few lines of code. At that point you have saved a bunch of memory in almost every program.
So then the last bit. While deciding if a set of instructions is int or FP at runtime would kill you, doing so at tokenizing time would not. It could even be done as a separate pass, first one tokenizes the line as normal, although using the $10 and $60 as above.
Then you take a second pass at it and see if every operand is an int. If so, you replace the math instructions with versions that do the calculations inline. So that means you have two sets of math tokens, for instance, FP+ and int+. You don't have a full set of the int instructions, only the most used ones like + and -, and maybe a few others like INT(), which becomes a no-op.
So in this second pass, if all the operands are int, and all the operators have int versions, you update the operator tokens. There's no need to actually parse the formulas again or do this during the original token run, you simply have a list of tokens that match int and ensure all the constants and vars have int flags on them.
More importantly, there should be two versions of FOR/NEXT and GOTO/GOSUB based on the same logic. In the FOR/NEXT case you can inline all the steps, which should result in a noticeable speed increase. For GOTO/GOSUB there is slightly more complexity, the "int versions" of these could work like FOR/NEXT. But much more interesting, instead of an "int only" version of these, you might consider the "constant only" version which would skip any parsing of the line and just read the value and go. Since the use of GOTO A*100 is relatively rare, this could offer a significant boost for the much more common GOTO 100 as it would skip the (admittedly cheap) INT() on the constant.
One might go so far make the "constant version" even a little smarter. If the constant following the GOTO is a $0E or $10, that means it's a line number that has to be looked up. For for a few extra lines of code, one could replace that value with a $11 constant and follow that with the 16-bit memory address that returned from the line-number-lookup. That would offer the same sort of boost that you see in TURBO or BASICXL, but would not require a separate lookup table (TURBO) or pre-parse (XL), thus saving even more time and memory.
-
1
-
-
15 hours ago, Faicuai said:As mentioned above, Altirra Basic 1.56 was used as the interpreter of choice, which is a 8K drop-in rom replacement for Atari Basic.
The reason I ask is BM2 time. Apple II running Applesoft is 8.5 seconds, while a stock XL is 7.3.
Altirra is 1.9?
This suggests that the Altirra version is using some sort of fast GOTO like Turbo or BASIC XL? Does anyone know?
-
On 6/21/2019 at 7:24 PM, towmater said:avoiding rounding errors seems like a perfectly valid explanation, and is reason enough by itself.
Really? The resulting code is much slower and I suspect people would choose performance over the relatively non-invasive rounding error every time. It's not like the code didn't still round off values during calculations, it only avoided a single type of problem.
QuoteMake time as well for that last one - Bill has passed now
Yeah, I "interviewed" him in email some time ago now. I guess the choice of BCD will remain a mystery.
-
On 6/22/2019 at 1:45 PM, Faicuai said:Here's the performance summary of the BASIC benchmark suite (interpreter-level only):
1. Atari: 61.449 secs (Integer tests), 102.1498 secs (Integer + FP tests)
2. BBC Micro: 114.6secs (Integer + FP)
3. Apple II: 74.1 (Integer), 192.6 (Integer + FP)4. Z80 2 Mhz: 166.3 (Integer)
Source code is attached on prior posts...
Which BASIC is this on the Atari? That's not stock I don't think?

Anyone for a little benchmarking?
in Atari 8-Bit Computers
Posted
Wait, I totally missed this too... why the heck does it do this?!