Jump to content

Maury Markowitz

  • Content Count

  • Joined

  • Last visited

Everything posted by Maury Markowitz

  1. I finally got my BASIC running. It's running in lex/yacc/c, and runs the native system, in my case, macOS. It is an interpreter, it reads the source and makes a structure that it then runs in a fashion similar to Atari BASIC - that is, it's fully tokenized before it runs, unlike MS. Here's Ahl's test: run time 0.003066 Accuracy 6.82121E-13 Random 7.7737 Lolz?
  2. YES! The screen shots don't show that I recall, but I'm pretty sure it is "Under the Ice". I'm going to try to track down the author.
  3. Hey, I was looking through some old posts and I'd like to ask a follow-up question:





    TurboBasic uses such a method. It keeps a table of 256 bytes that keeps the start addresses of the program lines. Of course there are more than 256 lines in many programs, so TurboBasic uses a quite simplistic "hash function" to derive the hash from the line number: Take the most significant byte of the line (a number between 0 and 127) times 2 defines the slot index.


    Thus, if a program searches for line N, it is sufficient to compute floor(N / 256) * 2, to find the nearest line number below the line the program is looking for, and then perform a linear search from that line on. This means that the basic interpreter has to search through at most 255 lines to find the line - instead of performing the search from the start of the program.


    I'm curious when this table is created and maintained? The values in it could theoretically change with every edit. I guess it could update it with every edit, as I suspect only the first so many slots are actually used in practice. But I don't think that's how they would do it.


    Or perhaps it does this are program start/restart... but then you have the associated delay at startup and the need to flush it on edits.

  4. The first is a WWII game (seriously?), the second is nothing like this game.
  5. I'm trying to recall the name of a game I spent too much time playing on my ST. It was a simulation of cold war sub combat. It took place entirely on a map view, where you controlled NATO subs, including British and US I'm not sure if others as well. There was no "internal view" or anything like that, you pointed and clicked on the map to perform operations at a fleet level. One key bit of gameplay I recall was when your US subs had SUBROCs on them and you'd launch and then wait for them to drop into the water right over the enemy while you sent your subs in the opposite direction. I also recall the feeling of joy when you got a Trafalgar class into Swordfish range and watch it run down even the fastest Soviet subs with ease. It was not Red Storm Rising, 688 or any of the other "common" ones. Nor was it Harpoon, although it felt more like that that any of the others I can find. This was pure tactical combat with no frills.
  6. I am not very familiar with BBC but I should be. Do you know of any introduction to its internal workings in the fashion of Chapter 10 of De Re?
  7. I'm talking about replacing the entire number with a separate token and the code handler for that token inserts the FP number directly into the FPacc. There are no integers anywhere and no conversions of anything from anything to anything. Again, I'm not talking about integers or integer values.
  8. How does one use these? Simply I=%1? Yes, but this is something the user has to know about. The parser can do this for them and then every program gets it. The one where it goes from one int to another int by one. In that case you don't use the FP at all, and just do a 16-bit add right inline.
  9. Indeed, and in fact STZ was added to the 65C02 to do this.
  10. Warning, brain dump follows! After the benchmarking thread of last summer, I began getting interested in seeing which parts of BASIC deserve some attention, as opposed to guessing. For instance, one might guess that you would want to optimize GOTO because they're so slow. But GOTOs aren't all that common, and often run only once. Is that the right place to spend bytes, or some other place? So I started thinking about a tool that would collect statistics from BASIC programs on which to make informed decisions. These would be static, not runtime traces or anything like that. At first I thought I could do it all in something like Python, but as the complexity became clearer I decided the "simplest" solution was to just write an entire BASIC in lex/yacc. I found a useful starting point and I've been hacking it an hour here an hour there ever since. And as it turns out, not so simple... Well as of yesterday it correctly parsed and partially ran Super Star Trek - partially because I don't eval DEF FNs yet (although it should be possible). And the statistics I'm seeing definitely seem useful! For instance, in Super Star Trek (SST) one finds: A total of 825 statements on 425 lines, which implies about 2 statements per line. This hides the fact that SST generally either has a single statement, PRINT, or 3 or 4 statements, which may or may not be typical of other programs. There are 709 numeric constants, of which only 41 are not integers. Of the 668 integer constants, 139 of them are zero, and 179 of them are ones! 169 of the remaining are a single digit (2 through 9). Looking at it another way, 490 of the numbers could be represented in a single byte, and 178 in two bytes - which is every integer in the app. There are 272 string constants, of which 140 are larger than 16 chars. The largest is 62. There are 184 branches; 7 ONs, 36 GOSUBs and 151 GOTOs. Of the GOTOs, a slight majority jump forward, whereas in IF statements the vast majority are backward. I didn't bother checking GOSUBs, I assumed they were generally forward, and ONs can go both so I didn't collect these. So the immediate lessons one learns: The VAST majority of numeric constants in BASIC are single-digit integers. Zero and one represent over a third of all numbers. This STRONGLY suggests that in an Atari-a-like BASIC where numbers are converted at parse time, tokens $D0 and $D1 should be used to represent these constants. In SST, this would save 1390 bytes of memory, which is not a small amount! Using $D2 through $D9 for the rest of the single-digit variables would save an additional 845 bytes. This too seems worthwhile, although it would slightly complicate the code compared to zero and one only. With this change we've already saved 1875 bytes. At runtime, when one of these tokens is encountered, the evaluation stack is loaded with a constant, rather than copying over the next five bytes. Everything else is unchanged. This appears to be a trivial modification that would have BIG paybacks in terms of program storage (and load time too!) while requiring only a few bytes of code in the parser and runtime. One could also attack this problem by using a binary format for small numbers, say a $D0 followed by a one or two-byte value. This improves the number of values that can be represented in this format, but requires a runtime conversion (assuming no INT math package). If you go the one-byte route you get 490 replacements at 2 bytes, a savings of 4 per number, for a total of 1960. If you go two-byte, you get 668 replacements with a savings of 3 bytes per, so 2004. This sort of comparison is precisely why I wrote the parser. It seems the single-digit-token approach saves almost the same amount of memory, but would be much easier to implement in code, and be faster as well. This is assuming you don't have a separate integer math package, at which point you need a two-byte format anyway. Turning to the GOTO/GOSUB/ETC issue, which was notoriously slow in Atari BASIC. Since the value is ultimately a 16-bit int, that does provide a slight additional argument that there should be 16-bit constants, as it would save that one conversion at runtime. However, that would also require a separate "GOTO-16-bit" token and a little code to look for this case and do the conversion. This is because if the line is "GOTO I*10" then you don't want to convert the 10 to 16-bit, leaving it as FP makes more sense. What is interesting here is that the majority of branches are backward. I guess this shouldn't be entirely surprising. But what that does suggest is that the "big problem" in Atari, that the line search always starts at the top, likely isn't as much of a problem as you'd guess. The counterexample is GOSUB - of the 49 forward branches, 36 of them are GOSUBs! So this is also extremely informative, we can make better decisions on how to optimize these lookups. The options include: 1) having a separate 32-bit token for line numbers, where the first 16-bits is the line and the second is the memory address. Now one can branch immediately to the code without any searching - this is what BASIC XL did IIRC. The problem with this approach is that every time the program is edited you have to go through the entire source and do the lookups. So let's exclude this one, given the statistics above, this really doesn't seem like a worthwhile approach. 2) Turbo's solution, store a small table of hashed numbers for more rapid lookup. After calculating the number (whether its a constant or not), you hash the result, find the closest match in the hash, and start from there. So for a few burned cycles you skip over a lot of code. Better yet, when the program runs you only have this little cache to recreate, fast enough that you could do it with every edit. 3) my febrile idea, a "variable like" table - Atari stores its variables in a table that is created when the program is typed in. Perhaps we could use this solution for GOTO? In this case the table would simply have 127 entries of 16-bits. At edit time, any branch-to-constant is replaced by an index in this table, and the original number is saved as a string in the same fashion as variable names. At runtime, if the value is zero you do the search (potentially also doing (2) if you wish) and drop that number into the slot in the table. If the code changes, this table is set to zero. (3) only works if the table is small. With 194 branches in SST (although I didn't see what numbers they branched too, the number of unique branches might be smaller) it seems unlikely that (3) works. It looks like some variation on Turbo is the way to go. Still to do: I need to fix the problems with variable counting and add code for FOR loop stats. I strongly suspect that one should have two separate FOR handlers, one for FOR I = (someint) TO (someint) (STEP int), but I'll know once it's implemented. I also need to get command line parsing working so you can script it. The code is GPL and once I get the last bits working I'll post to GitHub. It's currently running in Xcode because I like IDEs over CLIs, but the code itself should run on anything remotely Unix-y, including GW on win.
  11. Uhhh, it is. This is a program that decodes Atari BASIC programs into text files.
  12. So the code in question is so old that it will not run on any modern Windows machine. I downloaded the source, did some touchups for modern VS c++, and built a new version called ChkBase64. The name is inaccurate, it's really 32-bit as well, but it works. Is there a place I should upload the binary for other to use?
  13. There is an older 2012 thread on converting Atari BASIC programs to text format. At the time, it required the programs to be loaded into an emulator and then LISTed out to H Has there been: 1) any improvement in this? I would like to convert every file in the Antic Software Collection and any others one might suggest so this would be sometime time consuming 2) has someone already done this? Is there an archive of text format Atari BASIC programs out there?
  14. 10 PRINT "Asteroids. You lose, insert another quarter." Describes my attempts to play it, anyway.
  15. Ok, so this is ultimately wrong. Yes, it DOES post the line number, but it does so only so it can report the line on which an error might have occured. The actual pointer to the code to run is in TXTPTR, which is the address of the code to run. eval.s uses that pointer to loop back to the top in the NEXT. Which brings us back to why Atari BASIC didn't do this? MS would simply refuse to continue if you changed practically anything, and while this is perhaps going too far, in this case the massive upside of loops would seem to argue that this is a reasonable tradeoff.
  16. I asked on SO, but it seems that's drawing a blank, so maybe someone here knows. I was always under the impression that MS BASIC implemented NEXT by pushing the address of the FOR on the stack. So convinced that when I read the source code I assumed CURLIN had to be referring to the address, not the line number. Quelle surprise! Now recall that MS BASIC does NOT allow you to add or remove lines during a STOP/CONT - anything that might change addresses will cause the CONT to fail. So they didn't use the line number in order to allow CONT after modifying the code. So why did they then? If you used the address the NEXT code gets a bunch of lines shorter and a whole lot faster.
  17. But that's not true, the original basic, Dartmouth, was a compiler. This became common later, indeed, but it is not clear to me that they ever considered stop-n-go to be important.
  18. I am looking at the code, it seems to push CURLIN and TXTPTR, isn't that the address into the line?
  19. Sure, but it seems like a HUGE cost to pay! That seems like a simple solution!
  20. I'm on page 19 of the Atari Source Book. It shows how the FOR statement pushes an entry containing the line number so the NEXT can find it again. But why a line number? Why didn't they push the address of the line and/or individual statement on the line? I'm reading the MS code on github and it appears that's what they do - well if I'm reading it correctly they pop off until they get to the address and then RTS. I'm not sure I totally understand the stack use though.
  21. Therein lies the rub, this question is ultimately about parsing, but because MS has its own math there's no trivial way to directly compare the two, the changes to the underlying math routines would make it difficult to figure out how much the parser is eating up. Well, I guess running it in cc65 might do it...
  22. There's been much discussion over the years about the floating point system in Atari BASIC, but one topic that doesn't get so much attention is strings. AB had pre-defined string lengths, like HP BASIC. These sat in the heap and were pointed to from the variable table. Upsides and downsides are well defined. MS used a linked list of strings (IIRC) and did GC when required. BBC used a similar system, but just coughed a out-of-memory instead of any sort of GC. Are these the only approaches that have been used in BASICs?
  23. Well that makes sense, because if you end up having to start a GOTO you have the number right there. But... a) just store both, it's two freaking bytes b) the line number is +2 bytes in from the line start, it can't be that expensive to deref it BTW I did see your original message, no blocking, I just didn't notice this very interesting bit of it.
  24. Wait, I totally missed this too... why the heck does it do this?!
  • Create New...