Jump to content

Maury Markowitz

Members
  • Content Count

    118
  • Joined

  • Last visited

Community Reputation

17 Good

About Maury Markowitz

  • Rank
    Chopper Commander

Recent Profile Visitors

4,371 profile views
  1. I am not very familiar with BBC but I should be. Do you know of any introduction to its internal workings in the fashion of Chapter 10 of De Re?
  2. I'm talking about replacing the entire number with a separate token and the code handler for that token inserts the FP number directly into the FPacc. There are no integers anywhere and no conversions of anything from anything to anything. Again, I'm not talking about integers or integer values.
  3. How does one use these? Simply I=%1? Yes, but this is something the user has to know about. The parser can do this for them and then every program gets it. The one where it goes from one int to another int by one. In that case you don't use the FP at all, and just do a 16-bit add right inline.
  4. Indeed, and in fact STZ was added to the 65C02 to do this.
  5. Warning, brain dump follows! After the benchmarking thread of last summer, I began getting interested in seeing which parts of BASIC deserve some attention, as opposed to guessing. For instance, one might guess that you would want to optimize GOTO because they're so slow. But GOTOs aren't all that common, and often run only once. Is that the right place to spend bytes, or some other place? So I started thinking about a tool that would collect statistics from BASIC programs on which to make informed decisions. These would be static, not runtime traces or anything like that. At first I thought I could do it all in something like Python, but as the complexity became clearer I decided the "simplest" solution was to just write an entire BASIC in lex/yacc. I found a useful starting point and I've been hacking it an hour here an hour there ever since. And as it turns out, not so simple... Well as of yesterday it correctly parsed and partially ran Super Star Trek - partially because I don't eval DEF FNs yet (although it should be possible). And the statistics I'm seeing definitely seem useful! For instance, in Super Star Trek (SST) one finds: A total of 825 statements on 425 lines, which implies about 2 statements per line. This hides the fact that SST generally either has a single statement, PRINT, or 3 or 4 statements, which may or may not be typical of other programs. There are 709 numeric constants, of which only 41 are not integers. Of the 668 integer constants, 139 of them are zero, and 179 of them are ones! 169 of the remaining are a single digit (2 through 9). Looking at it another way, 490 of the numbers could be represented in a single byte, and 178 in two bytes - which is every integer in the app. There are 272 string constants, of which 140 are larger than 16 chars. The largest is 62. There are 184 branches; 7 ONs, 36 GOSUBs and 151 GOTOs. Of the GOTOs, a slight majority jump forward, whereas in IF statements the vast majority are backward. I didn't bother checking GOSUBs, I assumed they were generally forward, and ONs can go both so I didn't collect these. So the immediate lessons one learns: The VAST majority of numeric constants in BASIC are single-digit integers. Zero and one represent over a third of all numbers. This STRONGLY suggests that in an Atari-a-like BASIC where numbers are converted at parse time, tokens $D0 and $D1 should be used to represent these constants. In SST, this would save 1390 bytes of memory, which is not a small amount! Using $D2 through $D9 for the rest of the single-digit variables would save an additional 845 bytes. This too seems worthwhile, although it would slightly complicate the code compared to zero and one only. With this change we've already saved 1875 bytes. At runtime, when one of these tokens is encountered, the evaluation stack is loaded with a constant, rather than copying over the next five bytes. Everything else is unchanged. This appears to be a trivial modification that would have BIG paybacks in terms of program storage (and load time too!) while requiring only a few bytes of code in the parser and runtime. One could also attack this problem by using a binary format for small numbers, say a $D0 followed by a one or two-byte value. This improves the number of values that can be represented in this format, but requires a runtime conversion (assuming no INT math package). If you go the one-byte route you get 490 replacements at 2 bytes, a savings of 4 per number, for a total of 1960. If you go two-byte, you get 668 replacements with a savings of 3 bytes per, so 2004. This sort of comparison is precisely why I wrote the parser. It seems the single-digit-token approach saves almost the same amount of memory, but would be much easier to implement in code, and be faster as well. This is assuming you don't have a separate integer math package, at which point you need a two-byte format anyway. Turning to the GOTO/GOSUB/ETC issue, which was notoriously slow in Atari BASIC. Since the value is ultimately a 16-bit int, that does provide a slight additional argument that there should be 16-bit constants, as it would save that one conversion at runtime. However, that would also require a separate "GOTO-16-bit" token and a little code to look for this case and do the conversion. This is because if the line is "GOTO I*10" then you don't want to convert the 10 to 16-bit, leaving it as FP makes more sense. What is interesting here is that the majority of branches are backward. I guess this shouldn't be entirely surprising. But what that does suggest is that the "big problem" in Atari, that the line search always starts at the top, likely isn't as much of a problem as you'd guess. The counterexample is GOSUB - of the 49 forward branches, 36 of them are GOSUBs! So this is also extremely informative, we can make better decisions on how to optimize these lookups. The options include: 1) having a separate 32-bit token for line numbers, where the first 16-bits is the line and the second is the memory address. Now one can branch immediately to the code without any searching - this is what BASIC XL did IIRC. The problem with this approach is that every time the program is edited you have to go through the entire source and do the lookups. So let's exclude this one, given the statistics above, this really doesn't seem like a worthwhile approach. 2) Turbo's solution, store a small table of hashed numbers for more rapid lookup. After calculating the number (whether its a constant or not), you hash the result, find the closest match in the hash, and start from there. So for a few burned cycles you skip over a lot of code. Better yet, when the program runs you only have this little cache to recreate, fast enough that you could do it with every edit. 3) my febrile idea, a "variable like" table - Atari stores its variables in a table that is created when the program is typed in. Perhaps we could use this solution for GOTO? In this case the table would simply have 127 entries of 16-bits. At edit time, any branch-to-constant is replaced by an index in this table, and the original number is saved as a string in the same fashion as variable names. At runtime, if the value is zero you do the search (potentially also doing (2) if you wish) and drop that number into the slot in the table. If the code changes, this table is set to zero. (3) only works if the table is small. With 194 branches in SST (although I didn't see what numbers they branched too, the number of unique branches might be smaller) it seems unlikely that (3) works. It looks like some variation on Turbo is the way to go. Still to do: I need to fix the problems with variable counting and add code for FOR loop stats. I strongly suspect that one should have two separate FOR handlers, one for FOR I = (someint) TO (someint) (STEP int), but I'll know once it's implemented. I also need to get command line parsing working so you can script it. The code is GPL and once I get the last bits working I'll post to GitHub. It's currently running in Xcode because I like IDEs over CLIs, but the code itself should run on anything remotely Unix-y, including GW on win.
  6. Uhhh, it is. This is a program that decodes Atari BASIC programs into text files.
  7. So the code in question is so old that it will not run on any modern Windows machine. I downloaded the source, did some touchups for modern VS c++, and built a new version called ChkBase64. The name is inaccurate, it's really 32-bit as well, but it works. Is there a place I should upload the binary for other to use?
  8. There is an older 2012 thread on converting Atari BASIC programs to text format. At the time, it required the programs to be loaded into an emulator and then LISTed out to H Has there been: 1) any improvement in this? I would like to convert every file in the Antic Software Collection and any others one might suggest so this would be sometime time consuming 2) has someone already done this? Is there an archive of text format Atari BASIC programs out there?
  9. 10 PRINT "Asteroids. You lose, insert another quarter." Describes my attempts to play it, anyway.
  10. Ok, so this is ultimately wrong. Yes, it DOES post the line number, but it does so only so it can report the line on which an error might have occured. The actual pointer to the code to run is in TXTPTR, which is the address of the code to run. eval.s uses that pointer to loop back to the top in the NEXT. Which brings us back to why Atari BASIC didn't do this? MS would simply refuse to continue if you changed practically anything, and while this is perhaps going too far, in this case the massive upside of loops would seem to argue that this is a reasonable tradeoff.
  11. I asked on SO, but it seems that's drawing a blank, so maybe someone here knows. I was always under the impression that MS BASIC implemented NEXT by pushing the address of the FOR on the stack. So convinced that when I read the source code I assumed CURLIN had to be referring to the address, not the line number. Quelle surprise! Now recall that MS BASIC does NOT allow you to add or remove lines during a STOP/CONT - anything that might change addresses will cause the CONT to fail. So they didn't use the line number in order to allow CONT after modifying the code. So why did they then? If you used the address the NEXT code gets a bunch of lines shorter and a whole lot faster.
  12. But that's not true, the original basic, Dartmouth, was a compiler. This became common later, indeed, but it is not clear to me that they ever considered stop-n-go to be important.
  13. I am looking at the code, it seems to push CURLIN and TXTPTR, isn't that the address into the line?
  14. Sure, but it seems like a HUGE cost to pay! That seems like a simple solution!
×
×
  • Create New...