Jump to content

Atari_Ace

Members
  • Content Count

    176
  • Joined

  • Last visited

Community Reputation

266 Excellent

About Atari_Ace

  • Rank
    Chopper Commander

Profile Information

  • Custom Status
    https://ksquiggle.neocities.org/
  • Gender
    Male
  • Location
    Seattle, WA
  • Interests
    Atari 8-bit, Biking
  • Currently Playing
    Lego Marvel
  1. I should have debugged a bit more before I mentioned the issue, once I started looking into it further it became obvious it was a more general Windows kernel bug that for some reason was much more evident in Altirra than anywhere else. Other applications had perceptible (but sub-second) delays in bringing up new windows (e.g. Notepad++ search UI was a little sluggish starting up), but Altirra was just glacial in comparison. I have a trace, but the issue cleared up after a reboot, so I suspect something is leaking graphics elements and calculating what is visible (based on the function names) was exponentially slower than usual as a result. It did draw my attention to one tiny inconsistency though. System>"Configure System..." opens a dialog entitled "Configure Emulation" Similarly behaving menu items ("Disk Explorer...", "Adjust Colors...") launch windows with the same title as the item.
  2. Maybe it's a quirk on my Windows 10 system, but with 3.90-test2, opening any Configure Emulation page with drop-down selectors spins the CPU for several seconds before it completes, the more drop-downs, the longer the delay (Computer > Memory for instance is quite noticeable). Perfview suggests it's in a function calling a series of CreateWindowW and SendMessage commands, (user32!DialogBoxIndirectParamW -> Altirra64!? -> user32!CreateDialogIndirectParamW -> Altirra64!? -> …). I can send the trace if it's interesting.
  3. One (possible) bug I noticed: All the multi-line code sections from my blog entries before the migration (pretty much all of them but the last one) lost their carriage returns. I was hoping this would fix itself as the background tasks completed, but as the only task left is indexing I'm suspicious it was a migration issue and isn't going to change.
  4. Continuing our exploration of Atari 8-bit languages and virtual machines, we come to APX20166 Deep Blue C. Reverse engineering here is fairly simple as the entire source code for the runtime and compiler was published as an additional product, APX20179 Deep Blue Secrets. I'm going to only use that as a reference and proceed to produce a listing of the runtime via disassembly for a few reasons. First, it's always nice to have both the source code and the actual assembly output together. Second, the source code contains a fair amount of debug code we can ignore as it's not present in the shipped runtime. And lastly (but most importantly), by producing a listing from the object file we can spot if the provided source code actually matches the version in the product (it does, but with some quirks I'll discuss below). The first thing to do is to get the runtime (DBC.OBJ) from the disk. It's a 2098 bytes in size, and like all Atari object files (OBX), contains the object data as a sequence of (start address, end address, <data>) bytes, with two $FFs at the beginning of the file. A long time ago, I wrote a tool to examine OBX files, and running that on this file reveals this one is more complicated than strictly necessary: OFFSET: 0006 3000->30fb OFFSET: 0106 30fc->31f7 OFFSET: 0206 31f8->32f3 OFFSET: 0306 32f4->33ef OFFSET: 0406 33f0->34eb OFFSET: 0506 34ec->35e7 OFFSET: 0606 35e8->36e3 OFFSET: 0706 36e4->37df OFFSET: 0806 37e0->3805 OFFSET: 0830 02e0->02e1 Notice that each sequence is exactly 256 bytes long, with 252 bytes of data. This is a quirk of the Atari Macro Assembler, as that's the most data it will put in a section, even if the next section can be merged into it. I think I first noticed this when looking at Atari Pilot II. In any event, this makes disassembly directly from the file complicated, so I went ahead and merged all the adjacent sections together, producing a runtime that is smaller in size (2066 bytes), but otherwise equivalent. Given the merged runtime, we can produce our usual reverse engineering tool reusing code I've published before on this blog. It is remarkably short, since it needs no special features, just the disassembler. use strict; use m6502; sub open_lst { open my $fh, '<', 'dbc.lst' or die; $fh; } sub read_img { my ($addr, $size) = @_; read_img_core( $addr, $size, '../dbc.obj', [0x0006 - 0x3000, 0x3000, 0x3805], [0x0810 - 0x02e1, 0x02e1, 0x02e2]); } assem(@ARGV); We start the listing with some constants we can pull from the Secrets source code. =0001 REVNUM = 1 ; Revision # of c-code =00D0 PC = $D0 ; c-code program counter =00D2 SP = $D2 ; c-code stack pointer =00D4 P = $D4 ; Primary register =00D8 T = $D8 =00E0 ACC = $E0 ; BASIC USR RTN =00E2 MQ = $E2 ; MLTPLIER/QUOT =00E4 ENT = $E4 ; MCAND/DIVSOR =00E6 SC = $E6 ; SIGN CONTROL =00F0 UA = $F0 =00F2 UB = $F2 =00F4 UC = $F4 These are the registers the virtual machine use. The virtual machine has a program counter, a stack pointer and two 16-bit registers (or four 8-bit registers). As this product is derived from Ron Cain's Small-C, it is expected this should resemble an 8080 cpu. It must be true, it says so on Wikipedia ;-). As we'll see, there is some similarities but this virtual machine is by no means an 8080 emulator. Producing the listing now is straightforward. There's a jump table at $3274 with entries for the 47 opcodes supported by the virtual machine (which are called simply P00P to P46P), and a few places with strings or data instead of code, but largely the 2K is made up of easy to recognize 6502 assembly code. As always, the most exciting bit is to find the inner loop of the interpreter. It's called NEXT, and is more or less as expected. 3250: A0 00 NEXT LDY #0 3252: B1 D0 LDA (PC),Y 3254: E6 D0 INC PC 3256: D0 02 BNE NEXTR 3258: E6 D1 INC PC+1 325A: 0A NEXTR ASL A 325B: A8 TAY 325C: B9 74 32 LDA JUMPT,Y 325F: 85 D8 STA T 3261: B9 75 32 LDA JUMPT+1,Y 3264: 85 D9 STA T+1 3266: 6C D8 00 JMP (T) It pulls the opcode from the PC, increments the PC, multiplies the opcode by 2 to lookup the opcode from the jump table, and then jumps to the opcode implementation. Each opcode then must end by calling JMP NEXT to continue the execution. This routine could have been made slightly faster if we only used even numbers for opcodes. It could also have been made a little faster by placing this loop in page zero and self-modifying the jump target, but overall this is probably only a few cycles longer than optimal. The remaining code is largely unsurprising. The runtime starts with vectors for I/O support, strcpy, move, USR, find, PEEK and POKE, and curiously a SOUND vector. I'm not sure why a SOUND is part of the runtime but no graphics to speak of, but there must be a reason. There's a fair amount of code to implement 16-bit signed and unsigned multiply, divide and modulus which some of the opcodes use. Now the runtime is just six bytes more than 2K in size. Surely there are six bytes to be saved in there which might allow some shifting of the locations to give the programmer an extra kilobyte? Well, here's the first quirk that comes from using the Atari Macro assembler. There are 12 bytes lost in the runtime due to the assembler choosing the absolute version of LDA or STA when a zero page version would have been fine. You can see this in P24P (at $3533), an opcode to multiply P by the top of the stack. 3533: A5 D4 P24P LDA P 3535: 8D E0 00 STA.W ACC 3538: A5 D5 LDA P+1 353A: 8D E1 00 STA.W ACC+1 353D: A0 00 LDY #0 353F: B1 D2 LDA (SP),Y 3541: 8D E2 00 STA.W MQ 3544: C8 INY 3545: B1 D2 LDA (SP),Y 3547: 8D E3 00 STA.W MQ+1 354A: 20 96 37 JSR SMUL 354D: AD E2 00 LDA.W MQ 3550: 85 D4 STA P 3552: AD E3 00 LDA.W MQ+1 3555: 85 D5 STA P+1 3557: 4C AB 33 JMP POPNEX There are 6 opcodes here the assembler translated inefficiently. Similarly, P25P and P26P (divide and remainder) and the DIV subroutine also were assembled less than optimally. An aside: These inefficiently implemented opcodes further undermine the argument that this is an 8080 emulator. The 8080 had no multiply or divide instructions. This runtime package implements the opcodes needed to support the compiler, so it implements some 8080-like opcodes plus some additional opcodes that the compiler requires. Calling it an 8080 emulator is a bit misleading. We could save memory in a few other ways as well. For instance, there's a routine TRUE2, which is a jump trampoline because the opcodes that do tests (P36P through P45P) use up more than 128 bytes. The FALSE and TRUE routines could have been located in the middle of the tests instead of near the beginning to avoid needing the trampoline and another case of JMP FALSE, saving five bytes overall. There are also three places I noticed where JMP is used where a branch would have sufficed ($3223: BMI BEG; $34B7: BEQ P19W; $34DD: BEQ P18P), which would save three bytes. So it seems we could easily trim this down to fit within 2K. Overall, this is a nicely designed very compact runtime. There's a lot to be learned from studying this code. dbc1.zip
  5. 708 contains data from a deleted copy of DIFFEQN, we can restore the data from sector 255 of side B and then fix the sector link if we want that degree of fidelity. Comparing the two sectors, it appears bytes $2E-$7F are the ones that are damaged. 016190: SECTOR: 708: FILE: 0 17 00 00 00 00 25 0e 40-20 00 00 00 00 24 87 25 .....%[email protected] ....$.% 10 0e 3f 50 00 00 00 00 2c-16 fc 03 2d 0f 36 8b 2d .?P....,...-.6.- 20 0e 40 42 00 00 00 00 14-15 07 86 20 88 1b 24 22 [email protected] ..$" 30 08 00 80 00 00 00 04 00-01 00 05 02 40 14 20 08 [email protected] . 40 22 4a 40 20 08 44 00 04-00 04 65 80 d0 00 00 00 "[email protected] .D....e..... 50 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 60 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 70 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 007f10: SECTOR: 255: FILE: 0 17 00 00 00 00 25 0e 40-20 00 00 00 00 24 87 25 .....%[email protected] ....$.% 10 0e 3f 50 00 00 00 00 2c-16 fc 03 2d 0f 36 8b 2d .?P....,...-.6.- 20 0e 40 42 00 00 00 00 14-15 07 86 20 88 1b 21 36 [email protected] ..!6 30 8a 2d 0e 40 17 00 00 00-00 14 2d 36 8b 2d 0e 41 [email protected] 40 01 70 00 00 00 16 06 04-21 09 07 86 21 89 1b 15 .p......!...!... 50 36 8a 2d 0e 40 37 00 00-00 00 14 21 36 8b 2d 0e [email protected]!6.-. 60 41 01 70 00 00 00 16 10-04 34 08 20 85 15 14 1a A.p......4. .... 70 2d 0e 40 16 00 00 00 00-12 0e 40 23 00 21 00 7d [email protected]@#.!.} Sector 720 on the other hand is normally unused since it's inaccessible to DOS. I doubt the loss is interesting. Assuming the bitrot is similar to the above, I'm not sure there's anything meaningful to restore here. Sector 720 on side B is just 128 $1A bytes. ​016790: SECTOR: 720: FILE: 0 00 00 00 01 21 87 01 20-01 00 20 40 64 44 60 00 ....!.. .. @dD`. 10 00 10 00 01 80 00 72 02-00 24 20 14 b0 00 17 09 ......r..$ ..... 20 40 08 00 00 00 22 08 80-00 00 00 00 00 00 00 00 @....".......... 30 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 40 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 50 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 60 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 70 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
  6. I took a quick look and didn't see any problems. The sector links seem fine, the VTOC looks fine, and all the files parse as BASIC to line 32768 as expected. That said, only 2/3 of the disk is used, perhaps the problem is in the unused part. Can you be more specific about the errors you saw.
  7. In ZMagazine's Z*Net, vol. 1, No. 9 (included in the November 1989 JACG Newsletter, and other user group newsletters), the following Newswire item appeared: Computer Shopper Magazine will end its long support for the 8-bit Atari and other older computer lines with its December 1989 issue. New editor Bob Lindstrom is an Atari ST owner himself, and appreciates the power of the small computer as well, but has had no advertisers that seem to be concerned over the older units. The dropping of the "classic computers" is part of a refocusing of Computer Shopper, a 700 page monthly magazine that still will carry ST material, often with 5 or more articles per month. IBM and MAC will be the main direction now, with some additional Atari ST, Amiga, and Apple II coverage. These last three are in danger of also being dropped if sales and interest remain sluggish. The point seems to be, if it doesn't sell ads, they don't need the circulation that the small computers might satisfy. So perhaps coverage continued on and off throughout 1988 and 1989. I've extended to April 1986 the Applying the Atari columns from Computer Shopper at http://ksquiggle.neocities.org/columns/ata.htm.
  8. I decided to look for programs by Jeff Brenner among the various disks I've collected over the years. Since he put his name in REM statements almost universally, that got a few hits, but most ended up being work he published elsewhere. He had two articles published in ANALOG (Atari Graphics Overlay in #26, 1985-Jan, Screen Scroller in Analog #50, 1987-Jan). He also published Joystick Cursor Control in COMPUTE!'s Third Book of Atari and Big Buffer for Atari in COMPUTE! #46, 1984-Apr. I did find a somewhat modified version of Programmable Numeric Keypad (1985-Aug) on the pooldisk in UMICH\CIS\XMO19.ATR (and also on TRACE Club's 1985-Dec disk), and Atari Forms Generator (1986-???) in OHAUG's OHSO99A.ATR. So some of this Computer Shopper material circulated.
  9. I've begun HTMLifying the Applying the Atari columns by Jeff Brenner from these scans (all of 1985 thus far) to make these more accessible and readable. I've spot checked all the BASIC listings, but I doubt they are completely error free. The 3-letter codes for each line are in many places illegible, so I've not put extra effort to audit those. I'll probably add the 1986 columns that have been scanned in a few months. https://ksquiggle.neocities.org/columns/ata.htm
  10. FYI, Dungeon Hacks, Expanded Edition is in the current StoryBundle (Spring Fired Up at storybundle.com). It includes Dungeon Hacks, One-Week Dungeons, Anything But Sports: The Making of FTL: Faster Than Light and Red to Black: The Making of Rogue Legacy. There are a number of other books in the bundle as well that are interesting.
  11. I was reading Dungeon Hacks, by David L. Craddock, the other day and found a short reference to the Atari 8-bit in the chapter on the first Rogue-like, Beneath Apple Manor: For Beneath Apple Manor Special Edition, I reverse engineered the 6502 runtime environment for the Galfo Integer BASIC compiler and ported it to 8086 machines (IBM PC). Bob Christiansen of Quality Software also ported it to the Atari. I did not write Ali Baba--Bob might have--but it was in integer BASIC too I think, so Quality used the Galfo compiler on it to port it between machines too. Bob must have given me a copy, which is why I had it. So there's another virtual machine hiding inside this game, so I had to find it and figure out how it works. The first step of course was to download the xex file from www.atarimania.com and hook up a my disassembler to the image with appropriate offsets. The image has a number of parts: OFFSET: 0006 bf00->bf9fOFFSET: 00aa 02e2->02e3OFFSET: 00b0 0c34->803fOFFSET: 74c0 8100->a7ffOFFSET: 9bc4 0110->0132OFFSET: 9beb 02e0->02e1 Given this, my disassembler becomes: use strict;use m6502;sub open_lst { open my $fh, '<', 'bam.lst' or die; $fh;}sub read_img { my ($addr, $size) = @_; read_img_core( $addr, $size, '../bam.xex', [0x0006 - 0xbf00, 0xbf00, 0xbf9f], [0x00aa - 0x02e2, 0x02e2, 0x02e3], [0x00b0 - 0x0c34, 0x0c34, 0x803f], [0x74c0 - 0x8100, 0x8100, 0xa7ff], [0x9bc4 - 0x0110, 0x0110, 0x0132], [0x9beb - 0x02e0, 0x02e0, 0x02e1]);}assem(@ARGV); The first chunk at $BF00 sets of a simple display list for the load. The load into $02E2 invokes that code before continuing the rest of the image. The next two chunks consist of the main portion of the code, with a small gap that isn't used by the program. The last two chunks load a small init routine and invoke it. Let's look at the init routine: ​0110: A2 00 LDX #00112: BD 00 A0 LDA $A000,X0115: 9D 34 04 STA $0434,X0118: A9 00 LDA #0011A: 9D 00 A0 STA $A000,X011D: E8 INX011E: D0 F2 BNE $01120120: EE 14 01 INC $01140123: EE 17 01 INC $01170126: EE 1C 01 INC $011C0129: AD 1C 01 LDA $011C012C: C9 A8 CMP #$A8012E: D0 E0 BNE $01100130: 4C 34 04 JMP $0434 This piece of self-modifying code relocates the data at $A000 to $A7FF to $0434 to $0C33. So should add a relocation definition to the read_img routine: [0x74c0 - 0x8100 + 0xa000 - 0x434, 0x0434, 0x0c33]. Now let's look at $0434: 0434: A9 06 LDA #60436: 85 04 STA 40438: 85 05 STA 5043A: A2 7F LDX #$7F043C: BD 80 04 LDA $0480,X043F: 95 80 STA $80,X0441: CA DEX0442: E0 FF CPX #$FF0444: D0 F6 BNE $043C0446: A9 20 LDA #<$80200448: 8D 30 02 STA SDLSTL044B: A9 80 LDA #>$8020044D: 8D 31 02 STA SDLSTL+1...047D: 4C AF 00 JMP $00AF Another relocation routine, this time relocating $480 to $4FF to $80 to $FF, so we add one more relocation element to read_img: [0x74c0 - 0x8100 + 0xa000 - 0x434 + 0x480 - 0x80, 0x80, 0xff]. If we disassemble $80 to $FF we see a lot of data, but there is some code hiding in there. Labeling $B0 as IP, it looks like this: ​ ; pull IP-1 from stack009A: 68 PLA009B: 85 B0 STA IP009D: 68 PLA009E: 85 B1 STA IP+1 ; increment IP by 100A0: E6 B0 INC IP00A2: D0 0B BNE $00AF00A4: E6 B1 INC IP+100A6: D0 07 BNE $00AF ; increment IP by A00A8: 18 CLC00A9: 65 B0 ADC IP00AB: 85 B0 STA IP00AD: B0 F5 BCS $00A4 ; read byte code and jump00AF: AD 40 1D LDA $1D40 ; start at 1d4000B2: 85 B5 STA $B500B4: 6C FA 63 JMP ($63FA) This is a little virtual machine fetch loop, with page $63 containing the bytecode vectors (which point to code in pages 64 to 6F), and entry points here to increment the IP by 1 or A, or pull the IP-1 from the stack (implementing an RTS-like return). The machine even has the start address built in ($1D40). It patches the bytecode into the indirect jump address directly, so it only supports 128 even byte-codes. The Atari Pascal virtual machine I posted about recently supported 256, but at the cost of a more expensive fetch loop. We can now write a bytecode disassembler by disassembling the codes, figuring out how long each is, and making a table of those lengths. My preliminary disassembler looks like this: my $codeLen = [ 0, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0x 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 2x 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 4x 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 6x 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, # 8x 0, 0, 0, 0, 0, 0, 3, 0, 0, 3, 0, 0, 0, 0, 0, 0, # ax 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # cx 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 2, 0, 0, 0, 0, # ex];sub basic_buf { my ($buff, $addr, $size) = @_; for (my $i = 0; $i + 1 < $size; ) { my $val = unpack "C", substr($buff, $i, 1); my $len = $codeLen->[$val >> 1]; if ($len >= 1) { bytes_buf($len, substr($buff, $i, $len), $addr + $i, $len); $i += $len; } else { die sprintf "%02x", $val; } }}sub basic { my ($buff, $addr, $size) = read_img(@_); basic_buf($buff, $addr, $size);}sub main { return if assem(@_); if (0) { } else { basic(@_); }}main(@ARGV); And my preliminary bytecode disassembly of the start address is: 1D40: F0 .BYTE $F01D41: 06 00 10 .BYTE $06,$00,$101D44: F0 .BYTE $F01D45: 0A 00 10 .BYTE $0A,$00,$101D48: F6 EA .BYTE $F6,$EA1D4A: F6 12 .BYTE $F6,$121D4C: C8 .BYTE $C81D4D: 02 0A 71 .BYTE $02,$0A,$711D50: 9A .BYTE $9A1D51: F0 .BYTE $F01D52: 08 12 10 .BYTE $08,$12,$101D55: F0 .BYTE $F01D56: 08 14 10 .BYTE $08,$14,$10... For example, the most common bytecode here, F0, pushes 0 onto the stack. Some other comments about the image: $81F0-$9FFF is the data for an ANTIC mode E picture, the title screen, with the display list at $8126. So 1/5 of the image is for this picture. $7C40-$8000 is the data for an ANTIC mode 2 (graphics 0) screen, with the display list at $7C20. The code modifies most of it to ANTIC mode 4 during the game for use as the main screen. A 1k character set is located at $0C00-$0CFF. The game doesn't just modify the Atari set. The game requires 48K to load, but likely runs in 32K. A little creativity (compressing the title screen and character set might be enough) could have squeezed it into 32K, but 48K was probably quite common when this hacked image was circulating. I'll probably return to do more work on disassembling this, but I've established how the virtual machine works, so I'm happy enough for a few hours work. P.S. I took a quick look at Ali Baba and didn't see the same virtual machine. I'll need to explore that more fully to determine if it has a similar architecture.
  12. The last blog entry introduced the tools I'm using to explore the Pascal runtime, and included a preliminary (i.e. rough) disassembly. Now we'll start refining that disassembly and start discussing more of the opcodes. Firstly, the last listing was erroneous around $B959 to $B991. There are strings there I somehow missed when spot checking the disassembly, so I've fixed up that part of the disassembly. There were also a couple of missing $9B's as well after strings, and the p-code disassembly had a couple of errors as well which I've now fixed. Now let's discuss some more opcodes. The simplest opcode in the listing is opcode DB. It is just: ​AF9D: E8 INXAF9E: E8 INXAF9F: 4C 9D 00 JMP NEXT_OP1 Since X is the current evaluation stack pointer, and it grows downwards, this opcode drops the topmost entry of the stack, so let's call it DROP. Another simple opcode is $DA, which disassembles as: ​AF8C: CA DEXAF8D: CA DEXAF8E: BD 03 06 LDA EVALPAGE+3,XAF91: 9D 01 06 STA EVALPAGE+1,XAF94: BD 02 06 LDA EVALPAGE+2,XAF97: 9D 00 06 STA EVALPAGE,XAF9A: 4C 9D 00 JMP NEXT_OP1 This adds one entry to the stack, and copies the (previous) top element to it, so we can call this DUP. Opcode D2 is a bit longer, but just involves moving things around the stack, so that the first two elements are exchanged, so let's call it SWAP. AF5F: BC 00 06 LDY EVALPAGE,XAF62: BD 02 06 LDA EVALPAGE+2,XAF65: 9D 00 06 STA EVALPAGE,XAF68: 98 TYAAF69: 9D 02 06 STA EVALPAGE+2,XAF6C: BC 01 06 LDY EVALPAGE+1,XAF6F: BD 03 06 LDA EVALPAGE+3,XAF72: 9D 01 06 STA EVALPAGE+1,XAF75: 98 TYAAF76: 9D 03 06 STA EVALPAGE+3,XAF79: 4C 9D 00 JMP NEXT_OP1 Some other simple stack-only opcodes are 30 (AND), 32 (OR), 34 (NOT), 36 (EOR), 38 (NEG), 40 (ADD) and 44 (SUB). All of these replace the top two values on the stack with the result of the operation. Opcodes 60 and 70 oddly point to the same code, which looks like this: B185: BD 01 06 LDA EVALPAGE+1,XB188: DD 03 06 CMP EVALPAGE+3,XB18B: D0 5C BNE $B1E9B18D: BD 00 06 LDA EVALPAGE,XB190: DD 02 06 CMP EVALPAGE+2,XB193: D0 54 BNE $B1E9B195: F0 5F BEQ $B1F6...B1E9: E8 INXB1EA: E8 INXB1EB: A9 00 LDA #0B1ED: 9D 00 06 STA EVALPAGE,XB1F0: 9D 01 06 STA EVALPAGE+1,XB1F3: 4C 9D 00 JMP NEXT_OP1B1F6: E8 INXB1F7: E8 INXB1F8: A9 01 LDA #1B1FA: 9D 00 06 STA EVALPAGE,XB1FD: A9 00 LDA #0B1FF: 9D 01 06 STA EVALPAGE+1,XB202: 4C 9D 00 JMP NEXT_OP1 If the top two values are equal, we replace them with a 1, otherwise we replace them with a 0. So let's call them EQU. Opcodes 62 and 72 reverses this, so let's call them NEQ. Now why are there two equivalent opcodes? Well, let's look at opcode 64 and 74. 64 is simply: ​B1A9: 20 2F BE JSR $BE2FB1AC: F0 3B BEQ $B1E9B1AE: 30 39 BMI $B1E9B1B0: 10 44 BPL $B1F6 and 74 is similar: ​B1C9: 20 2F BE JSR $BE2FB1CC: F0 1B BEQ $B1E9B1CE: 90 19 BCC $B1E9B1D0: B0 24 BCS $B1F6 with BE2F: BD 02 06 LDA EVALPAGE+2,XBE32: DD 00 06 CMP EVALPAGE,XBE35: F0 0B BEQ $BE42BE37: BD 03 06 LDA EVALPAGE+3,XBE3A: FD 01 06 SBC EVALPAGE+1,XBE3D: 09 01 ORA #1BE3F: 70 0A BVS $BE4BBE41: 60 RTSBE42: BD 03 06 LDA EVALPAGE+3,XBE45: FD 01 06 SBC EVALPAGE+1,XBE48: 70 01 BVS $BE4BBE4A: 60 RTSBE4B: 49 80 EOR #$80BE4D: 09 01 ORA #1BE4F: 60 RTS This difference here seems to be whether the 16-bit comparisons here are done signed or unsigned. The 6x opcodes are signed comparisons, and the 7x opcodes are unsigned comparisons. 60 is EQU and 70 is UEQU, which happen to have identical implementations, and 62 and 72 are similarly NEQ and UNEQ. 64, 66, 68 and 6A seem to be greater than (GT), less than (LT), greater than or equal (GTE) and less than or equal (LTE) respectively. 74, 76, 78 and 7A appear to be same, only unsigned. To further complicate matters, the 8x opcodes also implement comparisons (the same six EQU, NEQ, GT, LT, GTE, LTE operations), but for other types than signed and unsigned integers. The second byte after determines the type, with 00 => bool, 01 => string (both from the stack, so both of these sequences consume 2 bytes), and 02, 03 and 04 being various byte comparisons consuming an additional 2 bytes after the type byte. So our simple p-code disassembler which assumes all opcodes but 2C are fixed size needs to be modified to handle these opcodes a little differently. That's enough for this post. The runtime disassembly is certainly starting to make a bit more sense, but there are plenty of mysteries left to explore.
  13. In the last couple of posts we explored some of the APX Pascal architecture, showing bits of disassembly of the runtime, but I neglected to include the tools I used to extract those bits. This post aims to remedy that, and produce a first draft listing of the APX Pascal runtime. The runtime disassembly was done using similar perl code as I developed in the Dealer Demo Forth deconstruction blog posts last year, so refer to those if you want a discussion on how this disassembler works. There's a few differences worth discussing though. I've split the code into pieces, m6502.pm being generic support for disassembly, pascal.pm being Pascal p-code decompilation and pascal.pl handling working with the PASCAL runtime object from Side 1 of the APX Pascal disk. If you use a perl 5.26 or later you may need to set PERL_USE_UNSAFE_INC=1 in your environment, or add the path with the code to your perl INC path. read_img has been implemented using read_img_core which takes a description of an image. The Dealer Demo version was just a one-off custom piece of code. For apxpascal.pl, the definition is: read_img_core( $addr, $size, '../pascal.xex', [0x06 - 0x3300, 0x3300, 0x59ff], [0x06 + 0x3a00 - 0x3300 - 0xa000, 0xa000, 0xbfff]); which means if you ask for an address between 0x3300 and 0x59ff, start at offset 0x6 of the runtime file. If you ask for an address between 0xa000 and 0xbfff, start at offset 0x706 of the runtime file. The core of the subroutines bytes and words have been factored into bytes_buf and words_buf to allow the code to be used in more cases. Some new options have been added to the core: -str for displaying a string with a known size, -scr for displaying a CR-delimited string. The option -mads, for translating a listing into a MADS compatible source file, is much more sophisticated now. It does some validation of the listing, and converts the listing bytes into a 'check.obx' file. The code is largely the same as the code I've used before when validating OCR's of source code listings (e.g. Atari PILOT or Star Raiders). The code from $A000 to $BFFF is the runtime and is pure assembly code. I produced it by initially disassembling all the code in that region, and then stitching in corrections where data appeared in the stream (usually by noticing a BRK or .BYTE slip in to the disassembly). The version 0.0 documentation (https://archive.org/details/AtariPascalV0.0Documentation) was then used to help pick some label names for common memory locations. I decided to use IP for the pseudo-PC. The commonly location $CC I suspected matched the TMPBASE label from the documentation. $CA was more-or-less exclusively used for saving and restoring the X-register, so XSAVE seemed a natural label (echoes of the Forth VM there). Finally I concluded DR0 must be $B6 (one of 8 16-bit display registers) and LCLBASE (local base register) $C6. $0600 is clearly used as an evaluation stack, so I used EVALPAGE from the 0.0 documentation. I haven't conclusively identified PRGSP, EVALSP, and LEXLEVEL, nor the many temporaries, and the previous labels could prove erroneous, but this is enough for now to produce a more readable listing. The code from $3300 to $39FF is a combination of assembly code and p-code. Whenever a JSR $1F06 appears, some number of JMPs before it appear, followed by what appears to be a length and then a sequence of p-code. A single appearance of JSR $1F03 appears to be followed by an address and then a sequence of p-code. So I decompile decode starting 2 bytes after a JSR $1F03 (until seeing opcode $D9) or $1F06 (until seeing opcode $A6). This description is almost certainly incorrect in some fashion, but seems adequate for now to start decompiling the image. You can find the pascal p-code decompiler in pascal.pm. The decompiler is very simple, it contains a list of p-code instruction lengths. Those that are non-zero are decompiled, the rest are assumed to be errors (not all the values are yet filled in). Negative lengths dump the additional data as words instead of bytes, and opcode $2C is handled as a special case. $2C is clearly a code for a string if you look at examples in the file (e.g. address $3421 contains 0x2c,5,'D:MON'), so clearly a fixed length decompiler isn't going to work for that case. Using this we obtain snippets of p-code like so: ​3410: 4C 13 34 JMP *+33413: 20 03 1F JSR $1F033416: 3A 34 .WORD $343A3418: A2 .BYTE $A23419: 21 35 .WORD $3521341B: FD 09 .BYTE $FD,$09341D: 25 .BYTE $25341E: 3A 34 .WORD $343A3420: F1 .BYTE $F13421: 2C .BYTE $2C3422: 05 44 3A .BYTE 5,'D:MON'3425: 4D 4F 4E3428: A2 .BYTE $A23429: E0 36 .WORD $36E0342B: 25 .BYTE $25342C: 3A 34 .WORD $343A342E: F1 .BYTE $F1342F: A2 .BYTE $A23430: 7D 35 .WORD $357D3432: 25 .BYTE $253433: 3A 34 .WORD $343A3435: F1 .BYTE $F13436: A2 .BYTE $A23437: 64 35 .WORD $35643439: D9 .BYTE $D9 and so: 36DD: 4C 00 00 JMP $000036E0: 4C E3 36 JMP *+336E3: 20 06 1F JSR $1F0636E6: 56 00 .WORD $5636E8: A8 00 .BYTE $A8,$0036EA: D2 .BYTE $D236EB: BC 51 .BYTE $BC,$5136ED: 09 52 .BYTE $09,$5236EF: 09 54 .BYTE $09,$5436F1: 01 54 .BYTE $01,$5436F3: A8 00 .BYTE $A8,$0036F5: BC 10 .BYTE $BC,$1036F7: A6 .BYTE $A6 We need to assign identifiers to those opcodes (first bytes) to make this more readable, but we can do that later as we better identify the opcodes. The current decompilation and the tools are attached to this post. In my next post we'll continue examining the opcodes and improving the listing.
  14. In the last post, we worked through layers of the APX Pascal runtime to find the main interpreter loop, which in fact resides entirely in page zero. In this post, we're going to dig into some of the opcodes to get a flavor for the runtime implementation. As we discussed last time, the each opcode is represented by a JMP value in a 512-byte table that is copied into $1D00 when the runtime starts. If you peruse though the table, the most common JMP target is $B9B5, in 81 entries. This is the not-implemented opcode, hitting any of these in code would be an error. The code seems to be an infinite loop. ​B9B5: 38 SECB9B6: A0 00 LDY #0B9B8: A9 67 LDA #$67B9BA: 20 EA B8 JSR $B8EAB9BD: 4C B5 B9 JMP $B9B5 The next most common opcode is a 4-way tie, for opcodes $90-$97 ($AA65), $98-9F ($AA7A), $E0-E7 ($A2F5) and $E8-EF ($A8EC). Let's investigate each in turn. ​AA65: 4A LSR AAA66: 29 07 AND #7AA68: 48 PHAAA69: A0 01 LDY #1AA6B: B1 A4 LDA (IP),YAA6D: 18 CLCAA6E: 65 C8 ADC $C8AA70: 85 A4 STA IPAA72: 68 PLAAA73: 65 C9 ADC $C8+1AA75: 85 A5 STA IP+1AA77: 4C A3 00 JMP $00A3 This appears to be some kind of unconditional branch/jump opcode. The opcode is shifted right and masked, yielding a number 0-3. It then adds this to the value in $C8,C9 plus the value following the opcode. We set the IP to this value and continue execution. AA7A: A8 TAYAA7B: BD 00 06 LDA $0600,XAA7E: E8 INXAA7F: E8 INXAA80: 4A LSR AAA81: B0 03 BCS $AA86AA83: 98 TYAAA84: 90 DF BCC $AA65AA86: A9 02 LDA #2AA88: 4C 92 00 JMP $0092 This pulls the top of data stack (the stack is at $0600 and indexed by X), if it's odd we're done, otherwise we call the branch function above. So it's a conditional branch. Both of these codes are very odd. The only reason I can think of to encode part of the branch offset into the opcode is to extend the range beyond 256 bytes, but in that case, why not just have a separate opcode for long branches. Also, using both BCC and BCS isn't optimal. A little thought shows removing the BCS achieves the same result, but faster. In general the runtime code looks like it could have used a little more optimization. This project started out as a port from the 8080, perhaps the author never developed enough 6502 experience to tighten up the code in the time allowed. The next two routines are similar: A2F5: 29 0F AND #$0FA2F7: A8 TAYA2F8: B1 C6 LDA ($C6),YA2FA: C8 INYA2FB: CA DEXA2FC: CA DEXA2FD: 9D 00 06 STA $0600,XA300: B1 C6 LDA ($C6),YA302: 9D 01 06 STA $0601,XA305: 4C 9D 00 JMP $009D and: A8EC: 29 0F AND #$0FA8EE: A8 TAYA8EF: B1 B6 LDA ($B6),YA8F1: C8 INYA8F2: CA DEXA8F3: CA DEXA8F4: 9D 00 06 STA $0600,XA8F7: B1 B6 LDA ($B6),YA8F9: 9D 01 06 STA $0601,XA8FC: 4C 9D 00 JMP $009D Both of these routine move a value to the top of the stack, just using different pointers to source the value ($C6 and $B6). Most of the remaining of the opcodes have unique implementations. Some of the interesting ones to look at are "load string" ($2C at $A88F), load small constants ($F0-$F7), load 1, 2 and 4 bytes ($24, $25, $26), call ($A2 at $AB2C) and return ($A6 at $AC96). What's most interesting to me is that having identified these, you might notice they don't match the "Functional Specification" (https://archive.org/details/AtariPascalFunctionalSpecification) at all. Apparently the paper design for the interpreter presented to Atari underwent major revisions by the time it was published. I expected some revisions, but it appears little of the original opcode design survived. In our next post, we'll examine the p-code that exists in the PASCAL runtime object, and write a very basic p-code disassembler.
  15. Bill Lange has been blogging about Atari Pascal since early February at https://insideataripascal.blogspot.com/, so here's my own small contribution after spending an afternoon poking around in APX Pascal and looking for the core interpreter. If we look at the PASCAL runtime on the APX Pascal disk, it's a simple enough image. It loads itself from disk to $3300-$59ff and then starts running from $3300. So what does that initial bootstrap code do? Here's the preamble: ​3300: A2 00 LDX #03302: A9 0C LDA #$0C3304: 9D 42 03 STA ICCMD,X3307: 20 56 E4 JSR CIOV330A: AD A7 33 LDA $33A7330D: 85 F0 STA $F0330F: AD A8 33 LDA $33A7+13312: 85 F1 STA $F0+13314: AD A9 33 LDA $33A93317: 85 F2 STA $F23319: AD AA 33 LDA $33A9+1331C: 85 F3 STA $F2+1331E: AD AB 33 LDA $33AB3321: 85 F4 STA $F43323: AD AC 33 LDA $33AB+13326: 85 F5 STA $F4+13328: 20 73 33 JSR $3373 This code closes IOCB #0, then sets up $F0-F5 using values at $33A7-$33AC and then calls a subroutine. Those values are: 33A7: 00 3A .WORD $3A00 ; source33A9: 00 A0 .WORD $A000 ; destination33AB: 00 20 .WORD $2000 ; count33AD: 00 1D .WORD $1D00 And the subroutine looks like: ​3373: A0 00 LDY #03375: B1 F0 LDA ($F0),Y3377: 91 F2 STA ($F2),Y3379: A5 F0 LDA $F0337B: 18 CLC337C: 69 01 ADC #1337E: 85 F0 STA $F03380: A5 F1 LDA $F0+13382: 69 00 ADC #03384: 85 F1 STA $F0+13386: A5 F2 LDA $F23388: 18 CLC3389: 69 01 ADC #1338B: 85 F2 STA $F2338D: A5 F3 LDA $F2+1338F: 69 00 ADC #03391: 85 F3 STA $F2+13393: A5 F4 LDA $F43395: 38 SEC3396: E9 01 SBC #13398: 85 F4 STA $F4339A: A5 F5 LDA $F4+1339C: E9 00 SBC #0339E: 85 F5 STA $F4+133A0: A5 F4 LDA $F433A2: 05 F5 ORA $F4+133A4: D0 CF BNE $337533A6: 60 RTS This is just a block copy routine, which relocates all the code at $3A00-$59FF to $A000-$BFFF. This block of code (which is most of the PASCAL executable), is the actual runtime. A simpler way to do this would have been to use a multi-segment load file, but this works well enough. $A000-$BFFF is the cartridge address space for an 8k cart, so clearly this was intended at one point to be shipped as a cartridge. What happens next: ​332B: AD AD 33 LDA $33AD332E: 85 80 STA $803330: AD AE 33 LDA $33AE3333: 85 81 STA $813335: A9 00 LDA #03337: 85 82 STA $823339: 85 83 STA $83333B: 20 00 A2 JSR $A200…A200: 4C 12 B8 JMP $B812 This copies the word in $33AD ($1D00) to $80,$81 and zeros $82,$83 before invoking a routine at $A200, which vectors to $B812. That routine does several things, including: B834: A5 80 LDA $80B836: 85 D0 STA $D0B838: A5 81 LDA $81B83A: 85 D1 STA $D1B83C: AD 78 A2 LDA $A278B83F: 85 CE STA $CEB841: AD 79 A2 LDA $A278+1B844: 85 CF STA $CE+1B846: AD 7A A2 LDA $A27AB849: 85 D2 STA $D2B84B: AD 7B A2 LDA $A27A+1B84E: 85 D3 STA $D2+1B850: 20 6B AE JSR $AE6B where: ​A278: 00 A0 .WORD $A000A27A: 78 02 .WORD $0278 So we move the word at $80,$81 ($1D00) to $D0,D1, and set $CE,$CF to $A000 and $D2,D3 to $0278, before calling another block copier. ​AE6B: A0 00 LDY #0AE6D: A6 D3 LDX $D3AE6F: F0 0E BEQ $AE7FAE71: B1 CE LDA ($CE),YAE73: 91 D0 STA ($D0),YAE75: C8 INYAE76: D0 F9 BNE $AE71AE78: E6 CF INC $CFAE7A: E6 D1 INC $D1AE7C: CA DEXAE7D: D0 F2 BNE $AE71AE7F: A6 D2 LDX $D2AE81: F0 08 BEQ $AE8BAE83: B1 CE LDA ($CE),YAE85: 91 D0 STA ($D0),YAE87: C8 INYAE88: CA DEXAE89: D0 F8 BNE $AE83AE8B: 60 RTS So we relocate the first $0278 bytes of the "cartridge" to address $1D00-$1F77. The first $200 bytes are just a series of addresses (more on those soon), the next $78 bytes are a set of JMP vectors, e.g. 1F00: 4C 12 B8 JMP $B8121F03: 4C B1 AB JMP $ABB11F06: 4C B6 AB JMP $ABB6…1F72: 4C 87 B8 JMP $B8871F75: 4C 5F BC JMP $BC5F After we return from this the code continues with: B853: A5 80 LDA $80B855: 85 82 STA $82B857: A5 81 LDA $81B859: 85 83 STA $83B85B: E6 83 INC $83B85D: E6 83 INC $83B85F: 20 EB B9 JSR $B9EB The word at $80,$81 gets moved to $82,$83 and incremented by $200, so the word at $82,$83 is now $1F00. The subroutine called looks like: ​B9EB: A2 21 LDX #$21B9ED: A0 00 LDY #0B9EF: B9 CA B9 LDA $B9CA,YB9F2: 99 92 00 STA $0092,YB9F5: C8 INYB9F6: CA DEXB9F7: D0 F6 BNE $B9EFB9F9: A5 81 LDA $81B9FB: 85 AD STA $ADB9FD: 85 B2 STA $B2B9FF: E6 B2 INC $B2BA01: 60 RTS This copies the code at $B9CA into page zero, and patches the value at $81 ($1D) into $AD and the $B2 and then increments $B2, so we end up with the following: 0092: 18 CLC0093: 65 A4 ADC $A40095: 85 A4 STA $A40097: 90 0A BCC $00A30099: E6 A5 INC $A5009B: B0 06 BCS $00A3009D: E6 A4 INC $A4009F: D0 02 BNE $00A300A1: E6 A5 INC $A500A3: AD FF FF LDA $FFFF00A6: 0A ASL A00A7: B0 05 BCS $00AE00A9: 85 AC STA $AC00AB: 6C 00 1D JMP ($1D00)00AE: 85 B1 STA $B100B0: 6C 00 1E JMP ($1E00) This is the core of the Pascal interpreter, similar to the Forth NEXT routine I discussed in http://atariage.com/forums/blog/734/entry-15007-dealer-demo-part-4-some-forth-at-last/. It has three parts, and self-modifies its code as it runs. If you enter at $0092, it increments the current p-code pointer (located at $A4,$A5) by the accumulator. If you enter at $009D, it increments the current p-code pointer by 1. In both cases, it then proceeds to the third part (which can be called directly as well) which reads the p-code value, multiplies the value by two and then patches one of two jump vectors with that value depending on whether the multiply overflowed or not. This allows us to dispatch all 256 possible p-codes, and each code will then jump back into this routine, keeping the interpreter running forever. Of course, we haven't actually gotten into the interpreter yet, only set it up. We'll discuss that in a future post, but we've made decent progress towards separating the runtime from the monitor. In particular, it's clear we could move the 8K runtime in $A000-$BFFF into a cartridge image and modify the PASCAL file to skip the initial relocation and rely on the cartridge. That focuses our attention on the remaining 1.8K of PASCAL to isolate the code that sets up the runtime and loads the MON program. Hopefully we can adapt that code to load another program directly, and thus produce binaries that can run without loading the monitor.
×
×
  • Create New...