Jump to content
IGNORED

Classic99 CPU bug?


Torrax

Recommended Posts

Tursi you have mentioned a CPU bug in Classic99 with Archiver 3.03G. Could this be the same one thats affecting the disk version of Sofmachine's Jumpy? It looks like some conditional bit is not being set properly on an instruction?!?! Included is a zipped file (V9T9 headers).

 

The game works in V9T9, MESS, and on the real deal. Win994a also exhibits the same behavior as Classic99 when trying to run it.

SOFMACHINE.zip

  • Like 1
Link to comment
Share on other sites

From a very cursory look, Jumpy is getting stuck in a loop around >B576

 

  B576  9CC1  cb   R1,*R3+			    (28)
  B578  1302  jeq  >b57e				  (12)
>  B57A  0583  inc  R3				   
  B57C  10FC  jmp  >b576				

 

It's moving through memory, looking for equality between the high-bytes of R1 and memory addressed by R3.

 

R1 is >4F83 (so it's looking for >4F in the memory addressed by R3) but the memory that R3 is pointing (~ >EE00) to is all 0's. Eventually R3 wraps to 0 (TI ROM) and a 4F is found, but it's all busted up by then.

 

Could it simply be a bad disk image? Do we know for sure that the same physical files, if transferred to a TI will work?

 

You mentioned that the other emulators also fail in the same way, which leads me to believe/think it might simply be a bad version of the program.

 

Just a thought....

Link to comment
Share on other sites

It might be getting stuck in the loop, but that does not mean the problem is with any of the instruction related to the loop. If Classic99 had a problem with CB or JEQ it would have shown up a long time ago. I have a hunch the problem is in the way R3 and R4 are set up prior to entering the loop.

Link to comment
Share on other sites

Me too Matthew. TF gives just about every 9900 instruction a thorough abusing, and I've seen no problems. Though I recently discovered a bug in the handling of the overflow flag in MESS on some instructions. That's why I'm asking about the disk files themselves. Maybe they're bad or incomplete.

Link to comment
Share on other sites

Found and fixed tonight. This was a tricky one (but it does not fix Arc303, alas!) I was fortunate enough to find the cartridge version and it did work, so I was able to compare and narrow down the faulty code in just a few hours.

 

The problem is that always confusing X instruction. And one rather unexpected way that Jumpy uses it. Jumpy has a piece of code that reads opcodes out of a table, so the instruction is something like X @TABLE(R4). No problem there, except that one of the instructions is a JMP. (It's actually a NOP, which is why it's uber confusing but also why it stayed in the loop. If it jumped wildly it may not have been possible to track down so quickly.)

 

This is the actual piece of code that is faulting:

 

  B45C 04A4 x  @>b3f8(R4)  (48) R4=0, Data=D0F7
  B3F8                          MOVB *R7+,R3     @B342->R3 (08)
  B460 0205 li R5,>0008    (20)
  0008
  B464 04A4 x  @>b3fc(R4)  (34) R4=0. Data=1000
  B3FC                          JMP $
  B466 B3FC ab *R12+,R15   (28) <-----------

 

 

I indicated the bad line. If you look at the X instruction, that address should not have been executed, as it's the oprand for the X instruction itself.

 

With a bit of further tracing, I was able to confirm that the X was executing a JMP 00, ie a NOP, so it should have continued with the NEXT instruction.

 

It comes down to what happens when an X needs to jump - it does so relative to the X instruction's PC.

 

In my previous attempt to address this, I had done so by caching the PC before jumping to the executed instruction. However, I cached it before parsing X's oprands, so it only worked for a single-word X (ie: X R7). That's why the NOP executed the oprand, it jumped there because that was cached. I fixed it by caching the PC after parsing the X oprands but before executing the instruction (which may have oprands of its own, but since the only relative instruction is JMP and friends, that's probably not a concern.)

 

So the executed instruction, the AB, corrupted R15 which was being used as a counter of some sort in the loop.

 

The cartridge version did this same piece of code, but it worked under Classic99 because it runs from cartridge space, so all the addresses were different. I don't recall what it worked out to in the end, but instead of R15 it was corrupting R8, and this loop didn't care.

 

It's been a long time since I've released an update to Classic99, so I'll get some testing done to make sure none of the work in progress code breaks anything, and post an update tommorrow or the next day. :)

Edited by Tursi
  • Like 1
Link to comment
Share on other sites

I'm surprised this did not show up earlier. I have not looked at Classic99 source in a while, so dealing with X could be a pain, but I don't understand why you would need to cache the PC?

 

X is a single operand multi-addressing instruction, so you would always have to process the operand to know where to get the data and possibly increment the PC again. In hardware, X is processed like this:

 

1. fetch instruction (X in this case)

2. inc PC (always by 2-bytes of course)

3. decode instruction

4. possibly read operands based on addressing

4a. inc PC if instruction used an operand for src or dst

5. read data at address specified by instruction operand

--- 6 and 7 are steps specific to X, other instructions will have different processing at this point

6. replace IR (instruction register) with data value obtained in step 5, which becomes a new instruction

7. jump back to step 3 to decode the new instruction

 

In the case of any of the jump instructions, the displacement is in the opcode, so further fetching of data from the PC's current location would not happen until after the jump had been executed (which of course changes the PC). Also, since the X instruction causes the FSM to go back to the decode step, it skips the "update flags" state, i.e. X does not affect any flags directly. Flags will be set according to the execution of the newly decoded instruction (which could very well be another X instruction... ;-) )

 

Just rambling. I hope this helps in some way.

Link to comment
Share on other sites

Yes, you're right Matthew. In the practical case there is no need to cache the PC, because it's impossible for the PC to move again if X is going to execute a relative jump, because there are no relative jump instructions with additional word oprands. It's really just the order of things in my head when I first coded it -- I understood that the branch was relative to the X instruction, and I likewise understood that X could execute instructions which required addtional words - these additional words would follow the X. What I did not do was connect in my head that there are no relative branches with additional words so they can't mess further with the PC.

 

I realized this as I was re-reading my description, but it was very late last night when I made the change. I will probably clean it up before publishing. ;) Also note that Classic99 doesn't parse instructions exactly the same way that the hardware does (although admittedly, if I had the CPU manual when I started it might well be closer ;) ). Anyway, it's a C program with numerous functions, of which X is one, so it can't just jump back to the decode step. There are a number of ways I could have solved that, but I do it by recursively calling the opcode dispatch code inside the X handler. This does mean that an X instruction which executes an X instruction could run Classic99 out of heap space, but IIRC I trap that case (and IIRC, I was told that the hardware locks up if you do that. ;) ).

 

As for why we never saw it before, well.. it's a strange usage of a rarely used instruction. X is somewhat uncommon, using an absolute address (indexed or not) with X is more uncommon, using a JMP instruction inside of X is very uncommon, and executing a NOP inside of X doesn't make a lot of sense. Only the last case would cause this exact behavior -- and even then the side effect of the incorrect instruction would depend on the address being referenced, which may or may not have any impact on the program. (Any other displacement would simply be off by 2 bytes, which may or may not actually show notable symptoms depending on the program).

Edited by Tursi
Link to comment
Share on other sites

Much thanks to figuring this out, and a good explanation on the "X" instruction. Also the images do work on an actual TI.

 

I'm sorry that this didn't help squash the ARC303 bug. Does SoundFX work on Classic99? As I think it is written by the same author. The source code is on the WHT site, and can give you quite a headache from all the optimizations in it. Could the outstanding bug could be related to them?!?!

Link to comment
Share on other sites

So what happens when an X instruction is pointed to a branch or JMP instruction? It was clear to me from reading the above.

 

Does the JMP/Branch get ignored and control returns to the main thread? If not, and the branch is taken, at what point does the main program resume?

 

It sounds like the code in Jumpy that uses the X instruction is a kind of interpreter. It's working through an array of instructions and calling instructions from the array. In some cases, the programmer wanted no action to be taken, but rather than trap that in the interpreter loop, he simply coded NOPs into the array, which to be fair is the better way to do it.

 

An interesting edge case.

 

TF also uses X in a couple of places if I remember correctly. I'll have to check the source. However, I remember that it all worked fine in classic99 and on real h/w so I'm not anticipating any problems.

Link to comment
Share on other sites

So what happens when an X instruction is pointed to a branch or JMP instruction? It was clear to me from reading the above.

 

Does the JMP/Branch get ignored and control returns to the main thread? If not, and the branch is taken, at what point does the main program resume?

 

Assuming you meant to write "it was not clear..." ?

 

Think of X as replacing itself with whatever instruction the operand (data) of the X instruction decodes as. However, the X instruction supports all the addressing modes, so there may be extra data following the X instruction opcode that is used for fetching the data (which becomes the new opcode).

 

The X instruction executes fully, just like any other instruction but it does not affect any flags, and after the X instruction there is no "fetch" cycle since the data retrieved by the X instruction becomes the new opcode to the decoder.

 

You do not want to think of X as modifying the flow of the program, it is not a branching instruction of any kind. It is actually very similar to a Load Immediate instruction with multi-addressing support and the destination always being the CPU's internal Instruction Register (the place the opcode is stored when fetched). The IR directs the CPU's state machine to execute an instruction.

 

If the X instruction ends up loading a branch (B, BL, BLWP) or jump (JMP, JNE, etc.) instruction, those instructions will behave exactly as they would had their opcode been at the address of the X instruction, with the exception that any data following the X instruction to support an address mode will already have been used by the X instruction, and the PC will be pointing to the address following the X opcode and any data used by the X instruction.

 

Since the X instruction does not affect any flags, the flags will be set according to the instruction prior to the X instruction. This allows the instruction loaded by X to take some action as if that instruction existed at the same address as the X instruction itself.

 

Clear as mud, right?

  • Like 2
Link to comment
Share on other sites

A few cases, mostly specific. I think most CPUs have something like an X instruction to help facilitate a debugger. The x86 has something similar, but I can't remember exactly what. Anyway, on the 9900, the X instruction can replace any instruction in a binary file, and could cause a jump to a debugger instead of executing the instruction itself. Think "break point". I think that is the most obvious and generally used case.

 

I suppose it could have its uses in some sort of interpreter, but I have not thought about it enough to give any example. Maybe this "Jumpy" game is an example?

 

You could also use it to change the decision path of a comparison without having the extra jump or labels. Say in some case you wanted to JNE after a compare, and in another case you wanted to JEQ. Instead of writing extra code, you could use X to have it different every time it executes. Of course somewhere you have to decide which decision you want to use and set it up, but that may already being done somewhere. Then again, on a CPU like the 9900 you can write self-modifying code, so I'm not sure where X would be better.

Link to comment
Share on other sites

Yeah.. my thinking is that X was designed in as a /replacement/ for self-modifying code. Using X, a particular address can be changed to be any instruction without actually changing memory. One advantage of this (besides cleaner code for the purists) is that you can essentially have fake-self-modifying code that runs from ROM. If the 9900 line had developed further, this would also ensure that such code ran nicely with the existence of program cache.

 

That's the only reason I can think of for it. But like many such things, you can probably abuse it in interesting ways ;)

Link to comment
Share on other sites

I'm sorry that this didn't help squash the ARC303 bug. Does SoundFX work on Classic99? As I think it is written by the same author. The source code is on the WHT site, and can give you quite a headache from all the optimizations in it. Could the outstanding bug could be related to them?!?!

 

I have SoundFX, but Classic99's sound system doesn't support high-frequency audio changes yet (I used to have it before I rewrote the sound engine, so I know what to do, I just haven't yet. I know exactly four programs that support digitized sound playback on the TI, and two of them are mine ;) ).

 

Even so, complex code is hard to find a faulting instruction in, and Sound F/X has a lot of options for memory, etc. Simple examples are better.

 

I actually know the exact address that Arc303 makes the wrong decision, I just don't follow what the code is doing at that point. I'll get back to it one day and sort it out. Part of me keeps hoping it's just a disk emulation bug, but as that becomes more robust I increasingly doubt that.

Link to comment
Share on other sites

Assuming you meant to write "it was not clear..." ?

Woops! Yes, sorry!

 

Think of X as replacing itself with whatever instruction the operand (data) of the X instruction decodes as. However, the X instruction supports all the addressing modes, so there may be extra data following the X instruction opcode that is used for fetching the data (which becomes the new opcode).

 

That still doesn't quite make sense to me. I think of X as a method of providing a GOSUB/RETURN (or BL/RT in machine code terms) for a *single* instruction.

 

So, for example:

 

X *R3

INCT R2

 

will do an indirection on R3, and execute that instruction. *But*... The INCT R2 will still execute. What you said does make sense, though. In ones mind one can simply imagine the eventual instruction (the target of the X instruction) in place of the X instruction.

 

So, you're saying, to be clear, that if the target of the X instruction in the example above was a JNE (for example) then the INCT might not be taken, if the EQ status bit happened to be set? I ask because it could turn out to be a useful feature.

 

Totally agree re the use of X as a debugger. I think there's also a specific instruction on the 9900, (is it HALT ? can't remember) that causes an interrupt (it would be called an exception, or a trap on other processor families) but it's not supported on the 4A motherboard; something to do with the interrupt decoding? I think that particular interrupt is not 'seen' by the hardware for some reason.

Link to comment
Share on other sites

So, for example:

 

X *R3

INCT R2

 

will do an indirection on R3, and execute that instruction. *But*... The INCT R2 will still execute.

---------------------------------------------------------------------------------------------------------------------

If i recall correctly, whether the INCT R2 executes depends on what is being executed. If it is a one word instruction then all is well, but if it is something like:

MOV @>834A,@>B000 then it will MOV using the INCT R2 in place of >834A and the next word after INCT R2 in place of >B000.

 

I think you won't go wrong if you think of it as plugging in the word being executed in place of the X instruction and not (as I used to think) as a BL with an automatic return.

 

For what it's worth, Win994a does not support the X instruction, although Cory sent me a beta copy that does.

Link to comment
Share on other sites

That still doesn't quite make sense to me. I think of X as a method of providing a GOSUB/RETURN (or BL/RT in machine code terms) for a *single* instruction.

 

That's not a good way to think about X and I would urge you to formulate a different mental model for the instruction. The main reason is because X does not affect the PC in any way, other than the CPU's normal incrementing of the PC to fetch opcodes and instruction parameters based on the opcode. The X instruction does not *go* anywhere.

 

I'm assuming you understand a few principles of machine 9900 execution:

 

* Data and instruction opcodes are indistinguishable by the CPU. If you branch to a blob of data, the CPU will happily try to execute the data as instructions.

 

* Any additional data needed to satisfy an instruction's operands are stored in memory immediately following the instruction opcode. Particularly for symbolic addressing and the immediate instructions.

 

* The PC (program counter) always points to an even address, and *incrementing* the PC causes it to point the next even address (it is incremented two bytes at a time).

 

So, for example:

6000 aaaa C @1234,@5678
6002 1234
6004 5678
6006 bbbb LI R1,2
6008 0002
600A cccc X @ABCD
600C ABCD
600E dddd DEC R1
6010 eeee MOV R1,R2
. . .
ABCD ffff JEQ $+6

 

I didn't go look up the opcodes for the specific instructions, but where you see four letters, xxxx, would be just another number (data), but since the PC is pointing to them they are fetched by the CPU and treated as instructions.

 

So let's play 9900 and execute the code (assume PC=6000 to start):

 

Fetch opcode from PC (6000): aaaa C

Inc PC (now 6002)

Store aaaa in IR (Instruction Register)

Decode IR: aaaa

Fetch src operand from PC (6002)

Inc PC (now 6004)

Fetch data at address 1234 into T1 (T1 and T2 are internal temporary registers)

Fetch dst operand from PC (6004)

Inc PC (now 6006)

Fetch data at address 5678 into T2

Execute: ALU op (compare T1 to T2)

Update flags

 

Fetch opcode from PC (6006): bbbb LI

Inc PC (now 6008)

Store bbbb in IR

Decode IR: bbbb

Fetch immediate value from PC (6008)

Inc PC (now 600A)

Execute: Store 0002 into workspace register

Update flags

 

Fetch opcode from PC (600A): cccc X

Inc PC (now 600C)

Store cccc in IR

Decode IR: cccc

Fetch src operand from PC (600C)

Inc PC (now 600E)

Fetch data at address ABCD into T1 (T1 = ffff)

Execute: Move T1 to IR, cause CPU to skip Fetch

 

Decode IR: ffff (JNE)

Execute: jump or not based on flags

 

Fetch opcode from PC (600E): eeee DEC

...

 

Also note that if the instruction loaded by X from address ABCD was such that it required an operand that followed the instruction, then that data would be fetched based on the PC, just like any instruction. For example, if the data at ABCD represented an instruction like:

 

MOV @0000,R1

 

The *value* of 0000 at runtime would be loaded from the PC's current address, which is 600E:dddd in this example. So it would execute the equivalent of:

 

MOV @dddd,R1

 

And the PC would be 6010 upon the next fetch. So, the DEC at 600E is never executed, it ends up being treated as a data parameter to the instruction loaded by the previous X instruction. This is what I believe was happening in the jumpy game.

  • Like 1
Link to comment
Share on other sites

Can you post the block of code around the suspect instruction(s)? More eyes make for quicker troubleshooting.

 

Sorry I haven't got back to this yet. I should have written "I once knew".. hehe. I will see if I saved my notes though. But I don't think the fault was in that code, it was just the branch that decided that the task was complete. I never figured out how that branch got its information.

Link to comment
Share on other sites

Totally agree re the use of X as a debugger. I think there's also a specific instruction on the 9900, (is it HALT ? can't remember) that causes an interrupt (it would be called an exception, or a trap on other processor families) but it's not supported on the 4A motherboard; something to do with the interrupt decoding? I think that particular interrupt is not 'seen' by the hardware for some reason.

 

We don't have a software interrupt instruction (like BRK on the 6502, or TRAP on the 68000, or INT on the x86). You're thinking of the "external" instructions that expect external hardware to do something, these are CKON/CKOFF (clock on and clock off), IDLE, RSET (Reset) and LREX (Load or Restart Execution).

 

These instructions work by setting a 3-bit pattern identifying the instruction on A0-A2 (Most significant bits of address bus) and toggling CRUCLK. They are not supported on the 99/4A because there is no hardware to differentiate this operation from a CRU operation (which has 000 set on those address lines).

 

Besides the possibility of confusing some piece of hardware in the system (I've never looked in depth what the odds are, I think it's rather left to change), most of these instructions have no side effect. RSET zeros the interrupt mask (disabling interrupts). IDLE actually stops execution until an interrupt is received, which might conceivably be used for some form of power saving. It doesn't actually put the CPU to sleep, though, it goes into a sort of loop, pulsing CRUCLK until the interrupt stops it.

 

Interestingly, the game Slymoids uses IDLE, although in emulation I can't see that it relies on it, as Classic99 does not implement the halt and the game still seems to run fine. It may be accidental, or the author may have tried it then never got around to removing it. Would not be the only bug I've seen - Moon Patrol has an illegal opcode in the middle of it's main loop. ;) But since illegal opcodes have no effect on the 9900 you never notice.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...