Jump to content

Photo

TMS9900 CPU core creation attempt


171 replies to this topic

#151 RXB OFFLINE  

RXB

    River Patroller

  • 3,583 posts
  • Location:Vancouver, Washington, USA

Posted Sun Mar 17, 2019 2:59 PM

 

I should have known better, thanks Stuart! Once again you've already done what I was looking for, this seems perfect!  :)

I am running out of time today on this project, need to continue tomorrow, first with your cartridge.

Just for CLARIFICATION I did not post DISASSEMBLY of GROM and XB ROMs of XB.

 

This is the ORIGINAL GROM and ROM code from Texas Instruments, now the changes made for RXB are listed.

 

I did not remove any original code it is commented out and I post * RXB PATCH CODE * for places it was replaced



#152 speccery OFFLINE  

speccery

    Moonsweeper

  • Topic Starter
  • 359 posts

Posted Mon Mar 18, 2019 2:03 PM

Well that was an interesting debugging session! At the end I understood that what I thought being the problem in computing subtraction incorrectly, the actual problem manifests itself in printing (and elsewhere too). Here is the problem under extended Basic, and below the explanation how I got there. I still don't know what is the offending CPU instruction, but I am getting forward.

 

Test program under extended Basic

 

The process how I found the problem was an interesting feature set galore of the FPGA system features, and using Stuart's cool LBLA / debugger module:

 

Since I thought the problem is in the subtract operation, I studied the excellent TI Intern book based on the comment from RXB SSUB routine address. I wrote a simple Basic program:

A=1
B=2
C=A-B 

and ran this under classic99, setting breakpoints at >D74 and >FA6 to see the contents of the scratchpad memory before and after the subtraction operation when running extended Basic. (I could have determined earlier the problem cannot be in this ROM code, as it is shared with regular TI Basic, and that was working, but bear with me - these things only make sense once you know where the problem is not present).

I could see the contents of floating point accumulator at 834A (the value 1) and the argument at 835C (the value 2) and after the operation the floating point accumulator became negative. That makes sense.

 

Next I wanted to verify if this is what happens with my FPGA CPU. This is where I got to use Stuart's cartridge and some features of the FPGA system.

First, taking advantage that in the FPGA system ROM actually is RAM, I loaded Stuart's cartridge and modified system ROM to call a subroutine at the beginning of subtract operation (I added the BLWP @>1360 instruction)

Inserted BLWP @>1360 instruction at >7DC
Notice that as I had to have space for my intercepting subroutine call. I overwrote the NEG instruction at >D7C and moved the NEG @>834A instruction to the intercepted routine. I placed the subroutine at >1364, writing over cassette support code.
The code jumped to from intercept at beginning of subtract

I then did the same operation again at the end of the floating point routine, at >FA6, this time moving the instruction MOV @>834A,R1 to the interception routine.

 

The second intercept point at the end of subtract (or actually rounding)
Second intercept destination
 

The actual benefit of the intercept routines is that they copy the entire scratchpad memory to a safe place, before and after executing Basic ROM's floating point subtract routine respectively. The FPGA system has 1 kilobyte of scratchpad memory instead of the regular 256 bytes, so I just copied the memory from 8300 .. 83FF first to 8100..81FF and at the end to 8200..82FF.

 

After making those patches to the system ROM, I copied the modified ROM to PC's disk. I then initialized the FPGA system again, this time with the modified ROM but with extended Basic cartridge inserted instead of Stuart's cartridge. Next I again performed my subtraction in Basic. Once running that piece of Basic code, I just read back the two copies (before and after subtract) of scratchpad memory, and compared them. At this point I saw that the subtract had in fact executed correctly, and the problem manifests itself when printing negative numbers - the minus sign does not appear. The problem also occurs with other operations, since cos and sin functions also have issues.

 

I am very happy with the DMA feature of the FPGA system, as this enables me to read and write the TI clone's memory while the system is running - super handy for debugging. The same mechanism is used when the system is booted up from PC (it can also boot from flash ROM).

 

Now, after this debugging session, I know where the problem is not. Progress. 


Edited by speccery, Mon Mar 18, 2019 2:07 PM.


#153 PeteE OFFLINE  

PeteE

    Chopper Commander

  • 221 posts
  • Location:Beaverton, OR

Posted Wed Mar 20, 2019 12:55 PM

Here is a cartridge program I've been working on for validating CPU instructions.  It's based on your earlier description of a program that runs through various instructions with varying inputs and check the results.  It checks all combinations of the inputs 0,1,>7FFF,>8000,>AAAA,>FFFF with the following instructions: A AB ABS AI ANDI C CB CI COC CLR CZC DEC DECT DIV INC INCT INV LI MOV MOVB MPY NEG ORI S SB SETO SLA SLA0 SOC SOCB SRA SRA0 SRC SRC0 SRL SRL0 SWPB SZC SZCB XOR.  The status flags are set to two known states before each test, to verify that only the proper status flags are modified.  If all goes well, the display should show "OK!".  For the first 24 failures, a line is printed containing: the instruction name, the first and second input, the result, the expected result, then the status flags, and finally the expected status flags.  The first status flag byte is the result of the instruction after only EQ is set, and the second byte is the result after LGT AGT C OV OP are set.  Instructions ending in I have the inputs swapped. Shift instructions ending in zero use the R0 register for the shift amount, otherwise are shifted by 1.  I've tested Classic99 and it is ok, so it will be interesting to see how it fares on your CPU.

Attached Files



#154 mizapf OFFLINE  

mizapf

    River Patroller

  • 3,607 posts
  • Location:Germany

Posted Wed Mar 20, 2019 1:49 PM

I tried it in MAME, and I am getting INV printouts like this (24 lines):

INV 0000 0000 FFFF FFFF 808C 809C

So that means that the status register differs in bit 11 (from left), where the specs say that those bits are unused and set to 0? (TMS 9900 Microprocessor Manual, p. 21, section 3.4)

Attached Files



#155 PeteE OFFLINE  

PeteE

    Chopper Commander

  • 221 posts
  • Location:Beaverton, OR

Posted Wed Mar 20, 2019 2:38 PM

I tried it in MAME, and I am getting INV printouts like this (24 lines):

INV 0000 0000 FFFF FFFF 808C 809C

So that means that the status register differs in bit 11 (from left), where the specs say that those bits are unused and set to 0? (TMS 9900 Microprocessor Manual, p. 21, section 3.4)

It's not bit 11, it's actually the leftmost 8 bits of the status word twice, so it's bit 3 (from left) which means MAME INV is clearing the C status flag.  The Editor Assembler manual says only L> A> and EQ bits are modified by INV.  The reason I have two sets of status bits is because before the instruction is executed the status bits are set to EQ only, and then again with L> A> C OV P.  That way I can check if bits are being set or cleared properly, that you might not catch with a single test. Does it print OK on a real console?


Edited by PeteE, Wed Mar 20, 2019 2:51 PM.


#156 mizapf OFFLINE  

mizapf

    River Patroller

  • 3,607 posts
  • Location:Germany

Posted Wed Mar 20, 2019 3:17 PM

OK, understood. You're certainly right, this is a bug. I fixed it; also got an issue with OV bit set by SRA, also fixed. Thank you for this tool!

 

But what is this here:

 

SLA0 0000 8000 0000 0000 2024 282C

 

This means that R0 is used for shifting, but 24 = 00100100 (LAECOPX-) means that parity is expected to be set (where MAME keeps it reset)?



#157 PeteE OFFLINE  

PeteE

    Chopper Commander

  • 221 posts
  • Location:Beaverton, OR

Posted Wed Mar 20, 2019 3:41 PM

I really should have tested this on a real console again before posting the binary.  The shift by R0 was one of the tests I added afterward. This could possibly be a bug in Classic99.  I'm at work now, but I can try it later at home.

EDIT: Not a bug in Classic99, Tursi did his homework :D

Edited by PeteE, Wed Mar 20, 2019 5:57 PM.


#158 Asmusr OFFLINE  

Asmusr

    River Patroller

  • 3,110 posts
  • Location:Denmark

Posted Wed Mar 20, 2019 4:36 PM

It's passing on a real machine.



#159 mizapf OFFLINE  

mizapf

    River Patroller

  • 3,607 posts
  • Location:Germany

Posted Wed Mar 20, 2019 4:56 PM

In that case I'm not sure whether I understood the output. What is the operation that was performed in the reported line?



#160 PeteE OFFLINE  

PeteE

    Chopper Commander

  • 221 posts
  • Location:Beaverton, OR

Posted Wed Mar 20, 2019 5:08 PM

SLA0 0000 8000 0000 0000 2024 282C
 
This means that R0 is used for shifting, but 24 = 00100100 (LAECOPX-) means that parity is expected to be set (where MAME keeps it reset)?


Edit: actually the expected bits (282C) are right word, so it expects OV to be set (since 8000 shifted by 16 overflows). Looking at the MAME source, it seems like it should be triggering the overflow. The P flag is preserved, as it should be.


Effectively,
LI R0,>0000
LI R1,>8000
SLA R1,0
Source code, if it helps:
Spoiler

Edited by PeteE, Wed Mar 20, 2019 5:30 PM.


#161 mizapf OFFLINE  

mizapf

    River Patroller

  • 3,607 posts
  • Location:Germany

Posted Wed Mar 20, 2019 5:19 PM

What does the "!" label mean? (Mind that I am only using Editor/Assembler.)

 

I can try to step forward in MAME to the execution of that instruction.

 

(BTW, setting status bits could be easier with RTWP (load the status value into R15, R13 = current WS, R14 = $+2). The 9995 has a LST operation.)



#162 mizapf OFFLINE  

mizapf

    River Patroller

  • 3,607 posts
  • Location:Germany

Posted Wed Mar 20, 2019 5:48 PM

OK now. While fixing the last issue, I unintentionally changed the semantics of the OV in SLA. Now MAME delivers "OK!"


Edited by mizapf, Wed Mar 20, 2019 5:48 PM.


#163 PeteE OFFLINE  

PeteE

    Chopper Commander

  • 221 posts
  • Location:Beaverton, OR

Posted Wed Mar 20, 2019 5:51 PM

What does the "!" label mean? (Mind that I am only using Editor/Assembler.)


Oh sorry, that's a local label, supported only by Ralph's XDT assembler.

The breakpoint is at >632A.

Edit:

OK now. While fixing the last issue, I unintentionally changed the semantics of the OV in SLA. Now MAME delivers "OK!"


Oh cool, glad I could help make MAME better. Here's hoping it can help Speccery too.

Edited by PeteE, Wed Mar 20, 2019 5:55 PM.


#164 mizapf OFFLINE  

mizapf

    River Patroller

  • 3,607 posts
  • Location:Germany

Posted Wed Mar 20, 2019 6:06 PM

This is one of those bugs that only show up when your (spaceship | MunchMan | laser beam | medical probe) is at a position in the game that has common divisors with the sum of my birth year and my car license plate. (In other words: almost never, and when it does, you have absolutely no clue what has happened.)

 

Some years ago, a bug that was also related to the overflow status bit made it impossible to cure a patient in Microsurgeon because the game always started with the deceased patient. Now try to find out why everything else works, just not this game.



#165 speccery OFFLINE  

speccery

    Moonsweeper

  • Topic Starter
  • 359 posts

Posted Fri Mar 22, 2019 3:05 AM

Here is a cartridge program I've been working on for validating CPU instructions.  It's based on your earlier description of a program that runs through various instructions with varying inputs and check the results.  It checks all combinations of the inputs 0,1,>7FFF,>8000,>AAAA,>FFFF with the following instructions: A AB ABS AI ANDI C CB CI COC CLR CZC DEC DECT DIV INC INCT INV LI MOV MOVB MPY NEG ORI S SB SETO SLA SLA0 SOC SOCB SRA SRA0 SRC SRC0 SRL SRL0 SWPB SZC SZCB XOR.  The status flags are set to two known states before each test, to verify that only the proper status flags are modified.  If all goes well, the display should show "OK!".  For the first 24 failures, a line is printed containing: the instruction name, the first and second input, the result, the expected result, then the status flags, and finally the expected status flags.  The first status flag byte is the result of the instruction after only EQ is set, and the second byte is the result after LGT AGT C OV OP are set.  Instructions ending in I have the inputs swapped. Shift instructions ending in zero use the R0 register for the shift amount, otherwise are shifted by 1.  I've tested Classic99 and it is ok, so it will be interesting to see how it fares on your CPU.

 

 

Thanks, this is awesome and extremely helpful to have an independent piece of verification code! I've not had time during the week to test this, but I am looking forward to doing so this evening. Hopefully something shows up immediately :)

 

Also your testing methodology is better than my test code, I should also test the instructions twice, to make sure the flags go both ways properly. Thus I can improve my test coverage by making a simple modification. Perhaps I should also work on the test code to make it a cartridge, could be useful to others too. 



#166 Asmusr OFFLINE  

Asmusr

    River Patroller

  • 3,110 posts
  • Location:Denmark

Posted Fri Mar 22, 2019 10:45 AM

The test suite found an error in js99er.net. However, it was not the tested instruction (A) that didn't work but the instruction to clear the flag before the test (AB) that didn't clear the parity flag. Just something to be aware of if you get any errors.



#167 speccery OFFLINE  

speccery

    Moonsweeper

  • Topic Starter
  • 359 posts

Posted Sat Mar 23, 2019 4:40 AM

It was great to be able to use PeteE's software, I found and fixed two bugs:

1. Despite my "testing" there still was a bug with the treatment of ST1 (A> flag) with the ABS instruction. The processing just lacked completely the special case that ABS instruction sets ST1 based on the source argument.

2. SLA0 did not set overflow flag properly if shift count was greater than one.

 

Fixing bug 1 got extended Basic fixed! So now I could resume what I was actually trying to implement, read access to the serial flash ROM chip. To my delight the code I had writing worked, and I was able to access the serial flash ROM from Basic with a series of call load(...) and call peek(...) statements. I wish the Basic had direct support for hexadecimal numbers, both input and output. The Oric Atmos Basic features these and also DOKE and DEEK operations, which enable peeks and pokes but with 16-bit values...

 

Anyway, with the bugs fixed, all the test cases pass now. It's great that this test is now also very easy to repeat whenever the CPU is updated.



#168 TheBF OFFLINE  

TheBF

    Dragonstomper

  • 987 posts
  • Location:The Great White North

Posted Sat Mar 23, 2019 7:15 AM

 I wish the Basic had direct support for hexadecimal numbers, both input and output. 

 

It's my job to say Forth has been used to wring out new hardware for 50 years for this very reason. HEX OCTAL BINARY, whatever radix you need.  ;-)

 

There is now also the Hayes Tester syntax that lets you test code even while it's being compiled if you choose that.

I think it could be used for a CPU instruction set as well.

However writing the test cases and all the corner cases is still diligent work.

 

Anyway I guess it's all moot now since you have it working.  Congratulations!

\ test example.  BL should result in HEX 20 output to the stack
HEX
T{ BL -> 20 }T

\ tests for AND operation
T{ 0 0 AND -> 0 }T 
T{ 0 1 AND -> 0 }T 
T{ 1 0 AND -> 0 }T 
T{ 1 1 AND -> 1 }T
T{ 0 INVERT 1 AND -> 1 }T 
T{ 1 INVERT 1 AND -> 0 }T


#169 speccery OFFLINE  

speccery

    Moonsweeper

  • Topic Starter
  • 359 posts

Posted Sun Mar 24, 2019 3:35 AM

It's my job to say Forth has been used to wring out new hardware for 50 years for this very reason. HEX OCTAL BINARY, whatever radix you need.  ;-)

 

Thanks, that is a good comment. I have also used Forth to bring up hardware - the last project of this type was porting the J1 CPU for the BlackIce-II FPGA board. The J1 is essentially a Forth CPU. I'm tempted to add a co-processor system to my TI-99/4A FPGA system with this CPU. It is very compact and very fast. You probably already know about it. This could be used for example to aid debugging, to monitor TI-99/4A signals etc. To make it truly useful it would need to have some capability to interface with the TI's peripherals. On the other hand my next goal is to make my system more accessible by porting it to other low-cost and widely available boards. I'm trying to resist feature creep until then.

https://excamera.com...nx/fpga-j1.html



#170 TheBF OFFLINE  

TheBF

    Dragonstomper

  • 987 posts
  • Location:The Great White North

Posted Sun Mar 24, 2019 7:15 AM

 

Thanks, that is a good comment. I have also used Forth to bring up hardware - the last project of this type was porting the J1 CPU for the BlackIce-II FPGA board. The J1 is essentially a Forth CPU. I'm tempted to add a co-processor system to my TI-99/4A FPGA system with this CPU. It is very compact and very fast. You probably already know about it. This could be used for example to aid debugging, to monitor TI-99/4A signals etc. To make it truly useful it would need to have some capability to interface with the TI's peripherals. On the other hand my next goal is to make my system more accessible by porting it to other low-cost and widely available boards. I'm trying to resist feature creep until then.

https://excamera.com...nx/fpga-j1.html

 

Cool.  I love that J1 processor.

 

For those who don't know about J1, imagine a subroutine call that takes 1 clock cycle and a return that takes 0 clocks. 

 

One of my thoughts, although I don't have any knowledge of Verilog is that Forth CPUs would benefit from having a workspace register to

assist multitasking.  The Forth stacks typically are in very fast on chip ram, but if you want to change tasks it can be awkward swapping the stacks (ie registers) in/out of conventional memory. So... if there were larger memory spaces available for a number of tasks and a workspace register, the chip could have fast context switching albeit for a finite number of tasks, which is typically ok for an embedded application.

 

You are probably one of the few people in the world who know about FPGA 9900 and J1. :-)

 

Is there a repository of your code for the J1->BlackIce project?


Edited by TheBF, Sun Mar 24, 2019 7:16 AM.


#171 speccery OFFLINE  

speccery

    Moonsweeper

  • Topic Starter
  • 359 posts

Posted Sun Mar 24, 2019 9:50 AM

Cool.  I love that J1 processor.
 
For those who don't know about J1, imagine a subroutine call that takes 1 clock cycle and a return that takes 0 clocks. 
 
One of my thoughts, although I don't have any knowledge of Verilog is that Forth CPUs would benefit from having a workspace register to
assist multitasking.  The Forth stacks typically are in very fast on chip ram, but if you want to change tasks it can be awkward swapping the stacks (ie registers) in/out of conventional memory. So... if there were larger memory spaces available for a number of tasks and a workspace register, the chip could have fast context switching albeit for a finite number of tasks, which is typically ok for an embedded application.
 
You are probably one of the few people in the world who know about FPGA 9900 and J1. :-)

The J1 implements its stacks as two huge shift registers, where each shift operation is a shift by word length, typically 16 or 32 bits. The stacks are not deep, they're for the 16-bit version by default 15 deep for data stack and 17 for the return stack. So these stacks are implemented in the FPGA logic fabric, not in block memory. This also means that there are no stack pointers, at least for the J1A version. So you don't know how deep you're in the stacks... The source code for J1A is about 130 lines of Verilog. It is tiny. It is inspired by the Novic NC4016 to my understanding.  The J1 is an awesome project, and it comes with Swapforth already implemented. The basic J1 system for the BlackIce takes 1072 logic cells, so about one eight of the total capacity.

It is not only that subroutine calls and pretty much every other instruction takes 1 clock cycle, you can also combine certain operations such as the subroutine return to it. Oh, and it runs at 48 MHz on the BlackIce-II. I did not try to optimize it. 
I think I also ported it over to the Pepino board, as 32 bit version. Along the lines James had done his version for the Xilinx Spartan 6.
 

Is there a repository of your code for the J1->BlackIce project?

No but I guess I could set it up. I was playing with the Icestorm tools and used the J1 as the core to play with. I did not do much, my work amounted to merging the top level block from BlackIce examples with the J1. I tested it with both place-and-route tools: arcahne-pnr and the newer nextpnr. For the latter I had to study things a little to get the PLL done properly (the input clock is 100MHz, which the PLL takes to 48MHz).



#172 TheBF OFFLINE  

TheBF

    Dragonstomper

  • 987 posts
  • Location:The Great White North

Posted Thu Mar 28, 2019 7:25 AM

The basic J1 system for the BlackIce takes 1072 logic cells, 

 

How many logic cells did it take to re-create the 9900? 






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users