Jump to content

speccery's Photo

speccery

Member Since 31 Jan 2016
OFFLINE Last Active Mar 29 2019 7:57 AM

Posts I've Made

In Topic: TMS9900 CPU core creation attempt

Sun Mar 24, 2019 9:50 AM

Cool.  I love that J1 processor.
 
For those who don't know about J1, imagine a subroutine call that takes 1 clock cycle and a return that takes 0 clocks. 
 
One of my thoughts, although I don't have any knowledge of Verilog is that Forth CPUs would benefit from having a workspace register to
assist multitasking.  The Forth stacks typically are in very fast on chip ram, but if you want to change tasks it can be awkward swapping the stacks (ie registers) in/out of conventional memory. So... if there were larger memory spaces available for a number of tasks and a workspace register, the chip could have fast context switching albeit for a finite number of tasks, which is typically ok for an embedded application.
 
You are probably one of the few people in the world who know about FPGA 9900 and J1. :-)

The J1 implements its stacks as two huge shift registers, where each shift operation is a shift by word length, typically 16 or 32 bits. The stacks are not deep, they're for the 16-bit version by default 15 deep for data stack and 17 for the return stack. So these stacks are implemented in the FPGA logic fabric, not in block memory. This also means that there are no stack pointers, at least for the J1A version. So you don't know how deep you're in the stacks... The source code for J1A is about 130 lines of Verilog. It is tiny. It is inspired by the Novic NC4016 to my understanding.  The J1 is an awesome project, and it comes with Swapforth already implemented. The basic J1 system for the BlackIce takes 1072 logic cells, so about one eight of the total capacity.

It is not only that subroutine calls and pretty much every other instruction takes 1 clock cycle, you can also combine certain operations such as the subroutine return to it. Oh, and it runs at 48 MHz on the BlackIce-II. I did not try to optimize it. 
I think I also ported it over to the Pepino board, as 32 bit version. Along the lines James had done his version for the Xilinx Spartan 6.
 

Is there a repository of your code for the J1->BlackIce project?

No but I guess I could set it up. I was playing with the Icestorm tools and used the J1 as the core to play with. I did not do much, my work amounted to merging the top level block from BlackIce examples with the J1. I tested it with both place-and-route tools: arcahne-pnr and the newer nextpnr. For the latter I had to study things a little to get the PLL done properly (the input clock is 100MHz, which the PLL takes to 48MHz).


In Topic: TMS9900 CPU core creation attempt

Sun Mar 24, 2019 3:35 AM

It's my job to say Forth has been used to wring out new hardware for 50 years for this very reason. HEX OCTAL BINARY, whatever radix you need.  ;-)

 

Thanks, that is a good comment. I have also used Forth to bring up hardware - the last project of this type was porting the J1 CPU for the BlackIce-II FPGA board. The J1 is essentially a Forth CPU. I'm tempted to add a co-processor system to my TI-99/4A FPGA system with this CPU. It is very compact and very fast. You probably already know about it. This could be used for example to aid debugging, to monitor TI-99/4A signals etc. To make it truly useful it would need to have some capability to interface with the TI's peripherals. On the other hand my next goal is to make my system more accessible by porting it to other low-cost and widely available boards. I'm trying to resist feature creep until then.

https://excamera.com...nx/fpga-j1.html


In Topic: TMS9900 CPU core creation attempt

Sat Mar 23, 2019 4:40 AM

It was great to be able to use PeteE's software, I found and fixed two bugs:

1. Despite my "testing" there still was a bug with the treatment of ST1 (A> flag) with the ABS instruction. The processing just lacked completely the special case that ABS instruction sets ST1 based on the source argument.

2. SLA0 did not set overflow flag properly if shift count was greater than one.

 

Fixing bug 1 got extended Basic fixed! So now I could resume what I was actually trying to implement, read access to the serial flash ROM chip. To my delight the code I had writing worked, and I was able to access the serial flash ROM from Basic with a series of call load(...) and call peek(...) statements. I wish the Basic had direct support for hexadecimal numbers, both input and output. The Oric Atmos Basic features these and also DOKE and DEEK operations, which enable peeks and pokes but with 16-bit values...

 

Anyway, with the bugs fixed, all the test cases pass now. It's great that this test is now also very easy to repeat whenever the CPU is updated.


In Topic: TMS9900 CPU core creation attempt

Fri Mar 22, 2019 3:05 AM

Here is a cartridge program I've been working on for validating CPU instructions.  It's based on your earlier description of a program that runs through various instructions with varying inputs and check the results.  It checks all combinations of the inputs 0,1,>7FFF,>8000,>AAAA,>FFFF with the following instructions: A AB ABS AI ANDI C CB CI COC CLR CZC DEC DECT DIV INC INCT INV LI MOV MOVB MPY NEG ORI S SB SETO SLA SLA0 SOC SOCB SRA SRA0 SRC SRC0 SRL SRL0 SWPB SZC SZCB XOR.  The status flags are set to two known states before each test, to verify that only the proper status flags are modified.  If all goes well, the display should show "OK!".  For the first 24 failures, a line is printed containing: the instruction name, the first and second input, the result, the expected result, then the status flags, and finally the expected status flags.  The first status flag byte is the result of the instruction after only EQ is set, and the second byte is the result after LGT AGT C OV OP are set.  Instructions ending in I have the inputs swapped. Shift instructions ending in zero use the R0 register for the shift amount, otherwise are shifted by 1.  I've tested Classic99 and it is ok, so it will be interesting to see how it fares on your CPU.

 

 

Thanks, this is awesome and extremely helpful to have an independent piece of verification code! I've not had time during the week to test this, but I am looking forward to doing so this evening. Hopefully something shows up immediately :)

 

Also your testing methodology is better than my test code, I should also test the instructions twice, to make sure the flags go both ways properly. Thus I can improve my test coverage by making a simple modification. Perhaps I should also work on the test code to make it a cartridge, could be useful to others too. 


In Topic: TMS9900 CPU core creation attempt

Mon Mar 18, 2019 2:03 PM

Well that was an interesting debugging session! At the end I understood that what I thought being the problem in computing subtraction incorrectly, the actual problem manifests itself in printing (and elsewhere too). Here is the problem under extended Basic, and below the explanation how I got there. I still don't know what is the offending CPU instruction, but I am getting forward.

 

Test program under extended Basic

 

The process how I found the problem was an interesting feature set galore of the FPGA system features, and using Stuart's cool LBLA / debugger module:

 

Since I thought the problem is in the subtract operation, I studied the excellent TI Intern book based on the comment from RXB SSUB routine address. I wrote a simple Basic program:

A=1
B=2
C=A-B 

and ran this under classic99, setting breakpoints at >D74 and >FA6 to see the contents of the scratchpad memory before and after the subtraction operation when running extended Basic. (I could have determined earlier the problem cannot be in this ROM code, as it is shared with regular TI Basic, and that was working, but bear with me - these things only make sense once you know where the problem is not present).

I could see the contents of floating point accumulator at 834A (the value 1) and the argument at 835C (the value 2) and after the operation the floating point accumulator became negative. That makes sense.

 

Next I wanted to verify if this is what happens with my FPGA CPU. This is where I got to use Stuart's cartridge and some features of the FPGA system.

First, taking advantage that in the FPGA system ROM actually is RAM, I loaded Stuart's cartridge and modified system ROM to call a subroutine at the beginning of subtract operation (I added the BLWP @>1360 instruction)

Inserted BLWP @>1360 instruction at >7DC
Notice that as I had to have space for my intercepting subroutine call. I overwrote the NEG instruction at >D7C and moved the NEG @>834A instruction to the intercepted routine. I placed the subroutine at >1364, writing over cassette support code.
The code jumped to from intercept at beginning of subtract

I then did the same operation again at the end of the floating point routine, at >FA6, this time moving the instruction MOV @>834A,R1 to the interception routine.

 

The second intercept point at the end of subtract (or actually rounding)
Second intercept destination
 

The actual benefit of the intercept routines is that they copy the entire scratchpad memory to a safe place, before and after executing Basic ROM's floating point subtract routine respectively. The FPGA system has 1 kilobyte of scratchpad memory instead of the regular 256 bytes, so I just copied the memory from 8300 .. 83FF first to 8100..81FF and at the end to 8200..82FF.

 

After making those patches to the system ROM, I copied the modified ROM to PC's disk. I then initialized the FPGA system again, this time with the modified ROM but with extended Basic cartridge inserted instead of Stuart's cartridge. Next I again performed my subtraction in Basic. Once running that piece of Basic code, I just read back the two copies (before and after subtract) of scratchpad memory, and compared them. At this point I saw that the subtract had in fact executed correctly, and the problem manifests itself when printing negative numbers - the minus sign does not appear. The problem also occurs with other operations, since cos and sin functions also have issues.

 

I am very happy with the DMA feature of the FPGA system, as this enables me to read and write the TI clone's memory while the system is running - super handy for debugging. The same mechanism is used when the system is booted up from PC (it can also boot from flash ROM).

 

Now, after this debugging session, I know where the problem is not. Progress.