Jump to content
IGNORED

TMS9900 CPU core creation attempt


speccery

Recommended Posts

Time for another ambitious time constrained project: retrochallenge 2017/04 has just started. I enjoyed the last round, so it is time for another attempt. This time my plan is go and create the TMS9900 CPU core in VHDL. Naturally the project is an extension of my previous retrochallenge project, where I created a clone of the TI-99/4A using a TMS99105 CPU and a development board with an FPGA.

 

This project will be in some sense more ambitious personally as I have never created a CPU core before. Having said that I want to think that the creation of TMS9918, TMS9919, SAMS memory controller, GROM implementation etc. amounts to something. Plus I don't have to start from scratch, as I will try to shoehorn a new quick-and-dirty TMS9900 core into the previous design implementing rest of the console. There is a decent amount of debugging functionality available in the previous design. Still, I know I have very limited time and a lot of work in real life to get done, so I really do not know if this project will result in anything. Hopefully it will be a start at least.

 

My goal is not to create anything that is cycle exact, just to have a VHDL CPU core that can run TI Basic and at least some of the game cartridges. The TMS99105 project already thought quite a bit about compatibility, most games seemed to work fine despite vast timing differences.

 

I am sure quite a few people have done a core already in the past, so even if I somehow was successful I will not really be breaking new ground:

The F18A has an impressive TMS9900 core created by Matthew Hagerty, even if it is not completely compatible (in fact it is in many ways better).

Also I understand that Gary Smith has implemented a TMS9995 core in the past.

And there probably are others.

 

Well I do not care, it is time to reinvent the wheel! The goal ultimately is to have a TI-99/4A clone done entirely in FPGA. That would enable porting to many FPGA platforms, including the MIST. If I do the CPU core myself, I can come up with meaningful license terms. The plan is to open source the stuff at least at the end - with the assumption something useful comes out...

 

Link to hack-a-day

  • Like 11
Link to comment
Share on other sites

Will it run faster than a real 9900?

 

Any chance of a few extra features? :-)

 

 

I have not started the work yet :) Hopefully today I have some time.

Regarding speed, the FPGA board has 10ns external static RAMs and could do memory cycles at 50MHz. My aim is much more conservative, reliability first, so I target 50ns memory cycles. I'm going to drive the core logic at 100MHz, the gold standard used by F18A and my TI-99/4A clone. So the answer should eventually be yes, it should run a whole lot faster.

 

In FPGA world everything is flexible - did you have something specific in mind? Novix NC4016 compatible coprocessor perhaps :-D

Once the framework is in place and something works, it should be easy to add features. Comments are welcome!

 

But first things first, what my aim initially is to implement a memory controller, instruction fetch, reset instruction and always taken branch instruction.

Edited by speccery
Link to comment
Share on other sites

Hi. I'm thinking specifically of a stack and associated stack instructions. The Win994a emulator has implemented a whole truck load of additional instructions using unused opcodes. I'll send you a document. Very interested in this.

Mark

Link to comment
Share on other sites

Here you go. See the section called Win994A Gets a Real Stack.

 

attachicon.gifEnhancements.pdf

 

 

Thanks Willsy, yet another thing about the TI99 ecosystem I was unaware about.

 

I looked at the extensions, I can understand the motivation for the extensions but my first reaction is that I would not build these myself into the CPU. Here is why: the extensions are not compatible with the TMS99000 instruction set. As one example, TMS99000 adds BLSK and BIND instructions, that enable traditional stack based call and return sequence (I haven't tested them though). In that scenario one programmer designated workspace register becomes the stack pointer. Also TMS99000 uses the status register bits that TMS9900 left unused - and not for paging the top of 24K address space as in Win994a. If performance is the concern, that can be treated as a hardware problem and hopefully an efficient CPU core implementation makes things go fast.

 

It is probably a matter of taste, but from my point of view the extensions also don't follow the spirit of the TMS9900 architecture. For example having bespoke CPU supported paging for top 24K seems a bit strange, when SAMS paging already exists. My TMS99105 system has some CRU bits which enable SAMS paging to take place on the entire 64K address space, i.e. the extra CRU bits map out ROM, I/O devices and cartridge port, enabling 4K page slots throughout the address space.

 

Is there some significant software that exists for these extensions you mentioned? The document talks about a C compiler.

  • Like 2
Link to comment
Share on other sites

For what it is worth, the F18A implemented two stack based instructions (CALL and RET) that use a stack pointer that is always in R15. The opcodes are part of the unused space of the 9900 (not sure if they were unused in the 99105 though) and are also compatible with two of the stack instructions implemented in win994a. IMO the win994a stack instructions went overboard with all the variation, and just CALL / RET are enough.

  • Like 1
Link to comment
Share on other sites

Time for another ambitious time constrained project: retrochallenge 2017/04 has just started. I enjoyed the last round, so it is time for another attempt. This time my plan is go and create the TMS9900 CPU core in VHDL.

 

 

This sounds like a cool thing Speccery.

 

What tool chain are you using for VHDL. I have a Papillio board here with a Spartan 6 on it but have never got around to getting a working toolchain.

 

BF

Link to comment
Share on other sites

There may be some generic HDL tools out there, but for FPGA's like Xilinx or Altera there is really only one option: the manufacturer's tools. For Xilinx that would be the ISE 14.7 software with a WebPack (free) license. Note that ISE is *not* the newer Vivado software, since Vivado does not support the Spartan FPGAs (which is really stupid since Xilinx quit working on the ISE version, yet still sells the Spartan FPGAs...)

  • Like 1
Link to comment
Share on other sites

Very nice! Looking forward to read about your progress!

 

I made a lot of promotion for your TMS99105 board on the TI-Meeting in Birkenau. The demoing of your board worked just fine. :)

The info for this new project just a few days too late to be included in the various talks. :(

 

Most were not aware of your sidecar-project for the TI-99. From a gut feeling this is the one that most TI Users look forward to.

With this third project in the making the confusion will be even bigger, haha :)

 

I guess all projects will benefit from the others, if you fall into timing issues in one project, the fix can be incoporated into the other projects as well to make the stuff more precise to the actual hardware behavior.

Have you thought about code sharing across the projects yet?

  • Like 1
Link to comment
Share on other sites

Are you talking the DSR and GPL and Interrupt Extensions or the ones used for ROM's like XB to branch to specific address in RAM or ROM?

 

Also one for the DEBUGGER is there.

 

 

I don't know about those extensions, my comment referred to the architectural changes (the addition of SP register) and the new instructions as described in the document that Willsy provided in his post.

Link to comment
Share on other sites

For what it is worth, the F18A implemented two stack based instructions (CALL and RET) that use a stack pointer that is always in R15. The opcodes are part of the unused space of the 9900 (not sure if they were unused in the 99105 though) and are also compatible with two of the stack instructions implemented in win994a. IMO the win994a stack instructions went overboard with all the variation, and just CALL / RET are enough.

 

I think that this is the way to go. If I have understood the TMS99105 documentation correctly, CALL instruction corresponds to:

BLSK @ROUTINE,R15

and RET instruction would correspond to:

BIND *R15+

Thus you specify the register used as stack pointer there. BLSK is a two word instruction (opcode 0x00BF, ROUTINE) and BIND would be opcode 0x017F in this case. This arrangement creates a downwards growing stack using R15 as the SP on the TMS99000.

Link to comment
Share on other sites

Very nice! Looking forward to read about your progress!

 

I made a lot of promotion for your TMS99105 board on the TI-Meeting in Birkenau. The demoing of your board worked just fine. :)

The info for this new project just a few days too late to be included in the various talks. :(

 

Most were not aware of your sidecar-project for the TI-99. From a gut feeling this is the one that most TI Users look forward to.

With this third project in the making the confusion will be even bigger, haha :)

 

I guess all projects will benefit from the others, if you fall into timing issues in one project, the fix can be incoporated into the other projects as well to make the stuff more precise to the actual hardware behavior.

Have you thought about code sharing across the projects yet?

 

Thanks for the promotion :) that is nice to hear. I'm glad to hear the demos worked well.

 

I realise that following these different projects can become a bit challenging - this is one of the reasons I started this new thread on the CPU core project (which is the one I am currently working on - time permitting). I am doing all off this for fun as a hobby, so I will be bouncing back and forth between the projects on a whim. It is true that the projects build on top of each other. In chronological order I have done the following:

 

1. Memory expansion for the TI-99/4A. Initially intended for standard 32K extension, this then grew to include support for 1M AMS memory, then 64M extended memory, somewhere in between cartridge support got added (GROMs and paged ROMs), USB downloading support went in too and then there is support for the SD card as well. No DSR though, but I wrote a program to load TMS9900 code from a FAT16 formatter partition. Uses the Pipistrello FPGA board.

 

2. The TMS99105 project. This evolved as an offshoot of the above - I developed on the smaller Pepino FPGA board the USB memory download support first, then migrated that to the Pipistrello, then the retro challenge "competition" came along and I also wanted to build my own TI-99/4A clone from scratch so all of the stuff from the memory extension project got somehow brought over and mixed with the TMS99105 CPU interface along with the rest of the TI-99/4A done in VHDL.

 

3. I started the PCB design of a practical version of project 1, to be a "super PEB". Personally my goal here is to design and build an FPGA board, but it would be nice to be a useful one so it is only natural to make a PEB replacement out of it. I don't have a PEB, I never had one, so I need to build these things to have access to the full TI experience...

 

4. Another retro challenge came along, and I wanted to participate, and I was thinking that for project 3 above it would be cool to have the CPU core also inside the FPGA. So I am again off another tangent, to build the CPU core - another personal ambition I've been wanting do for a really long time. In practice this project uses most of the code of project 2 (the TMS99105 project), technically it is a branch of the same source code tree (not pushed yet to GitHub - I don't have much yet done).

 

The TMS99105 project code is all shared at Github. This has been the case all along. That should include the VHDL code, the Xilinx ISE14.7 project files, PC C code, assembler test code and DSR code, plus some random scripts. The PCB schematics are there too, but I think the Gerber files are not there yet (those are needed to manufacture boards). The TMS99105 project is by far the best documented, even if that means that there is for sure need for more documentation.

  • Like 3
Link to comment
Share on other sites

  • 2 weeks later...

I think my TMS9900 core is now turing complete, update at hackaday

But still implements only probably 25% - or less - of the necessary functionality.

 

It can properly execute the following code in simulator.

********** TEST 3 ** Simulation output
* Reset vector not shown
BOOT
  LI  R3,>8340    ** write to 8306 data 8340 1000001101000000
  LI  R0,>1234    ** write to 8300 data 1234 0001001000110100
  LI  R1,1        ** write to 8302 data 0001 0000000000000001
  MOV R0,*R3      ** write to 8340 data 1234 0001001000110100
  MOV *R3+,R2     ** write to 8306 data 8342 1000001101000010
*                 ** write to 8304 data 1234 0001001000110100
  A   R1,R2       ** write to 8304 data 1235 0001001000110101
  MOV R2,R8       ** write to 8310 data 1235 0001001000110101
  MOV R1,*R3      ** write to 8342 data 0001 0000000000000001
  A   R1,*R3      ** write to 8342 data 0002 0000000000000010
  MOV @>4,@>8344
  JMP BOOT

I still have only implemented a handful of instructions, basically:

  • All immediate instructions, such as AI and LI and LWPI
  • All branches, although only a few tested at all: JMP, JNE, JEQ, JNC, JOC all seem to work. Rest are there too, but the condition codes are not set properly (I am sure) so it doesn't really matter.
  • All dual operand instructions, but only the word versions (A, S, C, SOC, SZC, MOV)
  • All the source addressing modes for the above are implemented: Rx, *Rx, *Rx+, @LABEL, @TABLE(Rx). I haven't tested the indexed mode yet though.
  • The same for all destination modes, but for these only Rx and *Rx work properly. Additionally @LABEL works for MOV instruction. I need to refactor the logic a little so that it properly handles the destination operand read and write. Currently the side effects (auto increment and immediate operand fetches for @LABEL and @TABLE) are executed twice. For @LABEL that means the CPU reads the next opcode too...

None of this is in anyway optimised yet, so the CPU reads memory too much for example when processing @LABEL addressing mode (it fetches R0 since the decoder only later realises it does not need it).

Since JNE and JEQ work, I have been able to run loops, using AI to update loop counter.

 

I have been very busy due to work travel and other stuff, so I haven't had much time to work on this - progress is slow. I think the current work is the result of maybe 12-15 hours of work scattered in multiple parts. I will try to get the current features debugged, since the feature set is now wide enough that if it worked I could try to synthesise it into the FPGA and initialise the VDP to display something. I probably will need to add the byte versions of the dual operand instructions as well, that would enable me to use normal code instead of hand crafted code using only a handful of opcodes. Aah and I also need B and BL to do subroutines. So still work to be done...

  • Like 4
Link to comment
Share on other sites

Another update

Now I have done all single operand instructions except BLWP:

B    @LABEL  * Jump to 16-bit address LABEL
BL   @LABEL  * As above, but with link: PC stored to R11 first
CLR  R4      * Clear R4
SETO *R5     * Set memory word at address R5 to >FFFF
INV  R9      * Invert bits of R9
NEG  R10     * Negate R10 (i.e. 0-R10)
ABS  R10     * Take the absolute value of R10
SWPB R5      * Swap bytes of R5
INC  R1      * Increment R1 by 1
INCT R1      * Increment R1 by 2
DEC  R1      * Decrement R1 by 1
DECT R1      * Decrement R1 by 2
X    R3      * Execute the opcode in R3 (UNTESTED) 

The above at least *try* to set CPU flags correctly, no idea if that really works. Too many things to test. I also did not dare to test the X instruction yet, but theoretically it works - the opcode just goes from source operand to instruction register. Now that I write this I think it actually maybe does not work, since my first decode stage reads the opcode from the memory read bus not the instruction holding register, as it is during this stage the instruction register is written. Also according to the data sheet the X instruction is not supposed to change flags. I wonder what that actually means, if it can be interpreted so that the X instruction itself does not change flags but the opcode that gets executed does change flags, then my implementation will work (with the assumption of course that normal execution handles the flags properly - which it does not currently do).

 

The above with all 5 addressing modes properly working (very limited testing so far, but at least for CLR they all work, and the logic is the same):

CLR R5     * Clear R5
CLR *R5    * Clear memory word pointed to by R5
CLR *R5+   * As above, also increment R5 by 2 to point to next word
CLR @MEM1  * Clear the word with the 16-bit address MEM1
CLR @4(R5) * Clear the word in the address R5+4

Very pleased to have subroutines with BL and returns with B, this greatly hopefully helps in testing later.

Link to comment
Share on other sites

Also according to the data sheet the X instruction is not supposed to change flags. I wonder what that actually means, if it can be interpreted so that the X instruction itself does not change flags but the opcode that gets executed does change flags ...

Yes.

 

You may want to compare with my implementation of tms9900.cpp in MAME; see https://github.com/mamedev/mame/blob/master/src/devices/cpu/tms9900/tms9900.cpp

  • Like 2
Link to comment
Share on other sites

Two more updates today, now I really had a productive day for the core. I decided to build the whole core in simulation first and then proceed to synthesis. I did a lot of the big things of the TMS9900 today. And learned a lot about the CPU on the way - interestingly while I have written a fair amount of TMS9900 there is stuff to learn all the time.

 

Both update are at hackaday.io:

https://hackaday.io/project/20826-tms9900-compatible-cpu-core-in-vhdl

 

Now the core supports many more instructions:

  • AB, CB, SB, SOCB, SZCB, MOVB. I only tested AB and MOVB, the rest follow the exact same pattern, so if the 16-bit variety worked/works these will too. This did not amount to many lines of code, but was somewhat hairy due to all 16/8 byte processing. Byte operations require special care with effective address processing - it is not enough to have the data word from memory in some internal temporary register, the effective address is also needed to control the byte selection multiplexer.
  • BLWP and RTWP are there and they work! As one might imagine, both of these require many additional states to the CPU core. I have only tested BLWP @LABEL version, not the other addressing modes, but theoretically the rest will work since the logic for effective address calculation is exactly the same as for all other instructions in the single operand instruction group.
  • SLA, SRA, SRC, SRL shift instructions are there too. They all go through the same new CPU core states. This too was a little involved, and obviously required changes to the ALU to be able to perform all of these functions. What is nice is that after setup the shifting of bits happens in a single core state using the ALU, so each bit takes one clock cycle. I was very tempted to implement a barrel shifter, but that has to wait for later. I did not test yet the variable shift count variety, i.e. taking the shift count from R0, but the logic is there.
  • Single bit CRU instructions are now there: SBO, SBZ, TB. They also required 3 more states to the core, and I also added a delay counter to extend the width of CRUCLK high period to four clock cycles. At 100MHz that will still only be 40ns. I really hope the synthesis will go through like this. I also obviously needed to add the CRU bus in there to support these.

If I do get this whole thing working there are many things I'd like to add to optimize. For one thing the core really needs to have a memory cache to speed up workspace register access (and everything else). The barrel shifter would be another thing to add to speed up things. Also even without a traditional memory cache it would be nice to have a fully associative in-core cache for at least a few workspace registers, that would enable the core to completely avoid going to memory read states even with a cache. And instruction prefetcher would also be good, since the decoding phase for most instructions takes at least a few clock cycles, which could be overlapped with the prefetch to keep memory bus busy. That would also allow multiple cycle instruction decode, which would keep the amount of logic per clock cycle lower and would enable higher clock speeds.

 

An entirely other level of optimisation would be to create a data flow engine, but that would be a whole lot more complex...

  • Like 4
Link to comment
Share on other sites

Hi Erik,

i am enjoying your blog updates and that you seem to have time for the challenge now :)

 

 

Yes I am trying to get something done before the end of the challenge. Two more blog entries to read :) for you.

Still running the whole thing in simulation, I have not done synthesis yet, although all the constructs I have used can be synthesised to hardware. I am really looking forward to synthesis (which also will require integration to the rest of the TI-99/4A FPGA - but that should be straightforward). But when I started to look at even my simple test code for the TMS99105 (replacement boot rom) it uses a quite wide variety of functionality, so in order to avoid writing new test code I need a pretty full functioning CPU...

  • Like 1
Link to comment
Share on other sites

https://hackaday.io/project/20826-tms9900-compatible-cpu-core-in-vhdl/log/58114-xop-stst-and-external-instructions

 

I was planning today to push the implementation to the end, but fell short by 7 instructions. Still, good progress, I got XOPs working among a bunch of other instructions. I am starting to be very anxious to synthesise the design on the FPGA... Just for fun I ran my TMS99105 test code boot ROM on it under simulation, but just for the first 150 microseconds (that is 15 000 cycles of activity). It correctly started to clear VDP memory (judging from the timing diagram). I did not bother to run the simulation further, although it was pretty much immediate on my PC.

  • Like 4
Link to comment
Share on other sites

Crawling towards the end of implementation and testing under simulation:

 

https://hackaday.io/project/20826-tms9900-compatible-cpu-core-in-vhdl/log/58216-3-more-instructions

 

Now only LDCR, STCR, MPY and DIV remain! Plus a ton of much needed testing and bug fixing. Plus interrupt support. Plus synthesis and integration to the FPGA. And more testing...

  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...