Jump to content
IGNORED

VHDL for 99xx chips


pnr

Recommended Posts

It's very encouraging that we now have more people creating interesting projects in VHDL for the TMS9900 systems, thanks for all the efforts pnr!

 

I am interested in computer architecture in general, and having a small TMS9900 based self-contained system for FPGA will be indeed be a great platform for further development. This design at least is very well understood by us.

 

Once we get the breadboard project as you described above running, I am interested in how performance can be pushed forward. I know this is not perhaps an interesting direction for everyone, but I am interested in creating the fastest TMS9900 system we can make. This is really for personal interest, as there is no existing software that could take benefit of a much higher performance:

  • I'm very keen to try out how many TMS9900 cores I can cram into the FPGA. The dual ported RAMs of the FPGAs allow sharing internal ROM memories between two cores, without wasting memory blocks, so multicore implementations can be interesting in many ways.
  • With the serial port as a channel to outside world, the rest of the logic can be clocked to higher clock frequencies than in my TI-99/4A design. It will be interesting to see how high the clock frequency can be for the CPU core.
  • An additional direction I would like to try is to implement a cache memory for my TMS9900 core using two memory blocks (one for data, one for address tags for a simple direct mapped cache structure). This again would help in multiprocessor system, as each core could have its own local cache, and they could interface to an external memory over a shared bus.

With regards to pnr's TMS9902 design, a practical extension could be the addition of a receive FIFO, for example a 16550 style 16 byte FIFO. This could probably be done in a transparent way so that software would not need changes. Having said that I don't know if the bandwidth of TI-99/4A serial communication software is constrained by lost characters, or perhaps other things, or even if bandwidth is/has been an issue.

  • Like 4
Link to comment
Share on other sites

At one point several years ago, there was a website up that showed a 9900-based parallel computer using 16 TMS-9900 CPUs, each on its own circuit card with memory and IO on each card. Unfortunately, the last time I went looking for the site, it was gone. I haven't checked to see if the Wayback machine captured it though. . .so I'll have to look to see if I still have the URL.

  • Like 1
Link to comment
Share on other sites

At one point several years ago, there was a website up that showed a 9900-based parallel computer using 16 TMS-9900 CPUs, each on its own circuit card with memory and IO on each card. Unfortunately, the last time I went looking for the site, it was gone. I haven't checked to see if the Wayback machine captured it though. . .so I'll have to look to see if I still have the URL.

 

http://www.famkoplien.de/henry/TI99/

  • Like 1
Link to comment
Share on other sites

  • 3 weeks later...

A little off topic - discussing the "breadboard" design which incorporates my TMS9900 CPU core with pnr's TMS9902 UART and Stuart's Cortex Basic port. It is a speedy system, with the CPU running at 100 MHz. The Cortex Basic is a fast Basic to begin with on the TMS9995 which was the original CPU for it. The design is inspired by and based on Stuart's very nice breadboard systems.

 

I updated the git repository and wrote a bit more informative README file - with pictures :)

So now the breadboard design is ported to three different FPGA boards, more information provided at the README file which you can immediately see by clicking the link.
They all perform identically as only on-chip resources (memory) are used.
What is very interesting, is that on the Altera EP4CE22 the system only takes 9% of the LUTs. So now I really want to shrink the usage of on-chip memory, and put in many CPU cores. There are two ways to use external memory: the obvious thing is to use the SDRAM, but that does not work on the Mini board as there is no SDRAM. However, that board has a whopping 8 megabytes of SPI flash. So my plan is to implement an interface a simple system where the TMS9900 can run code directly off the SPI ROM. Sure, that will be pretty slow, but it will be interesting. On the Pepino FPGA board's site one of the example projects does this (it is a Mac clone). The Mac implementation runs 68000 code off SPI, but on the Pepino the SPI is wired in quad mode, so it actually achieves very high transfer rates in bursts. In QPI mode the data transfers are serial but 4 bits at a time, and running at something like 50MHz.
Anyway all the more reason to add caching… And to find a way to run the Cortex Basic with minimum RAM. I haven’t had time to work on it.
  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

Changed the topic title to "VHDL for 99xx chips", as the 9902 seems done now. Still more to do, like a 9901, a 9995 and a 99105, etc.

 

I've started on doing a 99xxx CPU and have some first results: it runs EVMBUG on a Spartan6 FPGA (EVMBUG is basically TIBUG+LBLA, see Stuart's site for details).

 

The design is based on that of the 99xxx as found in its datasheet, in particular the data path as described in Figure 3 and the microcode as described in Table 18 and Table 19. So far I have only done 9900 level functionality in a 99xxx architecture. When it comes to the data path the mapping from Figure 3 to my VHDL is:
- IR maps to ir, PC to pc, WS to ws, ST to st
- D and T map to t1
- K maps to t2
- MQ maps to t3
- MA maps to ea
- ALU maps to alu, BYTE SWAPPER to swapper
- B BUS maps to alu_b, A BUS maps to alu_A, E BUS maps to alu_out
- P BUS maps to ab, DI BUS maps to db_in
- MICROCONTROLLER maps to sequencer, CROM maps to sequencer_pla
- SHIFT COUNT maps to bitctr
Note that Figure 3, although an abstraction, seems to derive directly from the chip layout: https://en.wikipedia.org/wiki/Texas_Instruments_TMS9900#/media/File:TI_TMS99105A_die.JPG

The instruction decoder generates three starting points into the micro rom: sig_ins, sig_op1 and sig_op2. Point sig_op1 is the code for a source operand fetch, point sig_op2 is the code for a destination fetch and sig_ins is the actual instruction. The sequencer uses the three as needed, and there is no call stack in the sequencer.

What I like about the current code:
- More or less replicates the 99xxx design, incl. prefetch
- Easy to develop into a 9995, and into a full 99xxx with moderate effort
- Uses standard VHDL only (i.e. no vendor blocks)

What I don’t like about the current code:
- The code for the micro rom mixes the true rom with bus multiplexers
- The description of the st, t1 and t3 registers mixes basic storage/shifting with next state logic
- Some bits of logic are convoluted, e.g. the derivation of the flag values
I guess the two underlying discomforts are that there seems to be some unneeded complexity and that the code as written would not generate something resembling the real die when run through an ASIC synthesiser.

tms99000_v1.vhd.txt

  • Like 1
Link to comment
Share on other sites

Great, I like the idea of creating a soft core that is leaned on to the original TI specifications using the datasheets! Doing a core, implementing a stupid state machine which only emulates the behavior of a 9900/99xxx, is boring to me.

 

I already started such a project some years ago, but gave up for the moment (just right before my brain explodes ;) ), because it is really hard to implement a more asynchronous 70's hardware design with alls its tristate busses, Latches and so on, into modern hardware. Because I got some issues with synchronization and edge-detection. Thirty years old hardware designs works with asynchronous edge-detections. But because using the modern synthesis tools creating configurations for FPGAs, your synthesized result does not depends on VHDL syntax but rather often on your used coding style. So if you write something like

if rising_edge(strobe) then ...

Your synthesis tool thinks "Aha! A clock!" and act accordingly: It will route your strobe signal through a clock buffer, and if you have a few of these edge-detecting, you will soon run out of these buffers. If you try an synchronous implementaion, you will waste many of your FlipFlops and get timing issues, because the result of one tier is available the next (or the next after) clock cycle.

Link to comment
Share on other sites

Uhm, you don't get away from doing the CPU FSM with a "soft core". This is still hardware and still being implemented in an FPGA.

 

The biggest hassle with interfacing the old hardware is having to level-shift the voltages since modern chips (FPGAs and microcontrollers) are generally not 5V tolerant (some are, most are not). However, synchronization is just something that has to be done when crossing clock domains, it is not a big deal and your won't run out of FPGA resources implementing them. You would not use the external signals directly either, unless your design uses the host system's external clock to become synchronous with the host system. The F18A, like the original 9918A VDP, is asynchronous to the host system and the timing interface was not a problem. For a CPU you only have to be concerned with the small number of control signals (R/#W, MEM_EN, etc.) anyway.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...