Jump to content

Photo

TMS9900 CPU core creation attempt


86 replies to this topic

#26 speccery OFFLINE  

speccery

    Chopper Commander

  • Topic Starter
  • 213 posts

Posted Wed Apr 26, 2017 2:27 PM

Sweet! Will you post a video of operation when you get to that point?

 

 

Yes of course, that is a promise - if I get this working :)



#27 speccery OFFLINE  

speccery

    Chopper Commander

  • Topic Starter
  • 213 posts

Posted Wed Apr 26, 2017 2:30 PM

Update at: https://hackaday.io/...ynthesis-passed

 

I added STCR and LDCR instructions. These are messy. DIV and MPY are still missing. But those things are beside the point - I did an ad-hoc totally buggy integration of the CPU core to my TI-99/4A FPGA design - and it did pass synthesis! On the very first time! Wow! Now I only have to make it work (sigh).

 

FPGA synthesis statistics are included in the blog entry I linked to above.



#28 kl99 OFFLINE  

kl99

    Dragonstomper

  • 658 posts
  • Location:Vienna, Austria

Posted Thu Apr 27, 2017 6:30 AM

Wow! What a milestone. From reading you have implemented a complete TMS9900 cpu including the instructions that are not supported by the 99/4A.

To be honest the readings are a bit above my skill level.

Good luck for debugging it to run. :)



#29 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,655 posts
  • Location:Flyover State

Posted Thu Apr 27, 2017 7:52 AM

<cough> prefetch </cough>



#30 mizapf OFFLINE  

mizapf

    River Patroller

  • 2,531 posts
  • Location:Germany

Posted Thu Apr 27, 2017 8:42 AM

What prefetch? The one from the 9995?



#31 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,655 posts
  • Location:Flyover State

Posted Thu Apr 27, 2017 9:33 AM

What prefetch? The one from the 9995?

If you are going to make a new processor and have already added commands that weren't on the 9900... why not add a prefetch?

The HD6309 was fully compatible with the 6809 on startup.  
Then you flip a bit and it enables the prefetch and some other things.
Still compatible but faster when you need it.
Just a suggestion.



#32 mizapf OFFLINE  

mizapf

    River Patroller

  • 2,531 posts
  • Location:Germany

Posted Thu Apr 27, 2017 9:39 AM

It depends on your goals. If you want to create an emulation of the 9900, prefetch does not belong to it (but to the 9995). So you could directly go for the 9995, or maybe for the 99000.



#33 speccery OFFLINE  

speccery

    Chopper Commander

  • Topic Starter
  • 213 posts

Posted Thu Apr 27, 2017 10:40 AM

Wow! What a milestone. From reading you have implemented a complete TMS9900 cpu including the instructions that are not supported by the 99/4A.

To be honest the readings are a bit above my skill level.

Good luck for debugging it to run. :)

 

 

To be honest it is not yet a complete TMS9900, since it lacks 1) interrupt support 2) MPY and DIV instructions. And of course number 3 is to make all of this actually work - the data path works etc already but flags I am pretty sure are still bogus. In other words, I have not yet added anything that a TMS9900 wouldn't have. But from here the additional things should be easy, I will first try to make my current processor do something on the FPGA chip.



#34 speccery OFFLINE  

speccery

    Chopper Commander

  • Topic Starter
  • 213 posts

Posted Thu Apr 27, 2017 10:47 AM

If you are going to make a new processor and have already added commands that weren't on the 9900... why not add a prefetch?

The HD6309 was fully compatible with the 6809 on startup.  
Then you flip a bit and it enables the prefetch and some other things.
Still compatible but faster when you need it.
Just a suggestion.

 

Just to say again - there is no added functionality (yet). I think before prefetch a more interesting addition would be a memory cache.

It is a good idea to extra features, which can be enabled at will.



#35 speccery OFFLINE  

speccery

    Chopper Commander

  • Topic Starter
  • 213 posts

Posted Thu Apr 27, 2017 10:50 AM

It depends on your goals. If you want to create an emulation of the 9900, prefetch does not belong to it (but to the 9995). So you could directly go for the 9995, or maybe for the 99000.

 

My first goal is to have a TMS9900 instruction set compatible CPU. My instruction decoder is already stricter than what you have on the TMS9900, for example on the TMS9900 RTWP instruction is >038X (where X can be anything) but on the TMS9995 RTWP is >0380. So my decoder only accepts >0380.

While I will hopefully have instruction set compatibility, I do not have a timing compatibility goal. I'm designing this to run at 100MHz, but as I have not run it on the real hardware I don't know if I reach that goal. Should be possible. 



#36 mizapf OFFLINE  

mizapf

    River Patroller

  • 2,531 posts
  • Location:Germany

Posted Thu Apr 27, 2017 11:29 AM

I do not have a timing compatibility goal. I'm designing this to run at 100MHz

 

Hmm... actually, why?

 

I mean, I understand why the F18A GPU runs at 100 MHz. You seem to be targeting a CPU at that speed. Just as a challenge? Mind that your TI hardware will not cope with speeds noticeably higher than the current 3 MHz (timer loops, device timings etc.).



#37 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,655 posts
  • Location:Flyover State

Posted Thu Apr 27, 2017 11:40 AM

 

Just to say again - there is no added functionality (yet). I think before prefetch a more interesting addition would be a memory cache.

It is a good idea to extra features, which can be enabled at will.

Even if you just cache the register file that would make a noticeable difference.  

A cache will not make any difference when accessing VDP RAM unless you have a very smart cache just for that.
 



#38 speccery OFFLINE  

speccery

    Chopper Commander

  • Topic Starter
  • 213 posts

Posted Fri Apr 28, 2017 2:06 PM

 

Hmm... actually, why?

 

I mean, I understand why the F18A GPU runs at 100 MHz. You seem to be targeting a CPU at that speed. Just as a challenge? Mind that your TI hardware will not cope with speeds noticeably higher than the current 3 MHz (timer loops, device timings etc.).

 

 

I should clarify that I am not trying to plug - at least initially - this CPU to the TI. It meant to be used as an embedded core in the FPGA. I already have the rest of the TI implemented inside the FPGA. That other logic already runs at 100MHz, and my VDP can accept new bytes probably at least at 25MHz, which is way faster my current core could do.


  • RXB likes this

#39 speccery OFFLINE  

speccery

    Chopper Commander

  • Topic Starter
  • 213 posts

Posted Fri Apr 28, 2017 2:09 PM

Even if you just cache the register file that would make a noticeable difference.  

A cache will not make any difference when accessing VDP RAM unless you have a very smart cache just for that.
 

 

Yes, the register file cache would make a dramatic difference. But it is a little nasty to implement: workspace changes would need to invalidate the cache, many TI programs use byte wide instructions to only update a byte within the register file, etc. So a "simple" implementation only doing 16-bit wide word operations on register operands would not work, the register cache tag entires would in practice need to be 15-bit wide to capture all the scenarios.

Again my VDP inside the FPGA already runs at 100MHz, so it goes very fast.



#40 matthew180 OFFLINE  

matthew180

    River Patroller

  • 2,387 posts
  • Location:Castaic, California

Posted Fri Apr 28, 2017 2:40 PM

Actually, a register cache would not make a lot of difference if an external memory was 10ns (talking about SRAM here).  And if all the memory is block RAM, a cache won't help at all.  It might help if an external SDRAM was being used.



#41 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,655 posts
  • Location:Flyover State

Posted Fri Apr 28, 2017 6:24 PM

Actually, a register cache would not make a lot of difference if an external memory was 10ns (talking about SRAM here).  And if all the memory is block RAM, a cache won't help at all.  It might help if an external SDRAM was being used.

For a register cache, it's small enough it should fit in the FPGA.



#42 kl99 OFFLINE  

kl99

    Dragonstomper

  • 658 posts
  • Location:Vienna, Austria

Posted Sun Apr 30, 2017 6:42 AM

all the best for the final hours :)



#43 matthew180 OFFLINE  

matthew180

    River Patroller

  • 2,387 posts
  • Location:Castaic, California

Posted Mon May 1, 2017 11:20 AM

For a register cache, it's small enough it should fit in the FPGA.

 

It is not a matter of fitting in the FPGA.  If the main RAM is being implemented with the FPGA's "Block RAM" or external 10ns SRAM (this design is running at 100MHz), then a cache will not speed things up since the RAM is just as fast as the small memory or flip-flops that would be used to implement a cache.  There is zero speed gain, but a lot of extra complexity.  However, if the main RAM was being implemented in a slower memory (any SDRAM, DDR2, DDR3, DDR4), then yes a cache might help some.



#44 speccery OFFLINE  

speccery

    Chopper Commander

  • Topic Starter
  • 213 posts

Posted Tue May 2, 2017 10:03 AM

all the best for the final hours :)

 

 

Thank you - real life took care of my last few days, no time to work on this. But I am will continue when I have time, perhaps even later today. I am very keen to get something running on the FPGA.



#45 speccery OFFLINE  

speccery

    Chopper Commander

  • Topic Starter
  • 213 posts

Posted Mon May 15, 2017 3:39 PM

First actual run of the TMS9900 CPU core!!

 

https://hackaday.io/...-successful-run

 

There is also a (quick-and-dirty late-at-night) video of the beast in action. Resolution is not fantastic, as my upload speeds are not that fast - and it is late. So I went with 960x540.

 

https://youtu.be/vn1OICqWQSo

 

This is too late for the retro challenge, my real life has been very involved lately... It is also too little, as it does not fully implement the TMS9900 yet (missing interrupts, DIV/MPY - and a truckload of debugging). So I cannot run actual TI ROMs on it. But surprisingly it does run my test code and at least it shows that the TMS9900 VHDL core can initialise and use my TMS9918 core. That may not look as much, but it actually is quite a lot of functionality.

 

The CPU is also completely unoptimised and the memory cycles for instance are way too long, but it still seems to run a lot faster than my TMS99105 based TI-99/4A clone. This can to an extent be seen on the video. That should be the case, as on the TMS99105 I run the CPU at 20MHz (5MHz cycles but 20MHz states), and the FPGA CPU runs at 100MHz state transitions. The memory cycles now take 90ns, which is ridiculously slow given that the FPGA board has 10ns SRAMs. I just threw in enough wait states to make sure the memory interface is stable, and I can trigger peripheral accesses reliably. I think at the end memory cycles could be done in 20ns or 30ns, so there is a ton of opportunities for optimisation.

 

The code is on GitHub.



#46 acadiel OFFLINE  

acadiel

    Dragonstomper

  • 936 posts
  • www.hexbus.com
  • Location:USA

Posted Mon May 15, 2017 4:42 PM

Fantastic!  Wow!



#47 matthew180 OFFLINE  

matthew180

    River Patroller

  • 2,387 posts
  • Location:Castaic, California

Posted Tue May 16, 2017 2:03 AM

Actually, if you are using external SDRAM then your memory access will never be better than about 70ns without a Block-RAM or SRAM-based cache (even then you may have a 70ns latency for a cache miss).  I wonder if the Xilinx tools have an SDRAM with cache controller wizard?



#48 speccery OFFLINE  

speccery

    Chopper Commander

  • Topic Starter
  • 213 posts

Posted Tue May 16, 2017 3:10 AM

Actually, if you are using external SDRAM then your memory access will never be better than about 70ns without a Block-RAM or SRAM-based cache (even then you may have a 70ns latency for a cache miss).  I wonder if the Xilinx tools have an SDRAM with cache controller wizard?

 

 

The FPGA board I am using currently has static RAMs, two 256K x 16 bit chips forming up a 32-bit databus. The SRAMs have a 10ns access time. So with this board there is no reason to use that many wait states, there are other circuits for the board running at a 50MHz memory clock.

 

My other primary FPGA board does have SDRAM, but using the Xilinx hardwired LPDDR SDRAM controller block it can run at crazy speeds, basically it supports 64-bit transfers at 100MHz or 800Mbytes per second transfer speeds once the burst is setup. There is no cache in there as far as I know. Anyway for now I am not planning to use this board, I'll stick to the SRAM based board.



#49 matthew180 OFFLINE  

matthew180

    River Patroller

  • 2,387 posts
  • Location:Castaic, California

Posted Tue May 16, 2017 2:35 PM

Ah, well, using SRAM then you do have a lot of room to improve the memory access. :-)

 

I'll have to give the Xilinx SDRAM controller a try.  It looked very confusing and complicated the first time I started messing with SDRAM, so I rolled my own controller based on a simpler design I found online (hamster works).  The problem with the burst mode and 64-bit transfers is that the 9900 does not really have that kind of access pattern, and you need a memory controller and cache to even begin to take advantage of any memory access over 16-bits.  For completely random access, the read access time is still about 70ns on even the fastest SDRAM.



#50 speccery OFFLINE  

speccery

    Chopper Commander

  • Topic Starter
  • 213 posts

Posted Tue May 16, 2017 3:03 PM

A little more progress - now I have proper interrupt support. 

https://hackaday.io/...rrupts-now-work

 

As far as I can tell (still unable to run the TI-99/4A ROMs) interrupts work properly both under simulation and with my test software.






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users