Jump to content

speccery

Members
  • Posts

    920
  • Joined

  • Last visited

Everything posted by speccery

  1. Well hello after a long while. I have been preoccupied with other things, but during the past few weeks I've found a little time to work again on the TI-99/4A FPGA clone. I really ought to be working on the Collectorvision Atari 2600 code, but could not help but spend some time with the TI-99/4A first. Here, I wanted to follow a bit my original passion which was to have a fast TI-99/4A. This time I also wanted to put some computer architecture theory into practice: I added an on-chip cache memory to my TI-99/4A clone, while also optimising the VHDL code a bit. The result is that instruction execution speed jumped from 23 times the original to 39 times the original speed. The TMS9900 core is now a little simpler than it used to be, but still far from an elegant design, although getting a little better. I have two plans on this project to follow up: First add a more speed, by going from the current non-pipelined design to a slightly pipelined design in the sense that there would be a two stage pipeline, where both stages would take multiple clock cycles. The first stage would be instruction fetch and decode stage, while the second would be instruction execution stage. I could not go to this direction in the past easily since there only was (and still is) one memory bus. But now that I have a working cache, I have much more memory bandwidth to play with. The cache is currently outside the CPU core, so it is serving instruction and data fetches. It is a super simple design: direct mapped with write-through update policy. 1 kilobyte data capacity and about half a kilobyte in tag memory. The whole thing is implemented as a simple 1k x 36 bit memory block (not all of the bits are used in each 36-bit word). Having the cache outside the CPU core is not ideal, so I am probably going to add another cache for instructions only and pull that inside the core, into the fetch/decode stage, so that it can operate in parallel to the execution stage. This should increase performance quite significantly. The second intention I have is to port the TI-99/4A core to a few more FPGA boards, in order to make this design more accessible for others. The cache is also an enabler of sorts in this sense, since now I can easily support slow buses (such as SPI connected flash memories) for cartridge ROMs, I could support DRAM fetches in burst mode enabling FPGA boards with DRAMs only to be effectively used, and I can also support quite small FPGAs since I could now modify the design in a way that doesn't anymore need a lot of on-chip memory while still running at a reasonable speed. Specifically I have the low cost blackice-ii board in mind as one target for the TI implementation, this FPGA only has 16K RAM on board.
  2. FPGA CPU version of the video got uploaded. This is the original version of the demo, not some custom one. updated: After a break I continued to tweak the VHDL, in an attempt to get the splitscreen3_demo.a99 working more smoothly. I just added registers so that the COINC and 5TH sprite flags are set pending and actually flagged at the end of a scanline, as opposed to immediately when they occur. This way there would be some CPU time between two consecutive settings of the flags. The changes maybe helped a little, but the sine curves still are jerky.
  3. I uploaded two videos, the latter one is still uploading as a write this, demonstrating the Megademo running on the FPGA system using TMS99105 CPU and then with my FPGA CPU core. The FPGA CPU video goes through the demo twice, once at running with a lot of wait states, bringing execution speed close to the original TI-99/4A, and then running at zero wait states, or around 23x the CPU speed. Here is a link to the TMS99105 version. This is a special compilation of the megademo, there are no actual code changes but I edited the controller.a99 file so that the video starts with the multicolour demo, I had problems running this phase of the demo for obvious reasons - the multicolour mode was not implemented... What is new here is that I now added to my TMS9918 code the ability to detect sprite coincidence, so the demo no longer gets stuck in the splitscreen3_demo.a99 phase. Timing behaviour is different though, as can be seen in the video. I guess one of the next challenges for me then would be to make a new demo phase, which would take advantage of the increased processing speed.
  4. Thanks @asmusr. No problem with the comment being old - this happens to the best of us. And thanks for putting in the time and energy to create such an amazing demo in the first place! I probably did not write very clearly - I have had the scan line buffer in there from day 1 of my TMS9918 implementation. You’re right that’s needed for sprites, but it is even more importantly required for scanline doubling for VGA output. In fact my scanline buffer is has double the horizontal resolution - when TI graphics output is written to it each pixel is processed twice to have a 512x192 resolution, which is scanline doubled to 512x384 fitted into a 640x480 VGA screen. The y-coordinate off by one feature I incorporated when I built the sprite engine... already in 2016
  5. Had today a little time to tinker my TI-99/4A FPGA clone. Strictly speaking I was now working on the TMS99105 version, but since this design shares most of the VHDL code with the full FPGA implementation, I can work on either for as long as I am not working on the FPGA CPU core itself. Anyway what I have decided to try to do is to improve compatibility and fix all the bugs I know about. The Megademo has been very useful in this regard, I found two bugs in the design, both on my TMS9918 implementation. I had already once decided not to complete my TMS9918 VDP since Matthew's F18A is already a feature complete version (with many additional features as I am sure people here would know), but had to revisit that decision since as long as the FPGA system is not correctly running all the software I throw at it I cannot know if something not working is due to the CPU or the VDP or something else. One of the missing features is the multicolor mode (providing 64x48 resolution with 15 colors per pixel). The rotozoom portion of the demo uses this mode, and was displaying garbage. But no more - now it is fixed. I remain amazed how very small changes to the VHDL code create new features. Adding the multicolor mode amounted to only minor changes to pattern fetch address generation, and the pixel shifter. Overall perhaps 10 lines of code were added/changed. And now the rotozoom runs - and it runs fast on the TMS99105! Overall the whole demo runs very nicely, that is - until it encounters the "sine wave split screen" where the system just halts. Now that I have found the Megademo source code and located finally the root cause for the halt: my VDP implementation does not yet generate the COINC status, I had completely forgotten that I did not built it. The COINC flag is set whenever two sprites have overlapping pixels and reset every time the VDP status register is read. On a real TMS9918 silicon the generation of this flag is easy since it has dedicated hardware to support drawing four sprites per scanline and it is easy to set the flag if any two sprite shifters are active simultaneously (or this is how I assume it works). My TMS9918 implementation is different, I have only one sprite generator which renders to a scanline buffer. The hardware is run in a loop and can render all 32 sprites on a single scanline. In fact I think I could support many more sprites, probably at least 128 per scanline. Here is the problem: due to the hardware being reused it needs special additional support to detect sprite overlap. Currently when it is writing pixels to the line buffer it is doing just that - writing. It does not care what is already in the buffer, the pixels overlaid by sprites just get overwritten. Sprites are rendered from lowest priority to highest, so that the highest priority sprites are rendered last and will be visible on top of any other sprites or characters. Alas, this "writing only" cannot work when you need to know if a pixel has been written to the linebuffer by character data or a previous sprite. So I will need to revise the state machine so that there will be an additional per pixel flag memory that is read when a sprite is rendered to detect the scenario when there are two sprite writes to the same pixel. This in turn means that in the state machine now will need additional states to perform the reads prior to writes. According to the TMS9918 data sheet the COINC flag is set even for transparent sprites, so the flags will need to be read from and written to even if the actual pixels are not visible. What a pain, and has to wait for another day and more time. Interestingly the source code of megademo (splitscreen3_demo.a99) has bogus comments - the comments lead one to believe that scanline position detection is done with the 5th sprite flag in the VDP, but in reality the code is reading the COINC flag. I already support the 5th sprite flag, so this would not have been a problem, and I initially thought the bug on my FPGA hardware the Megademo freezing was due to something outside the VDP, but now I know that the CPU is polling the COINC flag in busy loop. As it never gets raised in my FPGA design the demo just freezes...
  6. Yes, this is the exact same board. The manufacturer is QMTECH, and I did find all relevant documentation. SDRAM access is much more complex. This actually is still phase where I am at - after a 6 month pause in FPGA work I am still trying to remember where I left off, but I have integrated the TMS9900, TMS9902 cores on to this board, and I integrated SDRAM controller there too but if I remember I was still trying to get that working. I will e-mail you the manuals, so you can take a look what exactly is on board, but there are are three chips on the red base daughter board: CY7C68013 USB chip (which I have not used), ADV7123 VGA DAC (24-bits) and CP2102 USB to serial port. The last one I have used with your TMS9902 core successfully. I don't remember if I used the VGA DAC yet or not, there are a bunch of sample projects, one of them uses VGA and another one SDRAM. To make the story short - I found the boards so useful and affordable that I have 3 of the XC6SLX16 FPGA boards and two daughter boards. I don't recall if I bought them from the same seller, probably not.
  7. After a long while an update to my TMS99105 project too - I ported some features of my second version of this project (using my own TMS9900 CPU design) back to the TMS99105 CPU version. That enabled me to run the very cool megademo for the first time with the TMS99105 CPU. More explanation at hackaday: https://hackaday.io/project/15430-rc201699-ti-994a-clone-using-tms99105-cpu/log/153444-running-megademo-on-the-tms99105
  8. Yesterday and today I fixed in total four bugs in the FPGA CPU, these are documented in two blog postings, here is a link to the latter one. Three bugs with flag handling and one major bug in the hardware divider fixed. https://hackaday.io/project/20826-tms9900-compatible-cpu-core-in-vhdl/log/153286-two-more-fpga-cpu-bug-fixes
  9. Wow its been a long time without updates! The TI Treff is on-going in Germany, I did not have the time to go there, but inspired by the event - and also by the fact that after my house move my work room is beginning to be in good shape - I booted both version of my FPGA TI-99/4A. I was happy to see that both FPGA boards still work. The other one had the TMS99105 daughterboard plugged in, while the other was running my VHDL TMS9900 core. I spent some time working on the latter, fixing a couple of reset related problems - and discovering a bug. Apparently even from BASIC my CPU claims the 1*-1 equals 1. Well whatever, the negative numbers are just a nuisance https://hackaday.io/project/20826-tms9900-compatible-cpu-core-in-vhdl
  10. Sorry for the extended silence. My time during the summer has been taken mainly by house move. It's been a long project, but pretty much done now. I am hoping I will start to have some time to work on the TI-99/4A soonish. Having said that I haven't yet wired up my development computers yet, let alone the TI, but I will get there.
  11. I do now have the MIST system, and I have ported the CPU to the Altera toolchain, so I will probably make a port for the MIST myself. The code is open source so obviously this available for anyone...
  12. For what its worth I realised that the ET-PEB has been now covered twice in the RCR Podcast (Retrocomputing Roundtable), in episodes 168 and 172. http://rcrpodcast.com The first one (episode 168) I discovered just by listening - that was nice The most recent episode 172 talks about the MIDI experiment I did, but I have not yet listened to what was discussed in there, hopefully something nice I got notification about it through twitter. I discovered the RCR podcast probably around episode 70 or 80, and have been listening to it ever since. I can wholeheartedly recommend this podcast to anyone interested in retrocomputers, and I bet most of you here already are very much aware of its existence.
  13. I'll also note in this thread that if you're interested in FPGA based TMS9900 stuff, there is an update in pnr's TMS9902 VHDL thread. I've setup a repository with a little explanation at: https://github.com/Speccery/breadboard
  14. A little off topic - discussing the "breadboard" design which incorporates my TMS9900 CPU core with pnr's TMS9902 UART and Stuart's Cortex Basic port. It is a speedy system, with the CPU running at 100 MHz. The Cortex Basic is a fast Basic to begin with on the TMS9995 which was the original CPU for it. The design is inspired by and based on Stuart's very nice breadboard systems. I updated the git repository and wrote a bit more informative README file - with pictures https://github.com/Speccery/breadboard So now the breadboard design is ported to three different FPGA boards, more information provided at the README file which you can immediately see by clicking the link. They all perform identically as only on-chip resources (memory) are used. What is very interesting, is that on the Altera EP4CE22 the system only takes 9% of the LUTs. So now I really want to shrink the usage of on-chip memory, and put in many CPU cores. There are two ways to use external memory: the obvious thing is to use the SDRAM, but that does not work on the Mini board as there is no SDRAM. However, that board has a whopping 8 megabytes of SPI flash. So my plan is to implement an interface a simple system where the TMS9900 can run code directly off the SPI ROM. Sure, that will be pretty slow, but it will be interesting. On the Pepino FPGA board's site one of the example projects does this (it is a Mac clone). The Mac implementation runs 68000 code off SPI, but on the Pepino the SPI is wired in quad mode, so it actually achieves very high transfer rates in bursts. In QPI mode the data transfers are serial but 4 bits at a time, and running at something like 50MHz. Anyway all the more reason to add caching… And to find a way to run the Cortex Basic with minimum RAM. I haven’t had time to work on it.
  15. Quite a few of the micro controller's pins are already exposed on pins. Might add up to 8 bits even... They could be used as a parallel port too, but that of course would require firmware support.
  16. This is slightly off-topic: I've recently worked with pnr on a breadboard project. This is effectively a version of Stuart's TMS9900/TMS9995/TMS99010 breadboard system on an FPGA. pnr created a simple top level VHDL module to tie together his TMS9902 UART and my TMS9900 core. I have spent some time debugging and performance testing this system. I have created a GitHub repository with the project files at https://github.com/Speccery/breadboard I have created two versions of this project there, on for the "XC6SLX9 Mini board" and one for "XC6SLX16 board". These are both low-cost FPGA boards available from multiple eBay sellers, i.e. coming from China. In addition to the FPGA board one needs a Xilinx programming cable. My intention is to port this design over to other FPGA boards I have, including boards with Altera FPGAs. This preparation step should make it pretty easy to port the whole TI-99/4A design to the MIST as well. I acquired a MIST board a while ago. On the board with the larger XC6SLX16 chip (example eBay link to a bundle with daughter board) I did run some performance benchmarks of my TMS9900 core. I have in the past run this at 100MHz, but I haven't really tested how fast it could run on a Spartan 6 chip. The simple breadboard design is a good testbed for benchmarking the TMS9900 core, as the UART is the only interface to the external world and modifying bitrate divider is simple to test different clock frequencies. I have only done some brief tests and I have not done any critical path optimisation, but when using the internal memories of the FPGA I've now run the TMS9900 core and the whole breadboard design at 177MHz. At this clock rate the performance is about 55x times the speed of the zero wait state TMS9900 3MHz system as documented by Stuart. I haven't done many tests so I don't know if the design operates fully, but it does run the Cortex Basic interpreter seemingly correctly. Some of this work is documented at pnr's TMS9902 VHDL thread in this same forum.
  17. That's nice! I took a quick look at the MISTer board, and what I can tell is that it's a very high performance system with dual ARM cores. I'll ask a simple question to save some time - the MISTer system is effectively a DE0 Nano board with an additional board, is that right? Sorry I did not do much googling... Any recommendations where to get the boards and what would be the cost?
  18. It's very encouraging that we now have more people creating interesting projects in VHDL for the TMS9900 systems, thanks for all the efforts pnr! I am interested in computer architecture in general, and having a small TMS9900 based self-contained system for FPGA will be indeed be a great platform for further development. This design at least is very well understood by us. Once we get the breadboard project as you described above running, I am interested in how performance can be pushed forward. I know this is not perhaps an interesting direction for everyone, but I am interested in creating the fastest TMS9900 system we can make. This is really for personal interest, as there is no existing software that could take benefit of a much higher performance: I'm very keen to try out how many TMS9900 cores I can cram into the FPGA. The dual ported RAMs of the FPGAs allow sharing internal ROM memories between two cores, without wasting memory blocks, so multicore implementations can be interesting in many ways. With the serial port as a channel to outside world, the rest of the logic can be clocked to higher clock frequencies than in my TI-99/4A design. It will be interesting to see how high the clock frequency can be for the CPU core. An additional direction I would like to try is to implement a cache memory for my TMS9900 core using two memory blocks (one for data, one for address tags for a simple direct mapped cache structure). This again would help in multiprocessor system, as each core could have its own local cache, and they could interface to an external memory over a shared bus. With regards to pnr's TMS9902 design, a practical extension could be the addition of a receive FIFO, for example a 16550 style 16 byte FIFO. This could probably be done in a transparent way so that software would not need changes. Having said that I don't know if the bandwidth of TI-99/4A serial communication software is constrained by lost characters, or perhaps other things, or even if bandwidth is/has been an issue.
  19. pnr's TMS9902 VHDL core works nicely! I integrated it into my TMS9995 breadboard design - I built this two years ago or so before I dived into the TI-99/4A hobby. https://youtu.be/IGBE18uBV_o In the video you can see the TMS9902 working nicely with an actual TMS9995 processor chip. The breadboard is my version of Stuart's TMS9995 breadboard as documented very nicely by him at: http://www.stuartconner.me.uk/tms9995_breadboard/tms9995_breadboard.htm In my breadboard I have a small Xilinx Spartan 6 FPGA board connected to the breadboard and delivering a lot of stuff, now including the UART. This project is documented at: https://github.com/Speccery/fpga99
  20. I decided to write a quick comment as I’ve been silent for a while - things have been busy at work and I have been traveling so unfortunately I haven’t been able to work on the project. I am about to board a flight to Asia, once I get back I hope to do some progress on the ET-PEB. The next step will be to build a few more prototypes so that I can share them with some members of the community to do more testing...
  21. To illustrate, in the following are a few pieces of code from my tms9900.vhd. All of this stuff is occurring inside the same >>if rising edge<< block. -- process declaration, line 347 -- here a few variables are declared process(clk, reset) is variable offset : std_logic_vector(15 downto 0); variable take_branch : boolean; variable dec_shift_count : boolean := False; -- a couple hundred lines omitted -- from line 636 onwards, this is the giant state machine -- sorry about the indentation, make your window wide... when do_branch => -- do branching, we need to sign extend ir(7 downto 0) and add it to PC and continue. cpu_state <= do_fetch; -- may be overwritten with do_stuck take_branch := False; case ir(11 downto is when "0000" => take_branch := True; -- JMP when "0001" => if ST(14)='0' and ST(13)='0' then take_branch := True; end if; -- JLT when "0010" => if ST(15)='0' or ST(13)='1' then take_branch := True; end if; -- JLE when "0011" => if ST(13)='1' then take_branch := True; end if; -- JEQ when "0100" => if ST(15)='1' or ST(13)='1' then take_branch := True; end if; -- JHE when "0101" => if ST(14)='1' then take_branch := True; end if; -- JGT when "0110" => if ST(13)='0' then take_branch := True; end if; -- JNE when "0111" => if ST(12)='0' then take_branch := True; end if; -- JNC when "1000" => if ST(12)='1' then take_branch := True; end if; -- JOC (on carry) when "1001" => if ST(11)='0' then take_branch := True; end if; -- JNO (no overflow) when "1010" => if ST(15)='0' and ST(13)='0' then take_branch := True; end if; -- JL when "1011" => if ST(15)='1' and ST(13)='0' then take_branch := True; end if; -- JH when "1100" => if ST(10)='1' then take_branch := True; end if; -- JOP (odd parity) when others => cpu_state <= do_stuck; end case; if take_branch then offset := ir(7) & ir(7) & ir(7) & ir(7) & ir(7) & ir(7) & ir(7) & ir(7 downto 0) & '0'; pc <= std_logic_vector(unsigned(offset) + unsigned(pc)); end if; So basically above two variables are used. offset is used simply to make the code more readable. take_branch is calculated with its own case clause, to address all the different cases (hopefully). As you can see you could write a logic equation to calculate take_branch in one go, but it would be one pretty messy equation, whereas the code above is readable (IMHO). Note that the variable take_branch is set and then immediately used, so all of this is occurring during the same clock cycle (rising edge), its value does not need preserved any further, Sorry about many comments - I wrote the two first ones on my phone...
  22. And when I spoke about scope above, I mean scope in the sense of time, not exactly as a scope in code. I would have written this so that all code touching/assigning v would be inside the >>if rising edge<< which reduces it to syntactic sugar. In a simple scenario like this using the variable does not add much, but Ive used them when the logic is more complex (if clauses etc) but still occurring at a given time, not across changes in clock.
  23. Version B looks bogus to me, so I concur with Quartus. I have used variables only in limited ways: the way I have used them is just to clean code, in other words in a way where the same logic could be written without the variable in the first place, but using variables allows one to break down a complex statement/conditional thing into multiple easy to understand lines of code. If you follow this line of thought - variables are just syntactic sugar - it is clear that version B is bogus, since the scope where v is incremented is different from where the value of v is retrieved. In other words, v is not stateless. Note that I have not read textbooks about this topic almost at all, this is just my simple line of thought.
  24. To continue on the discussion about simulation workbenches, I concur that it definitely makes sense to create one. I keep forgetting that ISE has the facility of creating a template for that, thanks matthew180 for reminding about that. I had used it, then I coded some manually, then I remembered again that the tool exists... For pnr’s benefit it’s perhaps worth pointing out that even a very simple test bench can give you a lot of information. Normally you don’t need much more stimulus in the test bench than a clock and a reset to get started. With the TMS9902 you probably also want to include a CRU write method, so that you can initialize the design and try to send a byte.
  25. I had no time yesterday to do anything, and I have a workshop this weekend; next weekend heading to Asia, so not much time at home. Ill do what I always do and take a few FPGA boards with me, Ill try to do some integration of your code - that would be fun. The EPM7160 seems to be a very expensive chip. I took a look last week, perhaps I was looking at the wrong place, but paying nearly 200 per chip is just too much. Anyway with the experience I have now gained with the XC95144XL CPLD I will definitely lean towards real FPGAs in the future. I dont have that much time for the hobby, so when I have the time Id rather not fight against the CPLD routing...
×
×
  • Create New...