Jump to content

speccery

Members
  • Content Count

    544
  • Joined

  • Last visited

Everything posted by speccery

  1. I have the MX so I know it’s specs, although I am saving it for later. The current version has problems with its HDMI connector. I did not know that BlackIce II is history - that is a shame, I like that board. Thanks for letting me know, I assumed they would still be available. The BlackIce II thing you’ve made sounds similar to what I did with the Pipistrello 64M memory expansion for the 4A. Have you published your design?
  2. Thanks a lot for these comments and questions! These are really good ones. Regarding the choice of FPGA board. For this second design, I really wanted to try out the open source tool chain, and since that only support Lattice chips, I am stuck with those. However, there are important implications: Since the ICE40HX chips are primitive in FPGA standards, the design work that I have done now includes almost no platform dependent code. As an example, the multiplier is now implemented in verilog. That means that this second version of the design is very portable. I plan to implement this version of the design on multiple FPGA boards. I have now a whole bunch of FPGA boards (too many in fact, I don’t know how many), including MIST and MISTer. Of these I have not even powered on the MISTer yet... Even though I am working with FPGAs which by definition are new and not retro, I think it still is somehow in the spirit of retro computers to try to use a solution with limited resources. The MISTer is very cool but has huge resources compared to what a complete and expanded TI-99/4A needs. I admit that this point is a bit philosophical, but this work is more interesting for me if I learn and get challenged along the way, rather than just porting the same stuff over and over again. And yes, I want to make this standalone. My previous design already can run in standalone mode, but it required help from a microcontroller board. I have implemented a version of file system support, which supports storing TI files in a FAT file system. Since my last message I have added GROM support, but there is currently a bug which only shows up when running on hardware, in simulation it works perfectly as far as I can tell. So limited progress but still some progress. Finally, in the spirit of going simple, for this design I removed the cache and I plan to also make a cycle accurate version of the TMS9900 CPU. My original aspiration was to make my “dream TI” as in making a TI-99/4A that can go very fast, and got reasonably close to that already with 30x original speed and beyond. Having done that I realize that I also want go slow, close to original speed.
  3. Yes, I am aware of the Mx and in fact I have one. I just started to work with the Blackice II, for two reasons: I have several of them, and working with SRAM is easier than SDRAM. Having said that, I have ported an earlier design of mine to a cheap Xilinx board with SDRAM. That design uses SDRAM but comprises of the TMS9900 and UART only, it is my test bed for SDRAM work.
  4. Again it's been a long while since I wrote anything here... I thought to write a short update about my work on my new FPGA version of the TI-99/4A. This is work in progress. In short, I'm in the process of creating a version for the Blackice-II FPGA board. This is an affordable board (I hope it is still available) with a fairly small ICE40HX4K FPGA chip, 512K RAM and a fairly powerful microcontroller. The board is supported by the open source Icestorm toolchain, and I have used that for development work. This has been an interesting adventure so far. Icestorm very nice and compact toolchain compared to the bloated Xilinx and Altera tools. However, Icestorm only supports the Verilog hardware description language, so I had to learn Verilog and port my existing VHDL code base to Verilog. Most of the work so far (and I have but a fair amount of hours into this already) has gone to porting and modifying the code to work on this fairly limited platform, changing the language to Verilog and designing around the limitations. In the context of recreating the TI-99/4A the biggest drawback is that the small FPGA only has 16K of internal RAM (compared to 64K on the chip I used for the VHDL version). Also, the internal RAM is a lot less sophisticated. The result has been that I have had to redesign the system architecture quite a bit, so that the external 512K RAM chip is now used for code, data and video memory - as opposed to using on-chip RAM for video memory in the past. This may seem like a small change, and in a way it is, but in practice I had to design a much more involved memory controller which can arbitrate between CPU, VDP, and the bootloader accesses in real time. Although I have converted my whole code base to Verilog, currently only a portion of this has been fully ported and works. Namely I have a system now that has the TMS9900 CPU, TMS9918 VDP with VGA output, memory controller driving the external RAM, EVM-BUG debugger in a 8K ROM block, and finally pnr's TMS9902 UART. The ICE40HX4K chip is only supposed to have 4K LUTs (look up tables), but in practice the silicon is the same as ICE40HX8K with 7680 LUTs and the Icestorm toolchain enables access to all of them. Which is good, since the design already uses 4421 LUTs. The design runs at 25MHz, which is the VGA pixel clock. I am hoping I can fit in the whole thing into this FPGA. As the chip's resources gets close to full utilization the routing probably becomes impossible, so I cannot add too much more. I don't know yet where the limit is. One of the consequences of having the VDP use external RAM is that it now is possible to map video RAM to CPU's address space directly, and that is what I have done during debugging (I'm not yet using TI-99/4A ROMs, just EVMBUG). There are now two ways to access VRAM: using the standard indirect registers - this is obviously necessary for compatibility, and alternatively by just directly mapping it to CPU address space. Direct access to VRAM vastly increases the bandwidth and makes it very easy to use, but of course no existing software supports this... Next I need to add GROM support, which should be easy. When that is in place I should be able to boot this thing with the TI-99/4A ROMs. I still need to figure out how to split the 512K RAM between different functions, probably something like this: 8K system ROM (0000..1FFF) 8K disk support (4000..5FFF) 256K paged cartridge space (6000..7FFF) 64K GROM space (24K used by console GROM [actually 18K but multiplies of 2 are easier]) 64K VRAM space 32K normal RAM expansion That leaves 80K still to be allocated to something. If I can fit in my memory paging unit, it probably would make sense to have the ability to configure either 256K AMS memory space or 256K cartridge space.
  5. Very nice, good job adamantyr! I am hoping I'll find some time to hack something together too. This is such a delight
  6. Sorry for the long delay in answering. I have not tested to synthesise the plain vanilla TMS9900 core without any peripherals. Looking into one of the breadboard project targets on my GitHub account https://github.com/Speccery/breadboard/blob/master/bb-lx9/work/system_summary.html you can see that a minimal TMS9900 system took 1690 Xilinx Spartan 6 slice LUTs, or 29% of the XC6SLX9 chip. This system includes the TMS9900 core, 32K RAM, 32K ROM, and PNR's TMS9902 UART, all implemented using the FPGA's built-in resources. In a way this number is comparable to the 1072 logic cells for the J1 as that system also includes memory interface, some I/O and UART. However, the Spartan 6 logic core elements are much more advanced than what the Lattice ICE40HX provides, so the numbers 1690 vs 1072 are not directly comparable.
  7. The J1 implements its stacks as two huge shift registers, where each shift operation is a shift by word length, typically 16 or 32 bits. The stacks are not deep, they're for the 16-bit version by default 15 deep for data stack and 17 for the return stack. So these stacks are implemented in the FPGA logic fabric, not in block memory. This also means that there are no stack pointers, at least for the J1A version. So you don't know how deep you're in the stacks... The source code for J1A is about 130 lines of Verilog. It is tiny. It is inspired by the Novic NC4016 to my understanding. The J1 is an awesome project, and it comes with Swapforth already implemented. The basic J1 system for the BlackIce takes 1072 logic cells, so about one eight of the total capacity. It is not only that subroutine calls and pretty much every other instruction takes 1 clock cycle, you can also combine certain operations such as the subroutine return to it. Oh, and it runs at 48 MHz on the BlackIce-II. I did not try to optimize it. I think I also ported it over to the Pepino board, as 32 bit version. Along the lines James had done his version for the Xilinx Spartan 6. No but I guess I could set it up. I was playing with the Icestorm tools and used the J1 as the core to play with. I did not do much, my work amounted to merging the top level block from BlackIce examples with the J1. I tested it with both place-and-route tools: arcahne-pnr and the newer nextpnr. For the latter I had to study things a little to get the PLL done properly (the input clock is 100MHz, which the PLL takes to 48MHz).
  8. Thanks, that is a good comment. I have also used Forth to bring up hardware - the last project of this type was porting the J1 CPU for the BlackIce-II FPGA board. The J1 is essentially a Forth CPU. I'm tempted to add a co-processor system to my TI-99/4A FPGA system with this CPU. It is very compact and very fast. You probably already know about it. This could be used for example to aid debugging, to monitor TI-99/4A signals etc. To make it truly useful it would need to have some capability to interface with the TI's peripherals. On the other hand my next goal is to make my system more accessible by porting it to other low-cost and widely available boards. I'm trying to resist feature creep until then. https://excamera.com/sphinx/fpga-j1.html
  9. It was great to be able to use PeteE's software, I found and fixed two bugs: 1. Despite my "testing" there still was a bug with the treatment of ST1 (A> flag) with the ABS instruction. The processing just lacked completely the special case that ABS instruction sets ST1 based on the source argument. 2. SLA0 did not set overflow flag properly if shift count was greater than one. Fixing bug 1 got extended Basic fixed! So now I could resume what I was actually trying to implement, read access to the serial flash ROM chip. To my delight the code I had writing worked, and I was able to access the serial flash ROM from Basic with a series of call load(...) and call peek(...) statements. I wish the Basic had direct support for hexadecimal numbers, both input and output. The Oric Atmos Basic features these and also DOKE and DEEK operations, which enable peeks and pokes but with 16-bit values... Anyway, with the bugs fixed, all the test cases pass now. It's great that this test is now also very easy to repeat whenever the CPU is updated.
  10. Thanks, this is awesome and extremely helpful to have an independent piece of verification code! I've not had time during the week to test this, but I am looking forward to doing so this evening. Hopefully something shows up immediately Also your testing methodology is better than my test code, I should also test the instructions twice, to make sure the flags go both ways properly. Thus I can improve my test coverage by making a simple modification. Perhaps I should also work on the test code to make it a cartridge, could be useful to others too.
  11. Well that was an interesting debugging session! At the end I understood that what I thought being the problem in computing subtraction incorrectly, the actual problem manifests itself in printing (and elsewhere too). Here is the problem under extended Basic, and below the explanation how I got there. I still don't know what is the offending CPU instruction, but I am getting forward. The process how I found the problem was an interesting feature set galore of the FPGA system features, and using Stuart's cool LBLA / debugger module: Since I thought the problem is in the subtract operation, I studied the excellent TI Intern book based on the comment from RXB SSUB routine address. I wrote a simple Basic program: A=1 B=2 C=A-B and ran this under classic99, setting breakpoints at >D74 and >FA6 to see the contents of the scratchpad memory before and after the subtraction operation when running extended Basic. (I could have determined earlier the problem cannot be in this ROM code, as it is shared with regular TI Basic, and that was working, but bear with me - these things only make sense once you know where the problem is not present). I could see the contents of floating point accumulator at 834A (the value 1) and the argument at 835C (the value 2) and after the operation the floating point accumulator became negative. That makes sense. Next I wanted to verify if this is what happens with my FPGA CPU. This is where I got to use Stuart's cartridge and some features of the FPGA system. First, taking advantage that in the FPGA system ROM actually is RAM, I loaded Stuart's cartridge and modified system ROM to call a subroutine at the beginning of subtract operation (I added the BLWP @>1360 instruction) Notice that as I had to have space for my intercepting subroutine call. I overwrote the NEG instruction at >D7C and moved the NEG @>834A instruction to the intercepted routine. I placed the subroutine at >1364, writing over cassette support code. I then did the same operation again at the end of the floating point routine, at >FA6, this time moving the instruction MOV @>834A,R1 to the interception routine. The actual benefit of the intercept routines is that they copy the entire scratchpad memory to a safe place, before and after executing Basic ROM's floating point subtract routine respectively. The FPGA system has 1 kilobyte of scratchpad memory instead of the regular 256 bytes, so I just copied the memory from 8300 .. 83FF first to 8100..81FF and at the end to 8200..82FF. After making those patches to the system ROM, I copied the modified ROM to PC's disk. I then initialized the FPGA system again, this time with the modified ROM but with extended Basic cartridge inserted instead of Stuart's cartridge. Next I again performed my subtraction in Basic. Once running that piece of Basic code, I just read back the two copies (before and after subtract) of scratchpad memory, and compared them. At this point I saw that the subtract had in fact executed correctly, and the problem manifests itself when printing negative numbers - the minus sign does not appear. The problem also occurs with other operations, since cos and sin functions also have issues. I am very happy with the DMA feature of the FPGA system, as this enables me to read and write the TI clone's memory while the system is running - super handy for debugging. The same mechanism is used when the system is booted up from PC (it can also boot from flash ROM). Now, after this debugging session, I know where the problem is not. Progress.
  12. From the album: FPGA CPU debugging: extended Basic

    Here I finally understood that the problem is in displaying the minus sign, not the actual computation in this case.
  13. Some pictures during debugging session to find out why extended BASIC does not work with negative numbers.
  14. From the album: FPGA CPU debugging: extended Basic

    Here I copy scratchpad after the subtract to >8200. My FPGA desing incorrectly displays sprites even on top of text mode, hence the strange colored blocks on screen.
  15. From the album: FPGA CPU debugging: extended Basic

    Here jumping to another piece of code through BLWP vector at >137C. My FPGA desing incorrectly displays sprites even on top of text mode, hence the strange colored blocks on screen.
  16. From the album: FPGA CPU debugging: extended Basic

    The top two words are for BLWP instruction, first WP=>8000 and then the memory copy routine at >1364. It copies scratchpad from >8300 to >8100. This would not work on a regular TI, but my FPGA system has 1 kilobyte scratchpad. My FPGA desing incorrectly displays sprites even on top of text mode, hence the strange colored blocks on screen.
  17. From the album: FPGA CPU debugging: extended Basic

    Intercepting beginning of floating point subtraction routine. My FPGA desing incorrectly displays sprites even on top of text mode, hence the strange colored blocks on screen.
  18. I should have known better, thanks Stuart! Once again you've already done what I was looking for, this seems perfect! I am running out of time today on this project, need to continue tomorrow, first with your cartridge.
  19. Thanks for all the comments so far. I'll also post here a quick question on a different topic: when debugging hardware and the CPU, it would be convenient to have something akin to the minimemory and Line-by-Line-Assembler in ROM. (As an aside, I wish I purchased mini memory as a kid instead of extended basic). I have already been using Easybug and the minimemory ROM & GROMs, but my FPGA config does not yet support the 4K RAM of minimemory, although that is trivial to add. If I add the RAM to the cartridge address space I can easily enough load LBLA, but I am wondering if there already is a cartridge ROM which would have this capability to be used with the 32K memory extension? Of course I could use E/A which my system supports already, but I kind of like tweaking things with LBLA style and in most cases when debugging and testing I am only interested in running very short quick and dirty bits of code.
  20. Thanks for the link! Also here abs just clears carry, and it is never set. I don't think the problem is abs, it is something else...
  21. Yes - reading the disassembly is one way to go and I may have to resort to that if nothing else helps. NEG instruction does seem to work, and is included in my test cases. I tried many different varieties of providing negative numbers, ranging from the likes you provided to trigonometric expressions (such as cos(3.141592) - but that yields also bogus results). With TI Basic these operations work, but with extended Basic I get bogus results. The extended basic has a whole lot more code in it, so it is not very surprising that it reveals problems. My test cases for machine code instructions are not comprehensive but I did extend coverage quite a bit today, including use of various addressing modes - although not for all instructions. I can use an earlier version of this same design with the TMS99105 CPU but using my FPGA code for the rest of the TI. That works, so the problem must be in the CPU. As an extreme measure, if I cannot come up with anything easier, I could record memory bus traces when using the TMS99105 and compare those with the FPGA CPU. Or I could add my CPU core to the TMS99105 design and run it with the same data that the TMS99105 is using, but that also requires a lot of work so I am trying to come up with something easier. Probably just many more test cases. A lot of software works correctly, such as the Megademo and it has quite a bit of code in it, so I have reasonable amount of confidence on the CPU core, but clearly something is not working. Perhaps rather than reading the disassembly I could copy bits of code from it and compare them between the TMS99105 and my CPU core.
  22. Thanks a lot and special points for very quick reply The excerpt you provided was interesting, and I did add a whole bunch more test cases, but unfortunately I did not find the problem - yet. The source code will also be very useful, I'm sure once I get a bit deeper the rabbit hole.
  23. I've been today hacking away with the TI-99/4A FPGA after a while. I've been working on the collectorvision phoenix - it has been fun but is a little slow going, as the atari core I am working on is not mine. It makes quite a big difference to work on a design when you know it inside out, as opposed to porting code from someone else over. I did some refactoring of the TI-99/4A VHDL code, separating out the external memory interface from toplevel VHDL block, so that I can more easily adapt the design to other FPGA boards. As part of this process I wanted to enable direct execution of TMS9900 machine code from the FPGA's configuration flash ROM. This is a serial ROM chip, so reading it will be relatively slow, but that should be fine as the system is anyway running too fast for legacy software without slowing it down. Having this capability would enable the TI-99/4A core to run on many barebones FPGA boards, even without any external memory as long as the FPGA has approximately the same capabilities as the XC6SLX9 I am using. When testing the hardware, I wanted to use extended Basic, but realized I have a bug in running extended Basic: I cannot enter negative numbers. Setting A=-1 for example always ignores the minus sign, and A becomes positive. I had earlier similar problems with the regular Basic, and tracked that down to the FPGA CPU's condition codes not working properly in certain cases. I thought I still might have that problem and ran my tests again. One overflow flag bug had crept in, and I also noticed that my ABS instruction implementation was sometimes setting the carry flag while a real CPU does not do it - at least the TMS99105 never sets carry when running ABS - also looking into the source code of classic99 the carry is always cleared when running ABS. The data sheet is ambiguous here, it says ABS sets carry if there is a carry out from the ALU, but it appears in practice it is always zero. Anyway now my test machine code program has exactly the same behavior as a real TMS99105 chip when running through test cases of the following instructions: A, S, SOC, SZC, DIV, MPY, C, NEG, SRL, ANDI, CB, SB, AB, XOR, INC, DEC, SLA, SRA, SRC, MOV, MOVB, SOCB, SZCB, ABS and X. For each of those my test software process executes the operation with 16 different input parameter value combinations, and comparing the results and top 6 bits of status registers yields identical results. This of course is not a comprehensive test of all instructions, but the coverage is pretty good - pretty much all games and other software works. Nevertheless there is a bug somewhere still, hopefully in the CPU and not in timing. But the behavior is so consistent that I believe it is a CPU bug. So if anyone happens to know how extended Basic handles the minus sign, that would be greatly appreciated
  24. This is a fun game I barely resisted firing it up in classic99 (I did fire up classic99 but did not load the game) and instead ran it on my FPGA TI-99/4A for the first time. It is pretty hysterical when the CPU is running at 39x the normal speed I was actually wondering why it is not running any faster than it is (which is very fast), but that is probably due to sound effects. I haven't looked and don't remember from Basic manual, but I assume the call sound (is that the name of it) commands have a timing parameter which is probably tied to vertical frame sync in its implementation, and thus can slow down the FPGA system the same way as the real iron. Any timing based on loops would just run crazy fast, but the sound effect lengths seem the same when I run at maximum speed and when I ran at "slow" speed. I also notice that my "slow" speed is not very slow at all anymore... I also found a bug/limitiation in my setup: in my system I am using PC keyboard and capturing the keypresses on my PC. I have windows program I wrote which I use to load ROMs etc to the FPGA; this same program also captures keyboard presses and sends them to the FPGA through USB, using my own serial protocol. Now the game expects all button presses to be in upper case, but I don't support caps lock, so need to push shift while playing...
  25. Well hello after a long while. I have been preoccupied with other things, but during the past few weeks I've found a little time to work again on the TI-99/4A FPGA clone. I really ought to be working on the Collectorvision Atari 2600 code, but could not help but spend some time with the TI-99/4A first. Here, I wanted to follow a bit my original passion which was to have a fast TI-99/4A. This time I also wanted to put some computer architecture theory into practice: I added an on-chip cache memory to my TI-99/4A clone, while also optimising the VHDL code a bit. The result is that instruction execution speed jumped from 23 times the original to 39 times the original speed. The TMS9900 core is now a little simpler than it used to be, but still far from an elegant design, although getting a little better. I have two plans on this project to follow up: First add a more speed, by going from the current non-pipelined design to a slightly pipelined design in the sense that there would be a two stage pipeline, where both stages would take multiple clock cycles. The first stage would be instruction fetch and decode stage, while the second would be instruction execution stage. I could not go to this direction in the past easily since there only was (and still is) one memory bus. But now that I have a working cache, I have much more memory bandwidth to play with. The cache is currently outside the CPU core, so it is serving instruction and data fetches. It is a super simple design: direct mapped with write-through update policy. 1 kilobyte data capacity and about half a kilobyte in tag memory. The whole thing is implemented as a simple 1k x 36 bit memory block (not all of the bits are used in each 36-bit word). Having the cache outside the CPU core is not ideal, so I am probably going to add another cache for instructions only and pull that inside the core, into the fetch/decode stage, so that it can operate in parallel to the execution stage. This should increase performance quite significantly. The second intention I have is to port the TI-99/4A core to a few more FPGA boards, in order to make this design more accessible for others. The cache is also an enabler of sorts in this sense, since now I can easily support slow buses (such as SPI connected flash memories) for cartridge ROMs, I could support DRAM fetches in burst mode enabling FPGA boards with DRAMs only to be effectively used, and I can also support quite small FPGAs since I could now modify the design in a way that doesn't anymore need a lot of on-chip memory while still running at a reasonable speed. Specifically I have the low cost blackice-ii board in mind as one target for the TI implementation, this FPGA only has 16K RAM on board.
×
×
  • Create New...