Jump to content


  • Content Count

  • Joined

  • Last visited

Community Reputation

669 Excellent

About speccery

  • Rank

Recent Profile Visitors

3,873 profile views
  1. I have the MX so I know it’s specs, although I am saving it for later. The current version has problems with its HDMI connector. I did not know that BlackIce II is history - that is a shame, I like that board. Thanks for letting me know, I assumed they would still be available. The BlackIce II thing you’ve made sounds similar to what I did with the Pipistrello 64M memory expansion for the 4A. Have you published your design?
  2. Thanks a lot for these comments and questions! These are really good ones. Regarding the choice of FPGA board. For this second design, I really wanted to try out the open source tool chain, and since that only support Lattice chips, I am stuck with those. However, there are important implications: Since the ICE40HX chips are primitive in FPGA standards, the design work that I have done now includes almost no platform dependent code. As an example, the multiplier is now implemented in verilog. That means that this second version of the design is very portable. I plan to implement this version of the design on multiple FPGA boards. I have now a whole bunch of FPGA boards (too many in fact, I don’t know how many), including MIST and MISTer. Of these I have not even powered on the MISTer yet... Even though I am working with FPGAs which by definition are new and not retro, I think it still is somehow in the spirit of retro computers to try to use a solution with limited resources. The MISTer is very cool but has huge resources compared to what a complete and expanded TI-99/4A needs. I admit that this point is a bit philosophical, but this work is more interesting for me if I learn and get challenged along the way, rather than just porting the same stuff over and over again. And yes, I want to make this standalone. My previous design already can run in standalone mode, but it required help from a microcontroller board. I have implemented a version of file system support, which supports storing TI files in a FAT file system. Since my last message I have added GROM support, but there is currently a bug which only shows up when running on hardware, in simulation it works perfectly as far as I can tell. So limited progress but still some progress. Finally, in the spirit of going simple, for this design I removed the cache and I plan to also make a cycle accurate version of the TMS9900 CPU. My original aspiration was to make my “dream TI” as in making a TI-99/4A that can go very fast, and got reasonably close to that already with 30x original speed and beyond. Having done that I realize that I also want go slow, close to original speed.
  3. Yes, I am aware of the Mx and in fact I have one. I just started to work with the Blackice II, for two reasons: I have several of them, and working with SRAM is easier than SDRAM. Having said that, I have ported an earlier design of mine to a cheap Xilinx board with SDRAM. That design uses SDRAM but comprises of the TMS9900 and UART only, it is my test bed for SDRAM work.
  4. Again it's been a long while since I wrote anything here... I thought to write a short update about my work on my new FPGA version of the TI-99/4A. This is work in progress. In short, I'm in the process of creating a version for the Blackice-II FPGA board. This is an affordable board (I hope it is still available) with a fairly small ICE40HX4K FPGA chip, 512K RAM and a fairly powerful microcontroller. The board is supported by the open source Icestorm toolchain, and I have used that for development work. This has been an interesting adventure so far. Icestorm very nice and compact toolchain compared to the bloated Xilinx and Altera tools. However, Icestorm only supports the Verilog hardware description language, so I had to learn Verilog and port my existing VHDL code base to Verilog. Most of the work so far (and I have but a fair amount of hours into this already) has gone to porting and modifying the code to work on this fairly limited platform, changing the language to Verilog and designing around the limitations. In the context of recreating the TI-99/4A the biggest drawback is that the small FPGA only has 16K of internal RAM (compared to 64K on the chip I used for the VHDL version). Also, the internal RAM is a lot less sophisticated. The result has been that I have had to redesign the system architecture quite a bit, so that the external 512K RAM chip is now used for code, data and video memory - as opposed to using on-chip RAM for video memory in the past. This may seem like a small change, and in a way it is, but in practice I had to design a much more involved memory controller which can arbitrate between CPU, VDP, and the bootloader accesses in real time. Although I have converted my whole code base to Verilog, currently only a portion of this has been fully ported and works. Namely I have a system now that has the TMS9900 CPU, TMS9918 VDP with VGA output, memory controller driving the external RAM, EVM-BUG debugger in a 8K ROM block, and finally pnr's TMS9902 UART. The ICE40HX4K chip is only supposed to have 4K LUTs (look up tables), but in practice the silicon is the same as ICE40HX8K with 7680 LUTs and the Icestorm toolchain enables access to all of them. Which is good, since the design already uses 4421 LUTs. The design runs at 25MHz, which is the VGA pixel clock. I am hoping I can fit in the whole thing into this FPGA. As the chip's resources gets close to full utilization the routing probably becomes impossible, so I cannot add too much more. I don't know yet where the limit is. One of the consequences of having the VDP use external RAM is that it now is possible to map video RAM to CPU's address space directly, and that is what I have done during debugging (I'm not yet using TI-99/4A ROMs, just EVMBUG). There are now two ways to access VRAM: using the standard indirect registers - this is obviously necessary for compatibility, and alternatively by just directly mapping it to CPU address space. Direct access to VRAM vastly increases the bandwidth and makes it very easy to use, but of course no existing software supports this... Next I need to add GROM support, which should be easy. When that is in place I should be able to boot this thing with the TI-99/4A ROMs. I still need to figure out how to split the 512K RAM between different functions, probably something like this: 8K system ROM (0000..1FFF) 8K disk support (4000..5FFF) 256K paged cartridge space (6000..7FFF) 64K GROM space (24K used by console GROM [actually 18K but multiplies of 2 are easier]) 64K VRAM space 32K normal RAM expansion That leaves 80K still to be allocated to something. If I can fit in my memory paging unit, it probably would make sense to have the ability to configure either 256K AMS memory space or 256K cartridge space.
  5. Very nice, good job adamantyr! I am hoping I'll find some time to hack something together too. This is such a delight
  6. Sorry for the long delay in answering. I have not tested to synthesise the plain vanilla TMS9900 core without any peripherals. Looking into one of the breadboard project targets on my GitHub account https://github.com/Speccery/breadboard/blob/master/bb-lx9/work/system_summary.html you can see that a minimal TMS9900 system took 1690 Xilinx Spartan 6 slice LUTs, or 29% of the XC6SLX9 chip. This system includes the TMS9900 core, 32K RAM, 32K ROM, and PNR's TMS9902 UART, all implemented using the FPGA's built-in resources. In a way this number is comparable to the 1072 logic cells for the J1 as that system also includes memory interface, some I/O and UART. However, the Spartan 6 logic core elements are much more advanced than what the Lattice ICE40HX provides, so the numbers 1690 vs 1072 are not directly comparable.
  7. The J1 implements its stacks as two huge shift registers, where each shift operation is a shift by word length, typically 16 or 32 bits. The stacks are not deep, they're for the 16-bit version by default 15 deep for data stack and 17 for the return stack. So these stacks are implemented in the FPGA logic fabric, not in block memory. This also means that there are no stack pointers, at least for the J1A version. So you don't know how deep you're in the stacks... The source code for J1A is about 130 lines of Verilog. It is tiny. It is inspired by the Novic NC4016 to my understanding. The J1 is an awesome project, and it comes with Swapforth already implemented. The basic J1 system for the BlackIce takes 1072 logic cells, so about one eight of the total capacity. It is not only that subroutine calls and pretty much every other instruction takes 1 clock cycle, you can also combine certain operations such as the subroutine return to it. Oh, and it runs at 48 MHz on the BlackIce-II. I did not try to optimize it. I think I also ported it over to the Pepino board, as 32 bit version. Along the lines James had done his version for the Xilinx Spartan 6. No but I guess I could set it up. I was playing with the Icestorm tools and used the J1 as the core to play with. I did not do much, my work amounted to merging the top level block from BlackIce examples with the J1. I tested it with both place-and-route tools: arcahne-pnr and the newer nextpnr. For the latter I had to study things a little to get the PLL done properly (the input clock is 100MHz, which the PLL takes to 48MHz).
  8. Thanks, that is a good comment. I have also used Forth to bring up hardware - the last project of this type was porting the J1 CPU for the BlackIce-II FPGA board. The J1 is essentially a Forth CPU. I'm tempted to add a co-processor system to my TI-99/4A FPGA system with this CPU. It is very compact and very fast. You probably already know about it. This could be used for example to aid debugging, to monitor TI-99/4A signals etc. To make it truly useful it would need to have some capability to interface with the TI's peripherals. On the other hand my next goal is to make my system more accessible by porting it to other low-cost and widely available boards. I'm trying to resist feature creep until then. https://excamera.com/sphinx/fpga-j1.html
  9. It was great to be able to use PeteE's software, I found and fixed two bugs: 1. Despite my "testing" there still was a bug with the treatment of ST1 (A> flag) with the ABS instruction. The processing just lacked completely the special case that ABS instruction sets ST1 based on the source argument. 2. SLA0 did not set overflow flag properly if shift count was greater than one. Fixing bug 1 got extended Basic fixed! So now I could resume what I was actually trying to implement, read access to the serial flash ROM chip. To my delight the code I had writing worked, and I was able to access the serial flash ROM from Basic with a series of call load(...) and call peek(...) statements. I wish the Basic had direct support for hexadecimal numbers, both input and output. The Oric Atmos Basic features these and also DOKE and DEEK operations, which enable peeks and pokes but with 16-bit values... Anyway, with the bugs fixed, all the test cases pass now. It's great that this test is now also very easy to repeat whenever the CPU is updated.
  10. Thanks, this is awesome and extremely helpful to have an independent piece of verification code! I've not had time during the week to test this, but I am looking forward to doing so this evening. Hopefully something shows up immediately Also your testing methodology is better than my test code, I should also test the instructions twice, to make sure the flags go both ways properly. Thus I can improve my test coverage by making a simple modification. Perhaps I should also work on the test code to make it a cartridge, could be useful to others too.
  11. Well that was an interesting debugging session! At the end I understood that what I thought being the problem in computing subtraction incorrectly, the actual problem manifests itself in printing (and elsewhere too). Here is the problem under extended Basic, and below the explanation how I got there. I still don't know what is the offending CPU instruction, but I am getting forward. The process how I found the problem was an interesting feature set galore of the FPGA system features, and using Stuart's cool LBLA / debugger module: Since I thought the problem is in the subtract operation, I studied the excellent TI Intern book based on the comment from RXB SSUB routine address. I wrote a simple Basic program: A=1 B=2 C=A-B and ran this under classic99, setting breakpoints at >D74 and >FA6 to see the contents of the scratchpad memory before and after the subtraction operation when running extended Basic. (I could have determined earlier the problem cannot be in this ROM code, as it is shared with regular TI Basic, and that was working, but bear with me - these things only make sense once you know where the problem is not present). I could see the contents of floating point accumulator at 834A (the value 1) and the argument at 835C (the value 2) and after the operation the floating point accumulator became negative. That makes sense. Next I wanted to verify if this is what happens with my FPGA CPU. This is where I got to use Stuart's cartridge and some features of the FPGA system. First, taking advantage that in the FPGA system ROM actually is RAM, I loaded Stuart's cartridge and modified system ROM to call a subroutine at the beginning of subtract operation (I added the BLWP @>1360 instruction) Notice that as I had to have space for my intercepting subroutine call. I overwrote the NEG instruction at >D7C and moved the NEG @>834A instruction to the intercepted routine. I placed the subroutine at >1364, writing over cassette support code. I then did the same operation again at the end of the floating point routine, at >FA6, this time moving the instruction MOV @>834A,R1 to the interception routine. The actual benefit of the intercept routines is that they copy the entire scratchpad memory to a safe place, before and after executing Basic ROM's floating point subtract routine respectively. The FPGA system has 1 kilobyte of scratchpad memory instead of the regular 256 bytes, so I just copied the memory from 8300 .. 83FF first to 8100..81FF and at the end to 8200..82FF. After making those patches to the system ROM, I copied the modified ROM to PC's disk. I then initialized the FPGA system again, this time with the modified ROM but with extended Basic cartridge inserted instead of Stuart's cartridge. Next I again performed my subtraction in Basic. Once running that piece of Basic code, I just read back the two copies (before and after subtract) of scratchpad memory, and compared them. At this point I saw that the subtract had in fact executed correctly, and the problem manifests itself when printing negative numbers - the minus sign does not appear. The problem also occurs with other operations, since cos and sin functions also have issues. I am very happy with the DMA feature of the FPGA system, as this enables me to read and write the TI clone's memory while the system is running - super handy for debugging. The same mechanism is used when the system is booted up from PC (it can also boot from flash ROM). Now, after this debugging session, I know where the problem is not. Progress.
  12. Some pictures during debugging session to find out why extended BASIC does not work with negative numbers.
  13. I should have known better, thanks Stuart! Once again you've already done what I was looking for, this seems perfect! I am running out of time today on this project, need to continue tomorrow, first with your cartridge.
  14. Thanks for all the comments so far. I'll also post here a quick question on a different topic: when debugging hardware and the CPU, it would be convenient to have something akin to the minimemory and Line-by-Line-Assembler in ROM. (As an aside, I wish I purchased mini memory as a kid instead of extended basic). I have already been using Easybug and the minimemory ROM & GROMs, but my FPGA config does not yet support the 4K RAM of minimemory, although that is trivial to add. If I add the RAM to the cartridge address space I can easily enough load LBLA, but I am wondering if there already is a cartridge ROM which would have this capability to be used with the 32K memory extension? Of course I could use E/A which my system supports already, but I kind of like tweaking things with LBLA style and in most cases when debugging and testing I am only interested in running very short quick and dirty bits of code.
  15. Thanks for the link! Also here abs just clears carry, and it is never set. I don't think the problem is abs, it is something else...
  • Create New...