Jump to content

speccery

Members
  • Content Count

    418
  • Joined

  • Last visited

Community Reputation

658 Excellent

About speccery

  • Rank
    Moonsweeper

Recent Profile Visitors

3,828 profile views
  1. Very nice, good job adamantyr! I am hoping I'll find some time to hack something together too. This is such a delight
  2. Sorry for the long delay in answering. I have not tested to synthesise the plain vanilla TMS9900 core without any peripherals. Looking into one of the breadboard project targets on my GitHub account https://github.com/Speccery/breadboard/blob/master/bb-lx9/work/system_summary.html you can see that a minimal TMS9900 system took 1690 Xilinx Spartan 6 slice LUTs, or 29% of the XC6SLX9 chip. This system includes the TMS9900 core, 32K RAM, 32K ROM, and PNR's TMS9902 UART, all implemented using the FPGA's built-in resources. In a way this number is comparable to the 1072 logic cells for the J1 as that system also includes memory interface, some I/O and UART. However, the Spartan 6 logic core elements are much more advanced than what the Lattice ICE40HX provides, so the numbers 1690 vs 1072 are not directly comparable.
  3. The J1 implements its stacks as two huge shift registers, where each shift operation is a shift by word length, typically 16 or 32 bits. The stacks are not deep, they're for the 16-bit version by default 15 deep for data stack and 17 for the return stack. So these stacks are implemented in the FPGA logic fabric, not in block memory. This also means that there are no stack pointers, at least for the J1A version. So you don't know how deep you're in the stacks... The source code for J1A is about 130 lines of Verilog. It is tiny. It is inspired by the Novic NC4016 to my understanding. The J1 is an awesome project, and it comes with Swapforth already implemented. The basic J1 system for the BlackIce takes 1072 logic cells, so about one eight of the total capacity. It is not only that subroutine calls and pretty much every other instruction takes 1 clock cycle, you can also combine certain operations such as the subroutine return to it. Oh, and it runs at 48 MHz on the BlackIce-II. I did not try to optimize it. I think I also ported it over to the Pepino board, as 32 bit version. Along the lines James had done his version for the Xilinx Spartan 6. No but I guess I could set it up. I was playing with the Icestorm tools and used the J1 as the core to play with. I did not do much, my work amounted to merging the top level block from BlackIce examples with the J1. I tested it with both place-and-route tools: arcahne-pnr and the newer nextpnr. For the latter I had to study things a little to get the PLL done properly (the input clock is 100MHz, which the PLL takes to 48MHz).
  4. Thanks, that is a good comment. I have also used Forth to bring up hardware - the last project of this type was porting the J1 CPU for the BlackIce-II FPGA board. The J1 is essentially a Forth CPU. I'm tempted to add a co-processor system to my TI-99/4A FPGA system with this CPU. It is very compact and very fast. You probably already know about it. This could be used for example to aid debugging, to monitor TI-99/4A signals etc. To make it truly useful it would need to have some capability to interface with the TI's peripherals. On the other hand my next goal is to make my system more accessible by porting it to other low-cost and widely available boards. I'm trying to resist feature creep until then. https://excamera.com/sphinx/fpga-j1.html
  5. It was great to be able to use PeteE's software, I found and fixed two bugs: 1. Despite my "testing" there still was a bug with the treatment of ST1 (A> flag) with the ABS instruction. The processing just lacked completely the special case that ABS instruction sets ST1 based on the source argument. 2. SLA0 did not set overflow flag properly if shift count was greater than one. Fixing bug 1 got extended Basic fixed! So now I could resume what I was actually trying to implement, read access to the serial flash ROM chip. To my delight the code I had writing worked, and I was able to access the serial flash ROM from Basic with a series of call load(...) and call peek(...) statements. I wish the Basic had direct support for hexadecimal numbers, both input and output. The Oric Atmos Basic features these and also DOKE and DEEK operations, which enable peeks and pokes but with 16-bit values... Anyway, with the bugs fixed, all the test cases pass now. It's great that this test is now also very easy to repeat whenever the CPU is updated.
  6. Thanks, this is awesome and extremely helpful to have an independent piece of verification code! I've not had time during the week to test this, but I am looking forward to doing so this evening. Hopefully something shows up immediately Also your testing methodology is better than my test code, I should also test the instructions twice, to make sure the flags go both ways properly. Thus I can improve my test coverage by making a simple modification. Perhaps I should also work on the test code to make it a cartridge, could be useful to others too.
  7. Well that was an interesting debugging session! At the end I understood that what I thought being the problem in computing subtraction incorrectly, the actual problem manifests itself in printing (and elsewhere too). Here is the problem under extended Basic, and below the explanation how I got there. I still don't know what is the offending CPU instruction, but I am getting forward. The process how I found the problem was an interesting feature set galore of the FPGA system features, and using Stuart's cool LBLA / debugger module: Since I thought the problem is in the subtract operation, I studied the excellent TI Intern book based on the comment from RXB SSUB routine address. I wrote a simple Basic program: A=1 B=2 C=A-B and ran this under classic99, setting breakpoints at >D74 and >FA6 to see the contents of the scratchpad memory before and after the subtraction operation when running extended Basic. (I could have determined earlier the problem cannot be in this ROM code, as it is shared with regular TI Basic, and that was working, but bear with me - these things only make sense once you know where the problem is not present). I could see the contents of floating point accumulator at 834A (the value 1) and the argument at 835C (the value 2) and after the operation the floating point accumulator became negative. That makes sense. Next I wanted to verify if this is what happens with my FPGA CPU. This is where I got to use Stuart's cartridge and some features of the FPGA system. First, taking advantage that in the FPGA system ROM actually is RAM, I loaded Stuart's cartridge and modified system ROM to call a subroutine at the beginning of subtract operation (I added the BLWP @>1360 instruction) Notice that as I had to have space for my intercepting subroutine call. I overwrote the NEG instruction at >D7C and moved the NEG @>834A instruction to the intercepted routine. I placed the subroutine at >1364, writing over cassette support code. I then did the same operation again at the end of the floating point routine, at >FA6, this time moving the instruction MOV @>834A,R1 to the interception routine. The actual benefit of the intercept routines is that they copy the entire scratchpad memory to a safe place, before and after executing Basic ROM's floating point subtract routine respectively. The FPGA system has 1 kilobyte of scratchpad memory instead of the regular 256 bytes, so I just copied the memory from 8300 .. 83FF first to 8100..81FF and at the end to 8200..82FF. After making those patches to the system ROM, I copied the modified ROM to PC's disk. I then initialized the FPGA system again, this time with the modified ROM but with extended Basic cartridge inserted instead of Stuart's cartridge. Next I again performed my subtraction in Basic. Once running that piece of Basic code, I just read back the two copies (before and after subtract) of scratchpad memory, and compared them. At this point I saw that the subtract had in fact executed correctly, and the problem manifests itself when printing negative numbers - the minus sign does not appear. The problem also occurs with other operations, since cos and sin functions also have issues. I am very happy with the DMA feature of the FPGA system, as this enables me to read and write the TI clone's memory while the system is running - super handy for debugging. The same mechanism is used when the system is booted up from PC (it can also boot from flash ROM). Now, after this debugging session, I know where the problem is not. Progress.
  8. Some pictures during debugging session to find out why extended BASIC does not work with negative numbers.
  9. I should have known better, thanks Stuart! Once again you've already done what I was looking for, this seems perfect! I am running out of time today on this project, need to continue tomorrow, first with your cartridge.
  10. Thanks for all the comments so far. I'll also post here a quick question on a different topic: when debugging hardware and the CPU, it would be convenient to have something akin to the minimemory and Line-by-Line-Assembler in ROM. (As an aside, I wish I purchased mini memory as a kid instead of extended basic). I have already been using Easybug and the minimemory ROM & GROMs, but my FPGA config does not yet support the 4K RAM of minimemory, although that is trivial to add. If I add the RAM to the cartridge address space I can easily enough load LBLA, but I am wondering if there already is a cartridge ROM which would have this capability to be used with the 32K memory extension? Of course I could use E/A which my system supports already, but I kind of like tweaking things with LBLA style and in most cases when debugging and testing I am only interested in running very short quick and dirty bits of code.
  11. Thanks for the link! Also here abs just clears carry, and it is never set. I don't think the problem is abs, it is something else...
  12. Yes - reading the disassembly is one way to go and I may have to resort to that if nothing else helps. NEG instruction does seem to work, and is included in my test cases. I tried many different varieties of providing negative numbers, ranging from the likes you provided to trigonometric expressions (such as cos(3.141592) - but that yields also bogus results). With TI Basic these operations work, but with extended Basic I get bogus results. The extended basic has a whole lot more code in it, so it is not very surprising that it reveals problems. My test cases for machine code instructions are not comprehensive but I did extend coverage quite a bit today, including use of various addressing modes - although not for all instructions. I can use an earlier version of this same design with the TMS99105 CPU but using my FPGA code for the rest of the TI. That works, so the problem must be in the CPU. As an extreme measure, if I cannot come up with anything easier, I could record memory bus traces when using the TMS99105 and compare those with the FPGA CPU. Or I could add my CPU core to the TMS99105 design and run it with the same data that the TMS99105 is using, but that also requires a lot of work so I am trying to come up with something easier. Probably just many more test cases. A lot of software works correctly, such as the Megademo and it has quite a bit of code in it, so I have reasonable amount of confidence on the CPU core, but clearly something is not working. Perhaps rather than reading the disassembly I could copy bits of code from it and compare them between the TMS99105 and my CPU core.
  13. Thanks a lot and special points for very quick reply The excerpt you provided was interesting, and I did add a whole bunch more test cases, but unfortunately I did not find the problem - yet. The source code will also be very useful, I'm sure once I get a bit deeper the rabbit hole.
  14. I've been today hacking away with the TI-99/4A FPGA after a while. I've been working on the collectorvision phoenix - it has been fun but is a little slow going, as the atari core I am working on is not mine. It makes quite a big difference to work on a design when you know it inside out, as opposed to porting code from someone else over. I did some refactoring of the TI-99/4A VHDL code, separating out the external memory interface from toplevel VHDL block, so that I can more easily adapt the design to other FPGA boards. As part of this process I wanted to enable direct execution of TMS9900 machine code from the FPGA's configuration flash ROM. This is a serial ROM chip, so reading it will be relatively slow, but that should be fine as the system is anyway running too fast for legacy software without slowing it down. Having this capability would enable the TI-99/4A core to run on many barebones FPGA boards, even without any external memory as long as the FPGA has approximately the same capabilities as the XC6SLX9 I am using. When testing the hardware, I wanted to use extended Basic, but realized I have a bug in running extended Basic: I cannot enter negative numbers. Setting A=-1 for example always ignores the minus sign, and A becomes positive. I had earlier similar problems with the regular Basic, and tracked that down to the FPGA CPU's condition codes not working properly in certain cases. I thought I still might have that problem and ran my tests again. One overflow flag bug had crept in, and I also noticed that my ABS instruction implementation was sometimes setting the carry flag while a real CPU does not do it - at least the TMS99105 never sets carry when running ABS - also looking into the source code of classic99 the carry is always cleared when running ABS. The data sheet is ambiguous here, it says ABS sets carry if there is a carry out from the ALU, but it appears in practice it is always zero. Anyway now my test machine code program has exactly the same behavior as a real TMS99105 chip when running through test cases of the following instructions: A, S, SOC, SZC, DIV, MPY, C, NEG, SRL, ANDI, CB, SB, AB, XOR, INC, DEC, SLA, SRA, SRC, MOV, MOVB, SOCB, SZCB, ABS and X. For each of those my test software process executes the operation with 16 different input parameter value combinations, and comparing the results and top 6 bits of status registers yields identical results. This of course is not a comprehensive test of all instructions, but the coverage is pretty good - pretty much all games and other software works. Nevertheless there is a bug somewhere still, hopefully in the CPU and not in timing. But the behavior is so consistent that I believe it is a CPU bug. So if anyone happens to know how extended Basic handles the minus sign, that would be greatly appreciated
  15. This is a fun game I barely resisted firing it up in classic99 (I did fire up classic99 but did not load the game) and instead ran it on my FPGA TI-99/4A for the first time. It is pretty hysterical when the CPU is running at 39x the normal speed I was actually wondering why it is not running any faster than it is (which is very fast), but that is probably due to sound effects. I haven't looked and don't remember from Basic manual, but I assume the call sound (is that the name of it) commands have a timing parameter which is probably tied to vertical frame sync in its implementation, and thus can slow down the FPGA system the same way as the real iron. Any timing based on loops would just run crazy fast, but the sound effect lengths seem the same when I run at maximum speed and when I ran at "slow" speed. I also notice that my "slow" speed is not very slow at all anymore... I also found a bug/limitiation in my setup: in my system I am using PC keyboard and capturing the keypresses on my PC. I have windows program I wrote which I use to load ROMs etc to the FPGA; this same program also captures keyboard presses and sends them to the FPGA through USB, using my own serial protocol. Now the game expects all button presses to be in upper case, but I don't support caps lock, so need to push shift while playing...
×
×
  • Create New...