Jump to content

pnr

Members
  • Content Count

    110
  • Joined

  • Last visited

Community Reputation

85 Excellent

About pnr

  • Rank
    Chopper Commander
  1. Two more uses of the 9900 chip range: - In Germany, mini computer company "Kienzle" produced the MCS9000 range of mini computers based on the 9900 and the 99105. - The Fluke 1722A "Instrument Controller" was effectively a 99105 based desktop computer.
  2. I'm not sure I understand the solution chosen, probably because I misunderstand the context. When testing an int variable for zero, why not use something like "mov *Rx,r0" - assuming that R0 is a scratch register? Or "mov *Rx, @0" - assuming location 0 is ROM? For a char variable "movb *Rx,r0" / "movb *Rx,@0" should work, no? In the alternative, I don't quite see the need for generating the dummy jump on each occasion. Would it not make more sense to define a zero word in the start up code (e.g. named "czero") and generate a "cb @czero,R1" instead? As said, I probably misunderstand the context.
  3. That listing seems to have run out, but a similar one appears to be still open. - How hard is the interfacing to the SDRAM chip? Similar to SRAM, or does it require complex access code? - What is on the daughter board? It looks like 2 usb-serial ports, a VGA interface and some male/female headers. Is that correct? - Any documentation that came with the board? I did not see any link for what i/o's connect to what with the listing.
  4. Probably it is not quite what you guys are looking for, but vintage Unix has already been ported to the 99xx CPU: a port is available for the mini Cortex. Maybe Stuart still has a blank mini Cortex PCB available. As overlays were mentioned in this thread: its C compiler supports semi-automatic overlays: what goes into which overlay is user defined, but all other work is done by the compiler and linker (like generating thunks, etc.). However, I have not ported the support routines for that to the mini Cortex (yet). How vintage Unix did overlays is described here, in section 6.4 and 6.5 (warning: document is a 17M pdf download). In short, the program code is split in a base section and several overlays. The base section is always loaded. Every function call or return that potentially crosses an overlay boundary checks the current overlay number and if different to the desired one the runtime support routines swaps in the right one.
  5. Changed the topic title to "VHDL for 99xx chips", as the 9902 seems done now. Still more to do, like a 9901, a 9995 and a 99105, etc. I've started on doing a 99xxx CPU and have some first results: it runs EVMBUG on a Spartan6 FPGA (EVMBUG is basically TIBUG+LBLA, see Stuart's site for details). The design is based on that of the 99xxx as found in its datasheet, in particular the data path as described in Figure 3 and the microcode as described in Table 18 and Table 19. So far I have only done 9900 level functionality in a 99xxx architecture. When it comes to the data path the mapping from Figure 3 to my VHDL is: - IR maps to ir, PC to pc, WS to ws, ST to st - D and T map to t1 - K maps to t2 - MQ maps to t3 - MA maps to ea - ALU maps to alu, BYTE SWAPPER to swapper - B BUS maps to alu_b, A BUS maps to alu_A, E BUS maps to alu_out - P BUS maps to ab, DI BUS maps to db_in - MICROCONTROLLER maps to sequencer, CROM maps to sequencer_pla - SHIFT COUNT maps to bitctr Note that Figure 3, although an abstraction, seems to derive directly from the chip layout: https://en.wikipedia.org/wiki/Texas_Instruments_TMS9900#/media/File:TI_TMS99105A_die.JPG The instruction decoder generates three starting points into the micro rom: sig_ins, sig_op1 and sig_op2. Point sig_op1 is the code for a source operand fetch, point sig_op2 is the code for a destination fetch and sig_ins is the actual instruction. The sequencer uses the three as needed, and there is no call stack in the sequencer. What I like about the current code: - More or less replicates the 99xxx design, incl. prefetch - Easy to develop into a 9995, and into a full 99xxx with moderate effort - Uses standard VHDL only (i.e. no vendor blocks) What I don’t like about the current code: - The code for the micro rom mixes the true rom with bus multiplexers - The description of the st, t1 and t3 registers mixes basic storage/shifting with next state logic - Some bits of logic are convoluted, e.g. the derivation of the flag values I guess the two underlying discomforts are that there seems to be some unneeded complexity and that the code as written would not generate something resembling the real die when run through an ASIC synthesiser. tms99000_v1.vhd.txt
  6. Thanks for pointing out those gotcha's! The subtraction thing is a clever implementation trick, I think I will try that in the core. The ALU already needs to support "A and not B" for instructions like COC and SZC, so must have complemented B available anyway. Doing subtract as "A + not B + 1", where the "+ 1" is done by setting 'carry in' to one, means that the ALU only needs to implement an adder and not a subtractor as well. The MDEX operating system ("CP/M for the Marinchip M9900") has a program "BRAINS" that does a memory and a CPU check. However, I only have the executable, not the source. I'm not sure how thorough it is, it may only check for some common failure cases (e.g. CRU drivers fried).
  7. For testing a CPU core, I'm looking for diagnostic routines that test the functionality of a 9900 CPU. Maybe this was developed for one of the emulators? Or were those debugged by just throwing a lot of programs at them and fixing bugs where they did not execute as expected? All suggestions welcome.
  8. Thanks speccery! I think that I can now claim that my 9902 design is "FPGA proven" :^) I hope it will be useful to folks like FarmerPotato, Ksarul etc. to create new things. Actually I can report a further success: I now have a simple 9900 system running on a FPGA, with no external components. I'm actually using the same prototyping board as speccery uses in the video (see picture). It has enough resources on the chip to emulate a system with 64KB of RAM/ROM. It consists of 5 files: - a small ROM with a test routine - a small RAM - the tms9902 (note: reconfigured to work with a 50Mhz clock) - a tms9900 CPU, as developed by speccery for his retro-challenge project last year - a "breadboard" file that wires up the above components. All in all it is pretty much like the Grant Searle FPGA setup that started of this thread. I have attached the files for people that want to replicate the result. As before, the .txt extension has been added to make the forum happy. The test routine initializes the 9902 and proceeds to send a continuous stream of "Y" characters over the serial port. I've verified that this works. Taking into account speccery's result above, I'm confident that once I extend the RAM en ROM to bigger sizes and put one of Stuart's images in the ROM it would work, but I have not done that yet. The possibilities are endless. Besides all the "software breadboard" projects, it would be feasible to create ready made files for a TI99/4A (speccery has actually already done that last year), for a Tomy Tutor, a Geneve, a Powertran Cortex, a Marinchip M9900 or even a TI990 mini. breadboard.vhd.txt rom.vhd.txt ram.vhd.txt tms9902.vhd.txt tms9900.vhd.txt
  9. Well perhaps, perhaps not -- maybe I sound more certain about things than I really am: I'm on a learning curve here and what I believe to be correct today, may turn out a gross learner's mistake tomorrow; but putting "possibly", "hypothesis" or "current understanding" in every sentence gets to be a bit much. I (currently :^) think that this is a selectable option in ISE (the drop box just below the implementation-simulation radio button). If one selects "Behavioural" then I think it simulates the code as written, with combinational logic assumed to happen instantly. The point of doing this I think is that one can set breakpoints in process blocks, and get the main thrust of the model working. If one selects "Post place and route" then I think it simulates the circuit as the synthesised actual wires, LUT's and flip-flops, with their proper timing delays as part of the model. Haven't done any tests yet to truly figure this out. The old Atmel tool chain has this difference clearly outlined in its workflow. See figure 1 on page 3 in this document. Maybe ISE is very different from ProChip and I have it all backwards. Maybe the point about speedier simulation in some papers only applies to million gate circuits. I find that simulating the 9902 for 1.5 million clock cycles takes less than a second. Also, "compiling" simulators are apparently 100 times faster than "interpreting" simulators and the consideration may be outdated. As a beginner, I would not know.
  10. I'm running these tools in a Win7-VM on OSX, and it seems to work okay. Not snappy, but quite workable. Next to the full bore tool chains, I've set up a tool chain based around the Textadept programmer's editor, the GHDL compiler/simulator and the gtkWave 'logic analyser'. On Linux this will give you a full VHDL compile/simulate tool chain in about 15MB. On windows or OSX it is a bit bigger, as you have to load the GTK library as well (~15MB). I can certainly recommend that to people who just want to learn VHDL and are not ready for multi-GB installs. I guess that GHDL could be replaced by iVerilog for Verilog source code and having both installed would add about 10MB to the total size (but I have not tried that yet). I had a closer look at the Altera simulation tools. They weigh in at about 3GB. In part that size seems driven by installing a lot of support code, among which a full gcc-mingw install (>100MB). However 2/3rds seems to be files with detailed timing models for every component on every Altera FPGA (so there is a complete tree for each type/package/speed combination). You need that info to simulate the circuit in its synthesised form. That's many thousands of small files, adding up to >2GB. They could make the install a lot smaller by only installing the files for devices that the user was interested in (e.g. by download on demand). If they put all components for a device family in e.g. a sqlite database instead of separate files it would be more manageable and probably run faster. On the Xilinx side it is probably much the same story.
  11. Some further progress with the 9902 in VHDL. The code is feature complete now, and passes an extensive test set. In the main source there are the below changes: - added code to implement the "test mode" feature - moved from the deprecated std_logic_vector_numeric library to the recommended unsigned_std library - fixed a bug in the receive/transmit bit timers (the "/8" bit was not implemented right) - some code cleanup - some tweaks to optimise the synthesised circuit The test bench is much extended now, with about 150 tests (not exhaustive yet, but broad coverage nonetheless). The 9902 code passes all tests on all of Xilinx, Altera and GHDL. Note that this is simulation of the original source code; I have not attempted simulation of the synthesised circuit yet. When I look at the synthesis report of Xilinx and Altera I can see that the circuit is by and large extracted as I would expect (although some things I don't understand yet). There are some interesting differences though. Xilinx comes up with 158 flip-flops, which is what I would expect, Altera reports 162. The difference turns out to be that Xilinx uses binary encoding for the FSM's and Altera-one-hot encoding. Also, Xilinx also recognises the timer circuit as a FSM, whereas Altera only recognises the transmit and receive FSM's. In both tool chains the full 9902 circuit requires about 400 LUT/LE blocks on the FPGA. As an experiment I also synthesised the code for a MAX7160 CPLD. This time the Altera software picks binary encoding for the FSM's and arrives at the expected 158 flip-flops. However, the combinational logic does not seem to fit behind these, and a further 55 macrocells are needed to fit the circuit, for a total of 213. As there are only 160 on the device it does not fit. I wonder if hand-coding in CUPL could make it fit, but I expect it won't be possible and it is too large a job to even try. tb_9902_v2.vhd.txt tms9902_v6.vhd.txt
  12. Hi Matthew180, Below some thoughts on the interesting points you raised. I don’t have direct knowledge of how HDL synthesisers work either, but I did enough reading to have a basic mental model of how they work. I think it is like this: - First the source code is processed into an syntax tree (annotated by a symbol/signal table), like with any compiler. This part consists of well known lexer/parser algorithms. - Then it proceeds to analyse the syntax tree, essentially converting the concurrent statements into logic formulas (net lists, actually). It does the same for process blocks by first doing a flow analysis and then extracting the logic when possible (and for pure combinational code this should always be possible). - As part of the above it will try to find constructs that signify registers, adders, shifters and multiplexers. Registers can be synthesised onto macrocell flip-flops and adders/shifters can make use of the dedicated line between adjacent cells for the carry signal. It appears that an effort is also made to recognise FSM’s, although I’m not sure what the specific benefit is. The recognition of these things seems based on simple pattern recognition, with certain fixed idioms (code patterns) recognised to mean certain things. In order to be sure, the HDL developer must stay close to these standard idioms. If for example the fixed idioms are mixed too much with ‘random logic’, the recognition gets confused and the generated circuit will be confused as well. This is what I meant by “synthesiser voodoo”. It is my guess that this part of the process is far more heuristic (and possibly more simple) than one might imagine. - After everything has been processed into net lists, several optimisation processes are run to minimise the logic and to eliminate dead code. Finally, the simplified result is matched against a library of standard cells to find the optimal allocation (see e.g. this) This step seems conceptually similar to a regular compiler finding an optimal covering of abstract operations with actual CPU instructions. Here my guess is that this part is pretty advanced, with good theoretical underpinnings. I was surprised that my very partial v1 source for the 9902 synthesised down to 6 flip-flops as all else was eliminated as unreachable. - In the last step the library components are placed on the chip and wires routed. For an FPGA, with its set structure, this is perhaps a bit easier than for an ASIC back-end. The latter may be as complex as automated PCB design. As said, I have no direct knowledge, but it is my current understanding of roughly how it works. Perhaps I should have said early 1980’s feeling. Don’t take this too seriously, it is not intended that way. I think the similarity for me is that back then C programs would be bigger and slower than hand-coded assembler and that could be important on a slow machine with 64KB address space. Despite that being the case, C was still preferred because of the advantages that it offered: portability and a much higher abstraction level. In the case of CPLD’s one could think of writing logic formulas in e.g. CUPL as the equivalent of assembler. Working at this level you can still control every detail of how to fit a circuit onto a small device. Of course, this is only workable for small designs — the upper limit is perhaps 100 flip-flops. For larger, more complex systems it is clearly a non-scalable dead end. My memory of the early 80’s is that most C tool chains had bugs. It was part of life then for C compilers to mistranslate certain less common constructs, or to generate a complex instruction sequence were a simple one would have worked. Black belt programmers would inspect the assembler output of the compiler and tweak the C code to get the translation they wanted. For critical code they sometimes wrote scripts to massage the compiler's assembler output. It is different, but for me reminiscent of how sometimes HDL code that works in simulation apparently has to be tweaked to work (efficiently) on a real FPGA as well. I think the non-existence of leading open source FPGA tools might have more to do with the small group of developers that are all of (i) capable software engineers, (ii) capable FPGA engineers and (iii) interested in working on an open source HDL tool chain. The vendors keeping bit stream formats secret is no help either. I suppose they have a valid commercial interest in doing so, both creating customer lock-in and protecting against reverse engineering of designs. Having leading open source compilers for programming languages was not always the case. Let’s look at C. Up to the late 70’s system software was typically a service item with the hardware and customers often had access to source code. When this changed around 1980, compilers became closed source and at the time were not easy to write. For example, the original C compiler by Dennis Ritchie had some 13,000 lines of code and the Unix V6 core was only some 7,000 lines. That size was “big” for most programmers back then. In the early 80’s I think there were only two open source C compilers: Small C (some 6,000 lines) and cc68K (also some 6,000 lines); the latter would not run on 16 bit hardware. Minix source code was open (though not in the modern sense), but the tool chain (the “Amsterdam Compiler Kit”) was not. Similarly, Niklaus Wirth published the source to his Pascal-S system, but the descendant USCD Pascal system had no source openly available. Only after the initial 1.0 release of gcc (in 1989, about 100K lines) the context started to change, and gcc over the next 10-20 years rose to dominance. I think today the C compiler space is essentially gcc and llvm, with scarcely a proprietary player left. Speccery pointed me in the direction of the IceStorm project. That project may currently be experiencing its own “1989” moment and who knows what will happen in the next 10-20 years in the HDL synthesis space? As said, I'm of a similar mind. At the same time, the bloat of these systems is a pet peeve -- and actually kept me from installing any of it for very long time. Xilinx ISE is 17GB and 240 thousand files, Altera Quartus is 9GB and 150 thousand files. I'm sure that a small business / hobby version could fit in well less than 1GB. I think that the IceStorm tool chain, when combined with a good simulator and graphical front end fits in less than 100MB.
  13. I'm doing some cleanup of the 9902 code and here's a question: Take this code for the clock divider (it is around line 200 in the full source file) and let's call it version A: clkdiv: process(CLK, clkctr_q, ctl_q) variable v : std_logic_vector(1 downto 0); begin v := clkctr_q; if rising_edge(CLK) then v := v + 1; if ctl_q.clk4m='0' and v="10" then v:="11"; end if; clkctr_q <= v; end if; end process; bitclk <= '1' when clkctr_q="00" else '0'; And let's call this slightly modified version, version B: clkdiv: process(CLK, clkctr_q, ctl_q) variable v : std_logic_vector(1 downto 0); begin v := clkctr_q; if rising_edge(CLK) then v := v + 1; if ctl_q.clk4m='0' and v="10" then v:="11"; end if; end if; clkctr_q <= v; end process; bitclk <= '1' when clkctr_q="00" else '0'; The Xilinx ISE tool chain accepts both versions with a warning that "variable 'v' does not keep its value outside rising edge(CLK)". Both versions simulate okay. The Altera Quartus tool chain accepts version A without warning or error, and flags version B as an error (with a similar error text). Version A simulates okay. In my limited understanding of VHDL both versions should be okay and not give a warning or error. What subtle aspect of VHDL am I missing? (PS of course I can rewrite it to not have a warning or error, but I'm interested to understand why this is wrong and why the two tools chains have different opinions about it).
  14. I've been thinking about this for a bit. The Free range VHDL book talks about three common coding styles: data-flow, behavioural and structural (see chapter five). I think the difference may be that you are using a style that is mostly data-flow oriented and I'm using a style that is mostly behavioural oriented. The third style, structural, is what Grant Searle is doing in his top level file: connecting up components as if it is a schematic. I'm not sure these terms for coding styles are in common use, but does that make sense? The funny thing about it is that when I was reading up on VHDL in the past few months, I had decided to focus on a data-flow oriented style, as I imagined that would give me the maximum control over what the synthesiser would generate. Now that I look back on the 9902 project I see that I ended up doing mostly behavioural stuff without that ever being a conscious decision. By the way, I find the explanation of process blocks in chapter 5 confusing. It explains it in terms of code being executed sequentially, and for a while I imagined that the synthesised circuit would have interlock flip-flops to enforce that in real hardware. Only much later I realised that such language refers to a context where VHDL is being executed by the simulator. The synthesiser simply does a flow analysis (like a regular optimising compiler would) and extracts the equivalent logic formulas (and where it can't the block is non-synthesiable). I'm happy that these tools exist too, and I shouldn't look a gift horse in the mouth. You raise several interesting points. Will come back with a few thoughts on that "1970's feeling" later -- I'm a tad busy this week.
  15. Well, the tests went a lot quicker than I thought. I had to spent a bit of time learning to work with the tools, but the actual tests went much better than I had anticipated. The receive shift register shifted the wrong way 'round, and in the rcv/xmt state machines some timers were off by one (or more accurately: my thinking on what happens at a state transition was a bit muddled). All in all, very few changes versus the v4 source. I've also made the change to run the 9902 at the global clock speed, with just the timers running at 1Mhz. The clock divider for this is currently just /3 or /4 (as in a real 9902), but making it /50 or /100 is a trivial change. All of the timer, the transmitter and the receiver (along with all the associated configuration and status bits) appear to work in simulation. Making it "FPGA proven" will be next. I've also attached the source for the test bench. It is pretty basic now, but with the infrastructure done it will not be hard to make a real test suite that covers all chip modes and functions. If anybody wants to help a bit with defining test cases, I'd be much obliged. There's still a few tweaks and clean-up items to do in the source, but I'd say that having a VHDL 9902 for use in a Grant Searle style "soft breadboard" is 95% done (unless speccery's integration test brings up major new issues). tms9902_v5.vhd.txt tb_9902_v1.vhd.txt
×
×
  • Create New...