Jump to content


  • Content Count

  • Joined

  • Last visited

Everything posted by pnr

  1. Probably it is not quite what you guys are looking for, but vintage Unix has already been ported to the 99xx CPU: a port is available for the mini Cortex. Maybe Stuart still has a blank mini Cortex PCB available. As overlays were mentioned in this thread: its C compiler supports semi-automatic overlays: what goes into which overlay is user defined, but all other work is done by the compiler and linker (like generating thunks, etc.). However, I have not ported the support routines for that to the mini Cortex (yet). How vintage Unix did overlays is described here, in section 6.4 and 6.5 (warning: document is a 17M pdf download). In short, the program code is split in a base section and several overlays. The base section is always loaded. Every function call or return that potentially crosses an overlay boundary checks the current overlay number and if different to the desired one the runtime support routines swaps in the right one.
  2. Changed the topic title to "VHDL for 99xx chips", as the 9902 seems done now. Still more to do, like a 9901, a 9995 and a 99105, etc. I've started on doing a 99xxx CPU and have some first results: it runs EVMBUG on a Spartan6 FPGA (EVMBUG is basically TIBUG+LBLA, see Stuart's site for details). The design is based on that of the 99xxx as found in its datasheet, in particular the data path as described in Figure 3 and the microcode as described in Table 18 and Table 19. So far I have only done 9900 level functionality in a 99xxx architecture. When it comes to the data path the mapping from Figure 3 to my VHDL is: - IR maps to ir, PC to pc, WS to ws, ST to st - D and T map to t1 - K maps to t2 - MQ maps to t3 - MA maps to ea - ALU maps to alu, BYTE SWAPPER to swapper - B BUS maps to alu_b, A BUS maps to alu_A, E BUS maps to alu_out - P BUS maps to ab, DI BUS maps to db_in - MICROCONTROLLER maps to sequencer, CROM maps to sequencer_pla - SHIFT COUNT maps to bitctr Note that Figure 3, although an abstraction, seems to derive directly from the chip layout: https://en.wikipedia.org/wiki/Texas_Instruments_TMS9900#/media/File:TI_TMS99105A_die.JPG The instruction decoder generates three starting points into the micro rom: sig_ins, sig_op1 and sig_op2. Point sig_op1 is the code for a source operand fetch, point sig_op2 is the code for a destination fetch and sig_ins is the actual instruction. The sequencer uses the three as needed, and there is no call stack in the sequencer. What I like about the current code: - More or less replicates the 99xxx design, incl. prefetch - Easy to develop into a 9995, and into a full 99xxx with moderate effort - Uses standard VHDL only (i.e. no vendor blocks) What I don’t like about the current code: - The code for the micro rom mixes the true rom with bus multiplexers - The description of the st, t1 and t3 registers mixes basic storage/shifting with next state logic - Some bits of logic are convoluted, e.g. the derivation of the flag values I guess the two underlying discomforts are that there seems to be some unneeded complexity and that the code as written would not generate something resembling the real die when run through an ASIC synthesiser. tms99000_v1.vhd.txt
  3. Thanks for pointing out those gotcha's! The subtraction thing is a clever implementation trick, I think I will try that in the core. The ALU already needs to support "A and not B" for instructions like COC and SZC, so must have complemented B available anyway. Doing subtract as "A + not B + 1", where the "+ 1" is done by setting 'carry in' to one, means that the ALU only needs to implement an adder and not a subtractor as well. The MDEX operating system ("CP/M for the Marinchip M9900") has a program "BRAINS" that does a memory and a CPU check. However, I only have the executable, not the source. I'm not sure how thorough it is, it may only check for some common failure cases (e.g. CRU drivers fried).
  4. For testing a CPU core, I'm looking for diagnostic routines that test the functionality of a 9900 CPU. Maybe this was developed for one of the emulators? Or were those debugged by just throwing a lot of programs at them and fixing bugs where they did not execute as expected? All suggestions welcome.
  5. Thanks speccery! I think that I can now claim that my 9902 design is "FPGA proven" :^) I hope it will be useful to folks like FarmerPotato, Ksarul etc. to create new things. Actually I can report a further success: I now have a simple 9900 system running on a FPGA, with no external components. I'm actually using the same prototyping board as speccery uses in the video (see picture). It has enough resources on the chip to emulate a system with 64KB of RAM/ROM. It consists of 5 files: - a small ROM with a test routine - a small RAM - the tms9902 (note: reconfigured to work with a 50Mhz clock) - a tms9900 CPU, as developed by speccery for his retro-challenge project last year - a "breadboard" file that wires up the above components. All in all it is pretty much like the Grant Searle FPGA setup that started of this thread. I have attached the files for people that want to replicate the result. As before, the .txt extension has been added to make the forum happy. The test routine initializes the 9902 and proceeds to send a continuous stream of "Y" characters over the serial port. I've verified that this works. Taking into account speccery's result above, I'm confident that once I extend the RAM en ROM to bigger sizes and put one of Stuart's images in the ROM it would work, but I have not done that yet. The possibilities are endless. Besides all the "software breadboard" projects, it would be feasible to create ready made files for a TI99/4A (speccery has actually already done that last year), for a Tomy Tutor, a Geneve, a Powertran Cortex, a Marinchip M9900 or even a TI990 mini. breadboard.vhd.txt rom.vhd.txt ram.vhd.txt tms9902.vhd.txt tms9900.vhd.txt
  6. Well perhaps, perhaps not -- maybe I sound more certain about things than I really am: I'm on a learning curve here and what I believe to be correct today, may turn out a gross learner's mistake tomorrow; but putting "possibly", "hypothesis" or "current understanding" in every sentence gets to be a bit much. I (currently :^) think that this is a selectable option in ISE (the drop box just below the implementation-simulation radio button). If one selects "Behavioural" then I think it simulates the code as written, with combinational logic assumed to happen instantly. The point of doing this I think is that one can set breakpoints in process blocks, and get the main thrust of the model working. If one selects "Post place and route" then I think it simulates the circuit as the synthesised actual wires, LUT's and flip-flops, with their proper timing delays as part of the model. Haven't done any tests yet to truly figure this out. The old Atmel tool chain has this difference clearly outlined in its workflow. See figure 1 on page 3 in this document. Maybe ISE is very different from ProChip and I have it all backwards. Maybe the point about speedier simulation in some papers only applies to million gate circuits. I find that simulating the 9902 for 1.5 million clock cycles takes less than a second. Also, "compiling" simulators are apparently 100 times faster than "interpreting" simulators and the consideration may be outdated. As a beginner, I would not know.
  7. I'm running these tools in a Win7-VM on OSX, and it seems to work okay. Not snappy, but quite workable. Next to the full bore tool chains, I've set up a tool chain based around the Textadept programmer's editor, the GHDL compiler/simulator and the gtkWave 'logic analyser'. On Linux this will give you a full VHDL compile/simulate tool chain in about 15MB. On windows or OSX it is a bit bigger, as you have to load the GTK library as well (~15MB). I can certainly recommend that to people who just want to learn VHDL and are not ready for multi-GB installs. I guess that GHDL could be replaced by iVerilog for Verilog source code and having both installed would add about 10MB to the total size (but I have not tried that yet). I had a closer look at the Altera simulation tools. They weigh in at about 3GB. In part that size seems driven by installing a lot of support code, among which a full gcc-mingw install (>100MB). However 2/3rds seems to be files with detailed timing models for every component on every Altera FPGA (so there is a complete tree for each type/package/speed combination). You need that info to simulate the circuit in its synthesised form. That's many thousands of small files, adding up to >2GB. They could make the install a lot smaller by only installing the files for devices that the user was interested in (e.g. by download on demand). If they put all components for a device family in e.g. a sqlite database instead of separate files it would be more manageable and probably run faster. On the Xilinx side it is probably much the same story.
  8. Some further progress with the 9902 in VHDL. The code is feature complete now, and passes an extensive test set. In the main source there are the below changes: - added code to implement the "test mode" feature - moved from the deprecated std_logic_vector_numeric library to the recommended unsigned_std library - fixed a bug in the receive/transmit bit timers (the "/8" bit was not implemented right) - some code cleanup - some tweaks to optimise the synthesised circuit The test bench is much extended now, with about 150 tests (not exhaustive yet, but broad coverage nonetheless). The 9902 code passes all tests on all of Xilinx, Altera and GHDL. Note that this is simulation of the original source code; I have not attempted simulation of the synthesised circuit yet. When I look at the synthesis report of Xilinx and Altera I can see that the circuit is by and large extracted as I would expect (although some things I don't understand yet). There are some interesting differences though. Xilinx comes up with 158 flip-flops, which is what I would expect, Altera reports 162. The difference turns out to be that Xilinx uses binary encoding for the FSM's and Altera-one-hot encoding. Also, Xilinx also recognises the timer circuit as a FSM, whereas Altera only recognises the transmit and receive FSM's. In both tool chains the full 9902 circuit requires about 400 LUT/LE blocks on the FPGA. As an experiment I also synthesised the code for a MAX7160 CPLD. This time the Altera software picks binary encoding for the FSM's and arrives at the expected 158 flip-flops. However, the combinational logic does not seem to fit behind these, and a further 55 macrocells are needed to fit the circuit, for a total of 213. As there are only 160 on the device it does not fit. I wonder if hand-coding in CUPL could make it fit, but I expect it won't be possible and it is too large a job to even try. tb_9902_v2.vhd.txt tms9902_v6.vhd.txt
  9. Hi Matthew180, Below some thoughts on the interesting points you raised. I don’t have direct knowledge of how HDL synthesisers work either, but I did enough reading to have a basic mental model of how they work. I think it is like this: - First the source code is processed into an syntax tree (annotated by a symbol/signal table), like with any compiler. This part consists of well known lexer/parser algorithms. - Then it proceeds to analyse the syntax tree, essentially converting the concurrent statements into logic formulas (net lists, actually). It does the same for process blocks by first doing a flow analysis and then extracting the logic when possible (and for pure combinational code this should always be possible). - As part of the above it will try to find constructs that signify registers, adders, shifters and multiplexers. Registers can be synthesised onto macrocell flip-flops and adders/shifters can make use of the dedicated line between adjacent cells for the carry signal. It appears that an effort is also made to recognise FSM’s, although I’m not sure what the specific benefit is. The recognition of these things seems based on simple pattern recognition, with certain fixed idioms (code patterns) recognised to mean certain things. In order to be sure, the HDL developer must stay close to these standard idioms. If for example the fixed idioms are mixed too much with ‘random logic’, the recognition gets confused and the generated circuit will be confused as well. This is what I meant by “synthesiser voodoo”. It is my guess that this part of the process is far more heuristic (and possibly more simple) than one might imagine. - After everything has been processed into net lists, several optimisation processes are run to minimise the logic and to eliminate dead code. Finally, the simplified result is matched against a library of standard cells to find the optimal allocation (see e.g. this) This step seems conceptually similar to a regular compiler finding an optimal covering of abstract operations with actual CPU instructions. Here my guess is that this part is pretty advanced, with good theoretical underpinnings. I was surprised that my very partial v1 source for the 9902 synthesised down to 6 flip-flops as all else was eliminated as unreachable. - In the last step the library components are placed on the chip and wires routed. For an FPGA, with its set structure, this is perhaps a bit easier than for an ASIC back-end. The latter may be as complex as automated PCB design. As said, I have no direct knowledge, but it is my current understanding of roughly how it works. Perhaps I should have said early 1980’s feeling. Don’t take this too seriously, it is not intended that way. I think the similarity for me is that back then C programs would be bigger and slower than hand-coded assembler and that could be important on a slow machine with 64KB address space. Despite that being the case, C was still preferred because of the advantages that it offered: portability and a much higher abstraction level. In the case of CPLD’s one could think of writing logic formulas in e.g. CUPL as the equivalent of assembler. Working at this level you can still control every detail of how to fit a circuit onto a small device. Of course, this is only workable for small designs — the upper limit is perhaps 100 flip-flops. For larger, more complex systems it is clearly a non-scalable dead end. My memory of the early 80’s is that most C tool chains had bugs. It was part of life then for C compilers to mistranslate certain less common constructs, or to generate a complex instruction sequence were a simple one would have worked. Black belt programmers would inspect the assembler output of the compiler and tweak the C code to get the translation they wanted. For critical code they sometimes wrote scripts to massage the compiler's assembler output. It is different, but for me reminiscent of how sometimes HDL code that works in simulation apparently has to be tweaked to work (efficiently) on a real FPGA as well. I think the non-existence of leading open source FPGA tools might have more to do with the small group of developers that are all of (i) capable software engineers, (ii) capable FPGA engineers and (iii) interested in working on an open source HDL tool chain. The vendors keeping bit stream formats secret is no help either. I suppose they have a valid commercial interest in doing so, both creating customer lock-in and protecting against reverse engineering of designs. Having leading open source compilers for programming languages was not always the case. Let’s look at C. Up to the late 70’s system software was typically a service item with the hardware and customers often had access to source code. When this changed around 1980, compilers became closed source and at the time were not easy to write. For example, the original C compiler by Dennis Ritchie had some 13,000 lines of code and the Unix V6 core was only some 7,000 lines. That size was “big” for most programmers back then. In the early 80’s I think there were only two open source C compilers: Small C (some 6,000 lines) and cc68K (also some 6,000 lines); the latter would not run on 16 bit hardware. Minix source code was open (though not in the modern sense), but the tool chain (the “Amsterdam Compiler Kit”) was not. Similarly, Niklaus Wirth published the source to his Pascal-S system, but the descendant USCD Pascal system had no source openly available. Only after the initial 1.0 release of gcc (in 1989, about 100K lines) the context started to change, and gcc over the next 10-20 years rose to dominance. I think today the C compiler space is essentially gcc and llvm, with scarcely a proprietary player left. Speccery pointed me in the direction of the IceStorm project. That project may currently be experiencing its own “1989” moment and who knows what will happen in the next 10-20 years in the HDL synthesis space? As said, I'm of a similar mind. At the same time, the bloat of these systems is a pet peeve -- and actually kept me from installing any of it for very long time. Xilinx ISE is 17GB and 240 thousand files, Altera Quartus is 9GB and 150 thousand files. I'm sure that a small business / hobby version could fit in well less than 1GB. I think that the IceStorm tool chain, when combined with a good simulator and graphical front end fits in less than 100MB.
  10. I'm doing some cleanup of the 9902 code and here's a question: Take this code for the clock divider (it is around line 200 in the full source file) and let's call it version A: clkdiv: process(CLK, clkctr_q, ctl_q) variable v : std_logic_vector(1 downto 0); begin v := clkctr_q; if rising_edge(CLK) then v := v + 1; if ctl_q.clk4m='0' and v="10" then v:="11"; end if; clkctr_q <= v; end if; end process; bitclk <= '1' when clkctr_q="00" else '0'; And let's call this slightly modified version, version B: clkdiv: process(CLK, clkctr_q, ctl_q) variable v : std_logic_vector(1 downto 0); begin v := clkctr_q; if rising_edge(CLK) then v := v + 1; if ctl_q.clk4m='0' and v="10" then v:="11"; end if; end if; clkctr_q <= v; end process; bitclk <= '1' when clkctr_q="00" else '0'; The Xilinx ISE tool chain accepts both versions with a warning that "variable 'v' does not keep its value outside rising edge(CLK)". Both versions simulate okay. The Altera Quartus tool chain accepts version A without warning or error, and flags version B as an error (with a similar error text). Version A simulates okay. In my limited understanding of VHDL both versions should be okay and not give a warning or error. What subtle aspect of VHDL am I missing? (PS of course I can rewrite it to not have a warning or error, but I'm interested to understand why this is wrong and why the two tools chains have different opinions about it).
  11. I've been thinking about this for a bit. The Free range VHDL book talks about three common coding styles: data-flow, behavioural and structural (see chapter five). I think the difference may be that you are using a style that is mostly data-flow oriented and I'm using a style that is mostly behavioural oriented. The third style, structural, is what Grant Searle is doing in his top level file: connecting up components as if it is a schematic. I'm not sure these terms for coding styles are in common use, but does that make sense? The funny thing about it is that when I was reading up on VHDL in the past few months, I had decided to focus on a data-flow oriented style, as I imagined that would give me the maximum control over what the synthesiser would generate. Now that I look back on the 9902 project I see that I ended up doing mostly behavioural stuff without that ever being a conscious decision. By the way, I find the explanation of process blocks in chapter 5 confusing. It explains it in terms of code being executed sequentially, and for a while I imagined that the synthesised circuit would have interlock flip-flops to enforce that in real hardware. Only much later I realised that such language refers to a context where VHDL is being executed by the simulator. The synthesiser simply does a flow analysis (like a regular optimising compiler would) and extracts the equivalent logic formulas (and where it can't the block is non-synthesiable). I'm happy that these tools exist too, and I shouldn't look a gift horse in the mouth. You raise several interesting points. Will come back with a few thoughts on that "1970's feeling" later -- I'm a tad busy this week.
  12. Well, the tests went a lot quicker than I thought. I had to spent a bit of time learning to work with the tools, but the actual tests went much better than I had anticipated. The receive shift register shifted the wrong way 'round, and in the rcv/xmt state machines some timers were off by one (or more accurately: my thinking on what happens at a state transition was a bit muddled). All in all, very few changes versus the v4 source. I've also made the change to run the 9902 at the global clock speed, with just the timers running at 1Mhz. The clock divider for this is currently just /3 or /4 (as in a real 9902), but making it /50 or /100 is a trivial change. All of the timer, the transmitter and the receiver (along with all the associated configuration and status bits) appear to work in simulation. Making it "FPGA proven" will be next. I've also attached the source for the test bench. It is pretty basic now, but with the infrastructure done it will not be hard to make a real test suite that covers all chip modes and functions. If anybody wants to help a bit with defining test cases, I'd be much obliged. There's still a few tweaks and clean-up items to do in the source, but I'd say that having a VHDL 9902 for use in a Grant Searle style "soft breadboard" is 95% done (unless speccery's integration test brings up major new issues). tms9902_v5.vhd.txt tb_9902_v1.vhd.txt
  13. Matthew180's quick guidance how to start with simulations using the ISE toolchain were super helpful! I'm up and running with the first tests. No need to convince me on the utility of having a test bench: I've learned the hard way that one cannot confidently maintain and refactor a serious code base without one. That is one of the hard bits of working with vintage source code: more often than not there is no test suite.
  14. No worries: giving newly written source code a few weeks for code review is probably a good thing. It would also allow some time for me to get a test bench sorted out. Integrating working code is so much better than debugging a component and the integration at the same time. I just bought two for 35 bucks, incl. shipping. Haven't tested them yet, but they look genuine. With only a few spare macrocells I think the design won't fit anyway, but attempting this will give me more insight into the efficiency of synthesis. In many ways the current state of synthesis feels like the state of C in the late 70's: the result is 30% bigger and 30% slower than hand coded stuff, but the higher abstraction level makes it worth it. Also, working around 1970's C compiler bugs very much feels like the synthesis voodoo of today: always wondering "will this particular construct translate okay?" In general I agree with your point. I want to use HDL's to quickly prototype stuff, a "soft breadboard" if you like. For that I want a spacious FPGA so that even quick & dirty code will have ample room.
  15. I got the dev board (Spartan-6) about two years ago and it sat in a project box since then. Finally did the "blinken lights" a few weeks ago. I've been reading a lot about VHDL and synthesis in the past quarter. This is the first 'real' project. Perhaps I should try that a bit more. Quartus seems to insist on a constraints file before it can do simulation. Haven't really dived into it yet, but the ISE tool seems to autogenerate it. Pointers on how to get ISE to help with generating a test bench would be welcome. Many thanks for all these helpful pointers, much appreciated!
  16. And here is the code with the receiver added as well. Now at 845 lines, and there are only a few small bits and bobs still to do (e.g. DSR interrupt) -- I think the total will remain well below 900. Full file attached. The receiver code is very similar to the transmitter code, so I won't do a walk through. I ran the code through both the Altera and the Xilinx tool chain. I manually calculate that I have 154 flip-flops in the various registers and counters. Altera reports 158 and Xilinx reports 148. Puzzling... In any case, squeezing it into an EPM7160 CPLD will be a tight fit (it has 160 flip-flops). I think I will leave the code aside for a bit and go back to it with "fresh eyes" later. tms9902_v4.vhd.txt
  17. Well, I'm just a beginner here and maybe later on in this project I will realise that I approached it all wrong. No better school than the school of hard knocks :^) My coding style is also influenced by this project: https://github.com/wfjm/w11 My understanding of variables and signals in VHDL is as follows. When used for simulation, the VHDL runtime keeps an event queue with the next, current and previous state of all signals. Assigning a signal reads from the current state and writes to the next state. What is 'current' shifts continuously forward as simulated time progresses. Variables only exist inside a process block and are like regular variables. It would seem that variables keep their value from event to event (i.e. process blocks in simulation are "closures") and if one relies on this the block becomes non-synthesizable. In my "*_cmb" process block code all variables are re-initialised for each event (i.e. the process blocks are pure combinational) and this can be synthesised, or so it seems. Assigning a signal in simulation is expensive (it requires updating the event queue) and according to Gaisler some 100x slower than simply assigning a variable. Hence I keep all my intermediaries in variables, even though my designs are so small that I probably won't notice the difference. My understanding is that synthesising was something that was later overlaid on the VHDL language (and is arguably a language in itself). What constructs are being recognised as what circuit seems to me to be a bit of a black art, with differences between tools and changing as technology progresses. In effect, my *_cmb process blocks are truth tables written up like code. It would seem that toolchains have learned to recognise this (and even the 15 year old synthesiser shipped by Atmel/Microchip seems to handle it well). Now that I have written some code I think that I did not quite understand Gaisler: I'm still writing code with each register in a separate process statement. This is not necessary: all the registers for the timer could be in a big single record, as could all the registers for the transmitter. Doing so keeps all related code together and this is perhaps easier to read and maintain. Once I have working code I will try that refactoring and see if makes sense.
  18. All: when it comes to running VHDL simulations I'm starting from absolute zero. I'd be happy to hear of tips, suggestions and perhaps pointers to tutorials for running simulations / test benches on the Altera Quartus II software. My confusion already starts at the constraints file. It sounds like I'm on a similar journey, only 4 years later. I'm hoping to get Stuarts 9995 breadboard running multicomp style, and hook up a real 99105 to an otherwise emulated system next.
  19. Yeah... I've now designed it for PHI to be either 3 or 4 MHz, just like a real part: I'm thinking this design will maybe fit in a Altera EPM7160 CPLD, which may be nice for 5V-based experiments. The v3 source has the divide-by-3-or-4 clock divider circuit implemented. For most of the registers it does not matter how fast CLK is, 100MHz would be acceptable. The only issue to solve would be the bit rate counters, which have to count at about 1Mhz. What you could do when interfacing with a 100MHz design is modifying the clock circuit thus: PHI : the 100Mhz system clock --- signal clk : std_logic; signal bitclk : std_logic; signal clkctr_q : std_logic_vector(6 downto 0); ---- clk <= PHI; -- run 9902 logic at 100MHz clkdiv: process(PHI, clkctr_q) variable v : std_logic_vector(1 downto 0); begin v := clkctr_q; if rising_edge(phi) then v := v + 1; if v="1100100" then v:="0000000"; end if; clkctr_q <= v; end if; end process; bitclk <= '1' when clkctr_q="0000000" else '0'; -- 1 Mhz clock . The bit rate clock would then be modified like this: xhbctr_cmb : process(xhbctr_q, xdr_q, bitclk, sig_xhb_reset) variable v : std_logic_vector(13 downto 0); variable z : std_logic; begin v := xhbctr_q; if v="000000000000000" then z := '1'; else z := '0'; end if; if sig_xhb_reset='1' or z='1' then v := xdr_q(10)&"000"&xdr_q(9 downto 0); elsif bitclk='1' then v := v - 1; end if; xhbctr_d <= v; sig_xhbctr_iszero <= z; end process; . This of course all assumes that the internal clock rate of the 9902 is exactly 1 Mhz, like it is in Stuart's breadboard/PCB designs. If using software designed to work on hardware with another clock rate the clock divider would need a different constant to match the proper internal 9902 frequency.
  20. Got the transmitter blocks coded up. Now at 650 lines, of which 220 for the transmitter. If the receiver is ~250 lines the total would be 900, close to the original estimate of one thousand. The transmitter is driven of a clock signal running at twice the bitrate, just as in a real 9902. This allows the transmitter to send 1.5 stop bits (and makes the design of transmitter and receiver similar, as the receiver also needs half-bit times). xhbctr_reg : process(clk) begin if rising_edge(clk) then xhbctr_q <= xhbctr_d; end if; end process; xhbctr_cmb : process(xhbctr_q, xdr_q, sig_xhb_reset) variable v : std_logic_vector(13 downto 0); variable z : std_logic; begin v := xhbctr_q; if v="000000000000000" then z := '1'; else z := '0'; end if; if sig_xhb_reset='1' or z='1' then v := xdr_q(10)&"000"&xdr_q(9 downto 0); else v := v - 1; end if; xhbctr_d <= v; sig_xhbctr_iszero <= z; end process; The half-bit timer is a free running 14 bit counter that counts down from the value in the data rate register to zero. It can be reset by the transmit controller at the start of a new byte being sent. When the counter crosses zero it emits a one-clock signal to the controller. Then there is the transmit shift register. It has two control signals, "load from the transmit buffer register" and "shift", both provided by the controller: xsr_reg : process(clk) begin if rising_edge(clk) then xsr_q <= xsr_d; end if; end process; xsr_cmb : process(xsr_q, xbr_q, sig_xsr_load, sig_xsr_shift) variable v : std_logic_vector(7 downto 0); begin v := xsr_q; if sig_xsr_load='1' then v := xbr_q; elsif sig_xsr_shift='1' then v := '0'&xsr_q(7 downto 1); end if; xsr_d <= v; end process; (Perhaps I should do the shift with the VHDL 'ssr' operator, not sure if that is better). The controller is the most complex, about 100 lines. It has the following register bits: type xmtstat is (IDLE, BREAK, START, BITS, PARITY, STOP); type xmtFSM_type is record xbre : std_logic; xsre : std_logic; xout : std_logic; rts : std_logic; par : std_logic; bitctr : std_logic_vector(4 downto 0); state : xmtstat; end record; The elements 'xbre' to 'rts' are as per the datasheet and the bit 'par' holds the parity bit. The 'bitctr' element counts out delay times, running from N to 1. The 'state' field holds the current state of the controller (intended to be binary encoded). The controller starts out with figuring out the waiting times for the databits (depending on how many there are) and for the stop bits. It also makes the bit counter count down at twice the bit rate: xmtFSM_cmb : process(xmtFSM_q, ctl_q, flg_q, xsr_q, nCTS, sig_reset, sig_xbr7, sig_xhbctr_iszero) variable v : xmtFSM_type; variable par : std_logic; variable xsr_load, xsr_shift, xhb_reset : std_logic; variable xbits : std_logic_vector(4 downto 0); variable sbits : std_logic_vector(4 downto 0); begin v := xmtFSM_q; xsr_load := '0'; xsr_shift := '0'; xhb_reset := '0'; -- prepare half-bit times for data word and stop bits case ctl_q.rcl is when "11" => xbits := "10000"; when "10" => xbits := "01110"; when "01" => xbits := "01100"; when "00" => xbits := "01010"; end case; case ctl_q.sbs is when "00" => sbits := "00011"; when "01" => sbits := "00100"; when others => sbits := "00010"; end case; if sig_xhbctr_iszero='1' then v.bitctr := v.bitctr - 1; end if; Next is the handling of reset (write to CRU bit 31) and the IDLE and BREAK states. These are independent of the bit rate clock: if sig_reset='1' then v.xout := '1'; v.rts := '0'; v.xsre := '1'; v.xbre := '1'; elsif sig_xbr7='1'then v.xbre := '0'; elsif v.state=BREAK then v.xout := '0'; if flg_q.brkon='0' then v.state := IDLE; end if; elsif v.state=IDLE then if flg_q.rtson='1' then v.rts := '1'; else v.rts := '0'; end if; if nCTS='0' then if v.xbre='1' then if flg_q.brkon='1' then v.state := BREAK; end if; else v.state := START; v.bitctr := "00010"; xhb_reset := '1'; end if; end if; This is all intended to be in line with the top of the flow diagram in the datasheet. When a new character transmission starts, the half-bit counter is reset, the wait time is set to two half-bit times, and the next state is START. This state (and all remaining ones) are synchronous with the half-bit timer. elsif sig_xhbctr_iszero='1' then case v.state is when START => v.xsre := '0'; v.xbre := '1'; v.xout := '0'; if v.bitctr=0 then xsr_load := '1'; v.state := BITS; v.bitctr := xbits; v.par := '0'; end if; The START state waits for two half-bit times, keeping 'xout' zero. At the end, it also loads the shift register, sets up a new waiting time and inits the parity calculation bit. The next state is BITS. This is where the controller waits for the data bits to shift out and calculates even parity as it goes along. when BITS => if v.bitctr(0)='1' then v.par := v.par xor xsr_q(0); xsr_shift := '1'; end if; if v.bitctr=0 then if ctl_q.penb='1' then v.state := PARITY; v.bitctr := "00010"; else v.state := STOP; v.bitctr := sbits; end if; end if; When all the bits have been shifted out, it either moves to the PARITY state or to the STOP state. The PARITY state is responsible for sending out the parity bit. when PARITY => if ctl_q.podd='1' then v.xout := not v.par; else v.xout := v.par; end if; if v.bitctr=0 then v.state := STOP; v.bitctr := sbits; end if; In this state 'xout' equals the even/odd parity bit. At the end it sets up the STOP state, including its precalculated waiting time. The STOP state simply sets 'xout' to one and waits. It then moves back to the IDLE state. The rest of the controller code is boilerplate: when STOP => v.xout := '1'; if v.bitctr=0 then v.state := IDLE; end if; when others => v.state := IDLE; end case; end if; xmtFSM_d <= v; sig_xhb_reset <= xhb_reset; sig_xsr_load <= xsr_load; sig_xsr_shift <= xsr_shift; end process; So far, it still compiles and synthesises, but nothing at allhas been tested yet. I suspect that the code for the receiver will be very similar to that of the transmitter and perhaps some 250 lines. The full code of where I am at has been attached. tms9902_v3.vhd.txt
  21. That would be cool. Note however that I have not simulated it with a testbench, and the old adage is: "if it hasn't been tested it doesn't work" I can imagine things like bits shifting out the wrong direction, counters being of by one, parity in reverse, etc. Perhaps also fundamental design errors. I'll post the transmitter shortly.
  22. I've now added the timer circuitry. First I had a design with a finite state machine (FSM) with states like 'reset', 'wait', 'load', etc. When thinking that through I came to the conclusion that the timer controller really only has one state (apart from the 'timelp' and 'timerr' status bits). My current implementation is: -- define timer counter register -- timctr_reg : process(clk) begin if rising_edge(clk) then timctr_q <= timctr_d; end if; end process; timctr_cmb : process(timctr_q, tmr_q, sig_ldir_reset) variable v : std_logic_vector(13 downto 0); variable z : std_logic; begin v := timctr_q; if v="00000000000000" then z := '1'; else z := '0'; end if; if sig_ldir_reset='1' or z='1' then v := tmr_q&"000000"; else v := v - 1; end if; timctr_d <= v; sig_timctr_iszero <= z; end process; -- define timer controller register -- timFSM_reg : process(clk) begin if rising_edge(clk) then timFSM_q <= timFSM_d; end if; end process; timFSM_cmb : process(timFSM_q, sig_reset, sig_timenb, sig_timctr_iszero) variable v : timFSM_type; begin v := timFSM_q; if sig_reset='1' or sig_timenb='1'then v.timelp := '0'; v.timerr := '0'; elsif sig_timctr_iszero='1' then if v.timelp='1' then v.timerr := '1'; end if; v.timelp := '1'; end if; timFSM_d <= v; end process; The timer counter has 6 extra bits at the bottom versus the timer interval register (14 instead of 8 total): these are for the initial divide by 64. I've added four signals ('wires') for communication between the elements: sig_reset, sig_timenb, sig_ldir_reset, sig_timctr_iszero. Each is asserted for 1 clock to signal certain conditions. The first three emirate from the CRU interface, the last from the timer counter. The timer turned out a lot shorter and simpler than I thought. Stuff still compiles and synthesises, now at 438 lines. Full file attached for review. I've got the design for the transmitter mostly done, coding that up is next. tms9902_v2.vhd.txt
  23. Thanks Matthew, those are useful comments! I've already found that I need to work on grouping and signal naming to keep the code readable. Rather than refactor now, I think I will first complete the design in its current setup. I haven't thought much about constraints files, timing and simulation yet. I guess a "test bench" might be as much work as the 9902 itself.
  24. Well, I got underway with this project. I'm kinda new to working with VHDL and mostly following the advice given in the Freerange VHDL book. I'm trying to follow the coding style recommended by Gaisler and the best practices from P. Chu's lecture slides. So far I have done the code for the 6 simple registers and for the CRU interface. Some highlights below. I start out with some boilerplate and with defining the pins of the 9902: library ieee; use ieee.std_logic_1164.all; entity tms9902 is port ( PHI : in std_logic; -- input CLK nRTS : out std_logic; nDSR : in std_logic; nCTS : in std_logic; nINT : out std_logic; nCE : in std_logic; CRUOUT : in std_logic; CRUIN : out std_logic; CRUCLK : in std_logic; XOUT : out std_logic; RIN : in std_logic; S : in std_logic_vector(4 downto 0) ); end; Next, I'm declaring the registers. In most cases this is simply a bag of bits, like so: -- interval register signal tmr_q, tmr_d : std_logic_vector(7 downto 0); The signals ending in "_q" are the D flip-flop outputs, the signals ending in "_d" are the inputs. The entire circuit will be synchronous on a single clock (signal "clk"). In some cases (the flag register and the control register) I find it easier to define the bits by a mnemonic name, so I'm using a record to do that like so: -- control register type ctl_type is record sbs : std_logic_vector(2 downto 1); penb : std_logic; podd : std_logic; clk4m : std_logic; rcl : std_logic_vector(1 downto 0); end record; signal ctl_q, ctl_d : ctl_type; Each register is built up out of two statements: the first defines the D flip-flops, the second is pure combinational and defines the "_d" inputs. Most of the time "_d" equals "_q" and the flip-flops remain in the same state. The exception is when there is a CRU write operation. Defining the register flip-flops for the timer interval register: tmr_reg : process(clk) begin if rising_edge(clk) then tmr_q <= tmr_d; end if; end process; And defining its inputs: tmr_cmb : process(tmr_q, nCE, CRUCLK, flg_q, CRUOUT, S) variable v : std_logic_vector(7 downto 0); begin v := tmr_q; if nCE='0' and CRUCLK='1' and flg_q.ldctl='0' and flg_q.ldir='1' then case S is when "00111" => v(7) := CRUOUT; when "00110" => v(6) := CRUOUT; when "00101" => v(5) := CRUOUT; when "00100" => v(4) := CRUOUT; when "00011" => v(3) := CRUOUT; when "00010" => v(2) := CRUOUT; when "00001" => v(1) := CRUOUT; when "00000" => v(0) := CRUOUT; when others => null; end case; end if; tmr_d <= v; end process; It uses an intermediary variable for simulation efficiency. The variable is initialized with the "_q" outputs, and at the end the "_d" inputs are set from the variable: most of the time nothing happens. Only if the chip is selected, there is an active CRUCLK input, the LDCTL bit is reset and the LDIR bit is set (see datasheet for the latter two), one of the bits is changed. When S4..S0 is all zeroes, the lsb is modified, etc. Most of the registers work the same simple way. Only the register with the flag bits is a bit more complex, as there is some nifty logic associated with that. Again see the datasheet for details, there is an almost 1:1 relationship between the specs and the VHDL code. Reading is a bit simpler. Basically I hook CRUIN up to the tri-state output of a 32->1 multiplexer. I don't have all the required signals yet, but the structure is there: CRUIN <= 'Z' when nCE='1' else intr when S="11111" else -- 31, any interrupt pending flag when S="11110" else -- 30, 'flag' field dsch when S="11101" else -- 29, device status change not nCTS when S="11100" else -- 28, inverse of nCTS input not nDSR when S="11011" else -- 27, inverse of nDSR input flg_q.rtson when S="11010" else -- 26, inverse of nRTS output timelp when S="11001" else -- 25, timer elapsed '0' when S="11000" else -- 24, 'timerr', todo '0' when S="10111" else -- 23, 'xsre', todo xbre when S="10110" else -- 22, transmit buffer register empty rbrl when S="10101" else -- 21, receive buffer register full dscint when S="10100" else -- 20, device status change interrupt pending timint when S="10011" else -- 19, timer interrupt pending '0' when S="10010" else -- 18, not used (always 0) xint when S="10001" else -- 17, transmit interrupt pending rint when S="10000" else -- 16, receive interrupt pending RIN when S="01111" else -- 15, direct copy of RIN '0' when S="01110" else -- 14, 'rsbd', todo '0' when S="01101" else -- 13, 'rfbd', todo '0' when S="01100" else -- 12, 'rfer', todo '0' when S="01011" else -- 11, 'rover', todo '0' when S="01010" else -- 10, 'rper', todo '0' when S="01001" else -- 9, 'rcverr', todo '0' when S="01000" else -- 8, not used (always 0) rbr_q(7) when S="00111" else -- 7, receive buffer register, bit 7 rbr_q(6) when S="00110" else rbr_q(5) when S="00101" else rbr_q(4) when S="00100" else rbr_q(3) when S="00011" else rbr_q(2) when S="00010" else rbr_q(1) when S="00001" else rbr_q(0) when S="00000" else -- 0, receive buffer register, bit 0 '0'; Well that's the main points of what I have now. I've attached the full file for review (.txt added so the forum will accept it). So far it seems to compile and synthesize. A good 300 lines so far. Next up I'll look at the timer controller and timer counter. tms9902_v1.vhd.txt
  25. Hi FarmerPotato, that is very interesting! I'm not too familiar with Verilog, but my understanding is that -- when working with code intended for synthesis -- the difference between VHDL and and Verilog is not all that much. Once I get my initial design and code done, I'd love to compare notes. And of course I would appreciate any comments that you may have on the code I will post. That's a very cool idea: the 9902 as a standard API for pumping bits down a channel, whatever that channel may be.
  • Create New...