Jump to content
IGNORED

VHDL for 99xx chips


pnr

Recommended Posts

All: when it comes to running VHDL simulations I'm starting from absolute zero. :-o I'd be happy to hear of tips, suggestions and perhaps pointers to tutorials for running simulations / test benches on the Altera Quartus II software. My confusion already starts at the constraints file.

 

Start slow, it will take time. I recommend you get an FPGA devboard that has some tutorials that you can just load and blink an LED first, that is the "hello world" equivalent for FPGAs. Getting the full end-to-end process working quickly is really important, even if you don't understand all the pieces yet.

 

I have only ever once looked at Altera's FPGA software, and that was in 2011. For better or worse, I went with Xilinx. I'm sure Quartus can create the TB (test bench) boiler-plate stuff, and I have to image they have some sort of simulator that lets you see the signal waveforms and such.

 

As for the constraints file, keep in mind that an FPGA is a big pile of circuits and I/O that get configured into the circuit you describe. The constraints file is where you specify what physical I/O pins on the FPGA is connected to the specific "net names" (the term "net" comes from schematic capture and PCB layout, and is a named wire, essentially). You can also specify electrical constraints, like that an I/O is 3.3V or 1.8V, etc. For clock input signals (usually you have an external oscillator providing a clock input to an FPGA), you must also specify the frequency of the clock so the synthesizer can perform the proper propagation delay calculations. You don't need a constraints file until you want to load your design onto a real FPGA. You will need a schematic and FPGA pin-out documentation for this task.

 

 

My coding style is also influenced by this project: https://github.com/wfjm/w11

That is an interesting project, thanks for the link. It looks very clean and there are definitely some things in there I have not seen before (the "alias" type for example). I'm constantly looking for better ways to do HDL, to make it easier to read and manage, etc. so I'll be experimenting with some of that and maybe adopting some new methods.

 

My understanding of variables and signals in VHDL is as follows. When used for simulation, the VHDL runtime keeps an event queue with the next, current and previous state of all signals. Assigning a signal reads from the current state and writes to the next state. What is 'current' shifts continuously forward as simulated time progresses. Variables only exist inside a process block and are like regular variables. It would seem that variables keep their value from event to event (i.e. process blocks in simulation are "closures") and if one relies on this the block becomes non-synthesizable. In my "*_cmb" process block code all variables are re-initialised for each event (i.e. the process blocks are pure combinational) and this can be synthesised, or so it seems.

I try not to think or HDL in terms of "programming", or even make correlations to programming. It was a programming mind-set that got me into trouble with the F18A HDL, and only after I stopped thinking in terms of "code", and started thinking in terms of "how would I do this with 74LS logic?", only then did I start making progress again.

 

Assigning a signal in simulation is expensive (it requires updating the event queue) and according to Gaisler some 100x slower than simply assigning a variable. Hence I keep all my intermediaries in variables, even though my designs are so small that I probably won't notice the difference.

It depends on the size of the project. I did a simulation on an entire Joust SoC that I wrote without too much trouble. I would be careful when adopting claims like that one. In most cases, your simulations are so small that you won't notice. Then again, I have never simulated a design with variable vs signals. I just know that my simulations are typically so fast that it was not an issue.

 

My understanding is that synthesising was something that was later overlaid on the VHDL language (and is arguably a language in itself). What constructs are being recognised as what circuit seems to me to be a bit of a black art, with differences between tools and changing as technology progresses. In effect, my *_cmb process blocks are truth tables written up like code. It would seem that toolchains have learned to recognise this (and even the 15 year old synthesiser shipped by Atmel/Microchip seems to handle it well).

You can think of synthesizing as being roughly analogous to compiling. The synthesizer reads the HDL and infers logic, like muxes, counters, comparators, registers, memory, ROM, RAM, etc. You should always read the output of the synthesizer carefully to make sure it found the circuits you are trying to describe with your HDL. After all, that is what HDL is, describing hardware. Thats why I think most people avoid the language constructs that do not correlate to hardware.

 

There is then a step where the results of synthesis are mapped to an actual device, and FPGA resources are consumed. This is called "place and route". From that there is another step where the bit-stream is produced. The tools like ISE, Vivado, Quartus, etc. mash all this together, kind of like a compiler will call the linker for you.

 

Now that I have written some code I think that I did not quite understand Gaisler: I'm still writing code with each register in a separate process statement. This is not necessary: all the registers for the timer could be in a big single record, as could all the registers for the transmitter. Doing so keeps all related code together and this is perhaps easier to read and maintain. Once I have working code I will try that refactoring and see if makes sense.

There are many ways to organize your HDL, format your HDL, etc. and coming up with an organization and style that makes sense to you is the most important. After that, try for readability and understanding. The parallel nature of hardware makes it hard to follow when presented in a form that resembles "code". IMO a schematic is far more superior for hardware, which is probably why schematics were invented. ;-) Try to organize you code into blocks just like you would in real hardware.

  • Like 1
Link to comment
Share on other sites

And here is the code with the receiver added as well. Now at 845 lines, and there are only a few small bits and bobs still to do (e.g. DSR interrupt) -- I think the total will remain well below 900.

 

Full file attached. The receiver code is very similar to the transmitter code, so I won't do a walk through.

 

I ran the code through both the Altera and the Xilinx tool chain. I manually calculate that I have 154 flip-flops in the various registers and counters. Altera reports 158 and Xilinx reports 148. Puzzling... In any case, squeezing it into an EPM7160 CPLD will be a tight fit (it has 160 flip-flops).

 

I think I will leave the code aside for a bit and go back to it with "fresh eyes" later.

 

tms9902_v4.vhd.txt

Link to comment
Share on other sites

Start slow, it will take time. I recommend you get an FPGA devboard that has some tutorials that you can just load and blink an LED first, that is the "hello world" equivalent for FPGAs. Getting the full end-to-end process working quickly is really important, even if you don't understand all the pieces yet.

 

I got the dev board (Spartan-6) about two years ago and it sat in a project box since then. Finally did the "blinken lights" a few weeks ago. I've been reading a lot about VHDL and synthesis in the past quarter. This is the first 'real' project.

 

I have only ever once looked at Altera's FPGA software, and that was in 2011. For better or worse, I went with Xilinx.

 

Perhaps I should try that a bit more. Quartus seems to insist on a constraints file before it can do simulation. Haven't really dived into it yet, but the ISE tool seems to autogenerate it. Pointers on how to get ISE to help with generating a test bench would be welcome.

 

<... much more ...>

 

Many thanks for all these helpful pointers, much appreciated!

Link to comment
Share on other sites

ISE test bench from memory:

 

1. Source -> Add New

2. Select VDHL Test Bench

3. Select your UUT VHDL file, i.e. the VHDL file in your project that you want to test. This should *not* be your "top", unless you want to simulate the whole project. Assuming you are doing one module per HDL file, this will be a test bench for that module.

4. Hit Ok

 

A test bench is literally another HDL file that you will add to your project. Some people make a separate folder for the test bench HDL files. I tend to just prefix all the test bench files with tb_ (real original I know) and put them in the same folder as the rest of my HDL. Open the test bench file and you will see a bunch of boiler plate. Down near the bottom you will see a comment that says something like "add stimulus here" or some such thing. This is where you provide the inputs and read the outputs of the module. You just set inputs to make the module work. Something like: input_1 <= '1'; or whatever.

 

In ISE there is a radio button at the top of the left area where your files are shown, you can select "syntheisze" or "simulation" (or something like that, I can't remember exactly). This is a kind of mode-switch and by default it will be set to "synthesize". Change that to "simulation" and select the test bench. Now down below in the lower left box you should have an option to "Simulate". Start the simulation and ISE will do some compiling, and assuming there are no errors, it will open the iSim tool (or iSimScope, or whatever it is called).

 

At the top there is a button to "play" the simulation for a designated amount of time. Choose something like 10us and hit play. On the left you can drill down into the signals and pick which ones you want to see. It is a little confusing, but find the "UUT" entry in the tree and expand that. Now you should see all the signals in your component. Add those to the waveforms section and rerun the simulation. Add / remove signals, change the simulation time, repeat. You can also change how the signals are displayed, i.e. binary, decimal, hex, etc.

 

That is very brief, but it would take a rather long tutorial to step through it in detail. I'm pretty sure there must already be some screen shots and videos out there of doing an ISE simulation. If not, I might be able to put something together.

  • Like 1
Link to comment
Share on other sites

I don't know if the constraints file for ISE is required to simulate. I always just make one pretty early because they are not that hard. The format between ISE and Vivado changed, and the Vivado format is a pain in the ass IMO. There is a tool somewhere in ISE that you can use to make the constraints file, but I have never bother to look or try it out, I always just make it by hand. For example say you had a board with a 50MHz oscillator and four LEDs. Your constraints file might look like this:

# Clocks
NET "clk_50m0_net" PERIOD = 20 ns | LOC = "K3";

# LEDs
NET "led_net<0>" LOC="P11" | IOSTANDARD=LVTTL | DRIVE=8 | SLEW=SLOW;
NET "led_net<1>" LOC="N9"  | IOSTANDARD=LVTTL | DRIVE=8 | SLEW=SLOW;
NET "led_net<2>" LOC="M9"  | IOSTANDARD=LVTTL | DRIVE=8 | SLEW=SLOW;
NET "led_net<3>" LOC="P9"  | IOSTANDARD=LVTTL | DRIVE=8 | SLEW=SLOW;

Your "top" HDL file would have its port map like this:

entity top is
   port (
      clk_50m0_net   : in  std_logic;
      led_net        : out std_logic_vector(0 to 3)
   );
end top;

-- Set the LEDs on
led_net <= "1111";

-- Set one LED off
led_net(2) <= '0';

And so on. Your clock input will usually go to a DCM block to make whatever frequency you need, and get buffered for distribution around the FPGA. You also want to make sure you always drive outputs with registers! My example above probably won't work because led_net(2) is driven by two values, and the synthesizer will complain (rightfully so). The LOC entries are "location" and are specific to the FPGA device you are using. Constraints files are always device specific. You get the location values from the datasheet.

Link to comment
Share on other sites

And here is the code with the receiver added as well. Now at 845 lines, and there are only a few small bits and bobs still to do (e.g. DSR interrupt) -- I think the total will remain well below 900.

 

Full file attached. The receiver code is very similar to the transmitter code, so I won't do a walk through.

 

I ran the code through both the Altera and the Xilinx tool chain. I manually calculate that I have 154 flip-flops in the various registers and counters. Altera reports 158 and Xilinx reports 148. Puzzling... In any case, squeezing it into an EPM7160 CPLD will be a tight fit (it has 160 flip-flops).

 

I think I will leave the code aside for a bit and go back to it with "fresh eyes" later.

 

I had no time yesterday to do anything, and I have a workshop this weekend; next weekend heading to Asia, so not much time at home. Ill do what I always do and take a few FPGA boards with me, Ill try to do some integration of your code - that would be fun.

 

The EPM7160 seems to be a very expensive chip. I took a look last week, perhaps I was looking at the wrong place, but paying nearly 200 per chip is just too much. Anyway with the experience I have now gained with the XC95144XL CPLD I will definitely lean towards real FPGAs in the future. I dont have that much time for the hobby, so when I have the time Id rather not fight against the CPLD routing...

Link to comment
Share on other sites

I had no time yesterday to do anything, and I have a workshop this weekend; next weekend heading to Asia, so not much time at home. Ill do what I always do and take a few FPGA boards with me, Ill try to do some integration of your code - that would be fun.

 

No worries: giving newly written source code a few weeks for code review is probably a good thing. It would also allow some time for me to get a test bench sorted out. Integrating working code is so much better than debugging a component and the integration at the same time.

 

 

 

The EPM7160 seems to be a very expensive chip. I took a look last week, perhaps I was looking at the wrong place, but paying nearly 200 per chip is just too much. Anyway with the experience I have now gained with the XC95144XL CPLD I will definitely lean towards real FPGAs in the future. I dont have that much time for the hobby, so when I have the time Id rather not fight against the CPLD routing...

 

I just bought two for 35 bucks, incl. shipping. Haven't tested them yet, but they look genuine. With only a few spare macrocells I think the design won't fit anyway, but attempting this will give me more insight into the efficiency of synthesis. In many ways the current state of synthesis feels like the state of C in the late 70's: the result is 30% bigger and 30% slower than hand coded stuff, but the higher abstraction level makes it worth it. Also, working around 1970's C compiler bugs very much feels like the synthesis voodoo of today: always wondering "will this particular construct translate okay?"

 

In general I agree with your point. I want to use HDL's to quickly prototype stuff, a "soft breadboard" if you like. For that I want a spacious FPGA so that even quick & dirty code will have ample room.

  • Like 1
Link to comment
Share on other sites

To continue on the discussion about simulation workbenches, I concur that it definitely makes sense to create one. I keep forgetting that ISE has the facility of creating a template for that, thanks matthew180 for reminding about that. I had used it, then I coded some manually, then I remembered again that the tool exists...

 

For pnr’s benefit it’s perhaps worth pointing out that even a very simple test bench can give you a lot of information. Normally you don’t need much more stimulus in the test bench than a clock and a reset to get started. With the TMS9902 you probably also want to include a CRU write method, so that you can initialize the design and try to send a byte.

Link to comment
Share on other sites

Matthew180's quick guidance how to start with simulations using the ISE toolchain were super helpful!

 

I'm up and running with the first tests. No need to convince me on the utility of having a test bench: I've learned the hard way that one cannot confidently maintain and refactor a serious code base without one. That is one of the hard bits of working with vintage source code: more often than not there is no test suite.

Link to comment
Share on other sites

@speccery: wow, I might never have used a test bench if I had to have written it from scratch! ;-) I was still learning and did not understand that the test bench is just HDL too, albeit using language constructs that only work for simulation (which, of course, is the whole point of test benches...)

 

I'm glad the info has been useful. There is just too much information I want to say on every topic to convey everything. Every time I write something it reminds me of something else; lots of hard-won knowledge. I have to remember you will need to run into some of your own brick walls though, it is the only way to learn. Thinking in terms of hardware instead of software was a big one for me, along with understanding what pipe-lining really is, what it means to "register" you input and outputs, and the importance of synchronizing asynchronous inputs.

 

 

In many ways the current state of synthesis feels like the state of C in the late 70's: the result is 30% bigger and 30% slower than hand coded stuff, but the higher abstraction level makes it worth it. Also, working around 1970's C compiler bugs very much feels like the synthesis voodoo of today: always wondering "will this particular construct translate okay?"

 

I'm curious what gives you that notion? What kind of "hand coding" would you do in terms of HDL? I would hate to try and write the equivalent equations for my HDL. I'm not even sure a human could do it for a complex design. I'm actually very impressed by the synthesizer and what it can infer from the HDL. I have played with writing compilers for languages like C and Pascal, but an HDL compiler... I would not even know where to begin. Maybe it is simpler than I imagine, but I don't think so. If it was as easy as writing compilers then I think we would see more open source HDL synthesizers.

 

I see people complain about the closed-source vendor tools a lot, but personally I'm just really happy companies like Xilinx and Altera make their tools available for free. I would not be doing FPGA work if I would have had to pay for the tools. They can be big and unwieldy sometimes for sure, but they tend to work.

  • Like 2
Link to comment
Share on other sites

Well, the tests went a lot quicker than I thought. I had to spent a bit of time learning to work with the tools, but the actual tests went much better than I had anticipated.

 

The receive shift register shifted the wrong way 'round, and in the rcv/xmt state machines some timers were off by one (or more accurately: my thinking on what happens at a state transition was a bit muddled). All in all, very few changes versus the v4 source. I've also made the change to run the 9902 at the global clock speed, with just the timers running at 1Mhz. The clock divider for this is currently just /3 or /4 (as in a real 9902), but making it /50 or /100 is a trivial change.

 

All of the timer, the transmitter and the receiver (along with all the associated configuration and status bits) appear to work in simulation. Making it "FPGA proven" will be next.

 

I've also attached the source for the test bench. It is pretty basic now, but with the infrastructure done it will not be hard to make a real test suite that covers all chip modes and functions. If anybody wants to help a bit with defining test cases, I'd be much obliged.

 

There's still a few tweaks and clean-up items to do in the source, but I'd say that having a VHDL 9902 for use in a Grant Searle style "soft breadboard" is 95% done (unless speccery's integration test brings up major new issues).

tms9902_v5.vhd.txt

tb_9902_v1.vhd.txt

  • Like 1
Link to comment
Share on other sites

@pnr: Your use of the "variable" type is... interesting. [...] Admittedly it looks a little strange to me, [...]. This is all just my opinion and such, so don't take this wrong, I'm just contrasting and comparing, and trying to decide if using the variable type is something I might start trying to use more of. Your HDL is the most use of the variable type that I have ever seen, and I wonder if it will make it equally as strange to you when you see HDL that does not use variables at all.

 

I've been thinking about this for a bit. The Free range VHDL book talks about three common coding styles: data-flow, behavioural and structural (see chapter five). I think the difference may be that you are using a style that is mostly data-flow oriented and I'm using a style that is mostly behavioural oriented. The third style, structural, is what Grant Searle is doing in his top level file: connecting up components as if it is a schematic. I'm not sure these terms for coding styles are in common use, but does that make sense?

 

The funny thing about it is that when I was reading up on VHDL in the past few months, I had decided to focus on a data-flow oriented style, as I imagined that would give me the maximum control over what the synthesiser would generate. Now that I look back on the 9902 project I see that I ended up doing mostly behavioural stuff without that ever being a conscious decision.

 

By the way, I find the explanation of process blocks in chapter 5 confusing. It explains it in terms of code being executed sequentially, and for a while I imagined that the synthesised circuit would have interlock flip-flops to enforce that in real hardware. Only much later I realised that such language refers to a context where VHDL is being executed by the simulator. The synthesiser simply does a flow analysis (like a regular optimising compiler would) and extracts the equivalent logic formulas (and where it can't the block is non-synthesiable).

 

pnr wrote: "In many ways the current state of synthesis feels like the state of C in the late 70's: the result is 30% bigger and 30% slower than hand coded stuff, but the higher abstraction level makes it worth it. Also, working around 1970's C compiler bugs very much feels like the synthesis voodoo of today: always wondering "will this particular construct translate okay?""

 

I'm curious what gives you that notion? What kind of "hand coding" would you do in terms of HDL? I would hate to try and write the equivalent equations for my HDL. I'm not even sure a human could do it for a complex design. I'm actually very impressed by the synthesizer and what it can infer from the HDL. I have played with writing compilers for languages like C and Pascal, but an HDL compiler... I would not even know where to begin. Maybe it is simpler than I imagine, but I don't think so. If it was as easy as writing compilers then I think we would see more open source HDL synthesizers.

 

I see people complain about the closed-source vendor tools a lot, but personally I'm just really happy companies like Xilinx and Altera make their tools available for free. I would not be doing FPGA work if I would have had to pay for the tools. They can be big and unwieldy sometimes for sure, but they tend to work.

 

I'm happy that these tools exist too, and I shouldn't look a gift horse in the mouth.

 

You raise several interesting points. Will come back with a few thoughts on that "1970's feeling" later -- I'm a tad busy this week.

Edited by pnr
Link to comment
Share on other sites

I'm doing some cleanup of the 9902 code and here's a question:

 

Take this code for the clock divider (it is around line 200 in the full source file) and let's call it version A:

   clkdiv: process(CLK, clkctr_q, ctl_q)
   variable v : std_logic_vector(1 downto 0);
   begin
      v := clkctr_q;
      if rising_edge(CLK) then
         v := v + 1;
         if ctl_q.clk4m='0' and v="10" then v:="11"; end if;
         clkctr_q <= v;
      end if;
   end process;
   bitclk <= '1' when clkctr_q="00" else '0';

And let's call this slightly modified version, version B:

   clkdiv: process(CLK, clkctr_q, ctl_q)
   variable v : std_logic_vector(1 downto 0);
   begin
      v := clkctr_q;
      if rising_edge(CLK) then
         v := v + 1;
         if ctl_q.clk4m='0' and v="10" then v:="11"; end if;
      end if;
      clkctr_q <= v;
   end process;
   bitclk <= '1' when clkctr_q="00" else '0';

The Xilinx ISE tool chain accepts both versions with a warning that "variable 'v' does not keep its value outside rising edge(CLK)". Both versions simulate okay.

 

The Altera Quartus tool chain accepts version A without warning or error, and flags version B as an error (with a similar error text). Version A simulates okay.

 

In my limited understanding of VHDL both versions should be okay and not give a warning or error. What subtle aspect of VHDL am I missing?

 

(PS of course I can rewrite it to not have a warning or error, but I'm interested to understand why this is wrong and why the two tools chains have different opinions about it).

Link to comment
Share on other sites

Version B looks bogus to me, so I concur with Quartus. I have used variables only in limited ways: the way I have used them is just to clean code, in other words in a way where the same logic could be written without the variable in the first place, but using variables allows one to break down a complex statement/conditional thing into multiple easy to understand lines of code.

If you follow this line of thought - variables are just syntactic sugar - it is clear that version B is bogus, since the scope where v is incremented is different from where the value of v is retrieved. In other words, v is not stateless. Note that I have not read textbooks about this topic almost at all, this is just my simple line of thought.

Link to comment
Share on other sites

And when I spoke about scope above, I mean scope in the sense of time, not exactly as a scope in code. I would have written this so that all code touching/assigning v would be inside the >>if rising edge<< which reduces it to syntactic sugar. In a simple scenario like this using the variable does not add much, but Ive used them when the logic is more complex (if clauses etc) but still occurring at a given time, not across changes in clock.

Edited by speccery
Link to comment
Share on other sites

To illustrate, in the following are a few pieces of code from my tms9900.vhd. All of this stuff is occurring inside the same >>if rising edge<< block.

-- process declaration, line 347
-- here a few variables are declared
	process(clk, reset) is
	variable offset : std_logic_vector(15 downto 0);
	variable take_branch : boolean;
	variable dec_shift_count   : boolean := False;

-- a couple hundred lines omitted
-- from line 636 onwards, this is the giant state machine
-- sorry about the indentation, make your window wide...
					when do_branch =>
						-- do branching, we need to sign extend ir(7 downto 0) and add it to PC and continue.
						cpu_state <= do_fetch; -- may be overwritten with do_stuck
						take_branch := False;
						case ir(11 downto  is
						when "0000" => take_branch := True;	-- JMP
						when "0001" => if ST(14)='0' and ST(13)='0' then take_branch := True; end if; -- JLT
						when "0010" => if ST(15)='0' or  ST(13)='1' then take_branch := True; end if; -- JLE
						when "0011" => if                ST(13)='1' then take_branch := True; end if; -- JEQ
						when "0100" => if ST(15)='1' or  ST(13)='1' then take_branch := True; end if; -- JHE
						when "0101" => if                ST(14)='1' then take_branch := True; end if; -- JGT
						when "0110" => if                ST(13)='0' then take_branch := True; end if; -- JNE
						when "0111" => if                ST(12)='0' then take_branch := True; end if; -- JNC
						when "1000" => if                ST(12)='1' then take_branch := True; end if; -- JOC (on carry)
						when "1001" => if                ST(11)='0' then take_branch := True; end if; -- JNO (no overflow)
						when "1010" => if ST(15)='0' and ST(13)='0' then take_branch := True; end if; -- JL
						when "1011" => if ST(15)='1' and ST(13)='0' then take_branch := True; end if; -- JH
						when "1100" => if                ST(10)='1' then take_branch := True; end if; -- JOP (odd parity)
						when others => cpu_state <= do_stuck;
						end case;
						if take_branch then
							offset := ir(7) & ir(7) & ir(7) & ir(7) & ir(7) & ir(7) & ir(7) & ir(7 downto 0) & '0';
							pc <= std_logic_vector(unsigned(offset) + unsigned(pc));
						end if;

So basically above two variables are used. offset is used simply to make the code more readable.

take_branch is calculated with its own case clause, to address all the different cases (hopefully). As you can see you could write a logic equation to calculate take_branch in one go, but it would be one pretty messy equation, whereas the code above is readable (IMHO). Note that the variable take_branch is set and then immediately used, so all of this is occurring during the same clock cycle (rising edge), its value does not need preserved any further,

 

Sorry about many comments - I wrote the two first ones on my phone...

  • Like 1
Link to comment
Share on other sites

Hi Matthew180,

 

Below some thoughts on the interesting points you raised.

 

I have played with writing compilers for languages like C and Pascal, but an HDL compiler... I would not even know where to begin. Maybe it is simpler than I imagine, but I don't think so.

 

I don’t have direct knowledge of how HDL synthesisers work either, but I did enough reading to have a basic mental model of how they work. I think it is like this:

- First the source code is processed into an syntax tree (annotated by a symbol/signal table), like with any compiler. This part consists of well known lexer/parser algorithms.

- Then it proceeds to analyse the syntax tree, essentially converting the concurrent statements into logic formulas (net lists, actually). It does the same for process blocks by first doing a flow analysis and then extracting the logic when possible (and for pure combinational code this should always be possible).

- As part of the above it will try to find constructs that signify registers, adders, shifters and multiplexers. Registers can be synthesised onto macrocell flip-flops and adders/shifters can make use of the dedicated line between adjacent cells for the carry signal. It appears that an effort is also made to recognise FSM’s, although I’m not sure what the specific benefit is. The recognition of these things seems based on simple pattern recognition, with certain fixed idioms (code patterns) recognised to mean certain things. In order to be sure, the HDL developer must stay close to these standard idioms. If for example the fixed idioms are mixed too much with ‘random logic’, the recognition gets confused and the generated circuit will be confused as well. This is what I meant by “synthesiser voodoo”. It is my guess that this part of the process is far more heuristic (and possibly more simple) than one might imagine.

- After everything has been processed into net lists, several optimisation processes are run to minimise the logic and to eliminate dead code. Finally, the simplified result is matched against a library of standard cells to find the optimal allocation (see e.g. this) This step seems conceptually similar to a regular compiler finding an optimal covering of abstract operations with actual CPU instructions. Here my guess is that this part is pretty advanced, with good theoretical underpinnings. I was surprised that my very partial v1 source for the 9902 synthesised down to 6 flip-flops as all else was eliminated as unreachable.

- In the last step the library components are placed on the chip and wires routed. For an FPGA, with its set structure, this is perhaps a bit easier than for an ASIC back-end. The latter may be as complex as automated PCB design.

As said, I have no direct knowledge, but it is my current understanding of roughly how it works.

pnr wrote: "In many ways the current state of synthesis feels like the state of C in the late 70's: the result is 30% bigger and 30% slower than hand coded stuff, but the higher abstraction level makes it worth it. Also, working around 1970's C compiler bugs very much feels like the synthesis voodoo of today: always wondering "will this particular construct translate okay?"

 

I'm curious what gives you that notion? What kind of "hand coding" would you do in terms of HDL? I would hate to try and write the equivalent equations for my HDL. I'm not even sure a human could do it for a complex design.

 

Perhaps I should have said early 1980’s feeling. Don’t take this too seriously, it is not intended that way.

 

I think the similarity for me is that back then C programs would be bigger and slower than hand-coded assembler and that could be important on a slow machine with 64KB address space. Despite that being the case, C was still preferred because of the advantages that it offered: portability and a much higher abstraction level. In the case of CPLD’s one could think of writing logic formulas in e.g. CUPL as the equivalent of assembler. Working at this level you can still control every detail of how to fit a circuit onto a small device. Of course, this is only workable for small designs — the upper limit is perhaps 100 flip-flops. For larger, more complex systems it is clearly a non-scalable dead end.

My memory of the early 80’s is that most C tool chains had bugs. It was part of life then for C compilers to mistranslate certain less common constructs, or to generate a complex instruction sequence were a simple one would have worked. Black belt programmers would inspect the assembler output of the compiler and tweak the C code to get the translation they wanted. For critical code they sometimes wrote scripts to massage the compiler's assembler output. It is different, but for me reminiscent of how sometimes HDL code that works in simulation apparently has to be tweaked to work (efficiently) on a real FPGA as well.

If it was as easy as writing compilers then I think we would see more open source HDL synthesizers.

 

I think the non-existence of leading open source FPGA tools might have more to do with the small group of developers that are all of (i) capable software engineers, (ii) capable FPGA engineers and (iii) interested in working on an open source HDL tool chain. The vendors keeping bit stream formats secret is no help either. I suppose they have a valid commercial interest in doing so, both creating customer lock-in and protecting against reverse engineering of designs.

Having leading open source compilers for programming languages was not always the case. Let’s look at C. Up to the late 70’s system software was typically a service item with the hardware and customers often had access to source code. When this changed around 1980, compilers became closed source and at the time were not easy to write. For example, the original C compiler by Dennis Ritchie had some 13,000 lines of code and the Unix V6 core was only some 7,000 lines. That size was “big” for most programmers back then. In the early 80’s I think there were only two open source C compilers: Small C (some 6,000 lines) and cc68K (also some 6,000 lines); the latter would not run on 16 bit hardware. Minix source code was open (though not in the modern sense), but the tool chain (the “Amsterdam Compiler Kit”) was not. Similarly, Niklaus Wirth published the source to his Pascal-S system, but the descendant USCD Pascal system had no source openly available.

Only after the initial 1.0 release of gcc (in 1989, about 100K lines) the context started to change, and gcc over the next 10-20 years rose to dominance. I think today the C compiler space is essentially gcc and llvm, with scarcely a proprietary player left. Speccery pointed me in the direction of the IceStorm project. That project may currently be experiencing its own “1989” moment and who knows what will happen in the next 10-20 years in the HDL synthesis space?

I see people complain about the closed-source vendor tools a lot, but personally I'm just really happy companies like Xilinx and Altera make their tools available for free. I would not be doing FPGA work if I would have had to pay for the tools. They can be big and unwieldy sometimes for sure, but they tend to work.

 

As said, I'm of a similar mind. At the same time, the bloat of these systems is a pet peeve -- and actually kept me from installing any of it for very long time. Xilinx ISE is 17GB and 240 thousand files, Altera Quartus is 9GB and 150 thousand files. I'm sure that a small business / hobby version could fit in well less than 1GB. I think that the IceStorm tool chain, when combined with a good simulator and graphical front end fits in less than 100MB.

 

 

Link to comment
Share on other sites

I'm doing some cleanup of the 9902 code and here's a question:

 

Take this code for the clock divider (it is around line 200 in the full source file) and let's call it version A:

   clkdiv: process(CLK, clkctr_q, ctl_q)
   variable v : std_logic_vector(1 downto 0);
   begin
      v := clkctr_q;
      if rising_edge(CLK) then
         v := v + 1;
         if ctl_q.clk4m='0' and v="10" then v:="11"; end if;
         clkctr_q <= v;
      end if;
   end process;
   bitclk <= '1' when clkctr_q="00" else '0';

And let's call this slightly modified version, version B:

   clkdiv: process(CLK, clkctr_q, ctl_q)
   variable v : std_logic_vector(1 downto 0);
   begin
      v := clkctr_q;
      if rising_edge(CLK) then
         v := v + 1;
         if ctl_q.clk4m='0' and v="10" then v:="11"; end if;
      end if;
      clkctr_q <= v;
   end process;
   bitclk <= '1' when clkctr_q="00" else '0';

The Xilinx ISE tool chain accepts both versions with a warning that "variable 'v' does not keep its value outside rising edge(CLK)". Both versions simulate okay.

 

The Altera Quartus tool chain accepts version A without warning or error, and flags version B as an error (with a similar error text). Version A simulates okay.

 

In my limited understanding of VHDL both versions should be okay and not give a warning or error. What subtle aspect of VHDL am I missing?

 

(PS of course I can rewrite it to not have a warning or error, but I'm interested to understand why this is wrong and why the two tools chains have different opinions about it).

 

I've been trying to reply to this for two days, but there is just not enough time in a day... ;-)

 

There is a lot hiding in this example and it would be easy to talk about it for quite a while. Trying to keep it short.

 

I'm really surprised the second version works in ISE, and it is good that Quartus errors on it. The main problem with the second example (which took me a while to realize the difference), is that you are trying to assign a new value to a *register* (clkctr_q) without a clock, which to actually do would mean you are bypassing the register altogether and trying to assign the value to a wire.

 

Registers require clocks to move data from D to Q. The first example is a register-transfer, the second example is straight up combinatorial logic (which you can't do to assign a value to a register).

 

It is easier to see this if you draw the circuit you are describing:

 

post-24952-0-02313000-1524365293.png

This is also why using variable can be confusing IMO. The variable does not exist anywhere in the equivalent circuit, and the tools will remove them for anything but simulation. In the second example, the output of the MUX would bypass the register and just go to the Q wire, which creates a combinatorial loop. I'm guessing ISE is just ignoring this and forcing register behavior.

 

This is also a strange circuit to me and I'm not sure I understand what it is trying to do, although I am looking at it out of context from the rest of the HDL. You always increment the clkctr_q register and the sequence seems to go: 00,01,11,00,01,11... But the bitclk output will only be high when clk4m coincides with clkctr_r being "00". I don't know the frequency of CLK or clk4m, so it don't know how often that will happen.

 

The bitclk output will also only be high for the duration of CLK, and if CLK is faster than clk4m such that clkctr_q can count back around to "00" before clk4m goes high, bitclk will pulse high again, i.e. you could get two or more pulses of bitclk for a single clk4m low period. I don't know if this is intended, but it could a problem lurking.

 

Something else to note, your output goes through combinatorial logic, i.e. the MUX or equivalent logic generated with:

bitclk <= '1' when clkctr_q="00" else '0';

There are two characteristics to realize about doing this that may not be obvious:

 

1. The output combinatorial logic will be included in the delay path to whatever circuit bitclk is connected to. This might not be a problem in this case, but it will become an issue as your circuits get faster, and understanding this is very important.

 

2. If you were driving bitclk to an FPGA output pin, it can cause the output to glitch, which can cause problems for whatever you have connected to that physical pin.

 

You should register the bitclk output since you have a register-transfer process, which will eliminate both of the issues above.

post-24952-0-02313000-1524365293.png

  • Like 1
Link to comment
Share on other sites

The vendors keeping bit stream formats secret is no help either. I suppose they have a valid commercial interest in doing so, both creating customer lock-in and protecting against reverse engineering of designs.

I guess I never looked (never had a reason to know a bit stream format). Xilinx publishes some information though, for example they have documentation on how to replace parts of the bit stream on the fly. If you needed to just change the contents of a BRAM initialization your could find that part while the data was being sent to the FPGA and replace it with other data. I wrote a custom JTAG loader once and found all the documentation I needed about the bit stream files. That's also one of the reasons I stick with Xilinx. For all their faults, they have awesome documentation and I find they give more information than you would ever need.

 

 

Only after the initial 1.0 release of gcc (in 1989, about 100K lines) the context started to change, and gcc over the next 10-20 years rose to dominance. I think today the C compiler space is essentially gcc and llvm, with scarcely a proprietary player left.

 

Microsoft is a pretty big player I think. Their are also plenty of commercial compilers still used in industry, for example I have had to buy and use the Keil C compiler at work for some 8051 work.

 

As said, I'm of a similar mind. At the same time, the bloat of these systems is a pet peeve -- and actually kept me from installing any of it for very long time. Xilinx ISE is 17GB and 240 thousand files, Altera Quartus is 9GB and 150 thousand files. I'm sure that a small business / hobby version could fit in well less than 1GB. I think that the IceStorm tool chain, when combined with a good simulator and graphical front end fits in less than 100MB.

Yes, the tool bloat is INSANE! I always have to ask WTF could they *possibly* have in there that requires 7GB of download?!?! When I first downloaded ISE in 2011, it was only 2GB I think, maybe 3GB and I thought that was crazy. Now it is twice that size, and Vivado is worse!

 

I think that is what you get for using frameworks and other "tools" that generate code. It seems to be what "programmers" do today, they don't write code, they just come up with ways to connect frameworks, plugins, and modules. Anyway, I'm digressing and I sound like the old generation complaining about the next. ;-)

 

But, I deal with it since I'm interested in getting on with my projects rather than fussing over the tools. I even installed Win7 on real hardware just to run ISE, since it runs like crap on Win10, and just as bad on a Win7-VM. On Win7 installed on hardware, ISE is fast again like I remember it being, and I can work with it again.

  • Like 1
Link to comment
Share on other sites

Some further progress with the 9902 in VHDL. The code is feature complete now, and passes an extensive test set.

 

In the main source there are the below changes:

- added code to implement the "test mode" feature

- moved from the deprecated std_logic_vector_numeric library to the recommended unsigned_std library

- fixed a bug in the receive/transmit bit timers (the "/8" bit was not implemented right)

- some code cleanup

- some tweaks to optimise the synthesised circuit

 

The test bench is much extended now, with about 150 tests (not exhaustive yet, but broad coverage nonetheless). The 9902 code passes all tests on all of Xilinx, Altera and GHDL. Note that this is simulation of the original source code; I have not attempted simulation of the synthesised circuit yet.

 

When I look at the synthesis report of Xilinx and Altera I can see that the circuit is by and large extracted as I would expect (although some things I don't understand yet). There are some interesting differences though. Xilinx comes up with 158 flip-flops, which is what I would expect, Altera reports 162. The difference turns out to be that Xilinx uses binary encoding for the FSM's and Altera-one-hot encoding. Also, Xilinx also recognises the timer circuit as a FSM, whereas Altera only recognises the transmit and receive FSM's. In both tool chains the full 9902 circuit requires about 400 LUT/LE blocks on the FPGA.

 

As an experiment I also synthesised the code for a MAX7160 CPLD. This time the Altera software picks binary encoding for the FSM's and arrives at the expected 158 flip-flops. However, the combinational logic does not seem to fit behind these, and a further 55 macrocells are needed to fit the circuit, for a total of 213. As there are only 160 on the device it does not fit. I wonder if hand-coding in CUPL could make it fit, but I expect it won't be possible and it is too large a job to even try.

 

tb_9902_v2.vhd.txt

tms9902_v6.vhd.txt

  • Like 1
Link to comment
Share on other sites

Yes, the tool bloat is INSANE! I always have to ask WTF could they *possibly* have in there that requires 7GB of download?!?! When I first downloaded ISE in 2011, it was only 2GB I think, maybe 3GB and I thought that was crazy. Now it is twice that size, and Vivado is worse!

 

I think that is what you get for using frameworks and other "tools" that generate code. It seems to be what "programmers" do today, they don't write code, they just come up with ways to connect frameworks, plugins, and modules. Anyway, I'm digressing and I sound like the old generation complaining about the next. ;-)

 

But, I deal with it since I'm interested in getting on with my projects rather than fussing over the tools. I even installed Win7 on real hardware just to run ISE, since it runs like crap on Win10, and just as bad on a Win7-VM. On Win7 installed on hardware, ISE is fast again like I remember it being, and I can work with it again.

 

I'm running these tools in a Win7-VM on OSX, and it seems to work okay. Not snappy, but quite workable.

 

Next to the full bore tool chains, I've set up a tool chain based around the Textadept programmer's editor, the GHDL compiler/simulator and the gtkWave 'logic analyser'. On Linux this will give you a full VHDL compile/simulate tool chain in about 15MB. On windows or OSX it is a bit bigger, as you have to load the GTK library as well (~15MB). I can certainly recommend that to people who just want to learn VHDL and are not ready for multi-GB installs. I guess that GHDL could be replaced by iVerilog for Verilog source code and having both installed would add about 10MB to the total size (but I have not tried that yet).

 

I had a closer look at the Altera simulation tools. They weigh in at about 3GB. In part that size seems driven by installing a lot of support code, among which a full gcc-mingw install (>100MB). However 2/3rds seems to be files with detailed timing models for every component on every Altera FPGA (so there is a complete tree for each type/package/speed combination). You need that info to simulate the circuit in its synthesised form. That's many thousands of small files, adding up to >2GB. They could make the install a lot smaller by only installing the files for devices that the user was interested in (e.g. by download on demand). If they put all components for a device family in e.g. a sqlite database instead of separate files it would be more manageable and probably run faster. On the Xilinx side it is probably much the same story.

Link to comment
Share on other sites

You have obviously spent a lot more time paying attention to the tools and details about what they are doing. I just installed ISE and started writing VHDL. Whatever size my circuit synthesizes out to, well, I just accept it and don't pay much attention to how many flip flops things convert into. I have never worried about a really space-constrained CLPD though. Also, I have found that when you start pushing up against something like 96% or more of the resources, place-and-route takes a lot longer, some logic paths that used to be fine start failing timing, and small changes in the HDL can cause rather large changes in routing.

 

I did not even know there was an open source tool-chain, I might have too look into that. Thanks for the info.

 

I also did not know that you could run your VHDL through synthesis without synthesizing. I'm pretty sure the simulator in ISE works against synthesized HDL, or at least that's what is appears to be doing.

 

I tried my hand at using some variables in a small SPI module, along with records for the inputs and outputs, and some aliases for things like "stl" for "standard_logic_vector", etc. I'm not sure if I like it or not. I really don't see the advantage of using variables though. It is the same work flow, and in some cases it was a little confusing IMO. But it did work as expected, and maybe it made simulation faster? That is hard to tell because it was a small module.

Link to comment
Share on other sites

You have obviously spent a lot more time paying attention to the tools and details about what they are doing.

 

Well perhaps, perhaps not -- maybe I sound more certain about things than I really am: I'm on a learning curve here and what I believe to be correct today, may turn out a gross learner's mistake tomorrow; but putting "possibly", "hypothesis" or "current understanding" in every sentence gets to be a bit much.

 

I also did not know that you could run your VHDL through synthesis without synthesizing. I'm pretty sure the simulator in ISE works against synthesized HDL, or at least that's what is appears to be doing.

I (currently :^) think that this is a selectable option in ISE (the drop box just below the implementation-simulation radio button). If one selects "Behavioural" then I think it simulates the code as written, with combinational logic assumed to happen instantly. The point of doing this I think is that one can set breakpoints in process blocks, and get the main thrust of the model working. If one selects "Post place and route" then I think it simulates the circuit as the synthesised actual wires, LUT's and flip-flops, with their proper timing delays as part of the model. Haven't done any tests yet to truly figure this out.

 

The old Atmel tool chain has this difference clearly outlined in its workflow. See figure 1 on page 3 in this document. Maybe ISE is very different from ProChip and I have it all backwards.

 

I really don't see the advantage of using variables though. It is the same work flow, and in some cases it was a little confusing IMO. But it did work as expected, and maybe it made simulation faster? That is hard to tell because it was a small module.

Maybe the point about speedier simulation in some papers only applies to million gate circuits. I find that simulating the 9902 for 1.5 million clock cycles takes less than a second. Also, "compiling" simulators are apparently 100 times faster than "interpreting" simulators and the consideration may be outdated. As a beginner, I would not know.

Link to comment
Share on other sites

pnr's TMS9902 VHDL core works nicely! I integrated it into my TMS9995 breadboard design - I built this two years ago or so before I dived into the TI-99/4A hobby.

 

https://youtu.be/IGBE18uBV_o

In the video you can see the TMS9902 working nicely with an actual TMS9995 processor chip.

 

The breadboard is my version of Stuart's TMS9995 breadboard as documented very nicely by him at: http://www.stuartconner.me.uk/tms9995_breadboard/tms9995_breadboard.htm

In my breadboard I have a small Xilinx Spartan 6 FPGA board connected to the breadboard and delivering a lot of stuff, now including the UART.

This project is documented at: https://github.com/Speccery/fpga99

  • Like 5
Link to comment
Share on other sites

Thanks speccery! I think that I can now claim that my 9902 design is "FPGA proven" :^) I hope it will be useful to folks like FarmerPotato, Ksarul etc. to create new things.

 

Actually I can report a further success: I now have a simple 9900 system running on a FPGA, with no external components. I'm actually using the same prototyping board as speccery uses in the video (see picture). It has enough resources on the chip to emulate a system with 64KB of RAM/ROM.

 

It consists of 5 files:

- a small ROM with a test routine

- a small RAM

- the tms9902 (note: reconfigured to work with a 50Mhz clock)

- a tms9900 CPU, as developed by speccery for his retro-challenge project last year

- a "breadboard" file that wires up the above components.

All in all it is pretty much like the Grant Searle FPGA setup that started of this thread. I have attached the files for people that want to replicate the result. As before, the .txt extension has been added to make the forum happy.

 

The test routine initializes the 9902 and proceeds to send a continuous stream of "Y" characters over the serial port. I've verified that this works. Taking into account speccery's result above, I'm confident that once I extend the RAM en ROM to bigger sizes and put one of Stuart's images in the ROM it would work, but I have not done that yet.

 

The possibilities are endless. Besides all the "software breadboard" projects, it would be feasible to create ready made files for a TI99/4A (speccery has actually already done that last year), for a Tomy Tutor, a Geneve, a Powertran Cortex, a Marinchip M9900 or even a TI990 mini.

post-37953-0-36257600-1525195305.jpg

breadboard.vhd.txt

rom.vhd.txt

ram.vhd.txt

tms9902.vhd.txt

tms9900.vhd.txt

Edited by pnr
  • Like 4
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...