Designing MMU's for the 99xx and 99xxx

pnr · January 23, 2018

<Re-posted from "99110 ROM disassembly" because it is off topic there>

I suppose the TMS99105 still does not have what it would take to implement virtual memory, i.e. no support for page faults and generic restartable instructions?

In short:

- No, it does not have such support out of the box

- But maybe it can. Because of the 'registers in RAM' architecture, I think it might be possible with the help of external hardware (also for the 9900 & 9995)

But what is the point of demand paging when virtual memory space (64KB) is so much smaller than physical memory (say 1MB)? Maybe it only makes sense in the reverse situation. Also, does a TLB make sense when virtual memory is small?

Next to address translation, the other purpose of a MMU is memory protection. How to implement that on a 99xx/99xxx is an interesting question too.

As far as I saw, there is still a limit at 16 bit addresses, maybe a second bank, but not more. (I think there is a map bit.) It is a pity that the TMS architecture does not allow for more.

Yes, from a non-kernel program viewpoint, space is limited to 16 bits. The 99000 has two kludges to make it somewhat 17-18 bit like.

- Separation of instruction space and data space (not used on TI990 mini's). This was used with great success on PDP11 mini's and early Unix.

- The PSEL bit. Most mini's of the era had two memory spaces (kernel/user) driven by the supervisor bit in the status register. TI separated the two functions into separate bits, but when using a 74LS612 mapper or the TI990 MMU this is not fully exploited and the two bits move in tandem. With some new macro instructions PSEL could be made more useful.

+mizapf · January 23, 2018

One interesting concept is realized in the 99/8: Its mapper (AMIGO) is used to map logical addresses (16 bit) to physical addresses (24 bit). However, they could not expand the address bus, so the address is split in two words and transmitted over the address lines. The physical space decoder (MOFETTA) had to latch the first word and then wait for the second.

In the first word, the first eight bits in the mapper could be used to set to allow read/write/execute.

pnr · January 23, 2018

That's interesting, tell us more:

- Why could the 99/8 not have 24 physical address lines?

- What happens on a 99/8 when there is a violation of the read/write/execute permissions?

+mizapf · January 23, 2018

I'm not a TI engineer, but I would guess that 24 lines take up a lot more space on the board, and anyway they need to get down to 16 bit to keep compatibility with cartridges and the PEB.

The control bits in the mapper could be used to guard the memory against writing or execution. The mapper (which is the only part that we don't have logic diagrams for, so this is deduced from the specifications) raises an interrupt to the 9995 when the WE is asserted during access to that page, as is done when IAQ is asserted during an access to a page without execution permission. (This is actually still missing in the MAME emulation, so no need to try it.)

+mizapf · January 23, 2018

I wonder what would have been necessary to expand the logical memory space of the TMS processors while still keeping them mainly 16-bit architectures. One thing that comes to my mind is some kind of double register usage, similar to the MPY/DIV operations. What about

MOV **R0,R2

which would read the contents of the address in ((R0<<16) | R1), or vice versa write it there. One could also conceive an autoincrement **R0+ which uses R0 and R1 as a 32-bit value. But we would also need 32-bit operations like a INCD, DECD, and AD. However, and this would be the ultimate problem: There are no spare bits left in the command word to indicate such a mode. For two-register operations we always need twice 4 bits plus the mode bits (2), which leaves 4 bits for the opcode. Using three mode bits would result in 2*7 bits for the operands, leaving a single bit for the opcode. Have fun with two commands. Sigh.

pnr · January 23, 2018

which is the only part that we don't have logic diagrams for, so this is deduced from the specifications

This specification will make for an interesting read. Are the specifications online somewhere, and if so would you have a link?

There are no spare bits left in the command word to indicate such a mode.

It is possible to use a two-word opcode, where the first shifts the CPU into another mode, a bit like it works on a Z80 or 8086.

Something like this already happens in the 99000 instruction set for the 32 bit add, subtract and shift instructions (AM, SM, SRAM, SLAM, see page 73 of the data sheet).

One could imagine one prefix that makes the data 32 bit instead of 16, and another one that makes the address 32 bit instead of 16. Without the prefix a 16 bit address would refer to its own segment. This is then pretty similar to 'near' and 'far' pointers on a 8086.

Using macro code one could actually prototype a lot of this stuff on a 99000.

+mizapf · January 23, 2018

I just saw that the 99000 has 16 data lines, but after a short look over the specs, I could not find information about byte handling. Does it use read-before-write like the 9900? The 9995 has the advantage that it uses an 8-bit data bus, which means it can change single bytes in memory without RBW.

pnr · January 24, 2018

I just saw that the 99000 has 16 data lines, but after a short look over the specs, I could not find information about byte handling. Does it use read-before-write like the 9900? The 9995 has the advantage that it uses an 8-bit data bus, which means it can change single bytes in memory without RBW.

Yes, for MOVB it does read-before-write. For MOV, CLR, SETO, etc. it does not. MOVB R0,R1 on the 99000 executes in 4 clocks, same as 9995 (but the clock is up to twice as fast).

This specification will make for an interesting read. Are the specifications online somewhere, and if so would you have a link?

I found & read the specifications on WHTech:

http://ftp.whtech.com//datasheets and manuals/99-8 Computer/TI-99_8 Mapper Specifications 03-23-1983.pdf

http://ftp.whtech.com//datasheets and manuals/99-8 Computer/TI-99_8 The Mapper And Us 05-26-1982.pdf

It has an interesting approach. Some design choices I find strange, but that is probably because I don't understand the 99/8 context very well.

- It divides logical memory in sixteen 4KB blocks, with each block translated to a physical address through adding a 24-bit base value to the 12 lower logical address bits.

- It has to multiplex the physical address bus because the MMU is a single chip device and the designers ran out of pins: even with the multiplexed bus it needs 64 pins. To me, it would have made sense to include the dynamic ram control on the device as well and turn the multiplexing into an advantage, but there probably was a good reason not to do this.

- It adds a wait state during which the mapper can do its magic. That is a big cost, even taking into account that the dram would require a wait state of its own (33% slowdown). The wait state is probably necessary because (i) it uses full addition of the base register (rather than just replacing the top bits). The benefit of full addition is unclear to me, but it takes precious time to perform; (ii) it uses this state to output the top half of the physical address.

- The MMU holds a single map with 16 entries, 4 bytes per entry = 64 bytes. Loading such a big map creates a significant cost to switching from one map to another and this MMU uses a solution I have never seen before in this form. There is a separate static RAM chip on the processor bus that holds 8 'images' of 64 bytes for the MMU. Upon writing a control byte to the MMU, the MMU requests the bus (using hold/holda) and transfers one of the images in or out of the MMU using DMA, speeding up the transfer significantly. Complex but certainly cool.

- The protection bits work as you already mentioned. Interestingly, an illegal instruction fetch or memory read is not blocked, but proceeds until the interrupt is recognised. This leaves security holes, but the purpose was perhaps more to quickly interrupt crashed programs than to provide OS security. An illegal write is blocked though, allowing for shared read-only memory blocks. The extra MMU wait state helps here, because it creates ample time to prepare for blocking the /WE signal. If I understand the documents correctly, access to the static ram with mapper images and to the mapper control word is not subject to MMU write protection.

I guess that on a 9995 all the protection stuff has limited effectiveness anyway: all programs have the same full rights as the OS anyway as there is no supervisor mode.

However, it may be possible to add this to the 9995 (and 9900) with external hardware: a supervisor bit could be created in a CRU register (e.g. 74LS259). If the bit is set, the system goes to user mode, enabling mapping and enforcing protection bits. CRU operations are blocked in user mode. If there is a protection violation the hardware resets the CPU and the CRU register. Reset will abort the current instruction immediately (and actually save the user's WP, PC and ST in the reset workspace). A user program could call into supervisor mode by generating such a reset deliberately, for instance via the RSET instruction and some external hardware.

Figuring out a usable, safe interrupt mechanism for such a setup is a challenge, though.

+mizapf · January 24, 2018

Yes, for MOVB it does read-before-write. For MOV, CLR, SETO, etc. it does not. MOVB R0,R1 on the 99000 executes in 4 clocks, same as 9995 (but the clock is up to twice as fast).

Argh ... they just didn't learn. It could not have been too hard to add a bus code for upper/lower byte transfer.

How does it manage to do a MOVB in 4 clock cycles when it has to do a read first? The 9995 does not need it, as I said, since it addresses single bytes.

I found & read the specifications on WHTech: [...]

It has an interesting approach. Some design choices I find strange, but that is probably because I don't understand the 99/8 context very well.

The documents are - in parts - fun to read; the engineers seems to have developed some kind of sarcasm, as I feel it. ("The naming of the chips took a turn for the worse...")

As I already said earlier, I believe that if it had found its way into production, the 99/8 would have become a commercial disaster, for countless reasons. When you look at the Geneve, almost everything from RAM management to video, is done better than in the 99/8. Of course, the Geneve started four years later.

I had some fun with these documents when I implemented (or rewrote) the 99/ in MAME. There are some points that you just don't believe at first sight:

- Memory mapping by adding base addresses instead of using bit operations. You almost suspect this was sheer deliberateness to botch it up. So if you have a 24-bit physical space, why use a costly addition?

- I had a closer look at the privilege violation for the page accesses. This is only vaguely described ("an interrupt is signaled to the CPU"). In fact, the AMIGO mapper connects to the EXTINT* line which runs to the 9901, which again propagates it as INT1* to the CPU. This means it arrives at level 1. Wut? You can mask a privilege violation with LIMI 0? Or, even better, by blocking the CRU input? As you suspected, this rather looks like a way to detect software errors rather than to provide security. If they only had used the NMI, this would have make a bit more sense.

- 9995 variant using no on-chip RAM. The first thing I did on the Geneve was to squeeze code into the on-chip RAM and enjoy the high performance (e.g. for my Fractals program).

- 9995 configured for auto wait-state and clocked with 10.7 MHz (instead of 12). "Do you really want to hurt me? Do you really want to make me cry?", if I may cite a popular artist of the 80s.

pnr · January 24, 2018

Argh ... they just didn't learn. It could not have been too hard to add a bus code for upper/lower byte transfer.

How does it manage to do a MOVB in 4 clock cycles when it has to do a read first? The 9995 does not need it, as I said, since it addresses single bytes.

??

99000: 9995:

1. read opcode 1. read opciode high

2. read source word 2. read opcode low

3. read destination word 3. read source byte

4. write destination word 4. write destination byte

Note that there is no visible ALU cycle as it overlaps with instruction pre-fetch. Both the 9995 and the 99000 use this trick.

It is all documented in section10.6.3 and 10.6.4. of the datasheet. Actually, section 10.6.4. is so detailed that it is almost a listing of the microcode.

Edited January 24, 2018 by pnr

+mizapf · January 24, 2018

OK, right, you save a cycle when loading the opcode. Thanks for pointing to that section, this will be pretty helpful when I turn back to the 99000 in MAME.

matthew180 · January 24, 2018

I'm not a fan of hardware supported virtual memory or kernel vs. user memory in older retro systems like this. Resources are limited enough already, and making something that works with existing hardware and/or software is typically a priority.

Having support at the CPU level via enhanced instructions is key to making a memory-mapper usable, i.e. jump or branch instructions that know how to reach a destination address longer than 64K. Otherwise having to deal with managing the pages manually becomes a PITA (although no more of a hassle than dealing with a bank-switched cart). But to deal with that you are now looking at a situation of where to store the extra address bits, and how does the memory appear to the software. Intel dealt with this by using segments on the 8088/8086.

Other 8-bit systems like the MSX have already solved this for a 64K address space CPU, so how does the memory-mapper work in that system? Maybe the best thing would be to copy something that is proven, working, and supported in other systems.

pnr · January 25, 2018

I'm not a fan of hardware supported virtual memory or kernel vs. user memory in older retro systems like this. Resources are limited enough already, and making something that works with existing hardware and/or software is typically a priority.

I agree with the sentiment, but it also depends on ones retro reference. My retro interest is in building small boards that work like the 16-bit minis of the late 70's: the TI990, the PDP11 and the Nova/Eclipse. And actually, an early 80's home machine like the Cortex is not all that different (neither is the Geneve). My software reference is mostly early Unix, but also the Marinchip stuff:

http://www.stuartconner.me.uk/mini_cortex/mini_cortex.htm

https://www.fourmilab.ch/documents/marinchip/

For my next project I'm thinking about a 99000 based board with an MMU modeled on that of the Nova/Eclipse (essentially a fancy mapper).

But I'm also interested to hear about other concepts from back-in-the-day and other people's ideas for today's retro projects.

Other 8-bit systems like the MSX have already solved this for a 64K address space CPU, so how does the memory-mapper work in that system? Maybe the best thing would be to copy something that is proven, working, and supported in other systems.

Did a bit of Googling. MSX2 seems to have used a very simple mapping scheme with four 16KB blocks paged into up to 4MB ram (i.e. 8 bit page address). Many machines seem to have implemented a 4 bit page address, using two 74LS170 chips to implement the mapper. The Cortex and Geneve mappers appear more capable.

Having support at the CPU level via enhanced instructions is key to making a memory-mapper usable, i.e. jump or branch instructions that know how to reach a destination address longer than 64K. Otherwise having to deal with managing the pages manually becomes a PITA (although no more of a hassle than dealing with a bank-switched cart). But to deal with that you are now looking at a situation of where to store the extra address bits, and how does the memory appear to the software. Intel dealt with this by using segments on the 8088/8086.

Yes. This is what makes the 99000 so interesting: such instructions can be created in macrocode.

On possible solution is the early Unix overlay system. Here the tool chain keeps track of what code lives on what overlay and automatically switches between overlays as functions get called and return. It is mostly transparent to the programmer. This is how Ultrix-11 can run a 150KB kernel on a 16 bit PDP11.

Here's another idea that is specific to the 99000:

- All code must live on word boundaries, i.e. all target addresses for B, BL and BLWP are even. Effectively, bit 15 is not used and wasted.

- We could use that bit to hold an extra code address bit, above A0.

- Assume a machine laid out for separate I/D spaces; the PSEL bit is another address bit in I space, but ignored in D space (i.e. I space is 128KB, and D space is 64KB)

Now we can make three new macro instructions, "long" branches: LB, LBL and LBWP. Each of these behave as the normal branches, except that they look at bit 15 of the destination, set PSEL accordingly and then make the jump. So with these instructions we can branch anywhere in the 128KB code space. Total program space 128KB + 64KB = 192KB.

pnr · March 5, 2018

When considering MMU designs, perhaps it is good to look at the competitive field from an 1981 perspective. In 1981, arguably, the cottage industry around microcomputers separated in a “business” segment (Osborne, Kaypro, IBM PC, etc.) and a “home” segment. Early in 1981 the hobby magazine “Elektor” had a special supplement about 16 bit chips that I devoured. In the end I settled on a TI99/4A as the basis for my 16 bit endeavours.

At that time there were four 16 bit processors on the market:

- the 68000

- the 8086

- the Z8000

- the 99xxx / 99xx

These 4 processors were all remarkably similar: they came in DIP packaging, had a 16 bit data path, ALU and databus and roughly similar performance. Also the bus interface logic was quite similar across these chips.

Potentially, the list should also include the National Semiconductor 16032 (later renamed 32016). However, this chip had a 32 bit data path and ALU internally. Also, the chip was initially very buggy and usable silicon did not appear until about 1983, after 14 revisions of the design (revision letter “N”!)

I’d like to look at these chips from three perspectives: supervisor mode capability, segmentation vs. paging, and the handling of memory faults. The 99xxx looks to be interesting from all three perspectives.

- When it comes to supervisor mode capability, three of the four offer this: only the 8086 lacks this capability. The TI990 mini’s and the 99xxx offer this capability, but the 9900 and 9995 do not. It is hard to add with external hardware, because interrupts and system calls (XOP’s) must switch back to supervisor mode and the 9900 and 9995 do not offer (easy) signals to recognise this externally.

- All these designs initially chose segmentation to manage memory and only later reworked it into paging designs. This is interesting because the mini computer world had already decided in the late 70’s that paging was the way to go. I’m not sure why the microprocessor world initially chose segmentation. In the case of the 68000 the address space was linear, but its first MMU chip (the 68451) was designed around a segmentation scheme. The 8086 and the Z8000 series CPU’s were designed with native segmentation. The 99xxx could go either way: the TI990/10A used the chip with a segmenting MMU, but a paging scheme around a 74LS612 mapper was equally supported.

- The 68000 and the Z8000 were designed with hardware support for recovering from memory faults, but in both cases it did not work due to design errors. The 68000 had to be redesigned into the 68010 and the Z8001/2 into the Z8003/4 to get this fixed. So, from a 1981 perspective, none of the chips had working support for demand segmentation or demand paging. The 8086 does not claim to offer support for this; it would not be supported until the 80286. The 99xxx datasheet is silent on the topic, but my hunch is that the 99xxx does support demand paging with minimal external hardware.

JamesD · March 5, 2018

The 68000 has a 32 bit instruction set and a flat memory model. It's not exactly something that translates well to the 9900 series.
If you look at the 68000 series, at least look at the 68010 and higher as to what are considered supervisor instructions.
The external MMU chip is a bit different than the MMUs built into the later 680X0 processors if I remember right, and it requires a 68010.

If I were to design a larger memory system for the 9900 series now, I'd stick to a simple memory paging system.

Once you get to a 24 bit address buss, you can address 16MB of RAM which should be enough for anything you'd put on a TI derivative.
Then focus on a GCC/loader/linker combo that automatically works with the 24 bit memory setup rather than spend time on virtual or protected memory.

pnr · March 6, 2018

Thanks for those insightful comments James!

You are right, although the 68000 internally was a 16 bit chip, doing 32 bit operations in two steps, the architecture was 32 bit. With 24 physical address lines it could address 16MB directly, huge for 1981. I'll read up on 68010 a bit more.

I agree that a simple paging design, with only functional segmentation (instruction/data, user/supervisor), is probably the way to go. My interest in virtual memory on the 99xxx is more of a "retro challenge" than anything else, although it would enable experimenting with copy-on-write in early Unix.

When it comes to compilers I'm focussed on the C compiler from 2.11BSD that I ported to the 9995 a few years ago. It has support for overlays and separate instruction/data spaces built in. I used that compiler to port V6 Unix to the mini Cortex and this compiler now runs natively on 99xx hardware.

pnr · March 7, 2018

I’ve found an archived copy of that Elektor supplement from March 1981 that I devoured BITD. It can be found here (in Dutch):

https://archive.org/details/Elektuur20919813Gen

It is funny to see these old CPU's referred to as the new "super chips".

I think it appeared with the April 1981 issue of the UK edition of Elektor, but I have not been able to find an archived copy of this english version. I assume that Elektor had similar supplements in the other language editions. Does anybody else remember those supplements?

Edit:

Did find it:

https://archive.org/stream/ElektorMagazine/Elektor%5Bnonlinear.ir%5D%201981-04#page/n23/mode/2up

In the UK it was not a supplement, but simply page 23-46 of the April 1981 issue.

Edited March 7, 2018 by pnr

JamesD · March 7, 2018

Kilobaud has an article comparing the 68000, Z8000, and 8086. I ran across it last night on archive.org.
I didn't think about saving the link at the time though.

*edit*
I just ran across an old book. You can look at the 68012 as well. I extended the 68010 slightly.

Edited March 7, 2018 by JamesD

pnr · March 9, 2018

Below my notes on the 8086 MMU approach and how it could relate to a 99xxx.

The 8086 first appeared late in 1978 and essentially extended the 8080/8085 to 16 bits. It was not object code compatible, but 8080 assembler source code could be automatically converted into working 8086 source code using a conversion program that Intel provided. This proved a tremendous advantage as the existing CP/M code base could easily be ported to the 8086. Intel’s investment in helping Gary Kildall to develop PL/M and CP/M in the 1973-1975 era really paid off here.

The 8086 seems to have used all the chip area that technology would allow in 1978 to add an on-board MMU to the CPU chip and did not add any mini-computer features like a supervisor mode or support to deal with page faults. The MMU is simple, but effective:

- The MMU implements a segmentation scheme. Segmentation is along functional lines: instruction space (“code space”) is separated from data space as had been done on mini computers in the 70’s. Data space was optionally separated in a normal, a stack and an 'extra' data space.

- Each segment had a segment register (CS, DS, SS and ES respectively) which was 16 bits long. These 16 bits were added - offset by 4 bits - to a normal 16 bit address to create a 20 bit physical address.

- In the typical case, instructions were fetched using the CS segment register, data was stored/fetched using the DS segment register and stack operations used the SS register. ES was used with string instructions. However, using a prefix instruction a non-default segment register could be chosen.

- There were no facilities to limit a segment to less than 64KB in length and hence also no facilities to abort an illegal memory access. In this sense, it was no different from the earlier 8-bit CPU generation.

The 99xxx could use a 8086-like MMU with a little external hardware. Four 74LS170 chips can implement the four segment registers, and four 74LS283 fast adder chips can be used to do the addition of the segment to the base address. The segment registers could be loaded using parallel CRU I/O. The four segments could have been (i) instructions, (ii) workspace, (iii) data and (iv) extra. The first three derive directly from bus status codes, the fourth could have been selected using a prefix instruction (like LDD/LDS on a TI990). The prefix instructions and the instructions to load the segment registers could all easily be implemented in macro code.

I guess this all could have fitted in a single 48 pin ULA chip, which would have made a nice 8086-style MMU for the 99xxx. In a way, this would have been vaguely similar to the setup in a TI99/8. The key to making it work would have been in implementing an adder with full carry look-ahead so that it could be fast. Because implementing the four segment registers does not take much space, this would have been possible I think.

If done as full-custom silicon, such an MMU could have added a small amount of ROM with the matching supporting macro code. I wonder how successful such an add-on for the 99xxx would have been.

JamesD · March 9, 2018

Segmented architecture... <oomptf> I just threw up in my mouth a little.
But hey, it's easy if you follow the 8086 approach.

It's not the most efficient use of RAM though.

pnr · March 10, 2018

It would seem that John Walker of AutoCAD and Marinchip fame agrees with you, James!

This is what he wrote in May 1982 about the 8086:

https://www.fourmilab.ch/autofile/www/section2_10_8.html

By the way, writing in September 1981 he also has a view on other processors:

https://www.fourmilab.ch/autofile/www/chapter2_110.html

pnr · March 18, 2018

And here are my notes on another source of ideas (good and bad): the Z8000 series.

The Z8000 appeared early in 1979, in between the 8086 and the 68000. It was not a direct successor to the Z80, but shared its philosophy. It came in two versions, the Z8001 and the Z8002. The Z8002 was very similar to the 99xxx but with conventional registers and without the macrostore concept. The Z8001 added segmentation to the Z8002: each address was extended by a 7 bit segment number. Like the 68000, the Z8000 started without an established software ecosystem and Zilog ported V7 Unix instead, called “Zeus”. Zeus included a version of RM/Cobol to make it attractive as a business machine. Microsoft also ported Xenix to the Z8000. Nonetheless the chip failed to be a market success.

Segmentation on the Z8000 was somewhat similar to the page “0” and “1” on a 99xxx, but then with 128 segments instead of 2 pages. Each segment was up to 64KB in size. The program counter had a dedicated segment register and data accesses used two adjacent registers or memory words to hold an address and a segment number. The 7 bit segment and the 16 bit offset were distinct and could not easily be used as a 23 bit “flat” address. Next to these explicit segments, the Z8000 could also use functional segmentation (instructions/data/stack and user/supervisor). There was a companion Z8010 MMU chip that mapped up to 64 segments to real addresses; two of these could be used in parallel.

The Z8001 and Z8002 had an input signal to report memory faults. This input effectively was a non-maskable interrupt that could abort a faulting program. However, the faulting instruction itself ran to completion and could have irrevocably changed register values. Zilog realised early on that this was a mistake and announced the Z8003 and Z8004 that could abort instructions halfway through, along with a paging MMU, the Z8015. I did not find any 1981/82 designs that used the 03/04, maybe they were released later.

The abort mechanism on a Z8003/4 is intriguing: the abort signal seems to be a form of reset. When a memory access causes a fault the abort signal must be held active for 5 clocks simultaneously with the ‘wait’ signal active as well. Then a non-maskable interrupt must be asserted and the abort and wait signals released (note that on a Z8000 the reset signal must also be held active for 5 clocks to be recognised). What seems to happen is the following:

- asserting ‘wait’ stops the bus transaction from completing

- asserting ‘abort’ resets the microcode sequencer to a state where it recognises a non-maskable interrupt at the end of the current bus cycle (instead of the end of the current instruction)

- asserting the non-maskable interrupt causes the state of the processor (PC, status) to be saved and a recovery routine entered.

The Z8015 MMU latches the PC into a register on every instruction fetch, and counts the number of bus cycles since the last instruction fetch. After a fault, this information is frozen. Using these registers, a relatively simple routine can revert any changes the aborted instruction made so that it can be restarted later. All the details are in the 1983 data book (section 7, 9 and Appendix D):

http://bitsavers.trailing-edge.com/components/zilog/z8000/Z8000_CPU_Technical_Manual_Jan83.pdf

I'm not sure the Z8015 ever made it into production -- maybe it never got beyond engineering samples, like the later Z80,000 CPU.

Note that on the NS16032 abort and reset are actually the same pin, which suggests a close link also on that processor. It also almost makes you wonder if the Z8003/04 were really different from the Z8001/02 or whether it was just marketing, a bit like the 99105 has turned out not to be unique silicon.

Maybe the equivalent approach will also work on a 99xxx. On a 99xxx the reset signal is actually also a non-maskable interrupt, and it too will abort an instruction at the end of a bus cycle. However, it takes 3 clocks to be recognised and that could equate to 3 bus cycles. The following might work:

- identify an abort condition before the falling edge of CLKOUT;

- simultaneously assert ‘reset’ and de-assert ‘ready’, and wait for 3 clocks;

- release ‘reset’ and re-assert ‘ready’.

According to the datasheet, the 99xxx CPU will now finish the current bus cycle and proceed to save the processor state in the reset workspace R13-R15.

It will take some experimentation to find out if further hardware support is needed to be able to revert back any changes the aborted instruction may have made.

pnr · March 19, 2018

Well, I went ahead and set up another experiment on Stuart's 99110 PCB. Let's see if the 99000 can do Z8000 and 16032 style "reset-aborts"....

This time I added circuitry to generate a three clock reset signal upon a (simulated) page fault, keeping ready low at the same time. This seems to work, at least for what I have tested so far: it will abort the instruction halfway through, saving the proper PC and WP. A promising result!

The circuit is in a 22V10 GAL and the relevant logic formulas are:

fault  = !/mem * a0 * !a1 * a2 * a3 * !nclk;

/resout = !(fault * q2 + resin);

q0    := !fault;
q1    := q0;
q2    := q1;

Note: using a "=" sign means creating a combinatorial output and using a ":=" sign means creating a registered output (clocked on the rising edge of the pin 1 input, which is "nclk"). "/resin" is the button reset via a pair of schmitt-triggers (1/3rd of a 74ls14) and "nclk" is the inverted clkout. "/resout" goes to both the "/reset" and "ready" pins of the 99105. This test circuit makes address range >B000 .. >C000 illegal to simulate a page fault.

With this circuit, the instruction sequence

   LWPI >A000
   LI R1,>B000
   MOV *R1+,*R1+

will get aborted in the third instruction, with R1 increased to >B002. The microcode first reads R1 (storing its value internally), increases by two, stores the new R1 and then proceeds to fetch the source operand (see table 18 in the data book). Had the instruction run to completion (i.e. reset behaves like nmi), R1 would have increased to >B004.

For everything I tested so far, it would suffice to have:

the reset/abort logic as per above
a register holding the last correct IAQ address. If the IAQ itself causes the fault, the register is not updated. This is so because the last state of the previous instruction (the final write in many cases) did not take place and hence the previous instruction must be run again after the faulty prefetch is brought into memory.
a 4 bit register counting the number of memory accesses since the last correct IAQ. This is needed so that the roll back routine knows how far the instruction got before the page fault took place.

I think the above might fit in two 74ls374 chips and a 22V10 GAL.

I'm lucky how RTWP works. It turns out that the new WP is the last thing fetched (see page 85 of the data book), and the old WP is saved for every fault up to & including the fetch of the new WP: it can always be restarted with the old WP.

The problem appears to be with BLWP (see page 82). The first thing it does is fetching the new WP from the transfer vector. If this fetch faults, it would seem that the value read from the aborted fetch is stored as the old WP value in the reset workspace. Maybe this is my test setup having issues (this outcome is a bit strange after all), but if correct it means that such a fault will immediately loose the old WP value and it hence the instruction becomes impossible to restart.

One solution could be to make a page fault during a BLWP a fatal error. My Unix C compiler does not generate BLWP instructions, so it would be an uncommon thing in that context.

Maybe there is a way to handle it that currently escapes me: more work to do!

pnr · March 29, 2018

I did a lot more experiments with a logic analyser hooked up.

Things appear to be more complex: taking the READY signal low does not entirely stop the microcode from running. On the first clock the current bus cycle is extended, but on the second the CPU progresses to the next machine state / bus cycle anyway and only then starts the reset/interrupt 0 sequence.

For nearly all instructions this does not matter much, as nothing irreversible happens during that extra machine state. There is one exception: in that extra machine state the LWP instruction overwrites the WP register. If reading the new WP from a register fails (because the register workspace is located on a fault page) it cannot be restarted without first restoring the WP in some way.

So now there are two troublesome instructions to consider, BLWP and LWP. Part of the solution could be to save the previous WP just like the previous PC is saves. Luckily, every change to WP is echoed on the address bus in a special "WS update" bus cycle, so that isn't too hard to do. It does drive up complexity, though.

Designing MMU's for the 99xx and 99xxx

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members