An article about the TMS9900

vol · March 8, 2021

I have almost finished my study about the TI99/4a. It is a very unusual machine which makes it very interesting. Its processor is also very interesting. It's sad that Texas Instruments had to switch to the Z80 instead of using some kind of the updated TMS9900. I have made a summary about the TMS9900. Any corrections or suggestions for the content or style are welcome. Thank you

apersson850 · March 8, 2021

The TMS 9900 does have an unmaskable interrupt too. Or two, if you also consider RESET to be one.

RESET is normally used when power is applied. RESET has its vector at address 0x0000. The LOAD interrupt, which is non-maskable (corresponds to NMI in some other processors) has the vector at the other end of the address range, 0xFFFC.

The reason for bytes being processed in the most significant byte of a word is that then the same flag logic as for 16 bit values can be used.

To do a subtract immediate by using AI Rx,-VALUE isn't too difficult.

TI never changed to the Z80 CPU. It was just a suggestion, but it was turned down.

They did develop the TMS 9995, which was a 16-bit CPU with an 8-bit data bus (like the 8088). The TMS 9995 was much more efficient internally, and thus roughly three times faster. It was supposed to be used in the TI 99/8.

The TMS 9900 is really the multi-chip architecture of an early TI 990 mini computer on one chip.

Edited March 8, 2021 by apersson850

+mizapf · March 8, 2021

Interesting read; some comments from me ...

I never heard about "pseudo-registers"; it's not a term used by TI. The TMS9900 is a memory-memory architecture, hence the registers are in memory. At first I though you referred to the hardware registers as pseudo-registers (because you cannot freely use them). Registers are referred to in instructions; you have to provide their 4-bit numbers, unlike accesses to memory locations where you provide the address. From the point of view of the instruction set, it is not correct to say that the TMS9900 only has 3 registers; it does have 16 registers. If you like, you could probably differentiate between the hardware view (3 registers: WP, PC, SR) and the software view (set of 16 registers, freely locatable in memory).

Some instructions (XOP, BLWP) do what the E/A manual calls a "context switch". The term context is usually a bit broader, encompassing registers and other relevant data for a process. For the TMS9900, the relevant data are only the registers, so the context switch is done by a branch with workspace change. The set of registers is officially called the "workspace" rather than the context.

Also, a "dedicated 32-byte memory space" may imply that there is only one such space in memory. Any location in memory may serve as the start of the 16 word register area (if there is RAM, of course).

I did not quite understand your explanation of X (Execute): This command puts the value of the operand into the instruction register and executes that instruction. Hence, you can e.g. create a command during runtime.

You mentioned the "special role" of R0 in addressing, but you did not explain. The point is that you need one way to express the direct memory adressing without offset from a register, and for this, R0 (=0000 in the register number field) is used.

As for AI, the MIPS architecture neither has a (native) "subi" command, but only "addi". I never missed such a "SI", as the constants are always expressed in two's complement.

And my favorite topic, the 16-bit vs. 8-bit issue: There is actually no reason to call the TI-99/4A not a 16 bit machine. Have a look at the Intel 8088. Would you call machines "only conditionally called 16 bit" if they used a 8088 because it has a multiplexed 8-bit data bus as well? The data bus width is not relevant for the architecture width. In my view, many people seem to be way too cautious about this topic, as if you could be arrested for falsely claiming it to be a 16 bit machine.

Edit: What you should probably mention in this respect is that the system architecture of the TI-99/4A with its multiplexed data bus does not fully exploit the power of the 16-bit machine, also taking into account the very conservative wait state handling (2 wait states per access, which doubles by the multiplex).

Edited March 8, 2021 by mizapf

apersson850 · March 8, 2021

Pseudo-registers is usually used by people who aren't too familiar with the design. They feel that registers have to be in the CPU to validate as registers. From that point of view it's correct that the TMS 9900 has three hardware registers that are obvious to the user. There are other internal registers too, but you can't directly use them.

From the CPU's point of view, a context consists of the workspace pointer, the instruction pointer and the status register. They are all stored in R13-R15 when an interrupt occurs, or when a BLWP is called.

The user's process frequently has more local data, like its own stack or whatever. Since local data, or pointers to it, can fit into the workspace, very efficient context switches can occur. In a scheduler, it's usually convenient to pre-load the registers with the appropriate data, then call the process with the RTWP instruction. Thus the manager "returns" to the process, and the process is then either interrupted or stopped by a system call.

The TMS 9900, and the TI 99/4A, is clearly a 16-bit computer. That a 16-bit CPU also can access 8-bit data doesn't matter. Even 64-bit computers can frequently process characters, 8 bits long.

+TheBF · March 8, 2021

One thing that I think was missed was a discussion of context switching on the 9900. Even though the machine is glacial in execution speed a full context switch for multi-tasking can be accomplished in one instruction.

I don't know of any machine from that era that could do that. (SPARC maybe?)

I have had 30 processes running on Camel99 Forth just for fun and it worked just fine.

apersson850 · March 9, 2021

Zilog's Z80 got two register sets, to be able to do something similar. The large advantage of the TMS 9900 architecture is the virtually unlimited amount of register sets.

vol · March 13, 2021

Thanks a lot for the corrections. I have fixed my blog entry, I also added some additional info there. For convenience, I also made a document where all the changes are highlighted in yellow. Any additional corrections will be greatly appreciated. I am not a native English speaker. So any general suggestions on how to make the text better are also welcome.

On 3/8/2021 at 1:02 PM, apersson850 said:

To do a subtract immediate by using AI Rx,-VALUE isn't too difficult.

TI never changed to the Z80 CPU. It was just a suggestion, but it was turned down.

They did develop the TMS 9995, which was a 16-bit CPU with an 8-bit data bus (like the 8088). The TMS 9995 was much more efficient internally, and thus roughly three times faster. It was supposed to be used in the TI 99/8.

I try to joke about AI - you need some intelligence to use AI instead of subtraction.

TI stopped using their processors in their calculators since the 90s. They started using the Z80, and later 68000 and ARM.

Was there a 16-bit variant of the TMS9995?

On 3/8/2021 at 2:51 PM, mizapf said:

I never heard about "pseudo-registers"; it's not a term used by TI. The TMS9900 is a memory-memory architecture, hence the registers are in memory. At first I though you referred to the hardware registers as pseudo-registers (because you cannot freely use them). Registers are referred to in instructions; you have to provide their 4-bit numbers, unlike accesses to memory locations where you provide the address. From the point of view of the instruction set, it is not correct to say that the TMS9900 only has 3 registers; it does have 16 registers. If you like, you could probably differentiate between the hardware view (3 registers: WP, PC, SR) and the software view (set of 16 registers, freely locatable in memory).

On 3/8/2021 at 3:39 PM, apersson850 said:

Pseudo-registers is usually used by people who aren't too familiar with the design. They feel that registers have to be in the CPU to validate as registers. From that point of view it's correct that the TMS 9900 has three hardware registers that are obvious to the user. There are other internal registers too, but you can't directly use them.

Indeed the term pseudo-register is unofficial. However I just tried to find a common ground for the term register. The TMS9900 external registers are almost identical to the 6502 zero page memory and nobody calls this memory registers. The TMS9900 external registers are not the same as internal. Therefore I had to use a word which can show the difference.

apersson850 · March 13, 2021

Regarding the Z80, I thought you referred to the TI 99/4A computer. The Z80 was proposed for that machine, internally at Texas Instruments, but that never happened.

The TSM 9995 is a 16-bit CPU. It has some internal memory, which is 16 bits wide. It's only external memory access that's 8 bits wide. Like for the 16-bit processor 8088 from Intel.

TI did develop the concept one step further, into the TMS 99000 series. But they never became any commercial success. It was used in the TI 990/10A.

The 6502 does have registers internally. They are different from zero page memory addressing. That's a shorter way to specify the address.

As @mizapf wrote, the TMS 9900 is a memory to memory architecture. Just like the predecessor it's based on, the TI 990/9. There were also implementations of this concept with workspace caching, in the TI 990/12. So the concept, on which the TMS 9900 is based, has 16 workspace registers. The electrical implementation varies among them. Besides, when the TI 990/9 was designed, memory technology had advanced to a state where there was no significant speed advantage with hardware registers. This judgement would change again over time, as technical progress again changed the scenery.

+mizapf · March 13, 2021

With the TMS9995, things get yet another twist: The workspace registers may be put on the 256 byte on-chip RAM, the access is 16 bit wide and requires one cycle. There is not much difference to the hardware registers of other CPUs at that point.

+TheBF · March 13, 2021

8 hours ago, vol said:

Thanks a lot for the corrections. I have fixed my blog entry, I also added some additional info there. For convenience, I also made a document where all the changes are highlighted in yellow. Any additional corrections will be greatly appreciated. I am not a native English speaker. So any general suggestions on how to make the text better are also welcome.

I try to joke about AI - you need some intelligence to use AI instead of subtraction.

TI stopped using their processors in their calculators since the 90s. They started using the Z80, and later 68000 and ARM.

Was there a 16-bit variant of the TMS9995?

Indeed the term pseudo-register is unofficial. However I just tried to find a common ground for the term register. The TMS9900 external registers are almost identical to the 6502 zero page memory and nobody calls this memory registers. The TMS9900 external registers are not the same as internal. Therefore I had to use a word which can show the difference.

I think the thing that is missing in the revision is the actual difference between zero page memory and 9900 register workspaces.

The number of workspaces is limited only by memory in the 9900. You can create as many as will fit.

And I believe BLWP is the preferred way to call sub-routines by the architects, not BL with a simulated stack. (even though I use 2 at the same time)

The 9900 manual makes a big deal of the "memory to memory" architecture, so I think they were quite proud of it.

(Others here can provide more insight on that perhaps)

BTW, your use of English is flawless from what I can see. Did you go to school in English at some point?

vol · March 19, 2021

On 3/13/2021 at 7:07 PM, TheBF said:

I think the thing that is missing in the revision is the actual difference between zero page memory and 9900 register workspaces.

The number of workspaces is limited only by memory in the 9900. You can create as many as will fit.

And I believe BLWP is the preferred way to call sub-routines by the architects, not BL with a simulated stack. (even though I use 2 at the same time)

Thank you. Of course, the 6502 doesn't have the WP register so its zero page addressing is more primitive. However some the 6502 based systems, for example the Commodore 128 or Apple III use the MMU chip which allows to relocate zero page to any location in memory. So they provide WP functionality using an external chip. Moreover the 65816 (the Apple IIgs, ...), 65CE02, 6809 (the Tandy CoCo, ...) have the direct page register which provides the same functionality as WP. Indeed, these processors missed the BLWP instruction that makes their memory-to-memory features less advanced than the TMS9900 register architecture.
I've also added information from Washington Post to my material:

Quote

Interestingly, back in 1982, the TI-99/4A sales accounted for 34% of the US computer market for computers with an average retail price of $500. This was ahead of the Commodore VIC-20 (33%), Atari 400 (20%) and Tandy Coco (13%).

It is sad that TI stopped supporting the unique TI-99/4A architecture so early.

Edited March 19, 2021 by vol

apersson850 · March 22, 2021

On 3/19/2021 at 12:31 PM, vol said:

Thank you. Of course, the 6502 doesn't have the WP register so its zero page addressing is more primitive.

And significantly so. The fact that you can use a short address to reach 1/256 of the total memory doesn't make that part behave like registers. The 6502 is still limited to a single 8-bit accumulator for many of its instructions.

The TI 990/9 mini computer implements a CPU with good orthogonality. It's not perfect. The instruction format used doesn't allow that. But it's still good for the time. The TMS 9900 simply implements it on a chip. The 6502, "coming from the other end" (it's not a down-scaled mini computer), comparatively sucks in this department. It's more of a "we can squeeze this in, let's do it" architecture than one of some kind of structure and logic.

+TheBF · March 22, 2021

6 hours ago, apersson850 said:

It's more of a "we can squeeze this in, let's do it" architecture than one of some kind of structure and logic.

Ouch!

Very appropriate since the title is: "Emotional stories about processors for first computers"

( Edit: In case it's not clear, I share these feelings)

vol · March 23, 2021

On 3/22/2021 at 10:48 AM, apersson850 said:

And significantly so. The fact that you can use a short address to reach 1/256 of the total memory doesn't make that part behave like registers. The 6502 is still limited to a single 8-bit accumulator for many of its instructions.

Indeed, the 6502 zero page doesn't behave exactly like the TMS9900 registers but if you compare zp with the Z80 registers B, C, D, E you find out that zp behaves almost exactly like them.

The 6502 was designed primarily as a controller but it turned out versatile, fast and very cheap, and therefore it was pretty good for first cheap personal computers.

Edited March 23, 2021 by vol

vol · April 17, 2021

I have added a summary about the TMS9995. I hope I wrote correct information. Please let me know if there is anything wrong or if it just can be improved. Thank you.

senior_falcon · April 19, 2021

My understanding is that the 9995 at 12MHz is internally running at 3MHz, just like the 9900 in the TI99. But it is still something like 3x faster because the instructions take way fewer clock cycles.

Stuart · April 19, 2021

"This is a very unusual processor. The external data bus is 8-bit." Not very unusual I think. The TMS9980/81 had the same - 16 bit internally, 8-bit external data bus. Intel done the same on some processors.

"Instructions on the TMS9995 became much faster to execute, but only if they are located in the internal memory, or at least use data from the internal memory. But if an instruction and its operands are located in external memory, then it is executed, as a rule, even slightly slower. In addition, if we take an external clock frequency as the base, then even with the internal memory, the TMS9995 is slower than the TMS9900 at the same frequency." Only partially correct I think. The 9995 introduced instruction prefetch - it fetched the next instruction while processing the current instruction, and decoded the next instruction while storing the results of the current instruction. Much more efficient. Plus it didn't need the read-before-write cycle of the 9900. I'd be surprised if anything that the 9995 done was slower than the 9900, in internal memory or external.

+mizapf · April 19, 2021

8 data lines, as "unusual" as the Intel 8088, the CPU of the PC XT.

...

I cannot really follow the argument that the 9995 is slower than the 9900 in any respect. How do you get to that conclusion? Did you compare the instruction execution times from the tables in the specification documents?

Running benchmarks on the Geneve and the TI-99/4A does not prove much, as the systems add their specific amount of wait states. GPL mode 1 is strongly slowed down by additional wait states to achieve a comparable speed as the TI-99/4A. The fact that GPL speed 1 is slightly slower than the execution on the TI-99/4A does not entail that the 9995 is slower.

One example from the tables, let's take A (Add words), registers in external memory, instruction in external memory, no wait states.

TMS9900: 14 clock cycles @ 333 ns = 4.662 µs

TMS9995: 8 clock cycles @ 333 ns = 2.664 µs

The base cycles for Add for the 9995 are 4 cycles; due to the 8 data lines the number of cycles is twice, i.e. we get 8 cycles. This is still almost twice as fast as the TMS9900. As I said, this is all without wait states.

The base speed of the 9900 is half the speed of the 9995, since it takes two clock cycles for a single machine cycle, while the 9995 takes one clock cycle per machine cycle.

apersson850 · April 21, 2021

On 4/19/2021 at 10:30 PM, mizapf said:

The base speed of the 9900 is half the speed of the 9995, since it takes two clock cycles for a single machine cycle, while the 9995 takes one clock cycle per machine cycle.

Exactly, and then add the pipelining, which means that even if it's only 8 bits wide, the TMS 9995 is in reality constantly accessing memory. During an addition instruction, it uses the time when the ALU computes the result to read the next instruction (in parallel), and the time when the result is stored in memory to decode that instruction (in parallel).

The on-chip memory is 16 bits wide, so unless the programmer is insane, you use that for you workspace and perhaps frequently used data and/or small pieces of very time critical code.

vol · April 21, 2021

Thank you all for your interesting comments.

On 4/19/2021 at 10:58 PM, Stuart said:

"This is a very unusual processor. The external data bus is 8-bit." Not very unusual I think. The TMS9980/81 had the same - 16 bit internally, 8-bit external data bus. Intel done the same on some processors.

Indeed processors using 16-bit ALU and 8-bit data bus were quite known. Besides the 8088 (the IBM PC), we have the 68008 (the Sinclair QL), and 65816 (the Apple IIgs). I wrote that the TMS9995 is unusual, having in mind its other features: its internal memory, divided clock frequency, internal timer.

vol · April 21, 2021

On 4/19/2021 at 11:30 PM, mizapf said:

I cannot really follow the argument that the 9995 is slower than the 9900 in any respect. How do you get to that conclusion? Did you compare the instruction execution times from the tables in the specification documents?

Running benchmarks on the Geneve and the TI-99/4A does not prove much, as the systems add their specific amount of wait states. GPL mode 1 is strongly slowed down by additional wait states to achieve a comparable speed as the TI-99/4A. The fact that GPL speed 1 is slightly slower than the execution on the TI-99/4A does not entail that the 9995 is slower.

One example from the tables, let's take A (Add words), registers in external memory, instruction in external memory, no wait states.

TMS9900: 14 clock cycles @ 333 ns = 4.662 µs

TMS9995: 8 clock cycles @ 333 ns = 2.664 µs

The base cycles for Add for the 9995 are 4 cycles; due to the 8 data lines the number of cycles is twice, i.e. we get 8 cycles. This is still almost twice as fast as the TMS9900. As I said, this is all without wait states.

The base speed of the 9900 is half the speed of the 9995, since it takes two clock cycles for a single machine cycle, while the 9995 takes one clock cycle per machine cycle.

8 TMS9995 cycles in your example are actually 32 input (CLOCKIN) clock cycles. So you have proved that the TMS9995@12MHz is about 50% faster than the TMS9900@3MHz...

It seems it is rather easy to prove my point that the TMS9900 is faster than the TMS9995 as the same input clock frequency.
Let's analyse A (Add) command timing. For the the best case it takes 4 cycles on the TMS9995 but they are CLKOUT cycles which are 4 times longer than CLKIN cycles. So actually it takes at least 16 input clock cycles to do an addition on the TMS9995. The TMS9900 only needs 14 input cycles to this op. So it is about 15% faster for this case.
Let me present a short table which consists timings for some TM9900/9995 instructions:

       TMS9900    TMS9995
ABS    12-14      12
AI     14         16
B      8          12
BL     12         16
BLWP   26         44
DIV    92-124     112
INC    10         12
JMP    8-10       12
LDCR   22         44
LI     12         12
LIMI   16         20
MOV    14         12
MPY    52         92
RTWP   14         24
SLA    14         24

It is interesting that in some cases (ABS, DIV, MOV) the TMS9995 can be a bit faster than the TMS9900. However all these numbers are for a case when the TMS9995 uses its fast internal memory for code and data. The size of this memory is small so in general the TMS9900 which has 16-bit data bus is faster. However the TMS9995 has instructions for signed division and multiplication, and therefore the TMS9900, indeed, is slower for these cases. IMHO it is surprizing that shift instructions are so slow on the TMS9995, they are two times slower than MOV.

The fact that the TMS9995 uses one frequency for input and another for instruction timings is really confusing. Some processors (the R800, 486DX2, ...) use higher internal frequencies but the TMS9995 does something rather opposite to this.

Stuart · April 21, 2021

I think you're getting confused with the clocks. With the TMS9900, you have a 12 MHz crystal feeding a TIM9904A which generates a 4-phase, 3 MHz clock for the processor. With the TMS9995, what they have done is incorporate the clock generator into the processor IC - internally the 9995 is still using the same 4-phase 3 MHz clock as the 9900. So you can't compare a TMS9995@12MHz with a TMS9900@3MHz - they're both internally using a 4-phase, 3 MHz clock from a 12 MHz timing source.

+mizapf · April 21, 2021

I already prepared a longer text, but Stuart already said it, and I just lost any interest for adding more words.

The table you provided in your post is misleading, as it multiplies the cycle count of the TMS9995 by four without saying that the cycle time is one fourth. Comparing the crystal frequencies does not tell you anything about the processing speed. The Geneve runs much faster than the TI-99/4A although the processors run at the same cycle times, which is due to the higher efficiency of the 9995.

apersson850 · April 21, 2021

Yes, the TMS 9995 is 2-3 times faster than the TMS 9900. The conclusions drawn by @vol are based on a math error.

+mizapf · April 21, 2021

No, I guess this is based on vol's own measure of "MIPS/MHz" as described in the other thread. If you calculate it this way, you will get something like the 9995 requiring four times the clock cycles compared to the 9900, but only because it divides the input clock by 4. But this measure is, as we see here, obviously flawed, pretending to convey useful information, which is not the case.

You can calculate such a number, but I can also divide the size of the instruction set by the number of yogurts per week that I consume.

The four times higher frequency of the 9995 is not directly used for the machine cycles. We do not fully know what is happening inside the 9995; as I suggested, there may be a reason for the higher clock rate to drive the more complex microprograms, pipelining etc.

On the bottomline, as I said, it does not make sense to compare clocks, even in the same processor family (TMS). You may compare machine cycles per command, and maybe also compute MIPS, but the relation with the input clock is artificial.

And as vol said above, other processors use higher internal frequencies. TI could have done the same, integrating a frequency doubler or quadrupler for the CPU-internal elements, but they decided for the simpler variant of dividing the clock by four. In the proposed measure, the first approach is more efficient than the second, even though they are basically equivalent.

Edited April 21, 2021 by mizapf

An article about the TMS9900

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members