Question: What is precise interrupt occurrence in TI ASM?

gregallenwarner · November 15, 2017

I have a question regarding the nature of TI interrupts.

What is the precise nature of when an interrupt can occur? Here's my example:

Suppose I have a single instruction: SLA R0,1

Is this instruction atomic? The TMS9900 manual states that when an interrupt is raised, the CPU services the interrupt after the completion of the current instruction. But I'm not clear on what constitutes a single instruction.

With the shift instruction above, the CPU needs to read in the memory word at the location of the Workspace Pointer. Then, it needs to shift that word by 1 place, affecting the Status bits. Then, it needs to write the result back out to memory, which in and of itself consists of another read-before-write, thanks to the TI's architecture design.

Are each of those steps treated as individual instructions by the CPU? In other words, if an interrupt is raised at any point inbetween these steps, is it possible for the shift instruction to be interrupted halfway through? Or does the CPU treat this whole process as one atomic instruction, uninterruptible until its completion?

+adamantyr · November 15, 2017

A shift operation is considered a single atomic instruction, it wouldn't be interrupted in process.

Also, Interrupts only occur if you have the mask turned on, with the following instruction: LIMI 2

The pattern with most assembly programming is to leave the interrupts closed except at looping portions of your program, such as when you're waiting for input from the user, then to quickly open and close them.

You CAN run with them open all the time; if you use a user-defined interrupt it should check for interrupts every 1/60 of a second on the screen refresh, which is (very roughly) about every 150-200 instructions, assuming most are two-words in size.

+mizapf · November 15, 2017

In TMS9900 Data Manual, section 2.2:

When the level of the pending interrupt is less than or equal to the enabling mask level (higher or equal priority interrupt), the processor recognizes the interrupt and initiates a context switch following completion of the currently executing instruction.

+mizapf · November 15, 2017

Also, Interrupts only occur if you have the mask turned on, with the following instruction: LIMI 2

Actually, the mask is always there; it masks away lower prioritized interrupts. That is, LIMI 0 only allows level 0 interrupts (RESET, LOAD), while LIMI 1 allows interrupts of level 0 or 1, which are all that are supported in our TI console. For some reason unknown to me, LIMI 2 is always used, but the incoming interrupt is hardcoded to level 1.

One interesting point is that when an interrupt is raised and recognized by the TMS99xx, it decreases the mask by 1. That way, only interrupts of higher priority can interrupt an interrupt handler.

Edited November 15, 2017 by mizapf

Tursi · November 15, 2017

As far as I know from experimentation there are no violations to the "after" an instruction rule, I'm pretty sure the 9900 never breaks in the middle of an instruction.

However, there are a few circumstances after which interrupts are not checked for one instruction. For instance, after the jump to an interrupt vector. The datasheet seems pretty solid in this respect.

Stuart · November 15, 2017

Just adding that the interrupt signal lines from the sources come into the TMS9901, where the interrupts can also be masked and disabled. So for an interrupt signal from a source to actually causes an interrupt on the 9900, interrupts need to be enabled on the 9901, the interrupt mask on the 9901 must be set to enable that interrupt, and the LIMI level on the 9900 must be set enable that interrupt level.

gregallenwarner · November 15, 2017

Actually, the mask is always there; it masks away lower prioritized interrupts. That is, LIMI 0 only allows level 0 interrupts (RESET, LOAD), while LIMI 1 allows interrupts of level 0 or 1, which are all that are supported in our TI console. For some reason unknown to me, LIMI 2 is always used, but the incoming interrupt is hardcoded to level 1.

One interesting point is that when an interrupt is raised and recognized by the TMS99xx, it decreases the mask by 1. That way, only interrupts of higher priority can interrupt an interrupt handler.

Interesting, I had never heard about the interrupt mask being decreased by 1.

Does this mean that, if I use LIMI 2 as convention always dictated, can my interrupt handler be interrupted by yet another interrupt? Since 2 decreases to 1, and all interrupts are hardwired as 1 in the TI? Should we be using LIMI 1 to ensure interrupts can't interrupt interrupts?

sometimes99er · November 15, 2017

You CAN run with them open all the time; if you use a user-defined interrupt it should check for interrupts every 1/60 of a second on the screen refresh, which is (very roughly) about every 150-200 instructions, assuming most are two-words in size.

200 instructions per screen refresh * 60 times a second = 12,000 instructions per second = 0.012 MIPS. I think it should be around 0.136 MIPS. Is there something I'm missing?

+mizapf · November 15, 2017

Does this mean that, if I use LIMI 2 as convention always dictated, can my interrupt handler be interrupted by yet another interrupt? Since 2 decreases to 1, and all interrupts are hardwired as 1 in the TI? Should we be using LIMI 1 to ensure interrupts can't interrupt interrupts?

We don't need to. The interrupt level lines of the TMS9900 are hardwired to 0001. That is, whenever /INT is asserted, it is level 1.

The 9901 could have been used to set the interrupt level: It can be configured to have 15 interrupt lines (/INT1 to /INT15), and on its outputs IC0...IC3 it outputs the lowest number of the currently invoked interrupt lines. These lines IC0...IC3 should typically be wired to the IC0...IC3 lines of the TMS9900. So the intended use would be that if /INT7 is asserted on the 9901, it would trigger an interrupt that could be masked by LIMI 0 to LIMI 6.

But as I said, in the TI console, the IC lines of the 9901 are not used, and the IC lines of the 9900 are hardwired to 0001.

Thinking a bit longer about that, the idea to use LIMI 2 could make sense: When all interrupts are level 1, and the priority is raised by 1 every time, you can interrupt a program, and then interrupt the handler once more. If desired. But TI wrote the interrupt handler to mask all interrupts at once (LIMI 0 on entry), so this is only hypothetical.

I had to re-read all that stuff, although I once had to learn all this for the reimplementation of the 9900 in MAME - but this is already 5 years ago.

Edited November 15, 2017 by mizapf

gregallenwarner · November 15, 2017

Thanks for all the info! So, just to recap, any instruction in the TMS9900, no matter if the instruction itself is comprised of multiple steps of memory access, these instructions are treated as an atomic operation by the interrupt handling circuitry in the CPU.

I'm investigating the nature of non-blocking multitasking in the TMS9900, managed by a preemptive multitasking kernel, and to my knowledge, it's never been done before, due to the lack of an atomic Test-and-Set operation, such as TCMB and TSMB in the TMS99105.

Has anybody investigated non-blocking multitasking in the TMS9900?

Stuart · November 15, 2017

Interesting, I had never heard about the interrupt mask being decreased by 1.

Does this mean that, if I use LIMI 2 as convention always dictated, can my interrupt handler be interrupted by yet another interrupt? Since 2 decreases to 1, and all interrupts are hardwired as 1 in the TI? Should we be using LIMI 1 to ensure interrupts can't interrupt interrupts?

Looking at the manual, the "decreased by 1" isn't quite correct. It is set to 1 less than the interrupt level being serviced. So if you have done a LIMI 2 and the level 1 interrupt occurs, the interrupt mask will be set to 0 for the duration of the interrupt service routine.

Stuart · November 15, 2017

Has anybody investigated non-blocking multitasking in the TMS9900?

There's a multi-tasking OS for the 9900 called PDOS that may be of interest. See https://vaxbarn.com/index.php/other-bits/105-pdos.

"PDOS is a powerful multi-user, multi-tasking operating system developed by Eyring Research Institute, Inc., for the Texas Instruments compatible processor family. You use PDOS to design and develop scientific, educational, industrial, and business applications.

PDOS consists ot a small, real-time, multi-tasking kernel layered by file management, floating point, and user monitor modules. The 2K byte kernel provides synchronization and control of events occurring in a real-time environment using semaphores, events, messages, mailboxes, and suspension primitives. All user console I/0 as well as other useful conversion and housekeeping routines are included in the PDOS kernel."

+adamantyr · November 15, 2017

200 instructions per screen refresh * 60 times a second = 12,000 instructions per second = 0.012 MIPS. I think it should be around 0.136 MIPS. Is there something I'm missing?

It was a VERY rough estimation.

I was going off of an average of two words per instruction line (Some are one, uncommonly three) and an average of 50ms to execute a single word instruction. There's a LOT of factors to consider like if you're using 8-bit or 16-bit memory, the type of instruction executing, and so forth.

After looking at it I was mixing up clock cycles with MS, around 20ms per instruction makes more sense.

+mizapf · November 15, 2017

Looking at the manual, the "decreased by 1" isn't quite correct. It is set to 1 less than the interrupt level being serviced. So if you have done a LIMI 2 and the level 1 interrupt occurs, the interrupt mask will be set to 0 for the duration of the interrupt service routine.

Of course you're right. This is the only way it makes sense. :-) Thanks for clarifying.

matthew180 · November 16, 2017

Thanks for all the info! So, just to recap, any instruction in the TMS9900, no matter if the instruction itself is comprised of multiple steps of memory access, these instructions are treated as an atomic operation by the interrupt handling circuitry in the CPU.

...

The 9900 is a CISC CPU, so internally each instruction is indeed a series of micro-instructions that are controlled by micro-code (literally a smaller CPU inside the CPU). This is true of any CISC CPU that uses microcode. This is different from a RISC CPU where each instruction performs just one of the basic steps you describe above, and in that case each instruction is typically not divisible in of itself.

In a CISC CPU, each instruction is typically performed over several "machine cycles", and once the instruction cycle begins (usually with an instruction opcode fetch), it cannot be interrupted short of losing power. CPUs will usually check for interrupts only at very specific times in their internal state-machine, and in the case of the 9900 the only time interrupts are tested is after execution of the current instruction is complete (which means all the micro-steps needed to finish the instruction).

As already mentioned there are some instructions, specifically XOP and BLWP, where interrupts are not checked when they are done; probably because those instructions are executed in response to and interrupt and the ISR needs to be able to execute at least one instruction prior to itself possibly being interrupted by another higher-priority interrupt.

... But I'm not clear on what constitutes a single instruction.

For the 9900, execution of all individual assembly language instructions are atomic, no matter how many micro-states they have or what memory addressing mode they are using. There is no test-and-set instruction on the 9900, but you can emulate it by surrounding the multiple instructions with LIMI 0 and LIMI 2.

Asmusr · November 16, 2017

After looking at it I was mixing up clock cycles with MS, around 20ms per instruction makes more sense.

That's less than one instruction per interrupt! Around 7 micro seconds per instruction is more like it.

+mizapf · November 16, 2017

Looking at the manual, the "decreased by 1" isn't quite correct. It is set to 1 less than the interrupt level being serviced. So if you have done a LIMI 2 and the level 1 interrupt occurs, the interrupt mask will be set to 0 for the duration of the interrupt service routine.

... and this also invalidates my idea about the LIMI 2 usage. If all interrupts are level 1, the mask will always be set to 0000 when servicing the interrupt. So in fact, a LIMI 0 in the interrupt handler of the TI console is not needed. On the other hand, it does not hurt, and it looks better (you would not suspect a missing LIMI 0),

sometimes99er · November 16, 2017

It was a VERY rough estimation.

I was going off of an average of two words per instruction line (Some are one, uncommonly three) and an average of 50ms to execute a single word instruction. There's a LOT of factors to consider like if you're using 8-bit or 16-bit memory, the type of instruction executing, and so forth.

After looking at it I was mixing up clock cycles with MS, around 20ms per instruction makes more sense.

That's less than one instruction per interrupt! Around 7 micro seconds per instruction is more like it.

I'm not sure it's relevant to the interrupt, but assuming we haven't got many instruction per screen refresh, of course makes it interesting. I assumed ms to be microseconds and not the usual milliseconds. What is MS?

Instructions per screen refresh

150-200 ( 1 s / 60 frames / 50 microseconds / 2 )

1 s / 60 frames / 20 microseconds = 833

0.136 MIPS / 60 frames = 2,266

1 s / 60 frames / 7 microseconds = 2,380

I hope the last lines are more like it.

Edited November 16, 2017 by sometimes99er

+mizapf · November 16, 2017

We don't need to. The interrupt level lines of the TMS9900 are hardwired to 0001. That is, whenever /INT is asserted, it is level 1.

Just adding another bit of information: TI obviously dropped the idea of having 16 interrupt levels with the 9995. I believe the concept of fine-granular interrupt handling proved to be less useful than expected.

The TMS 9995 does not have these four IC lines anymore but separate /INT1 and /INT4 input lines. The macro instruction detection (MID) and the overflow take level 2, the decrementer is level 3, and they all have no external connections. The event counter has level 4, shared with /INT4. The Geneve only connects /INT1; no /INT4 from outside as far as I remember.

gregallenwarner · November 16, 2017

I assumed ms to be microseconds and not the usual milliseconds. What is MS?

Units for microseconds are denoted by 'us', with the letter 'u' approximating the Greek letter mu. Hope that helps.

There is no test-and-set instruction on the 9900, but you can emulate it by surrounding the multiple instructions with LIMI 0 and LIMI 2.

This is actually the very assumption I've been challenging in my head before coming here with my original post. There actually IS an atomic test and set instruction in the 9900!

The atomic test-and-set is the key to a non-blocking synchronization method, allowing threads to attempt to acquire a mutex without blocking, meaning, if the thread was unsuccessful in acquiring the mutex (because some other thread jumped in there before it was able to), the thread can be aware of this and branch off somewhere else other than its critical section and keep working on non-shared data, then come back and attempt to acquire the mutex again later. With the existing blocking methods, the unsuccessful thread simply blocks when it cannot acquire the mutex, completely unaware that it is temporarily halted in its task.

The other benefit to test-and set is that threads no longer need to bother with modifying the Interrupt mask anymore. You can safely synchronize threads without any threads having to muck around with LIMI instructions!

So what is this atomic test-and-set that supposedly already exists in the 9900?

I've actually already mentioned it at the top of this post:

SLA  R0,1

That's Shift Left Arithmetic, right? Yes, but it can also serve as an atomic test-and-set for non-blocking synchronization. Here's how:

Suppose we set up a mutex in memory somewhere. Let's call the label MUTEX. In this system, mutexes must be a 16-bit word, initiallized to >8000 by the kernel. A set bit in position 0 indicates the mutex is free, and is not owned by any thread.

Now to acquire the mutex, the user thread must set this bit to zero, so that other threads will see that bit 0 is clear, and know that the mutex is owned by someone else. But how can we do this atomically? And furthermore, the test-and-set operation atomically changes the value AND tests what the value used to be before the change!

Here's where Shift-Left-Arithmetic comes in. The user thread shifts this mutex one bit to the left. The '1' in bit position 0 falls off the end, and zeros shift in from the right. This has the effect of setting the mutex bit to 0, locking the mutex. (Additionally, since this mutex exists in memory, we need to quickly relocate our WP to that memory location, perform the shift, and immediately move our WP back again, like so...)

MUTEX  DATA >8000
       ...
       ...
* Acquire the mutex
       LWPI MUTEX
       SLA  R0,1
       LWPI WP  * Whatever our local workspace is

So even if multiple threads try and synchronize on this mutex at the same time, each thread is performing SLA R0,1 on this memory location, so only new zeros are ever getting shifted in, no matter how many threads attempt to grab it. As long as your multitasking kernel is sure to keep every thread's workspace local to itself, this works fine.

But what about the test portion of the procedure? Well, it's already been performed by the SLA instruction! We want to know what was the value of mutex bit zero before the shift occurred. The SLA instruction stores the value shifted out from position 0 in the carry bit of the Status register!

Bingo! We have all we need! Since a context switch preserves the Status register local to each thread, and because SLA is atomic, we have assurance that only 1 thread will ever acquire the mutex! If two threads are in contention for the same mutex, because we are shifting by one bit each time, only 1 thread's status register will grab the '1' bit in position 0 as it falls off the left side of the register. Everyone else will get zeros. So now we can use JNC to jump somewhere else if we were unlucky enough to miss the mutex!

MUTEX  DATA >8000
       ...
       ...
* Acquire the mutex
       LWPI MUTEX
       SLA  R0,1
       LWPI WP
       JNC  OTHER
       ...
       ... * Critical section of code
       ...

OTHER  ... * Do something else if we didn't acquire the mutex.

If non-blocking synchronization isn't needed, we can use this same method to simulate blocking, by jumping back to try and reacquire the mutex again if we missed it.

MUTEX  DATA >8000
       ...
       ...
GRAB   LWPI MUTEX
       SLA  R0,1
       LWPI WP
       JNC  GRAB
       ...
       ... * Critical section of code
       ...

Releasing the mutex is simple, Once the thread who owns the mutex is absolutely sure it is finished accessing shared memory, it simply releases the mutex by writing >8000 back to the mutex:

* (From local workspace)
* Release the mutex
       LI   R0,>8000
       MOV  R0,@MUTEX

* ( -or- From mutex workspace)
       LI   R0,>8000
       LWPI WP * Back to local workspace

That's what I've been working on. I've not tested this yet on real hardware, but that's why I wanted to confirm some theory about interrupts with you all here, and it seems to make sense. Non-blocking synchronization is a key and revolutionary factor when it comes to more modern approaches to parallelism, and now it's possible on the 9900 thanks to the atomic test-and-set instruction that's been hiding right below our noses the whole time! And it's faster than surrounding mutex-acquiring code with LIMI 0 and LIMI 2, since LWPI takes only 10 clock cycles vs. LIMI's 16 cycles.

Let me know what you all think of this theory.

Edited November 16, 2017 by gregallenwarner

apersson850 · November 16, 2017

It should work. Even if you get interrupted between the SLA and the JNC instructions, you'll eventually come back and continue with the correct value in the carry flag.

As you write, if some code, which find the mutex flag already zeroed, goes to do some "other" thing, that "other" could also be to signal a wait state. In such a case, the scheduler will notice and can call in another routine immediately, without having to wait for another task switch interrupt. If this is feasible or not depends on if your control code runs asynchronously with other events, of if you have a strict synchronous environment, where things must happen at regular intervals.

I've implemented pre-emptive multitasking inside the UCSD p-system once. As long as you run this on a normal 99/4A, it's simpler to do. There aren't interrupts happening at any unexpected time, but usually only on VDP blanking. Which means that it's not possible to be interrupted immediately after an interrupt. The PME (P-Machine Emulator) works like a CPU, where p-code are atomic instructions (almost all of them) and the assembly instructions in the emulator's interpreter act like the microcode in a real CPU.

If you have a 99/4A which can page in RAM where there's normally ROM, you can change the interrupt vectors and/or the interrupt service routines in the computer. Then you can for example use the timer in the PSI to control task switching instead. I've posted parts of my code to do that here before.

The reason for that the first instruction in an interrupt service routine isn't interrupted is that you need to execute one instruction there, to establish your whereabouts, i.e. proper values for WP, PC and ST for that routine. If you then immediately get interrupted again, these values are stored in R13-R15 of the new workspace, so you can BLWP return to the interrupted ISR and later BLWP return again, to the code that was interrupted in the first place.

MS referred to in a post above is probably Machine States. In some designs they are the same as clock cycles.

Many instructions executed on the TMS 9900 typically use 10-20 clock cycles, but the TI 99/4A adds wait states depending on where in memory the instructions and data are. On systems like mine, which doesn't have these wait states, you'll get less than 10 us per instruction in most cases.

Edited November 16, 2017 by apersson850

sometimes99er · November 16, 2017

MS referred to in a post above is probably Machine States. In some designs they are the same as clock cycles.

Oh. Thanks.

Tursi · November 16, 2017

But what about the test portion of the procedure? Well, it's already been performed by the SLA instruction! We want to know what was the value of mutex bit zero before the shift occurred. The SLA instruction stores the value shifted out from position 0 in the carry bit of the Status register!

That's pretty clever.

I've seen INC used on other systems. If you use SETO to set an unlocked mutex to >FFFF, then simply increment to try and acquire it, then only the first attempt will result in a >0000, testable with JEQ. It is slightly faster than the shift (I think, I don't have the docs to check, but the presence of SETO makes the release easier ). I guess another potential benefit is that the mutex can be anywhere in memory...? It does suffer the limitation that only 65535 simultaneous lock attempts can be safely discriminated.

A major downside of INC is that you can't easily wait on it like you can with the shift -- you'd hit 65535 too quickly and think you were unlocked. It's not safe to undo your INC with a DEC (you might undo an unlock operation by someone else), you'd have to poll and then take a chance on missing it when you detect that it's unlocked. The shift is nice since even if you do have to change the workspace (do you?), you can leave the workspace set while you poll.

Both of these work only with single processor systems, of course, since otherwise (without synchronization) the other CPU could change the value in RAM in the middle of an operation. The most common result of that is stuck locks, but that's not really a concern in the TI-99/4A

Edited November 16, 2017 by Tursi

Question: What is precise interrupt occurrence in TI ASM?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members