Jump to content
IGNORED

TMS9900 is the coolest CPU for multi-tasking


TheBF

Recommended Posts

I finally got around to getting my TI-99 multi-tasker working pretty much the way I wanted.

 

Traditionally commercial Forth systems were multi-tasking multi-user systems. I am told Forth Inc. could strap 12 terminals to an IBM PC and have good response for the users.

 

So why is the TMS9900 so cool? because it can create a new set of registers for itself anywhere in memory and change to that set of registers in 1 instruction!

 

For those who have never thought about it a conventional multi-tasking systems usea a special program called a scheduler that decides which program is going run and how long they get to do something. At some point while program A is running, the schedule program interrupts program A and gives program B a turn and so on with all the programs that are in the "schedule". So with only 1 CPU, it's really an illusion of multiple programs running at the same time.

 

The Chuck Moore's Forth multi-tasker was built to be very lightweight and has not got a separate program to schedule which task is going to run and when. (It's heresy, I know)

Instead a new task gets a turn EVERY time an input or output occurs. So after outputting a character to the printer another task gets a turn. If that task reads a key stroke, it releases control to the next task and so on...

 

This works well because most I/O takes tons of time so the CPU might as well go do something else.

 

So to make this kind of multi-tasker on the TMS9900 you can do a switch from one program to the next program in only four, yes that's right 4 instructions. This is unheard of.

 

Below is the code for the word YIELD, which changes control to the next task, written in a Forth Assembler language.

 

I did something with this that might be unique. I use the RTWP, instruction to jump to the next task in a list of tasks.

I can do that because I manually initialize all the registers in each task's workspace as if each task was already called once by BLWP.

 

It makes the actual switch to a new program 1 instruction.

Then the only thing we do is check a variable that is right after the workspace to see if the task is awake or asleep.

If it's asleep we just jump to the next workspace and so on.

I love this processor!

CODE: YIELD  ( -- )
              BEGIN,             \ CURRENT TASK:
                 RTWP,           \ one instruction switches context        14
l: _TSTAT        R1  STWP,       \ NEXT TASK: store NEW workspace in R1     8
                 32 (R1) R0 MOV, \ Read local TFLAG to see if I am awake   28
              NE UNTIL,          \ loop thru tasks until TFLAG is <> 0     10
              NEXT,              \ if task is awake, run next            \ 60 *.333 = 20uS
              END-CODE

So when all is said and done I have a set of words that lets me create tasks and run them, stop them or assign them new programs almost like Unix system, but much tinier.

 

Here is the DEMO program that I tested it with. I will get a Video up here shortly and the multi-tasking kernel up on git hub.

 

This can be ported to FBForth or Turbo Forth with just a little assembler code but mostly high level Forth.

\ CAMEL99 Forth Multi-tasking Demo
\ paste into system with mtask99.hsf installed
INIT-MULTI

CREATE TASK1   USIZE ALLOT
CREATE TASK2   USIZE ALLOT
CREATE TASK3   USIZE ALLOT

TASK1 FORK
TASK2 FORK
TASK3 FORK

DECIMAL
VARIABLE SPEED  25 SPEED !
: JOB1
          BEGIN
            15 3
            DO
              I SCREEN   ( change screen color)
              SPEED @ 5 MAX MS
            LOOP
          AGAIN  ;

: JOB2
          BEGIN
            90 65
            DO
               30 1 I 47 VCHAR
               25 MS
            LOOP
          AGAIN ;

VARIABLE X
\ run for a period of time then go to sleep
: JOB3
          X OFF
          2000 MS
          X ON
          MYSELF SLEEP  \ easy to I am all done
          PAUSE ;

 ' JOB1 TASK1 ASSIGN
 ' JOB2 TASK2 ASSIGN
 ' JOB3 TASK3 ASSIGN

With that code loaded, you type

MULTI

 

TASK1 WAKE

TASK2 WAKE

TASK3 WAKE

TASK1 SLEEP etc..

 

And the Forth console is still alive the whole time...

 

unless you say MYSELF SLEEP. :-)

 

Nighty night Forth.

 

  • Like 9
Link to comment
Share on other sites

Ok you got my attention... I'll be following this topic from here on out.

 

LOL. Ya this to me is one of the most fun things about little Forth systems. Somehow it got lost with some of the public domain systems that came out like the Forth Interest Group (FIG) . It might have been to avoid patent infringement or something like that.

 

But ya you can see quickly how some game stuff gets a h_ll of a lot simpler if you can give it to a little daemon and forget about it.

 

Since you are interested here is the code with a lot of comments. I am happy to answer questions... after I get some sleep.

 

 

 

\ TASKS99.HSF for CAMEL99                               06JAN2017 Brian Fox

\ May152017  RE-wrote for use with new Kernel that includes USER variables

[undefined] XASSEMBLER [IF] ."  **This is for XASM99 cross compiler"
                            cr ." Compile halted."  ABORT [THEN]

\ This multi-tasker takes advantage of the unique TMS9900
\ memory to memory architecure to create a 20uS task switcher.

\ WP in the 9900 CPU points to the current WORKSPACE which is normally
\ just the registers.  We extend the concept to include a set of
\ 15 USER VARIABLES and space for both stacks right above the registers.

\ *Therefore the WP becomes the USER POINTER (UP) of a conventional Forth multi-tasker.

\ Using WP to point to the USER area also lets us use the Workspace register
\ architecture futher. We can use registers 13,14 and 15 to link to another
\ workspace and use the RTWP instuction to change tasks in 1 instruction!
\ A very neat trick.
\
\ ALSO, the registers become user variables 0..15 of each task

\        ************* WARNING ****************
\ BLWP/RTWP R13 and R14 have been stolen by this MULTI-TASKER.
\ If you want to write code words that use BLWP/RTWP you must
\ save the contents of R13 and R14 before using BLWP

\ The simplest way in this Forth is to use the return stack:
\  R13 RPUSH,
\  R14 RPUSH,
\  WKSPX BLWP,       call your new workspace vector
\   R13 RPOP,
\   R14 RPOP,

\ =======================================================================
\ CAMEL99 MULTI-TASKING USER AREA
\ -----------------------------------------------------------------------
\   0 USER R0   LOCAL general purpose register     ( workspace begins)
\   1 USER R1   LOCAL general purpose register
\   2 USER R2   LOCAL general purpose register
\   3 USER R3   LOCAL general purpose register
\   4 USER R4   LOCAL Top of stack cache
\   5 USER R5   LOCAL overflow for mult. & div.,       // general purpose register (used by NEXT)
\   6 USER R6   LOCAL parameter stack pointer ('SP')
\   7 USER R7   LOCAL return stack pointer    ('RP')
\   8 USER R8   LOCAL Forth working register  ('W')    // general purpose register in code words
\   9 USER R9   LOCAL Forth interpreter pointer ('IP)
\  10 USER R10  LOCAL Forth's "NEXT" routine cached in R10
\  11 USER R11  LOCAL 9900 sub-routine return register // general purpose register in code words
\  12 USER R12  LOCAL 9900 CRU register                // general purpose register in code words
\  13 USER R13  LOCAL task link
\  14 USER R14  LOCAL Program counter: ALWAYS runs TSTAT routine
\  15 USER R15  LOCAL Status Register
\ ------------------------------------------------------------------------
\  16 USER TFLAG    LOCAL task's awake/asleep flag
\  17 USER JOB      contains XT of Forth word that runs in this task
\  19 USER VAR3
\  21 USER VAR4
\  22 USER VAR5
\  23 USER VAR6
\  24 USER VAR7
\  25 USER VAR8
\  26 USER VAR9
\  27 USER VAR10
\  28 USER VAR11
\  29 USER VAR12
\  30 USER VAR13
\ -----------------------------------------------------------------------
\   TASK Parameter stack base address 20 cells (grows downwards)
\   TASK Return stack base address    20 cells (grows downwards)
\ =======================================================================

\ Each task has a Process ID (PID)
\ In this system we use the workspace address as the PID

CROSS-ASSEMBLING
 CODE: MYSELF ( -- PID)         \  return my "Process ID"
            TOS PUSH,
            TOS STWP,           \ fetch the cpu WP register to Forth TOS
            NEXT,
            END-CODE

8300 CONSTANT: USER0            \ user0 is the main Forth task workspace

[CC] DECIMAL

[CC] 15 cells         \ calc. size of task memory block (168 bytes)
     28 cells +
     20 cells +
     20 cells +
     2+               \ 1 cell extra for safety
[TC] CONSTANT: USIZE


TARGET-COMPILING
\ define CPU registers as user variables
  12 user: 'SP             \ the task's Forth SP register
  14 user: 'RP             \ the task's Forth RP register
  18 user: 'IP             \ the task's Forth IP register

\ these registers are used by RTWP to change context
  26 user: TLINK           \ R13 = linked task wksp
  28 user: TPC             \ R14 = linked task program counter
  30 user: TST             \ R15 = linked task status register


\ T A S K   S W I T C H E R
\ ========================================================================
\ EXPLANATION OF THE MULTI-TASKER FOR TMS9900

\ Forth multi-taskers create a word, YIELD, that switches from one task "context"
\ to the next "context".  TMS9900 has a fantastic method. The Workspace.
\ CAMEL99 initializes the workspace of each task as if it had been called 
\ by BLWP. With at the workspaces pointing in a circle we can use the RTWP 
\ instruction to hop from one to next to the next.

\ But TMS9900 created a problem. The RTWP instruction will change context
\ immediately given an address and program counter in R13 and R14.
\ This is different than conventional round robin where YIELD remains in a
\ loop testing each task in the linked list, only leaving the loop when a
\ task is awake. (tflag<>0)

\ SOLUTION : *Divide YIELD into 2 parts*
\ Part1 : YIELD
\         Do the Forth style context switch at appropriate code boundaries.
\         In this case it's just one instruction. RTWP.

\ Part2 : TSTAT
\         Load R14 of each task with the address of the rest of YIELD (TSTAT)
\         TSTAT will run after the RTWP instruction hops to the new workspace.
\         TSTAT tests it's own TLAG variable to see if it's awake
\         If the task is asleep TSTAT jumps back to YIELD otherwise it's runs NEXT
\         which runs the Forth system for the awake task we just entered.
\
\ *Addressing the workspace registers and user variables with WP uses index# x 2
\ example:  R2 is accessed with 4 (R1) ...
CODE: YIELD  ( -- )
              BEGIN,                 \ CURRENT TASK:
                 RTWP,               \ one instruction switches context        14
\ -----------------------------------------------------------------------------------------
l: _TSTAT        R1  STWP,           \ NEXT TASK: store NEW workspace in R1     8
                 32 (R1) R0 MOV,     \ Read local TFLAG to see if I am awake   28
              NE UNTIL,              \ loop thru tasks until TFLAG is <> 0     10
              NEXT,                  \ if task is awake, run next            \ 60 *.333 = 20uS
              END-CODE

\ convert labels into Forth constants for the addresses of the code
[CC] HEX
NEXT2 [TC] constant: NEXT2             \ convert EQU to Forth constant

[CC]
T' YIELD 2+ [TC] constant: 'YIELD      \ *NOTE* we need the code address, not the code field addr.

_TSTAT    constant: 'TSTAT             \ convert code label to Forth constant

[CC] DECIMAL
TARGET-COMPILING
\ PID = process ID.  It is the address of the tasks' user area memory block
: LOCAL    ( PID uvar -- addr' ) MYSELF - + ;     \ usage:  TASK1 'SP LOCAL @
: SLEEP    ( PID -- )  0 SWAP TFLAG LOCAL ! ;     \ put PID to sleep
: WAKE     ( PID -- ) -1 SWAP TFLAG LOCAL ! ;     \ wake up PID

\ turn multi-tasking on or off by changing the CODE address in PAUSE
: SINGLE     NEXT2  T['] PAUSE ! ;   \ disable multi-tasking
: MULTI     'YIELD  T['] PAUSE ! ;   \ enable multi-tasking

( *** YOU  M U S T  use INIT-MULTI before multi-tasking ***)
: INIT-MULTI ( -- )
                MYSELF tlink !              \ Set my 'link to my own WKSP
                'TSTAT TPC !                \ set my task PC (R14) to run TSTAT
                MYSELF WAKE   ;             \ mark myself awake

: FORK ( PID -- )                           \ needs 168 bytes
       >R                                   \ copy taskaddr
       R@ USIZE 0 FILL                      \ erase user area
       USER0 R@ 60 CMOVE                    \ copy USER0 regs & vars to taskaddr
                                            \ -this sets R14 to TSTAT routine (IF init-multi was run)
                                            \  and it sets R10 to NEXT2

       R@ 100 +  R@ 'SP LOCAL !             \ set Rstack pointer to this user area
       R@ 140 +  R@ 'RP LOCAL !             \ set Pstack pointer to this user area

\ add this task to round-robin list of task s
       tlink @                 ( -- link)   \ get the current link round-robin link
       R@ tlink !                           \ replace it with addr of new task
       R@ tlink LOCAL !                     \ store the copy from above into new task's space

       R> SLEEP  ;                          \ mark this new task as asleep

: ASSIGN ( XT PID -- )
           dup  JOB local      ( -- xt PID addr )  \ get the address of JOB for task PID
           over 'IP local !    \ store addr of JOB in the PID's instruction pointer
           2dup JOB local ! ;  \ store the XT in the PID's JOB user var.

\ : RESTART  ( PID) 2DUP JOB LOCAL SWAP 'IP LOCAL !  WAKE ;

TARGET-COMPILING

[CC] HEX [TC]


 

 

Edited by TheBF
Link to comment
Share on other sites

The 9900 is cool, but every time you want to even increment a register, the CPU has to load from RAM, then perform the increment, and write the value back to RAM. That makes it a bit slower than a machine with internal registers.
The lack of a conventional stack pointer also makes some things a bit different. To maintain a stack you have to manage the stack with a register manually. No auto increment or decrement.
If it had a cache of the register contents, it could skip the load and write the value back to RAM when the memory bus isn't busy or you change the register file pointer.

Link to comment
Share on other sites

If it had a cache of the register contents, it could skip the load and write the value back to RAM when the memory bus isn't busy or you change the register file pointer.

 

 

But that creates a problem that I think someone raised in speccery's thread: some programs manipulate register values by reading/writing direct to where the workspace is stored in RAM (it's a perfectly valid programming technique). So the contents of a cache would lose sync with the register values in the workspace in RAM. Unless you had some sort of system to detect when this happens.

  • Like 1
Link to comment
Share on other sites

This would be an excellent application for robotics. For example, if a robot is executing a move forward 5 units, it is completely blind to its environment until the move command completes. With multitasking, it could still scan its sensor array while still executing the move command.

I would love to hear from Willsy and Lee about this... Well done!

Link to comment
Share on other sites

The 9900 is cool, but every time you want to even increment a register, the CPU has to load from RAM, then perform the increment, and write the value back to RAM. That makes it a bit slower than a machine with internal registers.

The lack of a conventional stack pointer also makes some things a bit different. To maintain a stack you have to manage the stack with a register manually. No auto increment or decrement.

If it had a cache of the register contents, it could skip the load and write the value back to RAM when the memory bus isn't busy or you change the register file pointer.

 

 

Ya for sure the 9900 has lots of warts. It's not just slow, it's glacial. The little MSP430 addresses most of these concerns from what I can see.

In a Camel99 Forth I manage 2 stacks and to prevent my stupid mistakes, I just made some macros called PUSH, POP, and RPUSH, RPOP, for the return stack. In the case of popping a stack the 9900 does it with 1 instruction so that's pretty good. For pushing it takes 2 instructions.

 

B

Link to comment
Share on other sites

This would be an excellent application for robotics. For example, if a robot is executing a move forward 5 units, it is completely blind to its environment until the move command completes. With multitasking, it could still scan its sensor array while still executing the move command.

I would love to hear from Willsy and Lee about this... Well done!

 

Yup. That's the idea. The one problem with the current implementation is KSCAN. That darn thing takes about .7 milliseconds to run.

I am told there are debouncing delays in the code and it they were implemented for a coopertative tasker, they would let other tasks work while waiting.

So I found some faster "check if any key is pressed" code that I will use to speed up my KEY() function.

Link to comment
Share on other sites

For my lectures I had to learn the MIPS architecture, and well, I have to admit, it is really impressive. I thought I could never love an architecture the way I did with the TMS9900. OK, the MIPS is a RISC 32 bit system, pretty different to what we know. But 32 registers ... operations with three registers ... fixed 32 bit command width ... no additional arguments ...

Link to comment
Share on other sites

For my lectures I had to learn the MIPS architecture, and well, I have to admit, it is really impressive. I thought I could never love an architecture the way I did with the TMS9900. OK, the MIPS is a RISC 32 bit system, pretty different to what we know. But 32 registers ... operations with three registers ... fixed 32 bit command width ... no additional arguments ...

 

Oh yes there are some beautiful instructions sets and architectures out there. MIPs is among them.

Sun Sparc also comes to mind with a circular set of registers on chip.

 

Forth guys also love the RTX2000 designed by Chuck Moore that performed more that 1 instruction per clock cycle, did sub-routines calls in 1 instruction and performed sub-routine returns for in 0 clock cycles. Honest. I believe the chip was specced at 12 MIPs with a 10 MHz clock. Pretty slick.

 

You can see something similar in Jim Bowmans FPGA J1 processor written in 200 lines of Verilog code.

 

B

Link to comment
Share on other sites

On the other side, I keep telling the students: Don't be too sad that we show you MIPS and not x86. We just try to spare you the ugliness of perpetual compatibility patchwork. :)

 

As we say in English: "Ain't that the truth"

(echt wahr)

Link to comment
Share on other sites

AFAIK the 9900 was based on TI's 990 Minicomputer (which had a typical mainframe/minicomputer CPU made up of discrete logic on boards). In a mainframe or mini, task switching is a very common task and it seems like TI was experimenting with a lot of ideas in the 9900 line. Unfortunately in the 99/4A, due to the external registers and crippled memory bus, the trade-off of the fast context switch over fast registers was not the best choice. Also, by not having to save the registers for a context switch, TI managed to avoid implementing a stack, stack pointer, and push/pop instructions.

  • Like 1
Link to comment
Share on other sites

lol. So I wrote up the little routine to scan the keyboard hardware directly.

 

Got it working. And it did what it was supposed to. Allow the multi-tasking programs to have more time to run because we only entered GPL space if a key was pressed.

 

Nice.

 

BUT... Classic99 PASTE function stops working because KSCAN is how it is getting characters into the emulator.

 

Oh well. it was a good exercise.

 

So basically if you write a game with this system you get maximum speed by putting the console to sleep until you need it.

 

I have considered putting the YIELD routine on the ISR timer, but then you have all the problems of share resources and record locking etc.

Chuck's way eliminates that but it means you have to manage time allocation yourself. That's the Forth way.

 

I need to get my sprites code working now. I want to replace ISR driven sprite motion with the multi-tasker.

 

We shall see how it works.

 

 

 

"Uthe the Forth Luke" said Obiwan after his trip to the dentist.

Link to comment
Share on other sites

The TMS 9900 CPU implements the multi-chip CPU from the TI 990/9 minicomputer on one chip. Almost. It has a separate clock chip.

 

The TI 990/4 and TI 990/5 minicomputers were actually driven by the TMS 9900 CPU. The larger TI 990/10 and the TI 990/12 had more advanced CPU designs, more similar to the later TMS 99000 series of microprocessors.

 

When TI designed the TMS 9900, at a time when most other LSI manufacturers were starting to consider 8 bit designs, chip technology was such that fast memory designs were capable of following logic circuits in speed. Thus they gained flexibility without really sacrificing anything. But already by the time the TMS 9900 ended up in the TI 99/4A, this had started to change.

 

Although there's no dedicated stack pointer in the TMS 9900 (there is in the TMS 99000), there is autoincrement. So although a PUSH must be implemented by

DECT SP

MOV Rx,*SP

 

you can do a POP by simply

MOV *SP+,Rx

 

I've implemented pre-emptive multitasking within the p-system, and as a separate assembly thing on the TI 99/4A. As you state here, the TMS 9900 is efficient for this. But in the p-system case, you have to keep within the constraints of the PME, as it's supposed to work on Pascal level as well.

Link to comment
Share on other sites

The TMS 99105 has stack support. So has for example the TI 990/12 minicomputer.

When I wrote TMS 99000 it was in a generic sense, since I didn't remember the detailed differences between the different introduced and planned versions of the TMS 99000 CPU.

Edited by apersson850
Link to comment
Share on other sites

Thanks chaps. I can see it in the data manual, now I know what to look for! So it looks like Rx is the register you're using for the stack pointer, it's pushing the address of the instruction following the BLSK onto the stack, and branching to Addr?

Link to comment
Share on other sites

I thought it would be fun to try to run 6 tasks on the poor old TI-99.

It actually worked.

 

I created some tools to monitor the tasks. .TASKS command shows the Process ID, the Parameter stack, the return stack, the instruction pointer of the task and the awake/asleep status.

 

I am closer to improving KSCAN for multi-tasking, but the code took me past the 8K file size limit of my cross-compiler, so that's the next improvement.

 

The MULTICAM EA5 program is on GITHUB and so it the MTASKDEM.FTH file in the \LIB folder on GITHUB.

 

https://github.com/bfox9900/CAMEL99

 

So you can play with this if you are interested.

 

I will get around to doing a demo version of the DENILE program one day. It really changes the way you do games when you can assign moving pieces to a separate task.

 

A movie of the 6 tasks test is attached.

TI99 6TASKS.mov

Edited by TheBF
  • Like 5
Link to comment
Share on other sites

I have found a much better timer method on the TI-99 for a cooperative multi-tasker.

 

I was using the 9901 timer before and waiting for it to count down for 1 milli-second.

That is not ideal because the entire system comes to a stop while waiting.

 

The ideal is you run the switcher routine whenever you have to wait, but TMS9900 is too slow to jump to another task and get back in 1mS.

 

So....

 

Since the interrupts are spinning a number at >8379 all the time, I can get an accurate 1/60 of a second measurement any time I want.

And in 1/60 of a second I can service some other tasks.

 

It's faster in a 16 bit Forth to read an integer so we will read >8378 which gives use the value at 8379 without playing with the bytes.

Much faster.

 

So the method is:

  1. read the value at 8378 & keep a copy on the stack
  2. Then switch to another task
  3. When you come back, read the 8378 again
  4. Goto 1, until it's not equal to the copy

Here is how it looks in Forth

: TMR@     ( -- n ) 8378 @ ;   \ read ISR counter

: 1/60    ( -- )    
          TMR@          \ read the timer onto stack
          BEGIN            
            PAUSE       \ service other tasks.
            DUP TMR@    \ copy 1st reading & get a new one
          <> UNTIL      \ loop until not equal
          DROP ;        \ DROP initial timer reading

The overall effect is a much snappier system, because it's not wasting time while doing timing delays.

Edited by TheBF
Link to comment
Share on other sites

After the success with the ISR timer, which is 16.6mS, I tried the 9901 timer with a 10mS countdown and it worked even better.

The key is to have something counting for the system while the programs are doing their own thing cooperatively.

 

So TMR@ reads the hardware in the 9901 instead of reading the memory address. I get finer resolution without

stopping everything just to read the TIMER.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...