Jump to content

Photo

TMS9900 is the coolest CPU for multi-tasking


33 replies to this topic

#1 TheBF OFFLINE  

TheBF

    Moonsweeper

  • 281 posts
  • Location:The Great White North

Posted Wed May 17, 2017 10:29 PM

I finally got around to getting my TI-99 multi-tasker working pretty much the way I wanted.

 

Traditionally commercial Forth systems were multi-tasking multi-user systems. I am told Forth Inc. could strap 12 terminals to an IBM PC and have good response for the users.

 

So why is the TMS9900 so cool?  because it can create a new set of registers for itself anywhere in memory and change to that set of registers in 1 instruction!

 

For those who have never thought about it a conventional multi-tasking systems usea a special program called a scheduler that decides which program is going run and how long they get to do something. At some point while program A is running, the schedule program interrupts program A and gives program B a turn and so on with all the programs that are in the "schedule".  So with only 1 CPU, it's really an illusion of multiple programs running at the same time.

 

The Chuck Moore's Forth multi-tasker was built to be very lightweight and has not got a  separate program to schedule which task is going to run and when. (It's heresy, I know)

Instead a new task gets a turn EVERY time an input or output occurs.  So after outputting a character to the printer another task gets a turn. If that task reads a key stroke, it releases control to the next task and so on...

 

This works well because most I/O takes tons of time so the CPU might as well go do something else.

 

So to make this kind of multi-tasker on the TMS9900 you can do a switch from one program to the next program in only four, yes that's right 4 instructions. This is unheard of.

 

Below is the code for the word YIELD, which changes control to the next task, written in a Forth Assembler language.

 

I did something with this that might be unique.  I use the RTWP, instruction to jump to the next task in a list of tasks.

I can do that because I manually initialize all the registers in each task's workspace as if each task was already called once by BLWP.

 

It makes the actual switch to a new program 1 instruction.

Then the only thing we do is check a variable that is right after the workspace to see if the task is awake or asleep.

If it's asleep we just jump to the next workspace and so on.

I love this processor!

CODE: YIELD  ( -- )
              BEGIN,             \ CURRENT TASK:
                 RTWP,           \ one instruction switches context        14
l: _TSTAT        R1  STWP,       \ NEXT TASK: store NEW workspace in R1     8
                 32 (R1) R0 MOV, \ Read local TFLAG to see if I am awake   28
              NE UNTIL,          \ loop thru tasks until TFLAG is <> 0     10
              NEXT,              \ if task is awake, run next            \ 60 *.333 = 20uS
              END-CODE

So when all is said and done I have a set of words that lets me create tasks and run them, stop them or assign them new programs almost like Unix system, but much tinier.

 

Here is the DEMO program that I tested it with. I will get a Video up here shortly and the multi-tasking kernel up on git hub.

 

This can be ported to FBForth or Turbo Forth with just a little assembler code but mostly high level Forth.

\ CAMEL99 Forth Multi-tasking Demo
\ paste into system with mtask99.hsf installed
INIT-MULTI

CREATE TASK1   USIZE ALLOT
CREATE TASK2   USIZE ALLOT
CREATE TASK3   USIZE ALLOT

TASK1 FORK
TASK2 FORK
TASK3 FORK

DECIMAL
VARIABLE SPEED  25 SPEED !
: JOB1
          BEGIN
            15 3
            DO
              I SCREEN   ( change screen color)
              SPEED @ 5 MAX MS
            LOOP
          AGAIN  ;

: JOB2
          BEGIN
            90 65
            DO
               30 1 I 47 VCHAR
               25 MS
            LOOP
          AGAIN ;

VARIABLE X
\ run for a period of time then go to sleep
: JOB3
          X OFF
          2000 MS
          X ON
          MYSELF SLEEP  \ easy to I am all done
          PAUSE ;

 ' JOB1 TASK1 ASSIGN
 ' JOB2 TASK2 ASSIGN
 ' JOB3 TASK3 ASSIGN

With that code loaded, you type

MULTI

 

TASK1 WAKE

TASK2 WAKE

TASK3 WAKE

 TASK1 SLEEP etc..

 

And the Forth console is still alive the whole time...

 

unless you say MYSELF SLEEP.   :-)

 

Nighty night Forth.

 



#2 philipj OFFLINE  

philipj

    Moonsweeper

  • 375 posts
  • Location:Birmingham, Alabama

Posted Wed May 17, 2017 10:34 PM

Ok you got my attention... I'll be following this topic from here on out.



#3 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Wed May 17, 2017 10:41 PM

Ok you got my attention... I'll be following this topic from here on out.

 

LOL. Ya this to me is one of the most fun things about little Forth systems. Somehow it got lost with some of the public domain systems that came out like the Forth Interest Group (FIG) . It might have been to avoid patent infringement or something like that. 

 

But ya you can see quickly how some game stuff gets a h_ll of a lot simpler if you can give it to a little daemon and forget about it.

 

Since you are interested here is the code with a lot of comments. I am happy to answer questions... after I get some sleep.

 

Spoiler

Edited by TheBF, Wed May 17, 2017 10:44 PM.


#4 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,465 posts
  • Location:Flyover State

Posted Thu May 18, 2017 12:08 AM

The 9900 is cool, but every time you want to even increment a register, the CPU has to load from RAM, then perform the increment, and write the value back to RAM. That makes it a bit slower than a machine with internal registers.  
The lack of a conventional stack pointer also makes some things a bit different. To maintain a stack you have to manage the stack with a register manually.  No auto increment or decrement.
If it had a cache of the register contents, it could skip the load and write the value back to RAM when the memory bus isn't busy or you change the register file pointer.
 



#5 Stuart OFFLINE  

Stuart

    Dragonstomper

  • 675 posts
  • Location:Southampton, UK

Posted Thu May 18, 2017 4:36 AM

If it had a cache of the register contents, it could skip the load and write the value back to RAM when the memory bus isn't busy or you change the register file pointer.
 

 

But that creates a problem that I think someone raised in speccery's thread: some programs manipulate register values by reading/writing direct to where the workspace is stored in RAM (it's a perfectly valid programming technique). So the contents of a cache would lose sync with the register values in the workspace in RAM. Unless you had some sort of system to detect when this happens.



#6 Vorticon OFFLINE  

Vorticon

    River Patroller

  • 2,698 posts
  • Location:Eagan, MN, USA

Posted Thu May 18, 2017 5:21 AM

This would be an excellent application for robotics. For example, if a robot is executing a move forward 5 units, it is completely blind to its environment until the move command completes. With multitasking, it could still scan its sensor array while still executing the move command. 

I would love to hear from Willsy and Lee about this... Well done!



#7 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Thu May 18, 2017 5:46 AM

The 9900 is cool, but every time you want to even increment a register, the CPU has to load from RAM, then perform the increment, and write the value back to RAM. That makes it a bit slower than a machine with internal registers.  
The lack of a conventional stack pointer also makes some things a bit different. To maintain a stack you have to manage the stack with a register manually.  No auto increment or decrement.
If it had a cache of the register contents, it could skip the load and write the value back to RAM when the memory bus isn't busy or you change the register file pointer.
 

 

Ya for sure the 9900 has lots of warts. It's not just slow, it's glacial.  The little MSP430 addresses most of these concerns from what I can see.

In a Camel99 Forth I manage 2 stacks and to prevent my stupid mistakes, I just made some macros called  PUSH, POP,  and RPUSH, RPOP, for the return stack.  In the case of popping a stack the 9900 does it with 1 instruction so that's pretty good. For pushing it takes 2 instructions.

 

B



#8 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Thu May 18, 2017 5:49 AM

This would be an excellent application for robotics. For example, if a robot is executing a move forward 5 units, it is completely blind to its environment until the move command completes. With multitasking, it could still scan its sensor array while still executing the move command. 

I would love to hear from Willsy and Lee about this... Well done!

 

Yup. That's the idea. The one problem with the current implementation is KSCAN.  That darn thing takes about .7 milliseconds to run.

I am told there are debouncing delays in the code and it they were implemented for a coopertative tasker, they would let other tasks work while waiting.

So I found some faster "check if any key is pressed" code that I will use to speed up my KEY() function.



#9 mizapf OFFLINE  

mizapf

    River Patroller

  • 2,474 posts
  • Location:Germany

Posted Thu May 18, 2017 6:11 AM

For my lectures I had to learn the MIPS architecture, and well, I have to admit, it is really impressive. I thought I could never love an architecture the way I did with the TMS9900. OK, the MIPS is a RISC 32 bit system, pretty different to what we know. But 32 registers ... operations with three registers ... fixed 32 bit command width ... no additional arguments ...



#10 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Thu May 18, 2017 6:32 AM

For my lectures I had to learn the MIPS architecture, and well, I have to admit, it is really impressive. I thought I could never love an architecture the way I did with the TMS9900. OK, the MIPS is a RISC 32 bit system, pretty different to what we know. But 32 registers ... operations with three registers ... fixed 32 bit command width ... no additional arguments ...

 

Oh yes there are some beautiful instructions sets and architectures out there. MIPs is among them.

Sun Sparc also comes to mind with a circular set of registers on chip.

 

Forth guys also love the RTX2000 designed by Chuck Moore that performed  more that 1 instruction per clock cycle,  did sub-routines calls in 1 instruction and performed sub-routine returns for in 0 clock cycles. Honest.  I believe the chip was specced at 12 MIPs with a 10 MHz clock.  Pretty slick.

 

You can see something similar in Jim Bowmans FPGA J1 processor written in 200 lines of Verilog code.

 

B



#11 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Thu May 18, 2017 6:58 AM

Here is the silly little demo program.

Attached Files


Edited by TheBF, Thu May 18, 2017 6:58 AM.


#12 schmitzi OFFLINE  

schmitzi

    River Patroller

  • 3,769 posts
  • ToXiC
  • Location:Germany

Posted Thu May 18, 2017 8:28 AM

wow :lust:



#13 mizapf OFFLINE  

mizapf

    River Patroller

  • 2,474 posts
  • Location:Germany

Posted Thu May 18, 2017 8:36 AM

On the other side, I keep telling the students: Don't be too sad that we show you MIPS and not x86. We just try to spare you the ugliness of perpetual compatibility patchwork. :)



#14 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Thu May 18, 2017 9:03 AM

On the other side, I keep telling the students: Don't be too sad that we show you MIPS and not x86. We just try to spare you the ugliness of perpetual compatibility patchwork. :)

 

As we say in English: "Ain't that the truth"

(echt wahr)



#15 matthew180 OFFLINE  

matthew180

    River Patroller

  • 2,382 posts
  • Location:Castaic, California

Posted Thu May 18, 2017 1:02 PM

AFAIK the 9900 was based on TI's 990 Minicomputer (which had a typical mainframe/minicomputer CPU made up of discrete logic on boards).  In a mainframe or mini, task switching is a very common task and it seems like TI was experimenting with a lot of ideas in the 9900 line.  Unfortunately in the 99/4A, due to the external registers and crippled memory bus, the trade-off of the fast context switch over fast registers was not the best choice.  Also, by not having to save the registers for a context switch, TI managed to avoid implementing a stack, stack pointer, and push/pop instructions.



#16 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Thu May 18, 2017 9:15 PM

lol. So I wrote up the little routine to scan the keyboard hardware directly.

 

Got it working.  And it did what it was supposed to. Allow the multi-tasking programs to have more time to run because we only entered GPL space if a key was pressed.

 

Nice.

 

BUT... Classic99 PASTE function stops working because KSCAN is how it is getting characters into the emulator.

 

Oh well.  it was a good exercise.

 

So basically if you write a game with this system you get maximum speed by putting the console to sleep until you need it.

 

I have considered putting the YIELD routine on the ISR timer, but then you have all the problems of share resources and record locking etc.

Chuck's way eliminates that but it means you have to manage time allocation yourself. That's the Forth way.

 

I need to get my sprites code working now.  I want to replace ISR driven sprite motion with the multi-tasker.

 

We shall see how it works.

 

 

 

"Uthe the Forth Luke" said Obiwan after his trip to the dentist.



#17 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 374 posts

Posted Fri May 19, 2017 6:05 AM

The TMS 9900 CPU implements the multi-chip CPU from the TI 990/9 minicomputer on one chip. Almost. It has a separate clock chip.

 

The TI 990/4 and TI 990/5 minicomputers were actually driven by the TMS 9900 CPU. The larger TI 990/10 and the TI 990/12 had more advanced  CPU designs, more similar to the later TMS 99000 series of microprocessors.

 

When TI designed the TMS 9900, at a time when most other LSI manufacturers were starting to consider 8 bit designs, chip technology was such that fast memory designs were capable of following logic circuits in speed. Thus they gained flexibility without really sacrificing anything. But already by the time the TMS 9900 ended up in the TI 99/4A, this had started to change.

 

Although there's no dedicated stack pointer in the TMS 9900 (there is in the TMS 99000), there is autoincrement. So although a PUSH must be implemented by

DECT SP

MOV Rx,*SP

 

you can do a POP by simply

MOV *SP+,Rx

 

I've implemented pre-emptive multitasking within the p-system, and as a separate assembly thing on the TI 99/4A. As you state here, the TMS 9900 is efficient for this. But in the p-system case, you have to keep within the constraints of the PME, as it's supposed to work on Pascal level as well.



#18 Stuart OFFLINE  

Stuart

    Dragonstomper

  • 675 posts
  • Location:Southampton, UK

Posted Fri May 19, 2017 6:54 AM

 

Although there's no dedicated stack pointer in the TMS 9900 (there is in the TMS 99000) ...

 

Is there really on the 99000? Couldn't see any mention of it in the (preliminary) data manual. Have you found some details somewhere?



#19 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 374 posts

Posted Fri May 19, 2017 12:03 PM

The TMS 99105 has stack support. So has for example the TI 990/12 minicomputer.

When I wrote TMS 99000 it was in a generic sense, since I didn't remember the detailed differences between the different introduced and planned versions of the TMS 99000 CPU.


Edited by apersson850, Fri May 19, 2017 12:05 PM.


#20 mizapf OFFLINE  

mizapf

    River Patroller

  • 2,474 posts
  • Location:Germany

Posted Fri May 19, 2017 3:27 PM

BLSK - Branch immediate and push link to stack

 

Description:

BLSK Rx,Addr

 

has this effect: DECT Rx, (PC)+4 -> *Rx, Addr -> (PC)

 

(PC) = Program counter



#21 Stuart OFFLINE  

Stuart

    Dragonstomper

  • 675 posts
  • Location:Southampton, UK

Posted Fri May 19, 2017 4:23 PM

Thanks chaps. I can see it in the data manual, now I know what to look for! So it looks like Rx is the register you're using for the stack pointer, it's pushing the address of the instruction following the BLSK onto the stack, and branching to Addr?



#22 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 374 posts

Posted Sat May 20, 2017 2:35 AM

Yes. You assign a register as a stack pointer. I usually use R10, and that's also what the PME uses.



#23 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Sat May 20, 2017 9:20 AM

I thought it would be fun to try to run 6 tasks on the poor old TI-99.

It actually worked.

 

I created some tools to monitor the tasks.  .TASKS command shows the Process ID, the Parameter stack, the return stack, the instruction pointer of the task and the awake/asleep status.

 

I am closer to improving KSCAN for multi-tasking, but the code took me past the 8K file size limit of my cross-compiler, so that's the next improvement.

 

The MULTICAM  EA5 program is on GITHUB and so it the MTASKDEM.FTH file in the \LIB folder on GITHUB.

 

https://github.com/bfox9900/CAMEL99

 

So you can play with this if you are interested.

 

I will get around to doing a demo version of the DENILE program one day.  It really changes the way you do games when you can assign moving pieces to a separate task.

 

A movie of the 6 tasks test is attached.

Attached Files


Edited by TheBF, Sat May 20, 2017 9:28 AM.


#24 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Sun May 21, 2017 11:19 AM

I have found a much better timer method on the TI-99 for a cooperative multi-tasker.

 

I was using the 9901 timer before and waiting for it to count down for 1 milli-second.

That is not ideal because the entire system comes to a stop while waiting.

 

The ideal is you run the switcher routine whenever you have to wait, but TMS9900 is too slow to jump to another task and get back in 1mS.

 

So....

 

Since the interrupts are spinning a number at >8379 all the time, I can get an accurate 1/60 of a second measurement any time I want.

And in 1/60 of a second I can service some other tasks.

 

It's faster in a 16 bit Forth to read an integer so we will read >8378 which gives use the value at 8379 without playing with the bytes.

Much faster.

 

So the method is:

  1. read the value at 8378  &  keep a copy on the stack
  2. Then switch to another task
  3. When you come back, read the 8378 again
  4. Goto 1, until it's not equal to the copy 

Here is how it looks in Forth

: TMR@     ( -- n ) 8378 @ ;   \ read ISR counter

: 1/60    ( -- )    
          TMR@          \ read the timer onto stack
          BEGIN            
            PAUSE       \ service other tasks.
            DUP TMR@    \ copy 1st reading & get a new one
          <> UNTIL      \ loop until not equal
          DROP ;        \ DROP initial timer reading

The overall effect is a much snappier system, because it's not wasting time while doing timing delays.


Edited by TheBF, Sun May 21, 2017 11:19 AM.


#25 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Sun May 21, 2017 1:06 PM

After the success with the ISR timer, which is 16.6mS,  I tried the 9901 timer with a 10mS countdown and it worked even better.

The key is to have something counting for the system while the programs are doing their own thing cooperatively.

 

So TMR@ reads the hardware in the 9901 instead of reading the memory address.  I get finer resolution without

stopping everything just to read the TIMER.






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users