Jump to content

Photo

Assembly on the 99/4A


613 replies to this topic

#601 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 3,963 posts
  • Location:Denmark

Posted Fri Oct 27, 2017 12:01 AM

Let's assume I have r1 pointing to a byte in memory, and I want to test it for being zero (the byte that is!). What would be the shortest (fastest) way of doing that.


	movb	*r1,r2		; 22 cycles - requires r2
	movb	*r1,*r1		; 26 "
	cb	*r1,r2		; 22 "      - requires r2 to contain value
	cb	*r1,@h0000	; 38 "      - requires address to contain value
h0000	data	>0000
 
 

Way back when I first wrote my TI99/4 assembly program I loved the capitals, right now I write my 9900 programs in lowercase!


Me too. :)

#602 matthew180 OFFLINE  

matthew180

    River Patroller

  • Topic Starter
  • 2,408 posts
  • Location:Castaic, California

Posted Fri Oct 27, 2017 12:35 AM

I find that comparing to zero is not something I do specifically, so it really depends on the circumstances.  For example, decrementing for a loop, the compare to zero is done for you:

  LI R2,32
LOOP:
  * do stuff
  DEC R2
  JNE LOOP

For your example where you are using a register as a pointer, and maybe dealing with a null terminated string, you could work the test into the loop itself and avoid a specific test for zero.  Something like this maybe:

TEXT DB  "HELLO",0
BUFF BSS 40

  LI R1,TEXT
  LI R2,BUFF
LOOP:
  MOVB *R1,*R2
  JNE  LOOP

Of course this assumes the destination is expecting that final 0 (zero) byte.  But this is just an example to demonstrate the idea that you can probably build the test into your loop without having to perform a specific compare for zero.  If you do need to compare specifically, the summary in sometimes99er's post above shows the fastest options.  Also keep in mind that the 9900 uses a RAM-based register-file, so even something like MOV *R1,*R2 is going to cause memory access (I get the feeling you know this).

 

I think the 9900 was heavily influenced by the PDP-11.


Edited by matthew180, Fri Oct 27, 2017 12:36 AM.


#603 mizapf OFFLINE  

mizapf

    River Patroller

  • 2,588 posts
  • Location:Germany

Posted Fri Oct 27, 2017 1:28 AM

Furthermore (r1) means *R1, just a different notation.
Way back when I first wrote my TI99/4 assembly program I loved the capitals, right now I write my 9900 programs in lowercase!

which means the standard assembler will run into trouble, won't it?

 

Although I'd prefer lowercase from all other programming languages, I'm using capitals for TMS9900 assembler because the language is defined by capitals in the manual. :)



#604 It's Sparky OFFLINE  

It's Sparky

    Combat Commando

  • 5 posts
  • Location:Nijmegen, the Netherlands

Posted Fri Oct 27, 2017 5:31 AM

which means the standard assembler will run into trouble, won't it?

 

Although I'd prefer lowercase from all other programming languages, I'm using capitals for TMS9900 assembler because the language is defined by capitals in the manual. :)

 

 

yes, I think so. That is why I use my own assembler! I needed an assembler/linker/loader toolset that is able to do split I/D for a future 99105 project (this will increase your effective memory to 128K since instructions and data are separated).

While creating it, I added some nice features, like local (reusable) labels to prevent name space cluttering:

    li r1,buffer
1:  movb *r1+,r2
    jne 1b

Automatic byte and word literal generation (done by the loader in text or data segment):

space  equ   ' '
...
       cb    *r1,=b(space)

Literals with the same value will be mapped to the same address. 

 

Automatic long/short jump expansion:

    c r1,r2
    bjne equal

This will automatically expand to a (short) jne, or a skipping jeq and a branch if the destination label is out of reach.

 

 

My loader generates mini-memory.ram files that are compatible with ti99sim, so I can directly run my programs in the simulator



#605 OLD CS1 OFFLINE  

OLD CS1

    River Patroller

  • 3,940 posts
  • Technology Samurai
  • Location:Tallahassee, FL

Posted Fri Oct 27, 2017 5:32 AM

which means the standard assembler will run into trouble, won't it?

 

Although I'd prefer lowercase from all other programming languages, I'm using capitals for TMS9900 assembler because the language is defined by capitals in the manual. :)

 

I ran into this when programming 6502.  Some sources are lower-case and some are upper.  What I tend to do, and this has migrated back into my 9900 programming, is use a different case dependent upon the situation.  For instance, if I am re-using code -- my own or someone else's -- I will preserve that case and use the opposite for whatever changes or additions I make.  In 9900, I will use a different comment delimiter, as well, such that if the source I am using has asterisks for comments I will use semicolons and vice-versa.



#606 It's Sparky OFFLINE  

It's Sparky

    Combat Commando

  • 5 posts
  • Location:Nijmegen, the Netherlands

Posted Fri Oct 27, 2017 5:38 AM

I find that comparing to zero is not something I do specifically, so it really depends on the circumstances.  For example, decrementing for a loop, the compare to zero is done for you:

  LI R2,32
LOOP:
  * do stuff
  DEC R2
  JNE LOOP

For your example where you are using a register as a pointer, and maybe dealing with a null terminated string, you could work the test into the loop itself and avoid a specific test for zero.  Something like this maybe:

TEXT DB  "HELLO",0
BUFF BSS 40

  LI R1,TEXT
  LI R2,BUFF
LOOP:
  MOVB *R1,*R2
  JNE  LOOP

Of course this assumes the destination is expecting that final 0 (zero) byte.  But this is just an example to demonstrate the idea that you can probably build the test into your loop without having to perform a specific compare for zero.  If you do need to compare specifically, the summary in sometimes99er's post above shows the fastest options.  Also keep in mind that the 9900 uses a RAM-based register-file, so even something like MOV *R1,*R2 is going to cause memory access (I get the feeling you know this).

 

I think the 9900 was heavily influenced by the PDP-11.

 

Yes I think you are right. The increase of the number registers (from 8 on the pdp11 to 16 on the 9900) costs a lot of instruction space. So the TI developers dropped the auto-decrement indirect addressing mode, and the nice orthogonal immediate addressing mode (and created seperate instructions for that). Maybe the tst(b) instruction was also deleted to make things fit. Still I feel uncomfortable to use an instruction that writes back to memory without a real reason. Maybe that is just me!


Edited by It's Sparky, Fri Oct 27, 2017 5:52 AM.


#607 Asmusr ONLINE  

Asmusr

    River Patroller

  • 2,466 posts
  • Location:Denmark

Posted Fri Oct 27, 2017 8:53 AM

which means the standard assembler will run into trouble, won't it?

 

Although I'd prefer lowercase from all other programming languages, I'm using capitals for TMS9900 assembler because the language is defined by capitals in the manual. :)

 

I have faithfully been using upper case and short labels for years, but with the Knight Lore project I finally changed to lower case and long labels. The code is now easier to type and long labels makes it easier to structure and understand. Even when I used upper case I always used cross platform assemblers: Asm994A to begin with and later xas99. I have always used conditional assembly instructions, e.g ifdef, so the code has never compiled on E/A anyway.



#608 matthew180 OFFLINE  

matthew180

    River Patroller

  • Topic Starter
  • 2,408 posts
  • Location:Castaic, California

Posted Fri Oct 27, 2017 11:58 AM

 

... Still I feel uncomfortable to use an instruction that writes back to memory without a real reason. Maybe that is just me!

 

Well, sadly, you will get that a lot on the 9900.  Also, the whole read-before-write just because TI left out the upper-byte / lower-byte control pins.

 

There are two instructions that don't write back to memory, yet can be used to test for 0:

 

ABS

CB

 

The problem with CB is that it requires a source to compare against, which means at the very least another memory read.  However, if you dedicate two registers to always be 0 and 1, then it is probably one of the fastest methods.

 

The ABS instruction skips the write to memory if the MS-bit of the original value was already zero.  However, ABS compares the whole 16-bits, so the LSB would already have to be zero, and the MSB would have to be between 0 and 127 to avoid the write to memory.  Finally, ABS compares the value "before" it is converted to a positive value.



#609 matthew180 OFFLINE  

matthew180

    River Patroller

  • Topic Starter
  • 2,408 posts
  • Location:Castaic, California

Posted Fri Oct 27, 2017 12:03 PM

 

... I have always used conditional assembly instructions, e.g ifdef, so the code has never compiled on E/A anyway.

 

I quit worrying about E/A compatibility a long time ago.  It was fine BITD, but the pain of writing code on the console and the confines of the E/A are nothing I miss or care to relive.



#610 It's Sparky OFFLINE  

It's Sparky

    Combat Commando

  • 5 posts
  • Location:Nijmegen, the Netherlands

Posted Sat Oct 28, 2017 3:48 PM

Hope you guys don't mind digging up an old but interesting post by matthew. 

 

...
However, that is not what happens. For the B instruction, indirect addressing will use the *value* of a register as the memory address to branch to. Lottrup's book actually has the best explanation I could find:


"The line B *R11, which returned several example programs to EASY BUG, meant to branch to the memory location addressed by the value in register 11."

In other words, use the *value* of R11 *as* the address the branch to, and not as the address to of where to look for an address to branch to.

 

Just as matthew describes, thinking about the meaning of the classic B *R11 instruction is contra intuitive. Funny detail, B R11 is a completely valid instruction which will 'jump into your workspace, executing your registers (starting at R11)', which has obviously limited usability. Maybe TI developers realised the need for a 'real' indirect jump when they created the BIND (Branch INDirect) instruction on the 99000. This powerful instruction can be used to jump to a routine from a jump table (using indexed addressing) or even returning from a subroutine call where the return address is on a 'stack': BIND *R10+

The introduction of this BIND instruction was paired with the BLSK (Branch and Link StacK) instruction. BLSK R10 is comparable with the standard BL instruction, but instead of storing the return address in R11, it will be stored in the address pointed by R10 (the 'stack' pointer) that will be pre-decremented by 2 (so before the storing is done). A perfect pair of instructions when you want to implement stack-like behaviour.

 

Maybe it is a cool idea to ramble about enhancing the 9900. Which instructions would you like to see? Maybe a super fast register set on chip? Of course it depends on your taste of your way of programming. Still love the original instruction set, don't get me wrong!

Franc



#611 It's Sparky OFFLINE  

It's Sparky

    Combat Commando

  • 5 posts
  • Location:Nijmegen, the Netherlands

Posted Wed Nov 8, 2017 3:01 PM

While writing some code to demonstrate 9900 assembly to my students I ran into the situation where my code needs to be readable and understandable. One of the easiest ways to accomplish this is to use functions (routines/subroutines) to break up and re-use code.

Of course, the use of functions involves a choice in calling conventions. On the 9900 we have several possibilities, for example those based on BLWP/RTWP or BL/B *R11. Apart from calling to and returning from a routine a decision should be made about passing of parameters. Inspired by other architectures (for example Sun/Sparc) I created a calling convention which is easy to use and has a lot of advantages:

·      Recursion is possible

·      Routine and subroutine pass parameters through registers

·      Each incarnation has its own free registers, no more implicit saving/restoring registers

·      Each incarnation has a number of scratch registers

·      Calling sequence is just 2 words

·      No implicit stack administration needed in routines

The basic idea behind this convention is to use overlapping workspaces: the routine and the subroutine share half of their workspace. So, there is a ‘stack of workspaces’ (growing from hi addresses to lo addresses). Each incarnation will use 16 bytes. Although this seems to be a lot, the total amount of memory used for the system depends on the depth of nested subroutines, which is in most applications really limited. The following table illustrates the idea:

 

scheme.png

 

When a function needs temporary register storage it can freely use R5, R6 and R7. However, their content will be destroyed as soon as the function calls another function. Parameters to the function are in R8 up to and including R12. Free (and persistent through calls) registers are R0 up to and including R4 which are also used to pass parameters to a subfunction.

 

To call a function, I wrote some code that handles all the administration through the use of the XOP instruction. So my assembler will rewrite:

     call routine

to

     xop routine,1

Which takes the same amount of instruction space as BL (2 words)

The routine itself will return to the caller using a standard RTWP instruction. Note that you can manipulate R15 before returning to signify an error condition (for example by setting the parity flag).

 

The XOP1 handler looks like this (a nice puzzle to see what is going on)

xop1:
      mov     r13,10(r13)
      mov     r14,12(r13)
      mov     r15,14(r13)
      mov     r11,r14
      ai      r13,-16
      rtwp

Funny that the ending rtwp is actually calling the routine!

 

Wonder what you think of this.


Edited by It's Sparky, Wed Nov 8, 2017 3:12 PM.


#612 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 436 posts

Posted Wed Nov 8, 2017 4:06 PM

I assume you all know that the TMS 9900 microprocessor actually implements the instruction set of the TI-990/9 minicomputer? The TI-990 was the successor of the TI-980 (now that was unexpected...). Where TI got the inspiration for the changes made between the two I don't know, but they sure changed quite a bit. The TI-980 is a much more conventional CPU design.

 

The assembler supplied with the p-system for the 99/4A has more functionality than the scaled down TI-990 assembler we know as the E/A package.

 

Overlapping registers was the parameter passing scheme we selected for the DSK 86000 CPU, designed as a dedicated robotics control CPU back in the 80's.


Edited by apersson850, Wed Nov 8, 2017 4:08 PM.


#613 matthew180 OFFLINE  

matthew180

    River Patroller

  • Topic Starter
  • 2,408 posts
  • Location:Castaic, California

Posted Sat Nov 11, 2017 8:14 PM

...

 

Wonder what you think of this.

 

My two cents:

 

Personally I'm a speed and memory use freak, so to me it seems overly complicated.  With a fixed number of possible parameters, it will be overkill for some calls, and not enough for others.  I think a valuable lesson you might impart on your students is that the right solution always depends on the system, language, and circumstances.  On limited systems you are always close to the hardware and have very limited resources (small amounts of RAM, probably not virtual memory, probably not fast disk storage, etc.), so the solutions used in modern languages like C, C++, Java, etc. with large memory don't always work well on a computer like the 99/4A.

 

It seems that using the stack for parameters-only would make better use of memory and be more flexible, i.e. if you only need to pass 1 or 2 parameters, then you only use memory for 1 or 2 parameters.  IMO all variables should be stored in memory and registers only used for temporary / immediate calculations.  Following this idea means you don't have to worry about preserving registers between subroutine calls.  Too many times I have seem programs that try to set up registers for specific uses through-out the program, and there is a lot of dancing around to keep registers intact.

 

I'm also not a fan of recursion and people try too hard to find ways to use it; just use a loop.



#614 TheBF OFFLINE  

TheBF

    Moonsweeper

  • 371 posts
  • Location:The Great White North

Posted Sun Nov 12, 2017 8:50 AM

While writing some code to demonstrate 9900 assembly to my students I ran into the situation where my code needs to be readable and understandable. One of the easiest ways to accomplish this is to use functions (routines/subroutines) to break up and re-use code.

Of course, the use of functions involves a choice in calling conventions. On the 9900 we have several possibilities, for example those based on BLWP/RTWP or BL/B *R11. Apart from calling to and returning from a routine a decision should be made about passing of parameters. Inspired by other architectures (for example Sun/Sparc) I created a calling convention which is easy to use and has a lot of advantages:

·      Recursion is possible

·      Routine and subroutine pass parameters through registers

·      Each incarnation has its own free registers, no more implicit saving/restoring registers

·      Each incarnation has a number of scratch registers

·      Calling sequence is just 2 words

·      No implicit stack administration needed in routines

The basic idea behind this convention is to use overlapping workspaces: the routine and the subroutine share half of their workspace. So, there is a ‘stack of workspaces’ (growing from hi addresses to lo addresses). Each incarnation will use 16 bytes. Although this seems to be a lot, the total amount of memory used for the system depends on the depth of nested subroutines, which is in most applications really limited. The following table illustrates the idea:

 

scheme.png

 

When a function needs temporary register storage it can freely use R5, R6 and R7. However, their content will be destroyed as soon as the function calls another function. Parameters to the function are in R8 up to and including R12. Free (and persistent through calls) registers are R0 up to and including R4 which are also used to pass parameters to a subfunction.

 

To call a function, I wrote some code that handles all the administration through the use of the XOP instruction. So my assembler will rewrite:

     call routine

to

     xop routine,1

Which takes the same amount of instruction space as BL (2 words)

The routine itself will return to the caller using a standard RTWP instruction. Note that you can manipulate R15 before returning to signify an error condition (for example by setting the parity flag).

 

The XOP1 handler looks like this (a nice puzzle to see what is going on)

xop1:
      mov     r13,10(r13)
      mov     r14,12(r13)
      mov     r15,14(r13)
      mov     r11,r14
      ai      r13,-16
      rtwp

Funny that the ending rtwp is actually calling the routine!

 

Wonder what you think of this.

 

 

I created a multi-tasking context switch using the RTWP instruction.  It's a pretty powerful instruction when used "backwards" this way.

 

Something that is worth considering for parameter passing is blending a stack with a register or two.

If you assign 1 register to the job of holding the TOP of stack like a little cache, then the stack becomes more efficient.

You can go with 2 cached registers for top and 2nd item, but that can result in more register pushing and popping than it's worth.

 

Another method that I have not fully explored would use a stack but then exploit BLWP to move the workspace to the top of the stack space.

 

This would let you push parameters onto a stack as outputs of a routine for example and then process them with register instructions.

I have not worked out the dynamics of this in detail, but it is something that the 9900 can do.  It would require that the stack grows upward I believe so that you can preserve R13,R14,R15 above the stack.

 

I must confess that I find overlapping register sets more complicated to think about than a stack, but that could be and Intel bias. :-)






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users