Assembly on the 99/4A

+retroclouds · June 25, 2010

I completely forgot about the X instruction. Never used that one.

hhmm, I guess if it could be used to write a compact "push registers" subroutine.

Seem to recall that the X instruction is rather slow ?

But what if it resides in scratch-pad memory.

Then again, could write some self modifying code there :roll:

Willsy · June 25, 2010

I completely forgot about the X instruction. Never used that one.

hhmm, I guess if it could be used to write a compact "push registers" subroutine.

Seem to recall that the X instruction is rather slow ?

But what if it resides in scratch-pad memory.

Then again, could write some self modifying code there

It's not slow. It's just that there is a cost. Clearly, with X the cost is the time to execute X plus the instruction you are referencing.

It's a Format VI word, so all the following are allowed:

X Rx
X *Rx
X @addr
X @addr(Rx)

Making X a seriously powerful instruction, and well worth the cost IMHO

Mark

Opry99er · August 17, 2010

So... Let's say I have a spreadsheet with 8 values in different columns and 10 rows.... This could be created abstractly in memory, could it not? Then you would basically be able to modify the values with one another by say "adding row 1 column 3 by row 4 column 7"..... Similarly to how the map data is held in memory and accessed by the scroll routine....

Is this thinking too abstractly--- and should I think more on lines of storing variables which can be plugged into registers and AI or MPY? I am specifically thinking about my statistic spreadsheet for the enemies in my game and how I could best use it all.

sometimes99er · August 17, 2010

So... Let's say I have a spreadsheet with 8 values in different columns and 10 rows.... This could be created abstractly in memory, could it not? Then you would basically be able to modify the values with one another by say "adding row 1 column 3 by row 4 column 7"..... Similarly to how the map data is held in memory and accessed by the scroll routine....

Is this thinking too abstractly--- and should I think more on lines of storing variables which can be plugged into registers and AI or MPY? I am specifically thinking about my statistic spreadsheet for the enemies in my game and how I could best use it all.

I guess you're in fact being abstract when you talk about columns and rows. The implementation in memory is a lower level though still a bit abstract. We're using things like decimals, hexidecimals, symbols and mnemonics, which the assembler understand, but the CPU doesn't. That's why we compile our assembler source into binary/machine code (or object code).

I think it's very common to use arrays and multidimensional arrays, almost as spreadsheets and even with formulas, formats, expressions if one really wanted, but usually arrays store the same type of information (either all strings or bytes or pointers or something). It's easy to go from multidimensional array to memory layout. Didn't Matthew explain that ?

Edited August 17, 2010 by sometimes99er

Opry99er · August 17, 2010

Matthew explained alot that has been bleached from my mind by the sun and late nights. Sadly, I have had very little time and/or inspiration to do much with assembly recently and I'm learning that knowledge DOES seep out and escape if you don't work on it and maintain.... I'm re-reading this thread and having a blast doing it. I miss Mr. Matthew.

sometimes99er · August 17, 2010

I miss Mr. Matthew.

I guess he's writing a book. And I'm off to hopefully give Strawberry an hour of undivided attention. :arrow:

Opry99er · August 17, 2010

Great!! You know, with that tool (Strawberry)---- the floodgates will open......

matthew180 · August 19, 2010

Actually, I just have too many projects in the fire at once, so I'm trying to focus on a few and get them done so I can start paying attention to the others. I'm still lurking, but the technical conversations seem to have died down lately. Probably just the summer months when everyone gets busy with the non-virtual (i.e. real life) activities.

As for losing what you don't use, absolutely you will forget the details! When I start coding in 9900 assembly after not using it for a while (even as short a time as a few weeks), I spend a lot of time looking up instruction details, memory addresses, etc. Computers are all about details and being exact, things our brains are not necessarily so good at, especially after taking a break from the them. Most things in life are like that I think, but some disciplines expose this human weakness better than others. I don't think anyone ever forgets how to ride a bicycle, but I promise everyone has forgotten which memory address is the VDP read and which is the VDP write.

I have a really great book called The 8088 Project Book where the author takes you through building a pretty complete 8088 based computer on solderless breadboards. But the best aspects about the book are that it is well written, you learn a lot about how computers go together, and the author has these "rules of life and design" through out the book that are great. A few of my favorites are:

When selecting parts: "If you can't afford to blow it up, you can't afford to use it."

Discussing hardware details: "Don't waste brain space on stuff you can look up somewhere."

So... Let's say I have a spreadsheet with 8 values in different columns and 10 rows.... This could be created abstractly in memory, could it not? Then you would basically be able to modify the values with one another by say "adding row 1 column 3 by row 4 column 7"

How you store your data depends on a lot of factors. Remember, the first Law of Life and Design:

"Know what you want to do before you try to do it."

What does the data represent, what will it be used for, and what do you need to do to it? Being abstract is okay when you are coming up with the overall structure and design, but at some point you have to get down to the details and implement things on the computer. At that point you have to have already worked out the details about exactly how things need to work. Also keep in mind that the way you might solve the problem on paper or in a spreadsheet is not necessarily the same way you would do it in a given programming language. The most important thing is to know your data and know your tools.

Matthew

Opry99er · August 19, 2010

I was referring specifically to my monster attribute spreadsheet.... If I need to take column 3, row 2 (the HP of a skeleton) and load it into a register to be acted on during a battle sequence, I was thinking that a table of attributes would be accessible by a simple check of memory at the necessary location. Setting up the necessary data is a bit curious to me.. I'm still reading this thread all the way through--- the answers may be here already.

matthew180 · August 19, 2010

Well, the answer is "it depends". Where is skeleton data stored *before* you get into combat? And if you flee instead of killing the skeleton, where is the skeleton's data stored *after* combat. How are you generating the monsters in a game? The Berly thread is a better place to work out those details, at which point I could help with the data manipulation here.

Typically you would have a set of attributes for each object in the game which defines what you can do with it. Since the 99/4A does not have enough memory to have all the objects in RAM at the same time, nor enough CPU power to cycle through all the objects a few times a second (which makes a very dynamic world), you will probably be loading and storing object data to and from disk as the player moves to various parts of the world.

If I were writing a CRPG, for combat I think I would use a structure of data for the mobs (monsters) and players. It might look something like (using a C-ish pseudo code):

structure mob
begin
   hitpts   data    * total hit points
   damage   data    * damage taken, dead when equal to hitpts
   armor    data    * armor class, used in mob hit calc
   weapon   data    * damage done by current weapon
end

This is a very minimal set of data for an RPG combat system, but it is only to demonstrate the use of a structure. So you would have this "set" of data, i.e. 1 structure for each mob. You would also have a similar structure for the player(s), but those would probably contain a lot more "fields" (each piece of data in the structure).

In a higher level programming language, the language itself would give you a way to define variables of the structure itself, as well as access the fields. Something like this:

   skeleton mob    * Make a variable called "skeleton" of type "mob"
   level    data   * Current level
.
.
.
   level := 3

   * Initialize the skeleton based on the level or area
   skeleton.hitpts := level * 25
   skeleton.damage := level * 0
   skeleton.armor  := level * 5
   skeleton.weapon := level * 8

Again this is very simplistic, but is used to demonstrate creating and accessing a "mob" variable. Of course you won't always have just 1 mob. So you would make an array of mobs and cycle through each one during combat until they were all dead, or the player(s) are all dead.

   num_mobs  data      * number of mobs
   mobs      mob[10]   * fixed array of up to 10 mobs max
   i         data      * general loop variable
.
.
.
   num_mobs := rnd(1,10)       * pick a random number between 1 and 10

   for i := 0 to num_mobs - 1  * loop through the mobs and initialize them all
   begin
      mobs[i].hitpts := level * 25
      mobs[i].damage := level * 0
      mobs[i].armor  := level * 5
      mobs[i].weapon := level * 8
   next i

Note that our array is zero-offset, i.e. the first subscript to the first element is an index of zero, not one.

In assembly language, the language does not provide us with structures (although it could), so we have to maintain them manually. Also, because of the limited resources on older computers, it is a lot easier to use static sized arrays vs. trying to deal with managing dynamic memory use. In a modern system, you could ask the OS for a blob of memory after you choose your random number for the number of mobs. For us, the memory management would be overkill. You might, however, have a blob of RAM designated for multiple uses. So during combat it could contain mob structure data, and when in town it could contain store and merchant structures, etc.

* Structure MOB
MOB_HP EQU  0      * 1 word, bytes 0,1
MOB_DM EQU  2      * 1 word, bytes 2,3
MOB_AR EQU  4      * 1 word, bytes 4,5
MOB_WP EQU  6      * 1 word, bytes 6,7
MOBSIZ EQU  8      * Size of the mob structure in bytes, word aligned

BLOB   EQU  >F000  * 1K blob of RAM from >F000 to >FFFF

MOBS   EQU  BLOB   * the mob array uses the RAM blob
TOWN   EQU  BLOB   * the town array uses the RAM blob

MOBNUM DATA 0
LEVEL  DATA 0
.
.
.

* Set up the mob array for combat
*
      BL   @RANDNO          * Get a random number of mobs from 1 to 10
      MOV  R2,@MOBNUM       * Store the number of mobs (assumes RANDNO returns a number 1 to 10 in R2)

      LI   R1,MOBS          * Set R1 to the start of the array, i.e. blob RAM
MOBINI
      LI   R3,25            * do mobs[i].hitpts := level * 25
      MPY  @LEVEL,R3        * R3,R4:= 25 * level
      MOV  R4,@MOB_HP(R1)   * Store in the current mob structure in the array

      CLR  @MOB_DM(R1)      * mobs[i].damage := level * 0

      LI   R3,5             * do mobs[i].armor  := level * 5
      MPY  @LEVEL,R3        * R3,R4:= 5 * level
      MOV  R4,@MOB_AR(R1)   * Store in the current mob structure in the array

      LI   R3,5             * do mobs[i].weapon := level * 8
      MPY  @LEVEL,R3        * R3,R4 := 8 * level
      MOV  R4,@MOB_WP(R1)   * Store in the current mob structure in the array

      AI   R1,MOBSIZ        * Adjust R1 to point to the next structure in the array
      DEC  R2               * Account for this mob
      JNE  MOBINI           * Not the last mob, initialize the next one

* Start combat
COMBAT
      LI   R1,MOBS          * Reset R1 to the start of the mobs array in blob RAM
      MOV  @MOBNUM,R2       * Reset the mob counter for combat
.
. Insert combat code here... 
.

The first time you do combat the mob data could randomly be generated. If the players end up fleeing then the mob data could be stored to disk for that map / world location and then reloaded next time the players enter that area again. Or the mobs could be randomly generated every time in an endless stream of mobs in that area. It all depends on how you decide to deal with such things. So the initialization code should probably be a subroutine (and made to deal with all the mob types in the game), and you would either call the initialization subroutine or the "load mobs" subroutine depending on various factors.

Is this the kind of information you are were looking for?

Matthew

+retroclouds · August 20, 2010

As usual a very clear explanation Matthew! :thumbsup:

+InsaneMultitasker · August 29, 2010

Agreed. KSCAN sucks the root. Here is a replacement KSCAN routine that I have on my hard disk from somewhere. I'm sorry, I have no idea who the author is so I can't give credit. If you are, or know the author, please shout out so that you can be credited (and questioned ). Nor have I checked that it works . I should invest some time in checking it out, because TurboForth could benefit from an independent (from the ROM) KSCAN. Unfortunately, what I know about CRU could be written on the back of a postage stamp with one of Demis Roussos' stubby fingers.

Mark.
*==
* Exported function list:
*   - KBSCAN -- Scans keyboard & returns qualified key code (BL @)
*               -- don't use if keyboard interrupt is active
<snip>

Ok so I'm re-reading Matthew's thread looking for more game-related goodness..

The comment structure and code lead me to believe this was written by Jeff Brown of Term80 / ZT4 fame. He wrote some very interesting, and I dare say elegant, code... :cool:

sometimes99er · September 13, 2010

*********************************************************************
*
* <subroutine skeleton>
*
SKEL
      MOV  R11,*R10+         * Push return address onto the stack

*      Subroutine code here ...

      DECT R10               * Pop return address off the stack
      MOV  *R10,R11
      B    *R11
*// SKEL

I haven't played around with this, so I don't know if the following change would affect your framework. Instead of having

      DECT R10               * Pop return address off the stack
      MOV  *R10,R11
      B    *R11

I have

      DECT R10               * Pop return address off the stack
      B    *R10

:cool:

matthew180 · September 14, 2010

Does that work? I don't think it will. I'm pretty sure that's what I had when I first wrote the stack code, but the B instruction does not work exactly like other instructions, if it did, B *R11 would not work. It drives me crazy to keep it straight.

Look at this:

      LI   R10,>8320
      LI   R11,1234
      MOV  R11,*R10

R10   : >8320
R11   : 1234
>8320 : 1234

In this example, the *value* of R10 is used as an address of where to store the *value* of R11. In this case the memory location >8320 gets a *value* of 1234.

Now look at BL and B:

>6000       BL   @SUB
>6002 >8000
>6004       INC  R2


>8000 SUB   INC  R0
>8002       B    *R11

I included memory addresses for reference. So, at line >6000 when the BL instruction is fetched, the PC will be incremented prior to the execution and will contain >6004 (the operand of the BL instruction is accounted for and skipped), and this new PC *value* will be stored in R11. So now the *value* of R11 is >6004, which is the memory location we want to return to after the subroutine, nothing unexpected there. Now the PC is replaced with the operand of the BL instruction which takes us to the subroutine.

The subroutine does its work and now needs to return. So the B *R11 is executed at location >8002. However, the *value* of R11 is >6004, and indirect addressing (designated by the *) means to use the *value* of the register as an address of where to find the actual value. So, that means B *R11 should use >6004 as a memory location, go to >6004 and get the value stored there, and branch to that location. However, that is not what happens. For the B instruction, indirect addressing will use the *value* of a register as the memory address to branch to. Lottrup's book actually has the best explanation I could find:

"The line B *R11, which returned several example programs to EASY BUG, meant to branch to the memory location addressed by the value in register 11."

In other words, use the *value* of R11 *as* the address the branch to, and not as the address to of where to look for an address to branch to.

Because the other instructions use the indirect addressing as you would expect, I need the extra MOV to get the address from the stack. Look at what happens if I leave out the extra MOV:

      MOV  R11,*R10+         * Push return address onto the stack

R11   contains >addr (the return to address, i.e. the PC prior to taking the branch)
>8320 receives >addr (memory location >8320, the stack, contains the saved return address)
R10   becomes  >8322 (the next stack location)

. . . do some work . . .

      DECT R10               * Pop return address off the stack

R10   becomes  >8320

      B    *R10

The B instruction will use the >8320 *value* in R10 as the address to branch to, which is not what we want. We need an extra MOV instruction to get the address back from the stack:

      DECT R10               * Pop return address off the stack

R10   becomes  >8320

      MOV  *R10,R11

>8320 contains >addr (the saved return address)
R10   contains >8320 (pointer to the address of where to get the saved return address)
R11   receives >addr (the value *at* >8320)

Now R11 contains the saved address and the B instruction will take us back to where we need to be. It confused me when I originally wrote it. To code what is really going on you would want to write:

        B R11

But that would have the affect of branching to whatever memory address was being used to store the workspace R11 (which would be >8316 assuming a workspace set to >8300).

Hmm, I just had a thought that explains this. The B instruction is using the WP (workspace pointer) as the "register" to use to get the value to use as an address of where to find the address (how's that for confusing?)

So, B takes the WP register's *value* which is >8300 (for this example) plus the offset of the specified register (R11 in this case). So, the WP register's *value* is >8316 in this case, and that *value* is used as a memory address of where to find the address to branch to. In this case, the *value* at memory location >8316 is simply the *value* of R11. So, the indirection still takes place, but the perspective as to which register to start with has changed for B vs. the other instructions.

Clear as mud now. I hope someone else chimes in and helps confirm (or disprove) this madness.

Matthew

Edited September 14, 2010 by matthew180

Opry99er · September 14, 2010

Great to see some activity on this thread again! I've been reading it over again while working on Calimari Carl. Good things are gonna happen for assembly programmers in the next couple months, I believe. Karsten will definitely have a wicked entry...

For now it's baby steps for me... But it's in motion. Thanks for all your input and insight here. It has been slow, but it's the summer... I've been out for the better part of 2 months playing. But it's good to be back.

sometimes99er · September 14, 2010

Sorry, you're right. Meltdown on my side. Had to go look at my source and yes indeed, it's not that simple. Some use of code like below must have confused me. Stack use on 9900 was never super simple (wonder why).

drip	mov 	r11,r10		; save return

...

b	*r10

matthew180 · September 14, 2010

Beware self-modifying code... I often do things like this versus using a stack...
 
SUB1   MOV  R11,@SUB1RT+2
      ...
SUB1RT B  @0
Now THAT is totally awesome! I always forget that we can modify any memory address on our little machine (too much time coding on stupid "modern" computers I guess.) I have no problems with code like this, it is fast, compact, and totally understandable. For those who might not understand what is going on, I'll go over it in detail in another post. I'm going to have to change to this method of subroutine calling I think. :-)
Two caveats I might mention:

1. The return will 'fail' if the code runs from a Read-only memory device - you can't modify ROM in-line.

2. If you use this trick in every subroutine, you can call any sub from another sub. However, if a sub doesn't call another sub, you will incur a small, small performance hit for each iteration. For a while I lazily did the former - or was it for consistency?

I think it is about time I got around to this explanation. :-)

When you are making large programs (and sometimes even when making small ones), you need to add some modularity to your code or you will quickly digress into a mess of confusion. The most fundamental separation of code is the subroutine, also called a function in most languages (but not always interchangeable.) You usually write a subroutine when you need to perform the same task multiple times in the same program (an example might be generating random numbers.) Rather than write the same code over and over every time you need to do that task, you write a subroutine that you "call" from the main program, the subroutine does its work, then "returns" to the main program right where it left off. I'm going to assume you already have an idea about what a subroutine is (and hopefully how / when to use them, etc.) If not, head over to Google and do a little reading since a firm grasp on subroutines is critical to being a programmer (and too much for me to teach in this post.)

A "stack" is simply a place to store data in memory. Typically you have a "stack pointer" that holds the address in memory where data can be added, and this is known as the "top of the stack". Stacks "grow" either "up" or "down" in memory as new data is added or old data removed. For example:

memory data
------ ----
0000   0000 <--- SP = 0000 (stack pointer)
0002   0000
0004   0000
0006   0000
0008   0000
000A   0000

This is a simple example of a stack that grows down (towards higher addresses) as data is added. The SP (stack pointer) is currently at the top of the stack.

There are usually special commands to add and remove data to/from a stack, and they are typically called "push" and "pop" respectively. So, if we were to push some data:

      LI   R0,1234
      PUSH R0

memory data
------ ----
0000   1234
0002   0000 <--- SP = 0002 (stack pointer)

The value in register R0 would be pushed to on to the stack at the current SP address, and the SP would be incremented by 2 (assuming a 16-bit CPU with a 16-bit oriented stack.) The inverse is of course the pop:

      POP  R0    R0 now contains 1234

memory data
------ ----
0000   1234 <--- SP = 0000 (stack pointer)
0002   0000

When popping data, the SP is *first* decremented, then the value on the stack is moved to the destination operand, in this case R0.

Now, before I'm lynched I'll stop using instructions that don't exist on the TMS9900 CPU. :-) Unfortunately for us the 9900 does not have a stack pointer nor push and pop instructions. The SP is usually a register built into the CPU, just like the PC (program counter), WP (workspace pointer), and most CPU's have their general purpose registers built in as well. The 9900 designers decided context switching was more important than on-die registers, but I'm not going to go into that here (I already covered that topic.)

The thing about a stack when taking about subroutines is, they are a convenient place to store the return addresses, and stack operations are typically provided for you by the CPU. For example:

memory
------
0000           LI   R0,1
0002 0001
0004           CALL SUB1
0006 A000
0008           CMP  R0,100
. . .
A000      SUB1 AI   R0,2
A002 0002
              RET

Stack after CALL SUB1:
memory data
------ ----
8320   0008 <--- The address of the instruction following the "CALL" is pushed on the stack
8322   0000 <--- SP after being incremented by the CALL instruction

Our code is going along and at address 0004 CALLs a subroutine (I'm using CALL simply for the example, the 9900 does not have CALL), which causes the value of the PC (program counter) to be pushed on to the stack. The PC will be pointing to the next instruction that *would* have been executed. The CPU would then load the address of SUB1 in to the PC which causes the subroutine code to start executing at address A000.

The subroutine does its work then executes the RET instruction which will pop the value off the stack and load it into the PC. Remember, the SP is decremented *before* the value is popped, so the address 0008 is loaded back into the PC and execution continues from where it was prior to the CALL.

While this may seem like a lot of work for calling single subroutines, then you would be right. However, as programs get more complex you will have subroutines calling other subroutines, and the "depth" of such calls can get very deep. With a stack, the CPU just keeps pushing the current address on the stack and branching to the subroutines, knowing that the address of where to return to are being saved on the stack.

On the 9900 there is no SP register or PUSH, POP, CALL, or RET instructions. If we want a stack we have to set it up and maintain it by hand. The two primary instructions used for subroutines are B (branch) and BL (branch and link) (I'm not covering BLWP here.)

The B instruction will simply branch to the address specified. This is equivalent to GOTO in BASIC and C (yes C has GOTO!) The B instruction is typically how we get back from a subroutine, i.e. RET (return). Most 9900 assemblers have a pseudo operation RET that is assembled as: B *R11

The BL instruction branches just like the B instruction, however the "link" part of the instruction saves the current value of the PC in to R11 prior the branching (saving the PC in to R11 is hard wired into the 9900 and cannot be changed.) This gives us a way to "link" back to where we came from, i.e. return from our subroutine.

So, to call a subroutine, we use BL to get there, and B *R11 to get back, like this:

6000             BL   @SUB1     <-- Saves the PC value (6004) into R11, branches to SUB1 (A000)
6002 A000
6004             CMP  R0,12
6006 000C
. . .
A000      SUB1   AI   R0,2
A002 0002
A004             B    *R11      <-- Branches to the address in R11 (6004)

So this takes care of calling subroutines. The problem comes when you want to call a subroutine from *within* a subroutine:

6000             BL   @SUB1     <-- Saves the PC value (6004) into R11, branches to SUB1 (A000)
6002 A000
6004             CMP  R0,12
6006 000C
. . .
A000      SUB1   AI   R0,2
A002 0002
A004             BL   @SUB2     <-- Saves the PC value (A006) into R11, branches to SUB2 (B000)
A006             B    *R11      <-- >>ENDLESS LOOP<< Branches to the address in R11 (A006)
. . .
B000      SUB2   AI   R0,1
B002 0001
B004             B    *R11      <-- Branches to the address in R11 (A006)

Notice that when we BL @SUB2, the return value saved in R11 is clobbered, so we can never return from SUB1 back to the original calling code, and in this case we are stuck in an endless loop forever branching to the same memory address, i.e. the B instruction itself. The only way to recover from this is to power-cycle the computer.

Thus, prior to calling SUB2 we need to save the value in R11. This is where our pseudo stack come in to play. The start of every subroutine is set up to push R11 onto the stack and increment the stack pointer. Then the end of every subroutine is set up to pop the top value of the stack back into R11, which will be the address of where to return (assuming the stack did not overflow or become corrupt.) I presented this stack maintenance in my example skeleton code:

**
* Scratch pad RAM use - Variables
*
*           >8300             * Workspace
*           >831F             * Bottom of workspace
STACK  EQU  >8320             * Subroutine stack, grows down (8 bytes)
*           >8322             * The stack is maintained in R10 and
*           >8324             * supports up to 4 BL calls
*           >8326

. . .

*      Initialize the call stack and Finite State Machine (FSM)
      LI   R10,STACK         * Set up the stack pointer

. . .

*********************************************************************
*
* <subroutine skeleton>
*
SKEL
      MOV  R11,*R10+         * Push return address onto the stack

*      Subroutine code here ...

      DECT R10               * Pop return address off the stack
      MOV  *R10,R11
      B    *R11
*// SKEL

Because the 9900 does not have a stack pointer, I designated R10 to forever be the stack pointer. I chose R10 because it is close to R11 and R11 is hardwired to always be used by BL. R13, R14, and R15 are used with BLWP, and R12 is used with the CRU instructions. Thus R10. So, R10 is set up with a memory address that is a used as a 4-value stack. Not very deep, but space is always an issue and I didn't need more than 4 levels of subroutines. You can make the stack deeper and also move it out of 16-bit RAM if you need a lot of room (and if a large stack trumps the penalty for using 8-bit RAM.)

Now, we can finally talk about the example InsaneMultitasker posted (whew):

 
A000 xxxx SUB1   MOV  R11,@SUB1RT+2
A002 A00E
A004 xxxx        LI   R0,1
      . . .
A00C xxxx SUB1RT B    @0
A00E 0000

I have added some machine code (as in the other examples) since you have to understand what is going on at that level. All instructions on the 9900 are 16-bit and will always be on even addresses. Any instructions that can use symbolic addressing (or take immediate operands) will have the data associated with the symbolic address encoded in memory immediately following the instruction. The CPU knows how many operands follow and will read that data while executing the instruction.

So, in the code above the: MOV R11,@SUB1RT+2 is what saves the return address. Address A000 is the MOV instruction itself. The assembler compiles our symbolic operand as:

 A00C - the address at the label SUB1RT
+ 0002 - the immediate value added to SUB1RT
 ----
 A00E

So A00E is encoded in the resulting machine code. I'm not going to look up the machine code for the instructions right now since it is not important for this example, however the xxxx above would be a hex value representing the instruction's opcode. When executing, the CPU sees the symbolic addressing, gets the next value in memory following the opcode, and uses that as the destination of the MOV. Thus we end up with something like:

      MOV R11,@>A00E

We could write this directly, however one of the main reasons for using an assembler is to let it figure out the memory addresses, and we can just use labels that make sense to us.

So, you will notice that address >A00E is the location in memory where the B instruction, also coded with a symbolic operand, will look to get its address when executed. Since R11 contains the address of where we need to return to, this is effectively saving the return address directly in the B instruction's symbolic address. This is self-modifying code! You can't do this on a modern CPU because the protection prevents your program from changing the program's code.

The "@0" operand of the B instruction is simply there as a place holder to make sure the assembler generates the correct machine code for the B instruction, i.e. a branch with a symbolic address. By the time the B instruction is executed, the 0000 address has been replaced with the real return address. *Note* a real B @>0000 will cause your console to reset, just like hitting CTRL-QUIT.

Doing this kind of self modification can be very handy, and in this case it is a nice way to save the return address without using a stack. In this case it is also easy to understand what is happening, so it is *good* self modifying code. ;-)

There are a few problems, as mentioned in the previous post:

1. You can not use this code when your program is in ROM, i.e. a game executing from the cartridge space. This should be obvious, you can't write to ROM, so you can't "fix up" the return address of the B instruction.

2. You can not use this code if you plan to do recursion (a function that calls itself) since there is only room to save one return address. For recursion you need a stack, usually a deep one.

3. Like #2, you can not use this code if you call a subroutine that might call another that might eventually get back to the current routine. This would be like "indirect" recursion (if there is such a thing) and has the same problem as #2.

However, if you are running from the 32K RAM expansion, or if your subroutine is in RAM and you now you won't have subroutines calling themselves, this is a very nice way to handle subroutine calling without the overhead of a stack and the stack maintenance instructions. I will certainly be using this kind of thing in the future!

Matthew

Edited September 14, 2010 by matthew180

+adamantyr · September 14, 2010

However, if you are running from the 32K RAM expansion, or if your subroutine is in RAM and you now you won't have subroutines calling themselves, this is a very nice way to handle subroutine calling without the overhead of a stack and the stack maintenance instructions. I will certainly be using this kind of thing in the future!

A very interesting technique... essentially, you're tagging every subroutine with a return-stack word value! The advantage of not having CODE and DATA segments in TMS9900 architecture.

However, I do see some drawbacks to the technique:

- Subroutine count vs. general stack size. Usually you have a lot less layers of return stack than count of subroutines.

- Debugging. At least with a stack, you only got ONE place to look. This has come in handy when my CRPG has gone off into ROM land and I need to see where it came from.

- Memory speed. With a stack you at least have the option to put it into the 16-bit scratchpad. A little harder to do with subroutines

For my own CRPG, I use R10 as well for a return address stack pointer. Currently I allocated 32 bytes to it for 16 levels of return stacking, but I'll reduce it to match the maximum level reached when the core engine work is done. I have a lot of subroutines, and if they do nothing but branches or BLWP's, I don't store the return address for that particular case.

I'll probably enlist aid here on the forum to help "optimize" the final source code. I'm trying to program efficiently NOW, but you know how it is. Sometimes you just want it to bloody WORK, and you'll figure out how to make it work BETTER later.

Adamantyr

+InsaneMultitasker · September 14, 2010

Though perhaps intuitive, I though I'd mention when "passing" parameters via R11, saving the return value generally occurs after the parameters are "passed" to the subroutine. I believe this is true for both techniques you have explained so well.

      BL   @SUB2
      DATA 8,1230          * two word parameters
      <program continues here>
..

SUB2   MOV  *R11+,@VAL1     * Copy first value into VAL1, increase R11 by 2 bytes
      MOV  *R11+,@VAL2     * Copy second value into VAL2, increase R11 by 2 bytes 
      MOV  R11,@SUB2RT+2   * R11 now points to proper return point
      ...
SUB2RT B  @0

intvnut · September 21, 2010

I'll get to the keyboard input stuff soon, but I won't be using KSCAN since (get ready) I don't like it... No surprise there. The clincher was the post that retroclouds (or was it sometimes? I can't remember now) made a bit ago that pointed out the ROM KSCAN routine uses a delay. So I fired up the TI Intern and did some code reading of the KSCAN routine and found the delay is used twice and each time causes a 25ms delay. That means it takes about 3 video frames just to call KSCAN, and that is counting just the delay loops. That is 50ms where the computer is just spinning its wheels, and when instructions are measured in microseconds, 50 milliseconds is a long time!

Ok, so that is where the gratuitous delay loop is! I remember Karl had mentioned there was one in there, but I never knew where it was, and when someone challenged me on it, my quick browse through TI-INTERN didn't turn it up. I hadn't looked in KSCAN, though.

I wonder how much faster other software on the TI-99/4A (even TI-BASIC!) would go if KSCAN were fixed?

Edited September 21, 2010 by intvnut

Willsy · September 21, 2010

Nice posts on here. It's good to see so much 9900 talk taking place!

matthew180 · September 21, 2010

I wonder how much faster other software on the TI-99/4A (even TI-BASIC!) would go if KSCAN were fixed?

Well, assuming a reliable keyboard debounce does not take 25ms to 50ms, and depending on when/how often BASIC or XB calls KSCAN, then there might be a significant difference! Hmm. I might have to socket my ROMs tonight. :-)

Matthew

Opry99er · September 21, 2010

I think Willsy wrote a faster KSCAN routine....

matthew180 · September 21, 2010

He probably did. Anyone who polls the keyboard directly will probably end up with a faster routine by default. However, *many* programs and games use the KSCAN routine, and thus suffer the delay. If the built-in KSCAN routine in ROM was changed then everything would get a little boost.

Matthew

intvnut · September 22, 2010

I wonder how much faster other software on the TI-99/4A (even TI-BASIC!) would go if KSCAN were fixed?

Well, assuming a reliable keyboard debounce does not take 25ms to 50ms, and depending on when/how often BASIC or XB calls KSCAN, then there might be a significant difference! Hmm. I might have to socket my ROMs tonight. :-)

It's always possible to do a "threaded" debounce. Rather than delay explicitly within the scanning routine, accumulate the delay over multiple calls. It's a shade more complex, but I've implemented such things in the past to great effect on other systems.

In pseudocode, you'd have something like this that would work well for a typed-text situation:

   scan input

   if current input different than previous:
       reset bounce counter to 0
       return nothing (or key-up event if appropriate/desired)

   if bounce counter == MAX
       return nothing

   increment bounce counter

   if bounce counter == MAX
       decode and return new keystroke

If you really do need several milliseconds to correctly debounce, then you can call such a scanning routine once a frame, but not take the hit within the routine. Just set the max bounce count to 1, 2 or 3 (16ms, 33ms or 50ms).

Of course, those debounce lengths sound ridiculously long. I think you could do a proper debounce in something closer to 1-2ms, but that's no good reason to spend it in a do-nothing loop. Another alternative is to call it a few times per frame even. I've had good luck scanning controllers on the Intellivision while scanning from an idle loop (which on the TI would be tantamount to polling for VSYNC), and setting the bounce count based on how fast the idle loop is.

If you need to be able to recover from the keyboard scan quickly (such as when you're in that VSYNC wait loop), you could even thread the row-loop, scanning one row per call. But, that's a shade more complicated.

Assembly on the 99/4A

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members