my first assembly program

ceratophyllum · March 23, 2011

I was reading Beginning Assembly and the Compute book and I decided

it was time to make a little ASCII nethack man move around the

screen. At first, to see what was happening, I set the uh "man" to

change with the 4 directions and didn't bother to erase.

I sort of like how it looks. Reminds me of the patterns in the

gravel in a Zen Rock Garden.

I'm still too confused to mess with scrolling, however I

figured--WRONGLY--that it should not be too hard to set up

a one-screen bounded playfield: no wrap around. Err, I mean

a screen like in Robot Finds Kitten.

That is, I move around Left and Right by INC, DEC and watch out

for the rightmost and leftmost columns. (Thanks to the way the

screen is numbered, it's easy to keep from going off the top or

bottom.)

If I were in BASIC I would just check the remainder=0

when dividing by 32 for the leftmost column. However, I've got

to go back to MiniMemory and LBLA see just how DIV works.

Anyhow, thanks to the wonders of modern text editing and

a little kludgery, I at least managed to implement a bone-head

solution. Is there a better way, not using DIV, that hasn't occurred

to me?

Extremely goofy code follows:

******************************
*  Zen ASCII Garden          *
*  arrow keys (E,D,X,S)      *
******************************
DEF	BEGIN              
REF	KSCAN,VSBW
KBOARD	EQU	>8374	* Holds ASCII # of pressed key
KEY	EQU	>8375           
* Split keyboard key codes Ext'd Basic man. page 201
KEYER	BYTE	15	* z
KEYUP	BYTE	5	* e
KEYRT	BYTE	3	* d
KEYDN	BYTE	0	* x
KEYLT	BYTE	2	* s
HEXFF	BYTE	>FF	* No key pressed value
ONE	BYTE	1
*
MYREG	BSS	>20
*
BEGIN	LWPI	MYREG
MOVB	@ONE,@KBOARD	* Check left side of keyboard.
LI	R0,300		* initial position of piece	
LOOP	BLWP	@KSCAN		* Check for keyboard input.
*
LI	R7,6000		* delay length
DLAY	DEC R7			* R7=R7-1
JNE DLAY		* IF R7>0 goto DLAY
*
CB	@HEXFF,@KEY	* Was a key pressed?
JEQ	LOOP		*
CB	@KEYUP,@KEY	* Compare to see which
JEQ	PUP		* arrow key was pressed.
CB	@KEYRT,@KEY
JEQ	PRIGHT
CB	@KEYDN,@KEY
JEQ	PDOWN       
CB	@KEYLT,@KEY 
JEQ	PLEFT       
CB	@KEYER,@KEY
JEQ	PERASE		* I added an erase key.
B	@LOOP		* no key GOTO LOOP
*
PERASE	LI	R1,>2000	* set piece to space
B	@PRINT		* used to erase a ><^V
PUP	LI	R1,>5E00	* set piece to ^
CI	R0,32		* are we in top row?
JLT	SKIP1		* yes? GOTO SKIP1
AI	R0,-32		* no? move up one row
SKIP1	B	@PRINT		* branch to display piece
PDOWN	LI	R1,>5600	* set piece to V
CI	R0,735		* Are we at bottom row? 
JGT	SKIP2		* If yes, skip & dont move down
AI	R0,32		* move row down
SKIP2	B	@PRINT		* branch to display piece
PRIGHT	LI	R1,>3E00	* set piece to > 
CI	R0,31		* stop at last column on right 
JEQ	SKIP3		* kludgy as hell way to deal with 
CI	R0,63		* long jumps: 2 shorter jumps
JEQ	SKIP3
CI	R0,95
JEQ	SKIP3
CI	R0,127
JEQ	SKIP3
CI	R0,159
JEQ	SKIP3
CI	R0,191
JEQ	SKIP3
CI	R0,223
JEQ	SKIP3
CI	R0,255
JEQ	SKIP3
CI	R0,287
JEQ	PRINT		* now close enough to reach PRINT
CI	R0,319
JEQ	PRINT
CI	R0,351
JEQ	PRINT
CI	R0,383
JEQ	PRINT
CI	R0,415
JEQ	PRINT
CI	R0,447
JEQ	PRINT
CI	R0,479
JEQ	PRINT
CI	R0,511
JEQ	PRINT
CI	R0,543
JEQ	PRINT
CI	R0,575
JEQ	PRINT
CI	R0,607
JEQ	PRINT
CI	R0,639
JEQ	PRINT
CI	R0,671
JEQ	PRINT
CI	R0,703
JEQ	PRINT
CI	R0,735
JEQ	PRINT
CI	R0,767
JEQ	PRINT	
INC	R0        
SKIP3	B	@PRINT 
PLEFT	LI	R1,>3C00
CI	R0,0		* stop marker from going offscreen on left
JLE	PRINT		* If only I could stick #s in an array
CI	R0,32		* and reference it somehow.
JEQ	PRINT       
CI	R0,64
JEQ	PRINT
CI	R0,96
JEQ	PRINT
CI	R0,128
JEQ	PRINT
CI	R0,160
JEQ	PRINT
CI	R0,192
JEQ	PRINT
CI	R0,224
JEQ	PRINT
CI	R0,256
JEQ	PRINT
CI	R0,288
JEQ	PRINT
CI	R0,320
JEQ	PRINT
CI	R0,352
JEQ	PRINT
CI	R0,384
JEQ	PRINT
CI	R0,416
JEQ	PRINT
CI	R0,448
JEQ	PRINT
CI	R0,480
JEQ	PRINT
CI	R0,512
JEQ	PRINT
CI	R0,544
JEQ	PRINT
CI	R0,576
JEQ	PRINT
CI	R0,608
JEQ	PRINT
CI	R0,640
JEQ	PRINT
CI	R0,672
JEQ	PRINT
CI	R0,704
JEQ	PRINT
CI	R0,736
JEQ	PRINT
DEC	R0
B	@PRINT 
PRINT	BLWP	@VSBW		* finally put that text on
B	@LOOP		* the screen
END

+adamantyr · March 23, 2011

Boundary checking in assembly isn't easy or simple. Kudos on creating a working solution!

Probably the most important paradigm to embrace here is that the screen is not a data storage area. What you're doing is storing a value in a linear array 768 bytes long. The fact that it's the screen is just how the data is being displayed. You're using a single register to track an index in this array, so this is why it's not easy to suddenly have it behave like it's in a two-coordinate system.

So, one solution is to use two data words to store a row and column value, and then calculate the index value on the screen from that. Then you can just check that the row and column are within boundary limits before calculating the index value.

Here's a short snippet how it would work:

ROW    BSS  2
COL    BSS  2
.
.
.
CHECK  MOV  @ROW,R0
      CI   R0,24
      JL   CHECK1                        * Value is 0 to 23
      CLR  R0
CHECK1 MOV  R0,@ROW
      MOV  @COL,R0
      ANDI R0,>001F                      * AND against 32, will cause column to wrap around if -1 or 32 to 31 and 0 respectively
      MOV  R0,@COL
      MOV  @ROW,R0
      SLA  R0,5                          * Multiply by 32
      A    @COL,R0

This is a bit crude... I have wrapping working for columns but rows will be strange. Also note that by using the SLA (Shift Left Arithmetic) instruction, I can multiply by any power of 2 value. You can use SRL or SRA to divide in much the same way.

Does this help?

Adamantyr

sometimes99er · March 23, 2011

Is there a better way, not using DIV, that hasn't occurred to me?

SRL R0,>5 ; Moves bits 5 places to the right = divides by 32 (to go from screen location to line no.)

SLA R0,>5; Moves bits 5 places to the left = multiply by 32 (to go from line no. to screen location)

Shift Right Logical (EA manual page 198)

Shift Left Arithmetic (EA manual page 200)

ANDI R0,>1F ; isolates last 5 bits = will then contain a value between 0 and 31

You should not move left from 0, and not move right from 31.

matthew180 · March 23, 2011

Some ideas I recommend:

* Don't use registers to store variables. Allocate memory for variables and use registers to do work.

* Always set your workspace inside the 16-bit fast RAM, i.e. >8300

* Store your player X,Y location in a format that makes sense to the game, and not what makes sense the the "screen"

* The screen is output only - you should be able to redraw it at any time

* Use WASD for direction control, yes, even on the 99/4A. It is much more comfortable for one handed control

Here is an example based on your code. It introduces a few concepts mentioned above, and gives you something to tinker with (or not.) Some things to pay attention to are the difference between DATA (or BYTE, TEXT, or BSS) and an EQUate, and when / how to use each. Also, byte operation on registers *always* affect the MSB only. Same with a byte operation on a label defined with DATA.

I tested this win asm994a and Classic99. Mess with the TRESET value to change the speed.

Hopes this helps.

*********************************************************************
*
*  Zen ASCII Garden
*
*  Normal PC style W,A,S,D keys...  Yes, even works on a 99/4A and
*  is much more comfortable than the "arrow" key arrangment.
*
      DEF  BEGIN
      REF  KSCAN

* VDP Memory Map
*
VDPRD  EQU  >8800             * VDP read data
VDPSTA EQU  >8802             * VDP status
VDPWD  EQU  >8C00             * VDP write data
VDPWA  EQU  >8C02             * VDP set read/write address
VR1CPY EQU  >83D4             * Copy of VDP register 1 - see E/A manual pg. 248
VSYNC  EQU  >83D7             * Vertical Sync

* Workspace
WRKSP  EQU  >8300             * Workspace
R0LB   EQU  WRKSP+1           * R0 low byte for VDP routines


KBOARD EQU  >8374             * Holds ASCII # of pressed key
KEY    EQU  >8375

TRESET EQU  200               * Keyboard read delay

* Keyboard test delay
TIMER  DATA 0

* Bounds (zero-based), 1 character boundary
BTOP   DATA 1
BLEFT  DATA 1
BBTM   DATA 22
BRIGHT DATA 30


* Player location data
P1X    DATA 0
P1Y    DATA 0
P1XD   DATA 0                 * New X
P1YD   DATA 0                 * New Y
P1CUR  BYTE 0                 * Current character
P1NEW  BYTE 0                 * New character

* Player character codes
P1UP   BYTE >5E               * ^
P1DN   BYTE >76               * v
P1LT   BYTE >3C               * <
P1RT   BYTE >3E               * >
P1ERSE BYTE >20               * _space_

* Split keyboard key codes XB man. page 201
KEYUP  BYTE 4                 * w
KEYLT  BYTE 1                 * a
KEYDN  BYTE 2                 * s
KEYRT  BYTE 3                 * d
KEYER  BYTE 15                * z - erase
HEXFF  BYTE >FF               * No key pressed value
ONE    BYTE 1

      EVEN

* Entry point
BEGIN
      LIMI 0                 * Interrupts off for writing to the screen
      LWPI WRKSP

*      Clear the screen
*      Assumes the screen is already in Graphics Mode I (24x32)
* R0   Starting write address in VDP RAM
* R1   MSB of R1 sent to VDP RAM
* R2   Number of times to write the MSB byte of R1 to VDP RAM
      CLR  R0
      LI   R1,>2000          * >20 is hex for 32 (space character)
      LI   R2,768
      BL   @VSMW

*      Set inital player location
      LI   R0,16
      MOV  R0,@P1X
      LI   R0,12
      MOV  R0,@P1Y

      MOV  @P1X,@P1XD
      MOV  @P1Y,@P1YD

      MOVB @P1UP,@P1CUR      * Set initial direction (high byte of P1CUR)
      MOVB @P1CUR,@P1NEW     * Set "new" character to current character


*      Set keyboard timer
      LI   R0,TRESET
      MOV  R0,@TIMER

**
* Main game loop
*
LOOP

*      Check if time to read keyboard.
*      Should really do this with the VDP interrupt...
      DEC  @TIMER            * TIMER := TIMER - 1
      JEQ  KEY00             * IF TIMER == 0 THEN read keyboard
      B    @STUFF

KEY00
*      Reset keyboard timer
      LI   R0,TRESET
      MOV  R0,@TIMER

      MOVB @ONE,@KBOARD      * Check left side of keyboard
      BLWP @KSCAN            * Check for keyboard input

      CB   @HEXFF,@KEY       * Was "some" key pressed?
      JNE  KEY01             * Yes, test if valid
      B    @STUFF            * Otherwise do other stuff

KEY01
      CB   @KEYUP,@KEY       * Test each valid key
      JNE  KEY02
      DEC  @P1YD
      MOVB @P1UP,@P1NEW
      B    @BOUND

KEY02
      CB   @KEYDN,@KEY
      JNE  KEY03
      INC  @P1YD
      MOVB @P1DN,@P1NEW
      B    @BOUND

KEY03
      CB   @KEYLT,@KEY
      JNE  KEY04
      DEC  @P1XD
      MOVB @P1LT,@P1NEW
      B    @BOUND

KEY04
      CB   @KEYRT,@KEY
      JNE  KEY05
      INC  @P1XD
      MOVB @P1RT,@P1NEW
      B    @BOUND

KEY05
      CB   @KEYER,@KEY
      JNE  KEY06
      MOVB @P1ERSE,@P1NEW

KEY06

* Bounds check. NOTE, using *SIGNED* tests here to support no border.
* Could be more efficient with UNSIGNED tests because the 9900 only
* has 2 signed tests: JGT and JLT, but many unsigned tests.  Thus the
* use of TWO test, one for > or < and one for =
BOUND
      C    @P1YD,@BTOP
      JGT  BOUND1
      JEQ  BOUND1
      MOV  @P1Y,@P1YD        * Reset out of bound Y

BOUND1
      C    @P1YD,@BBTM
      JLT  BOUND2
      JEQ  BOUND2
      MOV  @P1Y,@P1YD        * Reset out of bound Y

BOUND2
      C    @P1XD,@BLEFT
      JGT  BOUND3
      JEQ  BOUND3
      MOV  @P1X,@P1XD        * Reset out of bound X

BOUND3
      C    @P1XD,@BRIGHT
      JLT  BOUND4
      JEQ  BOUND4
      MOV  @P1X,@P1XD        * Reset out of bound X

BOUND4
*      Update legal X,Y location
      MOV  @P1XD,@P1X
      MOV  @P1YD,@P1Y


**
* Do other game stuff, draw the screen, etc.
*
STUFF

* Draw screen and player
DRAW

*      Save new character value, if any.
      MOVB @P1NEW,@P1CUR

*      Calcualte player screen location as
*      loc = y * 32 + x
      MOV  @P1Y,R0           * R0 := Y
      SLA  R0,5              * Multiply R0 (Y) by 32
      A    @P1X,R0           * Add X

      MOVB @P1NEW,R1

* R0   Write address in VDP RAM
* R1   MSB of R1 sent to VDP RAM
      BL   @VSBW


*      Enable / disable interrupts really quick to let the
*      ISR run.  Don't need to do this if you are not using
*      stuff the ISR offers...
      LIMI 2
      LIMI 0

      B    @LOOP             * Go back to main loop top



*********************************************************************
*
* VDP Single Byte Write
*
* R0   Write address in VDP RAM
* R1   MSB of R1 sent to VDP RAM
*
* R0 is modified, but can be restored with: ANDI R0,>3FFF
*
VSBW   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
      ORI  R0,>4000          * Set read/write bits 14 and 15 to write (01)
      MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
      MOVB R1,@VDPWD         * Write byte to VDP RAM
      B    *R11
*// VSBW


*********************************************************************
*
* VDP Single Byte Multiple Write
*
* R0   Starting write address in VDP RAM
* R1   MSB of R1 sent to VDP RAM
* R2   Number of times to write the MSB byte of R1 to VDP RAM
*
* R0 is modified, but can be restored with: ANDI R0,>3FFF
*
VSMW   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
      ORI  R0,>4000          * Set read/write bits 14 and 15 to write (01)
      MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
VSMWLP MOVB R1,@VDPWD         * Write byte to VDP RAM
      DEC  R2                * Byte counter
      JNE  VSMWLP            * Check if done
      B    *R11
*// VSMW


*********************************************************************
*
* VDP Write To Register
*
* R0 MSB    VDP register to write to
* R0 LSB    Value to write
*
VWTR   MOVB @R0LB,@VDPWA      * Send low byte (value) to write to VDP register
      ORI  R0,>8000          * Set up a VDP register write operation (10)
      MOVB R0,@VDPWA         * Send high byte (address) of VDP register
      B    *R11
*// VWTR


      END

marc.hull · March 24, 2011

I was reading Beginning Assembly and the Compute book and I decided

it was time to make a little ASCII nethack man move around the

screen. At first, to see what was happening, I set the uh "man" to

change with the 4 directions and didn't bother to erase.

I sort of like how it looks. Reminds me of the patterns in the

gravel in a Zen Rock Garden.

I'm still too confused to mess with scrolling, however I

figured--WRONGLY--that it should not be too hard to set up

a one-screen bounded playfield: no wrap around. Err, I mean

a screen like in Robot Finds Kitten.

That is, I move around Left and Right by INC, DEC and watch out

for the rightmost and leftmost columns. (Thanks to the way the

screen is numbered, it's easy to keep from going off the top or

bottom.)

If I were in BASIC I would just check the remainder=0

when dividing by 32 for the leftmost column. However, I've got

to go back to MiniMemory and LBLA see just how DIV works.

Anyhow, thanks to the wonders of modern text editing and

a little kludgery, I at least managed to implement a bone-head

solution. Is there a better way, not using DIV, that hasn't occurred

to me?

Extremely goofy code follows:

******************************
*  Zen ASCII Garden          *
*  arrow keys (E,D,X,S)      *
******************************
DEF	BEGIN              
REF	KSCAN,VSBW
KBOARD	EQU	>8374	* Holds ASCII # of pressed key
KEY	EQU	>8375           
* Split keyboard key codes Ext'd Basic man. page 201
KEYER	BYTE	15	* z
KEYUP	BYTE	5	* e
KEYRT	BYTE	3	* d
KEYDN	BYTE	0	* x
KEYLT	BYTE	2	* s
HEXFF	BYTE	>FF	* No key pressed value
ONE	BYTE	1
*
MYREG	BSS	>20
*
BEGIN	LWPI	MYREG
MOVB	@ONE,@KBOARD	* Check left side of keyboard.
LI	R0,300		* initial position of piece	
LOOP	BLWP	@KSCAN		* Check for keyboard input.
*
LI	R7,6000		* delay length
DLAY	DEC R7			* R7=R7-1
JNE DLAY		* IF R7>0 goto DLAY
*
CB	@HEXFF,@KEY	* Was a key pressed?
JEQ	LOOP		*
CB	@KEYUP,@KEY	* Compare to see which
JEQ	PUP		* arrow key was pressed.
CB	@KEYRT,@KEY
JEQ	PRIGHT
CB	@KEYDN,@KEY
JEQ	PDOWN       
CB	@KEYLT,@KEY 
JEQ	PLEFT       
CB	@KEYER,@KEY
JEQ	PERASE		* I added an erase key.
B	@LOOP		* no key GOTO LOOP
*
PERASE	LI	R1,>2000	* set piece to space
B	@PRINT		* used to erase a ><^V
PUP	LI	R1,>5E00	* set piece to ^
CI	R0,32		* are we in top row?
JLT	SKIP1		* yes? GOTO SKIP1
AI	R0,-32		* no? move up one row
SKIP1	B	@PRINT		* branch to display piece
PDOWN	LI	R1,>5600	* set piece to V
CI	R0,735		* Are we at bottom row? 
JGT	SKIP2		* If yes, skip & dont move down
AI	R0,32		* move row down
SKIP2	B	@PRINT		* branch to display piece
PRIGHT	LI	R1,>3E00	* set piece to > 
CI	R0,31		* stop at last column on right 
JEQ	SKIP3		* kludgy as hell way to deal with 
CI	R0,63		* long jumps: 2 shorter jumps
JEQ	SKIP3
CI	R0,95
JEQ	SKIP3
CI	R0,127
JEQ	SKIP3
CI	R0,159
JEQ	SKIP3
CI	R0,191
JEQ	SKIP3
CI	R0,223
JEQ	SKIP3
CI	R0,255
JEQ	SKIP3
CI	R0,287
JEQ	PRINT		* now close enough to reach PRINT
CI	R0,319
JEQ	PRINT
CI	R0,351
JEQ	PRINT
CI	R0,383
JEQ	PRINT
CI	R0,415
JEQ	PRINT
CI	R0,447
JEQ	PRINT
CI	R0,479
JEQ	PRINT
CI	R0,511
JEQ	PRINT
CI	R0,543
JEQ	PRINT
CI	R0,575
JEQ	PRINT
CI	R0,607
JEQ	PRINT
CI	R0,639
JEQ	PRINT
CI	R0,671
JEQ	PRINT
CI	R0,703
JEQ	PRINT
CI	R0,735
JEQ	PRINT
CI	R0,767
JEQ	PRINT	
INC	R0        
SKIP3	B	@PRINT 
PLEFT	LI	R1,>3C00
CI	R0,0		* stop marker from going offscreen on left
JLE	PRINT		* If only I could stick #s in an array
CI	R0,32		* and reference it somehow.
JEQ	PRINT       
CI	R0,64
JEQ	PRINT
CI	R0,96
JEQ	PRINT
CI	R0,128
JEQ	PRINT
CI	R0,160
JEQ	PRINT
CI	R0,192
JEQ	PRINT
CI	R0,224
JEQ	PRINT
CI	R0,256
JEQ	PRINT
CI	R0,288
JEQ	PRINT
CI	R0,320
JEQ	PRINT
CI	R0,352
JEQ	PRINT
CI	R0,384
JEQ	PRINT
CI	R0,416
JEQ	PRINT
CI	R0,448
JEQ	PRINT
CI	R0,480
JEQ	PRINT
CI	R0,512
JEQ	PRINT
CI	R0,544
JEQ	PRINT
CI	R0,576
JEQ	PRINT
CI	R0,608
JEQ	PRINT
CI	R0,640
JEQ	PRINT
CI	R0,672
JEQ	PRINT
CI	R0,704
JEQ	PRINT
CI	R0,736
JEQ	PRINT
DEC	R0
B	@PRINT 
PRINT	BLWP	@VSBW		* finally put that text on
B	@LOOP		* the screen
END

That looks really good ! Welcome to the assembly club.... Keep on keeping on cerato.....

ceratophyllum · March 25, 2011

* Always set your workspace inside the 16-bit fast RAM, i.e. >8300

I am still hazy about the internals of the TI99 4/A.

I figured it would start to come clear when I try to

program something and all my trees suddenly

turn into mushrooms.

>8300 is 33536? Isn't that above 32K bytes (32768)? How can that be?

Before I found asm994a, I was playing around with tiasm

but couldn't figure out how to get something to actually run

in an (emulated) TI. (tiasm source is in the v9t9 linux source

in the tools directory.)

I guess it spits out some kind of cartridge image?

~/Games/ti994a>./tiasm 
TIASM <input file> [-r <console ROM output>] [-m <module ROM output>]
[-d <DSR ROM output>] [-g <console GROM output>] [<list file>]

-r saves the 8k memory block at >0000.
-m saves the 8k memory block at >6000.
-d saves the 8k memory block at >4000.
-g saves the 24k memory block at >0000.  This can only be used with -m.

* Store your player X,Y location in a format that makes sense to the game, and not what makes sense the the "screen"

Wow! That's some spoiler! It totally blows the

examples in books out of the water. It is very

neat how you handle the screen

coordinates intuitively (x,y) and then at the

end convert to number 0-767. This is code I will

certainly reuse. Thank you! Can't wait to get home

and type it in so I can get a real look.

Compute doesn't address the bounds problem in the "moving +" example;

maybe the solution they had in mind wouldn't fit in Mini Memory!

Here is an example based on your code. It introduces a few concepts mentioned above, and gives you something to tinker with (or not.) Some things to pay attention to are the difference between DATA (or BYTE, TEXT, or BSS) and an EQUate, and when / how to use each. Also, byte operation on registers *always* affect the MSB only. Same with a byte operation on a label defined with DATA.

Doh! This is what was confusing me about DIVision and MOVing bytes around.

sometimes99er · March 25, 2011

>8300 is 33536? Isn't that above 32K bytes (32768)? How can that be?

There's a small RAM space, the so called ScratchPad, for the CPU. It's only 256 bytes, but faster (16 bit) than I guess most other RAM. The 32K Expansion is multiplexed (2x8 bit). The ScratchPad is located at >8300 to >83ff. ScratchPad is the only CPU RAM on the bare-bone unexpanded console. Game cartridges will then often need to use VDP RAM for storing variables and stuff.

Before I found asm994a, I was playing around with tiasm

but couldn't figure out how to get something to actually run

in an (emulated) TI. (tiasm source is in the v9t9 linux source

in the tools directory.)

I guess it spits out some kind of cartridge image?

Yep, quick and dirty, used it a lot. Some opcodes are not standard like all indirect autoincrement addressing is +*R1 instead of *R1+. It won't run under Windows7, and I didn't bother to recompile or patch, since I had to move to WinAsm99 sooner or later.

Edited March 25, 2011 by sometimes99er

marc.hull · March 25, 2011

I am still hazy about the internals of the TI99 4/A.

I figured it would start to come clear when I try to

program something and all my trees suddenly

turn into mushrooms.

>8300 is 33536? Isn't that above 32K bytes (32768)? How can that be?

The TI has 65K of mapped memory space. The 32K of ram is mapped (mostly) higher than 32767. Maybe someone has a map available ?

lucien2 · March 25, 2011

From E/A Manual :

matthew180 · March 25, 2011

* Always set your workspace inside the 16-bit fast RAM, i.e. >8300

I am still hazy about the internals of the TI99 4/A.

I figured it would start to come clear when I try to

program something and all my trees suddenly

turn into mushrooms.

>8300 is 33536? Isn't that above 32K bytes (32768)? How can that be?

Take a look at post #24 in my "Assembly My Way" thread:

http://www.atariage.com/forums/topic/162941-assembly-on-the-994a/page__view__findpost__p__2017806

Basically the 99/4A has 3 sources of RAM:

1. 256 bytes (not Kilobyte, just bytes) of 16-bit RAM in the console. This is actually the *only* CPU RAM in the console!

2. 16K bytes of VDP controlled RAM usually referred to as VRAM. You can access it, but only 1 byte at a time and you have to go through the VDP.

3. The 32K expansion in the PEB, *if* the PEB is attached.

A cartridge can also have some RAM in its address space, which the mini-memory does (4K of the 8K cartridge space in the mini-memory is RAM.)

The 9900 CPU is a 16-bit CPU with a 16-bit data bus, so it *always* reads and write 2-bytes at a time (1 "word"), and the address of the word is *always* even. If the 9900 needs to read / write a single byte, it will always grab both even and odd numbered bytes, then isolate the required byte internally. This is also why the 9900 does a read-before-write on all memory accesses (even when writing a word.)

So, the 9900 can *address* 32768 "words" (2-bytes), which are numbered 0 - 65534, counting by two's. So there are 32768 addresses, totaling 64K *BYTES* of memory (memory is universally measured in bytes.)

Think of it like a FOR NEXT in basic that goes like this:

total_addresses = 0
FOR address = 0 TO 65534 STEP 2
 PRINT "Even (high / MSB) byte address: ", address
 PRINT "Odd (low / LSB) byte address: ", address + 1
 total_addresses = total_addresses + 1
NEXT address

PRINT "Total 16-bit addresses: ", total_addresses
PRINT "Highest address: ", address - 1

Keep in mind this explanation is very brief and my assembly thread covers this in more detail.

So, the addresses are numbered from >0000 to >FFFF, and >8300 is where the 256 bytes of 16-bit RAM start. Since this is real 16-bit RAM, it is the fastest RAM in the 99/4A, and thus very precious. This RAM is typically called the "scratch pad" RAM.

My note about *always* keeping the workspace in scratch pad RAM is because, unlike most other CPUs, the 9900's general purpose registers are NOT stored in the CPU itself! The 9900 only has 3 *real* registers: the Program Counter (PC), the Work Space Pointer (WP), and the Status Register. The "registers" R0 through R15 are actually stored in RAM, with the value in the WP as the starting location of the register memory. Thus, the instruction:

LWPI xxxx

Means, Load Workspace Pointer Immediate, which means: load the workspace pointer with *this* immediate value (the value immediately following the LWPI instruction.) We want to ALWAYS make sure the address in the WP is in the 16-bit scratch pad RAM. In your original code, you reserved some memory for the workspace, however that memory was going to be in the 32K RAM expansion, and that memory is 8-bit RAM and causes wait-states to access, so your whole program will suffer drastically. The instruction in my example:

LWPI >8300 * workspace from >8300 to >831F

Sets the workspace to the start of that 16-bit RAM. This is a very typical location for the scratch pad. Keep in mind that each register is 16-bits (2 bytes, or 1 "word"), so each register uses two bytes and starts on an even address (see my assembly thread for a table.)

The other thing to remember about assembly is that you always have to remember to *count* zero. Things don't start at 1, they start at 0.

Before I found asm994a, I was playing around with tiasm

but couldn't figure out how to get something to actually run

in an (emulated) TI.

I have never used the tiasm program. I typically use asm994a with Classic99. See Vorticon's "assembly under emulation" thread for a detailed work flow.

* Store your player X,Y location in a format that makes sense to the game, and not what makes sense the the "screen"

Wow! That's some spoiler! It totally blows the

examples in books out of the water. It is very

neat how you handle the screen

coordinates intuitively (x,y) and then at the

end convert to number 0-767. This is code I will

certainly reuse. Thank you! Can't wait to get home

and type it in so I can get a real look.

That is typical of most BASIC games, and I used to think in terms of the screen myself. That idea of X,Y being separate from the screen is a general "game" thing, I don't take credit for the concept, just spreading the knowledge. Owen learned the same concept for the scrolling map in his RPG game. Most people starting off tend to dump stuff to the screen, then try to read back from the screen to see where the player is, what object may be in the way, etc. As the game gets more complicated, that paradigm gets very difficult to manage, is slow, and has limitation (like what about stuff *off screen*...)

Since your program put the *stuff* on the screen, then your program should know what is there! A better idea is to track all of your objects and such in data structures, then use those structures to draw or update the screen as necessary. Modern 3D games do this by redrawing the entire scene every frame! For 3D you have to do that though, but we can still adopt some of the same concepts for 2D, even on our lowly 99/4A.

Also, the idea of not waiting on user input before something happens, takes some thinking. This is where the idea of a "game loop" comes in, and a basic understanding of a "state machine". See my assembly thread, post #22 for some discussion on this:

http://www.atariage.com/forums/topic/162941-assembly-on-the-994a/page__view__findpost__p__2014877

Owen has been down this road and has asked a lot of these questions, so I think you will find a lot of the threads here on A.A. very helpful. And by all means, ask questions!

Compute doesn't address the bounds problem in the "moving +" example;

maybe the solution they had in mind wouldn't fit in Mini Memory!

The Compute! book does not address a lot of things. It is good for getting you going, but you have to be prepared to move on once you grow past what the book explains. Unfortunately it also demonstrates some things in probably the worst way possible. Some people find this ok: "hey, does it work?" Personally I don't, but as you will probably learn, I'm a freak about execution speed and efficiency.

Your code was good in that you found a solution to your problem. In your case you brute forced the problem, then asked questions. That's a very good way to learn, and a method I have used a lot in the past. Once you have solved a problem though, and understand it, try to come up with a better way.

Doh! This is what was confusing me about DIVision and MOVing bytes around.

Division on the 9900 is slow, try to avoid it whenever possible. Pre-calculate values if possible, and use the shift instructions if you are multiplying or dividing by any power of 2. If you don't understand why a bit-shift left multiplies by 2, and a bit-shift right divides by 2, think of it just like base-10 (decimal numbers.)

Moving the decimal point left in base-10 divides the number by 10, and moving it right multiplies by 10:

In decimal, 10 becomes 100 if you shift (the DIGITS, not the decimal point) to the left (same as multiplying by 10)

In decimal, 10 becomes 1 if you shift (the DIGITS, not the decimal point) to the right (same as dividing by 10)

Binary is shifting bits, not a "decimal point", which is why I made the clarification above. The decimal-point in base-10 is moving in the opposite direction as the digits. In binary there is no "decimal point".

Since computers store numbers in binary and operate in binary (base-2), shifts left and right work on powers of 2 instead of powers of 10.

In binary, 4 (0100) becomes 8 (1000) (multiply by 2) if you shift to the left

In binary, 4 (0100) becomes 2 (0010) (divide by 2) if you shift to the right

Get it? Same concept as decimal, just a different number base.

As for MOV vs. MOVB, EQU vs. DATA or BYTE, etc. those are in the assembly thread too. Try to do some reading on those first, and feel free to ask questions about anything you still don't understand. Try to be specific and you will get good answers.

       JLE     PRINT           * If only I could stick #s in an array
       CI      R0,32           * and reference it somehow.

I noticed that comment. You could stick the numbers in an array. However in this case there was still a better way. As for arrays in assembly, they do not exist like you think of them in BASIC. In assembly you have to think of an array for what it is, i.e. just a chuck of memory that you use in a particular way. DIM A[10] in BASIC would store memory for 10 numbers. Now, in assembly you have to decide if you need BYTE (8-bit) numbers, WORD (16-bit) numbers, or something bigger that the CPU can not deal with directly (like a floating point, fixed point, 32-bit numbers, pointers, etc.) Once you decided what you need, you just reserve the space, or simply *use* some chunk of memory (assuming you know it is empty.)

With 9900 assembly, the assembler has a directive ("directives" are NOT assembly instructions, they are commands for the assembler) called BSS (Block Starting Symbol) that could be used for this purpose. It just sets aside a number of bytes for you to reference, and you know there won't be any other code or data in that memory that you did not put there. If your "block" of RAM is for 16-bit words, then you have to access it as such. If it is bytes, then you index in to it as bytes, etc.

If you are unsure, let me know and I'll make an example. However, the screen itself is an example that you seem to already understand. It is just 768 bytes used by the VDP to draw the tiles on the monitor. We have to index in to that memory from byte 0 to byte 767 (768 total, remember to count 0). We mentally think of the screen as a grid of 32x24 tiles, and that is also how we see it on screen, but the computer just sees 768 consecutive bytes in RAM. Byte at address 0 is screen 0,0, byte at address 32 ends up on screen at 0,1, etc.

Clear as mud, right?

Edited March 25, 2011 by matthew180

sometimes99er · March 25, 2011

Sure DIV is one of the slower instructions, but it does a fine job. If you're doing lots of divisions in time critical loops or similar, then of course, consider ways of avoiding it. It all depends, and sometimes DIV might be the best answer.

matthew180 · March 25, 2011

Sure DIV is one of the slower instructions

DIV is *THE* slowest instruction, by almost a factor of 2. It's fastest time is 92uS, where most instructions are between 8uS and 30uS depending. Sure there are times when it is the only solution, just always think about what you are doing before you use it.

You *know* I had to reply to that!

+adamantyr · March 25, 2011

DIV is *THE* slowest instruction, by almost a factor of 2. It's fastest time is 92uS, where most instructions are between 8uS and 30uS depending. Sure there are times when it is the only solution, just always think about what you are doing before you use it.

You *know* I had to reply to that!

I love how you said putting registers in non-scratchpad will "affect performance drastically"... Most BASIC users wouldn't notice a problem for a LONG time, until they got over-ambitious.

Adamantyr

marc.hull · March 26, 2011

Sure DIV is one of the slower instructions

DIV is *THE* slowest instruction, by almost a factor of 2. It's fastest time is 92uS, where most instructions are between 8uS and 30uS depending. Sure there are times when it is the only solution, just always think about what you are doing before you use it.

You *know* I had to reply to that!

Hey Mathew...

How about sharing your code that does the same as divide but is faster than the instruction with the rest of us. Seems your on to something if you got that.

+InsaneMultitasker · March 26, 2011

Sure DIV is one of the slower instructions

DIV is *THE* slowest instruction, by almost a factor of 2. It's fastest time is 92uS, where most instructions are between 8uS and 30uS depending. Sure there are times when it is the only solution, just always think about what you are doing before you use it.

You *know* I had to reply to that!

Hey Mathew...

How about sharing your code that does the same as divide but is faster than the instruction with the rest of us. Seems your on to something if you got that.

Marc, you need to use the DI (Divide Immediate) instruction. Much, much faster. Immediate, even.

marc.hull · March 26, 2011

Sure DIV is one of the slower instructions

DIV is *THE* slowest instruction, by almost a factor of 2. It's fastest time is 92uS, where most instructions are between 8uS and 30uS depending. Sure there are times when it is the only solution, just always think about what you are doing before you use it.

You *know* I had to reply to that!

Hey Mathew...

How about sharing your code that does the same as divide but is faster than the instruction with the rest of us. Seems your on to something if you got that.

Marc, you need to use the DI (Divide Immediate) instruction. Much, much faster. Immediate, even.

Well.... is that fast enough ?

ceratophyllum · March 27, 2011

Sure DIV is one of the slower instructions

DIV is *THE* slowest instruction, by almost a factor of 2. It's fastest time is 92uS, where most instructions are between 8uS and 30uS depending. Sure there are times when it is the only solution, just always think about what you are doing before you use it.

You *know* I had to reply to that!

Hey Mathew...

How about sharing your code that does the same as divide but is faster than the instruction with the rest of us. Seems your on to something if you got that.

How are you measuring the execution time of particular instructions?

Just curious.

Working on a swimming pool from hell (cracks, leaky pipes, blown gaskets,

half a Barracuda (note capital B), and scary wiring)

have been taking me AFK, but I hope to return my TI soon.

+adamantyr · March 27, 2011

How are you measuring the execution time of particular instructions?

Just curious.

Actually, Theirry Nouspikel already has the 9900 opcode speeds worked out, you can find them here:

http://nouspikel.group.shef.ac.uk//ti99/tms9900.htm#Speed

Multiplication and division are really just add and subtract operations that loop. Multiplication is "Add first value a count of times equal to second value". Division is "subtract first value from second value until remainder is smaller than first value."

The problem with division is that it doesn't have a known stopping point, where multiplication does. That means the time to complete is not a stable value, and it also does a comparison check after each subtraction to see if the remainder is still larger than the divisor. Multiplication doesn't need to do this.

Matthew's main point about multiplication and division is that they are cycle-expensive. In assembly programming, you are often either optimizing for speed or memory. Usually speed ends up costing more memory space. So you should look at your algorithms and make certain that you're not using MPY and DIV when another set of instructions would suffice. If you're always adding/dividing by a power of 2, using shift operators is far more economic. Interestingly, modern ARM processors also don't have division as an opcode, but they do have multiplication. There's a method, thanks to overflows, that you can use to have multiplication actually do division for you, in fact.

That being said, I very much appreciate having MPY and DIV in the TI opcode system. If you're programming for a 6502, you don't have them, and it HURTS to do operations without them.

Adamantyr

sometimes99er · March 28, 2011

How are you measuring the execution time of particular instructions?

Just curious.

TMS9900 Microprocessor Data Manual found in the pinned development resources thread. Or a more direct link to the TMS9900 Microprocessor Data Manual, page 28.

marc.hull · March 28, 2011

If you're always adding/dividing by a power of 2, using shift operators is far more economic.

A point I think is being overlooked is that shift left is not an equivalent to MPY unless you are working in less than 16 bits to begin with and indeed with a power of two. If this is the case then I agree it is the best option. If you are using one of the other 97 percent of the available numbers then multiply is the best choice.

Shift right is definitely not the same as DIV. SR does not provide a remainder which has quite a bit of value in determining screen positions in bit map and sprite/character detection as well as other uses.

I think when people talk about cycle times that tend to not take into consideration the amount of work being done in that time. If you compare the two instructions with their equivalents they are extremely fast and infinitely more flexible than shifts... It really can be an issue of not seeing the forest for the trees IMHO.

That being said, I very much appreciate having MPY and DIV in the TI opcode system. If you're programming for a 6502, you don't have them, and it HURTS to do operations without them.

Adamantyr

Seconded !

As a suggestion I would say just learn to program in a style that suit what you can take in right now. Improvements will come as concepts become clearer to you. There are a million ways to skin a cat and they are all correct ;-) .

Tursi · March 29, 2011

There seems to be a bit of an issue brewing over the usage of DIV and MPY which is frankly baffling to me.

The slowest shift is faster than the fastest DIV, so if you are doing a power of 2, then yes, you want to use shifts. And this is true on nearly every microprocessor.

A classic question is "what if my number is not a power of 2?". For example, to multiply? If the number is the sum of two powers of two, for instance, 384, then you can perform two shifts and add the results (x * 256) + (x * 128) == (x * 384). The question is whether the extra instructions are faster.

You can also provide shifts over 32-bits by masking and ORing the bits that are being shifted out of one word into the other. This requires a couple of shifts and masks.

On the TI-99/4A, the time an algorithm takes is complicated by the memory structure of the machine. Every time you access any memory location that is on the 16<->8 bit multiplexer, you pay an additional 4 CPU cycles. You have to take into account the time to get the instruction itself, the time to read all the input values, and double the time to write the output values (due to the read-before-write behaviour of the CPU). That's in addition to the time of the instruction itself. We went through all this on the Yahoo group a year or two ago, and I don't want to go through it now, but IIRC, we determined that the double-shift and add could be faster if everything involved was in scratchpad, otherwise there was a good chance that DIV was faster or at least comparable.

Computers are binary machines, this means that you tend to work with powers of two more often than with other numbers -- and if you are not, it is often worth considering whether you can change the data so that you CAN work with powers of two. Likewise, working with numbers larger than the system word size complicates the code and slows it down, generally you want to try to avoid that. You can't always avoid it, but that's why it's a guideline, not a rule.

Marc, you specifically give the example of collecting remainders to calculate screen positions -- since the screen is a power of two characters wide in all modes except text, you would generally be better off masking to get your remainer (AND with 0x1F) than using DIV.

A second issue that isn't addressed is that both MPY and DIV have a hidden cost that is not shown in the datasheet -- each of them works with a 32-bit value. MPY provides a 32-bit result - except for ensuring you have space for it, this is not generally a concern, but DIV has a 32-bit dividend, and your code must ensure all 32-bits are set up. This means to use DIV on a 16-bit value, you need an extra initialization to zero the other 16-bits.

DIV and MPY are probably the best tools for the job when enough of the following points are true (and 'enough' is up to the developer):

-You are dealing with non-power of two values (usually the main decision)

-You can not adapt the dataset to powers of two, or doing so is more effort than the payoff

-Converting the math to powers of two complicates the code to be more expensive than DIV

-You need to work with 32-bit values

-Maximum performance is not important (or it is not possible to adapt to the above schemes)

They generally are not the best tools for the job for simple tasks like mapping a screen address to a coordinate pair, or vice-versa. But this is programming, and that's why it's "generally", not "always". When someone is just learning, it's better to do what makes the most sense first, and then learn the "tricks" to do the same thing more quickly -- or even better, to learn how to measure for themselves what works "best".

unhuman · March 29, 2011

The slowest shift is faster than the fastest DIV, so if you are doing a power of 2, then yes, you want to use shifts. And this is true on nearly every microprocessor.

A classic question is "what if my number is not a power of 2?". For example, to multiply? If the number is the sum of two powers of two, for instance, 384, then you can perform two shifts and add the results (x * 256) + (x * 128) == (x * 384). The question is whether the extra instructions are faster.

I've done some silly stuff to try and sort of do a compromise of shifts and divisions (in my limited hardware level math work). If I'm dividing by a known number that's a factor of 2, I'll just use a combination of shifts and then some simple div...

For example if I want to divide by 48, I'll shift 5 and then divide by 3.

-H

marc.hull · March 29, 2011

There seems to be a bit of an issue brewing over the usage of DIV and MPY which is frankly baffling to me.

The slowest shift is faster than the fastest DIV, so if you are doing a power of 2, then yes, you want to use shifts. And this is true on nearly every microprocessor.

A classic question is "what if my number is not a power of 2?". For example, to multiply? If the number is the sum of two powers of two, for instance, 384, then you can perform two shifts and add the results (x * 256) + (x * 128) == (x * 384). The question is whether the extra instructions are faster.

You can also provide shifts over 32-bits by masking and ORing the bits that are being shifted out of one word into the other. This requires a couple of shifts and masks.

On the TI-99/4A, the time an algorithm takes is complicated by the memory structure of the machine. Every time you access any memory location that is on the 16<->8 bit multiplexer, you pay an additional 4 CPU cycles. You have to take into account the time to get the instruction itself, the time to read all the input values, and double the time to write the output values (due to the read-before-write behaviour of the CPU). That's in addition to the time of the instruction itself. We went through all this on the Yahoo group a year or two ago, and I don't want to go through it now, but IIRC, we determined that the double-shift and add could be faster if everything involved was in scratchpad, otherwise there was a good chance that DIV was faster or at least comparable.

Computers are binary machines, this means that you tend to work with powers of two more often than with other numbers -- and if you are not, it is often worth considering whether you can change the data so that you CAN work with powers of two. Likewise, working with numbers larger than the system word size complicates the code and slows it down, generally you want to try to avoid that. You can't always avoid it, but that's why it's a guideline, not a rule.

Marc, you specifically give the example of collecting remainders to calculate screen positions -- since the screen is a power of two characters wide in all modes except text, you would generally be better off masking to get your remainer (AND with 0x1F) than using DIV.

A second issue that isn't addressed is that both MPY and DIV have a hidden cost that is not shown in the datasheet -- each of them works with a 32-bit value. MPY provides a 32-bit result - except for ensuring you have space for it, this is not generally a concern, but DIV has a 32-bit dividend, and your code must ensure all 32-bits are set up. This means to use DIV on a 16-bit value, you need an extra initialization to zero the other 16-bits.

DIV and MPY are probably the best tools for the job when enough of the following points are true (and 'enough' is up to the developer):

-You are dealing with non-power of two values (usually the main decision)

-You can not adapt the dataset to powers of two, or doing so is more effort than the payoff

-Converting the math to powers of two complicates the code to be more expensive than DIV

-You need to work with 32-bit values

-Maximum performance is not important (or it is not possible to adapt to the above schemes)

They generally are not the best tools for the job for simple tasks like mapping a screen address to a coordinate pair, or vice-versa. But this is programming, and that's why it's "generally", not "always". When someone is just learning, it's better to do what makes the most sense first, and then learn the "tricks" to do the same thing more quickly -- or even better, to learn how to measure for themselves what works "best".

I don't think it's a real issue Mike. It's just a difference of opinions.... I completely agree with your last statement and I think that issue has caused the little bit of grief here. Hopefully I and we haven't totally f'd up this thread with this silly hijacking ;-)

Tursi · March 30, 2011

For example if I want to divide by 48, I'll shift 5 and then divide by 3.

Is that a typo... I don't see how that would work?

Willsy · March 30, 2011

Interesting thread.

I needed to parse numbers (in a string) just the other day. The numbers are in base 10. By the time I got to writing that particular section of code, register usage was quite dense, and I was simply too lazy to re-factor the code, especially since it was already working.

Since the numbers are decimal, you're multiplying by 10 to build the number up. I ended up using the old 8x+2x trick, too lazy to use DIV :ponder:

So, to multiply r0 by 10:

MOV R0,R1   ; COPY R0
SLA R0,3    ; MULTIPLY R0 BY 8
A R1,R1     ; MULTIPLY R1 BY 2
A R1,R0     ; R0=R0*10

Not sure if that would be faster than DIV - probably not, since the code is in 8-bit memory, however, it wasn't time-critical code, and it makes register planning a lot easier

my first assembly program

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members