ceratophyllum Posted March 23, 2011 Share Posted March 23, 2011 I was reading Beginning Assembly and the Compute book and I decided it was time to make a little ASCII nethack man move around the screen. At first, to see what was happening, I set the uh "man" to change with the 4 directions and didn't bother to erase. I sort of like how it looks. Reminds me of the patterns in the gravel in a Zen Rock Garden. I'm still too confused to mess with scrolling, however I figured--WRONGLY--that it should not be too hard to set up a one-screen bounded playfield: no wrap around. Err, I mean a screen like in Robot Finds Kitten. That is, I move around Left and Right by INC, DEC and watch out for the rightmost and leftmost columns. (Thanks to the way the screen is numbered, it's easy to keep from going off the top or bottom.) If I were in BASIC I would just check the remainder=0 when dividing by 32 for the leftmost column. However, I've got to go back to MiniMemory and LBLA see just how DIV works. Anyhow, thanks to the wonders of modern text editing and a little kludgery, I at least managed to implement a bone-head solution. Is there a better way, not using DIV, that hasn't occurred to me? Extremely goofy code follows: ****************************** * Zen ASCII Garden * * arrow keys (E,D,X,S) * ****************************** DEF BEGIN REF KSCAN,VSBW KBOARD EQU >8374 * Holds ASCII # of pressed key KEY EQU >8375 * Split keyboard key codes Ext'd Basic man. page 201 KEYER BYTE 15 * z KEYUP BYTE 5 * e KEYRT BYTE 3 * d KEYDN BYTE 0 * x KEYLT BYTE 2 * s HEXFF BYTE >FF * No key pressed value ONE BYTE 1 * MYREG BSS >20 * BEGIN LWPI MYREG MOVB @ONE,@KBOARD * Check left side of keyboard. LI R0,300 * initial position of piece LOOP BLWP @KSCAN * Check for keyboard input. * LI R7,6000 * delay length DLAY DEC R7 * R7=R7-1 JNE DLAY * IF R7>0 goto DLAY * CB @HEXFF,@KEY * Was a key pressed? JEQ LOOP * CB @KEYUP,@KEY * Compare to see which JEQ PUP * arrow key was pressed. CB @KEYRT,@KEY JEQ PRIGHT CB @KEYDN,@KEY JEQ PDOWN CB @KEYLT,@KEY JEQ PLEFT CB @KEYER,@KEY JEQ PERASE * I added an erase key. B @LOOP * no key GOTO LOOP * PERASE LI R1,>2000 * set piece to space B @PRINT * used to erase a ><^V PUP LI R1,>5E00 * set piece to ^ CI R0,32 * are we in top row? JLT SKIP1 * yes? GOTO SKIP1 AI R0,-32 * no? move up one row SKIP1 B @PRINT * branch to display piece PDOWN LI R1,>5600 * set piece to V CI R0,735 * Are we at bottom row? JGT SKIP2 * If yes, skip & dont move down AI R0,32 * move row down SKIP2 B @PRINT * branch to display piece PRIGHT LI R1,>3E00 * set piece to > CI R0,31 * stop at last column on right JEQ SKIP3 * kludgy as hell way to deal with CI R0,63 * long jumps: 2 shorter jumps JEQ SKIP3 CI R0,95 JEQ SKIP3 CI R0,127 JEQ SKIP3 CI R0,159 JEQ SKIP3 CI R0,191 JEQ SKIP3 CI R0,223 JEQ SKIP3 CI R0,255 JEQ SKIP3 CI R0,287 JEQ PRINT * now close enough to reach PRINT CI R0,319 JEQ PRINT CI R0,351 JEQ PRINT CI R0,383 JEQ PRINT CI R0,415 JEQ PRINT CI R0,447 JEQ PRINT CI R0,479 JEQ PRINT CI R0,511 JEQ PRINT CI R0,543 JEQ PRINT CI R0,575 JEQ PRINT CI R0,607 JEQ PRINT CI R0,639 JEQ PRINT CI R0,671 JEQ PRINT CI R0,703 JEQ PRINT CI R0,735 JEQ PRINT CI R0,767 JEQ PRINT INC R0 SKIP3 B @PRINT PLEFT LI R1,>3C00 CI R0,0 * stop marker from going offscreen on left JLE PRINT * If only I could stick #s in an array CI R0,32 * and reference it somehow. JEQ PRINT CI R0,64 JEQ PRINT CI R0,96 JEQ PRINT CI R0,128 JEQ PRINT CI R0,160 JEQ PRINT CI R0,192 JEQ PRINT CI R0,224 JEQ PRINT CI R0,256 JEQ PRINT CI R0,288 JEQ PRINT CI R0,320 JEQ PRINT CI R0,352 JEQ PRINT CI R0,384 JEQ PRINT CI R0,416 JEQ PRINT CI R0,448 JEQ PRINT CI R0,480 JEQ PRINT CI R0,512 JEQ PRINT CI R0,544 JEQ PRINT CI R0,576 JEQ PRINT CI R0,608 JEQ PRINT CI R0,640 JEQ PRINT CI R0,672 JEQ PRINT CI R0,704 JEQ PRINT CI R0,736 JEQ PRINT DEC R0 B @PRINT PRINT BLWP @VSBW * finally put that text on B @LOOP * the screen END Quote Link to comment Share on other sites More sharing options...
+adamantyr Posted March 23, 2011 Share Posted March 23, 2011 Boundary checking in assembly isn't easy or simple. Kudos on creating a working solution! Probably the most important paradigm to embrace here is that the screen is not a data storage area. What you're doing is storing a value in a linear array 768 bytes long. The fact that it's the screen is just how the data is being displayed. You're using a single register to track an index in this array, so this is why it's not easy to suddenly have it behave like it's in a two-coordinate system. So, one solution is to use two data words to store a row and column value, and then calculate the index value on the screen from that. Then you can just check that the row and column are within boundary limits before calculating the index value. Here's a short snippet how it would work: ROW BSS 2 COL BSS 2 . . . CHECK MOV @ROW,R0 CI R0,24 JL CHECK1 * Value is 0 to 23 CLR R0 CHECK1 MOV R0,@ROW MOV @COL,R0 ANDI R0,>001F * AND against 32, will cause column to wrap around if -1 or 32 to 31 and 0 respectively MOV R0,@COL MOV @ROW,R0 SLA R0,5 * Multiply by 32 A @COL,R0 This is a bit crude... I have wrapping working for columns but rows will be strange. Also note that by using the SLA (Shift Left Arithmetic) instruction, I can multiply by any power of 2 value. You can use SRL or SRA to divide in much the same way. Does this help? Adamantyr Quote Link to comment Share on other sites More sharing options...
sometimes99er Posted March 23, 2011 Share Posted March 23, 2011 Is there a better way, not using DIV, that hasn't occurred to me? SRL R0,>5 ; Moves bits 5 places to the right = divides by 32 (to go from screen location to line no.) SLA R0,>5; Moves bits 5 places to the left = multiply by 32 (to go from line no. to screen location) Shift Right Logical (EA manual page 198) Shift Left Arithmetic (EA manual page 200) ANDI R0,>1F ; isolates last 5 bits = will then contain a value between 0 and 31 You should not move left from 0, and not move right from 31. Quote Link to comment Share on other sites More sharing options...
matthew180 Posted March 23, 2011 Share Posted March 23, 2011 Some ideas I recommend: * Don't use registers to store variables. Allocate memory for variables and use registers to do work. * Always set your workspace inside the 16-bit fast RAM, i.e. >8300 * Store your player X,Y location in a format that makes sense to the game, and not what makes sense the the "screen" * The screen is output only - you should be able to redraw it at any time * Use WASD for direction control, yes, even on the 99/4A. It is much more comfortable for one handed control Here is an example based on your code. It introduces a few concepts mentioned above, and gives you something to tinker with (or not.) Some things to pay attention to are the difference between DATA (or BYTE, TEXT, or BSS) and an EQUate, and when / how to use each. Also, byte operation on registers *always* affect the MSB only. Same with a byte operation on a label defined with DATA. I tested this win asm994a and Classic99. Mess with the TRESET value to change the speed. Hopes this helps. ********************************************************************* * * Zen ASCII Garden * * Normal PC style W,A,S,D keys... Yes, even works on a 99/4A and * is much more comfortable than the "arrow" key arrangment. * DEF BEGIN REF KSCAN * VDP Memory Map * VDPRD EQU >8800 * VDP read data VDPSTA EQU >8802 * VDP status VDPWD EQU >8C00 * VDP write data VDPWA EQU >8C02 * VDP set read/write address VR1CPY EQU >83D4 * Copy of VDP register 1 - see E/A manual pg. 248 VSYNC EQU >83D7 * Vertical Sync * Workspace WRKSP EQU >8300 * Workspace R0LB EQU WRKSP+1 * R0 low byte for VDP routines KBOARD EQU >8374 * Holds ASCII # of pressed key KEY EQU >8375 TRESET EQU 200 * Keyboard read delay * Keyboard test delay TIMER DATA 0 * Bounds (zero-based), 1 character boundary BTOP DATA 1 BLEFT DATA 1 BBTM DATA 22 BRIGHT DATA 30 * Player location data P1X DATA 0 P1Y DATA 0 P1XD DATA 0 * New X P1YD DATA 0 * New Y P1CUR BYTE 0 * Current character P1NEW BYTE 0 * New character * Player character codes P1UP BYTE >5E * ^ P1DN BYTE >76 * v P1LT BYTE >3C * < P1RT BYTE >3E * > P1ERSE BYTE >20 * _space_ * Split keyboard key codes XB man. page 201 KEYUP BYTE 4 * w KEYLT BYTE 1 * a KEYDN BYTE 2 * s KEYRT BYTE 3 * d KEYER BYTE 15 * z - erase HEXFF BYTE >FF * No key pressed value ONE BYTE 1 EVEN * Entry point BEGIN LIMI 0 * Interrupts off for writing to the screen LWPI WRKSP * Clear the screen * Assumes the screen is already in Graphics Mode I (24x32) * R0 Starting write address in VDP RAM * R1 MSB of R1 sent to VDP RAM * R2 Number of times to write the MSB byte of R1 to VDP RAM CLR R0 LI R1,>2000 * >20 is hex for 32 (space character) LI R2,768 BL @VSMW * Set inital player location LI R0,16 MOV R0,@P1X LI R0,12 MOV R0,@P1Y MOV @P1X,@P1XD MOV @P1Y,@P1YD MOVB @P1UP,@P1CUR * Set initial direction (high byte of P1CUR) MOVB @P1CUR,@P1NEW * Set "new" character to current character * Set keyboard timer LI R0,TRESET MOV R0,@TIMER ** * Main game loop * LOOP * Check if time to read keyboard. * Should really do this with the VDP interrupt... DEC @TIMER * TIMER := TIMER - 1 JEQ KEY00 * IF TIMER == 0 THEN read keyboard B @STUFF KEY00 * Reset keyboard timer LI R0,TRESET MOV R0,@TIMER MOVB @ONE,@KBOARD * Check left side of keyboard BLWP @KSCAN * Check for keyboard input CB @HEXFF,@KEY * Was "some" key pressed? JNE KEY01 * Yes, test if valid B @STUFF * Otherwise do other stuff KEY01 CB @KEYUP,@KEY * Test each valid key JNE KEY02 DEC @P1YD MOVB @P1UP,@P1NEW B @BOUND KEY02 CB @KEYDN,@KEY JNE KEY03 INC @P1YD MOVB @P1DN,@P1NEW B @BOUND KEY03 CB @KEYLT,@KEY JNE KEY04 DEC @P1XD MOVB @P1LT,@P1NEW B @BOUND KEY04 CB @KEYRT,@KEY JNE KEY05 INC @P1XD MOVB @P1RT,@P1NEW B @BOUND KEY05 CB @KEYER,@KEY JNE KEY06 MOVB @P1ERSE,@P1NEW KEY06 * Bounds check. NOTE, using *SIGNED* tests here to support no border. * Could be more efficient with UNSIGNED tests because the 9900 only * has 2 signed tests: JGT and JLT, but many unsigned tests. Thus the * use of TWO test, one for > or < and one for = BOUND C @P1YD,@BTOP JGT BOUND1 JEQ BOUND1 MOV @P1Y,@P1YD * Reset out of bound Y BOUND1 C @P1YD,@BBTM JLT BOUND2 JEQ BOUND2 MOV @P1Y,@P1YD * Reset out of bound Y BOUND2 C @P1XD,@BLEFT JGT BOUND3 JEQ BOUND3 MOV @P1X,@P1XD * Reset out of bound X BOUND3 C @P1XD,@BRIGHT JLT BOUND4 JEQ BOUND4 MOV @P1X,@P1XD * Reset out of bound X BOUND4 * Update legal X,Y location MOV @P1XD,@P1X MOV @P1YD,@P1Y ** * Do other game stuff, draw the screen, etc. * STUFF * Draw screen and player DRAW * Save new character value, if any. MOVB @P1NEW,@P1CUR * Calcualte player screen location as * loc = y * 32 + x MOV @P1Y,R0 * R0 := Y SLA R0,5 * Multiply R0 (Y) by 32 A @P1X,R0 * Add X MOVB @P1NEW,R1 * R0 Write address in VDP RAM * R1 MSB of R1 sent to VDP RAM BL @VSBW * Enable / disable interrupts really quick to let the * ISR run. Don't need to do this if you are not using * stuff the ISR offers... LIMI 2 LIMI 0 B @LOOP * Go back to main loop top ********************************************************************* * * VDP Single Byte Write * * R0 Write address in VDP RAM * R1 MSB of R1 sent to VDP RAM * * R0 is modified, but can be restored with: ANDI R0,>3FFF * VSBW MOVB @R0LB,@VDPWA * Send low byte of VDP RAM write address ORI R0,>4000 * Set read/write bits 14 and 15 to write (01) MOVB R0,@VDPWA * Send high byte of VDP RAM write address MOVB R1,@VDPWD * Write byte to VDP RAM B *R11 *// VSBW ********************************************************************* * * VDP Single Byte Multiple Write * * R0 Starting write address in VDP RAM * R1 MSB of R1 sent to VDP RAM * R2 Number of times to write the MSB byte of R1 to VDP RAM * * R0 is modified, but can be restored with: ANDI R0,>3FFF * VSMW MOVB @R0LB,@VDPWA * Send low byte of VDP RAM write address ORI R0,>4000 * Set read/write bits 14 and 15 to write (01) MOVB R0,@VDPWA * Send high byte of VDP RAM write address VSMWLP MOVB R1,@VDPWD * Write byte to VDP RAM DEC R2 * Byte counter JNE VSMWLP * Check if done B *R11 *// VSMW ********************************************************************* * * VDP Write To Register * * R0 MSB VDP register to write to * R0 LSB Value to write * VWTR MOVB @R0LB,@VDPWA * Send low byte (value) to write to VDP register ORI R0,>8000 * Set up a VDP register write operation (10) MOVB R0,@VDPWA * Send high byte (address) of VDP register B *R11 *// VWTR END Quote Link to comment Share on other sites More sharing options...
marc.hull Posted March 24, 2011 Share Posted March 24, 2011 I was reading Beginning Assembly and the Compute book and I decided it was time to make a little ASCII nethack man move around the screen. At first, to see what was happening, I set the uh "man" to change with the 4 directions and didn't bother to erase. I sort of like how it looks. Reminds me of the patterns in the gravel in a Zen Rock Garden. I'm still too confused to mess with scrolling, however I figured--WRONGLY--that it should not be too hard to set up a one-screen bounded playfield: no wrap around. Err, I mean a screen like in Robot Finds Kitten. That is, I move around Left and Right by INC, DEC and watch out for the rightmost and leftmost columns. (Thanks to the way the screen is numbered, it's easy to keep from going off the top or bottom.) If I were in BASIC I would just check the remainder=0 when dividing by 32 for the leftmost column. However, I've got to go back to MiniMemory and LBLA see just how DIV works. Anyhow, thanks to the wonders of modern text editing and a little kludgery, I at least managed to implement a bone-head solution. Is there a better way, not using DIV, that hasn't occurred to me? Extremely goofy code follows: ****************************** * Zen ASCII Garden * * arrow keys (E,D,X,S) * ****************************** DEF BEGIN REF KSCAN,VSBW KBOARD EQU >8374 * Holds ASCII # of pressed key KEY EQU >8375 * Split keyboard key codes Ext'd Basic man. page 201 KEYER BYTE 15 * z KEYUP BYTE 5 * e KEYRT BYTE 3 * d KEYDN BYTE 0 * x KEYLT BYTE 2 * s HEXFF BYTE >FF * No key pressed value ONE BYTE 1 * MYREG BSS >20 * BEGIN LWPI MYREG MOVB @ONE,@KBOARD * Check left side of keyboard. LI R0,300 * initial position of piece LOOP BLWP @KSCAN * Check for keyboard input. * LI R7,6000 * delay length DLAY DEC R7 * R7=R7-1 JNE DLAY * IF R7>0 goto DLAY * CB @HEXFF,@KEY * Was a key pressed? JEQ LOOP * CB @KEYUP,@KEY * Compare to see which JEQ PUP * arrow key was pressed. CB @KEYRT,@KEY JEQ PRIGHT CB @KEYDN,@KEY JEQ PDOWN CB @KEYLT,@KEY JEQ PLEFT CB @KEYER,@KEY JEQ PERASE * I added an erase key. B @LOOP * no key GOTO LOOP * PERASE LI R1,>2000 * set piece to space B @PRINT * used to erase a ><^V PUP LI R1,>5E00 * set piece to ^ CI R0,32 * are we in top row? JLT SKIP1 * yes? GOTO SKIP1 AI R0,-32 * no? move up one row SKIP1 B @PRINT * branch to display piece PDOWN LI R1,>5600 * set piece to V CI R0,735 * Are we at bottom row? JGT SKIP2 * If yes, skip & dont move down AI R0,32 * move row down SKIP2 B @PRINT * branch to display piece PRIGHT LI R1,>3E00 * set piece to > CI R0,31 * stop at last column on right JEQ SKIP3 * kludgy as hell way to deal with CI R0,63 * long jumps: 2 shorter jumps JEQ SKIP3 CI R0,95 JEQ SKIP3 CI R0,127 JEQ SKIP3 CI R0,159 JEQ SKIP3 CI R0,191 JEQ SKIP3 CI R0,223 JEQ SKIP3 CI R0,255 JEQ SKIP3 CI R0,287 JEQ PRINT * now close enough to reach PRINT CI R0,319 JEQ PRINT CI R0,351 JEQ PRINT CI R0,383 JEQ PRINT CI R0,415 JEQ PRINT CI R0,447 JEQ PRINT CI R0,479 JEQ PRINT CI R0,511 JEQ PRINT CI R0,543 JEQ PRINT CI R0,575 JEQ PRINT CI R0,607 JEQ PRINT CI R0,639 JEQ PRINT CI R0,671 JEQ PRINT CI R0,703 JEQ PRINT CI R0,735 JEQ PRINT CI R0,767 JEQ PRINT INC R0 SKIP3 B @PRINT PLEFT LI R1,>3C00 CI R0,0 * stop marker from going offscreen on left JLE PRINT * If only I could stick #s in an array CI R0,32 * and reference it somehow. JEQ PRINT CI R0,64 JEQ PRINT CI R0,96 JEQ PRINT CI R0,128 JEQ PRINT CI R0,160 JEQ PRINT CI R0,192 JEQ PRINT CI R0,224 JEQ PRINT CI R0,256 JEQ PRINT CI R0,288 JEQ PRINT CI R0,320 JEQ PRINT CI R0,352 JEQ PRINT CI R0,384 JEQ PRINT CI R0,416 JEQ PRINT CI R0,448 JEQ PRINT CI R0,480 JEQ PRINT CI R0,512 JEQ PRINT CI R0,544 JEQ PRINT CI R0,576 JEQ PRINT CI R0,608 JEQ PRINT CI R0,640 JEQ PRINT CI R0,672 JEQ PRINT CI R0,704 JEQ PRINT CI R0,736 JEQ PRINT DEC R0 B @PRINT PRINT BLWP @VSBW * finally put that text on B @LOOP * the screen END That looks really good ! Welcome to the assembly club.... Keep on keeping on cerato..... Quote Link to comment Share on other sites More sharing options...
ceratophyllum Posted March 25, 2011 Author Share Posted March 25, 2011 * Always set your workspace inside the 16-bit fast RAM, i.e. >8300 I am still hazy about the internals of the TI99 4/A. I figured it would start to come clear when I try to program something and all my trees suddenly turn into mushrooms. >8300 is 33536? Isn't that above 32K bytes (32768)? How can that be? Before I found asm994a, I was playing around with tiasm but couldn't figure out how to get something to actually run in an (emulated) TI. (tiasm source is in the v9t9 linux source in the tools directory.) I guess it spits out some kind of cartridge image? ~/Games/ti994a>./tiasm TIASM <input file> [-r <console ROM output>] [-m <module ROM output>] [-d <DSR ROM output>] [-g <console GROM output>] [<list file>] -r saves the 8k memory block at >0000. -m saves the 8k memory block at >6000. -d saves the 8k memory block at >4000. -g saves the 24k memory block at >0000. This can only be used with -m. * Store your player X,Y location in a format that makes sense to the game, and not what makes sense the the "screen" Wow! That's some spoiler! It totally blows the examples in books out of the water. It is very neat how you handle the screen coordinates intuitively (x,y) and then at the end convert to number 0-767. This is code I will certainly reuse. Thank you! Can't wait to get home and type it in so I can get a real look. Compute doesn't address the bounds problem in the "moving +" example; maybe the solution they had in mind wouldn't fit in Mini Memory! Here is an example based on your code. It introduces a few concepts mentioned above, and gives you something to tinker with (or not.) Some things to pay attention to are the difference between DATA (or BYTE, TEXT, or BSS) and an EQUate, and when / how to use each. Also, byte operation on registers *always* affect the MSB only. Same with a byte operation on a label defined with DATA. Doh! This is what was confusing me about DIVision and MOVing bytes around. Quote Link to comment Share on other sites More sharing options...
sometimes99er Posted March 25, 2011 Share Posted March 25, 2011 (edited) >8300 is 33536? Isn't that above 32K bytes (32768)? How can that be? There's a small RAM space, the so called ScratchPad, for the CPU. It's only 256 bytes, but faster (16 bit) than I guess most other RAM. The 32K Expansion is multiplexed (2x8 bit). The ScratchPad is located at >8300 to >83ff. ScratchPad is the only CPU RAM on the bare-bone unexpanded console. Game cartridges will then often need to use VDP RAM for storing variables and stuff. Before I found asm994a, I was playing around with tiasm but couldn't figure out how to get something to actually run in an (emulated) TI. (tiasm source is in the v9t9 linux source in the tools directory.) I guess it spits out some kind of cartridge image? Yep, quick and dirty, used it a lot. Some opcodes are not standard like all indirect autoincrement addressing is +*R1 instead of *R1+. It won't run under Windows7, and I didn't bother to recompile or patch, since I had to move to WinAsm99 sooner or later. Edited March 25, 2011 by sometimes99er Quote Link to comment Share on other sites More sharing options...
marc.hull Posted March 25, 2011 Share Posted March 25, 2011 I am still hazy about the internals of the TI99 4/A. I figured it would start to come clear when I try to program something and all my trees suddenly turn into mushrooms. >8300 is 33536? Isn't that above 32K bytes (32768)? How can that be? The TI has 65K of mapped memory space. The 32K of ram is mapped (mostly) higher than 32767. Maybe someone has a map available ? Quote Link to comment Share on other sites More sharing options...
lucien2 Posted March 25, 2011 Share Posted March 25, 2011 From E/A Manual : Quote Link to comment Share on other sites More sharing options...
matthew180 Posted March 25, 2011 Share Posted March 25, 2011 (edited) * Always set your workspace inside the 16-bit fast RAM, i.e. >8300 I am still hazy about the internals of the TI99 4/A. I figured it would start to come clear when I try to program something and all my trees suddenly turn into mushrooms. >8300 is 33536? Isn't that above 32K bytes (32768)? How can that be? Take a look at post #24 in my "Assembly My Way" thread: http://www.atariage.com/forums/topic/162941-assembly-on-the-994a/page__view__findpost__p__2017806 Basically the 99/4A has 3 sources of RAM: 1. 256 bytes (not Kilobyte, just bytes) of 16-bit RAM in the console. This is actually the *only* CPU RAM in the console! 2. 16K bytes of VDP controlled RAM usually referred to as VRAM. You can access it, but only 1 byte at a time and you have to go through the VDP. 3. The 32K expansion in the PEB, *if* the PEB is attached. A cartridge can also have some RAM in its address space, which the mini-memory does (4K of the 8K cartridge space in the mini-memory is RAM.) The 9900 CPU is a 16-bit CPU with a 16-bit data bus, so it *always* reads and write 2-bytes at a time (1 "word"), and the address of the word is *always* even. If the 9900 needs to read / write a single byte, it will always grab both even and odd numbered bytes, then isolate the required byte internally. This is also why the 9900 does a read-before-write on all memory accesses (even when writing a word.) So, the 9900 can *address* 32768 "words" (2-bytes), which are numbered 0 - 65534, counting by two's. So there are 32768 addresses, totaling 64K *BYTES* of memory (memory is universally measured in bytes.) Think of it like a FOR NEXT in basic that goes like this: total_addresses = 0 FOR address = 0 TO 65534 STEP 2 PRINT "Even (high / MSB) byte address: ", address PRINT "Odd (low / LSB) byte address: ", address + 1 total_addresses = total_addresses + 1 NEXT address PRINT "Total 16-bit addresses: ", total_addresses PRINT "Highest address: ", address - 1 Keep in mind this explanation is very brief and my assembly thread covers this in more detail. So, the addresses are numbered from >0000 to >FFFF, and >8300 is where the 256 bytes of 16-bit RAM start. Since this is real 16-bit RAM, it is the fastest RAM in the 99/4A, and thus very precious. This RAM is typically called the "scratch pad" RAM. My note about *always* keeping the workspace in scratch pad RAM is because, unlike most other CPUs, the 9900's general purpose registers are NOT stored in the CPU itself! The 9900 only has 3 *real* registers: the Program Counter (PC), the Work Space Pointer (WP), and the Status Register. The "registers" R0 through R15 are actually stored in RAM, with the value in the WP as the starting location of the register memory. Thus, the instruction: LWPI xxxx Means, Load Workspace Pointer Immediate, which means: load the workspace pointer with *this* immediate value (the value immediately following the LWPI instruction.) We want to ALWAYS make sure the address in the WP is in the 16-bit scratch pad RAM. In your original code, you reserved some memory for the workspace, however that memory was going to be in the 32K RAM expansion, and that memory is 8-bit RAM and causes wait-states to access, so your whole program will suffer drastically. The instruction in my example: LWPI >8300 * workspace from >8300 to >831F Sets the workspace to the start of that 16-bit RAM. This is a very typical location for the scratch pad. Keep in mind that each register is 16-bits (2 bytes, or 1 "word"), so each register uses two bytes and starts on an even address (see my assembly thread for a table.) The other thing to remember about assembly is that you always have to remember to *count* zero. Things don't start at 1, they start at 0. Before I found asm994a, I was playing around with tiasm but couldn't figure out how to get something to actually run in an (emulated) TI. I have never used the tiasm program. I typically use asm994a with Classic99. See Vorticon's "assembly under emulation" thread for a detailed work flow. * Store your player X,Y location in a format that makes sense to the game, and not what makes sense the the "screen" Wow! That's some spoiler! It totally blows the examples in books out of the water. It is very neat how you handle the screen coordinates intuitively (x,y) and then at the end convert to number 0-767. This is code I will certainly reuse. Thank you! Can't wait to get home and type it in so I can get a real look. That is typical of most BASIC games, and I used to think in terms of the screen myself. That idea of X,Y being separate from the screen is a general "game" thing, I don't take credit for the concept, just spreading the knowledge. Owen learned the same concept for the scrolling map in his RPG game. Most people starting off tend to dump stuff to the screen, then try to read back from the screen to see where the player is, what object may be in the way, etc. As the game gets more complicated, that paradigm gets very difficult to manage, is slow, and has limitation (like what about stuff *off screen*...) Since your program put the *stuff* on the screen, then your program should know what is there! A better idea is to track all of your objects and such in data structures, then use those structures to draw or update the screen as necessary. Modern 3D games do this by redrawing the entire scene every frame! For 3D you have to do that though, but we can still adopt some of the same concepts for 2D, even on our lowly 99/4A. Also, the idea of not waiting on user input before something happens, takes some thinking. This is where the idea of a "game loop" comes in, and a basic understanding of a "state machine". See my assembly thread, post #22 for some discussion on this: http://www.atariage.com/forums/topic/162941-assembly-on-the-994a/page__view__findpost__p__2014877 Owen has been down this road and has asked a lot of these questions, so I think you will find a lot of the threads here on A.A. very helpful. And by all means, ask questions! Compute doesn't address the bounds problem in the "moving +" example; maybe the solution they had in mind wouldn't fit in Mini Memory! The Compute! book does not address a lot of things. It is good for getting you going, but you have to be prepared to move on once you grow past what the book explains. Unfortunately it also demonstrates some things in probably the worst way possible. Some people find this ok: "hey, does it work?" Personally I don't, but as you will probably learn, I'm a freak about execution speed and efficiency. Your code was good in that you found a solution to your problem. In your case you brute forced the problem, then asked questions. That's a very good way to learn, and a method I have used a lot in the past. Once you have solved a problem though, and understand it, try to come up with a better way. Doh! This is what was confusing me about DIVision and MOVing bytes around. Division on the 9900 is slow, try to avoid it whenever possible. Pre-calculate values if possible, and use the shift instructions if you are multiplying or dividing by any power of 2. If you don't understand why a bit-shift left multiplies by 2, and a bit-shift right divides by 2, think of it just like base-10 (decimal numbers.) Moving the decimal point left in base-10 divides the number by 10, and moving it right multiplies by 10: In decimal, 10 becomes 100 if you shift (the DIGITS, not the decimal point) to the left (same as multiplying by 10) In decimal, 10 becomes 1 if you shift (the DIGITS, not the decimal point) to the right (same as dividing by 10) Binary is shifting bits, not a "decimal point", which is why I made the clarification above. The decimal-point in base-10 is moving in the opposite direction as the digits. In binary there is no "decimal point". Since computers store numbers in binary and operate in binary (base-2), shifts left and right work on powers of 2 instead of powers of 10. In binary, 4 (0100) becomes 8 (1000) (multiply by 2) if you shift to the left In binary, 4 (0100) becomes 2 (0010) (divide by 2) if you shift to the right Get it? Same concept as decimal, just a different number base. As for MOV vs. MOVB, EQU vs. DATA or BYTE, etc. those are in the assembly thread too. Try to do some reading on those first, and feel free to ask questions about anything you still don't understand. Try to be specific and you will get good answers. JLE PRINT * If only I could stick #s in an array CI R0,32 * and reference it somehow. I noticed that comment. You could stick the numbers in an array. However in this case there was still a better way. As for arrays in assembly, they do not exist like you think of them in BASIC. In assembly you have to think of an array for what it is, i.e. just a chuck of memory that you use in a particular way. DIM A[10] in BASIC would store memory for 10 numbers. Now, in assembly you have to decide if you need BYTE (8-bit) numbers, WORD (16-bit) numbers, or something bigger that the CPU can not deal with directly (like a floating point, fixed point, 32-bit numbers, pointers, etc.) Once you decided what you need, you just reserve the space, or simply *use* some chunk of memory (assuming you know it is empty.) With 9900 assembly, the assembler has a directive ("directives" are NOT assembly instructions, they are commands for the assembler) called BSS (Block Starting Symbol) that could be used for this purpose. It just sets aside a number of bytes for you to reference, and you know there won't be any other code or data in that memory that you did not put there. If your "block" of RAM is for 16-bit words, then you have to access it as such. If it is bytes, then you index in to it as bytes, etc. If you are unsure, let me know and I'll make an example. However, the screen itself is an example that you seem to already understand. It is just 768 bytes used by the VDP to draw the tiles on the monitor. We have to index in to that memory from byte 0 to byte 767 (768 total, remember to count 0). We mentally think of the screen as a grid of 32x24 tiles, and that is also how we see it on screen, but the computer just sees 768 consecutive bytes in RAM. Byte at address 0 is screen 0,0, byte at address 32 ends up on screen at 0,1, etc. Clear as mud, right? Edited March 25, 2011 by matthew180 Quote Link to comment Share on other sites More sharing options...
sometimes99er Posted March 25, 2011 Share Posted March 25, 2011 Sure DIV is one of the slower instructions, but it does a fine job. If you're doing lots of divisions in time critical loops or similar, then of course, consider ways of avoiding it. It all depends, and sometimes DIV might be the best answer. Quote Link to comment Share on other sites More sharing options...
matthew180 Posted March 25, 2011 Share Posted March 25, 2011 Sure DIV is one of the slower instructions DIV is *THE* slowest instruction, by almost a factor of 2. It's fastest time is 92uS, where most instructions are between 8uS and 30uS depending. Sure there are times when it is the only solution, just always think about what you are doing before you use it. You *know* I had to reply to that! Quote Link to comment Share on other sites More sharing options...
+adamantyr Posted March 25, 2011 Share Posted March 25, 2011 DIV is *THE* slowest instruction, by almost a factor of 2. It's fastest time is 92uS, where most instructions are between 8uS and 30uS depending. Sure there are times when it is the only solution, just always think about what you are doing before you use it. You *know* I had to reply to that! I love how you said putting registers in non-scratchpad will "affect performance drastically"... Most BASIC users wouldn't notice a problem for a LONG time, until they got over-ambitious. Adamantyr Quote Link to comment Share on other sites More sharing options...
marc.hull Posted March 26, 2011 Share Posted March 26, 2011 Sure DIV is one of the slower instructions DIV is *THE* slowest instruction, by almost a factor of 2. It's fastest time is 92uS, where most instructions are between 8uS and 30uS depending. Sure there are times when it is the only solution, just always think about what you are doing before you use it. You *know* I had to reply to that! Hey Mathew... How about sharing your code that does the same as divide but is faster than the instruction with the rest of us. Seems your on to something if you got that. Quote Link to comment Share on other sites More sharing options...
+InsaneMultitasker Posted March 26, 2011 Share Posted March 26, 2011 Sure DIV is one of the slower instructions DIV is *THE* slowest instruction, by almost a factor of 2. It's fastest time is 92uS, where most instructions are between 8uS and 30uS depending. Sure there are times when it is the only solution, just always think about what you are doing before you use it. You *know* I had to reply to that! Hey Mathew... How about sharing your code that does the same as divide but is faster than the instruction with the rest of us. Seems your on to something if you got that. Marc, you need to use the DI (Divide Immediate) instruction. Much, much faster. Immediate, even. Quote Link to comment Share on other sites More sharing options...
marc.hull Posted March 26, 2011 Share Posted March 26, 2011 Sure DIV is one of the slower instructions DIV is *THE* slowest instruction, by almost a factor of 2. It's fastest time is 92uS, where most instructions are between 8uS and 30uS depending. Sure there are times when it is the only solution, just always think about what you are doing before you use it. You *know* I had to reply to that! Hey Mathew... How about sharing your code that does the same as divide but is faster than the instruction with the rest of us. Seems your on to something if you got that. Marc, you need to use the DI (Divide Immediate) instruction. Much, much faster. Immediate, even. Well.... is that fast enough ? Quote Link to comment Share on other sites More sharing options...
ceratophyllum Posted March 27, 2011 Author Share Posted March 27, 2011 Sure DIV is one of the slower instructions DIV is *THE* slowest instruction, by almost a factor of 2. It's fastest time is 92uS, where most instructions are between 8uS and 30uS depending. Sure there are times when it is the only solution, just always think about what you are doing before you use it. You *know* I had to reply to that! Hey Mathew... How about sharing your code that does the same as divide but is faster than the instruction with the rest of us. Seems your on to something if you got that. How are you measuring the execution time of particular instructions? Just curious. Working on a swimming pool from hell (cracks, leaky pipes, blown gaskets, half a Barracuda (note capital B), and scary wiring) have been taking me AFK, but I hope to return my TI soon. Quote Link to comment Share on other sites More sharing options...
+adamantyr Posted March 27, 2011 Share Posted March 27, 2011 How are you measuring the execution time of particular instructions? Just curious. Actually, Theirry Nouspikel already has the 9900 opcode speeds worked out, you can find them here: http://nouspikel.group.shef.ac.uk//ti99/tms9900.htm#Speed Multiplication and division are really just add and subtract operations that loop. Multiplication is "Add first value a count of times equal to second value". Division is "subtract first value from second value until remainder is smaller than first value." The problem with division is that it doesn't have a known stopping point, where multiplication does. That means the time to complete is not a stable value, and it also does a comparison check after each subtraction to see if the remainder is still larger than the divisor. Multiplication doesn't need to do this. Matthew's main point about multiplication and division is that they are cycle-expensive. In assembly programming, you are often either optimizing for speed or memory. Usually speed ends up costing more memory space. So you should look at your algorithms and make certain that you're not using MPY and DIV when another set of instructions would suffice. If you're always adding/dividing by a power of 2, using shift operators is far more economic. Interestingly, modern ARM processors also don't have division as an opcode, but they do have multiplication. There's a method, thanks to overflows, that you can use to have multiplication actually do division for you, in fact. That being said, I very much appreciate having MPY and DIV in the TI opcode system. If you're programming for a 6502, you don't have them, and it HURTS to do operations without them. Adamantyr Quote Link to comment Share on other sites More sharing options...
sometimes99er Posted March 28, 2011 Share Posted March 28, 2011 How are you measuring the execution time of particular instructions? Just curious. TMS9900 Microprocessor Data Manual found in the pinned development resources thread. Or a more direct link to the TMS9900 Microprocessor Data Manual, page 28. Quote Link to comment Share on other sites More sharing options...
marc.hull Posted March 28, 2011 Share Posted March 28, 2011 If you're always adding/dividing by a power of 2, using shift operators is far more economic. A point I think is being overlooked is that shift left is not an equivalent to MPY unless you are working in less than 16 bits to begin with and indeed with a power of two. If this is the case then I agree it is the best option. If you are using one of the other 97 percent of the available numbers then multiply is the best choice. Shift right is definitely not the same as DIV. SR does not provide a remainder which has quite a bit of value in determining screen positions in bit map and sprite/character detection as well as other uses. I think when people talk about cycle times that tend to not take into consideration the amount of work being done in that time. If you compare the two instructions with their equivalents they are extremely fast and infinitely more flexible than shifts... It really can be an issue of not seeing the forest for the trees IMHO. That being said, I very much appreciate having MPY and DIV in the TI opcode system. If you're programming for a 6502, you don't have them, and it HURTS to do operations without them. Adamantyr Seconded ! As a suggestion I would say just learn to program in a style that suit what you can take in right now. Improvements will come as concepts become clearer to you. There are a million ways to skin a cat and they are all correct . Quote Link to comment Share on other sites More sharing options...
Tursi Posted March 29, 2011 Share Posted March 29, 2011 There seems to be a bit of an issue brewing over the usage of DIV and MPY which is frankly baffling to me. The slowest shift is faster than the fastest DIV, so if you are doing a power of 2, then yes, you want to use shifts. And this is true on nearly every microprocessor. A classic question is "what if my number is not a power of 2?". For example, to multiply? If the number is the sum of two powers of two, for instance, 384, then you can perform two shifts and add the results (x * 256) + (x * 128) == (x * 384). The question is whether the extra instructions are faster. You can also provide shifts over 32-bits by masking and ORing the bits that are being shifted out of one word into the other. This requires a couple of shifts and masks. On the TI-99/4A, the time an algorithm takes is complicated by the memory structure of the machine. Every time you access any memory location that is on the 16<->8 bit multiplexer, you pay an additional 4 CPU cycles. You have to take into account the time to get the instruction itself, the time to read all the input values, and double the time to write the output values (due to the read-before-write behaviour of the CPU). That's in addition to the time of the instruction itself. We went through all this on the Yahoo group a year or two ago, and I don't want to go through it now, but IIRC, we determined that the double-shift and add could be faster if everything involved was in scratchpad, otherwise there was a good chance that DIV was faster or at least comparable. Computers are binary machines, this means that you tend to work with powers of two more often than with other numbers -- and if you are not, it is often worth considering whether you can change the data so that you CAN work with powers of two. Likewise, working with numbers larger than the system word size complicates the code and slows it down, generally you want to try to avoid that. You can't always avoid it, but that's why it's a guideline, not a rule. Marc, you specifically give the example of collecting remainders to calculate screen positions -- since the screen is a power of two characters wide in all modes except text, you would generally be better off masking to get your remainer (AND with 0x1F) than using DIV. A second issue that isn't addressed is that both MPY and DIV have a hidden cost that is not shown in the datasheet -- each of them works with a 32-bit value. MPY provides a 32-bit result - except for ensuring you have space for it, this is not generally a concern, but DIV has a 32-bit dividend, and your code must ensure all 32-bits are set up. This means to use DIV on a 16-bit value, you need an extra initialization to zero the other 16-bits. DIV and MPY are probably the best tools for the job when enough of the following points are true (and 'enough' is up to the developer): -You are dealing with non-power of two values (usually the main decision) -You can not adapt the dataset to powers of two, or doing so is more effort than the payoff -Converting the math to powers of two complicates the code to be more expensive than DIV -You need to work with 32-bit values -Maximum performance is not important (or it is not possible to adapt to the above schemes) They generally are not the best tools for the job for simple tasks like mapping a screen address to a coordinate pair, or vice-versa. But this is programming, and that's why it's "generally", not "always". When someone is just learning, it's better to do what makes the most sense first, and then learn the "tricks" to do the same thing more quickly -- or even better, to learn how to measure for themselves what works "best". Quote Link to comment Share on other sites More sharing options...
unhuman Posted March 29, 2011 Share Posted March 29, 2011 The slowest shift is faster than the fastest DIV, so if you are doing a power of 2, then yes, you want to use shifts. And this is true on nearly every microprocessor. A classic question is "what if my number is not a power of 2?". For example, to multiply? If the number is the sum of two powers of two, for instance, 384, then you can perform two shifts and add the results (x * 256) + (x * 128) == (x * 384). The question is whether the extra instructions are faster. I've done some silly stuff to try and sort of do a compromise of shifts and divisions (in my limited hardware level math work). If I'm dividing by a known number that's a factor of 2, I'll just use a combination of shifts and then some simple div... For example if I want to divide by 48, I'll shift 5 and then divide by 3. -H Quote Link to comment Share on other sites More sharing options...
marc.hull Posted March 29, 2011 Share Posted March 29, 2011 There seems to be a bit of an issue brewing over the usage of DIV and MPY which is frankly baffling to me. The slowest shift is faster than the fastest DIV, so if you are doing a power of 2, then yes, you want to use shifts. And this is true on nearly every microprocessor. A classic question is "what if my number is not a power of 2?". For example, to multiply? If the number is the sum of two powers of two, for instance, 384, then you can perform two shifts and add the results (x * 256) + (x * 128) == (x * 384). The question is whether the extra instructions are faster. You can also provide shifts over 32-bits by masking and ORing the bits that are being shifted out of one word into the other. This requires a couple of shifts and masks. On the TI-99/4A, the time an algorithm takes is complicated by the memory structure of the machine. Every time you access any memory location that is on the 16<->8 bit multiplexer, you pay an additional 4 CPU cycles. You have to take into account the time to get the instruction itself, the time to read all the input values, and double the time to write the output values (due to the read-before-write behaviour of the CPU). That's in addition to the time of the instruction itself. We went through all this on the Yahoo group a year or two ago, and I don't want to go through it now, but IIRC, we determined that the double-shift and add could be faster if everything involved was in scratchpad, otherwise there was a good chance that DIV was faster or at least comparable. Computers are binary machines, this means that you tend to work with powers of two more often than with other numbers -- and if you are not, it is often worth considering whether you can change the data so that you CAN work with powers of two. Likewise, working with numbers larger than the system word size complicates the code and slows it down, generally you want to try to avoid that. You can't always avoid it, but that's why it's a guideline, not a rule. Marc, you specifically give the example of collecting remainders to calculate screen positions -- since the screen is a power of two characters wide in all modes except text, you would generally be better off masking to get your remainer (AND with 0x1F) than using DIV. A second issue that isn't addressed is that both MPY and DIV have a hidden cost that is not shown in the datasheet -- each of them works with a 32-bit value. MPY provides a 32-bit result - except for ensuring you have space for it, this is not generally a concern, but DIV has a 32-bit dividend, and your code must ensure all 32-bits are set up. This means to use DIV on a 16-bit value, you need an extra initialization to zero the other 16-bits. DIV and MPY are probably the best tools for the job when enough of the following points are true (and 'enough' is up to the developer): -You are dealing with non-power of two values (usually the main decision) -You can not adapt the dataset to powers of two, or doing so is more effort than the payoff -Converting the math to powers of two complicates the code to be more expensive than DIV -You need to work with 32-bit values -Maximum performance is not important (or it is not possible to adapt to the above schemes) They generally are not the best tools for the job for simple tasks like mapping a screen address to a coordinate pair, or vice-versa. But this is programming, and that's why it's "generally", not "always". When someone is just learning, it's better to do what makes the most sense first, and then learn the "tricks" to do the same thing more quickly -- or even better, to learn how to measure for themselves what works "best". I don't think it's a real issue Mike. It's just a difference of opinions.... I completely agree with your last statement and I think that issue has caused the little bit of grief here. Hopefully I and we haven't totally f'd up this thread with this silly hijacking Quote Link to comment Share on other sites More sharing options...
Tursi Posted March 30, 2011 Share Posted March 30, 2011 For example if I want to divide by 48, I'll shift 5 and then divide by 3. Is that a typo... I don't see how that would work? Quote Link to comment Share on other sites More sharing options...
Willsy Posted March 30, 2011 Share Posted March 30, 2011 Interesting thread. I needed to parse numbers (in a string) just the other day. The numbers are in base 10. By the time I got to writing that particular section of code, register usage was quite dense, and I was simply too lazy to re-factor the code, especially since it was already working. Since the numbers are decimal, you're multiplying by 10 to build the number up. I ended up using the old 8x+2x trick, too lazy to use DIV So, to multiply r0 by 10: MOV R0,R1 ; COPY R0 SLA R0,3 ; MULTIPLY R0 BY 8 A R1,R1 ; MULTIPLY R1 BY 2 A R1,R0 ; R0=R0*10 Not sure if that would be faster than DIV - probably not, since the code is in 8-bit memory, however, it wasn't time-critical code, and it makes register planning a lot easier Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.