Asmusr Posted July 20, 2017 Author Share Posted July 20, 2017 So what have I learned? It's definitely possible to convert Z80 code to the TI as can be seen here: http://atariage.com/forums/topic/267989-knight-lore/ My mapping of Z80 to TI register worked well: tmp0 equ 0 tmp1 equ 1 one equ 2 mone equ 3 af equ 4 a equ 4 bc equ 5 b equ 5 c equ R5LB de equ 6 d equ 6 e equ R6LB hl equ 7 h equ 7 l equ R7LB ix equ 8 iy equ 9 sp equ 10 af_ equ 12 bc_ equ 13 de_ equ 14 hl_ equ 15 one and mone are constants 1 and -1 that I found that I used all the time. It's inefficient to use the LSB registers c, e, and l because they have to be accessed as memory bytes rather than registers. The Knight Lore code is checking the carry flag a lot. I found that after a byte compare (cp on Z80, cb on the TI) the carry condition on the Z80 correspond to JL on the TI or JHE for the inverse condition. If the carry flag is checked after a subtraction instead of a compare this has to be turned into a compare followed by a (possible) subtraction on the TI. The biggest conversion issue is probably that loading data into a register (ld on Z80, mov or li on the TI) does not set any flags on the Z80, so here you can do a compare, then load something, and then check the condition. This type of code has to be reworked on the TI. A stack and calls to subroutines can easily, but somewhat slowly, be emulated on the TI, but on the TI it's the called routine that pushes the return address (r11) onto the stack. If Z80 code is jumping directly into subroutines (rather than calling) it is necessary to bypass the initial push of the return address on the TI. Anyone has a commented Z80 disassembly of Elite? 4 Quote Link to comment Share on other sites More sharing options...
Asmusr Posted July 20, 2017 Author Share Posted July 20, 2017 (edited) Forgot to mention: I believe converted code takes up 50% - 100% more memory than the Z80 code. Basically every Z80 instruction takes on byte and on the TMS9900 they take two. But it depends on the code and in this case the TMS9900 is a lot more efficient: * * =============== S U B R O U T I N E ======================================= * * b: pixel Y * c: pixel X * * Result in bc * calc_vidbuf_addr: ;RAM:D811 E5 push hl ;RAM:D812 CB 38 srl b ; y >> 1, bit 0 to Carry ;RAM:D814 CB 19 rr c ; Carry to bit 7, x >> 1 ;RAM:D816 CB 38 srl b ; ;RAM:D818 CB 19 rr c ; ;RAM:D81A CB 38 srl b ; ;RAM:D81C CB 19 rr c ; srl bc,3 ;RAM:D81E 21 F3 D8 ld hl, #vidbuf ; bitmap buffer ;RAM:D821 09 add hl, bc ; calculate bitmap memory address ;RAM:D822 4D ld c, l ;RAM:D823 44 ld b, h ; BC = bitmap memory address ai bc,vidbuf ;RAM:D824 E1 pop hl ;RAM:D825 C9 ret rt * * End of function calc_vidbuf_addr * Edited March 12, 2023 by Asmusr Quote Link to comment Share on other sites More sharing options...
artrag Posted July 21, 2017 Share Posted July 21, 2017 (edited) Anyone has a commented Z80 disassembly of Elite? I can point you to the commented sources of Uridium (there is an improved version for cv) an of Tales of Popolon (a 3D fps on msx) https://youtu.be/xaNX8i5f4pc Edited July 21, 2017 by artrag Quote Link to comment Share on other sites More sharing options...
apersson850 Posted July 21, 2017 Share Posted July 21, 2017 When you simply convert code, that's written to work well on one CPU, to another, that's quite a lot different, you have to count on that the code will not be that terrific when translated. But if you re-write it to actually exploit the advantages of that second architecture, then it may be even better (depending on the source and target systems, of course). Quote Link to comment Share on other sites More sharing options...
artrag Posted July 21, 2017 Share Posted July 21, 2017 (edited) Given that the vdp is the same it is matter of rewriting the same loops and algorithms for the new cpu You can start transposing opcode by opcode and then optimise once the loop starts running Edited July 21, 2017 by artrag Quote Link to comment Share on other sites More sharing options...
Asmusr Posted July 21, 2017 Author Share Posted July 21, 2017 I can point you to the commented sources of Uridium (there is an improved version for cv) an of Tales of Popolon (a 3D fps on msx) Do you think anything in Uridium would benefit from a 16-bit architecture or would it end up running at half speed? Quote Link to comment Share on other sites More sharing options...
artrag Posted July 21, 2017 Share Posted July 21, 2017 (edited) All the x coordinates of the objects are 16 bit, if not 24 bits (due to decimal parts). Some of the y coordinates are 16 bits to take into account decimal points. How fast the TMS9900 is in doing a large case/switch and in doing 8 and 16bit comparisons?Anyway, in case of problems, one can reduce the max number of enemies or set the speed at 30 fps.Both aspects are already implemented, one by a label, the other in the menu where there is the choice of the difficulty level, where easier modes work slowing down the main loop by adding extra waiting for vblank. Edited July 21, 2017 by artrag Quote Link to comment Share on other sites More sharing options...
Asmusr Posted July 22, 2017 Author Share Posted July 22, 2017 How fast the TMS9900 is in doing a large case/switch and in doing 8 and 16bit comparisons? A large case/switch is just a jump table I guess? 8 and 16 bit comparisons (CB and C) are equally fast, but I don't know how fast they are compared to the Z80. Quote Link to comment Share on other sites More sharing options...
artrag Posted July 22, 2017 Share Posted July 22, 2017 (edited) A large case/switch is just a jump table I guess? 8 and 16 bit comparisons (CB and C) are equally fast, but I don't know how fast they are compared to the Z80. Yes,the switches are used in the enemies and in the collisionsAbout the relative speed, maybe one can eventually compensate by using less enemies or setting at 30hz the frame rate Edited July 22, 2017 by artrag Quote Link to comment Share on other sites More sharing options...
adel314 Posted February 16, 2021 Share Posted February 16, 2021 On 7/1/2017 at 3:26 AM, mizapf said: When you calculate 00c0 - 00d0, the ALU effectively adds the two's complement: 00c0 + ff30 = fff0. Since the result does not exceed ffff, carry is cleared. However, when you calculate 00d0 - 00c0, you have 00d0 + ff40 = 10010 = 0010, so this means carry is set. I know this is way out of date, but it helped me sort out some frustrating issues I had when carrying out some 40 bit subtraction and addition. The lesson I learnt is that you can't rely on the Carry Flag when performing arithmetic larger than 16 bits. The most reliable method I found was to use the overflow for arithmetic functions and carry for carry over to the next register. For example: ; ; ;Subtract 40 bit Destination (R5,R6,R7)from another 40 bit Source (R1,R2,R3). ; ; SUBAC: CLR R0 SB R3,R7 JNO SAC1 INC R2 ;Add carry bit JNC SAC1 INC R1 ;Add carry bit JNC SAC1 SETO R0 ;Carry to AC has occured from R5 ; ; Now subtract the registers pairs 6 and 5 ; SAC1: S R2,R6 JNO SAC2 INC R1 JNC SAC2 SETO R0 ;Carry to AC has occured from R5 SAC2: S R1,R5 ; JNO SAC3 SETO R0 SAC3: INC R0 ;Set the carry flag if carry occured RET Quote Link to comment Share on other sites More sharing options...
+TheBF Posted February 16, 2021 Share Posted February 16, 2021 48 minutes ago, adel314 said: I know this is way out of date, but it helped me sort out some frustrating issues I had when carrying out some 40 bit subtraction and addition. The lesson I learnt is that you can't rely on the Carry Flag when performing arithmetic larger than 16 bits. The most reliable method I found was to use the overflow for arithmetic functions and carry for carry over to the next register. For example: ; ; ;Subtract 40 bit Destination (R5,R6,R7)from another 40 bit Source (R1,R2,R3). ; ; SUBAC: CLR R0 SB R3,R7 JNO SAC1 INC R2 ;Add carry bit JNC SAC1 INC R1 ;Add carry bit JNC SAC1 SETO R0 ;Carry to AC has occured from R5 ; ; Now subtract the registers pairs 6 and 5 ; SAC1: S R2,R6 JNO SAC2 INC R1 JNC SAC2 SETO R0 ;Carry to AC has occured from R5 SAC2: S R1,R5 ; JNO SAC3 SETO R0 SAC3: INC R0 ;Set the carry flag if carry occured RET Did that take more instructions in Z80 code? And for a the person who knows nothing about games (me), why is 40 bit math required? 32 bit operations give +/- 2 billion ish magnitudes. I can't imagine what a game would need more. Quote Link to comment Share on other sites More sharing options...
+mizapf Posted February 16, 2021 Share Posted February 16, 2021 1 hour ago, adel314 said: I know this is way out of date, but it helped me sort out some frustrating issues I had when carrying out some 40 bit subtraction and addition. The lesson I learnt is that you can't rely on the Carry Flag when performing arithmetic larger than 16 bits. The most reliable method I found was to use the overflow for arithmetic functions and carry for carry over to the next register. For example: I'd probably calculate the two's complement of the second operand and use an addition. by the way, which register is the most significant? I guess R3 and R7 (otherwise, SB and JNO would not make sense). Quote Link to comment Share on other sites More sharing options...
adel314 Posted February 17, 2021 Share Posted February 17, 2021 9 hours ago, mizapf said: I'd probably calculate the two's complement of the second operand and use an addition. by the way, which register is the most significant? I guess R3 and R7 (otherwise, SB and JNO would not make sense). Yes, I should have specified that. R5 to R7 (Upper Byte) represent a Floating Point Mantissa and the Lower byte of R7 the Exponent, that is why I only needed the SB for the R7 operation. So R5 is MS and R7 LS Quote Link to comment Share on other sites More sharing options...
adel314 Posted February 17, 2021 Share Posted February 17, 2021 (edited) 10 hours ago, TheBF said: Did that take more instructions in Z80 code? And for a the person who knows nothing about games (me), why is 40 bit math required? 32 bit operations give +/- 2 billion ish magnitudes. I can't imagine what a game would need more. In certain places the Z80 code with the use of the carry flag makes some of the arithmetic much simpler than the TMS9900 but in most other cases the TMS9900 produces much more efficient and smaller code. The Z80 Floating Point Math Package that I have ported to the TMS9900 is probably about the same size but I will do a check. Looks like it is all working now, but your post helped me quite a bit. Edited February 17, 2021 by adel314 Clearer expression 1 1 Quote Link to comment Share on other sites More sharing options...
adel314 Posted February 19, 2021 Share Posted February 19, 2021 (edited) On 2/17/2021 at 2:23 AM, mizapf said: I'd probably calculate the two's complement of the second operand and use an addition. by the way, which register is the most significant? I guess R3 and R7 (otherwise, SB and JNO would not make sense). Regarding the use of two's complement, I decided to use another method as the method I have posted earlier, is not 100% accurate, but from a coding perspective I think this updated method is accurate is reasonably compact and would appreciate your thoughts. The logic behind this is that a "Carry" or "Borrow" as we now it will always occur if the operand being subtracted (source) is logically larger than the destination operand, so using a Compare to test this allows us to use JH or JLE to detect the borrow. ; ; ;Subtract 40 bit Destination (R5,R6,R7)from another 40 bit Source (R1,R2,R3). ; ; 40 bit source: R1,R2,R3 with R1 being MS and R3 LS ; 40 bit destination: R5,R6,R7 with R5 being MS and R7 LS ; ; ; SUBAC: PUSH R1 PUSH R2 CLR R0 CB R3,R7 ;Check for borrow JLE SAC1 INC R2 ;Propogate the borrow JNC SAC1 INC R1 JNC SAC1 SETO R0 ; ; Now perform the subtract the R3 from R7 (LS) ; SAC1: SB R3,R7 ;no need to check for carry/overflow C R2,R6 JLE SAC2 INC R1 JNC SAC2 SETO R0 ; ; Now perform the subtract of R2 from R6 ; SAC2: S R2,R6 C R1,R5 JLE SAC3 SETO R0 ; ; Now the subtract the MSB registers R1 from R5 ; SAC3: S R1,R5 POP R2 POP R1 INC R0 ;Set the carry flag if carry in the MSB occured RET Edited February 19, 2021 by adel314 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted February 19, 2021 Share Posted February 19, 2021 I presume PUSH and POP are defined like DECT SP MOV %1,*SP and MOV *SP+,%1 But I don't get the logic of R2, R3 being the mantissa and the lower part of R7 being the exponent, combined with the code above? Or is this two different cases? Quote Link to comment Share on other sites More sharing options...
+mizapf Posted February 19, 2021 Share Posted February 19, 2021 Yes, I was a bit surprised, too, when hearing about mantissa and exponent, because I thought we were talking about integers. Quote Link to comment Share on other sites More sharing options...
adel314 Posted February 20, 2021 Share Posted February 20, 2021 18 hours ago, apersson850 said: I presume PUSH and POP are defined like DECT SP MOV %1,*SP and MOV *SP+,%1 But I don't get the logic of R2, R3 being the mantissa and the lower part of R7 being the exponent, combined with the code above? Or is this two different cases? The integer maths are required to calculate the 40bit mantissa component oof the floating point routines. Here is a link to the original Z80 code. http://www.z80.info/zip/math.zip . For your info here here is the code for PUSH and POP , they are just standard XOP routines. ; ;************************************************* ; PUSH DATA/REGISTER ONTO THE STACK ; USES CALLER'S WP AND STACK POINTERS ;************************************************* ; XOP8 MOV @FREEMEM,@2*R9(R13) ;UPDATE FREE MEMORY POINTER IE STACK LIMIT MOV @2*R10(R13),R10 DECT R10 C R10,@2*R9(R13) ;CHECK FOR OVERFLOW JLE STACKERR MOV *R11,*R10 MOV R10,@2*R10(R13) RTWP ; ; POP DATA/REGISTER OFF STACK ; XOP9 MOV @2*R10(R13),R10 MOV *R10+,*R11 MOV R10,@2*R10(R13) RTWP Quote Link to comment Share on other sites More sharing options...
adel314 Posted February 20, 2021 Share Posted February 20, 2021 18 hours ago, mizapf said: Yes, I was a bit surprised, too, when hearing about mantissa and exponent, because I thought we were talking about integers. Yes, it is about integer arithmetic. The mantissa is a 40 bit integer. The problem that I see is that the TMS9900 performs two's complement arithmetic and the flags are based on that implementation. Quote Link to comment Share on other sites More sharing options...
apersson850 Posted February 20, 2021 Share Posted February 20, 2021 (edited) OK, I'll have to look at the link. But I read the exponent is in least significant byte of R7, and you do CB with R7, which looks at most significant byte. Also I don't see the value of the exponent influencing the magnitude of the mantissa. Like if you calculate 1E20-9E0, then it doesn't matter that 9 is much bigger than one. In this case, the exponent will make the result of 1E20-9E0=1E20. The subtraction isn't even noticed. Or do you use the exponent in some other way? After looking at the data in the link, I see it discusses floating point BCD math. Is it something similar to TI's own floating point math you try to use? Just that they use radix 100, since there are no specific radix 10 instructions in the TMS 9900 anyway. Edited February 20, 2021 by apersson850 Quote Link to comment Share on other sites More sharing options...
adel314 Posted February 21, 2021 Share Posted February 21, 2021 19 hours ago, apersson850 said: OK, I'll have to look at the link. But I read the exponent is in least significant byte of R7, and you do CB with R7, which looks at most significant byte. Also I don't see the value of the exponent influencing the magnitude of the mantissa. Like if you calculate 1E20-9E0, then it doesn't matter that 9 is much bigger than one. In this case, the exponent will make the result of 1E20-9E0=1E20. The subtraction isn't even noticed. Or do you use the exponent in some other way? After looking at the data in the link, I see it discusses floating point BCD math. Is it something similar to TI's own floating point math you try to use? Just that they use radix 100, since there are no specific radix 10 instructions in the TMS 9900 anyway. Yes, the exponent is handled separately, so in effect I am working on a 5 byte mantissa in R5, R6 and MSB of R7. This causes extra handling but I though I would be consistent with the original and keeping the representation within 3 registers just makes it neater. Quote Link to comment Share on other sites More sharing options...
+TheBF Posted February 21, 2021 Share Posted February 21, 2021 On 2/20/2021 at 12:12 AM, adel314 said: The integer maths are required to calculate the 40bit mantissa component oof the floating point routines. Here is a link to the original Z80 code. http://www.z80.info/zip/math.zip . For your info here here is the code for PUSH and POP , they are just standard XOP routines. ; ;************************************************* ; PUSH DATA/REGISTER ONTO THE STACK ; USES CALLER'S WP AND STACK POINTERS ;************************************************* ; XOP8 MOV @FREEMEM,@2*R9(R13) ;UPDATE FREE MEMORY POINTER IE STACK LIMIT MOV @2*R10(R13),R10 DECT R10 C R10,@2*R9(R13) ;CHECK FOR OVERFLOW JLE STACKERR MOV *R11,*R10 MOV R10,@2*R10(R13) RTWP ; ; POP DATA/REGISTER OFF STACK ; XOP9 MOV @2*R10(R13),R10 MOV *R10+,*R11 MOV R10,@2*R10(R13) RTWP I have never used the XOP instructions but that seems like a lot of code to do PUSH and POP plus the BLWP/RTWP overhead. I suppose if the Assembler does not support macros this provides some abstraction. (?) Push can be two instructions and pop can be one instruction on 9900. Could that work on your system? 1 Quote Link to comment Share on other sites More sharing options...
adel314 Posted February 23, 2021 Share Posted February 23, 2021 (edited) On 2/22/2021 at 1:32 AM, TheBF said: I have never used the XOP instructions but that seems like a lot of code to do PUSH and POP plus the BLWP/RTWP overhead. I suppose if the Assembler does not support macros this provides some abstraction. (?) Push can be two instructions and pop can be one instruction on 9900. Could that work on your system? The problem with BLWP/RTWP is that after the first level of subroutine you have to begin saving the workspace pointers, status registers and return addresses etc so the overhead management soon becomes very complicated for nested systems. Using the XOPs you can avoid that are mimic and normal micro that microcodes the CALL, PUSH and POP functions. I have copied the CALL functions just for info. Of course in most circumstances you don't need to check for stackoverflow and this would reduce the code somewhat. As you suggest, if I don't want the overhead of a PUSH and POP, and provided the routine is local you can use MOV R3,@-2(SP) to push onto the stack and MOV @-2(SP),R3 to recover it. You can perform some fairly powerful pseudo instructions using XOPs which would yes be very similar to MACROs ; ;************************************************ ; CALL SUBROUTINE ; CALLING METHOD: CALL SUBROUTINE_ADDRESS ;************************************************* ; XOP6 MOV @2*R10(R13),R10 ;GET STACK POINTER DECT R10 C R10,@2*R9(R13) ;CHECK FOR STACK OVERFLOW JLE STACKERR ;O/P STACK OVERFLOW MESSAGE MOV R14,*R10 ;PUSH SAVED PC ONTO STACK MOV R11,R14 ;MOVE EA INTO R14 FOR CALL MOV R10,@2*R10(R13) ;UPDATE STACK POINTER RTWP ;NOTE WE ARE NOW USING THE ORIGINAL WP Edited February 23, 2021 by adel314 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted February 23, 2021 Share Posted February 23, 2021 I was thinking about the way the Forth community here does stacks on 9900. There is way less overhead in cycles and memory. A macro assembler makes it prettier but the inlined instructions are not very big. It would seem to be an order of magnitude faster than the XOP version just "eyeballing" the code. If you are changing context (workspace) you would of course have to decide that register X is the stack pointer for all workspaces and give each context a small section of the stack memory ie: an offset from the base stack address used by the main workspace. With this kind of small stack overhead you can make different decisions about what truly requires a BLWP and what can be managed with a register push like you would do on an MSP430 for example where there is only a stack and no workspaces. Just a thought. Note: SP is an EQUate for the register you choose for the stack pointer. * stack is placed in high memory and descends on PUSH * there is no overflow protection, there is no underflow protection * * PUSH * DECT SP MOV Rn,*SP * * POP * MOV *SP,Rn Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted February 23, 2021 Share Posted February 23, 2021 12 minutes ago, TheBF said: * * POP * MOV *SP,Rn should be * * POP * MOV *SP+,Rn ...lee Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.