Pab Posted April 20, 2015 Share Posted April 20, 2015 I'm looking for a good assembly routine to divide a 32 bit integer by a 16 bit integer, leaving a 32 bit result. The only routines I have on hand are for dividing 64 bit by 32 bit to leave a 32 bit result, or to divide 32 bit by 16 bit leaving a 16 bit result. Neither quite fits my case. For example, I'd like to be able to divide $400000 by $20 and end up with $20000. Anyone have any code handy for this? Quote Link to comment Share on other sites More sharing options...
Pab Posted April 20, 2015 Author Share Posted April 20, 2015 Here's what I've been working with, based on different routines I've seen. Using the rotate/subtract method. Complete gibberish as the result. 10 DIVISOR = $CF 20 DIVIDEND = $CB 30 REMAINDER = $D8 40 RESULT = $D4 50 SCRAP = $E0 60 DIVIDE LDA #0 ;preset remainder to 0 70 STA REMAINDER 80 STA REMAINDER+1 90 STA REMAINDER+2 100 STA REMAINDER+3 110 STA RESULT 120 STA RESULT+1 130 STA RESULT+2 140 STA RESULT+3 150 LDX #32 ;repeat for each bit: ... 160 DIVLOOP ASL DIVIDEND 170 ROL DIVIDEND+1 180 ROL DIVIDEND+2 190 ROL DIVIDEND+3 200 ROL REMAINDER 210 ROL REMAINDER+1 220 ROL REMAINDER+2 230 ROL REMAINDER+3 240 TXA 250 PHA 260 SEC 270 LDX #0 280 LDY #4 290 LP1 LDA REMAINDER,X 300 SBC DIVISOR,X ;substract divisor to see if it fits in 310 STA SCRAP,X 320 INX 330 DEY 340 BNE LP1 350 BCC SKIP ;if carry=0 then divisor didn't fit in yet 360 LDY #4 370 LDX #0 380 LP2 LDA SCRAP,X ;else save substraction result as new remainder, 390 STA REMAINDER,X 400 INX 410 DEY 420 BNE LP2 430 INC RESULT ;and INCrement result cause divisor fit in 1 times 440 SKIP PLA 450 TAX 460 DEX 470 BNE DIVLOOP 480 RTS Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted April 20, 2015 Share Posted April 20, 2015 Only the lowest 8 bits of RESULT are being bumped. INC RESULT BNE @+ INC RESULT+1 BNE @+ INC RESULT+2 BNE @+ INC RESULT+3 @ Quote Link to comment Share on other sites More sharing options...
Xuel Posted April 21, 2015 Share Posted April 21, 2015 (edited) Your routine looks like the one on codebase64. One thing to note is that that routine relies on dividend and result being aliased to the same memory locations. That way the rols on dividend also rol the result. Here's a working version: org $80 divisor org *+4 dividend org *+4 remainder org *+4 result equ dividend scrap org *+2 org $2000 divide lda #0 ;preset remainder to 0 sta remainder sta remainder+1 sta remainder+2 sta remainder+3 ldx #32 ;repeat for each bit: ... divloop asl dividend rol dividend+1 rol dividend+2 rol dividend+3 rol remainder rol remainder+1 rol remainder+2 rol remainder+3 sec lda remainder+0 ;substract divisor to see if it fits in sbc divisor+0 sta scrap+0 lda remainder+1 sbc divisor+1 sta scrap+1 lda remainder+2 sbc divisor+2 tay lda remainder+3 sbc divisor+3 bcc skip ;if carry=0 then divisor did not fit in yet sta remainder+3 ;else save substraction result as new remainder, sty remainder+2 mwa scrap remainder inc result ;and increment result cause divisor fit in 1 times skip dex bne divloop rts main mwa #123 dividend ;test mwa #0 dividend+2 mwa #3 divisor mwa #0 divisor+2 jsr divide jmp * run main Another problem with your code was that the loop counters went from 4 to 1 instead of 3 to 0. I circumvented this by unrolling loops. This also eliminates the need for pha/pla to save the X register. @flashjazzcat, the INC will never cause a carry the way the code is written, so the extra INCs are not necessary. EDIT: Nevermind about the loop counters. Yours are fine. My subtraction was in the wrong byte order. I've corrected it above. EDIT #2: Fixed another error in the copying of the scratch result to the remainder. Amazing how my small test case yields the correct result despite all these errors. Probably should expand the test suite. Edited April 21, 2015 by Xuel Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted April 21, 2015 Share Posted April 21, 2015 You're right: I didn't read too closely and assumed it was meant to hold the 32-bit result, but I see I'm writing nonsense again. Quote Link to comment Share on other sites More sharing options...
dmsc Posted April 21, 2015 Share Posted April 21, 2015 (edited) Hi! @flashjazzcat, the INC will never cause a carry the way the code is written, so the extra INCs are not necessary. In fact, the "INC" can be avoided by replacing the first ASL with a ROL and adding a CLC at the initialization, this is using the fact that the carry is set by the last substraction. A good place to look for arithmetic routines is the CC65 runtime library, those are not optimized for maximum speed but for reusing code. The code for the 32bit division is in libsrc/runtime/ludiv.s, "udiv32" (ludiv == long-unsigned-division), see at https://github.com/cc65/cc65/blob/master/libsrc/runtime/ludiv.s Edited April 21, 2015 by dmsc 1 Quote Link to comment Share on other sites More sharing options...
Pab Posted April 21, 2015 Author Share Posted April 21, 2015 Thanks, all.. I always had a problem with the theory of binary division and while what I had worked for 16/8 bit, when I tried scaling it up I just couldn't get it to work. Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted April 21, 2015 Share Posted April 21, 2015 (edited) Pab.... You are not the only one with binary arithmetic.... Ask me and my headaches when coding 3d stuff.... Compared to 16bit CPUs 6502 is a bitch for other Math than add/sub in 8 but. Edited April 21, 2015 by Heaven/TQA 1 Quote Link to comment Share on other sites More sharing options...
snicklin Posted April 21, 2015 Share Posted April 21, 2015 Pab.... You are not the only one with binary arithmetic.... Ask me and my headaches when coding 3d stuff.... Compared to 16bit CPUs 6502 is a bitch for other Math than add/sub in 8 but. 3D maths is difficult enough on a PC let alone an 8-bit in 6502. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted May 18, 2015 Share Posted May 18, 2015 Following on from this: I was using 32-bit divide and multiply functions based on routines found on Codebase, but for non-critical operations where space is at a premium (especially when PZ RAM can't be used for whatever reason), I wondered if they could be shrunk down some. The result seems to work well: ; Small unsigned 32-bit divide and multiply ; Num1, Num2 and Num3 must be absolute addresses Num1 .ds 4 Num3 .ds 8 ; must follow Num1 for Div to work Num2 .ds 4 ; ; Divide Num1 by Num2 and place result in Num1 ; Remainder is left in Num3, which MUST be immediately after Num1 in RAM ; Optimised for size rather than speed ; .local DivDWord lda #0 ;preset remainder to 0 ldx #3 @ sta Num3,x dex bpl @- ldy #32 ; repeat for each bit DivLoop clc ; clear carry so we can use ROL throughout ldx #248 ; dividend lb & hb*2, msb -> Carry @ rol.w Num1-248,x ; .w ensures ZP addressing is never used inx bne @- sec ldx #252 @ lda.w Num3-252,x sbc.w Num2-252,x pha inx bne @- bcs @+ pla pla pla pla bcc Skip @ ldx #3 @ pla sta Num3,x dex bpl @- inc Num1 ; and increment result because divisor fit Skip dey bne DivLoop rts .endl ; ; Multiply Num1 by Num2, storing result in Num3 ; Optimised for size rather than speed ; .local MulDWord ldx #3 lda #0 @ sta Num3+4,x ; clear upper bits of product dex bpl @- ldy #32 ; loop for each bit MultLoop clc ; clear carry so we can use ROR on all four bytes ldx #3 @ ror Num1,x ; divide multiplier by 2 dex bpl @- bcc Rotate clc ldx #253 @ lda.w Num3-249,x adc.w Num2-253,x sta.w Num3-249,x inx bne @- lda Num3+7 adc Num2+3 Rotate ror ; rotate partial product sta Num3+7 ldx #6 @ ror Num3,x dex bpl @- dey bne MultLoop rts .endl Obviously these aren't the fastest because of all the loops and stack usage, but for one-off calculations it hardly matters. Putting Num1, Num2 and Num3 in zero page and getting rid of the loops provides a better balance of performance and code size, however. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.