# Looking for 32 bit division routines

## Recommended Posts

I'm looking for a good assembly routine to divide a 32 bit integer by a 16 bit integer, leaving a 32 bit result.

The only routines I have on hand are for dividing 64 bit by 32 bit to leave a 32 bit result, or to divide 32 bit by 16 bit leaving a 16 bit result. Neither quite fits my case. For example, I'd like to be able to divide \$400000 by \$20 and end up with \$20000.

Anyone have any code handy for this?

##### Share on other sites

Here's what I've been working with, based on different routines I've seen. Using the rotate/subtract method. Complete gibberish as the result.

```10 DIVISOR = \$CF
20 DIVIDEND = \$CB
30 REMAINDER = \$D8
40 RESULT =  \$D4
50 SCRAP =   \$E0
60 DIVIDE LDA #0     	        ;preset remainder to 0
70       STA REMAINDER
80       STA REMAINDER+1
90       STA REMAINDER+2
100      STA REMAINDER+3
110      STA RESULT
120      STA RESULT+1
130      STA RESULT+2
140      STA RESULT+3
150      LDX #32     	        ;repeat for each bit: ...
160 DIVLOOP ASL DIVIDEND
170      ROL DIVIDEND+1
180      ROL DIVIDEND+2
190      ROL DIVIDEND+3
200      ROL REMAINDER
210      ROL REMAINDER+1
220      ROL REMAINDER+2
230      ROL REMAINDER+3
240      TXA
250      PHA
260      SEC
270      LDX #0
280      LDY #4
290 LP1  LDA REMAINDER,X
300      SBC DIVISOR,X 	;substract divisor to see if it fits in
310      STA SCRAP,X
320      INX
330      DEY
340      BNE LP1
350      BCC SKIP    	;if carry=0 then divisor didn't fit in yet
360      LDY #4
370      LDX #0
380 LP2  LDA SCRAP,X 	;else save substraction result as new remainder,
390      STA REMAINDER,X
400      INX
410      DEY
420      BNE LP2
430      INC RESULT  	;and INCrement result cause divisor fit in 1 times
440 SKIP PLA
450      TAX
460      DEX
470      BNE DIVLOOP
480      RTS```

##### Share on other sites

Only the lowest 8 bits of RESULT are being bumped.

INC RESULT

BNE @+

INC RESULT+1

BNE @+

INC RESULT+2

BNE @+

INC RESULT+3

@

##### Share on other sites

Your routine looks like the one on codebase64. One thing to note is that that routine relies on dividend and result being aliased to the same memory locations. That way the rols on dividend also rol the result.

Here's a working version:

```    org \$80
divisor org *+4
dividend org *+4
remainder org *+4
result equ dividend
scrap org *+2

org \$2000

divide
lda #0                  ;preset remainder to 0
sta remainder
sta remainder+1
sta remainder+2
sta remainder+3
ldx #32                 ;repeat for each bit: ...

divloop
asl dividend
rol dividend+1
rol dividend+2
rol dividend+3
rol remainder
rol remainder+1
rol remainder+2
rol remainder+3

sec
lda remainder+0         ;substract divisor to see if it fits in
sbc divisor+0
sta scrap+0
lda remainder+1
sbc divisor+1
sta scrap+1
lda remainder+2
sbc divisor+2
tay
lda remainder+3
sbc divisor+3
bcc skip                ;if carry=0 then divisor did not fit in yet

sta remainder+3         ;else save substraction result as new remainder,
sty remainder+2
mwa scrap remainder
inc result              ;and increment result cause divisor fit in 1 times

skip
dex
bne divloop
rts

main
mwa #123 dividend       ;test
mwa #0 dividend+2
mwa #3 divisor
mwa #0 divisor+2
jsr divide
jmp *

run main
```

Another problem with your code was that the loop counters went from 4 to 1 instead of 3 to 0. I circumvented this by unrolling loops. This also eliminates the need for pha/pla to save the X register.

@flashjazzcat, the INC will never cause a carry the way the code is written, so the extra INCs are not necessary.

EDIT: Nevermind about the loop counters. Yours are fine. My subtraction was in the wrong byte order. I've corrected it above.

EDIT #2: Fixed another error in the copying of the scratch result to the remainder. Amazing how my small test case yields the correct result despite all these errors. Probably should expand the test suite.

Edited by Xuel

##### Share on other sites

You're right: I didn't read too closely and assumed it was meant to hold the 32-bit result, but I see I'm writing nonsense again.

##### Share on other sites

Hi!

@flashjazzcat, the INC will never cause a carry the way the code is written, so the extra INCs are not necessary.

In fact, the "INC" can be avoided by replacing the first ASL with a ROL and adding a CLC at the initialization, this is using the fact that the carry is set by the last substraction.

A good place to look for arithmetic routines is the CC65 runtime library, those are not optimized for maximum speed but for reusing code. The code for the 32bit division is in libsrc/runtime/ludiv.s, "udiv32" (ludiv == long-unsigned-division), see at https://github.com/cc65/cc65/blob/master/libsrc/runtime/ludiv.s

Edited by dmsc

##### Share on other sites

Thanks, all.. I always had a problem with the theory of binary division and while what I had worked for 16/8 bit, when I tried scaling it up I just couldn't get it to work.

##### Share on other sites

Pab.... You are not the only one with binary arithmetic.... Ask me and my headaches when coding 3d stuff.... Compared to 16bit CPUs 6502 is a bitch for other Math than add/sub in 8 but.

Edited by Heaven/TQA

##### Share on other sites

Pab.... You are not the only one with binary arithmetic.... Ask me and my headaches when coding 3d stuff.... Compared to 16bit CPUs 6502 is a bitch for other Math than add/sub in 8 but.

3D maths is difficult enough on a PC let alone an 8-bit in 6502.

##### Share on other sites

Following on from this: I was using 32-bit divide and multiply functions based on routines found on Codebase, but for non-critical operations where space is at a premium (especially when PZ RAM can't be used for whatever reason), I wondered if they could be shrunk down some. The result seems to work well:

```;	Small unsigned 32-bit divide and multiply
;	Num1, Num2 and Num3 must be absolute addresses

Num1	.ds 4
Num3	.ds 8	; must follow Num1 for Div to work
Num2	.ds 4

;
;	Divide Num1 by Num2 and place result in Num1
;	Remainder is left in Num3, which MUST be immediately after Num1 in RAM
;	Optimised for size rather than speed
;

.local DivDWord
lda #0	        ;preset remainder to 0
ldx #3
@
sta Num3,x
dex
bpl @-

ldy #32		; repeat for each bit
DivLoop
clc		; clear carry so we can use ROL throughout
ldx #248	; dividend lb & hb*2, msb -> Carry
@
rol.w Num1-248,x	; .w ensures ZP addressing is never used
inx
bne @-

sec
ldx #252
@
lda.w Num3-252,x
sbc.w Num2-252,x
pha
inx
bne @-

bcs @+
pla
pla
pla
pla
bcc Skip
@
ldx #3
@
pla
sta Num3,x
dex
bpl @-

inc Num1	; and increment result because divisor fit
Skip
dey
bne DivLoop
rts
.endl

;
;	Multiply Num1 by Num2, storing result in Num3
;	Optimised for size rather than speed
;

.local MulDWord
ldx #3
lda #0
@
sta Num3+4,x	; clear upper bits of product
dex
bpl @-

ldy #32		; loop for each bit
MultLoop
clc		; clear carry so we can use ROR on all four bytes
ldx #3
@
ror Num1,x	; divide multiplier by 2
dex
bpl @-
bcc Rotate

clc
ldx #253
@
lda.w Num3-249,x
sta.w Num3-249,x
inx
bne @-

lda Num3+7
Rotate
ror	; rotate partial product
sta Num3+7

ldx #6
@
ror Num3,x
dex
bpl @-

dey
bne MultLoop
rts
.endl

```

Obviously these aren't the fastest because of all the loops and stack usage, but for one-off calculations it hardly matters. Putting Num1, Num2 and Num3 in zero page and getting rid of the loops provides a better balance of performance and code size, however.

## Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

Only 75 emoji are allowed.

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

• ### Recently Browsing   0 members

×
• Forums
• Clubs

• All Activity

• #### Subscriptions

×
• Create New...