Fast divide by seven?

Thomas Jentzsch · February 21, 2009

I have no idea what the other routine is for though.

:idea: It converts a hex-code into an ASCII code.

bogax · February 22, 2009

How'd you come up with that?

Someone posted a question about bit twiddling in one of the 6502.org forums

that was part of the answer

some interesting stuff there (one of the interesting things is a pointer in one

of the forums to this forum )

That ranks right up there with the following routine someone came up with (on another platform, but adaptable to 6502):
; Magical mystery routine (hint: use with acc = 0-15)
 sed
 adc #$90
 adc #$40
 cld
Anyone ever seen anything like that one before?

I believe I saw that posted to comp.sys.cbm some years ago, don't remember where

it was said to have come from.

Don't you need a clc in there?

bogax · February 22, 2009

Wonder if you could impliment a similar process for other numbers too. (still preserving the x and y)

Sure it's basically just reciprocal multiplication

divide by 7 1/7 = .001001001001001...

 sta temp   1.
lsr		 .1
lsr		 .01
lsr		 .001
adc temp   1.001	   leave the carry alone to round
ror		 .1001  
lsr		 .01001
lsr		 .001001
adc temp   1.001001
ror		 .1001001
lsr		 .01001001
lsr		 .001001001

what I find intriguing is that, since they're all repeating decimals,

you can (sort of) accumulate accuracy

 sta temp   1.
lsr		 .1
lsr		 .01
lsr		 .001
adc temp   1.001
sta temp   1.001	  
ror		 .1001  
lsr		 .01001
lsr		 .001001
lsr		 .0001001	 < I've skipped an addition here
lsr		 .00001001
lsr		 .000001001
adc temp   1.001001001
ror		 .1001001001
lsr		 .01001001001
lsr		 .001001001001

obviously doesn't help any in the case of dividing an 8 bit number by 7

but I've gotten to the point where it's less work to rol

 sta temp xxxxxxxx   1.
lsr	  xxxxxxxx	.1
lsr	  0xxxxxxx	.01
lsr	  00xxxxxx	.001
adc temp xxxxxxxx   1.001
sta temp xxxxxxxx   1.001	  
rol	  jjjjjjjx	.00000001001   I've (in effect) jumped a byte to the right 
rol	  jjjjjjxx	.0000001001	the j's are junk that would have been
rol	  jjjjjxxx	.000001001	 shifted out now I have to mask them off
and #$F8 00000xxx	.000001001	 still slightly faster
adc temp xxxxxxxx   1.001001001
ror	  xxxxxxxx	.1001001001
lsr	  0xxxxxxx	.01001001001
lsr	  00xxxxxx	.001001001001

OK, that doesn't help much either but it might if you were dividing more than

8 bits by a constant and in the cases where the repeating part is some integral

fraction of a byte as is the case for eg division by 10 or 15 I could acummulate

a full bytes worth and not have do any actual shifting at all

But I don't know if it's likely to be useful for any case of an 8 bit dividend.

(or usefull at all for that matter, but I'm thinking 16 bit binary to decimal)

For example Omegamatrix managed to do a pretty good job on division by 15 with

just some rounding.

Edit:changed some lsr's to the rol's they should have been (second and third rol)

Edited February 22, 2009 by bogax

gauntman · February 22, 2009

How'd you come up with that?

Someone posted a question about bit twiddling in one of the 6502.org forums

that was part of the answer

some interesting stuff there (one of the interesting things is a pointer in one

of the forums to this forum )
That ranks right up there with the following routine someone came up with (on another platform, but adaptable to 6502):
; Magical mystery routine (hint: use with acc = 0-15)
 sed
 adc #$90
 adc #$40
 cld
Anyone ever seen anything like that one before?
I believe I saw that posted to comp.sys.cbm some years ago, don't remember where

it was said to have come from.

Don't you need a clc in there?

I think I have seen this as:

sed
cmp #$a
adc #$30
cld

Omegamatrix · February 22, 2009

Bogax, were you just showning the reciprocal extension, or dividing by 7? I tried the first routine and it failed from numbers 228 up. At 228 they started back at zero. The second routine broke very early. It gave me a one when dividing 6 by seven. I didn't check anymore after that.

bogax · February 22, 2009

Bogax, were you just showning the reciprocal extension, or dividing by 7? I tried the first routine and it failed from numbers 228 up. At 228 they started back at zero. The second routine broke very early. It gave me a one when dividing 6 by seven. I didn't check anymore after that.

My bad

I should have said the code was untested and just for illustration

I was trying to point out that most of these routines (obviously not the LUTs)

are multiplication by reciprocals and that the reciprocals are repeating decimals (binimals? )

and the fact that they're repeating could probably be used to advantage.

The first routine is meant to be essentially the divide by 7 routine previously

posted by djmips except that I have it with the dividend starting out in A

instead of in memory (which I also failed to mention).

I figured it was so simple and straight forward it would be good for illustration

Sounds like you used an lsr where you should have an ror?

The second and third routines will fail for values over 227. I believe that

could be fixed by moving the last sta and adc down one shift so you get

the carry into A before you save it to temp like so:

sta temp
lsr
lsr
lsr
adc temp
ror
sta temp
lsr
lsr
lsr
lsr
lsr
lsr
adc temp
ror
lsr

hmm

dividing 6 by 7 should be like this

sta temp	00000110.	 00000110
lsr		 00000011.0
lsr		 00000001.1
lsr		 00000000.1
adc temp	00000111.0
sta temp	00000111.0	00000111
ror		 00000011.1
lsr		 00000001.1
lsr		 00000000.1
lsr		 00000000.0
lsr		 00000000.0
lsr		 00000000.0
adc temp	00000111.0
ror		 00000011.1
lsr		 00000001.1
lsr		 00000000.1

For the third routine I'll have to look at it again, I might well have screwed

something up

Omegamatrix · February 22, 2009

Sorry, I didn't post that very clear. I knew the first routine was the /7 djmips posted. It works fine. I meant your first routine as the middle one, and the last routine as your second.

The value starting out in the accumulator was fine. I've been testing the routines with a simple loop that takes increments a temp register every time it goes through, and looking at the values through the Stella emulator.

Edit: Also is there a way to go faster (and maybe save more bytes) then the previous /5 I did? If we can find a faster /5 then we can get a faster /10 too. :cool:

Edited February 22, 2009 by Omegamatrix

bogax · February 22, 2009

I edited that third routine.

It will still fail at 228 and there's no good way to fix that

with it rotating left instead of right (that I can think of at least)

Oh well, it was only ment to be an illustration, I guess that's

as good an illustration as any

Edit: Also is there a way to go faster (and maybe save more bytes) then the previous /5 I did? If we can find a faster /5 then we can get a faster /10 too.

I think with division into eight bits the simple straight forward way

is going to be the best we can do

here's a division by 10 similar to your division by 15

UNTESTED!

 sta temp
lsr
sec
adc temp
sta temp
lsr
lsr
lsr
lsr
adc temp
lsr
lsr
lsr
lsr

I expect, if you care to test it, you'll get around to that before I do

hmm wonder it something similar can be done for division by 7

Omegamatrix · February 22, 2009

I tried it, but it didn't work. It's the same issue that Batari resolved; how to account for numbers overflowing when times the original number by 1.5.

Your routine will work if you replace the 2 LSR's with ROR's here:

sta temp

lsr

sec

adc temp

sta temp

lsr < ror

lsr

adc temp

lsr < ror

lsr

but the second ror has to account for the original overflow. Post 44 is very similar to the approach here, where I kept the original value in the X register, and then compared it before the rotate. Getting around the time of not doing the compare, TAX, or SEC was always the problem. I managed to go a little faster after that, but could never find a way to do without SEC and save the 2 cycles there.

bogax · February 23, 2009

Your routine will work if you replace the 2 LSR's with ROR's here:

sta temp

lsr

sec

adc temp

sta temp

lsr < ror

lsr

lsr

lsr

adc temp

lsr < ror

lsr

lsr

lsr

ACK!

Yes, right, those were supposed to be ror's

bogax · February 23, 2009

OK, back to divide by seven

Again, untested (yes, I should set up to do that)

sta temp

lsr

adc temp

ror

lsr

cmp #$3F

adc #$00

lsr

Omegamatrix · February 23, 2009

I tried it, but it didn't work. The numbers started failing in the middle of range. The beginning and end were good.

Omegamatrix · January 26, 2013

A recent post by Bogax made me think about this old topic once again. I wrote a small Atari 200 program to help me find solutions to these division problems some time ago:

DivideTest.zip

I came up with a framework to division by 5 sometime ago, earlier in this thread. Thinking about it today I realized I was doing (2^n)+1 division, and I wondered if the algorithm could be extended to other (2^n)+1 divisions (such as the December '84 Apple Assembly Line routine could be extended to other (2^n)-1 divisions).

So I tried it with divide by nine, and the very first try worked! At 29 bytes and 48 cycles it is a beast of a routine, but I'm sure could be optimized with some permutations such as I've done with divide by 5, 10, and 15. I'm not sure if divide by nine has a good use though. I've put the routine in the assembly file above, and I'll include it just below as well.

  sta  temp
 lsr
 lsr
 lsr
 eor  #$FF
 sec
 adc  temp
 lsr
 lsr
 lsr
 eor  #$FF
 sec
 adc  temp
 lsr
 lsr
 lsr
 eor  #$FF
 sec
 adc  temp
 lsr
 lsr
 lsr

Edited January 26, 2013 by Omegamatrix

Omegamatrix · January 26, 2013

I found a more optimized solution in about 10 minutes by letting the program run. I truncated one of these sections:

 eor  #$FF
 sec
 adc  temp
 lsr
 lsr
 lsr

Then I inserted an "ADC testVariable" with MODE = 0. I let the program run and Whamo! It found a solution with adc #252. So we are now down 23 bytes and 37 cycles.

 sta  temp
 lsr
 lsr
 lsr
 adc  #252
 eor  #$FF
 sec
 adc  temp
 lsr
 lsr
 lsr
 eor  #$FF
 sec
 adc  temp
 lsr
 lsr
 lsr

Omegamatrix · January 26, 2013

Removed the SEC, it passed. 22 bytes, 35 cycles:

 sta  temp
 lsr
 lsr
 lsr
 adc  #252
 eor  #$FF
 adc  temp
 lsr
 lsr
 lsr
 eor  #$FF
 sec
 adc  temp
 lsr
 lsr
 lsr

Omegamatrix · January 26, 2013

Faster, better... 21 bytes, 33 cycles.

;divide by 9
 sta  temp
 lsr
 lsr
 lsr
 adc  #252
 eor  #$FF
 adc  temp
 lsr
 lsr
 lsr
 sbc  temp
 eor  #$FF
 lsr
 lsr
 lsr

Edited January 26, 2013 by Omegamatrix

+Gemintronic · January 26, 2013

I'm making a batariBasic game that leans pretty heavily on pseudo random numbers and bit shiting. I know this is beyond the scope of your topic here, but, how would one incorporate your divde by 9 code inline (in bB)?

Omegamatrix · January 26, 2013

I'm making a batariBasic game that leans pretty heavily on pseudo random numbers and bit shiting. I know this is beyond the scope of your topic here, but, how would one incorporate your divde by 9 code inline (in bB)?

I know you know how to insert assembly into your BB file, so all that is left to do load the the value before the routine. The "temp" register above is just that... some temporary register you need for a few lines of code. So first determine which zero page register is holding your value, and insert the routine. I'll give an example with register $A9 holding your value, and $BE a register I can trash (my temp register):

 asm

 lda $A9
 sta $BE
 lsr
 lsr
 lsr
 sbc #1
 eor #$FF
 adc $BE
 lsr
 lsr
 lsr
 sbc $BE
 eor #$FF
 lsr
 lsr
 lsr

;at this point the accumulator holds the result
;of the divide by nine, store the result somewhere,
;or add more assembly voodoo...

end

I've also changed the adc #252 to sbc #1 above. It works, and seems a little cleaner.

Edited January 26, 2013 by Omegamatrix

Omegamatrix · January 26, 2013

Same cycles, 1 less byte:

;divide by 9
;20 bytes, 33 cycles
 sta  temp
 lsr
 lsr
 lsr
 sec
 eor  #$FF
 adc  temp
 lsr
 lsr
 lsr
 sbc  temp
 eor  #$FF
 lsr
 lsr
 lsr

Omegamatrix · January 26, 2013

More... 19 bytes, 31 cycles

 sta  temp
 lsr
 lsr
 lsr
 sbc  temp
 eor  #$FF
 lsr
 lsr
 lsr
 sbc  temp
 eor  #$FF
 lsr
 lsr
 lsr

And as this is my 5000th post... I just wanted to say thank you, AtariAge! It's been quite a ride over the last eight years. Has it really been that long?

Omegamatrix · January 26, 2013

To switch gears a little bit, and step back from the divide by nine routines, I revisited the the divide by 10 routine.

I came up with this. Although it is worse then my other routine by 3 bytes and 1 cycle, it seems to have a simplicity which I love.

;divide by 10
;23 bytes, 36 cycles
 lsr
 sta  temp
 lsr
 lsr
 sbc  temp
 eor  #$FF
 lsr
 lsr
 sbc  temp
 eor  #$FF
 lsr
 lsr
 sbc  temp
 eor  #$FF
 lsr
 lsr

And by contrast, the more optimized divide by 10:

;divide by 10
;20 bytes, 35 cycles
 lsr
 sta  temp
 lsr
 sec
 adc  temp
 sta  temp
 lsr
 lsr
 lsr
 adc  temp
 adc  temp
 ror
 lsr
 lsr
 lsr

Omegamatrix · January 26, 2013

Woke up this morning and tried divide by 3. I found a quicker solution right away.

Old December '84 Apple Assembly Line solution:

;Divide by 3
;20 bytes, 35 cycles
 sta  temp
 lsr
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr

New routine:

;Divide by 3
;18 bytes, 30 cycles
 sta  temp
 lsr
 adc  #21
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr

Omegamatrix · January 26, 2013

Divide by 12, which of course equals (1/4) * (1/3).

;Divide by 12
;20 bytes, 34 cycles
 lsr
 lsr
 sta  temp
 lsr
 adc  #21
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr

Omegamatrix · January 26, 2013

By the same logic, divide by 14, which equals (1/2) * (1/7). The divide by seven routine is straight out of the December '84 Apple Assembly Line Magazine.

;Divide by 14
;16 bytes, 29 cycles
 lsr
 sta  temp
 lsr
 lsr
 lsr
 adc  temp
 ror
 lsr
 lsr
 adc  temp
 ror
 lsr
 lsr

Omegamatrix · January 26, 2013

Found a page with reciprocal multiplication routines for small approximation numbers.

http://homepage.cs.u...bcd/divide.html

I started with the divide by 13 algorithm they had, and at a certain point truncated it and added the divide by 12 routine. It works, but at 36 bytes and 61 cycles it is bulky and slow. I'm not too sure divide by 13 has a real use anyhow.

;divide by 13
;36 bytes, 61 cycles
 sta  temp
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 adc  temp
 ror
 adc  temp
 ror
;divide by 12
 lsr
 lsr
 sta  temp
 lsr
 adc  #21
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr

Edited January 26, 2013 by Omegamatrix

Fast divide by seven?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members