Jump to content
IGNORED

Fast divide by seven?


brpocock

Recommended Posts

How'd you come up with that?

 

Someone posted a question about bit twiddling in one of the 6502.org forums

that was part of the answer

 

some interesting stuff there (one of the interesting things is a pointer in one

of the forums to this forum :) )

 

That ranks right up there with the following routine someone came up with (on another platform, but adaptable to 6502):

; Magical mystery routine (hint: use with acc = 0-15)
 sed
 adc #$90
 adc #$40
 cld

Anyone ever seen anything like that one before?

 

I believe I saw that posted to comp.sys.cbm some years ago, don't remember where

it was said to have come from.

 

Don't you need a clc in there?

Link to comment
Share on other sites

Wonder if you could impliment a similar process for other numbers too. (still preserving the x and y)

 

Sure it's basically just reciprocal multiplication

 

divide by 7 1/7 = .001001001001001...

 

 sta temp   1.
lsr		 .1
lsr		 .01
lsr		 .001
adc temp   1.001	   leave the carry alone to round
ror		 .1001  
lsr		 .01001
lsr		 .001001
adc temp   1.001001
ror		 .1001001
lsr		 .01001001
lsr		 .001001001

 

what I find intriguing is that, since they're all repeating decimals,

you can (sort of) accumulate accuracy

 

 sta temp   1.
lsr		 .1
lsr		 .01
lsr		 .001
adc temp   1.001
sta temp   1.001	  
ror		 .1001  
lsr		 .01001
lsr		 .001001
lsr		 .0001001	 < I've skipped an addition here
lsr		 .00001001
lsr		 .000001001
adc temp   1.001001001
ror		 .1001001001
lsr		 .01001001001
lsr		 .001001001001

 

obviously doesn't help any in the case of dividing an 8 bit number by 7

but I've gotten to the point where it's less work to rol

 

 sta temp xxxxxxxx   1.
lsr	  xxxxxxxx	.1
lsr	  0xxxxxxx	.01
lsr	  00xxxxxx	.001
adc temp xxxxxxxx   1.001
sta temp xxxxxxxx   1.001	  
rol	  jjjjjjjx	.00000001001   I've (in effect) jumped a byte to the right 
rol	  jjjjjjxx	.0000001001	the j's are junk that would have been
rol	  jjjjjxxx	.000001001	 shifted out now I have to mask them off
and #$F8 00000xxx	.000001001	 still slightly faster
adc temp xxxxxxxx   1.001001001
ror	  xxxxxxxx	.1001001001
lsr	  0xxxxxxx	.01001001001
lsr	  00xxxxxx	.001001001001

 

OK, that doesn't help much either but it might if you were dividing more than

8 bits by a constant and in the cases where the repeating part is some integral

fraction of a byte as is the case for eg division by 10 or 15 I could acummulate

a full bytes worth and not have do any actual shifting at all

 

But I don't know if it's likely to be useful for any case of an 8 bit dividend.

(or usefull at all for that matter, but I'm thinking 16 bit binary to decimal)

For example Omegamatrix managed to do a pretty good job on division by 15 with

just some rounding.

 

Edit:changed some lsr's to the rol's they should have been (second and third rol)

Edited by bogax
Link to comment
Share on other sites

How'd you come up with that?

 

Someone posted a question about bit twiddling in one of the 6502.org forums

that was part of the answer

 

some interesting stuff there (one of the interesting things is a pointer in one

of the forums to this forum :) )

 

That ranks right up there with the following routine someone came up with (on another platform, but adaptable to 6502):

; Magical mystery routine (hint: use with acc = 0-15)
 sed
 adc #$90
 adc #$40
 cld

Anyone ever seen anything like that one before?

 

I believe I saw that posted to comp.sys.cbm some years ago, don't remember where

it was said to have come from.

 

Don't you need a clc in there?

 

I think I have seen this as:

 

sed
cmp #$a
adc #$30
cld

Link to comment
Share on other sites

Bogax, were you just showning the reciprocal extension, or dividing by 7? I tried the first routine and it failed from numbers 228 up. At 228 they started back at zero. The second routine broke very early. It gave me a one when dividing 6 by seven. I didn't check anymore after that.

 

My bad

 

I should have said the code was untested and just for illustration

 

I was trying to point out that most of these routines (obviously not the LUTs)

are multiplication by reciprocals and that the reciprocals are repeating decimals (binimals? :) )

and the fact that they're repeating could probably be used to advantage.

 

The first routine is meant to be essentially the divide by 7 routine previously

posted by djmips except that I have it with the dividend starting out in A

instead of in memory (which I also failed to mention).

I figured it was so simple and straight forward it would be good for illustration

 

Sounds like you used an lsr where you should have an ror?

 

The second and third routines will fail for values over 227. I believe that

could be fixed by moving the last sta and adc down one shift so you get

the carry into A before you save it to temp like so:

 

sta temp
lsr
lsr
lsr
adc temp
ror
sta temp
lsr
lsr
lsr
lsr
lsr
lsr
adc temp
ror
lsr

 

hmm

dividing 6 by 7 should be like this

 

sta temp	00000110.	 00000110
lsr		 00000011.0
lsr		 00000001.1
lsr		 00000000.1
adc temp	00000111.0
sta temp	00000111.0	00000111
ror		 00000011.1
lsr		 00000001.1
lsr		 00000000.1
lsr		 00000000.0
lsr		 00000000.0
lsr		 00000000.0
adc temp	00000111.0
ror		 00000011.1
lsr		 00000001.1
lsr		 00000000.1

 

For the third routine I'll have to look at it again, I might well have screwed

something up

Link to comment
Share on other sites

Sorry, I didn't post that very clear. I knew the first routine was the /7 djmips posted. It works fine. I meant your first routine as the middle one, and the last routine as your second.

 

 

The value starting out in the accumulator was fine. I've been testing the routines with a simple loop that takes increments a temp register every time it goes through, and looking at the values through the Stella emulator.

 

 

 

Edit: Also is there a way to go faster (and maybe save more bytes) then the previous /5 I did? If we can find a faster /5 then we can get a faster /10 too. :cool:

Edited by Omegamatrix
Link to comment
Share on other sites

I edited that third routine.

It will still fail at 228 and there's no good way to fix that

with it rotating left instead of right (that I can think of at least)

Oh well, it was only ment to be an illustration, I guess that's

as good an illustration as any :)

 

Edit: Also is there a way to go faster (and maybe save more bytes) then the previous /5 I did? If we can find a faster /5 then we can get a faster /10 too. :cool:

 

I think with division into eight bits the simple straight forward way

is going to be the best we can do

 

here's a division by 10 similar to your division by 15

 

UNTESTED!

 

 sta temp
lsr
sec
adc temp
sta temp
lsr
lsr
lsr
lsr
adc temp
lsr
lsr
lsr
lsr

 

I expect, if you care to test it, you'll get around to that before I do

 

hmm wonder it something similar can be done for division by 7

Link to comment
Share on other sites

I tried it, but it didn't work. It's the same issue that Batari resolved; how to account for numbers overflowing when times the original number by 1.5.

 

 

Your routine will work if you replace the 2 LSR's with ROR's here:

 

sta temp

lsr

sec

adc temp

sta temp

lsr < ror

lsr

lsr

lsr

adc temp

lsr < ror

lsr

lsr

lsr

 

but the second ror has to account for the original overflow. Post 44 is very similar to the approach here, where I kept the original value in the X register, and then compared it before the rotate. Getting around the time of not doing the compare, TAX, or SEC was always the problem. I managed to go a little faster after that, but could never find a way to do without SEC and save the 2 cycles there.

Link to comment
Share on other sites

  • 3 years later...

A recent post by Bogax made me think about this old topic once again. I wrote a small Atari 200 program to help me find solutions to these division problems some time ago:

 

 

DivideTest.zip

 

 

I came up with a framework to division by 5 sometime ago, earlier in this thread. Thinking about it today I realized I was doing (2^n)+1 division, and I wondered if the algorithm could be extended to other (2^n)+1 divisions (such as the December '84 Apple Assembly Line routine could be extended to other (2^n)-1 divisions).

 

 

So I tried it with divide by nine, and the very first try worked! At 29 bytes and 48 cycles it is a beast of a routine, but I'm sure could be optimized with some permutations such as I've done with divide by 5, 10, and 15. I'm not sure if divide by nine has a good use though. I've put the routine in the assembly file above, and I'll include it just below as well.

 

  sta  temp
 lsr
 lsr
 lsr
 eor  #$FF
 sec
 adc  temp
 lsr
 lsr
 lsr
 eor  #$FF
 sec
 adc  temp
 lsr
 lsr
 lsr
 eor  #$FF
 sec
 adc  temp
 lsr
 lsr
 lsr

Edited by Omegamatrix
  • Like 1
Link to comment
Share on other sites

I found a more optimized solution in about 10 minutes by letting the program run. I truncated one of these sections:

 

 eor  #$FF
 sec
 adc  temp
 lsr
 lsr
 lsr

 

Then I inserted an "ADC testVariable" with MODE = 0. I let the program run and Whamo! It found a solution with adc #252. So we are now down 23 bytes and 37 cycles.

 

 sta  temp
 lsr
 lsr
 lsr
 adc  #252
 eor  #$FF
 sec
 adc  temp
 lsr
 lsr
 lsr
 eor  #$FF
 sec
 adc  temp
 lsr
 lsr
 lsr

  • Like 1
Link to comment
Share on other sites

I'm making a batariBasic game that leans pretty heavily on pseudo random numbers and bit shiting. I know this is beyond the scope of your topic here, but, how would one incorporate your divde by 9 code inline (in bB)?

 

I know you know how to insert assembly into your BB file, so all that is left to do load the the value before the routine. The "temp" register above is just that... some temporary register you need for a few lines of code. So first determine which zero page register is holding your value, and insert the routine. I'll give an example with register $A9 holding your value, and $BE a register I can trash (my temp register):

 

 asm

 lda $A9
 sta $BE
 lsr
 lsr
 lsr
 sbc #1
 eor #$FF
 adc $BE
 lsr
 lsr
 lsr
 sbc $BE
 eor #$FF
 lsr
 lsr
 lsr

;at this point the accumulator holds the result
;of the divide by nine, store the result somewhere,
;or add more assembly voodoo...

end

 

 

I've also changed the adc #252 to sbc #1 above. It works, and seems a little cleaner.

Edited by Omegamatrix
  • Like 1
Link to comment
Share on other sites

To switch gears a little bit, and step back from the divide by nine routines, I revisited the the divide by 10 routine.

 

 

I came up with this. Although it is worse then my other routine by 3 bytes and 1 cycle, it seems to have a simplicity which I love.

 

;divide by 10
;23 bytes, 36 cycles
 lsr
 sta  temp
 lsr
 lsr
 sbc  temp
 eor  #$FF
 lsr
 lsr
 sbc  temp
 eor  #$FF
 lsr
 lsr
 sbc  temp
 eor  #$FF
 lsr
 lsr

 

And by contrast, the more optimized divide by 10:

 

;divide by 10
;20 bytes, 35 cycles
 lsr
 sta  temp
 lsr
 sec
 adc  temp
 sta  temp
 lsr
 lsr
 lsr
 adc  temp
 adc  temp
 ror
 lsr
 lsr
 lsr

  • Like 1
Link to comment
Share on other sites

Woke up this morning and tried divide by 3. I found a quicker solution right away. :)

 

Old December '84 Apple Assembly Line solution:

 

;Divide by 3
;20 bytes, 35 cycles
 sta  temp
 lsr
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr

 

 

New routine:

;Divide by 3
;18 bytes, 30 cycles
 sta  temp
 lsr
 adc  #21
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr

  • Like 1
Link to comment
Share on other sites

Found a page with reciprocal multiplication routines for small approximation numbers.

 

http://homepage.cs.u...bcd/divide.html

 

I started with the divide by 13 algorithm they had, and at a certain point truncated it and added the divide by 12 routine. It works, but at 36 bytes and 61 cycles it is bulky and slow. I'm not too sure divide by 13 has a real use anyhow.

 

 

;divide by 13
;36 bytes, 61 cycles
 sta  temp
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 adc  temp
 ror
 adc  temp
 ror
;divide by 12
 lsr
 lsr
 sta  temp
 lsr
 adc  #21
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr
 adc  temp
 ror
 lsr

Edited by Omegamatrix
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...