Jump to content

Photo

Fast divide by seven?


127 replies to this topic

#1 brpocock OFFLINE  

brpocock

    Chopper Commander

  • 245 posts
  • Skyline hacker
  • Location:Jacksonville, FL

Posted Tue Sep 11, 2007 12:16 AM

Simple thing that's killing me.

Any suggestions for a fast "divide by seven" routine?

HMPx/HMOVE whacks give either 7 or 8 clocks of motion. Divide by 8 is easy - bit shift - but division by seven is killing me. I can live with doing a dex / sbc loop, but that isn't nice.

Math and I are not friends.

#2 supercat OFFLINE  

supercat

    Quadrunner

  • 6,401 posts

Posted Tue Sep 11, 2007 12:22 AM

Any suggestions for a fast "divide by seven" routine?


Rather than doing a divide-by-seven, I'd suggest doing something like:
ldx displacement ; 0-159
  lda smallmove,x
  sta HMPx
  ...
  sta HMOVE
  lda bigmove,x
  sta HMPx   ; Use this value ten times
  ...
  sta HMOVE
  ...
  sta HMOVE
  ...
  sta HMOVE
  ...
  sta HMOVE ; etc. ten times total for the bigmove,x value
This way you only have two hmove values to worry about, and you always 'whack' things the same number of times.

#3 brpocock OFFLINE  

brpocock

    Chopper Commander

  • Topic Starter
  • 245 posts
  • Skyline hacker
  • Location:Jacksonville, FL

Posted Tue Sep 11, 2007 12:31 AM

This is actually for the "judgement" code to decide sprite drawing-lists ... I have the amount of displacement between two sprites to be entered into the drawing list, and am judging the number of scanlines required to perform the move. Incidentally, this is including a RESPx "shortcut" that allows bumping the position effectively up to 48 clocks (see my blog for details).

I'm tinkering with pretending to move 8 clocks and then throwing in an extra count, which could taint the results, but it's just to decide the best slot in the list for a sprite, so I'm wondering if an approximation might work.

The actual motion code works out something like the above; I (sometimes) hit RESPx (at one of two fixed positions) and then whack HMOVE until I think the sprite is where I want it.

The judgement code rewrite has invalidated the sprite drawing in the kernel loop at the moment, so until I refactor that to match, this is all gut instinct.

#4 Thomas Jentzsch OFFLINE  

Thomas Jentzsch

    Thrust, Jammed, SWOOPS!, Boulder Dash, THREE·S, Star Castle

  • 23,567 posts
  • Always left from right here!
  • Location:Düsseldorf, Germany, Europe, Earth

Posted Tue Sep 11, 2007 12:46 AM

Divide by 7 should be very similar to the well know divide by 15 routine, used for positioning sprites.

#5 brpocock OFFLINE  

brpocock

    Chopper Commander

  • Topic Starter
  • 245 posts
  • Skyline hacker
  • Location:Jacksonville, FL

Posted Tue Sep 11, 2007 1:23 AM

In the interests of getting it to Just Work At All, I'm dropping this logic actually, but thanks to you both for many useful suggestions.

My revised idea, rethinking a few things based on Thomas's comments on my blog, is to simplify the drawing list somewhat by only drawing a full sprite or not drawing it at all, which terribly simplifies the logic in the judgement routine, and the revised mess follows.

An ideal divide by seven would make the judgement code more accurate, but I'm being pessimistic and doing a divide by eight, and possibly ruling out some slots which otherwise could have held sprites, in favour of more reasonable execution times, and the fact that my local time is 3:30am and complex math is never my strong suit to begin with.

I suppose I could resort to a div7 table, as well :-( I think the valid range is only 0..49 or so, depending on how my RESPx hits end up aligning in the kernel, so at least it's a smallish table. Even smallish tables are starting to hurt me, though.

I'll dump some things to my blog in a moment, rather than polluting this abortive thread.

#6 batari OFFLINE  

batari

    )66]U('=I;B$*

  • 6,673 posts
  • begin 644 contest

Posted Tue Sep 11, 2007 2:30 AM

Dividing by 7 is a bitch, and there's not going to be a fast and short way to do it. A table is the only way to do it in a kernel.

However, posting requests like this may get us 6502 coders thinking.

Here's a useless attempt. 40 cycles, 25 bytes. Works from 0-63 (64+ is possible with more shifting and adding)
lda value
  lsr
  lsr
  lsr
  sta temp
  lda value
  and #%11111000
  clc
  sbc temp
  clc
  sbc value
  eor #$ff
  lsr
  lsr
  lsr
  clc
  adc temp


#7 brpocock OFFLINE  

brpocock

    Chopper Commander

  • Topic Starter
  • 245 posts
  • Skyline hacker
  • Location:Jacksonville, FL

Posted Tue Sep 11, 2007 7:51 AM

Here's a useless attempt. 40 cycles, 25 bytes.


OK, already better than my attempts. I'm still mystified, where do you even get the ideas for these shortcuts?

#8 Thomas Jentzsch OFFLINE  

Thomas Jentzsch

    Thrust, Jammed, SWOOPS!, Boulder Dash, THREE·S, Star Castle

  • 23,567 posts
  • Always left from right here!
  • Location:Düsseldorf, Germany, Europe, Earth

Posted Tue Sep 11, 2007 10:13 AM

The divide by 15 modification I mentioned above
tay					 
	and	#$07	   
	sta	tmpVar	 
	tya					
	lsr					 
	lsr					 
	lsr					 
	tay					
	clc					 
	adc	tmpVar	
.loop:
	cmp	#$07	  
	bcc	.skipIny	
	sbc	#$07		
	iny					 
	bne	.loop	

.skipIny:
Basically it divides by 8 and then corrects the result. Works from 0..255.

Edited by Thomas Jentzsch, Tue Sep 11, 2007 10:15 AM.


#9 djmips OFFLINE  

djmips

    Dragonstomper

  • 627 posts
  • scrolling
  • Location:Seattle

Posted Thu Sep 13, 2007 1:06 AM

I seem to recall that December 1984 issue of Apple Assembly line had a quirky routine for dividing any value up to 255 by 7 that appears to be faster than Batari's

[ Hint: 1/7 = 1/8 + 1/64 + 1/512 + 1/4096 + ... ]

DIVIDE_BY_SEVEN:
  LDA BYTE
  LSR	 
  LSR	 
  LSR	 
  ADC BYTE
  ROR
  LSR
  LSR
  ADC BYTE
  ROR 
  LSR
  LSR
  RTS



#10 jmrichardson OFFLINE  

jmrichardson

    Combat Commando

  • 1 posts

Posted Thu Jan 15, 2009 8:04 AM

And the reason the above, very economical code works is because:

Posted Image

#11 danwinslow OFFLINE  

danwinslow

    River Patroller

  • 2,561 posts

Posted Thu Jan 15, 2009 8:20 AM

!

brane hert

#12 TROGDOR OFFLINE  

TROGDOR

    Chopper Commander

  • 161 posts
  • Why Yes, I Have Played Atari Today.
  • Location:Strongbadia

Posted Sat Jan 17, 2009 3:05 AM

Very nice. Here's the original article. It also includes a compact divide by three, and the infamous divide by 15. This technique can be adapted for all n in (2**n)-1, so variants would work for divide by 3, 7, 15, 31, and 63.

DIVIDE.BY.THREE
		 LDA BYTE
		 LSR
		 LSR
		 ADC BYTE
		 ROR 
		 LSR
		 ADC BYTE
		 ROR 
		 LSR
		 ADC BYTE
		 ROR 
		 LSR
		 ADC BYTE
		 ROR 
		 LSR
		 RTS

Does anyone know any tricks like this for dividing by 5? Maybe some kind of (2**n)+1 equivalent? If you can divide by 5, you're only a shift away from dividing by 10. :cool:

#13 batari OFFLINE  

batari

    )66]U('=I;B$*

  • 6,673 posts
  • begin 644 contest

Posted Sat Jan 17, 2009 3:42 AM

Very nice. Here's the original article. It also includes a compact divide by three, and the infamous divide by 15. This technique can be adapted for all n in (2**n)-1, so variants would work for divide by 3, 7, 15, 31, and 63.

DIVIDE.BY.THREE
		 LDA BYTE
		 LSR
		 LSR
		 ADC BYTE
		 ROR 
		 LSR
		 ADC BYTE
		 ROR 
		 LSR
		 ADC BYTE
		 ROR 
		 LSR
		 ADC BYTE
		 ROR 
		 LSR
		 RTS

Does anyone know any tricks like this for dividing by 5? Maybe some kind of (2**n)+1 equivalent? If you can divide by 5, you're only a shift away from dividing by 10. :cool:

TO divide by 10, just do an LSR, CLC, ADC BYTE, then divide by 15.

#14 TROGDOR OFFLINE  

TROGDOR

    Chopper Commander

  • 161 posts
  • Why Yes, I Have Played Atari Today.
  • Location:Strongbadia

Posted Mon Jan 19, 2009 1:00 AM

TO divide by 10, just do an LSR, CLC, ADC BYTE, then divide by 15.

That's a good suggestion batari, but it has restrictions. That's (x * 1.5) / 15 = x / 10. However, it only works for values up to 170. 171 * 1.5 = 256.5, which would cause an overflow.

#15 roland p OFFLINE  

roland p

    River Patroller

  • 2,455 posts
  • $23
  • Location:The Netherlands

Posted Mon Jan 19, 2009 1:07 AM

Are there fast 16bit div 8bit routines?

#16 grafixbmp OFFLINE  

grafixbmp

    Dragonstomper

  • 681 posts
  • Location:South Central US

Posted Mon Jan 19, 2009 1:44 AM

I couldn't help but find great intrest in the topic and wonder if anyone has concidered using Vedic mathemetic routines to try and enhance the mathematic process on the atari? here is a link to some of the info and a bit on the types of processes involved Vedic Mathematics

The link is in the reading:



Multiply a two-figure number by 11
This couldn’t be easier — you simply put the sum of the first and last figure inbetween.

72 x 11 = 7 (7+2) 2 = 792

If the sum of the two figures is greater than 9 you simply carry the first figure of the sum and add it to to first figure of your number.

79 x 11 = 7 (7+9) 9 = 7 (16) 9 = 869

Dividing by 9
Here the answer for a two-figure number is the first figure as the result and the sum of the two figures the remainder.

23 / 9
result 2
remainder 2+3 = 5

For dividing three-figure numbers you need to add the sum of the first two numbers to the first figure and the total sum of the three numbers is your remainder.

134 / 9
result 1 (1+3) = 14
remainder (1+3+4) = 8

www.vedicmaths.org

#17 selgus OFFLINE  

selgus

    Moonsweeper

  • 333 posts
  • Location:Orlando, Florida

Posted Mon Jan 19, 2009 6:39 AM

Are there fast 16bit div 8bit routines?

It doesn't have a 16-bit/8-bit routine, but here are some good references to 6502/6510 math routines.
--Selgus

#18 danwinslow OFFLINE  

danwinslow

    River Patroller

  • 2,561 posts

Posted Mon Jan 19, 2009 6:59 AM

I couldn't help but find great intrest in the topic and wonder if anyone has concidered using Vedic mathemetic routines to try and enhance the mathematic process on the atari? here is a link to some of the info and a bit on the types of processes involved Vedic Mathematics

The link is in the reading:



Multiply a two-figure number by 11
This couldn’t be easier — you simply put the sum of the first and last figure inbetween.

72 x 11 = 7 (7+2) 2 = 792

If the sum of the two figures is greater than 9 you simply carry the first figure of the sum and add it to to first figure of your number.

79 x 11 = 7 (7+9) 9 = 7 (16) 9 = 869

Dividing by 9
Here the answer for a two-figure number is the first figure as the result and the sum of the two figures the remainder.

23 / 9
result 2
remainder 2+3 = 5

For dividing three-figure numbers you need to add the sum of the first two numbers to the first figure and the total sum of the three numbers is your remainder.

134 / 9
result 1 (1+3) = 14
remainder (1+3+4) = 8

www.vedicmaths.org


Well, thats interesting stuff, but seems like it would work best for people dealing with numerals on paper. Digits are not stored that way inside a computer.

#19 Rybags OFFLINE  

Rybags

    Quadrunner

  • 15,793 posts
  • Location:Australia

Posted Mon Jan 19, 2009 7:06 AM

Might work OK for stuff in BCD though.

#20 grafixbmp OFFLINE  

grafixbmp

    Dragonstomper

  • 681 posts
  • Location:South Central US

Posted Mon Jan 19, 2009 1:36 PM

There is also another process a friend of mine introduced me to once, but can't quite remember what it was called. It was either pure hex or true hex or something to that effect. It might be more adept at being used on a 6502 since it deals with hex values. But while I collect some more info on the subject, here is a part of the processes used in vedic for multiplication. It may not work well for a processor but it is still nice to know. It deals with the fact that the math is done with base 10 and uses it as so...

Suppose you need 8 x 7

8 is 2 below 10 and 7 is 3 below 10.
Think of it like this:

The answer is 56.
The diagram below shows how you get it.

You subtract crosswise 8-3 or 7 - 2 to get 5,
the first figure of the answer.
And you multiply vertically: 2 x 3 to get 6,
the last figure of the answer.


I'll get back after I see what I can find on the hex stuff.

Edited by grafixbmp, Mon Jan 19, 2009 1:38 PM.


#21 batari OFFLINE  

batari

    )66]U('=I;B$*

  • 6,673 posts
  • begin 644 contest

Posted Mon Jan 19, 2009 2:18 PM

I'm sure the multiplication topics will help someone, but in general, multiplication is easy. Division is not.

This is my first attempt at division by 10, which accounts for numbers larger an 170 by setting the MSB in two instances.
lda byte
  lsr
  clc
  adc byte
  php
  php
  sta temp
  ror
  lsr
  lsr
  lsr
  adc temp
  ror
  plp
  bcc .1
  ora #$80
.1
  lsr
  lsr
  lsr
  adc temp
  ror
  plp
  bcc .2
  ora #$80
.2
  lsr
  lsr
  lsr
I'm sure this can be improved.

Edited by batari, Mon Jan 19, 2009 2:19 PM.


#22 GroovyBee OFFLINE  

GroovyBee

    Games Developer

  • 9,821 posts
  • Busy bee!
  • Location:England

Posted Mon Jan 19, 2009 4:47 PM

My attempt at a fast divide by 10 (no branches, but trashes X) :-

lda byte
	lsr
	lsr
	lsr
	lsr
	sta temp
	lsr
	tax
	clc
	adc temp
	sta temp
	txa
	lsr
	tax
	clc
	adc temp
	sta temp
	txa
	lsr
	lsr
	clc
	adc temp


#23 batari OFFLINE  

batari

    )66]U('=I;B$*

  • 6,673 posts
  • begin 644 contest

Posted Mon Jan 19, 2009 5:18 PM

My attempt at a fast divide by 10 (no branches, but trashes X) :-

lda byte
	lsr
	lsr
	lsr
	lsr
	sta temp
	lsr
	tax
	clc
	adc temp
	sta temp
	txa
	lsr
	tax
	clc
	adc temp
	sta temp
	txa
	lsr
	lsr
	clc
	adc temp

I just tried it, and it doesn't work for all values 0-255. For example, if byte=10, the result is zero.

#24 supercat OFFLINE  

supercat

    Quadrunner

  • 6,401 posts

Posted Mon Jan 19, 2009 6:19 PM

I'm sure the multiplication topics will help someone, but in general, multiplication is easy. Division is not.


For medium-sized numbers, there's a huge difference. For really big numbers (crypto-sized), the difference isn't as significant, since the number of steps required to approximate 1/x to a certain number of bits is roughly proportional to the log of the number of bits; once one has a good estimate for the reciprocal of a dividend, the number of multiply/subtract operations is related to the amount of error in the approximated reciprocal.

I haven't looked to see to what extent this trick is used in RSA crypto implementations, but if it isn't used, it should be.

#25 Omegamatrix OFFLINE  

Omegamatrix

    Quadrunner

  • 6,215 posts
  • Location:Canada

Posted Wed Jan 21, 2009 3:33 AM

Hello Batari,


I tried to improve the code you posted, and I think I've saved 5 cycles, and 1 register (comparing the two used to hold the processer status, vs a second temp that could be in the same location) at a cost of 2 bytes. Though I haven't fully tested the code (and everytime I do that it bites me in the ass!), but I've been trying this for hours now and I'm tired. I think it works.


;divide by 10
;numbers > 170 (straight thru) take 60 cycles, numbers <= 170 (branched) take 62 cycles

	lda   byte	; 3
	lsr		; 2
	clc		; 2
	adc   byte	; 3
	sta   temp2; 3
	ror		; 2
	lsr		; 2
	sta   temp1; 3
	lsr		; 2
	lsr		; 2
	adc   temp2; 3
	ror		; 2
	lsr		; 2
	lsr		; 2
	lsr		; 2
	bit   temp1; 3
	bvc   .skip; 2³
	ora   #$10	; 2
.skip:
	adc   temp2; 3
	ror		; 2
	lsr		; 2
	lsr		; 2
	lsr		; 2
	bit   temp1; 3
	bvc   .exit; 2³
	ora   #$10	; 2
.exit:


I've been trying for hours to somehow double up the code and save a bunch of bytes, but every time I do something takes way longer. I hope that someone can improve this code much more. There's got to be some sick way of doing this with some illegal ops or something. :lol:


Edited for a last minute fix too!

Edited by Omegamatrix, Wed Jan 21, 2009 4:02 AM.





0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users