Fast divide by seven?

danwinslow · February 11, 2009

It's hard to say....I don't completely follow you. My impression is that you are wandering off towards more theoretical ideas that really have nothing to do with what's at hand, namely, efficient assembler routines to do 8 bit division. It's not that you aren't bringing up interesting things, you are, but they are not really apropos. Then again, I don't follow the assembler discussion very well either , but I like to try to follow along.

Omegamatrix · February 12, 2009

I would suggest that the overhead you speak of would start to take more cycles then an efficent divide by routine. The divde by seven routine is brilliant and will be hard to beat.

Relatively speaking the fastest way to do division (except /2) is to just use a table that holds every possible value. That would cost you a page worth of bytes for your table though! So right away a hybrid method like Supercat's is much more economical.

I absolutely agree with pattern recognition though. If you can get a close approximation to what you are dividing then you have a hope of finding a pattern to correct the division on the numbers it fails. I would suggest laying it all out in an excel sheet. I did that to see where the /10 routine failed, and how close it was at a given point.

+grafixbmp · February 12, 2009

I would suggest that the overhead you speak of would start to take more cycles then an efficent divide by routine. The divde by seven routine is brilliant and will be hard to beat.

Relatively speaking the fastest way to do division (except /2) is to just use a table that holds every possible value. That would cost you a page worth of bytes for your table though! So right away a hybrid method like Supercat's is much more economical.

I absolutely agree with pattern recognition though. If you can get a close approximation to what you are dividing then you have a hope of finding a pattern to correct the division on the numbers it fails. I would suggest laying it all out in an excel sheet. I did that to see where the /10 routine failed, and how close it was at a given point.

Very cool. Excel rocks for this. And I do agree, can't get any better than the current /7. But what about other ones that are harder or longer routines that are currently being used now which could use improving?

What about using less RAM to store large values. Larger than current 16 bit values can muster. That website for prime hex shows some intresting ways of counting. The friend that introduced me to hex counting suggested having a method for counting really large stored numbers by counting the hex rings in the visual pattern along with a value for an offset and beable to accuretely store a large number with far less space than usual.

Just 4 full rings reach from 0 to 37 with the inner one not being a ring at all. Add a 5th full ring and you jump to 61.

if one were to treat bianary as these rings then numbers would follow like this:

0

1

6

7

12

13

18

19

here 18 and 19 could repeat with either just the 4th ring or the 4th and first. With this, off sets might be easier to do.

Omegamatrix · February 13, 2009

I don't really follow you when you are talking about rings, Grafixbmp. I would guess most of the numbers we are trying to divide are a byte or less though for the 2600. Out of that there really are only a handful that are used a lot. The divide by 15 repositioning is one of the most important. I seem to remember Cybergoth discovering it in Battlezone. It repositioned your sprite horizontally any where on the scanline in a very clever way.

The December '84 Apple Assembly Line link posted earlier in this thread also had a beautiful /15 routine. I looked at it and found it could be improved to this:

;divide by 15
;26 cycles, 15 bytes

sta	temp	   ; 3
lsr			   ; 2
adc	#0		 ; 2
lsr			   ; 2
lsr			   ; 2
lsr			   ; 2
sec			   ; 2
adc	temp	   ; 3
ror			   ; 2
lsr			   ; 2
lsr			   ; 2
lsr			   ; 2

Of course it is of no use as a repositioning routine, but might be of use if you need to do /15 otherwise.

+grafixbmp · February 13, 2009

I don't really follow you when you are talking about rings, Grafixbmp. I would guess most of the numbers we are trying to divide are a byte or less though for the 2600. Out of that there really are only a handful that are used a lot. The divide by 15 repositioning is one of the most important. I seem to remember Cybergoth discovering it in Battlezone. It repositioned your sprite horizontally any where on the scanline in a very clever way.

The December '84 Apple Assembly Line link posted earlier in this thread also had a beautiful /15 routine. I looked at it and found it could be improved to this:
;divide by 15
;26 cycles, 15 bytes

sta	temp	 ; 3
lsr			 ; 2
adc	#0	; 2
lsr			 ; 2
lsr			 ; 2
lsr			 ; 2
sec			 ; 2
adc	temp	 ; 3
ror			 ; 2
lsr			 ; 2
lsr			 ; 2
lsr			 ; 2
Of course it is of no use as a repositioning routine, but might be of use if you need to do /15 otherwise.

I'm sorry. I went off on a bit of a tangent before. I was reffering to this: http://mathworld.wolfram.com/HexNumber.html

A friend of mine was looking once at using this form of counting to compress large amounts of data into a small space by using some clever math and this form of counting. Seemed rather neat and since it dealt with hex values, I figured it might have a home somewhere in the atari 2600 world. I was just trying to throw something new out into the mix and see if it stired any brain cells out there.

Edited February 13, 2009 by grafixbmp

+grafixbmp · February 13, 2009

Another attempt :-

		ldx #0		; 2
	lda byte	  ; 3
	cmp #200	  ; 2
	bcc below200  ; 2/3
	sbc #200	  ; 2
	ldx #20	; 2
	jmp below100  ; 3
below200
	cmp #100	  ; 2
	bcc below100  ; 2/3
	sbc #100	  ; 2
	ldx #10	; 2
below100
	cmp #50	; 2
	bcc below50; 2/3
	sbc #50	; 2
	sta temp	  ; 3
	txa		; 2
	ldx temp	  ; 3
	adc #5-1	  ; 2 (Carry set)
	jmp above50; 3
below50
	sta temp	  ; 3
	txa		; 2
	ldx temp	  ; 3
above50
	cpx #10	; 2
	bcc done	  ; 2/3
	adc #0		; 2
	cpx #20	; 2
	bcc done	  ; 2/3
	adc #0		; 2
	cpx #30	; 2
	bcc done	  ; 2/3
	adc #0		; 2
	cpx #40	; 2
	adc #0		; 2
done

Lots of branches and variable execution time. Tested against the Batari original.

I would suggest that the overhead you speak of would start to take more cycles then an efficent divide by routine. The divde by seven routine is brilliant and will be hard to beat.

Relatively speaking the fastest way to do division (except /2) is to just use a table that holds every possible value. That would cost you a page worth of bytes for your table though! So right away a hybrid method like Supercat's is much more economical.

I absolutely agree with pattern recognition though. If you can get a close approximation to what you are dividing then you have a hope of finding a pattern to correct the division on the numbers it fails. I would suggest laying it all out in an excel sheet. I did that to see where the /10 routine failed, and how close it was at a given point.

The overhead I was refering to was like unto the code posted here by groovybee. Where within certain ranges, treat the equasion diffrently. If a piece of code works best for low numbers, use it. If another section of code works best for middle range numbers use that. Or on a diffrent way, if a certain pattern to nubers works better with one set of code... You see where I am going. Your entire routine may take up more space than before but as it runs, depending on what number is being worked with, it will jump to the most efficient set of processes to get the answer.

Omegamatrix · February 15, 2009

I looked for a divide by 5 routine and couldn't find one. Has anyone wrote one before? Anyway I had a go at it and this is about my 3rd working attempt. I looked at each number through Stella's debugger. I think it's right. :ponder: I'm going to keep trying to cut it down.

;divide by 5
;40 cycles, 25 bytes

sta	temp	   ; 3
lsr			   ; 2
lsr			   ; 2
eor	#$FF	   ; 2
sec			   ; 2
adc	temp	   ; 3
lsr			   ; 2
lsr			   ; 2
eor	#$FF	   ; 2
sec			   ; 2
adc	temp	   ; 3
lsr			   ; 2
lsr			   ; 2
eor	#$FF	   ; 2
sec			   ; 2
adc	temp	   ; 3
lsr			   ; 2
lsr			   ; 2

Omegamatrix · February 15, 2009

If execution time is not as important as bytes, and you need to do both unsigned /5 and /10 in your routine; then use the /5 routine below. For /10 Simply do another LSR after the routine and you're there.

;slow divide by 5
;56 cycles, 16 bytes

sta	temp	   ; 3
ldy	#3		 ; 2
lsr			   ; 2
lsr			   ; 2
.loop
eor	#$FF	   ; 2
sec			   ; 2
adc	temp	   ; 3
lsr			   ; 2
lsr			   ; 2
dey			   ; 2
bne	.loop	  ; 2/3

Omegamatrix · February 16, 2009

I found a way to save two cycles and one byte, but couldn't make any more head way.

;divide by 5
;38 cycles, 24 bytes
sta	temp	   ; 3
lsr			   ; 2
adc	#1		 ; 2
adc	temp	   ; 3
ror			   ; 2
lsr			   ; 2
lsr			   ; 2
eor	#$FF	   ; 2
sec			   ; 2
adc	temp	   ; 3
lsr			   ; 2
lsr			   ; 2
eor	#$FF	   ; 2
sec			   ; 2
adc	temp	   ; 3
lsr			   ; 2
lsr			   ; 2

Then I happened upon illegal opcode $4B (ASR). This seems taylor made for what I need. It can AND, right shift, and guarantee the carry will be clear (by ANDing then shifting). I really hope this opcode is stable. It saves 6 cycles over the first routine I posted while using the same amount of bytes.

;divide by 5
;uses illegal opcode $4B (ASR)
;34 cycles, 25 bytes

sta	temp	   ; 3
lsr			   ; 2
asr	#$FE	   ; 2
sbc	temp	   ; 3
eor	#$FF	   ; 2
lsr			   ; 2
asr	#$FE	   ; 2
sbc	temp	   ; 3
eor	#$FF	   ; 2
lsr			   ; 2
asr	#$FE	   ; 2
sbc	temp	   ; 3
eor	#$FF	   ; 2
lsr			   ; 2
lsr			   ; 2

danwinslow · February 16, 2009

Nice....I was just reading up on some of the illegal opcodes, and its cool to see a clear usage. Anyone know why they are 'illegal'?

Omegamatrix · February 16, 2009

Anyone know why they are 'illegal'?

I guess because you weren't supposed to use them. Since they weren't documented (officially) it meant they were not guaranteed to be there if the IC was revised. This was long ago. Today the only revisions are stuff like the Flashback consoles. Some of the late Jr models might have more trouble with some illegal ops. I haven't tested any hardware myself; just read tibits here and there.

Some people avoid illegal ops like the plaque, others use them all the time. I think all agree in a situation where they work they can be extremely useful. Then can save bytes, save cycles, or save both.

I found by merging the last two routines I can save a byte.

;divide by 5
;34 cycles, 24 bytes
;uses illegal opcode #$4B (ASR)

sta	temp	   ; 3
lsr			   ; 2
adc	#1		 ; 2
adc	temp	   ; 3
ror			   ; 2
lsr			   ; 2
asr	#$FE	   ; 2
sbc	temp	   ; 3
eor	#$FF	   ; 2
lsr			   ; 2
asr	#$FE	   ; 2
sbc	temp	   ; 3
eor	#$FF	   ; 2
lsr			   ; 2
lsr			   ; 2

Omegamatrix · February 18, 2009

Crunched two more cycles, saved another byte. By adding an LSR at the end of the routine you can do a /10 which would be one cycle faster then the last /10 solution I posted. However, it will cost you more bytes.

;divide by 5
;uses illegal opcode $4B (ASR)
;32 cycles, 23 bytes

sta	temp	   ; 3
lsr			   ; 2
asr	#$FE	   ; 2
sbc	temp	   ; 3
lsr			   ; 2
asr	#$FE	   ; 2
adc	#$C1	   ; 2
adc	temp	   ; 3
lsr			   ; 2
asr	#$FE	   ; 2
sbc	temp	   ; 3
eor	#$FF	   ; 2
lsr			   ; 2
lsr			   ; 2

This latest solution is not pretty. Actually it is quite ugly, but it works. I derived adding $C1 by swapping bits of the routines in and out with each other. At one point it became clear I was toggling all the bits of the value twice (so changing nothing). However in between came two right shifts and adding zero with the carry set. So bits 6 and 7 became 1's plus the carry = $C1. Might not be a great explaination, but my brain is a little fried right now from looking at this.

djmips · February 18, 2009

So bits 6 and 7 became 1's plus the carry = $C1. Might not be a great explaination, but my brain is a little fried right now from looking at this.

Hey this is off topic, this is the divide by 7 thread!

But seriously, you are living inside the code. Quite a weird feeling eh? You're probably having 6502 daydreams by this point!

Omegamatrix · February 18, 2009

So bits 6 and 7 became 1's plus the carry = $C1. Might not be a great explaination, but my brain is a little fried right now from looking at this.

Hey this is off topic, this is the divide by 7 thread!

But seriously, you are living inside the code. Quite a weird feeling eh? You're probably having 6502 daydreams by this point!

I had a dream once I couldn't speak English, but I was speaking in assembly. The weird thing was everyone understood me. I haven't thought of that dream in a while. ;P

I guess I like to see how far I can push something until it breaks. Unfortunately I seem to break these routines more often then push them. :ponder:

+selgus · February 18, 2009

Crunched two more cycles, saved another byte. By adding an LSR at the end of the routine you can do a /10 which would be one cycle faster then the last /10 solution I posted. However, it will cost you more bytes.
;divide by 5
;uses illegal opcode $4B (ASR)
;32 cycles, 23 bytes

sta	temp	  ; 3
lsr			  ; 2
asr	#$FE	  ; 2
sbc	temp	  ; 3
lsr			  ; 2
asr	#$FE	  ; 2
adc	#$C1	  ; 2
adc	temp	  ; 3
lsr			  ; 2
asr	#$FE	  ; 2
sbc	temp	  ; 3
eor	#$FF	  ; 2
lsr			  ; 2
lsr			  ; 2
This latest solution is not pretty. Actually it is quite ugly, but it works. I derived adding $C1 by swapping bits of the routines in and out with each other. At one point it became clear I was toggling all the bits of the value twice (so changing nothing). However in between came two right shifts and adding zero with the carry set. So bits 6 and 7 became 1's plus the carry = $C1. Might not be a great explaination, but my brain is a little fried right now from looking at this.

I'll have to admit, this latest one I don't really understand yet... I'm going to have to look up those illegal instructions and then work out the bit patterns on paper.

--Selgus

Omegamatrix · February 18, 2009

ASR is kind of cool as it first preforms an AND with the value you give it, and then shifts it to the right. So when you AND with %11111110 you clear bit 0, and the right shift puts bit 0 into the carry.

When you add a value you are supposed clear the carry, and when you subtract you are supposed to set the carry. However if you do the opposite with the carry:

LDA #0

SEC

ADC #0

Will give you #$01 in the accumulator.

LDA #0

CLC

SBC #0

Will give you #$FF in the accumulator.

It becomes easy enough to start saving a byte when you see stuff like:

SomeRandomRoutine:

...

ASL

BCC .there

SEC

SBC #$07

...

Change it to:

...

ASL

BCC .there

SBC #$06

...

And save your self the byte and 2 cycles. ADC #0 is great when you have a score you are updating.

AddThirtyToScoreRoutine:

LDA loScore

CLC

ADC #30

STA loScore ; but what if it rolled over?

LDA HiScore

ADC #0 ; will add the rollover to the next ram register if there was one.

STA HiScore

But one of the best uses I have ever seen for ADC #0 is swapping nybbles.

;SwapNybblesRoutine

;state of carry doesn't matter before hand,
;carry is always clear afterward, and the nybbles are swapped.

asl					  ; 2
adc	#0				; 2
asl					  ; 2
adc	#0				; 2
asl					  ; 2
adc	#0				; 2
asl					  ; 2
adc	#0				; 2

Edited February 18, 2009 by Omegamatrix

danwinslow · February 18, 2009

yup, that swap nybbles is pretty cool.

Edited February 18, 2009 by danwinslow

Nukey Shay · February 18, 2009

Nice....I was just reading up on some of the illegal opcodes, and its cool to see a clear usage. Anyone know why they are 'illegal'?

They aren't. Just originally undocumented due to the odd/mixed effects they have. AFAIK, they became "illegal" from what an assembler might tell you. Entering anything that it didn't recognise would give you an error. So if you used LAX in one that didn't have that 3-letter mnemonic in it's lookup table, it would commonly reply that you entered an illegal instruction (or something to that effect). In that case, you'd have to cheat and just enter the byte values for the opcode and it's argument to get around the assembler's limitation.

The bad news is that because they were never "officially" part of the instruction set, they are commonly left out of 6502 chip revisions...even the opcodes that ran stable enough to be reliable.

+selgus · February 18, 2009

ASR is kind of cool as it first preforms an AND with the value you give it, and then shifts it to the right. So when you AND with %11111110 you clear bit 0, and the right shift puts bit 0 into the carry.

I probably should have been more clear-- I wanted to figure out how the illegal instruction was being used in this case to meet the goal of dividing by 5. The ADC #0 was not confusing at all.

But one of the best uses I have ever seen for ADC #0 is swapping nybbles.

;SwapNybblesRoutine

;state of carry doesn't matter before hand,
;carry is always clear afterward, and the nybbles are swapped.

asl					 ; 2
adc	#0			; 2
asl					 ; 2
adc	#0			; 2
asl					 ; 2
adc	#0			; 2
asl					 ; 2
adc	#0			; 2

Why wouldn't you just do something like this:

	cmp #127
ror
ror
ror
ror

or am I missing something?

--Selgus

Omegamatrix · February 18, 2009

I wanted to figure out how the illegal instruction was being used in this case to meet the goal of dividing by 5.

Its use in this case is speed. I can replace:

LSR

CLC

with ASR #$FE, and save 2 cycles with the same amount of bytes.

Why wouldn't you just do something like this:
	cmp #127
ror
ror
ror
ror
or am I missing something?

--Selgus

I just tried it, but it didn't work.

+selgus · February 18, 2009

I just tried it, but it didn't work.

That will teach me not to type off the top of my head again! The bit would have to move through the carry, so it would probably end up with a similar routine as yours in the end.

--Selgus

Omegamatrix · February 18, 2009

The swap nybbles routine I found comes from the Atari POS (point of sale) rom I found in Rom Hunters collection. I don't know if Atari had different Kiosks or what. I partly disassembled the rom, and well got bored and moved on. So any random scribbling I made could be completely wrong, but here it is. I'm not sure if a swap nybble routine was used in any other game.

AtariPOS_re_.zip

bogax · February 21, 2009

I just tried it, but it didn't work.

That will teach me not to type off the top of my head again! The bit would have to move through the carry, so it would probably end up with a similar routine as yours in the end.

--Selgus

here's what I think of as the "standard" way of moving a bit through the carry

 cmp #$80  ;copy the high bit to the carry
rol

but here's another way using addition

(I'm going to be all cagey and not

comment it, it's more fun that way )

;swapnibbles

asl
adc #$80
rol
asl
adc #$80
rol

supercat · February 21, 2009

but here's another way using addition
(I'm going to be all cagey and not

comment it, it's more fun that way )

That is cute. How'd you come up with that? That ranks right up there with the following routine someone came up with (on another platform, but adaptable to 6502):

; Magical mystery routine (hint: use with acc = 0-15)
 sed
 adc #$90
 adc #$40
 cld

Anyone ever seen anything like that one before?

It's pretty impressive managing to do the nybble swap with only two "extra" instructions totaling four bytes/four cycles (considering the 4 two-cycle shift/rotate instructions as the "base"). I think the best I would have been able to manage would have been two instructions totalling four bytes/six cycles, plus a magically-pre-initialized register:

  asl
 rol
 rol
 rol
 sax temp; Assumes X=$07 or $0F
 adc temp

Edited February 21, 2009 by supercat

Omegamatrix · February 21, 2009

but here's another way using addition
(I'm going to be all cagey and not

comment it, it's more fun that way )
;swapnibbles

asl
adc #$80
rol
asl
adc #$80
rol

I tried it and it works! A very elegant way of doing it to. Did you do that yourself?

It's pretty impressive managing to do the nybble swap with only two "extra" instructions totaling four bytes/four cycles (considering the 4 two-cycle shift/rotate instructions as the "base"). I think the best I would have been able to manage would have been two instructions totalling four bytes/six cycles, plus a magically-pre-initialized register:
  asl
 rol
 rol
 rol
 sax temp; Assumes X=$07 or $0F
 adc temp

Supercat's works too. I used LDX #$07. I have no idea what the other routine is for though.

Fast divide by seven?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members