danwinslow Posted February 11, 2009 Share Posted February 11, 2009 It's hard to say....I don't completely follow you. My impression is that you are wandering off towards more theoretical ideas that really have nothing to do with what's at hand, namely, efficient assembler routines to do 8 bit division. It's not that you aren't bringing up interesting things, you are, but they are not really apropos. Then again, I don't follow the assembler discussion very well either , but I like to try to follow along. Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted February 12, 2009 Share Posted February 12, 2009 I would suggest that the overhead you speak of would start to take more cycles then an efficent divide by routine. The divde by seven routine is brilliant and will be hard to beat. Relatively speaking the fastest way to do division (except /2) is to just use a table that holds every possible value. That would cost you a page worth of bytes for your table though! So right away a hybrid method like Supercat's is much more economical. I absolutely agree with pattern recognition though. If you can get a close approximation to what you are dividing then you have a hope of finding a pattern to correct the division on the numbers it fails. I would suggest laying it all out in an excel sheet. I did that to see where the /10 routine failed, and how close it was at a given point. Quote Link to comment Share on other sites More sharing options...
+grafixbmp Posted February 12, 2009 Share Posted February 12, 2009 I would suggest that the overhead you speak of would start to take more cycles then an efficent divide by routine. The divde by seven routine is brilliant and will be hard to beat. Relatively speaking the fastest way to do division (except /2) is to just use a table that holds every possible value. That would cost you a page worth of bytes for your table though! So right away a hybrid method like Supercat's is much more economical. I absolutely agree with pattern recognition though. If you can get a close approximation to what you are dividing then you have a hope of finding a pattern to correct the division on the numbers it fails. I would suggest laying it all out in an excel sheet. I did that to see where the /10 routine failed, and how close it was at a given point. Very cool. Excel rocks for this. And I do agree, can't get any better than the current /7. But what about other ones that are harder or longer routines that are currently being used now which could use improving? What about using less RAM to store large values. Larger than current 16 bit values can muster. That website for prime hex shows some intresting ways of counting. The friend that introduced me to hex counting suggested having a method for counting really large stored numbers by counting the hex rings in the visual pattern along with a value for an offset and beable to accuretely store a large number with far less space than usual. Just 4 full rings reach from 0 to 37 with the inner one not being a ring at all. Add a 5th full ring and you jump to 61. if one were to treat bianary as these rings then numbers would follow like this: 0 1 6 7 12 13 18 19 here 18 and 19 could repeat with either just the 4th ring or the 4th and first. With this, off sets might be easier to do. Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted February 13, 2009 Share Posted February 13, 2009 I don't really follow you when you are talking about rings, Grafixbmp. I would guess most of the numbers we are trying to divide are a byte or less though for the 2600. Out of that there really are only a handful that are used a lot. The divide by 15 repositioning is one of the most important. I seem to remember Cybergoth discovering it in Battlezone. It repositioned your sprite horizontally any where on the scanline in a very clever way. The December '84 Apple Assembly Line link posted earlier in this thread also had a beautiful /15 routine. I looked at it and found it could be improved to this: ;divide by 15 ;26 cycles, 15 bytes sta temp ; 3 lsr ; 2 adc #0 ; 2 lsr ; 2 lsr ; 2 lsr ; 2 sec ; 2 adc temp ; 3 ror ; 2 lsr ; 2 lsr ; 2 lsr ; 2 Of course it is of no use as a repositioning routine, but might be of use if you need to do /15 otherwise. Quote Link to comment Share on other sites More sharing options...
+grafixbmp Posted February 13, 2009 Share Posted February 13, 2009 (edited) I don't really follow you when you are talking about rings, Grafixbmp. I would guess most of the numbers we are trying to divide are a byte or less though for the 2600. Out of that there really are only a handful that are used a lot. The divide by 15 repositioning is one of the most important. I seem to remember Cybergoth discovering it in Battlezone. It repositioned your sprite horizontally any where on the scanline in a very clever way. The December '84 Apple Assembly Line link posted earlier in this thread also had a beautiful /15 routine. I looked at it and found it could be improved to this: ;divide by 15 ;26 cycles, 15 bytes sta temp ; 3 lsr ; 2 adc #0 ; 2 lsr ; 2 lsr ; 2 lsr ; 2 sec ; 2 adc temp ; 3 ror ; 2 lsr ; 2 lsr ; 2 lsr ; 2 Of course it is of no use as a repositioning routine, but might be of use if you need to do /15 otherwise. I'm sorry. I went off on a bit of a tangent before. I was reffering to this: http://mathworld.wolfram.com/HexNumber.html A friend of mine was looking once at using this form of counting to compress large amounts of data into a small space by using some clever math and this form of counting. Seemed rather neat and since it dealt with hex values, I figured it might have a home somewhere in the atari 2600 world. I was just trying to throw something new out into the mix and see if it stired any brain cells out there. Edited February 13, 2009 by grafixbmp Quote Link to comment Share on other sites More sharing options...
+grafixbmp Posted February 13, 2009 Share Posted February 13, 2009 Another attempt :- ldx #0 ; 2 lda byte ; 3 cmp #200 ; 2 bcc below200 ; 2/3 sbc #200 ; 2 ldx #20 ; 2 jmp below100 ; 3 below200 cmp #100 ; 2 bcc below100 ; 2/3 sbc #100 ; 2 ldx #10 ; 2 below100 cmp #50 ; 2 bcc below50; 2/3 sbc #50 ; 2 sta temp ; 3 txa ; 2 ldx temp ; 3 adc #5-1 ; 2 (Carry set) jmp above50; 3 below50 sta temp ; 3 txa ; 2 ldx temp ; 3 above50 cpx #10 ; 2 bcc done ; 2/3 adc #0 ; 2 cpx #20 ; 2 bcc done ; 2/3 adc #0 ; 2 cpx #30 ; 2 bcc done ; 2/3 adc #0 ; 2 cpx #40 ; 2 adc #0 ; 2 done Lots of branches and variable execution time. Tested against the Batari original. I would suggest that the overhead you speak of would start to take more cycles then an efficent divide by routine. The divde by seven routine is brilliant and will be hard to beat. Relatively speaking the fastest way to do division (except /2) is to just use a table that holds every possible value. That would cost you a page worth of bytes for your table though! So right away a hybrid method like Supercat's is much more economical. I absolutely agree with pattern recognition though. If you can get a close approximation to what you are dividing then you have a hope of finding a pattern to correct the division on the numbers it fails. I would suggest laying it all out in an excel sheet. I did that to see where the /10 routine failed, and how close it was at a given point. The overhead I was refering to was like unto the code posted here by groovybee. Where within certain ranges, treat the equasion diffrently. If a piece of code works best for low numbers, use it. If another section of code works best for middle range numbers use that. Or on a diffrent way, if a certain pattern to nubers works better with one set of code... You see where I am going. Your entire routine may take up more space than before but as it runs, depending on what number is being worked with, it will jump to the most efficient set of processes to get the answer. Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted February 15, 2009 Share Posted February 15, 2009 I looked for a divide by 5 routine and couldn't find one. Has anyone wrote one before? Anyway I had a go at it and this is about my 3rd working attempt. I looked at each number through Stella's debugger. I think it's right. I'm going to keep trying to cut it down. ;divide by 5 ;40 cycles, 25 bytes sta temp ; 3 lsr ; 2 lsr ; 2 eor #$FF ; 2 sec ; 2 adc temp ; 3 lsr ; 2 lsr ; 2 eor #$FF ; 2 sec ; 2 adc temp ; 3 lsr ; 2 lsr ; 2 eor #$FF ; 2 sec ; 2 adc temp ; 3 lsr ; 2 lsr ; 2 Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted February 15, 2009 Share Posted February 15, 2009 If execution time is not as important as bytes, and you need to do both unsigned /5 and /10 in your routine; then use the /5 routine below. For /10 Simply do another LSR after the routine and you're there. ;slow divide by 5 ;56 cycles, 16 bytes sta temp ; 3 ldy #3 ; 2 lsr ; 2 lsr ; 2 .loop eor #$FF ; 2 sec ; 2 adc temp ; 3 lsr ; 2 lsr ; 2 dey ; 2 bne .loop ; 2/3 Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted February 16, 2009 Share Posted February 16, 2009 I found a way to save two cycles and one byte, but couldn't make any more head way. ;divide by 5 ;38 cycles, 24 bytes sta temp ; 3 lsr ; 2 adc #1 ; 2 adc temp ; 3 ror ; 2 lsr ; 2 lsr ; 2 eor #$FF ; 2 sec ; 2 adc temp ; 3 lsr ; 2 lsr ; 2 eor #$FF ; 2 sec ; 2 adc temp ; 3 lsr ; 2 lsr ; 2 Then I happened upon illegal opcode $4B (ASR). This seems taylor made for what I need. It can AND, right shift, and guarantee the carry will be clear (by ANDing then shifting). I really hope this opcode is stable. It saves 6 cycles over the first routine I posted while using the same amount of bytes. ;divide by 5 ;uses illegal opcode $4B (ASR) ;34 cycles, 25 bytes sta temp ; 3 lsr ; 2 asr #$FE ; 2 sbc temp ; 3 eor #$FF ; 2 lsr ; 2 asr #$FE ; 2 sbc temp ; 3 eor #$FF ; 2 lsr ; 2 asr #$FE ; 2 sbc temp ; 3 eor #$FF ; 2 lsr ; 2 lsr ; 2 Quote Link to comment Share on other sites More sharing options...
danwinslow Posted February 16, 2009 Share Posted February 16, 2009 Nice....I was just reading up on some of the illegal opcodes, and its cool to see a clear usage. Anyone know why they are 'illegal'? Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted February 16, 2009 Share Posted February 16, 2009 Anyone know why they are 'illegal'? I guess because you weren't supposed to use them. Since they weren't documented (officially) it meant they were not guaranteed to be there if the IC was revised. This was long ago. Today the only revisions are stuff like the Flashback consoles. Some of the late Jr models might have more trouble with some illegal ops. I haven't tested any hardware myself; just read tibits here and there. Some people avoid illegal ops like the plaque, others use them all the time. I think all agree in a situation where they work they can be extremely useful. Then can save bytes, save cycles, or save both. I found by merging the last two routines I can save a byte. ;divide by 5 ;34 cycles, 24 bytes ;uses illegal opcode #$4B (ASR) sta temp ; 3 lsr ; 2 adc #1 ; 2 adc temp ; 3 ror ; 2 lsr ; 2 asr #$FE ; 2 sbc temp ; 3 eor #$FF ; 2 lsr ; 2 asr #$FE ; 2 sbc temp ; 3 eor #$FF ; 2 lsr ; 2 lsr ; 2 Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted February 18, 2009 Share Posted February 18, 2009 Crunched two more cycles, saved another byte. By adding an LSR at the end of the routine you can do a /10 which would be one cycle faster then the last /10 solution I posted. However, it will cost you more bytes. ;divide by 5 ;uses illegal opcode $4B (ASR) ;32 cycles, 23 bytes sta temp ; 3 lsr ; 2 asr #$FE ; 2 sbc temp ; 3 lsr ; 2 asr #$FE ; 2 adc #$C1 ; 2 adc temp ; 3 lsr ; 2 asr #$FE ; 2 sbc temp ; 3 eor #$FF ; 2 lsr ; 2 lsr ; 2 This latest solution is not pretty. Actually it is quite ugly, but it works. I derived adding $C1 by swapping bits of the routines in and out with each other. At one point it became clear I was toggling all the bits of the value twice (so changing nothing). However in between came two right shifts and adding zero with the carry set. So bits 6 and 7 became 1's plus the carry = $C1. Might not be a great explaination, but my brain is a little fried right now from looking at this. Quote Link to comment Share on other sites More sharing options...
djmips Posted February 18, 2009 Share Posted February 18, 2009 So bits 6 and 7 became 1's plus the carry = $C1. Might not be a great explaination, but my brain is a little fried right now from looking at this. Hey this is off topic, this is the divide by 7 thread! But seriously, you are living inside the code. Quite a weird feeling eh? You're probably having 6502 daydreams by this point! Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted February 18, 2009 Share Posted February 18, 2009 So bits 6 and 7 became 1's plus the carry = $C1. Might not be a great explaination, but my brain is a little fried right now from looking at this. Hey this is off topic, this is the divide by 7 thread! But seriously, you are living inside the code. Quite a weird feeling eh? You're probably having 6502 daydreams by this point! I had a dream once I couldn't speak English, but I was speaking in assembly. The weird thing was everyone understood me. I haven't thought of that dream in a while. ;P I guess I like to see how far I can push something until it breaks. Unfortunately I seem to break these routines more often then push them. 1 Quote Link to comment Share on other sites More sharing options...
+selgus Posted February 18, 2009 Share Posted February 18, 2009 Crunched two more cycles, saved another byte. By adding an LSR at the end of the routine you can do a /10 which would be one cycle faster then the last /10 solution I posted. However, it will cost you more bytes. ;divide by 5 ;uses illegal opcode $4B (ASR) ;32 cycles, 23 bytes sta temp ; 3 lsr ; 2 asr #$FE ; 2 sbc temp ; 3 lsr ; 2 asr #$FE ; 2 adc #$C1 ; 2 adc temp ; 3 lsr ; 2 asr #$FE ; 2 sbc temp ; 3 eor #$FF ; 2 lsr ; 2 lsr ; 2 This latest solution is not pretty. Actually it is quite ugly, but it works. I derived adding $C1 by swapping bits of the routines in and out with each other. At one point it became clear I was toggling all the bits of the value twice (so changing nothing). However in between came two right shifts and adding zero with the carry set. So bits 6 and 7 became 1's plus the carry = $C1. Might not be a great explaination, but my brain is a little fried right now from looking at this. I'll have to admit, this latest one I don't really understand yet... I'm going to have to look up those illegal instructions and then work out the bit patterns on paper. --Selgus Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted February 18, 2009 Share Posted February 18, 2009 (edited) ASR is kind of cool as it first preforms an AND with the value you give it, and then shifts it to the right. So when you AND with %11111110 you clear bit 0, and the right shift puts bit 0 into the carry. When you add a value you are supposed clear the carry, and when you subtract you are supposed to set the carry. However if you do the opposite with the carry: LDA #0 SEC ADC #0 Will give you #$01 in the accumulator. LDA #0 CLC SBC #0 Will give you #$FF in the accumulator. It becomes easy enough to start saving a byte when you see stuff like: SomeRandomRoutine: ... ASL ASL BCC .there SEC SBC #$07 ... Change it to: ... ASL ASL BCC .there SBC #$06 ... And save your self the byte and 2 cycles. ADC #0 is great when you have a score you are updating. AddThirtyToScoreRoutine: LDA loScore CLC ADC #30 STA loScore ; but what if it rolled over? LDA HiScore ADC #0 ; will add the rollover to the next ram register if there was one. STA HiScore But one of the best uses I have ever seen for ADC #0 is swapping nybbles. ;SwapNybblesRoutine ;state of carry doesn't matter before hand, ;carry is always clear afterward, and the nybbles are swapped. asl ; 2 adc #0 ; 2 asl ; 2 adc #0 ; 2 asl ; 2 adc #0 ; 2 asl ; 2 adc #0 ; 2 Edited February 18, 2009 by Omegamatrix Quote Link to comment Share on other sites More sharing options...
danwinslow Posted February 18, 2009 Share Posted February 18, 2009 (edited) yup, that swap nybbles is pretty cool. Edited February 18, 2009 by danwinslow Quote Link to comment Share on other sites More sharing options...
Nukey Shay Posted February 18, 2009 Share Posted February 18, 2009 Nice....I was just reading up on some of the illegal opcodes, and its cool to see a clear usage. Anyone know why they are 'illegal'?They aren't. Just originally undocumented due to the odd/mixed effects they have. AFAIK, they became "illegal" from what an assembler might tell you. Entering anything that it didn't recognise would give you an error. So if you used LAX in one that didn't have that 3-letter mnemonic in it's lookup table, it would commonly reply that you entered an illegal instruction (or something to that effect). In that case, you'd have to cheat and just enter the byte values for the opcode and it's argument to get around the assembler's limitation. The bad news is that because they were never "officially" part of the instruction set, they are commonly left out of 6502 chip revisions...even the opcodes that ran stable enough to be reliable. Quote Link to comment Share on other sites More sharing options...
+selgus Posted February 18, 2009 Share Posted February 18, 2009 ASR is kind of cool as it first preforms an AND with the value you give it, and then shifts it to the right. So when you AND with %11111110 you clear bit 0, and the right shift puts bit 0 into the carry. I probably should have been more clear-- I wanted to figure out how the illegal instruction was being used in this case to meet the goal of dividing by 5. The ADC #0 was not confusing at all. But one of the best uses I have ever seen for ADC #0 is swapping nybbles. ;SwapNybblesRoutine ;state of carry doesn't matter before hand, ;carry is always clear afterward, and the nybbles are swapped. asl ; 2 adc #0 ; 2 asl ; 2 adc #0 ; 2 asl ; 2 adc #0 ; 2 asl ; 2 adc #0 ; 2 Why wouldn't you just do something like this: cmp #127 ror ror ror ror or am I missing something? --Selgus Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted February 18, 2009 Share Posted February 18, 2009 I wanted to figure out how the illegal instruction was being used in this case to meet the goal of dividing by 5. Its use in this case is speed. I can replace: LSR CLC with ASR #$FE, and save 2 cycles with the same amount of bytes. Why wouldn't you just do something like this: cmp #127 ror ror ror ror or am I missing something? --Selgus I just tried it, but it didn't work. Quote Link to comment Share on other sites More sharing options...
+selgus Posted February 18, 2009 Share Posted February 18, 2009 I just tried it, but it didn't work. That will teach me not to type off the top of my head again! The bit would have to move through the carry, so it would probably end up with a similar routine as yours in the end. --Selgus Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted February 18, 2009 Share Posted February 18, 2009 The swap nybbles routine I found comes from the Atari POS (point of sale) rom I found in Rom Hunters collection. I don't know if Atari had different Kiosks or what. I partly disassembled the rom, and well got bored and moved on. So any random scribbling I made could be completely wrong, but here it is. I'm not sure if a swap nybble routine was used in any other game. AtariPOS_re_.zip Quote Link to comment Share on other sites More sharing options...
bogax Posted February 21, 2009 Share Posted February 21, 2009 I just tried it, but it didn't work. That will teach me not to type off the top of my head again! The bit would have to move through the carry, so it would probably end up with a similar routine as yours in the end. --Selgus here's what I think of as the "standard" way of moving a bit through the carry cmp #$80 ;copy the high bit to the carry rol but here's another way using addition (I'm going to be all cagey and not comment it, it's more fun that way ) ;swapnibbles asl adc #$80 rol asl adc #$80 rol 1 Quote Link to comment Share on other sites More sharing options...
supercat Posted February 21, 2009 Share Posted February 21, 2009 (edited) but here's another way using addition(I'm going to be all cagey and not comment it, it's more fun that way ) That is cute. How'd you come up with that? That ranks right up there with the following routine someone came up with (on another platform, but adaptable to 6502): ; Magical mystery routine (hint: use with acc = 0-15) sed adc #$90 adc #$40 cld Anyone ever seen anything like that one before? It's pretty impressive managing to do the nybble swap with only two "extra" instructions totaling four bytes/four cycles (considering the 4 two-cycle shift/rotate instructions as the "base"). I think the best I would have been able to manage would have been two instructions totalling four bytes/six cycles, plus a magically-pre-initialized register: asl rol rol rol sax temp; Assumes X=$07 or $0F adc temp Edited February 21, 2009 by supercat Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted February 21, 2009 Share Posted February 21, 2009 but here's another way using addition(I'm going to be all cagey and not comment it, it's more fun that way ) ;swapnibbles asl adc #$80 rol asl adc #$80 rol I tried it and it works! A very elegant way of doing it to. Did you do that yourself? It's pretty impressive managing to do the nybble swap with only two "extra" instructions totaling four bytes/four cycles (considering the 4 two-cycle shift/rotate instructions as the "base"). I think the best I would have been able to manage would have been two instructions totalling four bytes/six cycles, plus a magically-pre-initialized register: asl rol rol rol sax temp; Assumes X=$07 or $0F adc temp Supercat's works too. I used LDX #$07. I have no idea what the other routine is for though. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.