snicklin Posted July 11, 2012 Share Posted July 11, 2012 In 6502 Assembly, I am looking for a routine which will multiply an 8 bit number (0-127 only in this case for my tiling system) by 16, leaving a 16 bit number. (The missing bit of data is used for other purposes). Although my low to moderate 6502 skills could probably write this routine, they probably won't give me an efficient version of it. If the original 8 bit number is in location 128 and the results are stored in 129 and 130, does anyone know a good way to do this? The multiply by 16 can be hard-coded, it doesn't have to be a general 16 bit mathematics function. Could any suggestions please be in standard 6502, not Mads or any other compiler specific notation? Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted July 11, 2012 Share Posted July 11, 2012 The nice thing about multiply by powers of two is that you can use bit-shifts. So: LDA 128 STA 129 ; RESULT LSB LDA #0 STA 130 ; RESULT MSB ASL 129 ; MULTIPLY BY 2 ROL 130 ASL 129 ROL 130 ; MULTIPLY BY 4 ASL 129 ROL 130 ; BY 8 ASL 129 ROL 130 ; AND FINALLY BY 16 So we've simply doubled the 8-bit value four times. 1 Quote Link to comment Share on other sites More sharing options...
snicklin Posted July 11, 2012 Author Share Posted July 11, 2012 Ahh wonderful, and such a quick reply, thank you. I guessed that some ASL'ing would be taking place, but how to do it over 16 bits was what stumped me. Once more, thanks very much! Quote Link to comment Share on other sites More sharing options...
Rybags Posted July 11, 2012 Share Posted July 11, 2012 (edited) An alternate faster method using shifts both ways. Depdending on what value you're multiplying by you might save cycles by using table-lookup. But shifting 4 times is 8 cycles, no saving by using table-lookup in this case. Multiply value in A by 16 (cumulative cycle count included). Y register used as temp storage: tay ; save for later (2) lsr a ; 4 lsr a ; 6 lsr a ; 8 lsr a ; 4 high bits of A remain (10) sta high ; these become the high byte (14) tya ; original value (16) asl a ; 18 asl a ; 20 asl a ; 22 asl a ; multiply low 4 bits by 16 (24) sta low ; 28 Doing shifts/rotates using A is almost always much quicker than doing it to memory if you're doing it more than once. Edited July 11, 2012 by Rybags 2 Quote Link to comment Share on other sites More sharing options...
snicklin Posted July 12, 2012 Author Share Posted July 12, 2012 (edited) That's a nice bit of code there. I'm in the middle of coding a little test for them both. Thanks to both FJC and Rybags. Edited July 12, 2012 by snicklin Quote Link to comment Share on other sites More sharing options...
xxl Posted July 15, 2012 Share Posted July 15, 2012 tay ; save for later (2) lsr a ; 4 lsr a ; 6 lsr a ; 8 lsr a ; 4 high bits of A remain (10) sta high ; these become the high byte (14) tya ; original value (16) asl a ; 18 asl a ; 20 asl a ; 22 asl a ; multiply low 4 bits by 16 (24) sta low ; 28 lets use undocumented 6502C ldx #$f0 ;2 asl @ ;4 rol @ ;6 rol @ ;8 rol @ ;10 sax low ;14 and #$07 ;16 rol @ ;18 sta high ;22 22 cycle without tables 1 Quote Link to comment Share on other sites More sharing options...
+Stephen Posted July 15, 2012 Share Posted July 15, 2012 This is an art form, and it's why I don't buy the argument that a "compiler / assembler is good enough" Quote Link to comment Share on other sites More sharing options...
frogstar_robot Posted July 16, 2012 Share Posted July 16, 2012 This is an art form, and it's why I don't buy the argument that a "compiler / assembler is good enough" For 8 and 16 bit processors I think you're right. A dev who has spent years counting cycles and collecting efficient algos is going to get way tighter than a compiler. Besides which, the Motorola and Motorola inspired products have been designed with human low-level coders in mind. Modern 32 and 64 bit procs are designed to be targeted by compilers. By the time a coder can consistently outdo a compiler enough to be worth it, the next generation of products with different constraints and optimizations is out. Besides which the scale of the code run these days isn't too practical to code everything in assembly. At best, one can profile the code and surgically shave cycles with bits of assembly where it has the highest payoff. It isn't much use coding a menu in assembly. Recoding the portion of a renderer 60% of the processer time is spent in is another matter altogether. The art there is knowing where meticulous assembly is best applied. Quote Link to comment Share on other sites More sharing options...
fox Posted July 16, 2012 Share Posted July 16, 2012 (edited) lets use undocumented 6502C ror @ ror @ ror @ ror @ ldx #$f sax high arr #$e0 sta low Legal instructions only: rol @ rol @ rol @ rol @ tay and #$f0 sta low tya rol @ and #$f sta high asl @ rol @ rol @ rol @ sta high and #$f0 sta low eor high rol @ sta high Edited July 16, 2012 by fox Quote Link to comment Share on other sites More sharing options...
xxl Posted July 16, 2012 Share Posted July 16, 2012 ror @ ror @ ror @ ror @ ldx #$f sax high arr #$e0 sta low arr sweet Quote Link to comment Share on other sites More sharing options...
fox Posted July 16, 2012 Share Posted July 16, 2012 Yeah, just make sure you're not in the decimal mode. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted July 16, 2012 Share Posted July 16, 2012 And never run your code on a 65816 if using illegal instructions. Seriously though: very nice. Quote Link to comment Share on other sites More sharing options...
Chilly Willy Posted July 16, 2012 Share Posted July 16, 2012 This is an art form, and it's why I don't buy the argument that a "compiler / assembler is good enough" For 8 and 16 bit processors I think you're right. A dev who has spent years counting cycles and collecting efficient algos is going to get way tighter than a compiler. Besides which, the Motorola and Motorola inspired products have been designed with human low-level coders in mind. Modern 32 and 64 bit procs are designed to be targeted by compilers. By the time a coder can consistently outdo a compiler enough to be worth it, the next generation of products with different constraints and optimizations is out. Besides which the scale of the code run these days isn't too practical to code everything in assembly. At best, one can profile the code and surgically shave cycles with bits of assembly where it has the highest payoff. It isn't much use coding a menu in assembly. Recoding the portion of a renderer 60% of the processer time is spent in is another matter altogether. The art there is knowing where meticulous assembly is best applied. Hand-done assembly is still applicable to modern CPUs, but you have to recognize where the speed comes from in such a case: global allocation of registers over the entire program. If you figure out what an app needs to do and reserve registers to the task that are maintained across the entire project, you can double the speed compared to any compiler. It's one thing compilers still suck at - global optimization. Anything else other than a tight-loop calculation can be left to a compiler. When I make an assembly app, the first thing I do is make a list of all the registers and what I expect them to hold at different points in the program. That not only allows you to maintain global registers, but you can avoid saving/restoring registers where it's not needed (at some points in the program, some registers which are normally saved according to the ABI may be safely treated as volatile if you know what all the registers are being used for). Quote Link to comment Share on other sites More sharing options...
xxl Posted July 16, 2012 Share Posted July 16, 2012 And never run your code on a 65816 if using illegal instructions. use the power of standard 6502C ... for 16bit 65816 use: asl @ rol @ rol @ rol @ sta lowhigh Quote Link to comment Share on other sites More sharing options...
ivop Posted July 17, 2012 Share Posted July 17, 2012 (edited) I think the fastest is still to use a 512-byte look-up table, if you can sacrifice the memory. ldx 128 ; 3 lda lsbtab,x ; 4, tables must be page-aligned sta 129 ; 3 lda msbtab,x ; 4 sta 130 ; 3 ; += 17 If, as you say, bit 7 of the value in 128 is for other use and should not be part of the calculation, you do not even need to mask it out before using it as an index. Just ignore it when calculating the tables. If you do mask it out before you use it as an index (lda 128, and #127, tax) you can reduce the tables to 128 bytes each. Edited July 17, 2012 by ivop Quote Link to comment Share on other sites More sharing options...
snicklin Posted July 17, 2012 Author Share Posted July 17, 2012 I think the fastest is still to use a 512-byte look-up table, if you can sacrifice the memory. ldx 128 ; 3 lda lsbtab,x ; 4, tables must be page-aligned sta 129 ; 3 lda msbtab,x ; 4 sta 130 ; 3 ; += 17 If, as you say, bit 7 of the value in 128 is for other use and should not be part of the calculation, you do not even need to mask it out before using it as an index. Just ignore it when calculating the tables. If you do mask it out before you use it as an index (lda 128, and #127, tax) you can reduce the tables to 128 bytes each. Yes, bit 7 isn't in use at this moment and by the time that it gets to this stage, it won't be set at all. You've all given some great suggestions and to be honest, I'm finding it difficult to choose one particular method. Memory is tight, yes, but I can find 256 bytes. Speed isn't a massive issue as I've used Rybags' method so far and even with Altirra on 1% speed, it's still quick. (4x4 tiles, 6 tiles across, 4 down and a status bar for 4 rows). I wouldn't mind using the illegal opcode method, but I want to make my code as portable as possible. Do all of the (standard released) A8 machines support these illegal opcodes? I'm not too bothered if it doesn't work on somebody's hacked together Atari with non-standard architecture. Quote Link to comment Share on other sites More sharing options...
xxl Posted July 17, 2012 Share Posted July 17, 2012 Do all of the (standard released) A8 machines support these illegal opcodes? yes, all xl/xe ataris (6502C) support stable illegal opcodes http://atariki.krap.pl/index.php/Nieudokumentowane_rozkazy_6502C (green) Quote Link to comment Share on other sites More sharing options...
snicklin Posted July 17, 2012 Author Share Posted July 17, 2012 Do all of the (standard released) A8 machines support these illegal opcodes? yes, all xl/xe ataris (6502C) support stable illegal opcodes http://atariki.krap....e_rozkazy_6502C (green) Super. I've tried implementing your method. I'm using Mads. Does anyone know how I can get Mads to accept 'ARR' and 'SAX'? For SAX I've also tried AAX and AXS. Quote Link to comment Share on other sites More sharing options...
xxl Posted July 17, 2012 Share Posted July 17, 2012 Fox's method with arr is shorter and faster... MADS support undocumented opcodes Quote Link to comment Share on other sites More sharing options...
snicklin Posted July 18, 2012 Author Share Posted July 18, 2012 MADS support undocumented opcodes Hmm, I'm having problems with that. I can't see any command line parameter to switch them on and a normal compile isn't working. This is what I get.... C:\mads\scroller>mads stevetest.asm -o:stevetest.xex SAX TEMP2 TILE_HANDLER_GET_FULL_SCREEN.asm (48) ERROR: Undeclared macro SAX (BANK=0) ARR #224 TILE_HANDLER_GET_FULL_SCREEN.asm (49) ERROR: Undeclared macro ARR (BANK=0) I'm using version 1.9.0 build 21 (though will move 1.9.4 soon). I'll switch to Fox's method once I've got these mnemonics compiled. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted July 18, 2012 Share Posted July 18, 2012 I think some of the mnemonics are different in MADS. They're listed in the manual, but I had to put at least one opcode in using a .byte statement when I was experimenting with them, so perhaps a couple are missing. Quote Link to comment Share on other sites More sharing options...
Marius Posted July 20, 2012 Share Posted July 20, 2012 I love machine language! Everything is so LOGICAL! Nice thread folks! Quote Link to comment Share on other sites More sharing options...
bpj1138 Posted July 21, 2012 Share Posted July 21, 2012 lowly 6502 can only shift by 1 bit... Quote Link to comment Share on other sites More sharing options...
+bob1200xl Posted July 21, 2012 Share Posted July 21, 2012 I think it is a really bad idea to code anything with illegal OP codes. It won't be long before 65816s will be much more common than they are now and you are just killing yourself in that part of the market. If you need speed, the 65816 is your friend. Don't isolate yourself in the 6502C. By the way: if you are doing things that wrap $0000, like LDA $FF30,X, consider that a 65816 with linear memory will access $100xx when you wrap, not $0FFxx like the 6502 does. Bob Quote Link to comment Share on other sites More sharing options...
phaeron Posted July 21, 2012 Share Posted July 21, 2012 By the way: if you are doing things that wrap $0000, like LDA $FF30,X, consider that a 65816 with linear memory will access $100xx when you wrap, not $0FFxx like the 6502 does. I've seen conflicting info on this. Is this true only in native mode or also in emulation mode? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.