6502 16 bit and 8 bit multiply by 16

snicklin · July 11, 2012

In 6502 Assembly, I am looking for a routine which will multiply an 8 bit number (0-127 only in this case for my tiling system) by 16, leaving a 16 bit number. (The missing bit of data is used for other purposes).

Although my low to moderate 6502 skills could probably write this routine, they probably won't give me an efficient version of it.

If the original 8 bit number is in location 128 and the results are stored in 129 and 130, does anyone know a good way to do this?

The multiply by 16 can be hard-coded, it doesn't have to be a general 16 bit mathematics function.

Could any suggestions please be in standard 6502, not Mads or any other compiler specific notation?

flashjazzcat · July 11, 2012

The nice thing about multiply by powers of two is that you can use bit-shifts. So:

LDA 128
STA 129 ; RESULT LSB
LDA #0
STA 130 ; RESULT MSB
ASL 129 ; MULTIPLY BY 2
ROL 130
ASL 129
ROL 130 ; MULTIPLY BY 4
ASL 129
ROL 130 ; BY 8
ASL 129
ROL 130 ; AND FINALLY BY 16

So we've simply doubled the 8-bit value four times.

snicklin · July 11, 2012

Ahh wonderful, and such a quick reply, thank you. I guessed that some ASL'ing would be taking place, but how to do it over 16 bits was what stumped me. Once more, thanks very much!

Rybags · July 11, 2012

An alternate faster method using shifts both ways. Depdending on what value you're multiplying by you might save cycles by using table-lookup. But shifting 4 times is 8 cycles, no saving by using table-lookup in this case.

Multiply value in A by 16 (cumulative cycle count included). Y register used as temp storage:

 tay  ; save for later (2)
 lsr a ; 4
 lsr a ; 6
 lsr a ; 8
 lsr a ; 4 high bits of A remain (10)
 sta high ; these become the high byte (14)
 tya ; original value (16)
 asl a ; 18
 asl a ; 20
 asl a ; 22
 asl a ; multiply low 4 bits by 16 (24)
 sta low ; 28

Doing shifts/rotates using A is almost always much quicker than doing it to memory if you're doing it more than once.

Edited July 11, 2012 by Rybags

snicklin · July 12, 2012

That's a nice bit of code there. I'm in the middle of coding a little test for them both. Thanks to both FJC and Rybags.

Edited July 12, 2012 by snicklin

xxl · July 15, 2012

tay ; save for later (2)
lsr a ; 4
lsr a ; 6
lsr a ; 8
lsr a ; 4 high bits of A remain (10)
sta high ; these become the high byte (14)
tya ; original value (16)
asl a ; 18
asl a ; 20
asl a ; 22
asl a ; multiply low 4 bits by 16 (24)
sta low ; 28

lets use undocumented 6502C


     ldx #$f0    ;2
     asl @       ;4
     rol @       ;6
     rol @       ;8
     rol @       ;10
     sax low     ;14
     and #$07    ;16
     rol @       ;18
     sta high    ;22

22 cycle without tables

+Stephen · July 15, 2012

This is an art form, and it's why I don't buy the argument that a "compiler / assembler is good enough"

frogstar_robot · July 16, 2012

This is an art form, and it's why I don't buy the argument that a "compiler / assembler is good enough"

For 8 and 16 bit processors I think you're right. A dev who has spent years counting cycles and collecting efficient algos is going to get way tighter than a compiler. Besides which, the Motorola and Motorola inspired products have been designed with human low-level coders in mind. Modern 32 and 64 bit procs are designed to be targeted by compilers. By the time a coder can consistently outdo a compiler enough to be worth it, the next generation of products with different constraints and optimizations is out. Besides which the scale of the code run these days isn't too practical to code everything in assembly. At best, one can profile the code and surgically shave cycles with bits of assembly where it has the highest payoff. It isn't much use coding a menu in assembly. Recoding the portion of a renderer 60% of the processer time is spent in is another matter altogether. The art there is knowing where meticulous assembly is best applied.

fox · July 16, 2012

lets use undocumented 6502C

ror @
ror @
ror @
ror @
ldx #$f
sax high
arr #$e0
sta low

Legal instructions only:

rol @
rol @
rol @
rol @
tay
and #$f0
sta low
tya
rol @
and #$f
sta high

asl @
rol @
rol @
rol @
sta high
and #$f0
sta low
eor high
rol @
sta high

Edited July 16, 2012 by fox

xxl · July 16, 2012

ror @
ror @
ror @
ror @
ldx #$f
sax high
arr #$e0
sta low

arr

sweet

fox · July 16, 2012

Yeah, just make sure you're not in the decimal mode.

flashjazzcat · July 16, 2012

And never run your code on a 65816 if using illegal instructions.

Seriously though: very nice.

Chilly Willy · July 16, 2012

This is an art form, and it's why I don't buy the argument that a "compiler / assembler is good enough"

For 8 and 16 bit processors I think you're right. A dev who has spent years counting cycles and collecting efficient algos is going to get way tighter than a compiler. Besides which, the Motorola and Motorola inspired products have been designed with human low-level coders in mind. Modern 32 and 64 bit procs are designed to be targeted by compilers. By the time a coder can consistently outdo a compiler enough to be worth it, the next generation of products with different constraints and optimizations is out. Besides which the scale of the code run these days isn't too practical to code everything in assembly. At best, one can profile the code and surgically shave cycles with bits of assembly where it has the highest payoff. It isn't much use coding a menu in assembly. Recoding the portion of a renderer 60% of the processer time is spent in is another matter altogether. The art there is knowing where meticulous assembly is best applied.

Hand-done assembly is still applicable to modern CPUs, but you have to recognize where the speed comes from in such a case: global allocation of registers over the entire program. If you figure out what an app needs to do and reserve registers to the task that are maintained across the entire project, you can double the speed compared to any compiler. It's one thing compilers still suck at - global optimization. Anything else other than a tight-loop calculation can be left to a compiler. When I make an assembly app, the first thing I do is make a list of all the registers and what I expect them to hold at different points in the program. That not only allows you to maintain global registers, but you can avoid saving/restoring registers where it's not needed (at some points in the program, some registers which are normally saved according to the ABI may be safely treated as volatile if you know what all the registers are being used for).

xxl · July 16, 2012

And never run your code on a 65816 if using illegal instructions.

use the power of standard 6502C ...

for 16bit 65816 use:


       asl @
       rol @
       rol @
       rol @
       sta lowhigh

ivop · July 17, 2012

I think the fastest is still to use a 512-byte look-up table, if you can sacrifice the memory.

ldx 128		 ; 3
lda lsbtab,x ; 4, tables must be page-aligned
sta 129		 ; 3
lda msbtab,x ; 4
sta 130		 ; 3
	    ; += 17

If, as you say, bit 7 of the value in 128 is for other use and should not be part of the calculation, you do not even need to mask it out before using it as an index. Just ignore it when calculating the tables. If you do mask it out before you use it as an index (lda 128, and #127, tax) you can reduce the tables to 128 bytes each.

Edited July 17, 2012 by ivop

snicklin · July 17, 2012

I think the fastest is still to use a 512-byte look-up table, if you can sacrifice the memory.
ldx 128		 ; 3
lda lsbtab,x ; 4, tables must be page-aligned
sta 129		 ; 3
lda msbtab,x ; 4
sta 130		 ; 3
	 ; += 17
If, as you say, bit 7 of the value in 128 is for other use and should not be part of the calculation, you do not even need to mask it out before using it as an index. Just ignore it when calculating the tables. If you do mask it out before you use it as an index (lda 128, and #127, tax) you can reduce the tables to 128 bytes each.

Yes, bit 7 isn't in use at this moment and by the time that it gets to this stage, it won't be set at all. You've all given some great suggestions and to be honest, I'm finding it difficult to choose one particular method. Memory is tight, yes, but I can find 256 bytes. Speed isn't a massive issue as I've used Rybags' method so far and even with Altirra on 1% speed, it's still quick. (4x4 tiles, 6 tiles across, 4 down and a status bar for 4 rows). I wouldn't mind using the illegal opcode method, but I want to make my code as portable as possible. Do all of the (standard released) A8 machines support these illegal opcodes? I'm not too bothered if it doesn't work on somebody's hacked together Atari with non-standard architecture.

xxl · July 17, 2012

Do all of the (standard released) A8 machines support these illegal opcodes?

yes, all xl/xe ataris (6502C) support stable illegal opcodes http://atariki.krap.pl/index.php/Nieudokumentowane_rozkazy_6502C (green)

snicklin · July 17, 2012

Do all of the (standard released) A8 machines support these illegal opcodes?

yes, all xl/xe ataris (6502C) support stable illegal opcodes http://atariki.krap....e_rozkazy_6502C (green)

Super. I've tried implementing your method. I'm using Mads. Does anyone know how I can get Mads to accept 'ARR' and 'SAX'? For SAX I've also tried AAX and AXS.

xxl · July 17, 2012

Fox's method with arr is shorter and faster...

MADS support undocumented opcodes

snicklin · July 18, 2012

MADS support undocumented opcodes

Hmm, I'm having problems with that. I can't see any command line parameter to switch them on and a normal compile isn't working.

This is what I get....

C:\mads\scroller>mads stevetest.asm -o:stevetest.xex

SAX TEMP2

TILE_HANDLER_GET_FULL_SCREEN.asm (48) ERROR: Undeclared macro SAX (BANK=0)

ARR #224

TILE_HANDLER_GET_FULL_SCREEN.asm (49) ERROR: Undeclared macro ARR (BANK=0)

I'm using version 1.9.0 build 21 (though will move 1.9.4 soon).

I'll switch to Fox's method once I've got these mnemonics compiled.

flashjazzcat · July 18, 2012

I think some of the mnemonics are different in MADS. They're listed in the manual, but I had to put at least one opcode in using a .byte statement when I was experimenting with them, so perhaps a couple are missing.

Marius · July 20, 2012

I love machine language! Everything is so LOGICAL! Nice thread folks!

bpj1138 · July 21, 2012

lowly 6502 can only shift by 1 bit...

+bob1200xl · July 21, 2012

I think it is a really bad idea to code anything with illegal OP codes. It won't be long before 65816s will be much more common than they are now and you are just killing yourself in that part of the market. If you need speed, the 65816 is your friend. Don't isolate yourself in the 6502C.

By the way: if you are doing things that wrap $0000, like LDA $FF30,X, consider that a 65816 with linear memory will access $100xx when you wrap, not $0FFxx like the 6502 does.

Bob

phaeron · July 21, 2012

By the way: if you are doing things that wrap $0000, like LDA $FF30,X, consider that a 65816 with linear memory will access $100xx when you wrap, not $0FFxx like the 6502 does.

I've seen conflicting info on this. Is this true only in native mode or also in emulation mode?

6502 16 bit and 8 bit multiply by 16

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members