Jump to content
IGNORED

6502 16 bit and 8 bit multiply by 16


Recommended Posts

In 6502 Assembly, I am looking for a routine which will multiply an 8 bit number (0-127 only in this case for my tiling system) by 16, leaving a 16 bit number. (The missing bit of data is used for other purposes).

 

Although my low to moderate 6502 skills could probably write this routine, they probably won't give me an efficient version of it.

 

If the original 8 bit number is in location 128 and the results are stored in 129 and 130, does anyone know a good way to do this?

 

The multiply by 16 can be hard-coded, it doesn't have to be a general 16 bit mathematics function.

 

Could any suggestions please be in standard 6502, not Mads or any other compiler specific notation?

Link to comment
Share on other sites

The nice thing about multiply by powers of two is that you can use bit-shifts. So:

 

LDA 128
STA 129 ; RESULT LSB
LDA #0
STA 130 ; RESULT MSB
ASL 129 ; MULTIPLY BY 2
ROL 130
ASL 129
ROL 130 ; MULTIPLY BY 4
ASL 129
ROL 130 ; BY 8
ASL 129
ROL 130 ; AND FINALLY BY 16

 

So we've simply doubled the 8-bit value four times.

  • Like 1
Link to comment
Share on other sites

An alternate faster method using shifts both ways. Depdending on what value you're multiplying by you might save cycles by using table-lookup. But shifting 4 times is 8 cycles, no saving by using table-lookup in this case.

 

Multiply value in A by 16 (cumulative cycle count included). Y register used as temp storage:

 

 tay  ; save for later (2)
 lsr a ; 4
 lsr a ; 6
 lsr a ; 8
 lsr a ; 4 high bits of A remain (10)
 sta high ; these become the high byte (14)
 tya ; original value (16)
 asl a ; 18
 asl a ; 20
 asl a ; 22
 asl a ; multiply low 4 bits by 16 (24)
 sta low ; 28

 

Doing shifts/rotates using A is almost always much quicker than doing it to memory if you're doing it more than once.

Edited by Rybags
  • Like 2
Link to comment
Share on other sites

tay ; save for later (2)
lsr a ; 4
lsr a ; 6
lsr a ; 8
lsr a ; 4 high bits of A remain (10)
sta high ; these become the high byte (14)
tya ; original value (16)
asl a ; 18
asl a ; 20
asl a ; 22
asl a ; multiply low 4 bits by 16 (24)
sta low ; 28

 

 

lets use undocumented 6502C

 


     ldx #$f0    ;2
     asl @       ;4
     rol @       ;6
     rol @       ;8
     rol @       ;10
     sax low     ;14
     and #$07    ;16
     rol @       ;18
     sta high    ;22

 

22 cycle without tables

  • Like 1
Link to comment
Share on other sites

This is an art form, and it's why I don't buy the argument that a "compiler / assembler is good enough" :)

 

For 8 and 16 bit processors I think you're right. A dev who has spent years counting cycles and collecting efficient algos is going to get way tighter than a compiler. Besides which, the Motorola and Motorola inspired products have been designed with human low-level coders in mind. Modern 32 and 64 bit procs are designed to be targeted by compilers. By the time a coder can consistently outdo a compiler enough to be worth it, the next generation of products with different constraints and optimizations is out. Besides which the scale of the code run these days isn't too practical to code everything in assembly. At best, one can profile the code and surgically shave cycles with bits of assembly where it has the highest payoff. It isn't much use coding a menu in assembly. Recoding the portion of a renderer 60% of the processer time is spent in is another matter altogether. The art there is knowing where meticulous assembly is best applied.

Link to comment
Share on other sites

lets use undocumented 6502C

 

ror @
ror @
ror @
ror @
ldx #$f
sax high
arr #$e0
sta low

 

Legal instructions only:

rol @
rol @
rol @
rol @
tay
and #$f0
sta low
tya
rol @
and #$f
sta high

 

asl @
rol @
rol @
rol @
sta high
and #$f0
sta low
eor high
rol @
sta high

Edited by fox
Link to comment
Share on other sites

This is an art form, and it's why I don't buy the argument that a "compiler / assembler is good enough" :)

 

For 8 and 16 bit processors I think you're right. A dev who has spent years counting cycles and collecting efficient algos is going to get way tighter than a compiler. Besides which, the Motorola and Motorola inspired products have been designed with human low-level coders in mind. Modern 32 and 64 bit procs are designed to be targeted by compilers. By the time a coder can consistently outdo a compiler enough to be worth it, the next generation of products with different constraints and optimizations is out. Besides which the scale of the code run these days isn't too practical to code everything in assembly. At best, one can profile the code and surgically shave cycles with bits of assembly where it has the highest payoff. It isn't much use coding a menu in assembly. Recoding the portion of a renderer 60% of the processer time is spent in is another matter altogether. The art there is knowing where meticulous assembly is best applied.

 

Hand-done assembly is still applicable to modern CPUs, but you have to recognize where the speed comes from in such a case: global allocation of registers over the entire program. If you figure out what an app needs to do and reserve registers to the task that are maintained across the entire project, you can double the speed compared to any compiler. It's one thing compilers still suck at - global optimization. Anything else other than a tight-loop calculation can be left to a compiler. When I make an assembly app, the first thing I do is make a list of all the registers and what I expect them to hold at different points in the program. That not only allows you to maintain global registers, but you can avoid saving/restoring registers where it's not needed (at some points in the program, some registers which are normally saved according to the ABI may be safely treated as volatile if you know what all the registers are being used for).

Link to comment
Share on other sites

I think the fastest is still to use a 512-byte look-up table, if you can sacrifice the memory.

 

ldx 128		 ; 3
lda lsbtab,x ; 4, tables must be page-aligned
sta 129		 ; 3
lda msbtab,x ; 4
sta 130		 ; 3
	    ; += 17

 

If, as you say, bit 7 of the value in 128 is for other use and should not be part of the calculation, you do not even need to mask it out before using it as an index. Just ignore it when calculating the tables. If you do mask it out before you use it as an index (lda 128, and #127, tax) you can reduce the tables to 128 bytes each.

Edited by ivop
Link to comment
Share on other sites

I think the fastest is still to use a 512-byte look-up table, if you can sacrifice the memory.

 

ldx 128		 ; 3
lda lsbtab,x ; 4, tables must be page-aligned
sta 129		 ; 3
lda msbtab,x ; 4
sta 130		 ; 3
	 ; += 17

 

If, as you say, bit 7 of the value in 128 is for other use and should not be part of the calculation, you do not even need to mask it out before using it as an index. Just ignore it when calculating the tables. If you do mask it out before you use it as an index (lda 128, and #127, tax) you can reduce the tables to 128 bytes each.

 

Yes, bit 7 isn't in use at this moment and by the time that it gets to this stage, it won't be set at all. You've all given some great suggestions and to be honest, I'm finding it difficult to choose one particular method. Memory is tight, yes, but I can find 256 bytes. Speed isn't a massive issue as I've used Rybags' method so far and even with Altirra on 1% speed, it's still quick. (4x4 tiles, 6 tiles across, 4 down and a status bar for 4 rows). I wouldn't mind using the illegal opcode method, but I want to make my code as portable as possible. Do all of the (standard released) A8 machines support these illegal opcodes? I'm not too bothered if it doesn't work on somebody's hacked together Atari with non-standard architecture.

Link to comment
Share on other sites

Do all of the (standard released) A8 machines support these illegal opcodes?

 

yes, all xl/xe ataris (6502C) support stable illegal opcodes http://atariki.krap....e_rozkazy_6502C (green)

 

Super. I've tried implementing your method. I'm using Mads. Does anyone know how I can get Mads to accept 'ARR' and 'SAX'? For SAX I've also tried AAX and AXS.

Link to comment
Share on other sites

MADS support undocumented opcodes

 

Hmm, I'm having problems with that. I can't see any command line parameter to switch them on and a normal compile isn't working.

 

This is what I get....

 

 

C:\mads\scroller>mads stevetest.asm -o:stevetest.xex

SAX TEMP2

TILE_HANDLER_GET_FULL_SCREEN.asm (48) ERROR: Undeclared macro SAX (BANK=0)

ARR #224

TILE_HANDLER_GET_FULL_SCREEN.asm (49) ERROR: Undeclared macro ARR (BANK=0)

 

I'm using version 1.9.0 build 21 (though will move 1.9.4 soon).

 

I'll switch to Fox's method once I've got these mnemonics compiled.

Link to comment
Share on other sites

I think it is a really bad idea to code anything with illegal OP codes. It won't be long before 65816s will be much more common than they are now and you are just killing yourself in that part of the market. If you need speed, the 65816 is your friend. Don't isolate yourself in the 6502C.

 

By the way: if you are doing things that wrap $0000, like LDA $FF30,X, consider that a 65816 with linear memory will access $100xx when you wrap, not $0FFxx like the 6502 does.

 

Bob

Link to comment
Share on other sites

By the way: if you are doing things that wrap $0000, like LDA $FF30,X, consider that a 65816 with linear memory will access $100xx when you wrap, not $0FFxx like the 6502 does.

 

I've seen conflicting info on this. Is this true only in native mode or also in emulation mode?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...