Jump to content

Photo

6502 16 bit and 8 bit multiply by 16


38 replies to this topic

#1 snicklin ONLINE  

snicklin

    Stargunner

  • 1,314 posts
  • Location:Australia

Posted Wed Jul 11, 2012 2:12 PM

In 6502 Assembly, I am looking for a routine which will multiply an 8 bit number (0-127 only in this case for my tiling system) by 16, leaving a 16 bit number. (The missing bit of data is used for other purposes).

Although my low to moderate 6502 skills could probably write this routine, they probably won't give me an efficient version of it.

If the original 8 bit number is in location 128 and the results are stored in 129 and 130, does anyone know a good way to do this?

The multiply by 16 can be hard-coded, it doesn't have to be a general 16 bit mathematics function.

Could any suggestions please be in standard 6502, not Mads or any other compiler specific notation?

#2 flashjazzcat OFFLINE  

flashjazzcat

    Quadrunner

  • 8,661 posts
  • Location:United Kingdom

Posted Wed Jul 11, 2012 2:37 PM

The nice thing about multiply by powers of two is that you can use bit-shifts. So:

LDA 128
STA 129 ; RESULT LSB
LDA #0
STA 130 ; RESULT MSB
ASL 129 ; MULTIPLY BY 2
ROL 130
ASL 129
ROL 130 ; MULTIPLY BY 4
ASL 129
ROL 130 ; BY 8
ASL 129
ROL 130 ; AND FINALLY BY 16

So we've simply doubled the 8-bit value four times.

#3 snicklin ONLINE  

snicklin

    Stargunner

  • Topic Starter
  • 1,314 posts
  • Location:Australia

Posted Wed Jul 11, 2012 2:41 PM

Ahh wonderful, and such a quick reply, thank you. I guessed that some ASL'ing would be taking place, but how to do it over 16 bits was what stumped me. Once more, thanks very much!

#4 Rybags ONLINE  

Rybags

    Quadrunner

  • 13,009 posts
  • Location:Australia

Posted Wed Jul 11, 2012 5:28 PM

An alternate faster method using shifts both ways. Depdending on what value you're multiplying by you might save cycles by using table-lookup. But shifting 4 times is 8 cycles, no saving by using table-lookup in this case.

Multiply value in A by 16 (cumulative cycle count included). Y register used as temp storage:

  tay  ; save for later (2)
  lsr a ; 4
  lsr a ; 6
  lsr a ; 8
  lsr a ; 4 high bits of A remain (10)
  sta high ; these become the high byte (14)
  tya ; original value (16)
  asl a ; 18
  asl a ; 20
  asl a ; 22
  asl a ; multiply low 4 bits by 16 (24)
  sta low ; 28

Doing shifts/rotates using A is almost always much quicker than doing it to memory if you're doing it more than once.

Edited by Rybags, Wed Jul 11, 2012 5:29 PM.


#5 snicklin ONLINE  

snicklin

    Stargunner

  • Topic Starter
  • 1,314 posts
  • Location:Australia

Posted Thu Jul 12, 2012 1:54 PM

That's a nice bit of code there. I'm in the middle of coding a little test for them both. Thanks to both FJC and Rybags.

Edited by snicklin, Thu Jul 12, 2012 1:56 PM.


#6 xxl OFFLINE  

xxl

    Dragonstomper

  • 785 posts
  • Location:KRAKOW/Poland

Posted Sun Jul 15, 2012 4:43 PM

tay ; save for later (2)
lsr a ; 4
lsr a ; 6
lsr a ; 8
lsr a ; 4 high bits of A remain (10)
sta high ; these become the high byte (14)
tya ; original value (16)
asl a ; 18
asl a ; 20
asl a ; 22
asl a ; multiply low 4 bits by 16 (24)
sta low ; 28



lets use undocumented 6502C


      ldx #$f0    ;2
      asl @       ;4
      rol @       ;6
      rol @       ;8
      rol @       ;10
      sax low     ;14
      and #$07    ;16
      rol @       ;18
      sta high    ;22

22 cycle without tables

#7 Stephen OFFLINE  

Stephen

    River Patroller

  • 4,819 posts
  • A8 Gear Head
  • Location:Akron, Ohio

Posted Sun Jul 15, 2012 5:22 PM

This is an art form, and it's why I don't buy the argument that a "compiler / assembler is good enough" :)

#8 frogstar_robot OFFLINE  

frogstar_robot

    Dragonstomper

  • 751 posts

Posted Sun Jul 15, 2012 10:25 PM

This is an art form, and it's why I don't buy the argument that a "compiler / assembler is good enough" :)


For 8 and 16 bit processors I think you're right. A dev who has spent years counting cycles and collecting efficient algos is going to get way tighter than a compiler. Besides which, the Motorola and Motorola inspired products have been designed with human low-level coders in mind. Modern 32 and 64 bit procs are designed to be targeted by compilers. By the time a coder can consistently outdo a compiler enough to be worth it, the next generation of products with different constraints and optimizations is out. Besides which the scale of the code run these days isn't too practical to code everything in assembly. At best, one can profile the code and surgically shave cycles with bits of assembly where it has the highest payoff. It isn't much use coding a menu in assembly. Recoding the portion of a renderer 60% of the processer time is spent in is another matter altogether. The art there is knowing where meticulous assembly is best applied.

#9 fox OFFLINE  

fox

    Chopper Commander

  • 219 posts
  • Location:Poland

Posted Mon Jul 16, 2012 7:16 AM

lets use undocumented 6502C


ror @
ror @
ror @
ror @
ldx #$f
sax high
arr #$e0
sta low

Legal instructions only:
rol @
rol @
rol @
rol @
tay
and #$f0
sta low
tya
rol @
and #$f
sta high

asl @
rol @
rol @
rol @
sta high
and #$f0
sta low
eor high
rol @
sta high

Edited by fox, Mon Jul 16, 2012 7:18 AM.


#10 xxl OFFLINE  

xxl

    Dragonstomper

  • 785 posts
  • Location:KRAKOW/Poland

Posted Mon Jul 16, 2012 7:58 AM

ror @
ror @
ror @
ror @
ldx #$f
sax high
arr #$e0
sta low


arr :D

sweet

#11 fox OFFLINE  

fox

    Chopper Commander

  • 219 posts
  • Location:Poland

Posted Mon Jul 16, 2012 8:45 AM

Yeah, just make sure you're not in the decimal mode. :)

#12 flashjazzcat OFFLINE  

flashjazzcat

    Quadrunner

  • 8,661 posts
  • Location:United Kingdom

Posted Mon Jul 16, 2012 12:15 PM

And never run your code on a 65816 if using illegal instructions. ;)

Seriously though: very nice. :D

#13 Chilly Willy OFFLINE  

Chilly Willy

    Dragonstomper

  • 651 posts
  • Location:The Land of Enchantment

Posted Mon Jul 16, 2012 1:08 PM


This is an art form, and it's why I don't buy the argument that a "compiler / assembler is good enough" :)


For 8 and 16 bit processors I think you're right. A dev who has spent years counting cycles and collecting efficient algos is going to get way tighter than a compiler. Besides which, the Motorola and Motorola inspired products have been designed with human low-level coders in mind. Modern 32 and 64 bit procs are designed to be targeted by compilers. By the time a coder can consistently outdo a compiler enough to be worth it, the next generation of products with different constraints and optimizations is out. Besides which the scale of the code run these days isn't too practical to code everything in assembly. At best, one can profile the code and surgically shave cycles with bits of assembly where it has the highest payoff. It isn't much use coding a menu in assembly. Recoding the portion of a renderer 60% of the processer time is spent in is another matter altogether. The art there is knowing where meticulous assembly is best applied.


Hand-done assembly is still applicable to modern CPUs, but you have to recognize where the speed comes from in such a case: global allocation of registers over the entire program. If you figure out what an app needs to do and reserve registers to the task that are maintained across the entire project, you can double the speed compared to any compiler. It's one thing compilers still suck at - global optimization. Anything else other than a tight-loop calculation can be left to a compiler. When I make an assembly app, the first thing I do is make a list of all the registers and what I expect them to hold at different points in the program. That not only allows you to maintain global registers, but you can avoid saving/restoring registers where it's not needed (at some points in the program, some registers which are normally saved according to the ABI may be safely treated as volatile if you know what all the registers are being used for).

#14 xxl OFFLINE  

xxl

    Dragonstomper

  • 785 posts
  • Location:KRAKOW/Poland

Posted Mon Jul 16, 2012 2:16 PM

And never run your code on a 65816 if using illegal instructions. ;)


use the power of standard 6502C ...

for 16bit 65816 use:

        asl @
        rol @
        rol @
        rol @
        sta lowhigh


#15 ivop OFFLINE  

ivop

    Chopper Commander

  • 220 posts
  • Location:The Netherlands

Posted Tue Jul 17, 2012 10:36 AM

I think the fastest is still to use a 512-byte look-up table, if you can sacrifice the memory.

ldx 128		 ; 3
lda lsbtab,x ; 4, tables must be page-aligned
sta 129		 ; 3
lda msbtab,x ; 4
sta 130		 ; 3
		    ; += 17

If, as you say, bit 7 of the value in 128 is for other use and should not be part of the calculation, you do not even need to mask it out before using it as an index. Just ignore it when calculating the tables. If you do mask it out before you use it as an index (lda 128, and #127, tax) you can reduce the tables to 128 bytes each.

Edited by ivop, Tue Jul 17, 2012 10:37 AM.


#16 snicklin ONLINE  

snicklin

    Stargunner

  • Topic Starter
  • 1,314 posts
  • Location:Australia

Posted Tue Jul 17, 2012 12:31 PM

I think the fastest is still to use a 512-byte look-up table, if you can sacrifice the memory.

ldx 128		 ; 3
lda lsbtab,x ; 4, tables must be page-aligned
sta 129		 ; 3
lda msbtab,x ; 4
sta 130		 ; 3
		 ; += 17

If, as you say, bit 7 of the value in 128 is for other use and should not be part of the calculation, you do not even need to mask it out before using it as an index. Just ignore it when calculating the tables. If you do mask it out before you use it as an index (lda 128, and #127, tax) you can reduce the tables to 128 bytes each.


Yes, bit 7 isn't in use at this moment and by the time that it gets to this stage, it won't be set at all. You've all given some great suggestions and to be honest, I'm finding it difficult to choose one particular method. Memory is tight, yes, but I can find 256 bytes. Speed isn't a massive issue as I've used Rybags' method so far and even with Altirra on 1% speed, it's still quick. (4x4 tiles, 6 tiles across, 4 down and a status bar for 4 rows). I wouldn't mind using the illegal opcode method, but I want to make my code as portable as possible. Do all of the (standard released) A8 machines support these illegal opcodes? I'm not too bothered if it doesn't work on somebody's hacked together Atari with non-standard architecture.

#17 xxl OFFLINE  

xxl

    Dragonstomper

  • 785 posts
  • Location:KRAKOW/Poland

Posted Tue Jul 17, 2012 1:59 PM

Do all of the (standard released) A8 machines support these illegal opcodes?


yes, all xl/xe ataris (6502C) support stable illegal opcodes http://atariki.krap....e_rozkazy_6502C (green)

#18 snicklin ONLINE  

snicklin

    Stargunner

  • Topic Starter
  • 1,314 posts
  • Location:Australia

Posted Tue Jul 17, 2012 2:21 PM


Do all of the (standard released) A8 machines support these illegal opcodes?


yes, all xl/xe ataris (6502C) support stable illegal opcodes http://atariki.krap....e_rozkazy_6502C (green)


Super. I've tried implementing your method. I'm using Mads. Does anyone know how I can get Mads to accept 'ARR' and 'SAX'? For SAX I've also tried AAX and AXS.

#19 xxl OFFLINE  

xxl

    Dragonstomper

  • 785 posts
  • Location:KRAKOW/Poland

Posted Tue Jul 17, 2012 2:26 PM

Fox's method with arr is shorter and faster...

MADS support undocumented opcodes

#20 snicklin ONLINE  

snicklin

    Stargunner

  • Topic Starter
  • 1,314 posts
  • Location:Australia

Posted Wed Jul 18, 2012 12:07 AM

MADS support undocumented opcodes


Hmm, I'm having problems with that. I can't see any command line parameter to switch them on and a normal compile isn't working.

This is what I get....


C:\mads\scroller>mads stevetest.asm -o:stevetest.xex
SAX TEMP2
TILE_HANDLER_GET_FULL_SCREEN.asm (48) ERROR: Undeclared macro SAX (BANK=0)
ARR #224
TILE_HANDLER_GET_FULL_SCREEN.asm (49) ERROR: Undeclared macro ARR (BANK=0)

I'm using version 1.9.0 build 21 (though will move 1.9.4 soon).

I'll switch to Fox's method once I've got these mnemonics compiled.

#21 flashjazzcat OFFLINE  

flashjazzcat

    Quadrunner

  • 8,661 posts
  • Location:United Kingdom

Posted Wed Jul 18, 2012 3:37 AM

I think some of the mnemonics are different in MADS. They're listed in the manual, but I had to put at least one opcode in using a .byte statement when I was experimenting with them, so perhaps a couple are missing.

#22 ProWizard OFFLINE  

ProWizard

    River Patroller

  • 3,386 posts
  • MyIDE 2 Conversions in Progress!
  • Location:$d500-$d57f

Posted Fri Jul 20, 2012 2:14 AM

I love machine language! Everything is so LOGICAL! Nice thread folks!

#23 bpj1138 OFFLINE  

bpj1138

    Star Raider

  • 84 posts
  • Location:USA

Posted Fri Jul 20, 2012 8:17 PM

lowly 6502 can only shift by 1 bit...

#24 bob1200xl OFFLINE  

bob1200xl

    River Patroller

  • 2,088 posts

Posted Fri Jul 20, 2012 8:59 PM

I think it is a really bad idea to code anything with illegal OP codes. It won't be long before 65816s will be much more common than they are now and you are just killing yourself in that part of the market. If you need speed, the 65816 is your friend. Don't isolate yourself in the 6502C.

By the way: if you are doing things that wrap $0000, like LDA $FF30,X, consider that a 65816 with linear memory will access $100xx when you wrap, not $0FFxx like the 6502 does.

Bob

#25 phaeron OFFLINE  

phaeron

    Stargunner

  • 1,247 posts
  • Location:USA

Posted Fri Jul 20, 2012 11:05 PM

By the way: if you are doing things that wrap $0000, like LDA $FF30,X, consider that a 65816 with linear memory will access $100xx when you wrap, not $0FFxx like the 6502 does.


I've seen conflicting info on this. Is this true only in native mode or also in emulation mode?




0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users