Jump to content
ilmenit

shortest 8bx16b multiplication by 16

Recommended Posts

Hi,

I'm looking for a way to shorten (by the size, not the cycles) a procedure that is performing multiplication of 8bit value to 16bit result (where the result address is not overlapping the value address). So far I got it down to 24 ($18) bytes. Any ideas how to shorten it more?

word-mul16.asm

  • Like 2

Share this post


Link to post
Share on other sites
.proc mul3
	ldx val
	
	ldy #4
loop:
	; ASL16
	txa
	ASL
	tax
	LDA result+1
	ROL
	STA result+1
	; dec loop
	dey
	bne loop	
	stx result
	
	rts
.endp

seems a little too simple ?

  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)

Edit: Forget this, it's your "mul2 routine", but I wonder why you 'CLC' before 'ASL'ing...?

 

Original post: 

Does this qualify?

lda val
pha
asl
asl
asl
asl
sta result
pla
lsr
lsr
lsr
lsr
sta result+1
rts

(Extra candy: X and Y are not touched....)

Edited by Irgendwer
  • Like 4

Share this post


Link to post
Share on other sites
Posted (edited)

Nice if you only want to multiply by $10, but neat anyway 👍

 

Unless I read the intro wrong, I think he wants 8bit multiply with 16bit result

Edited by TGB1718

Share this post


Link to post
Share on other sites

Perhaps I'm missing something crucial, but it seems more concise to rotate the memory contents directly:

 

	lda #2
	sta val
	jsr mul4

...

.proc mul4
	lda val
	sta result
	lda #0
	sta result+1
	ldy #3
Loop:
	asl result
	rol result+1
	dey
	bpl Loop
	rts
.endp

Note I'm initialising the upper 8 bits of the result; lda #0/sta result+1 can be removed if that's not needed. It makes sense to pass value in A as well, which saves more space:

 

	lda #2
	jsr mul5

...

.proc mul5
	sta result
	lda #0
	sta result+1
	ldy #3
Loop:
	asl result
	rol result+1
	dey
	bpl Loop
	rts
.endp

Down to 15 bytes using absolute addresses if you get rid of the upper 8 bit initialisation.

  • Like 2

Share this post


Link to post
Share on other sites
3 minutes ago, flashjazzcat said:

Down to 15 bytes using absolute addresses if you get rid of the upper 8 bit initialisation.

Compared to the version I posted, which also would be 15 bytes without 'LDA'ing first, your's needs Y-register, is bigger if result is non-ZP and slower too.

Share this post


Link to post
Share on other sites
1 minute ago, Irgendwer said:

Compared to the version I posted, which also would be 15 bytes without 'LDA'ing first, your's needs Y-register, is bigger if result is non-ZP and slower too.

Agreed. I think yours is the best. :)

 

  • Like 1
  • Thanks 1

Share this post


Link to post
Share on other sites
        ldy #0
        sty tmp
        ldy #4
?loop   asl
        rol tmp
        dey
        bne ?loop
        rts

13 bytes when tmp is on page 0.

  • Like 2

Share this post


Link to post
Share on other sites
18 minutes ago, mono said:
        ldy #0
        sty tmp
        ldy #4
?loop   asl
        rol tmp
        dey
        bne ?loop
        rts

13 bytes when tmp is on page 0.

Where is the 16 bit result?

Share this post


Link to post
Share on other sites
15 minutes ago, Irgendwer said:

Where is the 16 bit result?

Looks like LSB is in A and MSB is tmp?

 

Share this post


Link to post
Share on other sites
	ldx val
	lda lsbtab,x
	sta result
	lda msbtab,x
	sta result+1

 

12 bytes if val and result are on ZP. You said shortest code :D

 

But you need 512 bytes of LUT.

Edited by ivop
first it an empty post, after that I fixed a typo
  • Haha 1

Share this post


Link to post
Share on other sites

Or this one:

    asl
    rol result+1
    asl
    rol result+1
    asl
    rol result+1
    asl
    rol result+1
    sta result


shorter, but trashes X

	ldx #3
loop
	asl
	rol result+1
	dex
	bpl loop
	sta result

Enter with value in A and result+1 set to 0.

 

And a way to set res+1 to 0 cheaply. Still trashes X.

 

    ldx #0
    stx res+1
    lda val
loop
    asl
    rol res+1
    inx
    cpx #4
    bne loop
    sta res

 

Edited by ivop
added more variations
  • Like 1

Share this post


Link to post
Share on other sites

A different approach. Not smaller though, but it might help others with thinking about this problem :)

 

; swap nibbles
    asl  
    adc  #$80
    rol  
    asl
    adc  #$80
    rol

; split and store result
    pha
    and #$f0
    sta res
    pla
    and #$0f
    sta res+1

 

Edited by ivop

Share this post


Link to post
Share on other sites
1 hour ago, ivop said:

A different approach. Not smaller though, but it might help others with thinking about this problem :)

that's exactly one of my original attempts in my first post 🙂

  • Haha 1

Share this post


Link to post
Share on other sites
4 minutes ago, ilmenit said:

that's exactly one of my original attempts in my first post 🙂

Haha, sorry. Missed that somehow :)

 

Edit: oh, I never looked at your asm file, but to the quoted code. It was mul1 :)

Edited by ivop

Share this post


Link to post
Share on other sites

Okay, how about this one? It trashes the value though. 16 bytes with val on ZP.

    lda #0
    asl val
    rol
    asl val
    rol
    asl val
    rol
    asl val
    rol
    sta val+1

Or trashing X, too:

    lda #0

    ldx #3
loop
    asl val
    rol
    dex
    bpl loop

    sta val+1

12 bytes.

 

Edited by ivop
typoo
  • Like 1

Share this post


Link to post
Share on other sites

.proc mul7 ; by barrym95838, 14 bytes!
; result16 = factor8 * 16
    lda val
    sta result
    lda #$10
loop:
    asl result
    rol 
    bcc loop
    sta result+1
    rts
.endp

  • Like 1

Share this post


Link to post
Share on other sites
6 minutes ago, ivop said:

Okay, how about this one? I trashes the value though.

Good ideas, but I will need the value 🙂

Share this post


Link to post
Share on other sites
2 minutes ago, ilmenit said:

Good ideas, but I will need the value 🙂

Yeah, I guessed so. Combining that incredibly neat trick rolling #$10 four times that barrym95838 introduced, with the val trashing code results in this:

 

    lda #$10

loop
    asl val
    rol
    bcc loop

    sta val+1

9 bytes.

 

Perhaps you can keep track of the original val somewhere else?

  • Like 2

Share this post


Link to post
Share on other sites
2 minutes ago, ivop said:

9 bytes.

Perhaps you can keep track of the original val somewhere else?

With 9 bytes now there can be space to preserve the val 🙂 

Share this post


Link to post
Share on other sites

I made one that  is using OS and requires placement on a special location, has 12 bytes and does not destroy the val:

	org $3
result_hi .ds[1]
result_lo .ds[1]
val .byte 121 	
	org $2000
.proc os_mul16
	lda val
	sta result_lo
	ldx #0
	stx result_hi
	jsr $DBED
	rts
.endp	

one that is destroying the val and has 8 bytes 🙂

	org $3
result_hi .ds[1]
result_lo:
val .byte 121 	
	org $2000
.proc os_mul16_destr_val
	ldx #0
	stx result_hi
	jsr $DBED
	rts
.endp	

I think it's good enough comparing to initial 21-25 bytes.

Edited by ilmenit
  • Like 2

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...