shortest 8bx16b multiplication by 16

ilmenit · January 2, 2022

Hi,

I'm looking for a way to shorten (by the size, not the cycles) a procedure that is performing multiplication of 8bit value to 16bit result (where the result address is not overlapping the value address). So far I got it down to 24 ($18) bytes. Any ideas how to shorten it more?

word-mul16.asm

Eagle · January 2, 2022

https://codebase64.org/doku.php?id=base:short_8bit_multiplication_16bit_product

ilmenit · January 2, 2022

5 minutes ago, Eagle said:

https://codebase64.org/doku.php?id=base:short_8bit_multiplication_16bit_product

this one is longer.

Btw, when moving the variables to Zero Page the Mul3 is winning so far by length:

* MUL1: $0015
* MUL2: $0015
* MUL3: $0014

rensoup · January 3, 2022

.proc mul3
	ldx val
	
	ldy #4
loop:
	; ASL16
	txa
	ASL
	tax
	LDA result+1
	ROL
	STA result+1
	; dec loop
	dey
	bne loop	
	stx result
	
	rts
.endp

seems a little too simple ?

Wrathchild · January 3, 2022

Scrap that

Edited January 3, 2022 by Wrathchild

Irgendwer · January 3, 2022

Edit: Forget this, it's your "mul2 routine", but I wonder why you 'CLC' before 'ASL'ing...?

Original post:

Does this qualify?

lda val
pha
asl
asl
asl
asl
sta result
pla
lsr
lsr
lsr
lsr
sta result+1
rts

(Extra candy: X and Y are not touched....)

Edited January 3, 2022 by Irgendwer

TGB1718 · January 3, 2022

Nice if you only want to multiply by $10, but neat anyway ?

Unless I read the intro wrong, I think he wants 8bit multiply with 16bit result

Edited January 3, 2022 by TGB1718

flashjazzcat · January 3, 2022

Perhaps I'm missing something crucial, but it seems more concise to rotate the memory contents directly:

	lda #2
	sta val
	jsr mul4

...

.proc mul4
	lda val
	sta result
	lda #0
	sta result+1
	ldy #3
Loop:
	asl result
	rol result+1
	dey
	bpl Loop
	rts
.endp

Note I'm initialising the upper 8 bits of the result; lda #0/sta result+1 can be removed if that's not needed. It makes sense to pass value in A as well, which saves more space:

	lda #2
	jsr mul5

...

.proc mul5
	sta result
	lda #0
	sta result+1
	ldy #3
Loop:
	asl result
	rol result+1
	dey
	bpl Loop
	rts
.endp

Down to 15 bytes using absolute addresses if you get rid of the upper 8 bit initialisation.

Irgendwer · January 3, 2022

3 minutes ago, flashjazzcat said:

Down to 15 bytes using absolute addresses if you get rid of the upper 8 bit initialisation.

Compared to the version I posted, which also would be 15 bytes without 'LDA'ing first, your's needs Y-register, is bigger if result is non-ZP and slower too.

flashjazzcat · January 3, 2022

1 minute ago, Irgendwer said:

Compared to the version I posted, which also would be 15 bytes without 'LDA'ing first, your's needs Y-register, is bigger if result is non-ZP and slower too.

Agreed. I think yours is the best.

mono · January 3, 2022

        ldy #0
        sty tmp
        ldy #4
?loop   asl
        rol tmp
        dey
        bne ?loop
        rts

13 bytes when tmp is on page 0.

Irgendwer · January 3, 2022

18 minutes ago, mono said:


        ldy #0
        sty tmp
        ldy #4
?loop   asl
        rol tmp
        dey
        bne ?loop
        rts

13 bytes when tmp is on page 0.

Where is the 16 bit result?

ivop · January 3, 2022

15 minutes ago, Irgendwer said:

Where is the 16 bit result?

Looks like LSB is in A and MSB is tmp?

ivop · January 3, 2022

	ldx val
	lda lsbtab,x
	sta result
	lda msbtab,x
	sta result+1

12 bytes if val and result are on ZP. You said shortest code

But you need 512 bytes of LUT.

Edited January 3, 2022 by ivop
first it an empty post, after that I fixed a typo

ivop · January 3, 2022

Or this one:

    asl
    rol result+1
    asl
    rol result+1
    asl
    rol result+1
    asl
    rol result+1
    sta result


shorter, but trashes X

	ldx #3
loop
	asl
	rol result+1
	dex
	bpl loop
	sta result

Enter with value in A and result+1 set to 0.

And a way to set res+1 to 0 cheaply. Still trashes X.

    ldx #0
    stx res+1
    lda val
loop
    asl
    rol res+1
    inx
    cpx #4
    bne loop
    sta res

Edited January 3, 2022 by ivop
added more variations

ivop · January 3, 2022

A different approach. Not smaller though, but it might help others with thinking about this problem

; swap nibbles
    asl  
    adc  #$80
    rol  
    asl
    adc  #$80
    rol

; split and store result
    pha
    and #$f0
    sta res
    pla
    and #$0f
    sta res+1

Edited January 3, 2022 by ivop

ilmenit · January 3, 2022

1 hour ago, ivop said:

A different approach. Not smaller though, but it might help others with thinking about this problem

that's exactly one of my original attempts in my first post ?

ivop · January 3, 2022

4 minutes ago, ilmenit said:

that's exactly one of my original attempts in my first post ?

Haha, sorry. Missed that somehow

Edit: oh, I never looked at your asm file, but to the quoted code. It was mul1

Edited January 3, 2022 by ivop

ivop · January 3, 2022

Okay, how about this one? It trashes the value though. 16 bytes with val on ZP.

    lda #0
    asl val
    rol
    asl val
    rol
    asl val
    rol
    asl val
    rol
    sta val+1

Or trashing X, too:

    lda #0

    ldx #3
loop
    asl val
    rol
    dex
    bpl loop

    sta val+1

12 bytes.

Edited January 3, 2022 by ivop
typoo

ilmenit · January 3, 2022

.proc mul7 ; by barrym95838, 14 bytes!
; result16 = factor8 * 16
lda val
sta result
lda #$10
loop:
asl result
rol
bcc loop
sta result+1
rts
.endp

ilmenit · January 3, 2022

6 minutes ago, ivop said:

Okay, how about this one? I trashes the value though.

Good ideas, but I will need the value ?

ivop · January 3, 2022

2 minutes ago, ilmenit said:

Good ideas, but I will need the value ?

Yeah, I guessed so. Combining that incredibly neat trick rolling #$10 four times that barrym95838 introduced, with the val trashing code results in this:

    lda #$10

loop
    asl val
    rol
    bcc loop

    sta val+1

9 bytes.

Perhaps you can keep track of the original val somewhere else?

ilmenit · January 3, 2022

2 minutes ago, ivop said:

9 bytes.

Perhaps you can keep track of the original val somewhere else?

With 9 bytes now there can be space to preserve the val ?

ilmenit · January 3, 2022

@barrym95838 is on AtariAge, I see. Kudos!

ilmenit · January 4, 2022

I made one that is using OS and requires placement on a special location, has 12 bytes and does not destroy the val:

	org $3
result_hi .ds[1]
result_lo .ds[1]
val .byte 121 	
	org $2000
.proc os_mul16
	lda val
	sta result_lo
	ldx #0
	stx result_hi
	jsr $DBED
	rts
.endp

one that is destroying the val and has 8 bytes ?

	org $3
result_hi .ds[1]
result_lo:
val .byte 121 	
	org $2000
.proc os_mul16_destr_val
	ldx #0
	stx result_hi
	jsr $DBED
	rts
.endp

I think it's good enough comparing to initial 21-25 bytes.

Edited January 4, 2022 by ilmenit

shortest 8bx16b multiplication by 16

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members