Jump to content
IGNORED

fast table driven 16bit x 8bit mul routine


Recommended Posts

To sign extend, just put either $00 or $FF in the high byte depending on the sign of the low byte:

 

multiply_16x8bit_signed
    ldx #0
    lda T2
    spl:ldx #$FF
    stx T2+1
    jmp multiply_16bit_signed
 
multiply_8x16bit_signed
    ldx #0
    lda T1
    spl:ldx #$FF
    stx T1+1
    jmp multiply_16bit_signed

Here's a full example with the routines converted to XASM-compatible (and by extension MADS) syntax:

 

mult.zip

 

The code just ends with a BRK to force the debugger to come up in Altirra so that you can look at memory at $80 to verify that the products are correct.

 

You can shave off two 16-bit adds and some other bookkeeping if you actually modify the routines instead of just sign extending. But it would be good to know if you want to keep the 16-bit multiplicand constant while computing products for many different 8-bit multiplicands or vice-versa as there are two different ways to reduce the 16x16 to 16x8. Maybe both versions are useful?

  • Like 1
Link to comment
Share on other sites

Here are optimized routines for 16x8 and 8x16. Both produce 24-bit results.

mult.zip

If we consider code size as a proxy for runtime, then the 16x8 and 8x16 versions should be about 2X faster than the 16x16 version:

  • unsigned 16x16 - 171 bytes
  • unsigned 16x8 - 84 bytes
  • unsigned 8x16 - 82 bytes
  • signed 16x16 - 209 bytes
  • signed 16x8 - 116 bytes
  • signed 8x16 - 114 bytes

The code sizes for signed include the unsigned code.

 

EDIT: I should add that I did not test these extensively, so I may have introduced bugs, but they worked great on three test cases. :) Hmmm, now what would be the fastest way to exhaustively test all 16 million inputs?

Edited by Xuel
  • Like 3
Link to comment
Share on other sites

All inputs passed, assuming that I also got the math right on the C side. verif.c is just a copy of lib6502's run6502.c with a few callbacks added to implement the checking. It would be great if someone could check my work.

Link to comment
Share on other sites

  • 1 month later...

I added multiply_s16u8 to github. It passed verification.

 

The sign fix code is pretty simple once you understand it. To fix unsigned X*Y, you just subtract 2^N*X if Y is negative and subtract 2^N*Y if X is negative. So all I had to do to get s16u8 from s16s8 is remove the subtraction for the 8-bit operand.

  • Like 1
Link to comment
Share on other sites

  • 6 months later...
You're using the 16x8 signed version:




mwa #$0001 T1
mwa #$ABCD T2
sec
jsr multiply_16x8bit_signed
mwa PRODUCT result3
mva PRODUCT+2 result3+2



This means that all 16 bits of T1 are used but only the first 8 bits of T2 are used. Since 6502 is little endian, this means T2 is actually $CD in the eyes of the 16x8 signed multiply routine. The value $CD is a negative number since the highest bit is set. You can think of it as -$33. When multiplied by 1 it should still equal -$33. Since 16x8 multiplication yields a 24-bit result, this comes out to $FFFFCD. In other words, $CD has been sign-extended to a 24 bit result.


The result of 1*-1 should be -1 which is either $FFFF, $FFFFFF, or $FFFFFFFF depending on whether you're using a routine that produces a 16-bit, 24-bit or 32-bit result.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...