fast table driven 16bit x 8bit mul routine

Heaven/TQA · July 3, 2014

does anybody have a fast (table driven) 16bit x 8bit mul routine ready? needs to be signed?

Xuel · July 3, 2014

Have you already looked at codebase64? You could modify the 16x16 by JackAsser/Instinct to be 16x8 or just sign extend your 8bit values.

Heaven/TQA · July 3, 2014

That's exactly what I am not able to do... Code base is my first watch...

Heaven/TQA · July 3, 2014

Right now I am using fox fast one included in the Mads pack but there is no table driven 16x16

Xuel · July 3, 2014

To sign extend, just put either $00 or $FF in the high byte depending on the sign of the low byte:

multiply_16x8bit_signed
    ldx #0
    lda T2
    spl:ldx #$FF
    stx T2+1
    jmp multiply_16bit_signed
 
multiply_8x16bit_signed
    ldx #0
    lda T1
    spl:ldx #$FF
    stx T1+1
    jmp multiply_16bit_signed

Here's a full example with the routines converted to XASM-compatible (and by extension MADS) syntax:

mult.zip

The code just ends with a BRK to force the debugger to come up in Altirra so that you can look at memory at $80 to verify that the products are correct.

You can shave off two 16-bit adds and some other bookkeeping if you actually modify the routines instead of just sign extending. But it would be good to know if you want to keep the 16-bit multiplicand constant while computing products for many different 8-bit multiplicands or vice-versa as there are two different ways to reduce the 16x16 to 16x8. Maybe both versions are useful?

Xuel · July 4, 2014

Here are optimized routines for 16x8 and 8x16. Both produce 24-bit results.

mult.zip

If we consider code size as a proxy for runtime, then the 16x8 and 8x16 versions should be about 2X faster than the 16x16 version:

unsigned 16x16 - 171 bytes
unsigned 16x8 - 84 bytes
unsigned 8x16 - 82 bytes
signed 16x16 - 209 bytes
signed 16x8 - 116 bytes
signed 8x16 - 114 bytes

The code sizes for signed include the unsigned code.

EDIT: I should add that I did not test these extensively, so I may have introduced bugs, but they worked great on three test cases. Hmmm, now what would be the fastest way to exhaustively test all 16 million inputs?

Edited July 4, 2014 by Xuel

Heaven/TQA · July 4, 2014

Thx. Saved me a lot of time as I am not good in this 6502 stuff

Xuel · July 4, 2014

I used Ian Piumarta's lib6502 library to verify all 16 million inputs to the 16x8 and 8x16 signed multiply routines. It only takes 8 seconds on my machine to go through all 16 million inputs twice. Source on github.

Heaven/TQA · July 4, 2014

and result? works as designed... just made some test (2x2, 1x0, 1x-1 etc) but seems to work now let's see how it works in combination with other 3d stuff I am playing around right now. Thanks Xuel!

Xuel · July 4, 2014

All inputs passed, assuming that I also got the math right on the C side. verif.c is just a copy of lib6502's run6502.c with a few callbacks added to implement the checking. It would be great if someone could check my work.

Heaven/TQA · July 4, 2014

if that's fine Tebe should include the files into MADS archive.

Xuel · July 4, 2014

Fine with me. I bet Jackasser and Graham from codebase64 would also approve.

Heaven/TQA · July 4, 2014

I guess those guys do not bother Graham should code another Oxyron Atari release

Heaven/TQA · August 7, 2014

Xuel... can you adapt one of the routines to 16bit signed x 8bit unsigned?

Xuel · August 7, 2014

I added multiply_s16u8 to github. It passed verification.

The sign fix code is pretty simple once you understand it. To fix unsigned X*Y, you just subtract 2^N*X if Y is negative and subtract 2^N*Y if X is negative. So all I had to do to get s16u8 from s16s8 is remove the subtraction for the 8-bit operand.

Heaven/TQA · August 8, 2014

cool. thx. will try it... as I suspect at the moment that the issues I run into in my Fractal demos is that I got z-value from 0-255 and the x-coordinates are 16bit... but could be - / +.

so will check later.

Heaven/TQA · February 21, 2015

Xuel.... sorry to bring that up... but

16bit signed

0001 x -1 should bring what?

and $0001 x $ABCD brings not $0000ABCD but $FFFFFFCD?

fmul16x16_test.zip

Edited February 21, 2015 by Heaven/TQA

Xuel · February 22, 2015

You're using the 16x8 signed version:



        mwa #$0001 T1

        mwa #$ABCD T2

        sec

        jsr multiply_16x8bit_signed

        mwa PRODUCT result3

        mva PRODUCT+2 result3+2

This means that all 16 bits of T1 are used but only the first 8 bits of T2 are used. Since 6502 is little endian, this means T2 is actually $CD in the eyes of the 16x8 signed multiply routine. The value $CD is a negative number since the highest bit is set. You can think of it as -$33. When multiplied by 1 it should still equal -$33. Since 16x8 multiplication yields a 24-bit result, this comes out to $FFFFCD. In other words, $CD has been sign-extended to a 24 bit result.

The result of 1*-1 should be -1 which is either $FFFF, $FFFFFF, or $FFFFFFFF depending on whether you're using a routine that produces a 16-bit, 24-bit or 32-bit result.

Heaven/TQA · February 22, 2015

Aaarg.... I will check my code in main app then as I was using this routine and wondered it looks little bit strange

Heaven/TQA · February 22, 2015

I expected to have a signed 16bit x 16bit

fast table driven 16bit x 8bit mul routine

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members