Heaven/TQA Posted July 3, 2014 Share Posted July 3, 2014 does anybody have a fast (table driven) 16bit x 8bit mul routine ready? needs to be signed? Quote Link to comment Share on other sites More sharing options...
Xuel Posted July 3, 2014 Share Posted July 3, 2014 Have you already looked at codebase64? You could modify the 16x16 by JackAsser/Instinct to be 16x8 or just sign extend your 8bit values. Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted July 3, 2014 Author Share Posted July 3, 2014 That's exactly what I am not able to do... Code base is my first watch... Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted July 3, 2014 Author Share Posted July 3, 2014 Right now I am using fox fast one included in the Mads pack but there is no table driven 16x16 Quote Link to comment Share on other sites More sharing options...
Xuel Posted July 3, 2014 Share Posted July 3, 2014 To sign extend, just put either $00 or $FF in the high byte depending on the sign of the low byte: multiply_16x8bit_signed ldx #0 lda T2 spl:ldx #$FF stx T2+1 jmp multiply_16bit_signed multiply_8x16bit_signed ldx #0 lda T1 spl:ldx #$FF stx T1+1 jmp multiply_16bit_signed Here's a full example with the routines converted to XASM-compatible (and by extension MADS) syntax: mult.zip The code just ends with a BRK to force the debugger to come up in Altirra so that you can look at memory at $80 to verify that the products are correct. You can shave off two 16-bit adds and some other bookkeeping if you actually modify the routines instead of just sign extending. But it would be good to know if you want to keep the 16-bit multiplicand constant while computing products for many different 8-bit multiplicands or vice-versa as there are two different ways to reduce the 16x16 to 16x8. Maybe both versions are useful? 1 Quote Link to comment Share on other sites More sharing options...
Xuel Posted July 4, 2014 Share Posted July 4, 2014 (edited) Here are optimized routines for 16x8 and 8x16. Both produce 24-bit results.mult.zipIf we consider code size as a proxy for runtime, then the 16x8 and 8x16 versions should be about 2X faster than the 16x16 version: unsigned 16x16 - 171 bytes unsigned 16x8 - 84 bytes unsigned 8x16 - 82 bytes signed 16x16 - 209 bytes signed 16x8 - 116 bytes signed 8x16 - 114 bytes The code sizes for signed include the unsigned code. EDIT: I should add that I did not test these extensively, so I may have introduced bugs, but they worked great on three test cases. Hmmm, now what would be the fastest way to exhaustively test all 16 million inputs? Edited July 4, 2014 by Xuel 3 Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted July 4, 2014 Author Share Posted July 4, 2014 Thx. Saved me a lot of time as I am not good in this 6502 stuff Quote Link to comment Share on other sites More sharing options...
Xuel Posted July 4, 2014 Share Posted July 4, 2014 I used Ian Piumarta's lib6502 library to verify all 16 million inputs to the 16x8 and 8x16 signed multiply routines. It only takes 8 seconds on my machine to go through all 16 million inputs twice. Source on github. 3 Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted July 4, 2014 Author Share Posted July 4, 2014 and result? works as designed... just made some test (2x2, 1x0, 1x-1 etc) but seems to work now let's see how it works in combination with other 3d stuff I am playing around right now. Thanks Xuel! 1 Quote Link to comment Share on other sites More sharing options...
Xuel Posted July 4, 2014 Share Posted July 4, 2014 All inputs passed, assuming that I also got the math right on the C side. verif.c is just a copy of lib6502's run6502.c with a few callbacks added to implement the checking. It would be great if someone could check my work. Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted July 4, 2014 Author Share Posted July 4, 2014 if that's fine Tebe should include the files into MADS archive. Quote Link to comment Share on other sites More sharing options...
Xuel Posted July 4, 2014 Share Posted July 4, 2014 Fine with me. I bet Jackasser and Graham from codebase64 would also approve. Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted July 4, 2014 Author Share Posted July 4, 2014 I guess those guys do not bother Graham should code another Oxyron Atari release 1 Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted August 7, 2014 Author Share Posted August 7, 2014 Xuel... can you adapt one of the routines to 16bit signed x 8bit unsigned? Quote Link to comment Share on other sites More sharing options...
Xuel Posted August 7, 2014 Share Posted August 7, 2014 I added multiply_s16u8 to github. It passed verification. The sign fix code is pretty simple once you understand it. To fix unsigned X*Y, you just subtract 2^N*X if Y is negative and subtract 2^N*Y if X is negative. So all I had to do to get s16u8 from s16s8 is remove the subtraction for the 8-bit operand. 1 Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted August 8, 2014 Author Share Posted August 8, 2014 cool. thx. will try it... as I suspect at the moment that the issues I run into in my Fractal demos is that I got z-value from 0-255 and the x-coordinates are 16bit... but could be - / +. so will check later. Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted February 21, 2015 Author Share Posted February 21, 2015 (edited) Xuel.... sorry to bring that up... but 16bit signed 0001 x -1 should bring what? and $0001 x $ABCD brings not $0000ABCD but $FFFFFFCD? fmul16x16_test.zip Edited February 21, 2015 by Heaven/TQA Quote Link to comment Share on other sites More sharing options...
Xuel Posted February 22, 2015 Share Posted February 22, 2015 You're using the 16x8 signed version: mwa #$0001 T1 mwa #$ABCD T2 sec jsr multiply_16x8bit_signed mwa PRODUCT result3 mva PRODUCT+2 result3+2 This means that all 16 bits of T1 are used but only the first 8 bits of T2 are used. Since 6502 is little endian, this means T2 is actually $CD in the eyes of the 16x8 signed multiply routine. The value $CD is a negative number since the highest bit is set. You can think of it as -$33. When multiplied by 1 it should still equal -$33. Since 16x8 multiplication yields a 24-bit result, this comes out to $FFFFCD. In other words, $CD has been sign-extended to a 24 bit result. The result of 1*-1 should be -1 which is either $FFFF, $FFFFFF, or $FFFFFFFF depending on whether you're using a routine that produces a 16-bit, 24-bit or 32-bit result. 1 Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted February 22, 2015 Author Share Posted February 22, 2015 Aaarg.... I will check my code in main app then as I was using this routine and wondered it looks little bit strange Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted February 22, 2015 Author Share Posted February 22, 2015 I expected to have a signed 16bit x 16bit Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.