+batari Posted July 21, 2005 Share Posted July 21, 2005 Hello, A few days ago I worked up a slick routine for 8-bit unsigned multiplication (or I think it's slick, anyway.) It works as long as the values do not overflow 8 bits, in which case the value should not be relied upon anyway. The loop will never repeat more than 8 times. The beq done and bcs done are optional, one or both might save cycles but I haven't tested to see if it actually decreases or increases cycles on average. ; x and a contain multiplicands, result in a stx temp1 sta temp2 lda #0 rept lsr temp2 bcc skip clc adc temp1 ;bcs done might save cycles? skip ;beq done might save cycles? asl temp1 bne rept done I tried to work up a good division routine too, but all I could come up with is successive subtraction, which is sub-optimal for small denominator values. Anyone know of a better way? ; x=numerator a=denominator, result in x lsr beq end;div by 0 = bad, div by 1=no calc needed, so bail out rol;restore a sta temp1 txa ldx #$ff sec loop sbc temp1 inx bcs loop end; result in x Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted July 21, 2005 Share Posted July 21, 2005 (edited) Basically, that's how multiplication without tables works. The code could be optimized, but the table based approach is much faster. Edited July 21, 2005 by Thomas Jentzsch Quote Link to comment Share on other sites More sharing options...
+batari Posted July 21, 2005 Share Posted July 21, 2005 Basically, that's how multiplication without tables works. The code could be optimized, but the table based approach is much faster. 895999[/snapback] How big is the table? Quote Link to comment Share on other sites More sharing options...
supercat Posted July 21, 2005 Share Posted July 21, 2005 (edited) A few days ago I worked up a slick routine for 8-bit unsigned multiplication (or I think it's slick, anyway.) It works as long as the values do not overflow 8 bits, in which case the value should not be relied upon anyway. The loop will never repeat more than 8 times. Computing 8x8->16 multiply is more useful than 8x8->8, and isn't really any harder. The 6502's lack of add-without-carry is somewhat irksome, but there are a variety of workarounds that could be used. ; Compute mul1*mul2+acc -> acc:mul1 [mul2 is unchanged] ldx #8 dec mul2 lp: lsr ror mul1 bcc nope adc mul2 nope: dex bne lp inc mul2 As for division, the normal approach is to do a shift-and-subtract. A 16/8->8+8 result is pretty easy, with the caveat that the results will be meaningless if the quotient doesn't fit in eight bits. I'll try to work one up for you. Edited July 21, 2005 by supercat Quote Link to comment Share on other sites More sharing options...
+batari Posted July 21, 2005 Share Posted July 21, 2005 A few days ago I worked up a slick routine for 8-bit unsigned multiplication (or I think it's slick, anyway.) It works as long as the values do not overflow 8 bits, in which case the value should not be relied upon anyway. The loop will never repeat more than 8 times. Computing 8x8->16 multiply is more useful than 8x8->8, and isn't really any harder. I'll have to see if I can work one up that deals with the 6502's "fun" add with carry behavior. As for division, the normal approach is to do a shift-and-subtract. A 16/8->8+8 result is pretty easy, with the caveat that the results will be meaningless if the quotient doesn't fit in eight bits. Again, I'll try to work one up for you. 896028[/snapback] You can whip one up for 16-bit multiplication but I don't need it right now - I just need 8x8->8 for batari Basic, actually, where you can't use 16-bit numbers anyway, and probably won't ever be able to - as the utility of these is somewhat limited and the routines to implement them would use up precious ROM space. It's wise to keep bB from getting too top-heavy, since it's supposed to be a stepping stone to asm, and is only intended for making games, not doing complex math... But don't let this stop you, as someone may find it useful, sometime, somewhere... As for division, all I want is 8/8->8... If you want to do 16/8->8+8, go ahead, but again, I probably don't need it, for the same reasons. Quote Link to comment Share on other sites More sharing options...
supercat Posted July 21, 2005 Share Posted July 21, 2005 You can whip one up for 16-bit multiplication but I don't need it right now - I just need 8x8->8 for batari Basic, actually, where you can't use 16-bit numbers anyway, and probably won't ever be able to - as the utility of these is somewhat limited and the routines to implement them would use up precious ROM space. It's wise to keep bB from getting too top-heavy, since it's supposed to be a stepping stone to asm, and is only intended for making games, not doing complex math... Well, I just snuck 8x8->16 in up above. I would think that for your fixed-point maths you'd want an 8x8->16 multiply. As noted, it's pretty easy. As for division, all I want is 8/8->8... If you want to do 16/8->8+8, go ahead, but again, I probably don't need it, for the same reasons. Here's a simple divide routine off the top of my head, but it probably isn't quite right. I'll have to simulate and see if I got it correct. It should divide A:quot by denom, leaving the quotient in quot and the remainder in A. ldx #8 lp: cmp denom bcc toosmall sbc denom ; Note: Carry is, and will remain, set. rol quot rol dex bne lp beq done toosmall: rol quol rol dex bne lp done: Quote Link to comment Share on other sites More sharing options...
+batari Posted July 21, 2005 Share Posted July 21, 2005 (edited) Well, I just snuck 8x8->16 in up above. I would think that for your fixed-point maths you'd want an 8x8->16 multiply. As noted, it's pretty easy. Wow, I had no idea it was so simple. I'll go ahead and use it, and just return the lower byte in the result, but leave a note in the docs on how to get the upper byte for integer multiplication if someone really needs it in bB! EDIT: Just noticed, before the ADC, the carry is always set... do you need to clear it first? Edited July 21, 2005 by batari Quote Link to comment Share on other sites More sharing options...
retrocon Posted July 21, 2005 Share Posted July 21, 2005 (edited) As for division, the normal approach is to do a shift-and-subtract. A 16/8->8+8 result is pretty easy, with the caveat that the results will be meaningless if the quotient doesn't fit in eight bits. I'll try to work one up for you. 896028[/snapback] Supercat, For a division shortcut, does this make any sense to you? This was a response I just got on a mailing list when asking how to optimize out a divide by 5 on a 10 bit unsigned number. I'm not sure I understand it fully, but I'd like to understand the concept. Do you know what would be the cheapest way to calculate a divide by 5? Table not an option of course. "This is often easier to understand when expressed in binary. 1/5 equals 0.00110011001100110011 etc. 0.00110011 etc equals 11 multiplied by 0.000100010001 etc. 0.000100010001 etc equals 10001 multiplied by 0.0000000100000001 etc. This suggests the following result (in C syntax): Y1 = (X + X<<1) Y2 = (Y1 + Y1<<4) Y3 = (Y2 + Y2<< and we have a result accurate to 16 bits with only three adders. Y4 = (Y3 + Y3<<16) 32 bits with only four adders. " [EDIT] Somehow I think this is related to this algorithm here, but I'm not seeing it quite yet: http://www.sxlist.com/techref/method/math/divconst.htm Edited July 21, 2005 by retrocon Quote Link to comment Share on other sites More sharing options...
supercat Posted July 22, 2005 Share Posted July 22, 2005 EDIT: Just noticed, before the ADC, the carry is always set... do you need to clear it first? 896042[/snapback] That's what the "dec mul2" and "inc mul2" instructions are for. Other ways of dealing with the issue would be to subtract mul1 from the accumulator before the multiplication (which would work nicely for the low-half, but correctly dealing with the high half would be more work), inverting mul1 beforehand and then reversing the sense of the branch (adds two bytes/two cycles at the spot where mul1 gets loaded, but makes the routine more 'opaque'), or adding a "clc" before the "adc" [which costs one byte instead of four, but sixteen cycles instead of ten]. Quote Link to comment Share on other sites More sharing options...
supercat Posted July 22, 2005 Share Posted July 22, 2005 For a division shortcut, does this make any sense to you? This was a response I just got on a mailing list when asking how to optimize out a divide by 5 on a 10 bit unsigned number. I'm not sure I understand it fully, but I'd like to understand the concept. Do you know what would be the cheapest way to calculate a divide by 5? Basically, you're replacing division by a constant with multiplication by a constant. This is an optimization approach which goes back at least to the 1960's, and probably even earlier. It's often an effective approach, but one needs to be careful to ensure that the results will aways be rounded in the expected direction (if one cares). By the way, there are some interesting table-based approaches to multiplication by non-constants which can be somewhat practical if one is doing a lot of multiplication, but require some care to ensure correct operation. One nice method, for example, takes advantage of the fact that A*A-B*B = (A+B)*(A-B). FastMult:; Multiplies n1 by n2. Requires that n1+n2 be 255 or less, and n1>=n2. lda n1 clc adc n2 tax lda n1 sec sbc n2 tay lda squareL,x sbc squareL,y sta resultL lda squareH,x sbc squareH,y sta resultH Fourty-two cycles for an 8x8->16 multiply, subject to the restrictions noted above. Considerably faster than the shift-add method, but at the expense of 512 bytes worth of tables. The tables should be computed such that ([squareH+n]:[squareL+n]) equals int(n*n/4). Quote Link to comment Share on other sites More sharing options...
djmips Posted July 22, 2005 Author Share Posted July 22, 2005 Basically, you're replacing division by a constant with multiplication by a constant. This is an optimization approach which goes back at least to the 1960's, and probably even earlier. It's often an effective approach, but one needs to be careful to ensure that the results will aways be rounded in the expected direction (if one cares). By the way, there are some interesting table-based approaches to multiplication by non-constants which can be somewhat practical if one is doing a lot of multiplication, but require some care to ensure correct operation. One nice method, for example, takes advantage of the fact that A*A-B*B = (A+B)*(A-B). goes back to at least the 1960's Both the multiply by 1/x and using the tables of squares for multiplication can be traced back to the Babylonians (2000-600BC) The formula is: ab = [(a+b)^2 - (a-b)^2]/4 Babylonian Mathematics There are also the useful: sqrt(a^2 + b) approx = a + b/2a There are several variations on this code and the following mores specialized version is attributed to Stephen Judd at C=Hacking (noted in old Stella post) and is 24-26 cycles at the expense of some range. Multiplying several numbers by a particular number is quite fast as you just need to reload Y each time. Fast Signed Multiply ; Multiply Y * A and result in A STA ZP1 ;ZP1 -- zero page pointer to table of g(x) EOR #$FF CLC ADC #$01 STA ZP2 ;ZP2 also points to g(x) LDA (ZP1),Y;g(Y+A) SEC SBC (ZP2),Y;g(Y-A) Quote Link to comment Share on other sites More sharing options...
supercat Posted July 22, 2005 Share Posted July 22, 2005 Both the multiply by 1/x and using the tables of squares for multiplication can be traced back to the Babylonians (2000-600BC) Yeah, well that's even further before my time than the 1960's. BTW, I developed an interesting method for doing long division which would be really cool if someone hadn't patented the same thing shortly before (and unbeknownst to me). It would be especially useful for performing really huge divisions on machines with a fast multiply but slow divide. There are several variations on this code and the following mores specialized version is attributed to Stephen Judd at C=Hacking (noted in old Stella post) and is 24-26 cycles at the expense of some range. Multiplying several numbers by a particular number is quite fast as you just need to reload Y each time. 896209[/snapback] Cute bit of code. Looks like it should be extendable to produce 16-bit results also. Quote Link to comment Share on other sites More sharing options...
Atari-Jess Posted July 22, 2005 Share Posted July 22, 2005 Not a killer hack but I'd like to share a link to an interview with William Mensch, 6502 designer. Real format video of interview 891481[/snapback] That was an awesome interview, but man, I greatly disliked the interviewer. He bumbled with his words and he sounded like a smart ass and only genuinely interested when it seemed like something he knew already. I apologize if its someone here or another "legend" in microprocessing, but man, I can't stand the interviewer. But Mensch was awesome. Quote Link to comment Share on other sites More sharing options...
mos6507 Posted August 31, 2005 Share Posted August 31, 2005 That interview must be from about 10 years ago. He's talking about Super Nintendo and the Pentium Pro like they are current products. Quote Link to comment Share on other sites More sharing options...
djmips Posted November 9, 2005 Author Share Posted November 9, 2005 Some code courtesy of Lee Davison 6502 code 'shorts' Range test. By Lee Davison For all of these we assume that the byte to be tested is in A and that the start and end values, n and m, are already defined. Also that 0 < n < m < $FF. If you don't need to preserve the byte in A then testing the byte can be done in five bytes and only six cycles. This sets the carry if A is in the range n to m. CLC ; clear carry for add ADC #$FF-m; make m = $FF ADC #m-n+1; carry set if in range n to m Quote Link to comment Share on other sites More sharing options...
supercat Posted November 9, 2005 Share Posted November 9, 2005 Some code courtesy of Lee Davison 6502 code 'shorts' 962533[/snapback] I like his hex-digit conversion routine. Saves a byte over the one I normally use if the state of carry on entry is unknown. I also like his 'toggle carry' method, though I still find myself wishing the 6502 allowed a direct method to do that. BTW, one thing that's often useful is the "cookie cutter" operation. For example, to store bits 3-5 of the accumulator into a memory location while leaving the other bits untouched: eor dest and #$38; bits 3-5 eor dest sta dest If you want to read bits 3-5 of a memory location into the corresponding bits of the accumulator, you may similarly do: eor src and #$C7 ; not bits 3-5 eor src but there's a very important proviso here: src MUST NOT change between the two 'eor' instructions. Any bit of src which changes between those instructions will cause the corresponding bit in the accumulator to be toggled if not in the range of bit 3..5. Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted November 10, 2005 Share Posted November 10, 2005 Here's some interesting code from batari: This: ldx #56 loop ;do some stuff here txa sec sbc #14 tax bne loop Can be rewritten like this: ldx #56 loop ;do some stuff here txa sbx #14 bne loop And his comment: the whole routine above was in a loop that was executed up to 10 times, thus saving 2 bytes and up to 160 cycles. Though I think that "160 cycles" is wrong; I think if the loop is executed 10 times you would save 40 cycles altogether. Quote Link to comment Share on other sites More sharing options...
+batari Posted November 10, 2005 Share Posted November 10, 2005 Here's some interesting code from batari: This: ldx #56 loop ;do some stuff here txa sbx #14 bne loop And his comment: the whole routine above was in a loop that was executed up to 10 times, thus saving 2 bytes and up to 160 cycles. Though I think that "160 cycles" is wrong; I think if the loop is executed 10 times you would save 40 cycles altogether. 963347[/snapback] The comment was misleading. The loop shown is executed 4 times, but an outer, unseen loop was executed 10 times for a total of 40. So 160 cycles would be correct. This code was in my game Zirconium until I found that the FB2 doesn't do SBX (nor does PCAE.) So I recoded it Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted November 10, 2005 Share Posted November 10, 2005 This code was in my game Zirconium until I found that the FB2 doesn't do SBX (nor does PCAE.) So I recoded it IMO it's a bad idea to tailor code to partially incompatible hardware or quite outdated emulators. The target system should be the Atari 2600 and nothing else. 1 Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted November 10, 2005 Share Posted November 10, 2005 This code was in my game Zirconium until I found that the FB2 doesn't do SBX (nor does PCAE.) So I recoded it IMO it's a bad idea to tailor code to partially incompatible hardware or quite outdated emulators. The target system should be the Atari 2600 and nothing else. 963398[/snapback] Agreed. Quote Link to comment Share on other sites More sharing options...
uosipa llamxew Posted November 10, 2005 Share Posted November 10, 2005 Not a killer hack but I'd like to share a link to an interview with William Mensch, 6502 designer. Real format video of interview 891481[/snapback] That was an awesome interview, but man, I greatly disliked the interviewer. He bumbled with his words and he sounded like a smart ass and only genuinely interested when it seemed like something he knew already. I apologize if its someone here or another "legend" in microprocessing, but man, I can't stand the interviewer. But Mensch was awesome. 896314[/snapback] Yikes! I couldn't agree more! The guy sounds like he drank a couple of bottles of Robitussin before the interview! Still, when he isn't talking, the video is quite interesting. Quote Link to comment Share on other sites More sharing options...
+batari Posted November 10, 2005 Share Posted November 10, 2005 (edited) This code was in my game Zirconium until I found that the FB2 doesn't do SBX (nor does PCAE.) So I recoded it IMO it's a bad idea to tailor code to partially incompatible hardware or quite outdated emulators. The target system should be the Atari 2600 and nothing else. 963398[/snapback] I tend to agree too, but there are still a staggering number of PCAE users, many of which are almost dogmatic about it. Maybe a bad decision, but I didn't want people not judging my game in the minigame contest because it wouldn't run on what they still consider the "best" emulator despite the fact that it's very dated (and the author has annouced that he will no longer update it.) The FB2 was a lesser consideration... but I have to say I really like the quality of the composite video, and yes I do play my own game occasionally. Edited November 10, 2005 by batari Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted November 10, 2005 Share Posted November 10, 2005 (edited) This code was in my game Zirconium until I found that the FB2 doesn't do SBX (nor does PCAE.) So I recoded it IMO it's a bad idea to tailor code to partially incompatible hardware or quite outdated emulators. The target system should be the Atari 2600 and nothing else. 963398[/snapback] I tend to agree too, but there are still a staggering number of PCAE users, many of which are almost dogmatic about it. Maybe a bad decision, but I didn't want people not judging my game in the minigame contest because it wouldn't run on what they still consider the "best" emulator despite the fact that it's very dated (and the author has annouced that he will no longer update it.) The FB2 was a lesser consideration... but I have to say I really like the quality of the composite video, and yes I do play my own game occasionally. 963421[/snapback] Hmmm...I wonder if Go Fish! 1K will run in PCAE? I never even bothered to check... EDIT: Sure as heck doesn't! Wonder why... Edited November 10, 2005 by vdub_bobby Quote Link to comment Share on other sites More sharing options...
+batari Posted November 11, 2005 Share Posted November 11, 2005 Hmmm...I wonder if Go Fish! 1K will run in PCAE? I never even bothered to check... EDIT: Sure as heck doesn't! Wonder why... 963428[/snapback] Are you using v2.6? It seems to load up just fine for me. The score is messed up though... Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted November 11, 2005 Share Posted November 11, 2005 Hmmm...I wonder if Go Fish! 1K will run in PCAE? I never even bothered to check... EDIT: Sure as heck doesn't! Wonder why... 963428[/snapback] Are you using v2.6? It seems to load up just fine for me. The score is messed up though... 963447[/snapback] Actually...I don't know what is going on. For some reason, the file menu is showing 3 different gofish1k.bin files, each of which behaves differently. There is only one file by that name in the directory... And for kicks, none of the three behaves like it does in z26. Really weird. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.