Rybags #1 Posted November 7, 2005 (edited) I was browsing an 8 bit magazine disk last night and it had an article on Russian multiplication, and how it can be used for very fast binary calculations. It works as so: C = A * B Take the operands A and B, and place them into a row. For each new row, double A and divide B by 2 (discarding the remainder). Keep doing so until a value of 1 is reached in the B column Then, derive the result © by adding up the values in the A column, but only where the B column value is odd. Examples (values of B marked with # are ignored in the final calculation) 23 x 14 # = 322 46 7 92 3 184 1 === 322 = 46+92+184 (sum the values of A, only where the B column value is odd) 17 x 53 = 901 34 26 # 68 13 136 6 # 272 3 544 1 ==== 901 = 17+68+272+544 Thanks to 6502 shift/rotate instructions, this algorithm can easily be incorporated into programs to multiply 2 8-bit numbers for a 16-bit result. Very handy for games or demos needing quick calculations for graphics. I did a quick Google Search and dug up some links: http://mathforum.org/dr.math/faq/faq.peasant.html Egyptian division: http://www.mathforum.org/library/drmath/view/57574.html Division would be great too. The formula used is a similar concept, and could also be fairly easily programmed. Edited November 7, 2005 by Rybags Quote Share this post Link to post Share on other sites
fox #2 Posted November 7, 2005 It's not impressive at all. This is how normally the multiplication is done on a 6502 (easy and slow way). The cool way is: a * b = f(a + b) - f(a - b) where: f(x) = x * x / 4 and is of course stored in a precomputed lookup table. This can be very effectively computed on a 6502, even with signed numbers! (unlike the Russian crap) I described it in detail and with examples years ago in Polish disk magazine "Syzygy" (issue #5 I think). If someone is interested, I may translate it here. Real-world example is in my portal engine in Numen, where I use both signed and unsigned variations and 16-bit * 16-bit with 32-bit result too. Using the Russian way it would be several times slower. As for the division I don't know a really fast algorithm. To my knowledge, even in modern CPUs division is much slower than multiplication. Avoid it if you can. If you can't, calculate 1/b once and then multiply a * (1/b) - I use such a technique in this portal engine, too (see Project_calcInverseD for how to quickly compute $2800 / x). Quote Share this post Link to post Share on other sites
supercat #3 Posted November 7, 2005 It's not impressive at all. This is how normally the multiplication is done on a 6502 (easy and slow way). Another method is to use Booth's algorithm which looks at 3 bits at a time to process two bits at a time. It's a little hard to describe, but it's what most modern CPUs use. As for the division I don't know a really fast algorithm. To my knowledge, even in modern CPUs division is much slower than multiplication. Avoid it if you can. If you can't, calculate 1/b once and then multiply a * (1/b) 961692[/snapback] If one is dividing really big numbers (e.g. doing RSA cryptography) the best approach is to compute an approximate reciprocal of the divisor and multiply both divisor and dividend by that so that the divisor is of the form 1.000...00xxxxxxxxxx or 0.1111...1111xxxx [for some nice generous number of leading zeroes or ones]. This can be done fairly easily (if you're trying for 33 leading 1's or zeroes, you only need to look at the first 34 bits of the divisor IIRC) even if the number is thousands of bits long). Once the division problem has been normalized to that form (divisor of 0.1111...11x with 33 leading 1's) it's easy to get result bits 32 bits at a time. The first 32 bits of the result will either be the first 32 bits of the dividend, or else that quantity plus 1. So assume it's that quantity, multiply and subtract out, and proceed with the next partial remainder. Note that this method of division is subject to a patent that was held by Cyrix when I checked; I came up with this method myself, independently, and discovered that patent when I did a patent search (too bad Cyrix beat me too it). Quote Share this post Link to post Share on other sites
SeaGtGruff #4 Posted November 8, 2005 I was browsing an 8 bit magazine disk last night and it had an article on Russian multiplication, and how it can be used for very fast binary calculations. It works as so: C = A * B Take the operands A and B, and place them into a row. For each new row, double A and divide B by 2 (discarding the remainder). Keep doing so until a value of 1 is reached in the B column Then, derive the result © by adding up the values in the A column, but only where the B column value is odd. Examples (values of B marked with # are ignored in the final calculation) 23 x 14 # = 322 46 7 92 3 184 1 === 322 = 46+92+184 (sum the values of A, only where the B column value is odd) 17 x 53 = 901 34 26 # 68 13 136 6 # 272 3 544 1 ==== 901 = 17+68+272+544 Thanks to 6502 shift/rotate instructions, this algorithm can easily be incorporated into programs to multiply 2 8-bit numbers for a 16-bit result. Very handy for games or demos needing quick calculations for graphics. I did a quick Google Search and dug up some links: http://mathforum.org/dr.math/faq/faq.peasant.html Egyptian division: http://www.mathforum.org/library/drmath/view/57574.html Division would be great too. The formula used is a similar concept, and could also be fairly easily programmed. 961506[/snapback] Thanks, Rybags, this is very cool, and it rings a bell from my computer science classes back in college. Michael Rideout Quote Share this post Link to post Share on other sites
Rybags #5 Posted November 8, 2005 I described it in detail and with examples years ago in Polish disk magazine "Syzygy" (issue #5 I think). If someone is interested, I may translate it here. Real-world example is in my portal engine in Numen, where I use both signed and unsigned variations and 16-bit * 16-bit with 32-bit result too. Using the Russian way it would be several times slower. 961692[/snapback] A translation and example would be appreciated. Any hints on division would be great too. Quote Share this post Link to post Share on other sites
Heaven/TQA #6 Posted November 8, 2005 a good source for fox' fastmul lies here: http://www.ffd2.com/fridge/chacking/ Fast Signed Multiply -------------------- Ah, now here is something we can fix. Consider the following function: f(x) = x*x/4 Now notice that f(a+b) - f(a-b) = a*b Wowsers! All we need to do is have a table of squares, and we can do a multiplication in no time! Whoa there, wait a minute, all of our calculations are using signed numbers. Won't that make a difference? Well, the above calculation is completely general -- I never said what the sign of a and b are. The fact that we are using two's complement notation makes this even simpler! Recall that our multiplication is x -> x * [d/(z0-z/64)] where x is a signed floating point number multiplied by 64 (that is, instead of going from -1 to 1 x would go from -64 to 64). Previously we made d large, so that the table of d/(z-z0) would be more accurate. Then we multiplied the numbers together and divided by 64, a procedure which took between 150 and 180 cycles, leaving us with a signed, 8-bit result. Well, that's easy to duplicate. From our first equation above, we see that (a*b)/64 = [ f(a+b) - f(a-b) ]/64 = g(a+b) - g(a-b) where g(x) = x*x/256. In other words, if we modify our table slightly, we get exactly the result we want. So here is the code to multiply two numbers together: * A*Y -> A Signed, 8-bit result STA ZP1 ;ZP1 -- zero page pointer to table of g(x) EOR #$FF CLC ADC #$01 STA ZP2 ;ZP2 also points to g(x) LDA (ZP1),Y;g(Y+A) SEC SBC (ZP2),Y;g(Y-A) And that's it -- we're done. The above takes 24-26 cycles to execute -- not bad at all! Yes, with another table we could make it even faster, but this is good enough for us. At the moment we don't do very many multiplications, but in the future, when we write a generalized routine to rotate and project an arbitrary object, this will give us a humongous time savings. Astute readers may be thinking ahead here: in the program, for each projection we have two multiplications, x=x*c and y=y*c, where c is the same in both cases. So if we store c in ZP1 and ZP2, we can make the multiplication even more efficient, right? The answer is yes, but only by being extremely careful, for reasons that will be detailed in exactly two paragraphs. BUT WAIT! We have to think about a few things here. What happens when we multiply, say, -1 * -1. In two's complement notation -1 is equal to 255. So our above algorithm adds 255 to 255 to get an index into the table and gets... oops! Our table needs to be larger than 256 bytes! In fact this is very easy to fix, because of the way two's complement works. All we need to do is put an exact copy of the 256 byte table on top of itself, and the table will work fine. (If you look at the initialization program you will notice that the statement is: q%=s*s:poke bm+j,q%:poke bm+j+256,q%). BUT WAIT!!! What kinds of numbers are we multiplying together here? Our vertices start out at the points (+/-1,+/-1,+/-1). Our rotations correspond to moving these points around on a sphere, so it is easy to see that the largest rotated value we can have is sqr(3), the radius of this sphere. Since we are multiplying these numbers by 64, the largest value we can have for these numbers is 64*sqr(3) = 111. Okay, no big whoop, what are the values for d/(z0-z/64) anyways? Well, for z0=5 and d=150 say we get values like 28... ACK! When we go to multiply we are going to add 111 to 28 and get 137, but in two's complement notation this is equal to -119. Example: a=28 b=111 f(b+a) = f(137) = f(-119) in two's-complement notation f(b-a) = f(83) In our table lookup we really want 137*137 but we are going to get -119*-119! One option is to never choose d very large so that we don't have this problem, but the solution turns out to be much simpler, again due to the way two's complementing works. We can see that we can get numbers larger than 127 when we add the two multiplicands together. What is the _smallest_ number we will come up with? Certainly the smallest x is going to get is -111. Ahhh... d/(z0-z/64) is always positive, so when we add them together we will get something larger than -111, which in two's complement notation is 145. This means that we can treat the table entries between 127 and at least 145 as positive numbers instead of two's complement negative numbers, and everything will work out great. Incidentally, fudging this table provides an easy way to add pretty cool special effects. The initialization program sets up the math table using the following line: [for j=0 to 255] 290 S=J:IF S>150 THEN S=256-S [poke bm+j,S*S] Try changing the inequality from S>150 to S>120 (or S>127, where it would be for a two's complement table), and see what happens! And this is why we can't store d/(z0-z) in the pointers ZP1 and ZP2 -- if we did, then for a given multiplication we could get numbers larger than 127 and smaller than -128, and our clever table would no longer work right. We can still get around this -- all we need is two clever tables, one for the positive d/(z0-z) and one for negative d/(z0-z). For the first table, we have the situation outlined above: no numbers smaller than -90 or so, and numbers possible larger than 127. For the second table we have the reverse situation: no numbers larger than 90 or so, but possible numbers less than -128. Since we are using two pointers anyways (ZP1 and ZP2), this is not difficult to implement. The end result is that you can do the entire projection in around 36 cycles if you so desire. 36 cycles? Note that for the second table the code does something like EOR #$FF; CLC; ADC #$01. Well, if we set up the second table as f(x)=(x+1)^2/4 then we have included the CLC and ADC #$01 into the table, so the instructions can be removed. The entire projection routine is then: ... (rotate z) STA ZP1 EOR #$FF STA ZP2 ... (rotate x) TAY LDA (ZP1),Y SEC SBC (ZP2),Y ;Now A contains projected X ... (rotate y) TAY LDA (ZP1),Y SEC SBC (ZP2),Y Looks like 36-40 cycles to me! The program doesn't implement this -- it only uses a single table, and repeats the STA ZP1 stuff at each step. A few cycles wasted won't kill us (there are plenty of cycles wasted in the code), and it is probably tricky enough to follow as it is. You might be asking, what is the true minimum value for a given z0 and d? Well, I tried writing down a set of equations and minimizing according to some constraints, and I ended up with a sixth-order polynomial which I had to write little newton-iteration solver for. In other words, I don't think you can write down a simple function of d and z0 to give the table boundaries. I found 150 to be a perfectly reasonable number to use. Incidentally, this is why the projection was not changed to z=z+c -- I didn't want to go fiddling around with it again. Maybe next time :). ONE MORE THING!!! This is very important. The math table MUST be on an even boundary for the above algorithm to work correctly. This one is easy to get bit by. Logarithmic Multiplication -------------------------- As long as we're talking about fast multiplication here, it is worthwhile to mention another method for multiplying two numbers together. To understand it you need to understand two important properties of logarithms: log(x*y) = log(x) + log(y) log_b(x) = y <=> b^y = x These are a reflection of the fact that logarithms are inverses of exponentiation. So you can see that another way to multiply two numbers together is to take their logs, add, and then exponentiate the result. So you could have a table of log_2 (base 2 logarithms) and another table of 2^x, and do a multiplication very quickly. (Actually, you'd want a table of 32*log_2(x), since log_2(256)=8). Why wasn't this method used? First, dealing with signed numbers is much trickier -- the logarithm of a negative number is a complex (i.e. real+imaginary) number, complete with branch cuts. You can get around this by setting up the tables in a special way (for instance by letting log(-x)=-log(x)) and putting in some special handling, but it isn't as efficient as the algorithm used in the program. Second, accuracy decreases significantly as x and y get large, so that for an eight-bit table of logarithms you will often get an answer that is off by one or more. You can in fact get around this problem by using some sneaky manipulation -- if you are interested in seeing this, contact us! But it is worthwhile to keep this method in mind if you need a really fast multiplication and you aren't too worried about accuracy. Christopher Jam ([email protected]) has come up with an interesting variation on this method. It calculates 64+64*x/z and uses a slightly different structure for the signed numbers, and runs almost as fast as the method used by the program -- contact him for more information if you're interested. it's in issue#9. maybe Fox can add his description to it... http://www.ffd2.com/fridge/chacking/c=hacking9.txt Quote Share this post Link to post Share on other sites
supercat #7 Posted November 8, 2005 (edited) Incidentally, my 'lunar lander' demo uses a somewhat different trick. The goal is to store coordinates in polar-ish notation into rectangular, and to be able to convert to rectangular very quickly. Rather than storing (r,theta) I instead store (theta1,theta2) such that x=cos(theta1)+cos(theta2) and y=sin(theta1)+sin(theta2). This can be accomplished for all 'r' values within range (i.e. less than 2) and coordinates stored this way do not require any multiplication for conversion to rectangular coordinates (the theta1 and theta2 values are all precomputed, and the sin/cos values are obtained via table lookup). To allow for rotation and for 'blowing up' the ship, the actual formulae used are: x=sincos[theta1 + tmod1] + sincos[theta2 + tmod2] y=sincos[theta1 + tmod3] + sincos[theta2 + tmod4] Typically, tmod1=tmod2 and tmod3=tmod4=tmod1+64, but the values can be changed for 'interesting' effects. The coordinates in sincos are scaled at a ratio of 6 units per x pixel or 7 per y pixel; for plotting, they are passed through a table lookup. A lunar-lander shape with 42 distinct pixels can be drawn every frame at 60Hz with a reasonable-sized screen (170 scan lines displayed). Edited November 8, 2005 by supercat Quote Share this post Link to post Share on other sites
Wrathchild #8 Posted November 8, 2005 'Elite' used Logarithmic calculations using lookup tables, so, as log(x*y) = log(x) + log(y), in a similar way log(x/y) = log(x) - log(y). as described above the implementation is relatively straight forward, except Elite's co-ordinate system used 3 bytes to represent a value. regards, Mark Quote Share this post Link to post Share on other sites
fox #9 Posted December 3, 2005 Fast multiplication (original article in Polish disk magazine "Syzygy" no. 6) The problem of multiplication on the 6502 is as old as this processor. The algorithms that can be found in various asm courses base on bit shifting and addition, and need about 100 cycles for calculating product of two 8-bit numbers. Such algorithms are not fast at all. There is an effective solution to this problem: create 2-dimensional array (multiplication table) that contains all the possible results. Unfortunately not always one can afford 16 KB memory for this. What I suggest is an intermediate solution: implement multiplication using an addition and a function "calculated" using a lookup table. Consider the following formulas: (1) a*b = exp( log(a)+log(b) ) (2) a*b = ( cos(x+y)+cos(x-y) )/2 where: x = arc cos (a), y = arc cos (b) (3) a*b = ( (a+b)^2-a^2-b^2 )/2 (4) a*b = ( (a+b)^2-(a-b)^2 )/4 Which formula to choose? The best would be one: - simple to calculate, - true for any a and b, - that not requires high precision, - uses a lookup table easy to generate. These conditions are fulfilled best by the formula (4). The calculation is very simple: a*b = f(a+b)-f(a-b) where: f(x) = (x*x)/4 At first glance, one thinks that you have to store the fractional part of (x*x)/4 in the lookup table in order to get an exact result. This is not the case, because: 1) for even x, x = 2*k : f(2*k) = k*k - the result is an integer 2) for odd x, x = 2*k+1 f(2*k+1) = k*k+k+1/4 - the result is an integer + 0.25. However, if a+b is odd, then a-b is odd too. The result of f(a+b)-f(a-b) would be therefore same, no matter if f() includes this 0.25 or not. Calculating successive squares is easy once you remember that: 1*1 = 1 = 0+1 2*2 = 4 = 0+1+3 3*3 = 9 = 0+1+3+5 ... That is, you only need to sum successive odd numbers. The algorithm is general enough to calculate products of signed numbers. There is no universal "fastest multiplication" routine, because universal routines are not fast and vice-versa. When implementing your own fast multiplication routine you should consider the following things: - how big are numbers which you want to multiplicate - are the numbers signed or not - where the factors are stored and where the result should be stored - what precision you want - for example, you may need just the high byte of the product and +-1 is acceptable But to not make this article too theoretical, I present two routines: - MULU8 - multiplicates unsigned numbers in A and X registers - MULS8 - multiplicates signed numbers in A and X A few words about the routines: - both routines calculate sums using 6502's indexed addressing mode. Unfortunately -A needs to calculated separately, which can be done with EOR #$ff and adding one. Instead of adding 1, we use a separate lookup table for (x+1)^2. - for MULU8 one lookup table contains squares of 0..510 and the other one -255..255 - for MULS8 both tables contain squares of -255..255 - for MULS8 $80 needs to be added to both input numbers, using EOR #$80. This eliminates the ambiguity that arises when adding signed numbers, for example: $60+$60=$c0 -$40+0=$c0 Instead, we have: $e0+$e0=$1c0 (positive $c0) $40+$80=$c0 (negative -$40) If the numbers are small (precisely: abs(A)+abs(X)<128), the EOR #$80 are redundant, but you need to use different lookup tables. mulu8.asm: * Unsigned multiplication * A*X -> A:Y (A-high byte, Y-low) * b. Fox/Tqa opt 21 org $480 sq1l equ $a000 sq1h equ $a200 sq2l equ $a400 sq2h equ $a600 jsr init lda #123 ldx #234 sta l1+1 sta h1+1 eor #$ff sta l2+1 sta h2+1 sec l1 lda sq1l,x l2 sbc sq2l,x tay h1 lda sq1h,x h2 sbc sq2h,x brk brk brk init ldx #0 stx sq1l stx sq1h stx sq2l+$ff stx sq2h+$ff ldy #$ff msq1 txa lsr @ adc sq1l,x sta sq1l+1,x sta sq2l-1,y sta sq2l+$100,x lda #0 adc sq1h,x sta sq1h+1,x sta sq2h-1,y sta sq2h+$100,x inx dey bne msq1 msq2 tya sbc #0 - ror @ adc sq1l+$ff,y sta sq1l+$100,y lda #0 adc sq1h+$ff,y sta sq1h+$100,y iny bne msq2 rts end muls8.asm: * Signed multiplication * A*X -> A:Y (A-high byte, Y-low) * b. Fox/Tqa opt 21 org $480 sq1l equ $a000 sq1h equ $a200 sq2l equ $a400 sq2h equ $a600 jsr init lda <123 ldx <0-45 eor #$80 sta l1+1 sta h1+1 eor #$ff sta l2+1 sta h2+1 txa eor #$80 tax sec l1 lda sq1l,x l2 sbc sq2l,x tay h1 lda sq1h,x h2 sbc sq2h,x brk brk brk init ldx #0 stx sq1l+$100 stx sq1h+$100 stx sq2l+$ff stx sq2h+$ff ldy #$ff msqr txa lsr @ adc sq1l+$100,x sta sq1l,y sta sq1l+$101,x sta sq2l-1,y sta sq2l+$100,x lda #0 adc sq1h+$100,x sta sq1h,y sta sq1h+$101,x sta sq2h-1,y sta sq2h+$100,x inx dey bne msqr rts end Quote Share this post Link to post Share on other sites
SeaGtGruff #10 Posted December 3, 2005 Fast multiplication(original article in Polish disk magazine "Syzygy" no. 6) The problem of multiplication on the 6502 is as old as this processor. The algorithms that can be found in various asm courses base on bit shifting and addition, and need about 100 cycles for calculating product of two 8-bit numbers. Such algorithms are not fast at all. There is an effective solution to this problem: create 2-dimensional array (multiplication table) that contains all the possible results. Unfortunately not always one can afford 16 KB memory for this. What I suggest is an intermediate solution: implement multiplication using an addition and a function "calculated" using a lookup table. Consider the following formulas: (1) a*b = exp( log(a)+log(b) ) (2) a*b = ( cos(x+y)+cos(x-y) )/2 where: x = arc cos (a), y = arc cos (b) (3) a*b = ( (a+b)^2-a^2-b^2 )/2 (4) a*b = ( (a+b)^2-(a-b)^2 )/4 Which formula to choose? The best would be one: - simple to calculate, - true for any a and b, - that not requires high precision, - uses a lookup table easy to generate. These conditions are fulfilled best by the formula (4). The calculation is very simple: a*b = f(a+b)-f(a-b) where: f(x) = (x*x)/4 At first glance, one thinks that you have to store the fractional part of (x*x)/4 in the lookup table in order to get an exact result. This is not the case, because: 1) for even x, x = 2*k : f(2*k) = k*k - the result is an integer 2) for odd x, x = 2*k+1 f(2*k+1) = k*k+k+1/4 - the result is an integer + 0.25. However, if a+b is odd, then a-b is odd too. The result of f(a+b)-f(a-b) would be therefore same, no matter if f() includes this 0.25 or not. Calculating successive squares is easy once you remember that: 1*1 = 1 = 0+1 2*2 = 4 = 0+1+3 3*3 = 9 = 0+1+3+5 ... That is, you only need to sum successive odd numbers. The algorithm is general enough to calculate products of signed numbers. There is no universal "fastest multiplication" routine, because universal routines are not fast and vice-versa. When implementing your own fast multiplication routine you should consider the following things: - how big are numbers which you want to multiplicate - are the numbers signed or not - where the factors are stored and where the result should be stored - what precision you want - for example, you may need just the high byte of the product and +-1 is acceptable But to not make this article too theoretical, I present two routines: - MULU8 - multiplicates unsigned numbers in A and X registers - MULS8 - multiplicates signed numbers in A and X A few words about the routines: - both routines calculate sums using 6502's indexed addressing mode. Unfortunately -A needs to calculated separately, which can be done with EOR #$ff and adding one. Instead of adding 1, we use a separate lookup table for (x+1)^2. - for MULU8 one lookup table contains squares of 0..510 and the other one -255..255 - for MULS8 both tables contain squares of -255..255 - for MULS8 $80 needs to be added to both input numbers, using EOR #$80. This eliminates the ambiguity that arises when adding signed numbers, for example: $60+$60=$c0 -$40+0=$c0 Instead, we have: $e0+$e0=$1c0 (positive $c0) $40+$80=$c0 (negative -$40) If the numbers are small (precisely: abs(A)+abs(X)<128), the EOR #$80 are redundant, but you need to use different lookup tables. mulu8.asm: * Unsigned multiplication * A*X -> A:Y (A-high byte, Y-low) * b. Fox/Tqa opt 21 org $480 sq1l equ $a000 sq1h equ $a200 sq2l equ $a400 sq2h equ $a600 jsr init lda #123 ldx #234 sta l1+1 sta h1+1 eor #$ff sta l2+1 sta h2+1 sec l1 lda sq1l,x l2 sbc sq2l,x tay h1 lda sq1h,x h2 sbc sq2h,x brk brk brk init ldx #0 stx sq1l stx sq1h stx sq2l+$ff stx sq2h+$ff ldy #$ff msq1 txa lsr @ adc sq1l,x sta sq1l+1,x sta sq2l-1,y sta sq2l+$100,x lda #0 adc sq1h,x sta sq1h+1,x sta sq2h-1,y sta sq2h+$100,x inx dey bne msq1 msq2 tya sbc #0 - ror @ adc sq1l+$ff,y sta sq1l+$100,y lda #0 adc sq1h+$ff,y sta sq1h+$100,y iny bne msq2 rts end muls8.asm: * Signed multiplication * A*X -> A:Y (A-high byte, Y-low) * b. Fox/Tqa opt 21 org $480 sq1l equ $a000 sq1h equ $a200 sq2l equ $a400 sq2h equ $a600 jsr init lda <123 ldx <0-45 eor #$80 sta l1+1 sta h1+1 eor #$ff sta l2+1 sta h2+1 txa eor #$80 tax sec l1 lda sq1l,x l2 sbc sq2l,x tay h1 lda sq1h,x h2 sbc sq2h,x brk brk brk init ldx #0 stx sq1l+$100 stx sq1h+$100 stx sq2l+$ff stx sq2h+$ff ldy #$ff msqr txa lsr @ adc sq1l+$100,x sta sq1l,y sta sq1l+$101,x sta sq2l-1,y sta sq2l+$100,x lda #0 adc sq1h+$100,x sta sq1h,y sta sq1h+$101,x sta sq2h-1,y sta sq2h+$100,x inx dey bne msqr rts end 975865[/snapback] Thank you, Fox, the article and routines will be very useful! Michael Rideout Quote Share this post Link to post Share on other sites
Shiru #11 Posted December 9, 2005 Hey, mans, it's 'russian multiplication' not for computer algorythms;) It's for multiply 'in mind', we learn that method in first classes of beginning school. For computer mulitply much better way to use binary multiply:) Quote Share this post Link to post Share on other sites
fox #12 Posted December 14, 2005 I really am impressed that Russians can calculate things like 23*14 or 17*53 'in mind'. Maybe they don't need computers at all? Quote Share this post Link to post Share on other sites
Heaven/TQA #13 Posted December 17, 2005 with 2 bottles of wodka i am able to calculate bigger numbers in mind as well... nastrovje... Quote Share this post Link to post Share on other sites