Jump to content
IGNORED

Russian multiplication


Rybags

Recommended Posts

I was browsing an 8 bit magazine disk last night and it had an article on Russian multiplication, and how it can be used for very fast binary calculations.

 

It works as so: C = A * B

 

Take the operands A and B, and place them into a row. For each new row, double A and divide B by 2 (discarding the remainder). Keep doing so until a value of 1 is reached in the B column

 

Then, derive the result © by adding up the values in the A column, but only where the B column value is odd.

 

Examples (values of B marked with # are ignored in the final calculation)

23    x    14 #   =  322
46           7
92           3
184         1
===
322 = 46+92+184  (sum the values of A, only where the B column value is odd)

17    x    53    = 901
34          26 #
68          13
136         6 #
272         3
544         1
====
901  = 17+68+272+544

 

Thanks to 6502 shift/rotate instructions, this algorithm can easily be incorporated into programs to multiply 2 8-bit numbers for a 16-bit result. Very handy for games or demos needing quick calculations for graphics.

 

I did a quick Google Search and dug up some links:

 

http://mathforum.org/dr.math/faq/faq.peasant.html

 

Egyptian division: http://www.mathforum.org/library/drmath/view/57574.html

 

Division would be great too. The formula used is a similar concept, and could also be fairly easily programmed.

Edited by Rybags
Link to comment
Share on other sites

It's not impressive at all. This is how normally the multiplication is done on a 6502 (easy and slow way).

 

The cool way is:

a * b = f(a + b) - f(a - b)

where:

f(x) = x * x / 4 and is of course stored in a precomputed lookup table.

 

This can be very effectively computed on a 6502, even with signed numbers! (unlike the Russian crap)

 

I described it in detail and with examples years ago in Polish disk magazine "Syzygy" (issue #5 I think). If someone is interested, I may translate it here. Real-world example is in my portal engine in Numen, where I use both signed and unsigned variations and 16-bit * 16-bit with 32-bit result too. Using the Russian way it would be several times slower.

 

As for the division I don't know a really fast algorithm. To my knowledge, even in modern CPUs division is much slower than multiplication. Avoid it if you can. If you can't, calculate 1/b once and then multiply a * (1/b) - I use such a technique in this portal engine, too (see Project_calcInverseD for how to quickly compute $2800 / x).

Link to comment
Share on other sites

It's not impressive at all. This is how normally the multiplication is done on a 6502 (easy and slow way).

 

Another method is to use Booth's algorithm which looks at 3 bits at a time to process two bits at a time. It's a little hard to describe, but it's what most modern CPUs use.

 

As for the division I don't know a really fast algorithm. To my knowledge, even in modern CPUs division is much slower than multiplication. Avoid it if you can. If you can't, calculate 1/b once and then multiply a * (1/b)

961692[/snapback]

 

If one is dividing really big numbers (e.g. doing RSA cryptography) the best approach is to compute an approximate reciprocal of the divisor and multiply both divisor and dividend by that so that the divisor is of the form 1.000...00xxxxxxxxxx or 0.1111...1111xxxx [for some nice generous number of leading zeroes or ones]. This can be done fairly easily (if you're trying for 33 leading 1's or zeroes, you only need to look at the first 34 bits of the divisor IIRC) even if the number is thousands of bits long).

 

Once the division problem has been normalized to that form (divisor of 0.1111...11x with 33 leading 1's) it's easy to get result bits 32 bits at a time. The first 32 bits of the result will either be the first 32 bits of the dividend, or else that quantity plus 1. So assume it's that quantity, multiply and subtract out, and proceed with the next partial remainder.

 

Note that this method of division is subject to a patent that was held by Cyrix when I checked; I came up with this method myself, independently, and discovered that patent when I did a patent search (too bad Cyrix beat me too it).

Link to comment
Share on other sites

I was browsing an 8 bit magazine disk last night and it had an article on Russian multiplication, and how it can be used for very fast binary calculations.

 

It works as so:  C = A * B

 

Take the operands A and B, and place them into a row.  For each new row, double A and divide B by 2 (discarding the remainder).  Keep doing so until a value of 1 is reached in the B column

 

Then, derive the result © by adding up the values in the A column, but only where the B column value is odd.

 

Examples (values of B marked with # are ignored in the final calculation)

23    x    14 #   =  322
46           7
92           3
184         1
===
322 = 46+92+184  (sum the values of A, only where the B column value is odd)

17    x    53    = 901
34          26 #
68          13
136         6 #
272         3
544         1
====
901  = 17+68+272+544

 

Thanks to 6502 shift/rotate instructions, this algorithm can easily be incorporated into programs to multiply 2 8-bit numbers for a 16-bit result.  Very handy for games or demos needing quick calculations for graphics.

 

I did a quick Google Search and dug up some links:

 

http://mathforum.org/dr.math/faq/faq.peasant.html

 

Egyptian division:  http://www.mathforum.org/library/drmath/view/57574.html

 

Division would be great too.  The formula used is a similar concept, and could also be fairly easily programmed.

961506[/snapback]

Thanks, Rybags, this is very cool, and it rings a bell from my computer science classes back in college.

 

Michael Rideout

Link to comment
Share on other sites

 

I described it in detail and with examples years ago in Polish disk magazine "Syzygy" (issue #5 I think). If someone is interested, I may translate it here. Real-world example is in my portal engine in Numen, where I use both signed and unsigned variations and 16-bit * 16-bit with 32-bit result too. Using the Russian way it would be several times slower.

 

961692[/snapback]

 

A translation and example would be appreciated.

 

Any hints on division would be great too.

Link to comment
Share on other sites

a good source for fox' fastmul lies here:

 

http://www.ffd2.com/fridge/chacking/

 

Fast Signed Multiply
--------------------

Ah, now here is something we can fix.  Consider the following function:

f(x) = x*x/4

Now notice that

f(a+b) - f(a-b) = a*b

Wowsers!  All we need to do is have a table of squares, and we can do a
multiplication in no time!

Whoa there, wait a minute, all of our calculations are using signed numbers.
Won't that make a difference?  Well, the above calculation is completely
general -- I never said what the sign of a and b are.  The fact that we are
using two's complement notation makes this even simpler!

Recall that our multiplication is

x -> x * [d/(z0-z/64)]

where x is a signed floating point number multiplied by 64 (that is,  instead
of going from -1 to 1 x would go from -64 to 64).  Previously we made d
large, so that the table of d/(z-z0) would be more accurate. Then we
multiplied the numbers together and divided by 64, a procedure which took
between 150 and 180 cycles, leaving us with a signed, 8-bit result.

Well, that's easy to duplicate.  From our first equation above, we see that

(a*b)/64 = [ f(a+b) - f(a-b) ]/64
  = g(a+b) - g(a-b)
where
g(x) = x*x/256.

In other words, if we modify our table slightly, we get exactly the result we
want.  So here is the code to multiply two numbers together:

* A*Y -> A  Signed, 8-bit result

STA ZP1	;ZP1 -- zero page pointer to table of g(x)
EOR #$FF
CLC
ADC #$01
STA ZP2	;ZP2 also points to g(x)
LDA (ZP1),Y;g(Y+A)
SEC
SBC (ZP2),Y;g(Y-A)

And that's it -- we're done.  The above takes 24-26 cycles to execute -- not
bad at all!  Yes, with another table we could make it even faster, but this
is good enough for us.

At the moment we don't do very many multiplications, but in the future, when
we write a generalized routine to rotate and project an arbitrary object,
this will give us a humongous time savings.

Astute readers may be thinking ahead here: in the program, for each
projection we have two multiplications, x=x*c and y=y*c, where c is the same
in both cases.  So if we store c in ZP1 and ZP2, we can make the
multiplication even more efficient, right?  The answer is yes, but only by
being extremely careful, for reasons that will be detailed in exactly two
paragraphs.

BUT WAIT!  We have to think about a few things here.  What happens when we
multiply, say, -1 * -1.  In two's complement notation -1 is equal to 255.  So
our above algorithm adds 255 to 255 to get an index into the table and
gets... oops!  Our table needs to be larger than 256 bytes!  In fact this is
very easy to fix, because of the way two's complement works.  All we need to
do is put an exact copy of the 256 byte table on top of itself, and the table
will work fine.  (If you look at the initialization program you will notice
that the statement is: q%=s*s:poke bm+j,q%:poke bm+j+256,q%).

BUT WAIT!!!  What kinds of numbers are we multiplying together here? Our
vertices start out at the points (+/-1,+/-1,+/-1).  Our rotations correspond
to moving these points around on a sphere, so it is easy to see that the
largest rotated value we can have is sqr(3), the radius of this sphere.
Since we are multiplying these numbers by 64, the largest value we can have
for these numbers is 64*sqr(3) = 111.  Okay, no big whoop, what are the
values for d/(z0-z/64) anyways?  Well, for z0=5 and d=150 say we get values
like 28...

ACK!  When we go to multiply we are going to add 111 to 28 and get 137, but
in two's complement notation this is equal to -119.

Example:
a=28
b=111
f(b+a) = f(137) = f(-119) in two's-complement notation
f(b-a) = f(83)

In our table lookup we really want 137*137 but we are going to get -119*-119!
One option is to never choose d very large so that we don't have this
problem, but the solution turns out to be much simpler, again due to the way
two's complementing works.

We can see that we can get numbers larger than 127 when we add the two
multiplicands together.  What is the _smallest_ number we will come up with?
Certainly the smallest x is going to get is -111. Ahhh... d/(z0-z/64) is
always positive, so when we add them together we will get something larger
than -111, which in two's complement notation is 145.  This means that we can
treat the table entries between 127 and at least 145 as positive numbers
instead of two's complement negative numbers, and everything will work out
great.

Incidentally, fudging this table provides an easy way to add pretty cool
special effects.  The initialization program sets up the math table using the
following line:

    [for j=0 to 255]
290 S=J:IF S>150 THEN S=256-S
    [poke bm+j,S*S]

Try changing the inequality from S>150 to S>120 (or S>127, where it would be
for a two's complement table), and see what happens!

And this is why we can't store d/(z0-z) in the pointers ZP1 and ZP2 -- if we
did, then for a given multiplication we could get numbers larger than 127 and
smaller than -128, and our clever table would no longer work right.  We can
still get around this -- all we need is two clever tables, one for the
positive d/(z0-z) and one for negative d/(z0-z).  For the first table, we
have the situation outlined above: no numbers smaller than -90 or so, and
numbers possible larger than 127.  For the second table we have the reverse
situation: no numbers larger than 90 or so, but possible numbers less than
-128. Since we are using two pointers anyways (ZP1 and ZP2), this is not
difficult to implement.

The end result is that you can do the entire projection in around 36 cycles
if you so desire.  36 cycles?  Note that for the second table the code does
something like EOR #$FF; CLC; ADC #$01. Well, if we set up the second table
as f(x)=(x+1)^2/4 then we have included the CLC and ADC #$01 into the table,
so the instructions can be removed.  The entire projection routine is then:

... (rotate z)
STA ZP1
EOR #$FF
STA ZP2
... (rotate x)
TAY
LDA (ZP1),Y
SEC
SBC (ZP2),Y   ;Now A contains projected X
... (rotate y)
TAY
LDA (ZP1),Y
SEC
SBC (ZP2),Y

Looks like 36-40 cycles to me!  The program doesn't implement this -- it only
uses a single table, and repeats the STA ZP1 stuff at each step.  A few
cycles wasted won't kill us (there are plenty of cycles wasted in the code),
and it is probably tricky enough to follow as it is.

You might be asking, what is the true minimum value for a given z0 and d?
Well, I tried writing down a set of equations and minimizing according to
some constraints, and I ended up with a sixth-order polynomial which I had to
write little newton-iteration solver for.  In other words, I don't think you
can write down a simple function of d and z0 to give the table boundaries.  I
found 150 to be a perfectly reasonable number to use.

Incidentally, this is why the projection was not changed to z=z+c -- I didn't
want to go fiddling around with it again.  Maybe next time :).

ONE MORE THING!!!  This is very important.  The math table MUST be on an even
boundary for the above algorithm to work correctly.  This one is easy to get
bit by.

Logarithmic Multiplication
--------------------------

As long as we're talking about fast multiplication here, it is worthwhile to
mention another method for multiplying two numbers together.  To understand
it you need to understand two important properties of logarithms:

log(x*y) = log(x) + log(y)
log_b(x) = y  <=>  b^y = x

These are a reflection of the fact that logarithms are inverses of
exponentiation.  So you can see that another way to multiply two numbers
together is to take their logs, add, and then exponentiate the result.  So
you could have a table of log_2 (base 2 logarithms) and another table of 2^x,
and do a multiplication very quickly. (Actually, you'd want a table of
32*log_2(x), since log_2(256)=8). Why wasn't this method used?

First, dealing with signed numbers is much trickier -- the logarithm of a
negative number is a complex (i.e. real+imaginary) number, complete with
branch cuts.  You can get around this by setting up the tables in a special
way (for instance by letting log(-x)=-log(x)) and putting in some special
handling, but it isn't as efficient as the algorithm used in the program.

Second, accuracy decreases significantly as x and y get large, so that for an
eight-bit table of logarithms you will often get an answer that is off by one
or more.  You can in fact get around this problem by using some sneaky
manipulation -- if you are interested in seeing this, contact us!

But it is worthwhile to keep this method in mind if you need a really fast
multiplication and you aren't too worried about accuracy.

Christopher Jam (phillips@ee.uwa.edu.au) has come up with an interesting
variation on this method.  It calculates 64+64*x/z and uses a slightly
different structure for the signed numbers, and runs almost as fast as the
method used by the program -- contact him for more information if you're
interested.

 

it's in issue#9. maybe Fox can add his description to it... :D

 

http://www.ffd2.com/fridge/chacking/c=hacking9.txt

Link to comment
Share on other sites

Incidentally, my 'lunar lander' demo uses a somewhat different trick. The goal is to store coordinates in polar-ish notation into rectangular, and to be able to convert to rectangular very quickly. Rather than storing (r,theta) I instead store (theta1,theta2) such that x=cos(theta1)+cos(theta2) and y=sin(theta1)+sin(theta2). This can be accomplished for all 'r' values within range (i.e. less than 2) and coordinates stored this way do not require any multiplication for conversion to rectangular coordinates (the theta1 and theta2 values are all precomputed, and the sin/cos values are obtained via table lookup). To allow for rotation and for 'blowing up' the ship, the actual formulae used are:

  x=sincos[theta1 + tmod1] + sincos[theta2 + tmod2]
 y=sincos[theta1 + tmod3] + sincos[theta2 + tmod4]

 

Typically, tmod1=tmod2 and tmod3=tmod4=tmod1+64, but the values can be changed for 'interesting' effects. The coordinates in sincos are scaled at a ratio of 6 units per x pixel or 7 per y pixel; for plotting, they are passed through a table lookup.

 

A lunar-lander shape with 42 distinct pixels can be drawn every frame at 60Hz with a reasonable-sized screen (170 scan lines displayed).

Edited by supercat
Link to comment
Share on other sites

'Elite' used Logarithmic calculations using lookup tables,

so, as log(x*y) = log(x) + log(y), in a similar way log(x/y) = log(x) - log(y).

as described above the implementation is relatively straight forward,

except Elite's co-ordinate system used 3 bytes to represent a value.

 

regards,

Mark

Link to comment
Share on other sites

  • 4 weeks later...

Fast multiplication

(original article in Polish disk magazine "Syzygy" no. 6)

 

The problem of multiplication on the 6502 is as old as this processor.

The algorithms that can be found in various asm courses base on bit shifting

and addition, and need about 100 cycles for calculating product of two

8-bit numbers. Such algorithms are not fast at all.

 

There is an effective solution to this problem: create 2-dimensional array

(multiplication table) that contains all the possible results.

Unfortunately not always one can afford 16 KB memory for this.

 

What I suggest is an intermediate solution: implement multiplication

using an addition and a function "calculated" using a lookup table.

 

Consider the following formulas:

 

(1) a*b = exp( log(a)+log(b) )

(2) a*b = ( cos(x+y)+cos(x-y) )/2

where: x = arc cos (a), y = arc cos (b)

(3) a*b = ( (a+b)^2-a^2-b^2 )/2

(4) a*b = ( (a+b)^2-(a-b)^2 )/4

 

Which formula to choose? The best would be one:

- simple to calculate,

- true for any a and b,

- that not requires high precision,

- uses a lookup table easy to generate.

 

These conditions are fulfilled best by the formula (4).

 

The calculation is very simple:

a*b = f(a+b)-f(a-b)

where: f(x) = (x*x)/4

 

At first glance, one thinks that you have to store

the fractional part of (x*x)/4 in the lookup table

in order to get an exact result. This is not the case,

because:

1) for even x, x = 2*k :

f(2*k) = k*k

- the result is an integer

2) for odd x, x = 2*k+1

f(2*k+1) = k*k+k+1/4

- the result is an integer + 0.25.

However, if a+b is odd, then a-b is odd too.

The result of f(a+b)-f(a-b) would be therefore same,

no matter if f() includes this 0.25 or not.

 

Calculating successive squares is easy once you remember

that:

 

1*1 = 1 = 0+1

2*2 = 4 = 0+1+3

3*3 = 9 = 0+1+3+5

...

 

That is, you only need to sum successive odd numbers.

 

The algorithm is general enough to calculate products

of signed numbers.

 

There is no universal "fastest multiplication" routine,

because universal routines are not fast and vice-versa.

 

When implementing your own fast multiplication routine

you should consider the following things:

- how big are numbers which you want to multiplicate

- are the numbers signed or not

- where the factors are stored and where the result

should be stored

- what precision you want - for example, you may need

just the high byte of the product and +-1 is acceptable

 

But to not make this article too theoretical, I present

two routines:

- MULU8 - multiplicates unsigned numbers in A and X registers

- MULS8 - multiplicates signed numbers in A and X

 

A few words about the routines:

- both routines calculate sums using 6502's indexed addressing mode.

Unfortunately -A needs to calculated separately, which can be done

with EOR #$ff and adding one. Instead of adding 1, we use a separate

lookup table for (x+1)^2.

- for MULU8 one lookup table contains squares of 0..510 and the other

one -255..255

- for MULS8 both tables contain squares of -255..255

- for MULS8 $80 needs to be added to both input numbers, using EOR #$80.

This eliminates the ambiguity that arises when adding signed numbers,

for example:

$60+$60=$c0

-$40+0=$c0

Instead, we have:

$e0+$e0=$1c0 (positive $c0)

$40+$80=$c0 (negative -$40)

If the numbers are small (precisely: abs(A)+abs(X)<128), the EOR #$80

are redundant, but you need to use different lookup tables.

 

mulu8.asm:

 

* Unsigned multiplication
* A*X -> A:Y (A-high byte, Y-low)
* b. Fox/Tqa

    opt 21
    org $480

sq1l equ $a000
sq1h equ $a200
sq2l equ $a400
sq2h equ $a600

    jsr init

    lda #123
    ldx #234

    sta l1+1
    sta h1+1
    eor #$ff
    sta l2+1
    sta h2+1
    sec
l1   lda sq1l,x
l2   sbc sq2l,x
    tay
h1   lda sq1h,x
h2   sbc sq2h,x

    brk
    brk
    brk

init ldx #0
    stx sq1l
    stx sq1h
    stx sq2l+$ff
    stx sq2h+$ff
    ldy #$ff
msq1 txa
    lsr @
    adc sq1l,x
    sta sq1l+1,x
    sta sq2l-1,y
    sta sq2l+$100,x
    lda #0
    adc sq1h,x
    sta sq1h+1,x
    sta sq2h-1,y
    sta sq2h+$100,x
    inx
    dey
    bne msq1
msq2 tya
    sbc #0 -
    ror @
    adc sq1l+$ff,y
    sta sq1l+$100,y
    lda #0
    adc sq1h+$ff,y
    sta sq1h+$100,y
    iny
    bne msq2
    rts

    end

 

muls8.asm:

* Signed multiplication
* A*X -> A:Y (A-high byte, Y-low)
* b. Fox/Tqa

    opt 21
    org $480

sq1l equ $a000
sq1h equ $a200
sq2l equ $a400
sq2h equ $a600

    jsr init

    lda <123
    ldx <0-45

    eor #$80
    sta l1+1
    sta h1+1
    eor #$ff
    sta l2+1
    sta h2+1
    txa
    eor #$80
    tax
    sec
l1   lda sq1l,x
l2   sbc sq2l,x
    tay
h1   lda sq1h,x
h2   sbc sq2h,x

    brk
    brk
    brk


init ldx #0
    stx sq1l+$100
    stx sq1h+$100
    stx sq2l+$ff
    stx sq2h+$ff
    ldy #$ff
msqr txa
    lsr @
    adc sq1l+$100,x
    sta sq1l,y
    sta sq1l+$101,x
    sta sq2l-1,y
    sta sq2l+$100,x
    lda #0
    adc sq1h+$100,x
    sta sq1h,y
    sta sq1h+$101,x
    sta sq2h-1,y
    sta sq2h+$100,x
    inx
    dey
    bne msqr
    rts

    end

Link to comment
Share on other sites

Fast multiplication

(original article in Polish disk magazine "Syzygy" no. 6)

 

The problem of multiplication on the 6502 is as old as this processor.

The algorithms that can be found in various asm courses base on bit shifting

and addition, and need about 100 cycles for calculating product of two

8-bit numbers. Such algorithms are not fast at all.

 

There is an effective solution to this problem: create 2-dimensional array

(multiplication table) that contains all the possible results.

Unfortunately not always one can afford 16 KB memory for this.

 

What I suggest is an intermediate solution: implement multiplication

using an addition and a function "calculated" using a lookup table.

 

Consider the following formulas:

 

(1) a*b = exp( log(a)+log(b) )

(2) a*b = ( cos(x+y)+cos(x-y) )/2

where: x = arc cos (a), y = arc cos (b)

(3) a*b = ( (a+b)^2-a^2-b^2 )/2

(4) a*b = ( (a+b)^2-(a-b)^2 )/4

 

Which formula to choose? The best would be one:

- simple to calculate,

- true for any a and b,

- that not requires high precision,

- uses a lookup table easy to generate.

 

These conditions are fulfilled best by the formula (4).

 

The calculation is very simple:

    a*b = f(a+b)-f(a-b)

where: f(x) = (x*x)/4

 

At first glance, one thinks that you have to store

the fractional part of (x*x)/4 in the lookup table

in order to get an exact result. This is not the case,

because:

1) for even x, x = 2*k :

    f(2*k) = k*k

- the result is an integer

2) for odd x, x = 2*k+1

    f(2*k+1) = k*k+k+1/4

- the result is an integer + 0.25.

However, if a+b is odd, then a-b is odd too.

The result of f(a+b)-f(a-b) would be therefore same,

no matter if f() includes this 0.25 or not.

 

Calculating successive squares is easy once you remember

that:

 

1*1 = 1 = 0+1

2*2 = 4 = 0+1+3

3*3 = 9 = 0+1+3+5

...

 

That is, you only need to sum successive odd numbers.

 

The algorithm is general enough to calculate products

of signed numbers.

 

There is no universal "fastest multiplication" routine,

because universal routines are not fast and vice-versa.

 

When implementing your own fast multiplication routine

you should consider the following things:

- how big are numbers which you want to multiplicate

- are the numbers signed or not

- where the factors are stored and where the result

  should be stored

- what precision you want - for example, you may need

  just the high byte of the product and +-1 is acceptable

 

But to not make this article too theoretical, I present

two routines:

- MULU8 - multiplicates unsigned numbers in A and X registers

- MULS8 - multiplicates signed numbers in A and X

 

A few words about the routines:

- both routines calculate sums using 6502's indexed addressing mode.

  Unfortunately -A needs to calculated separately, which can be done

  with EOR #$ff and adding one. Instead of adding 1, we use a separate

  lookup table for (x+1)^2.

- for MULU8 one lookup table contains squares of 0..510 and the other

  one -255..255

- for MULS8 both tables contain squares of -255..255

- for MULS8 $80 needs to be added to both input numbers, using EOR #$80.

  This eliminates the ambiguity that arises when adding signed numbers,

  for example:

    $60+$60=$c0

  -$40+0=$c0

  Instead, we have:

    $e0+$e0=$1c0 (positive $c0)

  $40+$80=$c0  (negative -$40)

  If the numbers are small (precisely: abs(A)+abs(X)<128), the EOR #$80

  are redundant, but you need to use different lookup tables.

 

mulu8.asm:

 

* Unsigned multiplication
* A*X -> A:Y (A-high byte, Y-low)
* b. Fox/Tqa

    opt 21
    org $480

sq1l equ $a000
sq1h equ $a200
sq2l equ $a400
sq2h equ $a600

    jsr init

    lda #123
    ldx #234

    sta l1+1
    sta h1+1
    eor #$ff
    sta l2+1
    sta h2+1
    sec
l1   lda sq1l,x
l2   sbc sq2l,x
    tay
h1   lda sq1h,x
h2   sbc sq2h,x

    brk
    brk
    brk

init ldx #0
    stx sq1l
    stx sq1h
    stx sq2l+$ff
    stx sq2h+$ff
    ldy #$ff
msq1 txa
    lsr @
    adc sq1l,x
    sta sq1l+1,x
    sta sq2l-1,y
    sta sq2l+$100,x
    lda #0
    adc sq1h,x
    sta sq1h+1,x
    sta sq2h-1,y
    sta sq2h+$100,x
    inx
    dey
    bne msq1
msq2 tya
    sbc #0 -
    ror @
    adc sq1l+$ff,y
    sta sq1l+$100,y
    lda #0
    adc sq1h+$ff,y
    sta sq1h+$100,y
    iny
    bne msq2
    rts

    end

 

muls8.asm:

* Signed multiplication
* A*X -> A:Y (A-high byte, Y-low)
* b. Fox/Tqa

    opt 21
    org $480

sq1l equ $a000
sq1h equ $a200
sq2l equ $a400
sq2h equ $a600

    jsr init

    lda <123
    ldx <0-45

    eor #$80
    sta l1+1
    sta h1+1
    eor #$ff
    sta l2+1
    sta h2+1
    txa
    eor #$80
    tax
    sec
l1   lda sq1l,x
l2   sbc sq2l,x
    tay
h1   lda sq1h,x
h2   sbc sq2h,x

    brk
    brk
    brk


init ldx #0
    stx sq1l+$100
    stx sq1h+$100
    stx sq2l+$ff
    stx sq2h+$ff
    ldy #$ff
msqr txa
    lsr @
    adc sq1l+$100,x
    sta sq1l,y
    sta sq1l+$101,x
    sta sq2l-1,y
    sta sq2l+$100,x
    lda #0
    adc sq1h+$100,x
    sta sq1h,y
    sta sq1h+$101,x
    sta sq2h-1,y
    sta sq2h+$100,x
    inx
    dey
    bne msqr
    rts

    end

975865[/snapback]

 

Thank you, Fox, the article and routines will be very useful!

 

Michael Rideout

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...