6502c multiply 10-digit number by 10-digit number

Recommended Posts

It can be done in any coding language but i think assembly will be fastest.

I'm looking for solution that is faster than ATARI basic which is 50-80 muls per sec and it have loss of precision :(

How many times per second can 6502c multiply 10-digit number by 10-digit number without a loss of precision?

Share on other sites

The base Atari OS mathpack FMUL works by repeated addition while counting down multiplier digits. It takes ~9750 cycles on average for RND(0)*RND(0) and ~16600 cycles for worst case of all nines in the multiplier, not counting DMA cycles.

In the current version of Altirra Extended BASIC, I use a decimal mode version of the classic square-table algorithm that can multiply two standard Atari FP numbers with rounding using 1K of tables. Average is ~2010 cycles, worst case ~2030 cycles.

Turbo-Basic XL uses an algorithm that precalculates the multiplicand by all BCD bits and then adds all partial factors for each bit set. Average is ~2180 cycles, worst case ~3120. The final result is not rounded but adding that wouldn't cost much. TBXL's algorithm uses no table space but requires some code space and particularly some temporary space for the premultiplied factors.

In the AltirraOS math pack, I convert the multiplier digit pairs from BCD to binary (0-99), then run a transposed loop that processes the same binary bit position across all multiplier bytes before doubling the multiplicand for the next bit position. Result is rounded. Average time is ~2580 unhalted cycles, worst case around 3560. The advantage of this algorithm is that it requires no large tables or unrolled code and not much temporary space, so it can fit within both the ROM and page zero memory constraints of the math pack.

• 3
• 1

Share on other sites
16 hours ago, Estece said:

How many times per second can 6502c multiply 10-digit number by 10-digit number without a loss of precision?

How are the numbers represented? Fractional, integer? BCD is certainly not a good choice, either. 10 digits require four bytes, with an 8-byte temporary. The classical "egyptian" multiplication by shift & add requires thus 32 loops, within which each time the temporary is shifted (probably ~16 cycles) and the multiplier is added. But that is integer * integer (or, fractional * fractional, with a pre-shift).

Share on other sites

And there are much faster methods compared to bit by bit multiplication. We just need to know what kind of numbers are these.

Share on other sites

Thank You for all this great information and response.

I read about "ENIAC" and in 1946 this machine had 10-digit number as word in it's architecture.

I think number was represented as integer in decimal format 0-9 10 places and sign.

It had 20 registers and a multiply and divide in hardware at 100kHz.

One instruction takes 20 cycles. It can do 352 muls per second.

I was curious if 6502c can beat this old technology.

Share on other sites
15 hours ago, Estece said:

I read about "ENIAC" and in 1946 this machine had 10-digit number as word in it's architecture.

I think number was represented as integer in decimal format 0-9 10 places and sign.

I was curious if 6502c can beat this old technology.

Well, this is pretty much an apples to oranges comparison. The Eniac was a decimal machine, i.e. its accumulator was designed to hold floating point numbers in decimal notation. That made it work "well enough" for the problems - mostly numerics - it was designed for, but would have made it a pretty bad computer (and much overpriced...) for a consumer games device. (-: The Eniac had (sort of) hardware support for multiplication (and division, and square root), so you had not to break them down to more primitive instructions, which is what you have to do on the 6502.

The 6502 was designed as a "budget CPU" that should should win market access due to its cheap price. Hence, it was designed with completely different goals in mind, with completely different architecture. It would have been possible to add a hardware multiply (same as the Eniac has), but this would have made the 6502 to miss its price point, but it would be able to multiply a lot faster than it does.

The fastest way how to multiply with the 6502 I can think of is to have a table containing squares from 00 to 1ff (1K), and then evalulate a * b = 1/4 ((a+b)^2 - (a-b)^2). This requires no bit shifting, but two table look ups, one addition with carry, one subtraction with carry, one comparison (swap a-b to b-a if the former is negative), and a two-byte subtraction. It's probably the fastest way, though it is quite wasteful with RAM.

Share on other sites
1 hour ago, thorfdbg said:

The fastest way how to multiply with the 6502 I can think of is to have a table containing squares from 00 to 1ff (1K), and then evalulate a * b = 1/4 ((a+b)^2 - (a-b)^2). This requires no bit shifting, but two table look ups, one addition with carry, one subtraction with carry, one comparison (swap a-b to b-a if the former is negative), and a two-byte subtraction. It's probably the fastest way, though it is quite wasteful with RAM.

What's hilarious about this method is that if someone asked for a fast way to multiply two numbers and you replied "add the two numbers and square that result, then subtract the two numbers and square that result, and subtract the second result from the first and divide by 4", he'd probably punch you for being obtuse.

Also, whoever came up with that exploit is a genius.

Share on other sites
12 hours ago, ChildOfCv said:

Also, whoever came up with that exploit is a genius.

No, not really. The above trick is a common method in various proofs in linear algebra. It essentially says that it is sufficient to proof certain properties of an inner product on the diagonal.

Share on other sites
31 minutes ago, thorfdbg said:

No, not really. The above trick is a common method in various proofs in linear algebra. It essentially says that it is sufficient to proof certain properties of an inner product on the diagonal.

It's not the algebraic equivalent I'm talking about though.  Simple expansion and gathering of terms shows that it's the same thing.

The genius part is inventing an expansion that allows you to multiply with one small table using squares, rather than, say, a 256x256 triangular table, or to make it manageable, a nibble table (16x16).  In doing so, you make the equation more complex in order to make the computation fast and easy, yet save big over other potential strategies.  That kind of dissonance is not obvious.

Share on other sites
7 hours ago, ChildOfCv said:

The genius part is inventing an expansion that allows you to multiply with one small table using squares, rather than, say, a 256x256 triangular table, or to make it manageable, a nibble table (16x16).  In doing so, you make the equation more complex in order to make the computation fast and easy, yet save big over other potential strategies.  That kind of dissonance is not obvious.

Like I said, this is not such a major invention. It is a rather common trick in linear algebra that has a longer tradition in mathematics, though typically in higher dimension. I believe every student in linear algebra learns this.

Edited by thorfdbg

Share on other sites

Program is based on mads math\fmulu16x16.asm .

Press START to exit.

counter 2 chars counts products

A 5 chars (40 bits signed int) at random from \$D014

B 5 chars (40 bits signed int) at random from \$D014

PRODUCT 10 chars (80 bits signed int) of A*B

It is multiplication program. It stops after 1 second.

6502c can do 570-675 40x40 muls per second without loss of precision.

It beats 34 year old "ENIAC" but only in asm!

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

×   Pasted as rich text.   Paste as plain text instead

Only 75 emoji are allowed.

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.