need multiply by 1.6 and by 2.6666... algorithms/routines

tschak909 · December 16, 2018

Hey guys,

Does anyone have a routine to multiply by 1.6 and/or 2.6666... and round up the answer? I am trying to replace two tables in my program, which will free up 1K of RAM.

-Thom

R0ger · December 16, 2018

How big are the numbers ? 1K ?

tschak909 · December 16, 2018

0-511.

You can see the table here:

https://github.com/tschak909/platoterm64/blob/master/src/atari/touch.c

So it scales a coordinate from 320,192 to 512,512.

-Thom

Rybags · December 16, 2018

What sort of numbers? Unsigned 16-bit binary? And done with assembler?

My thinking is you might need to do multiple steps.

For multiply by 1.6, I'd look for a divide by 5 routine. So, divide by 5 then add the answer 3 times to your number.

For multiply by 2.666~ I'd look for a divide by 3 routine. So, divide by 3 then double that. Then add your number in twice (quick mult by 2 would help too).

MaPa · December 16, 2018

Try to use OS math routines if possible.

flashjazzcat · December 16, 2018

Didn't we already replace the tables which did the reverse calculations? Anyway: I'd take the same approach here:

Multiply 320 by 8 (easy) then divide by 5

Multiply 192 by 8 (easy) then divide by 3

foft · December 16, 2018

Isnt it better to do a hard multiple and easy divide?

Eg multiply by 682 and divide by 256?

flashjazzcat · December 16, 2018

Probably yes: you want to end up dividing via shifts. The conversion in the opposite direction is easier (see the existing source code).

Try to use OS math routines if possible.

Unfortunately this will kill performance. The tables are already the most efficient solution, but if we're going to replace them with two calculations, we might as well code something bespoke which is as fast as possible.

Edited December 16, 2018 by flashjazzcat

+JAC! · December 16, 2018

I'd assume, this computation occurs very often and speed is crucial. Therefore I would'd give up the tables. I'd combine static CMP for the high byte (which will set CARRY and result in 0 or 1) with the lookup table for the low byte. This way your table size requirement is cut in half with just a few more cycles.

tschak909 · December 16, 2018

This calculation only happens when a pointer device button is pressed, because it converts touch calculations from Atari's coordinate system, to PLATO.

So virtually anything will help, as the tables are 1024 bytes in total, which is very critical space to reclaim.

-Thom

ivop · December 16, 2018

0-511.

You can see the table here:

https://github.com/tschak909/platoterm64/blob/master/src/atari/touch.c

So it scales a coordinate from 320,192 to 512,512.

-Thom

Another observation, all delta's in your table are less than 4, so you could compress the tables to 25% of its original size (8 bits --> 2 bits). All look-ups have to traverse the delta table until it reaches the desired value. On average, assuming all coordinates are equally likely, this needs 160 adds.

flashjazzcat · December 16, 2018

Totally untested, but should work with minor modifications (uses standard divide routine from Codebase64):

divisor = $80 
dividend = $82
remainder = $84
result = dividend	; save memory by reusing divident to store the result


	org $2000

.proc ConvertCoordinates
	lda cx+1	; process x
	sta dividend
	lda cx	; multiply x by 8
	asl
	rol dividend+1
	asl
	rol dividend+1
	asl
	rol dividend+1
	sta dividend
	
	mva #5 divisor
	jsr Divide	; leaves result in dividend

	mwa dividend NewX

	lda cy+1	; process y
	sta dividend+1
	lda cy
	asl
	rol dividend+1	; mult by 8
	asl
	rol dividend+1
	asl
	rol dividend+1
	sta dividend
	
	mva #3 divisor
	jsr Divide	; leaves result in dividend

	mwa dividend NewY
	
	rts
.endp


	
	
.proc Divide
	lda #0	; preset remainder to 0
	sta remainder
	sta remainder+1
	ldx #16		; repeat for each bit: ...

DivLoop
	asl dividend	; dividend lb & hb*2, msb -> Carry
	rol dividend+1	
	rol remainder	; remainder lb & hb * 2 + msb from carry
	rol remainder+1
	lda remainder
	sec
	sbc divisor	; divisor is always 8-bit
	tay		; lb result -> Y, for we may need it later
	lda remainder+1
	sbc #0
	bcc skip	; if c=0 then divisor didn't fit

	sta remainder+1	; else save substraction result as new remainder,
	sty remainder	
	inc result

skip
	dex
	bne divloop	
	rts
.endp

Edited December 16, 2018 by flashjazzcat

ivop · December 16, 2018

Yet another observation, the accuracy of the pointer/touch devices on an Atari 8-bit is pretty limited. Light pen, trackball, koala pad, PMG mouse pointer never exceed 160-positions accuracy. With this in mind, you could reduce the table size by 50% and still have instant look-ups.

Edited December 16, 2018 by ivop

phaeron · December 16, 2018

This type of scaling can be done with a fixed-point multiply: shift bits right from the source and shift result right while conditionally adding 160 or 96.

    ;0-511 input in A:X
    stx temp
    and #1
    ldx #9
bitloop:
    ror
    ror temp
    bcc no_bit
    adc #160-1
no_bit:
    dex
    bne bit_loop
    ;0-319 result in C:A

Mathy · December 16, 2018

Hello Thomas

Multiply by 1.6: Multiply by 16, then divide by 10

Sincerely

Mathy

flashjazzcat · December 16, 2018

Multiply by 1.6: Multiply by 16, then divide by 10

Which is the same as multiply by 8 and divide by 5.

Mathy · December 16, 2018

Hello Jon

Which is the same as multiply by 8 and divide by 5.

And the same as just multiplying by 1.6. :-) To the computer dividing by 5 might be as easy as dividing by 10, but to a human being, dividing by 10 seems easier than dividing by 5.

Sincerely

Mathy

flashjazzcat · December 16, 2018

To the computer dividing by 5 might be as easy as dividing by 10...

Hopefully dividing by 5 is easier for the computer than dividing by 10, and certainly multiplying by 8 is easier than multiplying by 16 (two fewer rotate instructions). This is the entire point about reducing the calculation to the lowest common denominator and avoiding decimal fractions, since it's the computer which will be doing the work.

Edited December 16, 2018 by flashjazzcat

Atari_Ace · December 16, 2018

Similar to ivop's idea, but with less of a performance hit (and not as good compression), for 1.6x, take the index and add it to a value from the table. Since the maximum delta is 192 (0.6 * 320), the table can be uint8_t instead of uint16_t, so 1 byte per entry, or 320 bytes.

Using logarithmic approaches can reduce the table size even further than ivop's idea but at higher and higher performance cost. For example, store all the values from 0 to 49 (0 => 79), that table takes 50 bytes. Set the result to 0, and add 80 every time you can subtract 50 from the input until the value is less than 50 (you might have to do this as many as 6 times for values 0-319). Then add the table result to the final result.

If you only need to do this calculation once, the table can be dropped down to as small as 5 bytes, and the code for computing the result is the same as above (only subtract 5 and add 3 ).

The same approach can be used for 8 / 3.

tschak909 · December 17, 2018

Performance isn't an issue with this particular routine, as it is only called when a mouse button is pressed (by mouse button I mean either a mouse, or tablet, or joystick or light pen button)

-Thom

Omegamatrix · December 17, 2018

I like flashcatjazz's code. I haven't tested it but it looks like it will work.

I did notice some repeated code, so in this case we could load X with the high byte and put the multiply by eight in front of the division routine. It would save 13 bytes I think.

divisor = $80 
dividend = $82
remainder = $84
result = dividend	; save memory by reusing divident to store the result


	org $2000

.proc ConvertCoordinates
	ldx cx+1	; process x
	lda cx		; multiply x by 8

	mva #5 divisor
	jsr MultEightThanDivide	; leaves result in dividend

	mwa dividend NewX

	ldx cy+1	; process y
	lda cy
	
	mva #3 divisor
	jsr MultEightThanDivide	; leaves result in dividend

	mwa dividend NewY
	
	rts
.endp



.proc MultEightThanDivide
	stx dividend
	asl
	rol dividend+1	; mult by 8
	asl
	rol dividend+1
	asl
	rol dividend+1
	sta dividend
	
.proc Divide
	lda #0	; preset remainder to 0
	sta remainder
	sta remainder+1
	ldx #16		; repeat for each bit: ...

DivLoop
	asl dividend	; dividend lb & hb*2, msb -> Carry
	rol dividend+1	
	rol remainder	; remainder lb & hb * 2 + msb from carry
	rol remainder+1
	lda remainder
	sec
	sbc divisor		; divisor is always 8-bit
	tay			; lb result -> Y, for we may need it later
	lda remainder+1
	sbc #0
	bcc skip		; if c=0 then divisor didn't fit

	sta remainder+1	; else save substraction result as new remainder,
	sty remainder	
	inc result

skip
	dex
	bne divloop	
	rts
.endp

Edited December 17, 2018 by Omegamatrix

sanny · December 17, 2018

I would avoid division entirely, and approximate the multiplicator with power of 2 values, e.g.

1.6 ~= 1 + 0.5 + 0.0625 + 0.03125 = 1 + 1/2 + 1/16 + 1/32 = 1.59375

This could be implemented by a collection of shifts and adds and no loop is required.

regards,

chris

Edited December 17, 2018 by sanny

carlsson · December 17, 2018

Phaeron's code is elegant, but doesn't it scale down from 512x512 to 320x192, which is the exact opposite of what Thom was looking for, scale up from 320x192 to 512x512? I'm sure it can be reversed in a simple way, though I'm a little too thick to figure out how.

Edited December 17, 2018 by carlsson

flashjazzcat · December 17, 2018

I did notice some repeated code, so in this case we could load X with the high byte and put the multiply by eight in front of the division routine. It would save 13 bytes I think.

I'm sure it could be shrunk and improved significantly, but it does basically work.

Phaeron's code is elegant, but doesn't it scale down from 512x512 to 320x192, which is the exact opposite of what Thom was looking for, scale up from 320x192 to 512x512?

I noticed that too, but wasn't sure and Avery's methodology is so far above my head that I didn't feel qualified to comment.

Rybags · December 17, 2018

This should be made into a programming comp... criteria being speed, accuracy and keeping the overall size under what a lookup table would be.

need multiply by 1.6 and by 2.6666... algorithms/routines

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members