Jump to content
IGNORED

need multiply by 1.6 and by 2.6666... algorithms/routines


Recommended Posts

What sort of numbers? Unsigned 16-bit binary? And done with assembler?

 

My thinking is you might need to do multiple steps.

 

For multiply by 1.6, I'd look for a divide by 5 routine. So, divide by 5 then add the answer 3 times to your number.

 

For multiply by 2.666~ I'd look for a divide by 3 routine. So, divide by 3 then double that. Then add your number in twice (quick mult by 2 would help too).

Link to comment
Share on other sites

Probably yes: you want to end up dividing via shifts. The conversion in the opposite direction is easier (see the existing source code).

 

Try to use OS math routines if possible.

Unfortunately this will kill performance. The tables are already the most efficient solution, but if we're going to replace them with two calculations, we might as well code something bespoke which is as fast as possible.

Edited by flashjazzcat
  • Like 1
Link to comment
Share on other sites

I'd assume, this computation occurs very often and speed is crucial. Therefore I would'd give up the tables. I'd combine static CMP for the high byte (which will set CARRY and result in 0 or 1) with the lookup table for the low byte. This way your table size requirement is cut in half with just a few more cycles.

Link to comment
Share on other sites

0-511.

 

You can see the table here:

https://github.com/tschak909/platoterm64/blob/master/src/atari/touch.c

 

So it scales a coordinate from 320,192 to 512,512.

 

-Thom

 

Another observation, all delta's in your table are less than 4, so you could compress the tables to 25% of its original size (8 bits --> 2 bits). All look-ups have to traverse the delta table until it reaches the desired value. On average, assuming all coordinates are equally likely, this needs 160 adds.

  • Like 1
Link to comment
Share on other sites

Totally untested, but should work with minor modifications (uses standard divide routine from Codebase64):

 

divisor = $80 
dividend = $82
remainder = $84
result = dividend	; save memory by reusing divident to store the result


	org $2000

.proc ConvertCoordinates
	lda cx+1	; process x
	sta dividend
	lda cx	; multiply x by 8
	asl
	rol dividend+1
	asl
	rol dividend+1
	asl
	rol dividend+1
	sta dividend
	
	mva #5 divisor
	jsr Divide	; leaves result in dividend

	mwa dividend NewX

	lda cy+1	; process y
	sta dividend+1
	lda cy
	asl
	rol dividend+1	; mult by 8
	asl
	rol dividend+1
	asl
	rol dividend+1
	sta dividend
	
	mva #3 divisor
	jsr Divide	; leaves result in dividend

	mwa dividend NewY
	
	rts
.endp


	
	
.proc Divide
	lda #0	; preset remainder to 0
	sta remainder
	sta remainder+1
	ldx #16		; repeat for each bit: ...

DivLoop
	asl dividend	; dividend lb & hb*2, msb -> Carry
	rol dividend+1	
	rol remainder	; remainder lb & hb * 2 + msb from carry
	rol remainder+1
	lda remainder
	sec
	sbc divisor	; divisor is always 8-bit
	tay		; lb result -> Y, for we may need it later
	lda remainder+1
	sbc #0
	bcc skip	; if c=0 then divisor didn't fit

	sta remainder+1	; else save substraction result as new remainder,
	sty remainder	
	inc result

skip
	dex
	bne divloop	
	rts
.endp
Edited by flashjazzcat
  • Like 4
Link to comment
Share on other sites

Yet another observation, the accuracy of the pointer/touch devices on an Atari 8-bit is pretty limited. Light pen, trackball, koala pad, PMG mouse pointer never exceed 160-positions accuracy. With this in mind, you could reduce the table size by 50% and still have instant look-ups.

Edited by ivop
  • Like 2
Link to comment
Share on other sites

This type of scaling can be done with a fixed-point multiply: shift bits right from the source and shift result right while conditionally adding 160 or 96.

    ;0-511 input in A:X
    stx temp
    and #1
    ldx #9
bitloop:
    ror
    ror temp
    bcc no_bit
    adc #160-1
no_bit:
    dex
    bne bit_loop
    ;0-319 result in C:A

  • Like 4
Link to comment
Share on other sites

Hello Jon

 

Which is the same as multiply by 8 and divide by 5. :)

 

And the same as just multiplying by 1.6. :-) To the computer dividing by 5 might be as easy as dividing by 10, but to a human being, dividing by 10 seems easier than dividing by 5.

 

Sincerely

 

Mathy

Link to comment
Share on other sites

To the computer dividing by 5 might be as easy as dividing by 10...

Hopefully dividing by 5 is easier for the computer than dividing by 10, and certainly multiplying by 8 is easier than multiplying by 16 (two fewer rotate instructions). This is the entire point about reducing the calculation to the lowest common denominator and avoiding decimal fractions, since it's the computer which will be doing the work.

Edited by flashjazzcat
  • Like 1
Link to comment
Share on other sites

Similar to ivop's idea, but with less of a performance hit (and not as good compression), for 1.6x, take the index and add it to a value from the table. Since the maximum delta is 192 (0.6 * 320), the table can be uint8_t instead of uint16_t, so 1 byte per entry, or 320 bytes.

Using logarithmic approaches can reduce the table size even further than ivop's idea but at higher and higher performance cost. For example, store all the values from 0 to 49 (0 => 79), that table takes 50 bytes. Set the result to 0, and add 80 every time you can subtract 50 from the input until the value is less than 50 (you might have to do this as many as 6 times for values 0-319). Then add the table result to the final result.

If you only need to do this calculation once, the table can be dropped down to as small as 5 bytes, and the code for computing the result is the same as above (only subtract 5 and add 3 ).

The same approach can be used for 8 / 3.

  • Like 1
Link to comment
Share on other sites

I like flashcatjazz's code. I haven't tested it but it looks like it will work. :)

 

I did notice some repeated code, so in this case we could load X with the high byte and put the multiply by eight in front of the division routine. It would save 13 bytes I think.

divisor = $80 
dividend = $82
remainder = $84
result = dividend	; save memory by reusing divident to store the result


	org $2000

.proc ConvertCoordinates
	ldx cx+1	; process x
	lda cx		; multiply x by 8

	mva #5 divisor
	jsr MultEightThanDivide	; leaves result in dividend

	mwa dividend NewX

	ldx cy+1	; process y
	lda cy
	
	mva #3 divisor
	jsr MultEightThanDivide	; leaves result in dividend

	mwa dividend NewY
	
	rts
.endp



.proc MultEightThanDivide
	stx dividend
	asl
	rol dividend+1	; mult by 8
	asl
	rol dividend+1
	asl
	rol dividend+1
	sta dividend
	
.proc Divide
	lda #0	; preset remainder to 0
	sta remainder
	sta remainder+1
	ldx #16		; repeat for each bit: ...

DivLoop
	asl dividend	; dividend lb & hb*2, msb -> Carry
	rol dividend+1	
	rol remainder	; remainder lb & hb * 2 + msb from carry
	rol remainder+1
	lda remainder
	sec
	sbc divisor		; divisor is always 8-bit
	tay			; lb result -> Y, for we may need it later
	lda remainder+1
	sbc #0
	bcc skip		; if c=0 then divisor didn't fit

	sta remainder+1	; else save substraction result as new remainder,
	sty remainder	
	inc result

skip
	dex
	bne divloop	
	rts
.endp
Edited by Omegamatrix
  • Like 1
Link to comment
Share on other sites

I would avoid division entirely, and approximate the multiplicator with power of 2 values, e.g.

1.6 ~= 1 + 0.5 + 0.0625 + 0.03125 = 1 + 1/2 + 1/16 + 1/32 = 1.59375

This could be implemented by a collection of shifts and adds and no loop is required.

 

regards,

chris

Edited by sanny
Link to comment
Share on other sites

Phaeron's code is elegant, but doesn't it scale down from 512x512 to 320x192, which is the exact opposite of what Thom was looking for, scale up from 320x192 to 512x512? I'm sure it can be reversed in a simple way, though I'm a little too thick to figure out how.

Edited by carlsson
  • Like 1
Link to comment
Share on other sites

I did notice some repeated code, so in this case we could load X with the high byte and put the multiply by eight in front of the division routine. It would save 13 bytes I think.

I'm sure it could be shrunk and improved significantly, but it does basically work.

 

Phaeron's code is elegant, but doesn't it scale down from 512x512 to 320x192, which is the exact opposite of what Thom was looking for, scale up from 320x192 to 512x512?

I noticed that too, but wasn't sure and Avery's methodology is so far above my head that I didn't feel qualified to comment. :)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...