tschak909 Posted December 16, 2018 Share Posted December 16, 2018 Hey guys, Does anyone have a routine to multiply by 1.6 and/or 2.6666... and round up the answer? I am trying to replace two tables in my program, which will free up 1K of RAM. -Thom 1 Quote Link to comment Share on other sites More sharing options...
R0ger Posted December 16, 2018 Share Posted December 16, 2018 How big are the numbers ? 1K ? Quote Link to comment Share on other sites More sharing options...
tschak909 Posted December 16, 2018 Author Share Posted December 16, 2018 0-511. You can see the table here: https://github.com/tschak909/platoterm64/blob/master/src/atari/touch.c So it scales a coordinate from 320,192 to 512,512. -Thom Quote Link to comment Share on other sites More sharing options...
Rybags Posted December 16, 2018 Share Posted December 16, 2018 What sort of numbers? Unsigned 16-bit binary? And done with assembler? My thinking is you might need to do multiple steps. For multiply by 1.6, I'd look for a divide by 5 routine. So, divide by 5 then add the answer 3 times to your number. For multiply by 2.666~ I'd look for a divide by 3 routine. So, divide by 3 then double that. Then add your number in twice (quick mult by 2 would help too). Quote Link to comment Share on other sites More sharing options...
MaPa Posted December 16, 2018 Share Posted December 16, 2018 Try to use OS math routines if possible. 1 Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted December 16, 2018 Share Posted December 16, 2018 Didn't we already replace the tables which did the reverse calculations? Anyway: I'd take the same approach here: Multiply 320 by 8 (easy) then divide by 5 Multiply 192 by 8 (easy) then divide by 3 Quote Link to comment Share on other sites More sharing options...
foft Posted December 16, 2018 Share Posted December 16, 2018 Isnt it better to do a hard multiple and easy divide? Eg multiply by 682 and divide by 256? 2 Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted December 16, 2018 Share Posted December 16, 2018 (edited) Probably yes: you want to end up dividing via shifts. The conversion in the opposite direction is easier (see the existing source code). Try to use OS math routines if possible. Unfortunately this will kill performance. The tables are already the most efficient solution, but if we're going to replace them with two calculations, we might as well code something bespoke which is as fast as possible. Edited December 16, 2018 by flashjazzcat 1 Quote Link to comment Share on other sites More sharing options...
+JAC! Posted December 16, 2018 Share Posted December 16, 2018 I'd assume, this computation occurs very often and speed is crucial. Therefore I would'd give up the tables. I'd combine static CMP for the high byte (which will set CARRY and result in 0 or 1) with the lookup table for the low byte. This way your table size requirement is cut in half with just a few more cycles. Quote Link to comment Share on other sites More sharing options...
tschak909 Posted December 16, 2018 Author Share Posted December 16, 2018 This calculation only happens when a pointer device button is pressed, because it converts touch calculations from Atari's coordinate system, to PLATO. So virtually anything will help, as the tables are 1024 bytes in total, which is very critical space to reclaim. -Thom Quote Link to comment Share on other sites More sharing options...
ivop Posted December 16, 2018 Share Posted December 16, 2018 0-511. You can see the table here: https://github.com/tschak909/platoterm64/blob/master/src/atari/touch.c So it scales a coordinate from 320,192 to 512,512. -Thom Another observation, all delta's in your table are less than 4, so you could compress the tables to 25% of its original size (8 bits --> 2 bits). All look-ups have to traverse the delta table until it reaches the desired value. On average, assuming all coordinates are equally likely, this needs 160 adds. 1 Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted December 16, 2018 Share Posted December 16, 2018 (edited) Totally untested, but should work with minor modifications (uses standard divide routine from Codebase64): divisor = $80 dividend = $82 remainder = $84 result = dividend ; save memory by reusing divident to store the result org $2000 .proc ConvertCoordinates lda cx+1 ; process x sta dividend lda cx ; multiply x by 8 asl rol dividend+1 asl rol dividend+1 asl rol dividend+1 sta dividend mva #5 divisor jsr Divide ; leaves result in dividend mwa dividend NewX lda cy+1 ; process y sta dividend+1 lda cy asl rol dividend+1 ; mult by 8 asl rol dividend+1 asl rol dividend+1 sta dividend mva #3 divisor jsr Divide ; leaves result in dividend mwa dividend NewY rts .endp .proc Divide lda #0 ; preset remainder to 0 sta remainder sta remainder+1 ldx #16 ; repeat for each bit: ... DivLoop asl dividend ; dividend lb & hb*2, msb -> Carry rol dividend+1 rol remainder ; remainder lb & hb * 2 + msb from carry rol remainder+1 lda remainder sec sbc divisor ; divisor is always 8-bit tay ; lb result -> Y, for we may need it later lda remainder+1 sbc #0 bcc skip ; if c=0 then divisor didn't fit sta remainder+1 ; else save substraction result as new remainder, sty remainder inc result skip dex bne divloop rts .endp Edited December 16, 2018 by flashjazzcat 4 Quote Link to comment Share on other sites More sharing options...
ivop Posted December 16, 2018 Share Posted December 16, 2018 (edited) Yet another observation, the accuracy of the pointer/touch devices on an Atari 8-bit is pretty limited. Light pen, trackball, koala pad, PMG mouse pointer never exceed 160-positions accuracy. With this in mind, you could reduce the table size by 50% and still have instant look-ups. Edited December 16, 2018 by ivop 2 Quote Link to comment Share on other sites More sharing options...
phaeron Posted December 16, 2018 Share Posted December 16, 2018 This type of scaling can be done with a fixed-point multiply: shift bits right from the source and shift result right while conditionally adding 160 or 96. ;0-511 input in A:X stx temp and #1 ldx #9 bitloop: ror ror temp bcc no_bit adc #160-1 no_bit: dex bne bit_loop ;0-319 result in C:A 4 Quote Link to comment Share on other sites More sharing options...
Mathy Posted December 16, 2018 Share Posted December 16, 2018 Hello Thomas Multiply by 1.6: Multiply by 16, then divide by 10 Sincerely Mathy Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted December 16, 2018 Share Posted December 16, 2018 Multiply by 1.6: Multiply by 16, then divide by 10 Which is the same as multiply by 8 and divide by 5. Quote Link to comment Share on other sites More sharing options...
Mathy Posted December 16, 2018 Share Posted December 16, 2018 Hello Jon Which is the same as multiply by 8 and divide by 5. And the same as just multiplying by 1.6. :-) To the computer dividing by 5 might be as easy as dividing by 10, but to a human being, dividing by 10 seems easier than dividing by 5. Sincerely Mathy Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted December 16, 2018 Share Posted December 16, 2018 (edited) To the computer dividing by 5 might be as easy as dividing by 10... Hopefully dividing by 5 is easier for the computer than dividing by 10, and certainly multiplying by 8 is easier than multiplying by 16 (two fewer rotate instructions). This is the entire point about reducing the calculation to the lowest common denominator and avoiding decimal fractions, since it's the computer which will be doing the work. Edited December 16, 2018 by flashjazzcat 1 Quote Link to comment Share on other sites More sharing options...
Atari_Ace Posted December 16, 2018 Share Posted December 16, 2018 Similar to ivop's idea, but with less of a performance hit (and not as good compression), for 1.6x, take the index and add it to a value from the table. Since the maximum delta is 192 (0.6 * 320), the table can be uint8_t instead of uint16_t, so 1 byte per entry, or 320 bytes. Using logarithmic approaches can reduce the table size even further than ivop's idea but at higher and higher performance cost. For example, store all the values from 0 to 49 (0 => 79), that table takes 50 bytes. Set the result to 0, and add 80 every time you can subtract 50 from the input until the value is less than 50 (you might have to do this as many as 6 times for values 0-319). Then add the table result to the final result. If you only need to do this calculation once, the table can be dropped down to as small as 5 bytes, and the code for computing the result is the same as above (only subtract 5 and add 3 ). The same approach can be used for 8 / 3. 1 Quote Link to comment Share on other sites More sharing options...
tschak909 Posted December 17, 2018 Author Share Posted December 17, 2018 Performance isn't an issue with this particular routine, as it is only called when a mouse button is pressed (by mouse button I mean either a mouse, or tablet, or joystick or light pen button) -Thom 1 Quote Link to comment Share on other sites More sharing options...
Omegamatrix Posted December 17, 2018 Share Posted December 17, 2018 (edited) I like flashcatjazz's code. I haven't tested it but it looks like it will work. I did notice some repeated code, so in this case we could load X with the high byte and put the multiply by eight in front of the division routine. It would save 13 bytes I think. divisor = $80 dividend = $82 remainder = $84 result = dividend ; save memory by reusing divident to store the result org $2000 .proc ConvertCoordinates ldx cx+1 ; process x lda cx ; multiply x by 8 mva #5 divisor jsr MultEightThanDivide ; leaves result in dividend mwa dividend NewX ldx cy+1 ; process y lda cy mva #3 divisor jsr MultEightThanDivide ; leaves result in dividend mwa dividend NewY rts .endp .proc MultEightThanDivide stx dividend asl rol dividend+1 ; mult by 8 asl rol dividend+1 asl rol dividend+1 sta dividend .proc Divide lda #0 ; preset remainder to 0 sta remainder sta remainder+1 ldx #16 ; repeat for each bit: ... DivLoop asl dividend ; dividend lb & hb*2, msb -> Carry rol dividend+1 rol remainder ; remainder lb & hb * 2 + msb from carry rol remainder+1 lda remainder sec sbc divisor ; divisor is always 8-bit tay ; lb result -> Y, for we may need it later lda remainder+1 sbc #0 bcc skip ; if c=0 then divisor didn't fit sta remainder+1 ; else save substraction result as new remainder, sty remainder inc result skip dex bne divloop rts .endp Edited December 17, 2018 by Omegamatrix 1 Quote Link to comment Share on other sites More sharing options...
sanny Posted December 17, 2018 Share Posted December 17, 2018 (edited) I would avoid division entirely, and approximate the multiplicator with power of 2 values, e.g. 1.6 ~= 1 + 0.5 + 0.0625 + 0.03125 = 1 + 1/2 + 1/16 + 1/32 = 1.59375 This could be implemented by a collection of shifts and adds and no loop is required. regards, chris Edited December 17, 2018 by sanny Quote Link to comment Share on other sites More sharing options...
carlsson Posted December 17, 2018 Share Posted December 17, 2018 (edited) Phaeron's code is elegant, but doesn't it scale down from 512x512 to 320x192, which is the exact opposite of what Thom was looking for, scale up from 320x192 to 512x512? I'm sure it can be reversed in a simple way, though I'm a little too thick to figure out how. Edited December 17, 2018 by carlsson 1 Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted December 17, 2018 Share Posted December 17, 2018 I did notice some repeated code, so in this case we could load X with the high byte and put the multiply by eight in front of the division routine. It would save 13 bytes I think. I'm sure it could be shrunk and improved significantly, but it does basically work. Phaeron's code is elegant, but doesn't it scale down from 512x512 to 320x192, which is the exact opposite of what Thom was looking for, scale up from 320x192 to 512x512? I noticed that too, but wasn't sure and Avery's methodology is so far above my head that I didn't feel qualified to comment. Quote Link to comment Share on other sites More sharing options...
Rybags Posted December 17, 2018 Share Posted December 17, 2018 This should be made into a programming comp... criteria being speed, accuracy and keeping the overall size under what a lookup table would be. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.