flashjazzcat Posted September 25, 2014 Share Posted September 25, 2014 Since the project I'm working on is cart-based and there's lots of room, I use a 2KB LUT for bit shifts (all 256 values, all eight positions). I was using the fourth page of that table to perform 16-bit division by 8 until recently (to translate the upper 13 bits of a 16-bit signed screen coordinate into a byte offset), but since the values in the LUT "wrapped around", I had to AND the low shifted bits from the LUT with %00011111, and then AND the high bits from the LUT with %11100000 before ORing them into the byte result. With space being plentiful, I later figured I'd have two special divide by 8 LUTs - one with the high bits of the result cleared, and the other with the low bits cleared. So the divide by 16 now looks like thus: lda cx1 ; LSB of 16-bit value tay and #7 sta xpix ; get pixel offset lda DivLoTable,y ; get LSB divided by 8 ldy cx1+1 ; MSB of 16-bit value ora DivHiTable,y ; OR in MSB divided by 8 sta xbyte ; byte offset This is pretty quick, but I wondered if anyone had anything even faster? 1 Quote Link to comment Share on other sites More sharing options...
Rybags Posted September 26, 2014 Share Posted September 26, 2014 Dealing with graphics, probably the only improvement you'd get is if it's a repeating process - only do lookup on the first go, then bump the pointers, masks etc. around for subsequent iterations. I thought of using a similar process to improve the line-draw algorithm. It seems weird that so much effort has been put into optimising the calculation phase and improving by doing the start at the ends, meet in the middle method but the thing missing is improve the actual plot point by just shuffling the address/mask/data around rather than recalculating it every time. 1 Quote Link to comment Share on other sites More sharing options...
dmsc Posted September 26, 2014 Share Posted September 26, 2014 Hi, Another faster possibility is using the X coordinate as a 16 bit number already multiplied by 32, with the added advantage that you now have "subpixel" position. Then, you have, to get the address and mask: lda cx1+1 sta xbyte ldx cx1 lda MaskTable,x ; 256 byte table with masks, ignoring the lower 5 bits. sta mask Daniel. 1 Quote Link to comment Share on other sites More sharing options...
phaeron Posted September 26, 2014 Share Posted September 26, 2014 Dealing with graphics, probably the only improvement you'd get is if it's a repeating process - only do lookup on the first go, then bump the pointers, masks etc. around for subsequent iterations. I thought of using a similar process to improve the line-draw algorithm. It seems weird that so much effort has been put into optimising the calculation phase and improving by doing the start at the ends, meet in the middle method but the thing missing is improve the actual plot point by just shuffling the address/mask/data around rather than recalculating it every time. That's actually what I do in AltirraOS's line draw routine. It's much faster than using a putpixel routine -- about 3.6x faster at line draw and 15x faster at fill than the XL/XE OS -- but there's still a lot of overhead in doing the incremental updates. The 6502 is great at table lookups and sucks at 16-bit arithmetic. 1 Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted September 26, 2014 Author Share Posted September 26, 2014 Dealing with graphics, probably the only improvement you'd get is if it's a repeating process - only do lookup on the first go, then bump the pointers, masks etc. around for subsequent iterations. Yep: I have a split LSB/MSB table of screen line start addresses, but it's probably quicker to load the pointer up beforehand and then bump it by forty rather than do an indexed lookup for every line (since the MSB only changes every six lines or so). Another faster possibility is using the X coordinate as a 16 bit number already multiplied by 32, with the added advantage that you now have "subpixel" position. Then, you have, to get the address and mask: lda cx1+1 sta xbyte ldx cx1 lda MaskTable,x ; 256 byte table with masks, ignoring the lower 5 bits. sta mask That's a cool idea. The 6502 is great at table lookups and sucks at 16-bit arithmetic. That's why it's great having lots of space for LUTs. A LUT always seems to be the faster solution on the 6502. It'll be interesting to see how fast I can make diagonal line drawing, which I haven't coded up yet (horizontal and vertical lines are treated as memory fill operations). Quote Link to comment Share on other sites More sharing options...
+MrFish Posted September 26, 2014 Share Posted September 26, 2014 It'll be interesting to see how fast I can make diagonal line drawing, which I haven't coded up yet (horizontal and vertical lines are treated as memory fill operations). Peteym5 posted up the sources for a bunch of line-drawing experiments he did a little while back: Fast Line Drawing Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted September 26, 2014 Author Share Posted September 26, 2014 Peteym5 posted up the sources for a bunch of line-drawing experiments he did a little while back: Yep - I downloaded them as soon as he published them but haven't had occasion to look at them yet. I'll be sure to check them out when I start coding up the graphics toolbox. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.