Jump to content
IGNORED

16-bit divide by 8


Recommended Posts

Since the project I'm working on is cart-based and there's lots of room, I use a 2KB LUT for bit shifts (all 256 values, all eight positions). I was using the fourth page of that table to perform 16-bit division by 8 until recently (to translate the upper 13 bits of a 16-bit signed screen coordinate into a byte offset), but since the values in the LUT "wrapped around", I had to AND the low shifted bits from the LUT with %00011111, and then AND the high bits from the LUT with %11100000 before ORing them into the byte result.

 

With space being plentiful, I later figured I'd have two special divide by 8 LUTs - one with the high bits of the result cleared, and the other with the low bits cleared. So the divide by 16 now looks like thus:

	lda cx1 ; LSB of 16-bit value
	tay
	and #7
	sta xpix ; get pixel offset
	lda DivLoTable,y ; get LSB divided by 8
	ldy cx1+1 ; MSB of 16-bit value
	ora DivHiTable,y ; OR in MSB divided by 8
	sta xbyte ; byte offset

This is pretty quick, but I wondered if anyone had anything even faster?

  • Like 1
Link to comment
Share on other sites

Dealing with graphics, probably the only improvement you'd get is if it's a repeating process - only do lookup on the first go, then bump the pointers, masks etc. around for subsequent iterations.

 

I thought of using a similar process to improve the line-draw algorithm. It seems weird that so much effort has been put into optimising the calculation phase and improving by doing the start at the ends, meet in the middle method but the thing missing is improve the actual plot point by just shuffling the address/mask/data around rather than recalculating it every time.

  • Like 1
Link to comment
Share on other sites

Hi,

 

Another faster possibility is using the X coordinate as a 16 bit number already multiplied by 32, with the added advantage that you now have "subpixel" position.

 

Then, you have, to get the address and mask:

    lda cx1+1
    sta xbyte
    ldx cx1
    lda MaskTable,x ; 256 byte table with masks, ignoring the lower 5 bits.
    sta mask

Daniel.

  • Like 1
Link to comment
Share on other sites

Dealing with graphics, probably the only improvement you'd get is if it's a repeating process - only do lookup on the first go, then bump the pointers, masks etc. around for subsequent iterations.

 

I thought of using a similar process to improve the line-draw algorithm. It seems weird that so much effort has been put into optimising the calculation phase and improving by doing the start at the ends, meet in the middle method but the thing missing is improve the actual plot point by just shuffling the address/mask/data around rather than recalculating it every time.

 

That's actually what I do in AltirraOS's line draw routine. It's much faster than using a putpixel routine -- about 3.6x faster at line draw and 15x faster at fill than the XL/XE OS -- but there's still a lot of overhead in doing the incremental updates. The 6502 is great at table lookups and sucks at 16-bit arithmetic.

  • Like 1
Link to comment
Share on other sites

Dealing with graphics, probably the only improvement you'd get is if it's a repeating process - only do lookup on the first go, then bump the pointers, masks etc. around for subsequent iterations.

Yep: I have a split LSB/MSB table of screen line start addresses, but it's probably quicker to load the pointer up beforehand and then bump it by forty rather than do an indexed lookup for every line (since the MSB only changes every six lines or so).

 

Another faster possibility is using the X coordinate as a 16 bit number already multiplied by 32, with the added advantage that you now have "subpixel" position.

 

Then, you have, to get the address and mask:

    lda cx1+1
    sta xbyte
    ldx cx1
    lda MaskTable,x ; 256 byte table with masks, ignoring the lower 5 bits.
    sta mask

 

That's a cool idea.

 

The 6502 is great at table lookups and sucks at 16-bit arithmetic.

That's why it's great having lots of space for LUTs. A LUT always seems to be the faster solution on the 6502. It'll be interesting to see how fast I can make diagonal line drawing, which I haven't coded up yet (horizontal and vertical lines are treated as memory fill operations).

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...