analmux Posted January 20, 2011 Share Posted January 20, 2011 I think we still need that LUT for 6-bit ROR don't we? In the example I was mentioning we already have a 2- and a 5-step self-ROL table. 2 * ROL = 6 * ROR. Quote Link to comment Share on other sites More sharing options...
Rybags Posted January 20, 2011 Share Posted January 20, 2011 OK - forgot that. Getting late. I don't think left rotates are needed anyway for character rendering. Or is there some other purpose? Quote Link to comment Share on other sites More sharing options...
analmux Posted January 20, 2011 Share Posted January 20, 2011 That's why I mentioned the duality. Type (1,2,5) LUT of multiple ROL instuctions = equivalent to type (7,6,3) ( = (3,6,7)) LUT of multiple ROR instruction. So, it doesn't really matter. But, yes, when applying it to the mod 8 rule, then we should express all in #of ROR steps. Then 7 * ROR = 1 * ROL, which can be executed directly with the {CMP #$80: ROL @} or {ASL @:ADC #$00} sequence. Then the 2 * ROL table will be translated to a 6 * ROR table....etc. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted January 20, 2011 Author Share Posted January 20, 2011 (edited) Well... I'm lost. I had a thought. Say the character data is: 1111 1100 I want to rotate this, say, 5 places to the right across TEMP and TEMP+1. If the bit shift table is set up so that low-order bits fall off the right and go straight back into the high-order bits of a byte, I could use a mask to take care of TEMP+1. ldx data lda shift_5,x This would result in: 1110 0111 What I want to see is: TEMP TEMP+1 0000 0111 1110 0000 How about using a table of masks to take the high-order bits of TEMP and place them where they need to go: sta temp ldx pixels ; "pixels" being the number of shifts we want, in this case 5 and masktab-1,x ; mask out low order bits for TEMP+1 sta temp+1 lda temp and masktab2-1,x ; mask out high order bits for TEMP sta temp masktab: %1000 0000 %1100 0000 %1110 0000 %1111 0000 %1111 1000 %1111 1100 %1111 1110 masktab2: %0111 1111 %0011 1111 %0001 1111 %0000 1111 %0000 0111 %0000 0011 %0000 0001 In the above example: ldx data (a = 1111 1100) lda shift_5,x (a = 1110 0111) sta temp ldx pixels (x = 5) and masktab-1,x (1110 0111 & 1111 1000 = 1110 0000) sta temp+1 lda temp and masktab2-1,x (1110 0111 & 0000 0111 = 0000 0111) sta temp This still doesn't take care of how we quickly arrive at "lda shift_5,x" without some kind of jump table or branching. Edited January 20, 2011 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
danwinslow Posted January 20, 2011 Share Posted January 20, 2011 (edited) Well... I'm lost. Lol. You aren't the only one. As far I know, ROL 6 * LUT = lol. Edited January 20, 2011 by danwinslow 1 Quote Link to comment Share on other sites More sharing options...
Rybags Posted January 20, 2011 Share Posted January 20, 2011 (edited) Yes, the table-lookup isn't exactly automatic, you'd either need self-modifying code or a branch decision taking you to the right place. The mask tables like you described are used to discriminate the parts of the rotated byte you put in each bitmap cell. No point doing 2 shift operations on your source data when you can just do the one and use masking. Another shortcut from the 6502 Killer hacks thread that might come in handy: Perform a double-rotate left (with carry) : asl a adc #$80 rol a With an extra rol a that should = a 6 bit shift right, cost = 8 cycles. To swap nybbles (same as a 4-bit rotate without carry) : asl a adc #$80 rol a asl a adc #$80 rol a I suppose the method you employ in the end will come down to cycle-counting and memory considerations. The other headache would be that you have the proportional font spacing calculations ongoing, not to mention having to do bounds testing. Edited January 20, 2011 by Rybags Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted January 20, 2011 Author Share Posted January 20, 2011 Yes, the table-lookup isn't exactly automatic, you'd either need self-modifying code or a branch decision taking you to the right place. It gets less and less enticing by the minute, although I suspect the saving might start to show up with 24 bit wide fonts. The mask tables like you described are used to discriminate the parts of the rotated byte you put in each bitmap cell. No point doing 2 shift operations on your source data when you can just do the one and use masking. I know this: that's why I wrote the routine and posted it here. Another shortcut from the 6502 Killer hacks thread that might come in handy: Perform a double-rotate left (with carry) : asl a adc #$80 rol a With an extra rol a that should = a 6 bit shift right, cost = 8 cycles. To swap nybbles (same as a 4-bit rotate without carry) : asl a adc #$80 rol a asl a adc #$80 rol a That nybble swap is pretty neat. I could probably use that in the fixed-width 4-bit 80 column routine somewhere. I suppose the method you employ in the end will come down to cycle-counting and memory considerations. The other headache would be that you have the proportional font spacing calculations ongoing, not to mention having to do bounds testing. Indeed so. I also considered simply unrolling the bitshifting loop and jumping into the code at a point which would yield the desired number of in-line shifts. Now I have MADS compiler trouble... it doesn't like my structs. Quote Link to comment Share on other sites More sharing options...
Rybags Posted January 21, 2011 Share Posted January 21, 2011 With the bit rotating for character renders, you have to calculate the # of rotations anyway, regardless of the method you choose to do the rotation and bit extraction. So since you have that number, it'd be a simple case of just doing something to it then using it as a Branch offset, or modifier to a JMP instruction. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted January 21, 2011 Author Share Posted January 21, 2011 I was thinking of using indexed indirect mode, and storing #shifts (plus a base offset) into the MSB of the pointer. Then we could say: lda #< shift_0 sta ptr ... lda #> shift_0 clc adc shifts sta ptr+1 ldy char lda (ptr),y Also, the masking technique we discussed earlier extends nicely across 16 bit and 24 bit wide characters; one just continues to mask the current byte's high order bits and place them in the next byte along. I really think this will come into its own when dealing with the larger characters. I was thinking about italic characters, too. With those, the background mask changes on every alternate line of the character, so the fact I've already eliminated the bit shifts in the background masking is good news. When italicised, the upper area of a seven bit character can extend past the right hand side of a sixteen bit range, so I guess I'll have the chance to try out these optimisations soon. Quote Link to comment Share on other sites More sharing options...
analmux Posted January 21, 2011 Share Posted January 21, 2011 Have a look here by the way: http://www.atariage.com/forums/topic/157385-release-of-seitensprung-v015-5th-place-abbuc-sc/page__view__findpost__p__2189142 Seems someone already did a speed-optimization. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted January 21, 2011 Author Share Posted January 21, 2011 (edited) Have a look here by the way: http://www.atariage....ost__p__2189142 Seems someone already did a speed-optimization. Indeed so. The author offered me some source code a year or so back but I couldn't make it worth his while. Edited January 21, 2011 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted January 21, 2011 Author Share Posted January 21, 2011 (edited) OK. I've implemented the shift tables using the following method: ; set up bit shift pointer at top of character render routine lda xpix clc adc #> shift_table sta shiftptr+1 ; index into shift tables And this code replaces the bit shifting loop: ldy temp ; character bitmap data lda (shiftptr),y sta temp ldy xpix and xpixtab-1,y sta temp+1 lda temp and rightpixtab-1,y sta temp This code is skipped if there are no bit shifts required. I calculate that at best this code executes in 31 cycles, and at worst 34. Contrasted with the bit shifting loop: lda temp shiflp lsr temp lsr ror temp+1 dex bne shiflp sta temp This loop executes in 16 or 17 cycles, depending on whether the branch crosses a page boundary. Assuming it doesn't, at best it's 16 cycles (1 shift right), and at worst (7 shifts right) it's 112 cycles. So - if my math is correct - that's an average of 64 cycles. Already a considerable improvement, I think, and it will be even more pronounced when the source data is sixteen bits (or more) wide, requiring two "passes" through the shift logic. fontrender1.wmv Edited January 21, 2011 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
analmux Posted January 21, 2011 Share Posted January 21, 2011 Interesting. Now we see it works, the next step could be saving some memory by replacing the 6 and 7 table by the fast rotate tricks: • Rotating 7 to the right is equivalent to rotating 1 to the left, and we can do this with cmp #$80 rol (if I'm correct this will take 5 cycles each time) • Rotating 6 to the right is equivalent to rotating 2 to the left, and we can do this with asl @ adc #$80 rol @ as Rybags pointed out. Quote Link to comment Share on other sites More sharing options...
analmux Posted January 21, 2011 Share Posted January 21, 2011 I'm not sure whether you need the x-register there, but if not you can replace ldy temp ; character bitmap data lda (shiftptr),y sta temp ldy xpix and xpixtab-1,y sta temp+1 lda temp and rightpixtab-1,y sta temp by ldy temp ; character bitmap data lda (shiftptr),y tax ldy xpix and xpixtab-1,y sta temp+1 txa and rightpixtab-1,y to gain some cycles. ("sta temp" at the end removed, as it's a temp value anyway. Can as well be saved in x-reg) Quote Link to comment Share on other sites More sharing options...
+wood_jl Posted January 21, 2011 Share Posted January 21, 2011 Obviously, this thread is considerably beyond my comprehension, and I possess merely enough intelligence to recognize that you three are bordering on genius. But I wanted to ask - for the GUI - where the fonts are coming from? Would it be possible to allow it to use Windows TTF? Maybe a utility could be written to pare TTF files down to work on the Atari? Then you could use lots of fonts, and not have to create them? .....and now, superior intellects, please resume.... Quote Link to comment Share on other sites More sharing options...
+MrFish Posted January 22, 2011 Share Posted January 22, 2011 Where the fonts are coming from? Would it be possible to allow it to use Windows TTF? Maybe a utility could be written to pare TTF files down to work on the Atari? Then you could use lots of fonts, and not have to create them? The only fonts that will be "created", so to speak, are ones that will be tailored to the GUI and it's desired dimensions and style. Other than that it's only a matter of taste as to whether any existing fonts used need to be "edited", as there are an innumerable amount of fonts that can easily be parsed straight in. Yes, the whole thing could be done without creating a single font. Quote Link to comment Share on other sites More sharing options...
andym00 Posted January 22, 2011 Share Posted January 22, 2011 ldy temp ; character bitmap data lda (shiftptr),y tax ldy xpix and xpixtab-1,y sta temp+1 txa and rightpixtab-1,y ldy temp ; character bitmap data lax (shiftptr),y ldy xpix and xpixtab-1,y sta temp+1 txa and rightpixtab-1,y Illegals for the win Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted January 22, 2011 Author Share Posted January 22, 2011 (edited) Interesting. Now we see it works, the next step could be saving some memory by replacing the 6 and 7 table by the fast rotate tricks: • Rotating 7 to the right is equivalent to rotating 1 to the left, and we can do this with cmp #$80 rol (if I'm correct this will take 5 cycles each time) • Rotating 6 to the right is equivalent to rotating 2 to the left, and we can do this with asl @ adc #$80 rol @ as Rybags pointed out. I might see about implementing this later if the tables prove too expensive in terms of RAM. I should be able to save some space by no longer having to pre-render the mouse pointer (which is drawn in the VBL) now that we have very fast bit shifting routines. In fact, all this stuff is re-usable for icon and UI element rendering. Things are really flying now. I'm not sure whether you need the x-register there, but if not you can replace ldy temp ; character bitmap data lda (shiftptr),y sta temp ldy xpix and xpixtab-1,y sta temp+1 lda temp and rightpixtab-1,y sta temp by ldy temp ; character bitmap data lda (shiftptr),y tax ldy xpix and xpixtab-1,y sta temp+1 txa and rightpixtab-1,y to gain some cycles. ("sta temp" at the end removed, as it's a temp value anyway. Can as well be saved in x-reg) Don't forget that TEMP is actually the left hand half of the character data to be ORed into the screen RAM. It has to find its way into a ZP register at some point. Where the fonts are coming from? Would it be possible to allow it to use Windows TTF? Maybe a utility could be written to pare TTF files down to work on the Atari? Then you could use lots of fonts, and not have to create them? The only fonts that will be "created", so to speak, are ones that will be tailored to the GUI and it's desired dimensions and style. Other than that it's only a matter of taste as to whether any existing fonts used need to be "edited", as there are an innumerable amount of fonts that can easily be parsed straight in. Yes, the whole thing could be done without creating a single font. If anyone wants to write a utility to convert TTF/GEM/Mac fonts to work with this system, I doubt myself or Mr Fish would raise any objections. We're still finalizing the font format, however (and I still need to suggest to Mr Fish that we abandon the Atari internal character sequence), and once we have a reasonable collection of fonts in a few sizes (to prove the font renderer), the sky's the limit. However, creating the fonts is no trivial task: Mr Fish is doing some amazing work creating bespoke fonts which will make the most of screen space and still look great. I've seen the screen mock-ups! ldy temp ; character bitmap data lda (shiftptr),y tax ldy xpix and xpixtab-1,y sta temp+1 txa and rightpixtab-1,y ldy temp ; character bitmap data lax (shiftptr),y ldy xpix and xpixtab-1,y sta temp+1 txa and rightpixtab-1,y Illegals for the win Nice use of LAX! By the way: I'd like to thank all the experts for their insight and for helping me brainstorm this problem. We now have a font renderer 100% faster than the original. Edited January 22, 2011 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted January 22, 2011 Author Share Posted January 22, 2011 (edited) By better management on the mouse pointer enable/disable routine (which waits a jiffy to sync with the interrupt), I've doubled the speed of the menu renderer: I don't think pre-rendered bitmap menu panels would be much faster than that. Edited January 22, 2011 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
analmux Posted January 22, 2011 Share Posted January 22, 2011 Wow, looks fabulous. By the way: I'd like to thank all the experts for their insight and for helping me brainstorm this problem. Well, I don't feel addressed as an "expert" here . I didn't really write similar code before. However, I'm not sure whether the routine will be that fast when Italic is included. I expect you'd need to recompute the 16bit index register multiple times then. But, possibly it's not really a problem, as usually Italic isn't used that much. Quote Link to comment Share on other sites More sharing options...
analmux Posted January 22, 2011 Share Posted January 22, 2011 ...the 16bit index register... Here I refer to "shiftptr". However, rethinking what I wrote: it should only be a matter of INC(or DEC) shiftptr+1 Quote Link to comment Share on other sites More sharing options...
Rybags Posted January 22, 2011 Share Posted January 22, 2011 What's italic though, just shifting the top half of the character an extra place? Shouldn't make a great deal of difference. Whatever you do, please make the pull-downs activated by button-press, not hover-over... or at least have the option to choose. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted January 22, 2011 Author Share Posted January 22, 2011 (edited) The menus work exactly like their Windows counterparts at the moment: they only pull-down when hovered over if you've already pulled a menu down with a click. I don't want to use the old Mac method of making the menus roll up as soon as the button is released. However, I see no reason why the hover-pull-down behaviour can't be made configurable. Usually, italics are shifted right one bit every alternate line (looking at the character from the bottom up). The two line rule applies to any font size, to keep the slant uniform. We then have the slightly unusual situation of potentially shifting a byte more than seven places to the right. It may be worth re-evaluating the bit shift and dividing it into bytes, and loading the character data directly into the adjacent byte and then applying a smaller shift. For example, if the top two lines of an 8-point character are to be shifted right three places, and the "xpix" offset of the character is already 7, 10 / 8 = 1 byte. So we load the top two lines into TEMP+1, then apply a 2 place right shift. The character data may then shift beyond 16 bits, of course, which is where TEMP+2, etc, come into play. Then - if we're drawing in a window - we need to do some clipping to make sure the shifted parts don't overwrite the scroll bar. I used a lot of counters in The Last Word's block marking routines, and they'd probably work well for clipping (which will always be on byte boundaries). Simlar provision will need to be made for objects partly crossing the left hand side of a window. That's a headache for later, though. Edited January 22, 2011 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
atarixle Posted January 22, 2011 Share Posted January 22, 2011 Wanna see my W.I.P.? http://andymanone.dyndns.org/atarixle/download/bossx/BeWeBOSS16.zip should work in any emulator with RAM-Disk and joystick in port 1. This is a very special milestone as it can change the GUI language on-the-fly even without re-starting the program or anything else... Quote Link to comment Share on other sites More sharing options...
+Stephen Posted January 22, 2011 Share Posted January 22, 2011 By better management on the mouse pointer enable/disable routine (which waits a jiffy to sync with the interrupt), I've doubled the speed of the menu renderer: I don't think pre-rendered bitmap menu panels would be much faster than that. That looks incredibly responsive! Really looking forward to this. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.