Jump to content

bogax

Members
  • Content Count

    902
  • Joined

  • Last visited

Everything posted by bogax

  1. Excellent! I 'd forgotten all about that. I'm curious, how you derived that. Did you have some sort of sytematic approach or was it just sort of by guess and by golly. (I never did figure out how to make /5 work past 179 using a similar adjustment) Ought compile a list of such routines somewhere. Maybe bump the divide by seven thread.
  2. Haven't taken the time to find my way all through your code. Here's a constant time divide by 15 (reciprocal multiplication) sta temp lsr lsr lsr lsr adc temp ror lsr lsr lsr adc temp ror lsr lsr lsr
  3. After a second look, it's really not much more to use all 8 bits It only costs a couple cycles and it gives you a better distribution. The code is basically the same. sta temp lsr adc temp ror lsr lsr adc temp ror lsr lsr lsr eor #$FF clc adc #$01
  4. Have you actually checked to see how well that works? If it were really random, I don't think you'd get an even distribution. I think on average 0 and 18 will come up 1 in 76 times 17 and 1 will come up 2 in 76 times 2 and 16 will come up 3 out of 76 and all the rest 4 out of 76. If it's not random (and it's not, but maybe close enough) it might be worse. Imagine what it would be if it were a simple binary counter instead of a polynomial counter (LFSR)
  5. One way is to just scale what you've got to what you want You have a trade off between how much adding of extra pieces you want to do and how nice a distribution you want You could start with 5 bits ie begin by ANDing your random number with $1F but the distribution won't be very good You probably don't need to use all eight bits though (I'm assuming you're starting with 8 bits ie 0-255) Here's code that uses a number 0-63 It multiplies (with some rounding thrown in, ie no clearing the carry for the adds) by 1.1875, 1 3/16 then divides by 4 and negates ; assuming there's a random number 0=255 in a and #$3F ; mask for 0-63 call that r sta temp lsr ; / 2 = .5 * r adc temp ; + 1 = 1.5 * r lsr lsr lsr ; 1.5 / 8 = .1875 * r adc temp ; + 1 = 1.875 * r , 75 max with rounding lsr lsr ; 75 / 4 = 18 max ; negate eor #$FF clc adc #$01
  6. I didn't mean that particular bit I meant elsewhere where there's a mutiply by 16 specifically, this line in the print_screenright routine: if change = 32 then worldx = rand*16 : change = 0 else change = change + 1
  7. re GroovyBee's comment about faster than calling a generic multiply, if you use *4*4 or *2*8 instead of *16 bB will use shifts instead of caling the multiply routine. There's other optimizations you could make also. (but they might mess with the clarity of your code)
  8. Here's an UNTESTED routine that counts through pf columns and rows maintaing a ds_byte counter and a current_bit counter in parallel You supply it with the first column and row from the data statement in ds_col and ds_row there's 64 columns 0-63 in the data statement but you only want to go up to 32 or you'll get into the next data statement row when writing the playfield obviously, rows will be similar and if you go too far you'll fall off the end of the data statement http://pastebin.com/JyGxd3dv dim pf_row = a dim pf_col = b dim ds_start_bit = c dim current_bit = d dim current_byte = e dim ds_byte = e dim ds_col = g dim ds_row = h printpf ds_byte = ds_col / 8 ds_byte = ds_row * 8 + ds_byte ds_start_bit = ds_col & 7 for pf_row = 0 to 10 current_bit = setbits[ds_start_bit] current_byte = map[ds_byte] for pf_col = 0 to 31 if current_byte & current_bit then pfpixel pf_col pf_row on else pfpixel pf_col pf_row off current_bit = current_bit / 2 rem if current_bit = 0 then done with one rem data statement byte so go to the next rem get the corresponding map byte rem and reset current_bit to the first bit column if current_bit = 0 then ds_byte = ds_byte + 1 : current_byte = map[ds_byte] : current_bit = $80 next rem we have incremented ds_byte by 4 rem in the pf_col loop rem 8 bytes per data statement row rem we need to advance 4 more to go to the rem next row in the data statement ds_byte = ds_byte + 4 next return data setbits %10000000, %01000000, %00100000, %00010000 %00001000, %00000100, %00000010, %00000001 end data map %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111 %10000000, %00000000, %00000000, %00000001, %10000000, %00000000, %00000000, %00000001 %10100000, %00000000, %00111111, %11111001, %10100000, %00000000, %00111111, %11111001 %10100000, %00100000, %00100000, %00110001, %10100000, %00100000, %00100000, %00110001 %10100000, %00100000, %00100000, %11000001, %10100000, %00100000, %00100000, %11000001 %10100000, %00101111, %11100011, %00000001, %10100000, %00100001, %11100011, %00000001 %10100000, %00111000, %00001100, %00000001, %10100000, %00000000, %00001100, %00000001 %10100000, %00000000, %00110000, %00000001, %10100000, %00000000, %00110000, %00000001 %10111111, %11111111, %11000000, %00000001, %10111111, %11000001, %11000000, %00000001 %10000000, %00000000, %00000000, %00000001, %10000000, %00000000, %00000000, %00000001 %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111 %10101010, %01010101, %01010101, %01010100, %01010101, %01010001, %01010101, %01010101 %10000000, %00000000, %00000000, %00000001, %00000000, %00000000, %00000000, %00000001 %10000000, %00000000, %00000001, %10000001, %00100000, %00000000, %00111111, %11111001 %10000000, %00100000, %00000000, %00110001, %00100000, %00100000, %00100000, %00110001 %10000000, %00100000, %00000000, %11000001, %00100000, %00100000, %00100000, %11000001 %10000000, %00101111, %10000011, %00000001, %00100000, %00100111, %11100011, %00000001 %10000000, %00111000, %00000000, %00000001, %00100000, %00111000, %00001100, %00000001 %10000000, %00000000, %00000000, %00000001, %00100000, %00000000, %00110000, %00000001 %10000000, %11111111, %00000000, %00000001, %10000011, %10000011, %11000000, %00000001 %10000000, %00000000, %00000000, %00000001, %10000000, %00000000, %00000000, %00000001 %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111 end
  9. First off if you're going to do things that are in whole bytes it'll be much faster to handle things in whole bytes So for example assuming the line in the data statement that goes to a line in the playfield starts at the beginning of a byte and ends at the end of a byte (ie it does NOT start half way through a byte, say at bit 4, and ending half way through a byte at bit 4) then it will be much faster to just transfer the 4 bytes than call pfpixel 32 times. pfpixel "knows" about rows and columns but the data statement doesn't. You've got 8 bytes per line in the data statement. the first row starts at 0 0-7 bytes inclusive for the first line in the data statement. The second line of the data statement begins with byte 8 of the data statement, bytes 8-15 inclusive etc 4 bytes per line of the playfield (columns 0-31 inclusive) so the playfield lines correspond to data statement bytes like so (for the upper left corner of the data statement) pfrow pfbytes data statement bytes 0 0-3 0-3 1 4-7 8-11 2 8-11 16-19 You could do something like in this post: http://www.atariage....50#entry2641780 and build a counter in for-next loops that counts by bits bytes and rows in the data statement except instead of taking an early out at 28 bits you'd take your early out at four bytes You wouldn't need a special test you'd just need a for-next loop that counted the correct four bytes. (an early out being 'quit this data statement row and go to the next data statement row') And of course it'll have to maintain a pfpixel columns, rows counter in parallel pfcolumns advance on every bit and get reset to 0 after every four bytes. pfrows advance every four bytes. edit: you elaborated while I was composing you could still do something similar except maybe count through the data statement by bits and rows taking your early out after 32 bits which would be, I suppose, when you've finished one playfield row data statement rows would still go by 8 bytes data statement bits would have to pick out the correct 32 bits, which would be the same for each row you'd start at some bit and row in the data statement count through data statement bit and pfcolumns in parallel and when you've done 32 bits of the playfield column go to the next data statement row reseting every thing to where ever it needs to be for the next row (pfcolumn to 0, data statement bit to the starting bit in the row, pfrow to the next pfrow) or you could still count through the data statement by bits, bytes, and rows which I think would be faster but probably not faster enough to make it worth the added complication. .
  10. I expect that's because I added a drawscreen at the beginning of print_style routine. I rewrote the randomization stuff to speed it up but it's not enough. edit: looks like adding inlinerand might do it calc_roomtype if dungeonlvl{1} then temp2 = %00100100 : temp3 = %00000010 else temp2 = 0 : temp3 = 0 if worldx{0} then temp1 = %01001001 : temp2 = temp2 | %00000001 else temp1 = 0 if worldy{6} then temp1 = temp1 | %00000010 if worldy{0} then temp1 = temp1 | %00010000 if dungeonlvl{0} then temp1 = temp1 | %00100100 if worldy{2} then temp1 = temp1 | %10000000 temp1 = temp1 * 2 + temp1 if worldy{4} then temp2 = temp2 | %00000010 if temp1{7} then temp2 = temp2 | %00001000 if worldy{1} then temp2 = temp2 | %00010000 if worldx{1} then temp2 = temp2 | %01000000 : temp3 = temp3 | %00000001 if temp1{1} then temp2 = temp2 | %10000000 if dungeonlvl{5} then temp3 = temp3 | %00000100 if worldx{2} then temp3 = temp3 | %00001000 if worldy{2} then temp3 = temp3 | %00010000 if dungeonlvl{2} then temp3 = temp3 | %00100000 if temp1{3} then temp3 = temp3 | %01000000 if worldy{7} then temp3 = temp3 | %10000000 temp1 = temp1 * 2 + temp2 + temp3 + gameseed if temp1 then rand = temp1 else rand = 255 roomtype = rand tempvar = rand + worldx + worldy if counter > 230 then roomtype = rand if worldx = 128 && worldy = 128 && dungeonlvl = 1 then roomtype = 1 rclass = roomtype / 4 / 4 : roomtype = roomtype & $0F if rclass = 15 then rclass = style_a return thisbank
  11. I think the short answer is yes fact is, I've forgotten what all I've done about that I said in a previous post that it was still going over even with a liberal sprinkling of drawscreens I'm not seeing that now I put a drawscreen at the beginning of the print_style routine and that seems to take care of it even with the for-next loop version.
  12. Here's a version with some asm The asm routine is in a macro. putting it in a macro allows passing a data statement as a parameter to a particular instance. macro WRIPFMAC asm ldx {2} ldy {3} lda {4} clc adc {2} sta {4} WRI_LOOP lda {1},y sta $A4,x iny inx cpx {4} bcc WRI_LOOP end end To use it as a subroutine you'd give a label, invoke the macro with it's four parameters then supply a return. The four parameters (in this order) are: The name of the data statement. The variable used to pass the location of the first byte to be written in the playfield. The variable used to pass the location of the first byte in the data statement. The variable used to pass the number of bytes to move and this one gets modified. So eg you create a subroutine using the macro. Then to use it set up the three variables and call the subroutine. pf_ptr = 0 : dat_ptr = top_tbl[top] : dat_len = 12 gosub WRI_PF pf_ptr = 12 : dat_ptr = mid_tbl[mid] : dat_len = 20 gosub WRI_PF pf_ptr = 32 : dat_ptr = bot_tbl[bot] : dat_len = 12 gosub WRI_PF return otherbank WRI_PF callmacro WRIPFMAC room_dat pf_ptr dat_ptr dat_len return thisbank http://pastebin.com/W3pWSNS9 Here's a second macro. Use is the same. This has a little more overhead but is faster in the loop so you save a few cycles for moving more than 1 line. It also costs a few more bytes It uses temp variables temp1-temp4 but doesn't touch the parameter variables. macro WRIPFMAC asm lda #$00 sta temp2 ldy {4} clc lda #$A4 adc {2} sta temp1 lda {3} adc #<{1} sta temp3 lda #>{1} adc #$00 sta temp4 WRI_LOOP lda (temp3),y sta (temp1),y dey bpl WRI_LOOP end end
  13. I didn't count cycles exactly. the for-next loop takes something more than twice as long per byte, 34 cycles v 14 cycles than a playfield statement but there's less overhead calling other banks and no scrolling things into place. I figured they're roughly equal but the for next loop probably does take longer. 34 cycles x 44 bytes + a couple hundred overhead is only a couple thousand cycles or so. I don't think it's taking too long But I'm not through. I expected to throw a few bytes of asm in there to do the actual moving of bytes. I just hadn't decided what would be the best way to structure it. I want something somewhat general. About as general as the for-next routine but perhaps with the possibility of passing various data statements. as a parameter.
  14. I've been playing with theloon's code. This is the result: http://pastebin.com/Wq8zKc08 My purpose was not primarily to rewrite or streamline his code (although I ended up doing a bit of that) but to put it in a form that would make it easier to write the play field bytewise from bB. (if I were going to stream line it I think the first thing I might try is replacing long strings of if-thens with on-gotos) The code draws the playfield in three pieces, (which I called) top, mid, bot Top and bot are three lines each and mid is five lines. There are 6 three line pieces and 8 five line pieces to select from. Which gets drawn, is selected by strings of if-thens (in the original code) by the variable rclass and bits in tempvar. rclass ranges from 0-14 and for each rclass (three) bits of tempvar are used to select from one of two possibilities each for top, mid and bot. If you tabulate the possible selections (that are actually in the code) there are four possible tops, eight mids and four bots. (there are 4 bits of rclass and 3 bits of tempvar giving a max 128 possibilities however those 128 could be selected from a number limited only by how much ROM you've got, but that's not what's done) I put all the 3 and 5 line pieces in a data statement. Each rclass gets a pattern in pat_tbl. The pattern is melded with tempvar to get three pointers in to top_tbl, mid_tbl and bot_tbl which then points to one of the 4, 8 or 4 (respectively) possibilities. The contents of top_tbl, mid_tbl and bot_tbl are the beginning location in the room_dat data statement for their (respective) pieces which are to be written to the playfield. bit 0 of tempvar corresponds to the bottom piece bit 1 to the middle piece and bit 2 to the top piece. To get the pointers that point in to the top, mid and bot tables: First tempvar is overlaid with the pattern using eor. mid = pat_tbl[rclass] bot = ((tempvar ^ mid) & %00000101) ^ mid Bits two and three of the result are the top_tbl pointer and are shifted into place with a division by four and selected with an and mask top = bot / 4 & %00000011 The lower two bits of the result are the bot_tbl pointer and are selected with an and mask bot = bot & %00000011 Then pattern is shifted with a divide by 16 (divding by 4 twice) and overlaid on the tempvar and the bottom three bits masked to get the mid_tbl pointer. mid = mid / 4 / 4 mid = (((tempvar ^ mid) & %00000010) ^ mid) & %00000111 The results of those table lookups mid = mid_tbl[mid] bot = bot_tbl[bot] top = top_tbl[top] which are the positions in the data statement of the piece to be written are then passed to the WRI_PF subroutine (one at a time) in dat_ptr along with the location of the last byte to write in dat_last and the location in the playfield of the first byte to be written to in pf_ptr pf_ptr = 0 : dat_ptr = top : dat_last = top + 11 gosub WRI_PF pf_ptr = 12 : dat_ptr = mid : dat_last = mid + 19 gosub WRI_PF pf_ptr = 32 : dat_ptr = bot : dat_last = bot + 11 gosub WRI_PF WRI_PF is just a for-next loop that reads the data statement and writes to the playfield. WRI_PF for dat_ptr = dat_ptr to dat_last pfbase[pf_ptr] = room_dat[dat_ptr] pf_ptr = pf_ptr + 1 next return thisbank There are also four routines that create openings in the rooms. I rewrote them to write bytes to the playfield instead of calling pfpixel. They just do a series of read-mask-writes of the appropriate bytes to the appropriate spots in the playfield. eg: print_open_left pfbase[16]=pfbase[16] & %00000011 pfbase[20]=pfbase[20] & %00000011 pfbase[24]=pfbase[24] & %00000011 return otherbank
  15. I'm still not sure what you guys are trying to do Here's a couple that write the playfield byte wise One needs the reversed data one has a bit of asm to do the reversing. set romsize 4k dim mapx = a dim mapy = b dim pfy = c const mem = $A4 COLUPF=$FF COLUBK=$00 playfield: ....XX...XX...XXX...XX...XX.... ....X.X..X.X..X....X....X...... ....XX...XX...XX....X....X..... ....X....X.X..X......X....X.... ....X....X.X..XXX..XX...XX..... ............................... .....XX..XX....X....XX..XXX.... ....X....X.X..X.X..X....X...... .....X...XX...XXX..X....XX..... ......X..X....X.X..X....X...... ....XX...X....X.X...XX..XXX.... end PROGRAMLOOP if joy0fire then let z = z | 1 if !joy0fire && z then gosub DRAW_MAP drawscreen goto PROGRAMLOOP DRAW_MAP if mapy > 39 then mapy = 0 else mapy = mapy + 4 for pfy = 0 to 40 step 4 for mapx = 0 to 3 temp1 = mapy | mapx temp2 = pfy | mapx mem[temp2] = map[temp1] next if mapy > 39 then mapy = 0 else mapy = mapy + 4 next z = 0 return data map %00001010, %00010111, %10001110, %00000000, %00001010, %00010001, %10001010, %00000000, %00001110, %00010011, %10001010, %00000000, %00001010, %00010001, %10001010, %00000000, %00001010, %01110111, %11101110, %00000000, %00000000, %00000000, %00000000, %00000000, %00001000, %01100100, %01110010, %00111000, %00001000, %10010100, %01001010, %01001000, %00000101, %10010010, %01110010, %01001000, %00000010, %01100001, %01001011, %00111011, %00000000, %00000000, %00000000, %00000000, %00000000, %00000000, %00000000, %00000000 end set romsize 4k dim mapx = a dim mapy = b dim pfy = c const mem = $A4 macro revb asm lda temp1 and #$0F tax lda temp1 lsr lsr lsr lsr tay lda rev_tbl_lo,x ora rev_tbl_hi,y sta temp1 end end COLUPF=$FF COLUBK=$00 playfield: ....XX...XX...XXX...XX...XX.... ....X.X..X.X..X....X....X...... ....XX...XX...XX....X....X..... ....X....X.X..X......X....X.... ....X....X.X..XXX..XX...XX..... ............................... .....XX..XX....X....XX..XXX.... ....X....X.X..X.X..X....X...... .....X...XX...XXX..X....XX..... ......X..X....X.X..X....X...... ....XX...X....X.X...XX..XXX.... end PROGRAMLOOP if joy0fire then let z = z | 1 if !joy0fire && z then gosub DRAW_MAP drawscreen goto PROGRAMLOOP DRAW_MAP if mapy > 39 then mapy = 0 else mapy = mapy + 4 for pfy = 0 to 40 step 4 for mapx = 0 to 3 temp1 = mapy | mapx temp1 = map[temp1] if mapx & 1 then callmacro revb temp2 = pfy | mapx mem[temp2] = temp1 next if mapy > 39 then mapy = 0 else mapy = mapy + 4 next z = 0 return data map %00001010, %11101000, %10001110, %00000000, %00001010, %10001000, %10001010, %00000000, %00001110, %11001000, %10001010, %00000000, %00001010, %10001000, %10001010, %00000000, %00001010, %11101110, %11101110, %00000000, %00000000, %00000000, %00000000, %00000000, %00001000, %00100110, %01110010, %00011100, %00001000, %00101001, %01001010, %00010010, %00000101, %01001001, %01110010, %00010010, %00000010, %10000110, %01001011, %11011100, %00000000, %00000000, %00000000, %00000000, %00000000, %00000000, %00000000, %00000000 end data rev_tbl_lo %00000000, %10000000, %01000000, %11000000 %00100000, %10100000, %01100000, %11100000 %00010000, %10010000, %01010000, %11010000 %00110000, %10110000, %01110000, %11110000 end data rev_tbl_hi %00000000, %00001000, %00000100, %00001100 %00000010, %00001010, %00000110, %00001110 %00000001, %00001001, %00000101, %00001101 %00000011, %00001011, %00000111, %00001111 end http://pastebin.com/VCPHUBeQ without asm http://pastebin.com/tMp24XXT with asm
  16. I have the feeling I'm not doing a very good job of explaining myself For that bit of code that's setup for 5 bytes per line ie x goes to 39 the only reason to do it that way is if for some reason you want to visualize it that way. That is, if you want to conceptualize your data as 40 pixels by (whatever) and use 5 bytes per line in a data statement. the code itself doesn't care how the data is arranged in the data statement. And that's sort of (partly) why I wasn't supplying working code. It's just meant to be a discription of how to approach the problem. Specific problems are likely to need specific solutions. (does anybody actually use 40 pixel lines?) That particular bit of code I didn't test except to see that it compiled. Usually I test the code but not always, and not always before I post (I do try to remember to say if it's tested or not but I don't always remember to do that either) So I could post an actual working program usually but what good would it actually be? It's usually just a hello world type thing not really useful or meant/expected to be. And I still haven't gottten any DPC+ stuff to compile
  17. The code is a big counter. It counts by bits, bytes and data statements It maintains a counter in parallel that counts by columns and rows. The columns advance for every bit and the rows for every fourth byte. The bytes go 0,1,2,3... Three is the fourth byte so the code masks the lower two bits (ands byte_ptr with a mask that happens to equal three, ie byte_ptr & 3) and increments the row if they equal three. To do 28 bits the easiest would be to check the column and if it's 27 reset it to 0, increment the row and skip the rest of this byte and go to the next byte. dim print_row = a dim current_col = b dim current_bit = c dim current_byte = d dim ds_index = e dim byte_ptr = f WritePFChunk print_row = 0 current_col = 0 for ds_index = 0 to 3 for byte_ptr = 0 to 175 current_bit = $80 on ds_index goto DCASE0 DCASE1 DCASE2 DCASE3 NEXT_BYTE next next return DCASE0 current_byte = L4_0[byte_ptr] : goto BIT_LOOP DCASE1 current_byte = L4_1[byte_ptr] : goto BIT_LOOP DCASE2 current_byte = L4_2[byte_ptr] : goto BIT_LOOP DCASE3 current_byte = L4_3[byte_ptr] BIT_LOOP if current_byte & current_bit then pfpixel current_col print_row on else pfpixel current_col print_row off if current_col = 27 then current_col = 0 : print_row = print_row + 1 : goto NEXT_BYTE current_col = current_col + 1 current_bit = current_bit / 2 if current_bit then goto BIT_LOOP goto NEXT_BYTE
  18. I was going for speed not code size. you've switched to 28 columns. Here's a shorter version of the previous code. dim print_row = a dim current_col = b dim current_bit = c dim current_byte = d dim ds_index = e dim byte_ptr = f WritePFChunk print_row = 0 current_col = 0 for ds_index = 0 to 3 for byte_ptr = 0 to 175 current_bit = $80 on ds_index goto DCASE0 DCASE1 DCASE2 DCASE3 DCASE0 current_byte = L4_0[byte_ptr] : goto BIT_LOOP DCASE1 current_byte = L4_1[byte_ptr] : goto BIT_LOOP DCASE2 current_byte = L4_2[byte_ptr] : goto BIT_LOOP DCASE3 current_byte = L4_3[byte_ptr] BIT_LOOP if current_byte & current_bit then pfpixel current_col print_row on current_col = current_col + 1 current_col = current_col & $1F current_bit = current_bit / 2 if current_bit then goto BIT_LOOP if byte_ptr & $03 = 3 then print_row = print_row + 1 next next return
  19. I see one goof right off. The byte_col loop should only go to 3 All so, the addressing mode in the macro is wrong. The macro doesn't work any way. I could have sworn I'd done that before but... Here's some code that's tested. Print_row/a, current_col/b are referenced in hex in the asm so if they're dimmed different the asm will have to be changed. It may not work anyway in DPC+ I don't know how that's set up. As for the formatting I'll see if I can attach some files. (they're just this code not a complete program that's actually going to run) One with the asm and one with out. So try this dim print_row = a dim current_col = b dim current_row = c dim byte_col = d dim current_byte = e dim ds_index = f WritePFChunk print_row = 0 for ds_index = 0 to 3 for current_row = 0 to 172 step 4 current_col = 0 for byte_col = 0 to 3 current_byte = current_row | byte_col on ds_index goto DCASE0 DCASE1 DCASE2 DCASE3 BYTE_DONE next print_row = print_row + 1 next next return DCASE0 current_byte = L4_0[current_byte] : goto CONT_WRITE_PF DCASE1 current_byte = L4_1[current_byte] : goto CONT_WRITE_PF DCASE2 current_byte = L4_2[current_byte] : goto CONT_WRITE_PF DCASE3 current_byte = L4_3[current_byte] CONT_WRITE_PF if current_byte & $80 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $40 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $20 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $10 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $08 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $04 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $02 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $01 then gosub PIXEL_ON current_col = current_col + 1 goto BYTE_DONE PIXEL_ON asm LDA $D7 LDY $D6 LDX #0 JMP pfpixel end edit: attaching a file didn't work for me with http://pastebin.com/QDUgwAG8 without http://pastebin.com/XUknWmpK edit: put the for-next loops back in
  20. Sounds like, in your case, might as well optimize for code size. I doubt it can be made fast enough. You're really doing something akin to a block transfer and pfpixel is the wrong tool for the job. You need somthing like pfpixel that will do whole bytes. With the default kernel that's possible even from Bb but I don't know enough about Harmony or DPC+ to do a byte at a time (assuming it's possible short of rewriting the kernel). Here is my attempt to speed things up. This code compiles but is otherwise UNTESTED Basically it's got the bits unrolled and uses constants instead of a setbits table. The data statements are referenced with a pointer. This costs a few cycles per pixel but shortens the code and presents the possibility of paramterizing the transfer so that it could be done in small chunks. I didn't do that because I don't know how small the chunks would need to be. You could probably do a few rows at a time and it would probably take several seconds to do the whole thing. I also wasted a few cycles per pixel to get rid of redundant pfpixel calls With this code the data would be divided in to 4 equal pieces of 44 rows each (named L4_0 - L4_3) dim current_row = a dim current_col = b dim byte_col = c dim current_byte = d dim ds_index = e dim print_row = f macro Pixel_ON_macro asm LDA #(1) LDY #(2) LDX #0 JMP pfpixel end end WritePFChunk print_row = 0 ds_index = 0 DS_LOOP current_row = 0 ROW_LOOP current_col = 0 for byte_col = 0 to 4 current_byte = current_row | byte_col on ds_index goto DCASE0 DCASE1 DCASE2 DCASE3 CONT_WRITE_PF if current_byte & $80 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $40 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $20 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $10 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $08 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $04 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $02 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $01 then gosub PIXEL_ON current_col = current_col + 1 next print_row = print_row + 1 current_row = current_row + 4 if current_row < 173 then goto ROW_LOOP ds_index = ds_index + 1 if ds_index < 4 then goto DS_LOOP return DCASE0 current_byte = L4_0[current_byte] : goto CONT_WRITE_PF DCASE1 current_byte = L4_1[current_byte] : goto CONT_WRITE_PF DCASE2 current_byte = L4_2[current_byte] : goto CONT_WRITE_PF DCASE3 current_byte = L4_3[current_byte] : goto CONT_WRITE_PF PIXEL_ON callmacro Pixel_ON_macro current_col print_row
  21. I'm curious, if you're drawing a playfield from data statements, would you optimize for code size or speed or would it depend on what you're doing ?
  22. I'm very far from an expert but it appears to me to work just fine. I don't see the purpose of the end statements after the playermove and badguymove blocks. Are you sure you don't want subroutines (with returns instead of ends and gosubs instead of gotos)? As it stands now your code runs in a loop from playermove to the goto playermove statement. It never gets to the goto badguymove and goto main statements (as far as I can tell).
  23. The conclusion I'm coming to is that it would be best to write some utilities in asm that could be included in and called from a bB program to do some of this stuff. In the present case of picking out a bit from a data statement and then calling pfpixel you end up duplicating in bB stuff that pfpixel then does any way. Also the setbits data statment duplicates data in the kernal. Then too there's stuff you could do in asm that you can't do with bB.
  24. What do routines like this really need to do? The examples here are writing chunks. If you don't need to do things on a per bit basis it's a lot easier (and faster) to do bytes or rows. I think if you were doing things in bB it's probably better to do your own version of pfpixel, especially if you're doing individual bits, but it would have to be taylored to the kernal/ kernal options. (well, maybe not strictly speaking, but that would certainly be preferable) And, of course, a little asm could help a lot. It seems just barely possible to get some purely bB version of the routines here to work (in Stella) but they take a lot of cycles and they don't really do much.
  25. Thanks for that. I would have eventually gotten around to trying it myself but I thought someone might know already.
×
×
  • Create New...