bogax
-
Content Count
902 -
Joined
-
Last visited
Posts posted by bogax
-
-
You have a trade off between how much adding of extra
pieces you want to do and how nice a distribution you want
You could start with 5 bits ie begin by ANDing your random
number with $1F but the distribution won't be very good
You probably don't need to use all eight bits though
(I'm assuming you're starting with 8 bits ie 0-255)
After a second look, it's really not
much more to use all 8 bits
It only costs a couple cycles and it
gives you a better distribution.
The code is basically the same.
sta temp lsr adc temp ror lsr lsr adc temp ror lsr lsr lsr eor #$FF clc adc #$01
-
There's a number of ways, you could AND out the higher bits first so that there's much better chance, e.g. AND #$1F means you'd have 0-31 but still that's nearly half that would fail.
Better again might be to just build the random number out of the sum of 2 smaller ones.
e.g. LDA $D20A
AND #3 ; will always be 0-3
STA RND
LDA $D20A
AND #$F ; will always be 0-15
CLC
ADC RND
STA RND ; will always be 0-18
Have you actually checked to see how well that works?
If it were really random, I don't think you'd get an
even distribution.
I think on average 0 and 18 will come up 1 in 76 times
17 and 1 will come up 2 in 76 times 2 and 16 will come
up 3 out of 76 and all the rest 4 out of 76.
If it's not random (and it's not, but maybe close enough)
it might be worse.
Imagine what it would be if it were a simple binary
counter instead of a polynomial counter (LFSR)
-
One way is to just scale what you've got to what you want
You have a trade off between how much adding of extra
pieces you want to do and how nice a distribution you want
You could start with 5 bits ie begin by ANDing your random
number with $1F but the distribution won't be very good
You probably don't need to use all eight bits though
(I'm assuming you're starting with 8 bits ie 0-255)
Here's code that uses a number 0-63
It multiplies (with some rounding thrown in,
ie no clearing the carry for the adds)
by 1.1875, 1 3/16 then divides by 4 and negates
; assuming there's a random number 0=255 in a and #$3F ; mask for 0-63 call that r sta temp lsr ; / 2 = .5 * r adc temp ; + 1 = 1.5 * r lsr lsr lsr ; 1.5 / 8 = .1875 * r adc temp ; + 1 = 1.875 * r , 75 max with rounding lsr lsr ; 75 / 4 = 18 max ; negate eor #$FF clc adc #$01
-
I'm not sure how I could get a random number that would place the object in multiples of 10 with those operations.
This seems to be the best useable method so far:
tempvar = ((rand&7)+1)
player1y = (tempvar*10)-30
I didn't mean that particular bit I meant elsewhere where there's a mutiply by 16
specifically, this line in the print_screenright routine:
if change = 32 then worldx = rand*16 : change = 0 else change = change + 1
-
re GroovyBee's comment about faster than calling a generic multiply,
if you use *4*4 or *2*8 instead of *16 bB will use shifts instead of caling
the multiply routine.
There's other optimizations you could make also. (but they might mess
with the clarity of your code)
-
Here's an UNTESTED routine that counts through
pf columns and rows maintaing a ds_byte counter
and a current_bit counter in parallel
You supply it with the first column and row
from the data statement in ds_col and ds_row
there's 64 columns 0-63 in the data statement but
you only want to go up to 32 or you'll get into
the next data statement row when writing the
playfield
obviously, rows will be similar and if you go too
far you'll fall off the end of the data statement
dim pf_row = a dim pf_col = b dim ds_start_bit = c dim current_bit = d dim current_byte = e dim ds_byte = e dim ds_col = g dim ds_row = h printpf ds_byte = ds_col / 8 ds_byte = ds_row * 8 + ds_byte ds_start_bit = ds_col & 7 for pf_row = 0 to 10 current_bit = setbits[ds_start_bit] current_byte = map[ds_byte] for pf_col = 0 to 31 if current_byte & current_bit then pfpixel pf_col pf_row on else pfpixel pf_col pf_row off current_bit = current_bit / 2 rem if current_bit = 0 then done with one rem data statement byte so go to the next rem get the corresponding map byte rem and reset current_bit to the first bit column if current_bit = 0 then ds_byte = ds_byte + 1 : current_byte = map[ds_byte] : current_bit = $80 next rem we have incremented ds_byte by 4 rem in the pf_col loop rem 8 bytes per data statement row rem we need to advance 4 more to go to the rem next row in the data statement ds_byte = ds_byte + 4 next return data setbits %10000000, %01000000, %00100000, %00010000 %00001000, %00000100, %00000010, %00000001 end data map %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111 %10000000, %00000000, %00000000, %00000001, %10000000, %00000000, %00000000, %00000001 %10100000, %00000000, %00111111, %11111001, %10100000, %00000000, %00111111, %11111001 %10100000, %00100000, %00100000, %00110001, %10100000, %00100000, %00100000, %00110001 %10100000, %00100000, %00100000, %11000001, %10100000, %00100000, %00100000, %11000001 %10100000, %00101111, %11100011, %00000001, %10100000, %00100001, %11100011, %00000001 %10100000, %00111000, %00001100, %00000001, %10100000, %00000000, %00001100, %00000001 %10100000, %00000000, %00110000, %00000001, %10100000, %00000000, %00110000, %00000001 %10111111, %11111111, %11000000, %00000001, %10111111, %11000001, %11000000, %00000001 %10000000, %00000000, %00000000, %00000001, %10000000, %00000000, %00000000, %00000001 %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111 %10101010, %01010101, %01010101, %01010100, %01010101, %01010001, %01010101, %01010101 %10000000, %00000000, %00000000, %00000001, %00000000, %00000000, %00000000, %00000001 %10000000, %00000000, %00000001, %10000001, %00100000, %00000000, %00111111, %11111001 %10000000, %00100000, %00000000, %00110001, %00100000, %00100000, %00100000, %00110001 %10000000, %00100000, %00000000, %11000001, %00100000, %00100000, %00100000, %11000001 %10000000, %00101111, %10000011, %00000001, %00100000, %00100111, %11100011, %00000001 %10000000, %00111000, %00000000, %00000001, %00100000, %00111000, %00001100, %00000001 %10000000, %00000000, %00000000, %00000001, %00100000, %00000000, %00110000, %00000001 %10000000, %11111111, %00000000, %00000001, %10000011, %10000011, %11000000, %00000001 %10000000, %00000000, %00000000, %00000001, %10000000, %00000000, %00000000, %00000001 %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111 end
-
First off if you're going to do things that are in whole
bytes it'll be much faster to handle things in whole bytes
So for example assuming the line in the data statement
that goes to a line in the playfield starts at the
beginning of a byte and ends at the end of a byte
(ie it does NOT start half way through a byte, say at bit 4,
and ending half way through a byte at bit 4)
then it will be much faster to just transfer the 4 bytes
than call pfpixel 32 times.
pfpixel "knows" about rows and columns but the data
statement doesn't.
You've got 8 bytes per line in the data statement.
the first row starts at 0
0-7 bytes inclusive for the first line in the data statement.
The second line of the data statement begins with byte 8 of the
data statement, bytes 8-15 inclusive etc
4 bytes per line of the playfield (columns 0-31 inclusive)
so the playfield lines correspond to data statement bytes like so
(for the upper left corner of the data statement)
pfrow pfbytes data statement bytes
0 0-3 0-3
1 4-7 8-11
2 8-11 16-19
You could do something like in this post:
http://www.atariage....50#entry2641780
and build a counter in for-next loops that counts by bits
bytes and rows in the data statement except instead of taking
an early out at 28 bits you'd take your early out at four bytes
You wouldn't need a special test you'd just need a for-next loop
that counted the correct four bytes. (an early out being 'quit this
data statement row and go to the next data statement row')
And of course it'll have to maintain a pfpixel columns, rows counter
in parallel
pfcolumns advance on every bit and get reset to 0 after
every four bytes.
pfrows advance every four bytes.
edit:
you elaborated while I was composing
you could still do something similar except maybe
count through the data statement by bits and rows
taking your early out after 32 bits which would
be, I suppose, when you've finished one playfield row
data statement rows would still go by 8 bytes
data statement bits would have to pick out the correct
32 bits, which would be the same for each row
you'd start at some bit and row in the data statement
count through data statement bit and pfcolumns in parallel
and when you've done 32 bits of the playfield column
go to the next data statement row reseting every thing
to where ever it needs to be for the next row (pfcolumn
to 0, data statement bit to the starting bit in the row,
pfrow to the next pfrow)
or you could still count through the data statement by
bits, bytes, and rows which I think would be faster
but probably not faster enough to make it worth the added
complication.
.
-
I tried it over and over again, but so far the program hasn't stopped in Stella using autoexec.stella.
I expect that's because I added a drawscreen at the beginning of
print_style routine.
I rewrote the randomization stuff to speed it up but it's not enough.
edit: looks like adding inlinerand might do it
calc_roomtype if dungeonlvl{1} then temp2 = %00100100 : temp3 = %00000010 else temp2 = 0 : temp3 = 0 if worldx{0} then temp1 = %01001001 : temp2 = temp2 | %00000001 else temp1 = 0 if worldy{6} then temp1 = temp1 | %00000010 if worldy{0} then temp1 = temp1 | %00010000 if dungeonlvl{0} then temp1 = temp1 | %00100100 if worldy{2} then temp1 = temp1 | %10000000 temp1 = temp1 * 2 + temp1 if worldy{4} then temp2 = temp2 | %00000010 if temp1{7} then temp2 = temp2 | %00001000 if worldy{1} then temp2 = temp2 | %00010000 if worldx{1} then temp2 = temp2 | %01000000 : temp3 = temp3 | %00000001 if temp1{1} then temp2 = temp2 | %10000000 if dungeonlvl{5} then temp3 = temp3 | %00000100 if worldx{2} then temp3 = temp3 | %00001000 if worldy{2} then temp3 = temp3 | %00010000 if dungeonlvl{2} then temp3 = temp3 | %00100000 if temp1{3} then temp3 = temp3 | %01000000 if worldy{7} then temp3 = temp3 | %10000000 temp1 = temp1 * 2 + temp2 + temp3 + gameseed if temp1 then rand = temp1 else rand = 255 roomtype = rand tempvar = rand + worldx + worldy if counter > 230 then roomtype = rand if worldx = 128 && worldy = 128 && dungeonlvl = 1 then roomtype = 1 rclass = roomtype / 4 / 4 : roomtype = roomtype & $0F if rclass = 15 then rclass = style_a return thisbank -
Speaking of which, does this still run over CPU cycles with the new inline asm bogax cooked up?
I think the short answer is yes
fact is, I've forgotten what all I've done about that
I said in a previous post that it was still going over even
with a liberal sprinkling of drawscreens
I'm not seeing that now
I put a drawscreen at the beginning of the print_style
routine and that seems to take care of it even with the
for-next loop version.
-
Here's a version with some asm
The asm routine is in a macro.
putting it in a macro allows passing a data statement
as a parameter to a particular instance.
macro WRIPFMAC asm ldx {2} ldy {3} lda {4} clc adc {2} sta {4} WRI_LOOP lda {1},y sta $A4,x iny inx cpx {4} bcc WRI_LOOP end endTo use it as a subroutine you'd give a label,
invoke the macro with it's four parameters
then supply a return.
The four parameters (in this order) are:
The name of the data statement.
The variable used to pass the location of the first byte to be
written in the playfield.
The variable used to pass the location of the first byte in the
data statement.
The variable used to pass the number of bytes to move and this
one gets modified.
So eg you create a subroutine using the macro.
Then to use it set up the three variables and call the subroutine.
pf_ptr = 0 : dat_ptr = top_tbl[top] : dat_len = 12 gosub WRI_PF pf_ptr = 12 : dat_ptr = mid_tbl[mid] : dat_len = 20 gosub WRI_PF pf_ptr = 32 : dat_ptr = bot_tbl[bot] : dat_len = 12 gosub WRI_PF return otherbank WRI_PF callmacro WRIPFMAC room_dat pf_ptr dat_ptr dat_len return thisbank
Here's a second macro. Use is the same.
This has a little more overhead but is faster in the loop
so you save a few cycles for moving more than 1 line.
It also costs a few more bytes
It uses temp variables temp1-temp4 but doesn't touch the
parameter variables.
macro WRIPFMAC asm lda #$00 sta temp2 ldy {4} clc lda #$A4 adc {2} sta temp1 lda {3} adc #<{1} sta temp3 lda #>{1} adc #$00 sta temp4 WRI_LOOP lda (temp3),y sta (temp1),y dey bpl WRI_LOOP end end -
I didn't count cycles exactly.
the for-next loop takes something more than twice as
long per byte, 34 cycles v 14 cycles than a playfield
statement but there's less overhead calling other banks
and no scrolling things into place. I figured they're
roughly equal but the for next loop probably does take
longer.
34 cycles x 44 bytes + a couple hundred overhead
is only a couple thousand cycles or so.
I don't think it's taking too long
But I'm not through. I expected to throw a few bytes
of asm in there to do the actual moving of bytes.
I just hadn't decided what would be the best way to
structure it. I want something somewhat general.
About as general as the for-next routine but perhaps
with the possibility of passing various data statements.
as a parameter.
-
I've been playing with theloon's code.
This is the result: http://pastebin.com/Wq8zKc08
My purpose was not primarily to rewrite or
streamline his code (although I ended up
doing a bit of that) but to put it in a
form that would make it easier to write the
play field bytewise from bB.
(if I were going to stream line it I think the
first thing I might try is replacing long strings
of if-thens with on-gotos)
The code draws the playfield in three pieces,
(which I called) top, mid, bot
Top and bot are three lines each and mid
is five lines.
There are 6 three line pieces and 8 five line
pieces to select from. Which gets drawn,
is selected by strings of if-thens (in the original
code) by the variable rclass and bits in tempvar.
rclass ranges from 0-14 and for each rclass (three)
bits of tempvar are used to select from one of two
possibilities each for top, mid and bot.
If you tabulate the possible selections (that are
actually in the code) there are four possible tops,
eight mids and four bots. (there are 4 bits of rclass
and 3 bits of tempvar giving a max 128 possibilities
however those 128 could be selected from a number
limited only by how much ROM you've got, but that's
not what's done)
I put all the 3 and 5 line pieces in a data statement.
Each rclass gets a pattern in pat_tbl.
The pattern is melded with tempvar to get three pointers
in to top_tbl, mid_tbl and bot_tbl which then points to
one of the 4, 8 or 4 (respectively) possibilities.
The contents of top_tbl, mid_tbl and bot_tbl are the
beginning location in the room_dat data statement for
their (respective) pieces which are to be written to the
playfield.
bit 0 of tempvar corresponds to the bottom piece
bit 1 to the middle piece and bit 2 to the top piece.
To get the pointers that point in to the top, mid and
bot tables:
First tempvar is overlaid with the pattern using eor.
mid = pat_tbl[rclass] bot = ((tempvar ^ mid) & %00000101) ^ mid
Bits two and three of the result are the top_tbl pointer
and are shifted into place with a division by four and
selected with an and mask
top = bot / 4 & %00000011
The lower two bits of the result are the bot_tbl pointer
and are selected with an and mask
bot = bot & %00000011
Then pattern is shifted with a divide by 16 (divding by
4 twice) and overlaid on the tempvar and the bottom three
bits masked to get the mid_tbl pointer.
mid = mid / 4 / 4 mid = (((tempvar ^ mid) & %00000010) ^ mid) & %00000111
The results of those table lookups
mid = mid_tbl[mid] bot = bot_tbl[bot] top = top_tbl[top]
which are the positions
in the data statement of the piece to be written are then
passed to the WRI_PF subroutine (one at a time) in dat_ptr
along with the location of the last byte to write in dat_last
and the location in the playfield of the first byte to be
written to in pf_ptr
pf_ptr = 0 : dat_ptr = top : dat_last = top + 11 gosub WRI_PF pf_ptr = 12 : dat_ptr = mid : dat_last = mid + 19 gosub WRI_PF pf_ptr = 32 : dat_ptr = bot : dat_last = bot + 11 gosub WRI_PF
WRI_PF is just a for-next loop that reads
the data statement and writes to the playfield.
WRI_PF for dat_ptr = dat_ptr to dat_last pfbase[pf_ptr] = room_dat[dat_ptr] pf_ptr = pf_ptr + 1 next return thisbank
There are also four routines that create openings in the rooms.
I rewrote them to write bytes to the playfield instead of calling
pfpixel. They just do a series of read-mask-writes of the
appropriate bytes to the appropriate spots in the playfield. eg:
print_open_left pfbase[16]=pfbase[16] & %00000011 pfbase[20]=pfbase[20] & %00000011 pfbase[24]=pfbase[24] & %00000011 return otherbank
-
2
-
-
I'm still not sure what you guys are trying to do

Here's a couple that write the playfield byte wise
One needs the reversed data one has a bit of asm
to do the reversing.
set romsize 4k dim mapx = a dim mapy = b dim pfy = c const mem = $A4 COLUPF=$FF COLUBK=$00 playfield: ....XX...XX...XXX...XX...XX.... ....X.X..X.X..X....X....X...... ....XX...XX...XX....X....X..... ....X....X.X..X......X....X.... ....X....X.X..XXX..XX...XX..... ............................... .....XX..XX....X....XX..XXX.... ....X....X.X..X.X..X....X...... .....X...XX...XXX..X....XX..... ......X..X....X.X..X....X...... ....XX...X....X.X...XX..XXX.... end PROGRAMLOOP if joy0fire then let z = z | 1 if !joy0fire && z then gosub DRAW_MAP drawscreen goto PROGRAMLOOP DRAW_MAP if mapy > 39 then mapy = 0 else mapy = mapy + 4 for pfy = 0 to 40 step 4 for mapx = 0 to 3 temp1 = mapy | mapx temp2 = pfy | mapx mem[temp2] = map[temp1] next if mapy > 39 then mapy = 0 else mapy = mapy + 4 next z = 0 return data map %00001010, %00010111, %10001110, %00000000, %00001010, %00010001, %10001010, %00000000, %00001110, %00010011, %10001010, %00000000, %00001010, %00010001, %10001010, %00000000, %00001010, %01110111, %11101110, %00000000, %00000000, %00000000, %00000000, %00000000, %00001000, %01100100, %01110010, %00111000, %00001000, %10010100, %01001010, %01001000, %00000101, %10010010, %01110010, %01001000, %00000010, %01100001, %01001011, %00111011, %00000000, %00000000, %00000000, %00000000, %00000000, %00000000, %00000000, %00000000 end
set romsize 4k dim mapx = a dim mapy = b dim pfy = c const mem = $A4 macro revb asm lda temp1 and #$0F tax lda temp1 lsr lsr lsr lsr tay lda rev_tbl_lo,x ora rev_tbl_hi,y sta temp1 end end COLUPF=$FF COLUBK=$00 playfield: ....XX...XX...XXX...XX...XX.... ....X.X..X.X..X....X....X...... ....XX...XX...XX....X....X..... ....X....X.X..X......X....X.... ....X....X.X..XXX..XX...XX..... ............................... .....XX..XX....X....XX..XXX.... ....X....X.X..X.X..X....X...... .....X...XX...XXX..X....XX..... ......X..X....X.X..X....X...... ....XX...X....X.X...XX..XXX.... end PROGRAMLOOP if joy0fire then let z = z | 1 if !joy0fire && z then gosub DRAW_MAP drawscreen goto PROGRAMLOOP DRAW_MAP if mapy > 39 then mapy = 0 else mapy = mapy + 4 for pfy = 0 to 40 step 4 for mapx = 0 to 3 temp1 = mapy | mapx temp1 = map[temp1] if mapx & 1 then callmacro revb temp2 = pfy | mapx mem[temp2] = temp1 next if mapy > 39 then mapy = 0 else mapy = mapy + 4 next z = 0 return data map %00001010, %11101000, %10001110, %00000000, %00001010, %10001000, %10001010, %00000000, %00001110, %11001000, %10001010, %00000000, %00001010, %10001000, %10001010, %00000000, %00001010, %11101110, %11101110, %00000000, %00000000, %00000000, %00000000, %00000000, %00001000, %00100110, %01110010, %00011100, %00001000, %00101001, %01001010, %00010010, %00000101, %01001001, %01110010, %00010010, %00000010, %10000110, %01001011, %11011100, %00000000, %00000000, %00000000, %00000000, %00000000, %00000000, %00000000, %00000000 end data rev_tbl_lo %00000000, %10000000, %01000000, %11000000 %00100000, %10100000, %01100000, %11100000 %00010000, %10010000, %01010000, %11010000 %00110000, %10110000, %01110000, %11110000 end data rev_tbl_hi %00000000, %00001000, %00000100, %00001100 %00000010, %00001010, %00000110, %00001110 %00000001, %00001001, %00000101, %00001101 %00000011, %00001011, %00000111, %00001111 end
http://pastebin.com/VCPHUBeQ without asm
http://pastebin.com/tMp24XXT with asm
-
I have the feeling I'm not doing a very good job of explaining myself
For that bit of code that's setup for 5 bytes per line ie x goes to 39
the only reason to do it that way is if for some reason you want to
visualize it that way. That is, if you want to conceptualize your data
as 40 pixels by (whatever) and use 5 bytes per line in a data statement.
the code itself doesn't care how the data is arranged in the data statement.
And that's sort of (partly) why I wasn't supplying working code.
It's just meant to be a discription of how to approach the problem.
Specific problems are likely to need specific solutions.
(does anybody actually use 40 pixel lines?)
That particular bit of code I didn't test except to see that it
compiled.
Usually I test the code but not always, and not always before I
post (I do try to remember to say if it's tested or not but I don't
always remember to do that either)
So I could post an actual working program usually but what good
would it actually be? It's usually just a hello world type thing
not really useful or meant/expected to be.
And I still haven't gottten any DPC+ stuff to compile
-
The code is a big counter.
It counts by bits, bytes and data statements
It maintains a counter in parallel that counts
by columns and rows. The columns advance for every bit
and the rows for every fourth byte.
The bytes go 0,1,2,3...
Three is the fourth byte so the code masks the lower
two bits (ands byte_ptr with a mask that happens to equal
three, ie byte_ptr & 3) and increments the row if they equal
three.
To do 28 bits the easiest would be to check the column
and if it's 27 reset it to 0, increment the row and skip
the rest of this byte and go to the next byte.
dim print_row = a dim current_col = b dim current_bit = c dim current_byte = d dim ds_index = e dim byte_ptr = f WritePFChunk print_row = 0 current_col = 0 for ds_index = 0 to 3 for byte_ptr = 0 to 175 current_bit = $80 on ds_index goto DCASE0 DCASE1 DCASE2 DCASE3 NEXT_BYTE next next return DCASE0 current_byte = L4_0[byte_ptr] : goto BIT_LOOP DCASE1 current_byte = L4_1[byte_ptr] : goto BIT_LOOP DCASE2 current_byte = L4_2[byte_ptr] : goto BIT_LOOP DCASE3 current_byte = L4_3[byte_ptr] BIT_LOOP if current_byte & current_bit then pfpixel current_col print_row on else pfpixel current_col print_row off if current_col = 27 then current_col = 0 : print_row = print_row + 1 : goto NEXT_BYTE current_col = current_col + 1 current_bit = current_bit / 2 if current_bit then goto BIT_LOOP goto NEXT_BYTE
-
Your routine takes 105 more bytes than the OP, but one less variable:
I was going for speed not code size.
you've switched to 28 columns.
Here's a shorter version of the previous code.
dim print_row = a dim current_col = b dim current_bit = c dim current_byte = d dim ds_index = e dim byte_ptr = f WritePFChunk print_row = 0 current_col = 0 for ds_index = 0 to 3 for byte_ptr = 0 to 175 current_bit = $80 on ds_index goto DCASE0 DCASE1 DCASE2 DCASE3 DCASE0 current_byte = L4_0[byte_ptr] : goto BIT_LOOP DCASE1 current_byte = L4_1[byte_ptr] : goto BIT_LOOP DCASE2 current_byte = L4_2[byte_ptr] : goto BIT_LOOP DCASE3 current_byte = L4_3[byte_ptr] BIT_LOOP if current_byte & current_bit then pfpixel current_col print_row on current_col = current_col + 1 current_col = current_col & $1F current_bit = current_bit / 2 if current_bit then goto BIT_LOOP if byte_ptr & $03 = 3 then print_row = print_row + 1 next next return
-
I see one goof right off. The byte_col loop should only go to 3
All so, the addressing mode in the macro is wrong.
The macro doesn't work any way. I could have sworn I'd done
that before but...
Here's some code that's tested. Print_row/a, current_col/b are
referenced in hex in the asm so if they're dimmed different the
asm will have to be changed. It may not work anyway in DPC+
I don't know how that's set up.
As for the formatting I'll see if I can attach some files.
(they're just this code not a complete program that's actually
going to run)
One with the asm and one with out.
So try this
dim print_row = a dim current_col = b dim current_row = c dim byte_col = d dim current_byte = e dim ds_index = f WritePFChunk print_row = 0 for ds_index = 0 to 3 for current_row = 0 to 172 step 4 current_col = 0 for byte_col = 0 to 3 current_byte = current_row | byte_col on ds_index goto DCASE0 DCASE1 DCASE2 DCASE3 BYTE_DONE next print_row = print_row + 1 next next return DCASE0 current_byte = L4_0[current_byte] : goto CONT_WRITE_PF DCASE1 current_byte = L4_1[current_byte] : goto CONT_WRITE_PF DCASE2 current_byte = L4_2[current_byte] : goto CONT_WRITE_PF DCASE3 current_byte = L4_3[current_byte] CONT_WRITE_PF if current_byte & $80 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $40 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $20 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $10 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $08 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $04 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $02 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $01 then gosub PIXEL_ON current_col = current_col + 1 goto BYTE_DONE PIXEL_ON asm LDA $D7 LDY $D6 LDX #0 JMP pfpixel end
edit: attaching a file didn't work for me
with http://pastebin.com/QDUgwAG8
without http://pastebin.com/XUknWmpK
edit: put the for-next loops back in
-
In using the DPC+ kernel, drawing a playfield from a data statement would use the current bank to store the data and not waste the "graphics" bank, so you can have more sprites and animation.
Also it is slow and will send the scan line count off, so it is only good to use once before the main loop.
Until I find another way, like an extended graphics bank or an inline assembly routine, it is the only way I can get 4 high res playfields in batari Basic.
Sounds like, in your case, might as well optimize for
code size. I doubt it can be made fast enough.
You're really doing something akin to a block
transfer and pfpixel is the wrong tool for the job.
You need somthing like pfpixel that will do whole
bytes. With the default kernel that's possible
even from Bb but I don't know enough about Harmony
or DPC+ to do a byte at a time (assuming it's possible
short of rewriting the kernel).
Here is my attempt to speed things up.
This code compiles but is otherwise UNTESTED
Basically it's got the bits unrolled and uses
constants instead of a setbits table.
The data statements are referenced with a pointer.
This costs a few cycles per pixel but shortens the
code and presents the possibility of paramterizing
the transfer so that it could be done in small chunks.
I didn't do that because I don't know how small the
chunks would need to be.
You could probably do a few rows at a time and it would
probably take several seconds to do the whole thing.
I also wasted a few cycles per pixel to get rid of redundant
pfpixel calls
With this code the data would be divided in to 4 equal
pieces of 44 rows each (named L4_0 - L4_3)
dim current_row = a dim current_col = b dim byte_col = c dim current_byte = d dim ds_index = e dim print_row = f macro Pixel_ON_macro asm LDA #(1) LDY #(2) LDX #0 JMP pfpixel end end WritePFChunk print_row = 0 ds_index = 0 DS_LOOP current_row = 0 ROW_LOOP current_col = 0 for byte_col = 0 to 4 current_byte = current_row | byte_col on ds_index goto DCASE0 DCASE1 DCASE2 DCASE3 CONT_WRITE_PF if current_byte & $80 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $40 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $20 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $10 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $08 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $04 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $02 then gosub PIXEL_ON current_col = current_col + 1 if current_byte & $01 then gosub PIXEL_ON current_col = current_col + 1 next print_row = print_row + 1 current_row = current_row + 4 if current_row < 173 then goto ROW_LOOP ds_index = ds_index + 1 if ds_index < 4 then goto DS_LOOP return DCASE0 current_byte = L4_0[current_byte] : goto CONT_WRITE_PF DCASE1 current_byte = L4_1[current_byte] : goto CONT_WRITE_PF DCASE2 current_byte = L4_2[current_byte] : goto CONT_WRITE_PF DCASE3 current_byte = L4_3[current_byte] : goto CONT_WRITE_PF PIXEL_ON callmacro Pixel_ON_macro current_col print_row
-
I'm curious, if you're drawing a playfield from data statements,
would you optimize for code size or speed or would it depend
on what you're doing ?
-
I'm very far from an expert but it appears
to me to work just fine.
I don't see the purpose of the end statements
after the playermove and badguymove blocks.
Are you sure you don't want subroutines
(with returns instead of ends and gosubs instead of
gotos)?
As it stands now your code runs in a loop from
playermove to the goto playermove statement.
It never gets to the goto badguymove and goto main
statements (as far as I can tell).
-
The conclusion I'm coming to is that it would be best
to write some utilities in asm that could be included in
and called from a bB program to do some of this stuff.
In the present case of picking out a bit from a data
statement and then calling pfpixel you end up duplicating
in bB stuff that pfpixel then does any way.
Also the setbits data statment duplicates data in the
kernal.
Then too there's stuff you could do in asm that you can't
do with bB.
-
What do routines like this really need to do?
The examples here are writing chunks. If you don't need
to do things on a per bit basis it's a lot easier (and faster)
to do bytes or rows.
I think if you were doing things in bB it's probably better
to do your own version of pfpixel, especially if you're doing
individual bits, but it would have to be taylored to the kernal/
kernal options. (well, maybe not strictly speaking, but that
would certainly be preferable)
And, of course, a little asm could help a lot.
It seems just barely possible to get some purely bB version
of the routines here to work (in Stella) but they take a lot of cycles
and they don't really do much.
-
Testing shows that 46 is the limit. That makes sense, as statements are limited to 50 strings, and the command, macroname, and newline chars take up 4.
No reason that couldn't be raised, but it seems sufficient.
Thanks for that.
I would have eventually gotten around to trying it myself but I thought
someone might know already.
-
Macros are also a way to pass values to the assembler from bB.

Creating new game – drawing, scanline and positioning issues.
in Atari 2600 Programming
Posted · Edited by bogax
Haven't taken the time to find my way all through your code.
Here's a constant time divide by 15 (reciprocal multiplication)