Jump to content

bogax

Members
  • Content Count

    902
  • Joined

  • Last visited

Posts posted by bogax


  1. You have a trade off between how much adding of extra

    pieces you want to do and how nice a distribution you want

     

    You could start with 5 bits ie begin by ANDing your random

    number with $1F but the distribution won't be very good

     

    You probably don't need to use all eight bits though

    (I'm assuming you're starting with 8 bits ie 0-255)

    After a second look, it's really not

    much more to use all 8 bits

    It only costs a couple cycles and it

    gives you a better distribution.

     

    The code is basically the same.

    sta temp
    lsr
    adc temp
    ror
    lsr
    lsr
    adc temp
    ror
    lsr
    lsr
    lsr
    
    eor #$FF
    clc
    adc #$01
    


  2. There's a number of ways, you could AND out the higher bits first so that there's much better chance, e.g. AND #$1F means you'd have 0-31 but still that's nearly half that would fail.

     

    Better again might be to just build the random number out of the sum of 2 smaller ones.

     

    e.g. LDA $D20A

    AND #3 ; will always be 0-3

    STA RND

    LDA $D20A

    AND #$F ; will always be 0-15

    CLC

    ADC RND

    STA RND ; will always be 0-18

     

    Have you actually checked to see how well that works?

    If it were really random, I don't think you'd get an

    even distribution.

    I think on average 0 and 18 will come up 1 in 76 times

    17 and 1 will come up 2 in 76 times 2 and 16 will come

    up 3 out of 76 and all the rest 4 out of 76.

    If it's not random (and it's not, but maybe close enough)

    it might be worse.

    Imagine what it would be if it were a simple binary

    counter instead of a polynomial counter (LFSR)


  3. One way is to just scale what you've got to what you want

    You have a trade off between how much adding of extra

    pieces you want to do and how nice a distribution you want

     

    You could start with 5 bits ie begin by ANDing your random

    number with $1F but the distribution won't be very good

     

    You probably don't need to use all eight bits though

    (I'm assuming you're starting with 8 bits ie 0-255)

     

    Here's code that uses a number 0-63

     

    It multiplies (with some rounding thrown in,

    ie no clearing the carry for the adds)

    by 1.1875, 1 3/16 then divides by 4 and negates

     

    ; assuming there's a random number 0=255 in a
    
    and #$3F ; mask for 0-63 call that r
    sta temp
    lsr ; / 2 = .5 * r
    adc temp ; + 1 = 1.5 * r  
    lsr
    lsr
    lsr ; 1.5 / 8 = .1875 * r
    adc temp ; + 1 = 1.875 * r , 75 max with rounding
    lsr
    lsr ; 75 / 4 = 18 max
    
    ; negate
    eor #$FF
    clc
    adc #$01


  4. I'm not sure how I could get a random number that would place the object in multiples of 10 with those operations.

     

    This seems to be the best useable method so far:

     

    tempvar = ((rand&7)+1)

    player1y = (tempvar*10)-30

     

    I didn't mean that particular bit I meant elsewhere where there's a mutiply by 16

    specifically, this line in the print_screenright routine:

     

    if change = 32 then worldx = rand*16 : change = 0 else change = change + 1


  5. Here's an UNTESTED routine that counts through

    pf columns and rows maintaing a ds_byte counter

    and a current_bit counter in parallel

     

    You supply it with the first column and row

    from the data statement in ds_col and ds_row

    there's 64 columns 0-63 in the data statement but

    you only want to go up to 32 or you'll get into

    the next data statement row when writing the

    playfield

    obviously, rows will be similar and if you go too

    far you'll fall off the end of the data statement

     

    http://pastebin.com/JyGxd3dv

     

    dim pf_row = a
    dim pf_col = b
    dim ds_start_bit = c
    dim current_bit = d
    dim current_byte = e
    dim ds_byte = e
    dim ds_col = g
    dim ds_row = h
    
    printpf
    
    ds_byte = ds_col / 8
    ds_byte = ds_row * 8 + ds_byte
    ds_start_bit = ds_col & 7
    
    for pf_row = 0 to 10
    current_bit = setbits[ds_start_bit]
    current_byte = map[ds_byte]
    
    for pf_col = 0 to 31
    if current_byte & current_bit then pfpixel pf_col pf_row on else pfpixel pf_col pf_row off
    current_bit = current_bit / 2
    
    rem if current_bit = 0 then done with one
    rem data statement byte so go to the next
    rem get the corresponding map byte
    rem and reset current_bit to the first bit column
    if current_bit = 0 then ds_byte = ds_byte + 1 : current_byte = map[ds_byte] : current_bit = $80
    next
    
    rem we have incremented ds_byte by 4
    rem in the pf_col loop
    rem 8 bytes per data statement row
    rem we need to advance 4 more to go to the
    rem next row in the data statement
    ds_byte = ds_byte + 4
    next
    return
    
    data setbits
    %10000000, %01000000, %00100000, %00010000
    %00001000, %00000100, %00000010, %00000001
    end
    
    data map
    %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111
    %10000000, %00000000, %00000000, %00000001, %10000000, %00000000, %00000000, %00000001
    %10100000, %00000000, %00111111, %11111001, %10100000, %00000000, %00111111, %11111001
    %10100000, %00100000, %00100000, %00110001, %10100000, %00100000, %00100000, %00110001
    %10100000, %00100000, %00100000, %11000001, %10100000, %00100000, %00100000, %11000001
    %10100000, %00101111, %11100011, %00000001, %10100000, %00100001, %11100011, %00000001
    %10100000, %00111000, %00001100, %00000001, %10100000, %00000000, %00001100, %00000001
    %10100000, %00000000, %00110000, %00000001, %10100000, %00000000, %00110000, %00000001
    %10111111, %11111111, %11000000, %00000001, %10111111, %11000001, %11000000, %00000001
    %10000000, %00000000, %00000000, %00000001, %10000000, %00000000, %00000000, %00000001
    %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111
    %10101010, %01010101, %01010101, %01010100, %01010101, %01010001, %01010101, %01010101
    %10000000, %00000000, %00000000, %00000001, %00000000, %00000000, %00000000, %00000001
    %10000000, %00000000, %00000001, %10000001, %00100000, %00000000, %00111111, %11111001
    %10000000, %00100000, %00000000, %00110001, %00100000, %00100000, %00100000, %00110001
    %10000000, %00100000, %00000000, %11000001, %00100000, %00100000, %00100000, %11000001
    %10000000, %00101111, %10000011, %00000001, %00100000, %00100111, %11100011, %00000001
    %10000000, %00111000, %00000000, %00000001, %00100000, %00111000, %00001100, %00000001
    %10000000, %00000000, %00000000, %00000001, %00100000, %00000000, %00110000, %00000001
    %10000000, %11111111, %00000000, %00000001, %10000011, %10000011, %11000000, %00000001
    %10000000, %00000000, %00000000, %00000001, %10000000, %00000000, %00000000, %00000001
    %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111, %11111111
    end
    


  6. First off if you're going to do things that are in whole

    bytes it'll be much faster to handle things in whole bytes

     

    So for example assuming the line in the data statement

    that goes to a line in the playfield starts at the

    beginning of a byte and ends at the end of a byte

    (ie it does NOT start half way through a byte, say at bit 4,

    and ending half way through a byte at bit 4)

    then it will be much faster to just transfer the 4 bytes

    than call pfpixel 32 times.

     

    pfpixel "knows" about rows and columns but the data

    statement doesn't.

     

    You've got 8 bytes per line in the data statement.

    the first row starts at 0

    0-7 bytes inclusive for the first line in the data statement.

    The second line of the data statement begins with byte 8 of the

    data statement, bytes 8-15 inclusive etc

    4 bytes per line of the playfield (columns 0-31 inclusive)

    so the playfield lines correspond to data statement bytes like so

    (for the upper left corner of the data statement)

     

    pfrow pfbytes data statement bytes

    0 0-3 0-3

    1 4-7 8-11

    2 8-11 16-19

     

     

    You could do something like in this post:

     

    http://www.atariage....50#entry2641780

     

    and build a counter in for-next loops that counts by bits

    bytes and rows in the data statement except instead of taking

    an early out at 28 bits you'd take your early out at four bytes

    You wouldn't need a special test you'd just need a for-next loop

    that counted the correct four bytes. (an early out being 'quit this

    data statement row and go to the next data statement row')

     

    And of course it'll have to maintain a pfpixel columns, rows counter

    in parallel

    pfcolumns advance on every bit and get reset to 0 after

    every four bytes.

    pfrows advance every four bytes.

     

     

    edit:

    you elaborated while I was composing

    you could still do something similar except maybe

    count through the data statement by bits and rows

    taking your early out after 32 bits which would

    be, I suppose, when you've finished one playfield row

    data statement rows would still go by 8 bytes

    data statement bits would have to pick out the correct

    32 bits, which would be the same for each row

    you'd start at some bit and row in the data statement

    count through data statement bit and pfcolumns in parallel

    and when you've done 32 bits of the playfield column

    go to the next data statement row reseting every thing

    to where ever it needs to be for the next row (pfcolumn

    to 0, data statement bit to the starting bit in the row,

    pfrow to the next pfrow)

     

    or you could still count through the data statement by

    bits, bytes, and rows which I think would be faster

    but probably not faster enough to make it worth the added

    complication.

     

    .


  7. I tried it over and over again, but so far the program hasn't stopped in Stella using autoexec.stella.

    I expect that's because I added a drawscreen at the beginning of

    print_style routine.

     

    I rewrote the randomization stuff to speed it up but it's not enough.

    edit: looks like adding inlinerand might do it

    calc_roomtype
    
    if dungeonlvl{1} then temp2 = %00100100 : temp3 = %00000010 else temp2 = 0 : temp3 = 0
    
    if worldx{0} then temp1 = %01001001 : temp2 = temp2 | %00000001 else temp1 = 0
    if worldy{6} then temp1 = temp1 | %00000010
    if worldy{0} then temp1 = temp1 | %00010000
    if dungeonlvl{0} then temp1 = temp1 | %00100100
    if worldy{2} then temp1 = temp1 | %10000000
    
    temp1 = temp1 * 2 + temp1
    
    if worldy{4} then temp2 = temp2 | %00000010
    if temp1{7} then temp2 = temp2 | %00001000
    if worldy{1} then temp2 = temp2 | %00010000
    if worldx{1} then temp2 = temp2 | %01000000 : temp3 = temp3 | %00000001
    if temp1{1} then temp2 = temp2 | %10000000
    
    if dungeonlvl{5} then temp3 = temp3 | %00000100
    if worldx{2} then temp3 = temp3 | %00001000
    if worldy{2} then temp3 = temp3 | %00010000
    if dungeonlvl{2} then temp3 = temp3 | %00100000
    if temp1{3} then temp3 = temp3 | %01000000
    if worldy{7} then temp3 = temp3 | %10000000
    
    temp1 = temp1 * 2 + temp2 + temp3 + gameseed
    
    if temp1 then rand = temp1 else rand = 255
    
    roomtype = rand
    
    tempvar = rand + worldx + worldy
    
    if counter > 230 then roomtype = rand
    
    if worldx = 128 && worldy = 128 && dungeonlvl = 1 then roomtype = 1
    
    rclass = roomtype / 4 / 4 : roomtype = roomtype & $0F
    if rclass = 15 then rclass = style_a
    return thisbank
    


  8. Speaking of which, does this still run over CPU cycles with the new inline asm bogax cooked up?

     

    I think the short answer is yes

     

    fact is, I've forgotten what all I've done about that

     

    I said in a previous post that it was still going over even

    with a liberal sprinkling of drawscreens

     

    I'm not seeing that now

     

    I put a drawscreen at the beginning of the print_style

    routine and that seems to take care of it even with the

    for-next loop version.


  9. Here's a version with some asm

     

    The asm routine is in a macro.

    putting it in a macro allows passing a data statement

    as a parameter to a particular instance.

     

     macro WRIPFMAC
    asm
    ldx {2}
    ldy {3}
    lda {4}
    clc
    adc {2}
    sta {4}
    WRI_LOOP
    lda {1},y
    sta $A4,x
    iny
    inx
    cpx {4}
    bcc WRI_LOOP
    end
    end

    To use it as a subroutine you'd give a label,

    invoke the macro with it's four parameters

    then supply a return.

     

    The four parameters (in this order) are:

     

    The name of the data statement.

     

    The variable used to pass the location of the first byte to be

    written in the playfield.

     

    The variable used to pass the location of the first byte in the

    data statement.

     

    The variable used to pass the number of bytes to move and this

    one gets modified.

     

    So eg you create a subroutine using the macro.

    Then to use it set up the three variables and call the subroutine.

     pf_ptr = 0 : dat_ptr = top_tbl[top] : dat_len = 12
    gosub WRI_PF
    pf_ptr = 12 : dat_ptr = mid_tbl[mid] : dat_len = 20
    gosub WRI_PF
    pf_ptr = 32 : dat_ptr = bot_tbl[bot] : dat_len = 12
    gosub WRI_PF
    return otherbank
    
    WRI_PF
    
    callmacro WRIPFMAC room_dat pf_ptr dat_ptr dat_len
    
    return thisbank

     

     

    http://pastebin.com/W3pWSNS9

     

    Here's a second macro. Use is the same.

     

    This has a little more overhead but is faster in the loop

    so you save a few cycles for moving more than 1 line.

     

    It also costs a few more bytes

     

    It uses temp variables temp1-temp4 but doesn't touch the

    parameter variables.

    macro WRIPFMAC
    asm
    lda #$00
    sta temp2
    ldy {4}
    clc
    lda #$A4
    adc {2}
    sta temp1
    lda {3}
    adc #<{1}  
    sta temp3
    lda #>{1}
    adc #$00
    sta temp4
    WRI_LOOP
    lda (temp3),y
    sta (temp1),y
    dey
    bpl WRI_LOOP
    end 
    end
    


  10. I didn't count cycles exactly.

    the for-next loop takes something more than twice as

    long per byte, 34 cycles v 14 cycles than a playfield

    statement but there's less overhead calling other banks

    and no scrolling things into place. I figured they're

    roughly equal but the for next loop probably does take

    longer.

    34 cycles x 44 bytes + a couple hundred overhead

    is only a couple thousand cycles or so.

    I don't think it's taking too long

     

    But I'm not through. I expected to throw a few bytes

    of asm in there to do the actual moving of bytes.

    I just hadn't decided what would be the best way to

    structure it. I want something somewhat general.

    About as general as the for-next routine but perhaps

    with the possibility of passing various data statements.

    as a parameter.


  11. I've been playing with theloon's code.

     

    This is the result: http://pastebin.com/Wq8zKc08

     

    My purpose was not primarily to rewrite or

    streamline his code (although I ended up

    doing a bit of that) but to put it in a

    form that would make it easier to write the

    play field bytewise from bB.

    (if I were going to stream line it I think the

    first thing I might try is replacing long strings

    of if-thens with on-gotos)

     

    The code draws the playfield in three pieces,

    (which I called) top, mid, bot

    Top and bot are three lines each and mid

    is five lines.

     

    There are 6 three line pieces and 8 five line

    pieces to select from. Which gets drawn,

    is selected by strings of if-thens (in the original

    code) by the variable rclass and bits in tempvar.

    rclass ranges from 0-14 and for each rclass (three)

    bits of tempvar are used to select from one of two

    possibilities each for top, mid and bot.

     

    If you tabulate the possible selections (that are

    actually in the code) there are four possible tops,

    eight mids and four bots. (there are 4 bits of rclass

    and 3 bits of tempvar giving a max 128 possibilities

    however those 128 could be selected from a number

    limited only by how much ROM you've got, but that's

    not what's done)

     

    I put all the 3 and 5 line pieces in a data statement.

    Each rclass gets a pattern in pat_tbl.

    The pattern is melded with tempvar to get three pointers

    in to top_tbl, mid_tbl and bot_tbl which then points to

    one of the 4, 8 or 4 (respectively) possibilities.

    The contents of top_tbl, mid_tbl and bot_tbl are the

    beginning location in the room_dat data statement for

    their (respective) pieces which are to be written to the

    playfield.

     

    bit 0 of tempvar corresponds to the bottom piece

    bit 1 to the middle piece and bit 2 to the top piece.

     

    To get the pointers that point in to the top, mid and

    bot tables:

    First tempvar is overlaid with the pattern using eor.

     mid = pat_tbl[rclass]
    bot = ((tempvar ^ mid) & %00000101) ^ mid

    Bits two and three of the result are the top_tbl pointer

    and are shifted into place with a division by four and

    selected with an and mask

     top = bot / 4 & %00000011

    The lower two bits of the result are the bot_tbl pointer

    and are selected with an and mask

     bot = bot & %00000011

    Then pattern is shifted with a divide by 16 (divding by

    4 twice) and overlaid on the tempvar and the bottom three

    bits masked to get the mid_tbl pointer.

     mid = mid / 4 / 4
    mid = (((tempvar ^ mid) & %00000010) ^ mid) & %00000111

    The results of those table lookups

     mid = mid_tbl[mid]
    bot = bot_tbl[bot]
    top = top_tbl[top]

    which are the positions

    in the data statement of the piece to be written are then

    passed to the WRI_PF subroutine (one at a time) in dat_ptr

    along with the location of the last byte to write in dat_last

    and the location in the playfield of the first byte to be

    written to in pf_ptr

     pf_ptr = 0 : dat_ptr = top : dat_last = top + 11
    gosub WRI_PF
    pf_ptr = 12 : dat_ptr = mid : dat_last = mid + 19
    gosub WRI_PF
    pf_ptr = 32 : dat_ptr = bot : dat_last = bot + 11
    gosub WRI_PF

    WRI_PF is just a for-next loop that reads

    the data statement and writes to the playfield.

    WRI_PF
    for dat_ptr = dat_ptr to dat_last
    pfbase[pf_ptr] = room_dat[dat_ptr]
    pf_ptr = pf_ptr + 1
    next
    return thisbank

     

    There are also four routines that create openings in the rooms.

    I rewrote them to write bytes to the playfield instead of calling

    pfpixel. They just do a series of read-mask-writes of the

    appropriate bytes to the appropriate spots in the playfield. eg:

    print_open_left
    pfbase[16]=pfbase[16] & %00000011
    pfbase[20]=pfbase[20] & %00000011
    pfbase[24]=pfbase[24] & %00000011
    return otherbank

    • Like 2

  12. I'm still not sure what you guys are trying to do :)

     

    Here's a couple that write the playfield byte wise

     

    One needs the reversed data one has a bit of asm

    to do the reversing.

     

    set romsize 4k
    
    dim mapx = a
    dim mapy = b
    dim pfy = c
    const mem = $A4
    
    COLUPF=$FF
    COLUBK=$00
    
    playfield:
    ....XX...XX...XXX...XX...XX....
    ....X.X..X.X..X....X....X......
    ....XX...XX...XX....X....X.....
    ....X....X.X..X......X....X....
    ....X....X.X..XXX..XX...XX.....
    ...............................
    .....XX..XX....X....XX..XXX....
    ....X....X.X..X.X..X....X......
    .....X...XX...XXX..X....XX.....
    ......X..X....X.X..X....X......
    ....XX...X....X.X...XX..XXX....
    end
    
    PROGRAMLOOP
    if joy0fire then let z = z | 1
    if !joy0fire && z then gosub DRAW_MAP
    
    drawscreen
    
    goto PROGRAMLOOP
    
    DRAW_MAP
    if mapy > 39 then mapy = 0 else mapy = mapy + 4
    for pfy = 0 to 40 step 4
    
    for mapx = 0 to 3
    temp1 = mapy | mapx
    temp2 = pfy | mapx
    mem[temp2] = map[temp1]
    next
    if mapy > 39 then mapy = 0 else mapy = mapy + 4
    next
    z = 0
    return
    
    data map
    %00001010, %00010111, %10001110, %00000000,
    %00001010, %00010001, %10001010, %00000000,
    %00001110, %00010011, %10001010, %00000000,
    %00001010, %00010001, %10001010, %00000000,
    %00001010, %01110111, %11101110, %00000000,
    %00000000, %00000000, %00000000, %00000000,
    %00001000, %01100100, %01110010, %00111000,
    %00001000, %10010100, %01001010, %01001000,
    %00000101, %10010010, %01110010, %01001000,
    %00000010, %01100001, %01001011, %00111011,
    %00000000, %00000000, %00000000, %00000000,
    %00000000, %00000000, %00000000, %00000000
    end
    

     

    set romsize 4k
    
    dim mapx = a
    dim mapy = b
    dim pfy = c
    const mem = $A4
    
    macro revb
    asm
    lda temp1
    and #$0F
    tax
    lda temp1
    lsr
    lsr
    lsr
    lsr
    tay
    lda rev_tbl_lo,x
    ora rev_tbl_hi,y
    sta temp1
    end
    end
    
    COLUPF=$FF
    COLUBK=$00
    
    playfield:
    ....XX...XX...XXX...XX...XX....
    ....X.X..X.X..X....X....X......
    ....XX...XX...XX....X....X.....
    ....X....X.X..X......X....X....
    ....X....X.X..XXX..XX...XX.....
    ...............................
    .....XX..XX....X....XX..XXX....
    ....X....X.X..X.X..X....X......
    .....X...XX...XXX..X....XX.....
    ......X..X....X.X..X....X......
    ....XX...X....X.X...XX..XXX....
    end
    
    PROGRAMLOOP
    if joy0fire then let z = z | 1
    if !joy0fire && z then gosub DRAW_MAP
    
    drawscreen
    
    goto PROGRAMLOOP
    
    DRAW_MAP
    if mapy > 39 then mapy = 0 else mapy = mapy + 4
    for pfy = 0 to 40 step 4
    
    for mapx = 0 to 3
    temp1 = mapy | mapx
    temp1 = map[temp1]
    if mapx & 1 then callmacro revb
    temp2 = pfy | mapx
    mem[temp2] = temp1
    next
    if mapy > 39 then mapy = 0 else mapy = mapy + 4
    next
    z = 0
    return
    
    data map
    %00001010, %11101000, %10001110, %00000000,
    %00001010, %10001000, %10001010, %00000000,
    %00001110, %11001000, %10001010, %00000000,
    %00001010, %10001000, %10001010, %00000000,
    %00001010, %11101110, %11101110, %00000000,
    %00000000, %00000000, %00000000, %00000000,
    %00001000, %00100110, %01110010, %00011100,
    %00001000, %00101001, %01001010, %00010010,
    %00000101, %01001001, %01110010, %00010010,
    %00000010, %10000110, %01001011, %11011100,
    %00000000, %00000000, %00000000, %00000000,
    %00000000, %00000000, %00000000, %00000000
    end
    
    data rev_tbl_lo
    %00000000, %10000000, %01000000, %11000000
    %00100000, %10100000, %01100000, %11100000
    %00010000, %10010000, %01010000, %11010000
    %00110000, %10110000, %01110000, %11110000
    end
    
    data rev_tbl_hi
    %00000000, %00001000, %00000100, %00001100
    %00000010, %00001010, %00000110, %00001110
    %00000001, %00001001, %00000101, %00001101
    %00000011, %00001011, %00000111, %00001111
    end
    

     

    http://pastebin.com/VCPHUBeQ without asm

     

    http://pastebin.com/tMp24XXT with asm


  13. I have the feeling I'm not doing a very good job of explaining myself

     

    For that bit of code that's setup for 5 bytes per line ie x goes to 39

    the only reason to do it that way is if for some reason you want to

    visualize it that way. That is, if you want to conceptualize your data

    as 40 pixels by (whatever) and use 5 bytes per line in a data statement.

    the code itself doesn't care how the data is arranged in the data statement.

     

    And that's sort of (partly) why I wasn't supplying working code.

    It's just meant to be a discription of how to approach the problem.

     

    Specific problems are likely to need specific solutions.

    (does anybody actually use 40 pixel lines?)

     

    That particular bit of code I didn't test except to see that it

    compiled.

     

    Usually I test the code but not always, and not always before I

    post (I do try to remember to say if it's tested or not but I don't

    always remember to do that either)

     

    So I could post an actual working program usually but what good

    would it actually be? It's usually just a hello world type thing

    not really useful or meant/expected to be.

     

    And I still haven't gottten any DPC+ stuff to compile


  14. The code is a big counter.

    It counts by bits, bytes and data statements

    It maintains a counter in parallel that counts

    by columns and rows. The columns advance for every bit

    and the rows for every fourth byte.

    The bytes go 0,1,2,3...

    Three is the fourth byte so the code masks the lower

    two bits (ands byte_ptr with a mask that happens to equal

    three, ie byte_ptr & 3) and increments the row if they equal

    three.

     

    To do 28 bits the easiest would be to check the column

    and if it's 27 reset it to 0, increment the row and skip

    the rest of this byte and go to the next byte.

     

     dim print_row = a
     dim current_col = b
     dim current_bit = c
     dim current_byte = d
     dim ds_index = e
     dim byte_ptr = f
    
    WritePFChunk
    
     print_row = 0
     current_col = 0
     for ds_index = 0 to 3
    
     for byte_ptr = 0 to 175
     current_bit = $80
     on ds_index goto DCASE0 DCASE1 DCASE2 DCASE3
    
    NEXT_BYTE
     next
     next
     return
    
    DCASE0 current_byte = L4_0[byte_ptr] : goto BIT_LOOP
    DCASE1 current_byte = L4_1[byte_ptr] : goto BIT_LOOP
    DCASE2 current_byte = L4_2[byte_ptr] : goto BIT_LOOP
    DCASE3 current_byte = L4_3[byte_ptr]
    
    BIT_LOOP
     if current_byte & current_bit then pfpixel current_col print_row on else pfpixel current_col print_row off 
     if current_col = 27 then current_col = 0 : print_row = print_row + 1 : goto NEXT_BYTE
     current_col = current_col + 1
     current_bit = current_bit / 2
     if current_bit then goto BIT_LOOP
     goto NEXT_BYTE
    


  15. Your routine takes 105 more bytes than the OP, but one less variable:

     

    I was going for speed not code size.

     

    you've switched to 28 columns.

     

    Here's a shorter version of the previous code.

     dim print_row = a
     dim current_col = b
     dim current_bit = c
     dim current_byte = d
     dim ds_index = e
     dim byte_ptr = f
    
    WritePFChunk
    
     print_row = 0
     current_col = 0
     for ds_index = 0 to 3
    
     for byte_ptr = 0 to 175
     current_bit = $80
     on ds_index goto DCASE0 DCASE1 DCASE2 DCASE3
    
    DCASE0 current_byte = L4_0[byte_ptr] : goto BIT_LOOP
    DCASE1 current_byte = L4_1[byte_ptr] : goto BIT_LOOP
    DCASE2 current_byte = L4_2[byte_ptr] : goto BIT_LOOP
    DCASE3 current_byte = L4_3[byte_ptr]
    
    BIT_LOOP
     if current_byte & current_bit then pfpixel current_col print_row on 
     current_col = current_col + 1
     current_col = current_col & $1F
     current_bit = current_bit / 2
     if current_bit then goto BIT_LOOP
     if byte_ptr & $03 = 3 then print_row = print_row + 1
     next
     next
     return
    


  16. I see one goof right off. The byte_col loop should only go to 3

    All so, the addressing mode in the macro is wrong.

    The macro doesn't work any way. I could have sworn I'd done

    that before but...

     

    Here's some code that's tested. Print_row/a, current_col/b are

    referenced in hex in the asm so if they're dimmed different the

    asm will have to be changed. It may not work anyway in DPC+

    I don't know how that's set up.

     

    As for the formatting I'll see if I can attach some files.

    (they're just this code not a complete program that's actually

    going to run)

     

    One with the asm and one with out.

     

    So try this

     

     dim print_row = a
     dim current_col = b
     dim current_row = c
     dim byte_col = d
     dim current_byte = e
     dim ds_index = f
    
    
    WritePFChunk
    
     print_row = 0
    
     for ds_index = 0 to 3
    
     for current_row = 0 to 172 step 4
     current_col = 0 
    
     for byte_col = 0 to 3
     current_byte = current_row | byte_col 
     on ds_index goto DCASE0 DCASE1 DCASE2 DCASE3
    
    BYTE_DONE
     next
     print_row = print_row + 1
     next
     next
     return
    
    DCASE0 current_byte = L4_0[current_byte] : goto CONT_WRITE_PF
    DCASE1 current_byte = L4_1[current_byte] : goto CONT_WRITE_PF
    DCASE2 current_byte = L4_2[current_byte] : goto CONT_WRITE_PF
    DCASE3 current_byte = L4_3[current_byte]
    
    CONT_WRITE_PF
     if current_byte & $80 then gosub PIXEL_ON 
     current_col = current_col + 1 
     if current_byte & $40 then gosub PIXEL_ON
     current_col = current_col + 1 
     if current_byte & $20 then gosub PIXEL_ON
     current_col = current_col + 1 
     if current_byte & $10 then gosub PIXEL_ON
     current_col = current_col + 1 
     if current_byte & $08 then gosub PIXEL_ON
     current_col = current_col + 1 
     if current_byte & $04 then gosub PIXEL_ON
     current_col = current_col + 1 
     if current_byte & $02 then gosub PIXEL_ON
     current_col = current_col + 1 
     if current_byte & $01 then gosub PIXEL_ON
     current_col = current_col + 1
    
     goto BYTE_DONE
    
    
    PIXEL_ON
     asm
     LDA $D7
     LDY $D6
     LDX #0
     JMP pfpixel
    end
    

     

    edit: attaching a file didn't work for me

    with http://pastebin.com/QDUgwAG8

    without http://pastebin.com/XUknWmpK

     

    edit: put the for-next loops back in


  17. In using the DPC+ kernel, drawing a playfield from a data statement would use the current bank to store the data and not waste the "graphics" bank, so you can have more sprites and animation.

    Also it is slow and will send the scan line count off, so it is only good to use once before the main loop.

    Until I find another way, like an extended graphics bank or an inline assembly routine, it is the only way I can get 4 high res playfields in batari Basic.

    Sounds like, in your case, might as well optimize for

    code size. I doubt it can be made fast enough.

    You're really doing something akin to a block

    transfer and pfpixel is the wrong tool for the job.

    You need somthing like pfpixel that will do whole

    bytes. With the default kernel that's possible

    even from Bb but I don't know enough about Harmony

    or DPC+ to do a byte at a time (assuming it's possible

    short of rewriting the kernel).

     

    Here is my attempt to speed things up.

    This code compiles but is otherwise UNTESTED

     

    Basically it's got the bits unrolled and uses

    constants instead of a setbits table.

    The data statements are referenced with a pointer.

    This costs a few cycles per pixel but shortens the

    code and presents the possibility of paramterizing

    the transfer so that it could be done in small chunks.

    I didn't do that because I don't know how small the

    chunks would need to be.

    You could probably do a few rows at a time and it would

    probably take several seconds to do the whole thing.

     

    I also wasted a few cycles per pixel to get rid of redundant

    pfpixel calls

     

    With this code the data would be divided in to 4 equal

    pieces of 44 rows each (named L4_0 - L4_3)

     

    dim current_row = a
    dim current_col = b
    dim byte_col = c
    dim current_byte = d
    dim ds_index = e
    dim print_row = f
    
    macro Pixel_ON_macro
    asm
    LDA #(1)
    LDY #(2)
    LDX #0
    JMP pfpixel
    end
    end
    
    WritePFChunk
    print_row = 0
    ds_index = 0
    
    DS_LOOP
    current_row = 0
    
    ROW_LOOP
    current_col = 0
    
    for byte_col = 0 to 4
    current_byte = current_row | byte_col
    on ds_index goto DCASE0 DCASE1 DCASE2 DCASE3
    
    CONT_WRITE_PF
    if current_byte & $80 then gosub PIXEL_ON
    current_col = current_col + 1
    if current_byte & $40 then gosub PIXEL_ON
    current_col = current_col + 1
    if current_byte & $20 then gosub PIXEL_ON
    current_col = current_col + 1
    if current_byte & $10 then gosub PIXEL_ON
    current_col = current_col + 1
    if current_byte & $08 then gosub PIXEL_ON
    current_col = current_col + 1
    if current_byte & $04 then gosub PIXEL_ON
    current_col = current_col + 1
    if current_byte & $02 then gosub PIXEL_ON
    current_col = current_col + 1
    if current_byte & $01 then gosub PIXEL_ON
    current_col = current_col + 1
    next
    print_row = print_row + 1
    current_row = current_row + 4
    if current_row < 173 then goto ROW_LOOP
    ds_index = ds_index + 1
    if ds_index < 4 then goto DS_LOOP
    return
    
    DCASE0 current_byte = L4_0[current_byte] : goto CONT_WRITE_PF
    DCASE1 current_byte = L4_1[current_byte] : goto CONT_WRITE_PF
    DCASE2 current_byte = L4_2[current_byte] : goto CONT_WRITE_PF
    DCASE3 current_byte = L4_3[current_byte] : goto CONT_WRITE_PF
    
    PIXEL_ON
    callmacro Pixel_ON_macro current_col print_row
    


  18. I'm very far from an expert but it appears

    to me to work just fine.

     

    I don't see the purpose of the end statements

    after the playermove and badguymove blocks.

     

    Are you sure you don't want subroutines

    (with returns instead of ends and gosubs instead of

    gotos)?

     

    As it stands now your code runs in a loop from

    playermove to the goto playermove statement.

    It never gets to the goto badguymove and goto main

    statements (as far as I can tell).


  19. The conclusion I'm coming to is that it would be best

    to write some utilities in asm that could be included in

    and called from a bB program to do some of this stuff.

     

    In the present case of picking out a bit from a data

    statement and then calling pfpixel you end up duplicating

    in bB stuff that pfpixel then does any way.

    Also the setbits data statment duplicates data in the

    kernal.

     

    Then too there's stuff you could do in asm that you can't

    do with bB.


  20. What do routines like this really need to do?

     

    The examples here are writing chunks. If you don't need

    to do things on a per bit basis it's a lot easier (and faster)

    to do bytes or rows.

     

    I think if you were doing things in bB it's probably better

    to do your own version of pfpixel, especially if you're doing

    individual bits, but it would have to be taylored to the kernal/

    kernal options. (well, maybe not strictly speaking, but that

    would certainly be preferable)

    And, of course, a little asm could help a lot.

     

    It seems just barely possible to get some purely bB version

    of the routines here to work (in Stella) but they take a lot of cycles

    and they don't really do much.


  21. Testing shows that 46 is the limit. That makes sense, as statements are limited to 50 strings, and the command, macroname, and newline chars take up 4.

     

    No reason that couldn't be raised, but it seems sufficient.

     

    Thanks for that.

     

    I would have eventually gotten around to trying it myself but I thought

    someone might know already.

×
×
  • Create New...