Jump to content
IGNORED

More Software Sprites


NRV

Recommended Posts

Hi, I've been working on software sprites for some weeks and wanted to share the results. I made this just for fun and to learn and test some software sprites techniques, to see how fast I could made an "engine" that only use these type of sprites.

 

My primary motivation was to see if a could make some routines near as fast as the "theoretical" limits to move memory in a 6502, using pre compiled sprites, unrolled code and some clever optimizations (see the code and the "technical doc" for more details).

 

Obviously, the methods are optimized for speed, and normally use a lot of memory, but this is fine with me because I'm not trying to put them on a catridge :) . Instead I'm happy with having maybe just one level of a game in memory at any time.

 

I choosed gr.7 with a narrow field (32 bytes of width) because you can still make sprites that look good for games (like with zybex or draconus), and also because you get more machine cycles per frame than in other modes (23024 to be "very" exact, assuming that the players are activated), and for fast moving objects you don't notice the loss in resolution too much.

 

I worked in NTSC primarily (you have many more cycles per frame in PAL, but the transition is easier from NTSC to PAL.. anyway maybe I should just program for PAL). You can see how many of the frame is free, holding down any of the CONSOL keys (START, OPTION or SELECT) in any of the demos (the lower part between the two lines is the "free" part).

 

All is programmer's art made by me, and I don't added colors by any normal method (like P/M's or DLI's) because I was concentrated in the programming techniques, so don't criticize the "art" :P

 

All demos run at 60 frames per seconds, and some of them still have room for more sprites (you can think of that as time for the game logic and the music code, specially in PAL).. I was too lazy to do something more complicated than sinus tables for the movements, sorry for that :) (feel free to modify the code)

 

All the code is included, you must compile it with the MADS assembler, or just move the .obx executables over the emulator.. (select the NTSC option in "Video System" the first time, to see them at my "intended" speed).

 

I'm still playing with the code and there is still room for improvement, so maybe I will post an update in some days.

 

Greetings to the Atariware forum, all AtariAge and the people in the "8-bit Computer Poll" post, in the Retro Gamer forum (most of them) :)

 

NRV

 

post-11240-1181193305_thumb.png post-11240-1181193323_thumb.png post-11240-1181193331_thumb.png

 

post-11240-1181193337_thumb.png post-11240-1181193343_thumb.png

 

soft_sprites.zip

  • Like 1
Link to comment
Share on other sites

Nice. Unrolled code all right (just from reading the text file).

 

I guess it would be even faster than the strip method (as used in ZoneRanger).

 

But, pretty heavy on the RAM usage.

 

I reckon with 128K, something like that would work really well. And, the code could be dynamically generated.

Link to comment
Share on other sites

Greetings to the Atariware forum, all AtariAge and the people in the "8-bit Computer Poll" post, in the Retro Gamer forum (most of them) :)

 

H'lo! =-)

 

That last example is very interesting, are you planning to slap a game engine around it perhaps...? Adam Tierney is offering to do graphics right now, and he's done some stunning work for the VCS before now and might be persuaded, look for the user name "salstadt" over in the Homebrew sub of this board.

Link to comment
Share on other sites

Well, I just want to say one thing:

 

COOOOL!!!! :-o

 

 

nice it also works in 60hz, and nice to see how much CPU time is still left. That answers a lot of questions I had in mind :)

 

....but, I still wonder what happens if you want to use different sprite (shape) definitions. Wouldn't it use tons of unrolled/compiled sprite code???

 

but anyway very nice job.

Link to comment
Share on other sites

Well, I just want to say one thing:

 

COOOOL!!!! :-o

 

 

nice it also works in 60hz, and nice to see how much CPU time is still left. That answers a lot of questions I had in mind :)

 

....but, I still wonder what happens if you want to use different sprite (shape) definitions. Wouldn't it use tons of unrolled/compiled sprite code???

 

but anyway very nice job.

 

 

Well, for an "Attack-wave" always one shape was enough.

 

What about limiting those to 8-12 software sprites and to overlay the graphics with mutliplexed PM objects?

Just for an experimental purpose...

Link to comment
Share on other sites

Working out the storage requirements...

 

I assume the code sequence unrolled would just be

 

LDA (screen_pointer),Y

STA save_area + <iteration> ; 0,1,2...

AND # <mask>

ORA # <sprite_data>

STA (screen_pointer),Y

INY

 

then repeat that section e.g. 3 times for a 12 pixel wide sprite (only 2x needed if on a 4-pixel boundary)

 

then update screen pointer for the next line. Add 36 to screen_pointer - saves us having to zero Y.

LDA screen_pointer

CLC

ADC #36

STA screen_pointer

BCC NOINC

INC screen_pointer+1

NOINC ;

 

then repeat from top for subsequent lines of the sprite.

 

but... we run into problems once Y wraps around, so after so many iterations, we'd have to just add 40, and zero Y again.

 

 

 

 

We would need to do some cycle counting to see just how it compares to the strip technique.

Edited by Rybags
Link to comment
Share on other sites

Very nice!

 

I'm doing something similar on the Space Harrier cart (for 64K Ataris), but using a strip sprite method (the 130XE only demo doesn't do this). I haven't fully examined your code (too many macros for me at the moment!), but if it's pure speed you're after, you may get better milage dedicating some more zero pages to the job, combined with a 256 byte wide screen? Works well for me, especially on large objects - set the y register only once per column, and if you've got any duplicate non-mask data bytes in the stripe it only needs loading once. Anyway, just a thought, my way might not be as good for what you're doing.

Edited by Sheddy
Link to comment
Share on other sites

Agreed - aligning each scanline on a page boundary saves the load/clc/add/store for the low-byte of the screen pointer. All that's needed then is an unconditional INC of the high byte each line.

 

But the big problem is that we only have 256 pages in a 64K system.

 

We could do alignment on 64 or 128 boundaries and use ASL ops on the screen_pointer (low byte). Probably save a few cycles there.

 

ASL screen_pointer

BCS CHECKOVERFLOW

BNE DO_NEXTLINE

; carry clear, screen_pointer=00 means we store $40 there

LDA #$40

BNE STOREPTR

CHECKOVERFLOW BEQ SET192

INC screen_pointer+1

LDA #0

BEQ STOREPTR

SET192 LDA #$C0

STOREPTR STA screen_pointer

DO_NEXTLINE ; continue drawing here...

 

Quick job there... I think that should do a quick add 64 to a zero-page pointer - of course, it only works for pointers aligned to a 64 byte boundary.

Link to comment
Share on other sites

Hi, many thanks for all your words :)

 

 

about the unrolled code:

 

	LDA (screen_pointer),Y
STA save_area + <iteration>; 0,1,2...
AND # <mask>
ORA # <sprite_data>
STA (screen_pointer),Y
INY

 

my code for masked sprites is almost the same:

 

	LDY #[row_byte + line * 32]
LDA (screen_pointer),Y
AND # <mask>
ORA # <sprite_data>
STA (screen_pointer),Y

 

for every byte of the sprite.

 

I don't use the "save_area" that you mention (I suppose that is for later restoring the background or erasing the sprite), because I have a clean background buffer that I use to erase the sprites (only in the demo5b). Your method is more memory "friendly" and seems to be more general.

 

Also because my lines are of 32 bytes of width, "Y" goes from 0 to 31 for the first line, 32 to 63 for the second, until the eighth line, after which we do only one "INC screen_pointer+1", and "Y" goes back to the 0-31 range (obviously you could use this trick with any line whose width is a power of 2).

 

But starting with the demo3 I made an optimization to the sprite draw routines macros, so the code for every sprite byte now looks like this:

 

.if [[mask_x0_y0] = 0]

ldy #0
lda #[sprite_data_x0_y0]
sta (m_ptrScreen1),y

.elseif [[mask_x0_y0] < $FF]

ldy #0
lda (m_ptrScreen1),y
and #[mask_x0_y0]
ora #[sprite_data_x0_y0]
sta (m_ptrScreen1),y

.endif

 

Thanks to the "magic" of macros, now you have 3 options for every byte of the sprite:

 

- if the byte mask is $FF then you don't create any code for that byte!

- if the byte mask is $00 then you don't need to do any masking because you are going to replace all the byte anyways!

- in any other case you go with the normal routine

 

you can see this like a kind of compression of the sprite. It helps more with bigger sprites and basically you use the masked version to draw the borders of a sprite, and the unmasked one to draw the inside of the sprite.

 

(there is an extra optimization that you could add, if you use the unmasked code for two equal and consecutive bytes, then you could omit the "lda #[sprite_data_x0_y0]" after the first time.. but that would be easier to do if you have a tool that generates the pre compiled sprite code automatically, from a bitmap file)

 

With this every sprite byte can cost you 0, 9 or 16 cycles, and 0, 6 or 10 bytes of code (because of this last optimization I couldn't use "iny" to save one more byte, but an automatic tool also could do it... hmm)

 

 

Other big optimization that you can use is what I call the "wave" version of the code: if you know you are drawing a lot of sprites with the same frame (and I mean the same rotation), then you could draw groups of sprites like this:

 

.if [[mask_x0_y0] = 0]

ldy #0
lda #[sprite_data_x0_y0]
sta (m_ptrScreen1),y
sta (m_ptrScreen2),y
sta (m_ptrScreen3),y
sta (m_ptrScreen4),y

.elseif [[mask_x0_y0] < $FF]

ldy #0
lda (m_ptrScreen1),y
and #[mask_x0_y0]
ora #[sprite_data_x0_y0]
sta (m_ptrScreen1),y

lda (m_ptrScreen2),y
and #[mask_x0_y0]
ora #[sprite_data_x0_y0]
sta (m_ptrScreen2),y

lda (m_ptrScreen3),y
and #[mask_x0_y0]
ora #[sprite_data_x0_y0]
sta (m_ptrScreen3),y

lda (m_ptrScreen4),y
and #[mask_x0_y0]
ora #[sprite_data_x0_y0]
sta (m_ptrScreen4),y

.endif

 

this is to draw 4 equal sprites for example.

 

 

Sorry about the macros, I try to not nest them, but as the code evolved and the size of the sprites grow, I also get tired of the cut/paste :)

 

For what's next all depends on how much time I have available, I was thinking in adding scrolling and more color through the use of P/M and DLI's, maybe for a fast vertical shooter or a side scroller. Is good to know that there is people with talent that can take requests for graphics and music :) (like Adam Tierney and Kjmann)

 

The other thing that I know that I need, is a good pc-oriented graphic tool, to mix screen, characters, level data, P/M editing, animation, color editing, all specific to the A8 (but configurable), and with a simple way to export usable data (like precompiled sprites code!). I have wanted to do something like this for sometime, and open source the code so the community can make improvements.

 

Greets to all!

 

NRV

Edited by NRV
Link to comment
Share on other sites

Forgot that... of course since it's unrolled code you can forget about doing mask ops if the particular sprite data byte has no "transparency". And, just skip the store op completely if the source data is #$00.

 

That could potentially save a lot of cycles, especially with larger sprites.

 

Also, the double advantage of 32 byte screen lines - you can optimize the code for pointers.

Edited by Rybags
Link to comment
Share on other sites

I would think dedicating some more zero pages to the job still makes sense for the larger 16x16 (and bigger) sprites even if you didn't want to go down the fully strip sprite route. (This is exactly what I've been doing until fairly recently, before moving over to the 256 byte wide screen strip sprites). For the 8 line high sprites I don't see any advantages over what you're doing already - like I said before - very nice! - it would seem to be optimal.

 

Using a 16x16 non-mask (opaque) sprite as an example:

The way things are at the moment, it can be best case done in 581 cycles (no extra cycles for page "faults"):

9 cycles per byte*4*16 bytes=576

5 cycles for the "inc zp+1" in the middle

581 total

 

Setting up another zero page pair instead of the "inc zp+1" gives the bonus of not having to load the y register at all for the lower part of the sprite. The extra overhead for set up of another zero page pair is minimal as the low bytes are the same, and the high byte can take the same cycles as an "inc zp":

9 cycles per byte*4*8 bytes=288

7 cycles per byte*4*8 bytes=224

3 cycles for sta zp+2

5 cycles for inx, stx zp+3 (instead of inc zp+1)

520 total

 

You'd also save some memory as well of course :)

I can also recommend that it is definitely worthwhile writing a utility to help compile the sprites (especially with bigger sprites, it's too easy to miss duplicates and other optimizations)

As always, everyone feel free to rip this apart if I've made any incorrect assumptions!

*Puts on tin foil hat and hides in corner*

Edited by Sheddy
Link to comment
Share on other sites

Going back to the increment of the pointer - just using the standard 40 byte screen, you could save cycles by just leaving the pointer low byte alone.

 

TYA

CLC

ADC #36 ; (assuming we have just written out 4 bytes of sprite data)

TAY

BCC DONT_INC_H

INC SCREEN_POINTER+1 ; bump screen pointer high byte

DONT_INC_H ; continue writing next line

 

Also, for the initial setup - just have the SCREEN_POINTER set to whatever address the scanline starts on - and set the Y register to the correct offset based on the XPOS of the sprite.

 

Just thinking about a Galaga type game. Using these optimized methods would make Galaga a real possibility. The enemy sprites only really need a "restore" buffer if they are at a Y-position such that they might be overlapping the stationary formation.

 

When they are swooping, just a mask/store operation is needed - then zero out the sprite instead of a restore op. Of course that is assuming that the starfield is either made using PMGs, or similarly drawn single pixel playfield softsprites.

 

That would leave sufficient cycles to have a genuine formation which swells and contracts it's size like the arcade original.

Link to comment
Share on other sites

Going back to the increment of the pointer - just using the standard 40 byte screen, you could save cycles by just leaving the pointer low byte alone.

 

TYA

CLC

ADC #36 ; (assuming we have just written out 4 bytes of sprite data)

TAY

BCC DONT_INC_H

INC SCREEN_POINTER+1 ; bump screen pointer high byte

DONT_INC_H ; continue writing next line

 

that soon adds up to quite a lot if you have to do it every scan line. for a 40 byte screen it'd probably be better to reference 6 lines using y reg, then add 240 to the pointer.

Edited by Sheddy
Link to comment
Share on other sites

I would think dedicating some more zero pages to the job still makes sense for the larger 16x16 (and bigger) sprites even if you didn't want to go down the fully strip sprite route. (This is exactly what I've been doing until fairly recently, before moving over to the 256 byte wide screen strip sprites). For the 8 line high sprites I don't see any advantages over what you're doing already - like I said before - very nice! - it would seem to be optimal.

 

That's a nice optimization! but you have opened that door, now we must get in (what have you done..) :D

 

I thinked about using more pointers, but I was too lazy to make the cycle counting.. you are right about the 16x16 version, but not only that, the 8x8 can also be improved:

 

Assuming your sprite has 16 bytes, your unrolled code will have 16 "ldy #", and every time you add a new pointer you eliminate the half of the "ldy #" still present in the code.

With that you save 2 cycles and 2 bytes for every "ldy #" eliminated.

But you also add the 8 cycles and 5 bytes (approx.) of the new pointer initialization.

 

Balancing that for the sprite of 8x8, 16 bytes:

- adding the first extra pointer:

--> we save (16/2)*2 = 16 cycles, and (16/2)*2 = 16 bytes

--> we add 8 cycles and 5 bytes

= we have a net gain of 8 cycles and 11 bytes per sprite (useful for the 64 sprites demo!)

 

Balancing for a sprite of 16x16, 64 bytes:

- adding the first extra pointer:

--> we save (64/2)*2 = 64 cycles, and (64/2)*2 = 64 bytes

--> we add 8 cycles and 5 bytes

 

- adding the second extra pointer:

--> we save (32/2)*2 = 32 cycles, and (32/2)*2 = 32 bytes

--> we add 8 cycles and 5 bytes

 

- adding the third extra pointer:

--> we save (16/2)*2 = 16 cycles, and (16/2)*2 = 16 bytes

--> we add 8 cycles and 5 bytes

 

= we have a net gain of 88 cycles and 97 bytes per sprite

(we also eliminate the "inc zp+1" that was in the middle of my code, that's an extra saving)

(if we add another pointer we will only save some bytes, but we will complicate the code too much)

 

by the way (for the 16x16 sprite) I could distribute the 4 pointers in this way:

pointer1 = top left byte offset

pointer2 = pointer1 + 256

pointer3 = pointer1 + 1

pointer4 = pointer1 + 256 + 1

 

or this other way:

pointer1 = top left byte offset

pointer2 = pointer1 + 256

pointer3 = pointer1 + 2

pointer4 = pointer1 + 256 + 2

 

please get out of your corner (and throw rocks at mine, now :) )

 

 

Two things that I would like to ask you:

- what do you mean with the "fully strip sprite route" (using a char screen?)

- what is really the advantage of using screen lines of 256 bytes? wouldn't you fill the memory too fast? (I think that for a sidescroller I will use the first 64 bytes of a line for the screen 1 buffer, the next 64 bytes for the screen 2 buffer, and the next 64 bytes for the restoration screen buffer, the last 64 bytes I don't know what I would put there.. maybe interleaving some code or other data :? )

 

Could you elaborate further?

 

Thanks

 

NRV

Edited by NRV
Link to comment
Share on other sites

It would probably be best to set the initial pointer value to the absolute screen address of the top left of the sprite. All that does is add a few extra cycles to the initial setup. But, that way, we know in advance what value the Y Reg has, and can hard-code all the increments for the high byte of the screen pointer where they're needed, rather than having to test for overflow conditions.

 

At the end of each line you just have a LDY #40, 80... etc.

 

And when "Y" has overflow conditions, just increment the high byte of screen pointer.

 

Remember, since the code is all unrolled, we know in advance what values "Y" will have - the only "variable" in the entire situation is the value of the screen pointer.

Link to comment
Share on other sites

Nice. Unrolled code all right (just from reading the text file).

 

I guess it would be even faster than the strip method (as used in ZoneRanger).

 

But, pretty heavy on the RAM usage.

 

I reckon with 128K, something like that would work really well. And, the code could be dynamically generated.

 

Heavy on RAM usage? ;)

 

Then look at these ones (sources included).

Link to comment
Share on other sites

I would think dedicating some more zero pages to the job still makes sense for the larger 16x16 (and bigger) sprites even if you didn't want to go down the fully strip sprite route. (This is exactly what I've been doing until fairly recently, before moving over to the 256 byte wide screen strip sprites). For the 8 line high sprites I don't see any advantages over what you're doing already - like I said before - very nice! - it would seem to be optimal.

 

That's a nice optimization! but you have opened that door, now we must get in (what have you done..) :D

 

I thinked about using more pointers, but I was too lazy to make the cycle counting.. you are right about the 16x16 version, but not only that, the 8x8 can also be improved:

 

Assuming your sprite has 16 bytes, your unrolled code will have 16 "ldy #", and every time you add a new pointer you eliminate the half of the "ldy #" still present in the code.

With that you save 2 cycles and 2 bytes for every "ldy #" eliminated.

But you also add the 8 cycles and 5 bytes (approx.) of the new pointer initialization.

 

Balancing that for the sprite of 8x8, 16 bytes:

- adding the first extra pointer:

--> we save (16/2)*2 = 16 cycles, and (16/2)*2 = 16 bytes

--> we add 8 cycles and 5 bytes

= we have a net gain of 8 cycles and 11 bytes per sprite (useful for the 64 sprites demo!)

 

Balancing for a sprite of 16x16, 64 bytes:

- adding the first extra pointer:

--> we save (64/2)*2 = 64 cycles, and (64/2)*2 = 64 bytes

--> we add 8 cycles and 5 bytes

 

- adding the second extra pointer:

--> we save (32/2)*2 = 32 cycles, and (32/2)*2 = 32 bytes

--> we add 8 cycles and 5 bytes

 

- adding the third extra pointer:

--> we save (16/2)*2 = 16 cycles, and (16/2)*2 = 16 bytes

--> we add 8 cycles and 5 bytes

 

= we have a net gain of 88 cycles and 97 bytes per sprite

(we also eliminate the "inc zp+1" that was in the middle of my code, that's an extra saving)

(if we add another pointer we will only save some bytes, but we will complicate the code too much)

 

by the way (for the 16x16 sprite) I could distribute the 4 pointers in this way:

pointer1 = top left byte offset

pointer2 = pointer1 + 256

pointer3 = pointer1 + 1

pointer4 = pointer1 + 256 + 1

 

or this other way:

pointer1 = top left byte offset

pointer2 = pointer1 + 256

pointer3 = pointer1 + 2

pointer4 = pointer1 + 256 + 2

 

please get out of your corner (and throw rocks at mine, now :) )

 

Thanks

 

NRV

 

*throwing snowball rather than rock for fun*

 

you're right about the 8x8 sprite, I made a quick calculation mistakenly thinking the cycles would work out the same as the original code.

 

on the 16x16 sprite: yep, quite right - I'm glad you worked that out yourself :) (just dropping every nuance of an idea in one go is no way to properly explore it) [Edit: Rereading: Oops - no sarcasm was intended (and I didn't intent to come across as a pretentious jerk by saying that)]. as you've worked out, it's all a case of balancing the size of the sprite versus how many zero pages it is worth setting up beforehand for the screen width. for a very wide sprite it sometimes works out best to have a zero page pair every line: smaller ones, every other line, etc. this is why I refer to it as not a fully strip sprite method (but it is when you have a pair every line) :). since I've been using very variable size sprites, I let a utility work out the best line spacing for me for the maximum amount of zero pages I can spare.

 

I would have all the zero pages pointing to the left side of the sprite, that way you can always do an "iny" instead of a "ldy #" as you go horizontally across the sprite.

 

it's not so much of a problem with the sprite sizes you're using, but the extra cycles from going over page boundaries (I just call them "page faults") has to be taken into account for deciding optimal line spacing - sometime it may be worth using less of the y register range:

 

for example, with arbitrary positioning of the original 16x16 sprite, you'll get just over 28 cycles on average extra due to page faults. with the 4 pointer version, you'll only get on average a little over 12 cycles

Edited by Sheddy
Link to comment
Share on other sites

Two things that I would like to ask you:

- what do you mean with the "fully strip sprite route" (using a char screen?)

- what is really the advantage of using screen lines of 256 bytes? wouldn't you fill the memory too fast? (I think that for a sidescroller I will use the first 64 bytes of a line for the screen 1 buffer, the next 64 bytes for the screen 2 buffer, and the next 64 bytes for the restoration screen buffer, the last 64 bytes I don't know what I would put there.. maybe interleaving some code or other data :? )

 

Could you elaborate further?

 

Thanks

 

NRV

 

By "fully strip sprite route" I just mean every line of the sprite has a zero page pair [edit: missed "with a 256 wide screen"], allowing the y to be set only once per column (strip)

Yes, the memory does fill up fast, and there will be (a lot of) wasted space that is not very useful. I use 85 lines of screen, so have to use 21.25K of contiguous RAM to do it. For me there is maybe less waste due to using the much loathed screen flickering for more colours. each of my screens is 48 bytes wide (for clipping) and there are 4 screen buffers instead of the usual 2. some of the 64 bytes remaining on each line will just be used for other buffers/variables etc. but unfortunately most will be wasted.

 

the big benefits I see from this width of screen is that assuming you can spare the zero pages it is really great for tall sprites:

  • your low byte of each zero page is fixed - never needs changing (y register can do it all)
  • setting up each zero page high byte only takes 5 cycles (inx, stx zp+3, inx, stx zp+5...for however many lines high)
  • as you've already seen, setting the y only once per byte of sprite width will make a huge difference in size and cycles
  • the y register can pass the screen buffer offset plus the x position. you just do an "iny" between each column of sprite
  • there's a much higher probability of duplicated non-mask (opaque) bytes where the accumulator doesn't need loading again (I often get hundreds of duplicates in the larger space harrier sprites)
  • there need be absolutely no extra cycle penalties for crossing page boundaries (this can add up to a lot for large sprites)

Edited by Sheddy
Link to comment
Share on other sites

*throwing snowball rather than rock for fun*

 

you're right about the 8x8 sprite, I made a quick calculation mistakenly thinking the cycles would work out the same as the original code.

 

on the 16x16 sprite: yep, quite right - I'm glad you worked that out yourself :) (just dropping every nuance of an idea in one go is no way to properly explore it) [Edit: Rereading: Oops - no sarcasm was intended (and I didn't intent to come across as a pretentious jerk by saying that)]. as you've worked out, it's all a case of balancing the size of the sprite versus how many zero pages it is worth setting up beforehand for the screen width. for a very wide sprite it sometimes works out best to have a zero page pair every line: smaller ones, every other line, etc. this is why I refer to it as not a fully strip sprite method (but it is when you have a pair every line) :). since I've been using very variable size sprites, I let a utility work out the best line spacing for me for the maximum amount of zero pages I can spare.

 

I would have all the zero pages pointing to the left side of the sprite, that way you can always do an "iny" instead of a "ldy #" as you go horizontally across the sprite.

 

it's not so much of a problem with the sprite sizes you're using, but the extra cycles from going over page boundaries (I just call them "page faults") has to be taken into account for deciding optimal line spacing - sometime it may be worth using less of the y register range:

 

for example, with arbitrary positioning of the original 16x16 sprite, you'll get just over 28 cycles on average extra due to page faults. with the 4 pointer version, you'll only get on average a little over 12 cycles

 

sorry for any potentially jerkish comments in there. It's amusing that you found the better zero page pair spacing when I should have been able to use it as the example first time. I'll use the excuse that I'm not used to working with a 32 byte wide screen :dunce:

 

Please share if you find better places to position the zero pages than down the left side.

 

By the way, I don't think the 256 wide screen will give you any great advantages with 16x16 sprites (or 8x8) and drawing only 1 screen buffer. It works better for me because I have to setup zero pages for 2 screen buffers otherwise.

Edited by Sheddy
Link to comment
Share on other sites

Hmm, NRV, converting my old 2 buffer zero page set up to a single buffer, I can't get to 8 cycles per pair when they are not spaced 256 apart. Did you have something special in mind for that? I'm thinking along these lines. Is there a better way?:

(assume a has 1st pointer position low byte and x has high)

adc #64
sta zp+2
bcc b1
inx
clc
:b1
stx zp+3
adc #64
sta zp+4
bcc b2
inx
clc
:b2
stx zp+5
...etc.

 

Maybe the 256 spaced zero page pairs is actually quicker for the 16x16 sprite then?

 

[Edit: I suppose when they are spaced 128 apart we can do an eor #80 rather than add, so no clear carry is needed, but that doesn't help enough if there is arbitrary value in 1st pointer low byte?]

[Next Edit: Ah - maybe if we had different routines for odd and even y position...]

[Final Edit: not odd and even, just +ve/-ve, then it can be done in just over 9 cycles avg. so yes, 128 spacing is still better!]

Edited by Sheddy
Link to comment
Share on other sites

Don't forget, if you do away with 256 byte screen lines (or 64, 128, whatever other than default), you then get a 2 cycle per line saving due to not having LMS in the Display List.

 

Not a huge gain, but maybe 400 cycles or so per frame for a standardish display.

Link to comment
Share on other sites

Don't forget, if you do away with 256 byte screen lines (or 64, 128, whatever other than default), you then get a 2 cycle per line saving due to not having LMS in the Display List.

 

Not a huge gain, but maybe 400 cycles or so per frame for a standardish display.

 

good point.

I work out a 256 byte wide screen 16x16 strip sprite would take 531 cycles. even with no extra cycles for going over page boundaries and maybe avoiding more accumulator loads because of duplicates in the strip, it's just not going to be worth it because of those LMS's

Link to comment
Share on other sites

aargh.. sorry for been late, sometimes life get in the way :)

 

yep, avoiding LMS's is one of the reasons that I choose a gr.7 32 byte wide mode for my demos, only 32 cycles every 2 scan lines, compared to other modes.. well, anyways 2 extra cycles per 96 lines don't seem so bad, and for scrolling I will need those LMS's

 

to sheddy..

 

about your comments: never mind! I never readed that like sarcasm (not from someone doing a Space Harrier conversion for the A8, I can tell that you know your sprites :) ) in the worst case I get motivated to find a better solution. Also, English is not my first language, so you need to pass all my comments through some kind of "good intentions" filter before reading them :D

 

about your question I thing I'm a little lost (I should go to sleep and answer tomorrow), you are trying to init pointers for every scan line, assuming a width of 128 bytes for every line? I don't know if I'm answering your problem (give me more info, I like this kind of problems :D), but I probably will try to init at least one pointer every 256 bytes and the others continuosly after the first one, like this:

 

ldx zp

ldy zp+1

 

inx

stx zp+2

sty zp+3

 

inx

stx zp+4

sty zp+5

 

inx

stx zp+6

sty zp+7

 

...

 

; alternate use of iny, inx and dex should let me init all the pointers, and when I need to draw I would do something like this:

 

ldy #0

lda #sprite_data1

sta (zp), y

lda #sprite_data2

sta (zp+2), y

lda #sprite_data3

sta (zp+4), y

lda #sprite_data4

sta (zp+6), y

 

ldy #4

lda #sprite_data5

sta (zp), y

lda #sprite_data6

sta (zp+2), y

lda #sprite_data7

sta (zp+4), y

lda #sprite_data8

sta (zp+6), y

 

I suppose that a fixed example should be more clear, I don't know is this will really work, but we could try..

 

NRV

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...