When exactly does the DLL / DL get read and processed?

PacManPlus · October 5, 2014

In every game I've made, I've always processed the 'Loader' (i.e. the updating of the DL with new sprite positions) just after the last visible display zone on the screen. I always thought that was how it had to be done. But lately I started wondering, If the DL gets processed (and therefore the screen updated) during VBlank, I can theoretically call the 'Loader' any other time, as long as it's consistent, correct?

I ask this because I experimented with an existing game I had started writing that used to crash from having too many sprite updates. I moved my 'Loader' routine to just before the *first* display zone of the screen (via a DLI), and everything now worked. If this is so, then it clears up an issue I've been having where it seemed that I was more limited in the number of sprites on the screen than I should have been. This can open up a whole new world for me

I would guess, as long as I'm not doing any manipulating of data with DLIs between zones, I can call my 'Loader' routine at the beginning of a screen, rather than the end, allowing for more time to process more sprites, Correct?

Thanks,

Bob

Edited October 5, 2014 by PacManPlus

Rybags · October 5, 2014

I should think it's more of a "just in time" sort of process. You could probably wait until the last possible non-active scanline or just before to set the actual DL pointer.

Double-line buffering, I'd assume a scanline is built on the preceding one then the next one is built while the first is displayed etc.

Actual DL entries would be read for every scanline they're active, sort of wasteful but I doubt Maria would have buffering for any other than the info relevant to the current object being processed.

This is based on docs I've read + some assumptions. I suppose a good test method would be to just push it to the limits and see what gets omitted on the real machine - work it all out by deduction and proving it.

I suppose the conventional "wisdom" of having everything ready to go before active display starts is based on an assumption that sufficient DMA loads will steal most of the CPUs cycles and leave it with not much time to do anything. But of course the case might be e.g. for an Asteroids type game that the load is highly variable and you might have cases of entire scanlines with barely any cycle steals.

RevEng · October 5, 2014

The pointer to your DLL (DPPH+DPPL) gets loaded as VBLANK ends. I believe the pointer to the DL location is read from the DLL as part of DMA for the last scanline, but it might be a part of each scanline DMA. The DL itself is traversed during DMA during each scanline in the zone.

So long as the DL list is consistent and terminated when that zone is being drawn, you should be fine for big glitches. If you're modifying the DLs on-screen, you need to take care that they all get updated consistently-before or consistently-after the zone is drawn. If some DLs are updated before the zone is displayed, and other DLs updated after the zone is displayed, you'll wind up with a zone boundary that when crossed will shear/tear sprites in motion.

Overall the 7800basic code works similar to your described traditional scheme, waiting for the visible zones to complete before updating DLs. I've been mulling over a few ideas to amp this up, since I max out the non-visible time available for DL updates around 24 zone-height 160A sprites on a 160A background.

Moving to double-buffering is a non-starter for me, since it would use twice the ram, and I can't assume extended ram schemes. It might not be for you, though I suspect you're looking for a good general scheme like I am.

The easiest modification would be to track the currently drawn zone with DLIs. When plotting an object, If we're currently in the visible screen but the zone in question is already passed, we go ahead and modify the DL. If not, wait.

The benefit here is maximized if you plot objects in Y order, but average case benefits would still be a win. Worst case it would be the same as the traditional method.

The CPU overhead for the tracking each zone is minimal, since you're just incrementing a variable at the end of each zone, with a small bit of additional non-interrupt code to clear the variable when the frame is done.

PacManPlus · October 6, 2014

Thanks guys

So from what it sounds like, I am better off leaving it where it currently is (after the last display zone on the screen), unless I want to completely overhaul the routine. Damn.

@RevEng, you are correct; I am also looking for a good general scheme, as it seems the ones I am using take too much time.

I actually use 2 types:

- Pre-built DLs, (copied to RAM of course) including all sprites, for games that have all sprites (mostly) always on the screen (i.e. Pac-Man). Only the HPOS and VPOS get changed from frame to frame.

- "On-the-fly" DLs (built in RAM), when there could be any number of sprites on the screen at a given time (like Asteroids, Scramble, Bentley Bear, Astro Blaster, etc.) Each line of the DL is built every frame

looks like I'll have to find a 'third' way

Thanks again,

Bob

RevEng · October 6, 2014

If you come up with something brilliant, let me know. I'll be more than happy to copy.

RevEng · October 14, 2014

I tried a few simpler things with 7800basic and struck out. A few notes on what doesn't work for sprite routine optimization, so you don't waste time on the same path...

technique: Waiting until the zone a sprite is in has already been drawn, before updating.

test notes: The problem with this is the worst case - where a sprite with a low Y is drawn early in the game loop - happens often if you don't have specific game logic to preclude it. Sorting the sprites by Y may be expensive, and could be complicated depending on the game.

technique: Avoid drawing a sprite while it's zone is being drawn.

test notes: This works pretty well, except for the shearing effect I mentioned. It allowed me to move ~34 bouncing sprites around the screen, where my previous limit was 24. It might work ok for a game that has fairly constant CPU usage up to the sprite routines, and/or if the sprites don't move vertically, or if you don't care that the sprites stutter/shear sometimes. Not so hot for a general routine.

There's another failed approach I dug up from Eric Ball's blog. I'll avoid paraphrasing his results - you can read them yourself - but a quick quote for truth from the same entry...

"One of the frustrating aspects of 7800 coding is constructing the display lists. On paper it sounds like a great plan - a list of pointers to sprites with width and horizontal positioning info. Very powerful and flexible, but damn difficult to use in practice."

PacManPlus · October 14, 2014

I jut read that. I'm using the same (or very similar) routine he used in the ball demo in something I'm working on (go through list of sprites, determine which DL they belong in, add them, terminate list at end).

I think the thing here, is to do as much of the calculations outside of the routine that updates the DLs during VBlank (my 'loader') as possible. I keep tables of sprites (VPLIST for vertical position of each sprite, HPLIST for horizontal position of each sprite, etc.) I started keeping the Vertical offset list (for both zones that a sprite could take up in a 16-line zone) as well so I didn't have to calculate it within the 'loader' routine.

We'll see how that works...

Thanks for the link & information

RevEng · October 14, 2014

I like it. Keeping the lists separate wastes a bit of RAM, but it allows you to use absolute addressing rather than indirect, and there's less index incrementing too. :thumbsup:

EricBall · October 14, 2014

In http://atariage.com/forums/blog/7/entry-3089-7800-cycle-counting/ I mention using (zp,x) being more efficient and say "I'll modify the rest of the sample code and post it." Of course, there's no follow-up post (on AA at least).

I remember trying to figure out the Robotron disassembly, but not getting very far. It's fundamentally a data structure & transform issue - how best to structure your data to make it efficient to transform into the 7800 display list structure.

RevEng · October 14, 2014

Some excellent stuff in that thread, including the (zp,x) approach. Thanks!

EricBall · October 15, 2014

Ahh, found it:

sample 7800 source code using (ZP,X)

RevEng · October 16, 2014

Thanks for that too, Eric. More food for thought.

I'm going to give a try at a version that builds up the DLL along with the DLs, and see how far that gets me. RAM is pretty tight, and a dynamic scheme would give more flexibility as to how 7800basic programs could be structured visually.

I'm hoping I can pull it off without the additional overhead making it cost more than my vanilla "wait for non-visible and update" routines.

CPUWIZ · October 16, 2014

BAH! You have a devcart, use the RAM. I got the boards to make games with RAM.

RAM is pretty tight, and a dynamic scheme would give more flexibility as to how 7800basic programs could be structured visually.

RevEng · October 16, 2014

Hah. Yes, ultimately that's my safety net.

Still, its fun to push at the limits. Maybe I can boost stock performance and save all that on-cart RAM for the developers.

CPUWIZ · October 16, 2014

Still, its fun to push at the limits.

That is my job in real life, so I know what you mean.

EricBall · October 16, 2014

I'm going to give a try at a version that builds up the DLL along with the DLs, and see how far that gets me. RAM is pretty tight, and a dynamic scheme would give more flexibility as to how 7800basic programs could be structured visually. I'm hoping I can pull it off without the additional overhead making it cost more than my vanilla "wait for non-visible and update" routines.

A truly dynamic (i.e. no wasted RAM) DL builder is certainly possible. What you would need to do is to order your sprites by Y or loop through the list of sprites, picking out those sprites for the zone. IIRC I looked at the latter at one point and found the cycles required to loop through the list of sprites for each zone was huge. Maybe build a second table with just Y and sprite index which could then be sorted . . . still sounds like a lot of CPU cycles.

RevEng · October 16, 2014

Maybe build a second table with just Y and sprite index which could then be sorted . . . still sounds like a lot of CPU cycles.

Yeah, I had this in mind as a possible optimization... and list of indexes sorted by Y or zone. The problem is the sorting, for sure. Also, from a big O perspective, sorting that list of indexes looks a lot like traversing through the list for each zone, so it may not get me anywhere; it just depends if the sorting will blow through my available visible time or not. (visible time not being a constant either, since different 7800basic program will have varying levels of DMA)

I'm going to start with the less clever implementation and see where it leaves me first. I've found the old Knuth "premature optimization is the root of all evil" quote to be doubly true on the 6502.

DracIsBack · October 27, 2014

I'm using the same (or very similar) routine he used in the ball demo in something I'm working on

I stopped reading for a moment and froze and this part of the thread ...

;-)

PacManPlus · October 27, 2014

When exactly does the DLL / DL get read and processed?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members