Jump to content
IGNORED

DMA Masking - an alternative to modifying CHARBASE when vertically scrolling


RevEng

Recommended Posts

Like with many other systems, a vertical scroll routine on the 7800 is a combination of fine-scrolling and coarse scrolling. The coarse scroll is just a straightforward change of data being displayed in each zone, so I won't talk about that. The fine scroll is where the interesting stuff happens.

 

The usual approach to fine-scroll upwards on the 7800 is to to decrease the height of a zone at the top of your scroll area, and correspondingly increase the height of a zone at the bottom of your scroll area. (swap "increase" and "decrease" in the previous sentence to fine-scroll downwards) The other thing you need to do on the 7800 is you need to set an interrupt that runs prior to the bottom zone, and adjust the CHARBASE character register; if you don't do this, when you shorten the height of the bottom zone, it causes the graphics in the last zone to incorrectly align with the graphics in the zone directly above it.

 

The problem with this technique, is that when the screen gets really DMA heavy (lots of objects) around your last zone, Maria will starve the 6502 of execution time, and your interrupt code will take several scanlines to complete, leaving some or all of the last zone without the CHARBASE fix.

 

So I came up with an alternative technique, which I'll dub "DMA Masking". Instead of modifying the height of the last zone, which is ultimately what messes up the alignment, you leave it the same height, but creatively use Maria's DMA-limits to mask-off the bottom lines of the zone as desired. i.e. you create 4x 32-byte wide sprite objects at the front of that last DL. Then you adjust all 4x sprite object HI pointers so they point partway into a DMA hole, deep enough that Maria will skip these objects for N scanlines (allowing other objects to be drawn) and then after N scanlines your objects are rendered, causing Maria to run out of DMA time for any other objects in the zone.

 

There are a few disadvantages to using DMA Masking:

1. 4 extra objects are now in that last zone, which means this zone has some extra DMA penalty, even when Maria is skipping your 4 sprites. (i.e. there are 44 fewer Maria cycles to spare) So if your last zone is heavily loaded with objects, a few of them may not be displayed in the unmasked area of the last zone.

2. To make the DMA masking sprites invisible, you'll either need to dedicate some ROM to empty graphics, or waste a palette index using it to draw your 4x sprites with the background color. (no longer a requirement, thanks to bsteaux, if you use 160B/320B/320C mode for the 4 masking sprites and position them off-screen)

 

The advantages of DMA Masking:

1. Since the timing of adjusting your bottom zone masking isn't super critical, it doesn't need to happen in an interrupt. This means simplified interrupts.

2. It results in minimal glitching for that final zone, even when your DMA penalties are through the roof. 

 

I think the trade-offs are worth it for many game designs. I've been playing around with it, and it works great in emulation (for emulators that implement DMA limits) and real hardware.

  • Like 8
Link to comment
Share on other sites

This sounds great, is this something that will find its way into 7800 in the future in terms of a feature or commands to manipulate it?

 

I was thinking about scrolling the other day and was going to ask. I figured for a coarse scroll I could "cheat" and use tiles and pokechar them in. Fine scrolling would be very neat indeed!

 

 

Edited by Muddyfunster
  • Like 4
Link to comment
Share on other sites

Yeah, scrolling is definitely on the "to do" list for 7800basic. Part of the challenge is doing scrolling in a generic way, is that different game designs have different needs from the scroll interaction, and that changes what needs to happen on the back end. e.g. Games that do infinite scroll and/or algorithmic terrain differ substantially from ones that just need to scroll over some ROM.

 

I'm trying to work it out - that's part of the reason I came up with DMA Masking - but as there's also a bunch of other things I need to take on first.

  • Like 8
Link to comment
Share on other sites

There's even bigger gains for this approach if you're not using character mode and want to keep your background drawing generic.

 

Another method which favors low color modes (320A / 320B) is the following...

  1. Don't modify the height of the last visible DLL region in your playfield - as with the DMA Masking technique.
  2. Trigger a DLI on the entry before this, then WSYNC into your fine scroll offset.
  3. Zero all the palette registers.

 

You'll lose cycles in the opposite direction here, and there is some risk of jitter since Sally is responsible for making the change. But since there's only a few registers to change the noise isn't much worse than the status bar divide in Super Mario Bros 3 or Kirby's Adventure. This is one of the reasons I dropped a raster comparator into a mapper, which drops the requirement for the WSYNC step.

  • Like 6
Link to comment
Share on other sites

4 hours ago, TailChao said:

There's even bigger gains for this approach if you're not using character mode and want to keep your background drawing generic.

Ah, I hadn't even considered that, which is funny, because I've been moving toward sprites-as-tiles designs in the last year or so. Excellent observation, and great suggestion for an interrupt-based alternative to CHARBASE tweaking too. ?

  • Like 2
Link to comment
Share on other sites

DMA Masking is useful as an alternative to the usual vertical scrolling approach. It allows the developer to max out the number moving objects on the screen, without fear of glitching out the bottom row. With the older method you'd need to either be more conservative with the number of moving objects, or risk corruption at the bottom. 

 

As TailChao pointed out, DMA Masking also allows for scrolling with game designs that use sprites as tiles, which in turn allows the game to draw different tiles with different palettes, resulting in a more colorful game.

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...
On 5/3/2020 at 9:03 AM, TailChao said:

Another method which favors low color modes (320A / 320B) is the following...

  1. Don't modify the height of the last visible DLL region in your playfield - as with the DMA Masking technique.
  2. Trigger a DLI on the entry before this, then WSYNC into your fine scroll offset.
  3. Zero all the palette registers.

A variation on your interrupt method, that occurred to me today... instead of hitting the palette registers when you're at the desired scanline, you could write a zero-terminator over the first object in that bottom zone DL. (and then restore it later) That would work better with higher color modes.

Untested (I'm sticking with dma masking) but I don't see why it wouldn't work.

  • Like 2
Link to comment
Share on other sites

10 hours ago, RevEng said:

A variation on your interrupt method, that occurred to me today... instead of hitting a the palette registers when you're at the desired scanline, you could write a zero-terminator over the first object in that bottom zone DL. (and then restore it later) That would work better with higher color modes.

Untested (I'm sticking with dma masking) but I don't see why it wouldn't work.

It should be fine - as long as the write makes it in. Maria walks the Display List each active line.

 

Three options, now ;)

  • Like 2
Link to comment
Share on other sites

  • 3 years later...

Hi,

"DMA masking" is I think the idea I was looking wrt vertical scrolling. At the moment, in cc7800, I'm using a black object to overlay the bottom of the screen, which is a similar idea but is not as effective as yours. I've got 2 questions to you Atari 7800 expert (I'm just starting on my side) :

1 - you're stating that on the upper part of the bottom DLL the DMA cost is 44 cycles. Why not 10 * 4  = 40 ? Is there something I misunderstand ? The idea is to overuse DMA on the bottom lines with 4 x 32 bytes. With 4 5-bytes DLs, the actual cost DMA cost would be (10 + 3 * 32) * 4 =  424 cycles, which together with the DMA startup and shutdown times + 7 CPU delay cycles should cover the 454 DMA cycles available. On the Holey DMA covered part, this should consume 40 cycles no ?

2 - What if we draw these 32 bytes outside the screen, at position 160 ? Do you know if Maria eats up DMA cycles in that case or if Maria is clever enough to disengage offscreen writing and spare DMA cycles ? (which would be nice in the general case, but not convenient in our case, since otherwise we would have to really draw something either transparent or background color ON screen).

Best regards !

 

Link to comment
Share on other sites

8 hours ago, bsteux said:

Hi,

"DMA masking" is I think the idea I was looking wrt vertical scrolling. At the moment, in cc7800, I'm using a black object to overlay the bottom of the screen, which is a similar idea but is not as effective as yours. I've got 2 questions to you Atari 7800 expert (I'm just starting on my side) :

1 - you're stating that on the upper part of the bottom DLL the DMA cost is 44 cycles. Why not 10 * 4  = 40 ? Is there something I misunderstand ? The idea is to overuse DMA on the bottom lines with 4 x 32 bytes. With 4 5-bytes DLs, the actual cost DMA cost would be (10 + 3 * 32) * 4 =  424 cycles, which together with the DMA startup and shutdown times + 7 CPU delay cycles should cover the 454 DMA cycles available. On the Holey DMA covered part, this should consume 40 cycles no ?

2 - What if we draw these 32 bytes outside the screen, at position 160 ? Do you know if Maria eats up DMA cycles in that case or if Maria is clever enough to disengage offscreen writing and spare DMA cycles ? (which would be nice in the general case, but not convenient in our case, since otherwise we would have to really draw something either transparent or background color ON screen).

Best regards !

 

1. It's been suggested in the past that an object's holey dma takes up 1 maria cycle, rather than none, for the calculation/abort. It's one of those things I should have verified a while back, but haven't done yet. I used the +1 cycle per holey graphic in my calculation, to err on the side of caution.

2. That should work - nice variation! Maria still "renders" objects that are outside the 0-159 range, with all the regular dma costs. It just doesn't actually update the scanline buffer with any out-of range pixels.

 

  • Like 1
Link to comment
Share on other sites

Thanks for your answer. This is what I thought about Maria - straightforward design: out of screen uses the same logic as in screen, but the line RAM is just not on chip for positions 160 to 255. This gives a good solution for vertical scrolling, since we can eat up DMA cycles for the last lines with any sprite data. No need for using precious holey DMA ROM for transparent or full data, and no need also to eat up a precious palette entry. I will implement this in my multisprites.h header in cc7800 and go back with the results.

Wrt 1 cycle DMA penalty existence, that would be surprising (but not excluded, a look at schematics would be required or an experimental check): in Maria, the high address on which the logic to continue the state machine for DMA access is based on (test on A11 or A12 bit) is not the last byte of each DL, so there is plenty of time to add the OFFSET and implement the logic to decide to jump to the next DL or continue with DMA access... The logic is a simple adder, followed by simple combinatorial logic, which certainly runs in a single cycle. I don't see why there would be a need for an additional cycle, and it's also not mentioned in GCC documentation.

Anyway, thank you for your answer and your expertise. We'll have a better vertical scrolling available soon...

Regards.

  • Like 2
Link to comment
Share on other sites

6 hours ago, bsteux said:

Wrt 1 cycle DMA penalty existence, that would be surprising (but not excluded, a look at schematics would be required or an experimental check): in Maria, the high address on which the logic to continue the state machine for DMA access is based on (test on A11 or A12 bit) is not the last byte of each DL, so there is plenty of time to add the OFFSET and implement the logic to decide to jump to the next DL or continue with DMA access..

I ran the test, and it's actually worse - you lose 3 cycles of DMA for sprites with graaphics that are in the hole. (in addition to the header dma, of course)  Additional info is over in the 7800 Hardware Facts thread.

  • Like 2
Link to comment
Share on other sites

16 minutes ago, RevEng said:

I ran the test, and it's actually worse - you lose 3 cycles of DMA for sprites with graaphics that are in the hole. (in addition to the header dma, of course)  Additional info is over in the 7800 Hardware Facts thread.

So sad... Unexpected but interesting... I'll have a look at the other thread. Let's keep this one for vertical scrolling innovations 🙂

  • Like 2
Link to comment
Share on other sites

  • 4 weeks later...

I have been helping with a 7800basic project that requires vertical scrolling and the idea of adding extra objects to the bottom row to use up all the Maria cycles seems to work very well.

 

In case it is of any use, attached is the source for a demo program making use of the technique.

scroll.zip

 

Edited to add another example, this time using tiled graphics:

basic.zip

 

Darryl drew the sprites, the background graphics are from: Omega Team | OpenGameArt.org

  • Like 5
  • Thanks 2
Link to comment
Share on other sites

31 minutes ago, playsoft said:

I have been helping with a 7800basic project that requires vertical scrolling and the idea of adding extra objects to the bottom row to use up all the Maria cycles seems to work very well.

 

In case it is of any use, attached is the source for a demo program making use of the technique.

scroll.zip 14.62 kB · 0 downloads

Thanks for sharing this Paul, really interesting!

 

 

  • Like 3
Link to comment
Share on other sites

On 9/15/2023 at 1:24 PM, Muddyfunster said:

Thanks for sharing this Paul, really interesting!

 

 

Yes, it was a great idea and works really well in 7800basic. For the vertical scrolling project we have allowed for a variable number of palettes per row, but here it's just a single palette per row. If you used a single palette for the whole playfield then you could use plotmap and wouldn't need much in the way of assembly code at all.

  • Like 6
Link to comment
Share on other sites

  • 5 months later...

If I'm understanding this correctly, it works best/only if you are using sprites as tiles.  If you are actually using character mode (and have it anchored at the beginning of the DL as to be under everything else), it becomes problematic and doesn't seem to work as expected... :( 

I'm running into this now with something I'm working on.  It looks like I *have* to use the 'adjust the bottom zone' method as I am using 32 DLs that I am 'scrolling' through, and I need to use character mode to scroll over a landscape.  The DLL itself is 26 zones high for the visible screen, and I'm using 32 to wrap around so I can 'AND #$1F' the top zone when I add 1 and work my way down.

Link to comment
Share on other sites

It works with characters, but as you observed, the last zone needs 4 objects inserted before the character objects. It's true whether you use sprites or characters for the background. You just need to handle the bottom zone as a special case.

 

The easiest way to do that is to just add 20 bytes of pre-padding to the last zone DL, but have your tables with DL locations skip over that 20 bytes of pre-padding. Then your special case for the last DL is just to point the DLL at the pre-padding, which can be done in your DLL build routine.

 

Of course, that assumes you aren't otherwise running out of DMA on the line, even without the 4 objects.

Link to comment
Share on other sites

I have to figure that out, because I have kind of a 'hybrid' DL build; the Character Mode is static, and the sprites are dynamically built.

 

Originally, I was *going* to put those 20 bytes in front of each line (I'm trying to get my 'loader' routine as fast as possible)... and make them all 1 byte in width and then just modify the width of the 4 objects in the last zone, but I think that would still take up too much Maria time.

 

If the 20 bytes are inserted in front of the last zone, I now have to remove them when they aren't in the last zone (because the character mode bytes are static)...

I don't know... I'll have to figure something out...

 

Thanks, RevEng! :) 

 

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...