Jump to content
  • entries
    334
  • comments
    900
  • views
    258,300

7800 cycle counting


EricBall

1,273 views

I've done some cycle counting of the display list builder for SpaceWar! 7800 and the results aren't pretty:

 

103 cycles per display list (25 NTSC, 30 PAL)

403 cycles per player sprite, +191 for horizontal wrap around

248 cycles per non-player sprite, +82 for horizontal wrap around

50 cycles per sprite header vertical wrap around

200+ cycles of overhead

 

At 114 cycles per raster, just the display list builder is going to chew through all 62 lines of VBLANK.

 

So, what to do?

1. Optimize where possible, but otherwise ignore it and risk having the display stutter.

2. Alternate frames building display lists and game processing. I'd need to recalculate (or fudge) all of the gravity & velocity tables since it would be effectively 30fps instead of 60fps.

 

#2 sounds feasible, just need to add the DLI wait routine after the display list builder.

 

Note: SpaceWar! 7800 doesn't do itself any favors when it comes to the display list builder:

1. 4 way scrolling tiled background

2. 4 way wrap around

3. player sprites are double height versus the zones

8 Comments


Recommended Comments

Hi there,

 

I'm not fluent in 7800 programming so bare with me.

 

Would you have enough time to alternate your display list during the kernel display? So while one section is being displayed you could build the other section before it's needed. IIRC Ms. Pac-man does this.

 

BTW, thank you for posting to [maria] I had given up hope for that list.

Link to comment
Would you have enough time to alternate your display list during the kernel display? So while one section is being displayed you could build the other section before it's needed. IIRC Ms. Pac-man does this.

 

I've not yet tried 7800 programming, so maybe I'm way off base, but I'm not sure I understand what is so difficult about building display lists. I would think the code would be something like (assumes ten 16-line-high display lists, plus a pointer set up for a 'dummy' eleventh)

objectlp:
; Find first display list for object #x
 lda objecty,y
 lsr
 lsr
 lsr
 and #$1E ; Select one of 16 pointers
 tax
 cpx #22
 bcs skipThisGuy
; Compute value for first display-list-entry byte
 sta (dlptrs,x)
 inc dlptrs,x
 sta (dlptrs+2,x)
 inc dlptrs+2,x
; Compute value for second display-list-entry byte
 sta (dlptrs,x)
 inc dlptrs,x
 sta (dlptrs+2,x)
 inc dlptrs+2,x
; Compute value for third display-list entry byte (upper list)
 sta (dlptrs,x)
 inc dlptrs,x
; Compute value for third display-list entry byte (lower list)
 sta (dlptrs+2,x)
 inc dlptrs+2,x

 dey
 bpl objectlp 

I don't know exactly what's involved in computing those display list bytes, but I should think adding each object to the proper display lists should take a lot less than 400 cycles.

Link to comment

Would you have enough time to alternate your display list during the kernel display? So while one section is being displayed you could build the other section before it's needed. IIRC Ms. Pac-man does this.

I'm already starting to create the display list immediately after the visible screen has finished displaying. (Although that reminds me, I should update the DPP registers earlier since the display list builder might not be finished before DMA starts.) Double buffering the display lists would also increase the complexity of the display list builder and significantly cut into the number of sprites each display list could handle (since the RAM requirements would double).

 

The problem is simplly the number of cycles it takes to create the display lists.

Link to comment

I don't know exactly what's involved in computing those display list bytes, but I should think adding each object to the proper display lists should take a lot less than 400 cycles.

See my sample 7800 source code post for closer to the minimum number of cycles per sprite (186 cycles). But, as I pointed out, the SpaceWar! 7800 code has some additional requirements (i.e. wrap around) which increases the number of cycles per sprite. SpaceWar! 7800 also uses subroutines to reduce the space requirements, though maybe I need to re-think some of that.

 

Oh, now I get why you're using (ptr,x). Hmm... An interesting idea. Too bad it would require 2 bytes per display list, that's a lot of RAM. It also requires 12 cycles for sta (ptr,x) + inc ptr,x versus 7 cycles for sta (ptr),y + iny.

 

The 7800 also has two lists. The display list list (or DLL) which has a pointer to a display list and the number of lines (1-16) to use that display list (and is basically static). Each display list is made up of a number of 4 or 5 byte sprite headers which contain a pointer to the graphics data or tile list, the width of the sprite (1-32 bytes), the horizontal position and the palette. The "kernel" of a 7800 game is the display list buider, which maps the sprite Y postitions to display lists, then adds the sprite header to the lists.

Link to comment
Oh, now I get why you're using (ptr,x). Hmm... An interesting idea. Too bad it would require 2 bytes per display list, that's a lot of RAM. It also requires 12 cycles for sta (ptr,x) + inc ptr,x versus 7 cycles for sta (ptr),y + iny.

 

If you use 16-line display lists, twelve of them (192 scan lines) plus one dummy would total 26 bytes. That doesn't seem horrible.

 

As for taking "twelve cycles per byte instead of seven", I would think that the extra 40 cycles necessary up update a pair of consecutive display lists would be more than offset by the code saved elsewhere. Even if you could count on carry being clear beforehand:

 lda DLLow,x
 sta DLptr1
 adc #4
 sta DLLow,x
 lda DLLow+1,x
 sta DLptr2
 adc #4
 sta DLLow+1,x
 lda DLHigh,x
 sta DLptr1+1
 lda DLHigh+1,x
 sta DLptr2+1

Let's see... 4+3+2+5 +4+3+2+5 +4+3+4+3 is 42 cycles. The code later on will require:

 ldy #0; First time only
 sta (DLptr1),y
 sta (DLptr2),y
 iny; All but last time

6+6+2, four times, for 56 cycles. Total of 98. The (ind,x) approach is 12*8, i.e. 96.

 

Hmm... I guess (ind,x) doesn't save all that many cycles, but any extra cycles in the store operation are fully made up for by eliminating the setup. Still not sure where all your other cycles are coming from. With minimal extra code, vertical wrapping could be handled for an extra six cycles in the non-wrap case, or 18 cycles in the wrap case:

; After setting up X
 cpx #22; Need a CPX anyway
 bcc draw_ok; Would otherwise be a bcs skip_drawing, so we use one extra cycle
 bne skip_drawing
; We're drawing onto DLptr+22 and DLptr+24
 lda DLptr
 sta DLptr+22
draw_ok:
; ...
 cpx #22; Were we drawing the wrap case
 bne no_wrap_recover:
 lda DLptr+24
 sta DLptr
no_wrap_recover:

If you're more interested in worst-case time than normal-case, you can unconditionally copy DLptr to DLptr+22. That would cost an extra three cycles in the no-wrap case, and save three cycles in the wrap case.

Link to comment

If you use 16-line display lists, twelve of them (192 scan lines) plus one dummy would total 26 bytes. That doesn't seem horrible.

 

As for taking "twelve cycles per byte instead of seven", I would think that the extra 40 cycles necessary up update a pair of consecutive display lists would be more than offset by the code saved elsewhere.

SpaceWar! 7800 is based on 8 line zones (for the background tiles & shot sprites), so it would be 240/8*2 = 60 bytes for PAL. However, since ZP,X is the same number of cycles as ABS,X/Y the sprite info tables could be moved into main RAM without any time penalty (extra byte though).

 

Anyway, I'm going to go back and rework the display list builder from scratch. Really focus on cycle counts and less on space. I'll see if (ZP,X) would be better.

Link to comment
I'll see if (ZP,X) would be better.

 

Well, since the people at MOS thought it was more worthwhile to document e.g. "sta (zp,x)" than e.g. "dcp zp", it might be nice to actually use the former once in awhile. ;)

Link to comment
I'll see if (ZP,X) would be better.
Well, since the people at MOS thought it was more worthwhile to document e.g. "sta (zp,x)" than e.g. "dcp zp", it might be nice to actually use the former once in awhile. ;)

 

I just converted my sample display list builder (no wrap, no background) to use (ZP,X) and it does require fewer cycles (34 per DLL versus 39 & 161 per sprite versus 186). I'll modify the rest of the sample code and post it.

Link to comment
Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...