I had few hours this morning, so decided to attempt to go for jag's panacea : phrase-blitting, which means using full 64-bit bandwidth of jag, which is in huge contrast (8 Bytes per transfer) to transferring just 1 Byte at a time (pixel transfer).
If you recall, I'm generating only half left of the road on GPU, and create right half by mirroring feature of the Blitter (X_SIGNSUB flag) - making two blitter calls per scanline.
Unfortunately, just like the docs said, and I discovered this morning, Blitter wasn't wired to reverse the bitmap in phrase mode, and thus the right half must stay in the slow pixel mode. A potential workaround would be to use the second scanline temporarily and revert the bytes there for free (in parallel, while Blitter is blitting the left half), but that's at least one page of code, and it does not look like I have that much space available in cache. It's obviously a very bad idea to swap that code back and in per scanline, so the right half will have to stay in the slow pixel mode. I came up with another workaround, but that would break with curves, so it's not much use either.
So, the final speed of the road rendering is 28% of frame time. Extrapolated, in two frames time (30 fps), I can now texture (2 / 0.28 = 7.14) ~7x number of texels, and 3 frames time (20 fps), it is (3 / 0.28 = 10.71) 10x number of texels.
Luckilly, since we render 50% of texels via phrase blit and 50% via pixel blit, that's actually a very realistic middle ground for fullscreen texturing.
What does it mean ? Well, my texturing routine is finally fast enough to be used for first-person-shooter engine. As I have a fillrate to do 10x texels (as the road has) at 20 fps, this means there's a huge buffer for overdraw (situation, when you redraw same pixel multiple times, thus effectively wasting the performance). Swapping different textures will probably bring the factor of 10x, to something like 8x, but that's still substantial buffer even for rooms that have pillars in the middle of the room (e.g. severe overdraw).
The following is a quick rough outline, how an ideal FPS engine on jag could be designed (obviously, with proper rearrangement as there are multiple sync points):
First, we lock the framerate to 20 fps. This means we have 3 vblanks of GPU, 3 vblanks of DSP and 3 vblanks of 68000. How are we going to use them then ?
68000: On or Off ?
- contrary to popular misconception, 68000 can do a lot on jag
- this is the reason why I was keeping 68000 in a constant non-productive loop, busy 100% of the frame time (basically just wasting bandwidth of Blitter) - I always knew I will want to use those 13.3 MHz later on, and wanted to have a realistic picture of how much bandwidth I have, when 68000 is nonstop banging on the bus
- the thing is, when you shut it off, it won't bring enough performance in GPU to warrant the shut off in the first place. The Blitter will be able to blit more, sure, but it won't cover what functionality 68000 can do in full frame time. That code has to be blit onto GPU. Which means two 4 KB blits (blit the new first, then the old one). During which time GPU must be idle, so not only you lost the performance of 68000, but now you are down 2 blits (about 7-10% of frame time). That alone kills any potential benefit of stopping 68000, unless, of course, we're talking about some simple intro/demo, which can fully run out of 4 KB without a single code blit.
- 3 frames time mean 3x13.3 MHz = ~40 MHz : Now that would be just stupid to not use it...
- The key to using 68000 effectively is to minimize memory access for variables and let it work off registers as much as possible. Plus, unlike GPU/DSP, it can directly work with 8-bit values without wasting 3 bytes, so we can stuff a lot of important data just to its registers.
Engine component breakdown:
Frame 1: Input, Audio
Frame 2: Scripting (Doors, switches, ...), AI
Frame 3: HUD (ammo/score/health),Crude World Culling (just prepare list of big chunks for DSP to process)
GPU: Frame 1-3: Texturing polygons (+clipping), swapping textures
DSP: Frame 1-3: Collision Detection, Visibility, world culling, Preparing list of polygons (for GPU to render)
This is a distant future, of course, but maybe I'll get to this before end of this year...