It's a bit hard to count the fps of that game in the video ... seems like 8-12 with some interleaved displayed objects.
Yeah, I usually notice that it must be under 10 fps, if you can recognize discreet frames and your eye has enough time that it can stay fixed on the frame for a bit.
But with charsets, there's really not much choice without requiring 128+ KB, unless the game engine can actually draw the scene into those charsets at loading time.
If it's possible to get up to 20 fps with sheer "3d rendering" , the game itself shouldn't drop the fps too much.
The reason why we can get close to 20 fps is that:
1. 0 cycles : ClearScreen
- we don't have to clear the framebuffer, as the sea and land cover 100% of the gameplay area
- that's quite a substantial portion of the first frame time (up to 75% actually, depending on how much RAM you want to sacrifice for unrolled clearing)
2. 3,200 cycles : Sea
- the sea polygons are by definition flat, which can be used to great performance advantage, e.g.:
- we don't have to engage in an expensive fixed-point scanline traversal involving edge computations, lots of tables and temporary arrays
- the sea starts at left edge of screen and ends at middle of screen (e.g. 20 Bytes), then with each other scanline , we store 1 Byte less
- there's about 40 scanlines of sea, so we need to store 40*20/2 = 400 Bytes on left side of screen and if visible, also 400 Bytes on right side of screen
- this can be done via completely unrolled code, no indexing, just a fastest 4-cycle STA $3017, as we know exactly each address of each pixel's byte
- thus the sea will take only 800*4 = 3,200 cycles and as an added bonus, it takes care of erasing the edges of the land polygons
- RAM costs would be 3 Bytes per STA, so 800*3 = 2,400 Bytes. Quite acceptable
- So, we've spent so far only 3,200 cycles out of our budget of 60,000 and handled clearing and sea, and about 40% of the 3D viewport (looks to be around 58-64 scanlines)
3. 7,160 cycles : 3D Transform of vertices
- So, now we're left with rendering the land, holes, and characters (in that order, using the painter's sorting approach). There's about 15-20 quads for land and roughly 30-40 vertices.
- The transform stage of pipeline thus will take 179*40 = 7,160 cycles, as I need 179 cycles to read, transform and store the vertex in a loop.
4. 13,782 cycles : Inner scanline quad-pixel fill
- let's go with the worst-case scenario : 66% of the viewport needs to be filled for land
- 0.66*160*64 = 6,758 pixels, that's 1,689 quads (4-pixels via STA)
- my last benchmark shows I can do 789 quads across multiple polygons, in 6,435 cycles, hence 6,435/789 = 8.16 cycles per quadpixel
- 8.16 * 1,689 = 13,782 cycles
5. 7,552 cycles: Scanline loop traversal
- 118 cycles per scanline
- we have 64 scanlines, thus 64*118 = 7,552 cycles
6. 1,920 cycles: Scanline Edge draw
- doing OR with framebuffer, for nice, blending of colors across byte boundaries
- 15 cycles per edge
- 64*2 = 128 edges, thus 128*15 = 1,920 cycles
All together: 1,920 + 7,552 + 13,782 + 7,160 + 3,200 = 33,614 cycles out of 60,000 budget, so we still have about 27,000 for the things like holes in ground and characters, input and still retain 20 fps.
Hell, it even looks, like a simple scene could hold 30 fps (~40,000 cycles), especially if we have PMG-only characters. Not sure right now, how many cycles it takes to draw transparent characters onto the background via CPU. That's going to be a lot, probably - haven't looked into that yet. Maybe we could try a 3D penguin ?
For sure on PAL, as 2 frames there result in roughly 50,000+ available cycles.
Also, you can't avoid the differences in the amount of cycles it takes to render the frame. It's just the nature of the beast - with so many different pipeline stages, each having different performance characteristics, it's just the way it is.
But as long as the scene complexity - especially the scanline count (that's the one which accounts for majority of time taken) - stays roughly constant between the frames, it shouldn't oscillate more than, say, ~20%, from frame to frame. At least that's the number I got when testing the tunnel flythrough. This does look very similar.
I am, however, convinced, that even larger framedrops would be acceptable, as the speed of movement is slower than in a car racer, plus you can literally only jump and go left/right
Also, when using the PMg , there is no need for WSYNC, just add the DLI bit to the range for coarse adjustments. That shouldn't also take too much CPU cycles.
But without WSYNC, there's going to be jitter. What kind of coarse adjustment without WSYNC to PMG can you do that won't be an ugly jittery glitch ?
We can easily do about 4 register changes per scanline during DLI, correct ? At the very least, I'd sacrifice few scanlines-worth of CPU time time to get:
1. Skybox colors
2. Fog in distance colors
3. Regular colors for 3D viewport
The above, at the cost of only 3 scanlines (~250-300 cycles per frame) will enhance the 4-color look substantially. Any additional color enhancement would cost us a lot more CPU time, as we'd need way too many DLIs after that point to get those additional colors noticed, I suppose.
The scene should be build on full software, including the penguin ... for the main "bytes" . the borders could be filled with missiles. Same with other objects... Draw the main part in the background and overlay the needed parts for some details and or colors.
Or, we could go for 15 fps, but would be able to spend another 50,000 cycles on additional objects - e.g. buildings, snow plows, barrels, boxes, planes, ship on sea....
Of course, not at the same time on one screen, it's 1.79 MHz after all
That's the beauty of 3D graphics, the full control you have over the scene and adding/removing things, adjusting resolution, creating 3D art in notepad...