Road Rash pre-alpha on Jaguar at 30 fps

ZylonBane · May 15, 2017

Are you going to use procedural texturing?

Do you even know what that means?

Edited May 15, 2017 by ZylonBane

+CyranoJ · May 15, 2017

Do you even know what that means?

It's like in Yars Revenge where you point the bitmap at the code

JagChris · May 15, 2017

Do you even know what that means?

I sure as hell have no idea what Zylonbane means Edited May 15, 2017 by JagChris

VladR · May 15, 2017

I sure as hell have no idea what Zylonbane means

He's not aware of our procedural texturing discussion for Doom3-style engine on Jaguar, that we had on jaguar64.eu about a year or so ago.

But, to answer your question, I have applied the Doom3-style proc.texturing design to texturing of RoadRash's buildings (about a year ago) and I have a detailed break-down of how to "compress" a 256x256 (64 KB) texture into 2 KB in cache via procedural texturing. Now, I could indeed go and implement it, though it's without a question, a complex endeavor.

But, the landscape has just changed. The Blitter-GPU combo is indeed proving a bit more powerful for texturing than my initial benchmarking few months ago hinted. Reason ? Even on a slow 68000, I could do 95 phrase-aligned transfers of a 2 KB block, per vblank.

So, even without this transfer being faster on GPU (which it will be), it's taking ~1/100 to transfer another texture. Now, on 68000, you were stuck, as Blitter was hitting RAM like crazy, and CPU did not move very much further, but on GPU the code continued to run without a hitch (or so it seemed), so you can just move the sync point down the engine chain and use the time (it was (30-21) = 9/100 , I believe - check scenario 6 in my yesterday's post) for something else (e.g. lighting, or whatever else). You probably see now what's the question, right ?

Can I create the 2 KB texture on GPU faster than in 1/100 of a vblank? I don't think so, right now, but it's kinda doubtful. Besides, even if I could, the underlying code won't hold anything else, so another paging would have to occur right after this stage - which might defeat the whole purpose, in the first place.

So, more benchmark data is needed now.

Complexity-wise, it's much, much harder to implement procedural texturing on GPU than a simple brute-force for-loop:

for (int UniqueBuildingTextureID = 0; UniqueBuildingTextureID < MAX_TEXTURES_PER_FRAME; UniqueBuildingTextureID++)
{
  Copy_Via_Blitter (UniqueBuildingTextureID);
  Render_All_Buildings (UniqueBuildingTextureID)
}

I'll gladly "sacrifice" 5% of vblank time to have 5 different textures for buildings at any given time (e.g. a total number of unique texture can be 200, it's just how many are in same frame) if it saves me so much work.

VladR · May 15, 2017

Thinking about this some more reveals obvious applications of the latest optimizations even for 2D racing games:

- since it takes 30% of vblank to texture the road, I can raise the resolution to 512x200 and still keep 60 fps

- 1024x200 should still run at around 30 fps, theoretically (depending on OP's impact on the bus)

At 1024x200, the ObjectProcessor would finally be properly taxed (and I really have no clue, as to where its real limits are), but it would provide an interesting experiment - how many useful large transparent bitmaps (e.g. trees, cars) could it render over the framebuffer without glitching.

Of course, since OP would be hitting the bus ~3-4x more often, this would most probably slow down Blitter significantly, but I guess we won't know the exact impact unless we actually try.

Pre-scaled bitmaps of trees/props/cars at 1024x200 will however eat up the rest of 1.6 MB (0.4 MB is gone just for the two framebuffers) real fast. But if we used CD, each level could have completely different art assets.

Something like Pole Position in 1024x200 - hell, yeah...

ZylonBane · May 16, 2017

Pre-scaled sprites? Like 8-bit game systems had to use? :_(

JagChris · May 16, 2017

At 1024x200, the ObjectProcessor would finally be properly taxed (and I really have no clue, as to where its real limits are), but it would provide an interesting experiment - how many useful large transparent bitmaps (e.g. trees, cars) could it render over the framebuffer without glitching.

The programmer of Super Burnout mentioned a trick where he split the screen into strips using the Object processor if I read it right. Would that technique help out here or no?

VladR · May 16, 2017

Pre-scaled sprites? Like 8-bit game systems had to use?

We got 2 MB s of RAM, might as well put it to good use (unless it's a 4 KB demo-contest entry). We want maximum possible performance, so you use any cheat possible, to get there.

Besides, having pre-scaled sprites means much higher visual quality, as the artist can manually retouch all scaled mipmaps, which especially for trees, can make a huge visual difference (compared to a scaling algorithm that will make a mess out of leaves).

If you put 2 choices in front of the player:

1. Play @30 fps at 1024x200 with pre-scaled sprites

2. Play @15 fps at 1024x200 with run-time scaled sprites

I don't think many players would voluntarily choose option 2 ('Yes, please! I desire half framerate and don't want the game to use up the bloody memory! Thank you!")

Of course, the 30 fps@1024x200 is just a hypothetical target number right now, as I haven't had a chance to do a proper fillrate test yet (but, soon).

The options, though, are definitely not hypothetical. I'm building the engine in a way that will player allow the choice of resolution, effects, vsync so that he can customize his own combination of visuals vs framerate.

swapd0 · May 16, 2017

I prefer 320x200 with as many fps as possible.

VladR · May 16, 2017

I prefer 320x200 with as many fps as possible.

I hear ya, but why force an artificial limitation, if you don't have to ? Why have PC gamers always had an option to choose a resolution/effects/details/vsync and console didn't (or barely did) ? If I force a combo of [resolution+details+vsync], you're basically screwed and left with whatever decision I as a developer made.

I can't say how many times I've seen people don't even know what vsync is and are not aware of tearing unless you post a screenshot of it (despite having played that game for dozens hours). They could have vsync turned off and just enjoy higher framerate, as they don't know / see / care anyway.

I personally prefer locked 30 fps in racing games to [sometimes 60, sometimes 50, sometimes 35], as it's very frustrating when your framerate suddenly drops in the middle of the curve, just for those few precious frames. If the game offered an option of the VSYNC, you could actually enjoy the game more, as the frameate would be stable (as long as you're avoiding performance spikes during the design phase, of course).

As I'm just finding out, it is indeed more work to make the OP+GPU+Blitter work in a resolution-independent way. I already had GPU-only based rendering resolution-generic long time ago, but throwing Blitter into the mix last weekend revealed few additional dependencies I missed in initial implementation, that just happened to work in default res, as those constants were just copy-pasted from my old C code.

Getting there...

VladR · May 16, 2017

The programmer of Super Burnout mentioned a trick where he split the screen into strips using the Object processor if I read it right. Would that technique help out here or no?

For a 3D racing game (like Road Rash and NFS), it wouldn't help, as everything is computed into the framebuffer, with only bike/cars/props/trees as 2D bitmaps (which are handled by OP anyway) - meaning, there's very few bitmaps in the OP list - one for framebuffer, one for player, and 10-20 for the props/trees/stuff.

However, for a scanline-based 2D racer, especially in higher resolution, it actually might be worthwile looking into. Right now, though, I'm rasterizing those 2D scanlines into framebuffer anyway. So, as long as it works, I don't intend on breaking it

I will keep this in mind however, and start thinking about it.

I'm probably going to hit OP-bandwidth limit in 1024x200, so may have to try it out soon.

VladR · May 16, 2017

OK, this morning I got it working at 512x200.

- still 60 fps

- takes 0.557 of a vblank, so almost half of vblank is still available on GPU

- if I don't wait for Blitter, GPU's really using only 0.375 of a vblank

- I just realized I could use this Blitter down time for something useful - e.g. 3D transformation of trees/props/cars bounding box (as that's the position that OP takes in OP List)

- Another use of this down time could be for real-time shadows on the road. While I already designed a road shadowing algorithm long time ago, it would require reading the lightmap from RAM, so that would kill the 60 fps (but this might be useful as a feature left up for a player - e.g. you want shadows ? alright, but performance will be affected).

Video shows few cycles of speed up and down. Obviously, as my capture device is a piece of shit, the capture is 30 fps.

I'm definitely going to try this in 1024x200. With a bit of luck (and tweaking), we might hit 60 fps. How ? Doubling pixel load means doubling 0.557 of vblank, e.g. ~1.14. Of course, if we hit the system's bandwidth limits, it won't work, but hey - can't blame a guy for trying

BTW, the 68000 is in a constant tight loop (hitting bus nonstop, 100% of frame time) here - so:

- GPU is reading a lot of variables from RAM - hitting the bus in outer loops

- Blitter is hitting the RAM all the time

- 68000 is running off the bus, so its tight loop is hitting the bus all the time too (it's just not doing anything useful, but that's another story)

I'll go and import some nice trees/shrubs so we can see how OP's going to cope with the higher resolution and overdraw (and especially transparent bitmaps of trees, as those are more expensive to render).

+Stephen · May 16, 2017

Love the videos - keep them coming!

DracIsBack · May 16, 2017

great experiments!

Edited May 16, 2017 by DracIsBack

DracIsBack · May 16, 2017

Indeed. Since VladR isn't promising anything here, there's absolutely no harm in him doing his experiments and posting the results. I've been following this thread just to see what he can achieve, and I've found it very interesting. So everyone can stop with the "WTF is going on here" posts, as it's adding nothing to the thread. If you don't like this thread, feel free to not post in it, simple as that.

^^^ This!!! ^^^

I'm loving the experiments.

VladR · May 16, 2017

Thanks! I'm very glad to see people are still curious about this attempt to see what kind of visuals are possible to extract from jag.

I always knew, just by looking at the frequency of GPU and DSP that there's an enormous computing potential sleeping there. Took me few years to summon the mojo to get my hands dirty in RISC assembler, but I'm glad I finally did few months ago.

Hopefully, these experiments will save a lot of R&D time to others, who will want to focus on a specific game (rather than a 3D engine),as they can just check the thread and see what technique works and what doesn't on jag.

Eventually, I also want to release a game on jag, but I want it to push the HW to the max (the max that I can extract, that is). And I almost did that with H.E.R.O. 3D, as I finally reached 60 fps from 68000 (which did, indeed, push 68000 to its absolute limits with textured graphics), but then I started experimenting with GPU and now I can clearly see that a smooth framerate can be achieved even in higher resolutions - most probably even in 1024x200, but I'm working on generalizing the engine so that a player will be able to choose from the lowest resolution, up to the 1409x240 (with appropriately reduced framerate).

Also, please note that this is all just scratching the surface. We still haven't touched DSP, which basically gives us another GPU performance-wise (just with half bandwidth to the RAM).

+Sauron · May 16, 2017

Also, please note that this is all just scratching the surface. We still haven't touched DSP, which basically gives us another GPU performance-wise (just with half bandwidth to the RAM).

Much less than that once you start using a sound engine.

May I make a request? Start putting together a demo that would simulate all of the necessary pieces of a game (sprites, models, overlays, sound, etc.). Seeing the barebones demos is nice, but it still doesn't tell us much of anything until you start accounting for other in-game assets in some way.

+CyranoJ · May 16, 2017

...and look into the 68k STOP command

JagChris · May 17, 2017

...and look into the 68k STOP command

You know Vlad is against reading documentation. ?

JagChris · May 17, 2017

...and look into the 68k STOP command

You know Vlad is against reading documentation. ?

ZylonBane · May 17, 2017

Besides, having pre-scaled sprites means much higher visual quality, as the artist can manually retouch all scaled mipmaps, which especially for trees, can make a huge visual difference (compared to a scaling algorithm that will make a mess out of leaves).

This is insanely wrong. Using pre-scaled sprites means that objects visibly snap between discrete sizes as they approach the player. This looks far worse than any transient aliasing of fine detail. The eye is far more attuned to smooth motion than resolving detail on a moving object.

And by consuming limited RAM with pre-scaled sprites, you're limiting the graphical variety of the game itself.

JagChris · May 17, 2017

For a 3D racing game (like Road Rash and NFS), it wouldn't help, as everything is computed into the framebuffer, with only bike/cars/props/trees as 2D bitmaps (which are handled by OP anyway) - meaning, there's very few bitmaps in the OP list - one for framebuffer, one for player, and 10-20 for the props/trees/stuff.

However, for a scanline-based 2D racer...

I thought Road Rash was a scanline based 2D racer with polygonal elements added? Edited May 17, 2017 by JagChris

swapd0 · May 17, 2017

I got a question, if the OP has two internal buffers of 360 pixels, the max horizontal resolution must be 360 x 2 = 720 pixels, how can you get 1024 pixels?

GroovyBee · May 17, 2017

I got a question, if the OP has two internal buffers of 360 pixels, the max horizontal resolution must be 360 x 2 = 720 pixels, how can you get 1024 pixels?

The OP can be started twice per scan line. While filling one of its line buffers, the other one can be shifted out. The line buffers are 360 x 32bits. With 24bit RGB, each line buffer can only hold 360 pixels (effectively having one byte of padding per pixel). In colour depths that use the CLUT, CRY or 16bit RGB, it doubles to 720 pixels per line buffer. The tech manuals also state that a VMODE pixel divisor of 1 will produce a ridiculously high resolution for a TV so it will be ignored.

Zerosquare · May 17, 2017

Nah, setting the divisor to 1 works as expected. Whether such a resolution is useful and the impact on bus bandwidth is another question.

Road Rash pre-alpha on Jaguar at 30 fps

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Recently Browsing 0 members