Jump to content

Photo

Road Rash pre-alpha on Jaguar at 30 fps

Road Rash GPU 30 fps

359 replies to this topic

#351 GroovyBee OFFLINE  

GroovyBee

    Games Developer

  • 9,736 posts
  • Busy bee!
  • Location:North, England

Posted Tue Jun 13, 2017 2:34 AM

- .long alignment on labels is occasionally ignored (still haven't figured why - but it's clearly a bug in the tool), thus I had to come up with a workaround in code


Which assembler and linker is this? Can you provide a short example that demonstrates your problem?

#352 Zerosquare ONLINE  

Zerosquare

    River Patroller

  • 2,287 posts
  • Location:France

Posted Tue Jun 13, 2017 9:04 AM

Seeing this thread bump reminded me of a question I forgot to ask. Given that the 'theoretical' maximum resolution most people will be able to run Jaguar stuff it on a TV or RGB monitor is 320 x 400/480 or 640 x 240 or even 640 x 400/480 *ish* (sorry - my memory about what CRTs / 15 kHz monitors can do is very hazy), other than for theoretical tests of what the Jag can push, what's the 'best' or most appropriate resolution to aim for 'in-game'? Other than memory and pixel shifting constraints, what would be the best resolution to target?

Best for what?
If you want maximum performance, use the lowest resolution (and the smallest number of colors).
If you want maximum visual quality, do the opposite.
If you're looking for something more-or-less balanced, roughly 320x240 in 16-bit (CRY or RGB) mode is probably the sweet spot, and that's what most Jag games use (and games for other consoles of that era are similar).
 

is there a chance that a game (with all associated 'AI' and music and sounds and objects) could run hi-res or interlaced?

LinkoVitch's Reactris runs in interlaced mode.

Edited by Zerosquare, Tue Jun 13, 2017 9:04 AM.


#353 VladR OFFLINE  

VladR

    Dragonstomper

  • Topic Starter
  • 685 posts
  • Location:New York

Posted Wed Jun 14, 2017 10:33 AM

Which assembler and linker is this? Can you provide a short example that demonstrates your problem?

I don't have access to the code right now, but it's usually something as simple as:
 

.long

MyJumpLabel:

 

 

During build, I'm displaying all symbols, so I can always check the address [and now that I have runtime hexa output] confirm the addresses are only word-aligned.

 

This is not exclusive to GPU section, it happens in the 68000 section too, and it's been a source of great frustration especially in past, when I didn't have the debugging functionality I have now and didn't understand why adding two unrelated lines of code suddenly breaks the stored values in memory (way down the line). Took a while, at that time without linker symbol printout,  to figure out why it's printing values from different variables (especially if they were Byte, not word).

 

As for the versions, JagChris will probably want to kill me [rightfully so, might I add], but I'm still using the original, oldest versions. I don't like to break&mess with my build environment, and now that I know exactly what's going on, and have code workarounds in place finally, it's not a huge deal anyway...



#354 VladR OFFLINE  

VladR

    Dragonstomper

  • Topic Starter
  • 685 posts
  • Location:New York

Posted Wed Jun 14, 2017 11:34 AM

Seeing this thread bump reminded me of a question I forgot to ask. Given that the 'theoretical' maximum resolution most people will be able to run Jaguar stuff it on a TV or RGB monitor is 320 x 400/480 or 640 x 240 or even 640 x 400/480 *ish* (sorry - my memory about what CRTs / 15 kHz monitors can do is very hazy), other than for theoretical tests of what the Jag can push, what's the 'best' or most appropriate resolution to aim for 'in-game'? Other than memory and pixel shifting constraints, what would be the best resolution to target?

From 2D art assets production standpoint, non-square-pixel bitmaps are harder to draw. I don't think PaintShopPro has that capability, I think I've seen it in Photoshop, that you can define rectangular dimensions of a pixel. For example, 640x240 means 2:1 ratio horiozontally. But 512x240 means 1.6, which is counterintuitive.

 

From 3D standpoint, it does not matter, as when you write the engine generically enough, higher resolution will only result in sharper texels, not aspect ratio issues.This is, where the higher resolutions make real sense, as they greatly benefit visual quality (less shimmering, more texture detail visible from distance, and so on). Of course, with the obvious caveat - performance.

 

As for the interlaced resolutions (480 / 576 lines), I am of the opinion, that they actually make sense now, when almost everybody has LCD, since they can't reproduce flicker - e.g. they just show all scanlines, but without flickering. I haven't seen it myself, but I've read plenty threads where people described interlaced resolutions on LCD as perfectly crisp and flicker-free (obviously).

 

Today, for a 2D game, I would target a framebuffer of 768x480. 768, because that's the closest line width that the OP understands, and just artificially reduce gameplay area by 32 pixels on each edge. OP can easily display up a scanline of 720 pixels in one pass (usually, 704 physically visible, at least on my end, based on the video registers set-up).

 

 

 

All of these look great, but is there a chance that a game (with all associated 'AI' and music and sounds and objects) could run hi-res or interlaced? Even a maximum of 720 x 480 would be amazing. I recall being slightly disappointed that I never saw any Jag games run any action in hi-res / interlace mode, especially Atari carped on about theoretical high resolutions that it could do.

First of all, lots of games process music and run purely on 68000, without ever touching GPU/DSP, yet still get enough bandwidth to play some audio.

 

Jag has absolutely enough bandwidth to play audio in HighRes. Just check my older transparency test, where I clear&display using Blitter a framebuffer of 1536x200 and then cover it fully with transparent tree bitmaps - meaning, OP has to process the 1536x200 second time. And during all that time, an idle loop on 68000 is constantly, 100% of frame time, banging on the bus (just doing nothing, but slowing down the system very effectivelly nonetheless).

 

Transferring a 1-2 KB of music data each frame to DSP's 8 KB cache is practically nothing compared to the obnoxious amount of data that Blitter/OP have to transport each frame in that resolution (~300 KB vs ~64 KB in 320x200).

 

Blitter-free games, e.g. 2D platformers, top-down screen-based games, where you only adjust the OP List of bitmaps (or use Blitter only when switching screens), I'd argue they should be able to play some simple MIDI even via 68000 in that resolution, let alone from DSP.

 

One of my desires for jag is getting something like Pitfall II running in PAL's 1382x576. That would look magnificently sharp even on large LCD. No Framebuffer here (it would be ~900 KB at 256 colors!, clearing that would take forever too, let alone filling it through blitter), just manipulating OP List of all the 2D tiles, and letting OP do what it was designed for. Since rendering would be done totally on OP, GPU/DSP/68000 are free to handle audio and updating OP List (which you have to do anyway, even if you have just 1 framebuffer bitmap there).

 

Not sure, I'll get to it this year, though - still so much to experiment with...

 

This year, for sure, I want to try the H.E.R.O. 3D engine in 1382x576 (12x more pixels than 320x200). If I could fit all texturing there under 4 vblanks (e.g. 12.5 (PAL)/15 (NTSC) fps), it'd still be quite smooth and playable, and the recent Blitter phrase texturing throughput tests I did in 1568x200 hint it actually should be within the remote realms of possibilities  :)



#355 JagChris OFFLINE  

JagChris

    River Patroller

  • 3,167 posts
  • Location:Oregon

Posted Wed Jun 14, 2017 2:07 PM

As for the versions, JagChris will probably want to kill me [rightfully so, might I add], but I'm still using the original, oldest versions. I don't like to break&mess with my build environment, and now that I know exactly what's going on, and have code workarounds in place finally, it's not a huge deal anyway...

Well yeah I can totally see taking up GPU memory and cycles creating workarounds for problems that could probably be fixed by updating to the latest versions of rmac/rln.

That's not batshit crazy at all. 😶

Edited by JagChris, Wed Jun 14, 2017 2:09 PM.


#356 JagChris OFFLINE  

JagChris

    River Patroller

  • 3,167 posts
  • Location:Oregon

Posted Thu Jun 15, 2017 12:30 AM

DOuble post

Edited by JagChris, Thu Jun 15, 2017 12:30 AM.


#357 VladR OFFLINE  

VladR

    Dragonstomper

  • Topic Starter
  • 685 posts
  • Location:New York

Posted Tue Jun 20, 2017 11:39 AM

Well yeah I can totally see taking up GPU memory and cycles creating workarounds for problems that could probably be fixed by updating to the latest versions of rmac/rln.

That's not batshit crazy at all.

Actually, it's not, given the direction the time flows in our universe. Any time, from now on, spent on updating the tools (even if it's just 30 minutes), is just time totally wasted, as:

1. it's not going to bring the time I wasted on troubleshooting this issue in last few months,

2. it won't save any future time (as I already have the workarounds in place).

 

In other words, if the tools were not buggy, the time I wasted last few months, on debugging what the hell was going on, I'd already have implemented this:

- road hills/curvature

- AI

- audio

- at least 3-4 additional renderpaths, experimenting with various rendering techniques (yes, there's still a lot to experiment with)

 

And that's a very conservative estimate, given the compound productivity during my high-focus weekends (where, the more features you initially implement, it's like a turbo, the more additional ones you manage to implement as a bonus - remember the LOD terrain ?).



#358 VladR OFFLINE  

VladR

    Dragonstomper

  • Topic Starter
  • 685 posts
  • Location:New York

Posted Tue Jun 20, 2017 12:10 PM

Small update from this morning and yesterday:

- since I now have GPU chunks, I can separate the 3D pipeline substages into separate chunks, thus leaving more cache for the code&data of respective chunk, which I did today with road texturing

- there was a lot of idle downtime, when GPU was waiting for Blitter till it finishes copying current scanline into framebuffer

- I've finally implemented the scanline doublebuffering codepath - e.g. while Blitter is blitting current scanline, GPU in parallel prepares another scanline without waiting

- this reduced the wait time by exactly 50%, so there's still 50% of the Blitter wait time, that I can use in future for additional effects (or features)

 

- what's interesting is, that the road texturing time dropped to below 33% of frame time -> meaning, I can have triple the amount of road texturing, and still keep 60 fps

- if we go for 30 fps, that's (3+3) = 6x amount of texels occupied by road

- if we're willing to drop to 20 fps, then we can have (3+3+3) = 9x amount of texels occupied by road. That's quite substantial amount of texturing for a jaguar, though not many genres play well in 20 fps (then again, lots of games play in 15 fps range)

- but, something like Legend Of Grimrock / Dungeon Master, does not need 60 fps, and 15-20 fps is more than enough

- this is still just the slowest pixel transfer of Blitter, not phrase transfer (which is obviously much faster)

 

- in practice, this means, that if I refactor the building texturing to use same approach (at the cost of lowres textures), it should be possible to run together with the road at 60 fps

 

- since there's no way, the road curvature/hills/ai/music/input/trees will eat a full vblank on GPU, the worst-case scenario is 30 fps (e.g. 2 vbls) for all elements of the game

- and we still have 2 frames worth of DSP time and 68000 time (both doing nothing right now, except for 68000 hogging the bus)

 

- I'm getting more confident, that on PAL, the city section of RoadRash, assuming phrase Blitter transfers, could actually run at full 50 fps on jag (including music)

 

I'll now go and refactor the building rendering into a second GPU chunk (may take few days), and integrate it with the road and trees, so that we'll get closer to something playable...



#359 VladR OFFLINE  

VladR

    Dragonstomper

  • Topic Starter
  • 685 posts
  • Location:New York

Posted Wed Jun 21, 2017 1:22 PM

I had few hours this morning, so decided to attempt to go for jag's panacea : phrase-blitting, which means using full 64-bit bandwidth of jag, which is in huge contrast (8 Bytes per transfer) to transferring just 1 Byte at a time (pixel transfer).

 

If you recall, I'm generating only half left of the road on GPU, and create right half by mirroring feature of the Blitter (X_SIGNSUB flag) - making two blitter calls per scanline.

Unfortunately, just like the docs said, and I discovered this morning, Blitter wasn't wired to reverse the bitmap in phrase mode, and thus the right half must stay in the slow pixel mode. A potential workaround would be to use the second scanline temporarily and revert the bytes there for free (in parallel, while Blitter is blitting the left half), but that's at least one page of code, and it does not look like I have that much space available in cache. It's obviously a very bad idea to swap that code back and in per scanline, so the right half will have to stay in the slow pixel mode. I came up with another workaround, but that would break with curves,  so it's not much use either.

 

So, the final speed of the road rendering is 28% of frame time. Extrapolated, in two frames time (30 fps), I can now texture (2 / 0.28 = 7.14) ~7x number of texels, and 3 frames time (20 fps), it is (3 / 0.28 = 10.71) 10x number of texels.

Luckilly, since we render 50% of texels via phrase blit and 50% via pixel blit, that's actually a very realistic middle ground for fullscreen texturing.

 

What does it mean ? Well, my texturing routine is finally fast enough to be used for first-person-shooter engine. As I have a fillrate to do 10x texels (as the road has) at 20 fps, this means there's a huge buffer for overdraw (situation, when you redraw same pixel multiple times, thus effectively wasting the performance). Swapping different textures will probably bring the factor of 10x, to something like 8x, but that's still substantial buffer even for rooms that have pillars in the middle of the room (e.g. severe overdraw).

 

The following is a quick rough outline, how an ideal FPS engine on jag could be designed (obviously, with proper rearrangement as there are multiple sync points):

 

First, we lock the framerate to 20 fps. This means we have 3 vblanks of GPU, 3 vblanks of DSP and 3 vblanks of 68000. How are we going to use them then ?

 

68000: On or Off ?

- contrary to popular misconception, 68000 can do a lot on jag

- this is the reason why I was keeping 68000 in a constant non-productive loop, busy 100%  of the frame time (basically just wasting bandwidth of Blitter) - I always knew I will want to use those 13.3 MHz later on, and wanted to have a realistic picture of how much bandwidth I have, when 68000 is nonstop banging on the bus

- the thing is, when you shut it off, it won't bring enough performance in GPU to warrant the shut off in the first place. The Blitter will be able to blit more, sure, but it won't cover what functionality 68000 can do in full frame time. That code has to be blit onto GPU. Which means two 4 KB blits (blit the new first, then the old one). During which time GPU must be idle, so not only you lost the performance of 68000, but now you are down 2 blits (about 7-10% of frame time). That alone kills any potential benefit of stopping 68000, unless, of course, we're talking about some simple intro/demo, which can fully run out of 4 KB without a single code blit.

- 3 frames time mean 3x13.3 MHz = ~40 MHz : Now that would be just stupid to not use it...

- The key to using 68000 effectively is to minimize memory access for variables and let it work off registers as much as possible. Plus, unlike GPU/DSP, it can directly work with 8-bit values without wasting 3 bytes, so we can stuff a lot of important data just to its registers.

 

Engine component breakdown:

 

68000 :

Frame 1: Input, Audio

Frame 2: Scripting (Doors, switches, ...), AI

Frame 3: HUD (ammo/score/health),Crude World Culling (just prepare list of big chunks for DSP to process)

 

GPU: Frame 1-3: Texturing polygons (+clipping), swapping textures

 

DSP: Frame 1-3: Collision Detection, Visibility, world culling, Preparing list of polygons (for GPU to render)

 

This is a distant future, of course, but maybe I'll get to this before end of this year...



#360 walter_J64bit OFFLINE  

walter_J64bit

    River Patroller

  • 4,966 posts
  • Location:Goldsboro NC

Posted Wed Jun 21, 2017 2:22 PM

 What he said!






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users