Jump to content
IGNORED

Road Rash pre-alpha on Jaguar at 30 fps


VladR

Recommended Posts

Welcome the Blitter

 

- Yesterday night I've created a compile-time codepath for Blitter

- I can still switch back to SW rasterizing, which is important, as the SW codepath will be later moved to DSP for a multithreaded renderer

- This is just the 8-bit (pixel) transfer ( not the infamous 64-bit one - not yet at least)

- The engine runs at rock solid vsync-locked 30 fps

- Without vsync, the framerate is on average (across 1,000 frames) 39.4 fps in 768x200

- Compared to previous video that was purely SW rasterizing (same level dataset, where the fps was 24. 8), the Blitter is a clear improvement even in a pixel mode, as the old SW version is 62.9% slower

- Blit Efficiency : The GPU is working with Blitter in parallel for about 35% of the Blit time currently (the rest - 65% of blit time - is literally wasted on waiting), so there's still quite a bit of downtime that could be reused for another functionality (e.g. transforming next triangles, ai, culling, ...)

- Blitter however saved quite a bit of cache (the SW scanline drawing was quite huge). This build takes just 2,866 Bytes in GPU cache, so there's a lot of space for other functionality and/or optimizations

 

 

This was a productive week, as a week ago:

- 19.8 fps : This was the framerate of the first working version with 3D meshes

- 23.8 fps : First optimization made

- 28.7 fps : Yesterday's Loop Unrolling optimization

- 39.4 fps : Introduced Blitter

 

I think I'm sort-of alright with the performance of the single thread right now (I still have a few things on my todo list, each bringing 4-8%, but that's for later) and will consider the current version good enough to start moving to DSP for my first-generation multithreaded renderer.

 

This should cross me over to the 60 fps in 768x200.

 

BTW, 1536x200 runs at 24.5 fps, so jag is entirely fast enough to run something like this in 30 fps in Hires when properly multithreaded.

 

But I should first debug why the left polygons are culled early, as yesterday's massive space scene showed that clipping worked just fine on all those 3d ships. But not here, strange.

  • Like 6
Link to comment
Share on other sites

VladR, with all of your experimentation you are doing I have 2 requests:

 

Request A - Actually turn this bad boy into stunrunner! (looks like it could work!)

 

Request B - Make me a lightgun for the jag that will work on modern tvs

 

That's all for now. keep up the good work, keep experimenting! ;)

  • Like 3
Link to comment
Share on other sites

Welcome the Blitter

 

- Yesterday night I've created a compile-time codepath for Blitter

- I can still switch back to SW rasterizing, which is important, as the SW codepath will be later moved to DSP for a multithreaded renderer

- This is just the 8-bit (pixel) transfer ( not the infamous 64-bit one - not yet at least)

- The engine runs at rock solid vsync-locked 30 fps

- Without vsync, the framerate is on average (across 1,000 frames) 39.4 fps in 768x200

- Compared to previous video that was purely SW rasterizing (same level dataset, where the fps was 24. 8), the Blitter is a clear improvement even in a pixel mode, as the old SW version is 62.9% slower

- Blit Efficiency : The GPU is working with Blitter in parallel for about 35% of the Blit time currently (the rest - 65% of blit time - is literally wasted on waiting), so there's still quite a bit of downtime that could be reused for another functionality (e.g. transforming next triangles, ai, culling, ...)

- Blitter however saved quite a bit of cache (the SW scanline drawing was quite huge). This build takes just 2,866 Bytes in GPU cache, so there's a lot of space for other functionality and/or optimizations

 

https://youtu.be/XFMeAKkgf_g

 

This was a productive week, as a week ago:

- 19.8 fps : This was the framerate of the first working version with 3D meshes

- 23.8 fps : First optimization made

- 28.7 fps : Yesterday's Loop Unrolling optimization

- 39.4 fps : Introduced Blitter

 

I think I'm sort-of alright with the performance of the single thread right now (I still have a few things on my todo list, each bringing 4-8%, but that's for later) and will consider the current version good enough to start moving to DSP for my first-generation multithreaded renderer.

 

This should cross me over to the 60 fps in 768x200.

 

BTW, 1536x200 runs at 24.5 fps, so jag is entirely fast enough to run something like this in 30 fps in Hires when properly multithreaded.

 

But I should first debug why the left polygons are culled early, as yesterday's massive space scene showed that clipping worked just fine on all those 3d ships. But not here, strange.

Zaxxon 2000?

  • Like 1
Link to comment
Share on other sites

Request A - Actually turn this bad boy into stunrunner! (looks like it could work!)

Yes, it absolutely can work, although Stunrunner is actually not exactly easy target given how much the arcade HW mimics jaguar's HW (except for frequencies) - especially in the 3D flatshading environment (where jag's OP performance is basically useless, as all it does is display the framebuffer):

 

Stunrunner HW had (compared to jag):

- TMS34010 for polygon rendering (vs jag's GPU) - renders environment

- ADSP2100 (vs jag's DSP) - renders vehicles

- 68010 (vs jag's 68000) - handles culling and AI

- Motorola 6502 (this is additional source of power, even if small, but for sure it could handle input&audio)

- Yamaha YM-2151 for sound (another chip)

 

The TMS34010 also had a scanline blit instruction, which is really all we need from Blitter for flatshading purposes. Having an efficient blit on GPU means in reality few pages of code (with all the 8/16/32-bit alignment of start/end of the scanline), so if jag's Blitter was limited to just this functionality (and 99% of the rest was removed), a flatshading performance wouldn't suffer anyway.

 

Now, we don't really know how well multithreaded the StunRunner engine really is, but it sure has a lot of potential with HW blitting, 2 RISCs an 2 CISCs (+Yamaha) in the box.

 

Zaxxon 2000?

For the longest time (this game is in top 5 games I would like to see how they could look when remade on jag and have extensive tech notes written on how the engine would work there), I was sure this would be an 2.5D ISO rasterized via Blitter.

 

But my recent test with the high-poly scene (those ~1,100 triangles) gives me great deal of confidence, that this could actually run flatshaded in 3D.

- Of course, we'd keep the camera angle the same (no firstperson nonsense, perhaps maybe other than for an experimental separate stage in a different game mode) to keep the iconic gameplay intact

- the problem with polycount is that all those on-screen objects (radars, turrets, walls, aircrafts, barricades, fuel tanks, floor markings...) add up real fast, and the overdraw is actually significant

- but with multithreading on DSP, I believe this could run at 30 fps at 768x200

 

- it would look absolutely stunning in HiRes, yet retain what made Zaxxon classic

- of course, this being 3D means that we could procedurally generate levels at run-time (it's a LEGO basically, you just place objects along a straight line - the engine is generic - it works with pointers to objects (regardless of how complex the objects are - it only takes 1 Byte to store object in the map, and another 6 to store it's 3D position)), so we could have levels that would take 15 minutes (or more) to finish. A classic Zaxxon level shouldn't need more than 1 KB of data in 3D (unlike 2D, which wastes all RAM real fast)

- EDIT: in 3D flatshading you gain the opportunity to adjust the color scheme for free (just change the color look up table values), so really each level could have a different "look&feel" using different colors - you could have darker/brighter sections of levels and do all kinds of visual effects that are utterly impossible to do with bitmaps (where all the lighting is baked in forever to the bitmaps)

- Sharp realtime Shadows : Projecting object's vertices on the floor gives an option for real-time shadows (Doom3-style pitch black, but this would be worth experimenting)

Edited by VladR
  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

Looks like I was wrong, and don't actually need DSP to cross 60 fps at 768x200. After my latest optimizations, I can now run track+player at exactly 60 fps, so Time Trial (with level-of-detail, that is still missing, but will provide a performance buffer) can comfortably run at 60 fps.

 

Features:

- Acceleration/Deccelaration via joystick (as you can see, the speed varies through the video, as I control it)

- Hills+Valleys

- Simple skybox to give a visual reference point for hills

- Collision Detection to keep player within the track

- Smooth camera interpolation for the camera offset above the track (the abrupt changes are due to track's non-smooth data at the top of the hill) - Fixed the 2D clipping glitch of previous builds that resulted in left and bottom polys to be clipped too early

 

- Dataset is now finally properly 2D clipped, but given the 3D frustum culling conditions, the track data needs to be manually adjusted for the culling not to occur as early as it occurs now (this is the empty screen space at the bottom that shows every few frames, that is NOT the clipping (but frustum culling))

 

I won't recreate the level dataset manually in code again (notepad "3d modelling" sucks) to be compatible with frustum culling - I'll rather spend the effort on getting my tools ready so I can start creating art assets in 3dsmax (which is easily 2 orders of magnitude faster than in notepad).

 

Optimizations resulting in much higher framerates at 768x200:

- 81 fps : Track (quite a lot of performance buffer for the 60-fps lock)

- 60 fps: Track+Player (no performance buffer yet, but getting there)

- 52 fps: Track+Player+Enemy (action gameplay will be locked to vsync 30 fps, with a lot of performance buffer to never fall under 30 fps)

 

 

  • Like 7
Link to comment
Share on other sites

- I believe I found my niche area (Flatshading in HiRes), where I can both show jag's strengths in 3D and use my current 3D engine for actual games

- A lot of games in past were experienced in such a terrible framerate that it made enjoying the games impossible

- I absolutely intend on keeping the framerate lock (e.g. 30 fps) to avoid the disgusting framedrops that plague so many games

- While I could have used a much older build to create the actual games, it would have been too prohibitive in past

- And the last thing jaguar needs is another non-smooth 3D game (jag has had its fair share of those, I believe!)

 

- There's so many early-3d games that are completely missing on jag that there's no shortage of ideas : from racing, through space games, all the way up to top-down 3D action games

- Obviously, I can't make an actual clone (e.g. actual StunRunner), as I do not have the license, so they can only be inspired by those games

- But, nobody can trademark "action racing", "sci-fi racing", "space shooter" so as long as my art assets are different, it's safe

 

- Since all current art assets have been created in Notepad, which is incredibly timeconsuming compared to 3dsmax, I will spend next month updating my PC-based 3d engine to create a simple VM/Emulator of jag, so that I can keep iterating between 3dsmax and 3D engine as fast as possible. Once the art assets have been authored and confirmed to look OK under my own jag "gfx emulator", I will proceed to importing them into jag's codebase.

 

- This is based on the workflow I had in past while I was working on my three PC-based 3D games, so I know from experience this set-up saves a lot of time

 

- As of now, I am leaning towards something like StunRunner / WipeOut, but there's still some time left before I pull the plug and start mass-producing the art assets

  • Like 5
Link to comment
Share on other sites

Looks like I was wrong, and don't actually need DSP to cross 60 fps at 768x200. After my latest optimizations, I can now run track+player at exactly 60 fps, so Time Trial (with level-of-detail, that is still missing, but will provide a performance buffer) can comfortably run at 60 fps.

 

Features:

- Acceleration/Deccelaration via joystick (as you can see, the speed varies through the video, as I control it)

- Hills+Valleys

- Simple skybox to give a visual reference point for hills

- Collision Detection to keep player within the track

- Smooth camera interpolation for the camera offset above the track (the abrupt changes are due to track's non-smooth data at the top of the hill) - Fixed the 2D clipping glitch of previous builds that resulted in left and bottom polys to be clipped too early

 

- Dataset is now finally properly 2D clipped, but given the 3D frustum culling conditions, the track data needs to be manually adjusted for the culling not to occur as early as it occurs now (this is the empty screen space at the bottom that shows every few frames, that is NOT the clipping (but frustum culling))

 

I won't recreate the level dataset manually in code again (notepad "3d modelling" sucks) to be compatible with frustum culling - I'll rather spend the effort on getting my tools ready so I can start creating art assets in 3dsmax (which is easily 2 orders of magnitude faster than in notepad).

 

Optimizations resulting in much higher framerates at 768x200:

- 81 fps : Track (quite a lot of performance buffer for the 60-fps lock)

- 60 fps: Track+Player (no performance buffer yet, but getting there)

- 52 fps: Track+Player+Enemy (action gameplay will be locked to vsync 30 fps, with a lot of performance buffer to never fall under 30 fps)

 

 

This looks very promising. I am a huge fan of hi-res flat shading in games. Are you familiar with a fighting game for the original PS1 called Tobal #1? 640X480@60fps, back in late 96.

Link to comment
Share on other sites

This looks very promising. I am a huge fan of hi-res flat shading in games. Are you familiar with a fighting game for the original PS1 called Tobal #1? 640X480@60fps, back in late 96.

Yes, I believe you showed this game to me either in this or some other thread. These are my thoughts on how a 3D fighting game could work on jag:

 

- We absolutely need multithreaded engine

- Player1 : GPU + Blitter

- Player2 : DSP (just SW rasterizing)

- Forward/Inverse Kinematics : This would slow things down considerably plus as a bonus make the code very complex and impossible to fit within 4 KB, so I'd rather go with Tweening (interpolation), which has almost zero cost per vertex, just very high storage costs (each animation frame has to be stored) - I used this a lot on PC in past

- We wouldn't brute-force store the duplicate data, just the deltas, so Index Buffer & Colors would be stored just once per character (2 KB)

- Let's say we use 200 vertices (~300-400 triangles) per character, one frame is thus 200 [vertices]*2[bytes per coord]*3 [x,y,z] = 1,200 Bytes

- each animation needs for highest quality a full uncompressed start/end frame -> 2,400 Bytes

- Each in-between frame needs only deltas, so 1 byte precision is enough from my past experience, hence 200*3 = 600 Bytes

- If each animation has 10 frames, 2 of which are uncompressed, we need 8 frames with deltas, so total: 2*1,200 + 8*600 = 7,200 Bytes = ~7 KB

- Of course, those 10 frames can be interpolated into 20 or 100 real frames (e.g. slow-mo replay cam) - that's the beauty of Tweening

- With 14-15 different moves, we need 15*7 = ~105 KB for all animations per character

- For two characters, that's about 220 KB in total, leaving us with a lot of memory for either more frames or more special moves, or we could store the pre-shaded polygon colors per each frame

 

- It should be possible, without spending huge engineering effort, to get this running at 2-3 vblanks, e.g. 60/3 = 20-30 fps. Expect framedrop of one full frame once the collision detection needs to happen when the characters touch.

- Of course, Tobal's characters are easily 2x-3x vertexcount of our 200 vertices, so that's impossible to match without huge reduction in framerate

 

 

- When I'm going to have 3D animated characters, I'll most probably go for something like Alien Breed 3D on PS3 - e.g. close-to-top-down 3D camera. Not a fighting game, I feel. Never was a huge fan of them, aside from Mortal Kombat...

Link to comment
Share on other sites

Yes, I believe you showed this game to me either in this or some other thread. These are my thoughts on how a 3D fighting game could work on jag:

 

- We absolutely need multithreaded engine

- Player1 : GPU + Blitter

- Player2 : DSP (just SW rasterizing)

- Forward/Inverse Kinematics : This would slow things down considerably plus as a bonus make the code very complex and impossible to fit within 4 KB, so I'd rather go with Tweening (interpolation), which has almost zero cost per vertex, just very high storage costs (each animation frame has to be stored) - I used this a lot on PC in past

- We wouldn't brute-force store the duplicate data, just the deltas, so Index Buffer & Colors would be stored just once per character (2 KB)

- Let's say we use 200 vertices (~300-400 triangles) per character, one frame is thus 200 [vertices]*2[bytes per coord]*3 [x,y,z] = 1,200 Bytes

- each animation needs for highest quality a full uncompressed start/end frame -> 2,400 Bytes

- Each in-between frame needs only deltas, so 1 byte precision is enough from my past experience, hence 200*3 = 600 Bytes

- If each animation has 10 frames, 2 of which are uncompressed, we need 8 frames with deltas, so total: 2*1,200 + 8*600 = 7,200 Bytes = ~7 KB

- Of course, those 10 frames can be interpolated into 20 or 100 real frames (e.g. slow-mo replay cam) - that's the beauty of Tweening

- With 14-15 different moves, we need 15*7 = ~105 KB for all animations per character

- For two characters, that's about 220 KB in total, leaving us with a lot of memory for either more frames or more special moves, or we could store the pre-shaded polygon colors per each frame

 

- It should be possible, without spending huge engineering effort, to get this running at 2-3 vblanks, e.g. 60/3 = 20-30 fps. Expect framedrop of one full frame once the collision detection needs to happen when the characters touch.

- Of course, Tobal's characters are easily 2x-3x vertexcount of our 200 vertices, so that's impossible to match without huge reduction in framerate

 

 

- When I'm going to have 3D animated characters, I'll most probably go for something like Alien Breed 3D on PS3 - e.g. close-to-top-down 3D camera. Not a fighting game, I feel. Never was a huge fan of them, aside from Mortal Kombat...

Wasn't really asking for a fighting game. I'd much prefer a racer/shooter. That game though, was always my 1st "holy shit" 3D moment. When I got to experience the 3DFX Voodoo and VooDoo2 cards, what a great time.

 

Very cool to see the jag can handle 3D, if not made to texture map. That always came up in old interviews, nice to see it also proved in practice.

Link to comment
Share on other sites

What sort of shape are you planning to draw there? A triangle requires 3 vertices, in the most efficient use of vertices the 1st triangle would take 3, and each additional triangle takes one additional vertex.. not terribly exciting, but 200 - initial 3 gives 197.. so at most efficient you are looking at 200 vertices = ~ 198 triangles.. making a simple object. Having a list of vertices is great and all, and you can use clockwise/anticlockwise to determine inner/outer-faces etc.. but you are also going to need to store a list of triangles, as unless you are building a very simplistic linear model how do you know which vertices form what part of which triangle?

 

You have also forgotten face colour parameters :)

 

All of this aside, I don't think the biggest problem with creating a 3D game is "fag-packet" memory usage calculations, probably need to more work out how many triangles you can handle, transform and cull, along with how many pixels you can paint, along with bus contention and available memory bandwidth. If you are going to continue with your memory calcs, don't forget to include costs of frame buffer, 1, 2 or 3 buffers (64KB for 8bit, 128KB for 16bit, which gets you fancy shading too without need for CLUT), as well as any 2D assets for hud/overlay. Probably want to chop off 200-300KB at least for audio.

 

Not saying it won't fit, just there are higher priorities than the memory footprint :)

  • Like 2
Link to comment
Share on other sites

- Dataset is now finally properly 2D clipped, but given the 3D frustum culling conditions, the track data needs to be manually adjusted for the culling not to occur as early as it occurs now (this is the empty screen space at the bottom that shows every few frames, that is NOT the clipping (but frustum culling))

 

 

In a 3D pipeline you have back face culling, frustum & screen clipping, there are no frustum culling.

 

You MUST clip polygons with the near plane (you can forget others planes), if you cull them (= don't draw) the whole polygon will disappear leaving empty spaces as soon as a vertex cross the near plane.

Edited by swapd0
Link to comment
Share on other sites

 

In a 3D pipeline you have back face culling, frustum & screen clipping, there are no frustum culling.

 

 

Not yet started my 3D voyage (with my own engine ;) ). Question though, in my understanding Frustum culling would be the process of removing everything behind the camera as early as possible as these meshes will never be seen (in this frame). Stuff that is in-front of the camera but has vertices off camera would be clipped rather than culled as you mentioned. Is this not the case?

Link to comment
Share on other sites

Well, actually you can cull polygons that are fully outside the frustum but the ones that intersect the frustum must be clipped.

 

If you don't clip a polygon with the near plane, when you do the projection from 3D to 2D (x2d = x3d / z; y2d =y3d / z) you will have the screen coordinates flipped because z is a negative number. The polygon will look something like this. So you must clip the polygon and draw the part that it's in front of the camera (z > 0)

  ------
  \    /
   \  /
    \/
    /\
   /  \
  ------
  • Like 1
Link to comment
Share on other sites

 

In a 3D pipeline you have back face culling, frustum & screen clipping, there are no frustum culling.

 

You MUST clip polygons with the near plane (you can forget others planes), if you cull them (= don't draw) the whole polygon will disappear leaving empty spaces as soon as a vertex cross the near plane.

C'mon, I am sure he knows that, he's been going on for a few years (literally) about how he knows everything there's to know about 3D.

Link to comment
Share on other sites

Can you fix the blinking/black space where it's constantly flashing out so it's seamless?

I fixed that yesterday night. Now it's completely seamless.

 

I also made the Medium and Low detail of the ships so that the LOD system can switch between them based on distance from camera. This will allow to use lot more on-screen enemies, yet retain the 30 fps lock.

 

Curious how this will turn once you add a MIDI sound file to the mix but it does seem promising, nice work.

Well, there's definitely more bandwidth of the system used in 768x200 than when I was using 320x200 :)

We shall find out in few months.

 

 

Wow! This is really amazing. I have a long way to go in my Jaguar 3D engine if you can crank out that kind of performance with super high-resolution flat shaded polys.

Thanks, but since you are from the very start splitting the load between GPU and DSP, you won't really have to spend time optimizing. Even a brute-force inefficient algorithm will suffice due to the parallel nature of execution.

 

Don't forget, I use only 4 KB for the code (3.2 KB at the moment). You have 3x more available (8 KB (DSP) + 4 KB (GPU) = 12 KB). If I had just 1 KB more, I could get another 10% boost easily. Gimme 2 KB and I'll get another 15%.

Gimme 8 more KB and ... :)

 

I like trying to extract maximum possible performance from just one core, though. It's more of a challenge that way :)

  • Like 1
Link to comment
Share on other sites

Thanks, but since you are from the very start splitting the load between GPU and DSP, you won't really have to spend time optimizing. Even a brute-force inefficient algorithm will suffice due to the parallel nature of execution.

 

Don't forget, I use only 4 KB for the code (3.2 KB at the moment). You have 3x more available (8 KB (DSP) + 4 KB (GPU) = 12 KB). If I had just 1 KB more, I could get another 10% boost easily. Gimme 2 KB and I'll get another 15%.

Gimme 8 more KB and ... :)

 

I like trying to extract maximum possible performance from just one core, though. It's more of a challenge that way :)

 

 

I ended up moving everything over to the GPU because I ran into hardware bugs in the DSP (something about external writes failing under certain conditions) that made the program not work on real hardware.

 

I have one GPU program that handles matrix calculations for building transformations and then another GPU program that handles projection and blitting. Are you saying you're doing all that in one GPU program? My matrix program (which includes translation, rotation, multiplication, and fixed-point add/subtract/multiply/divide functions) alone is 3.8KB.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...