Jump to content
IGNORED

Road Rash pre-alpha on Jaguar at 30 fps


VladR

Recommended Posts

I was hoping someone would call me out on that :)

 

Higher MHz but the 3DO CPU is both RISC and ARM

The 68000 takes multiple clock cycles to execute a single instruction. For example, a nop instruction takes 4 clock cycles. ARM CPUs of that era have conditional execution instructions that can be executed in a single cycle (the only exceptions being branches, memory read/write instructions and multiply). Although the ARM lacks the 68K's divide instruction, division can be carried out with an unrolled loop algorithm on ARM (using Newton–Raphson division). The 68K's BCD arithmetic/conversion can be replaced with double dabble BCD - note that the "C" software implementation on that wiki entry is very poor.

 

Edit: Added a link to the fast division typically used on ARM.

Edited by GroovyBee
  • Like 4
Link to comment
Share on other sites

I appreciate the research you are doing VLAD but realize you don't have the time or resources to create a game that would have taken a team of 30 and a few hundred thousand dollars to create 20 years ago.

Actually, I slightly disagree with that, let me explain, but basically, the corporate environment is incredibly non-productive for this kind of R&D work:

- I don't have to share codebase with anyone, which means

- I don't have to loose time on code merges, branching and all related source control uber-clusterfuck issues

- There's no migration from one source control system (insert any other infrastructure element here) to another every 6-12 months, when project manager reads an article in Cosmopolitan magazine during weekend

- There's nobody fucking up your code because they're learning a new source control system

- There are no daily scrums that fuck up the quality of your work because you are pressed to deliver "something". I can make a decision that I will work on a feature for 3 days, and nobody will give me the dirty look every day that "oh, it's in progress, but can you present it now (we're agile)?"

- There are no "status meetings" that come in the middle of the debugging session that kill your focus, and when you come back to your desk at 4pm, you feel like shit, absolutely not in state to continue the 2-hour debugging session you engaged before, hence you postpone for tomorrow, which will kill half day realistically, and most probably will be interrupted by any instrument of the corporate bullshittery

- so, what only needed to take additional 15-20 minutes, is suddenly spread over 2 days

 

- I have yet to meet a manager that understands that there's no such thing as "quick 10-minute meeting" when you're in the middle of coding. They're really good at pretending they do, but they don't really understand/care that we see that they don't.

 

- I don't do lunch breaks on weekends. Sometimes I have 14 hour coding sprees without eating.

 

- I've had many weekends that I produced more since Friday evening that I can realistically do in a corporate environment in 2 weeks.

  • Like 2
Link to comment
Share on other sites

^Good points on the programming side of things, but coming up with all the artwork, 3d models, textures etc to make a complete 3D game per 1994/1995 standards still seems like it would take a very long time in and of itself. It seems that having multiple people work on the artwork would greatly speed up the process.

Edited by sirlynxalot
  • Like 1
Link to comment
Share on other sites

^Good points on the programming side of things, but coming up with all the artwork, 3d models, textures etc to make a complete 3D game per 1994/1995 standards still seems like it would take a very long time in and of itself. It seems that having multiple people work on the artwork would greatly speed up the process.

In past, I have bought several large texture packs, that are of resolution 1024x1024 (and up), and cover many different materials. I learnt few skills how to create textures in PaintShopPro from multiple layers, using various effects, blend maps and transition alpha maps, that were sufficient for roughly 2000-2005 era, so these skills are waaaay above what's realistically needed for jag, as jag barely manages simple lowres texturing, let alone resolution 1024x1024...

 

As for 3D meshes, I spent years working with 3dsmax (almost every week, on top of my coding), creating way over a hundred of mostly low-poly (as in, non-normal-mapped) meshes. Anything anorganic - from buildings, interiors, bridges, various props, trees, crates, you name it...

 

Never had patience for characters, though. That's an area I always left for artists.

 

 

Let's take Crash n Burn, for example. I counted about 10 textures (2 for road, lava, water, sand, 3xstone, skybox), which, by the way, look like they were taken from some texture library anyway (and just retouched). There could be more environments in later levels that require more textures, for sure. I noticed only 3 simple transitions between 2 materials on the road and environment.

 

Not sure how many different 3D cars are there. 8, I guess ? Now, you could spend polishing and retouching the single car mesh for a week. But a core, useable version of those low-poly cars can be absolutely done in half day. Spend first weekend experimenting with the art direction, playing with the style and then another weekend just quickly bashing them out.

I once employed a 3D artist, that whipped up a similar-quality version under an hour. It was fascinating watching him, as I never was a fast modeller myself.

 

As for the cars, since they're prerendered, you don't have to loose time fighting for those precious collapsed vertices, and can just literally take a hires cube and keep quickly extruding left&right. Same for texturing, the source texture can be as highres as needed, no need for timeconsuming UVW unwrap, just direct and quick paint with materials.

 

Would it look like AAA work ? Of course not. It's a programmer's art, after all!

 

But it would be good enough, and free, for sure.

 

Besides, who's talking about a full game with many levels and full art variety ? My goal was always about one level.

 

 

In short, you can absolutely create full 3D game of that era as a one man team. Won't win any "Best Art" awards, but it's entirely doable. Just need to pick a reasonable genre without 3D characters and with not a lot of complex graphics (like, Star Raiders, for example - simple space background and about a dozen of simple, untextured meshes).

  • Like 1
Link to comment
Share on other sites

In past, I have bought several large texture packs, that are of resolution 1024x1024 (and up), and cover many different materials. I learnt few skills how to create textures in PaintShopPro from multiple layers, using various effects, blend maps and transition alpha maps, that were sufficient for roughly 2000-2005 era, so these skills are waaaay above what's realistically needed for jag, as jag barely manages simple lowres texturing, let alone resolution 1024x1024...

 

As for 3D meshes, I spent years working with 3dsmax (almost every week, on top of my coding), creating way over a hundred of mostly low-poly (as in, non-normal-mapped) meshes. Anything anorganic - from buildings, interiors, bridges, various props, trees, crates, you name it...

 

Never had patience for characters, though. That's an area I always left for artists.

 

 

Let's take Crash n Burn, for example. I counted about 10 textures (2 for road, lava, water, sand, 3xstone, skybox), which, by the way, look like they were taken from some texture library anyway (and just retouched). There could be more environments in later levels that require more textures, for sure. I noticed only 3 simple transitions between 2 materials on the road and environment.

 

Not sure how many different 3D cars are there. 8, I guess ? Now, you could spend polishing and retouching the single car mesh for a week. But a core, useable version of those low-poly cars can be absolutely done in half day. Spend first weekend experimenting with the art direction, playing with the style and then another weekend just quickly bashing them out.

I once employed a 3D artist, that whipped up a similar-quality version under an hour. It was fascinating watching him, as I never was a fast modeller myself.

 

As for the cars, since they're prerendered, you don't have to loose time fighting for those precious collapsed vertices, and can just literally take a hires cube and keep quickly extruding left&right. Same for texturing, the source texture can be as highres as needed, no need for timeconsuming UVW unwrap, just direct and quick paint with materials.

 

Would it look like AAA work ? Of course not. It's a programmer's art, after all!

 

But it would be good enough, and free, for sure.

 

Besides, who's talking about a full game with many levels and full art variety ? My goal was always about one level.

 

 

In short, you can absolutely create full 3D game of that era as a one man team. Won't win any "Best Art" awards, but it's entirely doable. Just need to pick a reasonable genre without 3D characters and with not a lot of complex graphics (like, Star Raiders, for example - simple space background and about a dozen of simple, untextured meshes).

 

Says a guy you has never followed through on anything, despite it apparently being so easy for one person to do.

Link to comment
Share on other sites

 

Says a guy you has never followed through on anything, despite it apparently being so easy for one person to do.

:)

 

You're wrong, btw. I have 2 full released PC boxed games under my belt (coded from scratch, with lots of production art (both 2D and 3D) done by me). I just lost interest in PC game coding upon transitioning to US from Europe, so didn't hire artists to do levels to my last Steam game, that's all. With coder's salary in U.S., there's little financial motivation in coding games for money, really.

Say, it takes me 2 years to finish a game, and then my cut is $0.5 Mil. Absolutely not worth the risk and actual time you spent on it, as it's never 40hrs a week. Back when I was in Europe ? Sure. But not really in U.S.

 

 

Also, I don't know how many times I've mentioned that I need to spend a bit of time coding importing/rasterizing arbitrary 3D meshes to my jag engine.

 

Like, 15-30 times by now ?

 

 

Apparently, it's a complex concept to grasp [that it does not make sense to create 3D meshes if there's no way to import them]. I apologize, I did not know.

 

 

 

But if your goal is to make sure I reconsider releasing public binaries of my work here once it's done, as they might be played by people like you, then you're doing a great job!

 

What have you created on Atari Jaguar that gives you right to bitch about my research&development work, that I do [in my free time for zero profit] ?

You haven't contributed anything to this thread. And you don't have to be coder. Just linking YT videos of various games by various people has helped tremendously.

  • Like 1
Link to comment
Share on other sites

Yesterday, I've implemented terrain coloring by reading color value from a texture, but as expected, it does not give a huge visual boost, as it's determined primarily by vertex density of terrain, so the visual effect is minimal (and probably not even worth the video).

 

Today, I implemented a quick and dirty texturing that has glitches(as it's unfinished), but the most important bandwidth performance impact of texturing (reading texture for each and every single pixel of terrain and interpolating UV values across the scanline), is there.

 

Here are the benchmark numbers (from rendering 2000 frames of the flythrough):

10.8 fps: Textured

15.1 fps: Flatshaded

 

I'll post video later this evening, got an errand to run now.

 

 

So, now everybody who told me "I told you so - jag can't do it", can jump in and start posting the Simpsons animated GIFs :)

 

And of course, there's no input, audio, AI. Yes, I know.

 

But, I'm not giving up! All this means is, that my triangle rasterization routine needs more improvement, because, quite frankly, the difference between flatshading and texturing is much smaller than I expected. Right now it's what - 39 %?

  • Like 3
Link to comment
Share on other sites

So, now everybody who told me "I told you so - jag can't do it", can jump in and start posting the Simpsons animated GIFs :)

 

Nobody told you that. They told you it couldn't be done at that speed on the 68000/ in C - which you've just proved.

 

Nice work on the GPU renderer.

 

[Edit]

 

BTW, what are those 2 PC games you've completed? I think we're all wondering what a finished Vladr Production looks like :) [j/k]

  • Like 8
Link to comment
Share on other sites

Alright, here's the glitchy video:

 

Yes, the texturing glitch sucks, but I didn't have time today to fix it, so I'd rather post it as it is, because 95% of performance impact of texturing is there.

 

I think this will force me to just bite the bullet and go for multi-threaded renderer, to get about 40-65% additional performance.

 

 

Also, this proves that the scanline method for racing games must be preferred, as it can be offloaded to OP/Blitter, freeing GPU/DSP.

 

So, a 30-60 fps Road Rash on jag is still doable at the city section, as road will be done on OP.

I always said that terrain section will be slower, but only now we know for sure how much slower it actually is.

Though, granted, this terrain is overkill for RoadRash, so perhaps I should go and simplify it for Road Rash.

 

On the other hand, I just realized, that I don't have to pay the huge fillrate cost for the road with RoadRash, so a combined renderer (e.g. 3D for terrain and 2D scanline for road) even at current form should reach 20 fps for the terrain part (but 60 for road, as it's OP-based).

 

No idea, how it will look with 2 different framerates (e.g. 60 fps for road, and ~20 fps for terrain) - but that's where this must be going to extract all power from jag (e.g. each chip working at full speed).

  • Like 3
Link to comment
Share on other sites

Well you're not using the blitter for any of this GPU stuff so the numbers may be different if that route was taken.

For a Flatshading - Yes, there might be a threshold where a single scanline filled by Blitter would be faster than on GPU. It is actually quite high on my to-do list of experiments.

For texturing - not so sure. While in theory, once you initiate the Blitter command, your GPU can work nicely in parallel , as long as it does not touch RAM.

But GPU is touching RAM, as it's rendering scanlines.

 

You would have to interrupt rasterizing and switch to a completely different task, which involves saving and later restoring all registers, so there's really almost zero time that the GPU can do something productive, as one scanline is not a lot of GPU time (I have measured that, actually).

Plus, that time is lowered, as you have to compute all registers for Blitter.

 

 

There is, however, a completely different algorithm, I came up with on a paper, few weeks ago, that would truly unleash Blitter for Flatshading, as instead of producing single scanlines, it would rasterize whole polygons in one call. You'd have to fill the rest the old-fashioned way, but the savings should be really substantial.

I'm just not sure if it's doable during one weekend, though, as it's quite involved (definitely more complex than texturing) and it's a lot of math, so there's possibility, that by the time I compute all intersections and line equations, the performance savings have already been evaporated.

 

But, unless I try, I won't really know.

 

 

The point is, there are lots of ways how to speed things up.

 

And while the performance difference between texturing and flatshading is almost 2 full vblanks worth of GPU time, I still believe, it is possible on jag, to reach 15 fps textured, on that Crash n Burn large drawing distance terrain.

 

This is what I need to do:

- break up GPU code into smaller chunks and keep only texturing function so that I can fit a lowres texture into its small cache

- Implement multi-threaded rendering

 

So, we're not there yet...

Link to comment
Share on other sites

  • 2 weeks later...

- I took a little break from the textured terrain and went exploring a different visual 3D style - Tempest

- Over the weekend I did the first iteration

- this morning I implemented a HiRes resolution - 640x200

- It's still surprisingly smooth

- For some reason I don't see the full screen width via videoout, while 512x200 fits just fine (right at the screen edge, though), 640x200 not really

 

 

Looks like Rez would be entirely possible on Jag, even in HiRes...

  • Like 3
Link to comment
Share on other sites

Looks cool. The creator of rez says he was inspired by tempest 2000 anyway so there's an unofficial connection there.

That's Interesting. Do you have a link ? I'd love to read more.

 

I'm currently processing only 1,200 points, but it's total brute-force, I'm even doing full 3D transform and 2D culling per each point - which can be optimized relatively easily to reach a higher number and get closer to the point cloud density of the Rez.

 

 

 

Is there any game in this kind of visual style (on any other console) that would be a step between Rez and Child Of Eden ? Child Of Eden is beautiful, of course, but way too detailed for jag, unfortunately...

Link to comment
Share on other sites

Nitrous Oxide on the PS1.

Thanks. For some reason I had a different racing game associated with this name. This is indeed tempest-inspired.

 

This must be the first racing game I've seen that moves through the world very slowly. It's a great cheat to partially obscure the lower performance of the 3D engine. Too bad we can't use it for car-racing genre, as then even with full-screen texturing the framerate would still feel quite good.

 

Small update:

- I did a first round of optimization of point cloud and managed to decimate the inner rendering loop to just 2 instructions (+2 for loop), so the point cloud portion of the scene is now roughly 4x times faster, so I can use 5,000 vertices in scene now. That, of course, is more than enough for something like Rez.

 

One has to wonder, if we used DSP for rendering in parallel with GPU, what kind of crazy detail would then be possible on Jag. DSP wouldn't obviously double the total amount of points, as there would be some overlap when both GPU and DSP would be trying to access the framebuffer in same RAM, but it still would be a great experiment...

  • Like 1
Link to comment
Share on other sites

Since I can't edit the last reply (permissions issue):

 

Technical EDIT: Looks like it's finally the time to start using the old-school 8-bit assembler techniques like loop unrolling, inlining and self-modifying code. This way I could double the performance of the inner loops again (by unrolling them, as I now use 2 ops to render and 2 ops for loop overhead) and split rendering into substages, where each substage would have the actual self-modifying code prepared for yourself by GPU (this would remove lots of conditions that have to occur only for half of the dataset for example, or would use specialized inlined transform code without lots of checks, which would eventually result in faster culling, and so on...)

  • Like 2
Link to comment
Share on other sites

This exploratory detour is proving very useful for few reasons, but mostly:

- I got a new idea on texturing, while implementing point cloud (one thing eventually feeds another, albeit unexpectedly)

- this looks to be the second type of 3D environment Jaguar's HW was created for, yet remains to this day unexplored

- I think I came up with a novel culling approach that's actually going to use OP's feature set indirectly, as a side effect - I need to implement it first, but I think it should work. If it does, it should improve the framerate of flatshading terrain too and remove all culling calls for point cloud (e.g. literally free culling)

 

I've unrolled the inner rendering loop and instead of rendering one thousand points, I didn't just merely double it, I've literally made an order of magnitude jump to 10,080 points.

 

And it's still stupidly fast.

 

And it's even in higher resolution : 512x200

 

 

 

I understand that this video does not look in any way special, but I'll now go and try to converr those 10,000 points into something useful - e.g. something resembling a Rez level.

  • Like 3
Link to comment
Share on other sites

Well you're not using the blitter for any of this GPU stuff so the numbers may be different if that route was taken.

Alright , I decided to spend a day with various GPU <-> Blitter experiments. See, [for comparison and discussion's sake] if you can dig up some older threads (or if memory serves you right - from older discussions) on this, as it's all fresh in my head right now.

 

I've returned to Road Rash, as its road and building texturing is best suited for Blitter's scanline approach. All numbers are in percentage of the time of 1 vblank (there's 60 vblanks per second on NTSC) - e.g. 30 / 100 means routine takes 30% of one vblank, and leaves 70% available for anything else.

 

The following summary should prove useful to anyone experimenting with texturing on GPU.

 

1. 99/100 : Default approach : RAM Texture

- loads texel from RAM (the texture) and stores it into RAM (framebuffer)

- simple, brute-force, short code

 

2. 85 /100 : Mirrored : RAM Texture

- reads just half of texels, duplicates writing current texel (from the right end of the scanline)

- uses the fact that the road texture is horizontally mirrored

- reduces RAM bandwidth by 25%, slightly more code

- reduces road texture size by 50%

 

3. 45 /100: GPU texture

- replaces loading texels from slow RAM by loading them from cache

- thus reducing RAM bandwidth by 50%

 

4. 37 /100: GPU Texture + GPU Scanline + Blitter

- instead of writing the scanline to framebuffer in RAM, we write it to a cache (we keep 1 scanline in cache)

- the code is more complicated, as it has to merge 4 pixels into one 32-bit value, and that's actually surprisingly slow (bitshift + or)

- Use Blitter to copy scanline from GPU

 

5. 30 /100: GPU Texture + GPU Scanline + 2x Blitter

- We're generating only half of the road scanline

- Run Blitter twice with same scanline data - e.g. second time with XSIGNSUB flag (rasterizes backwards)

 

6. 21 /100: No Blitter wait

- Removing waiting for Blitter results in best performance - ~80% of vblank is still available

- currently unused (as I don't have anything meaningful to run in parallel to Blitter), but this free time could be used for something else - e.g. transformations, terrain processing, and so on

 

7. XX /100: Phrase Blitter transfer

- to be implemented later

- From past experience - using phrase mode proved to be ~6.5x faster (compared to pixel mode) when I was working on H.E.R.O. renderer

- problem : exact aligning on GPU costs either shitload of unrolled instructions or a slow loop (must run 4 times for all 4 edges per scanline) - this will eat up a lot of GPU time (and cache for code), so I'm not exactly inclined towards implementing it right now, as any gain will be offset by the alignment above and pushing/popping registers for other code to run in parallel - not worth my time right now

 

 

Additional applications:

- Road Rash: City levels - for buildings

- there's going to be a threshold number of buildings that the engine will be able to render alongside the road at 60 fps, as right now I have 70% of vblank available

- H.E.R.O. 3D HiRes

- 512x200 : 40-60 fps

- 1024x200: 20-30 fps

- NFS

- Using Blitter gives a possibility of angled rasterizing - e.g. emulating camera roll (road can be tilted left and right) - just like in NFS

 

 

Lessons learnt:

1. Short scanlines

- Unlike when I was running Blitter from 68000 (for H.E.R.O.) , blitting short scanlines from GPU does not kill performance, like at all

- Example: Blitting first 52 scanlines (the longest ones - from the bottom) takes 12/100, blitting another 26 takes 3/100, and remaining 22 only 1/100 (that's 22 calls to Blitter, and setting up all the registers, for what is maybe 50 pixels in total)

- Those last 22 scanlines, on 68000, would take up a giant portion of the vblank, despite minimal screen coverage

- So, on GPU, you can afford to literally abuse Blitter (due to the HW wiring of registers, et al) and don't have to compromise on the algorithm's elegance (and debugging)

2. Somehow, blitting from cache, appears to be faster

- I don't have the numbers to back this up, but I spent a LOT of time with Blitter from 68000 (for H.E.R.O.), so I have a very good idea about the Blitter's throughput

- It's possible it's only because setting up Blitter from GPU is just so much faster (due to HW wiring), but I'll examine it at a later time

3. No urgent need for OP-based scanlines

- Since it takes only 30% of vblank to texture the road, I actually don't have to go down the road of OP-based scanlines (as I did with the old 68000 prototype), where I'd be updating 100 OP bitmaps.

- Eventually, down the road, it may be beneficial to transfer the bandwidth cost to the OP, but for now, there's no real need.

- Besides, computing an OP list of 100+ bitmaps is not going to be free either, so the difference between OP and Blitter might actually become negligible. Definitely, not worth my time right now.

 

RoadRash ToDo:

1. Reimplement the same texturing algorithm for buildings

2. Implement paging (first GPU chunk will be road texturing, second will be building texturing)

3. Find the building threshold to keep 60 fps

4. Input

5. $$$!!! (j/k)

  • Like 2
Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...