Jump to content
IGNORED

Road Rash pre-alpha on Jaguar at 30 fps


VladR

Recommended Posts

There's one more thing I forgot to mention about "Project M" on the Atari XE130... The texture used in the "Wolfenstien" demo; they didn't use any bit maps, but was done in assembly/binary code, which I thought was pretty interesting. It's a very note worthy mention; I wonder if the Jag could do something procedural with texture mapping for the sake of speed and memory? That would certainly help tilt the unbalanced nature of the buggy Jaguar if the textures were very small, compact, yet machine code fast for that on-the-fly real-time rendering.

  • Like 2
Link to comment
Share on other sites

... The texture used in the "Wolfenstien" demo; they didn't use any bit maps, but was done in assembly/binary code, which I thought was pretty interesting.

Of course, on 6502, you can't beat the

LDA #$13    ; 2 cycles
STA $1300   ; 4 cycles

combo, which is just 6 cycles. That's the fastest possible way on 6502 - you just unroll the data into code.

 

I wonder if the Jag could do something procedural with texture mapping for the sake of speed and memory?

Oh, no. Faaaar from it. The 8-bit Atari is vastly superior to Jaguar in this regard:

 

1. You can have 4 MB of a full-speed unrolled code on 8-bit 6502 (via PORTB bank-switching)

2. You can have 4 KB of a full-speed unrolled code on 64-bit jaguar

 

The little 8-bit Atari has literally over 3 orders of magnitude more accessible RAM for a fast unrolled code. It's literally order of magnitude easier to handle bank switching on 8-bit, than it is to **properly** swap code into GPU on Jaguar.

 

Now let's consider the speed of access to the new code:

1. On 8-bit Atari, the new bank is available within 1 cycle. In words: ONE cycle

2. On 64-bit Atari, this is what you need to do:

- compute (or look up) the size of new code chunk

- Turn GPU Off

- Set up ~10 registers for Blitter

- Initiate Blit

- Initiate endless loop waiting for Blitter's mighty 64-bit snail blitting to finally finish, thus killing another processor for substantial period of time

- Set the PC for GPU

- Turn GPU On

- Of course, this presumes you are aware of the SMAC assembler bugs and issues, and what happens if you foolishly attempt 32-bit aligning of the GPU code - that was one fun discovery : )

 

Also, on GPU, great majority of instructions take 3 cycles (so, no real win compared to 8-bit Atari), and with pipelining, you usually get to about 2.2 average throughput. I can't however stress how many times, I wasn't able to get under 2.0 due to those stupid HW bugs, where rearranging instructions reveals the ugly bugs, hence you either have to insert NOPs or use a much slower combination of instructions (but the one which actually executes).

 

 

Just because you compiled a legal GPU code, it doesn't mean it will execute. More than likely, not :)

 

If jag docs are to be believed, MOVEQ only takes 2 cycles, instead of 3, thus the closest you can get to the above LDA/STA combo in terms of speed is:

MOVEQ #$1F, r0  ; 2 cycles
STORE r0, (r1)  ; 3 cycles
ADDQ #4, r1     ; 3 cycles

Unfortunately, the allowed range of values for MOVEQ is only 0-31, so 32 colors are max. Of course, writing to GPU cache is always 32-bit, so you waste 24 bits (but, let's just say you are ok with this speed/size compromise for this particular case). I'm sure you noticed that you can't store a value to a memory directly. All storage is indirect via registers (hence the third instruction)

And I benchmarked the other case - where you just pack all bytes together. It's much slower than just writing from GPU to main via STOREB.

 

So, no - not even 64-bit jaguar can beat the little 8-bit micro from '70s in terms of this efficiency, as crazy as it sounds. The only advantage Jag has in this regard, is higher clock speed. But, it's still brutally [per clock cycle] inefficient in this regard compared to 6502.

 

 

 

That would certainly help tilt the unbalanced nature of the buggy Jaguar

Nope, I'm sorry, you're unfortunately wrong on this one, as that "feature" of jag just plain sucks. What was supposed to be an alpha version of the chip, Atari deemed as production candidate...

 

Nothing can "tilt" or "explain" or ease the bugs. You literally can't think of the code, as you are writing it, you must think of all the HW bugs, as you are writing it and adjust algorithm around it.

I used to just keep another text file open, right next to the GPU source code window, for instant verification if the code I wrote is remotely runnable or not.

 

That GPU-bugs file got so big (recently grew, because of the DSP-specific bugs), I actually have to scroll it. I think I'll have to buy a bigger TV because of the GPU bugs, mine's only 50"...

Link to comment
Share on other sites

 

Oh, no. Faaaar from it. The 8-bit Atari is vastly superior to Jaguar in this regard:

 

1. You can have 4 MB of a full-speed unrolled code on 8-bit 6502 (via PORTB bank-switching)

2. You can have 4 KB of a full-speed unrolled code on 64-bit jaguar

Right... And the "GPU-to-main work around" requires full 64bit due to a bug with the multiplexer not getting enough current when making jumps, which means more cycles lost, but is still somewhat useful provided the access isn't in a tight loop less you wind up "Hammering the GPU". Makes for great limited use like AI and the such, but still very limited when it comes to graphics for texture mapping in a speedy manner. The GPU has a great ALU on it that's fast; it seems like procedural textures would be ideal considering the two 16bit multipliers running parallel, but I guess that ram issue is always going to slow things down at some point.

 

Here's something from the Jag manual concerning it's math capabilities...

The GPU is also intended to perform rapid floating-point arithmetic. It has no floating-point instructions as such,
but has some specific simple instructions that allow a limited precision floating-point library to be capable of in
excess of 1 MegaFlop.

One of the reasons I chose 2.5D for the GPU is because of the fast math; it would be overkill for that sort of thing... I still stay hopeful, but it's not all that surprising considering every good Jag programmer all express their frustration with the system and its bugs.

 

Unfortunately, the allowed range of values for MOVEQ is only 0-31, so 32 colors are max. Of course, writing to GPU cache is always 32-bit, so you waste 24 bits (but, let's just say you are ok with this speed/size compromise for this particular case). I'm sure you noticed that you can't store a value to a memory directly. All storage is indirect via registers (hence the third instruction)

 

 

At 32bits that would be 8bits x 4, but would mean 4 times the data in cache... Another reason I would refer to the DSP, 68K or both to do some 3D work prior it the GPU handling whatever the outputs are. By the time it reaches the GPU, a great majority of the work is already done. Simply let the GPU do some fast 2D rendering based on the per-calculated stuff since the DSP have full access to main ram at 16bits. But the info you're dishing is very helpful only confirming a lot of stuff that I here from Jag programmers.

 

That GPU-bugs file got so big (recently grew, because of the DSP-specific bugs), I actually have to scroll it. I think I'll have to buy a bigger TV because of the GPU bugs, mine's only 50"...

 

 

I read somewhere that the Jaguar can do 720x480... Every tried another resolution size? Your vertical res is low, but you use a lot of lines within that low res. If the Jag can handle that many lines, seems like something a little more modest in resolution might be in order, which could help to free up some cycles.

Edited by philipj
Link to comment
Share on other sites

Of course, on 6502, you can't beat the

LDA #$13    ; 2 cycles
STA $1300   ; 4 cycles

combo, which is just 6 cycles. That's the fastest possible way on 6502 - you just unroll the data into code.

 

.....

 

Now let's consider the speed of access to the new code:

1. On 8-bit Atari, the new bank is available within 1 cycle. In words: ONE cycle

.....

Forgive my ignorance but in order to switch banks are you not supposed to execute something like:

 

LDA #$BANKNUM;

STA $BANKSWADDR; //$D301 most likely, not sure about the 4MB board you are referring to

 

at every bank switching juncture?

Sure beats setting up the blitter to copy 4K, but it is not exactly one cycle either.

Does the 4MB expansion perform autoswitching?

Link to comment
Share on other sites

Right... And the "GPU-to-main work around" requires full 64bit due to a bug with the multiplexer not getting enough current when making jumps, which means more cycles lost, but is still somewhat useful provided the access isn't in a tight loop less you wind up "Hammering the GPU". Makes for great limited use like AI and the such, but still very limited when it comes to graphics for texture mapping in a speedy manner.

Those rules are actually not 100%.For some time, it may seem so. I spent about 2 weeks with GPU-in-main code. Wrote a LOT of such code. Tried having the whole road rendering from Main. It is very unreliable. Just do a simple loop of 1,000 and most of the time it won't get to the end. Same executable, just run it few times.

In the end, I couldn't trust it, regardless of how much effort I spent in aligning all jumps and confirming by doing HexDump that those values are, indeed, at requested aligned addresses - because you can't really trust assembler (hard lesson, btw - about 2 days time worth). Now I have run-time hexDump on a keypad, so at any time I can confirm that the code I wrote, is indeed, at the address I want it.

 

In the end, the benefits of gpu in main are very marginal - its greatest benefit is actually in having the GPU-debugging code - e.g. you don't have to fill the tiny 4 KB with number-writing functionality. But, now that I do run-time debugging on 68000, I don't need it anymore for debugging.

 

The second greatest benefit of gpu-in-main is that you can have all the outer loops, that take less than 1% of frame time (but several pages of code inside precious 4 KB), slowly execute from main - as it's still much faster than doing code swap. And you gain the space for some new feature or optimization, that you didn't have the space for, previously.

 

But, that was when I was just single-threaded. After gpu-in-main proved unreliable, I started using the computing power of 68000 and did some drastic refactoring, but gained enough space in 4 KB to implement some major optimizations, that were impossible before (because there was no available space for them in 4 KB, with all other code there).

 

 

 

I'm sure you could write a synthetic benchmark that would "show" how 68000 slows the GPU down when it's on. But even ONE 4 KB swap removes more MIPS from your engine, than having 68000 banging on the bus all the time.

Somehow, you never hear those people mention this "tiny technical detail" :lol:

 

Here's something from the Jag manual concerning it's math capabilities...

The GPU is also intended to perform rapid floating-point arithmetic. It has no floating-point instructions as such,

but has some specific simple instructions that allow a limited precision floating-point library to be capable of in

excess of 1 MegaFlop.

Forget floating point. It's order of magnitude slower than fixed-point or integer. What's 1 MFlop compared to 15 MIPS ? I don't even use fixed point everywhere. I just use plain integer in tight loops, e.g. just one instruction, no need for shifting. That is the fastest way. Yes, it needs some experimenting, and refactoring, and adjusting on the input side, but you gain tremendous performance back for a little work.

 

 

... Another reason I would refer to the DSP, 68K or both to do some 3D work prior it the GPU handling whatever the outputs are. By the time it reaches the GPU, a great majority of the work is already done. Simply let the GPU do some fast 2D rendering based on the per-calculated stuff since the DSP have full access to main ram at 16bits.

Yes, that's exactly what I've been trying to explain last half decade. DSP is basically another GPU in terms of raw computing power. Nobody is going to drive Blitter from DSP (it's not on the same 64-bit bus as GPU is, after all). But guess what. My transformed coordinates are 16-bit, so it doesn't matter that DSP only has 16-bit access to RAM. It's exactly identical to the speed with regards to GPU (save for DMA priority) for what I need it to do.

Best thing, if I remove all the transform code from GPU, I will gain enough space for code to handle vertical Blitter stripes, which will bring additional performance boost (though, granted, it's mostly for Quake-style 3D scenes, with lots of thin vertical walls (e.g. pillars)).

The interrupts will take care of playing audio in parallel, which doesn't really consume all that much MIPS anyway.

 

 

 

I read somewhere that the Jaguar can do 720x480... Every tried another resolution size? Your vertical res is low, but you use a lot of lines within that low res. If the Jag can handle that many lines, seems like something a little more modest in resolution might be in order, which could help to free up some cycles.

All of my recent vids are in 768x200. The video chip may display only 720 of those 768, but Blitter, OP, GPU, 68000, all need to work with 768 pixels (even though you don't physically see 48 of them), as OP does not offer direct 720 width.

And no, I tried. It's not faster to ignore those 48 px by adjusting DWIDTH in the OP phrase.

  • Like 1
Link to comment
Share on other sites

  • 1 month later...

Can I please have some example YT vids of non-car racing 16/32-bit games that showcase a physics of the player's vehicle behavior (the controls) - must be something remotely comparable - e.g. along the lines of StunRunner / Wipeout ?

 

I just implemented the simplest possible baseline physics integer model that currently handles the following (currently only for strafing, as it doesn't make sense to auto-slowdown the vehicle for this game):

- separate acceleration rate when player holds left/right

- separate deceleration rate when player goes in opposite direction

- separate deceleration rate when player does nothing (e.g. inertia)

- max strafing speed

 

It's very fast, because it's not even fixed-point - just integers, I made sure it works in world-space coordinate system, so it doesn't need to be migrated to different coordinate space (which helps).

 

I created couple presets that allow me to have different behaviors (e.g. "slow tank", "nimble but not very fast", "ultra-fast", "average"). Probably would make sense to allow the player a choice (or gradual unlocking of better vehicles - kinda like wipeout does, I guess)

 

But, it would be great to see what was considered "standard controls physics" for this kind of games, back in the day, as I was strictly a PC player at the time. Please, no Playstation games, let's make a cut-off somewhere around Saturn/3DO, but preferably something that is closer to jaguar in terms of raw performance.

Link to comment
Share on other sites

I checked out the following 3 games that are close/similar in the setting :

 

F-Zero (SNES): no physics whatsoever, not even strafing physics

Wipeout : not a good comparison point, as it's a first-person racing game, and its physics is based primarily on camera banking (the roll) - that is not really applicable to 3rd person racing

StunRunner: has some very minor inertia, and I also noticed vertical bobbing (as if it was hanging on the spring) - but it's not really real physics as it doesn't adjust the range/speed/intensity even after jumps - it's merely an effect.

I guess I can implement the vertical bobbing behavior from StunRunner. Worst case, non-realistic, just an effect like in StunRunner, best case: physics-based (e.g. taking into account speed, hills, acceleration).

 

 

And I cannot at this moment recall any other similar non-car racing games on other related platforms. If we can't find anything, I guess I will have to extend the search to much faster platforms (and perhaps include even the current RedOut on PS4...)

 

EDIT: I found there's F-Zero X on N64 that is in third-person camera, so some of its physics properties can be implemented. I think I'll go and include the ship rolling (though, that one was in my to-do list from the beginning - just waiting till I get to this moment and implement it all).

Edited by VladR
Link to comment
Share on other sites

HoverRace looks interesting, but it's not track-based racing - rather open-world driving, so its physics is fundamentally different from track-based racing.

 

Club drive, outside of open-world driving (see above), features a car, so it's not applicable either.

 

Can you think of something like F-zero X, but on other platforms ?

 

 

Perhaps I should go and watch those All X hundred games on <insert random console here> vids, something similar to StunRunner should pop up there...

Link to comment
Share on other sites

StunRunner has a bit of some physics when you hit those tunnels... If you hit the corners the wrong way, you'll slow the vehicle down. The arcade version probably got the 68000 doing all of he physics similar "Checkard Flag". You probably can animate the vehicles enough to respond in a realistic way that way the game is operating off simple animation instead of some complex physics engine; of course that idea may vary depending on what you're trying to do.

 

Here's something for reference; a book called "Mathematics for Game Developers". Seem to cover a lot of stuff including game physics... Don't know how practical it is for the Jaguar, but it seem to cover some good angles.

 

https://ia800201.us.archive.org/13/items/Mathematics_For_Game_Developers_2004/Mathematics_For_Game_Developers_2004.pdf

Edited by philipj
Link to comment
Share on other sites

StunRunner has a bit of some physics when you hit those tunnels... If you hit the corners the wrong way, you'll slow the vehicle down.

That's not physics. That's just the same game feature as hitting speed pads (just reversed).

 

 

Here's something for reference; a book called "Mathematics for Game Developers". Seem to cover a lot of stuff including game physics... Don't know how practical it is for the Jaguar, but it seem to cover some good angles.

I have plenty similar books, but they require at least 10x faster CPU than jag has, so their practicality is approximately zero for jaguar.

 

I could use those books in past, because I was coding for PC, which at the time was ~1.0 GHz. Not 0.01 GHz like jag's 68000.

  • Like 2
Link to comment
Share on other sites

I just spent 2.5 hrs watching every Saturn game in a YT vid, looking for a game that would be set in a similar setting. Holy shit...

 

I have just acquired a tremendous gratitude for having been a PC gamer at the time and not having been exposed to such incredible shit. That must have been ....traumatizing at the time, growing up on that shit, considering what was available on PC - I am truly speechless, but you do indeed learn something new every day...

 

Anyway, I found a grand total of ONE similar game in the Saturn library: Cyber Speedway

 

Unsurprisingly, despite 3rd person view, it's copying everything from Wipeout.

 

 

So, it looks like StunRunner remains the only viable candidate for non-wipeout-esque physics - assuming nobody else can come up with a different platform's similar game. I don't think I have stomach watching the same for N64 anyway, and it's obvious that if Saturn didn't have a lot of those games, other platforms won't either.

  • Like 1
Link to comment
Share on other sites

Cyber Speedway actually came out before Wipeout when it was originally released in Japan!

Does Aero Gauge on the N64 meet your requirements? Its nothing special, just a decent little game. Cant hold a candle to Wipeout.

 

  • Like 2
Link to comment
Share on other sites

Wow. You deliver Every. Single. Time. I don't know how you do it, but that's mighty impressive!

 

 

 

The physics is actually quite an overkill. Unlike Cyber Speedway, which doesn't really react to environment - it merely plays out several physics presets animations (as in, "animation" of the values), this thing actually fully reacts to environment.

 

Emulating this on jag would mean spending substantial portion of 1 frame just on physics - this game clearly implements a multi-vector rigid body mechanics. I could probably devote some portion of the time of DSP for this, but even though the DSP doesn't know it yet, I already have some other plans for it, besides playing audio.

 

 

I will keep watching it over next few days and see if I can come up with an approximation of some kind, I'd sacrifice required performance for fixed-point computations, if it meant having something like this, for sure !

 

By any chance, you got some other tips like this ? Last time you showed an amazing GBA game (Hot Wheels Stunt Track challenge). Is there something like this on GBA that you know of ?

Link to comment
Share on other sites

Have you tried looking at:

 

Jet Moto 1-3 PlayStation

 

Powerdrome ST/Amiga

 

Slipstream 500 PC

 

Astro Go! Go! SNES..though that's more F-Zero..

 

Dodgem Arena PlayStation :

 

 

 

Looking at Android based games :

 

 

Cyber Race PC:

Edited by Lost Dragon
  • Like 1
Link to comment
Share on other sites

Thanks for the tips.

 

 

Powerdrome : I really like the level design of it. Looks decently optimized for ST. Thanks for reminding of the name - I saw the game long time ago, but couldn't find it recently.

 

Astro go go: it's a good example where not to go with physics. I'm pretty sure even my current physics is more complex than that, but just because other games got away with it, doesn't mean I should...

 

Turbo Fly : you know what ? I know it's an Android, and highly likely it's 100x faster chip than in jag, but it's not a complex triple-axis physics like in that N64 Aero Gauge game. It only applies single-axis roll, but with a very nicely choreographed inverted counter-roll on the camera (damn, took me like 5 minutes to figure out what the hell was going on on the screen). Now, my 3D camera doesn't do the roll (it ain't free), but I think I could easily apply those interpolated "bouncing" counter strafes (instead of rolls).

 

No idea yet how it'll look like, but it definitely should bring some charm. Should certainly beat the current [mostly] static camera. I wouldn't get this idea, if I didn't see that vid, for sure. Thanks!

  • Like 1
Link to comment
Share on other sites

ST Powerdrome was a game that really made you work to love it.

 

If you had the patience to master the overly sensitive controls and memorise track layouts, you could get a lot from it.

 

If you didn't, you were going to hate it.

 

I never tried the Xbox reboot many years later.

 

Glad Turbo Fly has proven inspirational :-)

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...