Jump to content
IGNORED

Any 3D game with flatshading on A800 ?


VladR

Recommended Posts

For a racing game, the cars would be too expensive to render on A800. The lowest possible "vehicle" mesh I can think of is something like Wipeout uses - just few triangles, and you got a ship.

Given how much 3D can be done in one frame's time, I think we could have 2 simple flatshaded enemies done in 1 frame. That would feel order of magnitude better than some 2D PMG sprite...

 

 

 

I was just thinking about it. How about 1x1 ?The narrow mode for 320x192 would be 256x192. The coordinates would still fit within a byte.

Now, a fullscreen flatshading in that resolution on A800 would be a slideshow.

But, let's take a look again at that Atari ST game: https://static.giantbomb.com/uploads/original/0/1987/1103706-hover_sprint_06.png

It's using roughly one third of screen for the 3D view (though, more than half of it is almost never covered with polygons). That's equivalent to 60 scanlines in 320x192. In Narrow mode, that would be a window 256x60 (15,360 px).

 

That's actually exactly same amount of pixels as in my current 160x96 (15,360 px) :)

 

Yes, we would have no colors (would have to use dithering or DLIs (though those would just butcher the framerate with its STA WSYNCs)), but we'd gain an extremely sharp 3D viewport on A800 !

 

And let's not forget that the unrolled inner scanline fill loop would fill scanlines in double speed compared to 160x96, as with single STA, it'd fill 8 pixels (not just 4 pixels, as in 160x96). All the other overhead (edges, Bresenham, scanline traversal, ...) would be basically identical, or actually smaller - since we'd have just 60 scanlines in our 3D window (not 96 as right now) - which would be an instant speedup, as there's tremendous overhead per each scanline.

 

Interesting :)

Highres dither might cool cool. And having it in narrow mode might not kill you in DMA stealing.

Link to comment
Share on other sites

Just to say it right away... when talking to Paul he never mentioned the term raycasting... it's a simple "3d" rendered with span buffer ora filler.

 

Just because it's a "grid" or maze it is not a raycaster. He simplified the 3d as in a maze you have several optimizations due to retangles. But no it's not a raycaster where you shoot rays per screen collums.

 

Where do you have that information?

 

"Simplified 3D" ...."not Raycaster" .....

 

 

Let's have a checkup, how wayout looked on 8 bits... framecount not needed ;)

 

https://youtu.be/ms0TkSxgCeI?t=108

https://youtu.be/frFvZwa_5bo?t=37

https://youtu.be/SRNVMf5GCuw?t=145

Edited by emkay
Link to comment
Share on other sites

A closer look shows the most outstanding "projection" is on the A8. It's not just a little, it's like a newer generation of hardware ...

 

But where does that speed come from ?

 

It uses the small playfield , 32 bytes

It uses the double scanline mode, so every 2nd line uses no DMA Cycle Stealing there...

 

 

A coarse counting of frames shows

9fps in the Atari version

6fps in the C64 version

 

both synchronized to the display...

 

The Apple version shows 6-8fps in a jumpy way.

Edited by emkay
Link to comment
Share on other sites

Just to say it right away... when talking to Paul he never mentioned the term raycasting... it's a simple "3d" rendered with span buffer ora filler.

 

Just because it's a "grid" or maze it is not a raycaster. He simplified the 3d as in a maze you have several optimizations due to retangles. But no it's not a raycaster where you shoot rays per screen collums.

Actually, it can be seen directly on YT, that Wayout is not a raycaster. The way that the polygons edges unfold/move looks and feels different to the way raycaster handles it. We can see that here, the edges are computed (not merely unfolded, as in a raycaster).

I personally always hated that typical 2D feel of walls in raycasters.

 

It's a minor visual thing, but it's there. Also, a Raycaster would just totally get stuck on the depths of that maze. Notice how framerate is more-or-less consistent. If it was raycaster, then certain heavy areas would just fall down to sub-1 fps (due to the nature of the raycasting algorithm, where it just keeps traversing - and that stuff on 1.79 MHz would just blow out - hell, I recall 386DX@40 MHz having framedrops in certain areas of Spears Of Destiny), yet they don't in Wayout.

 

What's the viewport size on Wayout ? 64x48 ?

  • Like 3
Link to comment
Share on other sites

Yeah it's using mirrow bottom half with display list same with capture the flag.

 

Capture the flag doesn't slow down even when you play around with the code...

 

E.g. I have the disassembled source and you can play with the projection and scale down or up the window...

 

In a raycaster that would not be possible without altering more in code. (Ray offset tables must fit to scale table

Must fit to trig tables etc).

 

And raycasting with marching rays through screen collums... we are talking in CTF about 128 collums... the Raycaster in Arsantica 3 rendered 64x48.... the one in Asskicker is smaller etc and all of them have different speed...

 

And the quality of the rendering in CTF is too good to be a raycaster...

Edited by Heaven/TQA
Link to comment
Share on other sites

Capture the Flag shows approx. 7 fps, while it's calculating the 2 projections.

There is also the difference that CTF isn't that symmetric. It's different in the upper and lower part.

Referring to "racing the beam", while Antic is doing a lot to show the stable content on the screen, parts of the walls could get some different visuals, if the content was changed in some DLIs... CPU got a lot of free time on the PAL Machines though...

Link to comment
Share on other sites

Nope... no free time master....it's double buffering so when 1 frame showed it calcs already 2nd. Don't know where you always think of having "plenty of free" time ;)

Hmmm.. As the graphics were adjusted to the vertical blank, the NTSC version has 60 times and the PAL version has 50 times per second for updates. So the "tick" for every frame is longer on the PAL machines. The CPU has to wait for that. This makes the PAL version approximately 1 fps slower.... If you cannot count a difference in changing the code and resulting fps, simply the wait for the next VB is shorter...

Edited by emkay
Link to comment
Share on other sites

But you know what I am referring to?

One frame in the buffer

one frame on the screen

finish the buffer, wait for next VB

set buffer to the screen

set screen to the buffer

create buffer

finish the buffer, wait for next VB

....

 

 

 

 

without that, the game would run full on the available cpu speed.

Link to comment
Share on other sites

Ok... maybe I interpreted this wrong then:

 

Capture the Flag shows approx. 7 fps, while it's calculating the 2 projections.

There is also the difference that CTF isn't that symmetric. It's different in the upper and lower part.

Referring to "racing the beam", while Antic is doing a lot to show the stable content on the screen, parts of the walls could get some different visuals, if the content was changed in some DLIs... CPU got a lot of free time on the PAL Machines though...

 

The last sentence with plenty of time in combinations with DLI.

Link to comment
Share on other sites

During weekend I was experimenting with clipping against screen edges, currently just the top&bottom edge (as my current "tunnel" dataset (1,981 pixels / 6 quads) is not crossing left&right screen edge). Through various tweaks and versions I found out it was actually fastest to adjust the transformation stage too, not leave the clipping as a separate stage (disconnected from the transform stage of pipeline) by centering the screen-space y-coordinate around 128.

 

Thus, I can do clipping without:

- using signed math

- going to 16 bit

 

Win:Win :)

 

So, I slowed down the transform stage a bit, but on the other hand gained a lot, since now I don't have to do the expensive check for each scanline (those add up real ugly). Instead, the checks happen only once per polygon, so my total cycle count went up tiny bit (for the functionality it brings) from 22,253 to 22,337 (for a case when polygon is not clipped against edges, and a bit more when clipped).

 

Obviously, due to the nature of the math, the polygons become inverted once they cross behind the camera, so now I need to implement culling (e.g. removing them from processing once they're behind camera), but then I should be able to capture a video.

 

It's funny I didn't come up with this technique during my jag flatshading earlier this year, as over there I'm doing the Y-clipping still per each scanline :)

 

So, when I eventually go back to jag coding I can make the jaguar rasterizer faster because of A800 experiments, which is real funny :)

  • Like 2
Link to comment
Share on other sites

Highres dither might cool cool. And having it in narrow mode might not kill you in DMA stealing.

Actually, HiRes Narrow 3D window (with same amount of pixels) should take only 50% of cycles stolen, as it's just 50% bytes:

1. 160x96x4 = 15,360 pixels = 3,840 Bytes

2. 256x60x1 = 15,360 pixels = 1,920 Bytes

 

But, dithering will be slightly more expensive I presume, as I have to somehow choose a dithering pattern for each scanline (most probably this check&pattern set will happen only once per polygon).

 

Still, doing this in HiRes should be much faster (in theory). I'll probably try an experiment soon...

Link to comment
Share on other sites

Not sure I follow .. you clip in 2D or 3D ?

Right now just 2D, so there might be precision issues with very large polygons (and me using just 8 bits) - but that's something that can be worked around with adjusting a dataset (e.g. creating smaller polygons).

 

EDIT: As I mentioned earlier, this is all an experiment in how far I can get with just 8-bit precision (I can always switch to 16 bits, which will give me ~order of magnitude more precision, but at a cost). A week ago I would swear it's impossible to do Y-clipping at the cycle cost that I'm currently having, so I really have no idea, how far I'll be able to push the 8-bit pipeline. But I'm almost there (just need to do X-clipping).

 

But clearly, it's possible (to a degree, of course) - and now it's just a matter of experimenting with various sizes of polygons and camera distances to see what kind of 3D world is possible to move through, in just 8 bits of precision.

I fully intend to find out the limits of the 8 bits of precision.

 

But, I'm not alone in this. You guys have implemented this before, so can share the ideas and guide me in the process, when I get stuck. Which is awesome!

Edited by VladR
Link to comment
Share on other sites

is "2d clamping" a visual thing or because of later performance?

 

when just visual thing... what about using a "dirty buffer" means reserving ram above top/below bottom screens so you actually can overdraw without glitches.

 

same with left/right thanks to Antic... does not help you performance wise but...

 

I did that several times... ok... it's cheating... ;) but was common practice on PSX, too. :D

Link to comment
Share on other sites

is "2d clamping" a visual thing or because of later performance?

 

when just visual thing... what about using a "dirty buffer" means reserving ram above top/below bottom screens so you actually can overdraw without glitches.

 

same with left/right thanks to Antic... does not help you performance wise but...

 

I did that several times... ok... it's cheating... ;) but was common practice on PSX, too. :D

Cheating is OK, that's what we gotta do, as this is just 1.79 MHz. Hell, I cheated like crazy on jag too (well, compared to my reference software rasterizer on PC, that is), and that beast has 52 MHz in 2 RISCs, 13 MHz in 68000, and OP and Blitter.

 

On jag I came up with the same scheme (not knowing it had a name already - but that's the fun in doing research) of "dirty regions", but didn't finish it, as early benchmarks showed no significant overall improvement, I just lost a lot of code (there's just 4 KB in GPU cache for code).

 

Besides, and at that time I didn't know about it yet, once the polygon is super close to camera (e.g. just about to leave the view frustum), it becomes extremely large in out-of-screen coordinates - coordinates like -1000 (or so). So, it would be extremely prohibitive to keep rendering it.

 

And if you cull it before it becomes so big, it results in ugly artifact - as the visible portions of polygon right in front of camera, just disappears.

 

I was later able to tweak it by adjusting vertices and camera distances, but this never really disappears completely.

 

So, scanline clipping is still necessary. In theory, it should be possible to adjust the view frustum in a way that does not result in large magnifying of the post-transformed vertices (at which point the whole unclipped polygon would fit within the dirty region), but I don't believe that the cycle cost involved would offset the cost of clipping (e.g. I suspect it would just slow everything else down). Especially given the fact that I just did Y-clipping in just few dozens cycles.

 

It might be a fun experiment, one of these days, but there's few other things that are currently taking way more cycles than I desire, so this will just make it into the "to-do-later" list :)

Link to comment
Share on other sites

I do clipping in 3D, because I have to .. I can have line starting in front of the camera, and ending behind the camera. So making persp first and clipping later just doesn't work. You could probably clip with some kind of 'near plane' ..parallel to screen plane .. and then do 2D, but still it seems prone to overflows to me. But then you can control the world and camera, so you the problematic cases might never happen.

Edited by R0ger
Link to comment
Share on other sites

I do clipping in 3D, because I have to .. I can have line starting in front of the camera, and ending behind the camera.

That's exactly what clipping is - the polygon crosses the screen edge(s) - in your case, it's just the line, not full polygon, but it's basically the same scenario - you can't just draw the whole thing (well, with "dirty regions" you technically could).

 

You could probably clip with some kind of 'near plane' ..parallel to screen plane .. and then do 2D, but still it seems prone to overflows to me.

I was running into overflow issue even with 32-bit computations on jaguar, so now know that more bits don't necessarily solve that particular problem. You must adjust the feasible range of values for both the vertices and the camera.

32-bit values do not really bring that much more precision over 16-bit, once we're talking about some meaningful viewing distance&precision combo.

This precision issue became especially an issue when I was experimenting on jag with level-of-detail large terrains - see https://www.youtube.com/watch?v=-scNWhRFDh0

At first I was puzzled, but then I spent half an hour in excel, and found out actually how easy it is to run of 32 bits. Good experience :)

 

But then you can control the world and camera, so you the problematic cases might never happen.

Well, if you take a look at that video above, I was eventually able to minimize it drastically, but you can still see the polygons disappearing in front of camera. As I didn't want to sacrifice the viewing distance (and unavoidable visual artifacts related to that), I chose a quick hack, and that was brutally culling those polys close to camera.

 

Now I know I could just spent an hour in excel, and find the proper range of vertices vs camera position without introducing any artifacts.

 

 

 

But, that's the experience - you don't know what you don't know until you know, so - next time (on jag, I mean) I'll do it better :)

 

EDIT: Which is exactly what I did for the 8-bit transformations. If I didn't have the above experience on jag, that you can "massage" the values into non-problematic ranges, I am 100% sure I would just discard the 8-bit precision altogether, and switched to much slower 16-bit.

 

Now, of course, the 8-bit pipeline won't be able to address large worlds, like in Oblivion. But, it looks, like it just might be able to handle medium-size rooms for an FPS genre (e.g. tunnel), and for sure close-ups of 3D meshes (e.g. spaceships). That's enough for me, now.

Edited by VladR
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...