Any 3D game with flatshading on A800 ?

Heaven/TQA · September 17, 2017

Raycasting in games where Carmack/id with Hoover Tank.... where it gets used.

Some mentioned same for Rescue on Fractalus or Other Lucasfilm usIng raycasting... I really doubt as it was not used in game scenario... we are talking about 1983-1985....

Heaven/TQA · September 17, 2017

For a racing game, the cars would be too expensive to render on A800. The lowest possible "vehicle" mesh I can think of is something like Wipeout uses - just few triangles, and you got a ship.

Given how much 3D can be done in one frame's time, I think we could have 2 simple flatshaded enemies done in 1 frame. That would feel order of magnitude better than some 2D PMG sprite...

I was just thinking about it. How about 1x1 ?The narrow mode for 320x192 would be 256x192. The coordinates would still fit within a byte.

Now, a fullscreen flatshading in that resolution on A800 would be a slideshow.

But, let's take a look again at that Atari ST game: https://static.giantbomb.com/uploads/original/0/1987/1103706-hover_sprint_06.png

It's using roughly one third of screen for the 3D view (though, more than half of it is almost never covered with polygons). That's equivalent to 60 scanlines in 320x192. In Narrow mode, that would be a window 256x60 (15,360 px).

That's actually exactly same amount of pixels as in my current 160x96 (15,360 px)

Yes, we would have no colors (would have to use dithering or DLIs (though those would just butcher the framerate with its STA WSYNCs)), but we'd gain an extremely sharp 3D viewport on A800 !

And let's not forget that the unrolled inner scanline fill loop would fill scanlines in double speed compared to 160x96, as with single STA, it'd fill 8 pixels (not just 4 pixels, as in 160x96). All the other overhead (edges, Bresenham, scanline traversal, ...) would be basically identical, or actually smaller - since we'd have just 60 scanlines in our 3D window (not 96 as right now) - which would be an instant speedup, as there's tremendous overhead per each scanline.

Interesting

Highres dither might cool cool. And having it in narrow mode might not kill you in DMA stealing.

emkay · September 17, 2017

Just to say it right away... when talking to Paul he never mentioned the term raycasting... it's a simple "3d" rendered with span buffer ora filler.

Just because it's a "grid" or maze it is not a raycaster. He simplified the 3d as in a maze you have several optimizations due to retangles. But no it's not a raycaster where you shoot rays per screen collums.

Where do you have that information?

"Simplified 3D" ...."not Raycaster" .....

Let's have a checkup, how wayout looked on 8 bits... framecount not needed

https://youtu.be/ms0TkSxgCeI?t=108

https://youtu.be/frFvZwa_5bo?t=37

https://youtu.be/SRNVMf5GCuw?t=145

Edited September 17, 2017 by emkay

emkay · September 17, 2017

A closer look shows the most outstanding "projection" is on the A8. It's not just a little, it's like a newer generation of hardware ...

But where does that speed come from ?

It uses the small playfield , 32 bytes

It uses the double scanline mode, so every 2nd line uses no DMA Cycle Stealing there...

A coarse counting of frames shows

9fps in the Atari version

6fps in the C64 version

both synchronized to the display...

The Apple version shows 6-8fps in a jumpy way.

Edited September 17, 2017 by emkay

VladR · September 17, 2017

Just to say it right away... when talking to Paul he never mentioned the term raycasting... it's a simple "3d" rendered with span buffer ora filler.

Just because it's a "grid" or maze it is not a raycaster. He simplified the 3d as in a maze you have several optimizations due to retangles. But no it's not a raycaster where you shoot rays per screen collums.

Actually, it can be seen directly on YT, that Wayout is not a raycaster. The way that the polygons edges unfold/move looks and feels different to the way raycaster handles it. We can see that here, the edges are computed (not merely unfolded, as in a raycaster).

I personally always hated that typical 2D feel of walls in raycasters.

It's a minor visual thing, but it's there. Also, a Raycaster would just totally get stuck on the depths of that maze. Notice how framerate is more-or-less consistent. If it was raycaster, then certain heavy areas would just fall down to sub-1 fps (due to the nature of the raycasting algorithm, where it just keeps traversing - and that stuff on 1.79 MHz would just blow out - hell, I recall 386DX@40 MHz having framedrops in certain areas of Spears Of Destiny), yet they don't in Wayout.

What's the viewport size on Wayout ? 64x48 ?

Irgendwer · September 17, 2017

What's the viewport size on Wayout ? 64x48 ?

Maybe, but as he AFAIK mirrors the top to bottom via Antic LMS the calculation costs are halved.

Heaven/TQA · September 17, 2017

Yeah it's using mirrow bottom half with display list same with capture the flag.

Capture the flag doesn't slow down even when you play around with the code...

E.g. I have the disassembled source and you can play with the projection and scale down or up the window...

In a raycaster that would not be possible without altering more in code. (Ray offset tables must fit to scale table

Must fit to trig tables etc).

And raycasting with marching rays through screen collums... we are talking in CTF about 128 collums... the Raycaster in Arsantica 3 rendered 64x48.... the one in Asskicker is smaller etc and all of them have different speed...

And the quality of the rendering in CTF is too good to be a raycaster...

Edited September 17, 2017 by Heaven/TQA

Heaven/TQA · September 17, 2017

Ah I remember when I looked into way out... it has a less than 64x rendering area as it covers with sprites left/right "garbage" of scanline.

emkay · September 17, 2017

Capture the Flag shows approx. 7 fps, while it's calculating the 2 projections.

There is also the difference that CTF isn't that symmetric. It's different in the upper and lower part.

Referring to "racing the beam", while Antic is doing a lot to show the stable content on the screen, parts of the walls could get some different visuals, if the content was changed in some DLIs... CPU got a lot of free time on the PAL Machines though...

Heaven/TQA · September 17, 2017

Nope... no free time master....it's double buffering so when 1 frame showed it calcs already 2nd. Don't know where you always think of having "plenty of free" time

emkay · September 17, 2017

Nope... no free time master....it's double buffering so when 1 frame showed it calcs already 2nd. Don't know where you always think of having "plenty of free" time

Hmmm.. As the graphics were adjusted to the vertical blank, the NTSC version has 60 times and the PAL version has 50 times per second for updates. So the "tick" for every frame is longer on the PAL machines. The CPU has to wait for that. This makes the PAL version approximately 1 fps slower.... If you cannot count a difference in changing the code and resulting fps, simply the wait for the next VB is shorter...

Edited September 17, 2017 by emkay

Heaven/TQA · September 17, 2017

But you know what I am referring to?

emkay · September 17, 2017

But you know what I am referring to?

One frame in the buffer

one frame on the screen

finish the buffer, wait for next VB

set buffer to the screen

set screen to the buffer

create buffer

finish the buffer, wait for next VB

....

without that, the game would run full on the available cpu speed.

Heaven/TQA · September 17, 2017

And how does this then referes to "plenty of time"?

emkay · September 17, 2017

Willst Du jetzt auch albern werden?

A Lot "viel"

Plenty "im Überfluss"

This refers to the available cycles that weren't used in PAL machines. So you can do more per frame.

Heaven/TQA · September 17, 2017

Ok... maybe I interpreted this wrong then:

Capture the Flag shows approx. 7 fps, while it's calculating the 2 projections.

There is also the difference that CTF isn't that symmetric. It's different in the upper and lower part.

Referring to "racing the beam", while Antic is doing a lot to show the stable content on the screen, parts of the walls could get some different visuals, if the content was changed in some DLIs... CPU got a lot of free time on the PAL Machines though...

The last sentence with plenty of time in combinations with DLI.

emkay · September 17, 2017

"Capture the flag doesn't slow down even when you play around with the code..." ...

VladR · September 18, 2017

During weekend I was experimenting with clipping against screen edges, currently just the top&bottom edge (as my current "tunnel" dataset (1,981 pixels / 6 quads) is not crossing left&right screen edge). Through various tweaks and versions I found out it was actually fastest to adjust the transformation stage too, not leave the clipping as a separate stage (disconnected from the transform stage of pipeline) by centering the screen-space y-coordinate around 128.

Thus, I can do clipping without:

- using signed math

- going to 16 bit

Win:Win

So, I slowed down the transform stage a bit, but on the other hand gained a lot, since now I don't have to do the expensive check for each scanline (those add up real ugly). Instead, the checks happen only once per polygon, so my total cycle count went up tiny bit (for the functionality it brings) from 22,253 to 22,337 (for a case when polygon is not clipped against edges, and a bit more when clipped).

Obviously, due to the nature of the math, the polygons become inverted once they cross behind the camera, so now I need to implement culling (e.g. removing them from processing once they're behind camera), but then I should be able to capture a video.

It's funny I didn't come up with this technique during my jag flatshading earlier this year, as over there I'm doing the Y-clipping still per each scanline

So, when I eventually go back to jag coding I can make the jaguar rasterizer faster because of A800 experiments, which is real funny

R0ger · September 18, 2017

Not sure I follow .. you clip in 2D or 3D ?

VladR · September 18, 2017

Highres dither might cool cool. And having it in narrow mode might not kill you in DMA stealing.

Actually, HiRes Narrow 3D window (with same amount of pixels) should take only 50% of cycles stolen, as it's just 50% bytes:

1. 160x96x4 = 15,360 pixels = 3,840 Bytes

2. 256x60x1 = 15,360 pixels = 1,920 Bytes

But, dithering will be slightly more expensive I presume, as I have to somehow choose a dithering pattern for each scanline (most probably this check&pattern set will happen only once per polygon).

Still, doing this in HiRes should be much faster (in theory). I'll probably try an experiment soon...

VladR · September 18, 2017

Not sure I follow .. you clip in 2D or 3D ?

Right now just 2D, so there might be precision issues with very large polygons (and me using just 8 bits) - but that's something that can be worked around with adjusting a dataset (e.g. creating smaller polygons).

EDIT: As I mentioned earlier, this is all an experiment in how far I can get with just 8-bit precision (I can always switch to 16 bits, which will give me ~order of magnitude more precision, but at a cost). A week ago I would swear it's impossible to do Y-clipping at the cycle cost that I'm currently having, so I really have no idea, how far I'll be able to push the 8-bit pipeline. But I'm almost there (just need to do X-clipping).

But clearly, it's possible (to a degree, of course) - and now it's just a matter of experimenting with various sizes of polygons and camera distances to see what kind of 3D world is possible to move through, in just 8 bits of precision.

I fully intend to find out the limits of the 8 bits of precision.

But, I'm not alone in this. You guys have implemented this before, so can share the ideas and guide me in the process, when I get stuck. Which is awesome!

Edited September 18, 2017 by VladR

Heaven/TQA · September 18, 2017

is "2d clamping" a visual thing or because of later performance?

when just visual thing... what about using a "dirty buffer" means reserving ram above top/below bottom screens so you actually can overdraw without glitches.

same with left/right thanks to Antic... does not help you performance wise but...

I did that several times... ok... it's cheating... but was common practice on PSX, too.

VladR · September 18, 2017

is "2d clamping" a visual thing or because of later performance?

when just visual thing... what about using a "dirty buffer" means reserving ram above top/below bottom screens so you actually can overdraw without glitches.

same with left/right thanks to Antic... does not help you performance wise but...

I did that several times... ok... it's cheating... but was common practice on PSX, too.

Cheating is OK, that's what we gotta do, as this is just 1.79 MHz. Hell, I cheated like crazy on jag too (well, compared to my reference software rasterizer on PC, that is), and that beast has 52 MHz in 2 RISCs, 13 MHz in 68000, and OP and Blitter.

On jag I came up with the same scheme (not knowing it had a name already - but that's the fun in doing research) of "dirty regions", but didn't finish it, as early benchmarks showed no significant overall improvement, I just lost a lot of code (there's just 4 KB in GPU cache for code).

Besides, and at that time I didn't know about it yet, once the polygon is super close to camera (e.g. just about to leave the view frustum), it becomes extremely large in out-of-screen coordinates - coordinates like -1000 (or so). So, it would be extremely prohibitive to keep rendering it.

And if you cull it before it becomes so big, it results in ugly artifact - as the visible portions of polygon right in front of camera, just disappears.

I was later able to tweak it by adjusting vertices and camera distances, but this never really disappears completely.

So, scanline clipping is still necessary. In theory, it should be possible to adjust the view frustum in a way that does not result in large magnifying of the post-transformed vertices (at which point the whole unclipped polygon would fit within the dirty region), but I don't believe that the cycle cost involved would offset the cost of clipping (e.g. I suspect it would just slow everything else down). Especially given the fact that I just did Y-clipping in just few dozens cycles.

It might be a fun experiment, one of these days, but there's few other things that are currently taking way more cycles than I desire, so this will just make it into the "to-do-later" list

R0ger · September 18, 2017

I do clipping in 3D, because I have to .. I can have line starting in front of the camera, and ending behind the camera. So making persp first and clipping later just doesn't work. You could probably clip with some kind of 'near plane' ..parallel to screen plane .. and then do 2D, but still it seems prone to overflows to me. But then you can control the world and camera, so you the problematic cases might never happen.

Edited September 18, 2017 by R0ger

VladR · September 18, 2017

I do clipping in 3D, because I have to .. I can have line starting in front of the camera, and ending behind the camera.

That's exactly what clipping is - the polygon crosses the screen edge(s) - in your case, it's just the line, not full polygon, but it's basically the same scenario - you can't just draw the whole thing (well, with "dirty regions" you technically could).

You could probably clip with some kind of 'near plane' ..parallel to screen plane .. and then do 2D, but still it seems prone to overflows to me.

I was running into overflow issue even with 32-bit computations on jaguar, so now know that more bits don't necessarily solve that particular problem. You must adjust the feasible range of values for both the vertices and the camera.

32-bit values do not really bring that much more precision over 16-bit, once we're talking about some meaningful viewing distance&precision combo.

This precision issue became especially an issue when I was experimenting on jag with level-of-detail large terrains - see https://www.youtube.com/watch?v=-scNWhRFDh0

At first I was puzzled, but then I spent half an hour in excel, and found out actually how easy it is to run of 32 bits. Good experience

But then you can control the world and camera, so you the problematic cases might never happen.

Well, if you take a look at that video above, I was eventually able to minimize it drastically, but you can still see the polygons disappearing in front of camera. As I didn't want to sacrifice the viewing distance (and unavoidable visual artifacts related to that), I chose a quick hack, and that was brutally culling those polys close to camera.

Now I know I could just spent an hour in excel, and find the proper range of vertices vs camera position without introducing any artifacts.

But, that's the experience - you don't know what you don't know until you know, so - next time (on jag, I mean) I'll do it better

EDIT: Which is exactly what I did for the 8-bit transformations. If I didn't have the above experience on jag, that you can "massage" the values into non-problematic ranges, I am 100% sure I would just discard the 8-bit precision altogether, and switched to much slower 16-bit.

Now, of course, the 8-bit pipeline won't be able to address large worlds, like in Oblivion. But, it looks, like it just might be able to handle medium-size rooms for an FPS genre (e.g. tunnel), and for sure close-ups of 3D meshes (e.g. spaceships). That's enough for me, now.

Edited September 18, 2017 by VladR

Any 3D game with flatshading on A800 ?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members