Heaven/TQA Posted September 17, 2017 Share Posted September 17, 2017 Raycasting in games where Carmack/id with Hoover Tank.... where it gets used. Some mentioned same for Rescue on Fractalus or Other Lucasfilm usIng raycasting... I really doubt as it was not used in game scenario... we are talking about 1983-1985.... Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted September 17, 2017 Share Posted September 17, 2017 For a racing game, the cars would be too expensive to render on A800. The lowest possible "vehicle" mesh I can think of is something like Wipeout uses - just few triangles, and you got a ship. Given how much 3D can be done in one frame's time, I think we could have 2 simple flatshaded enemies done in 1 frame. That would feel order of magnitude better than some 2D PMG sprite... I was just thinking about it. How about 1x1 ?The narrow mode for 320x192 would be 256x192. The coordinates would still fit within a byte. Now, a fullscreen flatshading in that resolution on A800 would be a slideshow. But, let's take a look again at that Atari ST game: https://static.giantbomb.com/uploads/original/0/1987/1103706-hover_sprint_06.png It's using roughly one third of screen for the 3D view (though, more than half of it is almost never covered with polygons). That's equivalent to 60 scanlines in 320x192. In Narrow mode, that would be a window 256x60 (15,360 px). That's actually exactly same amount of pixels as in my current 160x96 (15,360 px) Yes, we would have no colors (would have to use dithering or DLIs (though those would just butcher the framerate with its STA WSYNCs)), but we'd gain an extremely sharp 3D viewport on A800 ! And let's not forget that the unrolled inner scanline fill loop would fill scanlines in double speed compared to 160x96, as with single STA, it'd fill 8 pixels (not just 4 pixels, as in 160x96). All the other overhead (edges, Bresenham, scanline traversal, ...) would be basically identical, or actually smaller - since we'd have just 60 scanlines in our 3D window (not 96 as right now) - which would be an instant speedup, as there's tremendous overhead per each scanline. Interesting Highres dither might cool cool. And having it in narrow mode might not kill you in DMA stealing. Quote Link to comment Share on other sites More sharing options...
emkay Posted September 17, 2017 Share Posted September 17, 2017 (edited) Just to say it right away... when talking to Paul he never mentioned the term raycasting... it's a simple "3d" rendered with span buffer ora filler. Just because it's a "grid" or maze it is not a raycaster. He simplified the 3d as in a maze you have several optimizations due to retangles. But no it's not a raycaster where you shoot rays per screen collums. Where do you have that information? "Simplified 3D" ...."not Raycaster" ..... Let's have a checkup, how wayout looked on 8 bits... framecount not needed https://youtu.be/ms0TkSxgCeI?t=108 https://youtu.be/frFvZwa_5bo?t=37 https://youtu.be/SRNVMf5GCuw?t=145 Edited September 17, 2017 by emkay Quote Link to comment Share on other sites More sharing options...
emkay Posted September 17, 2017 Share Posted September 17, 2017 (edited) A closer look shows the most outstanding "projection" is on the A8. It's not just a little, it's like a newer generation of hardware ... But where does that speed come from ? It uses the small playfield , 32 bytes It uses the double scanline mode, so every 2nd line uses no DMA Cycle Stealing there... A coarse counting of frames shows 9fps in the Atari version 6fps in the C64 version both synchronized to the display... The Apple version shows 6-8fps in a jumpy way. Edited September 17, 2017 by emkay Quote Link to comment Share on other sites More sharing options...
VladR Posted September 17, 2017 Author Share Posted September 17, 2017 Just to say it right away... when talking to Paul he never mentioned the term raycasting... it's a simple "3d" rendered with span buffer ora filler. Just because it's a "grid" or maze it is not a raycaster. He simplified the 3d as in a maze you have several optimizations due to retangles. But no it's not a raycaster where you shoot rays per screen collums. Actually, it can be seen directly on YT, that Wayout is not a raycaster. The way that the polygons edges unfold/move looks and feels different to the way raycaster handles it. We can see that here, the edges are computed (not merely unfolded, as in a raycaster). I personally always hated that typical 2D feel of walls in raycasters. It's a minor visual thing, but it's there. Also, a Raycaster would just totally get stuck on the depths of that maze. Notice how framerate is more-or-less consistent. If it was raycaster, then certain heavy areas would just fall down to sub-1 fps (due to the nature of the raycasting algorithm, where it just keeps traversing - and that stuff on 1.79 MHz would just blow out - hell, I recall 386DX@40 MHz having framedrops in certain areas of Spears Of Destiny), yet they don't in Wayout. What's the viewport size on Wayout ? 64x48 ? 3 Quote Link to comment Share on other sites More sharing options...
Irgendwer Posted September 17, 2017 Share Posted September 17, 2017 What's the viewport size on Wayout ? 64x48 ? Maybe, but as he AFAIK mirrors the top to bottom via Antic LMS the calculation costs are halved. 1 Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted September 17, 2017 Share Posted September 17, 2017 (edited) Yeah it's using mirrow bottom half with display list same with capture the flag. Capture the flag doesn't slow down even when you play around with the code... E.g. I have the disassembled source and you can play with the projection and scale down or up the window... In a raycaster that would not be possible without altering more in code. (Ray offset tables must fit to scale table Must fit to trig tables etc). And raycasting with marching rays through screen collums... we are talking in CTF about 128 collums... the Raycaster in Arsantica 3 rendered 64x48.... the one in Asskicker is smaller etc and all of them have different speed... And the quality of the rendering in CTF is too good to be a raycaster... Edited September 17, 2017 by Heaven/TQA Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted September 17, 2017 Share Posted September 17, 2017 Ah I remember when I looked into way out... it has a less than 64x rendering area as it covers with sprites left/right "garbage" of scanline. Quote Link to comment Share on other sites More sharing options...
emkay Posted September 17, 2017 Share Posted September 17, 2017 Capture the Flag shows approx. 7 fps, while it's calculating the 2 projections. There is also the difference that CTF isn't that symmetric. It's different in the upper and lower part. Referring to "racing the beam", while Antic is doing a lot to show the stable content on the screen, parts of the walls could get some different visuals, if the content was changed in some DLIs... CPU got a lot of free time on the PAL Machines though... Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted September 17, 2017 Share Posted September 17, 2017 Nope... no free time master....it's double buffering so when 1 frame showed it calcs already 2nd. Don't know where you always think of having "plenty of free" time Quote Link to comment Share on other sites More sharing options...
emkay Posted September 17, 2017 Share Posted September 17, 2017 (edited) Nope... no free time master....it's double buffering so when 1 frame showed it calcs already 2nd. Don't know where you always think of having "plenty of free" time Hmmm.. As the graphics were adjusted to the vertical blank, the NTSC version has 60 times and the PAL version has 50 times per second for updates. So the "tick" for every frame is longer on the PAL machines. The CPU has to wait for that. This makes the PAL version approximately 1 fps slower.... If you cannot count a difference in changing the code and resulting fps, simply the wait for the next VB is shorter... Edited September 17, 2017 by emkay Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted September 17, 2017 Share Posted September 17, 2017 But you know what I am referring to? Quote Link to comment Share on other sites More sharing options...
emkay Posted September 17, 2017 Share Posted September 17, 2017 But you know what I am referring to? One frame in the buffer one frame on the screen finish the buffer, wait for next VB set buffer to the screen set screen to the buffer create buffer finish the buffer, wait for next VB .... without that, the game would run full on the available cpu speed. Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted September 17, 2017 Share Posted September 17, 2017 And how does this then referes to "plenty of time"? 1 Quote Link to comment Share on other sites More sharing options...
emkay Posted September 17, 2017 Share Posted September 17, 2017 Willst Du jetzt auch albern werden? A Lot "viel" Plenty "im Überfluss" This refers to the available cycles that weren't used in PAL machines. So you can do more per frame. Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted September 17, 2017 Share Posted September 17, 2017 Ok... maybe I interpreted this wrong then: Capture the Flag shows approx. 7 fps, while it's calculating the 2 projections. There is also the difference that CTF isn't that symmetric. It's different in the upper and lower part. Referring to "racing the beam", while Antic is doing a lot to show the stable content on the screen, parts of the walls could get some different visuals, if the content was changed in some DLIs... CPU got a lot of free time on the PAL Machines though... The last sentence with plenty of time in combinations with DLI. Quote Link to comment Share on other sites More sharing options...
emkay Posted September 17, 2017 Share Posted September 17, 2017 "Capture the flag doesn't slow down even when you play around with the code..." ... Quote Link to comment Share on other sites More sharing options...
VladR Posted September 18, 2017 Author Share Posted September 18, 2017 During weekend I was experimenting with clipping against screen edges, currently just the top&bottom edge (as my current "tunnel" dataset (1,981 pixels / 6 quads) is not crossing left&right screen edge). Through various tweaks and versions I found out it was actually fastest to adjust the transformation stage too, not leave the clipping as a separate stage (disconnected from the transform stage of pipeline) by centering the screen-space y-coordinate around 128. Thus, I can do clipping without: - using signed math - going to 16 bit Win:Win So, I slowed down the transform stage a bit, but on the other hand gained a lot, since now I don't have to do the expensive check for each scanline (those add up real ugly). Instead, the checks happen only once per polygon, so my total cycle count went up tiny bit (for the functionality it brings) from 22,253 to 22,337 (for a case when polygon is not clipped against edges, and a bit more when clipped). Obviously, due to the nature of the math, the polygons become inverted once they cross behind the camera, so now I need to implement culling (e.g. removing them from processing once they're behind camera), but then I should be able to capture a video. It's funny I didn't come up with this technique during my jag flatshading earlier this year, as over there I'm doing the Y-clipping still per each scanline So, when I eventually go back to jag coding I can make the jaguar rasterizer faster because of A800 experiments, which is real funny 2 Quote Link to comment Share on other sites More sharing options...
R0ger Posted September 18, 2017 Share Posted September 18, 2017 Not sure I follow .. you clip in 2D or 3D ? Quote Link to comment Share on other sites More sharing options...
VladR Posted September 18, 2017 Author Share Posted September 18, 2017 Highres dither might cool cool. And having it in narrow mode might not kill you in DMA stealing. Actually, HiRes Narrow 3D window (with same amount of pixels) should take only 50% of cycles stolen, as it's just 50% bytes: 1. 160x96x4 = 15,360 pixels = 3,840 Bytes 2. 256x60x1 = 15,360 pixels = 1,920 Bytes But, dithering will be slightly more expensive I presume, as I have to somehow choose a dithering pattern for each scanline (most probably this check&pattern set will happen only once per polygon). Still, doing this in HiRes should be much faster (in theory). I'll probably try an experiment soon... Quote Link to comment Share on other sites More sharing options...
VladR Posted September 18, 2017 Author Share Posted September 18, 2017 (edited) Not sure I follow .. you clip in 2D or 3D ? Right now just 2D, so there might be precision issues with very large polygons (and me using just 8 bits) - but that's something that can be worked around with adjusting a dataset (e.g. creating smaller polygons). EDIT: As I mentioned earlier, this is all an experiment in how far I can get with just 8-bit precision (I can always switch to 16 bits, which will give me ~order of magnitude more precision, but at a cost). A week ago I would swear it's impossible to do Y-clipping at the cycle cost that I'm currently having, so I really have no idea, how far I'll be able to push the 8-bit pipeline. But I'm almost there (just need to do X-clipping). But clearly, it's possible (to a degree, of course) - and now it's just a matter of experimenting with various sizes of polygons and camera distances to see what kind of 3D world is possible to move through, in just 8 bits of precision. I fully intend to find out the limits of the 8 bits of precision. But, I'm not alone in this. You guys have implemented this before, so can share the ideas and guide me in the process, when I get stuck. Which is awesome! Edited September 18, 2017 by VladR Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted September 18, 2017 Share Posted September 18, 2017 is "2d clamping" a visual thing or because of later performance? when just visual thing... what about using a "dirty buffer" means reserving ram above top/below bottom screens so you actually can overdraw without glitches. same with left/right thanks to Antic... does not help you performance wise but... I did that several times... ok... it's cheating... but was common practice on PSX, too. Quote Link to comment Share on other sites More sharing options...
VladR Posted September 18, 2017 Author Share Posted September 18, 2017 is "2d clamping" a visual thing or because of later performance? when just visual thing... what about using a "dirty buffer" means reserving ram above top/below bottom screens so you actually can overdraw without glitches. same with left/right thanks to Antic... does not help you performance wise but... I did that several times... ok... it's cheating... but was common practice on PSX, too. Cheating is OK, that's what we gotta do, as this is just 1.79 MHz. Hell, I cheated like crazy on jag too (well, compared to my reference software rasterizer on PC, that is), and that beast has 52 MHz in 2 RISCs, 13 MHz in 68000, and OP and Blitter. On jag I came up with the same scheme (not knowing it had a name already - but that's the fun in doing research) of "dirty regions", but didn't finish it, as early benchmarks showed no significant overall improvement, I just lost a lot of code (there's just 4 KB in GPU cache for code). Besides, and at that time I didn't know about it yet, once the polygon is super close to camera (e.g. just about to leave the view frustum), it becomes extremely large in out-of-screen coordinates - coordinates like -1000 (or so). So, it would be extremely prohibitive to keep rendering it. And if you cull it before it becomes so big, it results in ugly artifact - as the visible portions of polygon right in front of camera, just disappears. I was later able to tweak it by adjusting vertices and camera distances, but this never really disappears completely. So, scanline clipping is still necessary. In theory, it should be possible to adjust the view frustum in a way that does not result in large magnifying of the post-transformed vertices (at which point the whole unclipped polygon would fit within the dirty region), but I don't believe that the cycle cost involved would offset the cost of clipping (e.g. I suspect it would just slow everything else down). Especially given the fact that I just did Y-clipping in just few dozens cycles. It might be a fun experiment, one of these days, but there's few other things that are currently taking way more cycles than I desire, so this will just make it into the "to-do-later" list Quote Link to comment Share on other sites More sharing options...
R0ger Posted September 18, 2017 Share Posted September 18, 2017 (edited) I do clipping in 3D, because I have to .. I can have line starting in front of the camera, and ending behind the camera. So making persp first and clipping later just doesn't work. You could probably clip with some kind of 'near plane' ..parallel to screen plane .. and then do 2D, but still it seems prone to overflows to me. But then you can control the world and camera, so you the problematic cases might never happen. Edited September 18, 2017 by R0ger Quote Link to comment Share on other sites More sharing options...
VladR Posted September 18, 2017 Author Share Posted September 18, 2017 (edited) I do clipping in 3D, because I have to .. I can have line starting in front of the camera, and ending behind the camera. That's exactly what clipping is - the polygon crosses the screen edge(s) - in your case, it's just the line, not full polygon, but it's basically the same scenario - you can't just draw the whole thing (well, with "dirty regions" you technically could). You could probably clip with some kind of 'near plane' ..parallel to screen plane .. and then do 2D, but still it seems prone to overflows to me. I was running into overflow issue even with 32-bit computations on jaguar, so now know that more bits don't necessarily solve that particular problem. You must adjust the feasible range of values for both the vertices and the camera. 32-bit values do not really bring that much more precision over 16-bit, once we're talking about some meaningful viewing distance&precision combo. This precision issue became especially an issue when I was experimenting on jag with level-of-detail large terrains - see https://www.youtube.com/watch?v=-scNWhRFDh0 At first I was puzzled, but then I spent half an hour in excel, and found out actually how easy it is to run of 32 bits. Good experience But then you can control the world and camera, so you the problematic cases might never happen. Well, if you take a look at that video above, I was eventually able to minimize it drastically, but you can still see the polygons disappearing in front of camera. As I didn't want to sacrifice the viewing distance (and unavoidable visual artifacts related to that), I chose a quick hack, and that was brutally culling those polys close to camera. Now I know I could just spent an hour in excel, and find the proper range of vertices vs camera position without introducing any artifacts. But, that's the experience - you don't know what you don't know until you know, so - next time (on jag, I mean) I'll do it better EDIT: Which is exactly what I did for the 8-bit transformations. If I didn't have the above experience on jag, that you can "massage" the values into non-problematic ranges, I am 100% sure I would just discard the 8-bit precision altogether, and switched to much slower 16-bit. Now, of course, the 8-bit pipeline won't be able to address large worlds, like in Oblivion. But, it looks, like it just might be able to handle medium-size rooms for an FPS genre (e.g. tunnel), and for sure close-ups of 3D meshes (e.g. spaceships). That's enough for me, now. Edited September 18, 2017 by VladR Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.