ArneCRosenfeldt

MORTAL KOMBAT for ATARI JAGUAR is in the works!?

ArneCRosenfeldt replied to walter_J64bit's topic in Atari Jaguar

Don't we have enough 2d fighting games on the Jaguar? It is like with the Mario64 rewrite: You don't get the IP ! Do I dare to watch the YouTube video? At least I think that gamers would mix together his game with the Mortal Kombat CD somehow and then play in Jaguar as a proof of principle. Please buy the existing Jag IP for cheap and use that as a placeholder.

OpenGL-like Matrix-approach and Guard-Bands to transform vertices precisely to screen coordinates

ArneCRosenfeldt replied to ArneCRosenfeldt's topic in Atari Jaguar Programming

I am training it with posts. LLM learn from the internet. I hope to get answers in ChatGPT5 . The YouTube algorithm at least seems to understand me. Did you not complain about clipping artifacts on code someone posted 10 years ago? So don't you like a discussion about clipping in the SDK?

OpenGL-like Matrix-approach and Guard-Bands to transform vertices precisely to screen coordinates

ArneCRosenfeldt replied to ArneCRosenfeldt's topic in Atari Jaguar Programming

Have you understood how the SDK uses MMULT to transform vertices? People bash the SDK as being not enough. So it is small. So maybe we can discuss the code?

OpenGL-like Matrix-approach and Guard-Bands to transform vertices precisely to screen coordinates

ArneCRosenfeldt replied to ArneCRosenfeldt's topic in Atari Jaguar Programming

I just commented on the code in the SDK and F4L. You know the FPS. I will not change that much. FYB aparently did not get documentation about MMULT. MMULT is only useful ( for me ) for more precision, to at least trump the PSX with visual quality. I have written multiple times (on this website) that I need occlusion culling because I want to render indoor ( or racing games with tunnel and canyons ). The funny thing is that those weird OpenGL math even applies to the Jaguar quite well. My biggest problem was to find out how much RAM I have left. Can I even cache some part of a beam tree? So as an exercise I looked at the code size in the SDK and the size of a simple mesh ( without any occlusion data structure ). The TypeScript script grows to test the algorithm and set the structure to not fight clipping artifacts on the Jag itself. Anyone can check the algorithms in the browser.

OpenGL-like Matrix-approach and Guard-Bands to transform vertices precisely to screen coordinates

ArneCRosenfeldt replied to ArneCRosenfeldt's topic in Atari Jaguar Programming

OpenGL thinks that clipping is so very important and MMULT in such a way that it can clip at diagonals ( but no clipping to portals? ) using simple CMP or ADC, and move. Transformation to screen space uses those small factors (1+4 and 16-1) and the SHL; DIV. short refresher: After transformation we don't store 1/z, but we subtract the far plane to use the whole scale (and that is the only reason: Memory is expensive!). this gives us a far plane. I see how 1/z still has to be linear after this substraction Perspective correction works by using W also for UV just as for the other coordinates. W is the unbiased Z and U and V have not been transformed. Doom could write to the z-buffer while it renders the floors. It would need a second pass to write the z for the walls in horizontal phrase mode. Then monsters could be rendered horizontally while they check for z. Since n-gons and that two pass shading z thing go together well, yeah just stick to the SDK and interpolate per scanline and do subpixel correction per scanline. Maybe load a different code blob for this.

OpenGL-like Matrix-approach and Guard-Bands to transform vertices precisely to screen coordinates

ArneCRosenfeldt posted a topic in Atari Jaguar Programming

So I browsed the source again, and now thanks to the training I can read it better. It has a lot of boilerplate code. It uses NORMI to get max precise clipping. It uses MMULT for rotation and translation ( puts 1 into the packed registers). Vertices are 16 bit in world coordinates. I guess 68k usually writes them at load. Probably we should generate specific matrices for ships and cars. For bipeds I would go 32 bit. So cycle and space consumption by boilerplate code is so huge that even extreme MMUL application is better. So the code does many linear transformations to the vertices, but not combine the matrices. I guess that they worry about precision. We can split the bits of the matrix entries over different matrices and later SAR and ADD the results (the SDK also uses SAR). Even 3 or 4 matrices are still quite fast. Then I won't care about precision lost due to 32 bit camera position on a large race track anymore. Also sniper field of view is no problem. All the nice things 32 bit gave on PC then also work on Jaguar. DIV does unsigned 16.16 division. We use it as last instruction to get our also unsigned (see address generator) pixel positions. For this we need to clip negative vertices and those where the x nominator is more than 512 the denominator. You know, mul can only do 16 bit, but SHLq can do 32 bit and is single cycle. CMP is 32 bit. I would even say: Let's set up the address generator to use addresses centerred between 0 and 1024. Then with the OP we pick only this center. We manipulate buffer start and overlap front and back buffer to still use all memory. This creates guard bands. This means that for many polygons, even if they are clipped, we have their transformed vertices available. MMULt goes VRROOM, code is small. Anyone tried the carry flag after MMULT? Is there proper sign extension? I could use it like JMP CC ; ADDT bitPosition, next significant word . So even the SDK uses NORMI for max precision. So what if we want subpixel correction, and generally fill the 16.16 Address generator and Z-Buffer register with their full precision? One bit of the z buffer is used to mark even and uneven frames in indoor levels so that we don't need to clean it. Indoor levels on the other hand need occlusion culling. You decide. We can use normI to SHR nominator and denominator before DIV. So at least the nominator will have full 32 bit precision ( we pull that from the lower precision Matrix). For the ration to range from 0 to 1023, the denominator needs to be smaller by 10 bits. So we lose some precision here. Actually, we could make it smaller by 8 bit, and then shift the ratio up by two bits. So we use the same 8 bit twice so to say. There is even a reminder which can be used to create a 32 bit ratio. DIV is slow, but it runs in parallel. I think that it has no conflict with MMULT. We may be able to reuse the output from MMUL directly to save a load from register file. Notice how MMUL is in conflict with the real load from memory. We need to issue those beforehand. Also we only divide only once, because we need a high precicsion 1/Z for the z-buffer and perspective correction of subspans. So again multiplication is the most limiting. x * 1/z . Here the x segments from the matrix product can be used. We need to split z, but can then use it for x,y,Gouraud, and texture coordinates. For texture mapping we need to take the vertices on the screen as a 2d matrix and invert them. Importantly, we divide by the determinant. The sign of this determinant tells us which face we see. So here we can apply cheap back-face culling. This methods prevents me form drawing any other polygon than triangles. So again multiplication and division happens ( per polygon ). I propose to do this also at high ( variable ) precision. Either we use subroutines (two jumps only cost two cycles and we save SRAM, no stack because like in the SDK we stick to one level of call depth), or some fancy macro assembler, or some loader similar to the LRU cache loader. This way we can play with precision. Subpixel correction means that we move from the real vertex postion to a pixel position. We need to multiply this pixel fraction with our delta values. Good thing is that these fractions are already 16 bit and we can just use IMULT. No big deal. Only once per polygon. No excuse for PSX-wobble. IMULT needs two inputs. Mabe we can interleave like movefa; movefa; IMULT; IMLUT to also not have a problem with register usage? A bit more interesting is screen space clipping. So draw from top to bottom. So we check if y is visible. Then either out or jump to upper border. For each scanline we do the same checks for x ( maybe swap registers and resuse code?). Jumps cost us multiplication. There is a special case ( oh no, more code 😞 If we clipped the left x in the previous scanline, we just move downwards. We have those delta values. Indeed for edges, we need to calculate a delta. Either we take it from the inverse matrix ( for some esotheric rounding dogma maybe), or we go from the source and calculate delta along the edge. Now with each scanline advance we move integer pixels. So we still need to mix in the subpixel correction to give us delta values along y++, x+=integer screen vectors. But corrections only need 16 bit -- happy! There seem to be trapezoid renderers in of largly different code size. FYB is too larg, but uses good addressing modes ah, and has swap. FYB needs to swap so many registers. Maybe try to switch bank and then use movefa instead? Interrupt runs in bank0. So try to load, store and div in 0. Check with MMULT. I need texdraw1.inc . In my book we y++, loop over left and right edge and x+= . We delta all values. Then we check if our pixel ray is outside the edge ( Bresenham ). In this case we x++ on the GPU, and only then inform the blitter. For a super fast y loop we could have have pre calculated deltas for all the values. Or uh, with many values it may make sense to round to nearest. So we could also accept to be one pixel too much inside. But in more than half of the cases (slopes) we would be correct. We need one more Branch for this ( is branch after branch okay? Like if the second branch cannot be taken if the first was? > = < ?). And then some SUB . Okay, why is there QT, but no T alone? We don't care here. Just be happy about ADD, SUB symmetry. In pixel mode We could also start drawing from clipped sides and the side with the most integer slope. Branches, code size. Need to test. Sorting edges by Y and sorting slopes should not take much code. Maybe we need a subroutine to swap a set of registers? So now we have 32 bit values which can be used by the blitter. Vertical increments are done by the GPU, which also happens to use 32 bit. Registers in the blitter are interleaved for some strange reason. It may be possible to mimic this in JRISC and keep words in a swapped position. So the low word is actually a high word, which never carries over. The high word is the low word whose carry we ADD ( not ADDT), ADC . I think Gouraud has detangled the registers. Subspan Perspective correction does DIV and two MUL per span. Span length is typically tuned to the hardware. You measure how long JRISC needs, and then set the span length to the nearest power of two ( could even be done in level load code). I had a hard time to understand precision and the name "correction". I feel like, we are supposed to do affine transformation at max precision, and then have a z correction factor close to 1. Like billboards (Object Processor) always have this correction exactly 1. It deviates more the more oblique our viewing angle becomes. So for division we could try to deal only with the small deviation up to a certain angle? At least it looks like we should stick to affine for minimum jitter for a lot of polygons! DIV does not help us much here, but still we can use most of its bits. But then comes the correction part: We implement the multiplication of the 1 bit ourselves. We just use ADD (32 bit). So NORMI Z , DIV, SHL, ADD. Subspans give OP time to load the linebuffer. And also we don't really use Z, but W. OpenGL introduces near and far clipping planes, to make best use of the integer precision in the blitter. Like in OutRun for example we don't want to see an individual truck pixel scaled up to the whole screen. Also there is a viewing distance. The Matrix does this for us. The nice property "1/W is linear in screen space" stays intact. Jaguar SDK for some reason does not have these planes. Only a fiew polygons will hit near and far plane. So clipping is no performance problem. The JRSIC code in the SDK, F4L, and Doom has very long stretches of instructions without flags or branches. So it should be possible to interleave (using a tool) transformation and rasterizer. The cases where the blitter is busy at the next instruction should go down. With all those memory constraints there may not be much leeway to decide when to transform. Going from one polygon to the next can change up to 3 vertices. We may also want to set-up the next triangle in the background. So we have a range for the queue filling. We have a span length and line count. We may try to evenly mix blitter commands with the GPU work. Anyway, the code will not be interleaved. We calculate edges just before the blitter instruction, and we do all the other calculations 0 to n times after that. MMULT is necessary to get by with the registers. Interrupts would make this even worse because I could not manage temporary registers. Now I just block interrutps and get to use the stack pointer as ultima ratio. Ah, I could always do that. So would there also be a queue for polygon set-up? The SDK uses normi. With the matrix on larger levels, maybe the world vertices better come as float? Not normalized with a common exponent which just stays outside of all the matrix stuff. We have to shift the 1 in the W component upfront. Upon divison the exponents cancels itself. Translation and packing X and Y is like: LOADP;LOAD;SUB;SUB;SHR 16;SHR 16;SHL 16; or; moveta . As a big fan of CRY colors I don't need much color RAM. Still maybe some will like to have GPU RAM available for this as in the SDK. More importantly, free GPU RAM allows us to cache more transformed vertices. It may even be feasible to transform the next object or chunk, while the previous one is rasterized. Now that I see all the instructions the poor GPU needs to execute, I feel like it should never have to wait on the blitter. So each scanline, one of the next vertices is transformed. Maybe even use a sliding window for the vertices as in LZ compression. We transform one new vertex. The polygons refer to any vertex in the window. High precison vertices with lots of attributes are large. So it may makes sense to give compile their last usage into the model to free the cache. I tried to read the F4L code again, but it is just too lengthy. No MMULT no normi . Looks like a lot of micro-optimizations. I may steal the blitter code though. The way the SDK does every transformation step explicitly is compatible with a beam tree, so I will have to stick with that. I mean, the code has to be loaded in a second pass to render all the polygons which span the guard band. So the guard band is an optimization for speed just like the beam tree. It costs code size. The openGL render pipeline really profits form a z-buffer and good 3d also really needs shading ( I miss it in F4L). So we probably need to render in two passes going through colorRAM. (why even use gpuRAM with this alignment problem?). I want CRY color. Maybe reserve 16 palette entries for stuff, dunno. The sliding window with compiled objects and chunks and the triangles only ( no quads ) don't really let us render multiple small spans at once. So KISS it is. One triangle at a time. When we clip Gouraud per polygon, I would not need to saturate per pixel in software. Same for z. So the software rasterizer is quite simple a lot of adds ( as seen in the SDK and thanks to the alternative access to those registers at F0227C--F02298 ). We don't need to back calculate Z or I values because we re-start within the span. Yeah, another wait for the blitter, but we have two edges to trace anyway. And only the blitter can draw 3px of a phrase in one go (and Z and I). The engine should tune itself on the title screen with DSP, OP, and 68k load, but it looks like px-mode wins for any span below 16px and phrase mode above that. Subspans at least share one z-buffer slope. Caching a shaded textures in colorRAM sounds great espcially when zooming in, but z-buffer then works in px-mode which makes slow because: read, compare and write happens per pixel. You may use it for a skybox and then clear the z-buffer globally in the one vertical retrace which falls into (or adjacent to) the skybox drawing (which is so stupidly slow on Jaguar and eats a whole frame ). And for sound I found FS02_50.DAS: ;____________________start of I2S interrupt service routine__________________ ;______________________________________________________________________________ ; Sample Rate interrupt i2s_isr: ; Put the Flags register away for safe keeping!!! move r20,r30 ; get flags ptr load (r30),r12 ;load flags ; Now we need to actually store the data to the DACs here move r21,r28 ; get output counter shlq #3,r28 movei #buffstart,r29 ; r29 will be read location add r28,r29 move r9,r28 ; get address of DAC load (r29),r30 store r30,(r28) addq #4,r29 addq #4,r28 load (r29),r30 store r30,(r28) ; And finally increment the OC (Output Counter) addq #1,r21 bclr #3,r21 ; The following code is the magic to do an rte ; Assuming that the Flags are in r12 move r20,r30 ; get flags ptr bclr #3,r12 ; clear IMASK load (r31),r28 ; get last instruction address bset #10,r12 ; clear I2S interrupt addq #2,r28 ; point at next to be executed addq #4,r31 ; update the stack pointer jump (r28) ; and return store r12,(r30) ; restore flags nop ; NEEDS TWO NOPS TO PROTECT AGAINST nop ; EXTERNAL LOADINGS

Fight for Life: Graphics question

ArneCRosenfeldt replied to KrunchyTC's topic in Atari Jaguar

Why do people claim that FF has a slow pace? So they captured real motion, but gamers are used to comic style 2d fighters? I read that real martial arts fighters are two fast for 24fps cinema. Maybe Atari should have sped up the speed of their noob fighter models to match that of a top atlethe ? Or is the frame rate from the Jaguar too low? If framerate is everything that count, on the C64 many fighters only had one color. Virtua Fighter has flat shading. Backgrounds in FF did not impress anyone, so it could be some linebuffer init color which does not block the system bus ( +tree and a cloud ). Fill rate is a problem on the Jaguar: So smaller fighters on screen and compensate a bit with sub-pixel precision.

End Of Interrupt Handling: real vs. docs

ArneCRosenfeldt replied to 42bs's topic in Atari Jaguar Programming

So when JRSIC sends out a read request (to SRAM,DRAM,or DIV), it sends as return address only 5 bits and not the full 6 bit of the register file?

Blitter interrupt - does it exist?

ArneCRosenfeldt replied to 42bs's topic in Atari Jaguar Programming

Why would it makes sense? From the point we collect the control inputs to display on CRT we want the shortest time span. And we want to let run stuff in parallel. Even with only 4kiB on the GPU we might want to start drawing while we already proces some geometry. Or do you favour the Doom way: geometry on Jerry ( with interrupt for music ) drawing on Tom I would, if Jerry would have an efficient way to access memory.

Why are frame rates of Jaguar 3D games so low?

ArneCRosenfeldt replied to agradeneu's topic in Atari Jaguar

So 28 000 pixel toal in a frame. 280 000 pixel per second. 28 MHz . 100 cycles per pixel . Blitter control runs on the GPU interrupt, while the transformation runs ahead.

Why are frame rates of Jaguar 3D games so low?

ArneCRosenfeldt replied to agradeneu's topic in Atari Jaguar

I just had forgotten how low the frame rates were. I thought Atari would aim at Wings3d on SNES. Atari is so proud of their z-buffered Gouraud shader which saturate the memory bus. But for 10 fps, probably in cockpit view, and with 180px horizontal resolution (Doom) I still cannot completely grasp how the Jaguar could be so slow. On the other hand it appears over ambigious. So the z buffer is full blown. You can activate read and write individually. The buffer is interleaved with the frame buffer for fast page mode. Atari wanted a simple design and did not support transaparency in a register. For this there would need to be 4 bits which would go out as write-enable lines. So the blitter has always have to read the destination. Make it 8bits for 8bit framebuffer. Uh, would not work with 16 color mode. Hey Atari wake up. It is 1993 and 16 colors are so 1983 ( EGA graphics card ). I am so sad that the z-buffer did not see a good use. I even propose to first read 4 texels regardless if the z-test fails. We save a bit difficult circuitry, but also eliminate any benefit of z-sort. A simple world with only two passes: Transform all vertices ( MatrixMult ), then draw all triangles ( which have 8bit indices onto the vertices ) . No translucency. No shadows. Pixel mode is easy to set up in JRISC. Less code (very important with only 4 kB) and faster (JRISC is not very efficient per clock cycle). The blitter seems to have some register limit ( silicon real estate ). With smaller registers, everything is cheaper and there could be more registers and Gouraud and transparency could work in parallel. Adders would also be cheaper. Each register would have a carry flag in the middle, also it would be duplicated. Each pixel two registers at set onto the bus ( accumulator, delta ) and the sum is written into the accumulator duplicate. Next pixel sum goes back from dupe to first and the carry from the last pixel is taken in. So it is all very fast and each add costs only one cycle. For texels cache the last phrase address. That is 24 bit to store in a register .. not much. Looking at the big picture, the address generator has plenty of time to remove dupes. 4 cycles to get 4 carry flags of the texture cooardinates over the pixels in the phrase (store). Then read the 0..4 source phrases. Use the carries to condense the color values into the destination phrase. So 2 cycles/pixel minimum added, but maybe we keep the Tom design with separate address generator .. which again is just a big adder with carry flag, but for larger (address) values .. may be slow, may need more set-up time. But again look at the big picuture and think of the 10 fps -- a fixed round robin: write zPhrase write colorPhrase, read next zPhrase, read 0..4 texel phrases OP reads out 0 or 4 phrases ( display still runs at 60fps ) Jerry streams two phrases from one voice of the orchestral sound track / 68k gets a blip DRAM refresh ( with the great pressure regulation as is ). rinse repeat would have achieved 10 fps. It seems to much more important to have the next register set ready for the next line. So there would be 2 duplicates per register. On the first pixel the other duplicate also reads the value from the bus so that the uniform values of the triangle only need to be set up in the first line. A write from JRISC would stall the blitter for a cycle. Or hey, invest in a single buffer like in the JRISC STORE instruction. I mean in the Tom as it is, the others get the bus while the blitter waits ( idles ) to be set up for the next line. With the interrupt method this takes 12 cycles. With JRISC idle loop still 4 cycles reaction time. Then you need to store like 4 values ( start, length, intensity, fractions of this, z .. ah). You know, the micracle wonder blitter just idles most of the time. A bit more orchestration! Now I feel sad. Seems that only the the title screen animation can run at 60 fps or Tempest2000 because it does not even draw on the whole screen and then only flat and not z buffer.

Screen Resolution of Jaguar Games

ArneCRosenfeldt replied to Jagosaurus's topic in Atari Jaguar

2 Don't use MUL in your interrupt. How long are your MULT .. RESMAC chains anyway? 1 Almost still in theory: To absorb the bandwidth from main memory you would have to store 32 bit RGBA or so into internal SRAM ( for example the linebuffer ). But what if your pixel format is 16 bit like in Doom? In a sane design the OP would gobble up more than 16 bytes (like it does) in one burst and then do the shading. In Ataris design (going back to Panther, which was abondoned for a reason I guess) the hardware implementers needed to come up with a super scalar ( two pixels at one clock ) circuit. They hacked something together that unscaled at least uses 3/4 of the bandwidth. And for 8 bit color 2 pixels at once ( see the super scalar CLUT ) don't even cut it. Similar, in the blitter they botched the super-scalar design. Phrase-mode vs pixel mode .. aarg. Did they not learn from vi that we don't want modes? We want capabilities!

blitter access to line buffer

ArneCRosenfeldt replied to ArneCRosenfeldt's topic in Atari Jaguar Programming

Pixel mode in the blitter is not some obscure technique only for texture mapping. I man it is also needed for rotation of course. Collision detection works in pixel mode only. Gouraud and z-buffer also work in pixel mode and don't stress the CPU as much .. so for short lines and small triangles ( high LoD tesselation ). I think pixel mode is needed to expand pixels to more bits even in a simple blit. One can use SRCshade to fill the other bits. Pixel mode works with the 16 bit CLUT as seen in the SDK. There is no conflict with OP because of the bus lock .. or maybe there would be one or two cycles after the switch. But anway, the OP needs 4/3 cycles for 1:1 scale, and 1 cycle for scale .. the latter only uses one of the CLUT address busses. The other is free for us. The line buffer has a 16 bit interface to the bus as well and thus works well in pixel mode. The OP runs through the vertical blank and thus can be used to load textures. There is a register to specify buffer swaps at certain horizontal positions. So that is a bit nasty, but for large textures used all over a frame maybe we keep the texture mulitple lines anyway. So I hate sorting because that costs a lot CPU / GPU / Jerry time ( Doom uses Jerry for that ). An advantage of the z-buffer is that we do not need to sort. Just render like the game logic supplies the data. Personally, I am only interested in dungeon games ( descent ), so I need to carefully select which "game objects" to touch anyway. But for other games? Racing games and almost all first generation 3d console games have theatrical stage like scenes. Almost every front facing polygon has some pixels visible. Back to front rendering is a good fit. Now the Jag is so fill rate limited that the fill rate is halved when you hide behind a corner in Skyhammer or when you would use a blitter to draw those bridges or canyons in OutRun. I found one account of someone talking about span sorting. So that person was probably aiming for very high poly counts, but for the realistic number of polygons, as seen with BattleSphere or tunnels in pseudo-2d racing games (and checkered flag with out scenery, or that F1 thing). Since we are fill rate limited, we mostly care about larger polygons. We render spans front to back into a buffer with fixed capacity. If spans fuse on the way, great! If the buffer is full, accept overdraw. The pipeline architecture of JRISC likes parallel processing. I mean for the buffer there would be an unrolled loop with interleave of branches. Like you do two CMP then a branch some store another branch. Proper mixing reduces wait states in the GPU. As the GPU memory as texture cache code says: GPU SRAM is indeed 32 bit and thus two instructions are fetched at a time. Just align your lables and keep Load / Store away from them! An unrolled loop also allows to do bubble sort within the registers alone. So basically looking at the number of registers and the low amount of SRAM, sorting up to 32 values is very fast. Anything above nees some tree approach ( less robust) . With RLE sprites or polygons only a part of the edges should cross as we advance to the next scanline. Thus bubble sort needs to count / detect the number of changes. Also we have to do other stuff inbetween those sorts, I guess pure register based is not as useful as I first thought. Maybe have a first pass which also loads, then register based until sorted, and then store. This would also work in Super Burn Out ( to increase horizontal resolution? ) . We can use the spans from the large OBJECTs as occluders and also tell the OP not to load invisible phrases. Pixel mode is not really a thing with OP and there is a big cost per sprite per scanline anyway, which prompts everyon to use the blitter to draw the smaller sprites into the linebuffer. Even for unscaled sprites the blitter is 2 times slower. For opaque objects the Z buffer could be uses as a second phrase. But 16px wide spans .. an extra code path for these. I dunno. So in the end: The 5 cylces per pixel on the blitter is a given. There are no records. It is just sad that the memory is on idle most of the time. If the blitter is working all the time, we could fill every pixel in the windshield of for example TestDrive ( Amiga, PC ) (320x100px) . It is what it is. But if we decide to like the Jaguar, we decided to like that most of the memory bandwidth is wasted. Got that out of the way . Of course you get conflicts with the GPU if you try to do it in SRAM because the GPU continues to run even if it does not own the bus. Software rendering on the GPU always comes out a bit slower than those 5 cylces. So we are stuck with slow and slightly slower. Benchmarking will show two applications: For scaled down you better collect a phrase of pixels and then blit to DRAM ( for lower than 60 fps or full scanlines ). For scaled up / tiling you batter cache the original texture. The latter would promt me to put those draw calls into the vertical border. So I cannot sort by z? It is for scaled up only, so the part until the span buffer overflows. So many code paths to benchmark .. Anyway, beyond flat shaded games which look like on a 16 bit console, Wing Commander, Out Run, Fight4Life are possible. Fight4Life just has bad game mechanics. And it needs a simple floor .. I want all the polies on the fighters. Sorry, cannot have nice background on the Jag.

October 26, 2022
53 replies
- 6
- blitter
- line buffer
- (and 1 more)
  Tagged with:

WIP : Nitrous

ArneCRosenfeldt replied to Sporadic's topic in Atari Jaguar

But why do you go for this popping art style instead of fluent zoom? So where does all the memory go then? If you let zoom snap to some factor, you could also let rotation snap. Far away the small objects could rotate and you could have some twist in your roads. Thanks to your camera the rotation close to it snaps to zero to let OP deal with them. I mean, something to have beyond Super Burnout?

WIP : Nitrous

ArneCRosenfeldt replied to Sporadic's topic in Atari Jaguar

Why would we use pixel art on a (pseudo) 3d game. From the video it looks that the blitter cannot load a tiny amount of the RAM of the Jag anyway: Pixel mode is limited by output size, not input size. This is entirely dictated by the 3d part. You could even load some scenery from ROM or decompress in Jerry. Note how in Outrun objects repeat for a while, then a new bunch comes up. People in this forum told be that they never felt limited by cartridge size in their homebrew projects. I wonder why there is a JagCD then. Only for the FMV aparently. Have you guys seen the pseudocode for DCT (de-)compression? Like 20 adds, 10 copies, and two mults. I wonder why on the Jag mul is as fast as add and copy. I would (additionally) need a long instruction word (64bit) like the i750 from 1987 with 4 add and copy instructions on packed 64 bit registers. Ah, where was I ? Ah, yeah, why not use polygon models for the vehicles if you think that RAM is a limit? Than you can turn them in real time. Ah, I see, the sprites snap to fixed scales. So sprites saturate to 1:1 scale for a good fill rate. But then again for the largest sprites the OP scaling would be a good fit. In the video like 10% of the pixels are touched by the blitter? Dirty rectangles and some OP magic ( interrupts -> GPU ). I thought the floor in this game is rendered to a frame buffer .. or is it full Super BurnOut with render to linebuffer already? Those sprites are huge!

Sign In

Posts

Joined

Last visited

Recent Profile Visitors