Jump to content
IGNORED

Neo Geo to Jaguar ports in 3,2,1 GO!


joeatari1

Recommended Posts

5 hours ago, cubanismo said:

There's no scenario where GPU-In-Main wins Vs. GPU local I don't think. It's just better than the naive everything-on-68k approach. Would it bring Tomb Raider from .5fps to 1fps? Maybe. Would it magically make it playable? Not at all.

Who claimed that? 

 

There is quite an in depht and sincere discussion about this by AtariOwl, who did a nice texture mapped 3D demo.

 

BTW 1FPS to 5 FPS would be a huge improvement, and my impression always was, that it was more about step by step improvements than "magical things", like getting to PS1 levels.  

 

When it comes to 3D on the Jaguar, every frame counts, and (theoretical) 5x performance boost would be incredible. More likely, optimization would mean rather smaller gains, but for 3D texture mapped, still significant, especially if small optimizations would sum up.

 

So, is there a reason to write off GPU in MaiN? No. Is there a reason to believe in magical performance boosts? No. 

 

Both sides of the argument don't sound very reasonable to me.

Edited by agradeneu
  • Like 3
Link to comment
Share on other sites

I personally was just musing about how different techniques may be more fitting in different situations. Or more specifically the two custom processors. Especially regarding the two risc in local techniques I mentioned.

 

I wasn't making any claims on the performance of Tomb Raider. Considering how fast Tomb Raider turned out so well on the 32X no one really knows how that will go on the Jag until it's tried. 

Edited by JagChris
  • Like 2
Link to comment
Share on other sites

3 hours ago, agradeneu said:

So, is there a reason to write off GPU in MaiN? No. Is there a reason to believe in magical performance boosts? No. 

 

Both sides of the argument don't sound very reasonable to me.

well said,

 

I see an interesting scenario: the GPU after finishing its job jumps to the main RAM, in order to load a new code (e.g. with the blitter) to its internal RAM. It does some work while loading the code and when it is finished it jumps to the internal RAM to do next job. And again...

 

 

  • Like 1
Link to comment
Share on other sites

23 hours ago, cubanismo said:

There's no scenario where GPU-In-Main wins Vs. GPU local I don't think. 

 

Perhaps there's a situation where you have all the speed you need. And larger functions, larger stacks would be desirable. Just more room in general. 

Link to comment
Share on other sites

3 hours ago, Cyprian said:

I see an interesting scenario: the GPU after finishing its job jumps to the main RAM, in order to load a new code (e.g. with the blitter) to its internal RAM. It does some work while loading the code and when it is finished it jumps to the internal RAM to do next job. And again...

Yeah, this is pretty much how I imagined you would use the GPU if you had a C compiler + assembler that would build machine code that could run + jump in main. You could organize your key routines into a jump table, then tweak the assembly a little or rewrite them entirely to get them running in main. Probably need a little hand-rolled assembly to do the jumps back and forth. Theoretically if you had the compiler outputting position independent code you could just blit them to local mem and jump to them as-is.

 

You'd still want all the important code in GPU local, and you might be throwing away some parallelism if you previously had some points in your frames where the 68k was running from main memory without contending with any other processors for the bus (I can't imagine this happens very often at all, but I've done no measurements), but maybe in some cases this would be a little faster.

 

Again though, I don't think this technique would lead to substantial gains in theoretical raw throughput assuming you were going to take the time to completely optimize your engine. That's almost certainly going to lead you to putting all the good stuff in GPU local memory one way or the other. I think most of the gain here would just have been in developer productivity. Your initial builds would be faster, and if that's fast enough, you could ship that way and move on to the next project. No point in further optimizing something that's already good enough. It would also be way faster/easier to take GPU-in-main code and move it to GPU-in-local than it is to take 68k code and manually port it to JRISC for use with GPU-in-local, meaning faster path overall to theoretically optimal perf.

  • Like 1
Link to comment
Share on other sites

On 4/29/2022 at 4:15 PM, cubanismo said:

Fun pointless debate: N64 has a processor with 64-bit logic on a 32-bit bus. Jaguar has a processor(s) with 32-bit logic on a 64-bit bus. Which is more 64-bit? Is either? I would claim the Jaguar is actually more 64-bit in a practical sense. As Atari said, it's 64-bit where it needs to be. No one needs 64-bit logic in a game console with 2-8MB of RAM.

Slight correction here - the N64 is a 64 bit processor on a 16-bit bus. The bus between the main processor and the coprocessor chips was 16 bits, and passed packets of data. Even the cart is only 16 bits wide. The 16-bit packet nature of the bus is what contributed to the large latency on cache misses for the cpu. Nintendo also didn't want developers running 64-bit code as that unnecessarily expanded the amount of data needed to be passed across the bus. The N64 actually runs faster with 32-bit code, and as the N64 was never going to have more than 4GB of anything to access (max ram of 8MB, max rom of 128MB), 64 bit pointers were a complete waste. Saving and restoring 64 bit registers would double the time on entering and exiting functions.

 

Note that the bus between the RDP and RDRAM was 18 bits. That's the main 16 bits plus the extra two error detection bits that RDRAM had. It could use those extra two bits to increase the levels of alpha in 16 bit rendering. It's only because RDRAM was ridiculously fast (for the time) that the N64 had any kind of real performance. Imagine trying to draw across a 16 bit bus while the processor is trying to run code/data across the same 16 bit bus to the same 16 bit ram.

 

So in one respect, I agree - the N64 was NOT 64-bit where it truly mattered.

  • Like 4
  • Thanks 1
Link to comment
Share on other sites

41 minutes ago, Chilly Willy said:

Slight correction here - the N64 is a 64 bit processor on a 16-bit bus. The bus between the main processor and the coprocessor chips was 16 bits, and passed packets of data. Even the cart is only 16 bits wide. The 16-bit packet nature of the bus is what contributed to the large latency on cache misses for the cpu. Nintendo also didn't want developers running 64-bit code as that unnecessarily expanded the amount of data needed to be passed across the bus. The N64 actually runs faster with 32-bit code, and as the N64 was never going to have more than 4GB of anything to access (max ram of 8MB, max rom of 128MB), 64 bit pointers were a complete waste. Saving and restoring 64 bit registers would double the time on entering and exiting functions.

 

Note that the bus between the RDP and RDRAM was 18 bits. That's the main 16 bits plus the extra two error detection bits that RDRAM had. It could use those extra two bits to increase the levels of alpha in 16 bit rendering. It's only because RDRAM was ridiculously fast (for the time) that the N64 had any kind of real performance. Imagine trying to draw across a 16 bit bus while the processor is trying to run code/data across the same 16 bit bus to the same 16 bit ram.

 

So in one respect, I agree - the N64 was NOT 64-bit where it truly mattered.

That's very interesting. Hypothetically speaking, and let's all acknowledge we know system specs are more than the sum of their parts, with all this knowledge about the N64 not being "as 64-bit as the Jaguar," i.e. "where it needs to be," why are Nintendo 64 games so much better looking and fun than Atari Jaguar games? Let's assume the latter part (fun) is subjective; is the Jaguar remotely close to being able to pull off N64-level games? Again, let's say I don't know blitter from twitter; just asking so I can know and tune out every insane mention of the comparisons in the future. Because there seems to be a f**king ton. Almost as many as the insane Saturn and Jaguar, or Dreamcast and Jaguar, or [...]. At this point, if I mentioned on here that I was playing Forza Horizon, there's a chance a random nutter will pop up with, "Probably could run on the Jag... take out some textures, sure. But you know? So much potential. So much potential. Bro.... the potential? So much."  ?

Edited by Jag64
  • Thanks 1
  • Haha 1
Link to comment
Share on other sites

Nintendo made the N64 easier to program for. You had one processor that ran "normal" code. It had a robust toolchain, having compilers and assemblers, so you could write the code any way you were comfortable with. The power of the coprocessor for graphics and sound was easy to use. Regardless of how much I loathe their "microcode" nonsense Nintendo yammered about incessantly, it was easy to do all the things you needed to for 3D, and the audio library was up to playing sound effects while handling a MIDI-ish soundtrack.

 

About the only real problem was the development system itself - if you thought buying an ST just to program for the Jaguar was bad, imagine if you were told you needed to buy an SGI instead! ? ?

  • Like 1
  • Haha 1
Link to comment
Share on other sites

11 minutes ago, Chilly Willy said:

Nintendo made the N64 easier to program for. You had one processor that ran "normal" code. It had a robust toolchain, having compilers and assemblers, so you could write the code any way you were comfortable with. The power of the coprocessor for graphics and sound was easy to use. Regardless of how much I loathe their "microcode" nonsense Nintendo yammered about incessantly, it was easy to do all the things you needed to for 3D, and the audio library was up to playing sound effects while handling a MIDI-ish soundtrack.

 

About the only real problem was the development system itself - if you thought buying an ST just to program for the Jaguar was bad, imagine if you were told you needed to buy an SGI instead! ? ?

Okay, we're ticking all the marks: Jaguar had bottlenecks. N64 didn't. Jag lacked SDKs and code. N64 didn't. etc. etc. So with all that established knowledge cemented and in place ... it's been 25 years. In a quarter of a century, shouldn't someone have been able to make a game that show the Jaguar's "true power?" Just trying to figure out if I've been reading what equates to thread after thread of dog's chasing their tails over the course of every comparison conversation. If the bottlenecks cannot be overcome, bypassed, whatever, then the answer to my question about the Jaguar being able to produce/play/render/compost a game on par with the N64 is, "no."

 

(thinking "out loud" here) Logically, why haven't a pack of dorks gone nuts together and worked on isolated pieces of the machine? One dude just spends his time on Tom, another on Jerry, [...]. The console was designed like Atari operated, with almost no communication among its parts. So try going at the design like that?

Link to comment
Share on other sites

Who said the N64 didn't have bottlenecks? I certainly didn't. In fact, I specifically said that the only reason the N64 had any speed at all is that its ram was stupid fast. That's the primary bottleneck, same as the Jaguar - all the ram is shared. While graphics are being fetched for display, everyone else has to wait. While graphics are being drawn, everyone else has to wait. While the processor is fetching/storing code/data, everyone else has to wait. Unified ram is a big bottleneck on systems that use it. The main cpu in the N64 at least has decent caches to allow it to stay off the bus most of the time. A cache miss can be a big slow-down if you're not careful as much of the bus time will be going to the RDP to draw the 3D, and the display interface to output the video.

  • Like 1
Link to comment
Share on other sites

2 minutes ago, Chilly Willy said:

Who said the N64 didn't have bottlenecks? I certainly didn't. In fact, I specifically said that the only reason the N64 had any speed at all is that its ram was stupid fast. That's the primary bottleneck, same as the Jaguar - all the ram is shared. While graphics are being fetched for display, everyone else has to wait. While graphics are being drawn, everyone else has to wait. While the processor is fetching/storing code/data, everyone else has to wait. Unified ram is a big bottleneck on systems that use it. The main cpu in the N64 at least has decent caches to allow it to stay off the bus most of the time. A cache miss can be a big slow-down if you're not careful as much of the bus time will be going to the RDP to draw the 3D, and the display interface to output the video.

Ok. Cool. Gonna try this again:

Can the Atari Jaguar produce a game on par with the N64? Please answer yes or no.

¿Puede el Atari Jaguar producir un juego a la par del N64? Por favor responda sí o no.

A all yr Atari Jaguar gynhyrchu gêm sy'n cyfateb i'r N64? Atebwch ydw neu nac ydw.

Kann der Atari Jaguar ein N64-äquivalentes Spiel produzieren? Antworte ja oder nein.

L'Atari Jaguar peut-il produire un jeu équivalent à N64 ? Répond oui ou non.

Link to comment
Share on other sites

1 minute ago, Jag64 said:

Ok. Cool. Gonna try this again:

Can the Atari Jaguar produce a game on par with the N64? Please answer yes or no.

Yes... and no. It depends on the game. For a pure 3D game like Mario64, the Jaguar isn't up to the task. At least, not compared to the N64. For a 2D game, the Jaguar could compete with any of the other consoles - the example of that being Rayman. The Jaguar version is easily as good as any other port of that game.

 

Let's face it, the Jaguar's 3D is rudimentary at best, needing a lot of babying to get good results. The N64 has a full on GPU that was on par or ahead of what was available for the PC at the time. If you ran your Jaguar game code on the GPU in local ram, the N64 main processor was almost four times as fast on the same code. If you ran the code on the 68000, the N64 did loops around it running backwards on its hands. It's really not that fair to compare the N64 with the Jaguar. I don't think of the N64 as the last of its generation of consoles, I think of it as the FIRST of the next generation that included the PS2 and Xbox.

  • Like 3
Link to comment
Share on other sites

3 hours ago, Chilly Willy said:

I don't think of the N64 as the last of its generation of consoles, I think of it as the FIRST of the next generation that included the PS2 and Xbox.

 

Ive never heard of this thought process before. Im curious to hear what generation the Gamecube is in to you because it was released before the Xbox.

  • Like 2
Link to comment
Share on other sites

7 hours ago, Jag64 said:

with all this knowledge about the N64 not being "as 64-bit as the Jaguar," i.e. "where it needs to be," why are Nintendo 64 games so much better looking and fun than Atari Jaguar games?

Jaguar just had a wider bus. That doesn't make it technically superior in any way by itself. Even just for that one aspect of performance, what matters is the overall bandwidth of the bus which is, very roughly speaking, speed x width. The overall theoretical throughput of the infamous 64-bit Jaguar bus was apparently ~100Mbps, while the N64 could do over 550Mbps on it's (according to Wikipedia) 9-bit (or 16 according to above posts, I know very little about the N64) memory bus. And of course there are many other factors that go into overall system perf and game quality than peak memory bus throughput. I just again wanted to point out how insignificant the bits in the bit wars were as a measure of system capabilities. But boy did they ever matter in the marketing back then!

Link to comment
Share on other sites

Are the bugs related to going to and from main memory for the GPU the final blow?  Not knowing how anything actually works I could see a game engine built to stream main memory to the GPU and switch between active and being loaded chunks of code and data.  But, if you have to align things and whatever to avoid bugs then that extra overhead must kill the gains.

 

Am I close to a actual correct thought?

Link to comment
Share on other sites

52 minutes ago, Punisher5.0 said:

 

Ive never heard of this thought process before. Im curious to hear what generation the Gamecube is in to you because it was released before the Xbox.

I respect Chilly Willy opinion a lot, since I have followed his work for a few years… but i also find his point of view curious 

I have always thought the N64 being closer to the PS1 than to the Dreamcast, which to me was the first next gen console… N64vsDC had low poly counts, low framerates, blurry textures, and ugly lightning and transparency fxs… but i am no coder so my opinion might be poop compared to his

Maybe Chilly Willy can expand a bit more on why he feels that way…

Link to comment
Share on other sites

Nintendo tends to do their thing with little thought to how the other companies are doing their own. Most people think of the N64 as late to the party for the PS1/Saturn generation. You can think of it that way. As a programmer, it just doesn't feel like the same generation to me. The main processor supported floating point, an MMU, and kernel/user mode, all things found on the next generation, and none of the previous. The GPU was capable of perspective correct rendering, another thing only found on the next generation rather than the previous. It had much more memory (4 or 8 MB). The clock rates were far higher than the previous generation. Nintendo designed their SDK to isolate the developer from the hardware, again like the next generation and completely opposed to the previous.

 

If you think of it as first of the next gen, a lot of these features all feel more in common. However, being first means everyone after you can see what you've done and try to do better, much as Nintendo did to Sega in the Genesis/SNES generation. The way the SDK for the N64 works, it seemed Nintendo had intended for their next console to be an updated version of the N64. That clearly didn't happen, and Nintendo moved to a new architecture, one it WOULD stick with for at least a couple more generations.

 

Most people think of the start of the next generation as being the Dreamcast in 1998. But if you look at the DC, it's hardly much more powerful or advanced than the N64. Clearly Sega was just seeing what Nintendo did, and then doing that a bit better. Then Sony did that a little better with the PS2. MS was the wild card in this generation as it wasn't really clear if or when they'd throw their hat into the ring. At one point, it was just Nintendo, Sega, and Sony one upping each other in turn. But the hardware and SDK for the N64 looks more like the sixth gen than the fifth, which is why I consider it first in that gen rather than the DC. It came out about half-way between the two generations, so you could easily include it in either from a release time point of view. However, being so early meant the N64 ran out its lifespan in the middle of what most people call the sixth generation, so there wasn't much pressure on Nintendo to make as big an improvement to their next system as there had been for the N64. The GameCube seems more like the rest of the sixth generation systems, so they consider it part of that generation. I think if it more as Nintendo further divorcing themselves from what the rest of the game community was doing. They were taking themselves out of the race to make the "best" console. They starting making improvements to the console as met the needs for the next generation of GAMES rather than trying to match/exceed the next generation of hardware. They realized that it was the games that were important, not the hardware. As long as the hardware was capable of what the software needed, they didn't need to compete in the hardware wars like everyone else was. Every generation has taken Nintendo even further out of that race, leaving it to Sony and MS to keep at driving hardware further. It is kinda funny that in the end, Sony and MS have "standardized" to basically a PC with near identical specs. They also seem to be getting the idea that it's the games that are important, not the hardware.

Edited by Chilly Willy
speeling :)
  • Like 2
  • Thanks 2
Link to comment
Share on other sites

2 hours ago, Punisher5.0 said:

 

Ive never heard of this thought process before. Im curious to hear what generation the Gamecube is in to you because it was released before the Xbox.

John Carmack made a comment about how the N64 was handling graphics in a way that was ahead of its time. I should find the quote.

Link to comment
Share on other sites

2 hours ago, Gemintronic said:

But, if you have to align things and whatever to avoid bugs then that extra overhead must kill the gains.

There's no real way I know of to measure the perf impact of any wasted alignment bytes in the instruction stream, because removing them results in non-functional code, but the alignment requirements as I understand them were minor enough that it probably isn't a major factor in perf. If you're clever enough, or realistically, if your compiler+optimizer are clever enough, you can re-order the code to use useful instructions to align your jumps rather than no-ops, just like you can with the load stalls and always-executed instructions following conditional branches. At that point, there's no perf impact due to the workarounds.

  • Like 2
Link to comment
Share on other sites

On 4/29/2022 at 11:11 PM, Cyprian said:

I've just found a website which claims that PS1 cpu R3000A is clocked with 33MHz it has 4 KB of internal instruction cache, and 1KB data cache. It is connected to the main RAM bus 132MB/s and it reaches 30 MIPS (million instructions per second).

Just comparison - Jag's GPU: 16MIPS, the main RAM bus 106 MB/s

 

I wonder about that claimed by Atari  Jag's 106MB/s bus performance. It is simple calculation 13MHz multiplied by 8bytes (64bit - the bus width). Atari's assumption is simple - the main bus can provide the output in one the main bus cycle (two GPU cycles).

But we know that the GPU can wait 5, 7 or more cycles for a simple bus response.

It depends on what we are talking about, but generally we are talking about the maximum throughput not the average.

For the R3000 at 33Mhz, 30mips seems to be the maximum (to be checked, but it is possible that it cannot execute 1 instruction per cycle continuously). So theoretically in the best case it has an instruction throughput of 30mips (~1 instruction/cycle) or 33mips. I don't know the average we can expect with good programming but it must be around 25mips.

PS1 132Mb/s for memory is also a theoretical maximum speed. For DRAM, 132Mb/s with a 32bit BUS and at 33Mhz this can only occur during BURST mode with BEDO, SDRAM or similar type memory (1 cycle per transfer in BURST mode). To this must be added the precharge + RAS to CAS delay for any page miss, and refresh cycles.

 

JIRSCs can execute 1 instruction per cycle, so the theoretical maximum throughput is 26.6 mips for each of the JRISC in the Jaguar (not 16). With good programming you can expect to reach an average of 20mips or more. JRISC have 4 stages pipeline + prefetch mechanism.
Similarly, Jaguar DRAM has a maximum theoretical throughput of 106MB/s in fast page mode. 64bit data BUS, 26.6Mhz and 2 cycles per transfer in FPM : 106MB/s

Again, for average speed you have to add precharge+CAS to RAS delay for page miss, and refresh cycles. On Jaguar, precharge+RAS to CAS or refresh cycle are 3 clock cycles only.

  • Thanks 1
Link to comment
Share on other sites

5 hours ago, DEATH said:

JIRSCs can execute 1 instruction per cycle, so the theoretical maximum throughput is 26.6 mips for each of the JRISC in the Jaguar (not 16). With good programming you can expect to reach an average of 20mips or more. JRISC have 4 stages pipeline + prefetch mechanism.

good point,

to be honest I'm not sure whether the GPU/DSP have 26MIPS or 13MIPS because I heard that the fastest instruction takes 2 cycles.

 

5 hours ago, DEATH said:

Similarly, Jaguar DRAM has a maximum theoretical throughput of 106MB/s in fast page mode. 64bit data BUS, 26.6Mhz and 2 cycles per transfer in FPM : 106MB/s

Again, for average speed you have to add precharge+CAS to RAS delay for page miss, and refresh cycles. On Jaguar, precharge+RAS to CAS or refresh cycle are 3 clock cycles only.

yep, but the GPU has no fast (106MB/s) access to the MAIN ram.

Link to comment
Share on other sites

38 minutes ago, Cyprian said:

good point,

to be honest I'm not sure whether the GPU/DSP have 26MIPS or 13MIPS because I heard that the fastest instruction takes 2 cycles.

 

yep, but the GPU has no fast (106MB/s) access to the MAIN ram.

The GPU has 64bit access to the data BUS (yes, there are 64bit instructions... ok 2, but there are). I've never tried it, but theoretically it can load or store at the maximum bus speed.

Link to comment
Share on other sites

1 hour ago, DEATH said:

The GPU has 64bit access to the data BUS (yes, there are 64bit instructions... ok 2, but there are). I've never tried it, but theoretically it can load or store at the maximum bus speed.

This is a good question to @42bs but I heard that the GPU needs 5 or more cycles per an access to the main ram.

 

1 hour ago, DEATH said:

In fact, the GPU does not need to continuously access the data BUS to load or store. If you want to do that, use the Blitter. Full data BUS speed, guaranteed

I'm not able to find figures now, but someone measured that also. And the blitter needs 5 or more cycles when copy data from the man ram to the GPU ram.

 

 

---EDIT---

 

https://www.jagware.org/index.php?/topic/464-blitter-timing/

 

The GPU->DRAM transfert in phrase mode :
Result : 11 cycles per phrase -> 4096*11/8 = 5632 cycles for 4K

 

The DRAM->GPU transfert in phrase mode :
Result : 7 cycles per phrase -> 4096*7/8 = 3584 cycles for 4K

 

The DRAM->GPU speed transfert in phrase mode :
Result : 5 cycles per phrase -> 4096*5/8 = 2560 cycles for 4K

 


 

Edited by Cyprian
  • Like 1
Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...