Jump to content
IGNORED

Its 1993, you're in charge of the Jag, what do you do?


A_Gorilla

Recommended Posts

Yes, a C compiler could generate good GPU/DSP code, but still paging in/out of local memory was needed. ( I think Doom's renderer was compiled as well )

Fixing the main memory bug would allow any C code to run on GPU/DSP - maybe it wouldn't be wonderful optimised assembly, but it would still be faster than 68k asm ( or even 68k C ) and Atari would have a single toolchain to optimise ( or 3rd parties could provide better compilers )

The main memory bug and running C on the riscs has nothing to do with each other.

Running gpu out of main is not always faster than the 68k.

It would have to be "wonderfully optimized assembly" in order to be useful in main.

 

They are related if you intend to write an entire game in C - not just a group of small routines.

It's interesting that you find cases where the 68k is faster than the GPU running from main? What are they?

 

good question. That would be lower than 20% of capacity according to AO's website.

 

Let's look at it from a maths point of view... I believe the speed of the Atari ST 68k at 8MHz was reported as being very roughly about 1MIP... Therefore at 13.3MHz this should be about 1.65MIPS... in theory the GPU can reach 26.6MIPS, in practice this tends to be more like 17MIPS in other words 10x the speed of the 68k, even if we run at 20% its still twice as fast as the 68k and that's not even taking into account the effects on BUS.
Link to comment
Share on other sites

They are related if you intend to write an entire game in C - not just a group of small routines.

It's interesting that you find cases where the 68k is faster than the GPU running from main? What are they? Are they situations which would run faster if the memory bugs had been fixed?

 

Mainly and usually in small tight loops. Other wise an unrolled loop runs as fast. So careful coding out in main is still always faster then the 68k by plenty.

Link to comment
Share on other sites

So my guess is that the 40MHz rumor started from some 3rd party developer who misinterpreted the manual. Has anyone seen any authoritative quotes on the subject?

- KS

 

I nay have mentioned another guy who actually reclokced a Jaguar at 40MHZ. Said he put in HoverStrike

Cart that ran at a ridiculous 60 FPS for about a minute and then the unit blew up.

Link to comment
Share on other sites

I've yet to see a case where the 68k was quicker than the GPU running from main

 

There is one case where the tight loop senario indeed chokes the GPU. But we are talking

very few instructions in between the loop points. Once unrolled it actually ran as fast

as it did in local.

Link to comment
Share on other sites

I've yet to see a case where the 68k was quicker than the GPU running from main

 

There is one case where the tight loop senario indeed chokes the GPU. But we are talking

very few instructions in between the loop points. Once unrolled it actually ran as fast

as it did in local.

 

your saying in that situation the 68k could of done it faster?

Link to comment
Share on other sites

I've yet to see a case where the 68k was quicker than the GPU running from main

 

There is one case where the tight loop senario indeed chokes the GPU. But we are talking

very few instructions in between the loop points. Once unrolled it actually ran as fast

as it did in local.

 

your saying in that situation the 68k could of done it faster?

 

It sounds like a very extreme example though - and the kind of thing that could be optimised even in C code for gpu. ( ie - if there's a tight loop, unroll it a few times in C code )

Link to comment
Share on other sites

I've yet to see a case where the 68k was quicker than the GPU running from main

 

There is one case where the tight loop senario indeed chokes the GPU. But we are talking

very few instructions in between the loop points. Once unrolled it actually ran as fast

as it did in local.

 

your saying in that situation the 68k could of done it faster?

 

It sounds like a very extreme example though - and the kind of thing that could be optimised even in C code for gpu. ( ie - if there's a tight loop, unroll it a few times in C code )

Not extreme at all, in fact, a C compiler would just make this worse as it would usually try to minimize instructions and create this case.

Link to comment
Share on other sites

Its the only example and I am pretty sure it was done using the SMAC assembler and not by hand.

My guess is it has something to do with SMAC not handling something properly. My guess is it has

to do with the broken JR instruction hadling.

The problem was with smac and mac.

The point is, there is no magical solution to fixing and optimizing all the quirks of the jaguar system.

smac helps a lot by allowing code to be executed from main, but it doesn't keep the programmer from writing bad code.

Link to comment
Share on other sites

Its the only example and I am pretty sure it was done using the SMAC assembler and not by hand.

My guess is it has something to do with SMAC not handling something properly. My guess is it has

to do with the broken JR instruction hadling.

The problem was with smac and mac.

The point is, there is no magical solution to fixing and optimizing all the quirks of the jaguar system.

smac helps a lot by allowing code to be executed from main, but it doesn't keep the programmer from writing bad code.

 

 

And this is exactly why I've said time and time again, write code for the J-RISC's out in main by hand assembly

as anything else is bound to cause you headaches.

Link to comment
Share on other sites

Interesting. What code snippet was it? ( 'unroll loops' in gcc generally increases the number of instructions within a loop )

 

Um...it was assembler using SMAC...it has nothing to do with gcc. GCC would not help at all, nor would

any C compiler. Thre are going to be several places, no C compiler will avoid such a tight loop.

Link to comment
Share on other sites

Interesting. What code snippet was it? ( 'unroll loops' in gcc generally increases the number of instructions within a loop )

 

Um...it was assembler using SMAC...it has nothing to do with gcc. GCC would not help at all, nor would

any C compiler. Thre are going to be several places, no C compiler will avoid such a tight loop.

You can tell the C compiler to unroll loops, but then as Owl has suggested many times, everything is a trade off.

If you always unroll loops, now your code grows significantly.

Link to comment
Share on other sites

also, is it slower because of the main code workaround - or would it still be slower if the gpu didn't have the main code bug.

 

No it is slower do to bad coding techniques. If you were able to reasonably swicth

the DRAM to SRAM, there would be no slowdown at all as long as you wrote the code

properly. The only slowdown would be from the other parts of the system using the

main bus RAM.

Edited by Gorf
Link to comment
Share on other sites

Switching DRAM to SRAM would be extremely unlikely though :) I was just curious as to what the routine would be ( all I could think of that might be slower than 68k in main ram might be some kind of polling loop )

 

Like I said if you could do it reasonably. But yeah its like a polling type loop which is not something I'd bother

doing on the GPU. Use the interrupts if your are waiting on the blitter or OPL with the J-Risc's.

 

In fact Im willng to bet that the problem JagMod was dealing with was waiting on either those two in

a polling type fashion. Not something I recommend and also not something SMAC is good at dealing with.

As excellent an app as SMAC is, it stillhas a few issues that need attention beforeI'd use it for my code.

Link to comment
Share on other sites

No, I was just copying memory in a tight loop, one load, one store, and the gpu was slower running from main than the 68k.

As soon as I put multiple load/stores into the loop, it got faster.

But if you are writing something like a strcpy in C for the gpu, that's the kind of assembly the compiler will generate.

 

I'm still a proponent for gpu in main ram, but you have to be careful what you are doing, not everything running on the gpu is faster.

Link to comment
Share on other sites

No, I was just copying memory in a tight loop, one load, one store, and the gpu was slower running from main than the 68k.

As soon as I put multiple load/stores into the loop, it got faster.

But if you are writing something like a strcpy in C for the gpu, that's the kind of assembly the compiler will generate.

 

I'm still a proponent for gpu in main ram, but you have to be careful what you are doing, not everything running on the gpu is faster.

 

Ahh - I'd never tried such tight loops - which would explain why i hadn't seen that effect

Link to comment
Share on other sites

No, I was just copying memory in a tight loop, one load, one store, and the gpu was slower running from main than the 68k.

As soon as I put multiple load/stores into the loop, it got faster.

But if you are writing something like a strcpy in C for the gpu, that's the kind of assembly the compiler will generate.

 

I'm still a proponent for gpu in main ram, but you have to be careful what you are doing, not everything running on the gpu is faster.

 

Ahh - I'd never tried such tight loops - which would explain why i hadn't seen that effect

 

 

I've never had any such bus hammering experiences in main....then again I dont really run many(or any)

tight loops out there. I have at least a dozen or more instructions in between the loop.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...