Jump to content
A_Gorilla

Its 1993, you're in charge of the Jag, what do you do?

Recommended Posts

So my guess is that the 40MHz rumor started from some 3rd party developer who misinterpreted the manual. Has anyone seen any authoritative quotes on the subject?

 

- KS

 

Thank you. (I though it was Gorf who mentioned it much earlier in the thread, but I'm not sure now)

Share this post


Link to post
Share on other sites

Yes, a C compiler could generate good GPU/DSP code, but still paging in/out of local memory was needed. ( I think Doom's renderer was compiled as well )

Fixing the main memory bug would allow any C code to run on GPU/DSP - maybe it wouldn't be wonderful optimised assembly, but it would still be faster than 68k asm ( or even 68k C ) and Atari would have a single toolchain to optimise ( or 3rd parties could provide better compilers )

The main memory bug and running C on the riscs has nothing to do with each other.

Running gpu out of main is not always faster than the 68k.

It would have to be "wonderfully optimized assembly" in order to be useful in main.

 

They are related if you intend to write an entire game in C - not just a group of small routines.

It's interesting that you find cases where the 68k is faster than the GPU running from main? What are they?

 

good question. That would be lower than 20% of capacity according to AO's website.

 

Let's look at it from a maths point of view... I believe the speed of the Atari ST 68k at 8MHz was reported as being very roughly about 1MIP... Therefore at 13.3MHz this should be about 1.65MIPS... in theory the GPU can reach 26.6MIPS, in practice this tends to be more like 17MIPS in other words 10x the speed of the 68k, even if we run at 20% its still twice as fast as the 68k and that's not even taking into account the effects on BUS.

Share this post


Link to post
Share on other sites

They are related if you intend to write an entire game in C - not just a group of small routines.

It's interesting that you find cases where the 68k is faster than the GPU running from main? What are they? Are they situations which would run faster if the memory bugs had been fixed?

 

Mainly and usually in small tight loops. Other wise an unrolled loop runs as fast. So careful coding out in main is still always faster then the 68k by plenty.

Share this post


Link to post
Share on other sites

So my guess is that the 40MHz rumor started from some 3rd party developer who misinterpreted the manual. Has anyone seen any authoritative quotes on the subject?

- KS

 

I nay have mentioned another guy who actually reclokced a Jaguar at 40MHZ. Said he put in HoverStrike

Cart that ran at a ridiculous 60 FPS for about a minute and then the unit blew up.

Share this post


Link to post
Share on other sites

The main memory bug and running C on the riscs has nothing to do with each other.

Running gpu out of main is not always faster than the 68k.

It would have to be "wonderfully optimized assembly" in order to be useful in main.

 

Holy shit! You ARE still alive!

Share this post


Link to post
Share on other sites

I've yet to see a case where the 68k was quicker than the GPU running from main

 

There is one case where the tight loop senario indeed chokes the GPU. But we are talking

very few instructions in between the loop points. Once unrolled it actually ran as fast

as it did in local.

Share this post


Link to post
Share on other sites

I've yet to see a case where the 68k was quicker than the GPU running from main

 

There is one case where the tight loop senario indeed chokes the GPU. But we are talking

very few instructions in between the loop points. Once unrolled it actually ran as fast

as it did in local.

 

your saying in that situation the 68k could of done it faster?

Share this post


Link to post
Share on other sites

I've yet to see a case where the 68k was quicker than the GPU running from main

 

There is one case where the tight loop senario indeed chokes the GPU. But we are talking

very few instructions in between the loop points. Once unrolled it actually ran as fast

as it did in local.

 

your saying in that situation the 68k could of done it faster?

 

It sounds like a very extreme example though - and the kind of thing that could be optimised even in C code for gpu. ( ie - if there's a tight loop, unroll it a few times in C code )

Share this post


Link to post
Share on other sites

Its the only example and I am pretty sure it was done using the SMAC assembler and not by hand.

My guess is it has something to do with SMAC not handling something properly. My guess is it has

to do with the broken JR instruction hadling.

Share this post


Link to post
Share on other sites

I've yet to see a case where the 68k was quicker than the GPU running from main

 

There is one case where the tight loop senario indeed chokes the GPU. But we are talking

very few instructions in between the loop points. Once unrolled it actually ran as fast

as it did in local.

 

your saying in that situation the 68k could of done it faster?

 

It sounds like a very extreme example though - and the kind of thing that could be optimised even in C code for gpu. ( ie - if there's a tight loop, unroll it a few times in C code )

Not extreme at all, in fact, a C compiler would just make this worse as it would usually try to minimize instructions and create this case.

Share this post


Link to post
Share on other sites

Its the only example and I am pretty sure it was done using the SMAC assembler and not by hand.

My guess is it has something to do with SMAC not handling something properly. My guess is it has

to do with the broken JR instruction hadling.

The problem was with smac and mac.

The point is, there is no magical solution to fixing and optimizing all the quirks of the jaguar system.

smac helps a lot by allowing code to be executed from main, but it doesn't keep the programmer from writing bad code.

Share this post


Link to post
Share on other sites

Interesting. What code snippet was it? ( 'unroll loops' in gcc generally increases the number of instructions within a loop )

Share this post


Link to post
Share on other sites

Its the only example and I am pretty sure it was done using the SMAC assembler and not by hand.

My guess is it has something to do with SMAC not handling something properly. My guess is it has

to do with the broken JR instruction hadling.

The problem was with smac and mac.

The point is, there is no magical solution to fixing and optimizing all the quirks of the jaguar system.

smac helps a lot by allowing code to be executed from main, but it doesn't keep the programmer from writing bad code.

 

 

And this is exactly why I've said time and time again, write code for the J-RISC's out in main by hand assembly

as anything else is bound to cause you headaches.

Share this post


Link to post
Share on other sites

Interesting. What code snippet was it? ( 'unroll loops' in gcc generally increases the number of instructions within a loop )

 

Um...it was assembler using SMAC...it has nothing to do with gcc. GCC would not help at all, nor would

any C compiler. Thre are going to be several places, no C compiler will avoid such a tight loop.

Share this post


Link to post
Share on other sites

Interesting. What code snippet was it? ( 'unroll loops' in gcc generally increases the number of instructions within a loop )

 

Um...it was assembler using SMAC...it has nothing to do with gcc. GCC would not help at all, nor would

any C compiler. Thre are going to be several places, no C compiler will avoid such a tight loop.

You can tell the C compiler to unroll loops, but then as Owl has suggested many times, everything is a trade off.

If you always unroll loops, now your code grows significantly.

Share this post


Link to post
Share on other sites

also, is it slower because of the main code workaround - or would it still be slower if the gpu didn't have the main code bug.

Share this post


Link to post
Share on other sites

also, is it slower because of the main code workaround - or would it still be slower if the gpu didn't have the main code bug.

 

No it is slower do to bad coding techniques. If you were able to reasonably swicth

the DRAM to SRAM, there would be no slowdown at all as long as you wrote the code

properly. The only slowdown would be from the other parts of the system using the

main bus RAM.

Edited by Gorf

Share this post


Link to post
Share on other sites

Switching DRAM to SRAM would be extremely unlikely though :) I was just curious as to what the routine would be ( all I could think of that might be slower than 68k in main ram might be some kind of polling loop )

Share this post


Link to post
Share on other sites

Switching DRAM to SRAM would be extremely unlikely though :) I was just curious as to what the routine would be ( all I could think of that might be slower than 68k in main ram might be some kind of polling loop )

 

Like I said if you could do it reasonably. But yeah its like a polling type loop which is not something I'd bother

doing on the GPU. Use the interrupts if your are waiting on the blitter or OPL with the J-Risc's.

 

In fact Im willng to bet that the problem JagMod was dealing with was waiting on either those two in

a polling type fashion. Not something I recommend and also not something SMAC is good at dealing with.

As excellent an app as SMAC is, it stillhas a few issues that need attention beforeI'd use it for my code.

Share this post


Link to post
Share on other sites

No, I was just copying memory in a tight loop, one load, one store, and the gpu was slower running from main than the 68k.

As soon as I put multiple load/stores into the loop, it got faster.

But if you are writing something like a strcpy in C for the gpu, that's the kind of assembly the compiler will generate.

 

I'm still a proponent for gpu in main ram, but you have to be careful what you are doing, not everything running on the gpu is faster.

Share this post


Link to post
Share on other sites

No, I was just copying memory in a tight loop, one load, one store, and the gpu was slower running from main than the 68k.

As soon as I put multiple load/stores into the loop, it got faster.

But if you are writing something like a strcpy in C for the gpu, that's the kind of assembly the compiler will generate.

 

I'm still a proponent for gpu in main ram, but you have to be careful what you are doing, not everything running on the gpu is faster.

 

Ahh - I'd never tried such tight loops - which would explain why i hadn't seen that effect

Share this post


Link to post
Share on other sites

No, I was just copying memory in a tight loop, one load, one store, and the gpu was slower running from main than the 68k.

As soon as I put multiple load/stores into the loop, it got faster.

But if you are writing something like a strcpy in C for the gpu, that's the kind of assembly the compiler will generate.

 

I'm still a proponent for gpu in main ram, but you have to be careful what you are doing, not everything running on the gpu is faster.

 

Ahh - I'd never tried such tight loops - which would explain why i hadn't seen that effect

 

 

I've never had any such bus hammering experiences in main....then again I dont really run many(or any)

tight loops out there. I have at least a dozen or more instructions in between the loop.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...