Jump to content
IGNORED

Why are frame rates of Jaguar 3D games so low?


agradeneu

Recommended Posts

Well, you're not the only one bothered by some design choices Microchip made ;)
What design choices do you find objectionable? I think the 14-bit PICs are an elegant architecture, but have been unimpressed by the later ones. Yes, they improve some things but too many aspects seem clunky.
I must admit that I don't have much experience with PICs, especially more recent ones, but when I had to use them several things bothered me in comparison to Atmel's AVR architecture :

 

- I/O registers that return the state of the pin instead of the state of the output latches when read, and bit write instructions that are really read-modify-write on the whole register. What's worse is that the documentation even explains in great lengths all the problems caused by such a design ! (that's on the 16x84 at least -- I know they somewhat fixed the problem on later models by adding registers)

 

- Lack of registers and limited instruction set. The datasheet claims it's an advantage because it's easier to learn -- well, maybe for some people, but I find it easier to code for the AVR architecture, where you don't have to use so many "tricks" to get your work done.

 

- A RISC core that needs 4 clock cycles to do a NOP, and 8 for a jump... (the AVR only needs 1 and 2, respectively). I'd have to check, but I think that even dsPICs need at least 3 clock cycles to do anything.

 

- Not a design choice per se, but when the errata for your device is longer that ten pages, something is wrong (some models only, fortunately).

 

I know that the PIC architecture has been designed earlier than the AVR, but I feel they've not done much of a job of improving it .

 

BTW, I agree with the rest of your post : I am occasionally baffled by the absence of features that seem useful and simple to implement.

Edited by Zerosquare
Link to comment
Share on other sites

AVP is a raycasting engine, not 3D. The Jaguar doesn't have 3D hardware. Take a look at ALL the rest of the 2D game machines of that time and you'll see that not ONE has a 3D game as fast as the Jaguar with the same pixel depth, resolution etc. The 3DO was the only console that competed, but it had 3D hardware. I've developed raycasting engines faster than AVP at that pixel depth on the Atari Falcon, but it took me a while for the optimizations, I've heard that the AVP programmer was rushed and pressured by Atari.

Link to comment
Share on other sites

The Jaguar doesn't have 3D hardware.
Not really true : the blitter can be used for hardware-accelerated texture mapping and gouraud shading (and possibly Z-buffering, but I'm not sure). But it's far from being a complete 3D chipset.

 

 

 

I think I would classifiy it as Hardware assited 3D but not fully 3D.

 

The fact is, it makes for MUCH more flexible 3D as it is NOT hardwired.

Again...even though textured games run horribly slow on the JAg, the

quality is much higher than PS1 or N64. If you observe Hover Strike,

you will see AMAZING lighting and mipmapping. Just unfortunately at 20

fps....:(...again overuse of the 68k.

Link to comment
Share on other sites

AVP is a raycasting engine, not 3D. The Jaguar doesn't have 3D hardware. Take a look at ALL the rest of the 2D game machines of that time and you'll see that not ONE has a 3D game as fast as the Jaguar with the same pixel depth, resolution etc. The 3DO was the only console that competed, but it had 3D hardware. I've developed raycasting engines faster than AVP at that pixel depth on the Atari Falcon, but it took me a while for the optimizations, I've heard that the AVP programmer was rushed and pressured by Atari.

 

 

You've heard correctly then. This was a common practice it seems

at Atari at the time. I've have seen the AvP demo proto and it runs

atwhat looks to be 30to60 FPS all the time. ALL THE TIME. That was

before they used the 68k for AI. If they had been given the time to

use the GPU for the AI, that game would have rocked a constant

30fps at least.

Link to comment
Share on other sites

- I/O registers that return the state of the pin instead of the state of the output latches when read, and bit write instructions that are really read-modify-write on the whole register. What's worse is that the documentation even explains in great lengths all the problems caused by such a design ! (that's on the 16x84 at least -- I know they somewhat fixed the problem on later models by adding registers)

 

This issue was indeed fixed on the later designs.

 

- Lack of registers and limited instruction set. The datasheet claims it's an advantage because it's easier to learn -- well, maybe for some people, but I find it easier to code for the AVR architecture, where you don't have to use so many "tricks" to get your work done.

 

I rather like the PIC instruction set, really. Add-with-carry would be nice, but someone invented a pretty good way to implement that in code. The ability to do use f-registers as a destination is very handy. Though there are a few instructions/features I would have liked to have seen implemented on the 18x that weren't.

 

- A RISC core that needs 4 clock cycles to do a NOP, and 8 for a jump... (the AVR only needs 1 and 2, respectively). I'd have to check, but I think that even dsPICs need at least 3 clock cycles to do anything.

 

That's a function of clock input stage design. Nearly all microcontrollers, or even state machines, need to divide each execution cycle into multiple steps (unless they require multiple execution cycles per instruction, as with the Z80). On the 6502, there are four steps per cycle: -1- rising edge of ph1; -2- falling edge of ph1; -3- rising edge of ph2; -4- falling edge of ph2. Many of the TIA's internal circuits use a similar 4-step cycle (that's why, among other things, playfield pixels are four dots wide).

 

On the PIC (and the TIA), the four steps of a machine cycle are generated using four consecutive pulses of the input clock. On the 6502 and the AVR, they are generated using the rising and falling edges of the input clock, and the rising and falling edges of a delayed input clock.

 

There are advantages and disadvantages to each approach, but I consider the issue to be one of electronics moreso than one of architecture.

 

- Not a design choice per se, but when the errata for your device is longer that ten pages, something is wrong (some models only, fortunately).

 

Generally true of the older ones, the errata representing things that have been discovered over the years and fixed in newer silicon. But some of the errata were really pretty horrible. I wasted a week because of a severe problem in the 16F648 which rendered its internal data EEPROM all but useless (the problem was that when a EEPROM write completed, the processor would sometimes jump to a random address). Grrrr....

 

I know that the PIC architecture has been designed earlier than the AVR, but I feel they've not done much of a job of improving it.

 

I generally like the PIC design more than that of most other micros. I only did one project with the AVRtiny architecture; after it was completed, I discovered that the parts the Atmel rep had been pushing would not be available in the quantities we need for something like nine months, so I ended up having to redo the project on a PIC.

 

The 14-bit PIC architecture isn't perfect, but it's about as good as it should be given the 14-bit instruction size. The 16-bit parts are more of a disappointment. They recognized somewhat the limitation of only having one W register in that they added a MOVFF instruction to move any register to any register, but I considered that rather disappointing.

 

What I would have liked to have seen (and I actually discussed this when I had lunch with the Microchip guys during a job interview, though I'd guess the 18x design may have been too far along to consider it) would have been a new internal register and a couple of new instructions.

 

The new register would be called WSRC; most instructions would copy W to WSRC, and WSRC would be used in place of W for any instruction that takes W as a source operand.

 

There would be two new instructions, though: USEFW and USELW. USEFW would fetch a byte from an F-register and store it into WSRC (leaving W unaffected). USELW would load WSRC with an immediate constant (again without affecting W). Thus, "USEFW reg1 ; ADDWF reg2" would add reg1 to reg2 without affecting W. The two instructions would eliminate the need for a register-to-register move since they could be combined to the same effect. The interaction of interrupts and the new instructions could be handled by (1) disabling interrupts between USExx and the following instruction, or (2) requiring the ISR code to begin with two MOVWF instructions to store both WSRC and W, and end with a MOVF and a USEWF instruction to restore them), along with (3) using a backup register for WSRC as is done with W when servicing high-speed interrupts.

 

BTW, I agree with the rest of your post : I am occasionally baffled by the absence of features that seem useful and simple to implement.

 

Worse than missing features are misfeatures. Stuff which required circuitry to implement, which makes the device worse than it would be without such circuitry.

Link to comment
Share on other sites

I rather like the PIC instruction set, really. Add-with-carry would be nice, but someone invented a pretty good way to implement that in code. The ability to do use f-registers as a destination is very handy. Though there are a few instructions/features I would have liked to have seen implemented on the 18x that weren't.
Well, I guess it's a matter of personal taste then ;)

 

That's a function of clock input stage design. Nearly all microcontrollers, or even state machines, need to divide each execution cycle into multiple steps (unless they require multiple execution cycles per instruction, as with the Z80). On the 6502, there are four steps per cycle: -1- rising edge of ph1; -2- falling edge of ph1; -3- rising edge of ph2; -4- falling edge of ph2. Many of the TIA's internal circuits use a similar 4-step cycle (that's why, among other things, playfield pixels are four dots wide).

 

On the PIC (and the TIA), the four steps of a machine cycle are generated using four consecutive pulses of the input clock. On the 6502 and the AVR, they are generated using the rising and falling edges of the input clock, and the rising and falling edges of a delayed input clock.

Interesting -- I didn't know they used this kind of tricks to achieve 1-cycle operations. Thanks for the info. Edited by Zerosquare
Link to comment
Share on other sites

Interesting -- I didn't know they used this kind of tricks to achieve 1-cycle operations. Thanks for the info.

 

It's hardly a new trick (as evidenced by the fact that the 6507 did it); there are advantages and disadvantages as compared with dividing down a faster clock.

 

The advantage is obvious: you don't need as fast a clock signal to obtain a certain level of performance. Since RF emissions are often related to the fastest clock speed on a board, reducing the required clock speed will make it easier to meet FCC requirements.

 

This advantage is often not as great as it would seem, however. If one wants a device to run one million instructions per second (as a good tradeoff between performance and power consumption) it's cheaper to use a 4MHz crystal than a 1MHz one. Further, a divide-by-four clock input will be less sensitive to glitches than one that generates multiple clock phases from each cycle.

 

Perhaps the trickiest issue is related to the minimum required lengths for the different clock phases. Generally, the maximum reliable speed of operation for a microprocessor/microcontroller will be affected by voltage and temperature. If the four clock phases are produced via divide-by-four counter, then it will be necessary to slow the clock input to the point that the clock period exceeds the longest required length of any of the phases. If reliable operation at a particular voltage and temperature requires that phase 3 be at least 249ns long, then the device will not be able to operate above 4MHz input (1MHz machine cycle) even if the other phases could be shorter. On the other hand, the device will be operable under a wide variety of voltage and temperature conditions if the clock is slowed down sufficiently.

 

Devices which derive multiple clock phases from each cycle are a different story. They can get some advantage by the fact that they can exploit the different required lengths of the different phases, but they may have a hard time generating phases of the required lengths under all conditions. The chip designer will have to strike a balance between making the internally-generated phases longer than necessary under all conditions (which would reduce the maximum allowable clock speed below what would otherwise be possible) or else reducing the length of the internally-generated phases (which would mean that the chip couldn't be operated at any speed under those conditions where the internal phases were too short).

Link to comment
Share on other sites

It's hardly a new trick (as evidenced by the fact that the 6507 did it); there are advantages and disadvantages as compared with dividing down a faster clock.
Well, I'm only beginning my engineering career, I still have stuff to learn ! ;)

 

I thought using both clock edges, and delayed clocks, was frowned upon in "clean" synchronous logic (for the reasons you exposed above). I didn't know it was this common.

 

P.S. : Just noticed what you had written about the C2000 in one of your earlier posts. This chip look seriously crippled for a DSP ; even the Jaguar's one (trying to get back to the original topic here :D) -- which looks more like a tweaked general-purpose RISC core than a "real" DSP -- supports delayed branches and circular adressing.

Link to comment
Share on other sites

I thought using both clock edges, and delayed clocks, was frowned upon in "clean" synchronous logic (for the reasons you exposed above). I didn't know it was this common.

 

Use of dual non-overlapping phase clocks can result in extremely robust designs when using static logic, and efficient designs when using dynamic logic (especially with NMOS). One of the nice features of this sort of design is that there are no minimum propagation delays for any part of the circuit other than the clock generation itself. Clock skew tolerance is not affected by minimum propagation delays.

 

An alternative approach which is almost as good is to design logic blocks so that all of the inputs are sampled on one clock edge and all the outputs are sampled on the other clock edge. This often ends up being roughly analogous to the dual non-overlapping phase-clock method, except that each subcomponent logic block generates its own dual-phase non-overlapping clocks internally. The propagation delays within the clocking circuits of each subcomponent must be sufficient to ensure proper behavior, but other propagation delays could be reduced to zero with no adverse effect.

 

Many CPLD designs use single edge triggering for both input and output latching. This is much less robust. Here, the state of all the latches is sampled simultaneously (one hopes) to the system will take on a new state once per clock cycle. The difficulty with this approach is that it can fail if the clock skew exceeds the minimum propagation delay. There are ways of ensuring correct operation even when such clock skew is unavoidable (e.g. by ensuring that any circuit will be clocked before the circuit that feeds it) but this can often increase design complexity and also increases the likelihood of design mistakes.

 

I wish more logic devices followed the design principle of sampling the input on one clock edge while changing the output on another. It avoids race conditions very nicely, and doesn't really require extra circuitry.

Link to comment
Share on other sites

Well since this thread is way off topic...are either of you

into FPGA stuff at all? I just got me a Spartan3e from

Xilinx around Christmas and had pac man and the Astro

cade running on it. Very cool, Mike J did those. Im looking

to get the T&J chipset on it eventually. My hope is to fix

a few bugs, and speed it up considerably. I am told by

those who know that with the right xilinx and the right code

we could imagine 200+mhz version of the chipset.

 

Lotsa work..I have the T&J nets..I dont have a few

old MOTOROLA/TOSHIBA libs required to simulate

the chips for testing/debugging. May have to build

them all by hand. DOH!

Link to comment
Share on other sites

I have someone well experienced already. I just need the libs.
Sorry, we (SCPCD and I) don't have them either.

 

 

Yeah, no sweat man...no one seems to. It looks like its roll up

the sleeves time. Im am quite confident in the guy I have on the

project. Libs or not, I aims to see a Jaguar running at ludicrous

speed(of course using Space Balls technology)! :D

Link to comment
Share on other sites

You should ask SCPCD about FPGAs, he is already doing great little things :)

 

 

 

I have someone well experienced already. I just need the libs.

 

The same guy who is working on Jaguar 2 ? How is progressing the project ?

 

Yes but we have not been doing much at all on it with all the current work

I have before me. I need a bigger crew.....5D Stooges? DOH! nah! we'll just

have to do it ourselves....;)

 

Fact is, ther are a few other folks helping us behind the scenes with a few

projects like the debugulator.

Link to comment
Share on other sites

You should ask SCPCD about FPGAs, he is already doing great little things :)

 

 

 

I have someone well experienced already. I just need the libs.

 

The same guy who is working on Jaguar 2 ? How is progressing the project ?

 

Yes but we have not been doing much at all on it with all the current work

I have before me. I need a bigger crew.....5D Stooges? DOH! nah! we'll just

have to do it ourselves....;)

 

Fact is, ther are a few other folks helping us behind the scenes with a few

projects like the debugulator.

 

Long live the debugulator!

 

Perhaps a moderator can split off the OT posts in this thread and give Zero and the other guy their own thread to talk about whatever it is they are talking about that has nothing to do with 3d jaguar games.

Link to comment
Share on other sites

I think I would classifiy it as Hardware assited 3D but not fully 3D.

 

The fact is, it makes for MUCH more flexible 3D as it is NOT hardwired.

Again...even though textured games run horribly slow on the JAg, the

quality is much higher than PS1 or N64. If you observe Hover Strike,

you will see AMAZING lighting and mipmapping. Just unfortunately at 20

fps....:(...again overuse of the 68k.

 

I agree, it might be slower, but the graphics are superior to the PSX in almost every way. No clipping, warping, moving or anything of that nature.

Link to comment
Share on other sites

I think I would classifiy it as Hardware assited 3D but not fully 3D.

 

The fact is, it makes for MUCH more flexible 3D as it is NOT hardwired.

Again...even though textured games run horribly slow on the JAg, the

quality is much higher than PS1 or N64. If you observe Hover Strike,

you will see AMAZING lighting and mipmapping. Just unfortunately at 20

fps....:(...again overuse of the 68k.

 

I agree, it might be slower, but the graphics are superior to the PSX in almost every way. No clipping, warping, moving or anything of that nature.

 

 

Dont forget 16 bit vs 256 usually. Shoot , I like the shaded games on the Jag. I love the

unique and arcadey/cartooney feel they have. There is plenty of untapped potential still.

Granted we wont be doing terraflops with the Jag but I know there is more to squeeze

out of it. The color quality is also better on the Jaguar. It's much crisper.

Edited by Gorf
Link to comment
Share on other sites

Reading five pages of anything will do that to you :P

 

I've always prefered large flat shaded polys as opposed to lousily textured shitty ones my self.

 

The PSX for the most part just looks aweful, while many of the 3D games on Jaguar (and hell, Genesis and SNES) looked awesome. It's an art style all it's own really. Now that systems are so powerful, we'll never see that again, even in the handheld world, which is always a generation or two behind the console world.

 

But hey, there's always cheap shareware computer games :P

 

Anyhow....so, to my understanding, the reason the games on the Jag look so good, is the 68,000 chip, cause it can do lots of colors and what not, but it's also the reason for a lot of 3d and 3D looking stuff to run slowly? Heh, MOD ME UP baby. I wouldn't mind seeing a few of the 3D games refined to take out the 68000 or at least make it run more efficiently. Or heck, a few new homebrews could use these techniques. I expect to be playing Halo on my Jaguar by the end of the year :P J/K

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...