why ZX has and atari hasnt?

Rybags · September 9, 2010

Arcade Asteroids would port just as easily to C64 or Apple - the vector processing emulation would just need rework for the local graphics mapping. Of course the speed factor applies - even on the Atari so far it appears there might be some slowdown.

What would be interesting in the 6502 vs Z80 arena is some direct benchable comparisons for common game-related stuff like block moves, and shift/load/and/or/store type operations that games use.

Of course allow optimisations for both - within reason.

frogstar_robot · September 9, 2010

Arcade Asteroids would port just as easily to C64 or Apple - the vector processing emulation would just need rework for the local graphics mapping. Of course the speed factor applies - even on the Atari so far it appears there might be some slowdown.

I know some of that sort of thing has been done on the C-64 but I'm not familiar with any of the systems and titles involved. But frankencode is neato regardless of the port target. Another place you see it is ports between the TI-99 and the Colecovision. They have a graphics chip in common but the cpu. So you can mostly get identical graphics and sprite output but the actual code will be very very different. Though such ports may be amenable to a kind of decompile to intermediate form then recompile to the other.

Divya16 · September 9, 2010

Reverse example would be trying to port YOOMP rendering routine to spectrum...

Try to write this on Z80:

LDA ADDR1,Y ; 4 cycles

ORA ADDR2,Y ; 4 cycles

ORA ADDR3,Y ; 4 cycles

ORA ADDR4,Y ; 4 cycles

STA ADDR5,X ; 5 cycles

=21 cycles

And then take into account different screen organization, and it turns out that even C64 can not reproduce it and it has the same cpu just slower...

If those ADDRn are fixed, then couldn't you have combined tables of ORA combinations. They take memory but with banked carts the norm, that shouldn't be a big deal. So you have:

LDA ADDR1,Y

ORA ADDR2+3+4,Y

STA ADDR5,X

Divya16 · September 9, 2010

I know. But this makes the program yet longer (LDIR is only 2 bytes IIRC). Obviously, using a hectar of code you can go down to 8 cycles per byte.

I guess that's the only reason block move would be superior in that it takes up less memory since even on Z80 the unrolled LDA/STA equivalents would be faster than their block move (going by cycles given in this thread).

Poison · September 9, 2010

When I saw Vector engine in amazing demo Numen, I thinked that atari is the most powerful 8bit computer, but I still dont understand why atari can do the same in two! colours such as worst ZX:( I am not coder, only musician, is this really impossible on atari?

The_Laird · September 9, 2010

I personally think that Speccy doom looks and sounds fantastic :thumbsup:

Bryan · September 9, 2010

When I saw Vector engine in amazing demo Numen, I thinked that atari is the most powerful 8bit computer, but I still dont understand why atari can do the same in two! colours such as worst ZX:( I am not coder, only musician, is this really impossible on atari?

I think what you are asking is, "I do not understand why Atari conversions of Spectrum games have only two colors."

The reason is that on the Atari there is only a certain amount of RAM bandwidth available so you must trade color for resolution. Atari envisioned 160x resolution or less for gaming and 320x resolution for 40-column text. The best Atari games are designed with this limitation in mind.

Poison · September 9, 2010

yes, games from ZX have two colours (jet pac), but ugly colours in higher resolution (and squares around the player . . ..) are not all:) you need great playable games, I konw atari has only few great games, but the question is, is possible do the same such as ZX do? I mean Wolf 3D.

+Stephen · September 9, 2010

That Speccy Doom may not be great, but it's certainly better than what we have on the A8 which is nothing.

frogstar_robot · September 9, 2010

yes, games from ZX have two colours (jet pac), but ugly colours in higher resolution (and squares around the player . . ..) are not all:) you need great playable games, I konw atari has only few great games, but the question is, is possible do the same such as ZX do? I mean Wolf 3D.

Development on this has yet to go beyond proof of concept but most everyone in this forum knows about this:

It also now commonplace in A8 demos to have 3D world segments with Numen being the best known example of this. And that code is a cut down variant of Ken Silverman's Build Engine of Duke3d fame. Still, I know of no fully realized Wolf3d style game.

JamesD · September 9, 2010

Ah, more processor/computer wars. That usually means selecting 1 little thing that our computer/processor does better than yours and try to base the argument around that.

RISC simplified the instruction set to reduced die size which allowed for more registers and simple instructions allowed for efficient pipelining. In addition, all addressing modes and instructions work with all registers. The 6502 has none of that and attempting to call it RISC is just an attempt to make it sound more advanced than it is.

The Z80 is a pig when it comes to clock cycles per instruction.... there's no getting around it. Clearly, no optimization took place when the Z80 was designed as an enhanced 8080. The 64180/Z180 chips in native mode gain over 20% speed vs the Z80 which tells you just how much they could have done with the chip to begin with. But then it was usually clocked faster to make up for that. When you look at modern versions like the Rabbit you end up with very short instruction times due to pipelining... but that wasn't available back then.

The 6502 is pretty fast but it takes many more instructions to do the same thing or to get that speed. When you combine that with the faster clock speed Z80 machines like the ZX were running, the speed advantage is suddenly dependent on what application you are running.

Some magazines said the 6502 was better for pushing around graphics and the Z80 was better for business apps, but I think that idea can be traced to an early article and later authors just repeated it. At the time that was first suggested, I'm sure the Z80 had a large number of business apps from 8080 CP/M before the 6502 had time to build up a base of business apps, and the Z80 was being clocked in the 2MHz range due to the speed/cost of RAM at the time.

The Z80 ldir vs 6502 loop comparison left out something. ldir may take 21 clocks/byte on a Z80, but at twice the clock speed that takes place in 10 1/2 6502 clocks. Just keep that in mind.

Its also unfair to compare clock cycles to load the A register alone. Consider this... you can load a 16 bit pointer on the Z80 and can increment the pointer and never worry about crossing 256 byte page boundaries. On the 6502 you have to set up a page zero pointer and index off of it. And if you will be crossing memory page boundaries you have to loop until the end of the current page, then update the high byte of the pointer, etc... However, if you won't cross a 256 byte boundary you can skip the 16bit math. And FWIW, I think you'll find indexing off of a page zero pointer to be much more costly cycle wise than just loading the accumulator. Cherry picking features will not give you an accurate comparison of speed.

A series of benchmarks would be a much better indicator of overall speed than instruction by instruction comparisons. Picking benchmarks that don't bias things for one cpu or another is another matter. I would suggest categories for small sprites, large sprites, vector math & 3D wireframe graphics, sorting data, etc. I also suggest comparing C compiler output.

As for code size... the Z80 code is going to be smaller due to a larger number of registers, more instructions, and 16 bit support.

I ported a very simple music player for the AY sound chip from the Z80 to the 6803, 6809, and 6502 (ported in that order), then I made all versions interrupt driven. I don't remember the exact sizes but when all was said and done, the 6502 interrupt code was over 175 bytes, and Z80 code was around 150. The 65c02 version makes the code a few bytes smaller mostly due to new stack instructions, I haven't tried a 65816 version yet. Both Z80 and 6502 versions could probably be tweaked to save a few clock cycles and/or instructions but I think after size optimization you would still have similar code size differences as long as end functionality wasn't modified. Speed optimizations would probably make both larger. The 6803 code size varied by which approach I took for the main player loop but I think the faster version was around 135 bytes. The 6809 code was around 105 bytes without changing the DP register which would make the code smaller and faster.

I found similar results with some other code I ported but I don't guarantee that will be consistent with all programs.

popmilo · September 9, 2010

Reverse example would be trying to port YOOMP rendering routine to spectrum...

Try to write this on Z80:

LDA ADDR1,Y ; 4 cycles

ORA ADDR2,Y ; 4 cycles

ORA ADDR3,Y ; 4 cycles

ORA ADDR4,Y ; 4 cycles

STA ADDR5,X ; 5 cycles

=21 cycles

And then take into account different screen organization, and it turns out that even C64 can not reproduce it and it has the same cpu just slower...

If those ADDRn are fixed, then couldn't you have combined tables of ORA combinations. They take memory but with banked carts the norm, that shouldn't be a big deal. So you have:

LDA ADDR1,Y

ORA ADDR2+3+4,Y

STA ADDR5,X

if you use same reasoning further you could involve LDA also:

LDA ADDR1+2+3+4,y

STA ADDR5,x

Problem is in preparing data for this renderer... It seams to me that you would complicate insertion of new row... it would be those same LDA,ORA,ORA,ORA,STA instructions but in different place...

But, I must admit I never thought of it this way :ponder:

And I have suspicion that maybe there is something good in your reasoning ...

Have to put it on paper and think about it...

Thanks for the idea!

popmilo · September 9, 2010

...A series of benchmarks would be a much better indicator of overall speed than instruction by instruction comparisons. Picking benchmarks that don't bias things for one cpu or another is another matter. I would suggest categories for small sprites, large sprites, vector math & 3D wireframe graphics, sorting data, etc...

I would like to see a list with simple operations and time in microseconds for C64, A8,Spectrum and maybe some other...

But you are right, any serious comparison can be made only with longer running benchmarks of more complex operations.

Things like software sprites and texture mappers are good for this.

Making the same looking effect on another computer running as fast as possible...

Anyone for a challenge ?

p.s. remember, imho - this is only for indulging curiosity - no wars intended :cool:

sack-c0s · September 10, 2010

Wouldn't a couple of lookup tables more or less sort that out?

possibly, but it could end up being more efficient to calculate on the fly. The issue is that to code something like doom I think you end up mostly drawing columns rather than rows, so the overhead to find screen offsets is higher. I've not looked at it for ages and I'm assuming it uses a form of raycasting and I could be wrong though - that's a point for investigation I think

carmel_andrews · September 10, 2010

Well, sticking a z80 nto an A8, wouldn't make the A8 any better then the spectrum (and anyway, atari would have to moddy the z80 to making it work with the additional hardware, just like atari did with the atari version of 6502)

Though i did hear rumours that atari were thinking of a multi cpu based computer that also used z80

look at it another way, lets take the amstrad cpc series and the msx series (which had slightly better gfx hardware then the spectrum) yet over 90 percent of whatever was released for the amstrad cpc/msx was no better then the equivalent spectrum version

frogstar_robot · September 10, 2010

Well, sticking a z80 nto an A8, wouldn't make the A8 any better then the spectrum (and anyway, atari would have to moddy the z80 to making it work with the additional hardware, just like atari did with the atari version of 6502)

Ignoring that it would be pointless and labor intensive to swap a completely incompatible CPU in, it wouldn't be necessary to modify the Z-80 as long as it can run at some even multiple or submultiple of 1.79Mhz. It would just require more chips. The 400 and the 800 didn't have modified 6502s. They had extra external logic to allow halting the CPU so ANTIC could do it's thing. The XLs and XEs consolidated such logic where possible to get the component count down hence the FREDDIE and the 6502c.

high voltage · September 10, 2010

look at it another way, lets take the amstrad cpc series and the msx series (which had slightly better gfx hardware then the spectrum) yet over 90 percent of whatever was released for the amstrad cpc/msx was no better then the equivalent spectrum version

Look at the excellent cartridge range on MSX, thousand times better than Spectrum or CPC rubbish

atariksi · September 10, 2010

Well, sticking a z80 nto an A8, wouldn't make the A8 any better then the spectrum (and anyway, atari would have to moddy the z80 to making it work with the additional hardware, just like atari did with the atari version of 6502)

Ignoring that it would be pointless and labor intensive to swap a completely incompatible CPU in, it wouldn't be necessary to modify the Z-80 as long as it can run at some even multiple or submultiple of 1.79Mhz. It would just require more chips. The 400 and the 800 didn't have modified 6502s. They had extra external logic to allow halting the CPU so ANTIC could do it's thing. The XLs and XEs consolidated such logic where possible to get the component count down hence the FREDDIE and the 6502c.

680x0 had the HALT logic as used in Amiga so maybe it's better to put a 68000 into an Atari. It also has way of accessing 8-bit ports since they kept that in for backward compatibility with 6800.

atariksi · September 10, 2010

Reverse example would be trying to port YOOMP rendering routine to spectrum...

Try to write this on Z80:

LDA ADDR1,Y ; 4 cycles

ORA ADDR2,Y ; 4 cycles

ORA ADDR3,Y ; 4 cycles

ORA ADDR4,Y ; 4 cycles

STA ADDR5,X ; 5 cycles

=21 cycles

And then take into account different screen organization, and it turns out that even C64 can not reproduce it and it has the same cpu just slower...

If those ADDRn are fixed, then couldn't you have combined tables of ORA combinations. They take memory but with banked carts the norm, that shouldn't be a big deal. So you have:

LDA ADDR1,Y

ORA ADDR2+3+4,Y

STA ADDR5,X

if you use same reasoning further you could involve LDA also:

LDA ADDR1+2+3+4,y

STA ADDR5,x

Problem is in preparing data for this renderer... It seams to me that you would complicate insertion of new row... it would be those same LDA,ORA,ORA,ORA,STA instructions but in different place...

But, I must admit I never thought of it this way

And I have suspicion that maybe there is something good in your reasoning ...

Have to put it on paper and think about it...

Thanks for the idea!

I'm not into Z80 stuff, but I'm into optimizations and I think putting a table of of combined LDA/STA w/o ORA would increase the table exponentially by 256X if those LDAs are actual arbitrary 8-bit values being transformed. But just ORAs being combined to some tables would be like using a multi-input OR gate rather than several OR gates in circuits. Circuits have fan-out issues and software has memory usage issues.

As for comparing Z80 vs. 6502, it's better to compare the 3.5Mhz Z80 to 6502 1.79Mhz and then compare the A8 hardware advantage separately. So sort of like would you put a Z80 into an A8 at 3.5Mhz or take an A8 at 1.79Mhz?

atariksi · September 10, 2010

Well, sticking a z80 nto an A8, wouldn't make the A8 any better then the spectrum (and anyway, atari would have to moddy the z80 to making it work with the additional hardware, just like atari did with the atari version of 6502)

Ignoring that it would be pointless and labor intensive to swap a completely incompatible CPU in, it wouldn't be necessary to modify the Z-80 as long as it can run at some even multiple or submultiple of 1.79Mhz. It would just require more chips. The 400 and the 800 didn't have modified 6502s. They had extra external logic to allow halting the CPU so ANTIC could do it's thing. The XLs and XEs consolidated such logic where possible to get the component count down hence the FREDDIE and the 6502c.

680x0 had the HALT logic as used in Amiga so maybe it's better to put a 68000 into an Atari. It also has way of accessing 8-bit ports since they kept that in for backward compatibility with 6800.

Just to clarify, 680x0 was available in 8-bit Era so it was a possibility to use: http://en.wikipedia.org/wiki/Motorola_68000.

popmilo · September 11, 2010

...I'm not into Z80 stuff, but I'm into optimizations and I think putting a table of of combined LDA/STA w/o ORA would increase the table exponentially by 256X if those LDAs are actual arbitrary 8-bit values being transformed. But just ORAs being combined to some tables would be like using a multi-input OR gate rather than several OR gates in circuits. Circuits have fan-out issues and software has memory usage issues.

Good question... Values are quite specific...

These are the 'patterns' of those values:

1: xx000000

2: 00xx0000

3: 0000xx00

4: 000000xx

I don't see possible optimizations there...

And there is no way to take out those ORs outside of drawing code without making >3500 x 65 byte tables (basically prerendering whole frames..).

But that brings no speed increase...

We should make texture renderer in fpga and hook it up on 8-bit computer

JamesD · September 11, 2010

As for comparing Z80 vs. 6502, it's better to compare the 3.5Mhz Z80 to 6502 1.79Mhz and then compare the A8 hardware advantage separately. So sort of like would you put a Z80 into an A8 at 3.5Mhz or take an A8 at 1.79Mhz?

From an academic standpoint, comparing cpu vs cpu is interesting but it isn't guaranteed to help much once you compare machines.

The Z80 may do pretty well vs the 6502 when they are compared on their own, but once you stick it in a machine the results are going to vary a lot. It suffers heavily when clock cycles are stolen by the rest of the hardware.

One prime example is the NEC TREK. The TREK graphics interface stole cpu cycles and made it so slow it's version of

couldn't display the magnifier even though it has the same graphics chip as the CoCo and the CoCo could do it at .89 MHz. The 1MHz 6502 Apple II also had a version of this game called Dung Beetles with the magnifier.

On the other hand, the VZ200 may require some mods to display the same graphics mode but it's design doesn't slow the CPU so it should have been able to run it easily.

MSX and Amstrad machines also appear to suffer slowdowns but the Spectrum would be more like the VZ200.

atariksi · September 11, 2010

...I'm not into Z80 stuff, but I'm into optimizations and I think putting a table of of combined LDA/STA w/o ORA would increase the table exponentially by 256X if those LDAs are actual arbitrary 8-bit values being transformed. But just ORAs being combined to some tables would be like using a multi-input OR gate rather than several OR gates in circuits. Circuits have fan-out issues and software has memory usage issues.

Good question... Values are quite specific...

These are the 'patterns' of those values:

1: xx000000

2: 00xx0000

3: 0000xx00

4: 000000xx

I don't see possible optimizations there...

And there is no way to take out those ORs outside of drawing code without making >3500 x 65 byte tables (basically prerendering whole frames..).

But that brings no speed increase...

We should make texture renderer in fpga and hook it up on 8-bit computer

Well, the three ORA that index using same Y can be put into one look-up table if the addresses they use are fixed.

Adding hardware doesn't count, but memory expansion using standard means (non-solder) is the norm. PCs started with 64K and went up to 640K in DOS and some even used banked memory at $D000:0000 or above. I remember a PC 8088 that only allowed up to 512K w/o soldering.

atariksi · September 11, 2010

As for comparing Z80 vs. 6502, it's better to compare the 3.5Mhz Z80 to 6502 1.79Mhz and then compare the A8 hardware advantage separately. So sort of like would you put a Z80 into an A8 at 3.5Mhz or take an A8 at 1.79Mhz?

From an academic standpoint, comparing cpu vs cpu is interesting but it isn't guaranteed to help much once you compare machines.

The Z80 may do pretty well vs the 6502 when they are compared on their own, but once you stick it in a machine the results are going to vary a lot. It suffers heavily when clock cycles are stolen by the rest of the hardware.

One prime example is the NEC TREK. The TREK graphics interface stole cpu cycles and made it so slow it's version of

couldn't display the magnifier even though it has the same graphics chip as the CoCo and the CoCo could do it at .89 MHz. The 1MHz 6502 Apple II also had a version of this game called Dung Beetles with the magnifier.
On the other hand, the VZ200 may require some mods to display the same graphics mode but it's design doesn't slow the CPU so it should have been able to run it easily.

MSX and Amstrad machines also appear to suffer slowdowns but the Spectrum would be more like the VZ200.

I was looking at it from design viewpoint. Would an Atari be better with a Z80 at 3.5Mhz or a 6502 at 1.79Mhz? It's a controlled experiment so all the other hardware stays the same. When you mix in other hardware at the same time, you no longer have a controlled variable.

popmilo · September 11, 2010

...Well, the three ORA that index using same Y can be put into one look-up table if the addresses they use are fixed.

Adding hardware doesn't count, but memory expansion using standard means (non-solder) is the norm. PCs started with 64K and went up to 640K in DOS and some even used banked memory at $D000:0000 or above. I remember a PC 8088 that only allowed up to 512K w/o soldering.

Problem is there are around 3600 bytes created in that way on screen...

Each of them from different 4 addresses...

We would need 3600x65 bytes of tables... around 230k

That is not a big problem (on A8 especially...).

Problem is those tables are not calculated once... texture is scrolling...

It would need 3600 bytes calculated every frame and that is same as existing rendering code...

So... no speed gained there...

Only way to speed it up would be to precalculate whole level animation...

3600bytes x 64 x 48 (max size of level...)=10.54Mb... little to much

why ZX has and atari hasnt?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members