Jump to content
VladR

65C02 vs 6502 Cycles and OpCodes

Recommended Posts

The "30th Lynx Birthday Compo" got me thinking of reusing the three and half shi*loads of ASM code I wrote over last few years (covering areas from point cloud, wireframe, flatshading, and many other experiments like Rez, StunRunner or StarFox).

 

1.VTI Spec

- I spent some time googling for pdfs and specs on Lynx, but the 6 different ones I found are all missing the elusive "VTI spec" that is supposed to contain the exact cycle count of each instruction

- Anybody could please upload it to this thread?

 

2. Additional 65C02 instructions

- http://6502.org/tutorials/65c02opcodes.html lists quite a few additional instructions and addressing modes. Some are reaaally good ones and would result in much faster inner loops.

- the more I look into it, the more variations of 65c02 I find with yet another 65c02 opcodes

- hence the VTI spec would be really helpful to see which additional opcodes are available on Lynx's 65C02

 

3. Actual Frequency

- Given the 16 MHz Crystal I presume the 65C02 clock in Lynx is 4.00 MHz

- But all 6 docs I got use the same phrase "3.6 average". Average ? WTF ? Surely they didn't implement dynamic HW clock a century ago ?

 

4. Cycles available for CPU each frame

- I presume Mikey is the chip that reads FrameBuffer data (160x102: 8,160 Bytes) and just like Antic on Atari 800, Mikey halts the CPU while it reads the Framebuffer data

- 1 tick is 62.5 ns (1,000,000,000 / 16,000,000)

- as per the doc, there's 5 ticks per read for Mikey, hence 8,160*5 = 40,800 ticks. At least. There might be dozen other things Mikey re-reads each frame (palette, etc.).

- But, is it really 5 ticks ? Shouldn't it be 4 ticks for each of 256 bytes within current page and a 5 tick overhead when crossing the page boundary ? Has this been verified on HW ?

- At 60 fps, there should be 266,667 Raw CPU ticks available each frame ((1,000,000,000 / 62.5) / 60) = 266,667

- Since Mikey steals at least 40,800 ticks, we should [at best] have 225,867 ticks available each frame (266,667 - 40,800)

- But how much is, for example, LDA #37 ? 8 ticks, I suspect ? (4 to read opcode and 4 to read operand). Hence why VTI spec would be really helpful.

 

5. HW pipeline Execution time of 65C02

- I haven't found anything on this. There's 2 scenarios, though I'm heavily leaning towards the first one (it's cheaper and easier to design):

- a) 65C02 behaves exactly like 6502 in this regards, regardless of frequency and, if LDA #37 takes 2 cycles on 1.79 MHz 6502, it takes same 2 cycles on 4.00 MHz 65C02. It just takes less time [in ns] on Lynx, that's all

- b) 65C02 takes advantage of faster memory and higher clock, hence certain substages of the CPU decode/fetch/process pipeline execute sooner and most definitely within the cycle (impossible on 1.79 MHz), hence some instructions should be able to take less ticks

 

6. Suzy FrameBuffer Clear time

- Anybody got timings on how much faster Suzy is in clearing FrameBuffer compared to CPU ?

- Surely, the 16 MHz internal clock must fly through the FrameBuffer like a breeze, unless the internal implementation is botched.

- Though, from my experience with Blitter on Jaguar, I wouldn't really be surprised if certain functionality was actually slower in the dedicated HW. I got many examples when I can beat Blitter's HW implementation with brutally hand-optimized RISC code....

 

7. SW Rasterizing is barely marginally faster on Lynx than on Atari 8-bit

- Upon first look, the frequency difference is 2.235x (4.00 / 1.79). Yaaay. Over twice as fast !

- But wait, 160x96 FrameBuffer on Atari is 3,840 Bytes vs 8,160 on Lynx (160x104). That's 2.125x difference (every frame, to clear framebuffer). Almost completely wiping out any frequency differential.

- In flatshading, it gets even worse. On Atari 8-bit, I fill 4 pixels with one STA. On Lynx, just half.

- Hence, the scanline fill will have to run exactly 2x more (e.g. 80 STAs per scanline (on Lynx) instead of 40 STAs). There goes our 2.235 speed ratio

- The only advantage is, that on Lynx it will look better because we have 16 colors and not 4.

- Suzy appears to have the capability of 1/2/4-bit sprite handling, but I'm not sure if CPU can directly do the same as I -yet-didn't find how to change framebuffer color bit depth to 1 or 2 bits (BPP).

- I can currently think of only one type of visuals where we wouldn't be slowed down as much - Point Cloud, where it doesn't matter how many pixels are per byte, computation cost is same per pixel. Unfortunately, given the Suzy's scaling capabilities, a point cloud game on Lynx would look like Atari 2600 game on XBOX. Not worth the effort. It might look interesting (more than double the density of pixel cloud at same framerate), but that's about it. Perhaps as a loading-screen effect...

Share this post


Link to post
Share on other sites

Lynx has an 65SC02, so not all of the 65C02 opcodes are there.

AFAIK no "illegal" opcode.

  • Like 1

Share this post


Link to post
Share on other sites

On Lynx you do not "clear" the screen. You rather draw the background sprite.
As for the timing, a "CLS" sprite takes about 1/3rd of the screen to draw.

Share this post


Link to post
Share on other sites
3. Actual Frequency

- Given the 16 MHz Crystal I presume the 65C02 clock in Lynx is 4.00 MHz

- But all 6 docs I got use the same phrase "3.6 average". Average ? WTF ? Surely they didn't implement dynamic HW clock a century ago ?

 

Where did you find this?!

Here the original docs about timing:

http://www.monlynx.de/lynx/lynx4.html

Share this post


Link to post
Share on other sites

BTW, I am pretty sure this "VTI spec" does not exist. At least I haven't found it in 30 years of Lynx programming.

Share this post


Link to post
Share on other sites

Lynx has an 65SC02, so not all of the 65C02 opcodes are there.

AFAIK no "illegal" opcode.

Yeah,I got already 4 tabs with different 6502 flavors open (and I bet there's more if I searched more), so it'd be really nice to see the full specs of the one that is inside Lynx.

 

On Lynx you do not "clear" the screen. You rather draw the background sprite.

As for the timing, a "CLS" sprite takes about 1/3rd of the screen to draw.

So, Suzy can draw that background sprite at 60 fps, and still leave you with about ~60-70% of CPU cycles [per frame] available ? Now, that would be really nice compared to Atari 8bit ! At 4 MHz, that's quite a lot that can happen at 60 fps, yet have the doublebuffering running !

 

 

Where did you find this?!

Here the original docs about timing:

http://www.monlynx.de/lynx/lynx4.html

Like, almost all docs I googled seem to use the same phrase (looks like it's all copy paste). I refuse to believe there was a dynamic clock frequency, like in modern CPUs. So, I presume this is merely some kind of misunderstanding that just gets passed on (via copy paste), correct ?

Share this post


Link to post
Share on other sites

Actually, I never saw this "3.6". But someone might have done some statistic calculation considering the different number of ticks per cycle depending on bus contention and page crossing.

Share this post


Link to post
Share on other sites

BTW, I am pretty sure this "VTI spec" does not exist. At least I haven't found it in 30 years of Lynx programming.

Well, I was hoping somebody snatched it off some ftp before it was pulled [for legal reasons].

 

It is unfortunate, though. Of course, if I had real HW, I could very easily benchmark the instruction throughput.

 

But it would take a very slow and tedious debugging session (via printing registers and dumping RAM on screen) to figure out which new opcodes (of the ones that are mentioned in 65c02 links above) are supported and which aren't. Not impossible, for sure. Just, a PDF would be obviously much more comfortable.

 

I am presuming here that no Lynx emulator is cycle-exact as, say, Altirra - correct ? I searched through some posts in Lynx section and noticed pretty huge performance differences of Suzy's sprite throughput in some of people's benchmarks on real Lynx.

Share this post


Link to post
Share on other sites

Actually, I never saw this "3.6". But someone might have done some statistic calculation considering the different number of ticks per cycle depending on bus contention and page crossing.

 

That thought crossed my mind too, but I immediately discarded it because surely no one sane would average bus contention into CPU's frequency ? CPU runs at a fixed frequency, and whatever other chip is on the bus, then 65C02 is halted, obviously.

 

Though, I believe I noticed [in docs] some dual ported ram in Suzy/Mikey - kinda a bit like in Jaguar...

Share this post


Link to post
Share on other sites

It is unfortunate, though. Of course, if I had real HW, I could very easily benchmark the instruction throughput.

 

But it would take a very slow and tedious debugging session (via printing registers and dumping RAM on screen) to figure out which new opcodes (of the ones that are mentioned in 65c02 links above) are supported and which aren't. Not impossible, for sure. Just, a PDF would be obviously much more comfortable.

 

I am presuming here that no Lynx emulator is cycle-exact as, say, Altirra - correct ? I searched through some posts in Lynx section and noticed pretty huge performance differences of Suzy's sprite throughput in some of people's benchmarks on real Lynx.

 

Why not start some ultimate benchmarking initiative then? Mikey timers should be more than enough to measure precisely everything we could imagine. Some average values and standard deviations could be calculated. We just need some core administrating all the tests with terminal emulation, and writing specific benchmarks should be pure fun. I'm sure that most people here would be eager to participate in running such suit on their hardware :)

Share this post


Link to post
Share on other sites

 

Why not start some ultimate benchmarking initiative then? Mikey timers should be more than enough to measure precisely everything we could imagine. Some average values and standard deviations could be calculated. We just need some core administrating all the tests with terminal emulation, and writing specific benchmarks should be pure fun. I'm sure that most people here would be eager to participate in running such suit on their hardware :)

Not sure you noticed that I don't have the Lynx, so it would have to be purely "remote testing" by other people. I suppose I could do a does-it-execute test on an emulator, but really, not much could be tested on an emulator, once we get to measuring bus access by Mike and Suzy. Then again, since all games "work" on an emulator, the basic functionality of those chips must clearly be stable, correct ?

 

 

Yes, you are absolutely correct - writing such benchmarks, especially for me, is pure, crystal-clear excitement, as I'm quite hardcore when it comes to refactoring the routine many times to shave some cycles off.

 

 

I'm quite spoiled by my Jaguar Skunkboard, though. On a slow day, I average ~hundred builds. As of this very moment, while I'm shaving off some much needed bytes off the 4KB GPU cache for my triangle rasterizer, I do a build every 2-3 minutes (reordering RISC instructions turns the purring cat into a touchy b*tch), as I just hit F5 in Notepad++ and the build/deploy script gets executed.

 

It would be kinda hard to step back from something like that into a build per day (at best)...

 

 

This is my first time I checked eBay for Lynx. From a cursory look, it would appear that a bare device is around $120-$150 these days ? Sounds about right ? I still need to look into the current options how to deploy the builds - looks like there's at least 2-3 solutions now, some take longer to deploy than others, etc...

Share this post


Link to post
Share on other sites

Yeah,I got already 4 tabs with different 6502 flavors open (and I bet there's more if I searched more), so it'd be really nice to see the full specs of the one that is inside Lynx.

 

So, Suzy can draw that background sprite at 60 fps, and still leave you with about ~60-70% of CPU cycles [per frame] available ? Now, that would be really nice compared to Atari 8bit !

If you're interested in high-performance 6502-variants, you might be interested in checking out the HuC6280 that's in NEC's PC Engine/TurboGrafx.

 

7.16MHz, with all of the 65C02 instructions, plus a bunch of extra new ones, 2MBytes of physical address-space memory banking built into the CPU, no cycle-stealing for the video, and full-speed writes to video memory.

 

Kinda fun to program.

Share this post


Link to post
Share on other sites

This is my first time I checked eBay for Lynx. From a cursory look, it would appear that a bare device is around $120-$150 these days ? Sounds about right ? I still need to look into the current options how to deploy the builds - looks like there's at least 2-3 solutions now, some take longer to deploy than others, etc...

 

150USD for a Lynx II is less than I paid back in time, so I consider it a good deal.

Get yourself one, plus one of the various flash/SD cards. "Burn" BLL on it, build a nice USB<->ComLynx cable and debugging on real HW can start.

Download with 62500Bd isn't the quickest but for benchmarks sufficient I assume.

  • Like 1

Share this post


Link to post
Share on other sites

"Burn" BLL on it, build a nice USB<->ComLynx cable and debugging on real HW can start.

where I can find more details about that interface?

Share this post


Link to post
Share on other sites

If you're interested in high-performance 6502-variants, you might be interested in checking out the HuC6280 that's in NEC's PC Engine/TurboGrafx.

 

7.16MHz, with all of the 65C02 instructions, plus a bunch of extra new ones, 2MBytes of physical address-space memory banking built into the CPU, no cycle-stealing for the video, and full-speed writes to video memory.

 

Kinda fun to program.

I never got to experience PC Engine/TurboGrafx behind the Iron Curtain, BITD. It was primarily an Atari Land, so that's why heart wants what it wants :)

 

Speaking of High-performance 6502, all my last 2 years of 6502 coding is basically just prep for Eclaire XL. I bought one last year, but am still resisting unpacking the monster, as then I could kiss my Jaguar game good bye for yet-another 6 months :lol:

- EclaireXL has HiRes Antic modes 160x192x16, 320x192x16, 640x192x4

- CPU has 477,328 cycles per frame

- at 20 fps, you have ~1.4 Mil cycles for double-buffering.

- at 12 fps, you have ~2.1 Mil cycles

 

- I modified my personal 6502 cycle-exact dev emulator (running in Visual Studio) for these Hires modes, and it's absolutely amazing how complex scenes can be done with 1.4-2.1 Mil cycles, yet on an 8-bit CPU :)

  • Like 1

Share this post


Link to post
Share on other sites

 

150USD for a Lynx II is less than I paid back in time, so I consider it a good deal.

Get yourself one, plus one of the various flash/SD cards. "Burn" BLL on it, build a nice USB<->ComLynx cable and debugging on real HW can start.

Download with 62500Bd isn't the quickest but for benchmarks sufficient I assume.

Must. Resist. Now

 

Otherwise, ...

post-19882-0-49054100-1559209815_thumb.jpg

Share this post


Link to post
Share on other sites

I never got to experience PC Engine/TurboGrafx behind the Iron Curtain, BITD. It was primarily an Atari Land, so that's why heart wants what it wants :)

Yep, we're decades-in-time beyond the original lives of these old machines, so developing for them these days is definitely a labor-of-love.

 

The Atari 800 was my 1st-computer-love, and an amazing piece of hardware for its time.

Share this post


Link to post
Share on other sites

1.VTI Spec

- I spent some time googling for pdfs and specs on Lynx, but the 6 different ones I found are all missing the elusive "VTI spec" that is supposed to contain the exact cycle count of each instruction

- Anybody could please upload it to this thread?

BTW, I am pretty sure this "VTI spec" does not exist. At least I haven't found it in 30 years of Lynx programming.

 

Mikey's part - 6502 core - wasn't done by Epyx but it seems was bought from company called VLSI.

 

VLSI Technology, Inc. - VTI: https://en.wikichip.org/wiki/vti

 

 

Mystery solved.

 

 

There is a discussion about "The 65816 from VTI - 1988 datasheet"

 

 

 

Back to the 6502 core, I read Lynx I and Lynx II has a bit different instructions set.

 

 

Lynx I:

post-2308-0-24620500-1559569097.jpg

 

Lynx II:

post-2308-0-10454100-1559568988.jpg

Edited by Cyprian_K
  • Like 1

Share this post


Link to post
Share on other sites
On 6/8/2019 at 2:02 AM, 42bs said:

The VTI data sheet can be found here:
https://archive.org/details/1988_VTI_ASIC

 

See page 225/232 (PDF) and esp. page 234/242(PDF): Machine cycles ...

That looks like a really great find! I will browse the book tomorrow, for sure.

Thanks!

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...