Jump to content
IGNORED

7800 hardware facts


RevEng

Recommended Posts

In an attempt to improve the 7800 Software Guide updated and emulation in general, I've been trying to figure out which inconsistencies in the Guide and GCC MARIA Spec docs [1,2] are wrong and which are right.

 

Some folks have kindly scoped and posted some of the timings and inferred information from the schematics.

 

I don't have a scope and keen schematical skills, so to indirectly verify some of the characteristics of the console I wrote a series of routines that test and measure the console in various ways. Based on the above posts and my own results, I intend to eventually update the Guide with a few bits of info...

  • NTSC has 263 lines per frame, while the PAL has 313.
  • NTSC console runs from a 14.3181818MHz crystal, while PAL runs from a 14.187576MHz crystal.
  • The DMA cycle stealing in the later MARIA Spec seems to be the closest to the truth. Using those values for cycle stealing in emulators brings them fairly close to real hardware test results.
  • If Holey DMA is enabled and an DL entry for a line is in a hole, there are no cycle penalty at all for skipping it, even with indirect fetches, according to tests with emulation that match real hardware.
  • The RIOT timers seem to work, though I found you need to set them twice on real hardware to make them work consistently. For some reason that I haven't figured out, hitting WSYNC during the countdown seems to mess them up and make them unreliable. Possibly there are other unknown pitfalls.
Then there's a bit of oddity I ran into. If you turn off DMA and use the 6502 to turn on and off the background color at precise cycles, you can infer from the slope of the line produced using cycle times for the guessed line width, that the 6502 sees between 112 and 113 6502 cycles per line (closer to 113) which would translate into about 451 Maria cycles. This goes some way to explaining GCC's usage of 452 cycles in their frame diagram in their second document, but not all the way. Since people have scoped that there are 454 Maria clock cycles per line, it seems that the 6502 is being stalled for some other reason. Hopefully someone has an alternate theory or explanation here.

 

Here's my test program. Moving the joystick left and right switches the running test. I don't have a PAL console, so watermarks are provided for NTSC only. Someone with the ability to run a PAL bin on real hardware could assist by reporting if the PAL levels on various tests differ from NTSC, and if they differ, what the new mark is. (move the joystick up and down to reposition the level measure mark)

 

7800test.bas.ntsc.bin

7800test.bas.ntsc.a78

7800test.bas.pal.bin

7800test.bas.pal.a78

 

Current emulators don't implement cycle stealing, and will fail the above tests pretty badly. That should hopefully change very soon.

 

If you're still reading at this point, thanks for suffering through the monster post! I'm look forward to any comments, theories, questions, criticisms, or debate. :)

  • Like 7
Link to comment
Share on other sites

The RIOT timers seem to work, though I found you need to set them twice on real hardware to make them work consistently. For some reason that I haven't figured out, hitting WSYNC during the countdown seems to mess them up and make them unreliable.

 

This is interesting! I wrote a RIOT timer test program in batari Basic that runs as expected on my 2600, but (if I remember correctly) has an issue on the 7800 when using the T1024T mode. I wonder if this has any bearing on that?

Link to comment
Share on other sites

Walter aka gambler172 kindly ran the tests, and it turns out the PAL levels pretty much match the NTSC. This at least tells us there big-effect differences between the regions. (other than PAL having more screen to draw and therefore less off-screen CPU time, but the tests were designed to only start during the on-screen time)

 

 

This is interesting! I wrote a RIOT timer test program in batari Basic that runs as expected on my 2600, but (if I remember correctly) has an issue on the 7800 when using the T1024T mode. I wonder if this has any bearing on that?

I would expect problems to happen only when in the faster 7800 mode, but I'll give this a shot in 2600 mode and see.

Link to comment
Share on other sites

  • 3 weeks later...

Then there's a bit of oddity I ran into. If you turn off DMA and use the 6502 to turn on and off the background color at precise cycles, you can infer from the slope of the line produced using cycle times for the guessed line width, that the 6502 sees between 112 and 113 6502 cycles per line (closer to 113) which would translate into about 451 Maria cycles. This goes some way to explaining GCC's usage of 452 cycles in their frame diagram in their second document, but not all the way. Since people have scoped that there are 454 Maria clock cycles per line, it seems that the 6502 is being stalled for some other reason. Hopefully someone has an alternate theory or explanation here.

To refine this, is it possible that MARIA is only generating 113 CPU cycles for each line of its 454 MPU cycles, instead of 113.5? This would account for most of the 112.75 6502 cycle line width I'm seeing with DMA turned off. Anybody ever scope the 6502 clock generated when DMA is turned off?

Edited by RevEng
  • Like 1
Link to comment
Share on other sites

A "long cycle" automatically inserted at some point might make sense in that cycle alignment vs beam position would be consistent among scanlines (assuming none generated by TIA/RIOT accesses) although cycle-exact timing is much less important to programs than on the computer line.

 

7800 uses the exact same master clock as the XE computers.

Computer has 262 lines * 114 cycles = 29868 per frame

7800 with 263 lines * 113.5 cycles = 29850.5 per frame - 18.5 cycles less, which in fact might be "convenient" given the computer's framerate is in fact slower than analog broadcast standard.

 

7800 with 263 lines * 114 cycles = 29982 cycles per frame - which would be slower again than broadcast.

 

Wouldn't an easy way to test such theories be to just do a program that does a background colour change halfway through a scanline, then does a software wait equal to the exact number of cycles in a frame?

  • Like 2
Link to comment
Share on other sites

Wouldn't an easy way to test such theories be to just do a program that does a background colour change halfway through a scanline, then does a software wait equal to the exact number of cycles in a frame?

Excellent suggestion. It does look like the line length is indeed 113.5, but now I find myself much deeper down the rabbit hole.

 

I dialed in 29850 cycles instead of 29850.5 cycles to check for a line length of 113.5, since there's no way of delaying half a cycle.

 

The line was completely stationary. It wouldn't budge, despite the fact that I mentally willed it to move several times.

 

I banged my head against my original vertical line drawing code, and recreated it it a simply as possible. There was nothing wrong with the original code (e.g. I was careful to avoid incurring page crossing penalties) but I found with the new version I was able to get vertical lines with code that assumed the line was 113 cycles. :|

 

I played around some more, and it looks like if I hit the BACKGRND register in even numbers, it causes some strange 0.5 CPU cycle alignment penalty each scanline. This resulted in my carefully counted 29850 cycles in my frame delay to actually become 29850.5 cycles. And the same effect caused my 113 cycle vertical line drawing code to actually take 113.5 cycles and draw a straight line.

 

There appears to be a different, bigger, penalty if I use an odd number of BACKGRND hits. This is why I think its some kind of alignment quirk.

 

Here's a screenshot of some line segments drawn using 2 sets of register hits, 3 sets, 4 sets, and 5 sets, and 6 sets. The horizontal timing for all of these lines theoretically add up to 113 CPU cycles.

 

post-23476-0-41426100-1399068633_thumb.jpg

 

This may or may not be some kind of side-effect of the CPU throttling. BACKGRND is a MARIA register, but close to the TIA range. But if my 3 cycle register hit was slowed down to TIA speeds, the slow 3 cycles would be equivalent to losing 4.5 fast cycles, and I'm not seeing that much of a hit.

 

I'm open if anyone has any theories or ideas to try to tease out more info.

Edited by RevEng
  • Like 1
Link to comment
Share on other sites

Definitely a good suggestion, but I figured out the weirdness this morning, and I'm back out of the rabbit hole.

 

In a nutshell, the sleep macro I was using does a "nop 0" to eat odd numbers of cycles. Even though TIA isn't really at $00, the range from $00-$1F invokes the TIA slowdown. This led to the weirdness, as the sleep number total was odd or even depending on how many times I hit BACKGRND.

 

[...]if my 3 cycle register hit was slowed down to TIA speeds, the slow 3 cycles would be equivalent to losing 4.5 fast cycles, and I'm not seeing that much of a hit.

Except the clock slowdown only happens during the data-fetch part of the instruction, dummy.

 

Lessons learned from this whole excursion:

  • a line length is exactly 113.5 CPU cycles.
  • an NTSC frame is exactly 29850.5 CPU cycles.
  • there is no clock sync when DMA is turned off. (we already knew that MARIA waits for the CPU to complete an instruction before it halts the CPU for DMA)
  • for any sort of access to $00->$1F, the clock slowdown amounts to an additional 0.5 effective CPU cycles. I confirmed this is true no matter which address mode you use to access the TIA range.
  • Like 2
Link to comment
Share on other sites

I "measured" the amount of DMA cycles used by DLLs that point to NULL DLs with some interesting results.

 

The following image is my vertical line draw routine, tweaked to assume an effective scanline width of 109.5 CPU cycles. The shift happens during "last line" scanlines, and the straight segments are "other line" scanlines. (to use the 7800 Software Guide terminology) The shift appears to be 2/3 of a line width, and since the line width is 3 cycles, the shift represents that "last lines" take an extra 2 CPU cycles compared to "other lines".

 

post-23476-0-42637200-1399220202_thumb.jpg

 

Given that we know a scanline without DMA takes 113.5 cycles...

 

DMA for "other line" scanlines is: 113.5-109.5 CPU cycles = 4 CPU cycles = 16 MPU cycles

DMA for "last line" scanlines is: 113.5-107.5 CPU cycles = 6 CPU cycles = 24 MPU cycles

 

This includes startup+shutdown values, since I can't split them out using this technique. It's also worth mentioning that around the time that DMA is occurring I'm using 2-cycle NOPs, so I should be getting near-minimum delays in switching to DMA.

 

GCC lists the DMA times as...

DMA Startup: 5-12 cycles

DMA LL Shutdown: 19-23 cycles

DMA OL Shutdown: 13-17 cycles

 

Atari lists the DMA times as...

DMA Startup: 5-9 cycles

DMA LL Shutdown: 10-13 cycles

DMA OL Shutdown: 4-7 cycles

 

So real hardware appears to use less DMA time than the GCC docs say, and is near the upper limit of what Atari says.

 

Related thoughts... if part of DMA startup is indeed comprised of waiting for the 6502 to finish an instruction, maybe part of it doesn't count from the CPU's perspective.

 

Also, the fact that the startup DMA values is a range is explained in the Software Guide by the fact that the 6502 may be in the middle of a long instruction. So why is the shutdown DMA value a range?

  • Like 1
Link to comment
Share on other sites

Doesn't Maria use the /HALT always when doing DMA? There is no requirement there to wait for the CPU to finish an instruction. It's only when using /RDY that there's the possibility of 3 successive writes by the CPU before it actually stops.

 

Possibly the varying startup time could be due to the fact the CPU might be executing a long cycle? And DMA difference for RAM vs ROM?

Long shot - maybe a long cycle has to be followed by a short cycle? Maria decides not to halt until it can be sure an RMW instruction involving TIA or RIOT has finished?

Edited by Rybags
  • Like 2
Link to comment
Share on other sites

Thanks! The distinction between /RDY and /HALT was lost on me. I was basing my expectations here on what I knew of the 2600 pause kit, which I see now uses /RDY.

 

Looking at the shutdown numbers, MARIA/6502 clock phase being the explanation makes a lot of sense. TIA/Riot long-writes would be a factor in startup numbers but not shutdown, and indeed that pattern is in both sets of numbers.

  • Like 1
Link to comment
Share on other sites

Hello all,

 

good projects are going on here on the 7800 section !

 

I am reading since some weeks again here , and I could contribute a bit to the discossion.Long time ago I had taken some logic traces

from the Maria chip.Here I have a trace of a DL4 fetch.

You can see how HALT is going low (A) , two 4 bytes Display Lists are fetched together with their data , and the last DL4.

After that HALT goes again high and the cpu continous the work at $D065.

 

I unfortunately have not the VC7800 setup ready to take more traces.I would have to do a lot of work to be able to take good traces , but

I plan to come back to it (very) slowly...

 

Greetings,

vassilis

 

post-26516-0-82464500-1399299425_thumb.png

  • Like 2
Link to comment
Share on other sites

Thanks for sharing the capture. There are some interesting things there. Do you have the logicport file still? I'm love to see leading and trailing parts of the capture.

 

In the part shown, I see a lot of ROM addresses being accessed, but I'd expect to see more RAM ones for DMA. MARIA requires the DL and DLL to be in RAM.

 

Which game is this captured from? I may be able to track down the assembly code involved by tracing the address access in the MESS debugger.

Edited by RevEng
Link to comment
Share on other sites

Hello RevEng,

 

thank you for your interest , I have attached the lpf file.You must remove the TXT ending.

 

You will have to install the logicport software and run it in demo mode.

There you can zoom and measure things.The logicport software is very easy and effective to use.

 

I think the game was Pole Position 2 , but I can not guarantee it , it is too long back !

 

 

Greetings ,

vassilis

 

 

MARIA_DL4_SHORT_2.LPF.TXT

Link to comment
Share on other sites

for any sort of access to $00->$1F, the clock slowdown amounts to an additional 0.5 effective CPU cycles. I confirmed this is true no matter which address mode you use to access the TIA range.

This might explain why MIA works for me in A7800x. I had to alter the dma startup time to be later to get it and Kung Fu Master to work properly. Keep in mind the version of Mess source used by a7800x had no accomodation for scanline timer. But it seems to be in line with your observation of MIA's activity (access the TIA), your recent testing, and my lame hack that gets it working on 1.07. :P
Link to comment
Share on other sites

  • 1 year later...

On E-Bay there are some 6532A RIOTS.

 

The "A" means they run at 2MHZ, twice as fast as the one in the 7800.

 

What would happen if one was installed in a 7800?

Should be fine - the speed designation is the maximum the chip can handle. Running it slower will be fine.

Link to comment
Share on other sites

  • 1 year later...

Hi,

 

I am making an A7800 emulator, and I am wondering if there are pictorial results from a real console for the tests in the opening post of this thread.

 

So far I have all the DMA logic implemented and everything works correctly, including all the typical troublemakers like Xenophobe, Kung Fu Master, and top line of Centipede.

But, when I run these tests the marker doesn't quite reach to where it is supposed to, and I don't know exactly what the results of tests 6 and 13 should be.

 

Any help is appreciated!

Link to comment
Share on other sites

Hi,

 

I am making an A7800 emulator, and I am wondering if there are pictorial results from a real console for the tests in the opening post of this thread.

 

So far I have all the DMA logic implemented and everything works correctly, including all the typical troublemakers like Xenophobe, Kung Fu Master, and top line of Centipede.

But, when I run these tests the marker doesn't quite reach to where it is supposed to, and I don't know exactly what the results of tests 6 and 13 should be.

 

Any help is appreciated!

Is there a file for testing?

Link to comment
Share on other sites

But, when I run these tests the marker doesn't quite reach to where it is supposed to, and I don't know exactly what the results of tests 6 and 13 should be.

I don't have pictures, but perhaps someone will oblige here soon enough.

 

Those tests require mid-line color changes. I take it your new emulator supports mid-line color changes?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...