ANTIC decap and reverse engineering

Curt Vendel · March 30, 2019

GTIA Schems I have all say 78 on them so it was definitely in the works and just not ready for primetime.

I am taking all of my schematics this week to Kinkos to get high quality scans of them to post up as my previous postings were too fuzzy of the POKEY and GTIA. Sadly no, I do not have the ANTIC schematics, just the chip datasheets.

Yeah, I was hoping to have one and here I have two. Back in the '80s they'd think me mad for being excited about CTIA. I suppose it's still a little pathetic today, but it's like an archeological dig.

What I want to know is if CTIA contains GTIA circuitry with the register bits disabled. That would confirm that CTIA was only shipped because GTIA was behind schedule.

Curt Vendel · March 31, 2019

One thing I can offer is I do have the actual chip plots used to make the dies for about 80 Atari chips, if anyone feels these could benefit them in reverse engineering please let me know.

Its a big endeavor- most of these are anywhere from 4x4 feet to 6x6 feet and I have to take them to kinkos and run each layer through a large format scanner and get a high res images of each onto a thumb drive.

It can start to be costly so let me know if this is something everyone would want, Ill get costing and maybe put up a go fund me page or something so this can be done.

Edited March 31, 2019 by Curt Vendel

+Stephen · March 31, 2019

I would pay for good scans of anything Atari AVM related!!!

Heck - I'm sure lots of people would donate to this archival. I certainly will.

+Nezgar · April 1, 2019

It can start to be costly so let me know if this is something everyone would want, Ill get costing and maybe put up a go fund me page or something so this can be done.

Heck - I'm sure lots of people would donate to this archival. I certainly will.

I'd probably be open to contributing to the costs, but I'd like to suggest maybe posting a list of the "80 or so Atari Chips" and folks here can help you prioritize them at least with some kind of voting system maybe? I've seen voting threads before, not sure how they are created.

Edit: Just looked at the new topic page - there's a "Manage Topic Poll" link in the top right to set all the selection options. Just allows you to best manage your time and costs in the effort you put in.

Rybags · April 1, 2019

What sort of bucks are we looking at to get a decent set of scans?

ivop · April 1, 2019

One thing I can offer is I do have the actual chip plots used to make the dies for about 80 Atari chips, if anyone feels these could benefit them in reverse engineering please let me know.

I would think so. If these plots contain most/all of the layers, it should be possible to generate a netlist. Maybe this can even be automated if the scans are clear enough.

ijor · April 1, 2019

Sorry for the delay. I am just returning from a trip ...

what does it mean when the signal goes through "circle connected to S01" (S01 I understand as clock signal)

It is a so called clock coupler. It is a pass transistor controlled by one phase of the clock. Technically, this works as a transparent (dynamic) latch. If the corresponding clock phase is active, both sides are connected. This has the effect as latching being active and the output (usually placed at the right on the schematics) follows the same logic state as the input (usually at the left). When the clock phase is not active both sides are disconnected. Then, the output is (usually) floating and retains the previous value due to capacitance. It is dynamic because if the clock is stopped for too long, the output would eventually discharge, as any capacitor does.

If the signal traverses two clock couplers controlled by opposite phases of the clock, then you have a one cycle delay. This is similar as having a full blown synchronous flip flop, except again that here it is a dynamic, not fully static, delay. Modern CMOS logic almost never use separate clock couplers, discrete flip flops are used almost as black boxes. But internally, flip flops still have a similar circuit at the transistor level, just with the additional feedback logic to make it static.

Note that in some special cases a node might be connected to multiple inputs through multiple clock couplers. This is used, i.e., at the GTIA high rez logic to be able to change the output at twice the clock frequency. I don't remember for sure but I don't think ANTIC has anything like that.

There are several "styles" to represent those clock couplers at the schematics. These circles is just one possibility, and it is the style used by the original GTIA schematics. POKEY original schematics use a completely different style altogether with "clocked gates". But this is just a cosmetic difference. We don't have original ANTIC schematics, so I used the GTIA style which I find it visually more intuitive.

Edited April 1, 2019 by ijor

ijor · April 1, 2019

As I understand it, each span of logic between opposite clock phases should take half a cycle. I've had bad luck trying to correlate exact delays in the schematic against measurements, though.

Let me know if some circuit is particularly interesting and I can create simulation waveforms. Nothing like a waveform to illustrate the timing.

'p' is probably for positive, to contrast against 'n' for negative.

Correct. But, as I already mentioned long ago, I'm afraid I didn't use a fully consistent naming convention. "N" and "P" are just opposite, that's for sure. But in some cases "N" stands for the inverse of the same signal that doesn't have the "N" prefix. In other cases it is because it just uses inverted logic.

ijor · April 1, 2019

I am taking all of my schematics this week to Kinkos to get high quality scans of them to post up as my previous postings were too fuzzy of the POKEY and GTIA. Sadly no, I do not have the ANTIC schematics, just the chip datasheets.

I believe you mentioned once you have CTIA schematics. I think at this point CTIA could be even more interesting than POKEY or GTIA. But of course, if you can scan all of them, certainly the better

One thing I can offer is I do have the actual chip plots used to make the dies for about 80 Atari chips, if anyone feels these could benefit them in reverse engineering please let me know.

That, of course, would be awesome. I understand you also have some tape out files you recovered from tapes and you reverse engineered the format. IIRC you posted a small section of GTIA layout produced from those files. Those files could be also be very useful if you can make them publicly available.

Btw, it is not just for technical reverse engineering reasons. IMHO, there is a huge historical value an all this kind of stuff. That's the difference between reverse engineered stuff (as I did), and original material. They both complement each other ideally. Reverse engineered might be technically more accurate. Original schematics might miss last minute modifications. But, original schematics might include circuit that was eventually disabled, how the author named some signals, the purpose of some logic, etc. Again, IMHO, historically invaluable stuff.

Curt Vendel · April 2, 2019

1. I will complete the full inventory of Chip Plots I have in upper storage and make an excel spreasheet.

2. I am going to bring the POKEY and GTIA schematics to Kinko's this upcoming weekend to do high res scans so I can repost better ones up onto atarimuseum.com (also someone asked for the 1090 stuff, so I'll bring that too) and then I can see what the cost of scanning the layers of a 4' x 4' chip plot as the highest resolution setting will cost and I'll be able to let everyone know and we can put together from the excel sheet a priority list of what should be scanned and posted up first in line.

Curt Vendel · April 2, 2019

I believe you mentioned once you have CTIA schematics. I think at this point CTIA could be even more interesting than POKEY or GTIA. But of course, if you can scan all of them, certainly the better

That, of course, would be awesome. I understand you also have some tape out files you recovered from tapes and you reverse engineered the format. IIRC you posted a small section of GTIA layout produced from those files. Those files could be also be very useful if you can make them publicly available.

Btw, it is not just for technical reverse engineering reasons. IMHO, there is a huge historical value an all this kind of stuff. That's the difference between reverse engineered stuff (as I did), and original material. They both complement each other ideally. Reverse engineered might be technically more accurate. Original schematics might miss last minute modifications. But, original schematics might include circuit that was eventually disabled, how the author named some signals, the purpose of some logic, etc. Again, IMHO, historically invaluable stuff.

The fact that it is also the very first release of the CTIA chip plot is also historical significant and great jumping point to start from.

msh · April 2, 2019

Thank you for the details how the chip works. I would have one more question about the symbols. There are "NOT elements" with extra signal connected to the "body" of the element. e.g. when plySngOe, plyDblOe or DlReq-1 is generated (pg.4). What does it mean? In some places the signal is input in some inverted input.

And another one about high level funkcionality - display list jump operation. It is pretty complex, so not about the whole stuff. The part I cannot understand at all is how DL counter is updated with a new address. There is DlReq-1 signal to push DL counter to the address bus and to increment the counter at the same time. There are signals to update DLH and DLL from data bus derived directly from horizontal counter. But I am missing any kind of temporary storage for DLL data as I cannot update DL counter immediately with DLL I read. This would produce address [DLHold; DLLnew] and I would read DLH from invalid address. Not to mention there happens DL auto-increment too. I hope I am making sense. Not sure how to describe it better. I was thinking may be the DLL is left "floating" on the address bus between the two clocks? But this seems too crazy to me.

ijor · April 3, 2019

There are "NOT elements" with extra signal connected to the "body" of the element. e.g. when plySngOe, plyDblOe or DlReq-1 is generated (pg.4). What does it mean? In some places the signal is input in some inverted input.

Those are, generically, Super Buffers. DlReq-1 is a Super Inverter (or an inverting Super Buffer), plySngOe is a (non inverting) Super Buffer, and each one of the ADDR and DBUS drivers have a Super Nor. Other constructs are possible.

Those gates have a circuit to improve the analog characteristics. They are mostly used on signals that have a large fanout. Functionally they are identical to the "normal", not "Super", version of the gate. That is, a Super Inverter is functionally exactly the same as an Inverter, a Super Buffer the same as a buffer, etc.

In other words, you can safely ignore the extra circuit (gate or wire) that runs in parallel and above to the main gate.

And another one about high level funkcionality - display list jump operation. It is pretty complex, so not about the whole stuff. The part I cannot understand at all is how DL counter is updated with a new address. There is DlReq-1 signal to push DL counter to the address bus and to increment the counter at the same time. There are signals to update DLH and DLL from data bus derived directly from horizontal counter. But I am missing any kind of temporary storage for DLL data as I cannot update DL counter immediately with DLL I read. This would produce address [DLHold; DLLnew] and I would read DLH from invalid address. Not to mention there happens DL auto-increment too. I hope I am making sense. Not sure how to describe it better. I was thinking may be the DLL is left "floating" on the address bus between the two clocks? But this seems too crazy to me.

The Display List Counter process is pipelined. The counter is not updated on the same cycle that the new value is read. I'll try to post a simulation waveform later to illustrate the timing.

ijor · April 3, 2019

This is a simulation waveform of ANTIC processing a Display List Jump Instruction:

The signal names correspond, mostly, to the ones on the schematics:

DlReq_1 : Fetch display list byte and increment DL counter.
LdDLISTL : Load DL Counter LSB from internal data bus.
LdDLISTH : Load DL Counter MSB from internal data bus.

Display list was set to start at address $4000, and jump to address $6020.

- Cycle 8:
Read jump instruction and increment DL counter (to $4001)

- Cycle 13:
Read target address LSB ($20) and increment DL counter (to $4002).

- Cycle 14:
Read target address MSB ($60) and increment DL counter.
But LdDLISTL is also active and overrides DlReq_1. Then DL counter LSB is loaded and not incremented.

Loading is not directly from the external data bus, but from the internal buffers that in this case store the value read at the previous cycle ($20). DL counter == $4020

- Cycle 15:
DL counter MSB is loaded. DL counter == $6020

Note that both the address drivers and the data bus buffers have latches.

msh · April 4, 2019

Wow! this is so nice picture and explanation.

There are some important bits of information I was not aware of

signals like DlReq_1 do not do anything directly. These are just "enablers". The change is happening on S02 descending edge.
I didn't expect DMA is instantaneous. At the same edge both address and data bus are set. I assumed something like: 1st edge sets address; 2nd edge retrieves data from memory chip.
cycle 14 must have precise timing "within" the edge. DLL is updated at the same moment when data bus is being updated from $20 => $60. I would expect this to be hazard state, not by design This works due to some real-world/analog characteristics, or this is clear from the schema?
actually the same is happening at cycle 15 - DLH is updated during change $60 => $FF

Is there any reason why some of the signals in the picture are dashed? Just curious

ijor · April 4, 2019

signals like DlReq_1 do not do anything directly. These are just "enablers". The change is happening on S02 descending edge.

The chipset is clocked and most of the internal logic is synchronous. Most signals drive logic that, eventually, reach a clock coupler. And as I explained in a previous post, a clock coupler would not propagate the input until the starting edge of the phase.

This doesn't necessarily mean that signals like DlReq_1 don't do anything at all directly. It is a matter of point of view, if you want. The signals that I sampled at the simulation are taken after the clock synchronizer and then would change only after the active clock edge. They correspond to the named signals in the schematics and I thought it was the more intuitive node to sample.

But you can sample a node before the clock coupler if you want. Look at the C1 cell on the last page of the schematics. I sampled the Q signal that is after the S01 clock coupler and after one of the inverters. If you sample the node before the S01 coupler, just after the AOI gate (the gate to the left), then that signal would change before the clock edge.

I didn't expect DMA is instantaneous. At the same edge both address and data bus are set. I assumed something like: 1st edge sets address; 2nd edge retrieves data from memory chip.

Of course that it is not instantaneous. I performed a so called "functional" simulation. Timing between clock edges is not meant to be accurate, let alone whatever happens externally to the chip. And the simulation shows the external DATA BUS, that it is driven externally on DMA cycles. The exact timing would depend on RAM and other components. I didn't simulate any of that, but just a very basic simulation to drive the data bus with specific values depending on the address.

Regardless, ANTIC does latch the data on the second phase of the clock. If you look at the DBUS Buffer on the first page of the schematics, you will see that there is a clock coupler controlled by S02. That means that whatever happens on the other phase of the clock would be ignored. This is simulated correctly but might be not obvious in the waveform because I included the external data bus only. I added a new waveform, now with the internal data buffers to illustrate better the behavior. It will also probably answer another of your questions.

cycle 14 must have precise timing "within" the edge. DLL is updated at the same moment when data bus is being updated from $20 => $60. I would expect this to be hazard state, not by design This works due to some real-world/analog characteristics, or this is clear from the schema?

As I said in the previous message, the Display Counter is not loaded directly from the external bus. It is loaded from the latches of the internal buffers. Again, the new waveform would hopefully make this more clear.

Is there any reason why some of the signals in the picture are dashed? Just curious

It is just an artifact of the way I implemented the simulation. Doesn't really mean anything, sorry about that. Please ignore that and also ignore some glitches that appear on the waveform.

Edited April 5, 2019 by ijor

msh · April 7, 2019

Would you mind to create waveform for CHBASE update, please? Starting from the external addr and data bus down to the CHBASE register including wrCHBASE signal? If not too complicated ideally with horizontal counter. With whatever values, only to have click ticks "numbered".

Theory: if CPU writes to CHBASE at the same clock tick when chAddrOe is active (to fetch character data from charset) this value is not used as it is at that time stored in the internal buffers and CHBASE is updated one clock later due to delayed wrCHBASE (passing S01/S02). Correct? (there is still nPHI2 involved, but I hope this can be ignored for this case )

If this is true, there is tricky question. If a CPU write happens exactly one clock tick before chAddrOe; is new, by CPU provided, or old, from CHBASE, value used to set A9/10-15? CHBASE update and addr bus set happens at the same S02 edge...

Edited April 7, 2019 by msh

ijor · April 7, 2019

Would you mind to create waveform for CHBASE update, please? Starting from the external addr and data bus down to the CHBASE register including wrCHBASE signal? If not too complicated ideally with horizontal counter. With whatever values, only to have click ticks "numbered".

No problem. Will gladly run a small simulation and post a waveform.

Theory: if CPU writes to CHBASE at the same clock tick when chAddrOe is active (to fetch character data from charset) this value is not used as it is at that time stored in the internal buffers and CHBASE is updated one clock later due to delayed wrCHBASE (passing S01/S02). Correct? (there is still nPHI2 involved, but I hope this can be ignored for this case )

If this is true, there is tricky question. If a CPU write happens exactly one clock tick before chAddrOe; is new, by CPU provided, or old, from CHBASE, value used to set A9/10-15? CHBASE update and addr bus set happens at the same S02 edge...

The same issue exist on other similar registers, and not only on Antic, but on most chips. There is always one cycle that is the latest that you can modify before it takes effect (chip will use the new value), any later write the chip will use the "old" value.

I will try to illustrate this with the simulation, but you can easily test this on real hardware and under emulation. It might be even included in Phaeron's Acid test?

There are cases that the write to a register is not well synchronized and the result is not fully deterministic.

Edited April 7, 2019 by ijor

phaeron · April 7, 2019

One cycle delay, if I remember correctly: write to CHBASE on cycle N is first reflected by ANTIC fetches starting on cycle N+2.

If you think about the way cycles work, it makes sense. A cycle isn't an instant, it's a period of time during which internal logic has to run. On a read cycle, the address has to be ready at the start of the cycle so the memory or device has time to provide the data at the end of the cycle. For a write, both address and data have to be ready at the start of the cycle. This means that expecting a write on one cycle to be immediately reflected in the next cycle is tough, you're expecting the device to quickly turn around data it received late in the first cycle to the address bus early in the next cycle. This does happen -- the 6502 reads the opcode at a JMP target immediately after the high address is read -- but with ANTIC there's usually at least one full cycle in between for the address to be calculated. It takes two cycles in between for character modes, where on the first scanline when ANTIC is reading both character names and data it doesn't issue the data read until three cycles later than the name read.

ijor · April 8, 2019

This means that expecting a write on one cycle to be immediately reflected in the next cycle is tough, you're expecting the device to quickly turn around data it received late in the first cycle to the address bus early in the next cycle. This does happen -- the 6502 reads the opcode at a JMP target immediately after the high address is read -- but with ANTIC there's usually at least one full cycle in between for the address to be calculated.

It is possible to read data in one cycle and immediately use that data to drive the address bus, as the 6502 does, but it is expensive. It usually requires extra logic, and also it might make the device slower. Slower in the sense that as a consequence of this, the maximum frequency of the device might be lower. In general, the less the pipeline, the lower the max frequency.

For the 6502 it is obviously worth. Otherwise many instructions would take one extra cycle. So there is a direct path from the data bus input to the address bus output buffers. And this is no very expensive for the 6502 because it needs that data to address connection anyway for other purposes.

For ANTIC it wouldn't make much sense something like that because you (probably) wouldn't gain anything. So register write operations are usually pipelined for at least one cycle.

Edited April 8, 2019 by ijor

ijor · April 11, 2019

Simulation waveform of ANTIC internal processing when writing to CHBASE register.

Two writes are simulated. Display list IR was set to ANTIC mode $7. Memory scan was initialized to $3000 and CHBASE to $60.

First time writes $80 to CHBASE. But it is just too late and ANTIC uses the "old" ($60) CHBASE value.

Second time writes $A0 to to CHBASE one cycle earlier, just in time for ANTIC to use the "new" CHBASE value.

_The Doctor__ · April 11, 2019

1.5 3 1.5 3

msh · April 12, 2019

This all really valuable feedback.
I understand in the clock-ticks world there will be always one latest moment (tick) when the input matters. I assume there is happening some additional timing withing the tick, makes sense. I would be really sad to learn there are situations within ANTIC when the behavior is "random".
Anyways, my primary objective is to be able to read the schema at a level to answer my questions (like the last tick which matters for CHBASE update) myself. Could be too ambitious for me, who knows

Looking at the waveform provided I am curious

why is the wrCHBASE only 1/2 clock wide? Address bus seems to keep value for full cycle. Other signals are 1 or 2 clocks wide.
I want to double-check. At 32.5/33 there is missing update of D(ATA)BUS with character's $21 data. I think it is just "typo".

I made a Photoshoped version with A0-15out signals. Is my picture accurate? This would make it very clean, only A0-15out would be updated on rising edge of S02 while everything else is happening on falling edge...

+Stephen · April 12, 2019

How are you guys doing these simulated waveforms?

ijor · April 12, 2019

I understand in the clock-ticks world there will be always one latest moment (tick) when the input matters. I assume there is happening some additional timing withing the tick, makes sense.

There is a difference between transparent (asynchronous) latches and edge triggered flip-flops. A transparent latch propagates the input, and changes the output, all the time that latching is enabled. This is why it is called "transparent". An edge triggered flip, OTOH, changes the output only on the active edge, hence the name. To work properly, flip flop and latches require that the input must be stable a minimum time before and after the active edge. Those are called setup and hold time. If the input changes too close to the clock edge, this would produce a setup or hold timing violation. In such cases the result is not predictable and can even produce metastability. There are tons of resources online about the issue that you might want to check.

I would be really sad to learn there are situations within ANTIC when the behavior is "random".

A modern synchronous design, such as an FPGA core, would typically use edge triggered flip flops only. But this was prohibitive back at the old days. A flip flop takes too many transistors. So an old school design might have transparent latches and in general other asynchronous logic. As long as implemented correctly and carefully planned, this shouldn't necessarily produce synchronization issues. But the designer, for instance, might have not considered (or didn't care about) the possibility of changing some register "on the fly" while it is being active.

I'm not aware about any ANTIC synchronization issues. It seems that you can change any ANTIC register on any cycle you want and the result would be fully consistent and predictable. But no every chip is so well behaved. IIRC Bryan found out sometime ago that there is an obscure GTIA behavior that might change depending on the temperature (the thread should be somewhere in this forum).

why is the wrCHBASE only 1/2 clock wide? Address bus seems to keep value for full cycle. Other signals are 1 or 2 clocks wide.

Precisely because CHABASE uses async latches (see the DT cell on the last page of the schematics), and the input of the latches (coming from DBUS internal latches) is ready and stable during half cycle only.

So the designer added logic for wrCHBASE and other similar signals, like wrPMBASE, to be disabled during S2. You can see wrCHBASE is the result of a NAND that includes S02.

I want to double-check. At 32.5/33 there is missing update of D(ATA)BUS with character's $21 data. I think it is just "typo".

I forgot to simulate the output of the RAM/ROM at address range $Axxx. I attached a new corrected waveform.

I made a Photoshoped version with A0-15out signals. Is my picture accurate? This would make it very clean, only A0-15out ...

Not exactly. Aout, if we consider the exact labeled node, starts changing on S1, one full cycle before the external address bus. A[n]out is not a latch, it is the internal address bus. The latch is actually after this node, starting from the S02 coupler described in the ADDR drivers. I added these nodes on the simulation and I named the bus Aout_2, meaning the S2 latches after Aout.

Edited April 12, 2019 by ijor

ANTIC decap and reverse engineering

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members