ijor
Members-
Content Count
2,621 -
Joined
-
Days Won
2
Content Type
Profiles
Member Map
Forums
Blogs
Gallery
Calendar
Store
Everything posted by ijor
-
I would be interested in a fully populated and assembled version assuming the cost would be reasonable.
-
This is very interesting. I wasn't aware about that VIA bug and spent some time researching the issue. The bug is essentially the same. An unreliable synchronization of the external clock that might miss an edge if the external clock active transition happens too close to the internal clock edge. But the actual logic is quite different, it wasn't copied. I think that at the time these kind of issues weren't that uncommon. But they weren't always noted and many times they were just harmless. Say, if once in a while a system missed an edge on a joystick switch, probably nobody realized. OTOH, synchronizing an external clock as in Via and Pokey, is particularly sensitive. A reverse engineered schematics of the VIA edge detector is available here (obviously very different than Pokey's one): https://github.com/ijor/VIA6522/blob/main/viaSchem-EdgeDetector.png
-
Pokey serial shift maximum bitrate Pokey SIO synchronous mode using an external clock can operate at bitrates much higher than the standard asynchronous mode. The maximum possible bitrate for Pokey is one third of the main clock, in other words, the period must be at least 3 PHI2 cycles. This is a limitation of how the Pokey shift registers work. A shift register is typically based on edge triggered flip flops. Such a shift register can shift in a single cycle. An edge triggered flip flop can both shift in and shift out in the same cycle. But edge triggered flip flops are very expensive for the small transistor budget of the era. It was common to use much simpler cells based on asynchronous latches rather than edge triggered flip flops. Asynchronous latches use less transistors but because they are transparent latches, they can’t shift in a single edge. They need separate steps for shift in and shift out. Performing the shift in multiple steps requires a slightly more complicated control logic. But the control logic is shared by the whole shift register. It is then still cheaper, in terms of the number of transistors, to implement a more complicated control logic with simpler cells for each bit. Pokey shifts in three cycles. One cycle to shift out, two idle half cycles, and a third cycle to shift in. The idle half cycles between both steps are required to ensure they don't overlap. It is not completely impossible to use "cheap" asynchronous latches and still shift in a single cycle. This is implemented using half cycles controlled by both edges of the clock. But this method requires precise timing and might hinder performance. It is normal to avoid such method if possible. Pokey uses a safe and efficient method that was surely considered more than fast enough at the time. This is the man shift register bit cell from my reverse engineered schematics: This is a simulation waveform with Pokey receiving using an external clock with a period that is exactly three PHI2 cycles: The names of most signals are the same used in my Pokey reverse engineered schematics: PHI2 is the main system clock rBCLK is the external serial clock as seen internally by Pokey after being synchronized sdiClock is the single cycle clock pulse generated by Pokey for driving the receiver control logic ssiShft (in light blue) and ssiTransf (in red) are the main signals produced by the control logic to perform the shifting ssi_7 and ssi_6 are the actual data bits at the shift register The cycle counter is, of course, a virtual simulation counter that is not present in real hardware Shifting starts at the middle of cycle 19 when ssiShft is asserted and enables shift out. ssiTransf is asserted one a half cycle later, at the start of cycle 21 to enable shift in, and ssi_7 toggles. This is a similar simulation waveform, now with a clock period of only two PHI2 cycles that cannot work correctly: Note how the red signal ssiTransf never asserts once shifting starts. As an obvious consequence, shifting is actually not performed and the bit cells ssi_7 and ssi_6 stay unchanged.
-
Phaeron, please add to the debugger GUI wishlist: "Run to Cursor". I find this feature extremely useful
-
Anybody that bought recently from Brad, how much it takes approx to ship? Does it ship with tracking? Thanks,
-
There is no fundamental difference between PHI2/4 (~447 KHz) and PHI2/3 (~596 KHz) except that at PHI2/3 is much harder to reliably phase align the clock with pure software (see the technical thread for details).
-
CLOCK IN is bidirectional and the clock frequency can be used for receive data (to the computer) or to transmit data (from the computer). CLOCK OUT always outputs the transmit clock. You might argue that this is a bit of over engineering. But this allows a simultaneous full duplex communication with different receive and transmit bitrates. Guess this feature was never used, until now I'm actually using this feature although not exactly for that purpose, but for determining the phase alignment of the clocks.
-
That's a brilliant idea that, honestly, I didn't think about it. But I would need to check it carefully if it wouldn't be too tight. Yes, just 3 cycles per frame, but they happen all together almost one right after the other. In anycase, it might still be a good idea to use a "fast" timeout when waiting for the checksum. Fast in the sense to detect it as soon as possible. This is just for the sake of efficiently retrying a packet in the case Pokey missed a clock pulse. Timeouts at other points should normally not happen, so it is perfectly ok to have a long timeout. No. You technically can have DLIs with ANTIC DMA being disabled. But this won't be very useful here. What he is suggesting is to not disable ANTIC DMA, but instead to use a minimum display list, without any actual display, that would just trigger a DLI after so many frames.
-
Timeout management Handling timeouts at these frequencies is not trivial. Interrupts must be disabled, of course, and at the highest frequencies we can't even afford to spend any cycles checking for timeout. There are some timeouts that should never happen under normal conditions and they are the most difficult to deal with. Once the transfer started, unless there is a hardware malfunction, no timeout should happen. Personally, I don't care too much to detect a timeout if I accidentally disconnected the SIO cable or turned off the drive. But I realize that a "professional" commercial product might don't want to lock and would prefer to issue a timeout error, even in those conditions that normally don't happen. The way to detect a timeout without adding any extra cycles to the main loop is simply to unfold completely the inner, shorter, loop that waits for the incoming byte. Since the loop is already quite tight and it doesn't have too many idle cycles, the number of iterations to unfold is very small. Of course, this wouldn't be effective for slower frequencies. At slower frequencies we would need to unfold too many iterations. But then at slower frequencies we could afford a more "conventional" timeout strategy, like decrementing a counter. This strategy of unfolding the loop doesn't work for the first byte that could take much more time. So again, for the first byte we must timeout using another method. I used a Pokey timer, but that's not the only way. When waiting for the first byte I don't wait until the byte arrived completely. If we would do that, we might risk to enter the main loop a bit too late. So instead I wait for the first byte just starting to arrive. Fortunately Pokey has a special flag for that at SKSTAT. This is signaled as soon as the first start bit is detected. This way we could start the main loop optimally even before the first byte has completely arrived: ; ; First byte timeout checking SKSTAT bit 1 ; ldx #$20 sta STIMER ; Reset timer lda #$02 sta IRQEN ; Enable timer #2 IRQ :wait0 bit SKSTAT ; 4 Pokey receiver busy? beq :start ; 2/3 start bit detected by Pokey bit IRQST ; 4 bne :wait0 ; 2/3 No timer #2 IRQ beq :timeOut ; More than a full byte time elapsed :start stx IRQEN ; Enable rcv IRQ, disable timer IRQ txa ; $20 bne :nextByte ; Unconditional ; ; "Fast" timeout, doesn't add any extra cycles to the loop ; :waitByte and IRQST ; 4 beq :gotByte ; 2/3 and IRQST ; 4 beq :gotByte ; 2/3 bne :timeOut ; More than 1 byte time elapsed :nextByte and IRQST ; 4 bne :waitByte ; 2/3 ; Fall through without branch taking ; for fastest best in case we were a bit late :gotByte ... ; ; Might use different timeout when receiving checksum ; ... :timeOut Again, some implementations might check for timeout just at the first byte, *AND* at the last byte. Even without a hardware malfunction, a timeout might occur when receiving the checksum. This could happen if Pokey missed a serial clock pulse. It is possible to implement a special strategy at the sender to avoid a timeout, even if the computer missed a byte. I.e., the peripheral might send an extra byte or two.
-
Good to know. But again, note that even if it's 5V tolerant, it might not tolerate back current. These are two different things because it needs protection at the output pads, and not just at the inputs. You need to drive the output data signal (SIO DATA IN) as well. It can't be driven by a simple UART. Last thing I forgot. In some cases the computer outputs are very glitchy. That could be a problem for measuring the clock phases. In my case the MCU I used has digital glitch filters that solved the issue.
-
Hi, I'm not very familiar with the Fujinet hardware, neither with the ESP32, but yes, seems it should work. I understand that ESP32 has a full GPIO matrix, so that you can drive any pin with the PWM compare timers, right? Because you need to drive SIO signals accurately, and you can't use an UART. The PWM should allow you to generate arbitrary waveforms, which I understand it does. The only problem I see is the 5V tolerant issue. I understand that the ESP32 is not 5V tolerant. But I guess you have already been using it like that, directly without level shifter, and the chip didn't blew up. But note that here is a bit more problematic. Since we are driving these signals actively, not open drain, you are actually injecting back power from the 5V pullups on the computer to the ESP32 (3V) power. The pullups are rather weak, so the current should be low. But can't say if it is ok or not for the ESP32. Or you always used push pull drivers?
-
The SIO Clock In signal is bidirectional. That's why the Pokey pin is named "BCLK". There is no hardware handshake. It is an output in async mode, and it is an input when you configure Pokey in synchronous mode. Normally there is no conflict because the signal is, or is supposed to be, open drain. But yes, we must drive it actively at the higher frequencies because the resistor pull up is far too weak. So you should be careful just in case there is a software malfunction. The firmware at the drive side checks that the signal is not being driven before enabling the output, and puts it back to open drain when it is not needed. A current limiting resistor might be recommended as well. Or if using a CPLD, it is possible to drive it high actively for a couple of cycles only. And then when the signal should be already high, tristate it and let the pull up maintain the voltage. It is even possible to perform the same procedure by software with those MCUs that have a separate programmable I/O processor.
-
Very interesting, had no idea that SuperSalt tests synchronous mode with an external clock. Indeed, it looks that Atari was aware that it could produce errors. Although passing with just one correct byte out of 15 is a bit too much! LOL
-
I was looking at an old schematics. I see now that you are using 74LS07 buffers. That won't work. You need a bidirectional buffer for the clock signal, and you need buffers that are not permanently open drain.
-
Sure. I didn't know the FujiNet connects the SIO clock signals already. Nice! Make sure the SIO clock signals are connected to the MCU hardware compare/capture timer. You also need to be able to drive the SIO actively, not with open drain. I don't see any voltage level shifter in the schematics? Anyway, feel free to contact me by PM.
-
Wow, that's a nice Nucleo! Yep, I think it is good enough. LOL.
-
Sorry for the delay. Busy with real life ... I mentioned already in the post with the source that I was not posting the timeout management because it is a bit long, complicated, and there are several options. I'll post the timeout code later. The initial "start sync mode" command frame is: $31, 'y', 0, 0, checksum As you are saying, there are lots of new options that can be implemented. I just implemented the very basics of the protocol to be able to run the tests. I'm not very familiar with those drives or the TOMS firmware, but no. As I understand the TOMS uses the clock signal just to autodetect the command frame bitrate. It doesn't drive the clock and put Pokey in synchronous mode.
-
Hi E474, Great board! Isn't it? Insane powerful for that price! I'm not using any other components at all and I do am connecting the pins directly to the SIO connector. In theory some kind of buffering might be recommended because these chips aren't really 5V tolerant. They are 5V tolerant only as long as it is powered. That means that the chip might be damaged if you power up the Atari but not the board. In this case, actually, the signals are not directly connected to 5V at the Atari, but only through rather weak pullup resistors. So the risk might be low. Also the buffering gets a bit complicated because at least the output clock must be tristated. If you also consider that the board is so cheap that it is not big deal if you need to replace, then the whole idea of using buffering, might be even silly Btw, I accidentally erased the original flash. Not big deal but I would like to get it back, if possible. Do you still have the original factory flash that power up as a SD card reader? If so, exactly which version of the board you have?
-
It doesn't exactly bit-bang. But it does have accurate control of the I/O port. To be more precise, the processor writes to a counter that would toggle the output at any exact cycle you want. So you actually get cycle accuracy without the need of running cycle accurate code in assembler. So yes, it is perfectly possible to use two stop bits. Actually, it is possible to make any bit arbitrary longer and, as a matter of fact, I am making the start bit longer because Pokey needs that. Remember this is synchronous mode, the bit time is determined by the serial clock pulses. So you can make any bit longer or shorter, by adjusting the corresponding serial clock pulse. Once again, there is no need here. Not as long as we use the fast receiver loop that checks IRQST every other byte. But somebody might prefer a more traditional implementation. I used the "slower" loop for a lower, 4 PHI2 cycles per bit, bitrate. That would mean ~447 KHz. I didn't try something in the middle between 3 and 4 PHI2 cycles per bit. But that is perfectly possible to implement and in that case a longer bit might be a good solution. Not exactly. That bit has a similar, but not identical, behavior than the IRQST status bit. And I did consider to use it at some point. The problem is that this bit is too volatile. The IRQST bit means that a byte has been completed and is ready in SERIN. The SKSTAT bit means that a byte has been completed, AND, another bit haven't started yet. It toggle only during the stop bit. It toggles back as soon as the next start bit is received. At our fastest bitrate that means that we have only 3 cycles to catch that bit. That is impossible because even without ram refresh cycles, you can only test SKSTAT every 5 cycles in the best case.
-
Yep, mentioned that above already. I didn't miss this one, but don't worry, I made lots of worse mistakes than that Nice idea. I was expecting you'll come up with some optimization No, I'm not running from page zero, but that could be done if needed. I think there is no need here. As you both are saying, it would need some reshuffle to take any advantage of the optimization. But I think that routine is optimized enough already. But certainly a good idea anyway. No, that won't work. The accumulator must have $20 at the top of the loop to check the IRQST status bit. Not exactly. The loop can take slightly more than 57 cycles because you don't get 9 refresh cycles on every iteration. Let's forget for a moment that we have a loop that process two bytes and let's assume a simpler, one per byte loop. If the first byte arrive in cycle 0, you don't need to read the next byte in cycle 33. As long as you read it before cycle 66 is good enough. So if you read the first byte say, in cycle 1, you have 64 cycles to read the next one. Of course, if you take more than 33 cycles on every iteration, you'll eventually be too late. But as long you can compensate longer with shorter iterations, that's fine.
-
I don't have a fully developed production code. This is more a proof of concept. You can say that the current code is actually a disorganized collection of routines designed for testing and debugging. No, the command frame can't be send at the same speed because Pokey output signals are open drain with rather weak pullups, they can't toggle that fast. I am using a "safe" ~64 KHz bitrate for transmitting the command frame. The maximum frequency depends on the presence or not of the "infamous" capacitors on the SIO signals. It is possible to perform some extra optimization here. As I said already, I'm not sending the 'Ack' and 'Complete' bytes. But it is also possible to transmit a smaller command frame of just one or two bytes. Not sure it is really worth, but it is definitely possible. The computer does send an initial "start sync mode" command. This command is sent using standard SIO bitrate.
-
Yes, of course. But that would take extra time.
-
The Atari Software The software at the Atari side is mostly very simple, almost trivial. The only exception is the loop that actually receives the packet that must be very efficient at the higher frequencies. At the higher frequencies all interrupts, and even Antic DMA must be disabled. At the maximum speed we have only 33 cycles per byte in the best case. Even with interrupts and DMA disabled it is still difficult to process a byte, including updating the checksum, in less than 33 cycles. The best code I came up, using a "traditional" polling approach takes 36 cycles in the best case. I omitted the timeout check code here. It is a bit complicated, but it doesn't add more cycles, only adds more bytes: ; Traditional polling approach ; 36 cycles per iteration in the best case ldx #0 :nextByte lda #$20 ; 2 :waitByte bit IRQST ; 4 bne :waitByte ; 2/3 stx IRQEN ; 4 Reset IRQ bit sta IRQEN ; 4 Re enable interrupt lda SERIN ; 4 sta secBuf,Y ; 5 modified at run time before the loop adc chksum ; 3 sta chksum ; 3 iny ; 2 bne :nextByte ; 3 There are a couple of variants that would take the same number of cycles. It is also possible to unroll the loop and save a couple of bytes. Still not good enough when you consider that ram refresh can steal up to 9 cycles. I'm not very familiar with the 6502 undocumented instructions. But at first glance it doesn't seem they could help here, at least not enough, and I didn't want to use them here anyway. There are some extremely talented codes here, may be somebody can come up with a faster loop? Of course it is possible to compute the checksum separately after the loop. But we don't really gain anything. The extra time to compute the checksum would be about the same as just using a slower birtate and compute the checksum on the fly. The goal for maximum efficiency is to update the checksum inside the loop ... It is not difficult to note that just resetting the IRQ takes 8 precious bytes. Unfortunately we must sill do this, even with interrupts disabled and using polling, because Pokey doesn't have a separate status bit to check if a byte is available in SERIN (there is no such bit in SKSTAT). Furthermore, this bit doesn't auto reset. Most UART chips automatically reset the corresponding status bit when the CPU reads the receive buffer (SERIN) and/or the status byte itself. Here we must do it "manually". But do we really need to reset the interrupt? In every byte? Well, not necessarily. We know almost exactly how many cycles each bytes takes, 33 cycles in this case. This can be negotiated as part of the protocol if needed. Even when we consider the jitter introduced by ram refresh, we still have an estimation precise enough at which cycle the next byte would be ready. So we just check and reset the status bit every other byte. We can read the next byte blindly from SERIN at the right time to never be too early neither too late. The saving is huge: ; As fast as 56 cycles per loop iteration ; That's 28 cycles per byte ! lda #$20 :nextByte :waitByte and IRQST ; 4 bne :waitByte ; 2/3 sta IRQEN ; 4 Reset IRQ bit lda SERIN ; 4 sta secBuf,Y ; 5 modified at run time before the loop adc prevByte ; 3 Add previous byte adc chksum ; 3 sta chksum ; 3 iny ; 2 ; Second byte ; Dummy delay to make sure we don't read SERIN too early ora chksum ; 3 lda SERIN ; 4 sta secBuf,Y ; 5 sta prevByte ; 3 Better to process checksum above lda #$20 ; 2 sta IRQEN ; 4 iny ; 2 bne :nextByte ; 3 Note that it is ok if one iteration takes a little bit more than two byte times because of ram refresh DMA. We would compensate on the other iteration. If all the 9 ram refresh cycles fall into one iteration, then no ram refresh at all would happen on the next one, and no on the previous either. We have extra time at the first part of the loop, otherwise we might read SERIN too early. That's why the first part updates the checksum for both bytes. Of course, the last byte must be added after the loop (not shown). Btw, I said in the other thread that the total processed time was computed at real time, and you might think that it not easy at all to do that with all interrupts disabled and without updating some counter. You are absolutely right. It is computed on real time, but not by the Atari. The ARM MCU at the other side starts a timer when receiving the first read sector command, and it stops when the computer send a special command. The MCU replies with a packet containing the total elapsed time. I was lazy and I even send the timing it in pure ASCII because it is so much easier to do the binary to ASCII conversion at the powerful ARM side.
-
Using a hardware flip flop inside the computer is a good solution in other similar cases, but here it is not trivial to implement. This serial clock signal is bidirectional, it is sometimes driven by Pokey when using normal SIO async mode, and it is open drain. You can't simply put an unidirectional buffer or flip flop at the SIO port. Anyway, the goal is to avoid any hardware modification.
-
Aligning the serial clock As noted previously, Pokey can sometimes miss a serial clock pulse. One way to deal with this is simply to implement a reasonable detect and retry strategy that considering the error rate, it would not affect performance too significantly. If we actually want to avoid the errors, we need to align the serial clock with Pokey's system clock, PHI2. Unfortunately PHI2 is not available at the SIO port. But all the signal outputs produced by Pokey do are synchronized with PHI2. It is then still possible to align the clocks to some extent, even without extra hardware. For this purpose we need Pokey to output a signal at a frequency high enough. We used the output clock signal, the one shown in the traces in a previous post. We measure the delay between our serial clock output and Pokey's output clock. Most MCUs have powerful hardware timers that can measure the delay between two signals rather precisely. From this delay we can compute the current phase alignment of both clocks and adjust our one accordingly. This clock alignment won't be as precise as using a hardware PLL, but it doesn't have to be. All we need is to avoid a small dangerous phase alignment range that might provoke the error. This adjustment must be performed as frequently as possible. Ideally on every serial clock cycle. If this is implemented by software, it might require a rather fast processor at the higher frequencies. At the higher frequencies there is one problem, however. As noted, the higher frequencies require the signal to use push pull drivers. It can't be open drain. This is not a big problem for the signals we output as this is something we can control. But we can't control the Pokey outputs. Pokey outputs are always open drain. This puts a limit on the frequency of the output clock, somewhere below 100 KHz. When our clock signal toggles at close to 600 KHz, it means that we can only align the serial clock every few cycles. At the highest frequency, PHI2/3, it becomes pretty challenging to align the clocks. And not only because of the higher ratio between both clocks caused by the open drain limitation. To align our clock we temporarily increase, or decrease the frequency slightly. But at the maximum bitrate, if we increase the frequency even more, there is danger that Pokey will need to process a bit in two PHI2 cycles only, which might fail to shift correctly. That means that we cannot increase the frequency blindly at any arbitrary point. We can only do it safely at the start bit that is already twice longer at this high frequency. An additional problem is that these signals, depending on the case, might be rather glitchy. You can even see some glitches on the traces I posted earlier. For measuring the output clock delay some kind of glitch filtering might be needed. We used the output clock, but it is also possible to use the data signal SIO_OUT. The two-tome mode is very handy to create a constant clock on this signal.
