Jump to content


  • Content Count

  • Joined

  • Days Won


ijor last won the day on July 31 2011

ijor had the most liked content!

Community Reputation

706 Excellent

About ijor

  • Rank
    River Patroller

Contact / Social Media

Profile Information

  • Interests

Recent Profile Visitors

12,020 profile views
  1. Hi E474, Great board! Isn't it? Insane powerful for that price! I'm not using any other components at all and I do am connecting the pins directly to the SIO connector. In theory some kind of buffering might be recommended because these chips aren't really 5V tolerant. They are 5V tolerant only as long as it is powered. That means that the chip might be damaged if you power up the Atari but not the board. In this case, actually, the signals are not directly connected to 5V at the Atari, but only through rather weak pullup resistors. So the risk might be low. Also the buffering gets a bit complicated because at least the output clock must be tristated. If you also consider that the board is so cheap that it is not big deal if you need to replace, then the whole idea of using buffering, might be even silly Btw, I accidentally erased the original flash. Not big deal but I would like to get it back, if possible. Do you still have the original factory flash that power up as a SD card reader? If so, exactly which version of the board you have?
  2. It doesn't exactly bit-bang. But it does have accurate control of the I/O port. To be more precise, the processor writes to a counter that would toggle the output at any exact cycle you want. So you actually get cycle accuracy without the need of running cycle accurate code in assembler. So yes, it is perfectly possible to use two stop bits. Actually, it is possible to make any bit arbitrary longer and, as a matter of fact, I am making the start bit longer because Pokey needs that. Remember this is synchronous mode, the bit time is determined by the serial clock pulses. So you can make any bit longer or shorter, by adjusting the corresponding serial clock pulse. Once again, there is no need here. Not as long as we use the fast receiver loop that checks IRQST every other byte. But somebody might prefer a more traditional implementation. I used the "slower" loop for a lower, 4 PHI2 cycles per bit, bitrate. That would mean ~447 KHz. I didn't try something in the middle between 3 and 4 PHI2 cycles per bit. But that is perfectly possible to implement and in that case a longer bit might be a good solution. Not exactly. That bit has a similar, but not identical, behavior than the IRQST status bit. And I did consider to use it at some point. The problem is that this bit is too volatile. The IRQST bit means that a byte has been completed and is ready in SERIN. The SKSTAT bit means that a byte has been completed, AND, another bit haven't started yet. It toggle only during the stop bit. It toggles back as soon as the next start bit is received. At our fastest bitrate that means that we have only 3 cycles to catch that bit. That is impossible because even without ram refresh cycles, you can only test SKSTAT every 5 cycles in the best case.
  3. Yep, mentioned that above already. I didn't miss this one, but don't worry, I made lots of worse mistakes than that Nice idea. I was expecting you'll come up with some optimization No, I'm not running from page zero, but that could be done if needed. I think there is no need here. As you both are saying, it would need some reshuffle to take any advantage of the optimization. But I think that routine is optimized enough already. But certainly a good idea anyway. No, that won't work. The accumulator must have $20 at the top of the loop to check the IRQST status bit. Not exactly. The loop can take slightly more than 57 cycles because you don't get 9 refresh cycles on every iteration. Let's forget for a moment that we have a loop that process two bytes and let's assume a simpler, one per byte loop. If the first byte arrive in cycle 0, you don't need to read the next byte in cycle 33. As long as you read it before cycle 66 is good enough. So if you read the first byte say, in cycle 1, you have 64 cycles to read the next one. Of course, if you take more than 33 cycles on every iteration, you'll eventually be too late. But as long you can compensate longer with shorter iterations, that's fine.
  4. I don't have a fully developed production code. This is more a proof of concept. You can say that the current code is actually a disorganized collection of routines designed for testing and debugging. No, the command frame can't be send at the same speed because Pokey output signals are open drain with rather weak pullups, they can't toggle that fast. I am using a "safe" ~64 KHz bitrate for transmitting the command frame. The maximum frequency depends on the presence or not of the "infamous" capacitors on the SIO signals. It is possible to perform some extra optimization here. As I said already, I'm not sending the 'Ack' and 'Complete' bytes. But it is also possible to transmit a smaller command frame of just one or two bytes. Not sure it is really worth, but it is definitely possible. The computer does send an initial "start sync mode" command. This command is sent using standard SIO bitrate.
  5. Yes, of course. But that would take extra time.
  6. The Atari Software The software at the Atari side is mostly very simple, almost trivial. The only exception is the loop that actually receives the packet that must be very efficient at the higher frequencies. At the higher frequencies all interrupts, and even Antic DMA must be disabled. At the maximum speed we have only 33 cycles per byte in the best case. Even with interrupts and DMA disabled it is still difficult to process a byte, including updating the checksum, in less than 33 cycles. The best code I came up, using a "traditional" polling approach takes 36 cycles in the best case. I omitted the timeout check code here. It is a bit complicated, but it doesn't add more cycles, only adds more bytes: ; Traditional polling approach ; 36 cycles per iteration in the best case ldx #0 :nextByte lda #$20 ; 2 :waitByte bit IRQST ; 4 bne :waitByte ; 2/3 stx IRQEN ; 4 Reset IRQ bit sta IRQEN ; 4 Re enable interrupt lda SERIN ; 4 sta secBuf,Y ; 5 modified at run time before the loop adc chksum ; 3 sta chksum ; 3 iny ; 2 bne :nextByte ; 3 There are a couple of variants that would take the same number of cycles. It is also possible to unroll the loop and save a couple of bytes. Still not good enough when you consider that ram refresh can steal up to 9 cycles. I'm not very familiar with the 6502 undocumented instructions. But at first glance it doesn't seem they could help here, at least not enough, and I didn't want to use them here anyway. There are some extremely talented codes here, may be somebody can come up with a faster loop? Of course it is possible to compute the checksum separately after the loop. But we don't really gain anything. The extra time to compute the checksum would be about the same as just using a slower birtate and compute the checksum on the fly. The goal for maximum efficiency is to update the checksum inside the loop ... It is not difficult to note that just resetting the IRQ takes 8 precious bytes. Unfortunately we must sill do this, even with interrupts disabled and using polling, because Pokey doesn't have a separate status bit to check if a byte is available in SERIN (there is no such bit in SKSTAT). Furthermore, this bit doesn't auto reset. Most UART chips automatically reset the corresponding status bit when the CPU reads the receive buffer (SERIN) and/or the status byte itself. Here we must do it "manually". But do we really need to reset the interrupt? In every byte? Well, not necessarily. We know almost exactly how many cycles each bytes takes, 33 cycles in this case. This can be negotiated as part of the protocol if needed. Even when we consider the jitter introduced by ram refresh, we still have an estimation precise enough at which cycle the next byte would be ready. So we just check and reset the status bit every other byte. We can read the next byte blindly from SERIN at the right time to never be too early neither too late. The saving is huge: ; As fast as 56 cycles per loop iteration ; That's 28 cycles per byte ! lda #$20 :nextByte :waitByte and IRQST ; 4 bne :waitByte ; 2/3 sta IRQEN ; 4 Reset IRQ bit lda SERIN ; 4 sta secBuf,Y ; 5 modified at run time before the loop adc prevByte ; 3 Add previous byte adc chksum ; 3 sta chksum ; 3 iny ; 2 ; Second byte ; Dummy delay to make sure we don't read SERIN too early ora chksum ; 3 lda SERIN ; 4 sta secBuf,Y ; 5 sta prevByte ; 3 Better to process checksum above lda #$20 ; 2 sta IRQEN ; 4 iny ; 2 bne :nextByte ; 3 Note that it is ok if one iteration takes a little bit more than two byte times because of ram refresh DMA. We would compensate on the other iteration. If all the 9 ram refresh cycles fall into one iteration, then no ram refresh at all would happen on the next one, and no on the previous either. We have extra time at the first part of the loop, otherwise we might read SERIN too early. That's why the first part updates the checksum for both bytes. Of course, the last byte must be added after the loop (not shown). Btw, I said in the other thread that the total processed time was computed at real time, and you might think that it not easy at all to do that with all interrupts disabled and without updating some counter. You are absolutely right. It is computed on real time, but not by the Atari. The ARM MCU at the other side starts a timer when receiving the first read sector command, and it stops when the computer send a special command. The MCU replies with a packet containing the total elapsed time. I was lazy and I even send the timing it in pure ASCII because it is so much easier to do the binary to ASCII conversion at the powerful ARM side.
  7. Using a hardware flip flop inside the computer is a good solution in other similar cases, but here it is not trivial to implement. This serial clock signal is bidirectional, it is sometimes driven by Pokey when using normal SIO async mode, and it is open drain. You can't simply put an unidirectional buffer or flip flop at the SIO port. Anyway, the goal is to avoid any hardware modification.
  8. Aligning the serial clock As noted previously, Pokey can sometimes miss a serial clock pulse. One way to deal with this is simply to implement a reasonable detect and retry strategy that considering the error rate, it would not affect performance too significantly. If we actually want to avoid the errors, we need to align the serial clock with Pokey's system clock, PHI2. Unfortunately PHI2 is not available at the SIO port. But all the signal outputs produced by Pokey do are synchronized with PHI2. It is then still possible to align the clocks to some extent, even without extra hardware. For this purpose we need Pokey to output a signal at a frequency high enough. We used the output clock signal, the one shown in the traces in a previous post. We measure the delay between our serial clock output and Pokey's output clock. Most MCUs have powerful hardware timers that can measure the delay between two signals rather precisely. From this delay we can compute the current phase alignment of both clocks and adjust our one accordingly. This clock alignment won't be as precise as using a hardware PLL, but it doesn't have to be. All we need is to avoid a small dangerous phase alignment range that might provoke the error. This adjustment must be performed as frequently as possible. Ideally on every serial clock cycle. If this is implemented by software, it might require a rather fast processor at the higher frequencies. At the higher frequencies there is one problem, however. As noted, the higher frequencies require the signal to use push pull drivers. It can't be open drain. This is not a big problem for the signals we output as this is something we can control. But we can't control the Pokey outputs. Pokey outputs are always open drain. This puts a limit on the frequency of the output clock, somewhere below 100 KHz. When our clock signal toggles at close to 600 KHz, it means that we can only align the serial clock every few cycles. At the highest frequency, PHI2/3, it becomes pretty challenging to align the clocks. And not only because of the higher ratio between both clocks caused by the open drain limitation. To align our clock we temporarily increase, or decrease the frequency slightly. But at the maximum bitrate, if we increase the frequency even more, there is danger that Pokey will need to process a bit in two PHI2 cycles only, which might fail to shift correctly. That means that we cannot increase the frequency blindly at any arbitrary point. We can only do it safely at the start bit that is already twice longer at this high frequency. An additional problem is that these signals, depending on the case, might be rather glitchy. You can even see some glitches on the traces I posted earlier. For measuring the output clock delay some kind of glitch filtering might be needed. We used the output clock, but it is also possible to use the data signal SIO_OUT. The two-tome mode is very handy to create a constant clock on this signal.
  9. One of the most annoying, but yet interesting issues of synchronous mode is that Pokey might miss serial clock pulses. This seems to be a defect of the edge detector. It doesn't seem to have any relation to the clock frequency. It happens even at the standard SIO 19.2 KHz frequency. The serial clock pulse might be missed when it reaches Pokey too close to the falling edge of the system clock. Note that what matters is the timing inside Pokey which is quite different than what we can observe externally. This is a logic analyzer trace capture when Pokey was transmitting synchronously. It is easier to see the effect of the missing pulse when Pokey is transmitting. Pokey was running a program that was constantly transmitting bytes in increasing order. The trace shows Pokey transmitting hex $0F, $10 and $11. But the middle byte instead of being $10, it became $20. It also produced a frame error and the next byte ($11) started one cycle later. The high bit marked with the red marker number 0, is one cycle too late. If should have been one cycle earlier. The only way this could happen is if Pokey missed one of the previous serial clock pulses. Note that the external serial clock was at 19.2 KHz showing that the issue affects low frequencies as well. This one above is Pokey receiving synchronously. We can't see Pokey internal interpretation of the received data. But we can see when Pokey detected the end of the byte by monitor the IRQ signal. All Pokey interrupts were disabled except the serial receive interrupt. So each pulse of the IRQ signal represents Pokey completed the reception of one byte. The trace shows 3 IRQ pulses marked with 3 markers, "0" in read, "1" in Green and "2" in violet". There are ten serial clock pulses between marker "0" and marker "1" for the second byte. This is normal and expected because each byte has ten bits including the start and the stop bit, even in synchronous mode. But between makers "1" and "2", for the third byte, there are 11 cycles, not 10. Which, in this case it can only happen, again, if Pokey missed a pulse of the external serial clock. You might note that the stop bits seem to be in a strange position. Those are actually data bits and not stop bits. Stop bits were sent intentionally with the wrong polarity (as 0 instead of 1). The idea was to produce a single pulse of the data signal, and locate it far from the start and stop bit. This way we can rule out that Pokey was missing the start bit data transition, because here there is no such transition. Of course that this would produce frame errors on every byte, but for the purpose of this test we didn't care. The last interesting thing to note in both traces, is the pattern of the output clock signal. Pokey has two external clock signals. One is the bidirectional clock that is used for receiving, this is the clock that we drive from the peripheral in synchronous mode. The other is the output clock for transmitting. The transmit clock can be configured to replicate the incoming external clock. This is the configuration used in these traces. In this case Pokey buffers and synchronizes the external clock, then connects it to the clock output one one side, and also to an edge detector that feeds the transmitter logic. As you can see, the output clock doesn't show any missing pulse. It replicates the external pattern faithfully. Which means that the serial clock pulse is not entirely missed by Pokey. Seems that only the edge detectors are the ones that missed the serial clock pulse.
  10. Ah, ok, I understand. Yes, that would probably be doable. Good, but not every Nucleo is good enough. At least not if you want to support all the features. And by that I mean clock aligning (no errors) at the maximum frequency (~597 KHz). I'm using a board like this that usually costs even less than a Nucleo: https://www.ebay.com/itm/Core407V-STM32F407VET6-STM32-Cortex-M4-Development-Board-Motherboard-Module-Kit/182289028741
  11. Not all the SIO protocol is preserved. I'm not using A/C/E because they are a waste in this context, although this could be added, if wanted. But yes, the SIO checksum is fully supported and it is computed on the fly while receiving the packet. Which, believe me, it's not an easy thing to do at this speed. I'll elaborate and post full sources in the other thread.
  12. I assume this would be for 800 mode, because otherwise you could use the PBI, or I miss something? I'm not really familiar with the Incognito hardware. Does it have any kind of MCU or FPGA? Or it is just a CPLD and the firmware runs on the Atari CPU? There are here two sides, as in a typical SIO2XX implementation. The Atari needs a special loader, but it is rather small and I don't see a problem with that. However you need something on the other side to generate the synchronous serial transfer, and it has to be something much faster than Sally or a bare 6502. I used an STM32 MCU. Don't know, in your case you have access to the system (PHI2) clock I guess, right? This should make things much easier to process transfer fully synchronously.
  13. This is not a trick or an April 1st joke. It is very real. The video is a live capture of a real Atari 8-bit computer without any hardware modification whatsoever. The numbers displayed were computed on real time when processing the transfer.
  14. This is a detailed technical description about Pokey and SIO synchronous mode using an external clock. It is a continuation of a research I started years ago in this thread: https://atariage.com/forums/topic/139769-sio-synchronous-mode I am compiling a detailed document that would be available here together with sources and other technical data: https://github.com/ijor/sioSync In the meantime a quick highlight of the main issues for those technically oriented. Max bitrate is one third of the main system clock, which would be ~597 KHz. Pokey can't shift a serial bit in less than 3 cycles. I will elaborate about this later. The start bit must be more than PHI2 5 cycles. So the maximum effective speed is about 33 PHI2 cycles per byte, or ~ 54 KBPS. Higher frequencies require using push pull signals, including the serial clock. The pullups on these signals are too weak for open drain operation at the higher frequencies. Without special precautions Pokey sometimes might miss a pulse of the serial clock. This would create errors about once every 100 packets transferred regardless the bitrate. It is possible to avoid this by aligning the serial clock correctly to the system clock. This can be done purely with software without any hardware modification, although it might be challenging at the highest frequency. The software side at the Atari it is also challenging at the maximum frequency. At 33/34 cycles per incoming byte both interrupts and Antic DMA must be disabled. Even then, it is difficult, but not impossible, to update the checksum on the fly. Will follow up with more details.
  15. Hyper fast SIO loader at 600 Kbps ... SioSync-Test1.mp4 Using synchronous mode, of course. Technical details, docs and link to source code, shortly in a separate thread.
  • Create New...