Jump to content

HiassofT

Members
  • Posts

    1,331
  • Joined

  • Last visited

  • Days Won

    1

Everything posted by HiassofT

  1. If possible you should avoid VBI code while SIO is running. Deferred VBIs will most likely never be executed (due to CRITIC being set most of the time), and immediate VBIs will result in transmission errors (overruns / lost bytes) at very high SIO speed (or might also be disabled, if my highspeed code is used :-). So it's best to stick with the init-block method, it's safe to use and you have as many CPU cycles as you like. I've uploaded a Win32 command line EXE of ataricom: http://www.horus.com/~hias/tmp/ataricom-090315.zip There are no docs yet, but at least you get a list of options when calling it without any arguments :-) A short intro: If you just specify a COM file (eg "ataricom TEST.COM") it outputs the block numbers, addresses sizes and other information about the COM file. You'll need these block numbers for the other operations. To merge block 2-4 of TEST.COM simply type "ataricom -m 2-4 TEST.COM MERGED.COM". To split block 2 of MERGED.COM at addresses $5000, $6000 and $7000 use "ataricom -s 2,0x5000,0x6000,0x7000 MERGED.COM SPLIT.COM". Using either "-b" or "-x" options you can specify which blocks to use. For example "ataricom -b 2 -b 4-8" tells ataricom to process only blocks 2 and 4-8. "-x 2 -x 4-8" is the opposite, all blocks except 2 and 4-8 are processed. BTW: The default operation mode is "copy", if you don't use "-m" or "-s". So "ataricom -b 2 IN.COM OUT.COM" will write the second block of IN.COM to OUT.COM. If you have questions, or find bugs in ataricom, feel free to contact me. so long, Hias
  2. A simple an easy solution for a loading bar could look like this: Use a simple header block that just sets up the screen/display list. No need for VBI/DLI etc. Then cut the program into several pieces (for example 10). After every piece add a short block that contains the percentage (2 screen bytes) and replaces the currently displayed percentage. This adds just 6 bytes for every "percentage update". If you prefer a "real loading bar", add 2 blocks: one containing the percentage (1 byte) plus an init-block that calls an "update display" routine of your previously loaded code. This adds 5+6=11 bytes for each update. Monitoring SIO could be nasty. You'd have to check for SD/DD, possibly also 512bytes/sector when running SDX, and it could also be that SIO isn't involved at all (for example when loading the program from a ramdisk or a flashcart). OTOH manual ticks/updates always work, unless the COM loader code is completely broken :-) BTW: Some time ago I wrote a program ("ataricom", included in AtariSIO - I can also send you a Win32 EXE if you want) to deal with Atari COM files: extract some specific blocks, split/merge blocks, add run/init blocks etc. so long, Hias
  3. Ah, OK, now I get it! Somehow I only thought about busy-waiting (without receiving bytes) until VCOUNT is positive again. Stupid me :-) so long & thanks a lot for sharing your ideas, Hias
  4. You don't even need the Status command. If the FDC detects a (data) CRC error when reading a sector, the floppy returns a command error ("E") instead of a command complete ("C") and the SIOV return status is != 1. But you should also compare the data you read back from the drive with your original data. The floppy might have written wrong data, but this wrong data still might have a correct CRC (the CRC is calculated by the FDC). This might happen if there are undetected SIO transmission errors (the SIO checksum is very simple and weak), defective RAM in the floppy drive or a bug in the floppy ROM - the Happy 1050 bug with highspeed and not enabling fast writes comes to my mind. so long, Hias
  5. Hi! Mainly because I didn't want to mess around with $D40E. But I think it would be OK to set it to $40 (at the end), like OS rev. A does (as a result of calling SETVBV/$E45C). OTOH if possible I'd like to avoid this. Nice idea! The delay wouldn't be a problem at all. This is the problem. VCOUNT increments every second scanline, but at 126kbit a byte is received almost every scanline. So delaying for (at least) 2 scanlines would be too long. In terms of CPU cycles: we have 141 cycles between each byte, a scanline is 114 cycles (228 color clocks). Ouch. I had some other ideas, but none of them works correctly: - One idea was to disable NMIs and do timeout handling according to your idea of using the POKEY timer interrupt in polled mode. I still like the idea, but this involves messing around with $D40E plus it will be hard (actually I think impossible) to still have correct RTCLOCK ($12..$14) values, inremented once per NMI. - Another idea: don't increment $12 and $13 in the VBI code, but do it outside, after SIO is finished (for example check if $14 overflowed and then increment $13/$12). This will work OK most of the time, but fail if $14 overflowed more than once, for example if you get a timeout in a long SIO operation like formatting a disk. This might even happen when a read or write operation times out, which has a default timeout of 7 seconds and $14 is close to overflowing at the beginning of the SIO operation. Personally I don't care if $12..$14 are counting correctly, but some other users might do, for example if they are using a software clock based on $12..$14. So: it's all really tight, the NMI code takes just a few cycles too much in the worst case scenarios, but I have no idea how to shorten the code without sacrificing some compatibility. so long, Hias
  6. Now to round 2, here's a new version of my patch: http://www.horus.com/~hias/tmp/hipatch-090313.zip I now managed to get reliable operation at 110kbit (divisor 1) without patching the system NMI handler. Divisor 0 results in intermittent errors, especially when system clock 1 ($14) overflows. The trick to achieve this is quite simple and also compatible with the old OS rev. A and B OS (for example when using the SIO code in a program like MyPicoDos): If the pokey divisor is 4 or less a fast immediate VBI handler (vector $0222) is installed, that ends in a JMP $E462. At the end of the SIO code the old immediate handler is restored. So, at "normal highspeed" (Happy, Speedy etc) nothing is changed that could cause incompatibilities :-) When using the patched NMI handler pokey divisor 0 runs stable - I did a 5 hours test without a single error. Please note: I renamed the HIPATCH*.COM files, updated the docs, and there are now new diag*.atrs. diag.atr patches the NMI, diag-nonmi.atr doesn't include this patch. The diag-ext*.atr are similar, but output contain a modified SIO code that outputs more debug information. so long, Hias
  7. A: simpler as in not as simple to use / feature rich... yes. B: Costs more A: yes, not so much features as the SIO2SD, especially no display B: no, it's even cheaper than the SIO2SD, mainly because it doesn't need a display. If you build it all by yourself on a prototype board total parts costs are some 15 EUR (including VAT - this would then also be some 15 USD excluding VAT). BTW: I think the price for the SDrive NUXX kit is really OK, those large PCBs aren't cheap... I own a SIO2SD (and really like it) and just ordered the parts for 2 SDrives. I'm really curious about how the SDrive is in daily use :-) so long, Hias
  8. That depends. The SD standard defines sizes up to 2GB. Cards with 4GB and more usually use the SDHC standard. There were a few 4GB SD (not SDHC!) cards, but since this size is not officially defined in the SD standard many devices might not be compatible with it. Then: SDHC cards don't work in devices that only support SD. I tried this once with my USB cardreader, it wouldn't even recognize the SDHC card. I haven't tested it with the SIO2SD, but I doubt it supports the SDHC protocol. Anyways: I'd just recommend buying a 2GB SD card. Here in Austria a 2GB Kingston SD card costs approx. 4EUR, a 1GB card is almost the same price (3.50-4 EUR). Smaller cards are almost impossible to get, and if they are available they cost a lot more than 2GB cards (10 EUR and more). Even if you don't need the space for normal usage, you can use the remaining space for a backup of all your Atari stuff. so long, Hias
  9. I think it's better to do this within the loop, as it's faster after all. Since I now (hopefully) got this working, I'll stick to this method (for now). Calculating the checksum is also a little bit complicated as the length of the data block can be any arbitrary value (4 byte status, 12 byte percom, 128/256 byte sectors, some 1k when loading the highspeed code from atariserver, ...). So optimizing it is quite tricky. I've just ran some more tests with a modified immediate VBI routine ($222) instead of patching the NMI handler. This would have the benefit that I could install it only temporarily (when doing SIO). But this doesn't work too well, I get occasional timeouts (unnoticed overruns, maybe some race condition again). I also tried disabling the checksum calculation completely, but this didn't help here. The system NMI code seems to take too long (even with my modified code doing a JMP $E462 at the end of the immediate VBI handler). But at pokey divisor 1 the $222 method seems to work fine so far. This is also an important observation, as this makes it possible to use the highspeed code without having to modify the OS ROM - and still get up to divisor 1. I'll try to add this to my patch (for the non-"I" versions), I just hope there's enough space for it in the 1k block. For divisor 0 the only method seems to be patching the OS handler. so long, Hias
  10. You are welcome! So, does this mean you'd like to try integrating my code into the BlackBox ROM, replacing the BlackBox's SIO code? Could be somewhat tricky, I guess it would be easier to replace the OS ROM in the Atari (which then enables reliable transfers at divisors 3-0) and disable the BlackBox SIO code. I'm not familiar with the BlackBox, but if it uses the SIO code of the OS if the dip switch is off this should work. so long, Hias
  11. You could use a simple buffer (bus transceiver, inverter, and/or/nand/... gates - whatever you can find) to de-couple your resistor-ladder from the Antic outputs. Then you should be safe to do whatever you want. so long, Hias
  12. I did some more tests and extended my code. First of all: DLIs are a bad thing during SIO. I setup a DLI in the middle of a graphics 0 screen and then couldn't get faster than pokey divisor 4. Divisor 3 results in very frequent errors. This was my DLI code: DLI PHA LDA $D40B STA $D40A STA $D01A PLA RTI If I remove the "STA $D40A" I can go up to divisor 2 without any problems, but divisor 1 results in occasional errors. Then I tried moving the checksum calculation out of the read loop. This worked, pokey divisor 0 was possible with Antic DMA enabled :-) But since now the checksum calculation adds some additional time, the overall transfer rate was reduced, approximately to the speed of the old code at divisor 1. Here's a table of my results (time in seconds, measured by hand using "date" in an xterm, so not too accurate): Edit: time for reading sectors 4-720 of a DD disk with diag.atr div DMA old new 0 off 18 21 0 on -- 23 1 off 20 23 1 on 21 25 So, this didn't really help :-( Next try: I went back to my old code and moved the GETBYTE routine inside the read loop and added a flag to indicate if it should RTS. So, in the read loop I get rid of the "JSR GETBYTE" plus the "RTS", and added 2 new instructions "BIT GETFLG" "BMI DORTS". Now pokey divisor 0 also works with Antic DMA enabled :-) Since the new code is a little bit longer than the old one I had to move some locations so it fits into the 1k ROM block. Here's my current (development) version: http://www.horus.com/~hias/tmp/hipatch-090308.zip I also changed a few other things in the patch: - diag*.atr now start immediately with the tests (no need to press a key before), you can abort a test pressing option, pressing ESC after a test reboots, any other key restarts the test. - HISIO*.COM now delay if an error (or warning) occurred, for example if running it on an incompatible OS. - patchrom.exe can now also be used with the old Atari 400/800 OSes so you can burn an EPROM with the old OS for your XL/XE computer (installing such an OS into the Atari 400/800 might be tricky, since it also uses the 1k block at $CC00) - docs are still not updated :-( so long, Hias
  13. Good idea! I could try to setup the pokey timer so that it triggers an interrupt every second (or some 0.1 second when waiting for the command frame ACK). I'd then still use polling mode and add IRQST checks for timer interrupts. The benefit of it would be that for normal operations (at very high speed) the pokey timer interrupt would (almost) never kick in. Most operations are completed within a fraction of a second, meaning the additional "decrement software timer" code wouldn't be executed. There would still be the VBI, but since I wouldn't normally use CDTMV1 I could optimize the code for the case where CDTMV1=0. But first I'll try moving the checksum code out of the loop, would be interesting if this makes pokey divisor 0 possible. so long & thanks, Hias
  14. Hi Poobah! You are absolutely right, this discussion has become pointless. @atariksi: End of discussion. We don't agree. @all others: sorry for wasting your time with basics all of us know for lots of years. Hias
  15. No problem, it's the same thing here :-) Good suggestion! Currently I'm calculating the checksum inside the loop. Moving it outside will speed things up quite a bit. I'll try that when I have time. Yes and no. If CRITIC is not set, the RTCLOCK code isn't executed at all and my NMI code does the standard JMP ($222) stuff. But if CRITIC is set, JMP ($222) is never done - instead my fast RTCLOCK/timer 1 code is executed. This could have a severe impact on other programs.... Before writing this NMI handler code I made some experiments and set $222 to a faster immediate NMI code - which also worked quite well. But then I thought changing the default value of $222 (or $E45F) could cause some problems, so I went the route of changing the base NMI code. I guess both methods have their pros and cons and we might find programs that are incompatible with either one. So, ATM I'm a little bit clueless.... No, I need up to 255 seconds for full compatibility. IIRC most software sets the timeout to 160 (seconds) when formatting a disk, so I have to support at least this. Nice idea, noted :-) I'm already doing that. I read the byte immediately after checking IRQST. Then I reset IRQEN, after that I check SKSTAT for framing and overrun errors. In previous versions I had the LDA SERIN at the very end, but this could lead to a race condition: If the NMI kicks in between checking IRQST and LDA SERIN, and if a new byte is received at that time, GETBYTE will return a wrong byte (the newly received one, not the one that triggered IRQST at first). Usually these errors are caught by the checksum routine, but under evil circumstances one might end up with garbled data. so long & thanks a lot for your suggestions, Hias
  16. We could argue - in a scientific sense - if the special case s=2*f is an extension of the Nyquist theorem, if it was some mistake in his original theorem or something else. Earlier I explained why s=2*f doesn't work and what the impacts are (phase/amplitude lost). But that's not the point and I guess neither you nor me are interested in it (actually, I don't care if we call this an extension of Nyquist or something else). The point is you are stating that s=f works, and that's simply not true - at least when using the definitions as they are used by the rest of the world. One last time: Your bitstream at 19200bit/sec corresponds to a (maximum) frequency of 9600Hz and your receiver samples at 19200Hz. I don't speculate, a lot of people share the same opinion, a lot of books contain this interpretation and this is taught at universities. You are free to disagree, but so far you have failed to show a proof for it. BTW: The phase/amplitude thing is just a simple illustriation why sampling at s=2*f fails (in the general case, when the phase is unknown). Of course this works. Considering our serial transmission this assures that every bit is sampled at least once (sometimes a bit is sampled twice, but this is no problem - no information is lost, but of course when reconstructing/interpreting the sampled data you have to take this into account). I really don't understand what your are trying to say. There is only one relevant clock here, and that's the clock that's associated with the transmission. It doesn't matter if the sender/receiver is a PC running at 3GHz or an Atari running at 1.79MHz. Your clock is 59659Hz, the receiver samples at 59659Hz, the bitrate is 59659Hz and the maximum frequency of the data line carrying the bits is 29829.5Hz. so long, Hias
  17. Hi Rybags! Yes, that's what I'm doing in my SIO code :-) The only interrupt I need is the VBI for handling the timeout stuff (via system timer 1). Implementing timeouts without the VBI would be possible, but it's a PITA - calculating the maximum number of allowed (wait-) loops, decrementing a counter that's able to represent delays up to 255 seconds etc. Then, when I disabled the NMIs, the only things running in the Atari were my SIO code plus Antic DMA - no interrupts at all. Since this didn't work at divisor 0 (I got overrun errors) the bottleneck must be my code. I still need to do some testing. Maybe trying to optimize the (polled) SIO code so that divisor 0 also works with DMA enabled, plus - next thing on my list - investigating how DLIs affect SIO performance. so long, Hias
  18. I thought about this before (especially the RTCLOCK stuff - Copy2000 also does this), but then I decided to stick to the standard OS code, just eliminating atract mode. The chances of breaking some stuff are too high IMO. I ran some tests, with NMIs completely disabled (i.e. $D40E set to 0), and pokey divisor 0 only worked with Antic DMA off. So the bottleneck here is (currently) no longer the NMI code but the SIO code, in combination with ANTIC stealing too many cycles. I'm not sure if there's an easy way to solve this issue other than inlining the GETBYTE code (which will increase code size too much) and/or by only checking SKSTAT at the end of every data block. Checking SKSTAT only at the end of a data block is also a bad idea since this usually means the code has to run into the timeout (set by DTIMLO) before an error is detected. Pokey signalling an overrun error means that a byte was lost and that the Atari will never receive the 129 (or 257 or whatever) bytes, and framing errors also often mean that pokey got out of sync and some bits (and therefore often one or more bytes) were lost. So while this works fine (and faster) in the case of no errors, it adds quite large delays in case of transmission errors that would have otherwise been detected immediately. so long, Hias BTW: here's my current "fastnmi" code, if you are interested. fastnmi.txt
  19. Hi Peter! Ah, now I got it. But there was a small bug in your code: if the counter reached zero the vector is called at every VBI. This code (which is identical to the OS code) is really tricky, I made several attempts at optimizing it, some attempts ended in slower code, other attempts in incorrect code :-) But your idea was good. I'm now using the following code (and also added some comments): ?NOC1 LDA CDTMV1 ; check lo byte. if its != 0 timer is running and it must be decremented. BNE ?TIDEC ; go decrement, hi byte unchanged because lo != 0 LDA CDTMV1+1 ; this branch: lo = 0 BEQ ?TIEXIT ; timer is zero (disabled, expired), exit from here DEC CDTMV1+1 ?TIDEC DEC CDTMV1 BEQ ?TICHK ; lo reached 0, check if we hi also reached 0 ?TIEXIT PLA ; timer still running or expired RTI ?TICHK LDA CDTMV1+1 BEQ ?RUNT1 ; hi is also 0, execute timer vector PLA ; timer still running RTI This adds 2 bytes of code, but saves one cycle whenever the lo byte is not zero. When it's zero it adds one cycle, but in practice this doesn't happen too often so we save some time :-) If you (or anyone else) has some more ideas, just post them. This small part of code caused me more headaches than I wanted :-) My code uses two different timeouts: one when waiting for the command frame ACK (which is a fixed, short timeout of 2), and another one when waiting for command complete (plus data transmission). The second timeout is calculated from DTIMLO ($0306) and therefore controlled by the user code. I cannot change this timeout, otherwise slow operations (like formatting a disk, or when the drive has to spin up, maybe re-calibrate etc, to read a sector) will fail. so long, Hias
  20. I guess I found the reason why we are not understandig each other: This is not true, I never stated this. Actually, I especially mentioned this case but you snipped off the relevant part of my post #177: Short summary: If you have a clock, sample at s=2*f. If you don't have a clock, sample at s>2*f. As for sampling vs. clock frequency: In the usual case of standard, single clocking (like in Pokey synchronous mode), you sample once per clock cycle (in Pokey: on the falling edge of the clock signal). This means you need to have one full clock cycle per bit. In other words: the frequency of the clock line is twice the (highest) frequency of the data line. Just plot the state of both signals and you'll easily see that clock runs twice as fast as data. For example if you transmit a $55 byte it looks like this: start 0 1 2 3 4 5 6 7 stop --------+ +-------+ +-------+ +-------+ +-------+ +--------- DATA | | | | | | | | | | | | | | | | | | | | +-------+ +-------+ +-------+ +-------+ +-------+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ CLOCK | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ------+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +------- This well known fact has become quite an annoyance in modern technology, because it limits the data rate to half of the maximum of a given transmission channel (for example a cable, bus, ...). The solution to this is to do "double clocking", i.e. you sample on both the rising and the falling edge of the clock signal. But you have to be careful here: although the transmitted clock signal now has the same frequency as the data signal, sampling in the receiver still occurs at twice the frequency of the data signal. Only the way the clock signal is transmitted has changed. so long, Hias
  21. Christoph wrote that sometimes the flash memory might get corrupted or erased, especially when (un-)plugging the device while the Atari was powered on. He had to fix serveral "dead" SIO2SD devices, the Atmel chip was still OK but the programming was lost. After enabling the brownout detector this didn't happen again. so long, Hias
  22. No. You are using a different definition than the rest of the world. If you would like that we are able to understand what you are talking about it would be wise to use the same definitions. BTW: did you try to read and understand my last posting, especially the part about frequency of periodic events? So a last, very simple question: If you want to receive data at 19200 bit/sec, how often (at minimum) do you have look at the input (i.e. IN AL,DX)? If your answer is "19200", then please understand that the rest of the world defines this as a samplerate of 19200Hz. If your answer is something else ("9600" as you have been writing for several pages now), please explain how this should work. so long, Hias
  23. Hi Rybags! OK, you are right. I hadn't thought about this before. I guess I'll change my code back so that it's conforming to the standard again - and just use the shorter VBI code when CRITIC is set. Interesting idea! I'll think about it and do some tests later. so long, Hias
  24. Hi Rybags! Are you sure about this? I couldn't find anything regarding DLIs in the OS source code. The SIO code only sets CRITIC to 1. The NMI code of the OS checks for DLIs at the very beginning - before checking the CRITIC flag. In my code I changed this so that CRITIC also disables DLIs. Having DLIs enabled during very-high-speed SIO wouldn't be a wise idea. And since I don't know if DLIs are enabled or not, I thought it would be best to ignore them. so long, Hias
  25. Just a small hint: After having troubles with the SIO2SD Christoph (HardwareDoc) from ABBUC discovered that it's better to also enable the brownout detector (BODEN, BODLEVEL) of the Atmel. The fuse bits for the SDrive also didn't enable the detector, so instead of Low=0xFF, High=0xDF ist's better to set them to Low=0x3F, High=0xDF so long, Hias
×
×
  • Create New...