bugbiter Posted September 20, 2014 Share Posted September 20, 2014 Hi everyone, I've written a program which has to load several files into memory, but the reading speed from a real 1050 drive is very slow- instead of the familar didididididit.. noise It goes dit-dit... dit-dit.. dit-dit.. All in all it has to load 90K into extended memory so the slow speed is very annoying. I don't load one big chunk of data with one CIOV call but have to load segnents of 32 bytes from the buffer into different memory spots, so I guess my loading routine is a few microseconds too late when it comes to reading the next sector. The floppy has already rotated too far for the next sector and it has to wait for another spin to get it. I somewhere read that consecutive sectors are laid out on the disk physically in a sort of half-turn pattern to match the speed of reading calls from CIOV in a normal loading routine. Is there a tool that can spread the consecutive sectors of a file in a different pattern that can optimize reading speed, e.g. taking the sector that comes 'next' in turns of 'physical degrees' on the disk so that the delay in reading processing won't affect reding speed that much? Quote Link to comment Share on other sites More sharing options...
Rybags Posted September 20, 2014 Share Posted September 20, 2014 Normal format is close to optimum - but on a 1050 at least I found reading the sectors in reverse order each track is slightly faster. Your problem is likely too many CIO calls per given amount of data and/or subsequent processing of the data is taking too long. You'd probably be better off having a bigger buffer - load chunks of 1K or so then do the processing. A pause every 8 sectors or so is more tolerable than every sector. In any case, since your using CIO anyway, you're at the mercy of where whatever Dos wrote the files put them. In theory you could change the layout so a file is spread among sectors differently but lots of work and no guarantee of benefit. 2 Quote Link to comment Share on other sites More sharing options...
Irgendwer Posted September 20, 2014 Share Posted September 20, 2014 (edited) Is there a tool that can spread the consecutive sectors of a file in a different pattern that can optimize reading speed, e.g. taking the sector that comes 'next' in turns of 'physical degrees' on the disk so that the delay in reading processing won't affect reding speed that much? Nothing that I have heard of, but that wouldn't help anyway: IMHO you would have to change the 1050's firmware to accomplish the desired effect. AFAIR this was also the design principle of the '1050 Turbo' enhancement. You may find useful information in chapter 9.1 in http://www.strotmann.de/~cas/Infothek/1050Turbo/anleitung_1050_turbo.pdf It could help addressing sectors via SIO directly instead of using CIO, but beside leaving common file formats this would mean much work 'only' for standard drives speed-ups. Loading bigger chunks to memory and distributing them afterwards sounds more feasible to me. You may also could support a (simple?) compression scheme, changing transfer time to CPU time... (Rybags was faster ) Edited September 20, 2014 by Irgendwer Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted September 20, 2014 Share Posted September 20, 2014 Agree with Rybags. Minimize IO by reading as much as possible into an intermediate buffer, and then pull 32 byte records out of that. Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 20, 2014 Author Share Posted September 20, 2014 Hmm.. The thing is, I don't have much memory left for a huge buffer. I'm using up virtually all the memory of an 130XE including The RAM under the OS ROM. (E000 to FFFF) I can't use SIOV while the OS is switched off; I'm loading to 4000-8000, copying from 4000 to E000 and then load the last file to 4000 again. Thanks very much for the Strotmann link. I will try to write a Turbo basic routine that saves a file in a modified sector pattern via SIO and see what happens during loading :-) Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted September 20, 2014 Share Posted September 20, 2014 If you're calling CIOV, you must be re-enabling OS first, so you can call SIOV too? Quote Link to comment Share on other sites More sharing options...
Rybags Posted September 20, 2014 Share Posted September 20, 2014 Ditching the filing system ties you to disk or ATR images which can be a disadvantage, e.g. won't lend well to running from H: type file device on emulation or file server on SIO2PC emulations. Surely you could spare 1 or 2K during initial load? If things get tight towards the end just fallback to something smaller like 100-200 bytes. Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 20, 2014 Author Share Posted September 20, 2014 If you're calling CIOV, you must be re-enabling OS first, so you can call SIOV too? Sure, OS is only switched off during the 8K copying process from 4000 to E000 Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 20, 2014 Author Share Posted September 20, 2014 Ditching the filing system ties you to disk or ATR images which can be a disadvantage, e.g. won't lend well to running from H: type file device on emulation or file server on SIO2PC emulations. Surely you could spare 1 or 2K during initial load? If things get tight towards the end just fallback to something smaller like 100-200 bytes. Using the upper RAM also ties me to certain DOSes which don't use the RAM under OS as well. So I go for an ATR wich boots XDos and autoloads my program and then loads the files from D1. This works on SIO2PC as well. During development now I use WUDSN with Altirra. I load the files from H: there, but I don't see how I could do that with the finished XEX file. You would then be free to use any DOS and with SpartaDOS for example it would crash because it also uses the RAM under OS. Quote Link to comment Share on other sites More sharing options...
+CharlieChaplin Posted September 20, 2014 Share Posted September 20, 2014 Hmmm, you said you are using RAM under the OS: $E000-FFFF - how about the rest $C000-CFFF and $D800-DFFF ?!? Thats another 6 kbytes... Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 20, 2014 Author Share Posted September 20, 2014 Yes you're right, but this area cannot serve as buffer because I can't switch the Rom off during I/O. The file segments I have to load are all 8K each, so this area is not enough to hold one segment and free some other area. I could place my program code there, the code switches off all interrupts so it can run while OS is off, but when OS is off I don't have the ROM Charset anymore.. But I cannot load the code there directly, I'll have to copy it there. Maybe I'l try that sometime. Quote Link to comment Share on other sites More sharing options...
Rybags Posted September 20, 2014 Share Posted September 20, 2014 You could read chunks of 256 bytes then move to your 4K under the OS each time. Once the 4K is full, do your 32 byte moves from there. Quote Link to comment Share on other sites More sharing options...
ricortes Posted September 20, 2014 Share Posted September 20, 2014 Could you explain your technique with a bit more detail for me? I mean are you using 128 byte or 256 byte sectors, setting up CIOV to only pull 32 bytes at a time, that kind of stuff. Why does the data need decoding/moving? Couldn't it be arranged such that it is contiguous in the format you need? Seems like you could use your current method to construct the memory image you want, then write out the memory as a contiguous block. Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 20, 2014 Author Share Posted September 20, 2014 I want to do an animation using my interlaced APAC pictures. My .BGP File Format for interlaced APAC pictures has the Brightness picture as one 4K block and the Colour picture in one 4K block after that. My loading routine has to put line#1 (32 bytes) of the brightness Block into Video Memory for Frame A and then line #2 into Video Memory for Frame B, then Line#3 in A again and so on. When The Brightnes Block is done then the Colour Block line #1 goes into Frame B, second into Frane A and so on. You're right. Since I plan the animation program as a standalone thing I could convert the original .BGP Picture File into a format which already has the interlaced line order so I can load it as one contiguous 8K chunk. Quote Link to comment Share on other sites More sharing options...
+JAC! Posted September 20, 2014 Share Posted September 20, 2014 The problem with "optimizing" the sector order is that what is optimal for one type of drive will often even be worse for the other (810/1050/1050 Turbo/XF 551). Disks formatted for my 1050 TURBO were a nightmare on every other Atari :-) What I can comment (apart from giving CIO/SIO a chance be increasing the buffer size), it to simple reduce the amount of data . Graphics and esp. animations can typically be packed quite efficiently. Quote Link to comment Share on other sites More sharing options...
Chilly Willy Posted September 20, 2014 Share Posted September 20, 2014 What you are talking about is sector interleave, and is set when the disk is FORMATTED, not when sectors are written. Once the track is formatted, the order of the sectors is now fixed. So you need a new format routine to change the sector interleave, but the format is part of the disk drive bios, not part of the SIO. You would need to reprogram the disk drive to change that... or write a PC program to format tracks with the proper sector size, count, and the interleave you wish. PC floppy controllers are capable or reading/writing/formatting disks in an Atari compatible manner, but you need to do your own program for that. Quote Link to comment Share on other sites More sharing options...
+bob1200xl Posted September 21, 2014 Share Posted September 21, 2014 The USDoubler from ICD will format a disk in any interleave you like. Check out the Custom Format command ($66). Bob Quote Link to comment Share on other sites More sharing options...
Rybags Posted September 21, 2014 Share Posted September 21, 2014 Another alternative is just construct the Display Lists such that the non-contiguous frame layout is catered for. If CPU and memory isn't critical during display, that is - you'll lose 2 extra bytes/cycles per scanline since LMS is needed for every mode line. Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted September 21, 2014 Share Posted September 21, 2014 Sounds familiar when I was putting Arsantica demo together but loading 32 bytes once sounds not best way... I would use a sector buffer 128 or 256 bytes io buffer at least... Or use xBios and lz4 pack the stuff... Which xbios you can load directly into ram under OS... And depack while loading... This is what I am using for Arsantica 2. Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 22, 2014 Author Share Posted September 22, 2014 Another alternative is just construct the Display Lists such that the non-contiguous frame layout is catered for. If CPU and memory isn't critical during display, that is - you'll lose 2 extra bytes/cycles per scanline since LMS is needed for every mode line. In my BGP Photo viewer that's exactly the way the Display List works. An LMS in every Antic scanline that interlaces Bri and Hue lines. This way the Bri and Hue blocks can stay contiguous in memory. But since I want to do animation this Display List would force me to change all (128 here) LMS entries for change of frames which I didn't manage in one VBI (even including scantime of the blank spaces above and below the display) So I chose to interleave the lines in memory. That way I can switch frames with just one LMS change. Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 22, 2014 Author Share Posted September 22, 2014 What you are talking about is sector interleave, and is set when the disk is FORMATTED, not when sectors are written. Once the track is formatted, the order of the sectors is now fixed. So you need a new format routine to change the sector interleave, but the format is part of the disk drive bios, not part of the SIO. You would need to reprogram the disk drive to change that... or write a PC program to format tracks with the proper sector size, count, and the interleave you wish. PC floppy controllers are capable or reading/writing/formatting disks in an Atari compatible manner, but you need to do your own program for that. Yes, I know, the sectors all get their logical number during formatting. I thought it made no difference for reading speed if the sectors were 'hard' interleaved and read according to their count (1,2,3,4...17,18) or if they were numbered straight and read in an interleaved order (1,10,2,11,3,12...9,18) Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 22, 2014 Author Share Posted September 22, 2014 The problem with "optimizing" the sector order is that what is optimal for one type of drive will often even be worse for the other (810/1050/1050 Turbo/XF 551). Disks formatted for my 1050 TURBO were a nightmare on every other Atari :-) What I can comment (apart from giving CIO/SIO a chance be increasing the buffer size), it to simple reduce the amount of data . Graphics and esp. animations can typically be packed quite efficiently. Hi JAC! My BGP pictures are heavily dithered, so I guess there won't be lot of regular patterns that could be compressed much. But I really never used ANY packing routines with my Atari. Maybe I should try. What can you recommend? Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 22, 2014 Author Share Posted September 22, 2014 Sounds familiar when I was putting Arsantica demo together but loading 32 bytes once sounds not best way... I would use a sector buffer 128 or 256 bytes io buffer at least... Or use xBios and lz4 pack the stuff... Which xbios you can load directly into ram under OS... And depack while loading... This is what I am using for Arsantica 2. OK, I'll have a look. As I said, dithering makes me doubt I will get much of a compression ratio... Quote Link to comment Share on other sites More sharing options...
+JAC! Posted September 22, 2014 Share Posted September 22, 2014 With much dithering, lossless compression will typically not yield much, but on the other hand lossy compression might work without visible artifacts. The master of the latter is algorithm http://www.pouet.net/user.php?who=66775. If you have chroma and luma separated anyway, just pack them separately with ZIP to see would be the result. That gives a good estimate of the best that would be possible. Quote Link to comment Share on other sites More sharing options...
Chilly Willy Posted September 22, 2014 Share Posted September 22, 2014 Yes, I know, the sectors all get their logical number during formatting. I thought it made no difference for reading speed if the sectors were 'hard' interleaved and read according to their count (1,2,3,4...17,18) or if they were numbered straight and read in an interleaved order (1,10,2,11,3,12...9,18) Very good point. If you know the order the drive formats the track in and you know the amount of overhead used by your program + cio call, you can order the data in non-consecutive sectors to cover that overhead. Software interleave the data instead of hardware interleave. You'll need to make a command line tool that takes an input data file and an interleave value, and outputs a new data file with the interleaved order that you then dump to sectors on the drive. Then just try a range of interleave values to see what actually works best. If you have something like sio2sd or sio2pc, that wouldn't take but a day's work, if that. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.