tschak909 Posted September 4, 2014 Share Posted September 4, 2014 What happens, when you drop in a 21mhz 65816, and use a SIDE 2 with compact flash? The results are astounding. Amazing, indeed! But wait, there's more! If you drop in a VBXE, load S_VBXE and CON, and switch to an 80 column VBXE console, YOU GET: MUUUUUUAAAHAHAHAHAHAHAHAHAHAHAHA -Thom 1 Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted September 4, 2014 Share Posted September 4, 2014 Did you see Avery's 50fps video player? He made Antic pull data from SIDE2 at around 400KB/s. 1 Quote Link to comment Share on other sites More sharing options...
tschak909 Posted September 4, 2014 Author Share Posted September 4, 2014 Oh yes, I did. Jaw hit floor. -Thom Quote Link to comment Share on other sites More sharing options...
phaeron Posted September 4, 2014 Share Posted September 4, 2014 Was that with 128 byte sectors? It goes a lot higher with 512 byte sectors, ~400KB/sec. IDE uses 512 byte sectors, so using 128 or 256 byte blocks is inefficient -- it forces the driver to transfer a lot more data than is actually used. It's worth noting that these timings depend on Altirra's '816 emulation timings and YMMV will vary on actual accelerators. A rather big factor is whether the transfer buffers and IDE driver are executing from fast memory -- if they are in base system memory, the 65816 will be severely handicapped. Even with fast memory, you can see the effect of hardware register accesses on the inner loop (the timestamp is in the form of frame:vpos:hpos.subcycle): 23690:259: 67.0 | A=FF:20 X=02 Y=02 ( ) | 00:13F3: AD F0 D5 LDA $D5F0 23690:259: 69.0 | A=FF:A4 X=02 Y=02 (N ) | 00:13F6: 91 32 STA (BUFRLO),Y 23690:259: 69.6 | A=FF:A4 X=02 Y=02 (N ) | 00:13F8: C8 INY 23690:259: 69.8 | A=FF:A4 X=02 Y=03 ( ) | 00:13F9: AD F0 D5 LDA $D5F0 23690:259: 71.0 | A=FF:50 X=02 Y=03 ( ) | 00:13FC: 91 32 STA (BUFRLO),Y 23690:259: 71.6 | A=FF:50 X=02 Y=03 ( ) | 00:13FE: C8 INY 23690:259: 71.8 | A=FF:50 X=02 Y=04 ( ) | 00:13FF: D0 F2 BNE $13F3 $D5F0 is the IDE data register, so accessing it requires going over the chip bus... which then requires slowing down the 65816 from 21MHz to 1.79MHz. This makes the instruction take 15-26 CPU cycles instead of 4: 3 to read the instruction, 0-11 to synchronize, and 12 to read. The loop above would be 27 ideal cycles per word without this effect, but instead it takes 60. (The IDE driver managed to get relocated just right for its inner loop to cross a page boundary... DOH!) More serious is whenever the SDX library is called, which executes in place from banked memory in the cartridge. The default '816 settings in Altirra are for all external memory to reside on the chip bus, because it would be impractical for a real accelerator to run those faster unless they were part of the accelerator itself. This completely kneecaps the 65816 because the library code runs almost at 1.79MHz speed. Another bad case like this would be trying to use a Black Box with an internal '816 -- the SCSI disk transfers would still be slow because the disk driver would be running out of uncached firmware ROM on the slow PBI bus. The easiest way to avoid this problem is to get as much code into fast RAM as possible, which is the easiest for the accelerator to handle. That is complicated by needing to accommodate the original Atari 8-bit architecture and also by the 65816's stubborn insistence on having interrupt vectors, stacks, and direct page only in bank 0. This means that having a way to accelerate at least part of bank 0 is critical for an '816 accelerator to run existing software faster than a stock 6502 -- it's pointless to try to run the '816 faster than 1.79MHz if it has to slow down to that speed virtually all the time to fetch instructions. The Apple IIGS solves this problem by having bank 0 be fast RAM by default and optionally shadowing it with Mega II regions from bank $E0 for Apple II compatibility. Quote Link to comment Share on other sites More sharing options...
Rybags Posted September 4, 2014 Share Posted September 4, 2014 The IO runs with VBlank totally enabled, doesn't it? ie, CRITIC not set. Some more speed could probably be wringed out of it - do a user VBI that's streamlined and only does the minimum required, should free up a few scanlines per frame for a little extra boost. Quote Link to comment Share on other sites More sharing options...
tschak909 Posted September 5, 2014 Author Share Posted September 5, 2014 Yes, I am using 512 byte sectors... -Thom Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.