JamesD Posted August 30, 2016 Share Posted August 30, 2016 Ok, I've been trying to use the 65816's MVN instruction to scroll the screen on my 64/80 column text code.I've single stepped through the code on Altirra, and the MVN seems to be working... right up until the vertical blank interrupt.At least I think that's what's up.I tried disabling interrupts, but it happens anyway so it's a non-maskable interrupt?Registers are in 16 bit mode, so that would be very bad for 8 bit code.So I guess I need my own interrupt handler?Any sample interrupt code for the Atari out there. I've handled them on other machines but I don't know the Atari hardware that well. If someone has other ideas as to what is causing the problem, I'm open to suggestions.I'm using the following macro within CA65. I verified the addresses in the .lst file so that *shouldn't* be the problem. And I used the debug screen on Altirra so I could actually see the proper bytes being moved. ;Memory Move Negative. Move from hi to lo address. ;Memory is moved at the rate of seven clock cycles per byte. .macro MOVEMEMN SourceAddress,DestAddress,BytesToMove,SourceBank,DestBank clc xce ; native mode rep #$31 ; set A, X, and Y to 16 bit, clear carry .A16 ; Tell assembler Accumulator is 16 bit .I16 ; Tell assembler Index registers are 16 bit LDX #SourceAddress ; The source address for the memory move LDY #DestAddress ; The destination address for the memory move LDA #BytesToMove-1 ; The number of bytes we want to move + 1 because MVN needs it MVN SourceBank,DestBank ; RAM banks, should be 00,00 on 64K system sep #$31 ; set A, X, and Y to 8 bits, set carry ; sec xce ; emulated mode .A8 ; Tell assembler Accumulator is 8 bit .I8 ; Tell assembler Index registers are 8 bit .endmacro Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted August 30, 2016 Share Posted August 30, 2016 Are you using an OS which can handle native mode? Try: http://drac030.krap.pl/en-specyfikacja.php Quote Link to comment Share on other sites More sharing options...
JamesD Posted August 30, 2016 Author Share Posted August 30, 2016 If Altera didn't include it, I don't have it.I just downloaded it but it's an ARC file... and I have nothing that recognizes the compression. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted August 30, 2016 Share Posted August 30, 2016 (edited) Altirra will open it as a disk. You'll need SpartaDOS X to read it. Edited August 30, 2016 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
sanny Posted August 30, 2016 Share Posted August 30, 2016 Ok, I've been trying to use the 65816's MVN instruction to scroll the screen on my 64/80 column text code. I've single stepped through the code on Altirra, and the MVN seems to be working... right up until the vertical blank interrupt. At least I think that's what's up. I tried disabling interrupts, but it happens anyway so it's a non-maskable interrupt? Registers are in 16 bit mode, so that would be very bad for 8 bit code. Yes, the VBI (vertical blank interrupt) is a NMI. Quote Link to comment Share on other sites More sharing options...
phaeron Posted August 31, 2016 Share Posted August 31, 2016 To confirm, neither the Atari OS nor Altirra's built-in OS will support running '816 native mode with interrupts enabled. Not even m8x8 mode is safe -- consider what the TXS instruction does. Also, even if you do use a native mode capable OS, you should run accelerated (>=3.58MHz) as the overhead for managing interrupts in native mode can be significant. Quote Link to comment Share on other sites More sharing options...
drac030 Posted August 31, 2016 Share Posted August 31, 2016 (edited) On a non-accelerated machine native interrupts work tolerably, except the DLI, where the delay from saving the context is comparably huge. Also high frequency timer IRQ can have problems. I can only add that on Atari it is much more advantageous to use DL scrolls to scroll text (terminal) display up and down than physically copying the screen memory contents (7k in hires). Edited August 31, 2016 by drac030 1 Quote Link to comment Share on other sites More sharing options...
JamesD Posted August 31, 2016 Author Share Posted August 31, 2016 On a non-accelerated machine native interrupts work tolerably, except the DLI, where the delay from saving the context is comparably huge. Also high frequency timer IRQ can have problems. I can only add that on Atari it is much more advantageous to use DL scrolls to scroll text (terminal) display up and down than physically copying the screen memory contents (7k in hires). But one of the things I want to do is share the screen with some graphics... which will be horribly slow. The interrupt may take a few clock cycles, but using the native mode MVN saves over 6000 clock cycles. I'm thinkin the interrupt's going to be less. Quote Link to comment Share on other sites More sharing options...
JamesD Posted August 31, 2016 Author Share Posted August 31, 2016 Looks like I'll have to do a little research.I don't need to use any interrupts right now, but I will in the future. But then I don't think I'll be printing text at the same time. Quote Link to comment Share on other sites More sharing options...
JamesD Posted August 31, 2016 Author Share Posted August 31, 2016 (edited) This gets it working which is my main concern at the moment.*edit* SEI CLI still need to be added ;********************************************* ;* 65816 support ;********************************************* .if Use65816 = 1 .P816 ;Memory Move Negative. Move from hi to lo address. ;Memory is moved at the rate of seven clock cycles per byte. ; MVN increments the X and Y registers, and decrements A .macro MOVEMEMN SourceAddress,DestAddress,BytesToMove,SourceBank,DestBank LDA NMIEN ; save current NMIEN settings PHA LDA #0 STA NMIEN ; turn off non maskable interrupts for 65816 clc xce ; native mode rep #$31 ; set A, X, and Y to 16 bit, clear carry .A16 ; Tell assembler Accumulator is 16 bit .I16 ; Tell assembler Index registers are 16 bit LDX #SourceAddress ; The source address for the memory move LDY #DestAddress ; The destination address for the memory move LDA #BytesToMove-1 ; The number of bytes we want to move + 1 because MVN needs it MVN SourceBank,DestBank ; RAM banks, should be 00,00 on 64K system sep #$31 ; set A, X, and Y to 8 bits, set carry ; sec xce ; emulated mode .A8 ; Tell assembler Accumulator is 8 bit .I8 ; Tell assembler Index registers are 8 bit PLA STA NMIEN ; restore non maskable interrupts for 65816 .endmacro .endif Edited August 31, 2016 by JamesD 1 Quote Link to comment Share on other sites More sharing options...
phaeron Posted September 1, 2016 Share Posted September 1, 2016 You can't read NMIEN, unfortunately. It's a write-only register. 1 Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 1, 2016 Author Share Posted September 1, 2016 You can't read NMIEN, unfortunately. It's a write-only register. Thanks Quote Link to comment Share on other sites More sharing options...
ricortes Posted September 1, 2016 Share Posted September 1, 2016 If you have some time. I think a stock 6502 will write a graphics 8 screen ~3x/sec. Could you bench mark this to see how much quicker the 816 code runs? Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 1, 2016 Author Share Posted September 1, 2016 If you have some time. I think a stock 6502 will write a graphics 8 screen ~3x/sec. Could you bench mark this to see how much quicker the 816 code runs? Well, as I said earlier, I think the memory move is only 1 clock cycle less than the unrolled code. But there are 32 x 192 - (8 x 32) bytes to move on a 64 column screen. That's 5,888 clock cycles faster. An 80 column screen is 40 x 192 - (8 x 40) bytes to move. That's 7,360 clock cycles faster. There are also a few hundred cycles locked up in the 6502 loop. Most importantly, you can kinda see what it scrolling past on the screen on the MVN version. Writing a screen full of text is the same speed as I'm not using any 65816 instructions in the text rendering code. And as I said before, about 140 characters are printed every scan of the display. 140 x 60 (I'm using NTSC) = 8400 characters per second. An 80 column screen holds 1,920 characters. 8,400 / 1,920 = 4.375. So about 4 1/3 screens per second if you don't scroll the screen. I'm not sure about a "benchmark" but I plan on doing a side by side video. I'm guessing the 65816 will be ahead by at least 4 lines to scroll one screen of data. That's based on the difference vs the MC-10 before and after the change. Quote Link to comment Share on other sites More sharing options...
drac030 Posted September 1, 2016 Share Posted September 1, 2016 (edited) MVN is not very efficient (even if still faster than any 6502), but if the speed is the point, you can still try 16-bit LDA/STA (in the native mode). Such a pair is 10 clocks, but it transfers two bytes, so it is 5 clock cycles per byte. Edited September 1, 2016 by drac030 1 Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 1, 2016 Author Share Posted September 1, 2016 I can do that, it's not a huge change from my existing unrolled loop. Just set A to 16 bit, double the step counter, and cut the number of loops in half.Right now I have another puzzle to solve.The 65616 source code is much smaller than the 6502 code because it eliminates the partially unrolled loop.So why is the XEX file over 1700 bytes larger? I'll have to compare .lst files and look what's being linked in.<sigh> Quote Link to comment Share on other sites More sharing options...
phaeron Posted September 2, 2016 Share Posted September 2, 2016 you can still try 16-bit LDA/STA (in the native mode). Such a pair is 10 clocks, but it transfers two bytes, so it is 5 clock cycles per byte. Isn't that 11 clocks? Indexed stores usually always take the extra page-crossing clock. Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 2, 2016 Author Share Posted September 2, 2016 Isn't that 11 clocks? Indexed stores usually always take the extra page-crossing clock. That would still be at least 20% faster than MVN even with the loop overhead in my code. To see which is faster, I may have to disable the VZ code that scans for a key and pauses the display if it detects one. Quote Link to comment Share on other sites More sharing options...
phaeron Posted September 2, 2016 Share Posted September 2, 2016 (edited) You'd have to unroll by at least 3x to beat MVN, according to my calculations. Seeing as though you have interrupts disabled, though... the S and D registers are also usable. That means you should be able to copy aligned pages from D to S at just over 3 cycles/byte with PEI (dp), if you're willing to unroll 128x. Edited September 2, 2016 by phaeron Quote Link to comment Share on other sites More sharing options...
drac030 Posted September 2, 2016 Share Posted September 2, 2016 (edited) Isn't that 11 clocks? Indexed stores usually always take the extra page-crossing clock. Indexed yes, but I meant just a totally unrolled series of lda/sta abs. I would take like 45k, but if you have some RAM past the first 64k, this should not be a problem. Even with interrupts enabled, when the OS takes care of the D register contents, you should be able to reduce this to 4 clocks per byte using lda/sta zp instead (with slight overhead of reloading the D after every 256 bytes). Edited September 2, 2016 by drac030 Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 2, 2016 Author Share Posted September 2, 2016 My code uses rather large unrolled loops but I limited the total code size to... um... a bit less than 45K. It uses at least 1 lda sta pair per text row on 64 characters per line and 2 pairs as an option. 80 columns always uses 2 pairs per row. The last lda sta pair uses lda #0 to clear the last line in the same loop.The loop overhead is an inx beq jmp or inx cmp beq jmp. At most it executes 255 times with 64 columns, half that with the larger unrolled code. I can speed up the MVN code slightly by unrolling the code used to clear the last text row.If I keep 256 or 320 bytes of zeros just above the text screen I can use the single MVN to clear the last row as well.A good compromise here would be 32 or 40 empty bytes above the screen and then use three more MVN instructions, each taking advantage of the previous MVN to have a larger block it can copy. I could also switch the clear screen to use this approach. With two blank rows above the screen I only need 2 more MVN calls.If I were to completely unroll the loop, I would have the program generate the code. I have already written similar code for the 6803 version that could be used with a RAM expansion for the MC-10. That is untested though because no emulator supports it yet and I don't have the real hardware expansion.Actually, I my even have the 6502 code generated. That way I could generate the appropriate 6502/65816 version on the fly at startup. The difference is so small between the two it would be trivial to do. Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 7, 2016 Author Share Posted September 7, 2016 Routine to detect a 65816 CPU. From "The Fridge"http://www.ffd2.com/fridge/scpu/detect-scpu.s 1 Quote Link to comment Share on other sites More sharing options...
+Stephen Posted September 7, 2016 Share Posted September 7, 2016 Routine to detect a 65816 CPU. From "The Fridge" http://www.ffd2.com/fridge/scpu/detect-scpu.s That's a great bit of code - short and easy to understand too. Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 8, 2016 Author Share Posted September 8, 2016 That's a great bit of code - short and easy to understand too. I wish testing the Z80 were so simple. I figured out you could test for Z180 with MPY as it's an undocumented NEG on the Z80... so in theory, it should work on anything. But some Z80s (mostly FPGA cores) support MPY and ez80 opcodes, but not the rest of the features. They may even support some undocumented opcodes and Z180 features which should not go together. 1 Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 9, 2016 Author Share Posted September 9, 2016 If I only set the A to 16 bit... xce ; native mode rep #$21 ; set A to 16 bit, clear carry can I use sep #$21 ; set A, X, and Y to 8 bits, set carry xce To go back to normal, or do I have to use sep #$31 ; set A, X, and Y to 8 bits, set carry xce Altirra seems to be requiring me to use the latter. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.