65816 programming issues

JamesD · August 30, 2016

Ok, I've been trying to use the 65816's MVN instruction to scroll the screen on my 64/80 column text code.
I've single stepped through the code on Altirra, and the MVN seems to be working... right up until the vertical blank interrupt.
At least I think that's what's up.
I tried disabling interrupts, but it happens anyway so it's a non-maskable interrupt?
Registers are in 16 bit mode, so that would be very bad for 8 bit code.
So I guess I need my own interrupt handler?
Any sample interrupt code for the Atari out there. I've handled them on other machines but I don't know the Atari hardware that well.

If someone has other ideas as to what is causing the problem, I'm open to suggestions.

I'm using the following macro within CA65. I verified the addresses in the .lst file so that *shouldn't* be the problem. And I used the debug screen on Altirra so I could actually see the proper bytes being moved.

;Memory Move Negative.  Move from hi to lo address.
;Memory is moved at the rate of seven clock cycles per byte.
.macro	MOVEMEMN	SourceAddress,DestAddress,BytesToMove,SourceBank,DestBank
	clc
	xce					; native mode
	rep	#$31				; set A, X, and Y to 16 bit, clear carry
.A16						; Tell assembler Accumulator is 16 bit
.I16						; Tell assembler Index registers are 16 bit
	LDX	#SourceAddress			; The source address for the memory move
	LDY	#DestAddress			; The destination address for the memory move
	LDA	#BytesToMove-1			; The number of bytes we want to move + 1 because MVN needs it
	MVN	SourceBank,DestBank		; RAM banks, should be 00,00 on 64K system
	sep	#$31				; set A, X, and Y to 8 bits, set carry
;	sec
	xce					; emulated mode
.A8						; Tell assembler Accumulator is 8 bit
.I8						; Tell assembler Index registers are 8 bit
.endmacro

flashjazzcat · August 30, 2016

Are you using an OS which can handle native mode? Try: http://drac030.krap.pl/en-specyfikacja.php

JamesD · August 30, 2016

If Altera didn't include it, I don't have it.

I just downloaded it but it's an ARC file... and I have nothing that recognizes the compression.

flashjazzcat · August 30, 2016

Altirra will open it as a disk. You'll need SpartaDOS X to read it.

Edited August 30, 2016 by flashjazzcat

sanny · August 30, 2016

Ok, I've been trying to use the 65816's MVN instruction to scroll the screen on my 64/80 column text code.

I've single stepped through the code on Altirra, and the MVN seems to be working... right up until the vertical blank interrupt.

At least I think that's what's up.

I tried disabling interrupts, but it happens anyway so it's a non-maskable interrupt?

Registers are in 16 bit mode, so that would be very bad for 8 bit code.

Yes, the VBI (vertical blank interrupt) is a NMI.

phaeron · August 31, 2016

To confirm, neither the Atari OS nor Altirra's built-in OS will support running '816 native mode with interrupts enabled. Not even m8x8 mode is safe -- consider what the TXS instruction does. Also, even if you do use a native mode capable OS, you should run accelerated (>=3.58MHz) as the overhead for managing interrupts in native mode can be significant.

drac030 · August 31, 2016

On a non-accelerated machine native interrupts work tolerably, except the DLI, where the delay from saving the context is comparably huge. Also high frequency timer IRQ can have problems.

I can only add that on Atari it is much more advantageous to use DL scrolls to scroll text (terminal) display up and down than physically copying the screen memory contents (7k in hires).

Edited August 31, 2016 by drac030

JamesD · August 31, 2016

On a non-accelerated machine native interrupts work tolerably, except the DLI, where the delay from saving the context is comparably huge. Also high frequency timer IRQ can have problems.

I can only add that on Atari it is much more advantageous to use DL scrolls to scroll text (terminal) display up and down than physically copying the screen memory contents (7k in hires).

But one of the things I want to do is share the screen with some graphics... which will be horribly slow.

The interrupt may take a few clock cycles, but using the native mode MVN saves over 6000 clock cycles.

I'm thinkin the interrupt's going to be less.

JamesD · August 31, 2016

Looks like I'll have to do a little research.
I don't need to use any interrupts right now, but I will in the future. But then I don't think I'll be printing text at the same time.

JamesD · August 31, 2016

This gets it working which is my main concern at the moment.
*edit* SEI CLI still need to be added

;*********************************************
;* 65816 support
;*********************************************
.if Use65816 = 1
.P816
;Memory Move Negative.  Move from hi to lo address.
;Memory is moved at the rate of seven clock cycles per byte.
; MVN increments the X and Y registers, and decrements A
.macro	MOVEMEMN	SourceAddress,DestAddress,BytesToMove,SourceBank,DestBank
	LDA	NMIEN                ; save current NMIEN settings
	PHA
	LDA	#0
	STA	NMIEN			; turn off non maskable interrupts for 65816
	clc
	xce					; native mode
	rep	#$31				; set A, X, and Y to 16 bit, clear carry
.A16						; Tell assembler Accumulator is 16 bit
.I16						; Tell assembler Index registers are 16 bit
	LDX	#SourceAddress			; The source address for the memory move
	LDY	#DestAddress			; The destination address for the memory move
	LDA	#BytesToMove-1			; The number of bytes we want to move + 1 because MVN needs it
	MVN	SourceBank,DestBank		; RAM banks, should be 00,00 on 64K system
	sep	#$31				; set A, X, and Y to 8 bits, set carry
;	sec
	xce					; emulated mode
.A8						; Tell assembler Accumulator is 8 bit
.I8						; Tell assembler Index registers are 8 bit
	PLA
	STA	NMIEN			; restore non maskable interrupts for 65816
.endmacro
.endif

Edited August 31, 2016 by JamesD

phaeron · September 1, 2016

You can't read NMIEN, unfortunately. It's a write-only register.

JamesD · September 1, 2016

You can't read NMIEN, unfortunately. It's a write-only register.

Thanks

ricortes · September 1, 2016

If you have some time. I think a stock 6502 will write a graphics 8 screen ~3x/sec. Could you bench mark this to see how much quicker the 816 code runs?

JamesD · September 1, 2016

If you have some time. I think a stock 6502 will write a graphics 8 screen ~3x/sec. Could you bench mark this to see how much quicker the 816 code runs?

Well, as I said earlier, I think the memory move is only 1 clock cycle less than the unrolled code.

But there are 32 x 192 - (8 x 32) bytes to move on a 64 column screen. That's 5,888 clock cycles faster.

An 80 column screen is 40 x 192 - (8 x 40) bytes to move. That's 7,360 clock cycles faster.

There are also a few hundred cycles locked up in the 6502 loop.

Most importantly, you can kinda see what it scrolling past on the screen on the MVN version.

Writing a screen full of text is the same speed as I'm not using any 65816 instructions in the text rendering code.

And as I said before, about 140 characters are printed every scan of the display.

140 x 60 (I'm using NTSC) = 8400 characters per second.

An 80 column screen holds 1,920 characters.

8,400 / 1,920 = 4.375. So about 4 1/3 screens per second if you don't scroll the screen.

I'm not sure about a "benchmark" but I plan on doing a side by side video.

I'm guessing the 65816 will be ahead by at least 4 lines to scroll one screen of data.

That's based on the difference vs the MC-10 before and after the change.

drac030 · September 1, 2016

MVN is not very efficient (even if still faster than any 6502), but if the speed is the point, you can still try 16-bit LDA/STA (in the native mode). Such a pair is 10 clocks, but it transfers two bytes, so it is 5 clock cycles per byte.

Edited September 1, 2016 by drac030

JamesD · September 1, 2016

I can do that, it's not a huge change from my existing unrolled loop. Just set A to 16 bit, double the step counter, and cut the number of loops in half.
Right now I have another puzzle to solve.
The 65616 source code is much smaller than the 6502 code because it eliminates the partially unrolled loop.
So why is the XEX file over 1700 bytes larger? I'll have to compare .lst files and look what's being linked in.
<sigh>

phaeron · September 2, 2016

you can still try 16-bit LDA/STA (in the native mode). Such a pair is 10 clocks, but it transfers two bytes, so it is 5 clock cycles per byte.

Isn't that 11 clocks? Indexed stores usually always take the extra page-crossing clock.

JamesD · September 2, 2016

Isn't that 11 clocks? Indexed stores usually always take the extra page-crossing clock.

That would still be at least 20% faster than MVN even with the loop overhead in my code.

To see which is faster, I may have to disable the VZ code that scans for a key and pauses the display if it detects one.

phaeron · September 2, 2016

You'd have to unroll by at least 3x to beat MVN, according to my calculations.

Seeing as though you have interrupts disabled, though... the S and D registers are also usable. That means you should be able to copy aligned pages from D to S at just over 3 cycles/byte with PEI (dp), if you're willing to unroll 128x.

Edited September 2, 2016 by phaeron

drac030 · September 2, 2016

Isn't that 11 clocks? Indexed stores usually always take the extra page-crossing clock.

Indexed yes, but I meant just a totally unrolled series of lda/sta abs. I would take like 45k, but if you have some RAM past the first 64k, this should not be a problem.

Even with interrupts enabled, when the OS takes care of the D register contents, you should be able to reduce this to 4 clocks per byte using lda/sta zp instead (with slight overhead of reloading the D after every 256 bytes).

Edited September 2, 2016 by drac030

JamesD · September 2, 2016

My code uses rather large unrolled loops but I limited the total code size to... um... a bit less than 45K.
It uses at least 1 lda sta pair per text row on 64 characters per line and 2 pairs as an option.
80 columns always uses 2 pairs per row.

The last lda sta pair uses lda #0 to clear the last line in the same loop.
The loop overhead is an inx beq jmp or inx cmp beq jmp. At most it executes 255 times with 64 columns, half that with the larger unrolled code.

I can speed up the MVN code slightly by unrolling the code used to clear the last text row.
If I keep 256 or 320 bytes of zeros just above the text screen I can use the single MVN to clear the last row as well.
A good compromise here would be 32 or 40 empty bytes above the screen and then use three more MVN instructions, each taking advantage of the previous MVN to have a larger block it can copy. I could also switch the clear screen to use this approach. With two blank rows above the screen I only need 2 more MVN calls.

If I were to completely unroll the loop, I would have the program generate the code. I have already written similar code for the 6803 version that could be used with a RAM expansion for the MC-10. That is untested though because no emulator supports it yet and I don't have the real hardware expansion.
Actually, I my even have the 6502 code generated. That way I could generate the appropriate 6502/65816 version on the fly at startup. The difference is so small between the two it would be trivial to do.

JamesD · September 7, 2016

Routine to detect a 65816 CPU. From "The Fridge"
http://www.ffd2.com/fridge/scpu/detect-scpu.s

+Stephen · September 7, 2016

Routine to detect a 65816 CPU. From "The Fridge"

http://www.ffd2.com/fridge/scpu/detect-scpu.s

That's a great bit of code - short and easy to understand too.

JamesD · September 8, 2016

That's a great bit of code - short and easy to understand too.

I wish testing the Z80 were so simple.

I figured out you could test for Z180 with MPY as it's an undocumented NEG on the Z80... so in theory, it should work on anything.

But some Z80s (mostly FPGA cores) support MPY and ez80 opcodes, but not the rest of the features.

They may even support some undocumented opcodes and Z180 features which should not go together.

JamesD · September 9, 2016

If I only set the A to 16 bit...

	xce						; native mode
	rep	#$21					; set A to 16 bit, clear carry

can I use

	sep	#$21					; set A, X, and Y to 8 bits, set carry
	xce

To go back to normal, or do I have to use

	sep	#$31					; set A, X, and Y to 8 bits, set carry
	xce

Altirra seems to be requiring me to use the latter.

65816 programming issues

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members