Jump to content
IGNORED

Alternative bankswitching technique?


batari

Recommended Posts

For F8 bankswitching, to jump bank 1-2 I've been just doing STA $1FF9 but this offsets you by three bytes when jumping to the next bank, and you need to write routines for every bankswitch to get back to the proper place and have code aligned just right.

 

Instead, I came up with this idea. In this case the stack pointer is $FF before we start. Then do a JSR $FFF0 in either bank to switch to the other.

 

bank 1 contains:

 

FFF0: DEC $FE

FFF2: DEC $FE

FFF4: DEC $FE

FFF6: BNE $FFF9 (always taken unless FE happens to be zero...)

FFF8: RTS

 

bank 2 contains:

FFF0: NOP

FFF1: NOP

FFF2: DEC $FE

FFF4: DEC $FE

FFF6: DEC $FE

FFF8: NOP (This byte should never be accessed, actually)

FFF9: RTS

 

It seems to me that this should decrement the return address in the stack by 3 bytes (if I'm understanding the stack properly) and once the processor tries to fetch the next instruction, it should set the address lines to 1FF8 or 1FF9 and we should switch banks right away, perhaps even before the address set-up time is up, thus the new byte (RTS in either case) should be available from the ROM from the other bank, and not only should we switch banks, we should return to the exact place we started the original JSR in the other bank.

 

However, I tried this code in an emulator and it didn't work. Is this because the emulators don't work like real hardware or is there a fundamental problem with my idea?

Link to comment
Share on other sites

Interesting, but sorry, I don't think I fully understand your code here. Why do you have to correct the return address?

 

And I am not sure what happens faster in real hardware, fetching the next instruction or (most likely) switching the bank. A slow bankswitch might result in some problems:

1. the branch of the first bank points to nothing (an additional RTS would help here).

2. After executing the NOP, the following RTS might be fetched, thus switching back to the wrong bank.

 

Did you have a look at the z26 trace logs?

Link to comment
Share on other sites

You could just use a jump table at the very end of each bank. After the selected subroutine executes, have it perform a JMP back to address $DFF4 (where a return bankswitch is waiting). Once that is done, an RTS instruction sitting at $FFF7 sends the program back to whatever JSR sent it there. Here's an example that has 3 bankswitched subroutines...

 

;@ $DFEB

jmp sub1;execute subroutine 1

jmp sub2;execute subroutine 2

jmp sub3;execute subroutine 3



Return:;@$DFF4

sta $1ff9

nop;unused byte @ $DFF7










;@ $FFE8

Jsub1: sta $1ff8

Jsub2: sta $1ff8

Jsub3: sta $1ff8


;3 unused

nop

nop

nop


;@ $FFF7

rts

 

 

So to execute the first subroutine...just JSR to Jsub1 from any point in bank2. It bankswitches, the corresponding JMP in bank 1 sends it to the subroutine...and a JMP back to address $DFF4 flips it back to the RTS instruction. The overhead is 13 cycles either way...six for the JSR, four for the bankswitch, and three more for the JMP. And you'll only lose 4 bytes of romspace (3 of which could be reclaimed if you have a 3-byte data table in bank 2 that you can put in place of those NOP instructions). Working backward, you can add in as many subroutines that you need...and you can still use a time-critical bankswitch way down at $F000.

Link to comment
Share on other sites

Just Brainstorming:

 

You are in Bank 2 and want to switch to bank 1

 

FFF4: DEC $FE  

FFF6: DEC $FE  

FFF8: NOP (This byte should never be accessed, actually)  

FFF9: RTS  

 

when the processor is fetching the opcode from location FFF8,

the bankswitching should take place more or less immediately. Depending on how the bankswitching logic is designed (what switching delay etc...), you switch before, during or after the fetch of the opcode.

 

That means you could fetch

a) the NOP from this bank (you fetch before the switch)

b) the RTS from location FFF8 of the other bank (you already switched before the fetch)

c) ? (i don't know what happens if you switch in the middle of the fetch) maybe you get a mix of RTS and NOP :-)

 

So this code brings you to bank1, but you did not necessarily fetch the

RTS. Can you try to replace your NOP with an RTS, so you can be sure that you fetch an RTS. I don't know how the emulators do the switching, but on real hardware you are only safe, if you have RTS two times.

 

But now you have a second problem, that has to do with how a RTS is actually done by the microprocessor. Unfortunately the processor will not only fetch the RTS opcode, but will also do a dummy fetch of the NEXT byte which is at location FFF9 ! Now guess what happens :-) ... We are back in Bank 2 :-(

 

A solution would be to put the RTS at FFF7. Then you would switch banks during the dummy fetch of the RTS (at least in theory :-)

 

Here is what RTS does, step by step ( The PC+1 is the problem ...)

 

+---------------+------------------+-----------------------+----------+

| Cycle | Address Bus | Data Bus |Read/Write|

+---------------+------------------+-----------------------+----------+

| 1 | PBR,PC | Op Code | R |

| 2 | PBR,PC+1 | Internal Operation | R |

| 3 | PBR,PC+1 | Internal Operation | R |

| 4 | 0,S+1 | New PCL-1 | R |

| 5 | 0,S+2 | New PCH | R |

| 6 | 0,S+2 | Internal Operation | R |

| 1 | PBR,NewPC | New Op Code | R |

+---------------+------------------+-----------------------+----------+

 

 

 

And lets have a look at the Bank 1 code which should bring us to bank 2:

 

FFF0: DEC $FE  

FFF2: DEC $FE  

FFF4: DEC $FE  

FFF6: BNE $FFF9 (always taken unless FE happens to be zero...)  

FFF8: RTS  

 

I don't really understand why the BNE $FFF9 is needed .. but lets assume the branch is taken and we jump to FFF9. Then you switch banks again

before, during or after the fetch of what is at FFF9. Shouldn't there also be an RTS at FFF9 ? You don't get a FFF9 on the databus, before the processor is fetching the opcode from there. But if it does so, its important whats in that cell. So i would try an RTS at FFF9 in both banks.

 

I hope I didn't misunderstand what you want to do there ....

Link to comment
Share on other sites

Interesting, but sorry, I don't think I fully understand your code here. Why do you have to correct the return address?

 

I am correcting the return address so control will be passed to the other bank at the same address as the JSR instead of three bytes later as happens when you LDA $1FFx. I wanted to do this because I'm doubling up a 4k game and I wanted to avoid having to move everything around. I thought it would be particularly useful for startup code or after pressing RESET since the ROM could be in either bank, so putting a JSR $FFF0 in bank 2 will automatically go to bank 1 without having to put in NOPs here and there or jumps in various places.

 

For instance, the typical startup code in bank 1 might look like:

 

F000: SEI

CLD

LDX #$FF

TXS

... rest of code ...

 

If you begin in bank 2 but you want to start in bank 1, you can't just start with a F000: LDA $1FF8 without other support code. However, if we put the following in bank 2:

 

F000: SEI

CLD

LDX #$FF

TXS

JSR $FFF0

 

then we'll automatically jump to bank 1 at the right place instead of three bytes ahead. Or at least that's what I wanted to do.

 

A solution would be to put the RTS at FFF7. Then you would switch banks during the dummy fetch of the RTS (at least in theory  

 

Here is what RTS does, step by step ( The PC+1 is the problem ...)

 

I tried this and it worked in StellaX and Z26! In PCAE it didn't work at all, though, but does anyone use PCAE? Stella and Z26 emulators must do the PC+1 thing because a RTS at $1FF7 did switch from bank 2 to 1. By the way, where did you find that cycle-by-cycle info on the 6502? That might be useful for figuring out some more dirty tricks.

 

You could just use a jump table at the very end of each bank.

 

I'll try this if I have problems with my idea. Whether my idea will really work or not depends on the actual bankswitching hardware, as suggested.

Link to comment
Share on other sites

  • 5 months later...

Raiders Of The Lost Ark avoids entry points by saving a short routine in Ram memory...and then just JMP's to ram.

 

 

Bank1:

;store address to JMP to after bankswitch

      LDA    #<LF844                ;2

      STA    $88                    ;3

      LDA    #>LF844                ;2

      STA    $89                    ;3

LDFAD:

      LDA    #$AD                   ;2 (LDA $FFF9)

      STA    $84                    ;3

      LDA    #$F9                   ;2

      STA    $85                    ;3

      LDA    #$FF                   ;2

      STA    $86                    ;3

      LDA    #$4C                   ;2 (JMP $)

      STA    $87                    ;3

      JMP.w  $84                    ;3

 

Bank2:

LF48B:
;store address to JMP to after bankswitch

      LDA    #<LD024                ;2

      STA    $88                    ;3

      LDA    #>LD024                ;2

      STA    $89                    ;3

LF493:

      LDA    #$AD                   ;2 (LDA $FFF8)

      STA    $84                    ;3

      LDA    #$F8                   ;2

      STA    $85                    ;3

      LDA    #$FF                   ;2

      STA    $86                    ;3

      LDA    #$4C                   ;2 (JMP $)

      STA    $87                    ;3

      JMP.w  $84                    ;3

 

 

That's pretty flexible, since you can set any point without worrying about filler. The downside is that you need 6 bytes of temp ram to hold the 2 instructions (as well as 8 bytes+a branch of setup code per jump). A regular JMP table would probably be less costly ;)

Link to comment
Share on other sites

I like to use ram for banking like that in my projects for patches. Probably not very useful for atari though because of the limited ram. Usually I line up entry points but that can be a hassle if you've got a lot of code already locked in for different banks. I have a sub in bank 0 that I call from all other banks. So I have code in ram that looks like this. I just call the sub in ram. X is the bank number and is stored in the same spot in every bank.

 

 

load current bank #X

store later in code

switch bank #0

call sub

switch bank #X

return

Link to comment
Share on other sites

Weirdest thing... I had a dream about this last night. IF you want to use RAM to set up custom entry points, you can manipulate the stack. Then you use just two bytes of temp RAM that you would probably be using for the stack anyway. All you need to do is put an RTS in the right place.

 

example:

 

Bank 0:

FFF8: RTS

 

Bank 1:

FFF7: RTS

 

Now recall from earlier posts, Kroko pointed out that an RTS accesses the next byte and throws the value away, but the side effect is that it will switch banks since we're accessing FFF8 and FFF9. But since it's an RTS, we can also use it set up an entry point using stack pushes and "return" there. To call the routine, do something like this:

 

LDA #>ROUTINE

PHA

LDA #

PHA

JMP $FFFx ;x=8 if currently in bank 0, 7 in bank 1.

 

I haven't tested this - just at the idea phase, but it should work in theory unless I've made a mistake somewhere. The only limitation I see is that ROUTINE can't be on an address ending in 00.

Link to comment
Share on other sites

I just tried this in A-team, which also uses the Raiders technique. It saved 36 bytes of ROM and at least 4 of RAM, and the game seems to work just fine, at least in Stella. I just want to know if this works on real hardware!

 

Code snippets I changed:

 


in bank 0:
; around LBCD5:

      LDA    #$69   ;2

      STA    $8B    ;3

      LDA    #$F6   ;2

      STA    $8C    ;3

      LDA    #$AD   ;2

      STA    $87    ;3

      LDA    #$F9   ;2

      STA    $88    ;3

      LDA    #$FF   ;2

      STA    $89    ;3

      LDA    #$4C   ;2

      STA    $8A    ;3

      JMP.w  $0087  ;3



in bank 1:

LF617: LDA    #$28   ;2

      STA    $8B    ;3

      LDA    #$B0   ;2

      STA    $8C    ;3

      LDA    #$AD   ;2

      STA    $87    ;3

      LDA    #$F8   ;2

      STA    $88    ;3

      LDA    #$FF   ;2

      STA    $89    ;3

      LDA    #$4C   ;2

      STA    $8A    ;3

      JMP.w  $0087  ;3

 

new code:

 


bank 0:

      LDA    #$F6   ;2

      PHA   ;2

      LDA    #$68; return address -1 from above

      PHA   ;2

      JMP $FFF8
;then  18 NOPs to fill in replaced code



bank 1:

      LDA    #$B0   ;2

      PHA   ;2

      LDA    #$27; return address -1

      PHA   ;2

      JMP $FFF7



;then 18 NOPs

      

Of course, you need to put RTS's at $FFF8 anf $FFF7. Anyway, here's the bin:

ateam.zip

Link to comment
Share on other sites

I just tried this in A-team, which also uses the Raiders technique.  It saved 36 bytes of ROM and at least 4 of RAM, and the game seems to work just fine, at least in Stella.  I just want to know if this works on real hardware!

 

new code:

 


bank 0:

      LDA    #$F6   ;2

      PHA   ;2

      LDA    #$68; return address -1 from above

      PHA   ;2

      JMP $FFF8
;then  18 NOPs to fill in replaced code



bank 1:

      LDA    #$B0   ;2

      PHA   ;2

      LDA    #$27; return address -1

      PHA   ;2

      JMP $FFF7



;then 18 NOPs

      

Of course, you need to put RTS's at $FFF8 anf $FFF7.  Anyway, here's the bin:

 

That's some really nice code there. Sorry I cannot test it on a real 2600.

 

Stephen Anderson

Link to comment
Share on other sites

Very nice! It's elegant, it's simple... and it works!

 

Expanding a little on that, if you use the routine a lot, then maybe using lda, pha twice to get the address would be overkill. I'm not sure if this would even work, but it might:

 

	jsr	SwapBank1

.word	SomeRoutine



SwapBank1:

; return address is at stack+1

tsx

inx

lda	($00,x)

sta	temp1

inc	$01,x

lda	($00,x)

sta	temp2

inx

txs

lda	temp1

pha

lda	temp2

pha

jmp	LFFF7



LFFF7:

rts[/quote]



At the expense of a 23-byte routine, you can save four bytes every time you switch banks, and use a more elegant syntax. It could probably be expanded to check the higher bits to see which bank to switch to ($0xxx could be bank 0, $1xxx bank 1, etc.) You'd need to switch banks several times in the code in order to save space, but maybe there's a way the code could be simplified... just a thought, anyway.

Link to comment
Share on other sites

Looking over it a bit, it can easily be improved, and I had the JSR/RTS address wrong. 18 bytes, no RAM needed:

 

        jsr     SwapBank1

       .word   SomeRoutine-1



SwapBank1:

      ; routine address is located at the return address (stack+1)

       tsx



       inc     $02,x          ; shift routine adress pointer to low byte ($ff will roll over!)

       lda     ($01,x)        ; load low byte of routine address

       tay                    ; store it in y

   

       inc     $02,x          ; shift routine adress pointer to high byte

       lda     ($01,x)        ; load high byte of routine address



       sta     ($02,x)        ; rewrite the new routine address

       tya                    ; over the older return address

       sta     ($01,x)



       jmp     LFFF7          ; switch banks



LFFF7:

       rts

Link to comment
Share on other sites

I bet you could save another byte by hooking it up to the IRQ vector and using BRK + .word ADDRESS-1. It should also save you an INC instruction.

 

Because of the way you're not propagating carry, it would probably be a good idea to make the bank call macro check for the call wrapping around a page boundary, and either complain or add a few NOPs as necessary.

Link to comment
Share on other sites


       jsr     SwapBank1

       .word   SomeRoutine-1



SwapBank1:

      ; routine address is located at the return address (stack+1)

       tsx



       inc     $02,x          ; shift routine adress pointer to low byte ($ff will roll over!)

       lda     ($01,x)        ; load low byte of routine address

       tay                    ; store it in y

           

       inc     $02,x          ; shift routine adress pointer to high byte

       lda     ($01,x)        ; load high byte of routine address



       sta     ($02,x)        ; rewrite the new routine address

       tya                    ; over the older return address

       sta     ($01,x)



       jmp     LFFF7          ; switch banks



LFFF7:

       rts

 

Great idea! If you switch enough, this may save some space. It took me a while to figure out how this worked (I've never known a practical use for indexed indirect addressing before.) I think you could save one more byte by using sty $01,x, and the sta ($02,x) should be changed to sta $02,x, I think.

 


       jsr     SwapBank1

       .word   SomeRoutine-1



SwapBank1:

      ; routine address is located at the return address (stack+1)

       tsx



       inc     $02,x          ; shift routine adress pointer to low byte ($ff will roll over!)

       lda   ($01,x)        ; load low byte of routine address

       tay                    ; store it in y

           

       inc     $02,x          ; shift routine adress pointer to high byte

       lda     ($01,x)        ; load high byte of routine address



       sta     $02,x        ; rewrite the new routine address

                           ; over the older return address

       sty     $01,x



       jmp     LFFF7          ; switch banks



LFFF7:

       rts

 

Regarding the BRK, if you did it this way, you'd probably instead put an RTI at LFFF7 to keep the stack balanced, since it pushes the address plus flags. This would also save the three byte JSR.

Link to comment
Share on other sites

Oh yeah, BRK would save two bytes of code, not one. Plus two more for losing an INC in the common code.

 

And shouldn't that be INC 1,X not INC 2,X? The low byte of the address should come first in memory because the 6502 is little-endian.

 

Nice use of TSX and the (n,X) addressing mode for a pointer in the stack, and the 6800-like index register usage. That is so unlike the 6502 code that I'm used to seeing.

 

And there's a good reason why it looks so strange. This is only possible because the 2600 mirrors the stack into the zero page. You couldn't do that trick on an Apple II, 400/800/5200, 7800 in native mode, NES, or just about anything else with a 6502. It's only because the 2600 has so little RAM that they mirrored the zero page into the stack area that makes this stack trick possible!

 

One other thing about the RTS (or RTI) at FFF7/FFF8. Won't the bank with an RTS at FFF7 also need one at FFF8 in case the reading of FFF8 switches from the other bank?

Link to comment
Share on other sites

One other thing about the RTS (or RTI) at FFF7/FFF8. Won't the bank with an RTS at FFF7 also need one at FFF8 in case the reading of FFF8 switches from the other bank?

 

As long as I'm understanding this right, I would guess not, because when JMPing to FFF7, the 6502 fetches the opcode in the first cycle and doesn't access FFF8 until the next cycle. And if we JMP to FFF8 from bank 0, we won't switch until we access FFF9 in the next cycle. But it doesn't hurt to put it in anyway, since you can't put data at FFF8 regardless. In the A-team bin, I did put RTS's at FFF7 and FFF8 in both banks regardless of whether I really have to or not.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...