Jump to content
IGNORED

Bankswitch per jmp/jsr/rts


SvOlli

Recommended Posts

I just wanted to share this, as it was in my head for a couple of days now, without any concrete use.

 

Jac! had inspired me to write some code that does almost transparent bankswitching that works using jmp and even jsr/rts.

 

First, here's the code for F4 bankswitching, which - of cause - needs to be present at the same location of each bank:

.define HOTSPOTS $FFF4

.macro bankjsr _addr
  lda #>(_addr-1)
  ldy #<(_addr-1)
 jsr bankjmpcode
.endmacro

.macro bankjmp _addr
 lda #>(_addr-1)
 ldy #<(_addr-1)
 jmp bankjmpcode
.endmacro

.macro bankrts
 jmp bankrtscode
.endmacro

bankjmpcode:
 pha
 asl
 rol
 rol
 rol
 and #%00000111
 tax
 tya
 pha
 lda HOTSPOTS,x
 rts

bankrtscode:
 tsx
 lda $02,x
 asl
 rol
 rol
 rol
 and #%00000111
 tax
 lda HOTSPOTS,x
 rts

Disadvantages: jmp/jsr destroy all registers, rts just leave Y intact.

Advantages: very flexible, bankrts als works when not called via bankjsr, does not need a fixed ram address for the jmp destination.

 

For F6/F8 bankswitching, just reduce the number of "rol"s and the number of bits in the "and"s.

Link to comment
Share on other sites

:idea: For Boulder Dash (and Star Castle) we use a macro when defining a sub routine. This macro defines a variable which contains the bank "<name>_BANK".

 

So when you call the subroutine, the macro generates the bank variable name from the subroutine name and then determines the bank from that variable.

 

I think if you follow that path, your code could become a bit simpler.

  • Like 1
Link to comment
Share on other sites

icon_idea.gif For Boulder Dash (and Star Castle) we use a macro when defining a sub routine. This macro defines a variable which contains the bank "<name>_BANK".

 

So when you call the subroutine, the macro generates the bank variable name from the subroutine name and then determines the bank from that variable.

Also an interesting idea. It's like one of the golden rules of C/C++ programming: try to move as many of the calculations to compile-time instead of run-time.

 

I want to use that kind of bankswitching for my next (8k-or-bigger) demo.

 

Right now I'm using a jump-table, something like this:

jumptable:
.word part1-1, part2-1, part3-1, part4-1, reset
[...]
mainloop:
jsr vblank
lda index
asl
tax
lda jumptable+1,x
pha
lda jumptable,x
pha
rts

And in this kind of code can be easily extended to bankswitch the way I introduced here. The rest is just sugar coating, but I like the idea, that jsr and rts work almost like expected. Best part is: the bankswitching-rts code does also work, when not called using bankswitch-jsr.

Link to comment
Share on other sites

I hardcode all jumps so they are fast. The following fragment is in bank 3, which jumps to bank 4

 

End of bank 3:


org $3FDF
rorg $FFDF

JMP_KERNEL_3
nop BANK4
JMP_KERNEL_3_BORDER_RIGHT_1
nop BANK4
JMP_KERNEL_3_BORDER_RIGHT_2
nop BANK4

 

End of bank 4:


org [[JMP_KERNEL_3+3] & $0FFF + $4000]
rorg [JMP_KERNEL_3+3]
JMP KERNEL_3
JMP KERNEL_3_BORDER_RIGHT_1
JMP KERNEL_3_BORDER_RIGHT_2

 

So in bank 3 I want to jump to KERNEL_3 in bank 4. So I use JMP JMP_KERNEL_3 there. It jumps to the nop which triggers the bankswitch, then bank 4 becomes active and does JMP KERNEL_3.

 

Maybe the code can be cleaner, I just hacked around :) It has to be expanded if you want to rts back.

Edited by roland p
Link to comment
Share on other sites

Also an interesting idea. It's like one of the golden rules of C/C++ programming: try to move as many of the calculations to compile-time instead of run-time.

 

I want to use that kind of bankswitching for my next (8k-or-bigger) demo.

 

Right now I'm using a jump-table, something like this:

jumptable:
.word part1-1, part2-1, part3-1, part4-1, reset
[...]
mainloop:
jsr vblank
lda index
asl
tax
lda jumptable+1,x
pha
lda jumptable,x
pha
rts

And in this kind of code can be easily extended to bankswitch the way I introduced here. The rest is just sugar coating, but I like the idea, that jsr and rts work almost like expected. Best part is: the bankswitching-rts code does also work, when not called using bankswitch-jsr.

 

SvOlli,

you might like the backswitching model I've implemented in my ASDK Framework; it's incredibly simple, all proxy stub so there are no cycles to count.

 

I'll be teaching the method presently on my Learn Assembly thread but you'll get it just by looking at it :)

Link to comment
Share on other sites

SvOlli,

I just added the chapter, you should read it just for fun - I guarantee you'll get a kick out of it :)

Literately! ;)

 

Looking at your code, you pay extra 2 * 7 bytes per bankswitching jsr/rts combination. Mine is 24 bytes in total + 6 bytes extra per jsr/rts. This means at 3 calls both implementation use equal amount of rom. Starting with 4 call mine needs less space.

Link to comment
Share on other sites

Literately! icon_wink.gif

 

Looking at your code, you pay extra 2 * 7 bytes per bankswitching jsr/rts combination. Mine is 24 bytes in total + 6 bytes extra per jsr/rts. This means at 3 calls both implementation use equal amount of rom. Starting with 4 call mine needs less space.

Excellent points SvOlli, both models have their advantages and I would definitely switch to yours with a large number of calls when hitting space constraints!

 

You've got rolls in your code while I have action figures, but I'm surprised none of the advanced coders rolled their eyes at my previous chapters where I first avoid bitwise operations and then neatly dispose of Hexadecimal notation by submerging it in a bath of liquid Helium :)

Edited by Mr SQL
Link to comment
Share on other sites

Here is another way to bankswitch. I discovered it when I had a program keep intermediately crashing, long ago. It involves branching from $FF page to $FE, when the low address being branched to also is a low address of a hotspot. Since the high address of the branch gets updated after, the hotspot gets triggered. I jump between all 4 banks at the beginning of overscan. You will have to open the debugger to take a look. Nothing but a green screen otherwise. ;)

 

 

TestBank.zip

  • Like 1
Link to comment
Share on other sites

And in this kind of code can be easily extended to bankswitch the way I introduced here. The rest is just sugar coating, but I like the idea, that jsr and rts work almost like expected. Best part is: the bankswitching-rts code does also work, when not called using bankswitch-jsr.

You should have a look at my Star Castle code (link in my blog). There the task scheduler calls subroutines which then return to the scheduler. Or, to make things a bit more efficient (and complicated :)) directly continue with the next task.

Link to comment
Share on other sites

Per Thomas's comment, here's the basics of bankswitching under Boulder Dash™...

It's pretty simple, and fairly elegant to use...

First, the macro for defining 'subroutines'. Every routine that lives in a bank is defined using this macro...

 

MAC DEFINE_SUBROUTINE			 ; name of subroutine
BANK_{1}	 = _CURRENT_BANK		 ; bank in which this subroutine resides
		 SUBROUTINE			 ; keep everything local
{1}									 ; entry point
ENDM

 

Here's the actual usage...

 


DEFINE_SUBROUTINE ProcessExplosion
; contents...
rts

 

 

and here's how we call it...

 

		 lda #BANK_ProcessExplosion
		 sta SET_BANK
		 jsr ProcessExplosion

 

So we can easily move the subroutine somewhere else into a different bank -- basically just cut/paste it to anywhere, and all the code gets it right. The bank is correct.

This also makes it very easy to create vector tables, because the BANK_label is automatically calculated for all subroutines. Easy.

 

Just one addition: The banks themselves are also 'auto-calculated' using the following macro...

 

	    MAC NEWBANK ; bank name
		    SEG {1}
		    ORG ORIGIN
		    RORG $F000
BANK_START	  SET *
{1}			 SET ORIGIN / 2048
ORIGIN		  SET ORIGIN + 2048
_CURRENT_BANK   SET {1}
	    ENDM

 

 

 

Cheers

A

Link to comment
Share on other sites

  • 7 months later...
			 lda #BANK_ProcessExplosion
			 sta SET_BANK
			 jsr ProcessExplosion

This looks very nice but I am having trouble understanding the above. Is #BANK_ProcessExplosion the address of a bankswitch hotspot $1FF8 for example?

 

What is SET_BANK doing? By the name is implies it is actually doing the bankswitch.

 

When a bankswitch happens I thought you ended up in the same place but in the new bank so how can you guarantee jsr ProcessExplosion is in the correct place in the other bank?

Link to comment
Share on other sites

Here's a method I use:

 

 

   MAC GOTO
   LDY #[(>{0}&$F0)-$10]/$20
   LDX #<{0}
   LDA #>{0}
   JMP Switch_and_Go
   ENDM
 
   MAC GOSUB
   LDY #[(>.&$F0)-$10]/$20
   STY Return_Bank
   LDY #[(>{0}&$F0)-$10]/$20
   LDX #<{0}
   LDA #>{0}
   JSR Switch_and_Go
   ENDM
 
   MAC RETURN
   JMP Switch_and_Return
   ENDM
 
Target = $80 ; or whatever
Target_Lo = Target
Target_Hi = Target+1
Return_Bank = $82 ; or whatever
 
; Then this goes just before the bankswitching hotspots, in all banks
 
Switch_and_Go
   STX Target_Lo
   STA Target_Hi
   LDA Select_Bank,Y
   JMP (Target)
 
Switch_and_Return
   LDY Return_Bank
   LDA Select_Bank,Y
   RTS
 
Select_Bank ; hotspots
   HEX FF FF FF FF ; this is for 16K ROMs

 

This assumes that each 4K bank has its own address range, with bank 0 at $1000, bank 1 at $3000, bank 2 at $5000, etc. To use:

 

 

   ; if you don't care about coming back to the original bank
   GOTO Whatever ; the GOTO macro will figure out what bank to switch to based on the address of Whatever
 
   ; or, if you want to return to the original bank
   GOSUB Whatever ; the GOSUB macro will figure out which bank you're currently in and save it, then figure out which bank you want to switch to
 
   ; to return from the GOSUB
   RETURN

 

I don't know if this is the "best" or "fastest" or "smallest" method, but the idea is to be as "transparent" as possible-- let the macros do the work of figuring out the bank numbers. The downside is that it doesn't allow for multiple levels of GOSUB calls, since there's only one Return_Bank variable. Also, it destroys whatever was in A, X, and Y.

Link to comment
Share on other sites

Andrew's method works only on the 3F or 3E bankswitching which changes the first 2k of ROM, the last 2k are fixed.

 

While we're on it: the bankswiching I introduced on this topic is working fine in a demo (not game) that's due for release in 2014. I've got a subdirectory for each bank and move code from bank to bank, just by moving the file from one directory to another.

 

The code also got optimized a bit:

 

.macro bankjsr _addr
   ldx #>(_addr-1)
   lda #<(_addr-1)
   jsr bankjmpcode
.endmacro

.macro bankjmp _addr
   ldx #>(_addr-1)
   lda #<(_addr-1)
   jmp bankjmpcode
.endmacro

.macro bankrts
   jmp bankrtscode
.endmacro
 
bankjmpcode:
   pha
   txa
   pha
   ; slip through

bankrtscode:
   tsx
   lda $02,x
   asl
   rol
   rol
   rol
   and #%00000111
   tax
   lda HOTSPOTS,x
   rts

Now, Y is unchanged, and less ROM is used. :)

Link to comment
Share on other sites

[...] a demo (not game) that's due for release in 2014.

 

Oooh, looking forward to this! If by any chance this happens to be at Revision, expect some competition. ;) (Which would be entirely your own "fault", as your talk at this year's Revision prompted me to start coding a demo for the VCS. :grin: )

 

Anyway, thanks to all for showing the different ways of doing bank switching; I'll borrow some ideas when it comes to linking my demo...

Link to comment
Share on other sites

Spotted nicely,

 

but since the code is generated, I'm not sure if I'm gonna include this change. It's generated, because it's also working with F6 and F8 with only a few modifications:

 

F6:

   asl
   rol
   rol
   and #%00000011

F8:

   asl
   rol
   and #%00000001

And sticking to this pattern keeps the code more straight forward. On the other hand, if I should need that one byte in my demo, I know where to start... icon_winking.gif

Link to comment
Share on other sites

SuperMarioBros on NES used that routine, which is not competitive in size but I just wanted to add it :)

ScreenRoutines:
      lda ScreenRoutineTask        ;run one of the following subroutines
      jsr JumpEngine
    
      .word InitScreen
      .word SetupIntermediate
      .word WriteTopStatusLine
...


JumpEngine:
       asl          ;shift bit from contents of A
       tay
       pla          ;pull saved return address from stack
       sta $04      ;save to indirect
       pla
       sta $05
       iny
       lda ($04),y  ;load pointer from indirect
       sta $06      ;note that if an RTS is performed in next routine
       iny          ;it will return to the execution before the sub
       lda ($04),y  ;that called this routine
       sta $07
       jmp ($06)    ;jump to the address we loaded
Link to comment
Share on other sites

SuperMarioBros on NES used that routine, which is not competitive in size...

It is just a matter of how often you call that subroutine.

 

SvOlli's macro requires 7 bytes/call, NES 5, mine 3. NES and mine have extra overhead in the called routine.

 

So to find the optimal solution you have to know how many calls you need and then do the math.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...