+batari Posted September 22, 2004 Share Posted September 22, 2004 For F8 bankswitching, to jump bank 1-2 I've been just doing STA $1FF9 but this offsets you by three bytes when jumping to the next bank, and you need to write routines for every bankswitch to get back to the proper place and have code aligned just right. Instead, I came up with this idea. In this case the stack pointer is $FF before we start. Then do a JSR $FFF0 in either bank to switch to the other. bank 1 contains: FFF0: DEC $FE FFF2: DEC $FE FFF4: DEC $FE FFF6: BNE $FFF9 (always taken unless FE happens to be zero...) FFF8: RTS bank 2 contains: FFF0: NOP FFF1: NOP FFF2: DEC $FE FFF4: DEC $FE FFF6: DEC $FE FFF8: NOP (This byte should never be accessed, actually) FFF9: RTS It seems to me that this should decrement the return address in the stack by 3 bytes (if I'm understanding the stack properly) and once the processor tries to fetch the next instruction, it should set the address lines to 1FF8 or 1FF9 and we should switch banks right away, perhaps even before the address set-up time is up, thus the new byte (RTS in either case) should be available from the ROM from the other bank, and not only should we switch banks, we should return to the exact place we started the original JSR in the other bank. However, I tried this code in an emulator and it didn't work. Is this because the emulators don't work like real hardware or is there a fundamental problem with my idea? Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted September 22, 2004 Share Posted September 22, 2004 Interesting, but sorry, I don't think I fully understand your code here. Why do you have to correct the return address? And I am not sure what happens faster in real hardware, fetching the next instruction or (most likely) switching the bank. A slow bankswitch might result in some problems: 1. the branch of the first bank points to nothing (an additional RTS would help here). 2. After executing the NOP, the following RTS might be fetched, thus switching back to the wrong bank. Did you have a look at the z26 trace logs? Quote Link to comment Share on other sites More sharing options...
Nukey Shay Posted September 22, 2004 Share Posted September 22, 2004 You could just use a jump table at the very end of each bank. After the selected subroutine executes, have it perform a JMP back to address $DFF4 (where a return bankswitch is waiting). Once that is done, an RTS instruction sitting at $FFF7 sends the program back to whatever JSR sent it there. Here's an example that has 3 bankswitched subroutines... ;@ $DFEB jmp sub1;execute subroutine 1 jmp sub2;execute subroutine 2 jmp sub3;execute subroutine 3 Return:;@$DFF4 sta $1ff9 nop;unused byte @ $DFF7 ;@ $FFE8 Jsub1: sta $1ff8 Jsub2: sta $1ff8 Jsub3: sta $1ff8 ;3 unused nop nop nop ;@ $FFF7 rts So to execute the first subroutine...just JSR to Jsub1 from any point in bank2. It bankswitches, the corresponding JMP in bank 1 sends it to the subroutine...and a JMP back to address $DFF4 flips it back to the RTS instruction. The overhead is 13 cycles either way...six for the JSR, four for the bankswitch, and three more for the JMP. And you'll only lose 4 bytes of romspace (3 of which could be reclaimed if you have a 3-byte data table in bank 2 that you can put in place of those NOP instructions). Working backward, you can add in as many subroutines that you need...and you can still use a time-critical bankswitch way down at $F000. Quote Link to comment Share on other sites More sharing options...
Kroko Posted September 22, 2004 Share Posted September 22, 2004 Just Brainstorming: You are in Bank 2 and want to switch to bank 1 FFF4: DEC $FE FFF6: DEC $FE FFF8: NOP (This byte should never be accessed, actually) FFF9: RTS when the processor is fetching the opcode from location FFF8, the bankswitching should take place more or less immediately. Depending on how the bankswitching logic is designed (what switching delay etc...), you switch before, during or after the fetch of the opcode. That means you could fetch a) the NOP from this bank (you fetch before the switch) b) the RTS from location FFF8 of the other bank (you already switched before the fetch) c) ? (i don't know what happens if you switch in the middle of the fetch) maybe you get a mix of RTS and NOP :-) So this code brings you to bank1, but you did not necessarily fetch the RTS. Can you try to replace your NOP with an RTS, so you can be sure that you fetch an RTS. I don't know how the emulators do the switching, but on real hardware you are only safe, if you have RTS two times. But now you have a second problem, that has to do with how a RTS is actually done by the microprocessor. Unfortunately the processor will not only fetch the RTS opcode, but will also do a dummy fetch of the NEXT byte which is at location FFF9 ! Now guess what happens :-) ... We are back in Bank 2 :-( A solution would be to put the RTS at FFF7. Then you would switch banks during the dummy fetch of the RTS (at least in theory :-) Here is what RTS does, step by step ( The PC+1 is the problem ...) +---------------+------------------+-----------------------+----------+ | Cycle | Address Bus | Data Bus |Read/Write| +---------------+------------------+-----------------------+----------+ | 1 | PBR,PC | Op Code | R | | 2 | PBR,PC+1 | Internal Operation | R | | 3 | PBR,PC+1 | Internal Operation | R | | 4 | 0,S+1 | New PCL-1 | R | | 5 | 0,S+2 | New PCH | R | | 6 | 0,S+2 | Internal Operation | R | | 1 | PBR,NewPC | New Op Code | R | +---------------+------------------+-----------------------+----------+ And lets have a look at the Bank 1 code which should bring us to bank 2: FFF0: DEC $FE FFF2: DEC $FE FFF4: DEC $FE FFF6: BNE $FFF9 (always taken unless FE happens to be zero...) FFF8: RTS I don't really understand why the BNE $FFF9 is needed .. but lets assume the branch is taken and we jump to FFF9. Then you switch banks again before, during or after the fetch of what is at FFF9. Shouldn't there also be an RTS at FFF9 ? You don't get a FFF9 on the databus, before the processor is fetching the opcode from there. But if it does so, its important whats in that cell. So i would try an RTS at FFF9 in both banks. I hope I didn't misunderstand what you want to do there .... Quote Link to comment Share on other sites More sharing options...
+batari Posted September 22, 2004 Author Share Posted September 22, 2004 Interesting, but sorry, I don't think I fully understand your code here. Why do you have to correct the return address? I am correcting the return address so control will be passed to the other bank at the same address as the JSR instead of three bytes later as happens when you LDA $1FFx. I wanted to do this because I'm doubling up a 4k game and I wanted to avoid having to move everything around. I thought it would be particularly useful for startup code or after pressing RESET since the ROM could be in either bank, so putting a JSR $FFF0 in bank 2 will automatically go to bank 1 without having to put in NOPs here and there or jumps in various places. For instance, the typical startup code in bank 1 might look like: F000: SEI CLD LDX #$FF TXS ... rest of code ... If you begin in bank 2 but you want to start in bank 1, you can't just start with a F000: LDA $1FF8 without other support code. However, if we put the following in bank 2: F000: SEI CLD LDX #$FF TXS JSR $FFF0 then we'll automatically jump to bank 1 at the right place instead of three bytes ahead. Or at least that's what I wanted to do. A solution would be to put the RTS at FFF7. Then you would switch banks during the dummy fetch of the RTS (at least in theory Here is what RTS does, step by step ( The PC+1 is the problem ...) I tried this and it worked in StellaX and Z26! In PCAE it didn't work at all, though, but does anyone use PCAE? Stella and Z26 emulators must do the PC+1 thing because a RTS at $1FF7 did switch from bank 2 to 1. By the way, where did you find that cycle-by-cycle info on the 6502? That might be useful for figuring out some more dirty tricks. You could just use a jump table at the very end of each bank. I'll try this if I have problems with my idea. Whether my idea will really work or not depends on the actual bankswitching hardware, as suggested. Quote Link to comment Share on other sites More sharing options...
Kroko Posted September 23, 2004 Share Posted September 23, 2004 By the way, where did you find that cycle-by-cycle info on the 6502? That might be useful for figuring out some more dirty tricks One good document is "64doc" and I downloaded it at ftp://ftp.funet.fi/pub/cbm/documents/chipdata/64doc I also found some tables by searching google :-) Quote Link to comment Share on other sites More sharing options...
Nukey Shay Posted March 8, 2005 Share Posted March 8, 2005 Raiders Of The Lost Ark avoids entry points by saving a short routine in Ram memory...and then just JMP's to ram. Bank1: ;store address to JMP to after bankswitch LDA #<LF844 ;2 STA $88 ;3 LDA #>LF844 ;2 STA $89 ;3 LDFAD: LDA #$AD ;2 (LDA $FFF9) STA $84 ;3 LDA #$F9 ;2 STA $85 ;3 LDA #$FF ;2 STA $86 ;3 LDA #$4C ;2 (JMP $) STA $87 ;3 JMP.w $84 ;3 Bank2: LF48B: ;store address to JMP to after bankswitch LDA #<LD024 ;2 STA $88 ;3 LDA #>LD024 ;2 STA $89 ;3 LF493: LDA #$AD ;2 (LDA $FFF8) STA $84 ;3 LDA #$F8 ;2 STA $85 ;3 LDA #$FF ;2 STA $86 ;3 LDA #$4C ;2 (JMP $) STA $87 ;3 JMP.w $84 ;3 That's pretty flexible, since you can set any point without worrying about filler. The downside is that you need 6 bytes of temp ram to hold the 2 instructions (as well as 8 bytes+a branch of setup code per jump). A regular JMP table would probably be less costly Quote Link to comment Share on other sites More sharing options...
Dav Posted March 8, 2005 Share Posted March 8, 2005 I like to use ram for banking like that in my projects for patches. Probably not very useful for atari though because of the limited ram. Usually I line up entry points but that can be a hassle if you've got a lot of code already locked in for different banks. I have a sub in bank 0 that I call from all other banks. So I have code in ram that looks like this. I just call the sub in ram. X is the bank number and is stored in the same spot in every bank. load current bank #X store later in code switch bank #0 call sub switch bank #X return Quote Link to comment Share on other sites More sharing options...
+batari Posted March 9, 2005 Author Share Posted March 9, 2005 Weirdest thing... I had a dream about this last night. IF you want to use RAM to set up custom entry points, you can manipulate the stack. Then you use just two bytes of temp RAM that you would probably be using for the stack anyway. All you need to do is put an RTS in the right place. example: Bank 0: FFF8: RTS Bank 1: FFF7: RTS Now recall from earlier posts, Kroko pointed out that an RTS accesses the next byte and throws the value away, but the side effect is that it will switch banks since we're accessing FFF8 and FFF9. But since it's an RTS, we can also use it set up an entry point using stack pushes and "return" there. To call the routine, do something like this: LDA #>ROUTINE PHA LDA # PHA JMP $FFFx ;x=8 if currently in bank 0, 7 in bank 1. I haven't tested this - just at the idea phase, but it should work in theory unless I've made a mistake somewhere. The only limitation I see is that ROUTINE can't be on an address ending in 00. Quote Link to comment Share on other sites More sharing options...
+batari Posted March 9, 2005 Author Share Posted March 9, 2005 I just tried this in A-team, which also uses the Raiders technique. It saved 36 bytes of ROM and at least 4 of RAM, and the game seems to work just fine, at least in Stella. I just want to know if this works on real hardware! Code snippets I changed: in bank 0: ; around LBCD5: LDA #$69 ;2 STA $8B ;3 LDA #$F6 ;2 STA $8C ;3 LDA #$AD ;2 STA $87 ;3 LDA #$F9 ;2 STA $88 ;3 LDA #$FF ;2 STA $89 ;3 LDA #$4C ;2 STA $8A ;3 JMP.w $0087 ;3 in bank 1: LF617: LDA #$28 ;2 STA $8B ;3 LDA #$B0 ;2 STA $8C ;3 LDA #$AD ;2 STA $87 ;3 LDA #$F8 ;2 STA $88 ;3 LDA #$FF ;2 STA $89 ;3 LDA #$4C ;2 STA $8A ;3 JMP.w $0087 ;3 new code: bank 0: LDA #$F6 ;2 PHA ;2 LDA #$68; return address -1 from above PHA ;2 JMP $FFF8 ;then 18 NOPs to fill in replaced code bank 1: LDA #$B0 ;2 PHA ;2 LDA #$27; return address -1 PHA ;2 JMP $FFF7 ;then 18 NOPs Of course, you need to put RTS's at $FFF8 anf $FFF7. Anyway, here's the bin: ateam.zip Quote Link to comment Share on other sites More sharing options...
+Stephen Posted March 9, 2005 Share Posted March 9, 2005 I just tried this in A-team, which also uses the Raiders technique. It saved 36 bytes of ROM and at least 4 of RAM, and the game seems to work just fine, at least in Stella. I just want to know if this works on real hardware! new code: bank 0: LDA #$F6 ;2 PHA ;2 LDA #$68; return address -1 from above PHA ;2 JMP $FFF8 ;then 18 NOPs to fill in replaced code bank 1: LDA #$B0 ;2 PHA ;2 LDA #$27; return address -1 PHA ;2 JMP $FFF7 ;then 18 NOPs Of course, you need to put RTS's at $FFF8 anf $FFF7. Anyway, here's the bin: That's some really nice code there. Sorry I cannot test it on a real 2600. Stephen Anderson Quote Link to comment Share on other sites More sharing options...
Kroko Posted March 9, 2005 Share Posted March 9, 2005 Weirdest thing... I had a dream about this last night What a nice dream ! Your dream came true :-) At least it works fine on the Krokodile Cart. Quote Link to comment Share on other sites More sharing options...
Blackbird Posted March 10, 2005 Share Posted March 10, 2005 Very nice! It's elegant, it's simple... and it works! Expanding a little on that, if you use the routine a lot, then maybe using lda, pha twice to get the address would be overkill. I'm not sure if this would even work, but it might: jsr SwapBank1 .word SomeRoutine SwapBank1: ; return address is at stack+1 tsx inx lda ($00,x) sta temp1 inc $01,x lda ($00,x) sta temp2 inx txs lda temp1 pha lda temp2 pha jmp LFFF7 LFFF7: rts[/quote] At the expense of a 23-byte routine, you can save four bytes every time you switch banks, and use a more elegant syntax. It could probably be expanded to check the higher bits to see which bank to switch to ($0xxx could be bank 0, $1xxx bank 1, etc.) You'd need to switch banks several times in the code in order to save space, but maybe there's a way the code could be simplified... just a thought, anyway. Quote Link to comment Share on other sites More sharing options...
Blackbird Posted March 10, 2005 Share Posted March 10, 2005 Looking over it a bit, it can easily be improved, and I had the JSR/RTS address wrong. 18 bytes, no RAM needed: jsr SwapBank1 .word SomeRoutine-1 SwapBank1: ; routine address is located at the return address (stack+1) tsx inc $02,x ; shift routine adress pointer to low byte ($ff will roll over!) lda ($01,x) ; load low byte of routine address tay ; store it in y inc $02,x ; shift routine adress pointer to high byte lda ($01,x) ; load high byte of routine address sta ($02,x) ; rewrite the new routine address tya ; over the older return address sta ($01,x) jmp LFFF7 ; switch banks LFFF7: rts Quote Link to comment Share on other sites More sharing options...
Bruce Tomlin Posted March 10, 2005 Share Posted March 10, 2005 I bet you could save another byte by hooking it up to the IRQ vector and using BRK + .word ADDRESS-1. It should also save you an INC instruction. Because of the way you're not propagating carry, it would probably be a good idea to make the bank call macro check for the call wrapping around a page boundary, and either complain or add a few NOPs as necessary. Quote Link to comment Share on other sites More sharing options...
+batari Posted March 10, 2005 Author Share Posted March 10, 2005 jsr SwapBank1 .word SomeRoutine-1 SwapBank1: ; routine address is located at the return address (stack+1) tsx inc $02,x ; shift routine adress pointer to low byte ($ff will roll over!) lda ($01,x) ; load low byte of routine address tay ; store it in y inc $02,x ; shift routine adress pointer to high byte lda ($01,x) ; load high byte of routine address sta ($02,x) ; rewrite the new routine address tya ; over the older return address sta ($01,x) jmp LFFF7 ; switch banks LFFF7: rts Great idea! If you switch enough, this may save some space. It took me a while to figure out how this worked (I've never known a practical use for indexed indirect addressing before.) I think you could save one more byte by using sty $01,x, and the sta ($02,x) should be changed to sta $02,x, I think. jsr SwapBank1 .word SomeRoutine-1 SwapBank1: ; routine address is located at the return address (stack+1) tsx inc $02,x ; shift routine adress pointer to low byte ($ff will roll over!) lda ($01,x) ; load low byte of routine address tay ; store it in y inc $02,x ; shift routine adress pointer to high byte lda ($01,x) ; load high byte of routine address sta $02,x ; rewrite the new routine address ; over the older return address sty $01,x jmp LFFF7 ; switch banks LFFF7: rts Regarding the BRK, if you did it this way, you'd probably instead put an RTI at LFFF7 to keep the stack balanced, since it pushes the address plus flags. This would also save the three byte JSR. Quote Link to comment Share on other sites More sharing options...
Bruce Tomlin Posted March 10, 2005 Share Posted March 10, 2005 Oh yeah, BRK would save two bytes of code, not one. Plus two more for losing an INC in the common code. And shouldn't that be INC 1,X not INC 2,X? The low byte of the address should come first in memory because the 6502 is little-endian. Nice use of TSX and the (n,X) addressing mode for a pointer in the stack, and the 6800-like index register usage. That is so unlike the 6502 code that I'm used to seeing. And there's a good reason why it looks so strange. This is only possible because the 2600 mirrors the stack into the zero page. You couldn't do that trick on an Apple II, 400/800/5200, 7800 in native mode, NES, or just about anything else with a 6502. It's only because the 2600 has so little RAM that they mirrored the zero page into the stack area that makes this stack trick possible! One other thing about the RTS (or RTI) at FFF7/FFF8. Won't the bank with an RTS at FFF7 also need one at FFF8 in case the reading of FFF8 switches from the other bank? Quote Link to comment Share on other sites More sharing options...
Bruce Tomlin Posted March 10, 2005 Share Posted March 10, 2005 I just thought of something else possibly cool about using BRK. Because you have an extra byte on the stack, you could avoid using the Y register as temporary storage. Instead of TAY and TYA, you would use INX and TXS. Quote Link to comment Share on other sites More sharing options...
+batari Posted March 10, 2005 Author Share Posted March 10, 2005 One other thing about the RTS (or RTI) at FFF7/FFF8. Won't the bank with an RTS at FFF7 also need one at FFF8 in case the reading of FFF8 switches from the other bank? As long as I'm understanding this right, I would guess not, because when JMPing to FFF7, the 6502 fetches the opcode in the first cycle and doesn't access FFF8 until the next cycle. And if we JMP to FFF8 from bank 0, we won't switch until we access FFF9 in the next cycle. But it doesn't hurt to put it in anyway, since you can't put data at FFF8 regardless. In the A-team bin, I did put RTS's at FFF7 and FFF8 in both banks regardless of whether I really have to or not. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.