I've been kicking around an idea for a versatile bankswitch method that is easier to use than existing methods and would have been technically possible on 1980s hardware, and could be used for games where the author prefers not to enhance them with ARM code.
The basic idea is that it's 32k ROM and 32k RAM. Having a lot of RAM can do a lot of the same things that DPC+ or CDFJ can do. However, looking at the various RAM schemes, I find them a bit difficult to program for.
So, I set this up like FE/SCABS nominally with 8 banks, so the memory model is eight 4k banks located at addresses $1000-$1FFF, $3000-$3FFF, and so on, to $F000-$FFFF. I am pretty sure that FE/SCABS requires the stack pointer at $FF to work...? For those not familiar with SCABS, to "bankswitch" is transparent and done with JSR/RTS, and code appears, to the programmer, to live in fixed areas of memory. For example, a JSR to $1000-$1FFF, $3000-$3FFF, $5000-$5FFF, or an alternating 4k block on up to to $F000-$FFFF, will automatically switch to code appearing to "live" in those areas without any need to directly access the bankswitch hotspots. Then, RTS will return to the original code location, so the memory model is easy to set up and understand, and easy to program for.
Beyond SCABS, the basic idea here is that 32k RAM can be added to a SCABS-like scheme such that it is always available to the programmer easily and without having to swap around blocks of memory in a small 4k space. The RAM can be used as code or data. Also, there is no concept of a "read port" or a "write port" in RAM, and the RAM is available in the full 32k of space in full 4k sections just like the ROM.
In addition, SCABS normally just has banks of code in alternate 4k chunks but the data itself is only available within the bank its running in. This new method breaks out of that barrier by also allowing the entirely of the ROM data to be accessed from anywhere at any time without the need to switch banks or swap out areas of ROM/RAM.
How it works:
Just as $1FF (in the stack) is a likely virtual hotspot for SCABS, several zero-page locations serve as virtual hotspots for the reading and writing of RAM/ROM anywhere in memory. First off, you are allowed two levels of JSRs instead of one (SP should be $FF or $FD). For accessing the 32k ROM/RAM as data, the addresses $F3, $F5, $F7, $F9, and $FB are reserved, which contain the high byte of the address you want to access (and D5-D7 of these bytes is how the hardware "knows" where to read or write data to/from!)
How to access the 32k of RAM for data:
To write to RAM from any location, you use STA ($FA),y. For instance, if $FA is $00, $FB is $32, and y is 2, this instruction will store to RAM at $3202 in your virtual 32k address space (again, at $1000-$1FFF, $3000-$3FFF, and so on.)
To read from RAM you use LDA ($F8),y. For instance, if $F8 is $00, $F9 is $32 and y is 2, this will read from RAM at $3202.
Unlike practically all RAM routines proposed for the 2600, you can also use read-modify-write instructions with this scheme! Use address $F6. For example: INC ($F6),y.
You can do the same with ROM, just as easily.
To read data from anywhere in the 32k ROM you use LDA ($F4),y
There should also a special "copy from ROM to RAM" location to more quickly set up things like self-modifying RAM kernels (this is undergoing testing, but I think it should work) or simply copying data.
To copy ROM to the same location in RAM, use NOP ($F2),y. Could also use CMP if you don't like illegal instructions and don't mind flags being trashed. LDA/X/Y would work as well if you don't mind the value being loaded into a register as it's written to RAM!
You can also use, for instance, (zp,x) to access any ROM/RAM. Just be sure zp+x=$F2, $F4, $F6, $F8, or $FA so you use the expected functon.
You can use any instruction that has a (zp),y or (zp,x) addressing mode, including illegals such as LAX (zp),y, SAX (zp,x)
The only conventional bankswitch thing here is a RAM/ROM switch. All it does is selects whether code is running in ROM or RAM. Its function is similar to old 8-bit home computers that had the OS, Basic, and/or Kernel code in ROM, but RAM lived "underneath" it and could be swapped into these memory locations instead with a POKE. Just as with the old home computers, it's often useful to copy parts of the system ROM to RAM before you run anything there to avoid a crash
I haven't worked out the best way to do a RAM/ROM switch but keeping with the idea here, use $F1, (so the value can be read back at a mirror such as $4F1 to get bank status, if needed) The eight bits stored will select whether that particular "bank" points to RAM or ROM. 0 for ROM and 1 for RAM. Note that this "bank" ONLY applies to code, not data. Data is *always* accessible from anywhere as RAM or ROM regardless of this setting. It only applies to code banks you JSR to or RTS back from. Also, storing to $F1 does NOT have an immediate effect, so the bank you are running code from will NOT be swapped out from under you, and only is activated by JSR/RTS*.
$F0 is another hotspot that does the same thing as $F1, but only applies to the "second-level" JSR. For example, you can set up $F0 to $FF and $F1 to be 00, and your entire ROM/RAM areas can then be accessed for running code using only JSR/RTS without needing to change the switches. Could be useful during a kernel.
absolute, absolute,x or absolute,y addressing modes are not currently supported for extended RAM/ROM access, and will instead point to ROM in the current bank just like normal SCABS does. (Note that even if RAM is swapped into the current bank for code execution, a store to the current bank using these addressing modes will not actually store anything.)
Page wrapping is not supported for the (),y indexing, though in theory, maybe it could be? I will look into that. But for now, it would be best to ensure that the pointer+y does not wrap, or better, always set the low byte of the pointer to zero.
Writes to set up the pointer locations $F3, $F5, $F7, $F9, or $FB should be done from mirror locations such as $4F3, $4F5, $4F7, etc, to avoid accidental RAM writing or other issues. The bankswitch hardware may be able to detect some cases of unintended writes to these areas, but it's not certain if it could be foolproof, so be sure to set up equates to mirror locations. This includes not just setting up pointers but, for instance, incrementing a pointer.
Care should be taken with clearing RAM and TIA at boot, and the stock routines for clearing these areas should not be used as they typically store to zero page or use the stack. It's best to use a custom routine that simply writes zeros to $400-$4FF.
Programmers should avoid using (zp,x) or (zp),y instructions that point to locations outside of cartridge space as the results may be unpredictable.
A 32k ROM+32k RAM scheme would be limited to Harmony Encore or a homebrew board with a 32k RAM chip (which is on the drawing board.) Other flashcarts should be able to support it as well.
A 64k ROM+32k RAM version could also be made. How that would work? The first-level JSR (called when SP=$FF) accesses the upper 32k ROM, and a second-level JSR (called when SP=$FD) will use the lower 32k ROM. In addition, a second ROM/RAM hotspot at $F0 could determine the status of ROM/RAM banks for the lower 32k. The ROM data access hotspots will also point to this lower 32k when code is running within a second-level JSR.
Also planned is a "mini" version of this that has 28k ROM and 4k RAM and would work on a standard Harmony, and doesn't require a new homebrew board. The "mini" version has RAM hardwired to $1000-$1FFF and ROM hardwired to $3000-$3FFF, $5000-$5FFF, .... etc, to $F000-$FFFF. Because of this, the "RAM/ROM" code switch isn't needed, and the "NOP ($F2),y" will always read the ROM bank indicated by the bank in D5-D7 of $F3 and always write out to the single 4k bank of RAM. Even the mini version could make for some fancy kernels with self-modifying code!
This scheme needs a new name! Any suggestions? Preferrably not something with "SCABS" in the name...
By the way, I do have a stripped-down proof of concept of some of the ideas here in a private build of Stella, but it needs a lot of work before I can release anything.
*For those concened that a 6-cycle JSR or RTS is too slow for a kernel, and/or for those who want to use the stack for missiles and the ball in a kernel but still be able to bankswitch, there are alternatives. I think that a JMP ($1FE) or JMP ($1FC) might also bankswitch in 5 cycles and perform a jump, provided you set up the addresses beforehand. Also, a PHA with D5-D7 set in A approproately may be able to bankswitch to the same location in another bank in just 3 cycles, provided the SP is set to $FF or $FD. I haven't tested this but it seems like either trick should work.