Large 3E cartridge support on the UnoCart

DirtyHairy · May 25, 2018

After adding support for large (512k) 3F cartridges to the UnoCart I am considering to do the same for 3E cartridges. Depending on the STM32F4x variant on the cart, the UnoCart has either 1024k or 512k flash available. This translates to either up to 512k ROM or 448k ROM with 160k RAM available to the VCS (up to 512k ROM would be possible in the second case by sacrificing some RAM).

To the best of my knowledge, this would be the first cartridge format that has such a large amount of RAM available. Is anybody interested in this?

Edited May 25, 2018 by DirtyHairy

ZackAttack · May 26, 2018

For the 3E scheme I would think favoring the ROM over RAM would make more sense. I just can't imagine a use case for that much RAM in a scheme that doesn't also have some form of ARM accelleration. More ROM can always enable more level, graphics, or game play. More RAM can enable some more complex kernels, but diminishing returns would apply well before 160K.

I believe the harmony currently supports 3E with up to 32KB ROM and 16KB RAM and I'm not aware of anyone that has ever come close to exceeding the RAM limit.

32KB of RAM is interesting because you could have a double buffer of a fully unrolled display kernel. Maybe double that if you also include a giant dynamic game world. So I guess my interest would plateau at around 64KB of RAM.

Looking forward to whatever you decide to go with.

Thomas Jentzsch · May 26, 2018

The problem with 3E is, that RAM can only be accessed by one bank. This turned out to be a real problem in BD so Andrew and I thought about more flexible schemes.

3E+ is the final result of this and i would suggest implementing this one instead.

DirtyHairy · May 26, 2018

32KB of RAM is interesting because you could have a double buffer of a fully unrolled display kernel. Maybe double that if you also include a giant dynamic game world. So I guess my interest would plateau at around 64KB of RAM.

Yeah, I have been thinking along the same lines. An unrolled kernel is about 16k, double that for double buffering, and maybe double again if you are doing a variation of flicker blinds --- this gives 64k. Put another 32k on top for a huge dynamic world, and you end up with 96k. Coincidentally, this would allow for the full 512k ROM even with the "smaller" SOC.

While lots of RAM won't make up for the processing speed of the ARM, fully dynamic kernel code from RAM would increase the number of TIA writes per line to DPC+ / CDF levels. If you use double buffering and can spare a few frames to update the display, this could open up interesting new pathways without using the ARM directly.

The problem with 3E is, that RAM can only be accessed by one bank. This turned out to be a real problem in BD so Andrew and I thought about more flexible schemes.

3E+ is the final result of this and i would suggest implementing this one instead.

I have been wondering about this as well; 3E has a very restrictive memory layout. If I read the source correctly, 3E+ can support up to 64k ROM and 32k RAM and could be implemented easily. Do you know any ROM that uses this scheme that I could use as a test case?

I have also though about a modifying 3E to support more RAM and ROM in a way that offers flexibility similar to 3E+. One scheme I have come up with is dividing the 4k into four 1k slots. Each bank could hold either 1k ROM, 1k of RAM (readonly) or 512 bytes of RAM (r/w). In order to switch a bank, you would write four bit to 3E that encode which slot to configure into which mode (ROM, RAM r/o, RAM r/w) and then switch the bank by writing to 3F. This would allow for 256k ROM and 128k RAM and should be a good match for running dynamically generated or patched kernel code from RAM. However, this is pretty much academic without a test case that actually exploits this (beyond simply checking whether the scheme is implemented correctly). I have a few ideas, but no time to try them in the next months, in particular as I haven't tried myself on anything larger than a test ROM so far. However, if anyone is interested in experimenting with such a scheme, I'm game --- that's on of the reasons why I started this thread

Edited May 26, 2018 by DirtyHairy

Thomas Jentzsch · May 27, 2018

I don't think there exists any ROM which uses 3E+ yet (I am (very slowly) converting BD into this).

I like your idea with different RAM modes (BTW: why no write only?). Actually Andrew and I had tried to get larger RAM banks too (DASH). But when I experimented with our approach, this turned out to be overly complicated and restricting again, so I switched to the better usable 3E+. Handling smaller banks is not that much of a hassle if your code already has to take care of bank sizes.

What I don't like in your idea is, that you have to do 2 writes to select a bank. The problem are not only the extra CPU cycles for the writes, but that you have to use 2 registers. So that only one register is left to work with. When you need 2 registers (which is quite useful for subroutines), you are forced to store and load one via stack or ZP-RAM.

But the different RAM modes are cool. So assuming that the configuration of the slot layout doesn't change frequently, maybe one could allow permanent configuration of the slots. Then you could e.g. configure (e.g. using 3E):

slot 0 as 1k RAM r/o
slot 1 as 1k RAM w/o
slot 2 as 512b RAM r/w
slot 3 as 1k ROM

And later switch in the (RAM or ROM) banks as you like with e.g. 3F.

if preceded by 3E, all 8 bits are used for the bank select (256k ROM and RAM)
if not preceded by 3E, the upper two bits are used for slot select (64k ROM and RAM)

So the lower 64k can be accessed faster than the remaining 192k.

BTW: Since this would currently only require 4 bits in 3E, maybe this could be extended further, e.g. to allow configuring more smaller or less but larger slots (512k ROM) or moving faster access to a 64k segment in the upper 192k.

Edited May 27, 2018 by Thomas Jentzsch

DirtyHairy · May 27, 2018

Taking a second look at the TIA address space; we should be able to abuse all addresses between 0x2d and 0x3f as banking registers, so there is no need to rely on a particular order of writes. If we keep the smallest slot size to 1024, we could use one address to setup the slot layout (including switching between 1k, 2k and 4k slots), one address to switch between ROM / RAM and four other addresses to switch between the actual ROM and RAM banks. We could also use another four addresses to store a high byte of the bank address, which would remove the theoretical limit to the ROM and RAM size (we could use the full 192k RAM and 448k / 960k flash on the UnoCard). I like the idea of allowing r/o, w/o and r/w as modes for RAM banks. All in all, a scheme could look like

0x3c -- 0x3f: bank select low byte, select bank on write
0x38 -- 0x3b: bank select high byte
0x36: configure address space layout: 4k, 2k+2k, 2x1k + 2k, 2k + 2x1k, 4x1k
0x37: configure bank type: ROM, RAM r/o, RAM w/o, RAM r/w

In order to speed up banking even more, we could add the possibility to store and restore a banking setup

0x34: write to store banking setup in slot 0 .. 255
0x35: write to restore banking setup from slot 0 .. 255

If we want even more capacity, we could allow for the possibility to swap data into RAM directly from the cartridge image on SD, super charger style. This could work by writing a data structure to RAM that describes what to load and where to relocate it and then triggering the load by writing to a particular address (i.e. 0x33). As with the supercharger, the 6502 will then have to poll an address from cartridge space for the load to complete and the cartridge to come back online, so the code will have to run from RIOT RAM, but this would allow for huge games (limited only by the 4GB FAT limit on the size of a single file ).

Edited May 27, 2018 by DirtyHairy

Thomas Jentzsch · May 28, 2018

I like this, but I would allow maybe even more (granular) address space layouts, down to 256b slots (with some restrictions, e.g. 256b + 3.75k). That way we could configure the bankswitching to emulate all(?) other existing bank switching schemes (especially the old ones).

That sounds to be useful, though I am not sure here.

0x34, 0x35 look a bit like overkill to me now, but one might find it useful (or even necessary) while actually using the new scheme.

How about storing a jmp address into two registers? So whenever a bank is switched, the code does e.g. JMP (0033). Unless both bytes are 0. And maybe something similar for JSR. This would make banking a bit easier too.

Edited May 28, 2018 by Thomas Jentzsch

ZackAttack · May 29, 2018

Don't forget about the mirrors. Legacy bank schemes avoided differentiating mirrors because of pin counts and complexity. You already have the full address bus, so why not leverage it.

Something I had proposed a while back was to use different mirrors for different page sizes. I.E. Storing $05 to $003e would activate the 5th 4KB page. Storing $05 to $013e would activate the 5th 2KB page at $f000-$f7ff. Storing $05 to $023e would activate the 5th 2KB page at $f800-$ffff.

You can take that further and also have different hotspots for different configurations as well. I.E. Storing $05 to $003f would activate the specified RAM page as write only. Assuming the RAM is capped at 256KB or less the top 2 bits can serve as a mode mask, 0?=R/W, 10=RO, 11=WO. That should help conserve how many hotspots are needed.

I also like the idea of having a metapage hotspot so you can essentially swap 1MB chunks as needed. I assume that one would have some overhead since loading the next 1MB chunk from SD would take some time.

IMO, this would be the closest and most feasible solution to mimicking all the other schemes flexibility and also supported giant carts.

DirtyHairy · May 29, 2018

How about storing a jmp address into two registers? So whenever a bank is switched, the code does e.g. JMP (0033). Unless both bytes are 0. And maybe something similar for JSR. This would make banking a bit easier too.

Unfortunately, we can only write to those registers. Once we read, the TIA will respond and drive the bus. We could still go for some DPC+ like features and introduce data / jump streams if we like, though, but I would like to keep this out of scope for now.

Don't forget about the mirrors. Legacy bank schemes avoided differentiating mirrors because of pin counts and complexity. You already have the full address bus, so why not leverage it.

Something I had proposed a while back was to use different mirrors for different page sizes. I.E. Storing $05 to $003e would activate the 5th 4KB page. Storing $05 to $013e would activate the 5th 2KB page at $f000-$f7ff. Storing $05 to $023e would activate the 5th 2KB page at $f800-$ffff.

That's pretty clever. Making use of the available mirrors, we could easily support fully configurable slots from 4k down to 256 bytes (at the price of losing one cycle for the non-zeropage access). What about the following: We divide the 4k into 16 256 byte chunks that can be switched either individually or combined as larger chunks. In order to switch banks, we use the TIA mirrors starting from $012d. For each slot size, we use a single set of TIA mirrors, i.e. $012d -- $01ef for 4k, $022d -- $023f for 2k, and so on. We use $2f to switch the individual slot modes (bit 0-3 = slot number, bit 4-6 = mode), and use the registers starting from $30 to change the banks for the individual slots. Each slot has a register for the low byte and one register for the high byte, and actual banking is triggered by writes to the low byte (not high bytes or $2f). Altogether, we'd have:

$012f: Change 4k slot mode; bit 4-6 = mode: ROM, RAM r/w, RAM r/o, RAM w/o
$0130 -- $0131: 4k bank lo / 4k bank hi
$022f: Change 2k slot mode: bit 0-3 = slot 0-1, bit 4-6 = mode
$0230 -- $0231: 2k slot 0 bank lo / hi
$0232 -- $0233: 2k slot 1 bank lo / hi
$032f: Change 1k slot mode: bit 0-3 = slot 0-3, bit 4-6 = mode
$0330 -- $0337: 1k slot 0-3 bank lo / hi
$042f: Change 512b slot mode: bit 0-3 = slot 0-7, bit 4-6 = mode
$0430 -- $043f: 512b slot 0-7 bank lo / hi
$052f: Change 256b slot mode: bit 0-3 = slot 0-15, bit 4-6 = mode
$0530 -- $053f: 256b slot 0-7 bank lo / hi
$0630 -- $063f: 256b slot 8-15 bank lo / hi

Hi bytes and slot modes are "sticky", so after the slots have been configured, switching could usually be done simply by writing the low byte for the corresponding slot. The other functionality we discussed could be placed like such:

$012d: store banking configuration to slot (up to 256 possible slots, provided there is enough RAM)
$012e: load banking configuration from slot
$022d: load a chunk from the extended image using the relocation descriptor at the specified 16 byte offset in cartridge address space (0 - 4k)

The image would have a small header that declares the amount of RAM, ROM and the number of config slots that are required (so the cartridge can check whether it can support the image and partition its memory accordingly). The initial ROM image is read from the beginning of the image, and chunk loads can be used to swap other parts of the image either into RAM or ROM. Chunk loading from SD is slow and needs to be async, so the code has to be run from RIOT RAM. After writing $022d, the cartridge goes offline, and the code in RIOT RAM has to poll until the cartridge comes back online. The descriptors are simple date structures that define which block of the image should be loaded where into either RAM or ROM.

As an example, the following code would switch to a plain 4k ROM image with 128 byte SARA-style RAM at $f000:

LDA #$0

STA $012f

STA $0131

STA $0130

LDA #$10

STA $052f

LDA #$0

STA $0531

STA $0530

A lot of those writes are just for illustration and could be left out if we define the initial state as 1. all hi bytes zero and 2. all slots in ROM mode. In order to switch to another 4k bank, we'd simply write to $0130 (this would override the RAM bank, however), and if we wanted to switch in the fourth 2k ROM bank at $f800, we'd use

LDA #$03

STA $0232

(assuming the initial state is as described above and the configuration hasn't been changed since).

What do you think?

Edited May 29, 2018 by DirtyHairy

ZackAttack · May 30, 2018

The mostly sounds good, but I don't really understand what is gained from the explicit mode bits. Isn't easier to just imply the mode from which address you write to? That's what 3e does and I've found that easy to work with.

I definitely agree with supporting 4k down to 256 bytes for the size of the slots. Being able to bank in different look up tables could be very useful.

Btw, I posted a prototype side scrolling engine based on 3e to the Castlevania topic a while back. That could easily be adapted to show off what's possible with this new scheme. The biggest reason I didn't develop it further was because I couldn't find hardware that supported a large enough bin to have enough level data. Ideally this gets polished up and made into a batari basic kernel so it's accessible by more of the development community.

DirtyHairy · May 30, 2018

The mostly sounds good, but I don't really understand what is gained from the explicit mode bits. Isn't easier to just imply the mode from which address you write to? That's what 3e does and I've found that easy to work with.

I definitely agree with supporting 4k down to 256 bytes for the size of the slots. Being able to bank in different look up tables could be very useful.

Btw, I posted a prototype side scrolling engine based on 3e to the Castlevania topic a while back. That could easily be adapted to show off what's possible with this new scheme. The biggest reason I didn't develop it further was because I couldn't find hardware that supported a large enough bin to have enough level data. Ideally this gets polished up and made into a batari basic kernel so it's accessible by more of the development community.

Having explicit mode registers feels somewhat cleaner to me. If we infer the mode from the location, we need four times the registers we have in my previous version, and the register map would look much more messy. That said, I am not much of a VCS programmer, so if something else is easier to work with, I‘m open.

Sacrificing a few bits of the lo byte registers would make the split between lo an high bytes more messy imho (i.e. 5 bit lo, 8 bit high, 13 bit total).

What I don‘t like about my suggestion is that activating a large slot will remove any smaller slots that it covers. However, I haven‘t yet come up with an idea how to improve this without adding a lot complexity, and the possibility to store configurations might alleviate this.

Extending your prototype to exploit the resources provided by the new scheme would be great. Once we agree on a scheme (and I find time), I‘d start out by implementing it in Stella. Once we have it working there, I’d move to the Unocard.

ZackAttack · May 30, 2018

The only reason I had for implying the mode is that it makes it a little easier and faster to use a mixture of them. Looking at your proposal again, you already have that covered.

What you proposed sounds fine with me. Just to clarify will the ROM be viewed as a single array of pages where any page can go into any slot? With 16bits for each slot that allows for 16MB with 256 byte pages or 256MB with 4k pages. That plenty of room for future growth.

Having a large slot wipe out smaller slots is exactly how I'd expect it to work. You probably don't need to worry about that.

DirtyHairy · May 30, 2018

Precisely. That should be plenty of space for the next centuries to come

Thomas Jentzsch · May 31, 2018

Having a large slot wipe out smaller slots is exactly how I'd expect it to work. You probably don't need to worry about that.

The opposite is IMO more interesting. E.g. if I map a 256b slot into a 4K slot layout, does this become a 256b +3.75k layout?

ZackAttack · May 31, 2018

The opposite is IMO more interesting. E.g. if I map a 256b slot into a 4K slot layout, does this become a 256b +3.75k layout?

See the 4 permutations below. By map I mean writing to the corresponding bank lo register for a given page size. Are you suggesting that #3 should leave slot c unchanged or is what I've listed below what you're expecting?

1. Map a 4k page to slots 0-f, then map a different 4k page to slots 0-f.

-Slots 0-f contains the second 4k page

2. Map a .25k page to slot c, then map a different .25k page to slot c.

-Slot c contains the second .25k page. Slots 0-b, d-f are unchanged.

3. Map a .25k page to slot c, then map a 4k page to slots 0-f.

-Slot 0-f contains the 4k page. The .25k page is no longer mapped.

4. Map a 4k page to slots 0-f, then map a .25k page to slot c.

-Slot 0-b,d-f contains the 4k page. Slot c contains the .25k page.

Thomas Jentzsch · May 31, 2018

What you listed is what I would have expected.

Another one: After 4. what if the same 0.25b page is mapped to e.g. slot a? Does this invalidate the mapping to slot c? Or do we have the same page mapped to two slots (a+c)?

ZackAttack · May 31, 2018

I would think we'd just have the same page mapped to two slots.

DirtyHairy · May 31, 2018

My proposal behaves as you describe: Writing to a bank index to a slot lo register will swap the bank into the specified slot, removing anything else that might have been mapped there (including smaller banks). Mapping a bank into two different slots will do exactly that: the data will be accessible in two different memory areas.

I have thought a bit on how to allow for „sticky“ slots. We could use one of the two remaining bits in the mode registers to declare a slot sticky. A sticky slot will not be be overridden if a larger slot is swapped over it; instead, it stays persistent. This would allow e.g. SARA style banking: switchable 4k banks with the lowest 256 bytes always reserved for accessing 128 bytes of RAM.

What I am not sure about is the semantics of writing to the high and mode registers. In my initial proposal, actual change was only triggered by writing to the lo registers. However, I can see some virtue in changes to high and mode registers taking immediate effect, too, and it would be easier to implement. What do you think?

I think I will create a github repo that hosts a markdown specification of the new format to keep a concise description in a well defined place We can then all contribute to making it as clean and comprehensible as possible, and there will be a clean documentation for implementations. Do you have a suggestion for a name of the banking format? „Ubercharger“ and „Cardzilla“ come to my mind

Edited May 31, 2018 by DirtyHairy

Thomas Jentzsch · May 31, 2018

I have thought a bit on how to allow for „sticky“ slots. We could use one of the two remaining bits in the mode registers to declare a slot sticky. A sticky slot will not be be overridden if a larger slot is swapped over it; instead, it stays persistent. This would allow e.g. SARA style banking: switchable 4k banks with the lowest 256 bytes always reserved for accessing 128 bytes of RAM.

Interesting idea, I like it.

What I am not sure about is the semantics of writing to the high and mode registers. In my initial proposal, actual change was only triggered by writing to the lo registers. However, I can see some virtue in changes to high and mode registers taking immediate effect, too, and it would be easier to implement. What do you think?

IMO both writes should have immediate effect.

As for your suggestion, I think I would prefer to use ZP registers for the most frequent actions. I suppose any kind of configuration is happening much less frequent than switching. And for the latter, the low byte is used more frequently. So effectively a layout which allows fast switching up to 256 banks with a ZP access.

Regarding the name, I think I prefer something more technical, like "MegaFlex".

DirtyHairy · May 31, 2018

IMO both writes should have immediate effect.

Then that's how it will work, it is easier to implement anyway and probably more intuitive, too.

As for your suggestion, I think I would prefer to use ZP registers for the most frequent actions. I suppose any kind of configuration is happening much less frequent than switching. And for the latter, the low byte is used more frequently. So effectively a layout which allows fast switching up to 256 banks with a ZP access.

What we could do is to move the bank lo registers for 4k, 2k, 1k and 512b to $30 -- $3e. The 256b registers will have to stay in the higher pages.

Regarding the name, I think I prefer something more technical, like "MegaFlex".

Hmmm, what about "FlexCart"?

Edited May 31, 2018 by DirtyHairy

Thomas Jentzsch · June 1, 2018

Why "Cart"? The is a scheme, not a cart.

Edited June 1, 2018 by Thomas Jentzsch

DirtyHairy · June 2, 2018

Why "Cart"? The is a scheme, not a cart.

You‘re right there. One point to MegaFlex If nobody else contributes any ideas, we can also go for CTZ

ZackAttack · June 3, 2018

You‘re right there. One point to MegaFlex If nobody else contributes any ideas, we can also go for CTZ

My vote is for MegaFlex. I not a big fan of naming things after myself. Besides CTZ sounds like some sort of Cadillac.

DirtyHairy · June 3, 2018

Let‘s go for MegaFlex then. I’ll create a repo with an initial draft of the spec, hopfully next week.

Edited June 3, 2018 by DirtyHairy

DirtyHairy · June 14, 2018

As a first small step, I have extended 3E. The firmware now supports up to 96k RAM and 512k ROM; however, I have limited RAM for 3E images to 32k for backward compatibility (the Bad Apple demo runs, but it seems to rely on the RAM bank being set mod 32). If you want to experiment with more RAM, you can use the .3EX file extension: this lifts the limit and allows access to the full 96k RAM.

I have attached the modified firmware; the source is available on my fork on GitHub.

firmware.zip

Large 3E cartridge support on the UnoCart

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members