How is bank switching implemented in a Multi-Cartridge?

retro_doog · March 16, 2016

OK, I had an 80% complete schematic for the design above with dual EEPROM sockets and all of the limitations I originally conceded to, but when I started some initial PCB floor planning I noticed that the cartridge board was looking pretty congested. Then, I realized that forgot all about the SRAM chip! Not willing to give up yet another important feature, I decided to scrap the whole thing and start over at the architecture phase :twisted:

So, now I'm back with a new design. I've decided to take the plunge and go for a unified ROM resource. I'm using a Quad SPI capable tiny 8-pin SOIC that packs up to 8MB in a single tiny chip! My hope is to make a very efficient "Byte-Bang" SQI interface implementation using AVR assembly so I can hopefully achieve the 4X speedup over 1-bit SPI. I'll have to see how fast back to back port writes can be done in assembly. I'm not at all worried about GROM, since that is slow as molasses(less than half a MHz clock, right?), but I really want to be able to handle straight ROM on this interface as well. I'm upsizing from a 20-pin AtTiny SOIC to a 32-Pin AtMega and putting an external crystal down. Worst case I'll over clock the AVR from 20MHz to 30Mhz which appears to be quite stable for most who have tried. Since I'm not using the ADC or any of the internal I/O modules really, I expect to be in good shape. This will prevent me from having to level shift an Xmega processor.

Also, I'll be able to use some sort of serial in system programming method for the flash, instead of dealing with socketed EPROMs. This would open the door for a future revision with an AtMega USB capable AVR and being able to flash over USB, or I may just release some USB/UART based "bus-piratey" type flasher for those who want to be able to flash their carts. For first proto, at least, It'll have to be ISP through a custom external device like my MBED development board expansion port.

Anyway, I just wanted to update everyone and let you know the project is officially underway! The PCB is holding up a couple of prototype multi board panels I'm waiting to release, so I'm motivated to get this one done quickly so I can send everything out at once and have a ton of fun soldering/reflowing to do in a few weeks

+acadiel · March 17, 2016

:thumbsup:

retro_doog · March 17, 2016

Haha, thanks acadiel!

Now the board is looking plenty sparse. I may throw down one EEPROM/FLASH socket just in case I can't make ROM timing. I came up with a trick that I'm sure will allow me to meet GROM timing however, maybe even at a crystal-less 8MHz. Stuffing options are always good

Tursi · March 17, 2016

My AVR code easily meets 3x GROM timing at a crystal-less 8MHz, with plenty of indirection and hardware interaction on the side, so I'm pretty sure you aren't going to have any trouble there.

GROMs are blocking devices anyway, so there's no requirement to meet an exact timing. If you're fast, the other GROMs still hold the bus the normal duration anyway, and if you're slow, unless you're really slow, nobody will notice.

retro_doog · March 17, 2016

My AVR code easily meets 3x GROM timing at a crystal-less 8MHz, with plenty of indirection and hardware interaction on the side, so I'm pretty sure you aren't going to have any trouble there.

GROMs are blocking devices anyway, so there's no requirement to meet an exact timing. If you're fast, the other GROMs still hold the bus the normal duration anyway, and if you're slow, unless you're really slow, nobody will notice.

Oh wait. You mean the Console GROMs will hold the GREADY inactive(LOW) until they decide the address is not in their space? So all of the GREADY are wired AND together and are not Totem-Pole or Push-Pull? I thought GROMs were PMOS. Hmmm I better do some more research on that aspect of the GROM electrical specs. Still, I definitely don't want to be the slowest GROM in the box. Note that your GROMulator has parallel access to the internal flash in the AVR you chose. I'm using an external SQI flash. If I just used 1-bit SPI, the initial flash access would be 8+24+8 clocks at 4Mhz(the max SPI can run when the SysClk is 8Mhz) and would take 10uS plus some overhead to present the parallel data on the bus. Subsequent Sequential Data Cycles would be right at 2uS + Overhead and would be just about the exact rate for the SPI clock. However, if we creep just over the line, we have to wait over 2uS until the next GROM CLK to get the data and would effectively be 2x slower. But, I'm confident that whatever I lose in Nybble-Banging SQI, I'll get back in the 4x data throughput so I'll definitely be to deliver sequential data at the GROM CLK rate.

Still, I really want to be able to handle ROM cycles with the SQI ROM, So I'm hopeful that I can devise enough tricks including hand assembled SQI protocol, falling edge clock tricks, and as a last resort, over clocking to make ROM timing. Especially since I won't have the option of a hold signal. Before I fab this board, I plan to put my logic analyzer on a real cart and characterize the typical cycle timings at the cart slot. I'm suspecting I may be losing a sliver of time budget over the timing diagrams due to the console having to decode to generate the ROMG signal. Right now I'm designing a Nybble-Slicer from 8:4 muxes so I can present most of the address nybbles directly to the SQI interface instead of piping them through the MCU. Most of the overhead will be in generating the high order address nybbles and getting them piped to the SQI as there will be a Port read, table lookup, and pointer arithmetic before the bank portion of the address can be sent. I guess the table lookup offset will be in a local variable by then, so at least I won't have to take the array index timing hit.

All of this active discussion is great! I helps me "think out loud" while typing and alerts me to caveats I need to keep an eye out for.

Thanks all

Edited March 17, 2016 by retro_doog

+mizapf · March 17, 2016

Oh wait. You mean the Console GROMs will hold the GREADY inactive(LOW) until they decide the address is not in their space? So all of the GREADY are wired AND together and are not Totem-Pole or Push-Pull?

As far as I understood the GROM behavior, the GREADY output is always 0 except for a short period during access when the data byte becomes available. Thus the GREADY line is gated with the GS (select line) so that the system READY line is not affected at other times. Have a look at the schematics.

Also, a good source of information is M. Bunyard's "Hardware Manual", section 2.5. Unfortunately, there are still some open questions for me. I once considered to build a GROM analyser with an Arduino or Raspberry which sets and samples the data lines and records the levels, but this is a bit more difficult than I thought because the IO pins of the Raspberry or Arduino are not using TTL levels, and you need three voltages, one of them negative. Anyone interested in trying that? I'd love to hear about the results. :-)

Edit: The following diagram is my current theory on GROM operation. Maybe someone can shed some more light on the unknown spots.

Edited March 17, 2016 by mizapf

Tursi · March 17, 2016

but this is a bit more difficult than I thought because the IO pins of the Raspberry or Arduino are not using TTL levels, and you need three voltages, one of them negative.

You don't need negative voltages to talk to the GROM data bus. If you did, my AVR wouldn't work either. I spent plenty of time with a logic analyzer working with the GROMs before I built anything.

Oh wait. You mean the Console GROMs will hold the GREADY inactive(LOW) until they decide the address is not in their space? So all of the GREADY are wired AND together and are not Totem-Pole or Push-Pull?

correct. My timing measurements suggest that they all do the entire memory cycle and the only difference is whether they drive the data bus. They also all participate in address writes... this is necessary because they all track the GROM address in order to know whether to respond. The only thing I did not test is whether they all respond to address read or not. (I believe that they all do based on simple evidence. For instance, set the GROM address to a non-existent GROM, like >F000 with just Editor/Assembler plugged in. Reading back gives you the correct address, although no chip responds to that range). The data bus is likewise not strongly driven, but I need to double-check the details. Reportedly there is a pull-up on the bus in one direction and a pull-down in the other, so you need to drive it correctly. (TTL can dominate that system, which is how the old GROM devices would override it.)

Note that your GROMulator has parallel access to the internal flash in the AVR you chose.

I know how my system operates, and it has plenty of operations that are slower than the console GROMs. For instance, writing bytes to EEPROM is slower. Waiting for the flash controller to complete a block write or a block erase is slower. But I do all of those in a normal GROM write cycle - it just takes longer.

I've tested using my AVR to run the entire system in slow motion, which was great fun for about 10 minutes, by extending how long I held READY on purpose. I've also tested removing the console GROMs and running the system entirely on my AVR, both at 8MHz and 20MHz, and measuring the performance difference. (Pro-tip: barely noticeable. Traced this to the GPL interpreter which spends about 30:1 cycles interpreting versus reading GROM. You have to be REALLY slow to make a difference and being faster just doesn't matter).

However, if we creep just over the line, we have to wait over 2uS until the next GROM CLK to get the data and would effectively be 2x slower.

If I remember my notes correctly (I'm at work and can't look them up), GROM operations take 14-30 GROM clocks. You aren't anywhere near being slow. You are also not required to use the GROM clock, nothing else in the system relies on it. I take it into my AVR but I don't use it.

Especially since I won't have the option of a hold signal.

The hold (or rather, READY) signal is an important part of being a GROM. I wouldn't recommend leaving it out on purpose even if you don't think you want it today.

Tursi · March 17, 2016

Ah, let me also correct myself saying correct -- Mizapf is right on the operation of READY. GROMs are /always/ "not ready", except when they have completed an operation. They then stay "ready" until the GROM select is de-asserted. This counts for all four possible operations - read data, read address, write data, write address, and all GROMs must participate since the READY lines are merged together.

+mizapf · March 17, 2016

You don't need negative voltages to talk to the GROM data bus. If you did, my AVR wouldn't work either. I spent plenty of time with a logic analyzer working with the GROMs before I built anything.

Oh, that's even better ... I was only relying on Bunyard's description (section 2.5.10)

The GROM is a PMOS chip and requires three bias voltages. These are +5, -5, and -0.8V.

so I did not even try.

As you said you had done some analysis some time ago, you may certainly know some more details than me. As I wrote above, I'm wondering how long the GREADY stays high. Once GREADY goes high, the GROM access is over at the next CPU clock pulse, which will cause a new value on the address bus, clearing the GS line, thus deselecting the GROM. So what happens with GREADY? Will it stay high until GS is asserted again, or return to low? And if so, immediately or on the next GROMCLK pulse? (This is the hatched area in my diagram above.)

Bunyard says in his description that

Considerably more action occurs when the second address byte is moved to the GROM address register. After the byte is actually written into the LSBY of the address counter, a data prefetch occurs. This is easily witnessed by observing the longer second GROM ready with an oscilloscope while loading the full GROM address into the GROM bank.

I understand this as GREADY going high after the address byte load, then during the successive 2.5 GROMCLK pause the prefetch is done, and then GREADY goes low. This would imply that GREADY goes low before the GROM is selected again, and that this occurs as soon as its internal operation is done.

Tursi · March 17, 2016

I need to run my logic analyzer again to get the actual counts, but yeah, the second address byte causes a MUCH longer cycle. The number of cycles, though, I lost that data a while back. It will probably be a few months before I can try again due to work.

GREADY stays read until after GS goes inactive, as my last post said, but I'll need to run the analyzer again to get clock counts for you (unless someone beats me to it ).

retro_doog · March 17, 2016

Hi Tursi,

Sorry up front , I don't know how to properly reference split quotes, so I'll just indent and color any quotes below:

So it looks like I won't have to worry about processing time with GROM in general, so that's good.

The hold (or rather, READY) signal is an important part of being a GROM. I wouldn't recommend leaving it out on purpose even if you don't think you want it today.

Sorry, I may not have been clear. I was referring to not having a way to hold off straight ROM accesses, which I am also trying to implement in a single flash resource.

GROM operations take 14-30 GROM clocks. You aren't anywhere near being slow. You are also not required to use the GROM clock, nothing else in the system relies on it. I take it into my AVR but I don't use it.

Wait, GROMs are more or less synchronous. How can you know when to sample all of the control signals, particularly since access can have multiple phases (Load Addr H, load Addr L, etc.) Most importantly sequential data accesses surely need to be synchronized to the GROM CLK do they not?

Reading back gives you the correct address, although no chip responds to that range). The data bus is likewise not strongly driven, but I need to double-check the details. Reportedly there is a pull-up on the bus in one direction and a pull-down in the other, so you need to drive it correctly. (TTL can dominate that system, which is how the old GROM devices would override it.)

So during an address read, every GROM chip drives the bus at the same time? Normally that would be a nightmare if one of the GROMs had a corrupt address load, but, being PMOS, I guess they can get away with that since there is no possibility of contention. However, I will capitalize on this behavior and not have my GROMulator respond to address reads at all, since the console ROMs will cover reporting the loaded address for me

Hmmm, this does make me realize I may want to put a stage of PMOS open drain buffering on the data bus to prevent possible contention.

Mizapf is right on the operation of READY. GROMs are /always/ "not ready", except when they have completed an operation.

That makes perfect sense to me, however, I may have been confused by this statement then:

If you're fast, the other GROMs still hold the bus the normal duration anyway,

I took this to mean that for every GROM cycle, the CPU doesn't "see" the GREADY until every GROM chip says they are ready. In other words, for a given access I thought you were saying that even if I complete my access super fast and assert my READY, the console GROMs, for instance would block it until they also say "ready". Or were you simply stating that my accesses will be fast, but whenever access occurs to a different GROM those particular slower accesses will bring the average access time down. However, knowing that the devices are PMOS and that the GREADY is active High, I would think that no GROM can override my GREADY High since no PMOS device can drive low.

Reportedly there is a pull-up on the bus in one direction and a pull-down in the other

I'm probably misinterpreting this as well. I could see if NMOS devices are mixed with PMOS devices, that the NMOS drivers would have weak pull-ups and the PMOS drivers would have weak pull-downs, and any TTL(Or CMOS even if TI used any such devices in the console) can easily overdrive am otherwise non driven bus. However, since the data bus is bi-directional by design, that would suggest that an un driven bus is not actually tri-state, but floats somewhere in the middle due to the resistor divider created by the NMOS pull-ups and PMOS pull-downs. Looks like I'll need to put an oscilloscope on the bus as well as the logic analyzer. I'm working with a PMOS device emulation in my speech synth ROM project, and my goal is to completely understand the circuit, otherwise there is a real possibility that my replacement device, and my multi-cart for that matter, might have slivers of time where there is real bus contention that, over time, can degrade the components and cause long term damage or failure to the system.

I should probably put a scope on the actual ROMS used in the carts as well to see if they are PMOS or NMOS or if they have a TTL output stage(unlikely). There's a very good chance that the parallel ROMs are NMOS as that was a popular technology due to having higher bit density. Actually, it occurs to me that the reason that the GROMs are characterized as weak drivers even by TI's admission, is because they are using under sized PMOS transistors. In general a properly balanced CMOS gate(balanced for speed or drive which is related to speed), the PMOS high side FET is physically larger to achieve the same timing and drive as the NMOS low side FET. I forget how much larger, but close to 2X IIRC. I should pull out my old VLSI design book and look it up. I'd been writing RTL for so long, sometimes I forget about the transistors that are begotten(begat?) from the code

Anyway, pull-ups and pull-downs(and PMOS and NMOS) are pretty easy to spot with a scope. If the rising edge is fast and the falling edge is slow, you have PMOS with a pulldown and vice-versa for NMOS. The other question for technology this old is where the resistors are. I would have no problem believing that the PMOS and NMOS(if any) chips simply have open drain drivers and that the "pull-me" resistors as I generically call them, are passives on the board or a "bus parking" terminator of some kind. It may do me well to hunt down the console schematic if it's out there and see exactly what I'm dealing with before I start "playing on the bus" :-o

Again, thanks for the help and clarifications! If I learn anything new and interesting, I'll be sure to share

- Like This

Edited March 17, 2016 by retro_doog

+mizapf · March 17, 2016

Have you already had a look at Michael Bunyard's description? You can download it from http://www.hexbus.com/tibooks/("Hardware manual").

retro_doog · March 17, 2016

Have you already had a look at Michael Bunyard's description? You can download it from http://www.hexbus.com/tibooks/("Hardware manual").

Ah, very helpful, mizapf! The link to that book I had on my iPad, didn't have Appendix G included!

So I see now that there is a mechanism to switch the 2.2K resistors from pull-ups to pull downs on the 8-bit data bus to the GROMs. This also seems to suggest that while the GROMs are PMOS, the chips that interface to it are NMOS and that all MOS devices are open-drain.

Looking at GREADY(GRY in the sch) is interesting. It has a pull-up! So, it looks like there is a very good chance that the PMOS GROM has an NMOS GREADY signal! So the Not-Ready state cold easily be asserted by all chips and you're not ready until every last ROM releases the not ready state.

So… It could be fun to make a multi-cart that also has the console ROMs onboard. You would have to pull the internal console ROMs for it to work, but you could possibly speed up the whole system that way! Does the later v2.2 refer to Console ROM or GROM0? This could be a way to help out those v2.2 folks who can't run ROM only carts I may make this an option for the multi-cart. Custom non-extended basic, anyone? Why?…. Why not!

Thanks for pointing me to this resource :thumbsup: The other books will come in handy too. More night time reading for my iPad! :grin:

Edited March 17, 2016 by retro_doog

+mizapf · March 17, 2016

Sorry up front , I don't know how to properly reference split quotes, so I'll just indent and color any quotes below:

You can just hit the "Quote" button of the message to be quoted, so you'll get another quote block in your reply. Your existing text will not be lost.

That makes perfect sense to me.

You have to clean away those lines that are not of interest.

Again, thanks for the help and clarifications!

Tursi · March 18, 2016

I took this to mean that for every GROM cycle, the CPU doesn't "see" the GREADY until every GROM chip says they are ready. In other words, for a given access I thought you were saying that even if I complete my access super fast and assert my READY, the console GROMs, for instance would block it until they also say "ready".

This is the true statement of the pair.

However, an /idle/ GROM asserts "not ready". The CPU is not halted because the GREADY line is not connected to the CPU during non-GROM accesses.

However, I will capitalize on this behavior and not have my GROMulator respond to address reads at all, since the console ROMs will cover reporting the loaded address for me.

This is what most GROM sims did.

Wait, GROMs are more or less synchronous. How can you know when to sample all of the control signals, particularly since access can have multiple phases (Load Addr H, load Addr L, etc.) Most importantly sequential data accesses surely need to be synchronized to the GROM CLK do they not?

GROMs don't control anything, the CPU does. You're interfacing to the CPU. Therefore only the CPU timings (and any associated decode circuitry) matter.

Load Address H and Load Address L are two separate write operations. Likewise, every byte of a sequential data access is a separate CPU read.

The GROM clock is only there to provide a slow enough clock to run the GROM chips. The timing is detached from the rest of the system. The GREADY line takes care of the differences in clock. (You can further see an example of this in the PCode card which implements its own GROM subsystem, at different addresses, with an even slower clock ).

retro_doog · March 18, 2016

I clearly need to read the GROM spec more closely. I thought the GROM clock was what latched in the address bytes and synchronized the control signals. Also, I thought the auto incrementing sequential data cycles were just a series of GROM clocks with the next sequential byte's data at each GROM clock edge. If this is not the case, then GROM is much slower than I thought so I guess I'll have no problems. It will be more clear when I look at the appropriate flow charts or simply take some logic analyzer dumps.

I do realize the GROM Clock is in a different clock domain, but I assumed these were still synchronous devices. The actual GROM chips surely require this clock and I still plan to synchronize my GROMulator to that same clock. I'm going to run my software state machine for GROM emulation off the GROM clock either by sampling the clock, or, more likely, routing it to a pin change interrupt and emulating a clocked state machine from that mechanism. Doing everything that is in the GROM flow chart completely asynchronously could expose some of those latent bus contention situations I talked about earlier. Also, I would be inclined to believe that the internal address counter in an actual GROM actually counts on GROM clock edges.

So, the big challenge will be straight non-GROM ROM access. My understanding is that I will only have four 333nS CPU clocks total to decode the address, fetch data from the SQI ROM, and drive present it on the data bus. If so, timing will be pretty tight and I may just have to build the thing and see how good I can get the software and then decide how much over clocking if any I need to guarantee timing. Because of that, I may also put a "chicken ROM" down so I can at least have 512K-1M of parallel ROM if I can't get shared serial ROM to meet timing. I have a sort of AVR dev board on hand that is not an arduino, so maybe I should whip up some test code to see how fast I can emulate the SQI bus.

Also, is anyone clamoring for 8K of RAM in the cartridge space? Due to the bank/title select's use of lower address lines, I thought of having only the upper 4K as an SRAM option. Thats still, 16x what the built in console has, and many/most are utilizing various schemes to get the proper 32K expansion either with PEB emulation like the CF7+, putting the 32K in the console, or having actual PEBs. The SRAM I'm putting on the cart will be more like the battery backed Mini-Memory SRAM, although it can be used for volatile data if a title want that too. However, it will be nonvolatile and NOT need a battery… ever! It's a semi-new technology that has a flash backed SRAM implemented at the bit level. What I may do with the 8K chip is split it into 2 banks so that if you use the mini-memory built in cart, and later want to use a non-minimemory title that just wants volatile RAM, it won't overwrite the minimum 4K partition. The descriptor will have a bit that chooses which bank of 4K SRAM is active, if any, for a title.

Edited March 18, 2016 by retro_doog

+InsaneMultitasker · March 18, 2016

If you are going to provide RAM in the cartridge space, there isn't much value in only providing a 4k bank, IMHO. The most common applications are Mini-MEM and supercart, of which the former is pretty limited. I have a well-used 8K supercart, with Editor Assembler, that comes in handy for the programs supporting or requiring that extra space. It doesn't require or use any bank switching, and was the most common implementation.

Tursi · March 18, 2016

I clearly need to read the GROM spec more closely. I thought the GROM clock was what latched in the address bytes and synchronized the control signals.

It probably is, INSIDE the GROM. It's not synced to the CPU control signals.

Also, I thought the auto incrementing sequential data cycles were just a series of GROM clocks with the next sequential byte's data at each GROM clock edge. If this is not the case, then GROM is much slower than I thought so I guess I'll have no problems. It will be more clear when I look at the appropriate flow charts or simply take some logic analyzer dumps.

No. The auto increment feature just means that you don't need to load a new address every time you want a new byte of data - every read increments the address, but each read is still a distinct CPU operation.

You might be getting a little far ahead of yourself to have the cartridge 80% laid out without actually understanding the hardware you are attempting to reproduce.

I do realize the GROM Clock is in a different clock domain, but I assumed these were still synchronous devices. The actual GROM chips surely require this clock and I still plan to synchronize my GROMulator to that same clock. I'm going to run my software state machine for GROM emulation off the GROM clock either by sampling the clock, or, more likely, routing it to a pin change interrupt and emulating a clocked state machine from that mechanism. Doing everything that is in the GROM flow chart completely asynchronously could expose some of those latent bus contention situations I talked about earlier. Also, I would be inclined to believe that the internal address counter in an actual GROM actually counts on GROM clock edges.

Fill your boots. I spent a lot of time working on my system, and I'm confident that I understand the system quite well. You don't have to take my advice, but my GROM simulation code has been running fine on dozens of consoles for over a year, and on my own for several years beyond that. I don't believe the contentions you are worried about will manifest. But I'm happy to stay out of it.

retro_doog · March 19, 2016

It probably is, INSIDE the GROM. It's not synced to the CPU control signals.

Sure, that's my point. A GROM device is a synchronous device, albeit in it's own clock domain. My design will attempt to reproduce the synchronous behavior of an actual GROM. I now have come to realize that you have chosen to implement your design in an asynchronous manner, which was not immediately obvious to me. We're both engineers here, and we know there is more than one way to implement a design. My implementation will be closer to emulation, yours seems to be closer to simulation. Both methods work as your design has proven.

No. The auto increment feature just means that you don't need to load a new address every time you want a new byte of data - every read increments the address, but each read is still a distinct CPU operation.

Yes, I now realize that GROM operations are a series of individual atomic CPU access cycles and not a monolithic operation like I had assumed. My assumptions made sense to me based on how other sequentially accessed ROMs operate, including TI's own TMS6100 Speech ROMs which I had just completed an AVR based ROMulator for. Since the GROM was not openly specified by TI and I have yet to find an actual TI published timing diagram, I'm left to deal with interpretations based on the work of others like yourself.

You might be getting a little far ahead of yourself to have the cartridge 80% laid out without actually understanding the hardware you are attempting to reproduce.

Haha! I like you. Of course it's not laid out, just schematic and floor-planned. However, I can go all the way to fab and assembly before I need to know exact details. I just run every control signal to the AVR, and mux the two nybbles of the data bus to both the AVR and to the SQI flash, and the rest is just "typing"(my former manager's word for software )

Fill your boots. I spent a lot of time working on my system, and I'm confident that I understand the system quite well. You don't have to take my advice, but my GROM simulation code has been running fine on dozens of consoles for over a year, and on my own for several years beyond that. I don't believe the contentions you are worried about will manifest. But I'm happy to stay out of it.

Your advice is much welcomed, and your knowledge of the system is extremely helpful! I'm choosing a different path by implementing synchronous near-emulation, but that doesn't mean I don't have respect for your design or the work that went into it. :thumbsup: My background is in synchronous ASIC I/O bus and host controller designs including multi-clock domain situations(33/66Mhz PCI to 50/100MHz FireWire), so I'm naturally gravitating towards the design style that is in my wheelhouse. Other than quicker response time and probably not responding to address reads, I would like my GROMulator to behave both internally and externally most like an original GROM.

Again, thanks for all the useful input :-D

Edited March 19, 2016 by retro_doog

+acadiel · March 19, 2016

Ah, engineers. They each have their own unique ways of accomplishing something. <grin>

Edit.. just for my own knowledge, how did the GRAM Kracker "override" the console GROMs so that you could run your own GROM 0/1/2 out of the GRAM emulator?

retro_doog · March 23, 2016

Ah, engineers. They each have their own unique ways of accomplishing something. <grin>

Edit.. just for my own knowledge, how did the GRAM Kracker "override" the console GROMs so that you could run your own GROM 0/1/2 out of the GRAM emulator?

I'm not too familiar with the GRAM Kracker myself, but that is a good question.

apersson850 · March 23, 2016

I'm not sure (I don't have any), but I have a feeling that I read somewhere that the GRAM kracker simply "shouts louder", i.e. drive the bus with more power. But again, this may be a mistake.

retro_doog · March 23, 2016

OK, after starting a two EEPROM version of the AnyCart with an ATtiny and scrapping it...

And then completing a schematic for a mixed EEPROM(for ROM) and ATmega+SQI ROM(for GROM), all the way to floor planning, I've decided to park that implementation as well.

Still too many compromises and kludges/muxes/messy stuff.

So. I now have a 75% complete schematic of "Third Time's a Charm" goodness! :grin:

I've decided, I don't really want to just make just another, albeit improved, version of a ROM/GROM cart. I want to make a cartridge development platform!

So I've jumped straight to a 44-pin ATxmega class processor, which is only a buck more than the 32-pin ATmega I was using before, with the following benefits:

44-pins(of course ) - Turns out the TQFP44 has the same pin pitch of .8mm as the 32-pin, so other than the extra 12-pins, the soldering won't be that much of an extra challenge. I can do 1.27mm SOICs without my "Nerd Goggles" and can somewhat painfully handle .65mm TSSOPs and .5mm VSSOPs, so I'm confident these will go down relatively easily.

32-Mhz operation and all from the internal oscillator. This actually saves me a $0.50-$0.70 crystal, so the Xmega upgrade is even cheaper! Also, I hear people can run these at 48Mhz with no issues and still from the internal oscillator to boot!

USB, baby! Although my initial code base won't have the hooks in and I'll be using my ICE to flash the AVR and my custom SPI programmer for the flash, eventually, I'll have boot loader code and a way to flash the cart over USB. It will probably still require preparing a binary, but still, the cart will ultimately be able to be flashed without cracking it open. I plan to put the mini USB on the side so you can never have it powered by both the console and USB at the same time. If I had the newer white cartridge shells, I could have put it in the cart slot area to the side of the card edge connector. Still, when the cart is inserted, the USB won't show.

Unified SQI flash ROM Yup, a tiny 8-pin SIOC with up to 8M(Maybe more) of flash.

The only downside is that I have to run the XMEGA off of 3.3V and therefore have to level shift the entire cartridge slot. But I already have that worked out as well as mixing(well, tristate bussing) the 8-bit data bus with the lower byte of the address bus, since I'll never need both at the same time(The console won't be able to write to GROM).

Even at 48Mhz, with 24Mhz SPI, I'm still too slow to do straight ROM access with single bit SPI, but I figure the 32-48Mhz boost over the ATmega's 20 Mhz should make it easier to emulate the SQI bus at a rate faster then single bit automated SPI can handle. I have an idea where I preload the first 12-bits of the address whenever I go to Idle to get a head start. Then, if a ROM access comes in, I'll only have to send the last 3 nybbles of the address and fetch the data. For the second half of the read cycle, the auto-incrementing behavior of the SQI flash should make it easy to fetch the second(Even?) Byte. Depending on stuff, The ROM images may have to be "re-endianed" to line up with this, but that's just byte shuffling and can be done in a script if needed. Now, If we're sitting at Idle with our ROM bank address preloaded into the flash and a GROM access comes. I just pull the chip select high and start over with loading the GROM base address. The delay is no problem since GROM is Soooooo Slooooooooooow compared to CPU ROM.

So, this one's a keeper, and the floor planning is looking good. I have all the level shifters down in the "squeezed" area of the cart where the shell Z-height restrictions are tighter, which is where they want to be anyway, right next to the card edge signals. The only task left is to hook up the busses to the AVR, which is an iterative process that I'm doing in conjunction with layout to make routing easier(i.e. I can reorder signals to make them mostly run parallel to the AVR ports). I probably won't have a lot of time for layout in the next week, but I wanted to make sure the schematic was done so I didn't forget any of the fine details. Still, I have a bunch of PCB stuff backed up, so I'm pretty motivated to get this done so I can send everything out for fab at once.

I'll keep everyone posted.

Edited March 23, 2016 by retro_doog

Tursi · March 23, 2016

Ah, engineers. They each have their own unique ways of accomplishing something. <grin>

Edit.. just for my own knowledge, how did the GRAM Kracker "override" the console GROMs so that you could run your own GROM 0/1/2 out of the GRAM emulator?

Just stronger line drivers, literally. I was able to do the same with my AVR (although I would need to check if the current release still has the capability, since I tried to be a more considerate player). "Shouting louder" is a good description. I'm just not sure how much current draw that caused and whether the GRAMKracker used any form of current limiting or just drove the bus directly.

How is bank switching implemented in a Multi-Cartridge?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members