New 512K flash cartridge design, rewritable by 4A

+FarmerPotato · July 6, 2019

NOTE: The first run of cartridges will be soldered and given out to attendees of Fest 99/4ATX in Austin on Aug 10. Hence the silkscreen logo!

This is a 512k Flash cartridge with these special attributes:

It has the ability to be erased or reprogrammed on the 4A.
It is low-cost. Made for under $7. (assumes Qty 10).
It is easy to build. All thru-hole components. 2 socketed chips.
Bank switching follows the common non-inverted scheme used by existing mega-carts. Bank is 0 on power-up.
The inexpensive TL866/XGecu programmer is used for one-time programming of ATF22LV10, or filling up the Flash.
If using the SST39SF040, it provides 4K sectors, so programs can save data in the cartridge.
Open source hardware.
It supports the AT49F040 (found in UberGROM) or SST39SF040 and similar Flash chips.
Fits in a standard cartridge case.

Use cases

Run the Don't Mess With Texas MegaDemo
Run any other 512k cart image
With the SST Flash chip, write a game to use a large amount of read/write storage in 4K sectors.

Erasing/Reprogramming on the 4A

The example code (below) shows how programming works. (Erase is similar.)

The AT49F040 must erase the whole chip, with the exception of the first 16k which can be protected.

The SST39SF040 is notable for its 4k sectors, individually erasable.

When erased, the Flash is filled with 1s which can be written over at any time.

It's easy to imagine a scheme where a game could append a little bit of data at a time, and later read from the latest save.

Implementation

The bank register has 6 bits of address plus two mode bits. It increases the address range to 2^19 = 512K in the usual way.

M0 M1 B2 B3 B4 B5 B6 B7

M0 M1
0 0 Read mode
0 1 Command mode
1 0 Program even byte
1 1 Program odd byte

Command mode allows the 4A to write special byte sequences to the Flash, for erase, retrieve chip ID, or program a byte.

Program mode allows 1 byte to be written to the Flash.
Mode bits are cleared after the expected write is done.

NOTE: This cartridge will not support GROM. It's simply not possible with the chips on board.

Pretty Pictures

Those giant parts are actually supposed to be regular size capacitors. I'm having some KiCad difficulty there.

Code

MYWS   BSS  32
PGM    DATA >0100
ODD    DATA >0080
BAA    BYTE >AA
B55    BYTE >55
BA0    BYTE >A0
       EVEN
OPLWPI EQU >02E0  * instruction for LWPI 
OPLI   EQU >0200  * instruction for LI,R0 
OPCLR  STST R0    * instruction for STST R0 which is better than CLR R0
JUNK   EQU >AA55 

* Program the flash: address R0, data ptr R1, length R2, bank number R3 (bank*2)
* This self-modifying code takes advantage of the fact that LI does not do read-before-write.
FLASHW
       MOV  R11,R10
       ORI  R0,>6000    * ensure R0 in cartridge space
* modify instructions for bank number
       SOC  @PGM,R3
       MOV  R3,@FLSEP  * for program even byte
       SOC  @ODD,R3
       MOV  R3,@FLSOP  * for program odd byte
FLOOP
* modify instructions for addresses and data
       MOV  R0,@FLSEA
       MOV  *R1,@FLSED
       MOV  R0,@FLSOA
       MOV  *R1+,@FLSOD

       BL   @FLASHU
       DATA OPLWPI  * Enter Program mode for even byte in bank R3
FLSEP  DATA JUNK
       DATA OPCLR
       DATA OPLWPI  * Write even byte to address in FLSEA.
FLSEA  DATA JUNK
       DATA OPCLR
       DATA OPLI
FLSED  DATA JUNK
       
       BL   @FLASHU
       DATA OPLWPI  * Enter Program mode for odd byte in bank R3
FLSOP  DATA JUNK
       DATA OPCLR
       DATA OPLWPI  * Write odd byte to address in FLSOA.
FLSOA  DATA JUNK
       DATA OPCLR
       DATA OPLI
FLSOD  DATA JUNK

* next
       LWPI MYWS
       INCT R0
       DECT R2
       JGT  FLOOP

       B   *R10
* the inner loop is 20 words.. may be feasible to copy it to PAD. Or just 32K.


* Byte sequence to unlock Flash for programming. Hardware masks out the read-before write of MOVB, and the unwanted even byte write.
FLASHU
* Enter Command mode with bank 2
       LWPI >6080         bit >80 is for command mode
       STST  R2           set cmd, bank 2
       MOVB @BAA,@>7555   flash address >4000+>1555 = >5555

       STST  R1           set cmd, bank 1
       MOVB @BAA,@>6AAA   flash address >2000+>aaa = >2aaa

       STST  R2           set cmd, bank 2
       MOVB @BA0,@>7555   flash address >4000+>1555 = >5555
       LWPI MYWS
       RT
    
	   END

Risks

The cartridge has not been tested. It is being made at DirtyPCBs.
It might not be ready by Aug 10.
The 22LV10s might turn out to be misprogrammed by the XGecu. (I am verifying one now.)

Edited July 9, 2019 by FarmerPotato
Debug code, title

+Ksarul · July 6, 2019

I will have to look at this--I may be able to do a modified UberGROM board that permits this to work. . .as those two chips can probably be made to fit.

+FarmerPotato · July 6, 2019

50 minutes ago, Ksarul said:

I will have to look at this--I may be able to do a modified UberGROM board that permits this to work. . .as those two chips can probably be made to fit.

You are welcome to it. (but you have a ATmega128 already, why not use that?)

The real tricks are preventing read-before-write, and not screwing up the bank register during a byte write.

I was able to fit all the logic in a ATF22LV10 which only has 10 registers/combinatorial outputs. (PLD file to be released when it's tested.)

I rewrote the code to use LI, which doesn't do a read-before-write, and STST, same.

See:

GDMike · July 9, 2019

Oh man, this is cool. I'd love to be able to get my hands on a working one. Sigh

+FarmerPotato · July 9, 2019

2 hours ago, GDMike said:

Oh man, this is cool. I'd love to be able to get my hands on a working one. Sigh

PM me your address and I'll save one for you.

I stress that this is about to start the testing stage.

The boards are at DirtyPCBs, where the protopack is $20 for about 11 boards (10 cm size. Cartridge is 98 mm by 59 mm I recall.)

I designed and ordered a test jig from OshPark to help verify the logic behavior.

GDMike · July 9, 2019

Thanks!! I'm really trying to get a storage location like this to store Forth Words as some kinda cache. This may or may not do it..as I'm still trying..

+Vorticon · July 10, 2019

Very nice! The 4K storage scheme is equivalent to the SAMS card one, without the need for a PEB card. That is one the biggest advantages in my view.

+FarmerPotato · July 10, 2019

1 hour ago, Vorticon said:

Very nice! The 4K storage scheme is equivalent to the SAMS card one, without the need for a PEB card. That is one the biggest advantages in my view.

To clarify, the bank switching is the usual 8K. It is the SST flash chip that supports 4K eraseable sectors. They would still be mapped in the cartridge space as 8K banks.

I can't get to 4K banking with the $1.42 PLD I used. It only has 10 bits of storage. Sorry.

Hey, I just did some research and I think replacing the ATF22V10 with the $2 ATF1502 will make 4K paging doable. It has 32 bits of storage. Some Atari folks have used the $3 ATF750 but that's near obsolete. Going full Xilink like the FinalGROM or NanoPEB is a $9 part. My first goal has been to minimize cost.

By the way, I'm getting 2 thru-hole sockets from Unicorn Electronics which adds $1.09 to the build. PLCC-28 and PLCC-32.

+FarmerPotato · July 10, 2019

I did some initial design of a FAT-ish file system using 256 bytes at a time. Since the flash is rewritable by changing 1s to 0s one byte at a time, but not the reverse, it would work like this:

A program (that had been copied to the 32K) would embed a library of routines to manipulate files (no DSR). It could look up a file in the directory, find a list of sectors (16 bit numbers) and read 256 bytes at a time. Then it could append to the file. Rewriting part of a file would mean that sectors are replaced in the list by new ones, but only if the whole sector list is buffered in memory. Closing the file would mean writing out a clean copy of the entire file, then marking the old file as deleted. There is an allocation table with a bit to mark each sector as used, and another table where sectors are marked deleted.

To keep things tidy, new files would start on 4K boundaries. So whole deleted files could be erased, and the allocation table could be erased and rewritten.

This is ideally suited to:

large read-only files
appending to files
non-indexed databases (linear search, skipping deleted records)
especially a database of high scores
writing a saved game, or rewriting it

It's remotely possible that this could be done on a system with no 32K. Especially a simple task like saving high scores. VDP RAM would be required to buffer some tables and sectors while tiny routines executed out of PAD (bank switching means the ROM disappears while file sectors are mapped in.) Aha, I have already checked to see if the write byte routine fits into PAD!

HOWEVER, I am not going to work on this file system, only working up until the testing of the 512K bank switching and writing is done. The file system is just a spec. Too many projects!

+9640News · July 10, 2019

Just thinking out loud here. Could the DSR from the Horizon's Ramdisk be tweaked to use such a system? If so, is there any benefit to using a chip setup with more memory potential?

Again, just thinking out loud.

Another question. I know the FinalGrom has a lot of capabilities that have not been tapped yet. Is the Flash Cartridge setup you are proposing a subset of features on the Final Grom, or something more advanced?

I am just trying to understand the details here.

Beery

+FarmerPotato · July 10, 2019

31 minutes ago, BeeryMiller said:

Just thinking out loud here. Could the DSR from the Horizon's Ramdisk be tweaked to use such a system? If so, is there any benefit to using a chip setup with more memory potential?

Again, just thinking out loud.

Another question. I know the FinalGrom has a lot of capabilities that have not been tapped yet. Is the Flash Cartridge setup you are proposing a subset of features on the Final Grom, or something more advanced?

I am just trying to understand the details here.

Beery

Answering your questions in order:

I think a regular DSR is too complicated to adapt. Cartridge ROM is not searched by the usual DSRLNK? Accessing it in the cartridge space is tricky.

I have read the source though and I think the Horizon DSR is excellent. I would like to use it on a sidecar.

The 512K Flash chips are as big as it can go. The next size up, 1024K, is not compatible with 5V (3.3V only), costs $5, and doesn't come in an easy to solder package.

See Tursi's Dragon's Lair cartridge for an example of surface mount Flash, 1024K I think.

Novice soldering skills must be enough, so no surface mount, only thru-hole.

My cart is not intended to be like FinalGROM, for one thing it's not capable of doing GROM. FinalGROM uses a proper FPGA to do all the emulation. UberGROM uses an ATmega128 microprocessor to emulate GROM, which is a pretty good low-cost solution. My cart just has a cheap logic chip with 10 bits of memory.

My goal is to make the lowest cost multi cart out there (now $8) but still distinguish it from the old 512K multicart by adding save capability.

This design is locked down now (lots and lots of hours put into it...) Upgrading the logic to have 20 or 32 bits is beyond reach. The cheap XGecu programmer doesn't support these chips: ATF750 or ATF1502. Newer logic chips are not easy to solder.

Aside: I'm operating outside the usual path by skipping Xilinx FPGAs that everybody else has used. I'm a sucker with an emotional attachment for the underdogs. (Sound familiar?)

+FarmerPotato · July 10, 2019

Oh crud, my brain just worked out a possible harebrained scheme to keep the >6000 space locked to the first 4K (bank 0) while allowing the >7000 space to map any bank (7 bits). That means many hours more of simulation and testing... NOOOOO!

It's not possible to give the bottom 4K the ability to switch in some different banks.

This would be a choice at the time the ATF22V10 logic is programmed: regular 8K banking, or fixed 4K + banked 4K.

Such a scheme would allow the programmer to have all the library code in bank 0, plus the main program loop, calling lots of subroutines in the other banks. The file system code would execute from bank 0 and use the upper 4K to map in the storage space.

Just to re-clarify, this cartridge is not a general-purpose storage device. The program inside it would have to be written specifically to use it.

GDMike · July 10, 2019

2 hours ago, BeeryMiller said:

Just thinking out loud here. Could the DSR from the Horizon's Ramdisk be tweaked to use such a system? If so, is there any benefit to using a chip setup with more memory potential?

Again, just thinking out loud.

Another question. I know the FinalGrom has a lot of capabilities that have not been tapped yet. Is the Flash Cartridge setup you are proposing a subset of features on the Final Grom, or something more advanced?

I am just trying to understand the details here.

Beery

It would be nice to have a ramdisk equivalent with LARGE space. My 3 cents..

+9640News · July 10, 2019

48 minutes ago, GDMike said:

It would be nice to have a ramdisk equivalent with LARGE space. My 3 cents..

Yeah, that was my general thought and was wondering if it could be used in that manner. Having a 512K ramdisk for < $10 would been an easy decision to make. When I asked it though, I questioned to myself, it seemed it would be without Extended Basic or E/A support the best I could tell. Everything would need to run from Basic unless I am missing something.

I am wondering though if the memory mapping on the 512K unit would be the same as on the FinalGrom? Thus, if someone did not want to spend the money for the FinalGrom, they could go this route. And, if someone already had the FinalGrom, if the 512K could be simulated as a cartridge in the FinalGrom?

Beery

Tursi · July 10, 2019

4 hours ago, FarmerPotato said:

See Tursi's Dragon's Lair cartridge for an example of surface mount Flash, 1024K I think.

131,072K, actually.

You can't do a DSR file system from the cartridge port unless you're okay with it only working with programs that search GROM (ie: that support cassette). The address pins are not routed to be able to detect accesses to the DSR space.

You could do such a design that mounts internally or on the side port though. Or I had looked at attaching something to the back of the GROM port and just running one extra wire to detect DSR accesses, that's feasible (QI machines would also need to hook up the missing CRU bits).

GDMike · July 11, 2019

Is ram access just as fast thru PEB 32k or SAMs memory as memory would be at the >6000 space?

What about a ram cache controller at the >6000 space? Haha..I have know idea what I'm asking but it sounds cool..maybe it could be cool so I ask.

But thx for overlooking the source..lol

Edited July 11, 2019 by GDMike

+mizapf · July 11, 2019

Yes, memory at 6000 has multiplex access low-byte/high-byte with the automatically inserted wait states. Only the SRAM at 8300 has a full 16-bit access without wait states. Caching would only make sense if it is much faster than the normal access, so maybe one could think about using a 16-bit 0WS cache for caching the 8-bit data bus transfer, but this would not gain enough speed to justify the additional efforts to move RAM content in and out.

The cartridge port indeed only offers the address bits A3-A15 (for addresses 0000-1FFF), and there is a line ROMG* = A0+A1*+A2* for selecting ROM at that port (and GS* for GROM select).

+FarmerPotato · July 11, 2019

10 hours ago, GDMike said:

Is ram access just as fast thru PEB 32k or SAMs memory as memory would be at the >6000 space?

What about a ram cache controller at the >6000 space? Haha..I have know idea what I'm asking but it sounds cool..maybe it could be cool so I ask.

But thx for overlooking the source..lol

I think that considering a 512K flash cart as a big RAM is off the track. You could replace the chip with a large SRAM (for about $20) and have a mega-Supercart, but here are the numbers, like mizapf said:

Relative Speeds (not including instruction overhead which is pretty big.)

1 PAD 16-bit access in 1 cycle each for read and write.

6 32K expansion 8-bit access in 6 cycles for read and write of 1 or 2 bytes. 2 x (1 cycle + 2 wait states). 6 times slower than PAD.

300 Flash - same read access as 32K, but takes 9 writes to program one byte. Plus a lot more instruction fetch and executes. So it will be at least 20 times slower than the 32K RAM per byte, or 40x slower per word. I'm guessing 50x after overhead.

In other words, writing to Flash is best suited for saving small amounts of data or loading a new cart into the thing.

I spent a lot of hours with the logic analyzer on the side port watching Mini Memory and 32k accesses. I know it pretty well by now. Tursi also made a great document from his logic analyzer of all types of memory cycles. I'll record the bus activity of the flash-writing code listed above to know for sure what the relative speed will be. (I have simulated my tests so far in the WinCUPL simulator from the Windows 95 era and I really, really have grown to hate it.)

+FarmerPotato · July 11, 2019

I'm still waiting for the actual cart boards from DirtyPCBs (where they cost $2.25 each). I held it up until I added the peg hole into the board (oops). EDIT: they were made on 7/15/19, shipped out of Hong Kong on 7/18/19. Hope they get here by Aug 8.

BUT I will have this test rig for the ATF22V10 logic chip (PLD) from OshPark next week. I couldn't work with the PLCC to DIP adaptor that came with the TL866 / XGecu programmer. (It doesn't fit a breadboard, for one.)

It it a PLCC-28 adaptor with two sets of pins: one set on the bottom plugs into a breadboard, while the top set is for a logic analyzer. All the pins on top are labeled.

This will let me test just the bank register, and re-write logic, on the PLD.

PLCC28adaptor.png.328f856ffe84caddc030e787734c3cfc.png

Edited July 19, 2019 by FarmerPotato

GDMike · July 11, 2019

Very good...im sure no one will see you're messy work..bwahaha .I got some of these crazy ideas about cache from the 86 erra..we used to put cache between HD and controller..and the CPU already had a small cache on it too...so I'm assuming this cpu could benefit from some kinda cache somewheres...

But from what I gather from you guys is...it's too much work per benefit..or it's not addressably possible..-- one word...'shucks'.

Edited July 11, 2019 by GDMike

+FarmerPotato · July 11, 2019

42 minutes ago, GDMike said:

Very good...im sure no one will see you're messy work..bwahaha .I got some of these crazy ideas about cache from the 86 erra..we used to put cache between HD and controller..and the CPU already had a small cache on it too...so I'm assuming this cpu could benefit from some kinda cache somewheres...

But from what I gather from you guys is...it's too much work per benefit..or it's not addressably possible..-- one word...'shucks'.

Making the RAM fast and large is a perfectly fine goal, it's just not possible on the cartridge port.

9900 doesn't need cache because its clock is only 3 MHz and cheap SRAM even from the late 80s needs no wait states. Cache is a necessity when your CPU is clocked way faster than the RAM; today your x86 is clocked in gigahertz and your RAM at 800 Mhz.

TheBF had some good ideas about replacing the side port with a 16 bit fast memory bus to the PBox (I hope with a backwards compatible adaptor as an option).

Others have put fast 32K memory inside the console. We don't need a cache, we just need RAM in the console that is 16 bit and no wait states.

Modern SRAM is fast and cheap. For instance, this 512Kx8 thru-hole SRAM is $5 and is the type used in SAMS cards (and hopefully future Horizon RAMdisk).

And this 512Kx8 or 256Kx16 part is $3 and $5, operates up to 100Mhz, and comes in a surface mount package that is just a little harder to solder than plain old 0.1" thru hole chips (SOJ-36 with 0.05" pins, you need a magnifier, and a knife edge K solder tip.) It would be hellacomplicated but you could make an in-console 16-bit bus tiny SAMS upgrade around that. Then again, I have more ideas than time or attention span.

GDMike · July 11, 2019

Gotcha..now I got the picture. Yep. Im so happy with my SAMs card. I've been able to use it a lot further with the assistance of TForth..FBForth takes advantage as well as the RXB. I've gotten some code in assy for it and played a little, but I need a little more experience in assy to make use of it. But in TF I was able to make 3 char font sets that were each called by one word at snap! 5 or 6 blocks of code, but in assy I'm always worried about running out of memory just writing code...with file after file of 500+ lines of D/V80.. I just don't know how to manage my code yet..def a learning curve..

+adamantyr · July 11, 2019

5 minutes ago, GDMike said:

Gotcha..now I got the picture. Yep. Im so happy with my SAMs card. I've been able to use it a lot further with the assistance of TForth..FBForth takes advantage as well as the RXB. I've gotten some code in assy for it and played a little, but I need a little more experience in assy to make use of it. But in TF I was able to make 3 char font sets that were each called by one word at snap! 5 or 6 blocks of code, but in assy I'm always worried about running out of memory just writing code...with file after file of 500+ lines of D/V80.. I just don't know how to manage my code yet..def a learning curve..

It's not too bad in assembly, the biggest complication is coming up with a module-based approach with code management. I baked my own for my CRPG work, at some point when I'm done with the game I'll discuss it at length.

A cross-assembler that could build modules that are dynamically assigned to pages IS possible but I honestly think trying to apply a modern design system to what is essentially 8-bit architecture will get too top-heavy. Just my opinion though. I honestly feel C is too much overhead for the TI-99/4a.

Tursi · July 11, 2019

6 hours ago, FarmerPotato said:

TheBF had some good ideas about replacing the side port with a 16 bit fast memory bus to the PBox (I hope with a backwards compatible adaptor as an option).

Others have put fast 32K memory inside the console. We don't need a cache, we just need RAM in the console that is 16 bit and no wait states.

It's also worth noting that Theirry Nouspikel did some tests and successfully operated the TI with no wait states to any RAM or ROM (although this did require static RAM) - only GROM required the wait states. http://www.unige.ch/medecine/nouspikel/ti99/wait.htm

I did actually try part of this but I messed something up, I haven't had a chance to see what I did wrong yet.

+FarmerPotato · July 20, 2019

I've verified that ATF22V10s (combinatorial logic) work as expected when programmed in the cheap XGEcu (aka TL866-II).

I spent $10 making this adaptor , just what I needed for testing.

New 512K flash cartridge design, rewritable by 4A

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members