Jump to content
IGNORED

FPGA Based Videogame System


kevtris

Interest in an FPGA Videogame System  

682 members have voted

  1. 1. I would pay....

  2. 2. I Would Like Support for...

  3. 3. Games Should Run From...

    • SD Card / USB Memory Sticks
    • Original Cartridges
    • Hopes and Dreams
  4. 4. The Video Inteface Should be...


  • Please sign in to vote in this poll.

Recommended Posts

It's a FPGA-based console, which means it's designed to receive any core, whether it's 8-bit or 16-bit. So technically, Kevtris could release the Zimba now with the cores he's got implemented, and deliver the 16-bit cores later. Loading any core simply involves having that core on an SD card and following a procedure to have it loaded for the game you want to play.

 

My guess is Kev will want to have the Super-NES and Genesis cores done before he tackles proper mass-production of the Zimba, because being a FPGA console implies that it's a mostly open architecture, and he will not want someone else writing a 16-bit core for the Zimba that doesn't meet his (very high) standards. I would probably do the same if I were in his shoes, to be honest. :)

We agree on the conclusion but I have different reasons. I think he should wait because if he released it with just the 8-bit systems then it may get a reputation for being just that and initially not sell as well as if he just waited to include more. Then later when he adds the 16-bit systems less may notice because they think of it as the 8-bit machine.

 

Also, every core he includes at launch increases demand for it and the more he could get for it. In other words, selling the hardware with the cores is how he can make money off the software but if he added them after the fact that is just supporting the console for free causing maybe less incentive. But if he just included all the cores or at least most that he plans on including and the only added support is bug fixes and tweaks then he may get more and be more motivated to work on those bugs and tweaks.

 

Another thing is that it is possible that something with the chosen final design could be flawed somehow and then he realizes he had a big brain fart when he adds a new core,"Oh Chameleon turds! I forgot to add that thingy to this doodad for the NEO・GEO! Dope!"

 

Anyway, he should definitely wait until he has the SNES, Genesis, TurboGrafx-16, PS4, and all the other 16-bit systems.

  • Like 1
Link to comment
Share on other sites

Also, every core he includes at launch increases demand for it and the more he could get for it. In other words, selling the hardware with the cores is how he can make money off the software

Who would make money off the "software" running on the Zimba? This isn't Nintendo or some other similar company that will make first-party games for its own system.

 

Interestingly, there was a mention in another thread about Gamestop setting themselves up to become an outlet for collectible games on physical media after the gaming market goes "mostly download-only" within the next few years. I'd say it's a smart move, and Kevtris could actually capitalize on it with some good timing. Imagine setting foot inside a game shop with all kinds of used cartridge games: "Don't have the console/handheld to play these carts? Don't worry, we sell the Zimba 3000 which can play them all."

 

But that means making cartridge adaptors for all the supported consoles/handhelds. But hey, even the cartridge adaptors themselves could become collectibles! :D

 

 

Another thing is that it is possible that something with the chosen final design could be flawed somehow and then he realizes he had a big brain fart when he adds a new core,"Oh Chameleon turds! I forgot to add that thingy to this doodad for the NEO・GEO! Dope!"

 

Anyway, he should definitely wait until he has the SNES, Genesis, TurboGrafx-16, PS4, and all the other 16-bit systems.

Speaking of brain farts... :P :D

Link to comment
Share on other sites

Who would make money off the "software" running on the Zimba? This isn't Nintendo or some other similar company that will make first-party games for its own system.

I'm talking about the cores. The ones included as "launch cores" add value. So, if they are included at launch Kevtris makes money for both the work he did on the hardware and the cores(software).

 

Speaking of brain farts... :P :D

Whoops! I meant XBOX ONE. :D

  • Like 2
Link to comment
Share on other sites

I'm really hoping for Master System, Genesis (with Sega CD and 32X compatibility), Nintendo NES, SNES, Neo Geo, Turbo GraphX 16... That could be waaaay more than enough. I'd throw a ton of change at that personally, especially if roms could be loaded via SD card without the need for Everdrive. For that I'd be willing to way more than the current latest gen console price.

 

And what would also be amazing is if Saturn, Dreamcast, PlayStation 1 & 2 and maybe N64 & Gamecube were included in the next one. That'd be fantastic. My purchased and good condition Dreamcast discs barely work anymore. Sadly that's my only console that seems to be in the very late stages of the dying process. So it would be nice to have an alternative way to play Dreamcast games as my legitimately purchased discs with no scratches are having problems. :(

Link to comment
Share on other sites

Thanks for the kind words everyone :-) that means a lot. I really did want at least one 16 bit core at launch with this thing. PS1 is probably about as high end as these things can go, due to the speeds involved (CPU speeds, etc) and the sheer size of the undertaking. The N64 might be doable but is probably just out of reach without an insane amount of heroics. One problem I have been grappling with is how to get fast enough memory for things like SA-1 on the SNES, since that needs about 10MHz access to memory. This doesn't sound fast, but it's a lot faster than DDR running at 150MHz+ can handle I think. It's not the bandwidth that kills you, it's the latency. Setting up and performing random reads takes a relatively long time. I have been working to try and solve this problem before advancing. The Game King really showed me the problem of SDRAM latency. I am running it at 125MHz (max for that chip) and could barely coax 5.6MHz or so out of it of random access due to the latencies involved. There's several options I can do, so those are being explored. Good to get these kinks worked out now, before making the hardware.

 

That and I kind of want to get this synth project done first as a warm up too. Been working on that as well. Fortunately lots of the work on that will be directly transferable to the Zimba.

 

(edit: too quick on the posting trigger, lol)

Edited by kevtris
  • Like 8
Link to comment
Share on other sites

Thanks for the kind words everyone :-) that means a lot. I really did want at least one 16 bit core at launch with this thing. PS1 is probably about as high end as these things can go, due to the speeds involved (CPU speeds, etc) and the sheer size of the undertaking. The N64 might be doable but is probably just out of reach without an insane amount of heroics. One problem I have been grappling with is how to get fast enough memory for things like SA-1 on the SNES, since that needs about 10MHz access to memory. This doesn't sound fast, but it's a lot faster than DDR running at 150MHz+ can handle I think. It's not the bandwidth that kills you, it's the latency. Setting up and performing random reads takes a relatively long time. I have been working to try and solve this problem before advancing. The Game King really showed me the problem of SDRAM latency. I am running it at 125MHz (max for that chip) and could barely coax 5.6MHz or so out of it of random access due to the latencies involved. There's several options I can do, so those are being explored. Good to get these kinks worked out now, before making the hardware.

 

That and I kind of want to get this synth project done first as a warm up too. Been working on that as well. Fortunately lots of the work on that will be directly transferable to the Zimba.

 

(edit: too quick on the posting trigger, lol)

 

Thanks for the update. Is the Saturn possible?

Link to comment
Share on other sites

I think it would be fair enough not to include anything 32 bits unless it's a port of an existing core already shown to work (say Amiga or ST).

 

There's already a ton of work needed to get 16-bit machines properly inplemented, especially the NeoGeo which nobody has done yet.

Link to comment
Share on other sites

I would like to request Turbo/PC Engine SuperCD support. The aging CD/Duo units are very prone to failures due to "rotten" gears and dried capacitors, although the ealier standalone models have prooven very reliable. I have immensely enjoyed playing PC Engine and Turbografx Hucards on my unmodded Turbografx (I use an import adapter for the PCe games), but CD titles appear out of the scope of my collecting budget due to hardware requirements. Since your Zimba3000 supports SD, I was hoping it would be possible to throw CD images on there once you get PCe/TG16 cores up and running. Also I hope you support SDXC cards because disc images will eat up storage quickly. The CD drive doesn't add a ton of features besides redbook audio and extra storage/RAM. I imagine an FPGA implementation could load games with almost zero wait time! ;-)

 

As for the memory, it may increase costs, but as I understand it, older parallel RAM chips had practically zero latency. A few megs worth of medium speed parallel RAM chips could squash most requirements of 16-bit systems. I know the vintage style chips are typically 8-bit, but the FPGA could reconfigure the memory to run in parallel or consoles with 16- or 32-bit busses or in series for 8-bit busses. 8 megabytes worth of parallel RAM chips, (possibly 2mbyte 8-bit RAM chips x4 in an FPGA reconfigurable design could simulate memory requirements of an 8-, 16-, or 32-bit wide bus for a variety of systems) could do a lot of systems. The Everdrive 64v3 used a 64mbyte RAM chip that loads Conker's Bad Fur Day in 3 seconds or less, so I assume they are available and with adequate speed. A large wide bus parallel RAM chip could handle most consoles including the ROM storage. Maybe parallel RAM as slower bus would be more useful than high speed DDR???

Link to comment
Share on other sites

I would like to request Turbo/PC Engine SuperCD support. The aging CD/Duo units are very prone to failures due to "rotten" gears and dried capacitors, although the ealier standalone models have prooven very reliable. I have immensely enjoyed playing PC Engine and Turbografx Hucards on my unmodded Turbografx (I use an import adapter for the PCe games), but CD titles appear out of the scope of my collecting budget due to hardware requirements. Since your Zimba3000 supports SD, I was hoping it would be possible to throw CD images on there once you get PCe/TG16 cores up and running. Also I hope you support SDXC cards because disc images will eat up storage quickly. The CD drive doesn't add a ton of features besides redbook audio and extra storage/RAM. I imagine an FPGA implementation could load games with almost zero wait time! ;-)

 

As for the memory, it may increase costs, but as I understand it, older parallel RAM chips had practically zero latency. A few megs worth of medium speed parallel RAM chips could squash most requirements of 16-bit systems. I know the vintage style chips are typically 8-bit, but the FPGA could reconfigure the memory to run in parallel or consoles with 16- or 32-bit busses or in series for 8-bit busses. 8 megabytes worth of parallel RAM chips, (possibly 2mbyte 8-bit RAM chips x4 in an FPGA reconfigurable design could simulate memory requirements of an 8-, 16-, or 32-bit wide bus for a variety of systems) could do a lot of systems. The Everdrive 64v3 used a 64mbyte RAM chip that loads Conker's Bad Fur Day in 3 seconds or less, so I assume they are available and with adequate speed. A large wide bus parallel RAM chip could handle most consoles including the ROM storage. Maybe parallel RAM as slower bus would be more useful than high speed DDR???

 

Turbo CD support would be awesome of course.

 

Those 8-bit SRAM chips become expensive really quickly once you start stacking them.

 

The trouble with modern RAM according to kevtris is (if I understand latency correctly) random access speed, not sustained speed. It is a tortoise vs. hare situation. Random access speed, the need to read a few bytes here and write a few bytes there, is heavily affected by latency. You will encounter latency whenever you jump around in sizes of 4KB. This is frequently the case in the 16-bit systems. Modern RAM is designed for block copying and sustained speed, so when you copy a 4MB block of data, you only encounter the initial latency penalty and the copy is completed quickly.

Link to comment
Share on other sites

 

Turbo CD support would be awesome of course.

 

Those 8-bit SRAM chips become expensive really quickly once you start stacking them.

 

The trouble with modern RAM according to kevtris is (if I understand latency correctly) random access speed, not sustained speed. It is a tortoise vs. hare situation. Random access speed, the need to read a few bytes here and write a few bytes there, is heavily affected by latency. You will encounter latency whenever you jump around in sizes of 4KB. This is frequently the case in the 16-bit systems. Modern RAM is designed for block copying and sustained speed, so when you copy a 4MB block of data, you only encounter the initial latency penalty and the copy is completed quickly.

Yes. I'm no professing expert on RAM but when it started with SD RAM having typical speeds of 66-100-133 Mhz. It appears there is high throughput because you've got a 32-bit wide bus transmitting data serially on each clock. Feel free to skip the following armchair hardware analysis as I clearly have no idea what I'm talking about...

 

Double Data Rate DDR came along with vast improvements in speed. The chips inside the 200-266-333-400 classed DDR RAM would be equivalent to SD RAM running at up to 100-133-166-200Mhz, about a 50% increase over SD. The DDR sends data twice per clock instead of once, resulting in twice the throughput but only modest improvments in latency.

 

Then you've got DDR2, Dual channel DDR2, DDR3, and DDR3 in dual, triple, and quad channel configurations. Now DDR4 is slow to adoption due to diminishing returns in CPU tech. The CPUs are getting bottlenecked, however multicore CPU can take advantage of multiple channels of RAM, etc etc...Basically each generation of RAM doubles the throughput of data transmission without the chips really getting much faster. SD 200Mhz, DDR 400Mhz, DDR2 800Mhz, and DDR3 1600Mhz would all have pretty much the same amount of latency for Kevtris purposes, if fetching one byte at a time. Compounding the issue, RAM modules have different timing tables based on which speed class they are operated at. Running a 400Mhz DDR at 333Mhz for instance might result in fewer clock cycles worth of latency as a tradeoff for lower throughput.

 

Ultimately however, the RAM is really only used to efficiently pipe data into the CPU cache. The true low latency high speed memory which can be tweaked on the byte or word level is the CPU cache where the bulk of operations gets carried out.

 

And I'm not even sure if data can be fetched from SDRAM or DDR one byte at a time. It may be necessary for the FPGA to read an entire "word" off the RAM module, then discard all but the one byte of that "word" it needs, then immediately fetch a new "word" from somewhere else, discard all but one byte, and perform all intended operations with that byte within a single system clock. Writes would be even worse. It would need to read back the entire word it needs to write to, change the one byte in that word that needs to be changed, then write the entire word back to the RAM module, if for instance single 8-bit writes to a single address are not possible. Before you can write a single byte to the RAM, you need to know the contents of the entire 32-bit word where that byte resides. Most memory probably have 32-bit or 64-bit bus width, depending on whether the CPU is running 32 or 64 bits. So a CPU architecture designed to read/write one byte at a time with a maximum one clock latency, would be a challenge.

 

An alternative to dealing with 32-byte word lengths with 8-bit architecture would be to ignore the last three bytes. For instance, only the first 8 bits are used, and the remaining 24 bits of every word could be filled with 00 or FF. Then you wouldn't have to worry about the contents of each word when writing bytes back. This would also waste 75% of the RAM module for 8-bit architecture, but it would be negligible when cheap RAM modules are available in orders of magnitude larger than what you really need. Larger sizes are often cheaper than smaller ones.

 

Then there's the issue of multiple system busses. You've got the cartridge ROM, battery backed SRAM, CPU RAM, Graphics RAM all on different busses which may need to be accessed simultaneously. If a single random access per console system clock is pushing the boundaries of the RAM module's latency, then adding a secondary read or write to a different address on the same module during the same console system clock, might break the system. A cache miss in a game console could have devastating consequences.

 

Another disadvantage to using DDR or other RAM, even if it has shorter latencies, is that the FPGA may not be able to keep up. However, you may have access to using nonstandard timing. Just like system tweakers can force performance out of their CPUs, it may be possible to force performance out of RAM modules. Suppose a DDR module rated for 400Mhz access has 4-4-4 latency. Well you may be able to force it to operate under a custom profile, for instance 100Mhz at 1-1-1 would have the same latency despite clocking it at 1/4 the speed. I have no idea if such a hack would work. Underclocking the RAM module while simultaneously lowering the latency clocks might be useful if your FPGA cannot keep up with the module's rated speed. Again, the module may or may not operate properly at such a custom profile even if the overall latency were unchanged.

 

Just some food for thought. End armchair hardware analysis.

 

@Kevtris: I remember you mentioned some system running at 5.something Mhz was at the absolute latency threshold of a particular SDRAM module for single byte access. What about the Turbo/PCe 7.16Mhz system clock? The holy trinity of the 16-bit console wars is IMO pretty much required material for an FPGA console, and if you can't get all three base systems working (SNES expansion chips, Sega add-ons, etc notwithstanding), it may be a deal breaker to many people.

Link to comment
Share on other sites

And I'm not even sure if data can be fetched from SDRAM or DDR one byte at a time. It may be necessary for the FPGA to read an entire "word" off the RAM module, then discard all but the one byte of that "word" it needs, then immediately fetch a new "word" from somewhere else, discard all but one byte, and perform all intended operations with that byte within a single system clock. Writes would be even worse. It would need to read back the entire word it needs to write to, change the one byte in that word that needs to be changed, then write the entire word back to the RAM module, if for instance single 8-bit writes to a single address are not possible. Before you can write a single byte to the RAM, you need to know the contents of the entire 32-bit word where that byte resides. Most memory probably have 32-bit or 64-bit bus width, depending on whether the CPU is running 32 or 64 bits. So a CPU architecture designed to read/write one byte at a time with a maximum one clock latency, would be a challenge.

 

Reading an entire "word" is inconsequential and has no down-side to performance or otherwise, it is simply masked in the controller.

 

Given that memory is designed for maximum throughput, it would be a monumental drawback to require that entire words be written to the RAM. SDR/DDR have DQM/DM signals which are effectively byte enables for write data. So no issue there.

 

There shouldn't be an issue using 100% of the RAM for emulated systems. 8/16-bit data buses can easily be accommodated by a mux on the data bus and shifting low-order address bits into DQM/DM signals, for example. Plenty of 8-bit emulations doing that already with 16-bit FLASH and/or SRAM.

 

FYI here is a good explanation of how SDR/DDR works.

Link to comment
Share on other sites

Then there's the issue of multiple system busses. You've got the cartridge ROM, battery backed SRAM, CPU RAM, Graphics RAM all on different busses which may need to be accessed simultaneously. If a single random access per console system clock is pushing the boundaries of the RAM module's latency, then adding a secondary read or write to a different address on the same module during the same console system clock, might break the system. A cache miss in a game console could have devastating consequences.

 

 

This is probably the biggest impediment to emulating the Neo Geo; this and the relatively large memory devices required on no less than 5 different buses.

 

For simpler systems with modest requirements, modern FPGAs are likely to have sufficient RAM on-chip for some of the memory devices. In these cases the number of buses is almost inconsequential. A lot of 8-bit arcade games, for example, require no external RAM at all on later generation FPGAs; these games typically have video, attribute and CPU RAM sizes that are measured in 10s of kB at most. It's generally the (even 8-bit) microcomputer emulations that require external (S)RAM on "hobbyist" FPGAs, and of course later generation arcade games/consoles.

 

Some systems can tolerate some latencies on some buses (eg. 68K) and still run "correctly", though arguably extending the latency beyond the original design specifications affects the accuracy of the emulation. Other buses, such as video memory, cannot tolerate any latencies at all. However in these latter cases, it is sometimes possible to pre-fetch into a cache since the access pattern is known. Again, this will arguably affect emulation accuracy. But in general, yes, multiple buses equates to multiple headaches for the emulation implementer.

Edited by tcdev
Link to comment
Share on other sites

KS, typically a 16-bit value is referred to as a word and a 32-bit value is referred to as a doubleword and a 64-bit value is referred to as a quadword. No one ever refers to an 8-bit value as a word or a letter. :)

 

Kevtris has not complained of latency issues for 8-bit systems, even for such devices as the Turbo Grafx-16, which contains an 8-bit Data Bus except to the video memory. The 16-bit systems like the SNES and Genesis and probably the Neo Geo use a combination of 8-bit and 16-but data busses, so these systems may be slightly more efficient with 64-bit SDRAM chips. However, as the bandwidth has improved substantially from SDRAM to DDR, DDR2, DDR3 & DDR4 SDRAM, the latencies have gotten worse.

 

Assuming that the SA-1/65C816's memory must respond within one clock cycle, kevtris would need a RAM chip able to respond within 100ns. SRAM can easily do that, 486 machines often had 20ns RAMs.

  • Like 1
Link to comment
Share on other sites

KS, typically a 16-bit value is referred to as a word and a 32-bit value is referred to as a doubleword and a 64-bit value is referred to as a quadword. No one ever refers to an 8-bit value as a word or a letter. :)

Most of my computer science knowledge comes from wikipedia articles, so I added a disclaimer that my post was purely armchair speculation. I knew I would get a few things incorrect. :P

 

Good to know that the SDRAM can handle single byte reads and writes and has instructions to deal with this. Also just using the lower address lines and leaving the upper ones blank or mirrored works too.

 

Also no joke about the latencies of each successive RAM generation. My AMD PC I built in 2002 with and Athlon XP used 333Mhz 3-3-3 DDR. Now I look at the timing tables for the Raedon 16Gb dual channel kit (2x8Gb, 1867Mhz DDR3) I have installed in my Bulldozer rig and it's downright abysmal.

 

One of the issues using the FPGA to simulate the console memory isn't just the console itself, but if it's running SD cards for storage, it needs the ROMs stored somewhere too, with direct low latency access. So the SNES might use 128kbytes of RAM not counting the soundchip, etc, but the games can be up to 6mbytes (48mbit) for commercial releases and larger for homebrew ROMs or hacks. Genesis is in a similar position. These ROMs need direct bus access to work, so if using SD cards, the cart bus and ROM must also be added to the FPGA. If the FPGA isn't big enough, the larger ROMs must be stored in RAM. This also includes any and all expansion chips such as SA-1 or Super FX.

Link to comment
Share on other sites

KS, typically a 16-bit value is referred to as a word and a 32-bit value is referred to as a doubleword and a 64-bit value is referred to as a quadword. No one ever refers to an 8-bit value as a word or a letter. :)

 

Off topic but 8-bit is a byte and 4-bit is a nibble (one hex digit). :)

 

Actually to be pendantic byte and word do not necessarily imply a fixed number of bits... but the architectures where that happened are probably obsolete(?).

Edited by Newsdee
  • Like 1
Link to comment
Share on other sites

Off topic but 8-bit is a byte and 4-bit is a nibble (one hex digit). :)

 

Actually to be pendantic byte and word do not necessarily imply a fixed number of bits... but the architectures where that happened are probably obsolete(?).

Okay I got word wrong. I always thought nibble was a cute term though. One hex digit. Smaller than a "byte". ;-)

Link to comment
Share on other sites

Off topic but 8-bit is a byte and 4-bit is a nibble (one hex digit). :)

 

Actually to be pendantic byte and word do not necessarily imply a fixed number of bits... but the architectures where that happened are probably obsolete(?).

Yeah, I'm remembering back in the mainframe days, we used hex with 4bit nibble/8bit byte/16bit word (mostly IBM and DEC), and octal with 3bit nibbles/6bit byte/12 bit word architectures (not even sure of makers anymore, but maybe Burroughs or Honeywell, and also a couple minicomputers).

 

It turns out a group of 256 is SO much better than a group of 64, I guess.

Link to comment
Share on other sites

It turns out a group of 256 is SO much better than a group of 64, I guess.

According to Wikipedia, octals were convenient early on to make displays simpler (cheaper) but then became awkward to use when larger numbers were needed.

I learned about octal notation in school but we never used it beyond some simple base conversion exercises...

Link to comment
Share on other sites

According to Wikipedia, octals were convenient early on to make displays simpler (cheaper) but then became awkward to use when larger numbers were needed.

I learned about octal notation in school but we never used it beyond some simple base conversion exercises...

Could be worse. What if we used 5-bit icosidodecimal notation? (2^5 = 32) There are enough letters in the alphabet to do so as an extension of hexadecimal:

01234567 89ABCDEF GHJKLMNP QRTUVWXY

(FYI, I removed I, O, S, and Z because they are too similar to 1, 0, 5, and 2 when handwritten, potentially causing confusion)

 

There also exists 6-bit notation UUENCODE or something based on all alphanumeric characters upper and lowercase plus two special characters (10 + 26 + 26 + 2 = 64). That is much more easily expressed as two octal nibbles though... :P

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...