I'm more interested in supposing they're not flat-out lying about "hybrid emulation" and figuring out how they might can pull this off, instead of figuring out why they can't. "Synchronization is hard" doesn't cut the mustard, and "systems timing is unreliable" doesn't make sense. Are you telling me you can't setup a realtime kernel and configure an ARM system to run code out of a non-cached region, or alternatively, reconfigure the cache to be TCM instead and run out of there? Nintendo's made a lot of games that run at 60fps steady on ARM systems, including emulators! First of all, "Full hardware compatibility" is obviously a lie. So reckoning backwards from that to the impossibility of doing what it would take to achieve full hardware compatibility isn't useful. So let's stipulate that they're not beholden to USB timing shenanigans for reading their pads, because why not GPIOs? That's how it works on the game consoles, approximately. And it doesn't matter anyway: simply have one thread continually reading the pad input and parking it in buffers for the main emulation thread to pick up at the moment it needs it. That doesn't introduce any latency beyond the GPIOs (nobody can detect it) or beyond USB (only wizards can detect it). They could do all this without compromising their claims of low latency, because marketers have that power. Wait states on GPIOs for reading the cart -- I'm not an expert about this, but a few minutes of googling show ARM advertising GPIO "operating speed in excess of 150mhz" so this is almost sounding negligible. How about having to turn the emulator inside out in order to be able to mix the rendering with the main thread cpu emulation? My answer is: don't. Read/evaluate the PPU and ship the bytes off to another buffer, to be rendered all at once or even pixel by pixel truly in parallel. Now, the latter isn't the easiest synchronization in the world, but it's still a relatively simple pattern since it's simply a reader and a writer. We do have problems with situations where the cpu can read data back from the PPU or APU. Consider that the bulk of the novelty in this emulator. In both cases only a subset of the emulation needs to be done on the CPU thread (on the NES, only sprite 0 at a limited level; and the APU.. well.. maybe about 80% of it needs to be done there. That's starting to feel like enough logic that it may need to be broken into a several-stage state machine with only one stage ticked per clock out to the cart. Now's a good point to mention that this isn't such mad science that you couldn't prototype it on a workstation and IDE of your choice, thus accelerating development. You would only need to make sure you weren't busting your CPU time budgets once the code compiles and runs for ARM. What's more, they could have developed it before even having the hardware ready, given certain assumptions. Any of us could make this emulator right now as a proof of concept before dedicating our careers to getting it put in HW. Business considerations making all of these emulators in time -- ahh, but they can slip on shipping a few modules and as long as the product is real and proven, they can take some heat and survive. That's the real power of the modular design. What concerns me more is why a system built this way would have any provision for dumping and storing ROMs. Or even how it would do that -- it can't be reliably, and it adds an unneeded piece to their mountain of tasks. Maybe it's a value-add for convenience, but their userbase likes plugging carts, so I don't get it. Maybe it's so they can support patches, translations, etc.