Jump to content

ThomH

Members
  • Posts

    179
  • Joined

  • Last visited

Everything posted by ThomH

  1. To sneak in some extra information on this while I'm sitting at my desk, watching some very slow automated tasks in my day job: Re: the stencil; this is used because the CRT is its own sealed component trying to act like a real CRT. So it's attempting to sync to any incoming signal, and it paints scans to a virtual set of phosphors, which subsequently suffer exponential decay. It actually does this in a matter decoupled from your machine's frame rate for the reason that if you have a gaming monitor at 120Hz or 144Hz it can then provide 120 or 144 distinct frames per second, just like a real-life CRT doesn't atomically flip from one frame to the next, and you gain a big graphical latency reduction. The stencil is related to the exponential decay — that's just a discrete in-framebuffer effect, as you'd probably guess, but given that machines may not produce regular sync, which may cause the number of pixels painted sometimes not to reach the edges of the display, an extra means is needed to keep track of which pixels haven't been repainted by the normal scan. A ColecoVision produces an entirely stable display, so it's not actually much help here, other than possibly for clearing up after the bouncing that happens during initial switch on. It is nevertheless essential to allow the virtual CRT to fulfill its stated contract, and in practice a big help to other machines the emulator implements. Re: 'revert to saved'; the Mac version is, as stated, a full native application. Specifically it's an NSDocument/NSDocumentController-type application with XIBs aplenty, utilising an NSOpenGLView, CVDisplayLink, CoreAudio and IOKit for joypads and so on. That's nice because I get all the window management and most of the default menu bar items and their implementations for free. But it means I have to respond properly to the NSDocument messages. So every so often it turns out there's a menu option that isn't working because I'm not responding to a message like I should. It's just usually not immediately obvious what I've done wrong because I'm relatively weak at AppKit. It's not like I've tried to add a 'revert to saved' feature and failed to test it, I've actually been entirely oblivious to the menu entry. The SDL port, which allows use under Linux and similar environments, is completely distinct. Not one jot of SDL is used on the Mac. I have every intention of switching to Metal given that OpenGL is now deprecated on the Mac, and as of this overhaul the use of OpenGL by a particular binding is entirely optional. In practice it can be specified at runtime, which gives me a fallback for non-Metal Macs. It's just unfortunate that I don't presently know that much about Metal right now. Tech nerd asides: if you've any experience of pootling around in emulation source code, you might now be imagining a viper's nest of #ifdefs and equivalent runtime selection mechanisms, and a project that will surely die quickly under its own weight. There's none of that. The machine owner supplies the delegates to receive audio and video content, and pushes changes in input controls, and that's the end of that. I keep angling towards a Windows or Android port and, if ever I go that way, that will mean extra code to handle DirectX and/or OpenGL ES, but will require zero lines difference within the emulation. It'll just be a new consumer, or two, of the information the emulation puts out.
  2. Cool! That's the big one dealt with then, at least. It strikes me now that I have actually never once tried to use the 'revert to saved' menu entry with my emulator. So I've no offhand guesses there; I'll file it and fix it, naturally, but it might be more like weeks than days, especially given that the weekend is almost at an end. Thanks again for all your input and help!
  3. Having a quick review, I'm actually asking for a slightly odd stencil mode — a 1bpp stencil. The only reason I did that was that I really only need one bit. Would you be willing to try the attached build, which ups the request to 8 bits, for no reason other than it being a little more likely to be supported? I'm really grateful for the feedback so far, regardless! Clock Signal 8bpp stencil.zip
  4. It's still only S-Video, so the two colour channels are being QAM encoded and decoded, giving them an upper bandwidth bound, but you'll probably be hard-pressed to spot it. It's a completely different issue at least: GL_FRAMEBUFFER_UNSUPPORTED; presumably from my attempt to use a stencil buffer, though that surprises me a lot since they've been supported in hardware for 20 years. Out of interest, which GPU does your Mac have? I've tested on a 2011 MacBook Air, which I had assumed to be one of the lowliest Macs for 10.13 with its Intel integrated non-Metal-compatible GPU, but I guess that wasn't a safe assumption. I'll do some research. You might be able to use one of the older releases, which don't attempt to use a stencil at all. But we'll see. And the timing will be slightly off — no M1 delay, short SN waits — and you'll get composite video only. With some luminance aliasing, but it's another thing it'd possibly be hard to notice with ColecoVision software.
  5. Having tested on a 10.13 Mac, I was easily able to reproduce the issue; it should now be fixed. So I've thrown up a new release.
  6. Attached; a build that allows S-Video output from the ColecoVision. @mumbai it's late, so I might be very askew on my guess, but to me that looks potentially like I've got a dumb race condition on machine startup. So if it correlates at all with running the thing on 10.13 then that's purely luck. Try the attached, but failing that I'll throw myself back at it tomorrow. (there's more complicated threading than most emulators primarily in an attempt to reduce latency — e.g. so that decisions about the size of my audio packets are completely distinct from any video-frame-related considerations. In practice I aim for 5–10ms latency on audio) Clock Signal.zip
  7. Yeah, now I think of it, it's probably very dodgy of me still to claim compatibility back to 10.10 as I haven't tested on anything other than 10.14 for a while; I'll either dig out my 10.13 Mac or, if I can find it for download, put it in a virtual machine. This is absolutely not user error — the app should work exactly like any other native app, including for File -> Open, file associations and — option three — dragging and dropping onto its icon on the Dock. Re: the CRT look, that's actually a consequence of the full composite video pipeline. It's not deliberately downgrading it, that's genuinely the best composite decoding I've so far managed to produce — though the TMS's in-phase video (in NTSC, anyway) makes composite even worse than it usually is. This, I'm certain, is what the Karl Guttag memos mean by the rainbow effect. That excuse being made, the emulator can also do S-Video and RGB pipelines from the TMS. So I'll just enable one or both of those for the ColecoVision. My original thinking was a bit hard line probably, i.e. that the real ColecoVision has only an RF connector, and composite is as close as I get to RF, so that's that. That should be a really easy change. Definitely possible tomorrow. (EDIT: nerd aside: that's not some dumb "startup effect" making the screen bounce like that when you switch the machine on, by the way, that's because my emulated CRT really needs to establish sync with the input machine. Since it's just decoding a linear video stream and all, it doesn't have foreknowledge of the 2d-ness of it all.)
  8. This emulator has been updated to correct two timing errors: an M1 delay has been added; and the SN-generated WAIT has been extended to the correct length, having been off by over 90% (!). Some people don't always spot sentences about clock dividers. Otherwise, as a reminder, it's: a completely Cocoa-native application on the Mac; that also supports building for SDL for other UNIX platforms; which emulates at full cycle fidelity (the CPU, the VDP, everything); including its Super Game Module emulation; which builds its audio stream by producing a full machine clock-rate audio stream and then filtering that down to whatever your machine can handle (so: all speech effects, etc, should work perfectly, despite the code containing zero special cases); which builds its video stream as genuine composite video and then decodes the same on your GPU; which supports USB and Bluetooth joypads; which puts no restrictions whatsoever on window sizes, or how many ColecoVisions you have running at once, across however many screens (or even tabbed); and which requires no setup whatsoever. Just double click your game and play. It's MIT licensed, and available here. Feedback always appreciated!
  9. It's a little reading between the lines, and not exactly how I stated it above, but my expectation in that area comes from the Z80 manual, specifically the Z80 instruction breakdown by machine cycle that begins on page 85 where e.g. the machine cycle breakdown for ALU A, (IX+d) is described as (page 88): OCF(4), OCF(4), OD(3), IO(5), MR(3) Which, to save you the effort of looking up the key, means: opcode fetch (4 cycles), opcode fetch (4 cycles), operand data read (3 cycles), internal operation (5 cycles), memory read (3 cycles) The whole instruction is of course two opcodes plus the offset that forms the '+d', but the time that it takes to read the offset is 3 cycles rather than 4. So immediately you know that there's no refresh cycle in there, and a relatively safe assumption is that it's just a normal read cycle. That's even though the '+d' comes from the program counter like a real opcode. There's certainly no three-cycle pattern documented that signals M1, and M1 is documented as occurring only during "the op code fetch cycle of an instruction execution". That's elsewhere in the manual, alas, but I think it's likely that it means "opcode fetch" in the same sense as does the timing diagram — i.e. not to include operand data reads. Admittedly that doesn't amount to establishing the absence of an M1 beyond reasonable doubt, but I'd argue it's beyond balance of probability. Sadly I'm not aware of anybody that has ever just connected up an oscilloscope and produced hard evidence either way. EDIT: though, actually, further weak evidence on that particular type of instruction being weird is that, if memory serves, immediate operand instructions like LD (IX+2), 32 put the immediate operand before the offset, which means it's the only instruction where the immediate operand isn't the final thing. Or maybe it's just me that often trips up on that.
  10. Great, thanks for your reply! My previous thread, apart from reaching the wrong conclusion, gained no traction whatsoever so the unprompted mention of a sound chip wait period here was too much to pass up. So I guess it depends most strongly on what you think the SN76489 data sheet is trying to communicate with an "approximately" 32-cycle wait — e.g. is it a fixed period independent of the clock rendered into cycles on the assumption of a 4Mhz clock? — but if it were exactly 32 then I'd expect an output cycle to be extended by an additional 31 cycles both naively, given the single cycle of wait that's built in, and by thinking about when WAIT would first be sampled high if the 32 count kicked in exactly on the downward transition of T2. So the whole machine cycle would be 35 cycles. On the separate issue of the M1, yeah, I understand that to be a single cycle delay during the M1 machine cycle. So it's then the same length as every other memory access cycle. So it's five cycles for a plain NOP, as refresh isn't affected, and the differences to more complicated instructions depend on whether they signal M1 during opcode fetch. Speaking extemporaneously only, I think that some of the more compound forms, like the (ix+d)/(iy+d)s, don't actually signal M1 for everything they fetch from the PC — if memory serves then the opcodes after the offset are issued as ordinary read cycles. Or maybe I'm thinking of the offsets. Either way, I'm pretty confident I have it implemented properly in my little simulator.
  11. In case anybody finds this in the future; I had failed to spot the divide-by-eight that precedes the four-cycle number. So the conclusion posited above is incorrect. See http://atariage.com/forums/topic/286986-m1-delay/ for real discussion of this topic.
  12. Apologies, I'm unclear how to interpret this, so I'm going to ask a further follow-up. I'm looking at this timing diagram: I count T1 high, T1 low and T2 high before WR and IORQ go low. So is that not 1.5 cycles before the SN can possibly receive chip enable? (rounded to half cycles, anyway) Then it's another low/high to the first sampling of WAIT, then however long the SN is still holding that for, with that total delay rounded up to a whole number of Z80 clocks because that's how often the Z80 seems to sample, then another low/high/low to complete the machine cycle. ... I think that last part clears it up for me regardless. It really is 32 cycles, in terms of the same clock as the Z80, between chip enable and the WAIT(/READY) line deactivating. I had failed to observe that the same behaviour would happen on an input, but I guess since there's no reason to perform an input that maps to the SN, I hadn't really thought about it. Yes, that sounds like a good excuse.
  13. Actually, belated follow-up on that: is the sound chip delay just a single cycle? The data sheet mentions an "approximately" 32-cycle delay being signalled via the READY line, which goes straight to WAIT, starting from chip select going active. So shouldn't the Z80: begin its output cycle; trigger the SN's chip enable 1.5 cycles after that, when write goes low; 1 cycle after that sample WAIT and spot that it needs to wait; have to wait for "approximately" 32 - 1 = 30 cycles; then do the last cycles and a half of its output cycle? So that's 31 extra cycles? That sounds like a huge amount, so I'm sure I'm getting something wrong here.
  14. Looking at this schematic I see a 74LS74A which appears to be set up to trigger a cycle of WAIT upon every M1. That's not too uncommon for Z80 machines because it extends the opcode fetch read from 2 cycles to 3, moving it into line with most other accesses. It's even offered as a sample external circuit in at least one of the Z80 data sheets. Yet I've never heard about that in ColecoVision world. It would if I'm reading things correctly give operations the same cost as on the MSX. So e.g. NOP would take 5 cycles, not 4. Since I trust my schematic reading maybe only 70%, having no formal grounding in electronics, can anybody confirm or deny that the ColecoVision inserts an extra cycle into every M1 machine cycle?
  15. Being primarily an emulator author [of no significance], I may be viewing this from the other side of the spectrum: lack of in-depth documentation makes it an extraordinarily difficult platform to build a realistic model of; this is reflected in the fact that even with only 73 commercial titles, there still isn't an emulator that runs all of them completely without issues; and the consequence of it is that development is very difficult. If programming for something like a C64, a modern emulator is safe to trust to complete 99% of the job, but on a Lynx I would nowadays be reluctant to write almost anything without verifying it on real hardware regularly. So my argument would be: the absence of exact timing information that can be relied upon is problematic for potential authors of new software because it restricts the tooling. I am absolutely able to believe that the Jaguar is an order of magnitude worse but, no, I've no direct experience.
  16. I believe the opposite; the documentation states that "The interrupt signal comes from the timer when ... This signal then requests the bus control circuit to give the bus to the CPU. If the CPU already has the bus, then this function causes no delays. If Suzy If Suzy has the bus, then the maximum Suzy latency time could be incurred." so you just may not wake up quickly. Further supporting evidence: the raster bars in the background of Shadow of the Beast (which even the latest Mednafen still doesn't reproduce stably); anecdotally, a very poor attempt I once made at raster bars, which completely destroyed my sprite output; the fact that SPRGO offers feedback — "Write a 1 to start [sprite drawing], at completion of process this bit will be reset to 0. Either setting or clearing this bit will clear the Stop At End Of Current Sprite bit." (whereas obviously if the CPU could not awake until Suzy was done, no declaration of state would be necessary); and the slightly awkward documentation of WAITSUZY, which: "puts the CPU to sleep which allows the sprite engine to run; it returns control to you only after the sprite engine is finished" (and is entirely different from BLL's WAITSUZY, which just spins on the maths flag).
  17. I was going to argue it from the other direction: given that the requirement is a four-byte boundary rather than an 8-byte, it doesn't necessarily have the logic to distinguish 5 4 4 4 4 4 4 4 from 5 4 4 4 5 4 4 4. But, either way, since Mikey seems to access in 8-byte spurts it's bound to be one of those two patterns. Re: refresh, don't forget that — 8-bit or otherwise — the Lynx is a 1989 machine. So it supports CAS-before-RAS and hidden refresh; indeed it almost certainly isn't using classic RAS-only refresh because the advantage of those two is that the row counter is inside the RAM, so that's one less thing for Mikey to keep track of. If and when more is known about Suzy, a test might be to see what the penalty is when drawing with Suzy with the display off, since Mikey won't actually strictly need to do any memory accesses.
  18. Why shouldn't they? I suggested it was a potential area for exploration. I'm going to avoid getting into what's x bit and what's y bit completely. Per the documentation, up to 15 ticks for a cartridge read, up to 20 for an audio shifter read. If you're asking me to guess — even though the main thing I'm saying is that it would potentially be interesting to get into a position where we didn't have to guess — I think that reading the DPRAM would probably be more power efficient as that's just Mikey talking to Mikey. Not Mikey talking to Suzy talking to the cartridge ROM. You've just to figure out which of the audio registers reads from Mikey's dual-ported RAM. I'm an audio dunce, so don't trust me, but I guess that volume is the main one? It's implied that the pause is caused because the four audio channels are available to the CPU only in a round-robin fashion, so you might even do better with a loop that deliberately polls them in reverse order, assuming you can figure out which order is reverse! It is a frustratingly under-documented platform. My understanding was that sleep works for as long as Suzy wants the bus, but doesn't work for power saving. Or doesn't work properly, at least.
  19. I was thinking more like ask Suzy to draw something that isn't even on screen, and link that SCB back to itself. So Suzy will enter an infinite cycle of reading the SCB, reading the source data, not drawing anything, and repeating. Until an interrupt comes, causing Suzy to stop and passing the bus back to Mikey. There's no reason whatsoever to believe it would save more power than `.here BRA .here`, or that it necessarily wouldn't. So it'd just be interesting to know. It doesn't actually decrease memory usage, so polling one of the hardware registers that temporarily blocks the CPU (i.e. cartridge read or the audio shifters) is probably a better guess. Both much worse ideas than if CPU sleep actually worked properly, though I had the opposite impression to Karri: I thought it woke up much too often, rather than potentially never waking up. If it were the former, that'd presumably still be the way to go.
  20. Oh, it's WAI from a power-saving point of view? Given that the intended power-saving hardware doesn't work on a Lynx, it might also be interesting one day to do a search for the optimal way to waste time? E.g. does Suzy running what amount to a no-op loop consume less power than the CPU? What about constantly trying to interact with the cartridge or with the audio subsystem in order to generate CPU waits while you loop waiting for end of frame? Etc.
  21. Yeah, the CPU has to surrender the bus to the blitter so you're potentially losing quite a lot, especially when you consider the mechanism of clipping given the requirements: must work for RLE compressed data, must work for scaled data. So actually, I would dare imagine that the optimal horizontal scroller would do something like keep a set of pre-rendered columns, e.g. each of 32 pixels wide to contain four or two columns of your source tile map. Fill those columns on demand using Suzy, draw whichever of those columns is at least partially visible to the display. Composite your sprites directly in the display, don't worry about updating and maintaining your 32-or-whatever-width cache columns. Just fill them once, then keep them until you discard them. For an eight-way scroller you'd probably do that with macroblocks being squares rather than columns. My personal theory on why so many Lynx games are fairly slow-paced is that they're just optimised for the blurry LCD. Indeed, the fastest scroller I can think of is probably the BMX event from California Games, which was likely written long before the exact screen was selected. EDIT: my top demo-esque programming tip though: don't overlook the ability of using the blitter to allow you to be more lazy with the 6502. E.g. you can assemble a 160-width one-byte-per-pixel scan line on the 6502 stack, then use Suzy to stretch blit that into 4bpp form at wherever the frame buffer really is.
  22. Ticks are respective to the 16Mhz clock. It's jargon from the official hardware documentation, though it's not exactly pervasive. Maybe I was wrong to assume. Anyway, the CPU doesn't actually run at 4Mhz. It opens each new instruction with a five-tick random access fetch. Some sort of external lookup table observes the embedded version of the SYNC line and determines some sort of estimate of the number of page mode accesses that can be generated as a function of the opcode, subject to other machine state. To my mind the documentation is a little ambiguous whether that table can indicate that only the next byte may be page mode, or if it can potentially enable it for the next two, or possibly even provide a more complicated pattern than that (e.g. LDX (zero),x offers a pretty obvious predictable second location for a page mode access). So the speed at which the CPU executes while it is running is somewhere in the range 3.2Mhz to 4Mhz depending on instructions used, exactly what that lookup table can specify, and when the video interruptions fall. And, of course, if you're using Suzy for blitting then for those periods the CPU gives up the bus, subtracting some more in terms of how many processing cycles a real game actually expends per second. The processor is documented to be 5 ticks per random access, 4 per page mode. Mikey and Suzy aren't documented. Mikey activity is as implied above: it will pause the CPU; to fetch 8 bytes at a time; taking approximately 2us to do so; the amount of time it uses to fetch and output a single line is fixed, the programmatic frame rate is achieved by adjusting the amount of time between lines. I've done no investigation yet into Suzy, and I don't think anybody else has either, but it is definitely a page mode user. Besides anything else, there is an SCB palette-reading bug if you run the last section over a page boundary. Somebody with a logic analyser would be able to figure all this stuff out a lot more quickly, but flash carts are much readily available! The hardware palette is indeed manipulable in real time, but it would help you further with Mikey timings only. The CPU does not run at the same time as Suzy (i.e. the blitter). For that I am leaning towards using an interrupt to interrupt Suzy after a fixed amount of time and seeing how far she got.
  23. It's not all that useful on a Lynx, as there's really not a lot you can do by racing the beam. The only thing happening in real time is output of a frame buffer — there are no hardware sprites, there's no tile map, etc — so palette changes are probably the full extent of potential effects. E.g. if you pause Roadblasters and inspect RAM you'll see that the curved road is just something that has been drawn to a bitmap. Compare and contrast with other 8-bit consoles, and some of the computers, where you'd achieve that by twiddling the scroll offset as the frame runs.
  24. None that I'm aware of, believe it or not. I'm still optimistic I'll be able to try some things soon, but whether I'm even a good candidate for the job time has yet to tell. That said: the programmer selects the refresh rate, and the numbers above show that the number of CPU stoppages increases with greater refresh rates. Each set of data also suggests that there are 10 stoppages per line. Each line being 80 bytes wide leads to my 8-byte buffer guess. So it's a little odd to me that the video output buffer need be only quad-word aligned. But given that it is, a reasonable guess might be that each interruption is: 5 ticks to fetch the first byte; then 4 ticks, 4 ticks, 4 ticks, to fetch the next three in page mode; then 5 to fetch the fifth byte; then 4 ticks, 4 ticks, 4 ticks, to fetch the remaining three. i.e. each time the CPU is paused to grab some more video, it is stopped for 34 ticks. And that happens as often as is necessary to get enough data to draw the display. Further evidence that those numbers are likely to be similar to true is the difference in loop lengths. It looks like approximately 4us for a loop iteration under normal circumstances, increasing to 6 every time there is a video-related stoppage. 34 ticks would be 2.125us. Otherwise, it's clear from the test above that video fetch occurs instead of refresh, so we can rule out RAS-only refresh. Which might be a helpful observation at some point.
  25. In principle you could disable video DMA via bit 0 of $FD92, but as far as I'm aware nobody has ever tested whether that takes effect instantly, or is merely latched until end of frame. If you test it out, let somebody know! It took me twenty minutes, but eventually I learnt that 'VSP' is a C64-community term meaning 'variable screen position'. There seems to be an implication of wraparound addressing, so it's scrolling in much the same way as you might on something like a Master System where crossing an 8-pixel boundary means updating only a single row or column of tile indices, not shifting the entire set. So knee-jerk comments: The Lynx is ahead of its time; it's usually treated like a machine a couple of generations or so after the C64 in that the expectation is that every frame is drawn completely afresh into a plain frame buffer using the blitter. No hardware tilemap and usually no partial display updates. Total memory bandwidth is only somewhere between 60% and 100% greater than the C64, but the blitter makes up for the difference as it lets you do those graphics manipulations without also having to pay for an instruction stream. The main hassle is that there are diminishing returns in terms of graphics size; because Suzy has to read a complete SCB for each displayed item, and the graphics data pointed to by SCBs needs to include end-of-line markers and usually a per-line padding byte to avoid hardware bugs, drawing a lot of small things is disproportionately more expensive than drawing fewer large things. If you wanted 8x8 tiles, that might actually be problematic. There's definitely more than enough time to draw and redraw large graphics though. Therefore I guess that if you really wanted a perpetually-scrolling view into a buffer where you had to pay for redrawing only the column/row that comes into view, you'd probably just want an uncompressed buffer somewhere in memory that you draw to the display two (for 1d scrolling) or four (for 2d scrolling) times, and which you declare as the drawing destination for Suzy when updating. It's just going to have to be at most 156 pixels wide. So add an obscuring side bar on output?
×
×
  • Create New...