Jump to content

ZackAttack

Members
  • Content Count

    785
  • Joined

  • Last visited

Everything posted by ZackAttack

  1. See the 4 permutations below. By map I mean writing to the corresponding bank lo register for a given page size. Are you suggesting that #3 should leave slot c unchanged or is what I've listed below what you're expecting? 1. Map a 4k page to slots 0-f, then map a different 4k page to slots 0-f. -Slots 0-f contains the second 4k page 2. Map a .25k page to slot c, then map a different .25k page to slot c. -Slot c contains the second .25k page. Slots 0-b, d-f are unchanged. 3. Map a .25k page to slot c, then map a 4k page to slots 0-f. -Slot 0-f contains the 4k page. The .25k page is no longer mapped. 4. Map a 4k page to slots 0-f, then map a .25k page to slot c. -Slot 0-b,d-f contains the 4k page. Slot c contains the .25k page.
  2. Just ordered a cart from MacRorie. Going to port over the Strong-ARM driver and see how well bus stuffing works with the STM32.
  3. The only reason I had for implying the mode is that it makes it a little easier and faster to use a mixture of them. Looking at your proposal again, you already have that covered. What you proposed sounds fine with me. Just to clarify will the ROM be viewed as a single array of pages where any page can go into any slot? With 16bits for each slot that allows for 16MB with 256 byte pages or 256MB with 4k pages. That plenty of room for future growth. Having a large slot wipe out smaller slots is exactly how I'd expect it to work. You probably don't need to worry about that.
  4. Not sure if this is a better idea, but couldn't the Memory Protection Unit (MPU) be configured so a protection fault is generated anytime an access to LPC RAM is attempted? The handler would fix up the address for reads and writes. If the access was for execution it would just return to the correct execution point in the STM32 RAM. Fortunately the STM32 is significantly faster due to the fact that it's clocked higher and doesn't suffer from the hardware bug that cripples the LPC when running arm code from ROM. The speed up of all the code that doesn't access RAM would hopefully offset the penalty of fixing up the address when RAM is accessed. If not the program could probably be patched on the fly to eliminate most of the protection faults.
  5. The mostly sounds good, but I don't really understand what is gained from the explicit mode bits. Isn't easier to just imply the mode from which address you write to? That's what 3e does and I've found that easy to work with. I definitely agree with supporting 4k down to 256 bytes for the size of the slots. Being able to bank in different look up tables could be very useful. Btw, I posted a prototype side scrolling engine based on 3e to the Castlevania topic a while back. That could easily be adapted to show off what's possible with this new scheme. The biggest reason I didn't develop it further was because I couldn't find hardware that supported a large enough bin to have enough level data. Ideally this gets polished up and made into a batari basic kernel so it's accessible by more of the development community.
  6. Don't forget about the mirrors. Legacy bank schemes avoided differentiating mirrors because of pin counts and complexity. You already have the full address bus, so why not leverage it. Something I had proposed a while back was to use different mirrors for different page sizes. I.E. Storing $05 to $003e would activate the 5th 4KB page. Storing $05 to $013e would activate the 5th 2KB page at $f000-$f7ff. Storing $05 to $023e would activate the 5th 2KB page at $f800-$ffff. You can take that further and also have different hotspots for different configurations as well. I.E. Storing $05 to $003f would activate the specified RAM page as write only. Assuming the RAM is capped at 256KB or less the top 2 bits can serve as a mode mask, 0?=R/W, 10=RO, 11=WO. That should help conserve how many hotspots are needed. I also like the idea of having a metapage hotspot so you can essentially swap 1MB chunks as needed. I assume that one would have some overhead since loading the next 1MB chunk from SD would take some time. IMO, this would be the closest and most feasible solution to mimicking all the other schemes flexibility and also supported giant carts.
  7. Did you use stella debugger to "trapwrite SWCHA"? That should point you to the spot where the joystick is read. Both joysticks are at the same address. They each get 4 of the 8 bits in that byte. If you're lucky SWCHA is read once for each player, otherwise you'll have a much more complicated task.
  8. For the 3E scheme I would think favoring the ROM over RAM would make more sense. I just can't imagine a use case for that much RAM in a scheme that doesn't also have some form of ARM accelleration. More ROM can always enable more level, graphics, or game play. More RAM can enable some more complex kernels, but diminishing returns would apply well before 160K. I believe the harmony currently supports 3E with up to 32KB ROM and 16KB RAM and I'm not aware of anyone that has ever come close to exceeding the RAM limit. 32KB of RAM is interesting because you could have a double buffer of a fully unrolled display kernel. Maybe double that if you also include a giant dynamic game world. So I guess my interest would plateau at around 64KB of RAM. Looking forward to whatever you decide to go with.
  9. 160x192 procedurally generated bitmap is working in Stella. Need to optimize a bunch and then it can be tested with harmony.
  10. Updated the POC to include placeholder writes for audio and color. It fits perfectly! Next step is to generate the GRP values from the bitmap buffer. Kernel A: { vcsJmp3(); // AUDV0 BusStuff(COLUP1, 0x2c); vcsJmp3(); // COLUP0 vcsJmp3(); // COLUP1 vcsJmp3(); // COLUBK BusStuff(COLUPF, 0xcc); BusStuff(ENAM1, i); vcssta3(RESP0); BusStuff(COLUP0, 0x3a); vcssta3(RESP0); BusStuff(COLUP1, 0x64); BusStuff(COLUP0, 0x48); BusStuff(ENAM1, i >> 2) BusStuff(COLUP0, 0x56); vcssta3(RESP0); BusStuff(COLUP0, 0x72); vcssta3(RESP0); vcslda2(aMask); BusStuff(COLUP0, 0x80); BusStuff(COLUP0, 0x9e); vcssta3(RESP0); BusStuff(COLUP0, 0xb2); BusStuff(COLUP0, 0xe4); BusStuff(COLUP0, 0xfa); vcslda2(aMask); BusStuff(COLUP0, 0x1e); } Kernel B: { vcslda2(aMask); vcsJmp3(); // AUDV0 BusStuff(COLUP0, 0x1e); BusStuff(COLUP1, 0x2c); vcsJmp3(); // COLUP0 vcsJmp3(); // COLUP1 vcsJmp3(); // COLUBK BusStuff(COLUPF, 0xcc); BusStuff(ENAM1, i); BusStuff(COLUP0, 0x3a); vcssta4(RESP0); BusStuff(COLUP0, 0x48); vcssta3(RESP0); BusStuff(COLUP1, 0x64); BusStuff(COLUP0, 0x56); BusStuff(ENAM1, i >> 2) BusStuff(COLUP0, 0x72); vcssta3(RESP0); BusStuff(COLUP0, 0x80); vcssta3(RESP0); vcslda2(aMask); BusStuff(COLUP0, 0x9e); BusStuff(COLUP0, 0xb2); vcssta3(RESP0); BusStuff(COLUP0, 0xe4); vcslda2(aMask); }
  11. Yeah, everyone that tested the most recent attempts reported success. Batari's suggestion to use sta,stx,sty,sax so only 6 of 8 bits need to be stuffed really made a big difference. Couple that with stuffing high for low failures and it seems to be enough for even the most problematic systems. Unfortunately all that complexity eats up a lot of CPU cycles, but if you're very careful there's just enough time to figure out what value and where to stuff it between each store instruction.
  12. There's an easy fix for that. Just omit the branch instruction. Each line is 50 bytes of instructions so the PC will need to be reset back to $f000 ever 80 or so lines. When a color hasn't changed you replace the STA COLUxx with JMP $f000 and as long as that happens at least once every 80 lines it will be good to go. I think the bigger problem is going to be the limited RAM in the harmony cart. 3,840 bytes of graphics data and 384 bytes of color data leaves just 3,968 bytes of RAM to hold driver, display kernel, and game variables. The kernel code for bus stuffing will be much larger because in between servicing the 6502 bus it has to figure out which values to write to the TIA registers based on the graphics buffer. If we did all the bit shifting, anding, and oring during overblank it probably would take multiple frames just to draw the full bitmap and the buffer would need to be about 1.5K bigger. Plus there wouldn't be any time left for game logic. We only need to load one of the kernels into RAM each frame, so we could always swap them to save RAM. Obviously that will eat up some CPU time though.
  13. Most of the best selling homebrews have been based on some form of ARM acceleration lately. This ship has sailed. That said, I think the best use of a 160x192 bitmap would be to display a logo screen for a few seconds on startup so it's obvious that the game you're playing uses modern 2600 cart tech. Btw, if you are hoping not to offend, you should refrain from posting things like "Sloppy ARM code"
  14. Are you aware of the debug colors mode in the Stella emulator? (alt + , on PC) That feature and enabled/disabling individual players and missiles should make it easy to see how any game is put together visually.
  15. Indeed, dealing with the extra copies of the missile make it difficult to leverage them. Fortunately it's manageable when using double wide spacing. Here's a prototype for a full 160x192 30hz kernel. Maybe with some more effort this could be optimized further to work without bus stuffing. When I get a chance I'll port it over to harmony and implement the actual bitmap. M1 fills in 2 gaps, and BL overwrites a single pixel where GRP0 overlaps between 2 copies. I think interlacing is going to be difficult or impossible. In order to line up everything perfectly between the alternating frames you really need to shift the frame over 24 pixels. Each half of the screen is one of the frames that would need to be flickered at 30Hz to produce the 160 pixel bitmap. fullbitmap.bin fullbitmap.asm
  16. Maybe the way to get to 40 is using resp0 strobe trick for P0 and putting a wide double P1 between the P0 triplets. There would be some gaps between players, but it seems feasible to have them align with the spaces between characters and use the ball and missiles to fill in any gaps that are left. 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0
  17. If you provide a perfectly aligned 40 character demo that is a mock-up for bus stuffing I'll implement it for the encore cart. Since you'll have to use whatever values are in A, X, and Y I think a simple vertical color bar pattern would be fine.
  18. If you take that to the extreme 34.4fps should be possible. That might actually be tolerable. ; Produces a repeating pattern where 31 of 54 pixels can be set (31*60/54)=34.4fps ;[--po0--][--gr1--][--po1--][--gr0--][--gr1--][--gr0--][--po0--][--gr1--][--po1--][--gr0--][--gr1--][--gr0--] ; ________ 00000000 111111100000000 11111111 ________ 00000000 111111100000000 11111111 sta RESP0 sta GRP1 sta RESP1 sta GRP0 sta GRP1 sta GRP0
  19. You could always use 3E with enough RAM to buffer the entire screen as code and do lda #, sta zp for everything. After accounting for loop and bank switch overhead you'd probably have enough time for 10-12 TIA writes. That's enough for an assymetric PF and colored sprites. Of course writing a new PF to the buffer would need to span a few frames.
  20. I would suspect how you're reading the values from the address bus. The JRs and 7800s are notorious for throwing the wrong value on the address bus in between cycles. A12 seems to be the biggest offender. On my JR A12 will always go low for a small portion of each cycle even when transitioning from two locations that both have A12 high. I work around this by predicting the next address value in the Strong-ARM driver but that probably wouldn't work with regular bank switch schemes. Instead you may need to introduce a delay between when the address bus starts changing and when you read it or just read it multiple times and throw out any transient values. Too bad I didn't take you up on your offer for a PCB. I have a 7800, JR and logic analyzer already set up for exactly this type of issue.
  21. I looked at the code you provided and played with the bin and stella debugger. Since you're using the timer for all 3 portions that seems to rule out any possibility of variable length code putting you over/under by a scanline. I suspect the problem is in the TIMER_WAIT macro. My guess is that it needs to do a "sta WSYNC" between each poll of the timer to compensate for the course resolution of the timer register that's being checked.
  22. You're right. I totally overlooked that. The first observation about the PF and GRP1 values both being loaded into the A register still stands though. That will definitely cause a problem since the A register will only have whichever one was loaded last. GRP1 is going to be set to PLAYF5.
  23. See above for one problem that I see. LDX has an abs,y addressing mode available. You could use X for PF data and A for GRP data. You're still not at exactly 76 cycles either. Nothing is going to look right until you put in WSYNC or make the kernel exactly 76 cycles. ​EDIT: NM about the WSYNC it was scrolled off my screen and I didn't see it.
  24. Looks like you keep selecting a new random position each frame that the button is held down. Have you tried adding some additional logic so the reposition only happens when the button is pressed and was previously not pressed? Even a fast button press is going to span a few frames, which would cause what you're seeing.
  25. Once you modify your kernel to load PF data from ROM it will put you over 76 cycles. There's another method which is faster than masked draw which could get you back down to 76 cycles, but it'll come at the cost of using more ROM to store the graphics data and will use more RAM too. In order to better advise you about the best kernel solution I need to understand your goals completely. Would you please answer the following? Are you primarily doing this because you want to learn assembly, learn Atari 2600, learn programming, port the game to Atari or other? Do you have a specific cart technology that you wish to target? I.E. 4KB ROM with no extra RAM, 256KB ROM with extra RAM, ARM coprocessor, etc. Does it have to be written in assembly? Please provide a mockup of the worst case you want to support. Like the mockup you provided with Wally Week, but include the most enemies objects etc you expect to exist on one screen. Will the level design allow for portions of the screen to be shared between rooms or any other cheap and easy methods to compress the data? How many rooms do you want to include in the final game? How many frames of animations for P0? How many frames of animations for P1? (This should be a sum of the frames for each object that will be represented with P1)
×
×
  • Create New...