-
Content Count
785 -
Joined
-
Last visited
Content Type
Profiles
Member Map
Forums
Blogs
Gallery
Calendar
Store
Everything posted by ZackAttack
-
Figured out the problem with the JSR function. Forget to account for mirroring. Was waiting for it to access $1a instead of $011a. Version 4 has the properly fixed driver and updated JSR function. At this point I believe this should be compatible with all systems. I'm thinking about under clocking the ARM processor during development just to add some margin for error. There's also some additional audio in version 4. Turns out I had a lot more space left in the ROM than I realized. There's still some things to work out with the linker script.
-
Thanks for all the feedback. This was certainly a tricky one. Turns out the problem was the hold time of the last ROM byte injected before a switch to zeropage. The driver change that I made in version 2 was correct, but it resulted in a breaking change to the JSR function. Unfortunately I ran the wrong bin file and thought version 2 was working with the driver change. Turns out version 2 doesn't work anywhere I'm still trying to figure out how to fix the JSR function to work with the fixed driver, but for now I just hacked the original driver to waste some cycles before tristating the data bus. This seems to resolve it, but it's not as robust as it should be and I will fix it right once I figure out this JSR problem. What's interesting is why it worked in my testing but doesn't work for anyone else. I still had the harmony cart plugged into my test harness which allows the logic analyzer to be attached to the Atari busses. As CPUWIZ pointed out a long time ago, these "mile long" wires could cause problems. In this case it caused the hold time to increase artificially and compensated for the flawed driver. I plugged my harmony cart directly into the 7800 and then the problem appeared for me as well. I apologize for this testing failure on my part. Obviously I will test in this configuration from now on and reserve the harness for debugging purposes only. I've uploaded version 3 which uses the hack to extend the hold time and works when plugging the harmony directly into the Atari. Hopefully this will work for everyone now and serve as a reward for helping me find this problem.
-
Ah, yes. It's strange though. This really doesn't do anything questionable, especially not compared to the bus stuffing drivers we experimented with. It should just look like a normal bank switched rom to the 6507 and the 6507 is in charge of all the writes. Darrell are you flashing the harmony or loading via menu from SD card? There's a good chance this only works when loading it from the menu.
-
Which part do you think is super risky?
-
Would you please try again with the next build in the first post? I changed the driver to wait for the correct ZP address to appear on the bus before tri-stating the data bus. Previously it was tri-stating the data bus as soon as the address changed away from the current ROM location. It's possible that your system is more sensitive to violating the hold time than the ones I'm testing on. Another possibility is that there is a bad bit in the RIOT RAM. Currently there are instructions executed form $80 to $B1. If the driver change doesn't help I can send you a different program that doesn't ever jump execution to RIOT RAM. Thanks for testing this.
-
That's NTSC right?
-
Here is a ROM that uses the power of the Harmony cart to display 10 colored sprites per line. It uses a prototype driver that isn't supported in emulation so you'll have to put it on a Harmony cart to run it. I'd like to get some feedback on how well this works on different systems. I've tested on a NTSC JR and 7800 and the results were good. Thanks to rbairos for providing the converted audio and graphics. Current Build: back-to-the-future4.bin Edit: 12/18/2017 - Uploaded version 4 with proper driver fix and an additional second of audio samples Edit: 12/16/2017 - Uploaded version 3 with extra nops to increase hold time Older Builds: back-to-the-future3.bin I've included a picture of it running for those who don't have a harmony and would like to see. There's more to it than what's in the picture though, so I'd recommend running it if you have the hardware handy. Here is the code for the display kernel. It's pretty messy due to the extreme optimization and complete misuse of the 6507 JSR instruction. In order to compensate for artifacts from the JSR instruction the ball is used to mask some pixels. There wasn't enough time to both resize it and enable it every scan line. So CTRLPF is used to do both by changing the size and the PF priority. The ball is always enabled, but it is shown or hidden based on the PF priority. Since JSR is used to write to GRP1 and GRP0 it's called a second time to write to AUDV1 and AUDV0. This allows for 5bit sampled audio, though this demo is only using 4 bits because of space constraints. After the GRP and AUD registers are written to a TXS is performed to reset the SP back to GRP1. The X register is preloaded with the address GRP1 in vblank and remains that value during the entire frame. Y is used for the color of the right most sprite and is loaded on the previous scan line. JSR values are computed by subtracting the number of bytes consumed by the instructions until the next JSR. Each JSR is effectively loading the values that will be stored in the next JSR when the PC gets pushed to the stack. The JSR must always target an address in the ROM. So D12 is always set. Because there are about 30 bytes between each JSR there are some values which require D11 to be set as well in order to avoid starting outside of ROM space. This is why 0x800 is added if D12 was cleared by the previous subtraction. This also increases the ball size to 2 so it covers both pixels represented by D12 and D11. Whether or not the ball was enabled was determined by D12 in the original value. There is a slight artifact if you have the second to last sprite with the pattern xxx1000 and the last sprite has a value less than 32 or so. In this case you end up with xxx1100. There simply isn't enough time to move the ball over one in order to mask just D11, so D11 remains visible. This is a very small percentage of the time and isn't very noticeable so it should be acceptable. It takes about 6 lines of vblank and overscan to fill the audio buffer, position objects, and initialize everything for the next frame. The 6507 runs a routine from ZP RAM for the other 64 lines. This routine updates the audio registers and performs the vsync. During those 64 lines the ARM CPU is currently idle. All the display kernel lookups are done between putting bytes on the 6507 data bus. Hopefully this provides enough time to load in more data from the EEPROM or the SD card. It at least provides ample time for some game logic and audio calculations though. Eventually I'll be posting the entire source to GitHub for everyone to enjoy, but I want to stabilize the design some more first. for(; i < 192;) { // Left group starts cycle 0 vcsJsr6(jsrl); // G, I Graphics <-e1 results in I=$ff vcsWrite5(GRP0, pGraphics[i*10]); // A graphic vcsWrite5(GRP1, pGraphics[i * 10 + 2]); // C graphic vcsWrite5(GRP0, pGraphics[i * 10 + 4]); // E graphic vcsWrite5(COLUP0, pColors[i * 10 + 0] << 1); // A color vcslda2(pColors[i * 10 + 2] << 1); // C color vcssta4(COLUP1); // C color vcslda2(pColors[i * 10 + 4] << 1); // E color vcstxs2(); // Should be 36 cycles prior to here vcssta3(COLUP0); // E color vcslda2(pColors[i * 10 + 6] << 1); // G Color - 41 // 0x20 - 23 = 0x09 => sample will offset range by 0-15 giving 0x20-0x2f vcsJsr6(0x1009 + ((sampleOffset & 1) ? pAudioSamples[sampleOffset >> 1] >> 4 : (pAudioSamples[sampleOffset >> 1] & 0xf))); sampleOffset++; vcssta3(COLUP1); // G Color vcssta3(GRP1); // Flush delay register vcssty3(COLUP0); // I Color - 56 i++; jsrr = (((unsigned short)pGraphics[i * 10 + 7]) << | pGraphics[i * 10 + 9]; // H and J graphics bytes vcsWrite5(HMP0, 0x80); ctrlpfr = ((jsrr & 0x1000) >> 10) ^ 0x5; vcssta3(HMP1); jsrr = (jsrr | 0x1000) - 33; //31 bytes between JSRs vcssta4(HMBL); if ((jsrr & 0x1000) == 0) { jsrr += 0x800; ctrlpfr |= 0x10; } vcsldy2(pColors[i * 10 + 9] << 1); // J Color vcsWrite5(HMOVE, 2); StaggeredFrame: if (i >= 192) { break; } // Right group starts cycle -1 vcsJsr6(jsrr); // H, J Graphics <-de results in J=$ff vcsWrite5(GRP0, pGraphics[i * 10 + 1]); // B graphic vcsWrite5(GRP1, pGraphics[i * 10 + 3]); // D graphic vcsWrite5(GRP0, pGraphics[i * 10 + 5]); // F graphic vcsWrite5(COLUP0, pColors[i * 10 + 1] << 1); // B color vcsWrite5(COLUP1, pColors[i * 10 + 3] << 1); // D color vcslda2(ctrlpfr); vcssta3(CTRLPF); vcslda2(pColors[i * 10 + 5] << 1); // F color vcstxs2(); // Should be 39 cycles prior to here vcssta3(COLUP0); // F color vcslda2(pColors[i * 10 + 7] << 1); // H Color - 44 // 0x20 - 22 = 0x0a => sample will offset range by 0-15 giving 0x20-0x2f vcsJsr6(0x100a + ((sampleOffset & 1) ? pAudioSamples[sampleOffset >> 1] >> 4 : (pAudioSamples[sampleOffset >> 1] & 0xf))); sampleOffset++; i++; jsrl = (((unsigned short)pGraphics[i * 10 + 6]) << | pGraphics[i * 10 + 8]; // G and I graphics bytes vcssta3(COLUP1); // H Color ctrlpfl = ((jsrl & 0x1000) >> 10) ^ 0x5; vcssta3(GRP1); // Flush delay register jsrl = (jsrl | 0x1000) - 30; //28 bytes between JSRs vcssty3(COLUP0); // J Color - 59 if ((jsrl & 0x1000) == 0) { jsrl += 0x800; ctrlpfl |= 0x10; } vcsWrite5(CTRLPF, ctrlpfl); vcsWrite5(HMCLR, 0x00); vcsWrite5(HMOVE, 2); vcsldy2(pColors[i * 10 + 8] << 1); // I Color }
-
Keep in mind that C++ is a superset of C. You can write pure C code, put it in a cpp file and call it a C++ project. C++ adds templates and classes which is what makes it more powerful. Having the object-oriented mindset that's required to take advantage of that power is just one more thing to learn. By itself it's not difficult and I'm sure you'll learn it eventually, but if you're just learning to program it may be easier to take it one step at a time. You also have to deal with the fact that embedded processors are weak and there is no OS to do all the heavy lifting for you. So managing memory and object lifetimes in c++ will definitely up the difficulty some. CDF took the lessons learned from the original DPC+ driver and the BUS driver experiments to create what I would consider the next evolution of DPC+. Maybe they should call it DPC++ SpiceWare is one of the co-authors so I'm sure he could provide you more specifics. I've also been working on writing a driver for the Harmony/Melody hardware which aims to allow the entire thing to be written in c/c++. Instead of queuing up several data streams for a corresponding 6507 kernel, you simply call some functions to write the different TIA registers. What's really cool about this is that you can do a lot of the display and audio processing during the same time the screen is being drawn. Leaving almost all of vblank and overscan free for the ARM processor to do gamelogic and maybe even read from the SD card. It's very much a work in progress but it currently works and I used it to make a kernel which is achieving 17 TIA writes per scanline without needing to use bus stuffing. If you share more about what you want to build I can tell you if this new driver would be a good fit or not.
-
Sounds like I should stick to varying the vsync timing and make it possible to switch it on or off in case it doesn't work for some. Thanks everyone for the great info!
-
c++ is certainly possible, but I think it would make things harder for you instead of easier and wouldn't produce anything better than you can accomplish with c. Either way you're going to find yourself avoiding the use of malloc and new. Have you outlined exactly what type of game you're looking to make?
-
If I understand this correctly you need to delay vsync half a scanline and add an extra scanline every other field. This results in each field having a length of 262.5 scanlines. Normal V123 4567 V123 4567 Interlaced V123 4567 89V1 <- vsync (v) delayed to mid point 2345 6789 <- Extra scanline
-
medium spaced 6 sprites with color + graphics
ZackAttack replied to rbairos's topic in Atari 2600 Programming
Ok, I've just about got this working on the harmony cart. Still need to mask off the one pixel with BL and fix the JSR target address to map it to account for the bytes of instructions between the previous JSR and the current one. Then I'll drop in some actual data instead of a procedural generated pattern. Also need to implement interlacing, but the flicker isn't too bad with the interweaved pattern. -
What's the best way to implement an interlaced display? Is adding an extra scanline on odd frames all you need to do or is it something more complicated with vsync timing. Or something else?
-
DPC+ARM - Part 6, DPC+ Cartridge Layout
ZackAttack commented on SpiceWare's blog entry in SpiceWare's Blog
Thanks for informing us about the MAM bug! I ran into this last night and would have probably spent several days debugging it had I not read about it here. -
That's what the website says, but do you know of any 512KB roms that actually work on it? Either way it's probably best to just read in the data directly from a file on the SD card. Then you can have up to 4GB in a single file.
-
Harmony doesn't support larger 3E roms. You have to make it 32k and give it a .3e extension to playback on harmony. Eventually I'm going to enable playing back samples from the SD card and then we can really have some long audio clips.
-
If the problem is a load ZP in place of what was supposed to be a load # you could probably find it with a regex search pretty quickly. Something like ld[a-z] [^#]\d should find all the loads that are loading an arbitrary address instead of an immediate value. Of course this is assuming you've used labels for all your addresses and won't intentionally be doing lda $f000 instead of lda MyData. It would be cool if you could configure dasm to error out when a load is attempted with a number instead of a label and it's not an immediate addressing mode.
-
I agree with limiting such a list to games that were published to physical carts in some minimum quantity. It would probably be good to call it "Published Homebrews" or similar in order to avoid repeated debates about what qualifies a game to make the list.
-
Good news. Batari gave me permission to use the DPC+ startup code. So I now have a prebuilt routine to initialize harmony and melody boards. This allowed me to focus on writing the actual driver code. The following C file will compile and run on my harmony cart. What's really cool is that it is running from the ROM instead of the RAM. In fact, there's only about 30 lines of assembly that actually run from RAM right now. Everything else is running in ROM. This leaves lots of memory for game and kernel logic. There's still a huge amount of work remaining, but at least it's starting to look like this will work. Since this is a completely new driver it's not yet supported in any emulators. The only way to run this is with a melody/harmony cart. strong-arm-rainbow-test-pattern.bin int main() { handleReset(); int i = 0; // Clear memory and registers for (i = 0; i < 0x100; i++) { vcsWrite5((unsigned char)i, 0); } vcsJmp3(); while (1) { vcsWrite5(VSYNC, 2); vcsWrite5(WSYNC, 2); vcsWrite5(WSYNC, 2); vcsWrite5(WSYNC, 2); vcsWrite5(VSYNC, 0); for (i = 0; i < 37; i++) { vcsJmp3(); vcsWrite5(WSYNC, 2); } vcsWrite5(VBLANK, 0); for (i = 0; i < 192; i++) { vcsJmp3(); vcsWrite5(WSYNC, 2); vcsWrite5(COLUBK, (unsigned char)i); } vcsWrite5(VBLANK, 2); for (i = 0; i < 30; i++) { vcsJmp3(); vcsWrite5(WSYNC, 2); } } }
-
Could the second volume register be used to change the loudness of each frame? Then you'd only have to set it once per frame. I think that would work pretty well for video playback.
-
You could always pad the Sprite to make it 32 high. Of course that will waste space.
-
Wouldn't the non linearity of each individual channel also effect the results? Has anyone documented the graph if only AUDV0 is used? The routine I made only takes up 49 bytes for the code and 32 bytes for the sample buffer. (64 4bit samples packed 2 per byte). I modified it to handle both AUDV0 and AUDV1 and still had 10 bytes of zeropage memory left over. The only problem is that it requires 64 bytes to be copied to ZP each frame instead of 32. So either it's going to require bus stuffing or you'll lose one or two more lines of vblank time. We'll probably need bus stuffing anyway to have enough time to update both audio registers in the middle of a graphics kernel. I'm also wondering why combining two 4bit registers only produces 5bits of resolution. Doesn't that mean that on average for each output value there are 8 combination of AUD0/1 values that produce that output?
-
Wouldn't 5bit be too expensive though? That's an extra Tia write each scanline. Also, would doubling the sample rate be a better use of the extra audio bandwidth?
-
While testing a small routine designed to free up the overscan and vblank time for the harmony processor I threw together a quick demo for playing back about 30 seconds of sampled audio. The results are surprisingly good. Since the purpose of this was to test a routine that runs in zeropage memory during overscan and vblank the audio samples are intentionally stored in a very inefficient manner. If someone only cared about playing back audio it would be possible to fit more than a minute into a single rom. rbairos did the conversion to 4bit audio. btf_audio_part1.bin btf_audio_part2.bin
