-
Content Count
785 -
Joined
-
Last visited
Posts posted by ZackAttack
-
-
Have you tried another system, Darrell? Seems to me this is a super risky thing to do in a homebrew.
Which part do you think is super risky?
-
doesn't work on my light sixer, I get a blank screen or various lines:
Would you please try again with the next build in the first post? I changed the driver to wait for the correct ZP address to appear on the bus before tri-stating the data bus. Previously it was tri-stating the data bus as soon as the address changed away from the current ROM location. It's possible that your system is more sensitive to violating the hold time than the ones I'm testing on.
Another possibility is that there is a bad bit in the RIOT RAM. Currently there are instructions executed form $80 to $B1. If the driver change doesn't help I can send you a different program that doesn't ever jump execution to RIOT RAM.
Thanks for testing this.
-
Update - the above was with the Encore. On my original Harmony I only get a blank screen.
That's NTSC right?
-
Here is a ROM that uses the power of the Harmony cart to display 10 colored sprites per line. It uses a prototype driver that isn't supported in emulation so you'll have to put it on a Harmony cart to run it.
I'd like to get some feedback on how well this works on different systems. I've tested on a NTSC JR and 7800 and the results were good.
Thanks to rbairos for providing the converted audio and graphics.
Current Build:
Edit: 12/18/2017 - Uploaded version 4 with proper driver fix and an additional second of audio samples
Edit: 12/16/2017 - Uploaded version 3 with extra nops to increase hold time
Older Builds:
I've included a picture of it running for those who don't have a harmony and would like to see. There's more to it than what's in the picture though, so I'd recommend running it if you have the hardware handy.
Here is the code for the display kernel. It's pretty messy due to the extreme optimization and complete misuse of the 6507 JSR instruction. In order to compensate for artifacts from the JSR instruction the ball is used to mask some pixels. There wasn't enough time to both resize it and enable it every scan line. So CTRLPF is used to do both by changing the size and the PF priority. The ball is always enabled, but it is shown or hidden based on the PF priority. Since JSR is used to write to GRP1 and GRP0 it's called a second time to write to AUDV1 and AUDV0. This allows for 5bit sampled audio, though this demo is only using 4 bits because of space constraints. After the GRP and AUD registers are written to a TXS is performed to reset the SP back to GRP1. The X register is preloaded with the address GRP1 in vblank and remains that value during the entire frame. Y is used for the color of the right most sprite and is loaded on the previous scan line. JSR values are computed by subtracting the number of bytes consumed by the instructions until the next JSR. Each JSR is effectively loading the values that will be stored in the next JSR when the PC gets pushed to the stack. The JSR must always target an address in the ROM. So D12 is always set. Because there are about 30 bytes between each JSR there are some values which require D11 to be set as well in order to avoid starting outside of ROM space. This is why 0x800 is added if D12 was cleared by the previous subtraction. This also increases the ball size to 2 so it covers both pixels represented by D12 and D11. Whether or not the ball was enabled was determined by D12 in the original value. There is a slight artifact if you have the second to last sprite with the pattern xxx1000 and the last sprite has a value less than 32 or so. In this case you end up with xxx1100. There simply isn't enough time to move the ball over one in order to mask just D11, so D11 remains visible. This is a very small percentage of the time and isn't very noticeable so it should be acceptable. It takes about 6 lines of vblank and overscan to fill the audio buffer, position objects, and initialize everything for the next frame. The 6507 runs a routine from ZP RAM for the other 64 lines. This routine updates the audio registers and performs the vsync. During those 64 lines the ARM CPU is currently idle. All the display kernel lookups are done between putting bytes on the 6507 data bus. Hopefully this provides enough time to load in more data from the EEPROM or the SD card. It at least provides ample time for some game logic and audio calculations though.
Eventually I'll be posting the entire source to GitHub for everyone to enjoy, but I want to stabilize the design some more first.
for(; i < 192;) { // Left group starts cycle 0 vcsJsr6(jsrl); // G, I Graphics <-e1 results in I=$ff vcsWrite5(GRP0, pGraphics[i*10]); // A graphic vcsWrite5(GRP1, pGraphics[i * 10 + 2]); // C graphic vcsWrite5(GRP0, pGraphics[i * 10 + 4]); // E graphic vcsWrite5(COLUP0, pColors[i * 10 + 0] << 1); // A color vcslda2(pColors[i * 10 + 2] << 1); // C color vcssta4(COLUP1); // C color vcslda2(pColors[i * 10 + 4] << 1); // E color vcstxs2(); // Should be 36 cycles prior to here vcssta3(COLUP0); // E color vcslda2(pColors[i * 10 + 6] << 1); // G Color - 41 // 0x20 - 23 = 0x09 => sample will offset range by 0-15 giving 0x20-0x2f vcsJsr6(0x1009 + ((sampleOffset & 1) ? pAudioSamples[sampleOffset >> 1] >> 4 : (pAudioSamples[sampleOffset >> 1] & 0xf))); sampleOffset++; vcssta3(COLUP1); // G Color vcssta3(GRP1); // Flush delay register vcssty3(COLUP0); // I Color - 56 i++; jsrr = (((unsigned short)pGraphics[i * 10 + 7]) << | pGraphics[i * 10 + 9]; // H and J graphics bytes vcsWrite5(HMP0, 0x80); ctrlpfr = ((jsrr & 0x1000) >> 10) ^ 0x5; vcssta3(HMP1); jsrr = (jsrr | 0x1000) - 33; //31 bytes between JSRs vcssta4(HMBL); if ((jsrr & 0x1000) == 0) { jsrr += 0x800; ctrlpfr |= 0x10; } vcsldy2(pColors[i * 10 + 9] << 1); // J Color vcsWrite5(HMOVE, 2); StaggeredFrame: if (i >= 192) { break; } // Right group starts cycle -1 vcsJsr6(jsrr); // H, J Graphics <-de results in J=$ff vcsWrite5(GRP0, pGraphics[i * 10 + 1]); // B graphic vcsWrite5(GRP1, pGraphics[i * 10 + 3]); // D graphic vcsWrite5(GRP0, pGraphics[i * 10 + 5]); // F graphic vcsWrite5(COLUP0, pColors[i * 10 + 1] << 1); // B color vcsWrite5(COLUP1, pColors[i * 10 + 3] << 1); // D color vcslda2(ctrlpfr); vcssta3(CTRLPF); vcslda2(pColors[i * 10 + 5] << 1); // F color vcstxs2(); // Should be 39 cycles prior to here vcssta3(COLUP0); // F color vcslda2(pColors[i * 10 + 7] << 1); // H Color - 44 // 0x20 - 22 = 0x0a => sample will offset range by 0-15 giving 0x20-0x2f vcsJsr6(0x100a + ((sampleOffset & 1) ? pAudioSamples[sampleOffset >> 1] >> 4 : (pAudioSamples[sampleOffset >> 1] & 0xf))); sampleOffset++; i++; jsrl = (((unsigned short)pGraphics[i * 10 + 6]) << | pGraphics[i * 10 + 8]; // G and I graphics bytes vcssta3(COLUP1); // H Color ctrlpfl = ((jsrl & 0x1000) >> 10) ^ 0x5; vcssta3(GRP1); // Flush delay register jsrl = (jsrl | 0x1000) - 30; //28 bytes between JSRs vcssty3(COLUP0); // J Color - 59 if ((jsrl & 0x1000) == 0) { jsrl += 0x800; ctrlpfl |= 0x10; } vcsWrite5(CTRLPF, ctrlpfl); vcsWrite5(HMCLR, 0x00); vcsWrite5(HMOVE, 2); vcsldy2(pColors[i * 10 + 8] << 1); // I Color }-
6
-
-
Keep in mind that C++ is a superset of C. You can write pure C code, put it in a cpp file and call it a C++ project. C++ adds templates and classes which is what makes it more powerful. Having the object-oriented mindset that's required to take advantage of that power is just one more thing to learn. By itself it's not difficult and I'm sure you'll learn it eventually, but if you're just learning to program it may be easier to take it one step at a time. You also have to deal with the fact that embedded processors are weak and there is no OS to do all the heavy lifting for you. So managing memory and object lifetimes in c++ will definitely up the difficulty some.
CDF took the lessons learned from the original DPC+ driver and the BUS driver experiments to create what I would consider the next evolution of DPC+. Maybe they should call it DPC++
SpiceWare is one of the co-authors so I'm sure he could provide you more specifics.I've also been working on writing a driver for the Harmony/Melody hardware which aims to allow the entire thing to be written in c/c++. Instead of queuing up several data streams for a corresponding 6507 kernel, you simply call some functions to write the different TIA registers. What's really cool about this is that you can do a lot of the display and audio processing during the same time the screen is being drawn. Leaving almost all of vblank and overscan free for the ARM processor to do gamelogic and maybe even read from the SD card. It's very much a work in progress but it currently works and I used it to make a kernel which is achieving 17 TIA writes per scanline without needing to use bus stuffing. If you share more about what you want to build I can tell you if this new driver would be a good fit or not.
-
1
-
-
Sounds like I should stick to varying the vsync timing and make it possible to switch it on or off in case it doesn't work for some. Thanks everyone for the great info!
-
c++ is certainly possible, but I think it would make things harder for you instead of easier and wouldn't produce anything better than you can accomplish with c. Either way you're going to find yourself avoiding the use of malloc and new.
Have you outlined exactly what type of game you're looking to make?
-
1
-
-
If I understand this correctly you need to delay vsync half a scanline and add an extra scanline every other field. This results in each field having a length of 262.5 scanlines.
Normal V123 4567 V123 4567 Interlaced V123 4567 89V1 <- vsync (v) delayed to mid point 2345 6789 <- Extra scanline
-
1
-
-
Ok, I've just about got this working on the harmony cart. Still need to mask off the one pixel with BL and fix the JSR target address to map it to account for the bytes of instructions between the previous JSR and the current one. Then I'll drop in some actual data instead of a procedural generated pattern. Also need to implement interlacing, but the flicker isn't too bad with the interweaved pattern.
-
What's the best way to implement an interlaced display? Is adding an extra scanline on odd frames all you need to do or is it something more complicated with vsync timing. Or something else?
-
The Harmony Encore supports up to 512K, most likely the 3E driver needs to be updated.
That's what the website says, but do you know of any 512KB roms that actually work on it?
Either way it's probably best to just read in the data directly from a file on the SD card. Then you can have up to 4GB in a single file.
-
Note that I have used Stellerator for comparing the two; I haven't yet managed to get those ROMs (or ZackAttack's for that matter) to run on a Harmony encore.
Harmony doesn't support larger 3E roms. You have to make it 32k and give it a .3e extension to playback on harmony. Eventually I'm going to enable playing back samples from the SD card and then we can really have some long audio clips.
-
1
-
-
If the problem is a load ZP in place of what was supposed to be a load # you could probably find it with a regex search pretty quickly. Something like ld[a-z] [^#]\d should find all the loads that are loading an arbitrary address instead of an immediate value. Of course this is assuming you've used labels for all your addresses and won't intentionally be doing lda $f000 instead of lda MyData.
It would be cool if you could configure dasm to error out when a load is attempted with a number instead of a label and it's not an immediate addressing mode.
-
1
-
-
I agree with limiting such a list to games that were published to physical carts in some minimum quantity. It would probably be good to call it "Published Homebrews" or similar in order to avoid repeated debates about what qualifies a game to make the list.
-
Good news. Batari gave me permission to use the DPC+ startup code. So I now have a prebuilt routine to initialize harmony and melody boards. This allowed me to focus on writing the actual driver code. The following C file will compile and run on my harmony cart. What's really cool is that it is running from the ROM instead of the RAM. In fact, there's only about 30 lines of assembly that actually run from RAM right now. Everything else is running in ROM. This leaves lots of memory for game and kernel logic.
There's still a huge amount of work remaining, but at least it's starting to look like this will work.
Since this is a completely new driver it's not yet supported in any emulators. The only way to run this is with a melody/harmony cart.
strong-arm-rainbow-test-pattern.bin
int main() { handleReset(); int i = 0; // Clear memory and registers for (i = 0; i < 0x100; i++) { vcsWrite5((unsigned char)i, 0); } vcsJmp3(); while (1) { vcsWrite5(VSYNC, 2); vcsWrite5(WSYNC, 2); vcsWrite5(WSYNC, 2); vcsWrite5(WSYNC, 2); vcsWrite5(VSYNC, 0); for (i = 0; i < 37; i++) { vcsJmp3(); vcsWrite5(WSYNC, 2); } vcsWrite5(VBLANK, 0); for (i = 0; i < 192; i++) { vcsJmp3(); vcsWrite5(WSYNC, 2); vcsWrite5(COLUBK, (unsigned char)i); } vcsWrite5(VBLANK, 2); for (i = 0; i < 30; i++) { vcsJmp3(); vcsWrite5(WSYNC, 2); } } }-
2
-
-
Could the second volume register be used to change the loudness of each frame? Then you'd only have to set it once per frame. I think that would work pretty well for video playback.
-
You could always pad the Sprite to make it 32 high. Of course that will waste space.
-
I don't think it's a big problem. The non-linearity still gives 31 unique fairly-evenly distributed values, which is very nearly 5-bit, and certainly much better than 4-bit.
Here's a graph plot of the unique values, per DirtyHairy's formula...
Wouldn't the non linearity of each individual channel also effect the results? Has anyone documented the graph if only AUDV0 is used?
The routine I made only takes up 49 bytes for the code and 32 bytes for the sample buffer. (64 4bit samples packed 2 per byte). I modified it to handle both AUDV0 and AUDV1 and still had 10 bytes of zeropage memory left over. The only problem is that it requires 64 bytes to be copied to ZP each frame instead of 32. So either it's going to require bus stuffing or you'll lose one or two more lines of vblank time. We'll probably need bus stuffing anyway to have enough time to update both audio registers in the middle of a graphics kernel.
I'm also wondering why combining two 4bit registers only produces 5bits of resolution. Doesn't that mean that on average for each output value there are 8 combination of AUD0/1 values that produce that output?
-
Wouldn't 5bit be too expensive though? That's an extra Tia write each scanline. Also, would doubling the sample rate be a better use of the extra audio bandwidth?
-
While testing a small routine designed to free up the overscan and vblank time for the harmony processor I threw together a quick demo for playing back about 30 seconds of sampled audio. The results are surprisingly good. Since the purpose of this was to test a routine that runs in zeropage memory during overscan and vblank the audio samples are intentionally stored in a very inefficient manner. If someone only cared about playing back audio it would be possible to fit more than a minute into a single rom.
rbairos did the conversion to 4bit audio.-
4
-
-
Pixel clock that is 2 CPU cycles wide means the clock for the 6507 is the pixel clock divided by 2 instead of being divided by 3 like it is now right? In other words the 6507 is clocked 50% faster.
You're also missing a superscalar 6502 with L1 cache. I'm sure some out of order execution would go nicely with timing critical memory mapped I/O.
-
An easy way to set this up would be to start with a simple program which displays a sprite at any coordinate you specify. So the program is drawing one sprite at (30,40) for example. Then modify it so that it changes the position of the sprite every frame. On the even frames it remains located at (30,40) and on the odd frames it moves to (120,70). Now there appears to be two sprites located at (30,40) and (120,70), but they flicker because they are only drawn half the time. Then you need to track the location of each object separately and choose which location, graphics, colors etc to display for each frame.
-
1
-
-
Halt 6507, take over write signal, update TIA COLUBK register every pixel, game over.
-
1
-
-
That video looks so much better than I was expecting. For some reason my phone seems to do a better job at holding the 60fps. I'm pretty sure you could just attach a 10mb file to your post here. Isn't the limit 50mb?
Did you see my comment about changing the background color per row? Perhaps it would be useful for scenes with 2 bright dominant colors?
Now I'm wondering if we can build a youtube app for the Atari or maybe include some cut scenes in future games.

Back to the future
in Atari 2600 Programming
Posted
Ah, yes. It's strange though. This really doesn't do anything questionable, especially not compared to the bus stuffing drivers we experimented with. It should just look like a normal bank switched rom to the 6507 and the 6507 is in charge of all the writes.
Darrell are you flashing the harmony or loading via menu from SD card? There's a good chance this only works when loading it from the menu.