ZackAttack Posted November 5, 2015 Share Posted November 5, 2015 Small demo program I made to test out my 160x192 bitmap rendering algorithm. It uses bus-stuffing and 30Hz flicker to draw the bitmap. The first image is randomly generated to seed the game and then each iteration that follows simply applies Conway's game of life algorithm. Some pretty cool patterns appear.http://www.youtu.be/bSWhDHybXDYDisclaimer: Has not been tested on real hardware yet. 8 Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted November 6, 2015 Share Posted November 6, 2015 (edited) Hello again it's been a while, nice to see your project coming along nicely. Did you modify the Stella source to allow for single cycle bus stuffing? Edited November 6, 2015 by ScumSoft Quote Link to comment Share on other sites More sharing options...
Andromeda Stardust Posted November 6, 2015 Share Posted November 6, 2015 Disclaimer: Has not been tested on real hardware yet. This is amazing, thanks. If you PM me the ROM I would be glad to test it on my Harmony for you. I have an older model Harmony so I'm limited to 32kbyte ROMs though. I've been fascinated on and off by the organic organism like formations that have been discovered thus far, some highly complex. How did you get your program to work with with the memory and CPU limitations of the VCS? Quote Link to comment Share on other sites More sharing options...
Mountain King Posted November 6, 2015 Share Posted November 6, 2015 Great Work Zac! I can test it for you as well if you want. Quote Link to comment Share on other sites More sharing options...
ZackAttack Posted November 6, 2015 Author Share Posted November 6, 2015 Hello again it's been a while, nice to see your project coming along nicely. Did you modify the Stella source to allow for single cycle bus stuffing? I modified stella source to support 3 cycle writes using data bus stuffing on a STA ZeroPage instruction and a 5 cycle double write using a ROL ZeroPage instruction. The first I have personally tested on real hardware . I believe something similar to the later has been tested by supercat. I plan on releasing the source once I verify it on hardware. This is amazing, thanks. If you PM me the ROM I would be glad to test it on my Harmony for you. I have an older model Harmony so I'm limited to 32kbyte ROMs though. I've been fascinated on and off by the organic organism like formations that have been discovered thus far, some highly complex. How did you get your program to work with with the memory and CPU limitations of the VCS? Great Work Zack! I can test it for you as well if you want. Thank you both for offering to test this for me. Unfortunately it will not be able to run on a Harmony. The Harmony can do bus stuffing, but I have no idea how to implement it. There is also the problem of RAM. This program makes use of double buffering an entire frame. So it ends up using about 50KB of RAM. I think the MCU in the Harmony only has 8KB total. Not sure if that's the original or the encore though. This is targeting the C++ cart that I've been working on. I had to take a break from working on the hardware side of things because I'm waiting on a LQFP 144 adapter board to get here from China. Quote Link to comment Share on other sites More sharing options...
Andromeda Stardust Posted November 6, 2015 Share Posted November 6, 2015 I modified stella source to support 3 cycle writes using data bus stuffing on a STA ZeroPage instruction and a 5 cycle double write using a ROL ZeroPage instruction. The first I have personally tested on real hardware . I believe something similar to the later has been tested by supercat. I plan on releasing the source once I verify it on hardware. Thank you both for offering to test this for me. Unfortunately it will not be able to run on a Harmony. The Harmony can do bus stuffing, but I have no idea how to implement it. There is also the problem of RAM. This program makes use of double buffering an entire frame. So it ends up using about 50KB of RAM. I think the MCU in the Harmony only has 8KB total. Not sure if that's the original or the encore though. This is targeting the C++ cart that I've been working on. I had to take a break from working on the hardware side of things because I'm waiting on a LQFP 144 adapter board to get here from China. Dude this would be one impressive tech demo if you can get it off the ground. Another idea. Create a GUI to allow the user to "doodle" on the screen. I often find it more fascinating to watch doodles come to life than pure random seeds. CX-22, CX-80, trackball or Amiga mouse would be ideally suited for this. Select switch could start/pause the simulation. Doodle onscreen with joystick or trackball while holding fire when paused. Reset could clear the slate. Color/BW to shift colors. I would love to own a cart of this if you ever get it working on hardware. Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted November 6, 2015 Share Posted November 6, 2015 I modified stella source to support 3 cycle writes using data bus stuffing on a STA ZeroPage instruction and a 5 cycle double write using a ROL ZeroPage instruction. The first I have personally tested on real hardware . I believe something similar to the later has been tested by supercat. I plan on releasing the source once I verify it on hardware. How have you managed to display so many pixel horizontally? And how many are they? Quote Link to comment Share on other sites More sharing options...
+Propane13 Posted November 6, 2015 Share Posted November 6, 2015 What is "bus stuffing"? Never heard of that. Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted November 6, 2015 Share Posted November 6, 2015 What is "bus stuffing"? Never heard of that. A way to update TIA registers in 3 cycles, down from the 5 cycles we have in DPC+. It was used back in the day for The Graduate. I've been using Stella to develop a bus stuffing format for the Harmony/Melody, but it's on the back burner at the moment due to RL. I plan to resume work on it in January. When done, I'll be rebooting Draconian and Frantic to utilize it for even better graphics. Quote Link to comment Share on other sites More sharing options...
ZackAttack Posted November 6, 2015 Author Share Posted November 6, 2015 How have you managed to display so many pixel horizontally? And how many are they? There are 160 pixels per scan line. It certainly wasn't easy. About 1/2 of the pixels are drawn on alternating frames. Each scanline requires 25 TIA updates to position players and update player graphics. Cycle 74 HMOVE is used to clean up the alignment of the left side. Everything else is very carefully timed TIA updates. It took me a week to finally find a combination that covered everything. What is "bus stuffing"? Never heard of that. Another way to describe it is detecting the signal that the 6507(CPU) is sending to the TIA(GPU) and overriding that signal with your own. I.E. 5V signal is forced to 0V changing a 1 bit to a 0. A way to update TIA registers in 3 cycles, down from the 5 cycles we have in DPC+. It was used back in the day for The Graduate. I've been using Stella to develop a bus stuffing format for the Harmony/Melody, but it's on the back burner at the moment due to RL. I plan to resume work on it in January. When done, I'll be rebooting Draconian and Frantic to utilize it for even better graphics. Sounds like 2016 is going to be an awesome year for homebrews. Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted November 6, 2015 Share Posted November 6, 2015 (edited) Before my hiatus here, I was writing a new Harmony core centering around bus stuffing for bitmap displays. Fred's default core polls A12 for rom access then decodes the address and figures out what the VCS wants from it. This is why 70% of the ARM is busy feeding the bus and not so free to do other stuff. I figured it should have been interrupt driven instead of polled so I got the source, the schematics and saw that we could use an FIQ on A12 and set it to trigger on the rising edge. I did this all in C and optimized it as best I could. It didn't work because there were a few configuration errors I didn't catch last time. I revisited my project a week ago and got it working properly, so now all of the ARM cycles are free to do other things while not servicing A12 requests. I still have to implement my stuffing kernel and setup a few things but the hard part is over. Now you can focus on programming games in C and let the FIQ handle all the updates the VCS is requesting. I've chosen to load a display kernel ran from RIOT ram and have key A12 requests trigger the bus stuffing sequence all timed from the ARM. So far the core is only 600bytes big and all of the 8kSRAM is free for display output bitmapping. I also plan to implement the ability to stream in data off the SDcards for larger games. College is keeping me really busy this year, so I'm only dabbling on the code a few hours a week at the moment, but it's looking pretty promising so far. Though I'm sure ZackAttacks implementation is a much beefier solution than the LPC2103 in the Harmony. Edited November 6, 2015 by ScumSoft 3 Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted November 6, 2015 Share Posted November 6, 2015 I get only about 10% of all of this, but it sure sounds very intriguing and promising. Quote Link to comment Share on other sites More sharing options...
ZackAttack Posted November 7, 2015 Author Share Posted November 7, 2015 @ScumSoft I didn't know you were still working on that. Getting it to work interrupt driven is very impressive. I'd be happy to share the CGOL source with you once you have bus stuffing implemented. How are you planning on making the bus stuffing interrupt driven? The RMW is going to be super tricky because it hits the zeropage address 3 cycles in a row and the address bus itself must be overridden for the last 2 cycles. Perhaps you plan on using a built in timer to trigger the interrupts? Just make sure you don't drive it high and fry an innocent VCS! Here are some more details on how this is implemented. I took the stella source code and added a new cartridge type. This cartridge type monitors both the data and address bus and injects values in place of what the 6507 is providing during specific writes to the TIA. Then I built an API on top of that which takes an array of commands to execute. Each command maps to a 6507/ASM instruction and also include some extra data for the bus stuffing. The most common command I use is the Write3 command. Each command is 4 bytes. The first is the command ID, then the next 3 are parameters specific to the given command. Write3 only uses 2 of those parameters, address and data. So a Write3(COLUPF, $28) will translate to a STA COLUPF. On the 3rd cycle where the CPU goes to store the value of A, $ff, the cartridge pulls some bits low in order to make it a $28. Instead of "racing the beam", we can allocate a command buffer large enough to hold a full frames worth including vblank, drawing the screem, and overscan. There are still limitations on when a specific TIA register can be updated due to the time it takes each command to execute, but it's still much easier than trying to do it in assembly. Drawing a 160x192 bitmap requires a lot of RESP0, RESP1, GRP0 and GRP1 writes. This is accomplished with 30Hz flicker. This is what each side looks like by itself and then both combined. P0 is orange and P1 is blue. Getting rid of all the gaps at the edges was very difficult. In order to deal with the fact that each cycle is 3 pixels I split the screen into 24 pixels chunks covered by 2 copies of a player with a 8 pixel gap in between. That left me also dealing with the fact that 24 doesn't divide 160 evenly. To get around that I have to alternate the positions of P0 and P1 on each scanline per half. This is why you see a crazy checkerboard pattern. The single orange column is again related to filling in the left edge. I don't really have a great explanation for that. Here's the code for the image area. I'm omitting the vblank and overscan areas because there's nothing particularly difficult or interesting in those areas. Just set vysnc and wait. 1st Half: for (int i = 0; i < 192; i++) { uInt8 *line = image[i]; switch (i % 2) { case 0: { Write45(queue, GRP1, 0xff, GRP0, line[0] << 3 | line[1] >> 5); Jmp(queue); Jmp(queue); Jmp(queue); Jmp(queue); Jmp(queue); Write45(queue, RESP1, 0xff, GRP1, line[2] << 3 | line[3] >> 5); Write3(queue, GRP0, line[2] << 3 | line[3] >> 5); Write45(queue, RESP0, 0xff, GRP0, line[5] << 3 | line[6] >> 5); Write3(queue, GRP1, line[4] << 3 | line[5] >> 5); Write45(queue, RESP1, 0xff, GRP1, line[8] << 3 | line[9] >> 5); Write3(queue, GRP0, line[7] << 3 | line[8] >> 5); Write45(queue, RESP0, 0xff, GRP0, line[11] << 3 | line[12] >> 5); Write3(queue, GRP1, line[10] << 3 | line[11] >> 5); Write45(queue, RESP1, 0xff, GRP1, line[14] << 3 | line[15] >> 5); Write3(queue, GRP0, line[13] << 3 | line[14] >> 5); Write45(queue, RESP0, 0xff, GRP0, line[17] << 3 | line[18] >> 5); Write3(queue, GRP1, line[16] << 3 | line[17] >> 5); Write3(queue, HMOVE, 0xff); Write45(queue, GRP0, line[19] << 3, RESP1, 0xff); break; } case 1: { Write45(queue, GRP0, 0xff, GRP1, line[0] << 3 | line[1] >> 5); Jmp(queue); Jmp(queue); Jmp(queue); Jmp(queue); Jmp(queue); Write45(queue, RESP0, 0xff, GRP0, line[2] << 3 | line[3] >> 5); Write3(queue, GRP1, line[2] << 3 | line[3] >> 5); Write45(queue, RESP1, 0xff, GRP1, line[5] << 3 | line[6] >> 5); Write3(queue, GRP0, line[4] << 3 | line[5] >> 5); Write45(queue, RESP0, 0xff, GRP0, line[8] << 3 | line[9] >> 5); Write3(queue, GRP1, line[7] << 3 | line[8] >> 5); Write45(queue, RESP1, 0xff, GRP1, line[11] << 3 | line[12] >> 5); Write3(queue, GRP0, line[10] << 3 | line[11] >> 5); Write45(queue, RESP0, 0xff, GRP0, line[14] << 3 | line[15] >> 5); Write3(queue, GRP1, line[13] << 3 | line[14] >> 5); Write45(queue, RESP1, 0xff, GRP1, line[17] << 3 | line[18] >> 5); Write3(queue, GRP0, line[16] << 3 | line[17] >> 5); Write3(queue, HMOVE, 0xff); Write45(queue, GRP1, line[19] << 3, RESP0, 0xff); break; } } } 2nd Half: for (int i = 0; i < 192; i++) { uInt8 *line = image[i]; switch (i % 2) { case 0: { Write4(queue, VBLANK, 0x0); Write4(queue, GRP0, line[0] >> 5); Write3(queue, GRP1, 0xfe & (line[3] << 4 | line[4] >> 4)); Write3(queue, COLUBK, 00); Jmp(queue); Write3(queue, HMP0, 0x30); Write3(queue, HMP1, 0xc0); Write45(queue, RESP1, 0xff, GRP0, (line[1] << 3 | line[2] >> 5)); Write3(queue, GRP0, 0x80 & (line[3] << 3)); Write45(queue, RESP0, 0xff, GRP0, 0xfe & (line[6] << 4 | line[7] >> 4)); Write3(queue, GRP1, 0x01 & (line[6] >> 4)); Write45(queue, RESP1, 0xff, GRP1, 0xfe & (line[9] << 4 | line[10] >> 4)); Write3(queue, GRP0, 0x01 & (line[9] >> 4)); Write45(queue, RESP0, 0xff, GRP0, 0xfe & (line[12] << 4 | line[13] >> 4)); Write3(queue, GRP1, 0x01 & (line[12] >> 4)); Write45(queue, RESP1, 0xff, GRP1, 0xfe & (line[15] << 4 | line[16] >> 4)); Write3(queue, GRP0, 0x01 & (line[15] >> 4)); Write45(queue, RESP0, 0xff, GRP0, 0xfe & (line[18] << 4 | line[19] >> 4)); Write3(queue, GRP1, 0x01 & (line[18] >> 4)); Write45(queue, HMOVE, 0xff, RESP1, 0xff); break; } case 1: { Write4(queue, VBLANK, 0x0); Write4(queue, GRP1, line[0] >> 5); Write3(queue, GRP0, 0xfe & (line[3] << 4 | line[4] >> 4)); Write3(queue, COLUBK, 00); Jmp(queue); Write3(queue, HMP0, 0xc0); Write3(queue, HMP1, 0x30); Write45(queue, RESP0, 0xff, GRP1, (line[1] << 3 | line[2] >> 5)); Write3(queue, GRP1, 0x80 & (line[3] << 3)); Write45(queue, RESP1, 0xff, GRP1, 0xfe & (line[6] << 4 | line[7] >> 4)); Write3(queue, GRP0, 0x01 & (line[6] >> 4)); Write45(queue, RESP0, 0xff, GRP0, 0xfe & (line[9] << 4 | line[10] >> 4)); Write3(queue, GRP1, 0x01 & (line[9] >> 4)); Write45(queue, RESP1, 0xff, GRP1, 0xfe & (line[12] << 4 | line[13] >> 4)); Write3(queue, GRP0, 0x01 & (line[12] >> 4)); Write45(queue, RESP0, 0xff, GRP0, 0xfe & (line[15] << 4 | line[16] >> 4)); Write3(queue, GRP1, 0x01 & (line[15] >> 4)); Write45(queue, RESP1, 0xff, GRP1, 0xfe & (line[18] << 4 | line[19] >> 4)); Write3(queue, GRP0, 0x01 & (line[18] >> 4)); Write45(queue, HMOVE, 0xff, RESP0, 0xff); break; } } } Quote Link to comment Share on other sites More sharing options...
Andromeda Stardust Posted November 7, 2015 Share Posted November 7, 2015 This sure is impressive. Maybe someone can use your new "bus stuffing" display kernel to redo the Bad Apple ROM at 160x192 instead of the blocky 40x whatever the current one uses. A powerful ARM decompression system could also be used during Vblank to fetch the screen buffer instead of the weak RLE algorithm the original used. I wonder if there's enough write cycles left over to add 15khz audio a 192x160 display kernel way Pitfall II does. You say 24 does not divide evenly into 160. If necessary you could truncate the screen by 8 pixels on either side and use a 144 pixel display kernel flickered at 30hz. That would basically be like having 3 48 pixel displays back to back. Quote Link to comment Share on other sites More sharing options...
ZackAttack Posted November 7, 2015 Author Share Posted November 7, 2015 I think 15khz audio only needs 1 Tia update per scanline. Right now there are 4 available. So I could add audio, background color and foreground color. If the background color is left static the ball could be used to add a little color to something. That could be useful for distinguishing between players in 2 player games. When I say 24 doesn't divide 160. The 160 is the visible screen width. So displaying less pixels doesn't help with that because the screen is still 160 wide. Really the problem is that the 8 pixel overlap isn't divisible by 3. The 74 cycle hmove adjusts the overlap to be a cycle of 3 so everything stays aligned for the most part. Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted November 8, 2015 Share Posted November 8, 2015 @ScumSoft I didn't know you were still working on that. Getting it to work interrupt driven is very impressive. I'd be happy to share the CGOL source with you once you have bus stuffing implemented. How are you planning on making the bus stuffing interrupt driven? The RMW is going to be super tricky because it hits the zeropage address 3 cycles in a row and the address bus itself must be overridden for the last 2 cycles. Perhaps you plan on using a built in timer to trigger the interrupts? Just make sure you don't drive it high and fry an innocent VCS! I had stopped working on everything when school started up back in Jan this year and didn't touch it again till last week when I cleaned out my closet The bus stuffing isn't interrupt driven only the "Hey A12 went high I better go service the FIQ routine now" is, this way the ARM can be crunching C code getting everything ready for the bitmap output stuffing. The actual stuffing happens while in the FIQ, I am currently working out the logic required at this point, but I should be able to make use of the three timers in the ARM for gating the values on the bus we want. I already have a working kernel I wrote for the regular DPC+ core, but it repurposes all the datafetchers and more to display my bitmap output. 2016 will be a great year for more complex games. Quote Link to comment Share on other sites More sharing options...
Zarek Posted November 9, 2015 Share Posted November 9, 2015 Really cool stuff here Looking forward to more updates on this project! Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted November 9, 2015 Share Posted November 9, 2015 I think 15khz audio only needs 1 Tia update per scanline. Right now there are 4 available. So I could add audio, background color and foreground color. How precise is the timing? While working on Boulder Dash, I found that even very small timing shifts cause very noticeable distortions. In the end I had to make the timing 100% perfect. to get rid of them completely. Can the ARM interrupts be that precise? Quote Link to comment Share on other sites More sharing options...
ZackAttack Posted November 9, 2015 Author Share Posted November 9, 2015 (edited) How precise is the timing? While working on Boulder Dash, I found that even very small timing shifts cause very noticeable distortions. In the end I had to make the timing 100% perfect. to get rid of them completely. Can the ARM interrupts be that precise? you get complete control over when to write a TIA register. The only limitation is that there have to be some read cycles mixed in. A write3 command will always take 3 6507 cycles and the write always occurs on the third cycle. Edited December 5, 2015 by ZackAttack Quote Link to comment Share on other sites More sharing options...
ZackAttack Posted December 5, 2015 Author Share Posted December 5, 2015 So thanks to USPS losing my package I'm stuck waiting for parts from China still. In the mean time I've been thinking about trying a 4k ROM version of this. Since the RAM is so limited I think the best resolution I could do would be 30x30 or 900 cells total. I threw together a small test program to see what such limited resolution would look like. It uses 120 bytes of RAM for the screen buffer, which leaves only 8 bytes for implementing the GoL algorithm. I'm pretty sure it's technically feasible though. I'm wondering if it's worth completing. Quote Link to comment Share on other sites More sharing options...
Andromeda Stardust Posted December 5, 2015 Share Posted December 5, 2015 Bummer your parts were lost. What was in the box? Quote Link to comment Share on other sites More sharing options...
ZackAttack Posted December 6, 2015 Author Share Posted December 6, 2015 Bummer your parts were lost. What was in the box? Just a couple small circuit boards. The seller sent another for free. So that was good. Quote Link to comment Share on other sites More sharing options...
DrWho198 Posted December 7, 2015 Share Posted December 7, 2015 (edited) Nice, I'm impressed by the results. Yet I feel like this can not be compaired to a traditional Atari game. It's nice as a tech experiment but I don't feel like it's the Atari's work showing these images. For me it is starting to get in line with the super gameboy. A device that feeds the graphics to the console and does all the work itself. If you look at it a bit sceptical then you are modding the A2600. Maybe it is more like an expansion unit. Still a nice project though, but it's more of an hardware than software breakthrough. I guess it's up to the user to decide what keeps him/her attracted to the 2600. And if games ever get released with this technology then I hope they will be marked with a symbol that shows they are using it. Because as a collector I would avoid it. As a tech junky I love it. Edited December 7, 2015 by DrWho198 Quote Link to comment Share on other sites More sharing options...
ZackAttack Posted December 8, 2015 Author Share Posted December 8, 2015 Nice, I'm impressed by the results. Yet I feel like this can not be compaired to a traditional Atari game. It's nice as a tech experiment but I don't feel like it's the Atari's work showing these images. For me it is starting to get in line with the super gameboy. A device that feeds the graphics to the console and does all the work itself. If you look at it a bit sceptical then you are modding the A2600. Maybe it is more like an expansion unit. Still a nice project though, but it's more of an hardware than software breakthrough. I guess it's up to the user to decide what keeps him/her attracted to the 2600. And if games ever get released with this technology then I hope they will be marked with a symbol that shows they are using it. Because as a collector I would avoid it. As a tech junky I love it. It certainly isn't a game cart you'd find on store shelves in 1977. I figure as long as it doesn't have any power or video cables coming out of it, it's fair game. We all use emulators running on multiple GHz computers to build 4k games, so why limit ourselves elsewhere in the process? I do get what you're saying about collecting though. I expect this to be more popular among those looking to log some hours playing games. Assuming I can even get it working in the first place. Quote Link to comment Share on other sites More sharing options...
Andromeda Stardust Posted December 8, 2015 Share Posted December 8, 2015 We need a working demo of this. Not for collectors or gamers but as proof of concept. If you can run what is essentially fullscreen monochrome video on an Atari 2600, then you can port just about anything. This to me is way cooler and more fascinating than the Bad Apple demo. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.