Jump to content

Photo

Designing a cartridge that supports 100% C/C++ game development


195 replies to this topic

#1 ZackAttack OFFLINE  

ZackAttack

    Dragonstomper

  • 762 posts
  • Location:Orlando, FL US

Posted Fri May 15, 2015 11:02 AM

I have decided to build a prototype for a new cartridge that will support bus-stuffing and run everything on an ARM microcontroller. The goal for this project is to facilitate the creation of a new generation of Atari 2600 games in a timely manner.

 

Here is a high level diagram of the overall design. The basic concept is to offload the work off timing TIA register reads/writes and bus-stuffing to the FPGA thus freeing the MCU to run the game code. There will be two command buffers that will each have enough room for at least a single scanline of commands. The FPGA will automatically toggle between the buffers and raise an interrupt on the MCU each time the buffers are swapped. The MCU will then need to fill the other buffer during it's handling of the IRQ. When the MCU is not handling the buffer fill requests it is free to run whatever game logic it wants.

 

The following commands are what I've thought to support so far: 2cycle NOP, 3 Cycle Register Write, 4 Cycle Register Write, 5 Cycle Register Double Write (RMW Instruction), JMP to $1000, 3 cycle Register Read, 4 cycle Register Read.

 

Frameworks could be built on top of the command structure similar to batari Basic in order to provide some higher level abstractions for those who are just getting started with C++.

 

I would appreciate any feedback on this design, good or bad. If the prototype is successful I plan to modify stella so it can support running/debugging C++ games from within Visual Studio 2013. All design documents and source will be made publicly available.

 

cart_design.png

 

Level Converts and Bus Stuffer.png


Edited by ZackAttack, Sat May 16, 2015 6:39 PM.


#2 jaholmes OFFLINE  

jaholmes

    Space Invader

  • 36 posts

Posted Fri May 15, 2015 7:50 PM

This sounds like a neat project with a lot of potential.  I guess I'm still wondering what it actually means to "stuff the bus", though.  Except in situations where a ROM would normally drive the data bus, you wouldn't be driving any busses, and so you're really just emulating a ROM.  Presumably you'd have some set of well-written template kernels--chunks of 6502 machine code with all the operands missing--which would be populated in realtime by the FPGA based on some "operand memory", which would be the analog of the TIA from the standpoint of the ARM code.  Or am I off in the weeds?

 

Actually, thinking about this some more, you'd obviously need to be a bit more than a ROM emulator in order to deal with input, for instance.  I guess, since the FPGA would know when it had just handed the CPU an instruction to read from the RIOT's address space, it could snoop the result from the data bus directly when it saw the address go by.  There'd be no need to preserve the resulting register state in the 6502; the next instruction could safely blast over it.  Hmmm... the possibilities...


Edited by jaholmes, Fri May 15, 2015 8:28 PM.


#3 5-11under OFFLINE  

5-11under

    River Patroller

  • 3,398 posts
  • Location:Ontario, Canada

Posted Fri May 15, 2015 8:44 PM

It sounds interesting.

So you don't have to care about chasing the beam, then?

Would you just populate your own display screen "buffer", for instance, in your ARM program, and be able to send tiles and sprites to the buffer? What resolution and color limitations would you likely have?

 

/not anywhere near being a 2600 guru!



#4 ZackAttack OFFLINE  

ZackAttack

    Dragonstomper

  • Topic Starter
  • 762 posts
  • Location:Orlando, FL US

Posted Fri May 15, 2015 8:50 PM

This sounds like a neat project with a lot of potential.  I guess I'm still wondering what it actually means to "stuff the bus", though.  Except in situations where a ROM would normally drive the data bus, you wouldn't be driving any busses, and so you're really just emulating a ROM.

An example would be the STA instruction. Normally you would need a minimum of 5 cycles to LDA # and STA ZP in order to update a TIA register. Since the 6502 drives the databus high with transistors that act as pullup resistors it is safe to drive the data bus low during a cycle which the CPU is driving the bus. So as long as A has $ff loaded into it we can trim the TIA update down to 3 cycles by driving the appropriate bus lines low when the 6502 is writing the value. Hence it's called bus stuffing since you're stuffing in a new value.
 

Presumably you'd have some set of well-written template kernels--chunks of 6502 machine code with all the operands missing--which would be populated in realtime by the FPGA based on some "operand memory", which would be the analog of the TIA from the standpoint of the ARM code.  Or am I off in the weeds?


That's pretty much it. Each command will map to a 6502 instruction along with some additional data. The 3 cycle write TIA register command will require the Command ID, TIA Register Address, and Desired Value. This command would then feed the STA opcode followed by the Address and then bus stuff the value.
 

Actually, thinking about this some more, you'd obviously need to be a bit more than a ROM emulator in order to deal with input, for instance.  I guess, since the FPGA would know when it had just handed the CPU an instruction to read from the RIOT's address space, it could snoop the result from the data bus directly when it saw the address go by.  There'd be no need to preserve the resulting register state in the 6502; the next instruction could safely blast over it.  Hmmm... the possibilities...

Exactly what I was thinking. The snooped value would then be stored in the retrieved data buffer until cleared by the MCU. If the buffer isn't cleared it will eventually wrap around and overwrite the oldest values first.



#5 ZackAttack OFFLINE  

ZackAttack

    Dragonstomper

  • Topic Starter
  • 762 posts
  • Location:Orlando, FL US

Posted Fri May 15, 2015 9:01 PM

It sounds interesting.

So you don't have to care about chasing the beam, then?

Would you just populate your own display screen "buffer", for instance, in your ARM program, and be able to send tiles and sprites to the buffer? What resolution and color limitations would you likely have?

 

/not anywhere near being a 2600 guru!

Technically, I guess you'd no longer need to race the beam. For the best results you'd still need to carefully time the TIA updates. It'll just be a whole lot easier and quicker to set it up. A buffer won't be required, but the ARM microcontroller will have have plenty of on board RAM for programs that want to create one. The graphics will still be limited enough to look like a VCS game, but with this design you should be able to achieve better results than what could be done with assembly and even DCP+. Program size will only be limited by the size of the SD card.



#6 jaholmes OFFLINE  

jaholmes

    Space Invader

  • 36 posts

Posted Fri May 15, 2015 9:10 PM

Since the 6502 drives the databus high with transistors that act as pullup resistors it is safe to drive the data bus low during a cycle which the CPU is driving the bus.

 

Makes sense.  (Seems a little scary, I don't mind admitting!)  I just found this very informative thread on the subject.



#7 Zarek OFFLINE  

Zarek

    Space Invader

  • 26 posts
  • Location:Australia

Posted Sat May 16, 2015 2:17 PM

Sounds fantastic. I'm a budding C++ programmer with an interest in 6502 assembly, and coincidentally I've been thinking of learning ARM assembly too. I would love to see this come into fruition, even though I'm new haha.



#8 ZackAttack OFFLINE  

ZackAttack

    Dragonstomper

  • Topic Starter
  • 762 posts
  • Location:Orlando, FL US

Posted Sat May 16, 2015 6:57 PM

Added schematics for logic level translation and bus-stuffing circuits to the first post.

 

The logic level translation is pretty straight forward. D1 will clamp the voltage to 3.3V when the signal is high and the output will be pulled low by the buffer through the resistor when the input signal is low. A R1 value of 1K should be big enough to limit current to a safe level and small enough to meet timing requirements.

 

The Bus-Stuffing will also perform the level translation. The pull-up resistor R2 will always assert a high signal on the bus. The 6507 and peripherals should have no problem pulling it low. The FPGA will pull the data bits low by activating U2. From the FPGA perspective writing a value to the data bus will look the same regardless of whether or not bus-stuffing is taking place.

 

I found a couple 74LS125's in my parts bin to test with. If those don't work for U@ I may try a slightly different design that uses a 74 series part with open collector outputs.


Edited by ZackAttack, Sat May 16, 2015 6:57 PM.


#9 jaholmes OFFLINE  

jaholmes

    Space Invader

  • 36 posts

Posted Sat May 16, 2015 10:22 PM

I'd probably chicken out and use these (or similar):
http://www.nxp.com/p...LVC4245APW.html

But for a prototype...

Where's an FPGA with 5V I/O when you need one?!

Edited by jaholmes, Sat May 16, 2015 10:41 PM.


#10 jaholmes OFFLINE  

jaholmes

    Space Invader

  • 36 posts

Posted Sun May 17, 2015 11:51 AM

And just for kicks--for those situations where C++ just doesn't feel 'retro' enough, and yet you still want a somewhat casual development experience--you could drop a 6502 core into your FPGA. :) Clocked at 32+MHz, you're sure to win the beam race, even with the sloppiest of coding. You could name the cartridge simply "Overclocked", and the label artwork could be a photo of a heatsink/fan assembly. :)

#11 ZackAttack OFFLINE  

ZackAttack

    Dragonstomper

  • Topic Starter
  • 762 posts
  • Location:Orlando, FL US

Posted Sun May 17, 2015 12:10 PM

Sounds fantastic. I'm a budding C++ programmer with an interest in 6502 assembly, and coincidentally I've been thinking of learning ARM assembly too. I would love to see this come into fruition, even though I'm new haha.

Thanks for the encouragement and welcome to the forums!

 

I'd probably chicken out and use these (or similar):
http://www.nxp.com/p...LVC4245APW.html

But for a prototype...

Where's an FPGA with 5V I/O when you need one?!

The best I've found so far is this. I figure 5V parts are going to keep getting harder to find so it would be good for future proofing to not have to rely on any special ICs. I also should have mentioned that the design took into consideration which ICs I have in my parts bin so I could get started right away.

 

And just for kicks--for those situations where C++ just doesn't feel 'retro' enough, and yet you still want a somewhat casual development experience--you could drop a 6502 core into your FPGA. :) Clocked at 32+MHz, you're sure to win the beam race, even with the sloppiest of coding. You could name the cartridge simply "Overclocked", and the label artwork could be a photo of a heatsink/fan assembly. :)

Funny you say that. I was thinking it might be possible to run a modified version of stella on the ARM MCU to allow the cart to play non-arm games too. But first I need to sacrifice a few Atari 2600s to bus-stuffing experiments.



#12 ZackAttack OFFLINE  

ZackAttack

    Dragonstomper

  • Topic Starter
  • 762 posts
  • Location:Orlando, FL US

Posted Sun May 17, 2015 8:31 PM

Finished wiring up the FPGA to the Atari. Just need to write a test program and some VHDLto see if the bus-stuffing circuit works. Then it's on to designing the FPGA portion.

 

WP_20150517_007.jpg

 

U1 is two 74LS245 for the address bus and one 74HCT541 for the data bus. (It's what I had laying around)

R1 is 1K.

D1 is the 3.3V Zener Diode that's built into the Altera DE2 FPGA dev board

U2 is two 74LS125

R2 is 10k

 

 



#13 jaholmes OFFLINE  

jaholmes

    Space Invader

  • 36 posts

Posted Sun May 17, 2015 9:52 PM

Cool!  Looks like you're wasting no time.  I trust you'll share screenshots of your "extra chatty" Donkey Kong if it works out? :)

 

Should any of your 2600's ICs donate their smoke to science, may they at least be of the socketed sort!



#14 Thomas Jentzsch OFFLINE  

Thomas Jentzsch

    Thrust, Jammed, SWOOPS!, Boulder Dash, THREE·S, Star Castle

  • 24,031 posts
  • Always left from right here!
  • Location:Düsseldorf, Germany, Europe, Earth

Posted Mon May 18, 2015 7:40 AM

I am no hardware expert, but how is this different from the Harmony/Melody? Those are ARM based too and people were discussing about "bus-stuffing" there too.

 

Also a DPC+ game usually is 95% C++ too. All that's left for the 6507 are the kernel, timers and vertical sync, audio output and controller reads.



#15 Mr SQL OFFLINE  

Mr SQL

    River Patroller

  • 2,101 posts

Posted Mon May 18, 2015 8:15 AM

I am no hardware expert, but how is this different from the Harmony/Melody? Those are ARM based too and people were discussing about "bus-stuffing" there too.

 

Also a DPC+ game usually is 95% C++ too. All that's left for the 6507 are the kernel, timers and vertical sync, audio output and controller reads.

It's different because Zack is talking about a Framework development kit for handling the 6502 kernel, timers, vertical sync; specifically like bB or the ASDK.

 

bB has only limited support for the ARM and runs mostly on the 6502 though while Zacks proposed Framework would leave the developer free to write all of the code in C.



#16 ZackAttack OFFLINE  

ZackAttack

    Dragonstomper

  • Topic Starter
  • 762 posts
  • Location:Orlando, FL US

Posted Mon May 18, 2015 10:36 AM

The Harmony connects the ARM core directly to the 6507 bus. My design relies on custom hardware in the FPGA to interface with the 6507 busses. This allows the ARM to spend more time running user code and removes the need for any assembly programming.

Even if someone implemented bus-stuffing on the Harmony, there wouldn't be that much ARM cycles available to calculating what to stuff in.

#17 ZackAttack OFFLINE  

ZackAttack

    Dragonstomper

  • Topic Starter
  • 762 posts
  • Location:Orlando, FL US

Posted Mon May 18, 2015 8:37 PM

The good news is my Atari survived the night. The bad news is that I spent most of the night troubleshooting with a faulty multi-meter and ran out of time. Eventually I got Quartus to play nice and also figured out that I forgot to wire up one of the power bus strips on the breadboard. Doh!

 

The voltage level converters are working in both directions. I think the circuit to interface the Atari and FPGA is complete at this point. The next step will be to program the FPGA for a basic bus-stuffing test.

 

Here's a shot of the circuit in action. The switches are wired to the red LEDs and then to the databus driver circuit. The green LEDs are wired to the input from the databus.

 

Level Converters Tested.jpg

 

Verilog:

module light (SW, LEDR, GPIO_0, LEDG);
input [17:0] SW;
inout [35:0] GPIO_0;
output [17:0] LEDR;
output [7:0] LEDG;
assign LEDR = SW;
assign LEDG[7:0] = GPIO_0[15:8]; // 15:8 is DataBusIn
assign GPIO_0[7:0] = SW[7:0]; // 7:0 is DataBusOut
endmodule


#18 CPUWIZ OFFLINE  

CPUWIZ

    Commander

  • 34,881 posts
  • I am the one who knocks!
  • Location:SoCal

Posted Mon May 18, 2015 9:47 PM

Before you get too involved, I would suggest to build an interface, that goes into the cart slot (like you have), but not with 1mile long wires, you may be in for a world of surprises.  Figure out the PCB design, inside something like Eagle or other CAD software.  Check if your desired chips fit onto a PCB, that fits into donor shells.  Double check what the parts and assembly would cost.  Realize, that the boards will need to be pre-built, most people can not solder FPGA's etc.!  Just keep all of these things in mind.

 

Fun stuff to play with, either way, good luck. :)



#19 ZackAttack OFFLINE  

ZackAttack

    Dragonstomper

  • Topic Starter
  • 762 posts
  • Location:Orlando, FL US

Posted Tue May 19, 2015 8:31 PM

Before you get too involved, I would suggest to build an interface, that goes into the cart slot (like you have), but not with 1mile long wires, you may be in for a world of surprises.  Figure out the PCB design, inside something like Eagle or other CAD software.  Check if your desired chips fit onto a PCB, that fits into donor shells.  Double check what the parts and assembly would cost.  Realize, that the boards will need to be pre-built, most people can not solder FPGA's etc.!  Just keep all of these things in mind.

 

Fun stuff to play with, either way, good luck. :)

I'm sure you're right about the mile long wires. lol. Fortunately, I think I'll be able to get away with it for the prototype. Those are some good points about the PCB. I'm not really sure what to do about that yet. I'm certainly not equipped to solder those BGA parts. Maybe using some prototype pcb service would be possible. Though that might price it out of reach for many.



#20 ZackAttack OFFLINE  

ZackAttack

    Dragonstomper

  • Topic Starter
  • 762 posts
  • Location:Orlando, FL US

Posted Tue May 19, 2015 8:40 PM

SUCCESS! Tonight I verified that both bus-stuffing and snooping work as intended. The next step will be to design the command interface and get the microcontroller hooked up to it. The picture is pretty noisy, but that's expected considering my complete disregard for proper RF shielding and what not. Here's a video of it in action. The vertical bars correspond to a bit being changed by the bus-stuffing circuit and the green LEDs correspond to the SWCHA value being snooped of the data bus. Enjoy!
 

 
Game Runs.jpg
 
Attached File  bees.asm   4.35KB   165 downloads
 
VHDL:
---------------------------------------
-- driver (ESD book figure 2.3)		
--
-- two descriptions provided
----------------------------------------

library ieee;
use ieee.std_logic_1164.all;

----------------------------------------

entity Light is
port(	
	SW: in std_logic_vector(17 downto 0);
	LEDR: out std_logic_vector (17 downto 0);
	LEDG: out std_logic_vector (7 downto 0);
	GPIO_0: inout std_logic_vector (28 downto 0);
	FL_WE_N: out std_logic;
	FL_CE_N: out std_logic;
	FL_OE_N: out std_logic;
	FL_RST_N: out std_logic;
	FL_DQ: in std_logic_vector(7 downto 0);
	FL_ADDR: out std_logic_vector(21 downto 0)
);
end Light;  

----------------------------------------

architecture behv1 of Light is
begin

    process(SW)
    begin

	--Enable reading from flash memory
	FL_WE_N <= '1';
	FL_RST_N <= '1';
	FL_OE_N <= '0';
	FL_CE_N <= '0';

	--Only using the lowest 4KB block of flash memory
	FL_ADDR(21 downto 12) <= "0000000000";

	if GPIO_0(28) = '1' then
		--Fetch a byte from ROM and throw it on the data bus
		FL_ADDR(11 downto 0) <= GPIO_0(27 downto 16);
		GPIO_0(7 downto 0) <= FL_DQ;
	else
		--Disable data bus driver since this is outside of ROM address space
		GPIO_0(7 downto 0) <= "11111111";
	end if;

	--Snoop value of SWCHA and display it on green LEDs
	if GPIO_0(28 downto 16) = "0001010000000" then
		LEDG <= GPIO_0(15 downto 8);
	end if;
	
	--Stuff in values on switches when PF1 is written to
	if GPIO_0(28 downto 16) = "0000000001110" then
		GPIO_0(7 downto 0) <= SW(7 downto 0);
	end if;

	end process;

end behv1;

------------------------------------------

Edited by ZackAttack, Thu May 21, 2015 8:34 PM.


#21 jaholmes OFFLINE  

jaholmes

    Space Invader

  • 36 posts

Posted Tue May 19, 2015 11:20 PM

Very cool!  (Separately, you may have to share the story of the tag and ink on your TV screen, and why you haven't removed them. :))

 

One thing that springs to mind, though you've probably thought of it:  The final versions of your "stuff" and "snoop" operations are going to require some calibrated delays.  There may be a very nontrivial gap, for instance, between the appearance of a valid address and the reasonable assumption of valid data--easily a half-cycle or more.  While you're reflecting the data lines to LEDs, this is pretty irrelevant, but when you're trying to latch a value for future use, it'll be harder.

 

But perhaps not all that much harder, assuming the FPGA is clocked like 10x+ the rate of the 6502.


Edited by jaholmes, Tue May 19, 2015 11:23 PM.


#22 ZackAttack OFFLINE  

ZackAttack

    Dragonstomper

  • Topic Starter
  • 762 posts
  • Location:Orlando, FL US

Posted Wed May 20, 2015 5:37 AM

If the VHDL code is doing what I think,it should already be latching the snooped data. I don't think it's a problem because the address bus changes before the data bus. When the address bus changes the data is latched. I could be completely wrong though.

regarding the ink and sticker. That was a $3 TV from the thrift shop. I suppose I could remove the price tag.

#23 jaholmes OFFLINE  

jaholmes

    Space Invader

  • 36 posts

Posted Wed May 20, 2015 3:25 PM

If the VHDL code is doing what I think,it should already be latching the snooped data. I don't think it's a problem because the address bus changes before the data bus. When the address bus changes the data is latched. I could be completely wrong though.

regarding the ink and sticker. That was a $3 TV from the thrift shop. I suppose I could remove the price tag.

Nah, leave it on.  Adds character. :)

 

On the latching, I'm just wondering about these things:

  • As implemented, I believe you have a transparent latch, so as soon as the address matches SWCHA's %0001010000000, your LEDs start to reflect whatever is on the data bus.  Which, at that point, is probably nothing you want to see.  From a look at the 6532's timing diagrams, the address could easily become valid 300-400ns before the data bus stabilizes, so you could be seeing ~400ns worth of garbage.  Or maybe just nothing.  Since the LEDs in the video look stable, I'd tend to guess it's the latter.  A scope would obviously tell.
  • Instead, you could do an opaque latch only when the address transitions away from %0001010000000, but you would need to guarantee the validity of the data at that point, which from the timing diagrams, is a tad troublesome.  The 6502 appears to stipulate a minimum 10ns "hold time" on the data bus before the address bus goes invalid, however the state of the data bus after the address bus goes invalid doesn't have many guarantees attached to it.  At that point, one begins to wonder about the propagation delays in your 74LS245s, which have no documented minimum, but can go as high as 12ns.  Stick a fast one on the address bus and a slow one on the data bus, and you're probably in trouble.

It's a pity that Phi2 isn't brought out to the cartridge.  That would be awesome.  Without it, it seems like a worthy endeavor to reduce the propagation delay of your logic level translation circuitry as much as possible.  10's of ns might bring about interesting or "impossible" data.


Edited by jaholmes, Wed May 20, 2015 4:17 PM.


#24 Zarek OFFLINE  

Zarek

    Space Invader

  • 26 posts
  • Location:Australia

Posted Wed May 20, 2015 7:04 PM

Wow! Cool video, that looks really awesome! That is totally the kind of thing I want to do one day :)



#25 ZackAttack OFFLINE  

ZackAttack

    Dragonstomper

  • Topic Starter
  • 762 posts
  • Location:Orlando, FL US

Posted Wed May 20, 2015 8:22 PM

Nah, leave it on.  Adds character. :)

 

On the latching, I'm just wondering about these things:

  • As implemented, I believe you have a transparent latch, so as soon as the address matches SWCHA's %0001010000000, your LEDs start to reflect whatever is on the data bus.  Which, at that point, is probably nothing you want to see.  From a look at the 6532's timing diagrams, the address could easily become valid 300-400ns before the data bus stabilizes, so you could be seeing ~400ns worth of garbage.  Or maybe just nothing.  Since the LEDs in the video look stable, I'd tend to guess it's the latter.  A scope would obviously tell.
  • Instead, you could do an opaque latch only when the address transitions away from %0001010000000, but you would need to guarantee the validity of the data at that point, which from the timing diagrams, is a tad troublesome.  The 6502 appears to stipulate a minimum 10ns "hold time" on the data bus before the address bus goes invalid, however the state of the data bus after the address bus goes invalid doesn't have many guarantees attached to it.  At that point, one begins to wonder about the propagation delays in your 74LS245s, which have no documented minimum, but can go as high as 12ns.  Stick a fast one on the address bus and a slow one on the data bus, and you're probably in trouble.

It's a pity that Phi2 isn't brought out to the cartridge.  That would be awesome.  Without it, it seems like a worthy endeavor to reduce the propagation delay of your logic level translation circuitry as much as possible.  10's of ns might bring about interesting or "impossible" data.

The LEDs do reflect the garbage that's on the data bus for half a CPU cycle or so. However when the address changes the value is latched. So as long as the data is still valid when the address changes the latched data will be correct. This must be happening since the LEDs reflect the latched value for all but that 1/2 CPU cycle where the latch is transparent. I should probably include facilities to update the FPGA in the final design just in case variations in the 245's cause problems down the road.

 

I think the fact that the data bus is being pulled high with 10K resistors will help ensure that the address bus change happens faster than the data bus change. Honestly I thought those pull up resistors were going to cause timing problems.

 

Thanks again for all the feedback. It's been very helpful to discuss these potential design issues.

 

Wow! Cool video, that looks really awesome! That is totally the kind of thing I want to do one day :)

Thanks. Sounds like you're well on your way.






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users