Jump to content
IGNORED

Designing a cartridge that supports 100% C/C++ game development


Recommended Posts

I think the fact that the data bus is being pulled high with 10K resistors will help ensure that the address bus change happens faster than the data bus change. Honestly I thought those pull up resistors were going to cause timing problems.

 

Hmm. That's a very attractive thought. If you do end up scoping this out, I'd love to see what the actual timing works out to be!

Link to comment
Share on other sites

I'm not really sure how I'd trigger the scope on the address bus changing from the snooped address. Maybe the only way to do it is by bringing out a signal for it to one of the FPGA I/O pins? It might be easier to program the FPGA to serve double duty as a data logger and just log the data and address bus values for a few hundred cycles after the snooped address is detected. If the data logger can be clocked at 100MHz or more I think it would produce a good picture of what the timing really looks like.

Link to comment
Share on other sites

Yeah, I was thinking the former, although the logger solution might be better. But if you bring out the results of two completely independent tests--one for a known address, another for known data at that address--as two pins, I think you'd get a good picture of the relative timing. Of course, your known data might appear at other addresses, but if your scope is triggering on the address, that shouldn't matter.

Link to comment
Share on other sites

For the Command State Manager I think the address bus can be compared to an expected value in order to generate a clock signal that synchronizes everything. During power on the initial state will be a hard coded collection of commands that emulate a reset vector, load $FF into the 6502 Y register and then stay in an infinite loop of branching to $1000 until the MCU signals that buffer 0 is initialized. From that point on the state will toggle between buffer 0 and buffer 1.

 

This is a rough idea of what the execution of the 3 cycle register write command will look like.

PC => DB = $84 (STY)
!PC => PC++;
PC => DB = Param0; ZPAddress = Param0;
!PC => PC++;
ZPAddress => DB = Param1
!ZPAddress => Next Command

The FPGA will have it's own copy of the 6502 PC which it will keep in sync on it's own. When the PC matches the address bus (PC => the STY opcode will be activated on the data bus. Then when the PC stops matching the address (!PC =>) bus the PC is incremented. When the PC matches the address bus again the commands first parameter, Param0, is activated on the data bus. Param0 is also stored in the ZPAddress register inside the FPGA. When the PC stops matching the address bus again PC is incremented so it will be correct for the next command. At this point the ZPAddress will be compared to the address bus instead of the PC register. Once they match (ZPAddress =>) the second parameter, Param1, will be activated on the data bus. This will be a bus-stuffing operation since the STY operation being executed by the 6502 will be triggering a write of the $FF stored in the Y register to whatever ZP address that was provided. Finally once the ZPAddress stops matching the address bus the next command will be setup and the process will repeat. If the command was the last one in the buffer the buffers will be switched and the IRQ to the MCU will be activated.

 

High Level State Machine Diagram:

post-40226-0-90972300-1432261834_thumb.png

Link to comment
Share on other sites

Cool! That makes plenty of sense. How do the buffer sizes correlate with the cartridge address space, do you think? I guess they wouldn't need to be very large, as they could be filled extremely quickly by your IRQ handler from a single, logical staging area in the MCU's RAM. Perhaps their size is chosen to keep your FPGA cheap. I'm imagining a situation where a C++ developer wants to think of the screen as having a frame buffer associated with it, and wants to just start plotting pixels all over the place without regard for the beam. Obviously, if you wanted to support such a thing, then we're talking 10's of kbs worth of commands and multiple buffer swaps and refills, which a lazy C++ dev might want hidden from him/herself.

Link to comment
Share on other sites

The FPGA will use two small buffers that can only hold a few commands. The exact size will depend on the fpga. Minimum size will be 32 commands though. That way each buffer can hold a full scanline worth.

 

There's two approaches to how the full frame buffer is stored in the MCU. Either commands could be buffered or a bitmap. If commands are buffered then each IRQ can simply copy as many commands as the buffer holds in the fpga. If a bitmap is used the irq will have to convert the pixel data into commands.

 

A good estimate for actual command size is 1 byte per CPU cycle. So a 3 cycle write will have a 3 byte command. There are 192 scan lines with 76 cycles each. A frame buffer of commands would take up about 15kb of ARM RAM. So 30kb total since it would be wise to double buffer the command frames.

 

The best pixel kernel I'm aware of can do 48 pixels non-interlaced. Bus-stuffing might bump that up a bit though. These kernels are monochrome so only 1bpp. So a pixel buffer would be less than 3kb even with interlacing doubling the horizontal resolution.

 

I agree that the smart/lazy devs will want to program to buffer abstractions. A good framework should do nicely for that.

 

Keep in mind that those who want to push things to the limit would still be free to build their own framework which handles the creation of each command. I suspect a hybrid approach will be useful too. Similar to bB where the playfield is a bitmap and the sprites would be drawn via commands.

 

Btw, the ARM MCU will be a Atmel sam4s variant. They can have 64, 128, or 160 KB or ram. Even triple buffering commands should be no problem.

  • Like 1
Link to comment
Share on other sites

A good estimate for actual command size is 1 byte per CPU cycle. So a 3 cycle write will have a 3 byte command. There are 192 scan lines with 76 cycles each. A frame buffer of commands would take up about 15kb of ARM RAM. So 30kb total since it would be wise to double buffer the command frames.

Please do no limit your design to 192 scan lines. TVs can easily display ~10% more. Also we have a lot of people from PAL land, there we have another ~20% scan lines more.

Link to comment
Share on other sites

Please do no limit your design to 192 scan lines. TVs can easily display ~10% more. Also we have a lot of people from PAL land, there we have another ~20% scan lines more.

Just to clarify. The software decides how many scan lines per frame. The size of the fpga buffers will not impact that. Thanks for pointing this out. I will need to keep that in mind when it comes time to design the framework software. Eventually I'll need to find someone with a pal setup to help me test it.

Link to comment
Share on other sites

Eventually I'll need to find someone with a pal setup to help me test it.

 

Provided your display can handle it, you can test it out with an NTSC Atari. Just note the colors will be wrong. As an example, the PAL game Wing War:

post-3056-0-71856100-1432319285_thumb.png

 

Looks like this when played on my Atari, which was modded for S-Video and is connected to my C= 1084S monitor:

post-3056-0-98958700-1432319305_thumb.jpg

 

The monitor handles PAL refresh rates just fine, I used to play PAL demos and games all the time on my Amiga 2000HD.

Edited by SpiceWare
  • Like 1
Link to comment
Share on other sites

Cool! That makes plenty of sense. How do the buffer sizes correlate with the cartridge address space, do you think? I guess they wouldn't need to be very large, as they could be filled extremely quickly by your IRQ handler from a single, logical staging area in the MCU's RAM. Perhaps their size is chosen to keep your FPGA cheap. I'm imagining a situation where a C++ developer wants to think of the screen as having a frame buffer associated with it, and wants to just start plotting pixels all over the place without regard for the beam. Obviously, if you wanted to support such a thing, then we're talking 10's of kbs worth of commands and multiple buffer swaps and refills, which a lazy C++ dev might want hidden from him/herself.

Not necessarily jaholmes, Bb and the ASDK present framebuffers to the lazy developer where they can plot pixels in a very small memory footprint.

Link to comment
Share on other sites

Not necessarily jaholmes, Bb and the ASDK present framebuffers to the lazy developer where they can plot pixels in a very small memory footprint.

 

I was just talking about the amount of command data streaming by, not the size of the buffers needed. However, a mutable 12x12 byte array--assuming you want full color pixels--is already beyond the capability of the Atari 2600 hardware. 96x192, as just a random pick, would be nearly 20kb. I'm not sure what trickery could be used to work around the need for some (relatively) fat buffers in that case--at least in the MCU address space. The command buffers obviously needn't be anywhere near that large.

  • Like 1
Link to comment
Share on other sites

Thinking some more about the scan line differences between pal and ntsc it occurred to me that there's no reason a game can't be compatible with both by simply auto-detecting which format and adjusting the command stream accordingly. This should work really well with games that have vertical scrolling since it would be trivial to fill the extra scan lines in pal mode.

Link to comment
Share on other sites

It turns out that using the addressbus changes to clock the fpga state was an awful idea. I've reworked that part of the design so the sequential logic in the fpga will be clocked by a real clock running at 50MHz or more. Each command will have its own vhdl entity which will implement a finite state machine (FSM). The DataBus outputs from all the commands will be combined with AND gates since driving the bus low is the only thing that has any effect.

 

Here's the state diagram for the ResetVector command. This command will be what send the CPU to $1000 for its first instruction to execute. Transition labels define the condition to transition to the next state. Each state has a label followed by the output conditions. It looks pretty simple now, but it took me a long time to get there. All the other commands will follow the same basic pattern of Idle->Start->Data Sequence->Finish->Idle. Write and read commands will need to build up the expected address for states that correspond to non-ROM accesses. They will also need to interface with the buffer manager to pull in parameters.

 

post-40226-0-69717500-1432441234_thumb.png

Edited by ZackAttack
Link to comment
Share on other sites

Thinking some more about the scan line differences between pal and ntsc it occurred to me that there's no reason a game can't be compatible with both by simply auto-detecting which format and adjusting the command stream accordingly. This should work really well with games that have vertical scrolling since it would be trivial to fill the extra scan lines in pal mode.

 

Considering that the cartridge drives the frame rate, how are you going to auto-detect NTSC or PAL?

Link to comment
Share on other sites

Considering that the cartridge drives the frame rate, how are you going to auto-detect NTSC or PAL?

can't the 6507 be timed to see how long it takes to execute a set of commands? Since pal runs at a higher clock rate than ntsc it should execute the set in less time. Then you simply set the frame rate, scanline count and palette accordingly.

Link to comment
Share on other sites

can't the 6507 be timed to see how long it takes to execute a set of commands? Since pal runs at a higher clock rate than ntsc it should execute the set in less time. Then you simply set the frame rate, scanline count and palette accordingly.

That's why I'm asking, I thought you'd figured out a way. As I understand it the clock drives everything, including the timer, so I don't see how you could time anything to detect a difference.

Link to comment
Share on other sites

That's why I'm asking, I thought you'd figured out a way. As I understand it the clock drives everything, including the timer, so I don't see how you could time anything to detect a difference.

I meant a timer in the arm mcu. The arm will be running off a much faster clock so it should have enough timer resolution to discern between pal and ntsc 6507 clock speeds. To make the measurement more accurate we could fill a command buffer with many write wsync commands and time how long it takes until the next irq is triggered. 100 wsync commands would be 7600 6507 cycles, that should make for a pretty big time difference between pal and nstc variants.

  • Like 2
Link to comment
Share on other sites

Yeah, I was wondering about what to do with SECAM. Maybe there's a variation in the hardware that can be detected. Regarding the menu, I'd only show it for pal since nstc can be correctly identified. It also seems like a good setting to save in the arm eeprom so you only have to choose once. That could probably be done with the encore too.

Link to comment
Share on other sites

The picture is pretty noisy, but that's expected considering my complete disregard for proper RF shielding and what not.

 

The noise might be a critical issue in the end. For the "nomal" flash cartridge that operates with a fast clock it is sufficient to put ~1KOhm resistors between the cartridge connector and the databus [D0-D7]. This removes most of the noise. But I am a bit uncertain, about your setup. You might also have to put resistors between A0-A12 and cart connector. Most likely your noise is not because of RF shielding. It is induced by the A and D bus. Try to isolate all signals you inject into the 2600 from the power supply of your FPGA board as much as you can without breaking your design. Maybe there is a better way than resistors ... I just didn't find it, yet.

 

  • Like 1
Link to comment
Share on other sites

Thanks for tip about the resistors. After reading your post about the noise I watched the video again and noticed that the signal is only messed up for the portion of the frame where the 6502 is active. So it probably is related to switching noise. I'll try the extra resistors and post another video if it helps.

Link to comment
Share on other sites

The noise might be a critical issue in the end. For the "nomal" flash cartridge that operates with a fast clock it is sufficient to put ~1KOhm resistors between the cartridge connector and the databus [D0-D7]. This removes most of the noise. But I am a bit uncertain, about your setup. You might also have to put resistors between A0-A12 and cart connector. Most likely your noise is not because of RF shielding. It is induced by the A and D bus. Try to isolate all signals you inject into the 2600 from the power supply of your FPGA board as much as you can without breaking your design. Maybe there is a better way than resistors ... I just didn't find it, yet.

 

Adding the 1K resistors to the data bus helped a little, but it breaks bus-stuffing. Adding them to the address bus also helped a little. After staring at the TV for a while it occurred to me that the noise appears to be the worst on cpu cycles where the ROM address space is read from. I reviewed the VHDL code and believe the problem may be there.

	if GPIO_0(28) = '1' then
		--Fetch a byte from ROM and throw it on the data bus
		FL_ADDR(11 downto 0) <= GPIO_0(27 downto 16);
		GPIO_0(7 downto 0) <= FL_DQ;
	else
		--Disable data bus driver since this is outside of ROM address space
		GPIO_0(7 downto 0) <= "11111111";
	end if;

Anytime the CPU is in the ROM address space I'm sending the address to the flash memory on the dev board and sending back the flash memory data bus back to the Atari. I think I may be picking up some transient address values due to variance in the resistors that do the level translation. Since the fpga and flash memory operate much faster than the Atari it could potentially throw multiple values on the data bus in a very short period of time.

 

If this is indeed what's causing the problem that would be great. Once the command management vhdl is done the fpga will know which address value comes next and only change its data output once the next address value is detected. I could also build in a small delay to prevent it from thrashing.

 

Maybe I'll attempt to modify the existing fpga code to limit how often a new data output can occur so I can test the hypothesis sooner.

 

Link to comment
Share on other sites

I modified the fpga code a bit to try to reduce changes to the data bus output. To make it easy to experiment I wired up two groups of 8 switches to configure the delay before updating data out and how long to keep updating it once the delay expired. A few clocks of the 50MHz clock was enough delay to produce a significant reduction in noise. I tried to take a video to show the difference but my camera seems to amplify the little bit of noise that's left and it ends up looking the same. Based on these results I think I can ignore any remaining noise issues until I finish out the functionality and get the first pcb prototype. I think adjustments to the fpga code will allow for most of the signal issues to be resolved. Here's the latest vhdl code:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

----------------------------------------

entity Light is
port(
	CLOCK_50: in std_logic;	
	SW: in std_logic_vector(17 downto 0);
	LEDR: out std_logic_vector (17 downto 0);
	LEDG: out std_logic_vector (7 downto 0);
	GPIO_0: inout std_logic_vector (28 downto 0);
	FL_WE_N: out std_logic;
	FL_CE_N: out std_logic;
	FL_OE_N: out std_logic;
	FL_RST_N: out std_logic;
	FL_DQ: in std_logic_vector(7 downto 0);
	FL_ADDR: out std_logic_vector(21 downto 0)
);
end Light;  

----------------------------------------

architecture behv1 of Light is
signal prevAddr: std_logic_vector(12 downto 0);
signal count: unsigned(7 downto 0);
begin

    process(CLOCK_50)
    begin

	if(CLOCK_50'event and CLOCK_50 = '1') then
		--Enable reading from flash memory
		FL_WE_N <= '1';
		FL_RST_N <= '1';
		FL_OE_N <= '0';
		FL_CE_N <= '0';

		--Only using the lowest 4KB block of flash memory
		FL_ADDR(21 downto 12) <= "0000000000";

		if GPIO_0(28) = '1' then
			--Fetch a byte from ROM and throw it on the data bus
			FL_ADDR(11 downto 0) <= GPIO_0(27 downto 16);
			if (prevAddr = GPIO_0(28 downto 16)) then
				count <= count + 1;
			else
				count <= x"00";
			end if;
			if(count > unsigned(SW(7 downto 0)) and count < unsigned(SW(17 downto 10))) then
				GPIO_0(7 downto 0) <= FL_DQ;
			end if;
		else
			--Stuff in values on switches when PF1 is written to
			if GPIO_0(28 downto 16) = "0000000001110" then
				GPIO_0(7 downto 0) <= SW(7 downto 0);
			else
				--Disable data bus driver since this is outside of ROM address space
				GPIO_0(7 downto 0) <= "11111111";
			end if;
		end if;

		--Snoop value of SWCHA and display it on green LEDs
		if GPIO_0(28 downto 16) = "0001010000000" then
			LEDG <= GPIO_0(15 downto ;
		end if;
		
	
		prevAddr <= GPIO_0(28 downto 16);
	end if;
	end process;

end behv1;

------------------------------------------
Link to comment
Share on other sites

  • 2 weeks later...

I just stumbled upon the ESP8266 which is a small IC with wifi support that only costs about 5 bucks USD. I'm thinking it would be a good idea to plan for a wifi expansion to the C++ cart in the future. That could allow for network play, easier updates, and maybe even an app store for the 2600.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...