To infinity and beyond... (new hardware)

+Spaced Cowboy · December 18, 2018

Quick update on the board-bringup

So the first thing to do, once the ubiquitous LED has lit, is get serial i/o working on the debug port. I duly connected it all up, told the app to io_writechar(huart2, 'X'), hit run, and .... nothing happened. no X for me.

It took a bit of head-scratching to figure out the problem - I'm using the same connector as I use at work (a 10-pin JTAG pinout) because then I have access to all the cool hardware set up for board bringup. To get that to work with the open-source tools though, I'm going through a 20->10-way JTAG adapter, and that board doesn't propagate the serial line signals to the correct pins. No real problem, I soldered a couple of wires to the back of the board and connected it to a different serial->usb converter, so I still have serial output.

Next up was the second serial port. This was a screw-up on my part. The circuit is fine, up until the DB-9 port, where I routed TX to the RX pin and vice versa. I must have looked at that circuit 1000 times, and I still didn't catch it. For this revision, it's not too important, I'll just run one of these between the board and the cable, and swap the lines over.

Now that writing bare bytes to the serial port was up and running, I wanted printf() output (formatted text out). This is really useful in debugging to log something, so its always one of the first things I bring up. A simple call to printf() produced nothing, which is odd because printf calls into _write() which in turn you have to make output characters to the correct serial port using the __io_putchar() call. I'd done that, but I could see that the code wasn't being called (I put a breakpoint in it) by printf(). More head-scratching...

There is the concept of a 'weakly linked stub' in programming, where a default (usually empty) chunk of code will be linked in, if the user-supplied code of the same name isn't supplied. I thought that somehow my own code wasn't being called, and the weakly-linked code (which does nothing in this instance) was getting the call instead. I spent quite some time figuring out that this wasn't in fact the case.

Eventually, I was poring over the assembly output and I realized the code for _write() wasn't being compiled into the binary, even though I could see it ... *right there* in the IDE. If you've figured it out, you're ahead of me... The IDE generated this code (syscalls.c) for you, puts it into the source tree, but doesn't actually compile it until you move it to a directory named 'Src'. Wonderful. So, moving the file over, recompiling, and flashing it down to the board, and we have a working printf. "Hello World" rules again.

So, serial out of the way, next to verify the clock speed was actually what I thought it was. There's a function library that comes with STM chips called the HAL (Hardware Abstraction Layer). One of the HAL_RCC_xxx calls will return the frequency in Hz, so it was just a matter of formatting the output using printf() into MHz for ease of reading...

Finally (for this post), I wanted to check out the SDRAM. Using the data sheet, I calculated the parameters for this SDRAM, running it at 100 MHz (which is the max speed for the STM32H7). Enabled byte-lanes, burst-mode, set CAS and RAS timings, and various other things. Tried to access the RAM, and ... nothing. The CPU hit a bus exception and jumped to its exception handler.

So, a bit more reading up on SDRAM - I generally use an SRAM on microcontrollers if I need more memory (I generally don't need anywhere near the memory available on this board), so I'd overlooked that you have to initialise an SDRAM with a particular sequence of commands. Laying down those commands in the correct order with the correct delays, and configuring the SDRAM MODE register to match what the STM32H7 would be sending, I could read and write to locations in the SDRAM memory space.

The boot sequence is currently pretty short, but it looks like:

SoS Booting.
SoS Compile date: Dec 18 2018 @ 15:08:54
SoS Booting at 400.00 MHz
SoS SDRAM Memory ok at both BASE and LAST

All of which is duly printed out on boot

The remaining peripherals are:

USB (which I'm going to leave for now, getting the HID service up and running is not a trivial task)
The SD card
The video output
The SPI interfaces to the FPGA
The SPI interface to the slots
The i/o expander

But so far, so good. Nothing has gone wrong that isn't recoverable

Simon.

+Spaced Cowboy · December 21, 2018

Further updates:

The SD card interface isn't playing ball. I think there's something wrong with the middleware I was planning on using, which means I'll either be trying to debug the SD card library I have, or writing one from scratch, which is a bit disappointing. The card responds to several calls, happily tells us its configuration, type and required settings, but then a call to get the card status always returns an error. The code is a bit opaque, so I don't think it'll be too easy to fix. We'll see.

The other side of the board, of course, is the FPGA. There's a fair amount of prep-work involved in bringing up the FPGA - specifying all the i/o pins as being linked to a given named port in the top-level module via constraints, and specifying all the voltage specifications etc. On top of that, there's the actual module code to be written. In my (admittedly, limited) experience creating verilog for FPGAs, it's a good idea to have an overall diagram specifying how all these things are going to interact, so the first draft of that looks like:

There are going to be a few clock-domains implemented using the DCM tiles in the FPGA, so that I only need a single clock input (which runs at 50MHz). The figures are only estimates of what I think I'll be able to get the code to do, so they may well change...

I know I want SPI5 (input SPI from the ARM chip) to be as fast as possible so I can do oversampling and recover the clock relatively accurately.
I know the host bus interface (purple) will only need relatively slow access, since that's a <2 MHz clock on the atari.
I also know I can synthesize a 6502 core at ~100 MHz, so that'll set the green clock speed. The ARM can recieve SPI at up to 100MHz, so SPI6 can piggy-back on that.
I want the decode and Antic code to be "as fast as possible", but that'll be limited by the ANTIC part I suspect. We'll see.

Encouragingly, it's not an overly-complicated design. There's really only 4 paths through the design for data to flow, which means I can separate things nicely. I'll use wide-bit FIFOs, so I can encode data+context into a single word, and this is made easier by the FPGA allowing the use of Block-RAMs as FIFOs with widths up to 72-bits, 512 words deep... plenty for combined data+what-to-do-with-the-data information.

With that out of the way, I have an input/output constraints file to write...

+Spaced Cowboy · January 25, 2019

Just a ping to say I'm still alive, I haven't given up, and it's 6:30 am and I'm posting from work, which is why there's not been ~~much~~ any progress on the board this last month.

I have a deadline to meet in a few weeks, and hopefully after that I'll find the time to carry on working on it - but between my work schedule, school activities, and planning the kids birthday next month, there's just been no time.

+Spaced Cowboy · October 15, 2019

So, just to poke my head up over the trenches for a brief moment, the project isn't abandoned... I just got the impression that it was going to turn out to be too expensive for any significant take-up. To that end, I've been (off and on, it's been a busy year) rejigging it with cost in mind from the get-go, as opposed to thinking about that at the end It's now a 4-layer (not a 6 or 8-layer) board, and the parts are significantly cheaper. There are of course different trade-offs that have had to be made, so read on

I've just run the latest incarnation through the cost estimator at PCBway (preferred over Seeed for cost), and got per-board prices of $99 (quantity:5), $65 (quantity:50) and $55 (quantity:100). This is for all the SMT work assembled, and includes the cost of the SMT parts. There's a fair few optional through-hole parts on the board (figure another $10 or so per board), and to keep the costs down I've been assuming that kits with the SMD done, but the thru-hole parts being left for the buyer to do would be acceptable. If you're not into soldering at all, I'm sure enterprising individuals would be willing to step into the gap Closer to the time, I'll get costs for a full assembly, and we can take a view then. For now the P.o.R is to only do SMT assembly (anything with red pads on the diagram below). There's a lot of green-holed through-hole parts, but they're not all necessary - most people will only want the slots, the power-in, and maybe the 3 R-Pi connectors.One benefit is that if you just want the slots and memory capabilities, you're pretty much done once you've added the slots and (probably) a case.

The board currently looks like the below, and the basic design is to have a cable (I'm using a mini-SAS cable) from the back of the XL/XE to this motherboard. We can put the cartridge connector into that back-of-the-computer module, so there's no need to duplicate it on the motherboard. There's a CPLD to manage the fast turnaround times for the /MPD and /EXTSEL signals, and an STM32 handles the slots. Each slot has a dedicated UART with which it can talk to the STM32 whenever it wants, and the STM32 will schedule all the traffic into an ordered sequence of instructions to the host Atari. It seems to me that using a UART (which can be set to run at 115,200 or 1 Mbaud or 4 Mbaud by pulling pins low on the slot connector) is pretty much foolproof - every MCU under the sun has a UART facility

There are a few mentions of the R-Pi4 on the annotated image below, that's because there's an optional R-Pi holder on the underneath of the board on the left-hand side. You can see the extent of the Pi where the dashed/dotted line is. The Pi is connected to the STM32 via the built-in 480 Mbit/sec USB link, which the STM will use to send the video signal (that the CPLD decoded from the Atari's bus activity) down. The R-Pi4 has enough grunt (I think ) to take the video data coming in over USB and zoom it up using the GPU/OpenGL to give a full-screen HDMI interface to the XL. Oh, and you get all the facilities that Linux offers too...

The drawback is that a Raspberry Pi takes longer to boot (about 8 seconds on my Pi) than the XL/XE do, so you'll have to switch on the expansion board before you switch on the computer. To be honest, I seem to remember having to do that with anything I plugged into my old ST back in the day... I think this is a reasonable trade-off for more than halving the price of the expansion kit. You pays your money, and ...

To set expectations, I'm hoping that the all-in price (including a 1GB R-Pi4, the cables to connect it in, and SMT assembly) will come to ~$130 assuming there's sufficient interest for the quantity-50 price. I'm also hoping to get myself a rather nice 3D printer this month (it's bonus week, so the yearly toy gets bought...) and I'm definitely keeping an eye on being able to print the case - this printer can do 27x15cm prints...

Anyway, that's the update. Updates are not going to be anywhere near as frequent as at the start of this thread, but the project has certainly not been forgotten

_The Doctor__ · October 15, 2019

Wow! Now that's some great progress and coming in at a substantially better and quite bit more affordable price point. You really have been busy!

+Spaced Cowboy · October 15, 2019

One thing I forgot to add in to the above price is the cost of the XL/XE interface board. This is pretty minimal, just a pass-through from one port-type (Atari cartridge) to another (mini-SAS), but it has some high-ish cost items on it (the mini-SAS connector is $7 and the Atari cartridge port is $4.50 from DigiKey, though given that that is through-hole, it might be a lot cheaper to source from Ebay). So figure adding another $18-ish for the board and cable.

Colleton · October 15, 2019

This sounds like a wonderful "must have" to me. What sort of things could someone do with it? A modern(ish) network connection? A real external HDD? A 16 bit CPU and RAM expansion?

+Spaced Cowboy · October 15, 2019

All of those things. If you read back in the thread, some of the things I think you can do:

"Memory Apertures" allow arbitrary remapping of the 32MB of SDRAM on-board into the 6502's memory map
We have access to the /HALT line on the bus. We also know the video requirements for the current ANTIC scanline so we can predict when ANTIC will want access. The rest of the time, we can /HALT the CPU and read/write from/to memory ourselves. The STM is the ultimate co-processor
ANTIC uses the bus to fetch the display list and display data - so we can reconstitute the video display just from the bus accesses, which are conveniently signaled by ANTIC using the /HALT line (and not the /REF line, which we also have access to). This is the basis for the video-out-to-HDMI plan
I think it might be feasible to write software-only peripherals on the R-Pi. It ought to be possible, I think, to have a program running on the R-Pi which has access (via an API) over the USB-bus to the STM32 in some standardized way, and the STM would then vend that software "Device" to the XL/XE as a peripheral sinking/sourcing data
The CPLD has an embarrassment of riches with regard to i/o pins, so I put 4 PMOD interfaces in there. All they need is a cable to be brought out to the outside world. There's also an internal expansion bus (to both CPLD and CPU) to link in anything else in the future that need more oomph than the stock slots provide for.
Speaking of the CPLD, I have a 6502 design that runs at ~50MHz on that CPLD, and there's sufficient SRAM hanging off the CPLD to make it a full-blown co-processor...
Networking is actually pretty trivial - you need a daemon on the linux box, and an API in cc65 (or Action! or whatever) that the N: driver can implement. Anything over open, close,read,write can be done with XIO, or you could go native and just memory-map things - so you set up a memory page, 128 bytes is buffer, 128 bytes for control structures, and the driver just reads/writes to those memory locations which cause transactions to happen to the Pi.
Hard disks (USB-attached) are equally simple - just a buffer for data and transactional API for the transport - it's actually pretty much the same as the network model. The Pi has 2x USB-2 and 2x USB-3 ports exposed...
I've plumbed the audio lines through, so if you can get line audio into the cabling on the Atari side (trivial with a 1088XEL, the pins are right there, a bit more difficult with the stock hardware) you can have audio sent over the HDMI link, in stereo even.

Using something like a raspberry pi to reconstitute the display leads to some interesting possibilities - for example when the XL is putting out one of the APAC modes, we could recognize and render it correctly (ie: no black lines on alternate rows).. Things like that. To be honest, just having the gargantuan 32 Mbytes of RAM from the STM32 at hand could lead to some pretty freaking cool things

I think the Pi is too good a deal to miss out on, so I expect most people to go for it, but you don't *have* to have it - the STM is perfectly capable of running the slots without the Pi being present. If you *do* go for the Pi, though, the goal is to have the best damn developer machine for the Atari that you could reasonably get in real hardware - deploying the binary can be done over the parallel bus, there are 2 HDMI ports (one for the XL display, the other a high-res linux interface for text-editing/compiling) and a relatively fast CPU to work with. I've been playing with the Pi-4 for a few days, and it really is remarkably zippy, considering its cost and form-factor.

+Spaced Cowboy · October 17, 2019

So I didn't intend to have another post this quickly, but this is kind of interesting

Since I have the bus traces from an actual 130XE, I'm writing a bus-decoder in Objective-C on the Mac before I try to do it in verilog, it's far faster to debug it in software when I can run tests as part of the build process Once I know I have a working algorithm, I can then translate that to verilog for the CPLD.

Annnnyhoo - I noticed something odd...

Under no circumstance was bit 4 of the address vector ever being set. I thought that was a bit weird, so this morning I went and wired everything up again (40 flying leads!) in case I'd missed one somehow in the previous run. Nope. I still didn't get anything on A4 (bit 1 of the second nibble). So I wondered whether the analyser interface wire was broken, and changed out the flying lead to a different port (this thing can sample 256 lines in parallel at 2ns resolution!) - still nothing on A4.

So I un-wired everything, took the interface board ...

eci.png.9672408512140138d1f00082110c59ba.png

back to my desk and used a continuity tester to make sure that A4 (3rd pin along on the top row) was connected to the pin correctly, and there hadn't been a fabrication error. Yep, that beeped. So then I double-checked that it *was* in fact the third pin along on the top row that I ought to be sampling for A4, which seems to be the case.

Which leads me to believe that my 130XE is buggered, or at least it's A4 line to the outside world is - unless there is some unbeknownst-to-me reason why A4 should never be high ?

As an aside, this version of the breakout board has /REF and /HALT brought out onto it, and also vends a cartridge port interface, so I can plug in a cartridge and look at how /MPD and /EXTSEL work when something is actually plugged in.

Here's ANTIC doing the RAM refresh...

... It looks as though /HALT is asserted one clock before /REF is, and the bus refresh address changes when /REF is active. This is how I found the problem, actually - I was looking at Antic counting up the columns, and it does 9 (weirdly, not 8..) accesses within about every 64 uS with a heavy front-load within those 64uS, meaning it cycles through refreshing every row roughly every 18ms. Looking at the address though, it went $FFCC $FFCD $FFCE $FFCF $FFE0 $FFE1 (so the $xxCx ought to have been $xxDx) and looking around, I saw it was a pattern. A previous sequence was $FFAE $FFAF $FFC0, ... where the $xxAx really ought to have been $xxBx.

So, I think I have a broken external port. I've got my 1088XEL and I'll bring that in and see if I see the same thing (which would be *very* weird) and go from there...

+mytek · October 17, 2019

This is getting very interesting. Can't wait for the mystery of A4 to be revealed .

+Nezgar · October 18, 2019

8 hours ago, Spaced Cowboy said:

So, I think I have a broken external port.

I have one 130XE that had a pin that buckled inwards in the cartridge port. I'm not sure when it happenned, but I'm always very careful now inserting cartridges without a rounded edge connector, maybe the sharp edge of a cartridge PCB caught the edge of the pin in the cartridge port... but I was lucky as there are two "fingers" it seems for each contact, so after carefully bending the borked one out of the way, the other half still worked, so I didn't have to replace the whole connector...

Maybe visually inspect your cartridge port closely that all pins are ok looking?

_The Doctor__ · October 18, 2019

half a finger is never a good contact situation, a full two fingers ensures the least resistance and capacitance, providing full drive and flow.

I'd work on flexing that other finger back into shape, it'll provide a much happier connection, ensuring many more insertions with the most number of varied attachment successes. ?

Edited October 18, 2019 by _The Doctor__

+Spaced Cowboy · October 18, 2019

So, it seems my 130XE is indeed ~~expletive~~ (ahem) ... not fully functional, at least with respect to its external address bus. Here's the screenshot of an 1088XEL trace:

... where I happened to catch Antic doing a refresh of the 256'th column, so the address bus is all-on. This image also shows Antic doing memory-fetches for the graphics data, immediately after the DRAM refreshes.

I am a little confused over the accesses at the moment though, it looks as though Antic's memory access are to:

$F30F - unknown

$02FC - Internal hardware value for last key pressed ?

$F310 .. $F311 .. $F312 .. $F313 .. $F314 .. all unknown

$F3FD - unknown

$F2FD .. $F2FE .. $F2FF .. $F300 .. $F301 .. all unknown

All of these seem to be inside the OS ROM (and not where the charset is at $E000 .. $E3FF) unless my edition of "Mapping The Atari" is out of date, or unless the XEL replaces the ROM somehow ?

And yes, I checked that the low bit of the high nibble of the address wasn't always being set - once bitten, twice shy

Still, the reality is that the traces are what they are - so I guess I just have to figure out what they mean now

+Spaced Cowboy · October 19, 2019

Aaaand edit: I'm an idiot.

The address-bus values *between* those above were constant at $E001, so I was assuming that was the CPU (being halted every other clock) constantly accessing the same location. That's not the case Looking at the traces after Antic has stopped /HALTing the 6502,

... it's clear to see that the CPU is what is accessing $F306,7,8,9,A,... Not Antic. Which means Antic is accessing the $E001 address a lot.

On the XE, $E000 is the standard domestic character-set offset, and I'm assuming the XEL is booting into basic (because I didn't have a keyboard or screen attached) and getting the character-based screen data. Looking at other access patterns, it seems this is Antic reading a byte representing a row in a character (in this case the next-to-top-row of a space character which data is stored at $E000 through $E007). There are other sequences where Antic holds /HALT low for 80 clocks at a time, interleaving 40 read-character-at-screen-position with 40 bytes describing this scan-line's representation of that character.

It's all becoming a bit clearer now

This is all going to feed into a script I'm writing which will take these bus-traces as input and attempt to parse out the screen display from them based on triggering callbacks from events it discovers in the trace sequences (clk going high, /halt going low etc.). Doing it in software is a lot easier than doing it in hardware, and I can then just port the algorithm to the hardware world once I know it works in software

dmsc · October 20, 2019

Hi!

On 10/18/2019 at 9:27 PM, Spaced Cowboy said:

Aaaand edit: I'm an idiot.

The address-bus values *between* those above were constant at $E001, so I was assuming that was the CPU (being halted every other clock) constantly accessing the same location. That's not the case Looking at the traces after Antic has stopped /HALTing the 6502,

... it's clear to see that the CPU is what is accessing $F306,7,8,9,A,... Not Antic. Which means Antic is accessing the $E001 address a lot.

On the XE, $E000 is the standard domestic character-set offset, and I'm assuming the XEL is booting into basic (because I didn't have a keyboard or screen attached) and getting the character-based screen data. Looking at other access patterns, it seems this is Antic reading a byte representing a row in a character (in this case the next-to-top-row of a space character which data is stored at $E000 through $E007). There are other sequences where Antic holds /HALT low for 80 clocks at a time, interleaving 40 read-character-at-screen-position with 40 bytes describing this scan-line's representation of that character.

It's all becoming a bit clearer now

Look at the diagrams from the http://www.virtualdub.org/downloads/Altirra Hardware Reference Manual.pdf , this is the DMA pattern for MODE 2, with HSCR=0:

As for your problem with A4, I would try to put a pull-up on the line and see if it behaves, as the CPU output have very low internal pull-up capability - the levels are mostly TTL compatible.

Have Fun!

+Spaced Cowboy · October 22, 2019

I wanted to test out PCBway as an alternative to Seeed for assembly, so I sent off the board that will plug into the back of the XL/XE as a simple test of the procedures etc. It's just come back and it looks pretty nicely done:

The back row is for test purposes, it's just the pinouts of the port. The front holes are the cartridge expansion (probably through a right-angle connector) and audio-in (for the XEL or anywhere else you can get stereo line-level audio ) The connector on the side is a surface-mount mini-SAS connector with a guide-shield for cable insertion. It comes like this, and then you break off the plastic clip and it turns into a two-piece part. The mini-SAS cable looks like

and is really intended to be an internal cable, but it has a locking snap on the top, and I think it'll work well. I get 36 wires down a relatively flexible connector, and it's a lot easier to work with than an IDC cable. When you snap off the plastic cover, the board looks like:

and connecting it all together, it looks like:

+KlasO · October 22, 2019

I am really excited about what will come out of your project!

+Spaced Cowboy · October 29, 2019

All that sampling is turning out to be very useful. I've got some verilog code now:

`timescale 1ns/1ns

////////////////////////////////////////////////////////////////////////////////
// States that the bus-monitor can be in
////////////////////////////////////////////////////////////////////////////////
`define 	BS_NUM		2

`define		BS_WAIT_FALLING_CLOCK			`BS_NUM'h0
`define 	BS_WAIT_READ_ADDR				`BS_NUM'h1
`define 	BS_WAIT_RISING_CLOCK			`BS_NUM'h2
`define 	BS_WAIT_READ_DATA				`BS_NUM'h3

////////////////////////////////////////////////////////////////////////////////
// Bus monitor module. 
//
// Todo: 
// - Make it manage writes as well as reads
// 	- Take input from external mux for what to write to bus
// - Handle setting IRQ appropriately
// - Handle memory apertures and detection
////////////////////////////////////////////////////////////////////////////////
module busa8
	(
	input				clk,			// CPLD clock @ 100 MHz
	input				a8clk,			// A8 clock @ ~1.8MHz
	input				rw_n,			// A8 read/write signal
	input				halt_n,			// A8 /HALT signal
	input				irq_n,			// A8 /IRQ signal
	input				rd5,			// A8 rd5 cartridge signal
	input				s5_n,			// A8 /S5 cartridge select
	input				rsrvd,			// unused
	input				cctl_n,			// A8 /CCTL signal
	input	[15:0]		addr,			// A8 address bus
	input				extsel_n,		// A8 /EXTSEL signal
	input	[7:0]		data,			// A8 data bus
	input				rst_n,			// A8 /RST signal
	input				rd4,			// A8 rd4 cartridge signal
	input				s4_n,			// A8 /S4 signal
	input				mpd_n,			// A8 Math-Pak Disable (/MPD) signal
	input				ref_n,			// A8 Dram refresh (/REF) signal
	input				D1xx_n,			// A8 access to $D1xx
	
	output reg	[15:0]	busAddr,		// buffered address for this cycle
	output reg	[7:0]	busData,		// buffered data for this cycle
	output reg			busDlist,		// busData is display-list data
	output reg			busScreen,		// busData is screen memory data 
	output reg			busChar,		// busData is character-data
	output 				busDram,		// in a dram-refresh cycle
	output 				busHalt			// in a /HALT cycle
	);

    ////////////////////////////////////////////////////////////////////////////
    // Local state
    ////////////////////////////////////////////////////////////////////////////

	// Display-list related
	reg	[15:0]	dlistAddr;				// Current address of the display list
	reg	[15:0]	screenAddr;				// Address of next screen byte
	reg			lmsLo;					// read the LMS low byte
	reg			lmsHi;					// Read the LMS high byte
	
	// Bus timing related, start with clock going low
	reg	[4:0]	delay;					// Clocks to wait until doing something
	
	reg 		inRefresh;				// Whether we're in a dram refresh cycle
	reg	[1:0]	inHalt;					// Whether we're in a halt cycle
	
    ////////////////////////////////////////////////////////////////////////////
    // State machine for the video detection
    ////////////////////////////////////////////////////////////////////////////
	reg 	[`BS_NUM-1:0]	busState;	// State machine

    ////////////////////////////////////////////////////////////////////////////
    // Sync a8clk to the FPGA clock using a 3-bit shift register to avoid
    // metastability due to the different clock rates
    ////////////////////////////////////////////////////////////////////////////
	reg [2:0] clkDetect;  
	always @(posedge clk) 
		if (rst_n == 1'b0)
			clkDetect <= 3'b0;
		else
			clkDetect <= {clkDetect[1:0], a8clk};

    ////////////////////////////////////////////////////////////////////////////
    // We want to know about rising/falling edges to handle bus traffic timing
    ////////////////////////////////////////////////////////////////////////////
	wire clkRising 			= (clkDetect[2:1]==2'b01);   
	wire clkFalling 		= (clkDetect[2:1]==2'b10);  

    ////////////////////////////////////////////////////////////////////////////
    // map the refresh and halt signals
    // halt cycles are one-clock delayed, refresh is this cycle
    ////////////////////////////////////////////////////////////////////////////
	assign busDram			= inRefresh;
	assign busHalt			= inHalt[1];
	
    ////////////////////////////////////////////////////////////////////////////
    // Monitor the bus 
    ////////////////////////////////////////////////////////////////////////////
	always @ (posedge clk)
		if (rst_n == 1'b0)
			begin
				busState 		<= `BS_WAIT_FALLING_CLOCK;
				delay			<= 5'h0;
				busData			<= 8'b0;
				busAddr			<= 16'h0;
				busDlist		<= 1'b0;
				busScreen		<= 1'b0;
				busChar			<= 1'b0;
			end
		else
			begin
				case (busState)
					// Everything is synced off the falling 8-bit clk, where
					// all the signals are reset
					`BS_WAIT_FALLING_CLOCK:
						begin
							// At the start of the clock cycle, reset things
							busDlist 		<= 1'b0;
							busScreen		<= 1'b0;
							busChar			<= 1'b0;
							if (clkFalling)
								begin
									delay	 		<= 5'h12;
									busState 		<= `BS_WAIT_READ_ADDR;
								end
						end
				
					// We've waited 180ns, sufficient for the address to be
					// stable on the bus, and the /HALT and /REF signals to 
					// be asserted
 					`BS_WAIT_READ_ADDR:
						begin
							delay <= delay -1;
							if (delay == 3'h0)
								begin
									busState 		<= `BS_WAIT_RISING_CLOCK;
									busAddr			<= addr;
									inHalt			<= {inHalt[0],!halt_n};
									inRefresh		<= !ref_n;
									delay			<= 5'h0D;
								end
						end
				
					// We now re-sync to the rising clock signal rather than
					// dead-reckon
					`BS_WAIT_RISING_CLOCK:
						begin
							if (clkRising)
								begin
									// Halt is 1-cycle delayed
									delay	 		<= 5'h12;
									busState 		<= `BS_WAIT_READ_DATA;
								end
						end
						
					
					// We've waited another 180ns, sufficient for the data to 
					// be stable on the bus
					`BS_WAIT_READ_DATA:
						begin
							if (busAddr == 16'h0230)
								begin
									dlistAddr[7:0]	 <= data;
									$display("Set DL:Lo to %x", busData);
								end
							else if (busAddr == 16'h0231)
								begin
									dlistAddr[15:8]	<= data;
									$display("Set DL:Hi to %x", busData);
								end
							else if (addr == dlistAddr)
								begin
									$display("DL data %x @ %x", busData, dlistAddr);
									busData 		<= data;
									busDlist		<= 1'b1;
									dlistAddr 		<= dlistAddr+1;
								end
							else if (addr == screenAddr)
								begin
									$display("Screen data %x @ %x", busData, screenAddr);
									busData 		<= data;
									busScreen		<= 1'b1;
								end
							else if (inHalt[1] && !inRefresh)
								begin
									$display("Char data %x @ %x", busData, busAddr);
									busChar			<= 1'b1;
									busData			<= data;
								end

							busState <= `BS_WAIT_FALLING_CLOCK;
						end
				endcase
			end
			
    ////////////////////////////////////////////////////////////////////////////
    // Handle the Load-Memory-Scan instructions in the display-list stream
    ////////////////////////////////////////////////////////////////////////////
	always @ (posedge clk)
		if (rst_n == 1'b0)
			begin
				lmsLo						<= 1'b0;
				lmsHi						<= 1'b0;
				screenAddr					<= 16'h0;
			end
			
		else if (busDlist)
			begin
				if (busData[3:0] == 0 && (lmsLo == 1'b0) && (lmsHi == 1'b0))
					begin
						lmsLo				<= 1'b0;
						lmsHi				<= 1'b0;
					end	
					
				else if (lmsHi)
					begin
						screenAddr[15:8] 	<= busData;
						lmsHi				<= 1'b0;
						$display("Screen: %x%x", data, screenAddr[7:0]);
					end
				
				else if (lmsLo)
					begin
						screenAddr[7:0]		<= busData;
						lmsHi 				<= 1'b1;
						lmsLo 				<= 1'b0;
					end
				
				else
					begin
						lmsLo				<= busData[6];
					end
			end
			
		else if (busScreen)
			begin
				screenAddr 	<= screenAddr + 1;
			end
endmodule

... that captures the state of the bus in a manner useful to interpreting the bus traffic as ANTIC data. The above code gets me results like this for the top of the power-on BASIC-enabled screen (where it just prints READY)

DL data 41 @ 9c3d
DL data 20 @ 9c3e
DL data 9c @ 9c3f
Screen: 9c20
Set DL:Hi to 9c [9c40]
Set DL:Lo to 20 [9c40]
DL data 70 @ 9c20
DL data 70 @ 9c21
DL data 70 @ 9c22
DL data 42 @ 9c23
DL data 40 @ 9c24
DL data 9c @ 9c25
Screen: 9c40
Screen data 00 @ 9c40
Screen data 00 @ 9c41
Screen data 00 @ 9c42
Screen data 00 @ 9c43
Screen data 00 @ 9c44
Screen data 00 @ 9c45
Screen data 00 @ 9c46
Screen data 00 @ 9c47
Screen data 00 @ 9c48
Screen data 00 @ 9c49
Screen data 00 @ 9c4a
Screen data 00 @ 9c4b
Screen data 00 @ 9c4c
Screen data 00 @ 9c4d
Screen data 00 @ 9c4e
Screen data 00 @ 9c4f
Screen data 00 @ 9c50
Screen data 00 @ 9c51
Screen data 00 @ 9c52
Screen data 00 @ 9c53
Screen data 00 @ 9c54
Screen data 00 @ 9c55
Screen data 00 @ 9c56
Screen data 00 @ 9c57
Screen data 00 @ 9c58
Screen data 00 @ 9c59
Screen data 00 @ 9c5a
Screen data 00 @ 9c5b
Screen data 00 @ 9c5c
Screen data 00 @ 9c5d
Screen data 00 @ 9c5e
Screen data 00 @ 9c5f
Screen data 00 @ 9c60
Screen data 00 @ 9c61
Screen data 00 @ 9c62
Screen data 00 @ 9c63
Screen data 00 @ 9c64
Screen data 00 @ 9c65
Screen data 00 @ 9c66
Screen data 00 @ 9c67
DL data 02 @ 9c26
Screen data 00 @ 9c68
Screen data 00 @ 9c69
Screen data 32 @ 9c6a
Screen data 25 @ 9c6b
Screen data 21 @ 9c6c
Screen data 24 @ 9c6d
Screen data 39 @ 9c6e
Screen data 00 @ 9c6f
Screen data 00 @ 9c70
Screen data 00 @ 9c71
Screen data 00 @ 9c72
...

... which might not mean much until you realise that the '32','25','21','24','39' spell out READY in the internal character set

The bus signals look like

which is exactly what I'd expect. I need to merge in the fetch-character-data that's also happening (I've done it in a different source file) and I'd be able to reconstitute a GRAPHICS 0 screen

[Edit]

And here's the character data being fetched for the second pixel-row of READY, the first pixel row is all zeros...

Char data 00 @ e001
Char data 00 @ e001
Char data 00 @ e191
Char data 7c @ e129
Char data 7e @ e109
Char data 18 @ e121
Char data 78 @ e1c9
Char data 66 @ e001
Char data 00 @ e001
Char data 00 @ e001
Char data 00 @ e001
Char data 00 @ e001
Char data 00 @ e001

_The Doctor__ · April 19, 2020

I spaced out, can we get an update?

Kyle22 · April 20, 2020

8 hours ago, _The Doctor__ said:

I spaced out, can we get an update?

Nice NecroBump Doc. I nearly forgot about this one.

Thx.

+Spaced Cowboy · April 20, 2020

Much as I would love to give some positive news on that, with the current status of the CV19 pandemic, I've had to change job. My last day is in fact tomorrow, and it's been (understandably, I hope you can agree) very hectic over the last few months preparing for this eventuality that I could see coming a mile off

The family, and keeping bread being put onto the table come first, so this is shelved for the immediate future, sorry. I'd love to keep it going but there's not much call for old guys in Silicon Valley as it is, so I just don't have the time right now - I simply *have* to make sure that the new job is a success.

_The Doctor__ · July 9, 2020

@Spaced Cowboy ,

Did eye spy a May 19th update?

http://pbxl.oobergeek.net/progress/?post=memories

I did come across some PBI conditioning post by someone here shortening some signal width and cleaning up things a bit, I bet that combined with buffering built into eci adapter or pbi connector cable and things will be robust and fly.

Somehow I think you've found a way to deal with all that sort of stuff all ready though..

-congrats

-j

+Spaced Cowboy · October 2, 2022

I feel, posting to this thread, that I ought to be dressed in cleric garb. Resurrection is something not lightly undertaken, and the consequences could be dramatic.

But it's been a while, life, as it is wont to do, has waned and waxed. Currently the waxing is in the ascendent, and I'm thinking that the light at the end of the tunnel might not in fact be an oncoming train...

So, the electronics scene has also changed dramatically over the last 2 years, and plans made then don't really translate well to plans that could be acted on today. On the other hand, there are new chips / parts continually being launched, and exploring the undiscovered country of the lesser-known MCUs and parts is kind of exciting in itself. I've blathered enough - there's a picture attached. It doesn't quite live yet, but there's life in the ~~old dog~~ completely new design yet...

Basics are the same:

Attaches to the PBI (or in the case below, the ECI/Cartridge on the back of the XE)
Provides memory (8MB in the design below, integrated into the FPGA) which can be swapped in and out of XE/XL memory space on a page-by-page basis
Provides HDMI output, in this case attached directly to the green FPGA daughterboard.

What's new:

The heavy lifting is done using the Tang Nano 9K from Gowin/SiSpeed. This has the 8MB of PSRAM built-in, and 9k LUTs, which is a reasonable amount. It can directly drive an HDMI display with suitable HDL (to give an idea, the current video/audio HDMI design takes about 6% of that resource). These are about $15 from AliExpress.
Attached to the FPGA is an RP2040, which is an awesome little microcontroller, not in short supply, and ridiculously cheap. I have 2 8-bit busses linking it directly to the FGPA for full-duplex communication, it has 2 cores which can easily manage 200MHz each and it also controls a SPI-based ethernet controller.
Ah yes, ethernet. More on that below
Audio-in. The RP2040 has ADCs on it, and the last port on the back is a 3.5mm stereo jack. Got to have audio-in if I'm going to be piping audio through the HDMI connection
The price of the BOM ought to be lower than in the past - that Tang Nano 9K packs quite the punch, and the RP2040 is only $1. There's several ancillary things, of course, and the PCB has to be made/assembled but even with all that, there's a lot less expensive bits in this iteration.

The big new idea is that although "slots" are awesome, and still definitely something I want to see happen, splitting the project into two parts might make it a bit more feasible to get out the door. The success of fuji-net (and that could definitely be incorporated) shows that just connecting the 8-bits up to the internet in a useful way is a huge win. I plan to have a standardized "slot" service that you can run on a PC/Mac/Linux box (maybe even just a PHP script inside a web server) which will let you have "soft" peripherals running on a different computer, connected by ethernet. When part 2 comes along, and the real slots appear, they'll just be a dedicated device-with-ethernet that can pump out data in the same way. A 100-baseT ethernet connection can completely saturate an XL/XE almost instantly so tethering hardware slots to the computer isn't really necessary any more.

Another change is that I'm not trying to take over the bus at all. I figure that if you want some off-device "peripheral" to write directly into the 8-bit's memory, it's entirely possible to just set up a memory-aperture (remember those) to make accesses to those locations come from the FPGA memory, and write directly into that FPGA memory instead. Much easier and less fraught - no need to take over the bus, and push to RAM, just let the XL/XE come get it from FPGA instead without it even realizing. Partly this is "KISS", partly it's pragmatism, there aren't many IO's on that daughterboard, simplifying things means less control pins needed for output-enable and direction control.

I have a video-out design that synthesizes (nothing actually real yet) which will produce 640x480 in 24-bit colour at 60fps. Video will be constructed by snooping on the bus traffic, so I'll need to implement at least some of Antic. That video output is audio and video, and synthesizes to a healthy 40MHz pixel clock, well in excess of the 27MHz I'd need for a 640x480 image.

There's a couple of ways I can see the video working out - either frame-buffered or line-buffered. I'd obviously prefer the line-buffering, and the idea there would be to wait for a few lines of Antic data to get into the pipeline, then start the HDMI process fetching pixels and producing sync signals. It'd be a few scan-lines behind the original, but I doubt that'd be detectable. The alternative is to double buffer, and have Antic drawing into buffer #1 while the HDMI is displaying #2, then ping-ponging between them on successive frames. That is probably a lot easier, but it would introduce a frame's delay to the video. I hope to have sufficient control over everything (because it's in an FPGA ) that the line-buffering can work out.

One nice thing about using the RP2040 is that flashing the design "in the field" ought to be a matter of plugging it into a PC/Mac over USB while you hold the button on the back, and dragging a file to the "USB disk" that just appeared. I intend that .UF2 file to include both the new RP2040 firmware and the bitstream for the FPGA. All with one simple method to update.

As I write, I have a (business) trip to NY coming up, which is going to make the next couple of weeks too busy for much work on this, but since it's got to the stage where I think it might just actually be viable again, I thought I'd drop a line.

I've been waiting to write something like this for what seems to be forever. It's nice to be back

+MrFish · October 2, 2022

Just so I understand... in short, it's the following...

1. Memory (8 MB)

2. Video/Audio Out (HDMI)

3. Ethernet (100 Mbit)

and in the future...

4. Ethernet communication with PC for soft peripherals/services

5. Ethernet communication with a hardware box that will have cards for upgrades/services

+Spaced Cowboy · October 2, 2022

"is" is something of a misnomer. "will be" would be more appropriate

Build it first, then 1,2,3, with 4 following shortly afterwards because it's not much of a delta from 1,2,3,

5 would come after all the rest because it'd need more hardware.

But essentially, yes, that's right.

[edit: The 100mbit ethernet might not be entirely accurate - better to say >10mbit. The interface between the CPU and the ethernet chip will run at ~64mbit, and there will be overhead. I have seen people get 30-40 mbit of actual transmitted data using this setup though, which still seems like "plenty"]

To infinity and beyond... (new hardware)

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members