Jump to content
speccery

StrangeCart

Recommended Posts

2 hours ago, Lee Stewart said:

 

The code I slightly modified is from MG’s The Smart Programmer, V2(2), July, 1986. Here is my transcription of it:

Thanks Lee, that is interesting. One silly question (I have only written the receiving end of a DSR, i.e. the actual DSR): the calling convention you mention is:

	BLWP @DSRLNK    
	DATA 8

I am curious about the use of the parameter 8. How is it used? If I have understood properly scratchpad location >8356 must be initialised with the PAB structure pointer (offset 9) in VDP memory. Sorry I haven't taken any time to dig into this.

  • Like 1

Share this post


Link to post
Share on other sites
2 minutes ago, Asmusr said:

If you want to speed things up you should look into how to avoid any VDP reads and how to avoid settings up the VDP write address more than once. 

Thanks. Yes I have the solution for this already: do all processing by the Cortex M4 on the cartridge and write the data to be written to VDP memory to a memory area in the cartridge memory, and then have a very simple routine in scratchpad memory for the TMS9900 to just loop though the data and write it to VDP. Then the whole thing will become VDP memory write bandwidth limited. Simple structures like: count, VDP destination address, count bytes of data.

 

My current scrolling routine already sets VDP address registers very seldom (that is the reason for the vertical strips).

 

But before I go there, I want to first play around with TMS9900 code without help by the Cortex M4 core since that is kind of cheating, although I will be more than happy to cheat once I get to that part of the project 😁

 

But I did do some benchmarking of my prototype scrolling code, comparing the same code on the some of the different systems I have:

  • TI-99/4A (8.3 seconds)
  • My "legacy" EP994A FPGA core (around 0.7s @ 100MHz) with 31 wait states but cache enabled.
  • My "new" Icy99 core on the ICE40HX (around 1.5s @ 25 MHz).

I could not run my EP994A FPGA core at full speed due to some stupid versioning reason, and I did not want to run again the synthesis. I did find a bug in my Icy99 core: on the ICE40HX platform the VDP memory is located in external SRAM. It is actually an UMA system (unified memory architecture - just one memory bus shared between CPU, VDP, and DMA from host). With the scrolling routine putting "a lot" of memory transfer load between the CPU and VDP,  the VDP screen refresh engine sometimes runs too slowly, which causes annoyingly a temporary loss of vertical sync, i.e. screen flashes. It is funny how the different projects reveal new issues. By keeping the code in the TMS9900 assembly it is actually a lot of fun to test drive it in all of these other systems.

Share this post


Link to post
Share on other sites
11 hours ago, speccery said:

Thanks Lee, that is interesting. One silly question (I have only written the receiving end of a DSR, i.e. the actual DSR): the calling convention you mention is:

	BLWP @DSRLNK    
	DATA 8

I am curious about the use of the parameter 8. How is it used? If I have understood properly scratchpad location >8356 must be initialised with the PAB structure pointer (offset 9) in VDP memory. Sorry I haven't taken any time to dig into this.

 

The DSRLNK routine pops the data after its call to get from the caller whether s/he wants a high-level device service routine (8) for the device at the beginning of the file descriptor field in the PAB or a lower-level subprogram (10), whose name (often only a hex number) is passed in the PAB. When the DSRLNK returns, it will be to the address after “DATA 8” or “DATA 10”. For the diskette DSR, 8 is for level 3 DSRs, all of which use the PAB we all know and love with >8356 pointing to the name-length byte (byte 9) of the PAB in VRAM. Diskette DSR subprogram call 10 is for level 1 and 2 routines that usually use a 2-byte PAB in VRAM, which holds a counted string containing the name of the subprogram and the namelength byte for which is pointed to by >8356 and a transfer block at FAC or FAC+2 and an additional information block pointed to by the transfer block, both of which are in CRAM.

 

...lee

 

[EDITs: Corrections per @jedimatt42’s post below and, hopefully, some clarifications are in the color of this note.]

Edited by Lee Stewart
ADDITIONAL INFO and CORRECTIONS
  • Like 2
  • Thanks 1

Share this post


Link to post
Share on other sites

That's funny I've written my TIPI ROM with DSRs long before I ever wrote a DSRLNK or a call to one in assembly... LOL... 

 

To be strict, it is the DSRLNK routine in question that read that word at the return address to determine which NameLink list to search in the ROM headers.. 

 

8 bytes into the ROM header is where the device service routine list for each ROM begins, and 10 bytes into the ROM header is where the subroutine ( CALL FILES, level2 management routines, etc ) list begins. 

 

Conveniently TI spec'ed both lists with the same structure, so one name searching routine with an offset is all that is needed. Many ( most? ) DSRLNK routines floating around expect this, but not all. 

 

 

  • Like 3

Share this post


Link to post
Share on other sites
2 hours ago, jedimatt42 said:

That's funny I've written my TIPI ROM with DSRs long before I ever wrote a DSRLNK or a call to one in assembly... LOL... 

 

To be strict, it is the DSRLNK routine in question that read that word at the return address to determine which NameLink list to search in the ROM headers.. 

 

8 bytes into the ROM header is where the device service routine list for each ROM begins, and 10 bytes into the ROM header is where the subroutine ( CALL FILES, level2 management routines, etc ) list begins. 

 

Conveniently TI spec'ed both lists with the same structure, so one name searching routine with an offset is all that is needed. Many ( most? ) DSRLNK routines floating around expect this, but not all. 

Thanks, now this makes sense! I did the same, wrote a DSR before starting to think how they get called. The 99/4A would not be the 99/4A, unless the PABs and file buffers were in VDP RAM too :) That was my first thought when working on the DSR. Of course for an unexpanded computer that's nearly all RAM one's got.

  • Like 3

Share this post


Link to post
Share on other sites

Hi,

  Speccery, when we last meet here, he started another branch regarding the DSRLNK.

  I was wondering if you have had time to work on things like Pascal Emulator, maybe console grom over-ride or just getting what you have ready for people to try out?

Thank,

Dan

 

Share this post


Link to post
Share on other sites
On 9/16/2020 at 12:43 AM, dhe said:

Hey Speccery,

  Come back, don't be a stranger!

..and I am back.  No intention of being a stranger. Sorry for the long absence, not intended. It's just that I have been very busy with work etc, more stress due to numerous reasons not under my control. In situations like these I tend to use my spare time for exercise to cope with everything that's going on.

 

In addition to StrangeCart, I'll also discuss some other random off-topic stuff I've worked a little in case someone finds it interesting.

 

Anyway, I've only had little time to work with the StrangeCart, and I've used that time to get the rest of the software pluming on the cart working properly. This has actually been done standalone, without the TI connected.

 

One the things I've accomplished to do is the get the USB serial device running on the microcontroller. This removes some of the wiring mess, as I no longer need to use an external FTDI USB to serial converter to talk to the MCU (view debug messages, issue debug commands). It's also now working the way I wanted: the M0+ core is spending its time monitoring and serving the external bus (i.e. the TI cartridge port), while the more powerful M4 core is handling USB code with interrupts enabled.

 

One thing I want to implement - and this is going to require a little GPL programming (which I have never done) is to modify the standard Mini Memory cartridge running on the  StrangeCart so that I could save the 4K SRAM contents into the MCU's flash memory, to achieve the same rentention capability the original Mini Memory had with the battery backed RAM.

 

The other retro computer related stuff I have done is some kind of random rambling on different things: on the TI-99/4A front I noticed that someone has ported my EP994A TI-99/4A FPGA core for the MIST. I tried that out, seemed to work well. I really want to port my newer and better organised icy99 core for the MIST as well. As a small exercise and to see that my toolchain with Altera FPGAs is working, I did synthesise the MIST TI version myself as well. Kind of strange to get back to my own code, that someone else has ported for the Altera platform.

 

Another project I also built (this was almost too fast to enjoy) was to assemble the RC2014 kit I won back in 2016 for the retrochallenge. I can't believe it took me this long - after moving to our current home this summer I found the kit. It wasn't really lost, it was where I left it, but I kinda forgot that I had it. Anyway I enjoyed building it - this actually is probably the first Z80 system I have built, even if Z80 was the CPU I first learned assembly programming on. The first time I applied power, it worked. In its current form is one of the base versions of the kit, having just backplane, Z80 CPU, 32K RAM, a 64K ROM chip and serial interface. Now that all of this was working I did order from Tindie a couple add on boards, as I want to make my system CP/M capable.

 

The RC2014 is such a simple and lovely system, I feel I am almost obligated to next build a TMS9995 CPU board for it...

  • Like 2

Share this post


Link to post
Share on other sites

One of the things I learned from watching "HIGH SCORE" on Netflix was that some of the games for the SNES (Star Fox) had a co-processor on the cart. I have several ideas where something like that could be useful. The thing I don't understand if how a co-processor on a cart would interact with the normal CPU.

Share this post


Link to post
Share on other sites
6 hours ago, Asmusr said:

One of the things I learned from watching "HIGH SCORE" on Netflix was that some of the games for the SNES (Star Fox) had a co-processor on the cart. I have several ideas where something like that could be useful. The thing I don't understand if how a co-processor on a cart would interact with the normal CPU.

On the SNES and the Genesis, the system was basically able to get a frame buffer image from the co-processor (over-simplified, but essentially). On the TI we don't have the bandwidth to do exactly that.

 

Interaction between the CPUs isn't too hard though... a common approach is to have a small amount of shared memory that both processors can access (so on the TI, it'd have to be on the cartridge) that can be used to pass information back and forth. It's probably enough to have space for data, and a single byte for control of that data (so that as long as you write that control byte last, you don't need to be too worried about the integrity of the data). For data that must be carefully controlled, I've used Peterson's algorithm successfully in the past and found it pretty simple: https://en.wikipedia.org/wiki/Peterson's_algorithm

 

  • Like 3

Share this post


Link to post
Share on other sites

Well I got the first version of the madness with multiple processors working. It is not anything exciting visually (yet - I hope that changes soon).

I did a test program, which just fills the screen with 'E' characters in graphics mode 2. By sending a commands to the USB serial port of the cartridge, the software does the same again. Not very exciting, right? Right. It's not what, but how. What follows is a bit technical, a nice interplay of three processors:

 

What happens in this test first is that the TI-99/4A boots normally. During boot-up the strangecart does a bunch of stuff:

- the M4 processor core starts, and initializes the hardware of the MCU

- then the M4 copies from its internal Flash memory a small TMS9900 cartridge program to one of the internal SRAM blocks (SRAM 3).

- next the M4 copies from its internal Flash memory a small Cortex M0+ program to another internal SRAM block, and starts this program. It's what I call the "busserver" program, where the M0+ sits in a tight loop waiting for cartridge port accesses, either to ROM or GROM. If it sees a ROM or GROM read request by the TMS9900, it fetches the corresponding data from SRAM 3 and presents it to the TMS9900 on the normal data bus. And again it goes.

 

At this point the TI-99/4A is still sitting in it's boot screen. When the user pushes a key, the normal menu is presented. One of the options is my test program strange1.

 

If the user chooses this, the strange1 cartridge starts. There is some TMS9900 code to initialize the VDP to GM2. After this the TMS9900 writes to a magical memory location, namely >7FFE. The TMS9900 then sits in a loop, and checks if memory location >6166 becomes non-zero. If it does, the TMS9900 branches to that location.

; Signal the StrangeCart that we're ready to go!
        INC   @>7FFE          ; Write cycle to 7FFE
        LWPI WRKSP
; Stop here
Stop    MOV  @VECTOR,R0
        JEQ  STOP
; We did get a jump vector from StrangeCart. Go there.
        B    *R0        
vector  DATA 0
        END MAIN

The M0+ core maintains a counter of all writes to >7FFE. This counter is read by the M4 core. When the M4 notices that the counter has changed, it knows the TMS9900 is now waiting for something to do.

 

Before even checking for the counter to change, the M4 generates dynamically some code, from TMS9900 address >6200 onwards (the following routine is executed by the M4 core):

void vector_tms9900(volatile struct bus_config *bc)
{
	static int count = 0;
	int start = 0x200;	// Our offset for writing the program to execute.

	unsigned vdp_addr = 0;
	int cells = 0;
	while(cells < 768) { 
		unsigned char *p = &__base_Ram3_32[start];
		p = put_magic(p);
		p = put_lwpi(p, 0x8C00);
		p = put_li(p, 1, (vdp_addr & 0xFF) << 8);		// VDP write pointer to beginning of VRAM 1/2
		p = put_li(p, 1, 0x4000 | (vdp_addr & 0x3F00));	// VDP write pointer to beginning of VRAM 2/2
		// Let's write as many characters as we can.
		while(p < &__base_Ram3_32[8*1024-32] && cells < 768) {
			const unsigned char e_char[] = { 0, 0xFE, 0x80, 0x80, 0xFE, 0x80, 0x80, 0xFE };
			for(int j=0; j<8; j++)
				p = put_li(p, 0, e_char[(j+count) & 7] << 8);
			vdp_addr += 8;
			cells++;
		}
		// Ok cannot write more E chars.
		// Write an instruction to return back to where we started.
		p = put_b(p, 0x6156);
		// Finally send the TMS9900 to it's merry way.
		launch_tms9900(bc);
		long_wait_magic(bc);	// wait until done
	}
	count++;
}

In the above, the function put_magic writes "INC @>7FFE instruction" into memory, put_lwpi writes a LWPI instruction, put_li writes a load immediate instructions (the arguments are the register number and the data word to be written). Finally put_b(addr) writes simply a "B @ADDR" instruction. These four opcodes are all that the test software uses for now.

 

Thus from the viewpoint of the TMS9900, what the above vector_tms9900() function does is that is simply first inserts the magic instruction, sets W pointing to VDP, and then loads a load of stuff to the VDP. In this case that constitutes of E characters, as defined in the e-char array. The last instruction written is the B >6156 instruction, which branches back to the cartridge stub.  

 

As the cartridge space is 8K, and the first >200 bytes are reserved for the cartridge stub, so there is only 7.5K of code that can be written. Each byte written to the VDP takes 4 bytes of cartridge space in my case, since each LI R0,VALUE instruction takes 4 bytes. This is the fastest method to write data to the VDP. The payload is the high byte of value. Writing all 768 characters requires thus 768*8*4=24576 bytes of ROM code. As this does not fit into one bank of cartridge space, the M4 will write as many instructions as possible into the cartridge space. Once the space gets tight, it launches the TMS9900 with the launch_tms9900() function. The launch_tms9900() function waits for the magic variable to increment before returning. As can be seen from the above, the first instruction in the dynamically created TMS9900 code is put_magic(). Interestingly the M4 core needs to wait for this to happen for a long time, between 200 and 400 iterations of a C code NOP loop...

 

long_wait_magic then waits for the TMS9900 to execute the whole bunch of code. This takes normally 150 000 iterations of a NOP loop on the M4 core. The magic variable increments when the TMS9900 is done writing the stuff to the VDP, and jumps to address >6156. This corresponds to the first instruction of the TMS9900 code snippet above.

 

That took a bit of explaining, but is very simple really. It's just a sequence of events, managed by the M4 processor, effectively in this example reducing the TMS9900 into a write-to-VDP engine. In a more practical example the M4 would render the next frame, i.e. stream of TMS9900 instructions, while the TMS9900 is running the previous stream. In the function above it would be before the long_wait_magic() function. 

 

 

  • Like 3
  • Thanks 1

Share this post


Link to post
Share on other sites
On 9/19/2020 at 4:47 AM, Tursi said:

On the SNES and the Genesis, the system was basically able to get a frame buffer image from the co-processor (over-simplified, but essentially). On the TI we don't have the bandwidth to do exactly that.

 

Interaction between the CPUs isn't too hard though... a common approach is to have a small amount of shared memory that both processors can access (so on the TI, it'd have to be on the cartridge) that can be used to pass information back and forth. It's probably enough to have space for data, and a single byte for control of that data (so that as long as you write that control byte last, you don't need to be too worried about the integrity of the data). For data that must be carefully controlled, I've used Peterson's algorithm successfully in the past and found it pretty simple: https://en.wikipedia.org/wiki/Peterson's_algorithm

 

Moving off topic, but would a co-processor in a sidecar cartridge be able to push data to the VDP at the same time as the main CPU executed a program on the 16-bit bus in scratchpad? 

  • Like 1

Share this post


Link to post
Share on other sites
6 hours ago, Asmusr said:

Moving off topic, but would a co-processor in a sidecar cartridge be able to push data to the VDP at the same time as the main CPU executed a program on the 16-bit bus in scratchpad? 

My instinct is no... I am pretty sure the 8-bit bus still sees activity even with 16-bit accesses (just that the multiplexer doesn't kick in). I would have to double check (but am feeling lazy), but I thought the VDP was attached to the 16-bit side of the bus anyway...

 

The 9900 has the ability to suspend for external DMA masters, but TI locked that pin down instead of exposing it to the expansion port. Not that that's exactly what you had in mind either. :)

  • Like 1

Share this post


Link to post
Share on other sites
On 10/8/2020 at 7:09 PM, Asmusr said:

Moving off topic, but would a co-processor in a sidecar cartridge be able to push data to the VDP at the same time as the main CPU executed a program on the 16-bit bus in scratchpad? 

The answer is no, like @Tursi wrote. The VDP sits actually on the high byte of the 16-bit bus. The 16-to-8 multiplexer sits between the 16-bit and 8-bit bus. There is no DMA capability anywhere, let alone from external 8-bit source to the 16-bit bus.

 

We could build another VDP which could reside in the sidecar, with its own memory, co-processor and potential mass storage. However, this VDP could not respond to normal VDP accesses without modifying the console. Perhaps the F18A Mk2 could be used to do something along these lines too.

Edited by speccery
  • Like 1

Share this post


Link to post
Share on other sites

I don't know why but my projects seem to go in cycles, almost exactly in 1 year cycles in this case. I just finished designing second version of the StrangeCart. This PCB design took many times longer than I expected. This was for me also a bit educational, I tested several things I haven't tried before, mainly much narrower traces and smaller vias. I had serious feature creep, but finally cut down several things to make the design process come to an end. I attach a screen capture from Cuprum, a Gerber file viewer of the final board. I sent the board to manufacturing, it's currently waiting for audit. I decided to test PCBWay for the first time, it will be interesting to see how it goes. 

 

The main changes of the board, compared to the first version, are:

  • A completely rerouted design. Nearly all components now on the top side, just a few passives on the other side.
  • Fixed the location of the cartridge hole, I made a mistake in the first design.
  • Fixed font size for silkscreen labels, and added much more information on the silkscreen
  • Added SWD debug connector for the MCU. 10 pin SMD connector with 1.27mm pitch.
  • Added serial flash chip for additional storage. I am going to populate prototypes with 16 megabyte flash chips.
  • Changed buttons to be through hole type, which should enable them to be accessible when mounted inside a cartridge box
  • Since I had some space, I added some prototyping stuff which are optional and not really relevant for TI-99/4A but may be useful for other use cases: place for 1.2V regulator (or any other voltage, 1.2V relevant for FPGA prototyping) and two transistor buffers to enable the euro rack trigger or gate signals be processed  (will accept -12V to 12V range digital inputs)

The main motivation for the redesign was to have the SWD connector. I also wanted to have the the MCU on the topside, making the boards faster to build. I realised that using NXP's MCUxpresso toolchain I can clock the MCU at 150MHz, which is of course much higher than the 96MHz I previously used, making the timing of the bus interface better. More important than raw clock speed, MCUxpresso connected via SWD enables much easier debugging with hardware breakpoints. There is also now a connector for SPI interface, which could for many things, for example connecting to Gameduino or to a SD card.

 

While placing an order for the components, I have noticed that many semiconductors are out of stock. I have many ideas for add-ons, and eventually did find some interesting chips in stock for future boards. Now that I'm up to speed again with PCB design (assuming this comes out well from manufacturing) I am planning to continue and do a few more designs shortly 🙂

Screenshot 2021-04-05 at 20.10.29.png

  • Like 8

Share this post


Link to post
Share on other sites
Posted (edited)

Looks like I might be receiving the new boards from manufacturing early next week, perhaps even on Monday.

In anticipation to that, I did some development and tested a feature @dhe asked in this thread approximately one year ago: whether it would be possible to override the console GROMs using the StrangeCart? The answer is yes, as the picture below shows...

 

The StrangeCart is overriding all of the system GROMs, with 24K of GROM content (6K unused).

 

This is my development computer, it has F18A installed and StrangeCart in the cartridge port. No other peripherals connected, nothing in the expansion bus. I loaded a modified version of system GROMs into the StrangeCart (just text changes). The new version PCBs will support additional Flash memory, which enables much more content to be loaded, especially GROM content.

 

The on-chip flash image on the microcontroller is now using nearly all of flash memory when storing firmware for the processors as well as ROMs and GROMs for Invaders, Extended Basic, Mini Memory, Parsec, Car wars, two 8K test cartridges, 24K of system GROM override and finally @Asmusr's cool Zaxxon demo (64K cartridge). 

 

Thus the current firmware feature set includes:

- Cartridge ROM emulation (paging supported)

- Cartridge GROM emulation

- Mini Memory emulation (ROM, GROM, RAM)

- System GROM override

- Execution of cartridge ROM TMS9900 code dynamically generated by the ARM core on the fly, with a synchronisation system enabling the ARM core to know when the TMS9900 reaches a certain point in the code. This enables the TMS9900 to write data to the VDP with maximum bandwidth.

 

V2 PCB, if working, enables me to move all ROM and GROM code over to the additional flash chip, leaving a lot of space for firmware development.

 

Oh, and I found some old code of mine: I started to write a Basic interpreter in C back in 1999. I revived that code (a couple of thousand lines of C), and have the interpreter working on my Mac. It's still the beginning and missing a lot of features, but once I have a meaningful amount of functionality I will port that to the StrangeCart and write a simple I/O library for the TMS9900. Then we should have a very fast Basic :) 

IMG_4251.jpg

Edited by speccery
  • Like 10

Share this post


Link to post
Share on other sites

If I undertand you correctly, the ARM core could draw, e.g. vector graphics, to a buffer in 32K RAM while the TMS9900 was unloading another buffer to VDP RAM? How much faster is the ARM core than the TMS9900? Do you imagine this could be used as a single game cartridge, and would it be affordable?

  • Like 2

Share this post


Link to post
Share on other sites
Posted (edited)
9 hours ago, Asmusr said:

If I undertand you correctly, the ARM core could draw, e.g. vector graphics, to a buffer in 32K RAM while the TMS9900 was unloading another buffer to VDP RAM? How much faster is the ARM core than the TMS9900? Do you imagine this could be used as a single game cartridge, and would it be affordable?

The processor is a cortex M4 running currently at 96MHz. It achieves about 125 drystone MIPS. I would say that in integer operations it is 1000 times faster than the TMS9900, and those are 32 bit integers. It has a hardware floating point unit, enabling it to perform single cycle floating point (single precision) operations. Thus if single precision floating point is all you need, and it is enough for many things, it will perform those pretty much as fast as integer operations. Compared to the TMS9900 it is a rocket.


It also support DSP operations, enabling it to compute for example four 8-bit saturating adds per cycle, or to compute multiply accumulate operations in single cycle.

 

Since the board plugs to the cartridge port, it appears to the TMS9900 as a 8K range of memory. It cannot access 32K extension. If you wanted to do vector graphics, the way I would do it is that I would render the graphics in on-chip RAM first and then construct a series of LI instructions for the TMS9900 so that workspace is set to VDP write address. This TMS9900 instruction stream would reside in the cartridge memory area. While the TMS9900 is running that, I.e. writing to VDP memory, the cortex M4 would render the next frame. The microcontroller can simultaneously serve the TMS9900 CPU bus and do other things such as render graphics, since there are two processor cores.

 

The cartridge is affordable, the component cost is probably around 10-15 EUR, excluding case. of course assembling the board will also cost something. But cost wise it probably is similar to a multi bank ROM cartridge.
Cost can be less if cost optimized. The board can support multiple games too, of course depending on the memory needs of a game. The new version will have 16 megabytes of Flash ROM.

 

Edited by speccery
  • Like 10

Share this post


Link to post
Share on other sites

I've been doing some testing, measuring the VDP write bandwidth with StrangeCart generated data. It appears that doing LI R1,data instructions with workspace pointer set at VDP >8C00 only gives me about 127 kbytes per second of bandwidth to the VDP. At 3MHz this would be about 23.5 clock cycles per LI execution. Does this seem right? The instructions are in cartridge ROM area, so we're talking slow accesses. I should verify the measurements with oscilloscope, currently I've just looked at the timer of the MCU. The half cycle in 23.5 is probably just some measurement error or other overhead.

 

Each LI R1,data only writes one byte to the VDP, since the VDP is a 8-bit device only connected to the upper half of the 16-bit data bus.

 

According to Thierry's tech pages the LI instruction in ideal circumstances takes 12 clocks and has three memory accesses - I assume them to be opcode fetch, immediate operand fetch and write to workspace. Out of those three accesses the two fetches occur over 8-bit bus, the last write to workspace is to the VDP and should go at maximum speed. According to Thierry's pages the 8-bit bus accesses require 4 additional clocks, which would mean 12+4+4=20 cycles. So in that case I'm not too far off... and the max speed even with nothing else than LI instructions is quite slow. Well I guess 127kbytes per second is not bad for the 99/4A. I need to hook up the oscilloscope to check the bus activity, hopefully tomorrow.

  • Like 1

Share this post


Link to post
Share on other sites
2 hours ago, speccery said:

I've been doing some testing, measuring the VDP write bandwidth with StrangeCart generated data. It appears that doing LI R1,data instructions with workspace pointer set at VDP >8C00 only gives me about 127 kbytes per second of bandwidth to the VDP. At 3MHz this would be about 23.5 clock cycles per LI execution. Does this seem right? The instructions are in cartridge ROM area, so we're talking slow accesses. I should verify the measurements with oscilloscope, currently I've just looked at the timer of the MCU. The half cycle in 23.5 is probably just some measurement error or other overhead.

 

Each LI R1,data only writes one byte to the VDP, since the VDP is a 8-bit device only connected to the upper half of the 16-bit data bus.

 

According to Thierry's tech pages the LI instruction in ideal circumstances takes 12 clocks and has three memory accesses - I assume them to be opcode fetch, immediate operand fetch and write to workspace. Out of those three accesses the two fetches occur over 8-bit bus, the last write to workspace is to the VDP and should go at maximum speed. According to Thierry's pages the 8-bit bus accesses require 4 additional clocks, which would mean 12+4+4=20 cycles. So in that case I'm not too far off... and the max speed even with nothing else than LI instructions is quite slow. Well I guess 127kbytes per second is not bad for the 99/4A. I need to hook up the oscilloscope to check the bus activity, hopefully tomorrow.

I've estimated LI Rx,data to be 24 cycles, since the R0 is actually the VDP write data port, that incurs the 4 cycle penalty as well.  12+4+4+4=24 cycles

  • Like 1

Share this post


Link to post
Share on other sites

I hope you can get a batch out soon that will be just for folks that want to play with this unusual 99/4a memory / co-processor.

O' and the basic you listed sounded really interesting!

  • Like 2

Share this post


Link to post
Share on other sites
7 hours ago, dhe said:

I hope you can get a batch out soon that will be just for folks that want to play with this unusual 99/4a memory / co-processor.

O' and the basic you listed sounded really interesting!

Thanks, I'm looking to hand build the first batch which could be good enough for first testers. I wonder what is the level of interest, how many people would be interested... Currently one problem in general is that many chips are in really short supply. If there is some demand, I probably need to order all the chips Mouser has in stock, they only have 12 of them...

  • Like 4

Share this post


Link to post
Share on other sites
Posted (edited)
4 minutes ago, speccery said:

Thanks, I'm looking to hand build the first batch which could be good enough for first testers. I wonder what is the level of interest, how many people would be interested... Currently one problem in general is that many chips are in really short supply. If there is some demand, I probably need to order all the chips Mouser has in stock, they only have 12 of them...

I'm interested in one.

Edited by RickyDean
spelling
  • Like 3

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...