The GPU registers are dedicated, and yes they are much faster than accessing memory. The GPU is actually rather inefficient since it was my first attempt at a CPU and I brute-forced a lot of it. If I rewrote it I could probably reduce the cycles per instruction. But the biggest benefit is in being able to make dedicated hardware to support graphic operations. That is what the PIX and DMA were for, but I'm not sure either of those are panning out to be that useful (like the original scheme I had for scrolling, borders, etc.) Since there is not much software (other than a lot of your demos) that use the PIX or DMA, I have no problem replacing those with hardware that supports more useful operations. There is also a 10-nanosecond-accurate timer in the F18A that I don't think anyone has *ever* used.
It takes a while for concepts to sink into my head, so if you and artrag want to explain the painful low-level detail what you need hardware support for, I'll look into what it would take to make it happen.
Up for the chopping block if not really necessary:
1. second, millisecond, microsecond, nanosecond timer.
2. PIX instruction
4. CPU access to reading registers (take a lot of FPGA logic to support this)
Let me know.
I don't really think the GPU needs to be faster, I can just optimize my code. If we say that the standard TI-99/4A has been used up to 90% of its potential so far (on the Rasmus scale ), I would say that the TI-99/4A+F18A has only been used up to 30% of it's potential, and only by a handful of people.
The DMA is fine as it is for copying or writing. I use it in the raycaster for clearing the double buffer.
The feature I suggested would be to be able to scale the (single) bitmap layer. All I would need for this project would be to double the height, which would give you square pixels in the fat pixel mode. More generally you could scale by any value in both x and y, and ultimately I think systems like the SNES allowed you to scale the bitmap in 3D so you could produce something like the floor in my Skyway game.
The feature Artrag suggested to speed up the rendering of the raycaster would be to copy a one pixel wide strip of a bitmap stored in vdp ram to the visible bitmap in vdp ram, scaling it in the y direction, and taking into account bit mapping and possibly transparency. So unlike the general DMA it would be specific to the bitmap layer and would work on pixels instead if bytes. To generalize it should support any size at the source, scaling in both x and y at the destination, and would work with both normal and fat bitmap pixels.
I'm not arguing strongly to implement any of this because I'm happy to work within the current limitations.