PVcollib

alekmaul · November 11, 2019

Hi all,

I forgot to post here that i'm currently working on a lib based on works of newcoleco (Amy Purple).

The lib uses last sdcc version, is fully functionnal and i'm now going to do the wiki page to help people to install it and doing games with it

It is available here : https://github.com/alekmaul/pvcollib

Next step will be to add SGM support (how to detect it, to use it) and also F18A support (same functions).

If you want to help me with the lib, you're welcome !

alekmaul · December 3, 2019

New release 1.2.0 pushed on github

digress · December 5, 2019

Looks good. I will try it out on a future project.

Would be nice to have some easier access the sgm features etc.

alekmaul · December 5, 2019

SGM and f18a are on the way, don't worry. But I need to know how Phoenix handles SGM before (don't know if it uses same ports).

digress · December 5, 2019

I suspect the same. I'm making a sgm & f18a test right now. I am testing it on f18a/sgm colecovision & and also on a phoenix console.

I havn't found any incompatibility.

4 hours ago, alekmaul said:

SGM and f18a are on the way, don't worry. But I need to know how Phoenix handles SGM before (don't know if it uses same ports).

Edited December 6, 2019 by digress

alekmaul · December 6, 2019

Yep, I got confirmation, it's the same. I can work on it.

I began to integrate the f18a initialization, I will had a default bitmap mode and also adapt the tool gfx2col to have the good palette management for f18a.

If you want to participate to the lib, don't hesitate

Edited December 6, 2019 by alekmaul

digress · December 7, 2019

I doubt I could add anything to it that you haven't already covered.

I am , occasionally, writing a c library to reuse game code written in the coleco library for colecovision to produce a pc / vga game. I got a few routines done already.

The idea was I could just run a complete coleco game through it and output a vga dos game.

Edited December 7, 2019 by digress

alekmaul · December 9, 2019

Great, i did the same thing when i developed on gcw0 with sdl lib

digress · January 8, 2020

There has been some discussion about using the gpu of the f18a to run code because it runs so much faster than the z80. Specifically for speeding up graphics rendering.

Later perhaps it would be nice to add vector commands that the f18a gpu could run on say the fullscreen bitmap mode where every screen pixel would definable

f18a has 2k for running this code separate from the vram. if there was a few routines to clear the the tiles to black. then draw lines vectors and see how many per cycle make sense. So for games like asteroids or star trek.

Edited January 8, 2020 by digress

Pixelboy · January 8, 2020

I see three main challenges with adding vector-rendering capabilities:

1) Assuming the rendering of the entire screen is raster-based (line by line) then a screen buffer would be needed to draw all the vectors into, before sending the buffer to the TV screen. Assuming we have enough VRAM space to contain all the buffer, how do we quickly clear that buffer (after VBLANK) before drawing the next set of vectors towards the next screen refresh? Should we just redraw all the vectors with black pixels? Would we have time to do that and keep it a 60 frames per second?

2) If "vector objects" are constructed as a series of points defined as an angle + distance from a (0,0) origin point, then scaling and rotation could be done relatively easily around that origin (sinus and cosinus could be pre-calculated for a distance of 1) as long as the VPD can multiply the given cos and sin results by the required distance, and this implies multiplying floating-point values. Multiplication is always a bitch to compute...

3) If we go with a 256 x 192 display (I'm thinking ColecoVision here) then vectors could be drawn as basic spreads of pixels. But at a higher resolution, it would be good to have a line thickness parameter. Rendering a thicker line has to put more strain on the vector rendering.

There are probably other challenges I'm not thinking of, but those are the ones that come to my mind.

(And I'm now realizing that this thread may not be the best spot to post these comments, as I'd be curious to see what Matthew has to say.)

Edited January 8, 2020 by Pixelboy

digress · January 8, 2020

as for buffering.

if using 1 colour bitmaps line like traditional asteroids so everything is white.

full screen bitmap mode could actually provide 3 full screens of bitmaps of 256x192

reason being is ecm3 colour is 3 bit. 3 - 2kb layers . each layer represent a stacking effect to get the final 0-7 colour for that pixel.

000 or 100 for layer 1

000 or 010 for layer 2

000 or 001 for layer 3

however if the entire pallette was white expect 0 then you could use layer 2 & 3 to pre render lines and then update the pallette to hide the lines on the other layers

layer 1 bunch of lines in 1 bit colour

layer 2 bunch of lines in 1 bit colour but it's pallette entries are set to 0

layer 3 drawing a bunch of lines in 1 bit colour but it's pallette entries are set to 0

then you flip the pallette so layer 2 or layer 3 has it's coloured lines displayed

Edited January 8, 2020 by digress

Pixelboy · January 8, 2020

I think I understand what you're suggesting, but at a certain point, the canvas data has to be cleared before you can use it for rendering the next screen update.

digress · January 8, 2020

16 minutes ago, Pixelboy said:

I think I understand what you're suggesting, but at a certain point, the canvas data has to be cleared before you can use it for rendering the next screen update.

that would be simple. you would just blank that 2kb section of vram with 0's before drawing new lines

so you display 1 set of rendered lines

you are blanking 1 layer of rendered lines

you are drawing 1 layer of rendered lines with the pallete for that layer turn to 0

then you flip which layer is displayed next.

would work like a charm. the only question would how many lines could you render per cycle.

Pixelboy · January 8, 2020

1 hour ago, digress said:

that would be simple. you would just blank that 2kb section of vram with 0's before drawing new lines

so you display 1 set of rendered lines

you are blanking 1 layer of rendered lines

you are drawing 1 layer of rendered lines with the pallete for that layer turn to 0

then you flip which layer is displayed next.

would work like a charm. the only question would how many lines could you render per cycle.

I want to be sure I understand: Where are you taking that "2kb" from?

256 x 192 = 49152 pixels

49152 bits / (8 bits per byte) = 6144

digress · January 8, 2020

I think I am wrong now that I thought about it more. what I suggested would only work for 256 tiles of ecm3

6144 = 3 x 2048

normal coleco memory uses

2048 for top 1/3

2048 for middle 2/3 which is usueally just copied from first tiles set of 256

2048 for bottom 3/3 which is usueally just copied from first tiles set of 256

total 6144 bytes

same as full screen bitmap just redifines middle and bottom 1/3 to be unique

now ecm3 tiles use only 256 unique tiles for the whole screen so you would have to reuse.

but if in bitmap mode you can have 768 unique tiles with a custom pallette of 16 colours but will have colour bleed

but you wouldn't have more than 1 unless you define a second memory location and redriect the f18a towards that new memory location everytime you flip.

so you could still have 2 screens. however those 2 screens could use all 8 colours.

Tursi · January 9, 2020

Consider using the BML (bitmap layer) rather than a tile layer. That will give you a 256x192 with 4 colors and no color bleed. In addition, the BML supports a unique opcode in the GPU, "PIX", which will calculate the address and render a pixel in a single instruction. This can speed up the line draw since you don't need to calculate addresses, just count off the X and Y coordinates.

You do still need to handle clipping yourself. It is also, unfortunately, still 12k, but perhaps the bit plane options you were considering above will allow a tricky page flip. You could change the color palette to turn a layer off instantly then erase it with nobody the wiser. I think you were suggesting this above, but for ECM tiles...

I had thought there was a DMA engine to do the clear, but maybe it didn't happen... I don't see it in the spreadsheet.

Disclaimer: I haven't done the above, this is just from many discussions about it with Matt.

digress · January 9, 2020

I will do a test with that setup. 12kb is a bit brutal of 16 kb vram though so it doesn't leave room for much else. i've never used the BML setup either but I understand it. I could still get 1 colour sprites layer too.

though my thought about palette swapping you could draw new lines on to the screen and then pallet swap them into view might work with some minor erasure of the current foreground image. wouldn't be as clean as a complete page swap.

I'm sure I can make this work. I would get a kick out of making a vector graphics game this way. battlezone or something like that.

10 hours ago, Tursi said:

Consider using the BML (bitmap layer) rather than a tile layer. That will give you a 256x192 with 4 colors and no color bleed. In addition, the BML supports a unique opcode in the GPU, "PIX", which will calculate the address and render a pixel in a single instruction. This can speed up the line draw since you don't need to calculate addresses, just count off the X and Y coordinates.

You do still need to handle clipping yourself. It is also, unfortunately, still 12k, but perhaps the bit plane options you were considering above will allow a tricky page flip. You could change the color palette to turn a layer off instantly then erase it with nobody the wiser. I think you were suggesting this above, but for ECM tiles...

I had thought there was a DMA engine to do the clear, but maybe it didn't happen... I don't see it in the spreadsheet.

Disclaimer: I haven't done the above, this is just from many discussions about it with Matt.

Edited January 9, 2020 by digress

Pixelboy · January 9, 2020

2 hours ago, digress said:

I'm sure I can make this work. I would get a kick out of making a vector graphics game this way. battlezone or something like that.

My wish would be to make it possible to have vector arcade games like Asteroids Deluxe, Eliminator, Gravitar, Lunar Lander, Star Castle and others which are a little too graphically intensive for a vanilla ColecoVision to handle. I could see the CollectorVision Phoenix welcoming home ports of those arcade games, if the F18A can support a vector engine that can display a lot of vectors on the screen.

Edited January 9, 2020 by Pixelboy

matthew180 · January 9, 2020

Sorry for being late to the thread, I tend to try and limit my attention these days so I can focus on getting the MK2 done.

My thoughts as I read through the thread:

The MK2 is going to force some changes since it has 512K of VRAM. Not that this will help the existing F18A, but I am working on changes that will hopefully free up an additional 4K of Block RAM in the FPGA (by not being so greedy with line buffers), which I can make available as VRAM for the current F18A. So a total of 20K instead of 16K. The private 2K of GPU-only RAM will probably become part of that 20K.

If I can get 20K of VRAM, this should help make it possible to double-buffer a pattern table or the bitmap layer, so you will have a full frame to draw the next frame, instead of only the vsync period.

The GPU runs at the internal 100MHz clock, but takes multiple cycles per instruction. You can assume an average of 250ns per instruction when memory-to-memory operations are used. The GPU has full 16-bit read/write access to VRAM as well as the palette registers and general VDP registers. The GPU can easily respond to the horizontal interrupt, or just spin and watch for a certain scan line if you need such capability.

If you use the BML (bitmap layer), as Tursi mentioned, the GPU has a dedicated "PIX" (pixel) instruction that can read/modify/write a pixel based on X,Y coordinates. There is no faster way to write pixels since all the calculations are done in hardware and the actual VRAM update happens in 10ns (although the entire instruction takes several cycles). The PIX instruction can also partially operate in GM2 mode by performing the calculation to find the correct byte and bit to update for a pixel.

The GPU also has a block move processor that can copy bytes of VRAM at 10ns per byte. You absolutely cannot get faster access to VRAM. I realize this does not do much for pixel processing since there is no pixel-per-byte mode, but it can speed up things like clearing large sections of VRAM, shifting the name table to support horizontal and vertical scrolling, etc.

You can change the VRAM address pointer's auto-increment value from -128 to +127 (signed byte). So you can do stride-based VRAM access. Helps do horizontal scrolling of the name-table, for example, if you don't use the GPU.

I don't like "modes", so aside from the modes that are part of the original VDP, i.e. GM1, GM2, T40, etc. you will find that most of the features in the F18A are more like "layers", and can be used in any "mode". For example, the BML is not a "mode", you just turn it on, so it can be used in any mode. Same with sprites, once the F18A is unlocked, sprites are available in all the "modes", and can be used at the same time as the BML. Tile Layer 2 (TL2) is probably the only enhancement that was intended to be used in GM1 mode.

Sprites and tiles can have their own ECM level, so if you don't need ECM3 for tiles, you can use ECM2 and save some VRAM. Sprites can also have their own size, per-sprite, as well as flip-x, and flip-y, to try and help save pattern VRAM.

You can limit the number of patterns available to sprites and tile, so the size of the pattern tables can be reduced if you don't need all 256 patterns. The reduction in size is by powers of 2, so 256, 128, 64, 32. Can be huge VRAM savings if you manage patterns carefully.

There has been a lot of discussion in the 99/4A subsection about doing vector type graphics and such. If you have not frequented the 99/4A dev sub-forum on these topics, you might want to see what Rasmus and others have discussed and tried.

Pixelboy · January 9, 2020

Thanks for your input, Matthew!

matthew180 · January 10, 2020

Quote

I am aware that the F18A has 2K for running custom code. Can that be used to draw vectors? For example, the game software writes some pixel coordinates somewhere in VRAM, and then a routine would read those coordinates from VRAM and draw a line between them.

Yes, you could do that. The 2K of GPU-only RAM is just that, RAM, but only the GPU in the F18A can address it; meaning you cannot access this RAM via the VDP address-pointer used by the host CPU to access normal VRAM. You have to use the GPU to move data between this RAM and normal VRAM. This limitation is because the GPU has a real 16-bit address bus, and the VRAM address-pointer is only 14-bit. Also keep in mind that the changes coming in the MK2 are going to change this.

Really the GPU RAM is handy for GPU subroutines, exactly like you described. If you make a GPU line draw routine (using the PIX instruction to make it really fast), then you could certainly load pairs of xy line coordinates into VRAM and trigger the GPU routine to process the list. Rasmus may have already written such a routine, and you should definitely check out his F18A demos (easiest way is via his js99er browser-based emulator).

Quote

Can the block move processor write zeros in a range of bytes? I don't mean copying a block of zero-ed bytes between RAM addresses, but just writing zeros as fast as possible. I'm asking because it's not clear to me how much time the F18A needs to clear a 256x192 screen of all vector data.

The F18A has a DMA engine that can copy or fill blocks of memory, so yes, the DMA can do a fill operation instead of a copy. It can also move forwards or backwards through VRAM. Each read/write operation takes 20ns, so the entire 16K VRAM could be processed in 327.6us. The 768 byte name table can be moved in 15.3us, and a 2K pattern table could be cleared in 40.9us. Here is a summary of the DMA registers:

* >6000 to >603F VDP regs
*
* The DMA src, dst, width, height, stride are copied to dedicated counters when
* the DMA is triggered, thus the original values remain unchanged.

NTBA        EQU >6002            ; Name table base address
DMA_SRC     EQU >8000            ; DMA 16-bit src address, MSB first
DMA_DST     EQU >8002            ; DMA 16-bit dst address, MSB first
DMA_W       EQU >8004            ; DMA width
DMA_H       EQU >8005            ; DMA height
DMA_STRIDE  EQU >8006            ; DMA stride
DMA_CMD     EQU >8007            ; DMA command: 0..5 | !INC/DEC | !COPY/FILL
DMA_TRIG    EQU >8008            ; DMA trigger, write any value to address

The addresses of these registers are available only via the GPU. Using the DMA is non-destructive these registers, so once set up it can be triggered multiple times using the same parameters (if you need to do the same operation multiple times, i.e. always clearing or copying a block of memory, etc.)

The F18A Programming and Resources thread in the 99/4A dev sub-forum has a lot of the features documented. I'm working on getting proper docs written in one place, but until then, asking questions is the best way to get the info. Here is a port where I show how to use the GPU to do the name-table block moves to assist with scrolling:

https://atariage.com/forums/topic/207586-f18a-programming-info-and-resources/?do=findComment&comment=3604629

digress · January 10, 2020

so i found in the ti forums probably what I would need. I just need to get it converted to sdcc c language equivalent.

Here's the code for a F18A GPU/PIX based line drawing algorithm if anyone's interested. Note that BSTK and RSTK are WinAsm99's names for the F18A GPU opcodes CALL and RET.

*********************************************************************
*
* Draw a line from (x1,y1) to (x2,y2)
*
* Translated from C version at:
* http://rosettacode.org/wiki/Bitmap/Bresenham's_line_algorithm#C
*
* void line(int x1, int y1, int x2, int y2) {
* 
*  int dx = abs(x2-x1), sx = x1<x2 ? 1 : -1;
*  int dy = abs(y2-y1), sy = y1<y2 ? 1 : -1; 
*  int err = (dx>dy ? dx : -dy)/2, e2;
* 
*  for(; {
*    setPixel(x1,y1);
*    if (x1==x2 && y1==y2) break;
*    e2 = err;
*    if (e2 >-dx) { err -= dy; x1 += sx; }
*    if (e2 < dy) { err += dx; y1 += sy; }
*  }
* }
*
* R0	x1 value
* R1	y1 value
* R2	x2 value
* R3	y2 value
* R13   Color (0-3)
*
* Modifies registers R0-R10
*
LINE   
*	   Setup variables	   
	   CLR	R6				* R6 is sx = 0
	   MOV	R2,R4				* R4 is dx = x2
	   S	R0,R4				* dx = x2 - x1 
	   JGT	DXPOS
	   DEC 	R6				* sx = -1
	   JMP  LINE1
DXPOS      INC 	R6				* sx = 1
LINE1      ABS	R4				* dx = abs(dx)b
	   CLR	R7				* R7 is sy = 0
	   MOV	R3,R5				* R5 is dy = y2
	   S	R1,R5				* dy = y2 - y1 
	   JGT	DYPOS
	   DEC 	R7				* sy = -1
	   JMP  LINE2
DYPOS      INC	R7				* sy = 1
LINE2      ABS	R5				* dy = abs(dy)
	   C	R4,R5				* Compare dx to dy
	   JGT	DXGTR
	   MOV 	R5,R8				* R8 is err = dy
	   NEG 	R8				* err = -dy
	   JMP 	LINE3
DXGTR      MOV	R4,R8				* R8 is err = dx
LINE3      SRA  R8,1				* err = err / 2
	   MOV	R4,R10				
	   NEG	R10				* R10 = -dx
*	   Main loop
LINEL      BSTK	@PLOT	   			* Plot (x1,y1)
	   C	R0,R2				* Compare x1 to x2
	   JNE	CONT				* Continue if x1 != x2
	   C    R1,R3				* Compare y1 to y2
	   JNE	CONT				* Continue if y1 != y2
	   RSTK					* Break
CONT       MOV	R8,R9				* R9 is e2 = err
	   C	R9,R10				* Compare e2 to -dx
	   JLT	LINE4				* Jump if e2 < -dx
	   S	R5,R8				* err -= dy
	   A	R6,R0				* x1 += sx
LINE4      C	R9,R5				* Compare e2 to dy
	   JGT	LINEL				* Loop if e2 > dy
	   A	R4,R8				* err += dx
	   A	R7,R1				* y1 += sy
	   JMP	LINEL				* Loop
*// LINE

*********************************************************************
*
* Plot a pixel at (x,y)
*
* R0	x value
* R1	y value
* R13	color
*
PLOT       MOV	R0,R12
	   SWPB	R12
	   SOC	R1,R12
	   XOP	R12,R13				* PIX
	   RSTK
*// PLOT

Edited January 10, 2020 by digress

+5-11under · March 8, 2020

Alek, newbie questions...

I'm assuming I just need to grab the whole file tree?

Any other files needed to compile and link?

How to compile and link?

alekmaul · March 23, 2020

Sorry, forgot to reply to your post @5-11under, everything is explained here : https://github.com/alekmaul/pvcollib/wiki

+5-11under · March 23, 2020

Thanks Alek. I was able to get it working. I'll use it for any new projects, for sure. Existing projects I'm sticking with using an old computer to finish them up.

PVcollib

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members