Jump to content
IGNORED

Copper (raster tech demo)


PeteE

Recommended Posts

  • 2 weeks later...
  • 2 weeks later...

Here's the copper demo "how it works" explanation. 

 

Rotation

 

There are 64 steps of rotation, at 5.625 degrees each.  I calculated the slopes up to 45 degrees, and tried to make repeating slope character chunks that fit in the fewest number of character patterns.

 

Angle 1 is horizontal bars, and uses 1 character for each bar:

angle1.thumb.png.5da0d954d23843b3240e02428e354004.png

 

Angle 2 uses 10x1 characters per slope: (the most of any angle)

angle2.thumb.png.ab60d896aed1d41cb4430a59a09054c7.png

 

Angle 3 uses 5x1:

angle3.thumb.png.252d790cc5914cadece59541395afae2.png

 

Angle 4 uses 3x1 or 4x1: (in an attempt to keep the width of the bars somewhat consistent during rotation)

angle4.thumb.png.3f31305ac361ececd9d70424f64e2833.png

 

Angle 5 uses 5x2, with 6 characters:

angle5.thumb.png.d31c46416632ba7bf911696fd69aac1a.png

 

Angle 6 uses 2x1:

angle6.thumb.png.243f26f5b9c5454eeeaa0ec6a1dd5a09.png

 

Angle 7 uses 3x2, with 4 chars:

angle7.thumb.png.ffbb21c29ec99201cdcf2756361e5ef9.png

 

Angle 8 uses 4x3, with 6 chars:

angle8.thumb.png.9e0af68aade319260f8b8df7955d93fe.png

 

Angle 9 uses 2x1, but with two different offset characters (for consistent bar width)

angle9.thumb.png.826328a8f503b19cf9de0a42bb7632b5.png

 

Angles 10 to 16 are the same as angles 8 down to 2, except flipped diagonally.

 

Angle 17 is vertical bars:

angle17.thumb.png.9aef14a9002acdb01e81c7c1cff35606.png

 

Angles 18 to 32 are the same as angles 16 down to 2, except flipped vertically.

 

Angles 33 to 64 are the same as angles 1 to 32, except the color bars are in reverse order.  So all the image and pattern data is the same, except the copying a palette to the color table is reversed.

 

Interleaving

 

The "COPPER 99/4A" text and wide wipe bars exist on a different screen image table and color table.  The black rectangle in the rotated bars images, above, is the same location as the "COPPER 99/4A" text, below.

copper994a.thumb.png.0ba66527d05ce80cb8fc85c3c3e3b9db.png


The demo interleaves two images together, drawing one line from the rotating bars and one line from the "COPPER 99/4A" image.  The video screen on the TMS9918A is drawn from the top-left to the bottom-right, one scanline at a time, 60 times per second.  On every video scanline, the demo changes the screen image table and color table registers between the two.  This requires that the character patterns also be interleaved: all 8 of the rotation character pattern tables have the wide bars and copper text on the 2nd, 4th, 6th, 8th lines.  So no matter which rotation (pattern table) is being used, the wide bar/copper text image stays the same.

 

VDP Memory Layout

 

0000:02FF Screen image table A (rotation)

0300:037F Sprite list table

0400:07FF Character pattern table for angles 7-9

0800:0AFF Screen image table B (rotation)

0B00:0B1F Color table (rotation)

0B20:0B3F Color table (copper 99/4a)

0C00:0FFF Character pattern table for angles 10-12

1000:12FF Screen image table C (copper 99/4a)

1400:17FF Character pattern table for angles 26-28

1800:187F Sprite pattern table (limited to 4 sprites)

1A00:1FFF Character pattern table for angles 17,22-25

2200:27FF Character pattern table for angles 1-5

2A00:2FFF Character pattern table for angles 13-16

3200:37FF Character pattern table for angles 29-32

3A00:3FFF Character pattern table for angles 18-21

 

Pattern tables actually overlap the other tables in memory, except the characters in overlapping areas are not actually used.  Some rotation angle pattern sets use 24 characters per bar, and other pattern sets use 16 characters per color bar - allowing room for the screen image and other tables in the upper half.  The black rectangle on rotation images does use character 0, but the color table is set to black on black so the pattern doesn't matter.

 

Changing the rotation requires double-buffering the screen image table because 32*24 (768) bytes cannot be written to the VDP memory fast enough during vertical blanking.  The two halves are copied over two frames, and then the register flips to the new image to be displayed.  So the main loop looks like this:

  • Copy upper half of rotation image to table A (or B next iteration)
  • Do interleaving scanlines loop
  • Copy lower half of rotation image to table A (or B)
  • Do interleaving scanlines loop
  • Change character pattern table register to the appropriate set for the current angle
  • Change the screen image table register to table A (or B)

Furthermore, there still wasn't enough time in the vblank even for a fast-ram unrolled loop copy... maybe 256 bytes per update would have worked, but it also reduce the frame rate from 30Hz to 20Hz, no thanks.  Instead the data is converted to asm instructions that Load Immediate each byte into a workspace register mapped to the VDP Write Data register.  Like this:

SCR_0A
       LWPI VDPWD
       LI R0,>8800
       LI R0,>8800
... 384 total LI instructions
       LWPI WRKSP
       CLR @BANK0
       RT

Unfortunately this quadruples the size of the data in ROM... 384 bytes becomes 1.5KB, so I can fit only 4 halves (6KB) into one cart bank (8KB.)  So all 32 angles would require 16 banks, and that's why the cartridge is 128KB.

 

 

 

 

 

(More to come...)

Edited by PeteE
  • Like 14
  • Thanks 2
Link to comment
Share on other sites

On 1/24/2021 at 3:16 PM, fabrice montupet said:

PeteE, do you plan to make a V9938/58 version of your demo? That would be great! I miss your great demo on my computer.

Sorry, I've been trying to make it work, but so far the effect is not stable yet.  I will keep working on it and let you know.

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...
On 12/18/2020 at 1:36 PM, PeteE said:

From what I understand about the F18A, it doubles the scanlines to produce the 480p60 VGA output, and therefore must run horizontally twice as fast to scale up by 2.  In my raster test on F18A, the white band appears only only every other scanline and twice as wide, which would seem to support that idea.  I suspect it also latches the screen image table register and the color table register at the start of each scanline, @matthew180 can you confirm? 

 

But I'm also thinking a better solution would be to detect F18A and run the GPU to change the registers using the F18A-specific horizontal interrupt.

 

On 12/19/2020 at 4:12 PM, Asmusr said:

Well, after some thought I also agree with you ?. Because there is more to this demo than just the scanline detection, and everybody should have a chance to see it. But maybe there is a simple fix once we know what the issue with the F18A is. I could image that it's setting the collision flag on every VGA scanline instead of only setting it on even scanlines. Would that break the demo? I set up my F18A machine today for the first time since corona to be able to test it. 

 

The 9918A generates 262 lines of video every 16.6ms, which means every horizontal line takes ~63.6us.  The F18A is generating 525 lines of video every 16.6ms, which means each line is ~31.1us, and each line is displayed twice to match the vertical resolution of the 9918A.  There is no way to "slow down" the horizontal scan rate and still display video on a modern computer monitor in real-time (which would require a 15KHz monitor) .  There is nothing to fix.  If I am missing something, please let me know.

 

The F18A "renders" each line at 100MHz which is the only way it has enough time to draw all 32-sprites, both tile layers, and the bitmap layer all at once.  So, the main problem with the timing detection method used in this demo, when running on the F18A, is that by the time the collision is detected, the scan line is done being rendered.  The F18A does not latch any register values during the tile pixel generation, it just happens a lot faster than on the 9918A, so updates during that process would be very hard to pull off, even with the GPU.

 

On the F18A, it is better to do end-of-line detection, since it is designed to provide the sprite collision detection at the original 9918A resolution, i.e. every *other* VGA scan line.  IIRC, this is what the "Don't Mess with Texas" demo does?

 

I need to think about it a little more.

  • Like 5
  • Thanks 1
Link to comment
Share on other sites

I've been meaning to dig into the FPGA code for the F18A in the Phoenix repository as part of my FPGA learning adventure.  (I did manage to build it into a new empty machine type for the Phoenix, and saw the F18A test screen on HDMI when loaded onto the Phoenix.)  In my video-processing opinion, to more accurately emulate the 9918A, it would need to double-buffer each scanline.  Each scanline would be rendered into a buffer every ~63.3us at the same speed as the 9918A, and meanwhile the other buffer is output to the monitor twice at ~31.1us each, and then the buffers are swapped.  This would delay output by 1 scanline, but should be imperceptible.

 

The trouble with the copper demo on the F18A is that it tries to update two VDP registers during the border & horizontal-blank outside of the 256-pixels wide active video, at the end of every scan line.  It does this by using status register feedback from sprite collisions to cycle-lock the loop which changes the registers.  Changing the 2 registers takes 64 CPU cycles, or ~21.3us.  (Given the whole scanline is ~63.3us on the 9918A, that's about 190 CPU cycles per loop iteration, which isn't very many instructions.)  When a scanline is doubled and output at 31.1us, the register change happens mid-way though the 2nd scanline, where it glitches the transition in the visible portion of the screen.  The way the character pattern table is interleaved with even and odd lines requires the corresponding color and screen tables be set in the registers, if the register changes happen too soon or too late, the effect is ruined.

  • Like 1
Link to comment
Share on other sites

2 hours ago, PeteE said:

Each scanline would be rendered into a buffer every ~63.3us at the same speed as the 9918A, and meanwhile the other buffer is output to the monitor twice at ~31.1us each, and then the buffers are swapped.

The scan lines are actually double buffered in the current implementation.  However, the state machine that sequences the memory table accesses and tile expansion happen at the 100MHz clock.  Slowing this process down to finish just-in-time could be done, but it would take a rewrite and would also have to interleave with sprite processing and pattern expansion.  Not impossible, but at this point it would be a lot of effort to rewrite these sections.  If all the F18A did was implement the original 9918A functionality, it would not be so hard, but with all the extra capability, it becomes a lot more complicated.

  • Like 2
Link to comment
Share on other sites

1 hour ago, matthew180 said:

The scan lines are actually double buffered in the current implementation.  However, the state machine that sequences the memory table accesses and tile expansion happen at the 100MHz clock.  Slowing this process down to finish just-in-time could be done, but it would take a rewrite and would also have to interleave with sprite processing and pattern expansion.  Not impossible, but at this point it would be a lot of effort to rewrite these sections.  If all the F18A did was implement the original 9918A functionality, it would not be so hard, but with all the extra capability, it becomes a lot more complicated.

This is probably a stupid proposal, as I can imagine lacking space on the F18a MK1, but perhaps an option for the Mk2?

 

What about having 2 “VDP cores” and a frontend that switches between the two depending on the graphics mode/VDP register settings. That way when you are in pure “TMS9918A” mode, the 9918A core is in charge. If you are in F18a mode the F18a core takes over and you get all the bells and whistles. Guess that would require a major rewrite, for something that are edge cases.

(BTW still hoping to see 48/60 rows mode someday ;-)

  • Like 1
Link to comment
Share on other sites

  • 1 year later...
3 hours ago, fabrice montupet said:

Do you have any news about the V9958 port of the Copper demo?

No, sorry.  I am unable to find a solution without a V9958 to test against.  Would something like an EVPC or EVPC2 with a V9958 work with a NTSC 99/4a?  Does anyone have one I can borrow?

Link to comment
Share on other sites

9 hours ago, OLD CS1 said:

Is it possible to get this as a repeating demo rather than exiting to the MTS?  I would like to have this as a demo at my next VCF exhibit.

Is next VCF in 2023?  Sure, that will give me time to do music too :D

Link to comment
Share on other sites

2 hours ago, PeteE said:

Is next VCF in 2023?

Maybe 2022?  Dunno yet.

2 hours ago, PeteE said:

Sure, that will give me time to do music too :D

I was thinking about this, too.  Not sure why, but when I watch it I hear the Spaceballs' "State of the Art" demo music in my head.

 

(Flashy flashy video warning...)

 

 

  • Like 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...