Jump to content
IGNORED

Raycasting demo


Recommended Posts

Added a special-case fixed-point multiplication routine. It is only used when one of the multiplicand values is guaranteed to be in the range -1.0 ... +1.0. This actually covers most of the calls to the multiplication routine and results in a noticeable speed boost.

 

If anyone wanted to, say, port Wolf3D to the Inty, the frame rate might be good enough now ;)

raycast.zip

Link to comment
Share on other sites

  • 10 months later...

I've been taking a second look at my fixed-poing 8.8 multiplication routines and it might be possible to optimize them some more. I built a list of multiplication "patterns" and a Java program that validates them and searches for the optimal ones based on whether it's being used for the first stage (low byte) or for the second stage (high byte). I'd like to get it to the point where it can autogenerate the necessary CP1600 assembly; now that would be awesome.

 

As an example, my patterns look something like this:

 

Assuming the value to be multiplied is in R2:R0, encoded in the following format: 0000 0000 HHHH HHHH : LLLL LLLL 0000 0000

And the end 8.8 fixed-point result is to be in R5, with R4 used as an intermediate register

 

Some sample patterns for various multipliers (many more potential variations are possible):

 

0 (empty)

1 +

2 ++

.+

-+++

--..+

--:+

.-.+

3 +++

+.+

4 ++++

..+

:+

 

.

.

.

 

 

252 :-:::+

253 -.-.:::+

254 --::::+

255 +.+.+.+.+.+.+.+

-::::+

256 ::::+

 

 

+ means add R2:R0 to R5:R4 (remember the carry bit!)

- means subtract R2:R0 from R5: R4 (remember the carry/borrow bit!)

. means shift R2:R0 left by 1 bit

: means shift R2:R0 left by 2 bits

* means add R2 to R5 (this is only valid for phase 2 when R0 is always 0) -- think of it as an abbreviated + operation and is only used to substitute for the ending + in a pattern

 

These are the basic pattern elements, but there are more specialized ones if I'm able to use more registers or use them in different ways. Each operation has a cost in clock cycles, and the Java program can build some extra patterns based on existing ones. I have patterns using the basic operations above for multiplying by any value from 0-256, though there could always be some that I haven't found yet (not including specialized register-specific variants). There's a lot of potential to improve the multiplication speed, I think.

 

For example:

 

A pattern for multiplying by x+1 can always be built by +(pattern or x). This effectively eliminates the need to store separate patterns for odd multipliers.

A pattern for multiplying by x-1 can always be built by -(pattern or x). This effectively eliminates the need to store separate patterns for odd multipliers.

 

If I have the value for R2:R0 in a single 16-bit register before splitting it into R2:R0 format, then the ::::+ pattern can be replaced by a single operation that adds it to R5.

 

I'm still investigating more advanced ways of building patterns based on what register combinations are available at different times, but it has the potential to lead to a highly optimized way of multiplying 8.8 fixed-point numbers in native CP1600 code.

Edited by JohnPCAE
Link to comment
Share on other sites

This is interesting.

 

I had written a multiply generator some time back for integer multiplies that comes up with similar patterns to what you're computing. It didn't compute fixed-point MPYs, but I thought it might be fun to compare notes. I noticed many of our patterns are similar.

 

Attached is my C code and what it generated, if you'd like to take a look.

 

(The silly ".c.txt" extension is to get around AA's silly file extension restrictions.)

mpyk.asm

mult_by_constant.c.txt

Link to comment
Share on other sites

  • 9 months later...
  • 6 months later...
  • 1 month later...

Very good.
Btw
Would it be possible to build the screen in a hidden buffer ?
How many gram tiles do you use now ?

It seems you are using 10 tiles x 6 tiles

How many columns do you render ?

If you render 10*8 = 80 columns you could speed up the computation using less angles, say 40.

All you need to do is to set two pixels at time in your gram cards and keep 10 tiles wide the window.

 

About the walls, I see you can reuse the same tile vertically for large portions of the image.

I've the feeling you "blit" column by column all the gram tiles without exploiting the fact you can replace a whole tile instead of passing over it bit by bit 8x8 = 64 times during the rendering.

From what I see the time needed to update the gram is about 1 or two frames.
The tearing is very evident.
If you were able to use less than 32 tiles you could swap between the two subsets of tiles at each scene update.

 

If as I think you blit bit by bit the whole 10x6 tiles, I think you have room for improving the rendering speed.

 

A simple strategy for filled walls could be:

 

Compute in an array in ram the height of each column (now 80 bytes) using your raycasting engine.

Compute on each column how many integer tiles would be needed and group them 8 at time (divide by 8 the 80 values - shift).

Use a filled tile (no blitting, use grom - CARD 95) to plot the minimum number of pixels in a set of 8 (the "common part" of the 8 columns) (find the minimum out of 8 values)

Render the 8 spare heights in a set of gram cards (this time bit by bit as you do now) (use "and" and the minimum above to find the 8 remainders).

Edited by artrag
Link to comment
Share on other sites

You make some good points, though skimming them at 4am is causing most of them to sail over my head :)

 

Actually, this program has two modes: GRAM and Colored Squares. The side buttons will toggle you between the two modes, and the numeric keys can be used to set the rendering distance. The frame rate in colored squares mode is MUCH faster, for different reasons (less pixels, and I draw them a whole card at a time).

 

I've made some more optimizations (this time to the main general-purpose multiplication routine as opposed to the special-case one). In colored-squares mode, the frame rate seems noticeably better.

raycast_20150924.zip

Link to comment
Share on other sites

I think I see. Well, for a first start, I changed the Colored Squares mode to first determine the wall heights and then render the image all at once. The tearing is no longer visible in that mode now. I also fixed several bugs in my multiplication routine and added some text that show what the side buttons and keypad keys do.

raycast_20150926.zip

Link to comment
Share on other sites

Very good. Actually I think that in colored squares mode you can plot two columns at time by plotting the repetitive blocks with a single access to backtab vram.

 

That's what I do; I write one card to plot four squares at a time, for a total of 240 writes to BACKTAB.

 

I think the major performance bottleneck is in my FixedPtMultiply routine (the full one, not the limited-case one). I'm investigating using Joe's quarter-square implementation, and so far I've switched the limited-case version over to it (though I'm not noticing a performance improvement because I think the limited-case one isn't taking up that much time relative to everything else).

Edited by JohnPCAE
Link to comment
Share on other sites

You should change color of each column according to the distance e.g. using different levels of green

It is simple and effective to increase the realism.
It could work also in color stack mode even if with color clash

 

this is what you get when have 256 colors to play

 

Edited by artrag
Link to comment
Share on other sites

  • 2 weeks later...

I was thinking of maybe trying for a Treasure of Tarmin look at some point, with alternating wall colors. For now, though, here is a new version with (hopefully) improved performance. I added a version of the main multiplication routine that uses the quarter-square method and set the code to use that instead. The shift-and-add version is still there as well, just not used.

raycast_20151012.zip

Link to comment
Share on other sites

The speed in color square mode seems ok for a game but you should really use two colors at least for walls. E.g. dark green for N/S sides and light green for E/W sides. You could get the info from the final step of the ray casting loop.

 

About the color stack mode, the frame tearing needs an approach like the one we discussed earlier.

 

BTW, for a game, I would focus on coloring walls in color square mode.

This would allow to use the GRAM for sprites and Items.

Edited by artrag
Link to comment
Share on other sites

  • 2 months later...

I did a bit more work on the raycasting engine to try to get some more speed out of it. I optimized the casting loop in RenderCS() so that it scales better to more distant walls. You'll only see the difference in colored-squares mode as that's the only routine I worked on, but porting it to the normal Render() routine would be straightforward (the one that deals with F-B mode). Anyway, the frame rate does seem a bit higher in colored-squares mode now.

 

In the back of my mind I've been thinking a bit about what it would take to allow for individual control over wall colors and types, but I wanted to see if I could first wring as much performance out of the engine as possible. I'm not sure how much more speed can be squeezed out of it at this point, but you never know.

raycast_20151224.zip

  • Like 1
Link to comment
Share on other sites

JLP default RAM range is $8040 - $9F7F. If you move your RAM16 area down to there, and move your ROM out of that region, then it'd work well on JLP's default RAM range. I can always move the RAM (it's determined by firmware), but usually it's easy enough to rejigger the assembly.

 

EDIT: Also, putting _CARTRAM at $BE00 - $BFF isn't a great idea, as writes in this space will corrupt GRAM if done during vertical blank. There are write-only aliases of GRAM at $7800-$7FFF, $B800-$BFFF and $F800-$FFFF.

Edited by intvnut
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...