Jump to content
IGNORED

Raycaster


Asmusr

Recommended Posts

10 hours ago, artrag said:

I think that all the walls/surfaces in EOB are pre-rendered and "composed" on the screen. A given wall can have only 3 distances and 3 orientations in the field of view.

You can scan the 3x3 cells from the farther to the closer line while plotting on the screen the pre-rendered pieces

Hmm, that would probably produce better results on the 9918A since you wouldn't be limited to the fat pixels, but I don't know how fast it would be. Are there any MSX games using this technique?

Link to comment
Share on other sites

Not having examined the sources, I cannot say what is the technique used by msx games, but for sure there is no raycasting involved

This is a V9938 game that use something very close to what we said, probably using 4 distances

 

 

Another (craptastic) msx game using this 3d view and moviments (this time for msx1)

 

I've found this too

 

This other comes with sources in basic

https://www.msx.org/news/software/en/3d-maze-written-in-basic

 

I vote for your raycaster ;-)

Edited by artrag
Link to comment
Share on other sites

13 hours ago, Asmusr said:

It took me a while to understand what you're suggesting, and it sounds like an excellent idea, but I think it would take more than a few KB: A half height column is 96 pixels, and the lower limit per pixel is one 2 byte instruction, so that's 192 bytes per column, or 18 KB for all 96 different heights that fit on screen. But there are also oversized columns that don't fit on screen, but still need to be rendered to the full screen height, which takes a lot more bytes. 

 

The simplest code is when the height of the texture is the same as the texture height. Then each instruction is writing exactly one byte:


movb *r0+,*r3+ ; Write to screen buffer and increment
movb *r0+,*r3+ ; Write to screen buffer and increment
movb *r0+,*r3+ ; Write to screen buffer and increment
...

If the screen height is taller than the texture height, some of the texture bytes will be written more than once:


movb *r0,*r3+  ; Write to screen buffer 
movb *r0,*r3+  ; Write to screen buffer 
movb *r0+,*r3+ ; Write to screen buffer and increment 
...

If the screen height is smaller than the texture height we have to skip some texture bytes:


movb *r0+,*r3+ ; Write to screen buffer and increment 
inc r0         ; Increment
movb *r0+,*r3+ ; Write to screen buffer and increment 
inc r0         ; Increment
...

If we need to skip multiple bytes it will be faster to use ai (add immediate) instructions.

 

I assume that in most cases groups of instructions would repeat themselves periodically, so we could add loops, which may be what we need to fit the code into memory.  

Groups of instructions should repeat themselves periodically, so you could add loops and complete the column with the remaining pixels which exceed the multiple of the period. Nevertheless, this would waste a part of the speed gain. Moreover it would make more complex the code generation. Now a script to generate the code for a column could implement the same general algorithm you have in ASM computing the offset of each pixel in the texture. Those offsets should be converted in ASM instructions accordingly to their values.

You could use a ROM mapper and spread the code for unrolled columns across different pages.

 

In case you do not want to fill a rom mapper of auto generated generated code, you could also decide to unroll only the code for the most frequent heights, e.g. for column heights from a minimum to a maximum (according to the max and min distance of the player from the walls), and keep for the rendering the remaining heights the general purpose code you have already....

 

 

Edited by artrag
Link to comment
Share on other sites

On 8/17/2020 at 10:52 PM, artrag said:

Groups of instructions should repeat themselves periodically, so you could add loops and complete the column with the remaining pixels which exceed the multiple of the period. Nevertheless, this would waste a part of the speed gain. Moreover it would make more complex the code generation. Now a script to generate the code for a column could implement the same general algorithm you have in ASM computing the offset of each pixel in the texture. Those offsets should be converted in ASM instructions accordingly to their values.

You could use a ROM mapper and spread the code for unrolled columns across different pages.

 

In case you do not want to fill a rom mapper of auto generated generated code, you could also decide to unroll only the code for the most frequent heights, e.g. for column heights from a minimum to a maximum (according to the max and min distance of the player from the walls), and keep for the rendering the remaining heights the general purpose code you have already....

I did all the ground work of adding the unrolled texture drawing code to the ROM cartridge before I realized there is a big problem: the textures themselves are also in the ROM cart and it is not possible to map two banks of the cart into the CPU address space at the same time. The good news is that I still have room in RAM to copy the current textures over, so I'm able to make a demo to see the effects of unrolling the drawing code, but that wouldn't work in a game with lots of textures. The only solution I can think of is to use SAMS memory, which allows multiple 4K pages mapped at different locations.  

  • Like 3
  • Sad 1
Link to comment
Share on other sites

19 minutes ago, artrag said:

Sorry to hear this. Could you put the textures in the non mapped part of the rom?

There isn't any non-mapped part of the ROM. A cartridge in the cartridge slot can only be mapped into one 8K memory region, and the standard cartridge design maps all 8K as one page. 

  • Sad 1
Link to comment
Share on other sites

20 hours ago, artrag said:

On msx, roms are usually visible on 32KB, divided in 4 pages of 8KB each (or 2 pages of 16KB each)

Sorry for having given a bad advice 

It's still a very good suggestion, and I'm almost there.

 

Before this optimization, the routines that take long time are (approximately):

- Cast rays: 200,000 cycles

- Draw screen: 1,000,000 cycles

- Copy screen to VDP: 200,000 cycles

 

With 3,000,000 cycles per second, 1,400,000 cycles correspond to approximately 2 frames per second.

The 400,000 cycles from the first and last routine will still be there with the optimization, but the hope is to make a good cut in the middle one.

 

 

 

Edited by Asmusr
  • Like 2
Link to comment
Share on other sites

Just to note that in the optimized version that I presented I'm not handling columns taller than the screen correctly, which is apparent when you move close to the walls. The simple solution will be to add a few more ROM banks to deal with those additional heights.

  • Like 2
Link to comment
Share on other sites

It is quite faster. Great work ! Are you using SAMS extension and its rom paging? 

 

BTW, another possible optimisation of the same kind would be to specialise the code that does column tracing for flat colour walls. You could generate a complete set of unrolled routines for flat colour walls (or probably a single routine with differentiated entry points) where the input colour is in a register.

If textured and flat walls are mixed, the gain could be worth.

    

Edited by artrag
  • Like 1
Link to comment
Share on other sites

20 hours ago, artrag said:

It is quite faster. Great work ! Are you using SAMS extension and its rom paging? 

 

BTW, another possible optimisation of the same kind would be to specialise the code that does column tracing for flat colour walls. You could generate a complete set of unrolled routines for flat colour walls (or probably a single routine with differentiated entry points) where the input colour is in a register.

If textured and flat walls are mixed, the gain could be worth.

    

No I'm not using SAMS yet. I copied 4 textures to RAM, and that's at least as fast as using SAMS.

 

I also thought about optimizing the sky/floor/monochrome wall drawing, but I don't think it's worthwhile to unroll those loops entirely unless the sky/floor are also textured.

In this video I doubled the number of pixels written per wall/floor iteration from 4 to 8, and maybe you can see a slight difference, but I don't think unrolling those loop any further will have any visible effect.

 

 

Edited by Asmusr
  • Like 4
Link to comment
Share on other sites

Maybe a dummy question...

Why in upload_screen you need 16 pointers?

I was expecting you to use columns of 8 tiles on the pattern name table in order to be able to write 64 adjacent bytes. This allows you to set the VRAM pointer only 3 times per column, once per tile bank.

In this way you can use a ram buffer not longer than a column. 

Edited by artrag
Link to comment
Share on other sites

4 hours ago, artrag said:

Maybe a dummy question...

Why in upload_screen you need 16 pointers?

I was expecting you to use columns of 8 tiles on the pattern name table in order to be able to write 64 adjacent bytes. This allows you to set the VRAM pointer only 3 times per column, once per tile bank.

In this way you can use a ram buffer not longer than a column. 

I think you're looking at the master branch, which contains the non-texture mapped code. You need to look at the texture_mapped_unrolled branch.

It's correct that I could use a single column RAM buffer as it is now, but when I start adding objects on top of the background it would be more difficult. The full screen buffer also makes the screen update shorter and possibly less flickering than a single column buffer.  

Link to comment
Share on other sites

2 hours ago, fabrice montupet said:

Just a detail: At a time, the dungeon will need a roof. Maybe that changing the cyan color by a more adequate one will simulate it  ?

My plan is to make a textured ceiling, which will perhaps just be a static image.

  • Like 2
Link to comment
Share on other sites

25 minutes ago, FarmerPotato said:

Hi @Asmusr,

 

I tried assembling the version of Raycaster from github. It uses a xas99.py -w option, which wasn't recognized by the xdt99 I had, or the latest 3.00.

Can you help me with -w?

 

I haven't upgraded to version 3 yet. ;-) The -w option is just to suppress warnings about unused labels.

Link to comment
Share on other sites

About ceiling and floors, the fastest solution to render them is to use differential plotting, i.e. plot only the part that is needed.

Store in ram an array of column heights (64 bytes) for the current frame.

If the new height is equal of higher that the one from the previous frame in the array, plot only the wall column, as nothing changes in ceiling and floor.

If the new height is shorter that the one from the previous frame in the array, plot only the fraction of ceiling and floor from the previous height to the new height  and the new column.

In this way ceiling and floors are not plotted or plotted only in the area of the difference between the two columns. 

   

 

  • Like 3
Link to comment
Share on other sites

7 hours ago, artrag said:

About ceiling and floors, the fastest solution to render them is to use differential plotting, i.e. plot only the part that is needed.

Store in ram an array of column heights (64 bytes) for the current frame.

If the new height is equal of higher that the one from the previous frame in the array, plot only the wall column, as nothing changes in ceiling and floor.

If the new height is shorter that the one from the previous frame in the array, plot only the fraction of ceiling and floor from the previous height to the new height  and the new column.

In this way ceiling and floors are not plotted or plotted only in the area of the difference between the two columns. 

It's a good suggestion but it won't work (for the floor at least) if I start adding other objects to the screen buffer. Maybe an object will never overlap the ceiling so there it would work?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...