Asmusr Posted August 17, 2020 Author Share Posted August 17, 2020 10 hours ago, artrag said: I think that all the walls/surfaces in EOB are pre-rendered and "composed" on the screen. A given wall can have only 3 distances and 3 orientations in the field of view. You can scan the 3x3 cells from the farther to the closer line while plotting on the screen the pre-rendered pieces Hmm, that would probably produce better results on the 9918A since you wouldn't be limited to the fat pixels, but I don't know how fast it would be. Are there any MSX games using this technique? Quote Link to comment Share on other sites More sharing options...
artrag Posted August 17, 2020 Share Posted August 17, 2020 (edited) Not having examined the sources, I cannot say what is the technique used by msx games, but for sure there is no raycasting involved This is a V9938 game that use something very close to what we said, probably using 4 distances Another (craptastic) msx game using this 3d view and moviments (this time for msx1) I've found this too This other comes with sources in basic https://www.msx.org/news/software/en/3d-maze-written-in-basic I vote for your raycaster Edited August 17, 2020 by artrag Quote Link to comment Share on other sites More sharing options...
artrag Posted August 17, 2020 Share Posted August 17, 2020 (edited) 13 hours ago, Asmusr said: It took me a while to understand what you're suggesting, and it sounds like an excellent idea, but I think it would take more than a few KB: A half height column is 96 pixels, and the lower limit per pixel is one 2 byte instruction, so that's 192 bytes per column, or 18 KB for all 96 different heights that fit on screen. But there are also oversized columns that don't fit on screen, but still need to be rendered to the full screen height, which takes a lot more bytes. The simplest code is when the height of the texture is the same as the texture height. Then each instruction is writing exactly one byte: movb *r0+,*r3+ ; Write to screen buffer and increment movb *r0+,*r3+ ; Write to screen buffer and increment movb *r0+,*r3+ ; Write to screen buffer and increment ... If the screen height is taller than the texture height, some of the texture bytes will be written more than once: movb *r0,*r3+ ; Write to screen buffer movb *r0,*r3+ ; Write to screen buffer movb *r0+,*r3+ ; Write to screen buffer and increment ... If the screen height is smaller than the texture height we have to skip some texture bytes: movb *r0+,*r3+ ; Write to screen buffer and increment inc r0 ; Increment movb *r0+,*r3+ ; Write to screen buffer and increment inc r0 ; Increment ... If we need to skip multiple bytes it will be faster to use ai (add immediate) instructions. I assume that in most cases groups of instructions would repeat themselves periodically, so we could add loops, which may be what we need to fit the code into memory. Groups of instructions should repeat themselves periodically, so you could add loops and complete the column with the remaining pixels which exceed the multiple of the period. Nevertheless, this would waste a part of the speed gain. Moreover it would make more complex the code generation. Now a script to generate the code for a column could implement the same general algorithm you have in ASM computing the offset of each pixel in the texture. Those offsets should be converted in ASM instructions accordingly to their values. You could use a ROM mapper and spread the code for unrolled columns across different pages. In case you do not want to fill a rom mapper of auto generated generated code, you could also decide to unroll only the code for the most frequent heights, e.g. for column heights from a minimum to a maximum (according to the max and min distance of the player from the walls), and keep for the rendering the remaining heights the general purpose code you have already.... Edited August 18, 2020 by artrag Quote Link to comment Share on other sites More sharing options...
Asmusr Posted August 20, 2020 Author Share Posted August 20, 2020 On 8/17/2020 at 10:52 PM, artrag said: Groups of instructions should repeat themselves periodically, so you could add loops and complete the column with the remaining pixels which exceed the multiple of the period. Nevertheless, this would waste a part of the speed gain. Moreover it would make more complex the code generation. Now a script to generate the code for a column could implement the same general algorithm you have in ASM computing the offset of each pixel in the texture. Those offsets should be converted in ASM instructions accordingly to their values. You could use a ROM mapper and spread the code for unrolled columns across different pages. In case you do not want to fill a rom mapper of auto generated generated code, you could also decide to unroll only the code for the most frequent heights, e.g. for column heights from a minimum to a maximum (according to the max and min distance of the player from the walls), and keep for the rendering the remaining heights the general purpose code you have already.... I did all the ground work of adding the unrolled texture drawing code to the ROM cartridge before I realized there is a big problem: the textures themselves are also in the ROM cart and it is not possible to map two banks of the cart into the CPU address space at the same time. The good news is that I still have room in RAM to copy the current textures over, so I'm able to make a demo to see the effects of unrolling the drawing code, but that wouldn't work in a game with lots of textures. The only solution I can think of is to use SAMS memory, which allows multiple 4K pages mapped at different locations. 3 1 Quote Link to comment Share on other sites More sharing options...
artrag Posted August 20, 2020 Share Posted August 20, 2020 Sorry to hear this. Could you put the textures in the non mapped part of the rom? Quote Link to comment Share on other sites More sharing options...
Asmusr Posted August 20, 2020 Author Share Posted August 20, 2020 19 minutes ago, artrag said: Sorry to hear this. Could you put the textures in the non mapped part of the rom? There isn't any non-mapped part of the ROM. A cartridge in the cartridge slot can only be mapped into one 8K memory region, and the standard cartridge design maps all 8K as one page. 1 Quote Link to comment Share on other sites More sharing options...
artrag Posted August 20, 2020 Share Posted August 20, 2020 (edited) On msx, roms are usually visible on 32KB, divided in 4 pages of 8KB each (or 2 pages of 16KB each) Sorry for having given a bad advice Edited August 20, 2020 by artrag Quote Link to comment Share on other sites More sharing options...
jrhodes Posted August 20, 2020 Share Posted August 20, 2020 (edited) 2 hours ago, Asmusr said: ... The only solution I can think of is to use SAMS memory, which allows multiple 4K pages mapped at different locations. Time to put the SAMS memory sidecar i purchased to good use fun. ? Edited August 20, 2020 by jrhodes 1 Quote Link to comment Share on other sites More sharing options...
Elia Spallanzani fdt Posted August 20, 2020 Share Posted August 20, 2020 With sams, you could combine adamantyr's game with 3d dungeons. 1 Quote Link to comment Share on other sites More sharing options...
artrag Posted August 20, 2020 Share Posted August 20, 2020 If this is the SAMS expansion, it should be able to show multiple 4KB (and 8KB) pages http://www.unige.ch/medecine/nouspikel/ti99/superams.htm#low-level Quote Link to comment Share on other sites More sharing options...
Asmusr Posted August 21, 2020 Author Share Posted August 21, 2020 (edited) 20 hours ago, artrag said: On msx, roms are usually visible on 32KB, divided in 4 pages of 8KB each (or 2 pages of 16KB each) Sorry for having given a bad advice It's still a very good suggestion, and I'm almost there. Before this optimization, the routines that take long time are (approximately): - Cast rays: 200,000 cycles - Draw screen: 1,000,000 cycles - Copy screen to VDP: 200,000 cycles With 3,000,000 cycles per second, 1,400,000 cycles correspond to approximately 2 frames per second. The 400,000 cycles from the first and last routine will still be there with the optimization, but the hope is to make a good cut in the middle one. Edited August 21, 2020 by Asmusr 2 Quote Link to comment Share on other sites More sharing options...
Asmusr Posted August 21, 2020 Author Share Posted August 21, 2020 (edited) Here we go: the time for the drawing routine has been cut in half, resulting in a frame rate of 4-5 FPS. texcaster.rpk texcaster8.bin Edited August 21, 2020 by Asmusr 9 1 Quote Link to comment Share on other sites More sharing options...
Asmusr Posted August 21, 2020 Author Share Posted August 21, 2020 Just to note that in the optimized version that I presented I'm not handling columns taller than the screen correctly, which is apparent when you move close to the walls. The simple solution will be to add a few more ROM banks to deal with those additional heights. 2 Quote Link to comment Share on other sites More sharing options...
artrag Posted August 21, 2020 Share Posted August 21, 2020 (edited) It is quite faster. Great work ! Are you using SAMS extension and its rom paging? BTW, another possible optimisation of the same kind would be to specialise the code that does column tracing for flat colour walls. You could generate a complete set of unrolled routines for flat colour walls (or probably a single routine with differentiated entry points) where the input colour is in a register. If textured and flat walls are mixed, the gain could be worth. Edited August 21, 2020 by artrag 1 Quote Link to comment Share on other sites More sharing options...
Asmusr Posted August 21, 2020 Author Share Posted August 21, 2020 (edited) 20 hours ago, artrag said: It is quite faster. Great work ! Are you using SAMS extension and its rom paging? BTW, another possible optimisation of the same kind would be to specialise the code that does column tracing for flat colour walls. You could generate a complete set of unrolled routines for flat colour walls (or probably a single routine with differentiated entry points) where the input colour is in a register. If textured and flat walls are mixed, the gain could be worth. No I'm not using SAMS yet. I copied 4 textures to RAM, and that's at least as fast as using SAMS. I also thought about optimizing the sky/floor/monochrome wall drawing, but I don't think it's worthwhile to unroll those loops entirely unless the sky/floor are also textured. In this video I doubled the number of pixels written per wall/floor iteration from 4 to 8, and maybe you can see a slight difference, but I don't think unrolling those loop any further will have any visible effect. Edited August 22, 2020 by Asmusr 4 Quote Link to comment Share on other sites More sharing options...
artrag Posted August 21, 2020 Share Posted August 21, 2020 Monochrome walls can have a specialized routine of 64 instructions to write a single value from a register. According to the entry point you can plot from 1 pixel to 64 pixels and you can use it for walls ceiling and floor as well 1 Quote Link to comment Share on other sites More sharing options...
artrag Posted August 22, 2020 Share Posted August 22, 2020 (edited) Maybe a dummy question... Why in upload_screen you need 16 pointers? I was expecting you to use columns of 8 tiles on the pattern name table in order to be able to write 64 adjacent bytes. This allows you to set the VRAM pointer only 3 times per column, once per tile bank. In this way you can use a ram buffer not longer than a column. Edited August 22, 2020 by artrag Quote Link to comment Share on other sites More sharing options...
fabrice montupet Posted August 22, 2020 Share Posted August 22, 2020 Just a detail: At a time, the dungeon will need a roof. Maybe that changing the cyan color by a more adequate one will simulate it ? Quote Link to comment Share on other sites More sharing options...
Asmusr Posted August 22, 2020 Author Share Posted August 22, 2020 4 hours ago, artrag said: Maybe a dummy question... Why in upload_screen you need 16 pointers? I was expecting you to use columns of 8 tiles on the pattern name table in order to be able to write 64 adjacent bytes. This allows you to set the VRAM pointer only 3 times per column, once per tile bank. In this way you can use a ram buffer not longer than a column. I think you're looking at the master branch, which contains the non-texture mapped code. You need to look at the texture_mapped_unrolled branch. It's correct that I could use a single column RAM buffer as it is now, but when I start adding objects on top of the background it would be more difficult. The full screen buffer also makes the screen update shorter and possibly less flickering than a single column buffer. Quote Link to comment Share on other sites More sharing options...
Asmusr Posted August 22, 2020 Author Share Posted August 22, 2020 2 hours ago, fabrice montupet said: Just a detail: At a time, the dungeon will need a roof. Maybe that changing the cyan color by a more adequate one will simulate it ? My plan is to make a textured ceiling, which will perhaps just be a static image. 2 Quote Link to comment Share on other sites More sharing options...
+FarmerPotato Posted August 23, 2020 Share Posted August 23, 2020 Hi @Asmusr, I tried assembling the version of Raycaster from github. It uses a xas99.py -w option, which wasn't recognized by the xdt99 I had, or the latest 3.00. Can you help me with -w? Quote Link to comment Share on other sites More sharing options...
Asmusr Posted August 23, 2020 Author Share Posted August 23, 2020 25 minutes ago, FarmerPotato said: Hi @Asmusr, I tried assembling the version of Raycaster from github. It uses a xas99.py -w option, which wasn't recognized by the xdt99 I had, or the latest 3.00. Can you help me with -w? I haven't upgraded to version 3 yet. ;-) The -w option is just to suppress warnings about unused labels. Quote Link to comment Share on other sites More sharing options...
+FarmerPotato Posted August 23, 2020 Share Posted August 23, 2020 17 minutes ago, Asmusr said: I haven't upgraded to version 3 yet. The -w option is just to suppress warnings about unused labels. OK, it's -q to suppress warnings in the versions I have. Thanks! Quote Link to comment Share on other sites More sharing options...
artrag Posted August 25, 2020 Share Posted August 25, 2020 About ceiling and floors, the fastest solution to render them is to use differential plotting, i.e. plot only the part that is needed. Store in ram an array of column heights (64 bytes) for the current frame. If the new height is equal of higher that the one from the previous frame in the array, plot only the wall column, as nothing changes in ceiling and floor. If the new height is shorter that the one from the previous frame in the array, plot only the fraction of ceiling and floor from the previous height to the new height and the new column. In this way ceiling and floors are not plotted or plotted only in the area of the difference between the two columns. 3 Quote Link to comment Share on other sites More sharing options...
Asmusr Posted August 25, 2020 Author Share Posted August 25, 2020 7 hours ago, artrag said: About ceiling and floors, the fastest solution to render them is to use differential plotting, i.e. plot only the part that is needed. Store in ram an array of column heights (64 bytes) for the current frame. If the new height is equal of higher that the one from the previous frame in the array, plot only the wall column, as nothing changes in ceiling and floor. If the new height is shorter that the one from the previous frame in the array, plot only the fraction of ceiling and floor from the previous height to the new height and the new column. In this way ceiling and floors are not plotted or plotted only in the area of the difference between the two columns. It's a good suggestion but it won't work (for the floor at least) if I start adding other objects to the screen buffer. Maybe an object will never overlap the ceiling so there it would work? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.