+Black_Tiger Posted January 8, 2019 Share Posted January 8, 2019 I finally got to try this out on real hardware and it is much more impressive than I expected. I think that a good use for this in a game would be for a dungeon crawler RPG like Shining in the Darkness. Have random battles and when encountering an enemy/battle, just switch to a battle screen without the 3D engine. Even just a black background and menus with an enemy visual would be par with many NES RPGs. Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted January 9, 2019 Author Share Posted January 9, 2019 I'm glad you like it! I found a small bug in the code and so I'm reposting the ZIP file. I'm also posting a newer one that has most of the raycasting code moved to macros. I'm not sure if it's easier or harder to understand, but it does make the source file shorter and lets me unroll some loops for some extra performance. Both versions also have a separate slight optimization that squeezes a bit more performance out of the raycaster engine. Otherwise, the only difference between the two versions is the use of macros that allows for some loop unrolling. raycast_20190106.zip raycast_20190108.zip 1 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted January 10, 2019 Author Share Posted January 10, 2019 Small bugfix (you had to be looking along an exact direction for the bug to arise) as well as MORE SPEED! This time the optimization is in the colored-squares drawing routine with the help of some lookup tables. raycast_20190109.zip 1 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted January 10, 2019 Author Share Posted January 10, 2019 I found a couple more optimizations in the drawing code that I frankly can't believe I missed So here's a slightly faster version: raycast_20190109a.zip 3 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted January 17, 2019 Author Share Posted January 17, 2019 Fixed a couple of bugs and squeezed some more speed out of the raycaster. raycast_20190116.zip 2 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted January 17, 2019 Author Share Posted January 17, 2019 *sigh* I was simplifying some of the macros and found another bug that I had to fix. However, I also squeezed out a few more unnecessary instructions from the raycaster, so it's not all bad... raycast_20190117.zip 2 Quote Link to comment Share on other sites More sharing options...
artrag Posted January 17, 2019 Share Posted January 17, 2019 I think that there is some numeric problem somewhere. Under certain angles, you can see across the edges of the walls Quote Link to comment Share on other sites More sharing options...
+Lathe26 Posted January 17, 2019 Share Posted January 17, 2019 I think that there is some numeric problem somewhere. Under certain angles, you can see across the edges of the walls I just tried out the PlayStation Classic playing Cool Boarders 2 (made in 1997) and it had the same issue. Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted January 18, 2019 Author Share Posted January 18, 2019 Yeah that phenomenon has been driving me batty for years. I think it might be due to roundoff error. If anyone can find a solution, I'm all ears! Quote Link to comment Share on other sites More sharing options...
artrag Posted January 18, 2019 Share Posted January 18, 2019 (edited) The problem is due to the fact that the ray is missing the collision with the walls. John, can you describe in C the DDA algorithm that you use for plotting the rays? I'm not very into the CP1600 assembly, but maybe replicating the core of the ray-casting algorithm in C with the same math we can find the bug more easily. Edited January 18, 2019 by artrag 1 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted January 18, 2019 Author Share Posted January 18, 2019 I hate it when my allergies keep me up. I especially hate it when I'm supposed to work the next day. Until the Allegra-D kicks in, I decided to do some sleuthing. I hand-calculated the values it's using for the screenshot you posted and it's indeed due to roundoff error. These are the values in the screenshot:X 0DEEY 0E66A 010AI should probably point out what coordinate values really mean here. All position values are represented in signed 8.8 fixed-point format: the high 8 bits represent the integer part of the value and the low 8 bits represent the fractional part of the value.The meaning of these values with respect to the source code is as follows:XPos 0DEE (in fixed-point form it equates to 0D.EE, or 13.9296875 in decimal)YPos 0E66 (in fixed-point form it equates to 0E.66, or 12.3984375 in decimal)Angle 010A (266 decimal, which equates to 199.5 degrees)The column on the screen with the problem is column 22 (the left edge is column 0 and the right edge is column 39). The way the individual column angles are calculated is as follows: the exact center of the screen corresponds to the A value given above (199.5 degrees in this case). This leaves 20 columns on either side of that angle. In the engine, there are 480 angle "units" to a full circle, or in other words each unit equals 0.75 degrees (that's how I go from A=266 to 199.5 degrees -- multiply by 0.75). Starting from the center of the screen, the column immediately to the left is at angle A - 1, or in degrees, the central angle minus 0.75. Therefore, column 19 represents an angle of 199.5 - 0.75 = 198.75 degrees. Likewise, column 20 represents an angle of 199.5 + 0.75 = 200.25 degrees.For each subsequent column, the angle unit value increments by TWO. The reason for this is that the angle units are scaled for rendering a bitmap that is 80 pixels wide with a 60-degree field of view (the bitmap-mode view that initially displays). That's where the 0.75 multiplier comes from. However, in colored-squares mode the bitmap has only half the pixels, so the units must increment by two from one column to the next to represent the same 60-degree field of view.Putting all this together, the angle units range from A - 39 for column 0 to A + 39 for column 39, or in degrees, Angle - 29.25 to Angle + 29.25 (giving a 58.5-degree field of view). Given this, column 22 (the problematic one in this case) represents a ray angle of A + 5, or 203.25 degrees.When casting rays, the engine actually casts two rays: one that tests at integer X values and one that tests at integer Y values. Testing at integer X values lets it detect walls that face east or west, and testing at integer Y values lets it detect walls that face north or south. It does this by maintaining two pairs of coordinates:(XNext, YEdge) coordinates where XNext is an integer, i.e. XNext & $00FF = 0(XEdge, YNext) coordinates where YNext is an integer, i.e. YNext & $00FF = 0For the first pair of tests, the engine has to look at the viewer's position and calculate where the rays would be at the edges of the "box" that the viewer is currently occupying (the maze is 16 boxes by 16 boxes, where the fixed-point representation of position essentially subdivides each box into a 256x256 grid). In this particular example, the viewer's X position is 0DEE, which means box #13 and a fractional position of $00EE. So for a test where XNext is an integer (0D00), it has to calculate what the Y value would be. Likewise, since the viewer is at Y position 0E66, if the ray is to be at Y = 0E00, it has to calculate what the X value would be.This is where roundoff error becomes a problem.When looking along this angle (west-northwest), the formulas for XEdge and YEdge are as follows:XEdge = XPos - frac(YPos) * cot(CurAng)YEdge = YPos - frac(XPos) * tan(CurAng)So each formula takes the fractional part of the position (which determines the distance to the edge of the box -- get it?), multiplies it by the slope (or the inverse of the slope, depending on which coordinate we are calculating), and then adjusts the coordinate accordingly. Pretty simple. So what actually happens?The program has tables for lots of trigonometric functions, including tan and cos. Each table contains 480 entries, which correspond to the 480 possible angular values. Like everything else, they contain values in 8.8 fixed-point format, which introduces the first level of roundoff error -- limited precision.In this example, cot(CurAng) and tan(CurAng) are:cot = 0253 (02.53 if you think in fixed-point terms, or 2.32421875)tan = 006E (00.6E if you think in fixed-point terms, or 0.4296875)By comparison, the actual values that Calculator returns are:cot(203.25) = 2.3275630393965955716880472651463tan(203.25) = 0.42963390596683602666819711641139Then it has to multiply them by the fractional parts of YPos and XPos, respectively:frac(YPos) * cot(CurAng) = 00.66 * 02.53frac(XPos) * tan(CurAng) = 00.EE * 00.6EWe can hand-calculate the result by ignoring the decimal points and just multiplying the raw values together, then shift the result right by 8 bits:0066 * 0253 = 00ED00EE * 006E = 0066Subtracting from XPos and YPos, respectively, we get the final values:XEdge = XPos - 00ED = 0D01YEdge = YPos - 0066 = 0E00So the coordinates to be tested for the two rays are:(XNext, YEdge) = (0D00, 0E00)(XEdge, YNext) = (0D01, 0E00)If you look at the map definition at the bottom of the source, neither of these coordinates results in a wall hit.They also don't match, which is a red flag. The first ray is landing exactly at the corner (which is problematic in and of itself and can be a separate cause of a blank column). However, that's not the problem in this case. The problem is that it *shouldn't* be landing at that position at all.The second ray is landing just to the right of it. That's the ray that will move up in integral values of Y. If that is the correct ray, it implies that the first ray should be landing somewhere above Y=0E00. We can get a hint by increasing our precision.Instead of shifting the result right by 8 bits when multiplying, let's see what happens when we keep the entire result:0066 * 0253 = ED1200EE * 006E = 6644We can think of it in fixed-point terms as this:00.66 * 02.53 = 00.ED1200.EE * 00.6E = 00.6644Now if we subtract those from XPos and YPos, we get something different:XEdge = XPos - 00ED[12] = 0D00[EE] (think of it as 0D.00EE)YEdge = YPos - 0066[44] = 0DFF[bC] (think of it as 0D.FFBC)XEdge shifted left a tiny bit, but the "box" is still 0D, which doesn't give a wall hit. However, YEdge moved up a tiny bit, enough to move it up to the next "box", which DOES give a wall hit. The extra precision basically brought the two rays closer together.So precision is a problem here. 2 Quote Link to comment Share on other sites More sharing options...
artrag Posted January 18, 2019 Share Posted January 18, 2019 (edited) Without floating point precision, to me the trigonometric approach you are using seems doomed ... If you rise the the number of bits in multiplications and tables to get numeric accuracy, frame rate will be computed in seconds per frame rather than in frames per second. You could use JLP expansion and its HW multiplier. In this case you can count on a super fast Multiply / divide accelerator and you can rise the accuracy as you like, but the result will work only in a JLP cartridge. Another option, if you want to keep the game for plain Intellivision, is to change DDA algorithm and avoid the trigonometric functions at all. If you avoid angles to scan the field of view, you can avoid cos() sin() tan() while tracing the rays. This is a simple implementation of DDA based on the normal to the direction of view https://lodev.org/cgtutor/raycasting.html#The_Basic_Idea_ I used it in C and numerically it was quite stable using in FP 8.8 I've also done a faster implementation in FP 8.8 for z80, if you want I can share it and help to convert it to CP1600 Edited January 18, 2019 by artrag Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted January 19, 2019 Author Share Posted January 19, 2019 (edited) It is using the DDA approach: after that initial calculation for the two rays, it then always increments by one X unit or one Y unit. Everything I described above relates to the part midway down the page where he calculates the step and initial sideDist values (that page was actually my starting point for the algorithm). The major difference is in how I represent walls in the WorldMap array. The algorithm you linked to has every square as completely solid, which only requires a single 2D array to represent the map (I actually started out with the same scheme at first). My map is represented as four *separate* arrays, with each array representing the walls only along a certain edge of each square (top, bottom, left, and right edges). This allows a square to have walls of different colors depending on what edge you're viewing, as well as potentially making doorways possible. This approach requires that the initial collision check calculate the actual position along the X or Y step value instead of just the distance to that point. It also is why I have to use two rays instead of just one. Maybe the answer lies in revisiting how walls are represented? Edited January 19, 2019 by JohnPCAE Quote Link to comment Share on other sites More sharing options...
artrag Posted January 19, 2019 Share Posted January 19, 2019 From what I see the main difference is that you use angles instead of the vector normal to the view direction If you use the vector normal instead of the angle to scan the scene, you replace cot(CurAng) and tan(CurAng) by this //calculate ray position and direction double cameraX = 2 * x / double(w) - 1; //x-coordinate in camera space double rayDirX = dirX + planeX * cameraX; double rayDirY = dirY + planeY * cameraX; //which box of the map we're in int mapX = int(posX); int mapY = int(posY); //length of ray from current position to next x or y-side double sideDistX; double sideDistY; //length of ray from one x or y-side to next x or y-side double deltaDistX = std::abs(1 / rayDirX); double deltaDistY = std::abs(1 / rayDirY); double perpWallDist; //what direction to step in x or y-direction (either +1 or -1) int stepX; int stepY; int hit = 0; //was there a wall hit? int side; //was a NS or a EW wall hit? //calculate step and initial sideDist if (rayDirX < 0) { stepX = -1; sideDistX = (posX - mapX) * deltaDistX; } else { stepX = 1; sideDistX = (mapX + 1.0 - posX) * deltaDistX; } if (rayDirY < 0) { stepY = -1; sideDistY = (posY - mapY) * deltaDistY; } else { stepY = 1; sideDistY = (mapY + 1.0 - posY) * deltaDistY; } //perform DDA while (hit == 0) { //jump to next map square, OR in x-direction, OR in y-direction if (sideDistX < sideDistY) { sideDistX += deltaDistX; mapX += stepX; side = 0; } else { sideDistY += deltaDistY; mapY += stepY; side = 1; } //Check if ray has hit a wall if (worldMap[mapX][mapY] > 0) hit = 1; Full source is here https://lodev.org/cgtutor/files/raycaster_flat.cpp and its explanation is here https://lodev.org/cgtutor/raycasting.html#The_Basic_Idea_%C2%A0 1 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted January 19, 2019 Author Share Posted January 19, 2019 (edited) I understand the math. I should point out that I'm using similar math in my distance calculation: rayDirX is cos(angle) and rayDirY is sin(angle). In calculating the distance above, it divides by rayDirX or rayDirY, respectively, which is equivalent to multiplying by sec(angle) or csc(angle) respectively -- which is exactly what my code does. The only major difference is in how the code steps from one block to the next: I'm tracing two rays at once whereas the algorithm above alternates on which ray to trace based on distance traveled. That aspect is probably something I should look at. A few months ago I couldn't have used that approach because I wouldn't know which face a ray impacted -- as is the case above. However, with everything broken out now into special cases based on the direction of the ray, I should now be able to determine it by implication. Edited January 19, 2019 by JohnPCAE Quote Link to comment Share on other sites More sharing options...
artrag Posted January 19, 2019 Share Posted January 19, 2019 (edited) Your math is equivalent to the one in the sample code only if you use infinite precision, numerically they are different if you use fixed point math. Anyway as you spotted, the sample code is increasing x or y by 1 at each step and can detect witch side of the wall is colliding using only one ray. This could be another important difference Edited January 19, 2019 by artrag Quote Link to comment Share on other sites More sharing options...
artrag Posted January 22, 2019 Share Posted January 22, 2019 (edited) There is another optimization that you can consider in the rendering code.The whole screen can be represented by an array of heights and an array of colors with length equal to the number of columns in the field of view.When you plot a column, if it is of the same color of the one already on the screen, you can plot only the difference in height top and down. Most of the time you end up with plotting one or two pixels top and down the column. If the new height is smaller that the old one, you only plot pixels for ceiling and floor, otherwise you fill the difference with wall color. In both cases you have updated the column by plotting only few pixels. If the color is different then you can keep the portion of ceiling and floor and plot the wall section only. Edited January 22, 2019 by artrag Quote Link to comment Share on other sites More sharing options...
artrag Posted January 27, 2019 Share Posted January 27, 2019 (edited) JohnPCAE I've to apologize, in the end in my code I've ended to use secants and angles (2*pi is divided in 4*256 angles). This is my DDA loop in Z80 ASM.. psect text,class=CODE ; muluw .hl,.bc |ED C3 ;de:hl<-hl*bc ; mulub .a,.b |ED C1 ; hl<- a*b ; mulub .a,.c |ED C9 ; mulub .a,.d |ED D1 ; mulub .a,.e |ED D9 muluw_hl_bc macro db 0xed,0xc3 ;de:hl<-hl*bc endm mulub_a_b macro db 0xED,0xC1 ;hl<- a*b endm global cosine_low global secant_low global _x,_y global _alpha global _fovAngleP global _distP global _mapWidth global _world global _direction global _color global _side global _height global _dist ; //// global variables: ; ; // input for DDA asm routine: ; FP_8_8 x; // x-coordinate player position ; FP_8_8 y; // y-coordinate player position ; ANGLE alpha; // viewing direction ; ANGLE* fovAngleP; // should be initialized to start of fovAngles array ; FP_8_8* distP; // should be initialized to start of dists array ; ; // output from DDA asm routine: ; FP_8_8 dists[NDIRS]; // perpendicular distance per column (one element filled in) ; uchar color; // 'color' of the hit world-block ; uchar side; // side on which the block was hit ; uint height; // visible height for the column ; // the input variables fovAngleP and distP are updated ; ANGLE direction; // required for the texture coord calculation ; // (split in upper and lower 8-bit part) ; ; ; //// Use the asm routine like this: ; ; void draw() ; { ; vdp_setup0(); ; fovAngleP = fovAngles; ; distP = dists; ; for (i = 0; i < NDIRS; ++i) { ; ASM: <the asm routine below> ; ASM: this replaces lines 207-352 from the C code ; ASM: in the future more will be moved from C to asm ; ; C: asm routine above calculates the variables: ; C: color, side, height, direction ; C: (optionally) calculate texture coord ; C: the variables 'a' and 'quadrant' should be ; C: replaced with the lower/upper 8 bits of 'direction' ; C: (upper part still needs '& 3'). ; C: actually draw the column ; } ; } ; global _fast_dda _fast_dda: push ix ; TODO make _mapWidth a power of 2 ld a,(_y+1) ; a = y >> 8 ld b,_mapWidth mulub_a_b ; hl = (y >> * mapWidth ld bc,(_x+1) ; c = x >> 8 ld b,_world/256 add hl,bc push hl ; hl = &world[y >> 8][x >> 8] ld hl,(_fovAngleP) ld e,(hl) inc hl ld d,(hl) ; de = fovAngle ld hl,(_alpha) add hl,de ; hl = alpha + fovAngle ld (_direction),hl ; switch (quadrant) ld a,h ; hl = direction and 3 ; a = quadrant jp z,dda_0 ; case 0 cp 2 jp c,dda_1 ; case 1 jr z,dda_2 ; case 2 ;jr dda_3 ; case 3 dda_3: ; case 3 ld a,l ; a = direction & 255 neg ; 256 - a ld bc,32767 jr z,1f ; a == 0? ld h,secant_low/256 ld l,a ; hl = &secant_low[256-a] ld c,(hl) inc h ; hl = &secant_high[256-a] ld b,(hl) 1: push bc ; bc = lenX ld a,(_x) ; a = xfrac neg jr z,1f ; a == 0? ld h,0 ld l,a muluw_hl_bc ; de:hl = (256 - xfrac) * lenX ld c,h ; bc = de:hl >> 8 ld b,e 1: push bc ; bc = distX ld hl,(_direction) ; l = direction & 255 ld h,secant_low/256; hl = &secant_low[a] ld c,(hl) inc h ; hl = &secant_high[a] ld b,(hl) push bc ; bc = lenY ld hl,(_y) ; l = yfrac ld h,0 muluw_hl_bc ; de:hl = yfrac * lenY ld l,h ld h,e ; hl = distY = de:hl >> 8 pop de ; de = lenY exx pop ix ; ix = distX pop de ; de' = lenX pop hl ; hl' = worldP ld bc,-_mapWidth ; bc' = -mapWidth ; while (1) loop3b: exx ; loop3: ld a,ixh ; distX.high loop3: db 0xdd ld a,h ; distX.high cp h ; distY.high jr c,2f ; ->distX smaller jr nz,1f ; ->distY smaller ; ld a,ixl ; distX.low db 0xdd ld a,l ; distX.low cp l ; distY.low jr c,2f ; -> distX smaller 1: ; distX >= distY exx add hl,bc ; worldP -= mapWidth ld a,(hl) ; a = color or a exx jp nz,end3 add hl,de ; distY += lenY jr loop3 2: ; distX < distY exx inc hl ; worldP += 1 ld a,(hl) ; a = color or a jp nz,end0 add ix,de ; distX += lenX jr loop3b dda_2: ; case 2 ld h,secant_low/256; hl = &secant_low[a] ld c,(hl) inc h ; hl = &secant_high[a] ld b,(hl) push bc ; bc = lenX ld hl,(_x) ; l = xfrac ld h,0 muluw_hl_bc ; de:hl = xfrac * lenX ld b,e ld c,h ; bc = de:hl >> 8 push bc ; bc = distX ld a,(_direction) ; a = direction & 255 neg ld bc,32767 jr z,1f ; a == 0? ld h,secant_low/256 ld l,a ; hl = &secant_low[256 - a] ld c,(hl) inc h ; hl = &secant_high[256 - a] ld b,(hl) 1: push bc ; bc = lenY ld hl,(_y) ; l = yfrac ld h,0 muluw_hl_bc ; de:hl = yfrac * lenY ld l,h ld h,e ; hl = distY = de:hl >> 8 pop de ; de = lenY exx pop ix ; ix = distX pop de ; de' = lenX pop hl ; hl' = worldP ld bc,-_mapWidth ; bc' = -mapWidth ; while (1) loop2b: exx ; loop2: ld a,ixh ; distX.high loop2: db 0xdd ld a,h ; distX.high cp h ; distY.high jr c,2f ; ->distX smaller jr nz,1f ; ->distY smaller ; ld a,ixl ; distX.low db 0xdd ld a,l ; distX.low cp l ; distY.low jr c,2f ; -> distX smaller 1: ; distX >= distY exx add hl,bc ; worldP -= mapWidth ld a,(hl) ; a = color or a exx jp nz,end3 add hl,de ; distY += lenY jr loop2 2: ; distX < distY exx dec hl ; worldP -= 1 ld a,(hl) ; a = color or a jp nz,end2 add ix,de ; distX += lenX jr loop2b dda_1: ; case 1 ld a,l ; a = direction & 255 neg ld bc,32767 jr z,1f ; a == 0? ld h,secant_low/256 ld l,a ; hl = &secant_low[a] ld c,(hl) inc h ; hl = &secant_high[a] ld b,(hl) 1: push bc ; bc = lenX ld hl,(_x) ; l = xfrac ld h,0 muluw_hl_bc ; de:hl = xfrac * lenX ld b,e ld c,h push bc ; bc = distX ld hl,(_direction) ; l = direction & 255 ld h,secant_low/256; hl = &secant_low[a] ld c,(hl) inc h ; hl = &secant_high[a] ld b,(hl) push bc ; bc = lenY ld a,(_y) ; a = yfrac neg jr z,1f ld h,0 ld l,a muluw_hl_bc ld b,e ld c,h 1: ld h,b ld l,c ; hl = distY pop de ; de = lenY exx pop ix ; ix = distX pop de ; de' = lenX pop hl ; hl' = worldP ld bc,_mapWidth ; bc' = mapWidth ; while (1) loop1b: exx ;loop1: ld a,ixh ; distX.high loop1: db 0xdd ld a,h ; distX.high cp h ; distY.high jr c,2f ; ->distX smaller jr nz,1f ; ->distY smaller ; ld a,ixl ; distX.low db 0xdd ld a,l ; distX.low cp l ; distY.low jr c,2f ; -> distX smaller 1: ; distX >= distY exx add hl,bc ; worldP += mapWidth ld a,(hl) ; a = color or a exx jr nz,end1 add hl,de ; distY += lenY jr loop1 2: ; distX < distY exx dec hl ; worldP -= 1 ld a,(hl) ; a = color or a jr nz,end2 add ix,de ; distX += lenX jr loop1b dda_0: ; case 0 ld h,secant_low/256; hl = &secant_low[a] ld c,(hl) inc h ; hl = &secant_high[a] ld b,(hl) push bc ; bc = lenX ld a,(_x) ; a = xfrac neg jr z,1f ; a == 0? ld h,0 ld l,a muluw_hl_bc ld b,e ld c,h ; bc = de:hl >> 8 1: push bc ; bc = distX ld a,(_direction) neg ld bc,32767 jr z,1f ld h,secant_low/256 ld l,a ; hl = &secant_low[256 - a] ld c,(hl) inc h ; hl = &secant_high[256 - a] ld b,(hl) 1: push bc ; bc = lenY ld a,(_y) ; a = yfrac neg jr z,1f ld h,0 ld l,a muluw_hl_bc ld b,e ld c,h 1: ld h,b ld l,c ; hl = distY pop de ; de = lenY exx pop ix ; ix = distX pop de ; de' = lenX pop hl ; hl' = worldP ld bc,_mapWidth ; bc' = mapWidth ; while (1) loop0b: exx ; loop0: ld a,ixh ; distX.high loop0: db 0xdd ld a,h ; distX.high cp h ; distY.high jr c,2f ; ->distX smaller jr nz,1f ; ->distY smaller ; ld a,ixl ; distX.low db 0xdd ld a,l ; distX.low cp l ; distY.low jr c,2f ; -> distX smaller 1: ; distX >= distY exx add hl,bc ; worldP += mapWidth ld a,(hl) ; a = color or a exx jr nz,end1 add hl,de ; distY += lenY jr loop0 2: ; distX < distY exx inc hl ; worldP += 1 ld a,(hl) ; a = color or a jr nz,end0 add ix,de ; distX += lenX jr loop0b end3: ld e,3 ; e = side = NORTH jr end13 end2: ld e,2 ; e = side = EAST jr end02 end1: ld e,1 ; e = side = SOUTH end13: ld b,h ld c,l ; bc = distY jr ddaend end0: ld e,0 ; e = side = WEST end02: db 0xdd ld b,h ; end02: ld b,ixh db 0xdd ld c,l ; bc = distX ;jr ddaend ddaend: ld (_color),a ld a,e ld (_side),a ld (_dist),bc ; bc = non-perpendicular distance ld hl,(_fovAngleP) ld d,(hl) ; d = fovAngle & 255 inc hl ld a,(hl) ; a = fovAngle >> 8 inc hl ld (_fovAngleP),hl ; ++fovAngleP; or a ld a,d jr z,1f neg 1: ld h,cosine_low/256 ld l,a ; hl = &cosine_low[..] ld a,(hl) inc h ; hl = &cosine_high[..] ld h,(hl) ld l,a ; hl = cosine(..) muluw_hl_bc ; de:hl = dist * cosine(..) ; de = perpendicular distance ld hl,(_distP) ld (hl),e inc hl ld (hl),d inc hl ld (_distP),hl ; *distP++ = dist ; Very specialized routine to calculate ; HL = 0x4000 / DE (clipped to [1 .. 16383]) ; When DE is in range ; a) [0x0000 .. 0x0001] -> result is clipped to 16383 ; b) [0x0002 .. 0x0080] -> (result 16 bit, remainder 8 bit, 14 iterations) ; c) [0x0081 .. 0x00FF] -> (result 8 bit, remainder 8 bit, 6+1 iterations) ; d) [0x0100 .. 0x3FFF] -> (result 8 bit, remainder 16 bit, 7 iterations) ; e) [0x4000 .. 0xFFFF] -> result is clipped to 1 Divide: ld a,d or a jr z,divABC cp 0x21 ; for correctness we need this from 0x40 jr nc,divE ; for speed already do this from 0x21 (same result) divD: ; 0x100 <= de < 0x4000 ld hl,128 ; hl = remainder ld a,1 ; a = quotient (bit 0 set, so we can later use 'cpl') ld b,7 ; possibly unroll 7x 2: add hl,hl sbc hl,de jr nc,1f add hl,de 1: adc a,a djnz 2b ld h,0 jr divend3 ; l = cpl(a) divE: ; de >= 0x4000 (or even de >= 0x2100) ld hl,1 jr divend divABC: ; de < 0x100 ; Possible optimization: use LUT for small dividers ; ld h,div_tab_low/256 ; 256-bytes aligned ; ld l,e ; ld a,(hl) ; inc h ; ld h,(hl) ; ld l,a ld a,e cp 128+1 jr c,divAB divC: ; 0x80 < de < 0x100 ld b,6 ; There are exactly 7 significant bits in the result, ; (bit 6 is always 1), so handle 1st iteration special. ; Possibly unroll next loop ld l,2 ; 'set' 'upper' quotient bit (still needs cpl and <<= 6) xor a sub e ; a = 256 - e 3: add a,a jr c,1f ; we actually need 9 bits .. or 8 bits + carry cp e jr c,2f 1: sub e or a ; clear carry flag (needed when jumped to label '1') 2: rl l djnz 3b ld h,0 jr divend2 ; l = cpl(l) divA: ; de <= 1 ld hl,16383 jr divend divAB: ; de <= 0x80 cp 2 jr c,divA divB: ; 2 <= de <= 0x80 ld hl,3 ; lower 2 bits will be <<= 14 and 'cpl' ld a,1 ld b,14 ; possibly unroll 1: add a,a cp e jr c,2f sub e 2: adc hl,hl djnz 1b ld a,h cpl ld h,a divend2: ld a,l divend3: cpl ld l,a divend: ld (_height),hl pop ix ret Edited January 27, 2019 by artrag Quote Link to comment Share on other sites More sharing options...
artrag Posted January 27, 2019 Share Posted January 27, 2019 (edited) this is interesting even if off topic sources here https://bisqwit.iki.fi/jutut/kuvat/programming_examples/portalrendering.html Edited January 27, 2019 by artrag Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted February 14, 2019 Author Share Posted February 14, 2019 I haven't yet come up with a foolproof method for eliminating the missing corner lines without really killing performance, but in the meantime I added an instruction to my multiplication routine to improve its accuracy. It simply rounds the result instead of just chopping off the lower 8 bits. It seems like the missing corner lines happen a lot less often now. raycast_20190214.zip 3 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted March 8, 2020 Author Share Posted March 8, 2020 raycast_20200308.zip Slight speedups to both the card-based and the colored-squares code. I also removed a lot of dead code (older multiplication routines that didn't use quarter-squares multiplication). 2 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted March 9, 2020 Author Share Posted March 9, 2020 (edited) raycast_20200309.zip Drat. I found a potential bug in the code that could cause a crash, so here is a new version. It's not all bad, though: I also eliminated a bunch of overhead from the colored-squares ray caster so it will run a bit faster. Edited March 9, 2020 by JohnPCAE 5 Quote Link to comment Share on other sites More sharing options...
eddhell Posted March 11, 2020 Share Posted March 11, 2020 (to everyone involved) I commend your dedication to this, as it took a lot of coffee and dedication from me just to read the entire thread. 1 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted March 12, 2020 Author Share Posted March 12, 2020 (edited) raycast_20200312.zip MORE SPEED! (changes to colored-squares mode only) With all the loop unrolling this has, I'm running into space issues and I had to remove more dead code. This version, however, has much better performance when dealing with walls that are far away. That said, I don't know how much more this can be sped up while still using "plain vanilla" hardware. My intent has been to squeeze every bit of execution speed out of it before looking at adding features that could slow it down, such as texture mapping, "holes" (e.g doorways), and mazes larger than 16x16 squares (by tiling multiple mazes). Edited March 12, 2020 by JohnPCAE 2 Quote Link to comment Share on other sites More sharing options...
JohnPCAE Posted March 13, 2020 Author Share Posted March 13, 2020 raycast_20200313.zip Even MORE speed, fixed a crash, and it takes less space. 4 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.