Raycasting demo

+Black_Tiger · January 8, 2019

I finally got to try this out on real hardware and it is much more impressive than I expected.

I think that a good use for this in a game would be for a dungeon crawler RPG like Shining in the Darkness.

Have random battles and when encountering an enemy/battle, just switch to a battle screen without the 3D engine. Even just a black background and menus with an enemy visual would be par with many NES RPGs.

JohnPCAE · January 9, 2019

I'm glad you like it!

I found a small bug in the code and so I'm reposting the ZIP file. I'm also posting a newer one that has most of the raycasting code moved to macros. I'm not sure if it's easier or harder to understand, but it does make the source file shorter and lets me unroll some loops for some extra performance. Both versions also have a separate slight optimization that squeezes a bit more performance out of the raycaster engine. Otherwise, the only difference between the two versions is the use of macros that allows for some loop unrolling.

raycast_20190106.zip

raycast_20190108.zip

JohnPCAE · January 10, 2019

Small bugfix (you had to be looking along an exact direction for the bug to arise) as well as MORE SPEED! This time the optimization is in the colored-squares drawing routine with the help of some lookup tables.

raycast_20190109.zip

JohnPCAE · January 10, 2019

I found a couple more optimizations in the drawing code that I frankly can't believe I missed So here's a slightly faster version:

raycast_20190109a.zip

JohnPCAE · January 17, 2019

Fixed a couple of bugs and squeezed some more speed out of the raycaster.

raycast_20190116.zip

JohnPCAE · January 17, 2019

*sigh*

I was simplifying some of the macros and found another bug that I had to fix. However, I also squeezed out a few more unnecessary instructions from the raycaster, so it's not all bad...

raycast_20190117.zip

artrag · January 17, 2019

I think that there is some numeric problem somewhere.

Under certain angles, you can see across the edges of the walls

+Lathe26 · January 17, 2019

I think that there is some numeric problem somewhere.

Under certain angles, you can see across the edges of the walls

I just tried out the PlayStation Classic playing Cool Boarders 2 (made in 1997) and it had the same issue. ;-)

JohnPCAE · January 18, 2019

Yeah that phenomenon has been driving me batty for years. I think it might be due to roundoff error. If anyone can find a solution, I'm all ears!

artrag · January 18, 2019

The problem is due to the fact that the ray is missing the collision with the walls. John, can you describe in C the DDA algorithm that you use for plotting the rays? I'm not very into the CP1600 assembly, but maybe replicating the core of the ray-casting algorithm in C with the same math we can find the bug more easily.

Edited January 18, 2019 by artrag

JohnPCAE · January 18, 2019

I hate it when my allergies keep me up. I especially hate it when I'm supposed to work the next day. Until the Allegra-D kicks in, I decided to do some sleuthing.

I hand-calculated the values it's using for the screenshot you posted and it's indeed due to roundoff error. These are the values in the screenshot:

X 0DEE
Y 0E66
A 010A

I should probably point out what coordinate values really mean here. All position values are represented in signed 8.8 fixed-point format: the high 8 bits represent the integer part of the value and the low 8 bits represent the fractional part of the value.

The meaning of these values with respect to the source code is as follows:

XPos 0DEE (in fixed-point form it equates to 0D.EE, or 13.9296875 in decimal)
YPos 0E66 (in fixed-point form it equates to 0E.66, or 12.3984375 in decimal)
Angle 010A (266 decimal, which equates to 199.5 degrees)

The column on the screen with the problem is column 22 (the left edge is column 0 and the right edge is column 39). The way the individual column angles are calculated is as follows: the exact center of the screen corresponds to the A value given above (199.5 degrees in this case). This leaves 20 columns on either side of that angle. In the engine, there are 480 angle "units" to a full circle, or in other words each unit equals 0.75 degrees (that's how I go from A=266 to 199.5 degrees -- multiply by 0.75). Starting from the center of the screen, the column immediately to the left is at angle A - 1, or in degrees, the central angle minus 0.75. Therefore, column 19 represents an angle of 199.5 - 0.75 = 198.75 degrees. Likewise, column 20 represents an angle of 199.5 + 0.75 = 200.25 degrees.

For each subsequent column, the angle unit value increments by TWO. The reason for this is that the angle units are scaled for rendering a bitmap that is 80 pixels wide with a 60-degree field of view (the bitmap-mode view that initially displays). That's where the 0.75 multiplier comes from. However, in colored-squares mode the bitmap has only half the pixels, so the units must increment by two from one column to the next to represent the same 60-degree field of view.

Putting all this together, the angle units range from A - 39 for column 0 to A + 39 for column 39, or in degrees, Angle - 29.25 to Angle + 29.25 (giving a 58.5-degree field of view). Given this, column 22 (the problematic one in this case) represents a ray angle of A + 5, or 203.25 degrees.

When casting rays, the engine actually casts two rays: one that tests at integer X values and one that tests at integer Y values. Testing at integer X values lets it detect walls that face east or west, and testing at integer Y values lets it detect walls that face north or south. It does this by maintaining two pairs of coordinates:

(XNext, YEdge) coordinates where XNext is an integer, i.e. XNext & $00FF = 0
(XEdge, YNext) coordinates where YNext is an integer, i.e. YNext & $00FF = 0

For the first pair of tests, the engine has to look at the viewer's position and calculate where the rays would be at the edges of the "box" that the viewer is currently occupying (the maze is 16 boxes by 16 boxes, where the fixed-point representation of position essentially subdivides each box into a 256x256 grid). In this particular example, the viewer's X position is 0DEE, which means box #13 and a fractional position of $00EE. So for a test where XNext is an integer (0D00), it has to calculate what the Y value would be. Likewise, since the viewer is at Y position 0E66, if the ray is to be at Y = 0E00, it has to calculate what the X value would be.

This is where roundoff error becomes a problem.

When looking along this angle (west-northwest), the formulas for XEdge and YEdge are as follows:

XEdge = XPos - frac(YPos) * cot(CurAng)
YEdge = YPos - frac(XPos) * tan(CurAng)

So each formula takes the fractional part of the position (which determines the distance to the edge of the box -- get it?), multiplies it by the slope (or the inverse of the slope, depending on which coordinate we are calculating), and then adjusts the coordinate accordingly. Pretty simple. So what actually happens?

The program has tables for lots of trigonometric functions, including tan and cos. Each table contains 480 entries, which correspond to the 480 possible angular values. Like everything else, they contain values in 8.8 fixed-point format, which introduces the first level of roundoff error -- limited precision.

In this example, cot(CurAng) and tan(CurAng) are:

cot = 0253 (02.53 if you think in fixed-point terms, or 2.32421875)
tan = 006E (00.6E if you think in fixed-point terms, or 0.4296875)

By comparison, the actual values that Calculator returns are:

cot(203.25) = 2.3275630393965955716880472651463
tan(203.25) = 0.42963390596683602666819711641139

Then it has to multiply them by the fractional parts of YPos and XPos, respectively:

frac(YPos) * cot(CurAng) = 00.66 * 02.53
frac(XPos) * tan(CurAng) = 00.EE * 00.6E

We can hand-calculate the result by ignoring the decimal points and just multiplying the raw values together, then shift the result right by 8 bits:

0066 * 0253 = 00ED
00EE * 006E = 0066

Subtracting from XPos and YPos, respectively, we get the final values:

XEdge = XPos - 00ED = 0D01
YEdge = YPos - 0066 = 0E00

So the coordinates to be tested for the two rays are:

(XNext, YEdge) = (0D00, 0E00)
(XEdge, YNext) = (0D01, 0E00)

If you look at the map definition at the bottom of the source, neither of these coordinates results in a wall hit.

They also don't match, which is a red flag. The first ray is landing exactly at the corner (which is problematic in and of itself and can be a separate cause of a blank column). However, that's not the problem in this case. The problem is that it *shouldn't* be landing at that position at all.

The second ray is landing just to the right of it. That's the ray that will move up in integral values of Y. If that is the correct ray, it implies that the first ray should be landing somewhere above Y=0E00. We can get a hint by increasing our precision.

Instead of shifting the result right by 8 bits when multiplying, let's see what happens when we keep the entire result:

0066 * 0253 = ED12
00EE * 006E = 6644

We can think of it in fixed-point terms as this:

00.66 * 02.53 = 00.ED12
00.EE * 00.6E = 00.6644

Now if we subtract those from XPos and YPos, we get something different:

XEdge = XPos - 00ED[12] = 0D00[EE] (think of it as 0D.00EE)
YEdge = YPos - 0066[44] = 0DFF[bC] (think of it as 0D.FFBC)

XEdge shifted left a tiny bit, but the "box" is still 0D, which doesn't give a wall hit. However, YEdge moved up a tiny bit, enough to move it up to the next "box", which DOES give a wall hit. The extra precision basically brought the two rays closer together.

So precision is a problem here.

artrag · January 18, 2019

Without floating point precision, to me the trigonometric approach you are using seems doomed ...

If you rise the the number of bits in multiplications and tables to get numeric accuracy, frame rate will be computed in seconds per frame rather than in frames per second.

You could use JLP expansion and its HW multiplier. In this case you can count on a super fast Multiply / divide accelerator and you can rise the accuracy as you like, but the result will work only in a JLP cartridge.

Another option, if you want to keep the game for plain Intellivision, is to change DDA algorithm and avoid the trigonometric functions at all.

If you avoid angles to scan the field of view, you can avoid cos() sin() tan() while tracing the rays.

This is a simple implementation of DDA based on the normal to the direction of view

https://lodev.org/cgtutor/raycasting.html#The_Basic_Idea_

I used it in C and numerically it was quite stable using in FP 8.8

I've also done a faster implementation in FP 8.8 for z80, if you want I can share it and help to convert it to CP1600

Edited January 18, 2019 by artrag

JohnPCAE · January 19, 2019

It is using the DDA approach: after that initial calculation for the two rays, it then always increments by one X unit or one Y unit. Everything I described above relates to the part midway down the page where he calculates the step and initial sideDist values (that page was actually my starting point for the algorithm).

The major difference is in how I represent walls in the WorldMap array. The algorithm you linked to has every square as completely solid, which only requires a single 2D array to represent the map (I actually started out with the same scheme at first). My map is represented as four *separate* arrays, with each array representing the walls only along a certain edge of each square (top, bottom, left, and right edges). This allows a square to have walls of different colors depending on what edge you're viewing, as well as potentially making doorways possible. This approach requires that the initial collision check calculate the actual position along the X or Y step value instead of just the distance to that point. It also is why I have to use two rays instead of just one.

Maybe the answer lies in revisiting how walls are represented?

Edited January 19, 2019 by JohnPCAE

artrag · January 19, 2019

From what I see the main difference is that you use angles instead of the vector normal to the view direction

If you use the vector normal instead of the angle to scan the scene, you replace cot(CurAng) and tan(CurAng) by this

//calculate ray position and direction
      double cameraX = 2 * x / double(w) - 1; //x-coordinate in camera space
      double rayDirX = dirX + planeX * cameraX;
      double rayDirY = dirY + planeY * cameraX;
      //which box of the map we're in
      int mapX = int(posX);
      int mapY = int(posY);

      //length of ray from current position to next x or y-side
      double sideDistX;
      double sideDistY;

       //length of ray from one x or y-side to next x or y-side
      double deltaDistX = std::abs(1 / rayDirX);
      double deltaDistY = std::abs(1 / rayDirY);
      double perpWallDist;

      //what direction to step in x or y-direction (either +1 or -1)
      int stepX;
      int stepY;

      int hit = 0; //was there a wall hit?
      int side; //was a NS or a EW wall hit?
      //calculate step and initial sideDist
      if (rayDirX < 0)
      {
        stepX = -1;
        sideDistX = (posX - mapX) * deltaDistX;
      }
      else
      {
        stepX = 1;
        sideDistX = (mapX + 1.0 - posX) * deltaDistX;
      }
      if (rayDirY < 0)
      {
        stepY = -1;
        sideDistY = (posY - mapY) * deltaDistY;
      }
      else
      {
        stepY = 1;
        sideDistY = (mapY + 1.0 - posY) * deltaDistY;
      }
      //perform DDA
      while (hit == 0)
      {
        //jump to next map square, OR in x-direction, OR in y-direction
        if (sideDistX < sideDistY)
        {
          sideDistX += deltaDistX;
          mapX += stepX;
          side = 0;
        }
        else
        {
          sideDistY += deltaDistY;
          mapY += stepY;
          side = 1;
        }
        //Check if ray has hit a wall
        if (worldMap[mapX][mapY] > 0) hit = 1;

Full source is here

https://lodev.org/cgtutor/files/raycaster_flat.cpp

and its explanation is here

https://lodev.org/cgtutor/raycasting.html#The_Basic_Idea_%C2%A0

JohnPCAE · January 19, 2019

I understand the math. I should point out that I'm using similar math in my distance calculation: rayDirX is cos(angle) and rayDirY is sin(angle). In calculating the distance above, it divides by rayDirX or rayDirY, respectively, which is equivalent to multiplying by sec(angle) or csc(angle) respectively -- which is exactly what my code does. The only major difference is in how the code steps from one block to the next: I'm tracing two rays at once whereas the algorithm above alternates on which ray to trace based on distance traveled. That aspect is probably something I should look at. A few months ago I couldn't have used that approach because I wouldn't know which face a ray impacted -- as is the case above. However, with everything broken out now into special cases based on the direction of the ray, I should now be able to determine it by implication.

Edited January 19, 2019 by JohnPCAE

artrag · January 19, 2019

Your math is equivalent to the one in the sample code only if you use infinite precision, numerically they are different if you use fixed point math.

Anyway as you spotted, the sample code is increasing x or y by 1 at each step and can detect witch side of the wall is colliding using only one ray.

This could be another important difference

Edited January 19, 2019 by artrag

artrag · January 22, 2019

There is another optimization that you can consider in the rendering code.
The whole screen can be represented by an array of heights and an array of colors with length equal to the number of columns in the field of view.
When you plot a column, if it is of the same color of the one already on the screen, you can plot only the difference in height top and down.

Most of the time you end up with plotting one or two pixels top and down the column.

If the new height is smaller that the old one, you only plot pixels for ceiling and floor, otherwise you fill the difference with wall color.

In both cases you have updated the column by plotting only few pixels.

If the color is different then you can keep the portion of ceiling and floor and plot the wall section only.

Edited January 22, 2019 by artrag

artrag · January 27, 2019

JohnPCAE

I've to apologize, in the end in my code I've ended to use secants and angles (2*pi is divided in 4*256 angles).

This is my DDA loop in Z80 ASM..


	psect text,class=CODE


; muluw .hl,.bc            |ED C3 ;de:hl<-hl*bc
; mulub .a,.b              |ED C1 ;   hl<- a*b
; mulub .a,.c              |ED C9
; mulub .a,.d              |ED D1
; mulub .a,.e              |ED D9

muluw_hl_bc macro
	db 0xed,0xc3	;de:hl<-hl*bc
	endm

mulub_a_b   macro           
	db 0xED,0xC1 	;hl<- a*b
	endm
 
	global cosine_low
	global secant_low

	global _x,_y
	global _alpha
	global _fovAngleP
	global _distP
	global _mapWidth
	global _world
	global _direction
	global _color
	global _side
	global _height
	global _dist
	
; //// global variables:
;
; // input for DDA asm routine:
;   FP_8_8 x;         // x-coordinate player position
;   FP_8_8 y;         // y-coordinate player position
;   ANGLE alpha;      // viewing direction
;   ANGLE* fovAngleP; // should be initialized to start of fovAngles array
;   FP_8_8* distP;    // should be initialized to start of dists array
;

; // output from DDA asm routine: 
;   FP_8_8 dists[NDIRS]; // perpendicular distance per column (one element filled in)
;   uchar color;         // 'color' of the hit world-block
;   uchar side;          // side on which the block was hit
;   uint height;         // visible height for the column
;   // the input variables fovAngleP and distP are updated
;   ANGLE direction;     // required for the texture coord calculation
;                        //  (split in upper and lower 8-bit part)
;
;
; //// Use the asm routine like this:
;
;   void draw()
;   {
;	vdp_setup0();
;	fovAngleP = fovAngles;
;	distP = dists;
;	for (i = 0; i < NDIRS; ++i) {
;		ASM: <the asm routine below>
;		ASM: this replaces lines 207-352 from the C code
;		ASM: in the future more will be moved from C to asm
;
;		C: asm routine above calculates the variables:
;		C:    color, side, height, direction
;		C: (optionally) calculate texture coord
;		C:    the variables 'a' and 'quadrant' should be
;		C:    replaced with the lower/upper 8 bits of 'direction'
;               C:    (upper part still needs '& 3').
;		C: actually draw the column
;	}
;   }
;

	global _fast_dda
_fast_dda:
	push ix

	; TODO make _mapWidth a power of 2
	ld	a,(_y+1)	; a = y >> 8
	ld	b,_mapWidth
	mulub_a_b		; hl = (y >>  * mapWidth
	ld	bc,(_x+1)	; c = x >> 8
	ld	b,_world/256
	add	hl,bc
	push	hl		; hl = &world[y >> 8][x >> 8]

	ld	hl,(_fovAngleP)
	ld	e,(hl)
	inc	hl
	ld	d,(hl)		; de = fovAngle
	ld	hl,(_alpha)
	add	hl,de		; hl = alpha + fovAngle
	ld	(_direction),hl

	; switch (quadrant)
	ld	a,h		; hl = direction
	and	3		; a = quadrant
	jp	z,dda_0		; case 0
	cp	2
	jp	c,dda_1		; case 1
	jr	z,dda_2		; case 2
	;jr	dda_3		; case 3


dda_3:	; case 3
	ld	a,l		; a = direction & 255
	neg			; 256 - a
	ld	bc,32767
	jr	z,1f		; a == 0?
	ld	h,secant_low/256
	ld	l,a		; hl = &secant_low[256-a]
	ld	c,(hl)
	inc	h		; hl = &secant_high[256-a]
	ld	b,(hl)
1:	push	bc		; bc = lenX

	ld	a,(_x)		; a = xfrac
	neg
	jr	z,1f		; a == 0?
	ld	h,0
	ld	l,a
	muluw_hl_bc		; de:hl = (256 - xfrac) * lenX
	ld	c,h		; bc = de:hl >> 8
	ld	b,e
1:	push	bc		; bc = distX

	ld	hl,(_direction)	; l = direction & 255
	ld	h,secant_low/256; hl = &secant_low[a]
	ld	c,(hl)
	inc	h		; hl = &secant_high[a]
	ld	b,(hl)
	push	bc		; bc = lenY

	ld	hl,(_y)		; l = yfrac
	ld	h,0
	muluw_hl_bc		; de:hl = yfrac * lenY
	ld	l,h
	ld	h,e		; hl = distY = de:hl >> 8

	pop	de		; de = lenY
	exx
	pop	ix		; ix = distX
	pop	de		; de' = lenX
	pop	hl		; hl' = worldP
	ld	bc,-_mapWidth	; bc' = -mapWidth

	; while (1)
loop3b:	exx
; loop3:	ld	a,ixh		; distX.high
loop3:	db 0xdd
	ld	a,h		; distX.high
	cp	h		; distY.high
	jr	c,2f		; ->distX smaller
	jr	nz,1f		; ->distY smaller
	; ld	a,ixl		; distX.low
	db 0xdd
	ld	a,l		; distX.low
	cp	l		; distY.low
	jr	c,2f		; -> distX smaller

1:	; distX >= distY
	exx
	add	hl,bc		; worldP -= mapWidth
	ld	a,(hl)		; a = color
	or	a
	exx
	jp	nz,end3
	add	hl,de		; distY += lenY
	jr	loop3

2:	; distX < distY
	exx
	inc	hl		; worldP += 1
	ld	a,(hl)		; a = color
	or	a
	jp	nz,end0
	add	ix,de		; distX += lenX
	jr	loop3b


dda_2:	; case 2
	ld	h,secant_low/256; hl = &secant_low[a]
	ld	c,(hl)
	inc	h		; hl = &secant_high[a]
	ld	b,(hl)
	push	bc		; bc = lenX

	ld	hl,(_x)		; l = xfrac
	ld	h,0
	muluw_hl_bc		; de:hl = xfrac * lenX
	ld	b,e
	ld	c,h		; bc = de:hl >> 8
	push	bc		; bc = distX

	ld	a,(_direction)	; a = direction & 255
	neg
	ld	bc,32767
	jr	z,1f		; a == 0?
	ld	h,secant_low/256
	ld	l,a		; hl = &secant_low[256 - a]
	ld	c,(hl)
	inc	h		; hl = &secant_high[256 - a]
	ld	b,(hl)
1:	push	bc		; bc = lenY

	ld	hl,(_y)		; l = yfrac
	ld	h,0
	muluw_hl_bc		; de:hl = yfrac * lenY
	ld	l,h
	ld	h,e		; hl = distY = de:hl >> 8

	pop	de		; de = lenY
	exx
	pop	ix		; ix = distX
	pop	de		; de' = lenX
	pop	hl		; hl' = worldP
	ld	bc,-_mapWidth	; bc' = -mapWidth

	; while (1)
loop2b:	exx
; loop2:	ld	a,ixh		; distX.high
loop2:	db 0xdd
	ld	a,h		; distX.high
	cp	h		; distY.high
	jr	c,2f		; ->distX smaller
	jr	nz,1f		; ->distY smaller
	; ld	a,ixl		; distX.low
	db 0xdd
	ld	a,l		; distX.low
	cp	l		; distY.low
	jr	c,2f		; -> distX smaller

1:	; distX >= distY
	exx
	add	hl,bc		; worldP -= mapWidth
	ld	a,(hl)		; a = color
	or	a
	exx
	jp	nz,end3
	add	hl,de		; distY += lenY
	jr	loop2

2:	; distX < distY
	exx
	dec	hl		; worldP -= 1
	ld	a,(hl)		; a = color
	or	a
	jp	nz,end2
	add	ix,de		; distX += lenX
	jr	loop2b


dda_1:	; case 1
	ld	a,l		; a = direction & 255
	neg
	ld	bc,32767
	jr	z,1f		; a == 0?
	ld	h,secant_low/256
	ld	l,a		; hl = &secant_low[a]
	ld	c,(hl)
	inc	h		; hl = &secant_high[a]
	ld	b,(hl)
1:	push	bc		; bc = lenX

	ld	hl,(_x)		; l = xfrac
	ld	h,0
	muluw_hl_bc		; de:hl = xfrac * lenX
	ld	b,e
	ld	c,h
	push	bc		; bc = distX

	ld	hl,(_direction) ; l = direction & 255
	ld	h,secant_low/256; hl = &secant_low[a]
	ld	c,(hl)
	inc	h		; hl = &secant_high[a]
	ld	b,(hl)
	push	bc		; bc = lenY

	ld	a,(_y)		; a = yfrac
	neg
	jr	z,1f
	ld	h,0
	ld	l,a
	muluw_hl_bc
	ld	b,e
	ld	c,h
1:	ld	h,b
	ld	l,c		; hl = distY

	pop	de		; de = lenY
	exx
	pop	ix		; ix = distX
	pop	de		; de' = lenX
	pop	hl		; hl' = worldP
	ld	bc,_mapWidth	; bc' = mapWidth
	
	; while (1)
loop1b:	exx
;loop1:  ld	a,ixh		; distX.high
loop1:	db 0xdd
	ld	a,h		; distX.high
	cp	h		; distY.high
	jr	c,2f		; ->distX smaller
	jr	nz,1f		; ->distY smaller
	; ld	a,ixl		; distX.low
	db 0xdd
	ld	a,l		; distX.low
	cp	l		; distY.low
	jr	c,2f		; -> distX smaller

1:	; distX >= distY
	exx
	add	hl,bc		; worldP += mapWidth
	ld	a,(hl)		; a = color
	or	a
	exx
	jr	nz,end1
	add	hl,de		; distY += lenY
	jr	loop1

2:	; distX < distY
	exx
	dec	hl		; worldP -= 1
	ld	a,(hl)		; a = color
	or	a
	jr	nz,end2
	add	ix,de		; distX += lenX
	jr	loop1b


dda_0:	; case 0
	ld	h,secant_low/256; hl = &secant_low[a]
	ld	c,(hl)
	inc	h		; hl = &secant_high[a]
	ld	b,(hl)
	push	bc		; bc = lenX

	ld	a,(_x)		; a = xfrac
	neg
	jr	z,1f		; a == 0?
	ld	h,0
	ld	l,a
	muluw_hl_bc
	ld	b,e
	ld	c,h		; bc = de:hl >> 8
1:	push	bc		; bc = distX

	ld	a,(_direction)
	neg
	ld	bc,32767
	jr	z,1f
	ld	h,secant_low/256
	ld	l,a		; hl = &secant_low[256 - a]
	ld	c,(hl)
	inc	h		; hl = &secant_high[256 - a]
	ld	b,(hl)
1:	push	bc		; bc = lenY

	ld	a,(_y)		; a = yfrac
	neg
	jr	z,1f
	ld	h,0
	ld	l,a
	muluw_hl_bc
	ld	b,e
	ld	c,h
1:	ld	h,b
	ld	l,c		; hl = distY

	pop	de		; de = lenY
	exx
	pop	ix		; ix = distX
	pop	de		; de' = lenX
	pop	hl		; hl' = worldP
	ld	bc,_mapWidth	; bc' = mapWidth
	
	; while (1)
loop0b:	exx
; loop0:	ld	a,ixh		; distX.high
loop0:	db	0xdd
	ld	a,h		; distX.high
	cp	h		; distY.high
	jr	c,2f		; ->distX smaller
	jr	nz,1f		; ->distY smaller
	; ld	a,ixl		; distX.low
	db	0xdd
	ld	a,l		; distX.low
	cp	l		; distY.low
	jr	c,2f		; -> distX smaller

1:	; distX >= distY
	exx
	add	hl,bc		; worldP += mapWidth
	ld	a,(hl)		; a = color
	or	a
	exx
	jr	nz,end1
	add	hl,de		; distY += lenY
	jr	loop0

2:	; distX < distY
	exx
	inc	hl		; worldP += 1
	ld	a,(hl)		; a = color
	or	a
	jr	nz,end0
	add	ix,de		; distX += lenX
	jr	loop0b


end3:	ld	e,3		; e  = side = NORTH
	jr	end13

end2:	ld	e,2		; e  = side = EAST
	jr	end02

end1:	ld	e,1		; e  = side = SOUTH
end13:	ld	b,h
	ld	c,l		; bc = distY
	jr	ddaend

end0:	ld	e,0		; e = side = WEST

end02:	db 0xdd
	ld	b,h		; end02:	ld	b,ixh
	db	0xdd
	ld	c,l		; bc = distX
	;jr	ddaend

ddaend:	ld	(_color),a
	ld	a,e
	ld	(_side),a
	ld	(_dist),bc	; bc = non-perpendicular distance

	ld	hl,(_fovAngleP)
	ld	d,(hl)		; d = fovAngle & 255
	inc	hl
	ld	a,(hl)		; a = fovAngle >> 8
	inc	hl
	ld	(_fovAngleP),hl	; ++fovAngleP;
	or	a
	ld	a,d
	jr	z,1f
	neg
1:
	ld	h,cosine_low/256
	ld	l,a		; hl = &cosine_low[..]
	ld	a,(hl)		
	inc	h		; hl = &cosine_high[..]
	ld	h,(hl)
	ld	l,a		; hl = cosine(..)
	muluw_hl_bc		; de:hl = dist * cosine(..)

	; de = perpendicular distance
	ld	hl,(_distP)
	ld	(hl),e
	inc	hl
	ld	(hl),d
	inc	hl
	ld	(_distP),hl	; *distP++ = dist


; Very specialized routine to calculate
;   HL = 0x4000 / DE  (clipped to [1 .. 16383])
; When DE is in range
; a) [0x0000 .. 0x0001] -> result is clipped to 16383
; b) [0x0002 .. 0x0080] -> (result 16 bit, remainder  8 bit,  14 iterations)
; c) [0x0081 .. 0x00FF] -> (result  8 bit, remainder  8 bit, 6+1 iterations)
; d) [0x0100 .. 0x3FFF] -> (result  8 bit, remainder 16 bit,   7 iterations)
; e) [0x4000 .. 0xFFFF] -> result is clipped to 1

Divide:
	ld	a,d
	or	a
	jr	z,divABC
	cp	0x21		; for correctness we need this from 0x40
	jr	nc,divE		;   for speed already do this from 0x21 (same result)

divD:	; 0x100 <= de < 0x4000
	ld	hl,128		; hl = remainder
	ld	a,1		; a  = quotient (bit 0 set, so we can later use 'cpl')
	ld	b,7		; possibly unroll 7x
2:	add	hl,hl
	sbc	hl,de
	jr	nc,1f
	add	hl,de
1:	adc	a,a
	djnz	2b
	ld	h,0
	jr	divend3		; l = cpl(a)

divE:	; de >= 0x4000  (or even  de >= 0x2100)
	ld	hl,1
	jr	divend

divABC:	; de < 0x100
	; Possible optimization: use LUT for small dividers
	;   ld	h,div_tab_low/256 ; 256-bytes aligned
	;   ld	l,e
	;   ld  a,(hl)
	;   inc	h
	;   ld	h,(hl)
	;   ld	l,a
	ld	a,e
	cp	128+1
	jr	c,divAB

divC:	; 0x80 < de < 0x100
	ld	b,6		; There are exactly 7 significant bits in the result,
				; (bit 6 is always 1), so handle 1st iteration special.
				; Possibly unroll next loop
	ld	l,2		; 'set' 'upper' quotient bit (still needs cpl and <<= 6)
	xor	a
	sub	e		; a = 256 - e
3:	add	a,a
	jr	c,1f		; we actually need 9 bits  ..  or 8 bits + carry
	cp	e
	jr	c,2f
1:	sub	e
	or	a		; clear carry flag (needed when jumped to label '1')
2:	rl	l
	djnz	3b
	ld	h,0
	jr	divend2		; l = cpl(l)

divA:	; de <= 1 
	ld	hl,16383
	jr	divend

divAB:	; de <= 0x80
	cp	2
	jr	c,divA

divB:	; 2 <= de <= 0x80
	ld	hl,3		; lower 2 bits will be <<= 14 and 'cpl'
	ld	a,1
	ld	b,14		; possibly unroll
1:	add	a,a
	cp	e
	jr	c,2f
	sub	e
2:	adc	hl,hl
	djnz	1b
	ld	a,h
	cpl
	ld	h,a
divend2:
	ld	a,l
divend3:
	cpl
	ld	l,a

divend:	ld	(_height),hl
	pop	ix
	ret

Edited January 27, 2019 by artrag

artrag · January 27, 2019

this is interesting even if off topic

sources here

https://bisqwit.iki.fi/jutut/kuvat/programming_examples/portalrendering.html

Edited January 27, 2019 by artrag

JohnPCAE · February 14, 2019

I haven't yet come up with a foolproof method for eliminating the missing corner lines without really killing performance, but in the meantime I added an instruction to my multiplication routine to improve its accuracy. It simply rounds the result instead of just chopping off the lower 8 bits. It seems like the missing corner lines happen a lot less often now.

raycast_20190214.zip

JohnPCAE · March 8, 2020

raycast_20200308.zip

Slight speedups to both the card-based and the colored-squares code. I also removed a lot of dead code (older multiplication routines that didn't use quarter-squares multiplication).

JohnPCAE · March 9, 2020

raycast_20200309.zip

Drat. I found a potential bug in the code that could cause a crash, so here is a new version. It's not all bad, though: I also eliminated a bunch of overhead from the colored-squares ray caster so it will run a bit faster.

Edited March 9, 2020 by JohnPCAE

eddhell · March 11, 2020

(to everyone involved) I commend your dedication to this, as it took a lot of coffee and dedication from me just to read the entire thread.

JohnPCAE · March 12, 2020

raycast_20200312.zip

MORE SPEED!

(changes to colored-squares mode only)

With all the loop unrolling this has, I'm running into space issues and I had to remove more dead code. This version, however, has much better performance when dealing with walls that are far away. That said, I don't know how much more this can be sped up while still using "plain vanilla" hardware. My intent has been to squeeze every bit of execution speed out of it before looking at adding features that could slow it down, such as texture mapping, "holes" (e.g doorways), and mazes larger than 16x16 squares (by tiling multiple mazes).

Edited March 12, 2020 by JohnPCAE

JohnPCAE · March 13, 2020

raycast_20200313.zip

Even MORE speed, fixed a crash, and it takes less space.

Raycasting demo

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members