Jump to content
IGNORED

Star Raiders Source Code to be released?


Knimrod

Recommended Posts

I just downloaded the source code. It's super short ! Barely 3,900 lines (+ data), which is nothing for ASM. When browsing, it spans only 5 screens! Amazing, how little ASM code was enough to produce great games in those days !

 

I've been working backwards from a divide routine to duplicate the functionality of the Star Raiders "DIVIDE:" routine.
I just created a loop that keeps subtracting the divisor from the dividend and increments the result each time it's successful.

Once it fails the dividend becomes the remainder.
But the DIVIDE: routine has some odd behavior. It limits the number of loops, plus it shifts the dividend and result each loop.

A normal divide doesn't produce the desired results so it appears this cannot be duplicated by using log without additional changes to the code.

Having looked at the code just briefly, the DIVIDE is called just twice - from inside CALCVH (which calculates xpos,ypos for the stars) - once to divide zpos/xpos and then to divide ypos/xpos.

So, it should not be very difficult to adjust the code at those two places for the I/O differences to your method.

 

Of course, it's called in a loop (CALCV1) for all stars (LDX NSTARS), hence it takes 51% of frame time.

 

I'm pretty sure that approach could be replaced by a different one, while retaining the perspective effect. There are so many ways how to render the starfield from that perspective.

 

What were you planning to do with the performance you hope to gain ? Just improve the framerate or add some effect ?

Link to comment
Share on other sites

To begin with, explosions slow down the game and changing the divide to log lookups should eliminate that.
It also leaves extra cycles for other mods.
Oh, and the extra speed makes it easier to port to other platforms. More time can be spend on software sprites.

  • Like 1
Link to comment
Share on other sites

I found a version of starraiders on this disk that requires the basic cart to load and run. Don't know if someone has gone to a lot of trouble to get rom code and move it down lower in memory changing all pointers etc or it was a disk or cassette version (was one made?)

There are other what it thought were Cart only based games on this disk as well.

Note the Dos version on this disk is.. Dos 1.

DISK_39B old games.ATR

Link to comment
Share on other sites

Using X for the loop counter instead of TEMP4 in DIVIDE: saves about 18 clock cycles per call if my calculations are right.
(DEC 5 cycles - DEX 2 cycles = 3 cycles * 7 loops = 21 - 3 for LDX TEMP4)
So when there is an explosion it saves 12 normal stars + 32 explosion stars or 44 stars * 18 clock cycles each = 792 clock cycles?

There might be a few other places where little stuff like this could be done but all of them offer small returns.
I suppose once added up they could make all the difference that is needed on the explosions.
Then we wouldn't have to worry about issues with LOG and that's still an option.

 

DIVIDE:         
;       A = (TOP/BOTTOM)X80
;
        LDA     #$00            ;CLEAR THE RESULT 
        STA     TEMP3

		stx		TEMP4
		ldx		#$07
;        LDA     #$07            ; NUMBER OF SHIFTS 
;        STA     TEMP4
; SHIFT 0 INTO THE MSBIT
        LSR     TEMP1           ; TOP NUMBER
        ROR     TEMP
        LDA     DISFLG          ; FRONT OR BACK  ? 
        BNE     DIVID1          ; BACK 
        LDA     XPOSH,X         ; BOTTOM NUMBER 
        LSR     A
        STA     PNTR+1
        LDA     XPOSL,X
        ROR     A
        STA     PNTR
        JMP     DIVID2
DIVID1:         
        SEC     
        LDA     #$00
        SBC     XPOSL,X
        STA     PNTR
        LDA     #$00
        SBC     XPOSH,X
        LSR     A
        STA     PNTR+1
        ROR     PNTR
;
DIVID2:         
        ASL     TEMP3           ; SHIFT RESULT 
                                
        SEC                     ; SUBTRACT BOTTOM FROM TOP 
        LDA     TEMP
        SBC     PNTR
        TAY     
        LDA     TEMP1
        SBC     PNTR+1
        BCC     DIVID3          ; BOTTOM GREATER THAN TOP 
;               TOP LARGER
        STA     TEMP1           ; STORE REMAINDER 
        STY     TEMP
        INC     TEMP3           ; ADD 1 TO RESULT 
DIVID3:         
        ASL     TEMP            ; SHIFT TOP
        ROL     TEMP1
        BCC     DIVID4
;               IF TOP IS GREATER THN BOTTOM THEN OVERFLOW
        LDA     #$FF            ; MAX VALUE TO RESULT 
		ldx		TEMP4
        RTS     
DIVID4:
		dex						; next bit
;        DEC     TEMP4           ; NEXT BIT 
        BPL     DIVID2
        LDY     TEMP3           ; RESULT IN Y 
        LDA     PTAB,Y          ; MULTIPLY BY 80  (PTAB) 
		ldx		TEMP4
DIVID5:                         ; ENTRY POINT FROM THINK  *:
        RTS     

 

 

Edited by JamesD
Link to comment
Share on other sites

I've barely looked at the code...

 

how's other stuff like efficiency of routines that actually draw/erase stars?

Well, from the profiling (see page 3 of thread), the top 3 things are:

51% - DIVIDE

2.8 % - Store HPos

2.8 % - Store VPos

 

So, I wouldn't really worry about efficiency of drawing at this point, as there is 1 method that takes over half of each frame (and way more than that during explosions).

 

Once the DIVIDE has been refactored to take under, say, 25% of frame time, then it should make sense to check rendering.

  • Like 1
Link to comment
Share on other sites

Using X for the loop counter instead of TEMP4 in DIVIDE: saves about 18 clock cycles per call if my calculations are right.

(DEC 5 cycles - DEX 2 cycles = 3 cycles * 7 loops = 21 - 3 for LDX TEMP4)

So when there is an explosion it saves 12 normal stars + 32 explosion stars or 44 stars * 18 clock cycles each = 792 clock cycles?

There might be a few other places where little stuff like this could be done but all of them offer small returns.

I suppose once added up they could make all the difference that is needed on the explosions.

Then we wouldn't have to worry about issues with LOG and that's still an option.

So, this would save 792 cycles, potentially - which is how much ? ~3% of frame time ?. These microoptimization only make sense if you are on the threshold of frame time, hence you could fit under current vbl.

Try and see if this was the case.

 

But I believe you would gain far, far more, if you analyzed the full range of input/output values for your divide method and fit it at those 2 places. You got the full dev env, you can build and test the source code at any time.

 

Printing the values for those 12 stars sounds like a pretty fast and easy task. Then, adjusting your method should be straightforward.

 

Right now - what's the rough clock difference between the reference and your DIVIDE ? I understand it does not - yet - account for the differences, but just curious.

Link to comment
Share on other sites

I've barely looked at the code...

 

how's other stuff like efficiency of routines that actually draw/erase stars?

Most critical code seems pretty well optimized.

The problem is, without knowing what the code is doing it's impossible to rewrite it in a better way which is where real savings usually comes from.

If the explosion could be updated half in one frame and half in another you might be able to eliminate the problem.

Always update the first 12 stars and only 16 explosion stars per frame?

 

 

 

So, this would save 792 cycles, potentially - which is how much ? ~3% of frame time ?. These microoptimization only make sense if you are on the threshold of frame time, hence you could fit under current vbl.

Try and see if this was the case.

 

But I believe you would gain far, far more, if you analyzed the full range of input/output values for your divide method and fit it at those 2 places. You got the full dev env, you can build and test the source code at any time.

 

Printing the values for those 12 stars sounds like a pretty fast and easy task. Then, adjusting your method should be straightforward.

 

Right now - what's the rough clock difference between the reference and your DIVIDE ? I understand it does not - yet - account for the differences, but just curious.

I'd be shocked if it were even 3% savings to be honest.

I just ran across it while working with the code.

There is one potential savings related to values that I've already seen.

This is a 16/16 bit divide and the divide returns a max value of $FF.

So any divisor less than 257 ($FFFF/$FF) automatically results in $FF being returned and the loops can be skipped.

But if that doesn't occur very often, we cost more cycles... which that little optimization should offset so we are no worse off.

However, thanks to the weirdness in the divide I haven't figured out, 257 might not actually be the magic number.

*edit*

That might also allow some simplification of the loop but I hadn't really thought about that yet.

Also, the divisor being larger than the dividend should also result in a quick exit.

 

Edited by JamesD
Link to comment
Share on other sites

Well, tough slough ahead of you. We all know how those final routines come to be.

 

The Great Iteration List:

 

1. Yay, I got the DIVIDE thing working !

2. Hmm, this seems unnecessary. If I'll just alter the values in a layer above, I could shave off 2 ops !

3. And I could do the same in this layer right here and reuse the code for something completely unrelated !

4. This is cool, who would have thought you could abuse division to do also this little thing ?

6. Holy crap,if I modify this right here, I could also use this for this new feature ! I'll keep it disabled for now, but I'll surely return to it later to enable it !

7. 2 weeks later - Uhm, why did I call it divide again ? One beautiful summer morning, I am going to update the comments. No, really - I will !

  • Like 1
Link to comment
Share on other sites

Well, tough slough ahead of you. We all know how those final routines come to be.

 

The Great Iteration List:

 

1. Yay, I got the DIVIDE thing working !

2. Hmm, this seems unnecessary. If I'll just alter the values in a layer above, I could shave off 2 ops !

3. And I could do the same in this layer right here and reuse the code for something completely unrelated !

4. This is cool, who would have thought you could abuse division to do also this little thing ?

6. Holy crap,if I modify this right here, I could also use this for this new feature ! I'll keep it disabled for now, but I'll surely return to it later to enable it !

7. 2 weeks later - Uhm, why did I call it divide again ? One beautiful summer morning, I am going to update the comments. No, really - I will !

There's a reason I add comments as I write my code. It never gets done otherwise.

I just have to write a piece of code that calls the divide with every possible value combination and keep track of the max or min value that returns $FF.

Then it's just a matter of adding range checking to the code.

 

I have a hunch where the shifts and multiply come from but I'll need to do some testing to be sure.

Link to comment
Share on other sites

There is one other little issue with log that hasn't been mention yet.

Now, I haven't had to use LOG for 30 years, but it seems to me it's normally a decimal number and the game uses signed magnitude numbers.

 

 

The property of logs that turns division into subtraction is:

 

log(a/b) = log(a) - log(b)

 

but one of the interesting things about logs is that it doesn't matter what base you use. Typically, to get the division result it's written:

 

a/b = exp(log(a) - log(b)

 

but the exp can be replaced with whatever power function. E.g., in python using base 5:

>>> a=123455.
>>> b=4545.
>>> a/b
27.162816281628164
>>> math.log(a/b, 5)
2.0515541258596843
>>> math.log(a,5) - math.log(b,5)
2.0515541258596848
>>> math.pow(5, math.log(a,5) - math.log(b,5))
27.162816281628185

So the trick would be to use some base that would maximize the log tables into 16 bit values. And there's no reason that it has to be floating point; it could be fixed point. The tables will probably be pretty sparse because there won't be enough memory to include every possible log value, so eventually there would have to also be an interpolation routine, but as a first cut just using the closest log value would give a proof of concept.

 

You're right in that log only works with positive values, but since it's division you can just pull the signs out and deal with them at the end. Take the logs of the absolute value and If the signs were the same then the result is positive; if they were different then the result is negative. It looks like the sign is stored in a separate byte which makes things easier.

 

Let's see if I have this right: XPOS appears to be the depth into the screen. Do you have a sense of how wide the range of XPOS values is? Is it using the whole range 0000 - FFFF or is it smaller? I'm guessing it's two's compliment since the initial code comment says that -1 is FFFF (with a zero sign byte) and 0 is 0000 with a 1 sign byte.

 

Do you have an instrumented division routine or could you post a few sample inputs/outputs? I'm sure I can do the algorithm work in python and eventually generate a table of 16bit values and the 6502 assembly code for the table lookups that could be imported into CA65.

Link to comment
Share on other sites

I was expecting to have to use fixed point math.
I haven't run any tests on the range of values yet and I'm not real familiar with most of the code.

I can tell you the dividend/divisor * 80 <= 255 since the results are a byte. I haven't even verified it's actually multiplying by 80 in that table though.
If it is, then we can divide both sides of the equation by 80 and 255/80 = 3.1875 so dividend/divisor <= 3.1875.
That sounds fishy to me. If that's true, the loop could be limited to 3 or 4 instead of 7 and we might even be able to simplify the loop further and some sort of table lookup that doesn't even require LOG starts looking possible.

Edited by JamesD
Link to comment
Share on other sites

Best I can tell, Star Raiders is doing a textbook perspective projection:

 

view_width = tan(field_of_view_angle / 2)

screen_X = screen_center_X + view_Y / view_X * view_width

screen_Y = screen_center_Y + view_Z / view_X * view_width

 

The object positions are stored in view space (relative to the camera), instead of world space. CALCVH drives this by first culling out objects behind the camera (X<0 or X>=0 depending on fore/aft view) and then computes Y/X and Z/X. CALCVH computes abs(Y) and abs(Z) to pass in to DIVIDE, while DIVIDE itself computes abs(X) based on the fore/aft view setting. The division routine is a restoring 16/16 divide that calculates floor(a/b*128); it cranks out one bit at a time by binary long division. The result of the division is then looked up in PTAB, which computes y = (x*81)/256. This scales the division result by the view width, which is set by SCPTAB; this determines the field of view.

 

As shipped, the DIVIDE routine takes ~450 cycles per divide. By unrolling the loop and converting ASL + maybe INC to ROL I got it down to ~360 cycles. Note that the two divides per object use the same divisor, which an improved algorithm might be able to take advantage of.

 

Unfortunately, when I profiled the game the first time, I failed to notice that the rotation routine doesn't run if you aren't moving the joystick. Updated profile attached. ROHELP takes about a fifth of the frame when both axes are rotating. For the Y rotation, it is called twice to compute a rotation around Z as follows:

 

Y' = Y + X/64

X' = X - Y'/64

 

This is a rotation of about 0.9 degrees per frame (atan(1/64)). Interestingly, the /64 divide is dithered using RANDOM, and so it is non-deterministic. Note that the second calculation uses the result of the first, which is a hack to approximate a circle (look up "Minsky rotation").

 

post-16457-0-15477900-1445722554_thumb.png

  • Like 3
Link to comment
Share on other sites

Best I can tell, Star Raiders is doing a textbook perspective projection:

 

view_width = tan(field_of_view_angle / 2)

screen_X = screen_center_X + view_Y / view_X * view_width

screen_Y = screen_center_Y + view_Z / view_X * view_width

 

The object positions are stored in view space (relative to the camera), instead of world space. CALCVH drives this by first culling out objects behind the camera (X<0 or X>=0 depending on fore/aft view) and then computes Y/X and Z/X. CALCVH computes abs(Y) and abs(Z) to pass in to DIVIDE, while DIVIDE itself computes abs(X) based on the fore/aft view setting. The division routine is a restoring 16/16 divide that calculates floor(a/b*128); it cranks out one bit at a time by binary long division. The result of the division is then looked up in PTAB, which computes y = (x*81)/256. This scales the division result by the view width, which is set by SCPTAB; this determines the field of view.

 

As shipped, the DIVIDE routine takes ~450 cycles per divide. By unrolling the loop and converting ASL + maybe INC to ROL I got it down to ~360 cycles. Note that the two divides per object use the same divisor, which an improved algorithm might be able to take advantage of.

 

Unfortunately, when I profiled the game the first time, I failed to notice that the rotation routine doesn't run if you aren't moving the joystick. Updated profile attached. ROHELP takes about a fifth of the frame when both axes are rotating. For the Y rotation, it is called twice to compute a rotation around Z as follows:

 

Y' = Y + X/64

X' = X - Y'/64

 

This is a rotation of about 0.9 degrees per frame (atan(1/64)). Interestingly, the /64 divide is dithered using RANDOM, and so it is non-deterministic. Note that the second calculation uses the result of the first, which is a hack to approximate a circle (look up "Minsky rotation").

 

Shouldn't the division loop 16 times for a 16/16 restoring divide?

And how does a multiply by 128 end up in one byte?

Edited by JamesD
Link to comment
Share on other sites

Shouldn't the division loop 16 times for a 16/16 restoring divide?

And how does a multiply by 128 end up in one byte?

 

Nope, only 8 iterations are needed since only an 8-bit result is produced. It's a 16/16=8 routine. The multiply by 128 happens before the result is truncated to integer and is just a result of the way the divisor is aligned against the dividend when the algorithm runs. If you're simulating it in integer arithmetic, it would be (a*128)/b instead of floor(a/b*128).

Link to comment
Share on other sites

 

Nope, only 8 iterations are needed since only an 8-bit result is produced. It's a 16/16=8 routine. The multiply by 128 happens before the result is truncated to integer and is just a result of the way the divisor is aligned against the dividend when the algorithm runs. If you're simulating it in integer arithmetic, it would be (a*128)/b instead of floor(a/b*128).

Well the comments say:

; A = (TOP/BOTTOM)X80

80 in hex is 128, which I wondered about.

But the multiply is supposedly at the end of the code if you listen to the comments.

        LDY     TEMP3           ; RESULT IN Y 
        LDA     PTAB,Y          ; MULTIPLY BY 80  (PTAB) 

Last I checked, A is a byte so the result of the multiply by 80 has to be a byte.

If you multiply any byte sized integer over 1 you'll overflow that byte. (2 * 128 = 256)

That kind of defeats the purpose.

Hence my confusion

 

As for your analysis of the upper code (perspective projection), I'd say that's accurate. The divide is what threw me.

 

Link to comment
Share on other sites

just quick sunday morning hack of the original ca65 source now for MADS... should work (but not playtested).

 

replace

 

.org with ORG

all damned LSR A, ASL a, ROR A, ROL A with LSR, ASL, etc...

 

all .RES into .DS

 

exchanged the wordtab to assemble correct internal screen byte text:

 

.byte $xx,d"string"

 

 

 

put that on top

 

opt h-f+
; @com.wudsn.ide.asm.outputfileextension=.bin
so it assembles a cart file.
and put the DS segments which are at $2xx- $3xx from end of the source before the game code at $axxx so all segments are in ascending order... that's it...

StarRaiders_mads.zip

Edited by Heaven/TQA
  • Like 1
Link to comment
Share on other sites

btw. I borrowed from Star raiders for an unreleased project of mine the "shoot" sprite animation (before I got the source code)...

 

the shoots are basicly the asteroid sprite EORed with RANDOM.... clever.

 

ah no... it was AND...

 

https://youtu.be/U9adnZlhYt8

;animate test shoot
shootapproaching
lda #$7f
sta $d200
vol1 lda #$aa
sta $d201
lda #$cc
sta $d202
vol2 lda #$aa
sta $d203
lda #$ff
sta $d204
vol3 lda #$ab
sta $d205
lda #$87
sta $d206
vol4 lda #$cb
sta $d207
lda #$00
sta $d208
 
 
;erase old
jsr calc_shoot
lda ex+1
sta 53249
lda 53770
and #7
ora #$70
sta $d013
 
ldy shootcount
lax shoot_offset,y
ldy #$0f
@ lda sprite_data,x
and 53770
sprite_y sta p0+256,y
inx
dey
bpl @-
 
Edited by Heaven/TQA
  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...