Jump to content
IGNORED

Star Raiders Source Code to be released?


Knimrod

Recommended Posts

As it is all 'text book' 3D stars....all optimisations can be applied what we demo coders do :)

While it's been a while when I coded in Atari 800 ASM, that loop's op count sounds too big for such a simple task as star position computation. That's probably a result of the tradeoff 8 KB / speed of the target platform.

 

Now that we can use more RAM, I'd go for analysis of the input range (whether it's in screen-space or world-space) and adjust the look-up tables. If we devoted 8 KB to the LUT, it's enough to cover just one quarter of the screenspace (and offset the other three), we'd only have to interpolate the rest (in couple instructions).

There are so many ways to get the stars projected and at the speed they are moving, noone would be the wiser [as to their underlying technical approach] anyway.

Link to comment
Share on other sites

While just adding 32 (or whatever is the number of explosion stars - don't have the sources here ATM) to the list of active stars is certainly easiest from programmer's perspective, I believe a simple hack could still convey some perspective - so that even if you moved during the explosion, the particles would move [more-or-less] accordingly.

 

Effectively, particles would be a separate routine:

1. Each particle would move along some pre-defined trajectory (a set of precomputed (DeltaX, DeltaY) and a projection acceleration (ProjX, ProjY) ). We could have different sets of those, so each explosion would be unique (or just alter them with another set of offsets to give it a completely random look)

2. To convey the 3D perspective, we would add a pair of (OffsetX, OffsetY) based on current move of the player, hence when player moves up, the stars would go down (on the screen), emulating the perspective

 

The above is basically a set of few additions. No expensive divisions, multiplications or bit-shifting, let alone in a loop (per each particle).

 

 

The full projection transformation of the particles - that is in the code right now - is just a totally unnecessary performance overkill.

  • Like 1
Link to comment
Share on other sites

Well, I'm going to pass on what I discovered last week about the "divide" routine.
As I said before, I replaced the divide with one of my own.

I ended up with stars in a tiny box way out in front or behind me even with a full 16 bit divide... which btw wasn't slowing down the game so it was hitting the exit where you can't divide any more long before reaching 16 bits. That indicates to me that the dividend isn't using the full 16 bits.
I babbled something about 257 being the max value but that's not the dividend.

With the result being a byte, the maximum possible result of the divide is $FF no matter what the input. 127 or $7F if it's a signed value.

But with 7 passes through the loop like the divide uses, the max theoretical value is 7 * $FF or 1785. Anything greater is overflow or $FF.
If the lookup table at the end is in fact a multiply by 80 (not hex) then the max result is actually tiny.
A multiply by 128 ($80) would just involve adding a zero LSB and shifting 1 bit. It certainly wouldn't fit in one byte.

As near as I can tell, the code started out with a standard projection routine and he tried to reduce calculations so he worked back to 8 bits but that didn't fill the screen so he scaled the data. But since I haven't looked at actual numbers yet (I took the weekend off) I really don't have any more to add.

*edit*
Except if the divisor is larger than the dividend I believe it generates overflow as the value.

Edited by JamesD
Link to comment
Share on other sites

Also, here is some pseudo C code that I wrote up just now for the DIVIDE routine. It may help in understanding exactly what's going on.

bits = 7;	// TEMP4
result = 0;	// TEMP3

topnum = topnum >> 1;	// topnum is TEMP,TEMP1

if(!display_flag)
{	// back
	botnum = xpos >> 1;	//botnum is PNTR,PNTR+1
}
else
{
	botnum = (0 - xpos) >> 1;
}

do
{
	result = result << 1;
	topnum = topnum - botnum;
	result += 1;
	topnum = topnum << 1;
	if (topnum > 65535)
	{
		return(255);
	}
	bits--;
}
while (bits > 0);

return(((result+1)*81) >> ;
Edited by Joey Z
  • Like 1
Link to comment
Share on other sites

Wow! That is fantastic! A peta-thanks to Doug Neubauer, Steve Hales and Kevin Savetz, that is a giant leap for the community worldwide! A dream came true after 36 years. That is history and we are all part of it!

 

Besides the original version, there is further a version from Stefan Dorndorf with fast explosions and a score at the end! :-)) Please see attachment.

 

Further, we have here the dual-version:

 

http://atariage.com/forums/topic/184398-star-raiders-dual-anyone-heard-of-this-title/

 

In the early 80's I have heard, that there is a version with not just front and aft view, but with top and down as well as board and starboard! But I have never seen this for real nor do I have a trace of it.

 

All the years, because we fight the Zylons, I dreamed to fly my Viper home:

 

post-32599-0-40439200-1445884530_thumb.jpg

 

maybe this can be done with all views in all directions with a multiplayer of 8 over the web? ;-)

Star Raiders-stuff.zip

Link to comment
Share on other sites

 

Also, here is some pseudo C code that I wrote up just now for the DIVIDE routine. It may help in understanding exactly what's going on.

 

 

Not quite....

 

First problem, the loop does 8 iterations, not 7. TEMP4 is indeed loaded with $07, but the loop uses BPL instead of BNE.

 

The critical part of the division is missing, the check for underflow on the tentative subtraction. The current value is updated and the LSB of the result is set if and only if the tentative subtraction does not underflow. The code is producing one result bit at a time from MSB to LSB, with the catch being that the shift directions are reversed -- the result is shifted left and new bits inserted at the LSB, and the dividend is shifted left instead of the divisor being shifted right.

 

PTAB stores (i*81)>>8 and not ((i+1)*81)>>8. The initialization loop at LDTAB1 copies the high byte to PTAB before the addition occurs.

 

Here's a debugger session showing the output of the original routine:

Altirra> bt divide "divide($%04X,$%04X) = $%%02X -> $%%02X" db(temp1)*256+db(temp) db(xposh+x)*256+db(xposl+x) -- (y) (a)
Tracepoint 0 set at $AA21.
Altirra> gf
divide($05B3,$0C7C) = $3A -> $12
divide($0154,$0C7C) = $0D -> $04
divide($0737,$116A) = $35 -> $10
divide($0931,$116A) = $43 -> $15
divide($09BB,$0CC0) = $61 -> $1E
divide($064E,$0CC0) = $3F -> $13
divide($091B,$1044) = $47 -> $16
divide($092B,$1044) = $48 -> $16
divide($045B,$0A4C) = $36 -> $11
divide($0764,$0A4C) = $5B -> $1C
End of frame reached.
Altirra> ? ($764/2)*128/($a4c/2)
($764/2)*128/($a4c/2) = 91 ($005B)
Altirra> ? ($5b*81)/256
($5b*81)/256 = 28 ($001C)

...and here is the C version:

int bits = 8;
int result = 0;

if (display_flag)
	botnum = -botnum;

topnum >>= 1;
botnum >>= 1;

do {
    result <<= 1;

    if (topnum >= botnum) {
        topnum -= botnum;
        ++result;
    }

    topnum <<= 1;

    if (topnum >= 0x10000)
        return 0xFF;
} while(--bits);

return (result*81) >> 8;

By the way, since the fore/aft computation just appears to negate X, it's actually mirrored (flipped) from the fore view -- an asteroid passing by the left in the fore view appears on the left in aft view.

  • Like 5
Link to comment
Share on other sites

OK, guess that's what I get for rushing :)

 

So it seems DIVIDE gives us (TOPNUM/BOTTOMNUM) rounded down to a 128th, divided by 256, times 81.

In formula form: int(81*int(128*topnum/bottomnum))/256)

 

Seems the routine does long division of the top by the bottom, and this intermediate result is fixed point form, where the form is: 7.6543210

They make an assumption that the top value is less than 2 times the bottom value, or otherwise you just overflow. (result = 255). The intermediate result can be found by any division of the top number by the bottom number, as long as top number is less than 2 times the bottom number, with the answer represented in 128ths of a unit. Since the lookup table's operation is 81*result/256, and you have a maximum result of 255, the output range is just linearly scaled to be from 0 to 80. It appears the actual result of the division itself is not used, just the result from this lookup table.

 

Since the output ranges from 0 to 80 and is an integer, maybe we can compute the division to one less binary digit of accuracy, and get roughly the same results? If we ignore the LSB of the division operation, the most it will put us off by is 1 in the final output. 1 off over a range of 81 is about 1.2% off.

Edited by Joey Z
  • Like 3
Link to comment
Share on other sites

So it seems DIVIDE gives us (TOPNUM/BOTTOMNUM) rounded down to a 128th, divided by 256, times 81.

In formula form: int(81*int(128*topnum/bottomnum))/256)

 

Ah, thanks for that. I instrumented the DIVIDE function to store values as it was going and couldn't understand the output. I didn't realize there was that extra factor of 128, and now the results make sense. Here's some sample data from the function, broken up into 8 byte groups. bytes 0 & 1 are TEMP & TEMP1; bytes 2 & 3 are XPOSL,X and XPOSH,X, byte 4 is an 0xff inserted by my routine just to make sure it got there, byte 5 is the result from the DIVIDE routine, byte 6 is after looking up the value in PTAB. byte 7 is unused.

4100: BA 02 19 0B FF 1F 09 00 1A 05 A6 05 FF 73 24 00  .............s$.
4110: 74 05 A6 05 FF 7B 26 00 67 0F D1 0C FF 99 30 00  t.....&.g.....0.
4120: C2 01 D1 0C FF 11 05 00 01 01 80 07 FF 11 05 00  ................
4130: 03 09 80 07 FF 99 30 00 69 01 79 05 FF 20 0A 00  ......0.i.y.. ..
4140: 1D 03 79 05 FF 48 16 00 77 00 51 24 FF 01 00 00  ..y..H..w.Q$....
4150: 34 01 51 24 FF 04 01 00 92 00 58 20 FF 02 00 00  4.Q$......X ....
4160: E2 01 58 20 FF 07 02 00 CC 0C 70 07 FF DC 45 00  ..X ......p...E.
4170: 82 0C 70 07 FF D7 44 00 60 05 B1 0E FF 2E 0E 00  ..p...D.........
4180: 71 08 B1 0E FF 49 17 00 CF 08 AA 0F FF 47 16 00  q....I.......G..
4190: 53 01 AA 0F FF 0A 03 00 4C 03 1A 09 FF 2E 0E 00  S.......L.......
41A0: 31 01 1A 09 FF 10 05 00 FE 0B 11 0F FF 65 1F 00  1............e..
41B0: 3E 05 11 0F FF 2C 0D 00 FA 00 7E 05 FF 16 06 00  >....,..........
41C0: 42 00 7E 05 FF 06 01 00 A0 00 20 04 FF 13 06 00  B......... .....
41D0: 2A 02 20 04 FF 43 15 00 63 02 66 0F FF 13 06 00  *. ..C..c.f.....
41E0: D9 03 66 0F FF 1F 09 00 37 02 F9 0A FF 19 07 00  ..f.....7.......
41F0: BA 02 F9 0A FF 1F 09 00 1A 05 86 05 FF 76 25 00  .............v%.
4200: 74 05 86 05 FF 7E 27 00 67 0F B1 0C FF 9B 31 00  t.......g.....1.
4210: C2 01 B1 0C FF 11 05 00 01 01 60 07 FF 11 05 00  ................
4220: 03 09 60 07 FF 9C 31 00 69 01 59 05 FF 21 0A 00  ......1.i.Y..!..
4230: 1D 03 59 05 FF 4A 17 00 77 00 B7 24 FF 01 00 00  ..Y..J..w..$....
4240: 34 01 B7 24 FF 04 01 00 92 00 BE 20 FF 02 00 00  4..$....... ....
4250: E2 01 BE 20 FF 07 02 00 CC 0C 48 07 FF E0 46 00  ... ......H...F.
4260: 82 0C 48 07 FF DB 45 00 60 05 91 0E FF 2F 0E 00  ..H...E....../..
4270: 71 08 91 0E FF 4A 17 00 CF 08 8A 0F FF 48 16 00  q....J.......H..
4280: 53 01 8A 0F FF 0A 03 00 4C 03 FA 08 FF 2F 0E 00  S.......L..../..
4290: 31 01 FA 08 FF 10 05 00 FE 0B F1 0E FF 66 20 00  1............f .
42A0: 3E 05 F1 0E FF 2C 0D 00 FA 00 5E 05 FF 17 07 00  >....,....^.....
42B0: 42 00 5E 05 FF 06 01 00 A0 00 00 04 FF 14 06 00  B.^.............
42C0: 2A 02 00 04 FF 45 15 00 63 02 46 0F FF 13 06 00  *....E..c.F.....
42D0: D9 03 46 0F FF 20 0A 00 37 02 D9 0A FF 1A 08 00  ..F.. ..7.......
42E0: BA 02 D9 0A FF 20 0A 00 1A 05 66 05 FF 78 25 00  ..... ....f..x%.
42F0: 74 05 66 05 FF 81 28 00 67 0F 91 0C FF 9C 31 00  t.f...(.g.....1.

E.g. for the values at 42F0:

$ python
Python 2.7.5 (default, Oct 11 2013, 13:42:33) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> hex(0x574*128/0x566)
'0x81'
Edited by playermissile
Link to comment
Share on other sites

OK, guess that's what I get for rushing :)

 

So it seems DIVIDE gives us (TOPNUM/BOTTOMNUM) rounded down to a 128th, divided by 256, times 81.

In formula form: int(81*int(128*topnum/bottomnum))/256)

 

Seems the routine does long division of the top by the bottom, and this intermediate result is fixed point form, where the form is: 7.6543210

They make an assumption that the top value is less than 2 times the bottom value, or otherwise you just overflow. (result = 255). The intermediate result can be found by any division of the top number by the bottom number, as long as top number is less than 2 times the bottom number, with the answer represented in 128ths of a unit. Since the lookup table's operation is 81*result/256, and you have a maximum result of 255, the output range is just linearly scaled to be from 0 to 80. It appears the actual result of the division itself is not used, just the result from this lookup table.

 

Since the output ranges from 0 to 80 and is an integer, maybe we can compute the division to one less binary digit of accuracy, and get roughly the same results? If we ignore the LSB of the division operation, the most it will put us off by is 1 in the final output. 1 off over a range of 81 is about 1.2% off.

If it's calculating a fixed point number, we could use that for a LOG lookup.

If you change the rest of the math, can you drop to 8 bit's everywhere else?

Edited by JamesD
Link to comment
Share on other sites

Took a shot at reimplementing the DIVIDE routine using logarithms -- attached. It's an EXE, as I'm too lazy to repack it into a cartridge. I did fix the issue where if you built it as an executable and tried to run it on an XL machine, it would crash because it banked out the kernel ROM.

 

Bit of a warning: replacement division routines must handle zero properly. The hyperspace target indicator will break otherwise.

 

The new routines and tables take about 4.5K but run about three times faster. This helps during the explosions, but the problems now are the rotation, motion, and plot/unplot routines. Profile attached with main routines broken out. With the faster routine, the game is still running 3-4 frames per tick when an explosion occurs and both axes are rotating... still not good. CALCVH is still taking up an entire frame by itself even with the fast divide.

 

StarRaiders-FastDiv.zip

post-16457-0-67221700-1445909566_thumb.png

  • Like 8
Link to comment
Share on other sites

Yeah, that is all great, thank you all.

 

Some ads:

 

This one:
is good, too. :-)))
post-32599-0-60538700-1445961419_thumb.gif
(I don't know, why this isn't animated, in a normal browser it works)
Link to comment
Share on other sites

How about we put star raiders on a 1MB cart? just use all the extra space for a gigantic lookup table? you could normalize the top/bottom numbers first. You know that if topnum >= botnum << 1, then you overflow, so we don't need a table entry for any of that. 1MB gives you nearly 20 bits of lookup table to play around with, could be top 8 bits of normalized topnum, and the most significant 12 bits of the bottom number. bounds on the top and bottom number could allow you to eliminate the 8K from the LUT that you need for the game code itself.

Link to comment
Share on other sites

How about we put star raiders on a 1MB cart? just use all the extra space for a gigantic lookup table?

That would be horribly inefficient - to use 1 MB just for 1 LUT. That game is basically ~ 4 KB of code. Can you imagine how it could be extended if you used 1 MB properly and just swapped code/data in&out per each mission ? You could, quite literally, have a full Mass Effect game (with full galaxy, FPS / exploration/ driving missions) on A800, including in-engine FMVs (limited to A800's capabilities). Of course, you'd need an army of artists (and couple coders) too...

Link to comment
Share on other sites

That would be horribly inefficient - to use 1 MB just for 1 LUT. That game is basically ~ 4 KB of code. Can you imagine how it could be extended if you used 1 MB properly and just swapped code/data in&out per each mission ? You could, quite literally, have a full Mass Effect game (with full galaxy, FPS / exploration/ driving missions) on A800, including in-engine FMVs (limited to A800's capabilities). Of course, you'd need an army of artists (and couple coders) too...

could you imagine how fast the division would go?

  • Like 1
Link to comment
Share on other sites

could you imagine how fast the division would go?

 

Exactly. It seems 1MB is within easy reach for everyone (emulated anyway). I agree with VladR, the memory could be put to better use theoretically, but, an "army of artists" and "at least a couple of coders" probably will never happen. So let's not avoid something because it is not the ultimate, while we hold out for the never materializing.

 

PS: How much could it be sped up with lookup tables that were less memory hungy, so as to be able to at least include people using the real hardware (but with memory upgrades)?

Edited by fujidude
Link to comment
Share on other sites

 

Exactly. It seems 1MB is within easy reach for everyone (emulated anyway). I agree with VladR, the memory could be put to better use theoretically, but, an "army of artists" and "at least a couple of coders" probably will never happen. So let's not avoid something because it is not the ultimate, while we hold out for the never materializing.

 

PS: How much could it be sped up with lookup tables that were less memory hungy, so as to be able to at least include people using the real hardware (but with memory upgrades)?

you are pretty much at the limit of phaerons version which uses log tables. The issue with decreasing memory usage is not that of slowing it down, you lose precision of the math. Otherwise, you just need a different method entirely. Use less memory, and the answers start coming out off by a noticable amount. You can certainly try smaller lookup tables though, and see what results you get out of it. As far as for those with RAM expansions in their atari's, these can be used with a table computed during initalization.

 

Anyway, I don't see how a big LUT is 'wasting' a 1MB flash cart, as you can always replace the game with something else! Anyway, this is also a good application for those with a Veronica cart to play around with. Do the math on the coprocessor using fast division routines.

  • Like 1
Link to comment
Share on other sites

So let's not avoid something because it is not the ultimate, while we hold out for the never materializing.

 

That's some people's modus operandi.

 

Huge table. Faster game. Can happen now. Seems like a better path to walk down than the infinity delay.

  • Like 1
Link to comment
Share on other sites

How about using RAM for the tables, but not going crazy on the size so it will load from SDX on an Incognito or U1M? That would be better than wasting an entire 1M cart. Let it run from hard disk, load into banked RAM and run from there.

Maybe even keep it to 256K so RAMbo XL users can run it.

Even 128K to run on 130XE should be a great improvement over the original.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...