Jump to content
IGNORED

New game port WIP: Pentagram


mariuszw

Recommended Posts

Here is next preview of Pentagram. The game has been heavily optimized, with the goal to get the same speed as original on ZX Spectrum. It is difficult to measure, but comparing gameplay on Spectrum and Atari (on emulators), it looks that goal has been achieved. There are many improvements, including rearranging of sprite data by separating mask plane and data plane, optimizing sprite sorting by changing data structure from array of structures (typical Z80) to structure of arrays (typical 6502) and also caching list of sprites, preflipping some commonly flipped sprites and optimizing horizontal flipping of sprites.

 

Last, but not least, I added frame skipping, which at the expense of animation smoothness gives better control on the game in places where there is lots of animations (note that typically game renders 10 sprites per game frame, and at "hot" places, it renders 20+ sprites). I included version with and without frameskipping. Please test both versions and let me know which one is better for your taste.

 

Game should work on real Atari, but will not start from DOS, as it loads at $800. You may need appropriate loader. It should also work on NTSC machines.

 

Both versions have unlimited lives cheat as well as disabled title screen to ease testing.

 

For those interested in internals of original games, there is recent reverse engineering effort here: http://retroports.blogspot.com/, with Filmation games disassembled and reimplemented in C for various platforms.

 

I am waiting for your feedback about current version of Pentagram. Thanks!

pg.obx

pg_with_frameskipping.obx

  • Like 6
Link to comment
Share on other sites

Hmmm,

 

for the final version I would still like to have a higher starting adress than $0800, so I can load Pentagram on real A8 without XBIOS (or without a built-in loader in some hardware or OS, like e.g. Ape-loader, QMEG-loader, SPOS-loader, SIDE-loader, etc.). Maybe you can use something like $0A00, $0B00, $0C00 or simply $1000...(or start from $1000 and when the program gets executed, Exomizer depacks some data to the lower memory) ?!?

Link to comment
Share on other sites

Hmmm,

 

for the final version I would still like to have a higher starting adress than $0800, so I can load Pentagram on real A8 without XBIOS (or without a built-in loader in some hardware or OS, like e.g. Ape-loader, QMEG-loader, SPOS-loader, SIDE-loader, etc.). Maybe you can use something like $0A00, $0B00, $0C00 or simply $1000...(or start from $1000 and when the program gets executed, Exomizer depacks some data to the lower memory) ?!?

 

As you might have noticed, final releases of my games always load at $2000+ (with some initialization code on $600). Current version of Pentagram is development version, and it is much easier to keep it in a form, which loads directly at execution address.

Link to comment
Share on other sites

What is that arrays of structures vs structures of arrays?

 

Array of structures is defined in C:

 

 

struct

{

byte x;

byte y;

byte z;

byte a;

} aObjects[MAX_OBJECTS]

 

In order to access field z of element n, you need to multiply n times size of array element and add ofset of field

 

 

lda #n

asl @

asl @

tay

iny

iny

lda aObjects,y

 

Struct if arrays is:

 

 

struct

{

byte[MAX_OBJECTS] x;

byte[MAX_OBJECTS] y;

byte[MAX_OBJECTS] z;

byte[MAX_OBJECTS] a;

} aObjects;

 

Here, if you want access field z of element n, you need to simply make:

 

 

ldy #n

lda aObjects.z,y

 

Second method is very fast on 6502, while Z80 code seems to prefer first method.

Edited by mariuszw
  • Like 1
Link to comment
Share on other sites

  • 1 month later...

Long time lurker, just wanted to thank Mariuszw for this and the other conversions he has made. It really is appreciated.

 

I've been playing the latest WIP version of Pentagram quite a bit and haven't encountered any problems yet.

 

For my taste I like the non frame skipping version best as with the frame skipping version Saberman appears to "float" around the busy screens a bit too much.

 

The speed difference between this latest version compared to the first non optimized version is incredible. Shows what the 6502 can do in good hands!

Having studied the original source code Mariuszw posted I would love to see how you achieved such a large speed increase.

 

Good luck with the rest of the project, looking forward to see the final version ;-)

Link to comment
Share on other sites

Thanks for feedback!

 

I attached source code of the last released version, so you may study which optimizations have been done and how they have been done. The list was presented in the post about release.

 

About the speed, one must remember that first version of Pentagram posted was just a proof of concept, that I can recompile Z80->6502 and have a game running ;) . So it was lots of possibilities to improve code performance. However, what was not easy was to get performance comparable to Spectrum original, and I am happy it was finally possible. If you have any questions about the code, feel free to ask.

 

The final version is being prepared, although it will not differ very much from the preview. For the frame skipping, I decided to include possibility to turn it on or off from within the game, so everybody can play as he wants.

 

Were you able to actually complete the game? I wonder if room with Pentagram, bringing there amulets and game finishing was working properly.

 

 

 

Pentagram.zip

  • Like 3
Link to comment
Share on other sites

Thanks so much for releasing the latest source code, for someone still learning the ways of the 6502 this type of stuff is invaluable. Looking forward to comparing the two versions of the source code!

 

As far as the speed of the latest version goes, comparing it side by side to the Spectrum version I think your version seems to be slightly faster than the original Spectrum version so your goal has been achieved.

 

Good idea to be able to select frame skipping on/off in the final game as I'm sure there will be a difference of opinion over which method is best.

 

I've not completed the game yet (it's quite a hard game even with endless lives!). I have activated some of the pyramid things and not found any problems yet. I will report back if I find any problems or if I'm able to complete it.

 

Thanks again for all your work.

Link to comment
Share on other sites

Nice work! I'm impressed with the performance you got out of it!

 

I've just finished a 6809 port of Knight Lore for the TRS-80 Coco3. All that is left for me to do is optimise parts of the code as it's a little slower than the Spectrum atm. I need to study yours! ;)

 

Did you find that HFLIP'pIng was a big factor? The stars have aligned perfectly and as it stands on the Coco, the Knight Lore sprite data occupies almost exactly 16KB, aligned on two 8KB MMU pages. I plan on pre-HFLIP'ping every sprite and storing them in alternate memory pages. HFLIP then simply requires paging in the right 16KB bank, 2 write operations on the Coco.

 

VFLIP on Knight Lore is only used once - on the menu/text screens for the border. I'm tempted to add a few extra sprites and do away with the VFLIP logic altogether.

Link to comment
Share on other sites

Although Mariuszw is the one doing Pentagram from ZX and other isometric or not ports from C64, Knight Lore was ported to Atari 8bit in 2008 by XXL and if I remember it correctly it was from other 6502 machine, the BBC/Electron. He's a member here and you can download the game at Fandal's website:

http://a8.fandal.cz/detail.php?files_id=5710if you wish.

Edited by José Pereira
Link to comment
Share on other sites

Although Mariuszw is the one doing Pentagram from ZX and other isometric or not ports from C64, Knight Lore was ported to Atari 8bit in 2008 by XXL and if I remember it correctly it was from other 6502 machine, the BBC/Electron. He's a member here and you can download the game at Fandal's website:

http://a8.fandal.cz/detail.php?files_id=5710if you wish.

Yes, I am aware of that, thanks for the heads-up regardless!

 

My port for the Coco3 is from the original ZX Spectrum, which I reverse-engineered completely. Disassembly here. I wanted to use the Spectrum version, rather than the BBC version, simply because it was the original, and I'm also a lot more well-versed in Z80 than 6502.

 

I've also done a C port, which I'm finalising the debugging of atm, completely faithful to the original. It runs on the Amiga, amongst other machines, and I would assume would also port well to the Atari ST. The C port has the option of using the Amstrad CPC graphics, and soon also Mick Farrow's modified graphics.

  • Like 2
Link to comment
Share on other sites

Nice work! I'm impressed with the performance you got out of it!

 

I've just finished a 6809 port of Knight Lore for the TRS-80 Coco3. All that is left for me to do is optimise parts of the code as it's a little slower than the Spectrum atm. I need to study yours! ;)

 

Did you find that HFLIP'pIng was a big factor? The stars have aligned perfectly and as it stands on the Coco, the Knight Lore sprite data occupies almost exactly 16KB, aligned on two 8KB MMU pages. I plan on pre-HFLIP'ping every sprite and storing them in alternate memory pages. HFLIP then simply requires paging in the right 16KB bank, 2 write operations on the Coco.

 

VFLIP on Knight Lore is only used once - on the menu/text screens for the border. I'm tempted to add a few extra sprites and do away with the VFLIP logic altogether.

 

HFLIP is not really worth to spend another 16KB RAM for it ;) The point is, that it happens only from time to time, and it is always better to optimize things which always execute every frame. In my port I improved hflip by unrolling the loops and providing different code paths for different sprite width (see hflip_* routines) but that's all.

 

There is different situation however with tiny spider. It has only one sprite, so it is animated by flipping it every frame. For this sprite I store hflipped version of it. There is also second sprite which is preflipped in my port: dragon head, as there are rooms which feature two dragon heads and each one is looking in different direction, so this sprite is flipped two times every frame. Having this sprite preflipped gives good CPU saving.

 

What was most important for the performance was sorting the objects before drawing. The algorithm they use basically compares every redrawn sprite against all redrawn sprites, so it is O(n*n), and with 20 objects redrawn it was taking even 30-40% of CPU. What I did here was:

1. I changed the structure of objects data, and also pre-compute the values. It is done PrepareSorting method. Then, in sorting method (see LB58A routine) I could use two 6502 index registers (X and Y) to two compared objects, making comparison code short and efficient. Not sure how this maps on 6809, but IIRC there are also two index registers, so this approach may help here.

2. I checked their searching algorithm and found, that there are certain paths where it is known in advance what will be the comparison results, so code can take a shortcut. Just look for comments like "; if we are here, then C will be 2,5,8,11,14,17,20,23,26 (3*n+2), but these always go to @@57"

3. Finally I found, that algorithm for sorting behaves better, if table of objects is pre-sorted. So I implemented cache. After sorting, I cache input table (which contains objects to sort) and output table (which contains sorted objects). During next frame, I check if table of objects to sort has the same contents as table to objects to sort from previous frame and if yes, I use table of sorted objects from previous frame as parameter for sorting function. See SortingCache method for details.

 

The other part optimized was sprite drawing (PrintSprite method). What I did here was:

1. Change the structure to store pixel data and mask at different addresses (processing is faster on 6502).

2. Some background sprites in the game have mask equal to their pixel data. For these I removed mask data and written special renderer (PrintSpriteNoMask method).

3. The slowest code path here is rendering sprite which needs rotation - here I used tons of self modifying code to get best performance possible from 6502 (PrintSpriteRotated).

 

If I would have more RAM, I would precache sprites for each screen in their rotated form. Also, it would be worth to precache background objects draw at entrances - they are huge sprites, and game places them in a position so they need rotation, so rendering them is costly. You may see that game slows down if Sabreman is passing screen entrance.

 

BTW: Thanks for disassemblies, it helped to understand Z80 code and make improvements. Keep up good work!

  • Like 3
Link to comment
Share on other sites

 

HFLIP is not really worth to spend another 16KB RAM for it ;) The point is, that it happens only from time to time, and it is always better to optimize things which always execute every frame. In my port I improved hflip by unrolling the loops and providing different code paths for different sprite width (see hflip_* routines) but that's all.

 

There is different situation however with tiny spider. It has only one sprite, so it is animated by flipping it every frame. For this sprite I store hflipped version of it. There is also second sprite which is preflipped in my port: dragon head, as there are rooms which feature two dragon heads and each one is looking in different direction, so this sprite is flipped two times every frame. Having this sprite preflipped gives good CPU saving.

 

What was most important for the performance was sorting the objects before drawing. The algorithm they use basically compares every redrawn sprite against all redrawn sprites, so it is O(n*n), and with 20 objects redrawn it was taking even 30-40% of CPU. What I did here was:

1. I changed the structure of objects data, and also pre-compute the values. It is done PrepareSorting method. Then, in sorting method (see LB58A routine) I could use two 6502 index registers (X and Y) to two compared objects, making comparison code short and efficient. Not sure how this maps on 6809, but IIRC there are also two index registers, so this approach may help here.

2. I checked their searching algorithm and found, that there are certain paths where it is known in advance what will be the comparison results, so code can take a shortcut. Just look for comments like "; if we are here, then C will be 2,5,8,11,14,17,20,23,26 (3*n+2), but these always go to @@57"

3. Finally I found, that algorithm for sorting behaves better, if table of objects is pre-sorted. So I implemented cache. After sorting, I cache input table (which contains objects to sort) and output table (which contains sorted objects). During next frame, I check if table of objects to sort has the same contents as table to objects to sort from previous frame and if yes, I use table of sorted objects from previous frame as parameter for sorting function. See SortingCache method for details.

 

The other part optimized was sprite drawing (PrintSprite method). What I did here was:

1. Change the structure to store pixel data and mask at different addresses (processing is faster on 6502).

2. Some background sprites in the game have mask equal to their pixel data. For these I removed mask data and written special renderer (PrintSpriteNoMask method).

3. The slowest code path here is rendering sprite which needs rotation - here I used tons of self modifying code to get best performance possible from 6502 (PrintSpriteRotated).

 

If I would have more RAM, I would precache sprites for each screen in their rotated form. Also, it would be worth to precache background objects draw at entrances - they are huge sprites, and game places them in a position so they need rotation, so rendering them is costly. You may see that game slows down if Sabreman is passing screen entrance.

 

BTW: Thanks for disassemblies, it helped to understand Z80 code and make improvements. Keep up good work!

 

The Coco3 comes stock with 128KB, so when Knight Lore is running there's 64KB of free memory sitting there idle. Using another 16KB is neither here nor there. However, in light of your comments, I went back and looked at the objects that require HFLIP in Knight Lore; the answer is, not many. There's several background objects (walls etc) that need flipping, though of course they're only rendered when something moves in front of them. Backgrounds aside, there's just the player, the ghosts and the portcullis that moves up/down. Like your tiny spider, these objects, particularly when there's two ghosts on the screen, could require multiple flips per frame. I'll have to evaluate the likely improvement, though implementing the memory page banking would be quite trivial.

 

I'm more interested in your sorting - I hadn't gotten around to profiling the software yet - and I'm very interested to learn that it spends 30-40% of time doing so! You're right, the 6809 has two index registers. Throughout the game I use X,Y for IX,IY on the Z80 version, which works out nicely. Of course during rendering etc those get saved and re-used. I'll have to digest your optimisations. Part of me is hesitant to use them, only because the intention was to port the software as-is, warts and all, except for routines that must be changed like rendering. There's a couple of bugs in Knight Lore that write to address $0000 which is ROM on the Spectrum, and therefore harmless. Not so on the Coco3 - I've got it mapped to video memory so the screen gets garbage. Those types of bugs I did fix too. Mind you, if it makes the game more playable I might be forced to relent on my idealism! Either way, I'll study your code and try it out to see what savings I get too - thanks again for making the source public!

 

I'm a little confused at your use of the term 'rotation' and how that relates to HFLIP? IIUC they're one-in-the-same!?!

 

What's funny is that you use, eg. "z80_b" to store intermediate results in Z80 registers. I used exactly the same names for mine! ;-)

 

So at the end of it all, do you think there was benefit in using automated translation as a starting point? Or in hindsight would you have hand-coded the entire program in 6502?

 

Depending on the reception to my Coco3 Knight Lore, I'm considering doing the other two filmation games as well (little point going to the effort if no-one wants to play them). They should be much quicker as the 'core engine' has already been done, even if it requires minor tweaks. I'll probably (also) do C ports for them too - maybe.

Link to comment
Share on other sites

How about porting this for an Atari with a 65816?

 

Nice idea, but I am not familiar with 65816 programming at all. On the other hand, higher CPU clock of 65816 will give significant performance boost even to 6502 version of the game, so why doing native 65816 code? Moreover you can have working Spectrum emulation with 65816 ;-) (see https://www.youtube.com/watch?v=hcwx3JtruLM).

 

I take care however not to use illegal opcodes, so Pentagram runs on 65816 properly. With current version of the game there is a problem it doesn't synchronize with CPU, so it will run too fast on 65816 with high clock. I will address this issue in final release.

  • Like 2
Link to comment
Share on other sites

 

The Coco3 comes stock with 128KB, so when Knight Lore is running there's 64KB of free memory sitting there idle. Using another 16KB is neither here nor there. However, in light of your comments, I went back and looked at the objects that require HFLIP in Knight Lore; the answer is, not many. There's several background objects (walls etc) that need flipping, though of course they're only rendered when something moves in front of them. Backgrounds aside, there's just the player, the ghosts and the portcullis that moves up/down. Like your tiny spider, these objects, particularly when there's two ghosts on the screen, could require multiple flips per frame. I'll have to evaluate the likely improvement, though implementing the memory page banking would be quite trivial.

 

I'm more interested in your sorting - I hadn't gotten around to profiling the software yet - and I'm very interested to learn that it spends 30-40% of time doing so! You're right, the 6809 has two index registers. Throughout the game I use X,Y for IX,IY on the Z80 version, which works out nicely. Of course during rendering etc those get saved and re-used. I'll have to digest your optimisations. Part of me is hesitant to use them, only because the intention was to port the software as-is, warts and all, except for routines that must be changed like rendering. There's a couple of bugs in Knight Lore that write to address $0000 which is ROM on the Spectrum, and therefore harmless. Not so on the Coco3 - I've got it mapped to video memory so the screen gets garbage. Those types of bugs I did fix too. Mind you, if it makes the game more playable I might be forced to relent on my idealism! Either way, I'll study your code and try it out to see what savings I get too - thanks again for making the source public!

 

I'm a little confused at your use of the term 'rotation' and how that relates to HFLIP? IIUC they're one-in-the-same!?!

 

What's funny is that you use, eg. "z80_b" to store intermediate results in Z80 registers. I used exactly the same names for mine! ;-)

 

So at the end of it all, do you think there was benefit in using automated translation as a starting point? Or in hindsight would you have hand-coded the entire program in 6502?

 

Depending on the reception to my Coco3 Knight Lore, I'm considering doing the other two filmation games as well (little point going to the effort if no-one wants to play them). They should be much quicker as the 'core engine' has already been done, even if it requires minor tweaks. I'll probably (also) do C ports for them too - maybe.

 

Sorting: I read note on your blog where you stated that changing sorting algorithm will make new version work differently than original: I would agree about cache, but the other optimization with using different data structure and making shortcuts doesn't really change original game in my opinion and event these were significant IIRC.

 

Rotation: I am referring to situation where you need to render sprite not on byte boundary. In that case code rotates sprite data by 1,2,...,7 bits depending on the position of the sprite on the screen. This is quite slow - on my Atari version, I need 48 cycles for one byte rendered! For comparison, when sprite is rendered on byte boundary (so no rotation), there is only 24,5 cycles per byte. Having sprites prerotated would help quite a lot, but it will take tons of memory to precache all the data.

 

Automated translation: For me it was essential to get thing done ;) Before trying with automated translation, I attempted to make manual translation (i.e. each game routine manually rewritten to 6502) but I found it extremely time consuming and also I was afraid there would be lots of typing mistakes (which usually happens with manual code) and these would be pain to debug and fix - remember I started without commented disassembly, so I didn't really know what each procedure is supposed to do. Also, usually 90% of the program time is spent in 10% of the code, so other 90% doesn't really have to be fast and optimal ;) I am still experimenting with recompiler and I am still happy with the results.

 

Good luck with other ports. I think Alien 8 is more "mission pack" for Knight Lore, so porting this will be probably trivial. Pentagram adds shooting, so it is a little bit more complicated. I also took a look at GunFright and while this is another engine, the code quality is high and structure is beautiful, so I believe if you enjoyed Knight Lore analysis, you will enjoy GunFright too. (And then Nightshade, the same engine as GunFright).

Link to comment
Share on other sites

 

Sorting: I read note on your blog where you stated that changing sorting algorithm will make new version work differently than original: I would agree about cache, but the other optimization with using different data structure and making shortcuts doesn't really change original game in my opinion and event these were significant IIRC.

 

Rotation: I am referring to situation where you need to render sprite not on byte boundary. In that case code rotates sprite data by 1,2,...,7 bits depending on the position of the sprite on the screen. This is quite slow - on my Atari version, I need 48 cycles for one byte rendered! For comparison, when sprite is rendered on byte boundary (so no rotation), there is only 24,5 cycles per byte. Having sprites prerotated would help quite a lot, but it will take tons of memory to precache all the data.

 

Automated translation: For me it was essential to get thing done ;) Before trying with automated translation, I attempted to make manual translation (i.e. each game routine manually rewritten to 6502) but I found it extremely time consuming and also I was afraid there would be lots of typing mistakes (which usually happens with manual code) and these would be pain to debug and fix - remember I started without commented disassembly, so I didn't really know what each procedure is supposed to do. Also, usually 90% of the program time is spent in 10% of the code, so other 90% doesn't really have to be fast and optimal ;) I am still experimenting with recompiler and I am still happy with the results.

 

Good luck with other ports. I think Alien 8 is more "mission pack" for Knight Lore, so porting this will be probably trivial. Pentagram adds shooting, so it is a little bit more complicated. I also took a look at GunFright and while this is another engine, the code quality is high and structure is beautiful, so I believe if you enjoyed Knight Lore analysis, you will enjoy GunFright too. (And then Nightshade, the same engine as GunFright).

 

I didn't meant to say it would change the way the game plays, but rather how the game works. The outcome would, in theory, be exactly the same, so I agree with you there! I'm just being a pedantic purist. ;)

 

Ah rotation - I use the term shifted. Now I'm on the same page! I haven't calculated the hit I take using the lookup tables but caching is an interesting idea, especially considering I have 64KB free memory. Considering there's a maximum of 40 sprites on the screen at any one time (and many of them the same), it wouldn't take a silly amount of memory to cache just what's being shown. Obviously as sprites change into other sprites there'd be a need to modify the cache, but it might be worth looking into as well... it all comes down to how much speed I can coax out of simpler optimisations on the Coco3 first. But definitely going to look at sorting given your metrics!

 

Yeah it would be scary to attempt without reverse-engineering the original. Amazingly, that's exactly what Sockmaster did when he ported the Z80-based arcade version of Donkey Kong to the 6809-based Coco3, also adding emulation of (scaled-down) hardware tiles/sprites and sound hardware. The results are simply incredible, and that project has been an inspiration for me for my later porting projects, in particular Donkey Kong for the Neo Geo (in 68K ASM, 50% done). Anyway, the work you've done with automated translation is very impressive indeed!

 

I've taken a preliminary look at Alien 8 and Pentagram and was encouraged by the similarities in the core engine - especially Alien 8. And from what I saw I didn't think shooting was too radical an update; just another sprite whose update handler made it move and destroy other objects?!? Perhaps I'm not remembering properly, or perhaps I'm thinking that's how I would have added it to the basic core?!? I'll have to study it in more detail if/when I get around to it.

 

Right now I'm feeling a little over Knight Lore; 5 ports in 3 different languages is getting a little old. Many years ago I reverse-engineered the arcade Space Invaders - almost but not quite to the degree that is now on Computer Archeology - in order to patch it to run on the TRS-80. I'm thinking of revisiting that just for a diversion from filmation, and tinkering with the idea of porting it to the Coco3 and, since it's already in 8080/Z80, the ZX Spectrum. Both of these machines have plenty of Space Invaders clones/remakes already, so it wouldn't be particularly exciting, but IMHO cool to have the original code/logic running for a 100% faithful port. I'm still in the process of seeing how that would work on the respective screens - the Coco3 at least can do 320x225 which is just enough vertical resolution for invaders if the top two lines of text are moved to the side of the display (which I'm sure you've seen before). Or perhaps I'll do something different (I have a couple of Apple II games in mind), and I'd still like to tackle an Atari 8-bit game. Either way I'll probably do one project before I tackle another filmation game again.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...