Jump to content
IGNORED

Super Space Acer


Tursi

Recommended Posts

 

I've also been doing a lot of optimizations, since the original code was fairly crude and slow. One of the biggest ones has been replacing inline switch..case statements with function pointers.

 

It was fairly common in my code to have a handler for enemies (for example) in the main loop like so:

switch (enemyType) {
case SAUCER: do this;
case JET: do this;
case COPTER: do this;
}

and so forth, plus generic animation code, tests for specific cases... and so on.

 

In the best case, SDCC will reduce such things to a jump table. This means that it will calculate a list of addresses, one per entry, then calculate the index and just jump over to it. This is pretty good and if that's all it was, it would certainly be enough.

 

But in some cases, SDCC is not happy with the range provided, and needs to include additional tests to make sure the jump table is safe. Or worse, it decides to replace the jump table with a series of comparisons. In addition, I had additional unnecessary code running (for instance, the 'jet' enemy does not animate, and simply moves straight down the screen. But it still ran the animation code and checked for moving off the left and right edges).

 

I was able to replace this, and several state machines, with function pointers, which made a visible difference in performance. Separate functions almost certainly cost more in terms of code space, but not as much as you'd expect once all the special-casing code was removed. And the main enemy handler simply becomes:

enemy_func(idx);

When the enemy is created, in addition to defining its shape, color, and animation, now I also define the function that handles its movement.

 

So what does that look like? The declaration of a function pointer may look a little strange if you haven't used them very much:

void (*en_func[12])(uint8);    // pointer to enemy handler function

Note this is further complicated by being an array -- but the name goes inside the parenthesis, and this defines a function that returns void and takes in a single uint8.

 

Initializing it is as simple as passing the name of a matching function:

en_func[k]=enemysaucer;

and you call it much like any other function call. (If not for the array index, it would look identical to a normal function call).

 for (x=0; x<12; x++) {
  if (ent[x] != ENEMY_NONE) {
   en_func[x](x);
  }
 }

(I could probably have defined a function for ENEMY_NONE as well and saved that comparison, but testing for zero is usually faster than a function call's overhead.)

 

SDCC's handling of function pointer calls is not completely straightforward:

;enemy.c:276: en_func[x](x);
 ld l,e
 ld h,#0x00
 add hl, hl
 ld bc,#_en_func
 add hl,bc
 ld a, (hl)
 inc hl
 ld h,(hl)
 ld l,a
 push de
 ld a,e
 push af
 inc sp
 call ___sdcc_call_hl
 inc sp
 pop de

This code looks up the array index (which is probably half of it right there, no array would be simpler) to get the correct function pointer, then calls "sdcc_call_hl" to do the actual jump. This is a tight little indirection function, but it's still additional overhead, especially contrasted to a direct function call, which can be as simple as a direct call.

 

Here's a function pointer without the index, to show how expensive the array is. shieldsOn is defined to point to the appropriate shield activation function depending on the ship the player chose.

;superspaceacer.c:995: shieldsOn();
 ld hl,(_shieldsOn)
 call ___sdcc_call_hl

Even though arrays appear to be pretty pricey, SDCC can do some surprisingly clever things. I attempted to unroll a number of loops, but measuring the resulting performance suggested the original, seemingly more complex code, was actually faster. As with all things - if you optimize for performance, you can't say you're done until you measure it. ;)

-bugfix - dying on blimp doesn't corrupt the boss image

 

Didn't know that :o ! Thanks for sharing.

Link to comment
Share on other sites

Ran into an odd one this weekend related to optimizing the bank switches. I had the case of a single wrapper function that was getting data from the wrong bank.

 

I had optimized my bank switch macros to look like this:

 

#define SWITCH_IN_BANK7 (*(volatile unsigned char*)0xFFF8); nBank=(unsigned int)0xFFF8;

So first it dereferences the cartridge address to do the bank, then it saves off the current bank value. I was very pleased that this assembled to pretty much what I wanted to see:

 

;trampolines.c:163: SWITCH_IN_BANK5;
	ld	a,(#0xFFFA)
	ld	hl,#0xFFFA
	ld	(_nBank),hl

I couldn't ask for too much better than that! And off I went. Then, mysteriously, all of a sudden my scrolltext quit working properly.

 

Now, the code that runs the scroll text, and the text itself, live in different banks. This is okay, since it's not terribly timing critical, and so I had two wrappers for getting the data:

 

const char *winwrapcentr(unsigned char row, const char *out) {
	// special one for win code, can remove when that's replaced
	// need this because its text is in a different bank
	unsigned int old = nBank;
	const char *p;
	SWITCH_IN_BANK5;
	p=centr(row, out);
	SWITCH_IN_PREV_BANK(old);

	return p;
}

char winwrapgetbyte(const char *adr) {
	// another special one for win code - read one byte (so we can
	// get the termination character).
	unsigned int old = nBank;
	char ch;

	SWITCH_IN_BANK5;
	ch = *adr;
	SWITCH_IN_PREV_BANK(old);

	return ch;
}

The first function wraps another function that takes the string and displays it centered on the screen (returning a pointer to the next string). The second function just retrieves a single byte from the text table, and I use it to detect the end of the scroll text.

 

I'd had weird behavior before, but thought I'd solved it. Suddenly, the end of the scroll text was not being detected again.

 

I was baffled - the two functions changed banks in exactly the same way. I tried moving winwrapgetbyte around in case it was being affected by other code, no such luck. When I looked at the assembly, I had no explanation - there was no code at all:

 

;trampolines.c:100: char winwrapgetbyte(const char *adr) {
;	---------------------------------
; Function winwrapgetbyte
; ---------------------------------
_winwrapgetbyte::
	push	ix
	ld	ix,#0
	add	ix,sp
;trampolines.c:103: unsigned int old = nBank;
	ld	de,(_nBank)
;trampolines.c:106: SWITCH_IN_BANK5;
;trampolines.c:107: ch = *adr;
	ld	l,4 (ix)
	ld	h,5 (ix)
	ld	b,(hl)

That was baffling, I thought! I thought maybe the macro was damaged, I tried replacing it with inline code, I tried moving things around - always the same effect. No code emitted. Even more confusing was just 13 lines above, winwrapcentr was doing the right thing:

 

;trampolines.c:88: const char *winwrapcentr(unsigned char row, const char *out) {
;	---------------------------------
; Function winwrapcentr
; ---------------------------------
_winwrapcentr::
	push	ix
	ld	ix,#0
	add	ix,sp
;trampolines.c:91: unsigned int old = nBank;
	ld	bc,(_nBank)
;trampolines.c:93: SWITCH_IN_BANK5;
	ld	a,(#0xFFFA)
	ld	hl,#0xFFFA
	ld	(_nBank),hl

After a LOT of cursing, head-scratching, checking that I was using volatile correctly (I was, although it's unclear whether a bare read statement is valid C), I finally was able to prove what was going on.

 

Despite being marked volatile, which I intended as "do this, no matter what else, and don't reorder it either", SDCC was looking ahead. The clue came when I noticed that it was ALSO not emitting any code for "nBank=0xfff8", even when I put it in different places in the function.

 

The compiler was being too smart for me, and I was being too smart for it. SDCC was able to look at the function and realize that there was little enough going on that it never needed to change nBank. nBank was changed in this function, then changed back. No other function was called, and it was not volatile - so SDCC just left it out (which is correct). Furthermore, there was no side effect from reading 0xFFFA anywhere in this function, so it optimized that away too.

 

Why did it work, then, in ALL the other places? Simpler than I expected - the other cases include function calls. In the event of a function call, SDCC doesn't make any more assumptions about the current function. The volatile read happens and nBank is saved and pushed to the stack.

 

The solution, for now, was to go back to my old method of an assignment:

 

#define SWITCH_IN_BANK5	(*(volatile unsigned char*)0) = (*(volatile unsigned char*)0xFFFA); nBank=(unsigned int)0xFFFA;

The volatile assignment is never optimized away. The code is not too much worse than before, with just one extra 3-byte load, but I still liked seeing it simpler:

 

;trampolines.c:106: SWITCH_IN_BANK5;
	ld	a,(#0xFFFA)
	ld	(#0x0000),a

Note that it still sees that it can optimize away the setting of the global nBank, which is of course perfectly correct behavior, and a nice side effect to having the macro in C instead of assembly.

 

The ultimate solution will probably be to use inline assembly for the bank itself, but I don't know that there are any registers it's safe to use in any situation, and if I have to push stuff to the stack to switch banks, I'd rather let the compiler write the additional load. Stuff to try, anyway.

 

I'm not sure whether I like that it optimized away the volatile read, though. More than that I know I don't like that it's inconsistent - working in one situation but not in another. Just a case to remember - tools don't always do what you expect. If it's weird, verify! ;)

Link to comment
Share on other sites

  • 2 weeks later...

-bugfix - clean up sprites when leaving difficulty select

-bugfix - fix font characters

-bugfix - restore older bank switch mechanism

-bugfix - fix sprite for wide gun powerup

-bugfix - preserve score after win scroller

-bugfix - mines leave properly during BOSS APPROACHING

-bugfix - made boss hit flash last longer

-bugfix - don't draw lives indicator for cruiser or other single-life mode

-added easy/medium/hard mode endings (only one hard ending so far)

-scroll text (now 'medium' ending) is now a smooth scroll

-added proper lowercase to font

-re-did title page for new copyright year

-all enemies now have hitpoints, in particular mines, bombs and swirlies are harder to destroy

-tweaked weapon in one of the secret modes

-replaced boss draw code with explicit draw functions

-improved boss collision detection against player

-cruiser is now invulnerable during it's hit shimmy

 

http://youtu.be/DHGgC9pxUNg

 

http://www.harmlesslion.com/cgi-bin/onesoft.cgi?64

 

A lot of things changed in this version, although the ones I'm going to talk about nobody will likely see. ;)

 

The first one is boss collision detection. Because you could sit on top of the boss before, I added an invisible 'cockpit' sprite that moved with it, and would hit you if it collided. It turned out that there was still lots of room to position around it on some bosses, so after a bit of playing around, I came up with a simple solution -- the cockpit sprite chases you, although it's restricted to the boss's bounding box. I took a couple of screenshots showing it implemented with the helicopter sprite, but normally it's invisible.

 

post-12959-0-55433100-1440782397_thumb.jpgpost-12959-0-59683800-1440782420_thumb.jpg

 

This way, no matter where you enter the boss's outline, it's going to get you, and it's still just one sprite with a simple movement routine.

 

The one that's arguably interesting to most people is the full screen smooth scroll. Most of the old-timers probably know the trick, but some may not. We've had tons of talk in the TI forum about this, (including with some of the folks from here) and I'll just direct you there: http://atariage.com/forums/topic/210888-smooth-scrolling/

 

For my scrolltext, I just used the double-spaced text concept with 8 character sets. ;)

 

By far, however, the most time was spend on getting text to scroll smoothly on the bitmap screen. I had a static picture and I wanted a scrolltext beside it, something like Thunder Force IV did at the end. I settled on 12 characters of text, and got the scroll coded. It worked, and it was tolerable, but it was slower than I liked. I ended up cheating and using half sprites, half text. ;)

 

(Youtube is running slow for me, as usual, it looks like it may take a while for the video to upload).

  • Like 2
Link to comment
Share on other sites

This is a bug report on the real Colecovision with the Ultimate SD cartridge. I tried with and without SGM just in case it isn't a RAM mirroring issue. As it loads in the boss, the sprites start randomly plot all over the screen. The 2nd boss, after it load the 2nd boss while randomly plotting sprites, the sprite generator table became totally corrupted when the boss appear.

I'm not sure if it is an issue only on my Colecovision or not.

Link to comment
Share on other sites

About the game itself, which is being played in BlueMSX. I really enjoy this game and the music is outstanding. I managed to beat skill 1 and 2, so I'm working on skill 3. I'm using Snowball ship type to get through the game. Skill 3 is a challenge, and I will eventually beat this skill level so I can see the ending.

Link to comment
Share on other sites

This is a bug report on the real Colecovision with the Ultimate SD cartridge. I tried with and without SGM just in case it isn't a RAM mirroring issue. As it loads in the boss, the sprites start randomly plot all over the screen. The 2nd boss, after it load the 2nd boss while randomly plotting sprites, the sprite generator table became totally corrupted when the boss appear.

 

I'm not sure if it is an issue only on my Colecovision or not.

Same for me. Ultimate SD cart, tested with SGM and without.

Link to comment
Share on other sites

Thanks for the hardware feedback, that's something I've been wondering recently, as BlueMSX will occasionally break for too-fast VRAM access in the sprite copy code. I guess that confirms it. I will have my real hardware in a couple of weeks and can test properly, I'll make that a priority to kill (I was sure it was fixed at one time... but at least I know where to look).

 

There's one more major gameplay change coming, the enemies need to actually attack in patterns rather than random as today. I've been delaying this as I'm not entirely sure how I'm going to lay it out yet (except for one part where I do know what I want ;) ). I should also warn you, Kiwi, that the only hard ending implemented so far is for one of the secret modes. ;)

 

The smooth scrolling bosses were one of the first updates I did for the game - they are all character graphics. The game unpacks the graphics to VRAM, then during the "BOSS APPROACHING" text, it pre-shifts the graphics a few characters at a time for the boss, creating four character sets with each shifted 2 pixels. When damage is added to the boss that damage needs to be copied to all four sets. The rest of the character graphics (stars and characters) are the same in all four sets, so you don't see them change.

 

The nice thing about having character sets defined (and the ending scrolltext is similar, but with 8 sets for single pixel scroll), is it's sort of like having a scroll register available, you just change the register for the character set, and the screen shifts by a pixel. ;) The downside is all the VRAM it takes up.

  • Like 1
Link to comment
Share on other sites

  • 3 weeks later...

A minor update with some bugfixes that should get it running on hardware (I had a conflict in how I was handling the end of frame that caused it to be processed twice, when the handler assumed it was in vblank and could write freely).

 

This was mostly a legacy problem - when I first wrote the boss shifting code, I was manually checking the VDP status and using it to move the stars to give the illusion that the game was still processing. As the code improved the sprite engine began to count on being aware of end of frame so we could successfully copy the sprite table to VRAM as quickly as possible - but the boss processing code was subverting the end of frame code, causing it to be executed twice and run over the end of free time. That's fixed now, and BlueMSX, at least, does not complain anymore. I think BlueMSX is a little tighter than hardware by a few scanlines, but better safe than sorry. ;)

 

I do have my real hardware now, but I haven't had a chance to set it up yet, so I've just got emulation testing so far. If anyone tries it before I get a chance to, please let me know if the bosses work now?

 

Anyway, the lesson there is to be consistent when you have timing critical code - try to keep it centralized so you aren't caught off guard by interfering code you have forgotten about. ;)

 

No new video for this one, since the changes are behind the scenes for the most part.

 

Changelist:

 

Behavior Changes:
-bosses with mine droppers launch increasingly difficult mines
-fix ladybug shield behavior to recharge shield when hitting enemies
-no longer recenters player before boss

 

Bugfixes:
-fix random engine corruption during boss battle
-fix timing bug when boss is preparing to enter (should work on hw now)
-change startup to clear interrupt flags
-add scoring flag for cheat mode
-also remember scoremode across reboots

 

http://www.harmlesslion.com/cgi-bin/onesoft.cgi?64

Link to comment
Share on other sites

Small update today to release a version tested and working on real hardware.

http://www.harmlesslion.com/cgi-bin/onesoft.cgi?64

 

The issue was an interesting one. During boss entry, there was lots of flicker and (starting with the second boss) graphical corruption. It was a good lesson in bad assumptions. With the reports given, I had assumed VDP overrun was to blame and worked with BlueMSX as well as code review to track down all the overrun cases. The last release contained those fixes, but when I finally burned a ROM, it still had trouble. This particular bug was not actually a hardware issue at all - but bad assumptions on my part. It didn't show up in BlueMSX because of a bad assumption on their part.

 

In the original version of this game twenty years ago, the bosses didn't scroll onto the screen, they just appeared, using a simple function "drawboss()", which used global row and column variables. When I updated it to scroll them on, I decided that I'd just live with the wrap-around and let the row start negative when needed. I calculated how much VRAM was being wasted, and just marked it unused in my map:

// VRAM map:
// >0000 Screen Image Table
// >0300 Sprite Descriptor Table
// >0380 Color Table
// >03A0 (Unused)
// >03C0 Color Table 2 (all white on transparent)
// >03E0 (Unused)
// >0800 Sprite Pattern Table
// >1000 Pattern table (scroll 0)
// >1800 Pattern table (scroll 2)
// >2000 Pattern table (scroll 4)
// >2800 Pattern table (scroll 6)
// >3000 (Unused)
// >3D00 wraparound memory overwritten by boss draw code

Tested in emulation, everything worked, so I shipped it.

 

Some of you might have already figured out what happened, but don't tell just yet.

 

At the beginning of the boss draw function, I calculated a screen address like so:

ptr=(bosscol>>2)+(bossrow<<5)+gIMAGE;

gIMAGE is 0 in this case (and in fact, since I rely on it being zero to avoid other corruption, it might as well not be there. But it's a constant and it's optimized away). So when bossrow is negative, we end up wrapping around the memory pointer. Again, I didn't have a problem with that, after all, I documented the wraparound in a comment! What could go wrong?

 

I had forgotten something about how the VDP address set function works, and about what wraparound would really mean. For instance, the first boss starts at a row of -8. This translates to an address of 0xFF20. No problem, right? The VDP only has up to 0x3FFF of VRAM, and the address bits will mask.

 

That's true. The address bits /will/ mask. However, when the address 0xFF20 is written, it's very important to notice that the most significant bit is set. What does this mean? REGISTER WRITE.

 

To back up a little, the VDP is controlled by writing two bytes to the command port. The first byte (the least significant) is latched (remembered), and when the second byte is written, it is examined to see what the command is. The two most significant bits of the second byte define what the operation will do. The most significant bit tells the VDP to send the latched byte to the VDP register specified in the 3 least significant bits of the command byte. If either of the two most significant bits are set, it tells the VDP not to do a memory prefetch (commonly considered setting a 'write' address). In all cases, the VDP internal address pointer is updated.

 

So what this means is when you write an address of 0x1000, for instance, the VDP sets its internal address pointer to 0x1000, then it performs a prefetch from memory, reading the data at 0x1000 and incrementing the address pointer to 0x1001. When you read, you get the pre-fetched data and a new prefetch occurs. This is meant to allow the CPU to get a byte of VDP data without waiting for an access cycle, the prefetch operation does that in the background.

 

When you write an address of 0x5000, however, the second most significant bit is now set. the VDP still sets its internal address pointer to 0x1000 (because this address pointer only has 14 bits), but the set bit tells it NOT to prefetch data. The prefetch buffer is unmodified and the address remains 0x1000 - this allows the next byte you write to go to 0x1000.

 

When you write an address of 0x8701, this is actually no longer an address operation. The most significant bit being set tags this as a register write. This will take the latched byte (01) and write it to VDP register 7. Internally, however, this "destroys" the address pointer, because it's the only workspace register the VDP has. The address pointer still gets updated, in this case to 0x0701, and the set command bit tells it NOT to prefetch data. This is considered a side effect however.

 

That's all documented, the question becomes what happens when you send something completely non-sensical like 0xFF20. Well, the chip doesn't try to be smart about it - it does exactly what you told it to do:

- the MSB is set, so a register write takes place. There are only three bits of register space (8 registers), so the fourth bit of the nibble is ignored. That means the "register F" becomes register 7, and the value 0x20 is written to it
- the next to MSB bit is set as well. Either of the command bits being set causes a prefetch inhibit, so no prefetch will happen
- the address pointer is updated using the 14 relevant bits, in this case to 0x3F20.

 

Why is this? The trick is to remember that hardware, unlike software, is wired in parallel. It takes MORE logic to make something not happen when some other bit is set - so often, especially on older hardware, bits are tied only to their operation and completely ignore all other bits.

 

Interestingly - this set the address that I expected to be set. What I forgot was that the registers would get messed with as well. The different start addresses caused by the different sizes of the bosses cause different registers to be hit - this is why the first boss only flashes the screen, while the second one changes the sprite pattern table pointer and corrupts all the sprites.

 

Since this is such a simple effect, and BlueMSX works so hard to get the VDP right for the MSX guys, I was surprised that it didn't happen in the emulator. (I had worked these behaviors out for the TI for Classic99 some years back, thinking I was playing catch-up!) But sure enough, right there in the code in VDP.C writeLatch() was an assumption about the command bits being unique (note they correctly update the address):

   case VDP_TMS9929A: case VDP_TMS99x8A:
   if (vdp->vdpKey) {
    vdp->vramAddress = ((UInt16)value << 8 | (vdp->vramAddress & 0xff)) & 0x3fff;
    if (!(value & 0x40)) {
     if (value & 0x80) vdpUpdateRegisters(vdp, value, vdp->vdpDataLatch);
     else readNoTimingCheck(vdp, ioPort);
    }
    vdp->vdpKey = 0;
   }

The fix was fairly simple... just ensure that all three operations are always evaluated - they are not exclusive:

   case VDP_TMS9929A: case VDP_TMS99x8A:
   if (vdp->vdpKey) {
    vdp->vramAddress = ((UInt16)value << 8 | (vdp->vramAddress & 0xff)) & 0x3fff;
    if (value & 0x80) vdpUpdateRegisters(vdp, value, vdp->vdpDataLatch);
    if (!(value & 0xC0)) readNoTimingCheck(vdp, ioPort);
    vdp->vdpKey = 0;
   }



With this fix, the code glitched the same way it does on hardware. Unfortunately, I wasn't able to get a perfectly clean build of BlueMSX (though I did find the 2.8.3 source on SourceForge, I had DirectShow related build issues. I got it to go by updating DirectInput to DI8 and removing the 'Grabber' objects, which isn't ideal. I couldn't post to the forum - are they done with updating it?)

 

Anyway, that was a fun adventure, but now the game should work on hardware AND my copy of the emulator will catch such issues. ;)

 

Link to comment
Share on other sites

Accidental duplicate post, but that's okay... the final question I meant to touch on was "how did I fix it?"

 

Well, I wanted to keep the boss draw as fast as possible, because it's quite an expensive area of the game. So I wanted to avoid a test per line. I originally tried precalculating the offsets and using a switch..case to jump to the appropriate draw line, but SDCC really hated that. The code seemed to work, but it took minutes to calculate for just one function (and I have five).

 

I opted to go back to being lazy and allowing the wraparound, but updated my address write macro to mask off the most significant byte to ensure only VDP addresses were written. It's not the cleanest solution, but it only adds a single AND instruction per row, and it works. And now my comment is also accurate. ;)

Edited by Tursi
Link to comment
Share on other sites

Kinda funny that I thought the starfield being active may be causing serious NMI VRAM curruption, and I was going to suggest disabling that and see if that solves the problem. It winded up something different. Thanks you for explaining the glitch, it was interesting to read. I will try your game on my Colecovision tomorrow, I want to listen to that awesome music again.

Another quirk that BlueMSX VDP have that the Colecovision doesn't. If you place a high priority invisible sprite over non invisible, it mask the sprite below it in BlueMSX. There's an invisible sprite over the hockey puck in this old gif of PONG I made long time ago.

post-24767-0-62566700-1384471774.gif

On the real colecovision, the mask doesn't occur. I winded up using solid color sprite and redrawing the tiles to be single color.

Link to comment
Share on other sites

Ah, that's interesting that they get that wrong too... I would have expected the MSX guys to know about invisible sprites. I guess they aren't really that useful. In my testing invisible sprites do not draw, but they still set the collision flag if their pixels collide. It was probably intended to have invisible collision points over a background, if I were to guess. That's a good looking hockey pong though! :)

 

 

 

awesome music

 

I can't say enough good things about RushJet1, he was good to talk to and interested in helping - he even followed up when a year after I bought the music I still hadn't worked on the game. Most of his current chiptune stuff is NES based, but he was willing to go back if I needed anything else (he composed for the SMS, which ports across directly.)

Link to comment
Share on other sites

I got the chance to play Super Space Acer. All 5 bosses loaded properly. I couldn't get to the text scroller to see if that one works on the real hardware yet. I will keep trying. ^_^

 

hehe, thanks! I've tested both text scrollers here so they should work! ;)

Link to comment
Share on other sites

  • 2 months later...
  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...