Jump to content
  • entries
  • comments
  • views



LATEST 'N' NOT-SO-GREATEST version of the Elevator Repairman knockoff:





The sprites is all screwed up because I rewrote part of the kernel and didn't have time to fix the variable setup routines. The kernel works fine; the variables just aren't being setup properly. But that's no big deal.


Oh yeah, I also took out a bunch of stuff (score, 48-char display, etc.) that was using RAM - I needed the RAM and I didn't want to waste time optimizing when the kernel is still in flux, so I just deleted all that stuff. :D


These are bigger deals: at the suggestion of supercat, I sunk a ton of RAM into making the elevators solid. :)

RAM is pretty tight now, but it should be manageable.


However, as I mentioned in the comments to the previous post, the routine that sets up all that RAM for the elevators takes a looooooong time. It took about 65 scanlines but I optimized it (and fixed some bugs) last night so now it takes about 48 scanlines. So I could split it between the overscan and the vblank, but that's such an annoying pain that I'm hoping to come up with something faster.


[EDIT: I knew I could count on you guys! With the suggested optimizations the whole thing runs in about 33 scanlines now - long, but not too long. New binary and source:



The new routine is in the comments; this post is plenty long already.]


Here's the routine in question:


;--new method:
;--Y position of elevators ranges from 0 to 80
;	upon beginning this routine, ElevY will hold values pointing to the tops
;		of the elevators
;	-so first thing to do is to subtract ELEV_HEIGHT from each Y position 
;		so that ElevY holds the	*bottom* of the elevators
;	-after processing the bottom of the elevators, flag that we processed
;		the bottom of that elevator (temp variables) and add ELEV_HEIGHT back to the
;		the Y position so that it points to the top.
;	-after processing the top of the elevators, ORA ElevY with 128 (set top bit)
;	-when the lowest Y position (ElevY) is a negative number (top bit set) then we are done.

ldx #6
lda ElevY,X
sta ElevY,X	
lda #0
sta Temp+2,X;clear flags also
bpl InitializeElevYLoop

;--initialize the bottom band with zeroes
lda #0
sta ElevRAM
sta ElevRAM+14
sta ElevRAM+29
sta ElevRAM+44
sta ElevBandHeight

ldy #0
jsr FindLowestPtSubroutine
;--now Temp holds lowest value
;	and Temp+1 holds index into which elevator is the lowest
lda Temp
bmi AllDone
cmp ElevBandHeight,Y
beq SameBand
;--else: new band, so
;	bring all values from old band into new band:
sta ElevBandHeight+1,Y
lda ElevRAM,Y
sta ElevRAM+1,Y
lda ElevRAM+14,Y
sta ElevRAM+15,Y
lda ElevRAM+29,Y
sta ElevRAM+30,Y
lda ElevRAM+44,Y
sta ElevRAM+45,Y
ldx Temp+1
lda ElevPtrTableLo,X
sta MiscPtr
lda (MiscPtr),Y
eor ElevFlipTable,X
sta (MiscPtr),Y
lda Temp+2,X	;flags that determine top or bottom of elev
eor #$FF
sta Temp+2,X
bne JustProcessedElevBottom
;--else that was the top
lda ElevY,X
ora #$80
sta ElevY,X
bne SetElevRAMLoop;branch always
;--that was the bottom, now set it for the top
lda ElevY,X
sta ElevY,X
bne SetElevRAMLoop;branch always

;--now we have to deal with situation where the highest
;	elevator(s) is at the top line
lda ElevBandHeight,Y
cmp #80
bne NoElevAtVeryTop
sty Temp+2	;save this value for later



lda #255
sta Temp
ldx #6
stx Temp+1
lda ElevY,X
cmp Temp
bcs NotLowestPt
sta Temp
stx Temp+1
bpl FindLowestPtLoop

;--Temp+1 holds index into lowest elevator
;	Temp holds the height of the low elevator


And here's the source if you want to see things with more context:


Recommended Comments

Hey, save 10 cycles with this :lol:


ldx #0
ldy #6
lda ElevY,y
sta ElevY,y	
stx Temp+2,y  ;clear flags also
bpl InitializeElevYLoop


Are you sure about this one:

Y position of elevators ranges from 0 to 80

Wouldn't the "bottoms" get negative then? (Maybe the sec is superfluous?)


Continue like:

;--initialize the bottom band with zeroes
sty ElevRAM
sty ElevRAM+14
sty ElevRAM+29
sty ElevRAM+44
sty ElevBandHeight


For the following loop remember that y already is 0.


Do you need FindLowestPtSubroutine more than once? Eliminating the subroutine call would possibly gain 7*12 cycles, more than a scannline :D


Ok, will dig deeper - after work :ponder:

Link to comment

Hey, save 10 cycles with this :ponder:

Thanks. :lol:

Are you sure about this one:
Y position of elevators ranges from 0 to 80

Wouldn't the "bottoms" get negative then? (Maybe the sec is superfluous?)

Yeah, that was just sloppy commenting on my part; the range is actually 8-80 so, yes, the sec is superfluous. :D

Do you need FindLowestPtSubroutine more than once? Eliminating the subroutine call would possibly gain 7*12 cycles, more than a scannline :D

Good idea, plus it frees two bytes on the stack. :D



Link to comment

I really hate to say this now as you have put a lot of effort into optimising, but I prefer the old stripy elevators. I think the previous ones look like old-fashioned elevator cages, while the new ones just look like blobs to me. However, I guess the new ones look more like the original game, and they are technically more of an achievement.




Link to comment

I really hate to say this now as you have put a lot of effort into optimising, but I prefer the old stripy elevators. I think the previous ones look like old-fashioned elevator cages, while the new ones just look like blobs to me. However, I guess the new ones look more like the original game, and they are technically more of an achievement.




Don't apologize! Technical achievments be damned, it's the game that matters! :lol:


Thanks for the comment. :ponder:

Link to comment

I was going to say something similar, but Chris beat me to it. Now if you can manage gradiated colors in your elevators, that would be great, but otherwise I prefer the striped pieces to solid white rectangles.

Link to comment

I was going to say something similar, but Chris beat me to it. Now if you can manage gradiated colors in your elevators, that would be great, but otherwise I prefer the striped pieces to solid white rectangles.

That would be cool, but I don't know how that would be possible, since the elevators are all drawn with the PF.

Link to comment

Oh, I guess I'm not paying enough attention. I got it to the part where you said:

The sprites is all screwed up

...thinking by myself that this was why the elevators looked like ugly blobs now :ponder:


Fullstop and rewind please. Dead end here. Really.

Link to comment
lda ElevY,X
cmp Temp
bcs NotLowestPt
sta Temp
stx Temp+1
bpl FindLowestPtLoop



4+3+3+2+3 = 15 cycles per iteration. The approach in my sample code

  lda pos0
 cmp pos1
 bcc not_1
 ldx #1
 lda pos1
 cmp pos2
 bcc not_2
 ldx #2
 lda pos2

takes five cycles per "iteration". More code, but a 3:1 speedup.

Link to comment

Or, without unrolling:

lda ElevY+6,X
ldx #5
stx Temp+1
cmp ElevY,X
bcc NotLowestPt
stx Temp+1
bpl FindLowestPtLoop
sta Temp

Saves at least 3 cycles/loop (or 6), plus one whole loop.

Link to comment

It possibly is already conceptually flawed.

Shouldn't it be faster to sort a bunch of indexes once, instead of picking 1 elevator out of 7 - 7 times?

Link to comment
Manuel, you are right.


Even some simple bubble sort should be way more efficient than any peephole optimizations.

I thought about that but RAM is so tight that I was reluctant to go that route.

Link to comment

You were asking for speed optimization, not RAM :ponder:

I'm not rejecting that solution out of hand, I'm just saying that I considered it but RAM constraints made me reluctant.


Plus a few other issues:

-what's the maximum time to sort 7 objects? I suppose I should dig out my old CS books and just look it up, or figure it out...

-it is really 14 objects, the top and bottom of each elevator. Or, if I want to deal with it as 7 objects, then after processing the bottom of the elevator I have to reinsert the top of the elevator into the sorted list.


Is there a slick way to resolve those issues? :lol:

I don't think TJs loop will work BTW, as it never updates A.

Loop schmoop, the unrolled version is so short that I'd use that anyway. :D

Link to comment
Loop schmoop, the unrolled version is so short that I'd use that anyway. :lol:


Also frees the temp variables, as A and X already exit with the right values. It's pretty cool, I think it might even beat sorting - if my assumptions are correct, it'll save you some 500 cycles :ponder:

Link to comment

When you init all Temp+2,X with 1 instead of 0, you can replace


  lda Temp+2,X	 ;flags that determine top or bottom of elev
eor #$FF
sta Temp+2,X
bne JustProcessedElevBottom



  lsr Temp+2,X	 ;flags that determine top or bottom of elev
  bcs JustProcessedElevBottom


It'll save 56 cycles I think :ponder:



sta ElevY,X



sta ElevY,X


then, for some more :lol:


(Don't forget to DEY Y back to 0 after the initing loop!)

Link to comment
Loop schmoop, the unrolled version is so short that I'd use that anyway. :lol:


Also frees the temp variables, as A and X already exit with the right values. It's pretty cool, I think it might even beat sorting - if my assumptions are correct, it'll save you some 500 cycles :ponder:

I figured that the subroutine to find the lowest point took, on average, just about one scanline; so maximum (15 times through) that takes about 15 scanlines.

Link to comment



Supercat's unrolled find-the-lowest-point routine cut 16 scanlines out of that routine - combined with Manuel's optimizations and the whole darn routine fits in a standard vblank with plenty of time left to spare. Well, not plenty, but enough. The routine takes about 30 scanlines now.


Thanks! New version, with source, posted above.


EDIT: Looks like I spoke too soon, there is still a *little* bit of overrun occasionally [fixed by making vblank longer by ~2 lines]. But it's close enough, since I have another 5-scanline subroutine in vblank.


EDIT II: Now I just need to decide which elevators I like best. Any more opinions? I'm kind of leaning towards the striped ones, since some people seem to like them and also I'd rather have the 60+ bytes of RAM back! :ponder:


Also, here's the new, optimized routine:

	ldy #1
ldx #6
lda ElevY,X
sta ElevY,X	
sty Temp+2,X;set flags also
bpl InitializeElevYLoop

;--initialize the bottom band with zeroes
dey	;Y back to zero
sty ElevRAM
sty ElevRAM+14
sty ElevRAM+29
sty ElevRAM+44
sty ElevBandHeight
sty MiscPtr+1

	ldx #0
lda ElevY
cmp ElevY+1
bcc Elev2NotLowest
ldx #1
lda ElevY+1
cmp ElevY+2
bcc Elev3NotLowest
ldx #2
lda ElevY+2
cmp ElevY+3
bcc Elev4NotLowest
ldx #3
lda ElevY+3
cmp ElevY+4
bcc Elev5NotLowest
ldx #4
lda ElevY+4
cmp ElevY+5
bcc Elev6NotLowest
ldx #5
lda ElevY+5
cmp ElevY+6
bcc Elev7NotLowest
ldx #6
lda ElevY+6
;--now A holds lowest value
;	and X holds index into which elevator is the lowest
stx Temp+1;save index
and #$FF;set flags for value of A
bmi AllDone
cmp ElevBandHeight,Y
beq SameBand
;--else: new band, so
;	bring all values from old band into new band:
sta ElevBandHeight+1,Y
lda ElevRAM,Y
sta ElevRAM+1,Y
lda ElevRAM+14,Y
sta ElevRAM+15,Y
lda ElevRAM+29,Y
sta ElevRAM+30,Y
lda ElevRAM+44,Y
sta ElevRAM+45,Y
ldx Temp+1
lda ElevPtrTableLo,X
sta MiscPtr
lda (MiscPtr),Y
eor ElevFlipTable,X
sta (MiscPtr),Y
lsr Temp+2,X	;flags that determine top or bottom of elev
bcs JustProcessedElevBottom
;--else that was the top
lda ElevY,X
ora #$80
sta ElevY,X
bne SetElevRAMLoop;branch always
;--that was the bottom, now set it for the top
lda ElevY,X
adc #ELEV_HEIGHT-1;carry is set following bcs
sta ElevY,X
jmp SetElevRAMLoop;branch always

;--now we have to deal with situation where the highest
;	elevator(s) is at the top line
lda ElevBandHeight,Y
cmp #80
bne NoElevAtVeryTop
sty Temp+2	;save this value for later


EDIT III: Just realized that, since X isn't used except to hold the elevator index there is no reason to save and load the index to a temp variable. So I can cut that also.

Link to comment

striped elevators = hott!


In my opinion, and such.


And I think it's about time you actually finished something, hmm...



Link to comment

If you're using the MSB to indicate that you've already done an elevator, how can you handle a screen more than 128 scan lines high? Or are you planning on using half-resolution for your vertical positioning?


BTW, if you revert to striped elevators, but keep the list-based kernel, you'd have oodles of cycles free to handle other fun things. Having two 2-color maids sharing a scan line shouldn't be a problem in that case.

Link to comment
Nah! Unrolling is for wimps! :ponder:


There are times when loop unrolling is absolutely necessary. There are other times when it's clearly mandated by practicality. I would consider this one of the latter cases. There's a 2x speedup from a straightforward unrolling, and the loop is small enough that--even unrolled--it's still a practical size.


BTW, using the SAX instruction I was able to convert my "zig" demo from in-line code to a loop. Anyone remember zig?

Link to comment

And I think it's about time you actually finished something, hmm...

All he has to do is replace the maid with Santa Claus and presto! Elevator Rescue: The 2006 AtariAge Holiday Cart. :ponder:

Link to comment
striped elevators = hott!


I spent the weekend pondering this (and playing Castlevania :D) and striped it is. It will be nice to have that RAM back...

If you're using the MSB to indicate that you've already done an elevator, how can you handle a screen more than 128 scan lines high? Or are you planning on using half-resolution for your vertical positioning?

I am, and will be, using a 2-line kernel, with half-resolution positioning. The elevators will likely never move as slow as 1 line per frame so it isn't a big loss.

BTW, if you revert to striped elevators, but keep the list-based kernel, you'd have oodles of cycles free to handle other fun things. Having two 2-color maids sharing a scan line shouldn't be a problem in that case.

I'll keep that in mind, but I think there will be plenty of time for everything I want to do anyway.

And I think it's about time you actually finished something, hmm...

All he has to do is replace the maid with Santa Claus and presto! Elevator Rescue: The 2006 AtariAge Holiday Cart. :lol:

Oh hush. :ponder:

Link to comment
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Create New...