Jump to content
  • entries
    106
  • comments
    796
  • views
    140,721

vdub_bobby

1,818 views

LATEST 'N' NOT-SO-GREATEST version of the Elevator Repairman knockoff:

 

ElevRep20061025.bin

blog-6060-1161875416_thumb.png

 

The sprites is all screwed up because I rewrote part of the kernel and didn't have time to fix the variable setup routines. The kernel works fine; the variables just aren't being setup properly. But that's no big deal.

 

Oh yeah, I also took out a bunch of stuff (score, 48-char display, etc.) that was using RAM - I needed the RAM and I didn't want to waste time optimizing when the kernel is still in flux, so I just deleted all that stuff. :D

 

These are bigger deals: at the suggestion of supercat, I sunk a ton of RAM into making the elevators solid. :)

RAM is pretty tight now, but it should be manageable.

 

However, as I mentioned in the comments to the previous post, the routine that sets up all that RAM for the elevators takes a looooooong time. It took about 65 scanlines but I optimized it (and fixed some bugs) last night so now it takes about 48 scanlines. So I could split it between the overscan and the vblank, but that's such an annoying pain that I'm hoping to come up with something faster.

 

[EDIT: I knew I could count on you guys! With the suggested optimizations the whole thing runs in about 33 scanlines now - long, but not too long. New binary and source:

ElevRep20061027.bin

ElevRep20061027source.zip

The new routine is in the comments; this post is plenty long already.]

 

Here's the routine in question:

SetElevRAMSubroutine

;--new method:
;--Y position of elevators ranges from 0 to 80
;--so:
;	upon beginning this routine, ElevY will hold values pointing to the tops
;		of the elevators
;	-so first thing to do is to subtract ELEV_HEIGHT from each Y position 
;		so that ElevY holds the	*bottom* of the elevators
;	-after processing the bottom of the elevators, flag that we processed
;		the bottom of that elevator (temp variables) and add ELEV_HEIGHT back to the
;		the Y position so that it points to the top.
;	-after processing the top of the elevators, ORA ElevY with 128 (set top bit)
;	-when the lowest Y position (ElevY) is a negative number (top bit set) then we are done.


ldx #6
InitializeElevYLoop
lda ElevY,X
sec
sbc #ELEV_HEIGHT
sta ElevY,X	
lda #0
sta Temp+2,X;clear flags also
dex
bpl InitializeElevYLoop

;--initialize the bottom band with zeroes
lda #0
sta ElevRAM
sta ElevRAM+14
sta ElevRAM+29
sta ElevRAM+44
sta ElevBandHeight

ldy #0
SetElevRAMLoop
jsr FindLowestPtSubroutine
;--now Temp holds lowest value
;	and Temp+1 holds index into which elevator is the lowest
lda Temp
bmi AllDone
cmp ElevBandHeight,Y
beq SameBand
;--else: new band, so
;	bring all values from old band into new band:
sta ElevBandHeight+1,Y
lda ElevRAM,Y
sta ElevRAM+1,Y
lda ElevRAM+14,Y
sta ElevRAM+15,Y
lda ElevRAM+29,Y
sta ElevRAM+30,Y
lda ElevRAM+44,Y
sta ElevRAM+45,Y
iny
SameBand
ldx Temp+1
lda ElevPtrTableLo,X
sta MiscPtr
lda (MiscPtr),Y
eor ElevFlipTable,X
sta (MiscPtr),Y
lda Temp+2,X	;flags that determine top or bottom of elev
eor #$FF
sta Temp+2,X
bne JustProcessedElevBottom
;--else that was the top
lda ElevY,X
ora #$80
sta ElevY,X
bne SetElevRAMLoop;branch always
JustProcessedElevBottom
;--that was the bottom, now set it for the top
lda ElevY,X
clc
adc #ELEV_HEIGHT
sta ElevY,X
bne SetElevRAMLoop;branch always

AllDone
;--now we have to deal with situation where the highest
;	elevator(s) is at the top line
lda ElevBandHeight,Y
cmp #80
bne NoElevAtVeryTop
dey
NoElevAtVeryTop
sty Temp+2	;save this value for later

rts

;----------------------------------------------------------------------------

FindLowestPtSubroutine
lda #255
sta Temp
ldx #6
stx Temp+1
FindLowestPtLoop
lda ElevY,X
cmp Temp
bcs NotLowestPt
sta Temp
stx Temp+1
NotLowestPt
dex
bpl FindLowestPtLoop

;--Temp+1 holds index into lowest elevator
;	Temp holds the height of the low elevator


rts

And here's the source if you want to see things with more context:

25 Comments


Recommended Comments

Hey, save 10 cycles with this :lol:

 

ldx #0
ldy #6
InitializeElevYLoop
lda ElevY,y
sec
sbc #ELEV_HEIGHT
sta ElevY,y	
stx Temp+2,y  ;clear flags also
dey
bpl InitializeElevYLoop

 

Are you sure about this one:

Y position of elevators ranges from 0 to 80

Wouldn't the "bottoms" get negative then? (Maybe the sec is superfluous?)

 

Continue like:

;--initialize the bottom band with zeroes
sty ElevRAM
sty ElevRAM+14
sty ElevRAM+29
sty ElevRAM+44
sty ElevBandHeight

 

For the following loop remember that y already is 0.

 

Do you need FindLowestPtSubroutine more than once? Eliminating the subroutine call would possibly gain 7*12 cycles, more than a scannline :D

 

Ok, will dig deeper - after work :ponder:

Link to comment

Hey, save 10 cycles with this :ponder:

Thanks. :lol:

Are you sure about this one:
Y position of elevators ranges from 0 to 80

Wouldn't the "bottoms" get negative then? (Maybe the sec is superfluous?)

Yeah, that was just sloppy commenting on my part; the range is actually 8-80 so, yes, the sec is superfluous. :D

Do you need FindLowestPtSubroutine more than once? Eliminating the subroutine call would possibly gain 7*12 cycles, more than a scannline :D

Good idea, plus it frees two bytes on the stack. :D

 

Thanks!

Link to comment

I really hate to say this now as you have put a lot of effort into optimising, but I prefer the old stripy elevators. I think the previous ones look like old-fashioned elevator cages, while the new ones just look like blobs to me. However, I guess the new ones look more like the original game, and they are technically more of an achievement.

 

Apologies,

Chris

Link to comment

I really hate to say this now as you have put a lot of effort into optimising, but I prefer the old stripy elevators. I think the previous ones look like old-fashioned elevator cages, while the new ones just look like blobs to me. However, I guess the new ones look more like the original game, and they are technically more of an achievement.

 

Apologies,

Chris

Don't apologize! Technical achievments be damned, it's the game that matters! :lol:

 

Thanks for the comment. :ponder:

Link to comment

I was going to say something similar, but Chris beat me to it. Now if you can manage gradiated colors in your elevators, that would be great, but otherwise I prefer the striped pieces to solid white rectangles.

Link to comment

I was going to say something similar, but Chris beat me to it. Now if you can manage gradiated colors in your elevators, that would be great, but otherwise I prefer the striped pieces to solid white rectangles.

That would be cool, but I don't know how that would be possible, since the elevators are all drawn with the PF.

Link to comment

Oh, I guess I'm not paying enough attention. I got it to the part where you said:

The sprites is all screwed up

...thinking by myself that this was why the elevators looked like ugly blobs now :ponder:

 

Fullstop and rewind please. Dead end here. Really.

Link to comment
FindLowestPtLoop
lda ElevY,X
cmp Temp
bcs NotLowestPt
sta Temp
stx Temp+1
NotLowestPt
dex
bpl FindLowestPtLoop

[/quite]

 

4+3+3+2+3 = 15 cycles per iteration. The approach in my sample code

  lda pos0
not_0:
 cmp pos1
 bcc not_1
 ldx #1
 lda pos1
not_1:
 cmp pos2
 bcc not_2
 ldx #2
 lda pos2
not_2:

takes five cycles per "iteration". More code, but a 3:1 speedup.

Link to comment

Or, without unrolling:

FindLowestPtSubroutine
lda ElevY+6,X
ldx #5
stx Temp+1
FindLowestPtLoop
cmp ElevY,X
bcc NotLowestPt
stx Temp+1
NotLowestPt
dex
bpl FindLowestPtLoop
sta Temp

Saves at least 3 cycles/loop (or 6), plus one whole loop.

Link to comment

It possibly is already conceptually flawed.

Shouldn't it be faster to sort a bunch of indexes once, instead of picking 1 elevator out of 7 - 7 times?

Link to comment
Manuel, you are right.

 

Even some simple bubble sort should be way more efficient than any peephole optimizations.

I thought about that but RAM is so tight that I was reluctant to go that route.

Link to comment

You were asking for speed optimization, not RAM :ponder:

I'm not rejecting that solution out of hand, I'm just saying that I considered it but RAM constraints made me reluctant.

 

Plus a few other issues:

-what's the maximum time to sort 7 objects? I suppose I should dig out my old CS books and just look it up, or figure it out...

-it is really 14 objects, the top and bottom of each elevator. Or, if I want to deal with it as 7 objects, then after processing the bottom of the elevator I have to reinsert the top of the elevator into the sorted list.

 

Is there a slick way to resolve those issues? :lol:

I don't think TJs loop will work BTW, as it never updates A.

Loop schmoop, the unrolled version is so short that I'd use that anyway. :D

Link to comment
Loop schmoop, the unrolled version is so short that I'd use that anyway. :lol:

 

Also frees the temp variables, as A and X already exit with the right values. It's pretty cool, I think it might even beat sorting - if my assumptions are correct, it'll save you some 500 cycles :ponder:

Link to comment

When you init all Temp+2,X with 1 instead of 0, you can replace

 

  lda Temp+2,X	 ;flags that determine top or bottom of elev
eor #$FF
sta Temp+2,X
bne JustProcessedElevBottom

 

with

  lsr Temp+2,X	 ;flags that determine top or bottom of elev
  bcs JustProcessedElevBottom

 

It'll save 56 cycles I think :ponder:

 

Replace

clc
adc #ELEV_HEIGHT
sta ElevY,X

 

with

adc #ELEV_HEIGHT-1
sta ElevY,X

 

then, for some more :lol:

 

(Don't forget to DEY Y back to 0 after the initing loop!)

Link to comment
Loop schmoop, the unrolled version is so short that I'd use that anyway. :lol:

 

Also frees the temp variables, as A and X already exit with the right values. It's pretty cool, I think it might even beat sorting - if my assumptions are correct, it'll save you some 500 cycles :ponder:

I figured that the subroutine to find the lowest point took, on average, just about one scanline; so maximum (15 times through) that takes about 15 scanlines.

Link to comment

:lol:

 

Supercat's unrolled find-the-lowest-point routine cut 16 scanlines out of that routine - combined with Manuel's optimizations and the whole darn routine fits in a standard vblank with plenty of time left to spare. Well, not plenty, but enough. The routine takes about 30 scanlines now.

 

Thanks! New version, with source, posted above.

 

EDIT: Looks like I spoke too soon, there is still a *little* bit of overrun occasionally [fixed by making vblank longer by ~2 lines]. But it's close enough, since I have another 5-scanline subroutine in vblank.

 

EDIT II: Now I just need to decide which elevators I like best. Any more opinions? I'm kind of leaning towards the striped ones, since some people seem to like them and also I'd rather have the 60+ bytes of RAM back! :ponder:

 

Also, here's the new, optimized routine:

	ldy #1
ldx #6
InitializeElevYLoop
lda ElevY,X
sec
sbc #ELEV_HEIGHT
sta ElevY,X	
sty Temp+2,X;set flags also
dex
bpl InitializeElevYLoop

;--initialize the bottom band with zeroes
dey	;Y back to zero
sty ElevRAM
sty ElevRAM+14
sty ElevRAM+29
sty ElevRAM+44
sty ElevBandHeight
sty MiscPtr+1

SetElevRAMLoop
	ldx #0
lda ElevY
cmp ElevY+1
bcc Elev2NotLowest
ldx #1
lda ElevY+1
Elev2NotLowest
cmp ElevY+2
bcc Elev3NotLowest
ldx #2
lda ElevY+2
Elev3NotLowest
cmp ElevY+3
bcc Elev4NotLowest
ldx #3
lda ElevY+3
Elev4NotLowest
cmp ElevY+4
bcc Elev5NotLowest
ldx #4
lda ElevY+4
Elev5NotLowest
cmp ElevY+5
bcc Elev6NotLowest
ldx #5
lda ElevY+5
Elev6NotLowest
cmp ElevY+6
bcc Elev7NotLowest
ldx #6
lda ElevY+6
Elev7NotLowest
;--now A holds lowest value
;	and X holds index into which elevator is the lowest
stx Temp+1;save index
and #$FF;set flags for value of A
bmi AllDone
cmp ElevBandHeight,Y
beq SameBand
;--else: new band, so
;	bring all values from old band into new band:
sta ElevBandHeight+1,Y
lda ElevRAM,Y
sta ElevRAM+1,Y
lda ElevRAM+14,Y
sta ElevRAM+15,Y
lda ElevRAM+29,Y
sta ElevRAM+30,Y
lda ElevRAM+44,Y
sta ElevRAM+45,Y
iny
SameBand
ldx Temp+1
lda ElevPtrTableLo,X
sta MiscPtr
lda (MiscPtr),Y
eor ElevFlipTable,X
sta (MiscPtr),Y
lsr Temp+2,X	;flags that determine top or bottom of elev
bcs JustProcessedElevBottom
;--else that was the top
lda ElevY,X
ora #$80
sta ElevY,X
bne SetElevRAMLoop;branch always
JustProcessedElevBottom
;--that was the bottom, now set it for the top
lda ElevY,X
adc #ELEV_HEIGHT-1;carry is set following bcs
sta ElevY,X
jmp SetElevRAMLoop;branch always

AllDone
;--now we have to deal with situation where the highest
;	elevator(s) is at the top line
lda ElevBandHeight,Y
cmp #80
bne NoElevAtVeryTop
dey
NoElevAtVeryTop
sty Temp+2	;save this value for later

rts

EDIT III: Just realized that, since X isn't used except to hold the elevator index there is no reason to save and load the index to a temp variable. So I can cut that also.

Link to comment

striped elevators = hott!

 

In my opinion, and such.

 

And I think it's about time you actually finished something, hmm...

 

:ponder:

Link to comment

If you're using the MSB to indicate that you've already done an elevator, how can you handle a screen more than 128 scan lines high? Or are you planning on using half-resolution for your vertical positioning?

 

BTW, if you revert to striped elevators, but keep the list-based kernel, you'd have oodles of cycles free to handle other fun things. Having two 2-color maids sharing a scan line shouldn't be a problem in that case.

Link to comment
Nah! Unrolling is for wimps! :ponder:

 

There are times when loop unrolling is absolutely necessary. There are other times when it's clearly mandated by practicality. I would consider this one of the latter cases. There's a 2x speedup from a straightforward unrolling, and the loop is small enough that--even unrolled--it's still a practical size.

 

BTW, using the SAX instruction I was able to convert my "zig" demo from in-line code to a loop. Anyone remember zig?

Link to comment

And I think it's about time you actually finished something, hmm...

All he has to do is replace the maid with Santa Claus and presto! Elevator Rescue: The 2006 AtariAge Holiday Cart. :ponder:

Link to comment
striped elevators = hott!

 

I spent the weekend pondering this (and playing Castlevania :D) and striped it is. It will be nice to have that RAM back...

If you're using the MSB to indicate that you've already done an elevator, how can you handle a screen more than 128 scan lines high? Or are you planning on using half-resolution for your vertical positioning?

I am, and will be, using a 2-line kernel, with half-resolution positioning. The elevators will likely never move as slow as 1 line per frame so it isn't a big loss.

BTW, if you revert to striped elevators, but keep the list-based kernel, you'd have oodles of cycles free to handle other fun things. Having two 2-color maids sharing a scan line shouldn't be a problem in that case.

I'll keep that in mind, but I think there will be plenty of time for everything I want to do anyway.

And I think it's about time you actually finished something, hmm...

All he has to do is replace the maid with Santa Claus and presto! Elevator Rescue: The 2006 AtariAge Holiday Cart. :lol:

Oh hush. :ponder:

Link to comment
Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...