Jump to content

1

Coding tricks


57 replies to this topic

#1 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Wed Mar 2, 2011 5:47 AM

Hello again everyone, I've been busy since last week programming up my games framework(see blog for details).
Its great to be back here and to start coding again.

I have come up with a neat interlaced routine (not sure if something similar has been posted) and
I need to flip a single bit on and off every frame which sets the interlace routine to even/odd frames:

So what happens is during the kernal I manage it all like this:
			ldy Scanlines		   ;[]+3	 ;Our Scanline count for our interlaced loops
											  ;-Loaded from VBLANK to save precious playfield cycles

PF_LOOP:	tya					 ;[3]+2	;Interlace frame check
			eor #1				  ;[5]+2	;Toggle bit, this switches between Draw & Logic scanlines
			and #$01				;[7]+2	;Mask all but bit 1
			beq PF_LOGIC			;[9]+2/3  ;Branch to Logic scanline

PF_DRAW:	;********************************
			; [DRAW SCANLINE]
			; * 96 Scanlines of visible graphics
			; * Scanline [??], S.cyc [11]
			; * PixelPos [-35], Color clock [33]
			;********************************
			
			;***DO STUFF like draw graphics

			jmp PF_Return		   ;[]+3	;Jump out of Draw

PF_LOGIC:   ;********************************
			; [LOGIC]
			; * 96 scanlines for logic
			; * Scanline [39 to ??], S.cyc [12]
			; * PixelPos [-32], Color clock [36]
			;********************************
			;***DO STUFF LIKE
			;***Blank out graphics so they are not visible on this scanline
			;***Process updates ect
			;-Fall through to PF_Return

PF_Return:  dey					 ;Decrement scanline
			sta WSYNC			   ;-
			bne PF_LOOP			 ;If more scanlines left, loop 
			;All done? Fall through to overscan

And in the Overscan I do this:
	 
			;***Interlaced display settings***
			lda Interlace		   ;This will interlace the display
			eor #1				  ;-toggle bit between 0 and 1
			sta Interlace		   ;-store value for next pass
			bne .ODD				;-if its a 1 then setup for ODD frames
				lda #191			  ;Set to 192 scanlines total (191 to 0)
				sta Scanlines		 ;-used during PF_LOOP
				lda #33			   ;Compensate 1 more scanline for interlace
				sta TIM64T			;Will total 262 scanlines
				jmp .OS_LOGIC		 ;Proceed to Overscan logic
			
.ODD:	   lda #192				;We want 193 scanlines (192 to 0)
			sta Scanlines		   ;-store value for next pass
			lda #34				 ;Compensate 1 less scanline for interlace
			sta TIM64T			  ;Will total 262 scanlines
			;-proceed to Overscan logic

Is this a typical way of doing an interlaced kernal? Every frame is alternated starting at Logic->Draw to Draw->Logic and so forth, I find this gives minimal flicker and allows for an entire frames worth of time for logic and drawing collectively.

What I would like to know is if someone invented a better way to do interlaced kernals. This seems to work fine right now for my game, but I am very curious as to what others have done before me.

Okay sleep time, cya all tomorrow.

Edited by ScumSoft, Wed Mar 2, 2011 5:50 AM.


#2 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Thu Mar 3, 2011 5:22 PM

Alright I give. How does one implement a Jump table in 6507 code? I'd like to hold an offset like such to a table of Jumps:

Offset = 0 through 3

JumpTable:
jmp doThis
jmp doThat
jmp doSomething
jmp DONTDOTHAT

And I'd like to call it via jmp JumpTable,Offset to land on the right bounce.

#3 SpiceWare ONLINE  

SpiceWare

    Quadrunner

  • 6,952 posts
  • Medieval Mayhem
  • Location:Planet Houston

Posted Thu Mar 3, 2011 5:50 PM

This is how I did it in Stay Frosty to run the appropriate level specific routines (move fireballs, elevators, etc).  I've been using this method since the 80s when I was coding on the Vic 20, C= 64 and C= 128.

;*****************************
;*		S T A R T		  *
;* Level Specific Processing *
;*****************************
		lda CurrentLevel
		and #LEVEL_MASK ; 32 levels
		asl
		tax
		lda LPjumpTable+1,x
		pha
		lda LPjumpTable,x
		pha
		rts

LPjumpTable:
		.word Level1Processing-1 ;
		.word Level2Processing-1 ;
		.word Level3Processing-1 ;
		.word Level4Processing-1 ;
		.word Level5Processing-1 ;
		.word Level6Processing-1 ;
		.word Level7Processing-1 ;
		.word Level8Processing-1 ;
		.word Level9Processing-1 ;
		.word Level10Processing-1 ;
		.word Level11Processing-1 ;
		.word Level12Processing-1 ;
		.word Level13Processing-1 ;
		.word Level14Processing-1 ;
		.word Level15Processing-1 ;
		.word Level16Processing-1 ;
		.word Level17Processing-1 ;
		.word Level18Processing-1 ;
		.word Level19Processing-1 ;
		.word Level20Processing-1 ;
		.word Level21Processing-1 ;
		.word Level22Processing-1 ;
		.word Level23Processing-1 ;
		.word Level24Processing-1 ;
		.word Level25Processing-1 ;
		.word Level26Processing-1 ;
		.word Level27Processing-1 ;
		.word Level28Processing-1 ;
		.word Level29Processing-1 ;
		.word Level30Processing-1 ;
		.word Level31Processing-1 ;
		.word Level32Processing-1 ;   


#4 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Thu Mar 3, 2011 6:03 PM

Oh thank you so much SpiceWare its working great!
I had the addressing wrong, I was placing commands in the table and not offsets to the routines.

Once I finish my game, I'll have a huge list of problems I've encountered and should compile them together and their appropriate solutions for others to learn by.
Beginner tutorials are great for getting started, but actual game design problems and their solutions would be a valuable resource don't you think?

Now I can finally proceed. Thanks again.

#5 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Fri Mar 4, 2011 11:56 PM

Well never mind the first post here, I found a much better way to interlace the frames. Much of that code was not really needed at all :D
Hurray for optimizations!

#6 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Mon Mar 21, 2011 7:22 PM

		; Unused/undefined registers ($285-$294)

			ds 1	; $286
			ds 1	; $287
			ds 1	; $288
TEMP0		ds 1	; $289  Writeable and readable
			ds 1	; $28A
TEMP1		ds 1	; $28B  Writeable and readable
			ds 1	; $28C
			ds 1	; $28D
			ds 1	; $28E
			ds 1	; $28F
			ds 1	; $290
			ds 1	; $291  Mirror of TEMP0 data
			ds 1	; $292
			ds 1	; $293  Mirror of TEMP1 data

I modified VCS.h like this.

I noticed there was a few undefined bytes of memory space, but I am not sure where this space is located. So I assigned a name to each and tried storing and loading data from each of them, and this is what I got.

Stella can read and write to the above addresses, but I wasn't sure if a real 2600 could, so I loaded GFX data into TEMP0 and TEMP1 each scanline, slapped it into my harmony cart and behold they work on a real 2600.

Where in the 2600 are these bytes located? It defines these in the riot chip, but what parts of this chip are actually unused? The other bytes aren't writable but those 2 are for some reason.

Found a RIOT.txt that explains them as this:

$0286 = (RIOT $06) - Write edge detect control - negative edge, enable int (1)
$0287 = (RIOT $07) - Write edge detect control - positive edge, enable int (1)
$0288 = (RIOT $08) - Write DRA
$0289 = (RIOT $09) - Write DDRA
$028A = (RIOT $0A) - Write DRB
$028B = (RIOT $0B) - Write DDRB
$028C = (RIOT $0C) - Write edge detect control - negative edge, disable int (1)
$028D = (RIOT $0D) - Write edge detect control - positive edge, disable int (1)
$028E = (RIOT $0E) - Write edge detect control - negative edge, enable int (1)
$028F = (RIOT $0F) - Write edge detect control - positive edge, enable int (1)
$0290 = (RIOT $10) - Write DRA
$0291 = (RIOT $11) - Write DDRA
$0292 = (RIOT $12) - Write DRB
$0293 = (RIOT $13) - Write DDRB

I'm not sure what DRA/DDRA/DRB/DDRB pertain too.

Edited by ScumSoft, Mon Mar 21, 2011 7:48 PM.


#7 Nukey Shay OFFLINE  

Nukey Shay

    Sheik Yerbouti

  • 20,782 posts
  • Location:The land of Gorch

Posted Mon Mar 21, 2011 8:12 PM

$0289 an $028B are mirror addresses of SWACNT and SWBCNT (used to define "data direction" of the bits of SWCHA and SWCHB).  In short, you can redefine which bits of the 2 registers you want to use as ram instead of their original configuration...namely, reading the controller ports and console switches.

#8 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Mon Mar 21, 2011 8:24 PM

Thanks nukey! That makes much more sense.
I was looking through the header file and wondered why those bytes were undefined and decided to just check em out.

Okay my side tracked mission is done. Back to coding I go.

#9 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Mon Mar 21, 2011 9:09 PM

I've implemented a software 96x192 pixel display output.
However I am having some issues coming up with a fast method to do mid-byte positioning.

I have a 12byte display output buffer and brute forcing a sprites bits into position would be done like this:
GeneratePixelOffset:
	lax PlayerX			 ;[]+3 Load Players X position 0-95
	ldy XPosTable,X		 ;[]+4 Get amount to shift
	sty ShiftAMT			;[]+3 Store for later
	lsr					 ;[]+2 Divide by 8
	lsr					 ;[]+2
	lsr					 ;[]+2
	tax					 ;[]+2 Use as offset
	lda GFXtable,X		  ;[]+4 Get GFXbuffer slot number
	sta P0GFXslot		   ;[]+3 Save for later
	rts					 ;[]+6
GFXtable:
  .byte $00,$06,$01,$07,$02,$08,$03,$09,$04,$0A,$05,$0B ;GFXbuffer 0-11
			  
XPosTable:
  .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 00-07
  .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 08-15
  .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 16-23
  .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 24-31
  .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 32-39
  .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 40-47
  .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 48-55
  .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 56-63
  .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 64-71
  .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 72-79
  .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 80-87
  .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 88-95

[...]

***THEN IN THE SCANLINE KERNAL***
;Pretend I want to draw the player right now
ldy GFXindex	   ;Which sprite to draw?
lda (GFXplayer),Y  ;fetch GFX data to be drawn

ldx P0GFXslot	  ;calculated outside kernal, 0 to 11

;Position the sprite roughly into the appropriate GFXbuffer byte
sta GFXbuffer,X

;Now rotate into position determined by ShiftAMT
lda ShiftAMT
beq .noShift
lsr
beq Shift1
lsr
beq Shift2

[ect...]

This would take too many cycles just checking to see how many bytes to shift.
Then if say I am shifting over 3 bytes:
shift3:
  lsr GFXbuffer,X
  ror GFXbuffer+1,X
  lsr GFXbuffer,X
  ror GFXbuffer+1,X
  lsr GFXbuffer,X
  ror GFXbuffer+1,X

Way too many cycles over budget.
So I am currently looking into doing some smart masking and bit flipping to avoid this much overhead.

Would a simpler method already be known that I could learn from?

Attached Files


Edited by ScumSoft, Mon Mar 21, 2011 9:14 PM.


#10 RevEng OFFLINE  

RevEng

    River Patroller

  • 2,540 posts
  • bit shoveler
  • Location:Canada

Posted Tue Mar 22, 2011 6:28 AM

If you replace your "ldy XPosTable,X" with "and #7" you'll get the same results without the lookup table.

The fastest method to shift your sprites is to store copies of all of your sprites pre-shifted, trading off rom for cpu time. Then use EOR to place the software sprites into your line ram instead of STA.

#11 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Tue Mar 22, 2011 2:00 PM

Brilliant! I really appreciate the help.

[edit] Well eor only works on the acc, so to place the result into ram I have to use a sta regardless right?

Edited by ScumSoft, Tue Mar 22, 2011 2:43 PM.


#12 RevEng OFFLINE  

RevEng

    River Patroller

  • 2,540 posts
  • bit shoveler
  • Location:Canada

Posted Tue Mar 22, 2011 3:26 PM

View PostScumSoft, on Tue Mar 22, 2011 2:00 PM, said:

Brilliant! I really appreciate the help.

[edit] Well eor only works on the acc, so to place the result into ram I have to use a sta regardless right?
No problem.

You'd want to load the value from your sprite table into the accumulator, and then eor it into your ram line-buffer...

 lda (GFXplayer),y
 eor GFXBuffer,x
 Sta GFXBuffer,x

...not sure if it was clear in my last post, but using EOR instead of STA has the benefit that if 2 sprites fall into the same GFXBuffer byte, you won't disturb the second one when placing the first.

You could also use ORA instead of EOR, the difference being the effect when software sprites overlap each other.

#13 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Tue Mar 22, 2011 5:46 PM

Right, that was already completely understood. Although having eor $ram modify the ram contents directly would be a nice addressing mode though it would take the same amount of cycles to perform. That is what macros are for right :)

okay then.
Lets calculate real quick the ram requirements of storing 7x the amount of data for one 8x8 sprite = 448 bytes.

However being clever I could store half the data by sharing bytes, so half that is 224 bytes per sprite.
My game has 3 different 8x24 player sprites so that would be with byte sharing 2016 bytes for the just the player graphics.
Combine that with the object and monster sprites and I am well over 16k just for graphic data.

I think I'll need to simplify them a bit, and cut back on the detail to make this way work. I would have hoped there existed some mathematical tricks to dividing a sprite by 4,6 and 8 without all the shifts.

I'm looking into some other methods of mid byte positioning and well see which one would be most ideal then.

Thanks for the continued support.

#14 RevEng OFFLINE  

RevEng

    River Patroller

  • 2,540 posts
  • bit shoveler
  • Location:Canada

Posted Wed Mar 23, 2011 6:29 AM

Glad to add where I can.

Quote

okay then.
Lets calculate real quick the ram requirements of storing 7x the amount of data for one 8x8 sprite = 448 bytes.
You mean rom requirement, not ram, right?

Storing a pre-shifted sprite should take (#_bytes_width+1)*height*7 bytes. For an 8x8 sprite that would be 2*8*7=112 bytes. Still a lot of rom, though.

#15 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Wed Mar 23, 2011 3:23 PM

Oh yes I did, simple typo on my part.

I've realized that there isn't a need to do any of this at all, I simply don't have to do the shifting during the draw phase and can preshift the data during Vsync and Overscan time then store the result in an 8byte buffer for each object to be displayed on screen.
My game is using less than 1/3 of the 128 bytes of ram, so this isn't a problem as there is only ever 4 objects on the screen maximum.

I'll look into using the DPC+ for future improvements to this kernal which provides much more ram space to work with ;)

#16 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Tue Apr 5, 2011 6:08 PM

Is it possible to force a VSYNC instead of performing the Overscan? It seems to work on Stella, but on the real console the TV seems to always perform the Overscan.

I would very much like to tell the TV to skip the Overscan and reset the frame to perform a fast redraw on the screen that would minimize/eliminate flicker on a 4 frame kernal.

It would be nice if possible, as the turn around time would minimize the length of time between phosphor hits and thus offering brighter colors and minimize or eliminate the flicker seen. After 4 frames have been drawn, then we go to the Overscan and process the needed logic. Lather rinse repeat.

Edited by ScumSoft, Tue Apr 5, 2011 6:09 PM.


#17 eshu OFFLINE  

eshu

    Chopper Commander

  • 187 posts

Posted Wed Apr 6, 2011 1:20 AM

In short - no...

In long - on the VCS you need to build up all the parts of the TV signal on the fly, you could slightly reduce the overscan period to produce a non-standard frame rate (above 60hz) that may work on some tv's - the more you reduce it the less tv's it will work on, you're best off working as close to the standard as possible (60hz for NTSC, 50hz for PAL) - what you most definitely cannot do is remove the overscan on some frames and not others as then you won't even have a static frame size and frame rate - I'd be surprised if any TV would display that.

#18 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Wed Apr 6, 2011 2:22 AM

Yeah I figured as much, the 2600 isn't controlling the TV so much as it is walking along with it. I became curious if sending a VSYNC signal would tell the TV to stop what it was doing and return the beam to the VBLANK period prematurely. If this was possible you could refresh the screen a second time much quicker thus hitting the phosphors again in a shorter period. Even if the refresh was off you wouldn't have any roll since you'd be controlling the beam the entire time.

Now then, you can draw however many scanlines you wish, so long as the TV's timing is still sent the VBLANK signal once you're done. Case and point in the following demos where none of them have an Overscan period. As soon as the desired number of scanlines are drawn, we can hop back to the VBLANK and start drawing again. Although in order to avoid the roll, the required number of scanlines has to be drawn as seen in the case of the 170scanline demo, even though the scanline number is consistent, it still rolls. (I believe this can be avoided with careful timing)

I was hoping we could force a non-standard refresh rate on the TV, as this would open up some really nice tricks.

I've been plugging along with my game and desire to use a 4-frame flickerblind kernal without any flicker. If I could skip the overscan and refresh the screen once every other frame, this would eliminate the flicker. It works well in Stella, so I might make a version that uses this trick just for emulation play and use my other kernal for the real units.

Well, it was worth a shot.

[edit] Whoops forgot to add the 170scanline.bin

Attached Files


Edited by ScumSoft, Wed Apr 6, 2011 2:31 AM.


#19 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Thu Apr 7, 2011 5:07 PM

I need to optimize what this routine does, even tho I wrote it myself I can't seem to find a way to make it faster.
Or substitute a different method in it's place NOT preshifting every sprite sacrificing rom space for cycles, it's a much better learning experience to think of a way to do this faster.

;********************************
;	 [ROTATE ROUTINES]
;********************************
;I test RotateAMT before entering the routine, skipping when not needed.

RotateBytes:
	lda #15				 ;[]+2 Work on 16 bytes of data 0-15
	sta Counter			 ;[]+3 Set counter 
	lda RotateAMT		   ;[]+3 Fetch rotate amount
	cmp #5				  ;[]+2 Faster to do ROL?
	bpl .doROL			  ;[]+2/3 Branch if yes
	
	;Rotate values 1 through 4
.RORa  
	ldx RotateAMT		   ;[]+3 Load amount to rotate
	ldy Counter			 ;[]+2 Use counter as byte index, Y is trashed so reload counter	
	lda SPRITE,Y			;[]+4 Load Sprite byte to be rotated
.RORloop
	;8-bit rotate
	tay					 ;[]+2 Save byte being worked on
	ror					 ;[]+2 Rotate data into carry
	tya					 ;[]+2 Restore byte being worked on
	ror					 ;[]+2 Shift carry into byte
	dex					 ;[]+2 Countdown rotate amount, worst case cycles for single byte = 52, best = 12
	bne .RORloop			;[]+2/3 Loop if more to rotate
	ldy Counter			 ;[]+3 General counter
	sta SPRITE,Y			;[]+4 Store Result into 16-byte SPRITE buffer
	lda #$FF				;[]+2 Compare value
	dcp Counter			 ;[]+5 Decrement counter, compare with #$FF
	bne .RORa			   ;[]+2/3 Branch if more bytes to rotate
	rts	

.doROL ;Rotate values 5 through 7
	;A holds RotateAMT right now
	eor #6				  ;[]+2 Set value to 1,0, or 3
	bne .ROLa			   ;[]+2/3 Does rotate value = 0
	lda #2				  ;[]+2 Yes then load corrected value
.ROLa
	sta RotateAMT		   ;[]+3 Save new value
.ROLb  
	ldx RotateAMT		   ;[]+3 Load amount to rotate
	ldy Counter			 ;[]+2 Use counter as byte index, Y is trashed so reload counter	 
	lda SPRITE,Y			;[]+4 Load Sprite byte to be rotated
.ROLloop
	;8-bit rotate
	tay					 ;[]+2 Save byte being worked on
	rol					 ;[]+2 Rotate data into carry
	tya					 ;[]+2 Restore byte being worked on
	rol					 ;[]+2 Shift carry into byte
	dex					 ;[]+2 Countdown rotate amount, worst case cycles for single byte = 52, best = 12
	bne .ROLloop			;[]+2/3 Loop if more to rotate
	ldy Counter			 ;[]+3 General counter
	sta SPRITE,Y			;[]+4 Store Result into 16-byte SPRITE buffer
	lda #$FF				;[]+2 Compare value
	dcp Counter			 ;[]+5 Decrement counter, compare with #$FF
	bne .ROLb			   ;[]+2/3 Branch if more bytes to rotate
	rts   

Edited by ScumSoft, Thu Apr 7, 2011 5:21 PM.


#20 Thomas Jentzsch OFFLINE  

Thomas Jentzsch

    Thrust, Jammed, SWOOPS!, Boulder Dash

  • 17,525 posts
  • Always left from right here!
  • Location:Düsseldorf, Germany

Posted Fri Apr 8, 2011 9:48 AM

What is most important here? Reducing average time or maximum time?

Simple improvement: For the ROL loop you better do
  cmp #$80
  rol

Other ideas are:
- store every sprite twice (normal and shifted 4 bits), so that you have to shift less
- unroll the loops and use a jump table to select the correct starting point
- rol/ror with 9 bits (incl. the carry) and fix the extra bit after the last shift.

#21 Thomas Jentzsch OFFLINE  

Thomas Jentzsch

    Thrust, Jammed, SWOOPS!, Boulder Dash

  • 17,525 posts
  • Always left from right here!
  • Location:Düsseldorf, Germany

Posted Fri Apr 8, 2011 10:19 AM

Here is some untested(!) code:

Your code:
inner loop: 13 cylces; average (2.5 times): 31.5
outer loop: 26 cylces + 31.5; total (16 times): 919 cycles

	lda #>Ror1			  ;[]+2
	sta .vec				;[]+3
	...
	ldy RotateAMT		   ;[]+3 Load amount to rotate
	lda RorJmpTbl,y		 ;[]+4
	sta .vec				;[]+3
	ldx #15				  []+2 Use counter as byte index
.RORa
	lda SPRITE,x			;[]+4 Load Sprite byte to be rotated
	and RorAndTbl,y		 ;[]+4
	ror					 ;[]+2
	eor SPRITE,x			;[]+4
	and RorAndTbl,y		 ;[]+4
	eor SPRITE,x			;[]+4 = 18 extra cycles to preserve X and make ror faster
	jmp (.vec)			  ;[]+5
Ror4:
	ror					 ;[]+2
Ror3:
	ror					 ;[]+2
Ror2:
	ror					 ;[]+2
Ror1:
	ror					 ;[]+2
	sta SPRITE,x			;[]+4 Store Result into 16-byte SPRITE buffer
	dex					 ;[]+2 Decrement counter
	bpl .RORa			   ;[]+2/3 Branch if more bytes to rotate
	...

RorJmpTbl:
; make sure >Ror1 and >Ror4 are in the same page!
	.byte	<Ror1, <Ror2, <Ror3, <Ror4
RorAndTbl:
	.byte	%1, %11, %111, %1111

setup code: 17 cycles
shifts (average 2.5 shifts): 5 cycles
loop: 36 + 5 = 41 cycles; total (16 times): 655 cycles

Edited by Thomas Jentzsch, Fri Apr 8, 2011 10:35 AM.


#22 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Fri Apr 8, 2011 2:22 PM

I do appreciate the help, I'll see how much of an impact this new setup makes. What is most important is to get the entire routine to minimal cycles used, So I really like the hybrid shift + preshifted sprite idea, one thing I definitely didn't think of using before.

I am over budget by 456 cycles in the overscan because I moved from 8x8 sprites to 8x16. 8x8 just didn't have the space to make the player look right, but this also added quite a bit of presprite overhead to the routines. I need preshifted data due to the way my software output buffers align to the 96x96 screen space I have setup right now.
The player position registers aren't moving along with the sprite like your typical game does, instead they are stationary and split into 6 sections per frame, then aligned to form a 12x24 software settable block of pixels.

Ok, off to test the code. Be back later on.

#23 Thomas Jentzsch OFFLINE  

Thomas Jentzsch

    Thrust, Jammed, SWOOPS!, Boulder Dash

  • 17,525 posts
  • Always left from right here!
  • Location:Düsseldorf, Germany

Posted Fri Apr 8, 2011 2:28 PM

View PostScumSoft, on Fri Apr 8, 2011 2:22 PM, said:

I do appreciate the help, I'll see how much of an impact this new setup makes. What is most important is to get the entire routine to minimal cycles used, So I really like the hybrid shift + preshifted sprite idea, one thing I definitely didn't think of using before.

I am over budget by 456 cycles in the overscan
Do you have free cycles in VBlank?

#24 ScumSoft OFFLINE  

ScumSoft

    Moonsweeper

  • 332 posts
  • Location:Polysorbate 60

Posted Fri Apr 8, 2011 6:28 PM

Vblanks time is entirely used for working on the remaining sprites, level construction, sounds and game logic, and masking the buffers to get them ready for drawing. Only the player and monsters are 8x16 sprites and therefore take the most time to work on, so I do them in the overscan first, everything else are 8x8 sprites and not an issue.

I'll probably rearrange the workload as need be later on, I just need the entire games functionality in place first. But I wanted to see if there was a way to optimize this routine now as it dictates how large of a sprite I can toss in my game and still have time left over for other things. If 8x8 is the largest feesable sprite to work on then I have to design the game around this accordingly see? But I am sure I can get some 8x16 ones in here.

#25 bogax ONLINE  

bogax

    Moonsweeper

  • 277 posts

Posted Fri Apr 8, 2011 11:59 PM

View PostThomas Jentzsch, on Fri Apr 8, 2011 10:19 AM, said:



	...

.RORa
	lda SPRITE,x			;[]+4 Load Sprite byte to be rotated
	and RorAndTbl,y		 ;[]+4
	ror					 ;[]+2
	eor SPRITE,x			;[]+4
	and RorAndTbl,y		 ;[]+4
	eor SPRITE,x			;[]+4 = 18 extra cycles to preserve X and make ror faster
	jmp (.vec)			  ;[]+5

	...


I don't think you need that first and

eg

.RORa
	lda SPRITE,x	 ; ? abcdefgh	   
	ror			  ; h ?abcdefg
	eor SPRITE,x	 ; h xxxxxxxx
	and RorAndTbl,y  ; h 0000xxxx
	eor SPRITE,x	 ; h abcddefg
	ror			  ; g habcddef
	ror			  ; f ghabcdde
	ror			  ; e fghabcdd
	ror			  ; d efghabcd

an alternative

	lda SPRITE,x	 ; ? abcdefgh
	and RorAndTbl,y  ; ? abcd0000
	clc
	adc SPRITE,x	 ; a bcd0efgh
	ror			  ; h abcd0efg
	ror			  ; g habcd0ef
	ror			  ; f ghabcd0e
	ror			  ; e fghabcd0
	ror			  ; 0 efghabcd
of course you have to invert the mask(s)
and if you know the carry will be clear
you could leave out the clc and possibly
gain a couple cycles


for rol

	lda SPRITE,x	 ; ? abcdefgh
	asl			  ; a bcdefgh0
	adc #$80		 ; b ?cdefgha
	rol			  ; ? cdefghab

ie three cycles per bit if you do them in pairs
but I think you'd need a seperate routine for
an odd number of bits




0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users