tschak909 Posted August 29, 2016 Share Posted August 29, 2016 When you guys do your cycle counting, do you guys literally just keep erasing and re-writing the cycle counts after recalculating, as you add instructions? Am wondering if any of you have found a trick to save sanity as you're figuring things out in a kernel? -Thom Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted August 29, 2016 Share Posted August 29, 2016 Yes, I am updating the cycle counts very frequently, especially when I create a kernel without WSYNC or a reflected, asymmetrical playfield. Quote Link to comment Share on other sites More sharing options...
Just Jeff Posted August 29, 2016 Share Posted August 29, 2016 I do, but not every time. I just clean it it up periodically or if I'm tackling a problem. Quote Link to comment Share on other sites More sharing options...
+Andrew Davie Posted August 29, 2016 Share Posted August 29, 2016 When you guys do your cycle counting, do you guys literally just keep erasing and re-writing the cycle counts after recalculating, as you add instructions? Am wondering if any of you have found a trick to save sanity as you're figuring things out in a kernel? -Thom "sanity"?! Ha! Large amounts of Boulder Dash code were cycle-counted. By hand. Over and over. I dream opcode cycle-times. Here's an example; I don't think Thomas will mind releasing this - it's part of our engine draw code, over which we hold copyright... DEFINE_SUBROUTINE DrawIntoStack tsx ;2 stx save_SP ;3 ldx DrawStackPointer ;3 txs ;2 = 10 ldy DSL ;3 ; worst-case DrawStackOne loop = 61 cycles per character (+11 for first one) ; + exit cost which is +10 cycles ; TJ: I count 63 ; + 14 for the wtf2 exit ; + 22 for the end of loop exit ; This loops 80 times/frame and is called 1-2 times/frame, so any saving inside the loop will make up for a lot of overhead outside ; For 80 iterations that is 640 cycles, just for checking INTIM ; Worst case we would have ~5000. Though that will most likely never happen, we have to optimize for it, since ; it also will require a lot of CPU time for updating the screen data. ; TODO: optimize! ; ; idea #4: the mixed draw idea (two different characters have to be redrawn), ; first do a QuickDraw and then a SlowDraw, faster than two SlowDraws ; costs some detection time here, but saves ~240 cylces for drawing the two .loopDrawStack lda INTIM ;4 cmp #SEGTIME_DSL ;2 bcc .exitDrawStack ;2/3= 8/9 STRESS_TIME SEGTIME_DSL lda DrawFlag,y ;4 cmp ScreenBuffer,y ;4 Is the character already there the same as the new one? beq .next0 ;2/3=10/11 yes, so we don't draw anything ; Character is NOT the same. Figure out how it should be drawn. ; If it is in column 0 or 5 then it can be DirectDrawn (indirectly found by a A:A compare) ; If it is the same as its paired character (sharing same PF byte) then it can be DirectDrawn ; The top bit of the ScreenBuffer character indicates the DirectDrawn hint ldx PairedCharacter,y ;4 the "paired" character for this one cmp DrawFlag,x ;4 same as partner character in new drawn screen? bne .notPaired0 ;2/3=10/11 ; Consider two 'paired' characters. Either A:A or A:B ; When we're scanning, and we check the first, if they are NOT paired, then the second character ; can be considered in isolation -- its check comes later, will determine itself if the pair can be written ; If, however, the first character IS paired, then the write below will cause the second check to FAIL ; on the comparison, so the character will not be added to the draw stack. So our first character will ; do the job of drawing BOTH characters to the screen. sta ScreenBuffer+RAM_WRITE,x ;5 mark paired character as drawn already (!!) ora #$80 ;2 = 7 DirectDraw this character 'pair' ; In the case of columns 0 and 5, the X and Y registers will be the same -- no problemo, because ; the last write(below) marks the character as to be direct-drawn. .notPaired0 sta ScreenBuffer+RAM_WRITE,y ;5 NEW character to draw + DirectDraw flag (128) ; The following 'pla' really just increments the draw-stack pointer. Value is unimportant. Unusual! pla ;4 ASSUMPTION IS WE DON'T OvERFLOW DRAW STACK tya ;2 tsx ;2 << now X holds drawstackpointer sta DrawStack+RAM_WRITE,x ;5 = 18 index of character to draw .next0 dey ;2 bmi .finishedDrawStack ;2/3= 4/5 ; unrolled 2nd loop: lda DrawFlag,y ;4 cmp ScreenBuffer,y ;4 Is the character already there the same as the new one? beq .next1 ;2/3=10/11 yes, so we don't draw anything ldx PairedCharacter,y ;4 the "paired" character for this one cmp DrawFlag,x ;4 same as partner character in new drawn screen? bne .notPaired1 ;2/3=10/11 sta ScreenBuffer+RAM_WRITE,x ;5 mark paired character as drawn already (!!) ora #$80 ;2 = 7 DirectDraw this character 'pair' .notPaired1 sta ScreenBuffer+RAM_WRITE,y ;5 NEW character to draw + DirectDraw flag (128) pla ;4 ASSUMPTION IS WE DON'T OvERFLOW DRAW STACK tya ;2 tsx ;2 sta DrawStack+RAM_WRITE,x ;5 = 18 index of character to draw .next1 dey ;2 bpl .loopDrawStack ;2/3= 4/5 ;worst case: 111-4 ;40 loops(-4), max. 2 calls(+20) -> -160+40=-120, +8 bytes ; THE FOLLOWING OPTIMISATION IS STUFFED IF PROCESSOBJSTACK is not first in the vector processor! .finishedDrawStack inc ScreenDrawPhase ;5 = 5 tsx ;2 stx DrawStackPointer ;3 ldx save_SP ;3 txs ;2 = 10 jmp SwitchObjects ;rts ;6 = 6 .exitDrawStack sty DSL ;3 = 3 tsx ;2 stx DrawStackPointer ;3 ldx save_SP ;3 txs ;2 = 10 NoBlanks rts ;6 = 6 2 Quote Link to comment Share on other sites More sharing options...
DanOliver Posted August 29, 2016 Share Posted August 29, 2016 I only did it for screen drawing code and only when I got close to working. After counting and a little tweaking I'd know I was maxed out or N cycles under, Any changes from then would be small so just add/subtract the changes. Kind of like counting cards. Pretty quick the cycle counts are memorized so pretty easy. I never put cycle counts in the source file. Too easy to get out of sync and then be misleading for me. 1 Quote Link to comment Share on other sites More sharing options...
+grafixbmp Posted August 29, 2016 Share Posted August 29, 2016 I use a 2600 scan line chart. Quote Link to comment Share on other sites More sharing options...
gauauu Posted August 31, 2016 Share Posted August 31, 2016 I still want a magic ide that auto-inserts/tracks them for me. Why are we doing a computer's job by hand? (and yes, DASM can assemble so quickly and computers are so powerful, that it should even know if it's crossing a page boundary, and highlight that fact for me as I'm going) Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted August 31, 2016 Share Posted August 31, 2016 Yep. I wrapped up my latest round of cycle counting last night: MainLoop: ; 11 stx NUSIZ0 ; 3 14 - now 3 close stx NUSIZ1 ; 3 17 - now 3 close sta GRP0 ; 3 20 - blank out Stella copy of P0 (not seen on hardware) ldx #L8_3_MEDIUM; 2 22 - 3 Copies medium & left 8 (early HMOVE) sty GRP1 ; 3 25 @0-41 datastream 5 sty GRP0 ; 3 28 @??-30 datastream 1 - ??=0 for real hardware SLEEP 2 ; 2 30 sty GRP0 ; 3 33 @33-35 datastream 3 SLEEP 3 ; 3 36 sty GRP0 ; 3 39 @39-54 datastream A SLEEP 2 ; 2 41 sty GRP1 ; 3 44 @44-46 datastream 7 sta RESP0 ; 3 47 @47 sty GRP1 ; 3 50 @49-51 datastream 9 sta RESP0 ; 3 53 @53 stx NUSIZ1 ; 3 56 - now 3 medium sta GRP0 ; 3 59 @57-59 - blank out sty GRP1 ; 3 62 @55-62 datastream D sty GRP0 ; 3 65 @63-65 datastream E EntryEven: ; 65 ldx #L8_3_CLOSE ; 2 67 stx HMP1 ; 3 70 sta HMOVE ; 3 73 @73-74 - move left 8 sta RESP0 ; 3 76/0 ;__________________________________________________ dec LoopCounter ; 5 5 beq KernelDone ; 2 7 (3 stx NUSIZ0 ; 3 10 - now 3 close stx NUSIZ1 ; 3 13 - now 3 close ldx #R8_3_MEDIUM; 2 15 sta GRP0 ; 3 18 - blank out Stella copy of P0 (not seen on hardware) sty GRP1 ; 3 21 @0-38 datastream 4 SLEEP 2 ; 2 23 sty GRP0 ; 3 26 @25-27 datastream 0 SLEEP 2 ; 2 28 sty GRP0 ; 3 31 @31-33 datastream 2 SLEEP 2 ; 2 33 sta GRP0 ; 3 36 @??-46 blank out sta RESP0 ; 3 39 @39 sty GRP1 ; 3 42 @41-43 datastream 6 sta RESP0 ; 3 45 @45 sty GRP1 ; 3 48 @47-49 datastream 8 stx.w NUSIZ1 ; 4 52 - now 3 medium sty GRP0 ; 3 55 @49-57 datastream B sty GRP1 ; 3 58 @??-59 datastream C stx NUSIZ0 ; 3 61 - now 3 medium sty GRP0 ; 3 64 @61-67 datastream F stx HMP1 ; 3 67 ldx #R8_3_CLOSE ; 2 69 EntryOdd: ; 69 SLEEP 4 ; 4 73 sta RESP0 ; 3 76/0 @0 ;__________________________________________________ sta HMOVE ; 3 3 dec LoopCounter ; 5 8 bne MainLoop ; 2 10 (3 11) For the bus stuffing 128 pixel display: 3 Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted August 31, 2016 Share Posted August 31, 2016 DASM can assemble so quickly and computers are so powerful, that it should even know if it's crossing a page boundary macro.h in Collect has a number of macros which take care of that. Search for SAME PAGE BRANCH CHECK as well as DIFFERENT PAGE BRANCH CHECK. 1 Quote Link to comment Share on other sites More sharing options...
gauauu Posted August 31, 2016 Share Posted August 31, 2016 macro.h in Collect has a number of macros which take care of that. Search for SAME PAGE BRANCH CHECK as well as DIFFERENT PAGE BRANCH CHECK. That's slick! But I want it integrated with my editor/IDE. Alongside automatically managed cycle counts. Oh well, I know the drill. If I want it badly enough, I should write it. Quote Link to comment Share on other sites More sharing options...
alex_79 Posted August 31, 2016 Share Posted August 31, 2016 I calculate and write cycle counts in the source whenever timing is crucial.Example: this is a subroutine that receives a byte of data at 19200 baud from a PC RS232 connected to the 2600 right controller port (using a serial-to-TTL level converter or a USB to TTL-serial adapter). ; baudrate mark/space cpu cycles ; duration 2600P 2600N 7800P 7800N ; ; 19200 52.083 us 61.6 62.1 92.4 93.2 ; 9600 104.167 us 123.2 124.3 184.7 186.4 ; 4800 208.333 us 246.3 248.6 369,5 372.9 ; 2400 416.667 us 492.6 497.2 738.9 745.7 ; 1200 833.333 us 985.2 994.3 1477.9 1491.5 ; ; max error allowed 2% ; ; best values for 19200 baud: ; 62 cycles in 2600 mode ; 93 cycles in 7800 mode ; ; PAL: 52.44 us (error +0.69%) ; NTSC: 51.96 us (error -0.23%) ; ; ; TTL serial data ; start bit = 0 ; stop bits = 1 ; idle state = 1 ; ; ; right port: pin 1 = TX (SWCHA D0) ; pin 2 = RX (SWCHA D1) ; BITIN = %00000001 BITOUT = %00000010 SUBROUTINE rxbyte_26 ; end of previous frame stop bit... lda #BITIN ; wait for START bit .loop2 bit SWCHA ;4 bne .loop2 ;2* ;2 - 9 cycles after start of start bit ---> average 5 ; wait about half of the bit duration (31 cycles in 2600 mode) ; 5 (average!) ldy #2 ;2 7 .wait ; 5*y+4=14 dey ; bpl .wait ; ; 21 nop ;2 23 ; main loop ldx #8 ;2 25 .nextbit ; 56 lsr SWCHA ;6 62 31 carry = received bit. PORT A must be configured as INPUT !! ror buffer ;5 5 ldy #8 ;2 7 .wait2 ; 5*y+4=44 dey ;2 bpl .wait2 ;2* ; 51 dex ;2 53 bpl .nextbit ;2* 55 nop 0 ;3 58 bit SWCHA ;4 62 first stop bit rts ;6 returns Z=0 if framing error (A=#BITIN) Quote Link to comment Share on other sites More sharing options...
tschak909 Posted September 3, 2016 Author Share Posted September 3, 2016 When counting, do you first do the pass from the top starting from 0, and then when you reach the bottom of your main kernel loop, do you go back up to the top, and continue counting until 76, or the next WSYNC? Am trying to find the best technique to deal with cycle counting overlap. -Thom Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted September 3, 2016 Share Posted September 3, 2016 I am starting from 0 after each WSYNC. In kernels without WSYNC I start with 0 (or more) when the count exceeds 76. So the counts are always in parallel with the real scanline count (see Stella debugger: "Scan Cycle"). Quote Link to comment Share on other sites More sharing options...
tschak909 Posted September 3, 2016 Author Share Posted September 3, 2016 KERNloop: LDX #ENABL ; 3 27 TXS ; 2 29 LDA SCANLINE ; 3 32 EOR BALLY2 ; 3 35 AND #$FE ; 2 37 PHP ; 3 40 LDA SCANLINE ; 3 43 EOR BALLY1 ; 3 46 AND #$FE ; 2 48 PHP ; 3 51 INC SCANLINE ; 5 56 STA WSYNC ; 3 59 (---) LDA SCANLINE ; 3 3 LSR ; 2 5 LSR ; 2 7 LSR ; 2 9 SEC ; 2 11 SBC #$06 ; 2 13 TAY ; 2 15 STA WSYNC ; 3 18 LDA PF0_0,Y ; 4 22 STA PF0 ; 3 25 LDA PF1_0,Y ; 4 29 STA PF1 ; 3 32 LDA PF2_0,Y ; 4 36 STA PF2 ; 3 39 INC SCANLINE ; 5 44 STA WSYNC ; 3 47 (---) LDA SCANLINE ; 3 3 EOR BALLY0 ; 3 6 AND #$FC ; 2 8 PHP ; 3 11 INC SCANLINE ; 5 16 LDA SCANLINE ; 3 19 CMP #232 ; 2 21 BCC KERNloop ; 2 24 dodgeball.bin Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted September 3, 2016 Share Posted September 3, 2016 Yes, about like that. But I use lower case mnemonics, and camel case variables (to distinguish them from constants). Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.