Jump to content
IGNORED

Counting is not... easy :)


shazz

Recommended Posts

Hello,

 

I'm new to the VCS development, I read a lot of things on this wonderful website.... and I learn :) So sorry for any dumb questions...

 

So, I'm displaying an asymmetrical playfield, that's fun but it seems my cycle counting is not right (or Stella is bugged... but... probably I'm bugged)

 

here is my kernel extract :

 

ScanlineLoop

STX COLUBK ; 3

LDA Screen_PF0-1,Y ; 5
STA PF0 ; 3
; -> were consumed to reach this point : 2 + 3 (DEY+BNE) + 3 + 5 + 3 = 16 cycles
LDA Screen_PF1-1,Y ; 5
STA PF1 ; 3
LDA Screen_PF2-1,Y ; 5
STA PF2 ; 3

INX ; 2
INX ; 2
SLEEP 4 ; 4
; -> we consumed to reach this point : 16 + 5 + 3 + 5 + 3 + 2 + 2 + 4 = 40 cycles < 49-8 limit, good enough to start write PF0 again
LDA Screen_PF3-1,Y ; 5
STA PF0 ; 3
LDA Screen_PF4-1,Y ; 5
STA PF1 ; 3
LDA Screen_PF5-1,Y ; 5
STA PF2 ; 3

STA WSYNC ; 3

;*********************** scanline frontier

DEY ; 2
BNE ScanlineLoop ; 2+1 

 

- so I change the background color (ahhh rasters....)

- then I poke the PF registers

- and again.

 

This piece of code works but I don't understand why, let me explain :

1. I understand I've got 22 CPU cycles before PF0 is rastered and here I count 16 cycles, and if I add my 2 inx instructions after those 16 cycles (so 16 + 4 = 20) it should be ok but the display glitches... so it's not

2. before poking the PF registers a second time, I've got a sleep 4. If I count well 40 cycles are consumed at this point, so I should have 9 cycles left to poke PF0 (so 8 cycles) but if I sleep 5, it glitches...

 

So if somebody can explain me where I did a mistake....

 

thanks !!!

so fun :)

Link to comment
Share on other sites

Yes but would be worse as I count 16 cyles (with LDA abs,Y == 5 cycles) and it seems I'm already @19 cycles.

And as my Screen_PFx data are located at the end of my program and they are quite big, I considered I crossed the page boundary...

 

So I still wonder where I missed cycles...

Link to comment
Share on other sites

Yes but would be worse as I count 16 cyles (with LDA abs,Y == 5 cycles) and it seems I'm already @19 cycles.

And as my Screen_PFx data are located at the end of my program and they are quite big, I considered I crossed the page boundary...

 

So I still wonder where I missed cycles...

 

 

I quote myself...

 

"Now, what happens at colour clock 148? The TIA starts displaying the second half of the playfield for the scanline in question, of course! Depending on if the playfield is mirrored or not, we will start seeing data from PF2 (mirrored) or from PF0 (non-mirroed)."

 

You don't by any chance have the playfield mirroring wrong?

I'd want to see PF0 PF1 PF2 PF0 PF1 PF2, not PF0 PF1 PF2 PF2 PF1 PF0. If the latter, that would explain your glitches... nothing to do with cycle counts at all!

Cheers

 

A

Link to comment
Share on other sites

So...

1. I check with the Stella debugger, I count well :) But it doesn't explain (to me at least) the behavior..

2. No my CTRLPF register looks ok :

 

LDA #%00000000 ; Playfield symétrique... to do asymmetrical ;)

STA CTRLPF

 

3. the 1st glitch appears on the 5th pixel so PF1 if I'm not wrong (screenshot : http://shazz.untergr...VCS/source4.png you can see a strange vertical bar on the atari logo)

4. here is the source code that works : http://shazz.untergr.../source4_en.asm (http://shazz.untergr.../source4_en.bin + http://shazz.untergr.../source4_en.sym)

4.1 if I add 1 CPU cycle before poking PF0 it glitches so I understand that PF1 should poked before the 27th cycle and not the 28th as expected (84/3)

4.2 if I add 1 CPU cycle before repoking PF0 (right PF) it glitches so I understand that PF0 should poked before the 48th cycle and not the 49th as expected... but maybe that 48.666 :) so may be ok

 

So 4.2 may be ok but 4.1 confuses me and explains me stuff at the same time, the issue was with PF1 (but I focused on PF0) so.... it also contraints PF0 as PF0 is processed quickly 8/3 cycles) but I expected 28 cycles...

Edited by shazz
Link to comment
Share on other sites

Hallo enthusi,

 

Yeah that was wanted in order to get this extra cycle. This piece of code objective was to reach the timing limit before it starts to glitch. So with this extra cycle, I finish to write PF0 (second time) at 48 CPU cycles. One more (so 49) and it glitches... but it should not according to my understanding of the TIA (starts to draw PF0 at 148 color clock so 49.33 CPU time)

 

I'm using DASM.

Link to comment
Share on other sites

If your data tables start on page boundaries, then the -1,y access will actually change to $FF boundary for indexing, and every access will incur a cycle penalty. You seem to have catered for this in your count of 5 for the access (whereas without the penalty should be 4). Why you'd want to do it this way, I don't know... waste of cycles!

Get rid of the -1, see if that changes things. If that doesn't help, please post full source code including data.

Cheers

A

Link to comment
Share on other sites

Thanks Andrew to point me my bad habits ! Yeah I reused Chris' code (TIA painter author) as an example and I forgot to think why the hell -1.....

So I removed them.... but it doesn't seem to change anything.

..

- glitch if finish poking PF0 @20 then so PF1 @28 CPU cycles ...instead of what the theory says... PF0@22 and PF1@28 but .. ok if PF0@19 then so PF1@27

- glitch if finish poking PF0 @49 then so PF1 @57 CPU cycles ...instead of what the theory says... PF0@49and PF1@54 but ... ok if PF0@48 then so PF1@56 !!!

 

all the code is available there : http://shazz.untergrund.net/VCS/forum/

 

I start to wonder if all those computations make sense... but as it is required to take those stuff into account on the VCS if I understand well... :)

 

Other related question, if I uncomment

org $F300

INCLUDE "MJJA.ASM"

 

I think I did not understant the "boundary crossing" penalty, I thought it was if the difference between the PC and the adress to read was in different pages but it seems this is not the case as my calls :

 

LDA Screen_PF0,Y ; -> take 4 cycles

STA PF0

LDA Screen_PF1,Y ; -> take 5 cycles

STA PF1 ;

LDA Screen_PF2,Y; -> take 5 cycles

 

why only 4 cycles for Screen_PF0 ?

 

PC : $F040

Screen_PF0 : $F900

Screen_PF1 : $F9E5

Screen_PF2 : $FAC9

Link to comment
Share on other sites

I think I did not understant the "boundary crossing" penalty, I thought it was if the difference between the PC and the adress to read was in different pages but it seems this is not the case as my calls :

 

LDA Screen_PF0,Y ; -> take 4 cycles

STA PF0

LDA Screen_PF1,Y ; -> take 5 cycles

STA PF1 ;

LDA Screen_PF2,Y; -> take 5 cycles

 

why only 4 cycles for Screen_PF0 ?

 

PC : $F040

Screen_PF0 : $F900

Screen_PF1 : $F9E5

Screen_PF2 : $FAC9

The page-crossing penalty occurs if the address plus the index offset crosses a page boundary. Your Screen_PF0 table starts at the beginning of a page (the least significant byte of the address is 0) and index Y will always be between 0 and 255 inclusive, so there will never be a page-crossing situation for LDA $F900,Y. The other two addresses are not too far from the end of a page, so sometimes there will be page crossings and sometimes not, depending on the value of Y:

 

$F9E5 is below page boundary $FA00.

$FA00 - $F9E5 = $1B (or 27).

So LDA $F9E5,Y will cross the page boundary at $FA00 when Y is 27 or greater, but not when Y is 26 or less.

 

$FAC9 is below page boundary $FB00.

$FB00 - $FAC9 = $37 (or 55).

So LDA $FAC9,Y will cross the page boundary at $FB00 when Y is 55 or greater, but not when Y is 54 or less.

Link to comment
Share on other sites

I think I did not understant the "boundary crossing" penalty, I thought it was if the difference between the PC and the adress to read was in different pages

By the way, there's another type of page-crossing situation, which is where you're doing a branch instruction and the target of the branch instruction is on a different page-- not from the address where the branch instruction starts, but from the address *after* it. It sounds like maybe you were confusing these two types of page crossings.

Link to comment
Share on other sites

Thanks SeatGtGruff !

 

That's perfectly clear ! And yes you're right, I was mixing the 2 page-crossing situations !

 

So now I have aligned my data at a page start (I used ALIGN $100, not sure it is the best way to do.. or use org...) and now, despite the memory waste, my kernal is more predictive as all my LDA take 4 cycles.

 

:-)

Link to comment
Share on other sites

So I did my computations again uisng a stable routine (no boundary cross anymore) and here are my results with Stella :

 

ScanlineLoop

STX COLUBK ; 3 (
SLEEP 5 ; 5 (13)

; PF left
LDA Screen_PF0,Y ; 4 (17) -> no boundary cross
STA PF0 ; 3 (20) -> were consumed to this point : 20 cycles < 22 cycles, 21 is ok, 22 glitches
LDA Screen_PF1,Y ; 4 (24)
STA PF1 ; 3 (27) -> were consumed to this point 27 <= 84/3=28, 27 is ok, 28 glitches
LDA Screen_PF2,Y ; 4 (31)
STA PF2 ; 3 (34) -> were consumed to this point : 34 < 116/3=38, 39 is ok !!! 40 glitches...

INX ; 2 (36) (raster stuff)
INX ; 2 (38) -> we are at 37 cycles here
SLEEP 2 ; 2 (40)

; PF right
LDA Screen_PF3,Y ; 4 (44)
STA PF0 ; 3 (47) -> were consumed to this point : 47 cycles < 148/3=49.3 cycles ! 48 is ok, 49 glitches
LDA Screen_PF4,Y ; 4 (51)
STA PF1 ; 3 (54) -> were consumed to this point : 54 < 164/3=54, 54 is ok, 55 glitches
LDA Screen_PF5,Y ; 4 (58)
STA PF2 ; 3 (61) -> were consumed to this point : 63 < 196/3=65, 64 is ok, 65 glitches

STA WSYNC ; 3 (64)


;*********************** scanline boundary

DEY ; 2 (2)
BNE ScanlineLoop ; 2+1 (5) 

 

so there is a discrepency of one cycle between the theory and the reality, every PF poke should be done at least one cycle before the TIA starts to draw else it glitches.

(code updated : http://shazz.untergrund.net/VCS/forum/)

 

Time to test on a real VCS to check this is not a Stella emulation issue.....

Link to comment
Share on other sites

I checked, 3 cycles and all the timings I have added in the comments were checked against Stella S. Syc. counter so I count well but Stella doesn't behave as I expect, but maybe this is like the RESPx registers, there is a short delay (nearly 1 CPU cycle) required for the TIA to take into account the data poked in PFx registers ? Where can I check that ?

Link to comment
Share on other sites

I checked, 3 cycles and all the timings I have added in the comments were checked against Stella S. Syc. counter so I count well but Stella doesn't behave as I expect, but maybe this is like the RESPx registers, there is a short delay (nearly 1 CPU cycle) required for the TIA to take into account the data poked in PFx registers ? Where can I check that ?

Yes, there is a short delay of 2 color clocks (2/3 of a machine cycle), although some versions or clones of the VCS have a delay of 3 color clocks (1 machine cycle).

 

Attached is a little table I made that shows when you need to write to a given playfield register.

 

The first two columns list the playfield registers in the order they're drawn in. The first column lists them in the order they're drawn if you're using a repeated playfield (PF0-PF1-PF2-PF0-PF1-PF2), and the second column lists them in the order they're drawn if you're using a reflected playfield (PF0-PF1-PF2-PF2-PF1-PF0). Note that the table lists each pixel, not just the entire register, with the number after the dash being the pixel number within that register (*not* the bit number, which is different). The letter in front of the register name indicates whether the register is being drawn on the left side or the right side of the screen.

 

The third column shows the screen position (numbered from 0 to 159) where the pixel starts. The fourth column shows the color clock (numbered from 0 to 227) where the pixel starts-- basically, the value in the third column plus 68.

 

The fifth column shows (for a standard TIA) the *latest* machine cycle when you can write to the given playfield register to have the new value be used for the given pixel. The sixth column shows the same thing, but for the non-standard TIAs or TIA clones where the delay is a tad longer. The seventh column contains an asterisk wherever columns five and six are different.

playfield_timings_2.txt

Link to comment
Share on other sites

Thanks !!

Really interesting & crystal clear !

 

So in my case, I tried really to check the reality of the 5th column (for an asymetrical PF) and my tests using Stella show that the results are a little different from what expected and so from your table too:

Basically, if I only consider the timings for the 1st pixel of each PF (as I don't know how to poke them one by one) here are my results

 

repeatd reflectd pixel clock -2/3 -1

=======================================

HBLANK | HBLANK | -68 | 000 | 75 | 75

---------------------------------------

LPF0-0 | LPF0-0 | 000 | 068 | 21 | 21 *

LPF1-0 | LPF1-0 | 016 | 084 | 27 | 27

LPF2-0 | LPF2-0 | 048 | 116 | 39 | 37 *

---------------------------------------

RPF0-0 | RPF2-0 | 080 | 148 | 48 | 48

RPF1-0 | RPF2-4 | 096 | 164 | 54 | 53 *

RPF2-0 | RPF1-4 | 128 | 196 | 64 | 64

---------------------------------------

 

The red numbers come from the tests I done with Stella (not "my" counts) but whatever the difference, maybe an emulation issue or some luck due to my tests PF data, your table explained me this gap between the theory and the reality and I'll use it as a reference ! Thanks a lot !

That closes this topic icon_smile.gif

Edited by shazz
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...