Jump to content
IGNORED

Why can't the Atari 2600 display better graphics?


pojr

Recommended Posts

I just stumbled upon this thread, and I completely agree with the original topic starter regarding the 2600 Centipede title screen. I felt the same way the first time I saw it as well 30+ years ago. You can give me all the factual and technical explanations you want, and at the time, limited by 1982 technology and cost efficiency, I will absolutely accept it. But as for present day and/or future technology, I see absolutely no reason why it could not eventually be done.

 

First time I saw VCS Centipede I was at a friend's house and his kid brother was playing it. I thought "yuck" as I was used to a much better looking centipede clone on my Vic 20.

 

Now, knowing the limitations of the Atari like I do, I can appreciate it for the masterpiece of Atari programming that it really is.

 

While using things like DPC+ can make for a better Centipede, I don't see the mushrooms as being drawn by anything except the playfield - the flicker would be horrific if you tried to draw them with the players. One major benefit of using DPC+ would be the ability for each mushroom to have a unique height, which would let you know many hits each one had taken. I suspect it would also make it possible for true trackball support.

  • Like 1
Link to comment
Share on other sites

Having played the lovely Missile Command-TB hack, I have to say a similar Centipede-TB hack for the 2600 would be fabulous. :)

Not a hack, a homebrew. My understanding of how the trackball works is you have to take multiple readings during the kernel, the drawing of the game screen, just like is required to obtain a paddle position. I suspect the reason you've not seen a Centepede-TB hack is there's not enough time left in the kernel to take those readings.

  • Like 2
Link to comment
Share on other sites

Without DPC you need ~40 cycles for one read (maybe you can optimize this by 5-10 cycles).

 

I never tested how frequently you have to read before the player notices that the trackball becomes less responsive. Probably every 4th scanline would be enough.

 

Here is the CX-22 code:

; CX-22:
    lda    SWCHA                ; 4
    lsr                         ; 2
    lsr                         ; 2
    lsr                         ; 2
    lsr                         ; 2
    and    #BIT_MASK_CX         ; 2
    ldy    trackX_CX22          ; 3
    sta    trackX_CX22          ; 3
    eor    NextTrackTabCX22,y   ; 4 = 24
    beq    .leftCX22            ; 2/3
    eor    #BIT_MASK_CX         ; 2
    bne    .endCX22_X           ; 2/3
;.rightCX22
    ldy    posX_CX22            ; 3
    iny                         ; 2
    bne    .setCX22_X           ; 2/3 = 13/14
    beq    .endCX22_X           ; 3

.leftCX22                       ; 3
    ldy    posX_CX22            ; 3
    beq    .endCX22_X           ; 2/3
    dey                         ; 2 = 10
.setCX22_X
    sty    posX_CX22            ; 3 =  3
.endCX22_X
; total: 31..41
Edited by Thomas Jentzsch
  • Like 2
Link to comment
Share on other sites

No matter what you do or how you do it, so long as the cart plugs into the Atari cartridge port, and the unmodified system produces a signal that goes to the television, it counts.

I disagree.

 

That would mean, that the 2600 is only defined by the TIA. But IMO it is much more, a combination of multiple hardware components.

Edited by Thomas Jentzsch
  • Like 2
Link to comment
Share on other sites

Without DPC you need ~40 cycles for one read (maybe you can optimize this by 5-10 cycles).

With DPC+ I'd try the following:

 

; in VerticalSync
  ldx #<TrackBallReadings
  stx DF7LOW
  ldx #>TrackBallReadings
  stx DF7HI
 
; periodically during kernel
 lda SWCHA
 sta DF7WRITE

The ARM code can quickly process the data in TrackBallReadings to figure out the movement. It'll complicate things if readings need to happen during Vertical Blank and Overscan, but it would still be doable.

 

I do something similar in Stay Frosty 2 where I stash the collision registers after drawing a platform:

        lda CXM0P               ; 3 26/21 - D7 = M0P1 (snowball hit object)
        sta DF6WRITE            ; 4 30/25
        lda CXP0FB              ; 3 33/28 - D7 = P0PF (frosty hit ice/platform)
                                ;           D6 = P0BL (frosty hit elevator)
        sta DF6WRITE            ; 4 37/32
        lda CXM1P               ; 3 40/35 - D6 = M1P1 (bird hit nose)
        sta DF6WRITE            ; 4 44/39
        lda CXPPMM              ; 3 47/42 - D7 = M0P1 (Frosty hit object)
        sta DF6WRITE            ; 4 51/46
...
        sta CXCLR               ; 3 66 - clear collision data

The ARM code uses the stashed data to figure out if Frosty hit something, and in what horizontal zone it occurred.

  • Like 1
Link to comment
Share on other sites

 

With DPC+ I'd try the following:

 

 

; in VerticalSync
  ldx #<TrackBallReadings
  stx DF7LOW
  ldx #>TrackBallReadings
  stx DF7HI
 
; periodically during kernel
 lda SWCHA
 sta DF7WRITE

The ARM code can quickly process the data in TrackBallReadings to figure out the movement. It'll complicate things if readings need to happen during Vertical Blank and Overscan, but it would still be doable.

 

I do something similar in Stay Frosty 2 where I stash the collision registers after drawing a platform:

lda CXM0P               ; 3 26/21 - D7 = M0P1 (snowball hit object)
        sta DF6WRITE            ; 4 30/25
        lda CXP0FB              ; 3 33/28 - D7 = P0PF (frosty hit ice/platform)
                                ;           D6 = P0BL (frosty hit elevator)
        sta DF6WRITE            ; 4 37/32
        lda CXM1P               ; 3 40/35 - D6 = M1P1 (bird hit nose)
        sta DF6WRITE            ; 4 44/39
        lda CXPPMM              ; 3 47/42 - D7 = M0P1 (Frosty hit object)
        sta DF6WRITE            ; 4 51/46
...
        sta CXCLR               ; 3 66 - clear collision data

The ARM code uses the stashed data to figure out if Frosty hit something, and in what horizontal zone it occurred.

That's pretty cool to just do a read and store during the kernel. Without using extra hardware I suppose you could take Thomas's routine and spread over more lines. That restricts you to multiline kernels, of course.

Link to comment
Share on other sites

Ah, my secret plan is coming together - I've got two of the brightest, most talented minds who currently work in the hobby both thinking about Trak-Ball code following my comment about a "Centipede-TB" ...

 

Now I just need a secret lair under a volcano and an analogous plan to control the world with my burgeoning army of disposable henchmen ...

  • Like 3
Link to comment
Share on other sites

 

And posted to on Mon Aug 31, 2015 10:58 PM

Go pick on someone else.

 

ADDED:

This is a retrogame thread and forum. Old is cool. Any question and any answer is valid regardless of when it was asked and/or responded to.

 

If you feel the need (which I don't believe you do) to clean up this thread, apply for moderator status and begin deleting with post number 64 onwards.

I don't have him on ignore, but I do pay him no heed. I enjoyed when he said LadyBug on the 2600 was impossible, then had to eat crow when the game became reality.

  • Like 2
Link to comment
Share on other sites

Without DPC you need ~40 cycles for one read (maybe you can optimize this by 5-10 cycles).

 

I never tested how frequently you have to read before the player notices that the trackball becomes less responsive. Probably every 4th scanline would be enough.

 

Here is the CX-22 code:

; CX-22:
    lda    SWCHA                ; 4
    lsr                         ; 2
    lsr                         ; 2
    lsr                         ; 2
    lsr                         ; 2
    and    #BIT_MASK_CX         ; 2
    ldy    trackX_CX22          ; 3
    sta    trackX_CX22          ; 3
    eor    NextTrackTabCX22,y   ; 4 = 24
    beq    .leftCX22            ; 2/3
    eor    #BIT_MASK_CX         ; 2
    bne    .endCX22_X           ; 2/3
;.rightCX22
    ldy    posX_CX22            ; 3
    iny                         ; 2
    bne    .setCX22_X           ; 2/3 = 13/14
    beq    .endCX22_X           ; 3

.leftCX22                       ; 3
    ldy    posX_CX22            ; 3
    beq    .endCX22_X           ; 2/3
    dey                         ; 2 = 10
.setCX22_X
    sty    posX_CX22            ; 3 =  3
.endCX22_X
; total: 31..41

I thought about this routine. I propose two optimizations to speed it up. Each comes with a price you may not be able to pay, but will give you back cycles in the kernel.

 

1) Use two temporary variables for adjusting the horizontal position. It will save cycles by eliminating all of the branches after updating posX_CX22. Have a variable for moving right, and one for moving left. Increment each separately, and then compare them after the kernel to figure out how much and what way you need to move. I suggest two variables because you have 160 horizontal positions. Using 1 variable won't cut it because you won't know if you moved left or right if you go over 127 counts.

 

Edit: After thinking about this more, one variable that you inc/dec might be fine. It's very unlikely the speediest player could ever move the trackball that fast in 1 frame. So just start with 0, inc/dec as needed, and add it to posX_CX22 after the screen is drawn. It'll be easy to handle overflow/underflow at that time. You'll also need to change the always branch to a jump in routine I posted to avoid possible jitter.

 

2) Expand your "NextTrackTabCX22" table so that you don't have to do any shifts for the SWCHA value. That's an easy 8 cycles. You can claw back the wasted rom by stuffing other data in the unused bytes. What's even better is if you can find other tables that are indexed by shifting (in this case 4 times), and interlace them all. That way you loose no bytes, and actually gain some by eliminating the shifts from all of the routines. It also makes them run faster, of course.

 

 

Here is what I came up with using those two optimizations. It runs at 24-28 cycles.

BIT_MASK_CX   = $03 << 4



    lda    SWCHA                 ;4  @4
    and    #BIT_MASK_CX          ;2  @6
    ldy    trackX_CX22           ;3  @9
    sta    trackX_CX22           ;3  @12
    eor    NextTrackTabCX22,y    ;4  @16
    bne    .checkRight           ;2³ @18/19
.left:                        
    inc    adjustY_Left          ;5  @23   = 0 at beginning of kernel
    bne    .end_CX22             ;3  @26   always branch

.checkRight:
    eor    #BIT_MASK_CX          ;2  @21
    bne    .end_CX22             ;2³ @23/24
;.right
    inc    adjustY_Right         ;5  @28   = 0 at beginning of kernel
.end_CX22:
Link to comment
Share on other sites

For a console released in 1977, with only 128 bytes [!] of RAM, Atari VCS graphics were the shit. If you are comparing it to games coming out for competing 16-bit consoles in 1989, like Genesis and Turbo Grafx, and later SNES, then you're doing it wrong. Derp.

 

Fun fact: The Atari VCS can display twice as many colors (112) as the NES (56)! :P

Edited by stardust4ever
  • Like 2
Link to comment
Share on other sites

I thought about this routine. I propose two optimizations to speed it up. Each comes with a price you may not be able to pay, but will give you back cycles in the kernel.

Yup, that's about the fastest you can get using my algorithm (28 cycles). The table becomes a bit large, but that's no major concern anymore.

 

But how about alternative approaches?

 

One idea would be to follow Darrell's suggestion for DPC. One would just need some RAM for that. For scanning every 4th line during a 180 scanline kernel, we need 45 bytes of RAM.

  lda SWCHA          ; 4
  ldx trackIndex     ; 3
  sta trackBuffer,x  ; 5
  inc trackIndex     ; 5 = 17
Or, if you can reserve enough ZP-RAM:

  lda SWCHA          ; 4
  pha                ; 3 = 7 !!!
Or how about assuming that the direction will not change during one frame? You detect direction once per frame (best outside the kernel) and then you would not have to make a difference between increasing or decreasing the variable.

 

And then probably you don't have to check if two bits have changed. So you just check if SWCHA changes its value.

  lda SWCHA          ; 4
  cmp lastTrack      ; 3
  beq .noMove        ; 2/3
  sta lastTrack      ; 3
  inc posDiff        ; 5 = 17 
.noMove
Maybe that would allow hacking existing games. Finding 17 cycles in a multi line kernel looks possible.
Link to comment
Share on other sites

I did some experiments using Stella (verification on real hardware is necessary!).

Sampling takes place over 180 scanlines, each scanline is sampled. It seems that not matter how fast I move the mouse, no more than ~20 changes happen during one frame. Also there are no 2-bit changes between two samples, so we can omit that extra check.

Detection of the direction outside the kernel is a bit tricky and not 100% reliable so far:

  1. I remember the initial sample of SWCHA before the kernel starts.
  2. After the kernel I take the number of changes (posDiff) modulo 4.
  3. For even values, I cannot determine the direction (this is the problem), so I assume the previous one.
  4. For the two odd values, I have two dedicated routines where I compare the last sample of SWCHA with the initial sample and determine the direction. This is reliable because there are no 2-bit changes happening.

The problem occurs on rapid direction changes with a certain new direction speed (2, 4, 6 etc. changes/frame). Then the cursor continues into the wrong direction. This does not happen very frequently, because movement speed is never 100% constant. But still it is noticeable.

 

Attached is a test program for the CX-80 if you want to check it out. If you move up and down erratically enough, sometime you will notice a wrong direction response from the cursor.

 

Ideas for improvement are welcome. :)

BTW: The 17 cycle code above requires two extra cycles. Add "and #%11000000" after lda SWCHA. Now we need 19 cycles.

Trackball Test v0.03.bin

Link to comment
Share on other sites

I don't have him on ignore, but I do pay him no heed. I enjoyed when he said LadyBug on the 2600 was impossible, then had to eat crow when the game became reality.

 

Show me a post where I said that, or anything even remotely close to that.

 

 

Fun fact: The Atari VCS can display twice as many colors (112) as the NES (56)! :P

 

TIA supports 128 colors, not 112.

 

Link to comment
Share on other sites

I am (mainly) looking for the most efficient code for existing ROM like Centipede or Reactor. So only PLA may work.

 

BTW: I regained the extra 2 cycles by using:

  lda #%11000000
  sta SWACNT

This sets the unused lower bits to output, so they never change on reading. That makes masking the bits inside the kernel superfluous.

Edited by Thomas Jentzsch
  • Like 2
Link to comment
Share on other sites

Attached is a test program for the CX-80 if you want to check it out. If you move up and down erratically enough, sometime you will notice a wrong direction response from the cursor.

 

Any chance of a CX-22 version? :)

 

(My secret plan seems to be working perfectly. Excellent.)

  • Like 1
Link to comment
Share on other sites

I am (mainly) looking for the most efficient code for existing ROM like Centipede or Reactor. So only PLA may work.

If you can expand the rom a little bit then maybe you can use PLA when you branch if SWCHA hasn't changed instead of has changed. You'll need to find 3 cycles somewhere else to rejoin main code when you can. This changes the code to 12/14 cycles not including jump back at later point.

 

Alternatively if you had X available (whivh im sure you don't or you would just INX), then you could jump between opcode operand if you have STA PLA,X to perform PLA. Otherwise you store to your register in zp indexed off of $68 (PLA ). So you need X as constant value.

Link to comment
Share on other sites

If you can expand the rom a little bit then maybe you can use PLA when you branch if SWCHA hasn't changed instead of has changed.

Good idea. How about this?

  lda     SWCHA       ; 4
  cmp     lastTrack   ; 3
  beq     .noChange   ; 2/3
  sta     lastTrack   ; 3
  .byte   $a9         ;-2   skips pla
.noChange
  pla                 ; 4 = 14
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...