Jump to content

ZackAttack

Members
  • Content Count

    785
  • Joined

  • Last visited

Posts posted by ZackAttack


  1. Unrolling the loop would also free up some cycles during the loads because you'd no longer need an indexed load. Just hardcode the ZP address for each load. Honestly, I don't see how this would work without unrolling. Even if you don't use Y for loop maintenance don't you still need it for the indexed loads and that means extra overhead to deal with Y being used for two different things each scan line.

     

    Of course, there's always bus stuffing. PM me if you ever want to switch over to the dark side.

    • Like 3

  2. Sooooooooooo bankswitching is infinite and I can literally demake Doom on a 2600? Awesome! Wait, no, that's probably too extreme...maybe I should leave it at Super Mario Bros. 3.

    Trading ROM space for performance has its limits, but with other cart tech like using the ARM processor in the harmony encore a FPS on the 2600 is completely feasible. As is smb 3.


  3. I am about to develop a 2LK for the "Lock and Load" part of the game, the second part, which will feature:

    • 2 coloured sprites
    • 1 missile
    • a coloured asymmetrical playfield

    A few questions about this.

     

    1. Will the playfield be drawn between bands?
    2. Is the missile going to be limited to just a few pixels per band?
    3. How tall did you plan on making the PF pixels?
    4. What's the largest/tallest PF pixel you'd accept?
    5. Will there be more than one PF image for this mode of the game?
    6. Can the PF be limited to the middle 32 PF pixels so that updating PF0 can be avoided?
    7. What are you trying to draw with the PF?

  4. Plenty of time since you only execute one code path or the other.

     

    Speaking of time. Since you're using bands there's no need for flipdraw or any other overhead on P1. Just pad the graphics and color data with a few bytes if you want to allow a small amount of vertical movement. Then offset the pointer to effect the vertical position within each band. Another nice thing about the bands is that you can use mask draw for P0 with a mask that's 3 * band height. So a 32 line band would only need 96 bytes for the mask. Then you can do P0 with only 5 cycles of overhead. Just as important, this gets rid of all the branches and makes it a lot easier to make the kernel exactly 76 cycles.

    ; Updates both sprites in 37 cycles
    lda (ptrgrp0),y
    and (ptrmask),y
    sta GRP0
    lda (prtcol0),y
    sta COLUP0
    lda (ptrgrp1),y
    sta GRP1
    lda (prtcol1),y
    sta COLUP1
    
    

    One more thing to consider is how P1 is being positioned between bands. If you want to draw anything between the bands you may need to have different kernel fragments that all do the same thing, but with a RESP1 strobe at different times. Then before hand you figure out which fragment to jump to based on where P1 is to be positioned. Then have a few lines to use HMOVE for fine positioning. If you have one line for HMOVE you'd need 160/15 = 11 fragments, but with 3 lines of HMOVE you'd only need 160/45 = 4 different fragments. You'd be left with at least 4 lines between each P1, but you gain the potential for some playfield or other enhancements.

    • Like 1

  5. I'm not saying you should get rid of the branch. I'm saying that the branch prevents the P0 graphics from being preloaded for the next line. So instead of 0 you get a garbage value. Maybe in pseudo code it could be easier to see.

     

    if(drawP0) then

    y = graphicForNextLine

    end if

     

    GRP0 = y <- y only is set if the if evaluates to true.

     

    Instead what you want is this

     

     

    if(drawP0) then

    y = graphicForNextLine

    else

    y = 0

    end if

    GRP0 = y <- now y is 0 if it's not time to draw P0

    Diff your previous posted asm file with this one to see the small change which makes all the difference. guerrilla-fix-y.asm

     

    • Like 1

  6. You have a branch that is skipping the code which loads the value for GRP0 into Y. This is why Y never contains the graphics data. The use of LDA (PTR),Y appears to be fine.

        lda #P0_HEIGHT      ; 2 47
        dcp Player0Offset   ; 5 52
        bcc .p0FlipDraw     ; 2 54 (3 54) <- Branch is taken, leaving Y with offset value
        ldy Player0Offset   ; 3 57
    
        lda (Player0Clr),y  ; 5 62
        sta COLUP0          ; 3 65 
        lda (Player0Ptr),y  ; 5 70
        tay                 ; 2 72
    
    .p0FlipDraw             ;(3 54)
    
    • Like 1

  7. That sample sounds amazing, by the way. Is it writing to the audio registers every scanline, at about 15000Hz?

     

    Yes, AUDC0 and AUDC1 are set to 0 during initialization and then AUDV1 and AUDV0 are written to every scan line. Fortunately the address of the AUDVx registers are just before the GRPx registers. So the JSR trick can be used twice in a row to update 4 registers before restoring the SP back to GRP1. This demo is only using 4bit samples though. $00 is always written to AUDV1, but that's only because I originally planned on doing 4bit and only asked for 4bit samples.

    • Like 2

  8. Tested version 3 on my 7800 and it works, altough there's still some warm up time to make it perfectly stable.

     

    Sadly, it doesnt seem to fix my problematic Jr. This sequence plays in a loop, so I took a video of one iteration. It looks okay for a bit, then goes into a loop, looks okay again for a bit, then does it again.

    Ok, version 4 has improved the detection algorithm based on your feedback. Please try it out.

    • Like 1

  9. Sadly, it doesnt seem to fix my problematic Jr. This sequence plays in a loop, so I took a video of one iteration. It looks okay for a bit, then goes into a loop, looks okay again for a bit, then does it again.

     

    Thanks for posting the video. That is very helpful. During the stuff-low sweep D6 should be detected as a failure but it's not failing when $00 is stuffed. Looks like this is the same problem that alex_79 has. Had it been properly detected I'm certain the correction routine would have fixed it.

     

    Should be a simple change to include all 128 values in the detection routine. I'll try to post a new version soon.


  10. Tested version 3 on my 7800 and it works, altough there's still some warm up time to make it perfectly stable. It just takes a few seconds (like, between 5 and 10), not over two minutes like in a previous test rom (here's another example with the same console)

     

    Here is the result with the console just turned on:

    attachicon.gifcold.jpg

    The spots indicated by the arrows (that in the picture are on), actually flicker on and off for that short warm up time.

     

    After that the image is stable and it's like this:

    attachicon.gifwarm.jpg

     

     

    I must add that I just noticed that my 7800 shows slightly graphics corruption in the "2600BC" demo credits, which makes me wonder if it is starting to fail and if its behaviour is therefore the result of that fact, rather than a different hardware revision. (Sorry for the flash. Seems to be the only way to avoid the blur caused by the scrolling text)

    attachicon.gifP1070064.JPG

    attachicon.gifP1070066.JPG

    attachicon.gifP1070067.JPG

     

    How many low and high failures where there during the first detection phase? And which bits failed? If you look at Hobo's screenshot you can see that D6 had a stuff high failure, I assume this would have also appeared as a failure on the stuff low step before it too.

     

    If detection is working properly it indicates your system had 3 bits which failed both high and low. Two of them would have been corrected by varying which register is stored and the third would still appear as a glitch. Most significant bits are corrected first in order to minimize the magnitude of the glitches. For PF it doesn't matter, but for color, move and other TIA registers it would.

     

    If it's a detection issue, I could cycle through all 128 values for each bit a few times to improve the chance of detecting the intermittent failures. It appears that this failure only occurs when the value being stuffed is $fc, and even then it's only some of the time. I was also thinking about including a mechanism to allow the detection to be rerun at any given time. Just in case something changes after playing a game for a while.

     

    In my notes from the last round of testing it was the systems that TheHoboInYourRoom ​and alex_79 tested which were the most problematic. One is already working completely and the other is very close. We'll need to refine the driver some and test across a broader range of systems. Looks like we will be successful soon enough.

    • Like 2

  11. Version 3 is a success, with a little interesting warm-up behavior.

    attachicon.gifbustest-1-a.jpg

     

    Now, that first photo was taken at the end of the first cycle less than 30 seconds after turning the Atari on, so it was still cold. But a couple bits flickered slightly while the tests were running: D6 of PF1 started flickering near the end of the stuff-low test and continued during the stuff-high test, and D6 of PF2 started flickering (during the stuff-high test) about a minute after the PF1 bit did.

    attachicon.gifbustest-1-b.jpg

     

    After the Atari warmed up from playing a game, there was no flickering at all. However, the image at the end of the cycle was never incorrect.

     

    Ok, great. This is exactly what we want to see. The failures were properly detected and compensated for. Obviously the detection doesn't need to be visible or take so long but for debugging purposes it's nice to see it in action. The correction isn't applied until after the detection phase. So it's normal to see glitches at that time. The important thing is that the test pattern was always correct.


  12. Another failure on my Rev. E Jr.

    After a few cycles, the scanning bar started jittering back and forth by a couple color clocks (to the right) as it scanned. This test is actually more successful when my Atari is cold (D0 of both PF1 and PF2 were stuffed correctly before this photo was taken).

    attachicon.gifbustest-0.jpg

     

    Was this with the updated version I posted this morning? DirtyHairy discovered a bug which cause things to get worse after the first cycle on systems with at least one stuff-low failure. Version 2 should fix that at least.

     

    Perhaps the stuff-high code has a bug as well. I'll review the code again to be sure. If the code was working properly that screen shot would indicate that 6 of the 8 bits can't be stuffed high or low. Based on your previous test that seems to indicate the code is not working correctly.


  13. Success... sort of :) On my PAL Jr. the first pass reliably ends with the correct test picture. All subsequent runs display two black bars (see my photo below) but, as the first test picture is fine, I suspect that this is not an issue with the algorithm itself, but merely a bug in the subsequent passes.

     

    attachicon.gifIMG_20171219_233431.jpg

    Would you post a video of the first few iterations? Hopefully it can help me find the bug.

     

    This is exciting. We may be close to a workable solution.

    • Like 2

  14. Here's another attempt which combines my prototype driver, stuffing high and low, detection of failures, and an idea Fred had to use multiple registers and illegal opcodes to correct up to 2 bits. Hopefully this will work on all the machines that previously ran into issues.

     

    This must be run on a harmony cart. Emulators will not know how to load it.

     

    Current build:

    stuff-with-detection-and-correction4.bin

     

    Previous builds:

    stuff-with-detection-and-correction3.bin

    stuff-with-detection-and-correction2.bin

    stuff-with-detection-and-correction1.bin

     

    When you first load the rom it will have some vertical lines on the sides, a test pattern in PF1 and PF2 should be black. A blue bar will sweep across part of the screen. This is the program using collision detection to determine if there are any bits which can't be stuffed low. The blue bar should turn red if a failure is detected.

     

    Next pf2 should be all on. Any bits that were detected as low failures will now switch to being stuffed high. The blue bar sweeps the screen again to detect high failures this time and will turn red as soon as it finds any.

     

    Once the detection is complete, the algorithm will find the optimal combination of stuffing low, high, and using different store instructions. It will then attempt to display the test pattern with this optimal combination. If all is well there should only be the vertical bars on the sides of the screen were PF0 is. The rest of it should look like the picture below.

     

    The program will repeat a cycle of low detection, high detection, test pattern indefinitely. However, the correction values are retained, so there shouldn't be any more failures detected after the first iteration if it finds a correct combination to use.

     

    post-40226-0-51406900-1513718580_thumb.png

     

    Edit:

    12/21 version 4 improves failure detection so stuffing high and low are checked with all 128 values for each data bit multiple times. Should now detect intermittent failures much better.

    12/20 version 3 makes P0 positioning more robust, completely restarts detection each cycles, fixes stuff-high driver bug

    Version 2 fixes bug that corrupted state in subsequent passes

     

    • Like 5

  15. Figured out the problem with the JSR function. Forget to account for mirroring. Was waiting for it to access $1a instead of $011a. Version 4 has the properly fixed driver and updated JSR function. At this point I believe this should be compatible with all systems. I'm thinking about under clocking the ARM processor during development just to add some margin for error.

     

    There's also some additional audio in version 4. Turns out I had a lot more space left in the ROM than I realized. There's still some things to work out with the linker script.


  16. With the new build both Harmony & Encore show this, and the lines moves upward on the display.

     

    attachicon.gifIMG_9580.jpg

     

     

    Awesome demo! Though I get the same as Spiceware on my 4 Switch VCS with the new build.

     

     

    Same for me on my wood-grain 4-switch. First build shows either black ar dark brown screen, sometimes with a few vertical stripes. Second build is black with dots scrolling up both edges of the screen.

     

    Note, I am still using Harmony version 1.06. I am on Arch Linux, and there is no AUR package for HarmonyCart. I am reading up on making PKGBUILD files, so I can add it to the AUR for other Arch users, although the handful of people who use Arch has very little crossover with the handful of people who have a Harmony.

     

    Thanks for all the feedback. This was certainly a tricky one. Turns out the problem was the hold time of the last ROM byte injected before a switch to zeropage. The driver change that I made in version 2 was correct, but it resulted in a breaking change to the JSR function. Unfortunately I ran the wrong bin file and thought version 2 was working with the driver change. Turns out version 2 doesn't work anywhere :(

     

    I'm still trying to figure out how to fix the JSR function to work with the fixed driver, but for now I just hacked the original driver to waste some cycles before tristating the data bus. This seems to resolve it, but it's not as robust as it should be and I will fix it right once I figure out this JSR problem.

     

    What's interesting is why it worked in my testing but doesn't work for anyone else. I still had the harmony cart plugged into my test harness which allows the logic analyzer to be attached to the Atari busses. As CPUWIZ pointed out a long time ago, these "mile long" wires could cause problems. In this case it caused the hold time to increase artificially and compensated for the flawed driver. I plugged my harmony cart directly into the 7800 and then the problem appeared for me as well. I apologize for this testing failure on my part. Obviously I will test in this configuration from now on and reserve the harness for debugging purposes only.

     

    I've uploaded version 3 which uses the hack to extend the hold time and works when plugging the harmony directly into the Atari. Hopefully this will work for everyone now and serve as a reward for helping me find this problem.

×
×
  • Create New...