jaglib and DSP program offsets

Luigi301 · October 6, 2017

What is the proper usage of the A1_CLIP register? I have a 320x200 pixel buffer that I'm blitting my sprites to so I set A1_CLIP to 320x200 and enabled the CLIP_A1 flag in B_CMD. Now for some reason the blitter is only drawing the first line of each sprite? If I don't set CLIP_A1, the sprites are drawn correctly. I don't see anything about that in the manual or on the jagdox blitter page. My B_CMD value is (SRCEN | DSTEN | UPDA1 | UPDA2 | CLIP_A1 | DCOMPEN | LFU_REPLACE).

Edited October 6, 2017 by Luigi301

JagChris · October 6, 2017

This may help you.

http://www.mulle-kybernetik.com/jagdox/blitter.html

http://www.mulle-kybernetik.com/jagdox/dox.html

Whoops nm you already found jagdox

Edited October 6, 2017 by JagChris

Luigi301 · October 11, 2017

That doesn't clarify much. I'm still not sure why it doesn't actually clip to the destination bitmap dimensions I put down.

I'm getting better at GPU programming though, I ported the Amiga's linked list library over to the Jaguar and rewrote the 68K list traversal macros in GPU code so I can have the GPU process sprite display lists now. Yay?

Edited October 11, 2017 by Luigi301

JagChris · October 11, 2017

And since no one is responding I'm guessing you must have gone somewhere that nobody around here has gone before.

If you haven't already you can check out existing Source codes and see if there are some clues in there.

http://www.3do.cdinteractive.co.uk/viewtopic.php?f=35&t=3430

If you see any there you are interested in that are listed but not accessible anymore let me know and I will get them to you because I still have them all except for Myst.

Sporadic · October 11, 2017

What is the proper usage of the A1_CLIP register? I have a 320x200 pixel buffer that I'm blitting my sprites to so I set A1_CLIP to 320x200 and enabled the CLIP_A1 flag in B_CMD. Now for some reason the blitter is only drawing the first line of each sprite? If I don't set CLIP_A1, the sprites are drawn correctly. I don't see anything about that in the manual or on the jagdox blitter page. My B_CMD value is (SRCEN | DSTEN | UPDA1 | UPDA2 | CLIP_A1 | DCOMPEN | LFU_REPLACE).

From what you have shown here, it looks like you are using the clip register correctly. You set it to the window size to clip and its always positioned at 0,0. So setting it to your buffer size is correct. I would perhaps check you set the 320x200 correctly (bit shifting the 320).

There are also bugs around using the clip stuff but I don't think any appear as what you've shown above.

Luigi301 · October 13, 2017

Hmm. Well, I've also managed to break my GPU program in such a way that it works in Virtual Jaguar but doesn't blit anything on real hardware, too. That's... annoying to debug.

+CyranoJ · October 13, 2017

Hmm. Well, I've also managed to break my GPU program in such a way that it works in Virtual Jaguar but doesn't blit anything on real hardware, too. That's... annoying to debug.

Try slow/fast blitter in VJ.

Zerosquare · October 13, 2017

Or ask Shamus to port VJ to the Jag.

Luigi301 · October 14, 2017

Try slow/fast blitter in VJ.

Yeah, it's a sync problem with the CPU and GPU both trying to bait at the same time.

Luigi301 · October 14, 2017

Nope, that wasn't it. The problem is that VBCC only word-aligns heap variables and the GPU's load instructions want them to be long-aligned. And VBCC doesn't support the GCC aligned attribute. I know some C compilers will long-align longs (or structs that begin with longs) implicitly but I haven't been able to massage VBCC into doing that. I'd need a GCC version of libc for Jaguar to recompile it in GCC.

Edited October 14, 2017 by Luigi301

ggn · October 14, 2017

Raptor basic + actually has gcc 4.6.4 + libc compiled and used as a part of its build process. If you want you can take a look at https://github.com/ggnkua/bcx-basic-Jaguar(start from build.bat and you'll see where the calls to vbcc and rln are being made to compile and link stuff). If you are interested and cannot understand what's happening I could make a stripped down version.

Luigi301 · October 15, 2017

Oh, I can look into that, thanks.

Either way, I managed to get it working. The problem was that the pointer in my actual node structure that makes up the display list wasn't long-aligned. Now I've got the real hardware display working again.

Luigi301 · October 23, 2017

Im writing a vertical shooter and I want to set up a scrolling tilemapped playfield with the object processor. I have a 320x200 window with a 272x200 area I want to scroll, so 17x13 = 221 8bpp tiles on screen, plus a 1bpp text layer. Can the object processor handle that? I dont really know how much DMA time is available on a scanline.

+CyranoJ · October 23, 2017

Im writing a vertical shooter and I want to set up a scrolling tilemapped playfield with the object processor. I have a 320x200 window with a 272x200 area I want to scroll, so 17x13 = 221 8bpp tiles on screen, plus a 1bpp text layer. Can the object processor handle that? I dont really know how much DMA time is available on a scanline.

Easily. But you'll have to use branch objects. Check "Project One" on the reboot site.

RAPTOR API, and by extension, Raptor BASIC+ also have a built in tilemap engine capable of doing what you have described.

Luigi301 · October 23, 2017

Hmm, okay. I think I understand...

The object list needs to start with branch objects that branch to the correct row for this scanline. The last tile in the row branches to whatever is after the tilemap (in my case, a font layer bitmap). So the object processor is only processing the bitmaps that make up a specific row rather than everything. I seem to be having a problem resetting the objects' pixel data pointer and height on each frame though. I tried putting it in my VBlank interrupt but it doesn't seem to be fast enough. I set it to 16x5 identical tiles and it seems to be running out of time.

I'll look at the Project One code and see if I can figure out what's going on. I'm just doing


mobj_background.graphic->p0.height = 200;
mobj_sprites.graphic->p0.height = 200;
mobj_font.graphic->p0.height = 200;


for(uint8_t i=0; i<BACKGROUND_TILES_WIDE*BACKGROUND_TILES_TALL; i++){
  mobj_bg_tiles[i].graphic->p0.height = 16;
  mobj_bg_tiles[i].graphic->p0.data = (uint32_t)shipsheet >> 3;
}

during VBlank.

Edited October 23, 2017 by Luigi301

+CyranoJ · October 23, 2017

The only realistic way to update all that per frame is on the GPU.

Note, that if you branch half way down a bitmap, the OP will render from the top of the image, so you need to clip objects depending on which segment they are in.

Luigi301 · October 23, 2017

So reset the bitmap structures with the GPU when we hit the VBL rather than the 68k? Okay, I'll give that a try.

Luigi301 · October 25, 2017

It works! I've got a tilemapped background layer.

(The sprite rendering and background rendering programs collide if there are too many sprites on screen but I can fix that another time.)

Luigi301 · November 12, 2017

My 3D code is getting better... I'm using all three processors in parallel though not in an optimized fashion and they definitely block the CPU when they don't need to. The rotation matrix is calculated by the DSP, matrix multiplication is done by the GPU, and lines are drawn with the blitter on the GPU.

Luigi301 · November 27, 2017

Took me a while but I've got perspective projection now! It's damn slow and something's wrong with my matrix that's making it fisheye but it's still cool.

Luigi301 · November 29, 2017

Rotate the cube!

Uh...

Edited November 29, 2017 by Luigi301

Luigi301 · December 5, 2017

I've got most of the render loop running on the GPU now. The perspective matrix and view matrix are precalculated at the start of a frame. The CPU gets an object and its position, yaw/pitch/roll, and scale parameters and passes them to the GPU. It uses these to calculate the model's transform matrix, then calculates the full transformation matrix. Right now the CPU passes each of the object's triangles in turn to the GPU, where they're combined with the transformation matrix, projected, and blitted by the blitter. The next step is to just pass the full list of triangles to the GPU instead.

But look how fast it is now!

Luigi301 · December 9, 2017

The whole render loop is now running on the GPU. It's still two separate programs (one to produce the transformation matrix and one to project and blit the polygons) but it's faaaaaast now! Removing all the DSP code means I can bolt on the U235 sound engine, too.

Luigi301 · December 11, 2017

Well, this was fun. Working on back-face culling, so I need to calculating normals of polygons. That means I need to do square roots... of fixed-point numbers... on the GPU. Translating the C to GPU code took about an hour and a half.

C:

#define FRACBITS 16
#define ITERS (15 + (FRACBITS >> 1))
FIXED_32 FIXED_SQRT(FIXED_32 val)
{
    //http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.178.3957&rep=rep1&type=pdf
    uint32_t root, remHi, remLo, testDiv, count;
   
    root = 0;    //Clear root
    remHi = 0;   //Clear high part of partial remainder
    remLo = val; //Get argument into low part of partial remainder
    count = ITERS;  //16.16 number
   
    do {
        remHi = (remHi << 2) | (remLo >> 30); remLo <<= 2;
        root <<= 1;
        testDiv = (root << 1) + 1; //test radical
        if(remHi >= testDiv) {
            remHi -= testDiv;
            root++;
        }
    } while(count-- != 0);
           
    return root;
}

GPU:

FIXED_SQRT:
    ;; Calculate the square root of the fixed-point number in r0.
    ;; Returns the result in r0.
    FRACBITS        .equ    16
    ITERS           .equ    (15 + (FRACBITS >> 1))
 
    SQRT_ROOT       .equr   r20
    SQRT_REM_HI     .equr   r21
    SQRT_REM_LO     .equr   r22
    SQRT_TEST_DIV   .equr   r23
    SQRT_COUNT      .equr   r24
 
    SQRT_THIRTY     .equr   r25
    SQRT_LOOP_CHECK .equr   r29
    SQRT_LOOP_ADDR  .equr   r30
 
    moveq   #0,SQRT_ROOT
    moveq   #0,SQRT_REM_HI
    move    r0,SQRT_REM_LO
    moveq   ITERS,SQRT_COUNT
 
    moveq   #30,SQRT_THIRTY
    movei   #.sqrt_loop,SQRT_LOOP_ADDR
    movei   #.sqrt_do_loop,SQRT_LOOP_CHECK
 
.sqrt_loop:
    shlq    #2,SQRT_REM_HI
    move    SQRT_REM_LO,TEMP1
    sh      SQRT_THIRTY,TEMP1
    or      TEMP1,SQRT_REM_HI
    shlq    #2,SQRT_REM_LO
 
    shlq    #1,SQRT_ROOT
    move    SQRT_ROOT,SQRT_TEST_DIV
    shlq    #1,SQRT_TEST_DIV
    addq    #1,SQRT_TEST_DIV
 
    cmp     SQRT_TEST_DIV,SQRT_REM_HI
    jump    ge,(SQRT_LOOP_CHECK) ;if remHi >= testDiv
    nop
 
    sub     SQRT_TEST_DIV,SQRT_REM_HI
    addq    #1,SQRT_ROOT
 
.sqrt_do_loop:
    subq    #1,SQRT_COUNT
 
    cmpq    #-1,SQRT_COUNT
    jump    ne,(SQRT_LOOP_ADDR) ; if not -1, keep looping
    nop
 
    move    SQRT_ROOT,r0
 
    GPU_RTS

Edited December 11, 2017 by Luigi301

Tursi · December 12, 2017

Don't know if this helps since you've already ported the code, but one of the classic ways to determine back faces was simply to determine if the points of your triangle are sorted clock-wise or counter-clockwise after projection. You can do that with just a couple of comparisons.

jaglib and DSP program offsets

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members