Jump to content
IGNORED

jaglib and DSP program offsets


Luigi301

Recommended Posts

What is the proper usage of the A1_CLIP register? I have a 320x200 pixel buffer that I'm blitting my sprites to so I set A1_CLIP to 320x200 and enabled the CLIP_A1 flag in B_CMD. Now for some reason the blitter is only drawing the first line of each sprite? If I don't set CLIP_A1, the sprites are drawn correctly. I don't see anything about that in the manual or on the jagdox blitter page. My B_CMD value is (SRCEN | DSTEN | UPDA1 | UPDA2 | CLIP_A1 | DCOMPEN | LFU_REPLACE).

 

SjnkKWa.png

Edited by Luigi301
Link to comment
Share on other sites

That doesn't clarify much. I'm still not sure why it doesn't actually clip to the destination bitmap dimensions I put down.

 

I'm getting better at GPU programming though, I ported the Amiga's linked list library over to the Jaguar and rewrote the 68K list traversal macros in GPU code so I can have the GPU process sprite display lists now. Yay?

Edited by Luigi301
Link to comment
Share on other sites

And since no one is responding I'm guessing you must have gone somewhere that nobody around here has gone before.

 

If you haven't already you can check out existing Source codes and see if there are some clues in there.

 

http://www.3do.cdinteractive.co.uk/viewtopic.php?f=35&t=3430

 

If you see any there you are interested in that are listed but not accessible anymore let me know and I will get them to you because I still have them all except for Myst.

Link to comment
Share on other sites

What is the proper usage of the A1_CLIP register? I have a 320x200 pixel buffer that I'm blitting my sprites to so I set A1_CLIP to 320x200 and enabled the CLIP_A1 flag in B_CMD. Now for some reason the blitter is only drawing the first line of each sprite? If I don't set CLIP_A1, the sprites are drawn correctly. I don't see anything about that in the manual or on the jagdox blitter page. My B_CMD value is (SRCEN | DSTEN | UPDA1 | UPDA2 | CLIP_A1 | DCOMPEN | LFU_REPLACE).

 

SjnkKWa.png

From what you have shown here, it looks like you are using the clip register correctly. You set it to the window size to clip and its always positioned at 0,0. So setting it to your buffer size is correct. I would perhaps check you set the 320x200 correctly (bit shifting the 320).

 

There are also bugs around using the clip stuff but I don't think any appear as what you've shown above.

  • Like 1
Link to comment
Share on other sites

Nope, that wasn't it. The problem is that VBCC only word-aligns heap variables and the GPU's load instructions want them to be long-aligned. And VBCC doesn't support the GCC aligned attribute. I know some C compilers will long-align longs (or structs that begin with longs) implicitly but I haven't been able to massage VBCC into doing that. I'd need a GCC version of libc for Jaguar to recompile it in GCC.

Edited by Luigi301
Link to comment
Share on other sites

Raptor basic + actually has gcc 4.6.4 + libc compiled and used as a part of its build process. If you want you can take a look at https://github.com/ggnkua/bcx-basic-Jaguar(start from build.bat and you'll see where the calls to vbcc and rln are being made to compile and link stuff). If you are interested and cannot understand what's happening I could make a stripped down version.

  • Like 1
Link to comment
Share on other sites

Im writing a vertical shooter and I want to set up a scrolling tilemapped playfield with the object processor. I have a 320x200 window with a 272x200 area I want to scroll, so 17x13 = 221 8bpp tiles on screen, plus a 1bpp text layer. Can the object processor handle that? I dont really know how much DMA time is available on a scanline.

Link to comment
Share on other sites

Im writing a vertical shooter and I want to set up a scrolling tilemapped playfield with the object processor. I have a 320x200 window with a 272x200 area I want to scroll, so 17x13 = 221 8bpp tiles on screen, plus a 1bpp text layer. Can the object processor handle that? I dont really know how much DMA time is available on a scanline.

Easily. But you'll have to use branch objects. Check "Project One" on the reboot site.

 

RAPTOR API, and by extension, Raptor BASIC+ also have a built in tilemap engine capable of doing what you have described.

Link to comment
Share on other sites

Hmm, okay. I think I understand...

 

The object list needs to start with branch objects that branch to the correct row for this scanline. The last tile in the row branches to whatever is after the tilemap (in my case, a font layer bitmap). So the object processor is only processing the bitmaps that make up a specific row rather than everything. I seem to be having a problem resetting the objects' pixel data pointer and height on each frame though. I tried putting it in my VBlank interrupt but it doesn't seem to be fast enough. I set it to 16x5 identical tiles and it seems to be running out of time.

 

GGYaXFi.png

 

I'll look at the Project One code and see if I can figure out what's going on. I'm just doing


mobj_background.graphic->p0.height = 200;
mobj_sprites.graphic->p0.height = 200;
mobj_font.graphic->p0.height = 200;


for(uint8_t i=0; i<BACKGROUND_TILES_WIDE*BACKGROUND_TILES_TALL; i++){
  mobj_bg_tiles[i].graphic->p0.height = 16;
  mobj_bg_tiles[i].graphic->p0.data = (uint32_t)shipsheet >> 3;
}
during VBlank.
Edited by Luigi301
Link to comment
Share on other sites

  • 3 weeks later...

My 3D code is getting better... I'm using all three processors in parallel though not in an optimized fashion and they definitely block the CPU when they don't need to. The rotation matrix is calculated by the DSP, matrix multiplication is done by the GPU, and lines are drawn with the blitter on the GPU.

 

Qq8buW4.gif

  • Like 2
Link to comment
Share on other sites

  • 2 weeks later...

I've got most of the render loop running on the GPU now. The perspective matrix and view matrix are precalculated at the start of a frame. The CPU gets an object and its position, yaw/pitch/roll, and scale parameters and passes them to the GPU. It uses these to calculate the model's transform matrix, then calculates the full transformation matrix. Right now the CPU passes each of the object's triangles in turn to the GPU, where they're combined with the transformation matrix, projected, and blitted by the blitter. The next step is to just pass the full list of triangles to the GPU instead.

 

But look how fast it is now!

 

VW7kvUr.gif

  • Like 8
Link to comment
Share on other sites

Well, this was fun. Working on back-face culling, so I need to calculating normals of polygons. That means I need to do square roots... of fixed-point numbers... on the GPU. Translating the C to GPU code took about an hour and a half.

 

C:

#define FRACBITS 16
#define ITERS (15 + (FRACBITS >> 1))
FIXED_32 FIXED_SQRT(FIXED_32 val)
{
    //http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.178.3957&rep=rep1&type=pdf
    uint32_t root, remHi, remLo, testDiv, count;
   
    root = 0;    //Clear root
    remHi = 0;   //Clear high part of partial remainder
    remLo = val; //Get argument into low part of partial remainder
    count = ITERS;  //16.16 number
   
    do {
        remHi = (remHi << 2) | (remLo >> 30); remLo <<= 2;
        root <<= 1;
        testDiv = (root << 1) + 1; //test radical
        if(remHi >= testDiv) {
            remHi -= testDiv;
            root++;
        }
    } while(count-- != 0);
           
    return root;
}
GPU:
FIXED_SQRT:
    ;; Calculate the square root of the fixed-point number in r0.
    ;; Returns the result in r0.
    FRACBITS        .equ    16
    ITERS           .equ    (15 + (FRACBITS >> 1))
 
    SQRT_ROOT       .equr   r20
    SQRT_REM_HI     .equr   r21
    SQRT_REM_LO     .equr   r22
    SQRT_TEST_DIV   .equr   r23
    SQRT_COUNT      .equr   r24
 
    SQRT_THIRTY     .equr   r25
    SQRT_LOOP_CHECK .equr   r29
    SQRT_LOOP_ADDR  .equr   r30
 
    moveq   #0,SQRT_ROOT
    moveq   #0,SQRT_REM_HI
    move    r0,SQRT_REM_LO
    moveq   ITERS,SQRT_COUNT
 
    moveq   #30,SQRT_THIRTY
    movei   #.sqrt_loop,SQRT_LOOP_ADDR
    movei   #.sqrt_do_loop,SQRT_LOOP_CHECK
 
.sqrt_loop:
    shlq    #2,SQRT_REM_HI
    move    SQRT_REM_LO,TEMP1
    sh      SQRT_THIRTY,TEMP1
    or      TEMP1,SQRT_REM_HI
    shlq    #2,SQRT_REM_LO
 
    shlq    #1,SQRT_ROOT
    move    SQRT_ROOT,SQRT_TEST_DIV
    shlq    #1,SQRT_TEST_DIV
    addq    #1,SQRT_TEST_DIV
 
    cmp     SQRT_TEST_DIV,SQRT_REM_HI
    jump    ge,(SQRT_LOOP_CHECK) ;if remHi >= testDiv
    nop
 
    sub     SQRT_TEST_DIV,SQRT_REM_HI
    addq    #1,SQRT_ROOT
 
.sqrt_do_loop:
    subq    #1,SQRT_COUNT
 
    cmpq    #-1,SQRT_COUNT
    jump    ne,(SQRT_LOOP_ADDR) ; if not -1, keep looping
    nop
 
    move    SQRT_ROOT,r0
 
    GPU_RTS
Edited by Luigi301
  • Like 3
Link to comment
Share on other sites

Don't know if this helps since you've already ported the code, but one of the classic ways to determine back faces was simply to determine if the points of your triangle are sorted clock-wise or counter-clockwise after projection. You can do that with just a couple of comparisons. :)

  • Like 4
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...