Luigi301 Posted October 6, 2017 Author Share Posted October 6, 2017 (edited) What is the proper usage of the A1_CLIP register? I have a 320x200 pixel buffer that I'm blitting my sprites to so I set A1_CLIP to 320x200 and enabled the CLIP_A1 flag in B_CMD. Now for some reason the blitter is only drawing the first line of each sprite? If I don't set CLIP_A1, the sprites are drawn correctly. I don't see anything about that in the manual or on the jagdox blitter page. My B_CMD value is (SRCEN | DSTEN | UPDA1 | UPDA2 | CLIP_A1 | DCOMPEN | LFU_REPLACE). Edited October 6, 2017 by Luigi301 Quote Link to comment Share on other sites More sharing options...
JagChris Posted October 6, 2017 Share Posted October 6, 2017 (edited) This may help you. http://www.mulle-kybernetik.com/jagdox/blitter.html http://www.mulle-kybernetik.com/jagdox/dox.html Whoops nm you already found jagdox Edited October 6, 2017 by JagChris Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted October 11, 2017 Author Share Posted October 11, 2017 (edited) That doesn't clarify much. I'm still not sure why it doesn't actually clip to the destination bitmap dimensions I put down. I'm getting better at GPU programming though, I ported the Amiga's linked list library over to the Jaguar and rewrote the 68K list traversal macros in GPU code so I can have the GPU process sprite display lists now. Yay? Edited October 11, 2017 by Luigi301 Quote Link to comment Share on other sites More sharing options...
JagChris Posted October 11, 2017 Share Posted October 11, 2017 And since no one is responding I'm guessing you must have gone somewhere that nobody around here has gone before. If you haven't already you can check out existing Source codes and see if there are some clues in there. http://www.3do.cdinteractive.co.uk/viewtopic.php?f=35&t=3430 If you see any there you are interested in that are listed but not accessible anymore let me know and I will get them to you because I still have them all except for Myst. Quote Link to comment Share on other sites More sharing options...
Sporadic Posted October 11, 2017 Share Posted October 11, 2017 What is the proper usage of the A1_CLIP register? I have a 320x200 pixel buffer that I'm blitting my sprites to so I set A1_CLIP to 320x200 and enabled the CLIP_A1 flag in B_CMD. Now for some reason the blitter is only drawing the first line of each sprite? If I don't set CLIP_A1, the sprites are drawn correctly. I don't see anything about that in the manual or on the jagdox blitter page. My B_CMD value is (SRCEN | DSTEN | UPDA1 | UPDA2 | CLIP_A1 | DCOMPEN | LFU_REPLACE). From what you have shown here, it looks like you are using the clip register correctly. You set it to the window size to clip and its always positioned at 0,0. So setting it to your buffer size is correct. I would perhaps check you set the 320x200 correctly (bit shifting the 320). There are also bugs around using the clip stuff but I don't think any appear as what you've shown above. 1 Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted October 13, 2017 Author Share Posted October 13, 2017 Hmm. Well, I've also managed to break my GPU program in such a way that it works in Virtual Jaguar but doesn't blit anything on real hardware, too. That's... annoying to debug. Quote Link to comment Share on other sites More sharing options...
+CyranoJ Posted October 13, 2017 Share Posted October 13, 2017 Hmm. Well, I've also managed to break my GPU program in such a way that it works in Virtual Jaguar but doesn't blit anything on real hardware, too. That's... annoying to debug. Try slow/fast blitter in VJ. 1 Quote Link to comment Share on other sites More sharing options...
Zerosquare Posted October 13, 2017 Share Posted October 13, 2017 Or ask Shamus to port VJ to the Jag. 2 Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted October 14, 2017 Author Share Posted October 14, 2017 Try slow/fast blitter in VJ. Yeah, it's a sync problem with the CPU and GPU both trying to bait at the same time. Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted October 14, 2017 Author Share Posted October 14, 2017 (edited) Nope, that wasn't it. The problem is that VBCC only word-aligns heap variables and the GPU's load instructions want them to be long-aligned. And VBCC doesn't support the GCC aligned attribute. I know some C compilers will long-align longs (or structs that begin with longs) implicitly but I haven't been able to massage VBCC into doing that. I'd need a GCC version of libc for Jaguar to recompile it in GCC. Edited October 14, 2017 by Luigi301 Quote Link to comment Share on other sites More sharing options...
ggn Posted October 14, 2017 Share Posted October 14, 2017 Raptor basic + actually has gcc 4.6.4 + libc compiled and used as a part of its build process. If you want you can take a look at https://github.com/ggnkua/bcx-basic-Jaguar(start from build.bat and you'll see where the calls to vbcc and rln are being made to compile and link stuff). If you are interested and cannot understand what's happening I could make a stripped down version. 1 Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted October 15, 2017 Author Share Posted October 15, 2017 Oh, I can look into that, thanks. Either way, I managed to get it working. The problem was that the pointer in my actual node structure that makes up the display list wasn't long-aligned. Now I've got the real hardware display working again. 1 Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted October 23, 2017 Author Share Posted October 23, 2017 Im writing a vertical shooter and I want to set up a scrolling tilemapped playfield with the object processor. I have a 320x200 window with a 272x200 area I want to scroll, so 17x13 = 221 8bpp tiles on screen, plus a 1bpp text layer. Can the object processor handle that? I dont really know how much DMA time is available on a scanline. Quote Link to comment Share on other sites More sharing options...
+CyranoJ Posted October 23, 2017 Share Posted October 23, 2017 Im writing a vertical shooter and I want to set up a scrolling tilemapped playfield with the object processor. I have a 320x200 window with a 272x200 area I want to scroll, so 17x13 = 221 8bpp tiles on screen, plus a 1bpp text layer. Can the object processor handle that? I dont really know how much DMA time is available on a scanline. Easily. But you'll have to use branch objects. Check "Project One" on the reboot site. RAPTOR API, and by extension, Raptor BASIC+ also have a built in tilemap engine capable of doing what you have described. Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted October 23, 2017 Author Share Posted October 23, 2017 (edited) Hmm, okay. I think I understand... The object list needs to start with branch objects that branch to the correct row for this scanline. The last tile in the row branches to whatever is after the tilemap (in my case, a font layer bitmap). So the object processor is only processing the bitmaps that make up a specific row rather than everything. I seem to be having a problem resetting the objects' pixel data pointer and height on each frame though. I tried putting it in my VBlank interrupt but it doesn't seem to be fast enough. I set it to 16x5 identical tiles and it seems to be running out of time. I'll look at the Project One code and see if I can figure out what's going on. I'm just doing mobj_background.graphic->p0.height = 200; mobj_sprites.graphic->p0.height = 200; mobj_font.graphic->p0.height = 200; for(uint8_t i=0; i<BACKGROUND_TILES_WIDE*BACKGROUND_TILES_TALL; i++){ mobj_bg_tiles[i].graphic->p0.height = 16; mobj_bg_tiles[i].graphic->p0.data = (uint32_t)shipsheet >> 3; } during VBlank. Edited October 23, 2017 by Luigi301 Quote Link to comment Share on other sites More sharing options...
+CyranoJ Posted October 23, 2017 Share Posted October 23, 2017 The only realistic way to update all that per frame is on the GPU. Note, that if you branch half way down a bitmap, the OP will render from the top of the image, so you need to clip objects depending on which segment they are in. 1 Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted October 23, 2017 Author Share Posted October 23, 2017 So reset the bitmap structures with the GPU when we hit the VBL rather than the 68k? Okay, I'll give that a try. Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted October 25, 2017 Author Share Posted October 25, 2017 It works! I've got a tilemapped background layer. (The sprite rendering and background rendering programs collide if there are too many sprites on screen but I can fix that another time.) 2 Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted November 12, 2017 Author Share Posted November 12, 2017 My 3D code is getting better... I'm using all three processors in parallel though not in an optimized fashion and they definitely block the CPU when they don't need to. The rotation matrix is calculated by the DSP, matrix multiplication is done by the GPU, and lines are drawn with the blitter on the GPU. 2 Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted November 27, 2017 Author Share Posted November 27, 2017 Took me a while but I've got perspective projection now! It's damn slow and something's wrong with my matrix that's making it fisheye but it's still cool. 3 Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted November 29, 2017 Author Share Posted November 29, 2017 (edited) Rotate the cube! Uh... Edited November 29, 2017 by Luigi301 2 Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted December 5, 2017 Author Share Posted December 5, 2017 I've got most of the render loop running on the GPU now. The perspective matrix and view matrix are precalculated at the start of a frame. The CPU gets an object and its position, yaw/pitch/roll, and scale parameters and passes them to the GPU. It uses these to calculate the model's transform matrix, then calculates the full transformation matrix. Right now the CPU passes each of the object's triangles in turn to the GPU, where they're combined with the transformation matrix, projected, and blitted by the blitter. The next step is to just pass the full list of triangles to the GPU instead. But look how fast it is now! 8 Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted December 9, 2017 Author Share Posted December 9, 2017 The whole render loop is now running on the GPU. It's still two separate programs (one to produce the transformation matrix and one to project and blit the polygons) but it's faaaaaast now! Removing all the DSP code means I can bolt on the U235 sound engine, too. 6 Quote Link to comment Share on other sites More sharing options...
Luigi301 Posted December 11, 2017 Author Share Posted December 11, 2017 (edited) Well, this was fun. Working on back-face culling, so I need to calculating normals of polygons. That means I need to do square roots... of fixed-point numbers... on the GPU. Translating the C to GPU code took about an hour and a half. C: #define FRACBITS 16 #define ITERS (15 + (FRACBITS >> 1)) FIXED_32 FIXED_SQRT(FIXED_32 val) { //http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.178.3957&rep=rep1&type=pdf uint32_t root, remHi, remLo, testDiv, count; root = 0; //Clear root remHi = 0; //Clear high part of partial remainder remLo = val; //Get argument into low part of partial remainder count = ITERS; //16.16 number do { remHi = (remHi << 2) | (remLo >> 30); remLo <<= 2; root <<= 1; testDiv = (root << 1) + 1; //test radical if(remHi >= testDiv) { remHi -= testDiv; root++; } } while(count-- != 0); return root; } GPU: FIXED_SQRT: ;; Calculate the square root of the fixed-point number in r0. ;; Returns the result in r0. FRACBITS .equ 16 ITERS .equ (15 + (FRACBITS >> 1)) SQRT_ROOT .equr r20 SQRT_REM_HI .equr r21 SQRT_REM_LO .equr r22 SQRT_TEST_DIV .equr r23 SQRT_COUNT .equr r24 SQRT_THIRTY .equr r25 SQRT_LOOP_CHECK .equr r29 SQRT_LOOP_ADDR .equr r30 moveq #0,SQRT_ROOT moveq #0,SQRT_REM_HI move r0,SQRT_REM_LO moveq ITERS,SQRT_COUNT moveq #30,SQRT_THIRTY movei #.sqrt_loop,SQRT_LOOP_ADDR movei #.sqrt_do_loop,SQRT_LOOP_CHECK .sqrt_loop: shlq #2,SQRT_REM_HI move SQRT_REM_LO,TEMP1 sh SQRT_THIRTY,TEMP1 or TEMP1,SQRT_REM_HI shlq #2,SQRT_REM_LO shlq #1,SQRT_ROOT move SQRT_ROOT,SQRT_TEST_DIV shlq #1,SQRT_TEST_DIV addq #1,SQRT_TEST_DIV cmp SQRT_TEST_DIV,SQRT_REM_HI jump ge,(SQRT_LOOP_CHECK) ;if remHi >= testDiv nop sub SQRT_TEST_DIV,SQRT_REM_HI addq #1,SQRT_ROOT .sqrt_do_loop: subq #1,SQRT_COUNT cmpq #-1,SQRT_COUNT jump ne,(SQRT_LOOP_ADDR) ; if not -1, keep looping nop move SQRT_ROOT,r0 GPU_RTS Edited December 11, 2017 by Luigi301 3 Quote Link to comment Share on other sites More sharing options...
Tursi Posted December 12, 2017 Share Posted December 12, 2017 Don't know if this helps since you've already ported the code, but one of the classic ways to determine back faces was simply to determine if the points of your triangle are sorted clock-wise or counter-clockwise after projection. You can do that with just a couple of comparisons. 4 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.