Jump to content

SCPCD

Members
  • Content Count

    103
  • Joined

  • Last visited

Everything posted by SCPCD

  1. Hi ! I'm currently working on an C app with multiple files and I'm using "m68k-atari-mint-gcc". It works great, but after adding some more variables, I got an issue where a variable in BSS segment are not aligned with what I need : When using gcc, all variable are aligned in word boundaries, but for specific ones, I need those to be longword aligned. I tried to add " __attribute__((aligned(4)))" to have something as "u8_t toto[1024] __attribute__((aligned(4)))" but at build time gcc give me the warning "warning: alignment of 'toto' is greater than maximum object file alignment. Using 2 [enabled by default]" and finally the variable is not aligned to a longword. Why gcc don't take account of the attribute and how to resolve the issue ? Thanks
  2. Are you using the original PSU or another one ? I had something similar when using a PSU given not enough power. Recommanded one is 9V 1.2A
  3. and RAM is 2 cycle and ROM is 5 cycles min.
  4. [$000000, $200000[ = [$200000, $400000[ as it's in the same bank of DRAM and the memory controller will probably wrap (to be checked, but it's what I remember from my test 18years ago) [$400000, $800000[ is another bank of DRAM that is not used on the Jaguar and will always return "FFFFFFFF" (as there are pull-ups on the bus).
  5. It's seems that it's phrase aligned in compile time, but your start data section address in link time is not aligned making the final thing misaligned. Exemple : .text .loop bra .loop .data .phrase toto: dc.b 0 .phrase titi: dc.b 1 will be assembled to : LTEXT+0 : bra.s LTEXT+0 LDATA+0 : toto : dc.b 0 LDATA+1 : dc.b 0 LDATA+2 : dc.w 0 LDATA+4 : dc.l 0 LDATA+8 : titi : dc.b 1 so if DATA is not properly aligned, toto & titi will be misaligned. Another point to take account is to pay attention about multiple file linking as if no proper alignment size is specified between each file, it can result on misalignement.
  6. GPU object doesn't use the YPOS. I initialy thought it was a bug, but in fact it seems that some version of the doc are wrong as the last i known version (7 Nov 1994) describe the GPU obj as interrupt always and reading the netlist confirm that. Probably it was a forgotten features or they think it was not needed since there is branch object and prefer to use more bits to pass information to GPU ?
  7. GPU as a 32-bit bus and so, GPU internal access is 32-bit. But, it can do 64-bit external access because the Hi-data register is part of the 64-bit Gateway interconnection. To speed up DRAM memory read/write, i will probably advise to use loadp/storep, but for ROM1 address space, i think it's more easiest to use standard load/store for optimisation. with loadp you have to be sure that hidata is properly loaded before doing the storep or hidata read, this implies : - to wait that the destination register is written back using the scoreboard detection Or - to insert enough usefull instruction before the storep instruction or hidata read (see my ST2Jag optimised code from Orion_'s original code) When it's DRAM memory, you know that the memory controller will take the same number of bus cycle to read the data than standard load/store so it's easy to insert some instructions (supposing the bus is not taken by a highest priority) . But with ROM1, you don't (generally) know the width of the card and the selected speed, in any case, it will take at least 2x5 cycles to read 64-bit data for highest ROM1 speed with 32-bit size. If you use GPU interrupts : - don't use external load when using loadp - if you access ROM1 with loadp and can't do enough instruction, GPU will be in scoreboard wait state as you will probably need the loadp result, and so delay the interrupt in other word, you will kill GPU performance as it will be very difficult to keep the instruction flow. Using standard load will reduce the overrall pending time. Anyway, the speedest way to copy data will be the Blitter in phrase mode (if there is enough data to copy comparing to the time to set blitter registers), but it also implies that the blitter need to be available to do that.
  8. As describe in the "Technical reference v8" section "Timing Diagram" and probably in another document, you can't rely in ChipSelect to latch the address : there is a pipeline mecanism on ROM1 access, so the Chip Select is remaining asserted during all "burst" transfert. With the 68k it never happens since the memory controller add an extra cycle to "translate" the jaguar bus to the 68k bus width, and so a bus release is made. In opposite, with the GPU, Blitter or OP, you will have 100% chance to encounter successive read/write in "burst mode" at a time or another. With the DSP, i don't remaind if I made some ROM1 tests in the past so can't say if it's case 1 or case 2.
  9. I haven't do 68k code from many years now but : How does this work without a reverse movem before the rts ? Does that mean a post-compile process is done to add instructions that is not in this assembly output ?
  10. SCPCD

    STNICCC

    Here it is : http://scpcd.free.fr/temp/STNICC/stnicc.html
  11. SCPCD

    STNICCC

    Great Job swapd0 ! I try it on my JagFPGA and it works very well ! It's perfectly fluid and too fast in some part It finishs in "1:14:48" in the screen counter (60Hz) and in real time "1:02:66" to loop !
  12. There is no bug on the GPU on that. If it crashs the way you say, I would be probably 100% sure it's one the following way : - GPU code is wrong (wrong registers used, or not properly initialised, wrong address/instructions from the build process, wrong alignment) - Another CPU makes it crachs : most probability the 68k, then the Blitter (wrong commands), then DSP, and the less probability for the OP. What is running on other CPU when the GPU crash ? Some games make fucking bad stuff with the GPU (like overwriting GPU PC at any time, step by step, PC checking, no semaphore, etc...) Many use GPU code in main memory to init some basic stuff but it can be possible that the loading/unpacking process was not executed properly or at the good time if not used on real jaguar with real jagcd and original boot rom, GPU will then crash.
  13. You crash the GPU by making : load (r31),r28 ; return address addq #4,r1 addq #2,r28 ; next op jump t,(r28) ; rte store r29,(r1) in your code R1 = G_FLAGS so you increase by 4 G_FLAGS, instead to incresead R31 (the stack) I think that it should be : load (r31),r28 ; return address addq #4,r31 addq #2,r28 ; next op jump t,(r28) ; rte store r29,(r1) Should work better.
  14. From what i remember : For each line of the ground, a scaled object is modified by a GPU Object interrupt. The GPU compute the Object X position, the start address of the texture and the scale to apply to draw the current (or next?) line.
  15. It's not possible to do 8b -> 24b with the OP : the CLUT is only 16-bit.
  16. Like said Orion_, the 24-bit sprite can't be transparent : there is no transparent check with 24-bit sprite in the OP netlist, but there is a trick used in IronSoldier2. From what i know the 24-bit mode as never been use in games, but only for fixed screens (like title screen) with very minimum animation/interaction. I only see it yet in 2 games (maybe there is others but don't think many more) : - In IronSoldier the scrolling picture before the menu screen is 24-bit. - In IronSoldier2 the weapon selector use a trick to use transparent in 24-bit mode : it is 24-bit background with a 24-bit sprite for the selector used as 16-bit with transparent. L00004210: 0000000848008FD3 ; branch 506 < VC, $00004240 L00004218: 00000008480040D3 ; branch 26 > VC, $00004240 L00004220: 02FCF80846380170 0000800A0280D026 ; bitmap X:38, Y:46, H:224, IW:320pix, D:$0002FCF8, L:$00004230, BPP:32, DW:320pix, IDX:0, FP:0, PITCH1 | TRANS L00004230: 00C4200848054600 00008000E038C128 ; bitmap X:296, Y:192, H:21, IW:56pix, D:$0000C420, L:$00004240, BPP:16, DW:56pix, IDX:0, FP:0, PITCH1 | TRANS L00004240: 0000000000000004 ; stop #$0000000000000000
  17. The B_COUNT register should be rewritten before each B_CMD, because the B_COUNT is directly used by the blitter : at the end of the blitt, it's value is [0;0] A1_PIXELs and A2_PIXEL registers are also directly used by the blitter : at the end of the blitt, those register reflects the next [X;Y] pixel coordinate to read and write.
  18. Can't build the sample code : i got an error during C compiling, but i see some errors : - all "LPOKE PIXEL4|XADDPHR|PITCH1,A2_FLAGS" are reversed from the correct instruction that should be "LPOKE A2_FLAGS, PIXEL4|XADDPHR|PITCH1" - in phrase mode (XADDPHR) you can't use pixel size below 8-bit, else the blitter goes to infinite loop. if you have 4-bit pixel that is multiple of phrase, you can use 8-bit pixel with PIXEL8 and divide by 2 the amount of the X in B_count, else you need to use the pixel mode. For the 68k, you don't need to read the blitter status register, the blitter has higher priority than the 68k and freeze it during his blitt operation. The blitter release the bus for the 68k only on 68k interrupts, and when blitter operation is in idle or stopped states. But for compatibility and stability, I highly recommended to add the blitter checking status before using it. EDIT : I think that Raptor use the Blitter in it's OP list updates, probably it will need to use the flag to disable this features but don't know how to do it else blitter regs will be destroyed each VBL.
  19. In the past, I also would like to do a FPGA console board, then I saw all existing dev board with almost all needed to do it and at lower price than making one myself, so I didn't take the time to do the PCB myself and targeting an existing one. The only thing it will need for those board is a PCB interface for classic controler/cartridge, a box and done . Else, what i'm searching on a FPGA board is : - "Big" FPGA (around 77kLE) - HDMI supporting 1080p/60Hz - Analog audio output - SD card - Ethernet interface - high RAM bandwidth - extension connection to easily connect other things
  20. in fact it's a very good idea since OP scaled bitmap is slower than standard bitmap object : - scaled object write 1 pixel by cycle to the line buffer - unscaled object write 2 pixels by cycles. A 2x scaled imply 4x more time than an unscaled object to write the object to the line buffer and during all this time the main ram is locked by the OP. Using the technic above, makes a 2x scaled without extra time and give many more times to the system.
  21. the Object list is defined as this (dumped in runtime so some parametres are not the initial one like Y, H and D ) L00021DC0: 00000043F4008FDB ; branch 507 < VC, $00021FA0 L00021DC8: 00000043F40040CB ; branch 25 > VC, $00021FA0 L00021DD0: 00000043BE384CD3 ; branch 410 > VC, $00021DF0 L00021DD8: 00000043BE388CD3 ; branch 410 < VC, $00021DF0 L00021DE0: 0000000000000CD2 ; gpuobj 410, #0000000000000000 L00021DE8: 00000043F4008FDB ; branch 507 < VC, $00021FA0 L00021DF0: 1EA28043C20C8180 0000000280A0C007 ; bitmap X:7, Y:48, H:50, IW:160pix, D:$001EA280, L:$00021E10, BPP:16, DW:160pix, IDX:0, FP:0, PITCH1 L00021E10: 1C801043C40A0CD0 0000000280A0B00E ; bitmap X:14, Y:410, H:40, IW:320pix, D:$001C8010, L:$00021E20, BPP:8, DW:320pix, IDX:0, FP:0, PITCH1 L00021E20: 1CB21043F40A0CD0 0000800280A0B00E ; bitmap X:14, Y:410, H:40, IW:320pix, D:$001CB210, L:$00021FA0, BPP:8, DW:320pix, IDX:0, FP:0, PITCH1 | TRANS L00021FA0: 0000000000000004 ; stop #$0000000000000000 The GPU object interrupt the GPU and it do this : L00F03030: load (r1), r29 ; A43D r1 = $F02100 L00F03032: move r0, r30 ; 881E r0 = $F00028 L00F03034: storew r3, (r30) ; BBC3 r3 = $0006C1 L00F03036: subq #2, r30 ; 185E L00F03038: storew r28, (r30) ; BBDC L00F0303A: bclr #3, r29 ; 3C7D L00F0303C: bset #12, r29 ; 399D L00F0303E: load (r31), r28 ; A7FC L00F03040: move r2, r31 ; 885F L00F03042: addq #2, r28 ; 085C L00F03044: jump T, (r28) ; D380 L00F03046: store r29, (r1) ; BC3D on GPU object interrupts, the GPU write the $6C1 value in the vmode register wich is : VIDEN | MODE_16CRY | CSYNC | BGEN | PWIDTH_332 on VBL interrupts, the 68k do : L000041A6: link a6, #$0000 ; 4E56 0000 NV.. L000041AA: movem.l d0-d5/a0-a5, -(a7) ; 48E7 FCFC H... L000041AE: jsr L000040F0.l ; 4EB9 000040F0 [email protected] ; write OP list L000041B4: move.l L00021FC4.l, d0 ; 2039 00021FC4 9.... ; $000EC1 L000041BA: move.w d0, L00F00028.l ; 33C0 00F00028 3....( and $00EC1 is : VIDEN | MODE_16CRY | CSYNC | BGEN | PWIDTH_166 That's why the first part of the screen seems low res when the bottom is sharpen.
  22. most game use the 350 pixels (NTSC) / 345 pixels (PAL) mode. Doom use the 175 pixels (NTSC) / 172 pixels (PAL) mode for the first 200 line then switch to 350 pixels (NTSC) / 345 pixels (PAL) mode for the last 40 lines. This give a free 2x horizontal zoom for the first 200 lines.
  23. Interesting concept but : - on real jaguar, the power led is between both subD-HD15, how do you add a led between both USB connectors ? do you made changes in the top plastic to move the led ? - USB connectors seems very near of each one, is there enough space to plug 2 USB cables ? - USB connectors and subD-9 seems very near of each one, is there enough space to plug a subD-9 cable and an USB cable ? is the footprint of subd-9 and USB connectors enough small to be so closed each other on the PCB ? What is the USB connectors used ? type A ? is the USB connector footprint enough small to not touch the power push button ? - How do you modify the mold to move the subD connectors to leave enough space for USB connectors ?
  24. Just tried it with and wihtout a team tap, and it works fine
  25. The CRY to RGB convertion is made in the pixel data path from the Line Buffer to the Screen, so this doesn't take more cycle from the user point of view since it's in the video output stream. (page 23 of Jag_v8.pdf for more information) Internaly, the screen drawing logic takes account of the CRY2RGB convertion time by adding extra synchronisation for the 16-RGB to 24-RGB convertion. (and so the variable mode, that allow CRY and RGB pixel at same time, could be done)
×
×
  • Create New...