Jump to content

SCPCD

Members
  • Content Count

    99
  • Joined

  • Last visited

Community Reputation

108 Excellent

About SCPCD

  • Rank
    Star Raider

Contact / Social Media

Profile Information

  • Gender
    Male
  • Location
    France
  • Interests
    Informatic and Electronic

Recent Profile Visitors

7,859 profile views
  1. Don't know if it's what you would like to achieve, but, I give it a try and made some modification (search SCPCD tags) Seems working fine on real Jaguar, JagFPGA & VirtualJaguar. rasters_SCPCD.zip
  2. I don't think adding a load just after the nop will be enough: Each 68k memory acces takes 6cycles on the jaguar (from what I remember) so arround 12gpu cycles : you will have approx 12 cycles between each word write. If I don't do mistakes, the "wait_list" will take arround 6 cycles to complete. Adding extra nop (I would say approx 6) before the second load will probably do the trick for test purpose, but it's not the solution. The best way is indeed a semaphore. Edit: After posting I have a doubt about if it's 5 or 6cycles on the jaguar for the 68k memory access, need to check in my notes, but it's higher than standard 68k use.
  3. One of the bug is indeed what CyranoJ says. I will see another one : from what I understand, bank0 is used for your service interrupt and bank1 for your user code. In your post #21, i don't see any r31 configuration (don't know if it's in part of code we don't have), but I will explain what I suspect now that we have more code and mixing with post #4. Initialisation : (#4) movei #init_raster,r0 ; 3 movei #G_ENDRAM,stack ; 3 jump t,(r0) ; 1 moveta stack,stack ; 1 => G_ENDRAM in b0r31 & b1r31 (stack = r31) Dispatcher code : (probably b1r31) (#21) ; jsr to routine movei #.l0,return subqt #4,r31 jump (event) store return,(r31) => b1r31 changed and ".l0" written in stack Interrupt code : (b0r31) (#4 & #21) load (r31),r30 addq #2,r30 addqt #4,r31 jump (r30) store flags,(gpuFlagsPtr) ; restore interrupt => b0r31 changed during interrupt entering and PC written in stack -> RTE : jump back to user code correcting stack pointer Event RTS : (#21) ;rts moveq #0,r0 ; skip 0 parameter load (r31),r1 jump (r1) addqt #4,r31 If it's the case, when entering the interrupt routine during your user "event" code, your stack content is overwritten as b0r31 & b1r31 are independant. Then, the interrupts service completed and go back to the user code, at the end of your user "event" code when your "rts" append, the PC go back to an unattended address. Edit: Hmm in #4, i see a subq #4, for the "user" stack, maybe to reserve one stack slot for the interrupt service routine ?
  4. Can you provide to me the source code of your test program ? I will look at it.
  5. There is not enough informations about how the interrupts routine is written to give a solution but, this is what i have done for my FACTS demo : Main code use BANK1 Interrupt service code use BANK0 Initialisation routine : GPU_STACK .equ $F04000 ; BANK0 : ;-------- OBJ_FLAGS .equr r22 vbl_counter .equr r26 vbl_interrupts .equr r27 pGflags .equr r28 cGflags .equr r29 cGstack .equr r30 pGstack .equr r31 ;GPU initialisation gpu_init: movei #G_FLAGS,pGflags ;Flags GPU load (pGflags),cGflags bclr #3,cGflags bset #7,cGflags ;enable op interrupt bset #12,cGflags ;clear pending interrupt bclr #14,cGflags ;select bank0 store cGflags,(pGflags) ;mise a jour des flags nop nop ;Stack Pointer movei #GPU_STACK,pGstack ;adresse SP movei #VblInterrupt,vbl_interrupts movei #OBF,OBJ_FLAGS moveq #0,vbl_counter Service routine at slot 3 (GPU Object Interrupts): gpu_int_3: jump (vbl_interrupts) nop nop nop nop nop nop nop GPU Object interrupt routine : .long VblInterrupt: storew r0,(OBJ_FLAGS) ; Rr0 Rr22 | - | - load (pGflags),cGflags ; Rr28 | Cr22 | - addq #1,vbl_counter ; #1 Rr26 | Cr29 | W(r22) load (pGstack),cGstack ; Rr31 | Cr26 | Wr29 bclr #3,cGflags ; #3 Rr29 | Cr30 | Wr26 addq #2,cGstack ; #2 Rr30 | Cr29 | Wr30 addq #4,pGstack ; #4 Rr31 | Cr30 | Wr29 bset #12,cGflags ; #12 Rr29 | Cr31 | Wr30 jump T,(cGstack) ; T Rr30 | Cr29 | Wr31 store cGflags,(pGflags) ; Rr29 Rr28 | - | Wr29 ; - | Cr28 | - ; - | - | W(r28) In this exemple, I do those steps : write to the OBF register as soon as possible to allow the OP to continue his process read the GPU Flags register increase the vbl counter read the stack clear the IMASK bit correct the address of the instruction that will be executed after jumping increase the stack pointer clear the interrupt flag jump to the new address write back flags to register
  6. Each JagtopusProgrammer cost arround 360€ to made in 2012. Probably more now as there are obsoletes components.
  7. Indexed and offset load/store take 2 more cycle (as wait_states) than standard load/store. Remplacing it with standard load/store and using addq can be more efficiency as you can rearange opcodes to avoid as much wait_states as possible (at least 1 for each load/store) : The idea is to replace the 2 wait_states by 1 addq and 1 another useful instruction.
  8. There are some tips on my website that I use to make the ST2Jag optimization here : http://scpcd.free.fr/jag/jag.htm#ST2Jag It should be neer 99% true, from what I remember. For the ST2Jag exemple : - First column describes what is done in the Read cycle as R[register number] - Second column describes what is done in the Compute cycle as C[register number] and the parrellel memory controller current task as "M" (external memory read), "I" (internal memory read) and "R" (GPU register range) - Third column describes what is done in the Write cycle as W[register number] For External Memory LOAD(B.W.P), it will depend of bus usage but a good approximation is arround 10 cycles (for the ST2Jag, I use 12cycles to be pessimistic). For the "High Long Word register", there is also an exemple in the ST2Jag code If you would like to use loadp/storep, you can't made other load instruction to external memory as it will trash the "high long word register". It will be effectively neer imposible to use it if there is external load in GPU interrupt routine. But, as you can see in my code, you can insert load instruction if it only reads in internal memory. For storep, you effectively can't do something like : storep r0, (r1) store r2, (high_word_register) nop storep r0, (r3) In this case, the high_word_register can be updated by the second instruction before the memory controller has latched the data and write to the r1 memory address : this will depend of the memory controller curent state and bus activities. To avoid this, you can made one of the following : - insert enough instruction between the first and the second instruction but it will be difficult to have something reliable as it depends of the bus activity storep r0, (r1) nop nop nop nop nop nop nop nop nop nop store r2, (high_word_register) nop storep r0, (r3) - or : add an external load or store instruction between the first and the second instruction : when the instruction will arrive to the second storep, you will have the assurance that the first storep is completed because the added load/store will trigg a gpu wait_state as the memory controller is in "work in progress" state. storep r0, (r1) store r4, (somewhere_in_external_memory) store r2, (high_word_register) nop storep r0, (r3) For load/store R14(5)+, those are usefull, but at the cost of an extra (wait_state) cycle. In the ST2Jag exemple, you will see that I replace them by standard load instruction to give me more reordering possibilities and increase instruction pipelining. But It will probably depends of the registers availabilities and algorithms.
  9. Other bits are only used for software (OB[0-3] registers) like for GPU objects.
  10. STOREP/LOADP worth it when you can use it : if you can handle the restriction about those instruction. I will not recommend to use DMAEN, I think that the headache to make it works as you want with all Tom bugs will not worth the time/boost ratio. A better way is to optimize the GPU code.
  11. In the most up to date JTRM, the YPOS for GPU Object was removed. In the netlist, there is no use of YPOS for Gpu object (the state machine always execute gpuint). I think I have read somewhere that this feature was removed from the specification and replaced by the use of branch object.
  12. Bit 3 of Stop object generate an interrupt if equal to 1. (documented on the most up to date JTRM and confirmed in netlist) If bit 2 of INT1 register is enabled, then the stop interrupt will be sent back to the 68K.
  13. SCPCD

    Zero 5 tips?

    Yep it was me playing
  14. The distortion in the video & sound glitch is typical to not enough bandwidth for the OP. lets verify it : - the OP has ~64µs to parse the whole list each line : 26.59MHz * 64µs = 1701cycles - the picture is 352 wide in 16bpp, and the skunkboard is 16-bit at 5cycles : (352*2)/2 *5 cycles = 1760cycles (only for data, extra cycles is needed to read the object description) The scaled up is a side effect to this as the line buffer is swapped while OP haven't finished yet the previous line. What you can do to grab some cycles, is to reduce horizontally the picture. 352 is way larger than the visible screen which is 332 in overscanned. With a 320 wide picture, this will give you 100 available cycles to other objects in internal RAM and if there is not to much object, to wide or to much scaled, maybe it will be sufficient. If you don't want to edit the picture, you can play with the Object IWIDTH and Object DATA to clip it. PS : on standard cartridge, wich is 32-bit 10cycles, you will have : (352*2)/4 * 10 cycles = 1760 cycles. With faster ROM you can set it to 5cycles and have 2x bandwidth. About the Universal header, I would say it's by default for standard cartridge so 32bit 10cycles unless it was already modified.
×
×
  • Create New...