Jump to content
IGNORED

Another weirdness with the GPU


42bs

Recommended Posts

I have following code:

	moveq	#0,tmp0
.w	cmpq	#0,tmp0
	jr	eq,.w
	nop

One would expect it will never leave, right? But it does.

Changing it to:

	moveq	#0,tmp0
.w	cmpq	#1,tmp0
	jr	ne,.w
	nop

works as expected!

 

(In the actual code, tmp0 is a flag modified by an interrupt.)

 

I also tried something like this:

	moveq	#0,tmp0
    add	tmp0,tmp1
.w	cmpq	#0,tmp0
	jr	eq,.w
	nop

to be sure tmp0 is really 0 when reaching the cmpq.

 

Anyone run into this or sees the problem with it?

Link to comment
Share on other sites

I don't see any problem with code #1.

In this part of code, moveq writeback is in cycle 2, so you have the assurance that the cmpq have the right value in tmp0.

 

If tmp0 is not equal to 0 here, i see 2 cases :

- tmp0 is modified by an interrupt

- tmp0 is conflicted with a long instruction that preced this code (load, div...)

 

  • Like 1
Link to comment
Share on other sites

tmp0 is not modified by an interrupt in the test case and there is no div before.

Esp. this would also have effect on the second example.

What make me crazy is why test equal fails and and for non-equal not.

Interesting, if I double the code in #1 it works?!

Link to comment
Share on other sites

Test #2 only prove that tmp0 is not equal to 1 and from Test#1, tmp0 seems not equal to 0 too.

So i would think that tmp0 is not equal to 0 and not equal to 1 but something else.

 

Maybe you can add a "store tmp0, (somewhere)" between the cmpq and jr (or instead of the nop) to see what his true value is.

It will probably help to see what goes wrong.

 

  • Like 1
Link to comment
Share on other sites

tmp0 is r0, but changed to r10, same behavior.

1 hour ago, DEATH said:

try with an emulator, place a break just after the nop instruction, look at the tmp0 value

VJ does not show the effect. It has actually problems with the GPUOBJ.

5 hours ago, SCPCD said:

Maybe you can add a "store tmp0, (somewhere)

Changed the code:

	moveq	#0,r0
	moveq	#2,r10
.w	cmpq	#1,r10
	jr	ne,.w
	store	r10,(r0)
	addq	#4,r0
	store	BG,(r0)

$0 and $4 is printed by the 68k. (BG = $f00058)

In the above case, the "ne" check, $0 == $2, $4 = $4 (pre-set value)

If I change back to "eq":

	moveq	#0,r0
	moveq	#1,r10
.w	cmpq	#1,r10
	jr	eq,.w
	store	r10,(r0)
	addq	#4,r0
	store	BG,(r0)

It just falls thru and $0 = $1 and $4 = $f00058.

 

_But_ it is somehow related to the OP interrupt. If it does not fire (no GPUOBJ in the OP list), the loop blocks as intended.

CPU and timer interrupt work w/o problem.

 

IRQ_SP.a	REG 31
IRQ_RTS.a	REG 30
IRQ_FLAGADDR.a	REG 29
IRQ_FLAG.a	REG 28

IRQScratch4.a	REG  4
IRQScratch3.a	REG  3
IRQScratch2.a	REG  2
IRQScratch1.a	REG  1
IRQScratch0.a	REG  0

  org	$f03030
op::
	load	(IRQ_FLAGADDR.a),IRQ_FLAG.a
	movei	#op_irq,IRQScratch0.a
	bset	#9+3,IRQ_FLAG.a
	load	(IRQ_SP.a),IRQ_RTS.a
	jump	(IRQScratch0.a)
	bclr	#3,IRQ_FLAG.a

...

irq_return
	addqt	#2,IRQ_RTS.a
	movefa	IRQ_SP,IRQ_SP.a
	jump	(IRQ_RTS.a)
	store	IRQ_FLAG.a,(IRQ_FLAGADDR.a)

op_irq:
	movei	#$f00026,IRQScratch1.a
	movefa	BG,IRQScratch0.a
	storew	IRQScratch1.a,(IRQScratch1.a) ; resume OP
	moveq	#0,IRQScratch1.a
	jr	irq_return
	storew	IRQScratch0.a,(IRQScratch0.a)

 

Edited by 42bs
Link to comment
Share on other sites

Any chance your loop resumes execution (following interrupt return) with the wrong bank of registers still active for a few clock cycles?  Then you get unpredictable results, which might randomly seem predictable?

 

I think the store below sets/clears bits to clear interrupt with side affect of returning to normal active bank of registers.

irq_return
    jump    (IRQ_RTS.a)
    store    IRQ_FLAG.a,(IRQ_FLAGADDR.a)

 

 

Page 53 of SoftRef_V10.pdf, @c Stephen Moss:
"Values written to the G_FLAGS resister may not appear to have changed in the following two instructions due to 
pipe-lining effects. 
Consequently, writing a value to the flag bits and making use of those flag bits in the following instruction will 
not work properly. If it is necessary to use flags set by a STORE instruction, then ensure that at least two other 
instructions lie between the STORE and the flags dependent instruction.
If it is necessary to use flags set by an indexed STORE instruction, then ensure that at least four other 
instructions lie between the STORE and the flags dependent instruction."


 

  • Like 3
  • Thanks 1
Link to comment
Share on other sites

Yeah, I've always wondered why the interrupt return routine is written as if it was immune to that. It seems like if it wasn't, it would be almost impossible to use both register banks and interrupts, so I'm really hoping that's not it.

 

I have a similar issues in some DSP code that could be explained by this, but I've never managed to isolate it enough to say for sure.

Link to comment
Share on other sites

6 hours ago, jguff said:

Any chance your loop resumes execution (following interrupt return) with the wrong bank of registers still active for a few clock cycles?

Though the Atari code for the interrupt handling is like this and it also work for other interrupts, following code works which means you might be correct.

	moveq	#0,r0
	moveq	#2,r10
	moveta	r10,r10
.w	cmpq	#2,r10
	jr	eq,.w
	store	r10,(r0)
	addq	#4,r0
	store	BG,(r0)

If I change the irq_return:

	addqt	#2,IRQ_RTS.a
	movefa	IRQ_SP,IRQ_SP.a
	store	IRQ_FLAG.a,(IRQ_FLAGADDR.a)
	jump	(IRQ_RTS.a)
	nop

It works w/o the 'moveta'

 

And: My original code, where I use a register as flag for the waiting for the interrupt also works.

Edited by 42bs
Link to comment
Share on other sites

In summary, I think the return from interrupt should be:

	addqt	#2,IRQ_RTS.a
	movefa	IRQ_SP,IRQ_SP.a
	moveta	IRQ_RTS.a,IRQ_RTS
	store	IRQ_FLAG.a,(IRQ_FLAGADDR.a)
	jump	(IRQ_RTS.a)
	nop

BTW: I ended up in reseting the IRQ stack rather then incrementing it.

Link to comment
Share on other sites

Here's a little prog I wrote, you might want to read it code by code ...

 

It uses 3 interrupts: CPU->GPU, Timer and GPUOBJ.

 

It does not run correctly on emulators as they do not handle OP[0-3] ($f0010) correctly.

 

irq_exp.zip

Edited by 42bs
Typo
Link to comment
Share on other sites

I can't rebuild it, but what is expected and what happens when it didn't work ?

 

I quickly take a look and I have seen some potential errors :

- in "my_irq", there is, at the end, a clear of both interrupts (vid & gpu) when only one of both is treated at a time :

If there is a VBL interrupt and almost in same time a GPU interrupts, you will have an infinite loop in the waitGPU because the pending GPU interrupts will be cleared and the no_vi will never be executed.

I don't know if it can happen in the sequence of the code, but it's maybe possible.

=> my_irq should treat all latched interrupts and clear only those treated to avoid missing interrupts processing.

 

- in "StartGPU", the "moveq #1, d7" should be before the "move.l #$5, $f02114", to avoid the case where the GPU finish before next instruction of the 68k

Probably also a check of d7 before the stop to avoid to wait after the vbl to detect the end. (with the current code I'm sure it's not the case because the GPU seems have a long work but in the case it's a tiny work it can happen... :) )

 

- in "irq_return", the restoring of the flags should be in the jump slot, it's the only way to do it properly : the jump takes enough time to permit the bank switch.

If it's not in the jump slot, you could have wrong interrupts process if pending interrupts are waiting to execute : the store before jump will disable the IMASK at least two cycle earlier and allow the pending interrupts to start process making the potential problem that it put in the stack the wrong return PC address.

 

- There is a "storep" in the ".fill" loop but it potentially can be corrupt from "load" of the op_irq as those load go through the gateway interface of the GPU : only load from $F02000 to $F021FF and $F03000 to $F0FFFF are internal and don't corrupt the HI_DATA.

I'm not sure about the $F02200 to $F02FFF but should be safe too.

 

- i prefer the "addqt #4, sp" instead of a move but, in this context, it should work the same. ;)

 

 

I will see later if I see other potential problems.

 

  • Thanks 1
Link to comment
Share on other sites

Ok, rebuilding needs "my" rmac and lyxass.

As for

7 minutes ago, SCPCD said:

in "irq_return", the restoring of the flags should be in the jump slot, it's the only way to do it properly : the jump takes enough time to permit the bank switch.

See my trouble/comments above which was the original reason for the post: If I do not restore the flags _before_ the JUMP, the "jr eq," fails.

And this applies _only_ if the OP-interrupt is activated.

 

This even works:

	addqt	#2,IRQ_RTS.a
	moveta	IRQ_RTS.a,IRQ_RTS
	store	IRQ_FLAG.a,(IRQ_FLAGADDR.a)
	jump	(IRQ_RTS.a)
	moveta	IRQ_SP,IRQ_SP.a	; this instruction is already in back bank #1!

 

8 minutes ago, SCPCD said:

- i prefer the "addqt #4, sp" instead of a move but, in this context, it should work the same. ;)

I had this before, but at least JagTris crashes after some time. I checked the game sources I have, and there is not much GPU interrupt code, but one I found does also restore using top address rather then the "addqt"

11 minutes ago, SCPCD said:

in "my_irq", there is, at the end, a clear of both interrupts (vid & gpu) when only one of both is treated at a time :

Not true, this code:

	lsl.w	#8,d2
	or.w	#C_VIDENA|C_GPUENA,d2
	swap	d2

Clears only the currently latched ones (in d2). The "or.w" enables both (see the swap).

 

13 minutes ago, SCPCD said:

There is a "storep" in the ".fill" loop but it potentially can be corrupt from "load" of the op_irq

Right, if any interrupt comes in while the fill loop, the HIDATA gets corrupted. I do not see it in this example, but had others.

14 minutes ago, SCPCD said:

n "StartGPU", the "moveq #1, d7" should be before the "move.l #$5, $f02114", to avoid the case where the GPU finish before next instruction of the 68k

Yes, this is a potential RC.

 

Link to comment
Share on other sites

34 minutes ago, SCPCD said:

I can't rebuild it, but what is expected and what happens when it didn't work

This is a working example. It set BG color, fills the screen then clears BG color to zero. And it uses two GPU objects to set BG color a different Y positions.

Link to comment
Share on other sites

2 hours ago, 42bs said:

Not true, this code:


	lsl.w	#8,d2
	or.w	#C_VIDENA|C_GPUENA,d2
	swap	d2

Clears only the currently latched ones (in d2). The "or.w" enables both (see the swap).

 

 

If d2 = %11 (gpu & vint), the service routine will only do vint and clear both interrupts. :)

 

  • Thanks 1
Link to comment
Share on other sites

Yeah, I was worried about what @SCPCDsaid about the restoration of flags too soon. It seems impossible to correctly use two register banks and interrupts if the hardware really behaves as feared. I'm worried the "fixed" version is just getting more lucky with some timings than the broken version.

Link to comment
Share on other sites

Found a bug in the gpu code :

init::
	movei	#$f02100,IRQ_FLAGADDR
	moveta	IRQ_FLAGADDR,IRQ_FLAGADDR.a

	movei	#1<<14|%11111<<9,r0	; clear all ints, REGPAGE = 1
;	store	r1,(IRQ_FLAGADDR) 
	store	r0,(IRQ_FLAGADDR) ;;; SCPCD => should be r0 not r1, else unexpected data will be set in GFLAGS register and probably wrong bank used in next instructions
	nop
	nop				; wait

	movei	#IRQ_STACK,IRQ_SP
	moveta	IRQ_SP,IRQ_SP.a

	movei	#$f00058,BG

	movei	#1<<14|%11111<<9|%01101<<4,r0
	store	r0,(IRQ_FLAGADDR)
	nop
	nop

Dunno if it will resolve the problem. :)

 

  • Thanks 1
Link to comment
Share on other sites

1 hour ago, cubanismo said:

Yeah, I was worried about what @SCPCDsaid about the restoration of flags too soon. It seems impossible to correctly use two register banks and interrupts if the hardware really behaves as feared. I'm worried the "fixed" version is just getting more lucky with some timings than the broken version.

Might be a reason, I could not find GPU code with interrupts in the game sources I checked.

Link to comment
Share on other sites

Besides that, DSP would likely have the same issue, and I assume everyone is using interrupts there. Maybe they don't use two register banks though.

 

The CD code requires GPU interrupts. It rarely runs during gameplay I would assume, but the Cinepak cutscenes, assuming they all use something like the sample, use multiple GPU interrupts (external routed from Jerry for CD access and GPU objects in object list) too, and I believe the Cinepak codec itself runs in bank 1, but I'd have to go check to be sure.

Link to comment
Share on other sites

1 hour ago, cubanismo said:

Besides that, DSP would likely have the same issue, and I assume everyone is using interrupts there. Maybe they don't use two register banks though.

The audio/input engine I wrote (that's been used in a number of Reboot and rB+/Jagstudio games) uses almost every register in both banks, and I'm not aware of any problems.

 

Here's the code I use:

Int_DSP_StackPtr            .equr   r31         ; Interrupt (bank 0) DSP stack
Int_DSP_Flags               .equr   r28         ; Interrupt DSP flags backup
Int_Tmp1                    .equr   r29         ; Interrupt temporary register #1
Int_Tmp2                    .equr   r30         ; Interrupt temporary register #2

; Start of interrupt: save DSP flags
    movei       #D_FLAGS, Int_Tmp1
    load        (Int_Tmp1), Int_DSP_Flags

...

; End of interrupt
    movei       #D_FLAGS, Int_Tmp2              ; get flags address
    bset        #10, Int_DSP_Flags              ; clear the interrupt latch 
    load        (Int_DSP_StackPtr), Int_Tmp1    ; get last instruction address
    bclr        #3, Int_DSP_Flags               ; clear IMASK
    addq        #2, Int_Tmp1                    ; point at next to be executed
    addq        #4, Int_DSP_StackPtr            ; updating the stack pointer
    jump        (Int_Tmp1)                      ; and return
    store       Int_DSP_Flags, (Int_Tmp2)       ; restore flags
Edited by Zerosquare
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...