Jump to content

Photo

Session 12: Initialisation


75 replies to this topic

#51 Andrew Davie OFFLINE  

Andrew Davie

    Stargunner

  • Topic Starter
  • 1,782 posts
  • Dr.Boo
  • Location:Tasmania

Posted Sat Jun 28, 2014 4:00 AM

It's a bit like someone coming up to Yoda and thanking him for his advice and calling him "young master"... with no idea he's talking to the most powerful Jedi master of all.



#52 bogax OFFLINE  

bogax

    Dragonstomper

  • 710 posts

Posted Fri Aug 8, 2014 2:30 PM

 

 

Yes, that's a good 8-byte clear. It clears memory but doesn't set the stack pointer.

you should look at Session 24 "Some nice code" for a better one...

        ldx #0 
        txa 
Clear   dex 
        txs 
        pha 
        bne Clear

The above exits with stack pointer set to $FF, all memory zeroed, X and A zero.

 

 how about
 

 ldx #$FF
 txs
 lda #$00
LOOP
 tsx
 pha
 bne LOOP 

 Omegamatrix can stick in
 a few extra pha's at the
 beginning of the loop as
 long as the total is divisible
 in to 256


 

 


Edited by bogax, Fri Aug 8, 2014 2:52 PM.


#53 bogax OFFLINE  

bogax

    Dragonstomper

  • 710 posts

Posted Fri Aug 8, 2014 4:02 PM

 

 

Yes, that is a brilliant, optimal solution. Just stick a CLD on there and you are done.

 

 

I sometimes run into a situation where I need to re-boot or switch to an entirely new kernel (say titlescreen to a playing screen). The main issue to avoid is scanline bounces. The optimal code takes about 36 scanlines to complete which is too long. I made a routine that saves about 26 scanlines with the trade-off of using much more bytes. I tried to balance the byte cost vs amount of scanlines gained, and this was the best balance I could find:

    cld
    lda    #0
    ldx    #$2C
    txs
.loopClear:
    pha
    pha
    pha
    pha
    pha
    pha
    tsx
    cpx    #$7E
    bne    .loopClear
    ldx    #$FF
    txs


 like this (see above)
 

 ldx #$FF
 txs
 lda #$00
LOOP
 pha
 pha
 pha
 pha
 pha
 pha
 pha
 tsx
 pha
 bne LOOP
 cld

Edited by bogax, Fri Aug 8, 2014 4:04 PM.


#54 Andrew Davie OFFLINE  

Andrew Davie

    Stargunner

  • Topic Starter
  • 1,782 posts
  • Dr.Boo
  • Location:Tasmania

Posted Fri Aug 8, 2014 6:19 PM

 

 how about
 

 ldx #$FF
 txs
 lda #$00
LOOP
 tsx
 pha
 bne LOOP 

 

It's not better as it stands -- 9 bytes instead of 8.  However, in the special case where you need extra speed it's definitely quicker.



#55 LS_Dracon OFFLINE  

LS_Dracon

    Dragonstomper

  • 737 posts

Posted Fri Aug 8, 2014 6:44 PM

 ldx #$FF

txs
lda #$00
LOOP
tsx
pha
bne LOOP

 

Hmm.... This gonna works?

 

     lax #$00

     dex

     txs

Loop

     tsx 

     pha

     bne Loop

 

EDIT : or

 

          lax #$00

Clear dex
          txs
          pha
          bne Clear


Edited by LS_Dracon, Fri Aug 8, 2014 6:51 PM.


#56 Nukey Shay OFFLINE  

Nukey Shay

    Sheik Yerbouti

  • 21,670 posts
  • Location:The land of Gorch

Posted Fri Aug 8, 2014 7:53 PM

Although LAX#imm is supposedly always stable when the argument is zero, undocumented opcodes are not supported on all hardware.

#57 Omegamatrix OFFLINE  

Omegamatrix

    Quadrunner

  • 6,125 posts
  • Location:Canada

Posted Fri Aug 8, 2014 9:55 PM

    cld
    lda    #0
    ldx    #$2C
    txs
.loopClear:
    pha
    pha
    pha
    pha
    pha
    pha
    tsx
    cpx    #$7E
    bne    .loopClear
    ldx    #$FF
    txs

 

I came up with a more optimized solution which I posted in my blog a while ago:

    cld
    lda    #0
    ldx    #CXCLR
    txs
    ldx    #28
.loopClearFaster:
    pha
    pha
    pha
    pha
    pha
    pha
    dex
    bpl    .loopClearFaster
    txs

@Bogax the point of the above code is to balance speed vs bytes used. By starting at CXCLR and working down a lot more cycles are saved (plus less loops by stuffing multiple PHA's). The above code only takes 10 scanlines and 48 cycles. That's pretty good performance. The compact code takes 36 scanlines and 22 cycles to complete.

 

 

You typically only need speed if you are switching kernels... say from a title screen to playing screen, and want to easily avoid scanline bounces.


Edited by Omegamatrix, Fri Aug 8, 2014 10:15 PM.


#58 Andrew Davie OFFLINE  

Andrew Davie

    Stargunner

  • Topic Starter
  • 1,782 posts
  • Dr.Boo
  • Location:Tasmania

Posted Fri Aug 8, 2014 10:27 PM


          lax #$00

Clear dex
          txs
          pha
          bne Clear

 

Very nice!



#59 LS_Dracon OFFLINE  

LS_Dracon

    Dragonstomper

  • 737 posts

Posted Sat Aug 9, 2014 6:35 AM

Although LAX#imm is supposedly always stable when the argument is zero, undocumented opcodes are not supported on all hardware.

Yep.

It's safe in Atari 2600 I assume, as many homebrews uses this opcode. Actually this and DCP.

 

 

 

Very nice!

Thanks but it's just your code with lax ;)



#60 Andrew Davie OFFLINE  

Andrew Davie

    Stargunner

  • Topic Starter
  • 1,782 posts
  • Dr.Boo
  • Location:Tasmania

Posted Sat Aug 9, 2014 7:47 AM

 lax #0
 txs
loop pha
 tsx
 bne loop



#61 LS_Dracon OFFLINE  

LS_Dracon

    Dragonstomper

  • 737 posts

Posted Sat Aug 9, 2014 8:31 AM

Neat!

Saves 2 cycles in the loop!



#62 Omegamatrix OFFLINE  

Omegamatrix

    Quadrunner

  • 6,125 posts
  • Location:Canada

Posted Sat Aug 9, 2014 8:34 AM

Although LAX#imm is supposedly always stable when the argument is zero, undocumented opcodes are not supported on all hardware.

I use LAX all the time, but have never used LXA #IMM as it was reportedly highly unstable. Looking at this page it might be true that loading zero might always work:

 

http://www.oxyron.de.../opcodes02.html

 

 

note to LAX: DO NOT USE!!! On my C128, this opcode is stable, but on my C64-II it loses bits so that the operation looks like this: ORA #? AND #{imm} TAX.

 

I'm writing this opcode as "LXA" because that is how DASM compiles it.



#63 LS_Dracon OFFLINE  

LS_Dracon

    Dragonstomper

  • 737 posts

Posted Sat Aug 9, 2014 9:04 AM

LAX not works with imm (at least in DASM)

So LXA is working fine on emulator.

 

BTW I'm testing and having problems in these codes, it's not working.

 

TSX must be set before PHA, but doesn't make sense to me...



	cld
	lxa #0
	txs

loop	tsx
	pha
	bne loop

Edited by LS_Dracon, Sat Aug 9, 2014 9:06 AM.


#64 Andrew Davie OFFLINE  

Andrew Davie

    Stargunner

  • Topic Starter
  • 1,782 posts
  • Dr.Boo
  • Location:Tasmania

Posted Sat Aug 9, 2014 9:31 AM

 

LAX not works with imm (at least in DASM)

So LXA is working fine on emulator.

 

BTW I'm testing and having problems in these codes, it's not working.

 

TSX must be set before PHA, but doesn't make sense to me...

	cld
	lxa #0
	txs
loop	tsx
	pha
	bne loop

 

The problem seems to be that your (and my) code exits with SP=0, whereas it should be $FF

Add another PHA at the end, like this...

 lxa #0
 txs
loop pha
 tsx
 bne loop
 pha

starts with x=0 and then puts that into SP,  the PHA writes 0 to location 0, and sets the SP to $FF and we loop

when SP is 1, the pha will write 0 to location 1, and SP becomes 0 which is then tsx'd and the loop ends, with SP=0

the final PHA resets the SP to $FF

 

I haven't actually run this. But it looks reasonable.  However, "LXA" is considered unstable and should probably not be used. And there's no LAX immediate as you have pointed out.

 

So...

 lda #0
 tax
 txs
loop pha
 tsx
 bne loop
 pha

It's not so elegant anymore. 9 bytes, but does have the advantage of a quicker (512 cycles) clear at the cost of an extra byte.



#65 Omegamatrix OFFLINE  

Omegamatrix

    Quadrunner

  • 6,125 posts
  • Location:Canada

Posted Sat Aug 9, 2014 9:33 AM

 lax #0
 txs
loop pha
 tsx
 bne loop

 

There is a problem with the above code in that the stack pointer is left pointing to 0 after completion.

 

LAX not works with imm (at least in DASM)

So LXA is working fine on emulator.

 

BTW I'm testing and having problems in these codes, it's not working.

 

TSX must be set before PHA, but doesn't make sense to me...



	cld
	lxa #0
	txs

loop	tsx
	pha
	bne loop

The branch in the loop is never taken, as the very first time through TSX brings a value of 0 to X. PHA does not affect any flags.

 

 

Edit: Andrew beat me to it.


Edited by Omegamatrix, Sat Aug 9, 2014 9:39 AM.


#66 LS_Dracon OFFLINE  

LS_Dracon

    Dragonstomper

  • 737 posts

Posted Sat Aug 9, 2014 11:34 AM

That's the need of dex, to stack (txs) enter in loop as $FF.

But then, the code get's bigger again.

 

We're so close... :(



#67 Omegamatrix OFFLINE  

Omegamatrix

    Quadrunner

  • 6,125 posts
  • Location:Canada

Posted Sat Aug 9, 2014 12:16 PM

Here's another one. :)

;25 scanlines + 18 cycles (1918 cycles total)
;A  = 0
;X  = 0
;Y  = random
;SP = $FF
;zp ram location $FF = random

    cld
    lda    #0
.loopClear:
    ldx    #$48          ; PHA opcode = $48
    txs
    inx
    bne    .loopClear+1  ; jump between operator and operand to do PHA

This sets the stack correctly, but leaves ram location $FF untouched. Not clearing $FF is okay for me. It can be used for a random seed, and often programmers use JSR with the stack aligned to $FF anyhow. starting at $48 instead of 0 or $FF makes the routine quicker.

 

 

Edit just realized the mirror for the TIA registers starts at $40, so I don't actually clear:

VSYNC

VBLANK

NUSIZ0

NUSIZ1

COLUP0

COLUP1

 

Most of these registers the programmer will set up during the program, so it's still not too bad as long as the user is aware that the initial state of them is unknown.


Edited by Omegamatrix, Sat Aug 9, 2014 12:30 PM.


#68 Omegamatrix OFFLINE  

Omegamatrix

    Quadrunner

  • 6,125 posts
  • Location:Canada

Posted Sat Aug 9, 2014 12:47 PM

I believe I have just come up with an 8 byte solution that includes CLD, an no illegal opcodes:

;39 scanlines + 65 cycles (3029 cycles total)
;A  = 0
;X  = 0
;Y  = random
;SP = $FF

    cld
.loopClear:
    ldx    #$0A          ; ASL opcode = $0A
    inx
    txs
    pha
    bne    .loopClear+1  ; jump between operator and operand to do ASL

It takes the most cycles of any solution, but clears all the TIA registers and RIOT ram. :)


Edited by Omegamatrix, Sat Aug 9, 2014 12:52 PM.


#69 Andrew Davie OFFLINE  

Andrew Davie

    Stargunner

  • Topic Starter
  • 1,782 posts
  • Dr.Boo
  • Location:Tasmania

Posted Sat Aug 9, 2014 7:07 PM

I believe I have just come up with an 8 byte solution that includes CLD, an no illegal opcodes:

;39 scanlines + 65 cycles (3029 cycles total)
;A  = 0
;X  = 0
;Y  = random
;SP = $FF

    cld
.loopClear:
    ldx    #$0A          ; ASL opcode = $0A
    inx
    txs
    pha
    bne    .loopClear+1  ; jump between operator and operand to do ASL

It takes the most cycles of any solution, but clears all the TIA registers and RIOT ram. :)

 

The branch into mid-instruction which is a asl is very clever.

However, I'm struggling to understand this. X is effectively initialised at 11 (first time) so that's where the first "a" value goes.  But "a" is undefined -- effectively random.

second time you do an "asl" every loop, so after 8 loops a will guaranteed be 0.  And you branch until Z is zero (effectively when x gets to 0). So you never clear locations 0 to 10.

And furthermore locations 10 to 17 effectively have randomish data.

This code is bizarre, and this is my third attempt to analyse/respond.



#70 Omegamatrix OFFLINE  

Omegamatrix

    Quadrunner

  • 6,125 posts
  • Location:Canada

Posted Sat Aug 9, 2014 7:18 PM

 

The branch into mid-instruction which is a pha is very clever.

However, I'm struggling to understand this. X is effectively initialised at 11 (first time) so that's where the first "a" value goes.  But "a" is undefined -- effectively random.

second time you do an "asl" every loop, so after 8 loops a will guaranteed be 0.  And you branch until Z is zero (effectively when x gets to 0). So you never clear locations 0 to 10.

And furthermore locations 10 to 17 effectively have randomish data.

This code is bizarre, and this is my third attempt to analyse/respond.

Hi Andrew,

 

A=0 by the time it hits the TIA mirrors at $40-$7F. It doesn't matter what value A starts with, as it will be zero for the "second time through" as it clears the mirrored addresses. As a bonus you know the carry will also always end up being clear by the end of this routine.



#71 Omegamatrix OFFLINE  

Omegamatrix

    Quadrunner

  • 6,125 posts
  • Location:Canada

Posted Sat Aug 9, 2014 8:16 PM

Does the second routine make sense now?

 SP    REGISTER    VALUE (FROM ACCUMULATOR, which gets ASL'd)
$0B     REFP0      %XXXXXXXX
$0C     REFP1      %XXXXXXX0
$0D     PF0        %XXXXXX00
$0E     PF1        %XXXXX000
$0F     PF2        %XXXX0000
$10     RESP0      %XXX00000
$11     RESP1      %XX000000
$12     RESM0      %X0000000
$13     RESM1      %00000000
$14     RESBL      A=0 for now on

;writes continue to start of TIA mirrors

 SP    REGISTER    VALUE (FROM ACCUMULATOR)
$40     VSYNC       0
$41     VBLANK      0
$42     WSYNC       0
...

;Writes continue through ZP $80-$FF clearing RIOT RAM
;At end of routine TIA registers and RIOT RAM cleared,
;A=X=0, SP = $FF


#72 Nukey Shay OFFLINE  

Nukey Shay

    Sheik Yerbouti

  • 21,670 posts
  • Location:The land of Gorch

Posted Sun Aug 10, 2014 6:00 AM

You typically only need speed if you are switching kernels... say from a title screen to playing screen, and want to easily avoid scanline bounces.

But why would you be clearing ram and registers at that point anyway?  Of all the games I've altered to have more than a single kernel, I've never had to do it.  Powerup only "requires" it because everything is in an unknown state...but even that is too broad of a statement to be using (i.e. you really only need to clear the stuff your regular game init routine misses, or gfx/aud registers that you won't be using at all).



#73 LS_Dracon OFFLINE  

LS_Dracon

    Dragonstomper

  • 737 posts

Posted Sun Aug 10, 2014 8:00 AM

Assuming LXA is stable as LAX and removing dex from the loop and setting stack as $FF, this should works?

We could test LXA in real hardware. I'm searching about it and people who said it's not stable, misunderstand referring as LAX.

   lxa #0 
   dex 
   txs 
loop 
   pha 
   tsx 
   bne loop

EDIT : Definitely unstable, and it's not "lax #imm", it's AND A with X and load on X.

Since X not starts as 0,  A as well, it's not useful.


Edited by LS_Dracon, Sun Aug 10, 2014 8:16 AM.


#74 Omegamatrix OFFLINE  

Omegamatrix

    Quadrunner

  • 6,125 posts
  • Location:Canada

Posted Sun Aug 10, 2014 9:56 AM

But why would you be clearing ram and registers at that point anyway?  Of all the games I've altered to have more than a single kernel, I've never had to do it.  Powerup only "requires" it because everything is in an unknown state...but even that is too broad of a statement to be using (i.e. you really only need to clear the stuff your regular game init routine misses, or gfx/aud registers that you won't be using at all).

It's just much easier to clean it all. IMHO it also makes the game a lot easier to troubleshoot.



#75 Omegamatrix OFFLINE  

Omegamatrix

    Quadrunner

  • 6,125 posts
  • Location:Canada

Posted Sun Aug 10, 2014 10:05 AM

EDIT : Definitely unstable, and it's not "lax #imm", it's AND A with X and load on X.

Since X not starts as 0,  A as well, it's not useful.

Although LXA is unstable, it is possible that using 0 for the immediate value could be stable as Nukey described.

 

My notes describes LXA as:

AND byte with accumulator, then transfer accumulator to X register.
 
And the unstable behaviour is described as: 

ORA #? AND #{imm} TAX

 

 

In either case the accumulator is AND'd with the immediate value right before TAX. As long as you are ANDing with 0 you should be okay. That being said I'd still be a little iffy to implement it. Who knows if the behaviour will be different on some consoles?






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users