Session 12: Initialisation

+Andrew Davie · June 28, 2014

It's a bit like someone coming up to Yoda and thanking him for his advice and calling him "young master"... with no idea he's talking to the most powerful Jedi master of all.

bogax · August 8, 2014

Yes, that's a good 8-byte clear. It clears memory but doesn't set the stack pointer.

you should look at Session 24 "Some nice code" for a better one...
        ldx #0 
        txa 
Clear   dex 
        txs 
        pha 
        bne Clear
The above exits with stack pointer set to $FF, all memory zeroed, X and A zero.

how about

 ldx #$FF
 txs
 lda #$00
LOOP
 tsx
 pha
 bne LOOP

Omegamatrix can stick in

a few extra pha's at the

beginning of the loop as

long as the total is divisible

in to 256

Edited August 8, 2014 by bogax

bogax · August 8, 2014

Yes, that is a brilliant, optimal solution. Just stick a CLD on there and you are done.

I sometimes run into a situation where I need to re-boot or switch to an entirely new kernel (say titlescreen to a playing screen). The main issue to avoid is scanline bounces. The optimal code takes about 36 scanlines to complete which is too long. I made a routine that saves about 26 scanlines with the trade-off of using much more bytes. I tried to balance the byte cost vs amount of scanlines gained, and this was the best balance I could find:
    cld
    lda    #0
    ldx    #$2C
    txs
.loopClear:
    pha
    pha
    pha
    pha
    pha
    pha
    tsx
    cpx    #$7E
    bne    .loopClear
    ldx    #$FF
    txs

like this (see above)

 ldx #$FF
 txs
 lda #$00
LOOP
 pha
 pha
 pha
 pha
 pha
 pha
 pha
 tsx
 pha
 bne LOOP
 cld

Edited August 8, 2014 by bogax

+Andrew Davie · August 9, 2014

how about

 ldx #$FF
 txs
 lda #$00
LOOP
 tsx
 pha
 bne LOOP

It's not better as it stands -- 9 bytes instead of 8. However, in the special case where you need extra speed it's definitely quicker.

LS_Dracon · August 9, 2014

ldx #$FF
txs
lda #$00

LOOP

tsx

pha

bne LOOP

Hmm.... This gonna works?

lax #$00

dex

txs

Loop

tsx

pha

bne Loop

EDIT : or

lax #$00

Clear dex

txs

pha

bne Clear

Edited August 9, 2014 by LS_Dracon

Nukey Shay · August 9, 2014

Although LAX#imm is supposedly always stable when the argument is zero, undocumented opcodes are not supported on all hardware.

Omegamatrix · August 9, 2014

    cld
    lda    #0
    ldx    #$2C
    txs
.loopClear:
    pha
    pha
    pha
    pha
    pha
    pha
    tsx
    cpx    #$7E
    bne    .loopClear
    ldx    #$FF
    txs

I came up with a more optimized solution which I posted in my blog a while ago:

    cld
    lda    #0
    ldx    #CXCLR
    txs
    ldx    #28
.loopClearFaster:
    pha
    pha
    pha
    pha
    pha
    pha
    dex
    bpl    .loopClearFaster
    txs

@Bogax the point of the above code is to balance speed vs bytes used. By starting at CXCLR and working down a lot more cycles are saved (plus less loops by stuffing multiple PHA's). The above code only takes 10 scanlines and 48 cycles. That's pretty good performance. The compact code takes 36 scanlines and 22 cycles to complete.

You typically only need speed if you are switching kernels... say from a title screen to playing screen, and want to easily avoid scanline bounces.

Edited August 9, 2014 by Omegamatrix

+Andrew Davie · August 9, 2014

lax #$00

Clear dex

txs

pha

bne Clear

Very nice!

LS_Dracon · August 9, 2014

Although LAX#imm is supposedly always stable when the argument is zero, undocumented opcodes are not supported on all hardware.

Yep.

It's safe in Atari 2600 I assume, as many homebrews uses this opcode. Actually this and DCP.

Very nice!

Thanks but it's just your code with lax

+Andrew Davie · August 9, 2014



 lax #0

 txs

loop pha

 tsx

 bne loop

LS_Dracon · August 9, 2014

Neat!

Saves 2 cycles in the loop!

Omegamatrix · August 9, 2014

Although LAX#imm is supposedly always stable when the argument is zero, undocumented opcodes are not supported on all hardware.

I use LAX all the time, but have never used LXA #IMM as it was reportedly highly unstable. Looking at this page it might be true that loading zero might always work:

http://www.oxyron.de/html/opcodes02.html

note to LAX: DO NOT USE!!! On my C128, this opcode is stable, but on my C64-II it loses bits so that the operation looks like this: ORA #? AND #{imm} TAX.

I'm writing this opcode as "LXA" because that is how DASM compiles it.

LS_Dracon · August 9, 2014

LAX not works with imm (at least in DASM)

So LXA is working fine on emulator.

BTW I'm testing and having problems in these codes, it's not working.

TSX must be set before PHA, but doesn't make sense to me...



	cld
	lxa #0
	txs

loop	tsx
	pha
	bne loop

Edited August 9, 2014 by LS_Dracon

+Andrew Davie · August 9, 2014

LAX not works with imm (at least in DASM)

So LXA is working fine on emulator.

BTW I'm testing and having problems in these codes, it's not working.

TSX must be set before PHA, but doesn't make sense to me...
	cld
	lxa #0
	txs
loop	tsx
	pha
	bne loop

The problem seems to be that your (and my) code exits with SP=0, whereas it should be $FF

Add another PHA at the end, like this...

 lxa #0
 txs
loop pha
 tsx
 bne loop
 pha

starts with x=0 and then puts that into SP, the PHA writes 0 to location 0, and sets the SP to $FF and we loop

when SP is 1, the pha will write 0 to location 1, and SP becomes 0 which is then tsx'd and the loop ends, with SP=0

the final PHA resets the SP to $FF

I haven't actually run this. But it looks reasonable. However, "LXA" is considered unstable and should probably not be used. And there's no LAX immediate as you have pointed out.

So...

 lda #0
 tax
 txs
loop pha
 tsx
 bne loop
 pha

It's not so elegant anymore. 9 bytes, but does have the advantage of a quicker (512 cycles) clear at the cost of an extra byte.

Omegamatrix · August 9, 2014

 lax #0
 txs
loop pha
 tsx
 bne loop

There is a problem with the above code in that the stack pointer is left pointing to 0 after completion.

LAX not works with imm (at least in DASM)

So LXA is working fine on emulator.

BTW I'm testing and having problems in these codes, it's not working.

TSX must be set before PHA, but doesn't make sense to me...
	cld
	lxa #0
	txs

loop	tsx
	pha
	bne loop

The branch in the loop is never taken, as the very first time through TSX brings a value of 0 to X. PHA does not affect any flags.

Edit: Andrew beat me to it.

Edited August 9, 2014 by Omegamatrix

LS_Dracon · August 9, 2014

That's the need of dex, to stack (txs) enter in loop as $FF.

But then, the code get's bigger again.

We're so close...

Omegamatrix · August 9, 2014

Here's another one.

;25 scanlines + 18 cycles (1918 cycles total)
;A  = 0
;X  = 0
;Y  = random
;SP = $FF
;zp ram location $FF = random

    cld
    lda    #0
.loopClear:
    ldx    #$48          ; PHA opcode = $48
    txs
    inx
    bne    .loopClear+1  ; jump between operator and operand to do PHA

This sets the stack correctly, but leaves ram location $FF untouched. Not clearing $FF is okay for me. It can be used for a random seed, and often programmers use JSR with the stack aligned to $FF anyhow. starting at $48 instead of 0 or $FF makes the routine quicker.

Edit just realized the mirror for the TIA registers starts at $40, so I don't actually clear:

VSYNC

VBLANK

NUSIZ0

NUSIZ1

COLUP0

COLUP1

Most of these registers the programmer will set up during the program, so it's still not too bad as long as the user is aware that the initial state of them is unknown.

Edited August 9, 2014 by Omegamatrix

Omegamatrix · August 9, 2014

I believe I have just come up with an 8 byte solution that includes CLD, an no illegal opcodes:

;39 scanlines + 65 cycles (3029 cycles total)
;A  = 0
;X  = 0
;Y  = random
;SP = $FF

    cld
.loopClear:
    ldx    #$0A          ; ASL opcode = $0A
    inx
    txs
    pha
    bne    .loopClear+1  ; jump between operator and operand to do ASL

It takes the most cycles of any solution, but clears all the TIA registers and RIOT ram.

Edited August 9, 2014 by Omegamatrix

+Andrew Davie · August 10, 2014

I believe I have just come up with an 8 byte solution that includes CLD, an no illegal opcodes:
;39 scanlines + 65 cycles (3029 cycles total)
;A  = 0
;X  = 0
;Y  = random
;SP = $FF

    cld
.loopClear:
    ldx    #$0A          ; ASL opcode = $0A
    inx
    txs
    pha
    bne    .loopClear+1  ; jump between operator and operand to do ASL
It takes the most cycles of any solution, but clears all the TIA registers and RIOT ram.

The branch into mid-instruction which is a asl is very clever.

However, I'm struggling to understand this. X is effectively initialised at 11 (first time) so that's where the first "a" value goes. But "a" is undefined -- effectively random.

second time you do an "asl" every loop, so after 8 loops a will guaranteed be 0. And you branch until Z is zero (effectively when x gets to 0). So you never clear locations 0 to 10.

And furthermore locations 10 to 17 effectively have randomish data.

This code is bizarre, and this is my third attempt to analyse/respond.

Omegamatrix · August 10, 2014

The branch into mid-instruction which is a pha is very clever.

However, I'm struggling to understand this. X is effectively initialised at 11 (first time) so that's where the first "a" value goes. But "a" is undefined -- effectively random.

second time you do an "asl" every loop, so after 8 loops a will guaranteed be 0. And you branch until Z is zero (effectively when x gets to 0). So you never clear locations 0 to 10.

And furthermore locations 10 to 17 effectively have randomish data.

This code is bizarre, and this is my third attempt to analyse/respond.

Hi Andrew,

A=0 by the time it hits the TIA mirrors at $40-$7F. It doesn't matter what value A starts with, as it will be zero for the "second time through" as it clears the mirrored addresses. As a bonus you know the carry will also always end up being clear by the end of this routine.

Omegamatrix · August 10, 2014

Does the second routine make sense now?

 SP    REGISTER    VALUE (FROM ACCUMULATOR, which gets ASL'd)
$0B     REFP0      %XXXXXXXX
$0C     REFP1      %XXXXXXX0
$0D     PF0        %XXXXXX00
$0E     PF1        %XXXXX000
$0F     PF2        %XXXX0000
$10     RESP0      %XXX00000
$11     RESP1      %XX000000
$12     RESM0      %X0000000
$13     RESM1      %00000000
$14     RESBL      A=0 for now on

;writes continue to start of TIA mirrors

 SP    REGISTER    VALUE (FROM ACCUMULATOR)
$40     VSYNC       0
$41     VBLANK      0
$42     WSYNC       0
...

;Writes continue through ZP $80-$FF clearing RIOT RAM
;At end of routine TIA registers and RIOT RAM cleared,
;A=X=0, SP = $FF

Nukey Shay · August 10, 2014

You typically only need speed if you are switching kernels... say from a title screen to playing screen, and want to easily avoid scanline bounces.

But why would you be clearing ram and registers at that point anyway? Of all the games I've altered to have more than a single kernel, I've never had to do it. Powerup only "requires" it because everything is in an unknown state...but even that is too broad of a statement to be using (i.e. you really only need to clear the stuff your regular game init routine misses, or gfx/aud registers that you won't be using at all).

LS_Dracon · August 10, 2014

Assuming LXA is stable as LAX and removing dex from the loop and setting stack as $FF, this should works?

We could test LXA in real hardware. I'm searching about it and people who said it's not stable, misunderstand referring as LAX.

   lxa #0 
   dex 
   txs 
loop 
   pha 
   tsx 
   bne loop

EDIT : Definitely unstable, and it's not "lax #imm", it's AND A with X and load on X.

Since X not starts as 0, A as well, it's not useful.

Edited August 10, 2014 by LS_Dracon

Omegamatrix · August 10, 2014

But why would you be clearing ram and registers at that point anyway? Of all the games I've altered to have more than a single kernel, I've never had to do it. Powerup only "requires" it because everything is in an unknown state...but even that is too broad of a statement to be using (i.e. you really only need to clear the stuff your regular game init routine misses, or gfx/aud registers that you won't be using at all).

It's just much easier to clean it all. IMHO it also makes the game a lot easier to troubleshoot.

Omegamatrix · August 10, 2014

EDIT : Definitely unstable, and it's not "lax #imm", it's AND A with X and load on X.

Since X not starts as 0, A as well, it's not useful.

Although LXA is unstable, it is possible that using 0 for the immediate value could be stable as Nukey described.

My notes describes LXA as:

AND byte with accumulator, then transfer accumulator to X register.

And the unstable behaviour is described as:

ORA #? AND #{imm} TAX

In either case the accumulator is AND'd with the immediate value right before TAX. As long as you are ANDing with 0 you should be okay. That being said I'd still be a little iffy to implement it. Who knows if the behaviour will be different on some consoles?

Session 12: Initialisation

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members