Jump to content
enthusi

Lynx loader from scratch

Recommended Posts

Hi,

I am trying to fully get what's going on there and write an own loader.

I base the first tests on Karris nice small loader:

*=$200

stz mapctl
lda #$04
sta serctl
ldx #$00
loop
lda cart0
sta $f000,x
inx
bne loop

jmp $f000

.dsb $200+50-*,0

 

this loads a full page (the ripple counter is supposedly somewhere in the middle of Bank0 now).

stage2 is only ~100 bytes long currently.

However, my own stage2 loader starts by setting block 1 (512 Byte Blocksize),

so the ripple counter should be reset and the further data is taken from Block 1, 0 offset or in terms of

ROM at 0x200 and in terms of LNX at 0x240 (correct??).

Any error in thinking here (as it doesnt work).

I encrypt the above code with lynxenc to (only stage1):

 

00000000 ff cc e7 22 43 9e 5a e6 4e c1 47 ba 12 48 0d ff |..."C.Z.N.G..H..|
00000010 f6 ed 22 8e 00 6d 47 57 ac cb c1 6f 79 82 87 99 |.."..mGW...oy...|
00000020 42 e7 71 9c aa de 7f f6 75 a6 fa 1a 3d 01 97 75 |B.q.....u...=..u|
00000030 22 99 43 11 |".C.|

lynxdec agrees on the content at least :)

 

Hard to tell in handysdl or mednafen if that part worked at all.

I thought a BRK also exits handysdl but even starting the stage1 loader with BRK results in 'nothing' happening.

 

My LNX header is pretty certainly ok:

00000000 4c 59 4e 58 00 02 00 00 01 00 41 73 73 65 6d 62 |LYNX......Assemb|
00000010 6c 6f 69 64 73 00 00 00 00 00 00 00 00 00 00 00 |loids...........|
00000020 00 00 00 00 00 00 00 00 00 00 50 72 69 6f 72 41 |..........PriorA|
00000030 72 74 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |rt..............|
00000040 ff cc e7 22 43 9e 5a e6 4e c1 47 ba 12 48 0d ff |..."C.Z.N.G..H..|
00000050 f6 ed 22 8e 00 6d 47 57 ac cb c1 6f 79 82 87 99 |.."..mGW...oy...|
00000060 42 e7 71 9c aa de 7f f6 75 a6 fa 1a 3d 01 97 75 |B.q.....u...=..u|
00000070 22 99 43 11 a9 1a 8d 8a fd a9 0b 8d 8b fd a9 02 |".C.............|
00000080 8d 87 fd a9 01 85 10 20 39 f0 a9 14 85 14 a0 02 |....... 9.......|
....

 

The stage2 loader starts right after the last byte of the encryted stage1 code.

Stage2 is not encrypted anymore.

My stage 2 loader does this to set a block address:

(not optimized at all, I know)

 

set_block
.(
lda iodat
and #%11111101
sta tmpiodat

lda block
sta tmp

;pre-roll once into bit 0 (not yet 1)
rol tmp
rol tmp+1

ldx #7
loop_block

rol tmp
rol tmp+1 ;bit 7 of block -> bit 1 of tmp+1

lda tmp+1
and #%00000010
ora tmpiodat
sta iodat

lda #%00000011 ;0
sta sysctl1
lda #%00000010 ;1
sta sysctl1

dex
bpl loop_block
rts
.)

 

 

and then it starts loading.

The payload starts in the LNX image at 0x240.

 

Can any of you spot an error (in thinking or code) here?

That would be most appreciated :)

Thanks and thanks for all the help/documentation so far!

Martin

Share this post


Link to post
Share on other sites

If you use handybug you can set a break point in your stage2 loader and then step through your code.

And, why re-invent the wheel (means block select)? Do you think you can make a rounder one?

Share this post


Link to post
Share on other sites

I am running on linux only, so no handydebug unfortunately as far as I know?

This is all part of the learning process. I like loaders anyway, wrote several for C64 and would hate to use a black box on Lynx now.

There is always room for improvement. Either in code or at least in own understanding of things :)

In fact I even consider it vital for a community to hold up knowledge about the 'inner workings'.

BLL is great (in fact it is pretty awesome) but I don't intend to make use of it (other than of course learning from it).

Probably worth digging up that handydebug somehow and worst case set up some notebook with it - thanks for the hint!

  • Like 1

Share this post


Link to post
Share on other sites

Well, a VM with win XP should be enough.

BLL's select block is the "official" one from boot ROM, only register saving added.
Its first action is to reset strobe and output 0.

Maybe "hand"-debug your code.

 

Be sure to restore IODAT on leave!

Share this post


Link to post
Share on other sites

Ah, MS even has a free XP image for VirtualBox it seems. Will try.

I also went for 'hand debug' but even BRK didnt really do what I had hoped it would, so I have doubts that it is even the set_block code that fails (even though I spotted an error).

I will report back here if I make progress.

Share this post


Link to post
Share on other sites

Hm, XP (32) complains that the handybug.exe I used (from the testes wanted thread) is no proper Win32 Application.

Share this post


Link to post
Share on other sites

Thanks! Yours works in VirtualBox at least. Seems my Wine is somehow borked on top of that but Im fine with something working somewhere ;-)

Share this post


Link to post
Share on other sites

Splendid, I got it fixed. Will improve it a alitte and post here. Thanks 42BS und Sage!

Share this post


Link to post
Share on other sites

Find attached a minimal loader that loads a 5500 Byte test program.

Can't be much faster I guess.

Stage2 fits into stack and payload starts at $0200 then.

 

Stage1 is this now:

*= $0200
stz mapctl ;BIOS seems to set this to 3 which I think is fine for most cases
lda #$04
sta serctl
ldx #162 ;(256-94 size of stage2)
loop
lda cart0
sta !$005e,x
inx
bne loop
jmp $100

 

I assume 64KB LNX files are not that common?

I haven't tested this on real hardware yet.

Thanks for your help :)

Cheers,

Martin

love.zip

Share this post


Link to post
Share on other sites

I kind of see the value in minimizing code. But the Stage1 cannot be smaller than 51 bytes.

 

Would it be possible to use the extra zero-bytes for something useful?

 

There is also the case of AUDIN. As the Lynx I and Lynx II start up with AUDIN in different states I thought it would be a good idea to set it in Stage1.

 

One thing that I was concerned about was to be able to load in data in any place. That is why I put my 2nd loader at the same sport ar Mickey ROM and SCREEN bufffers. That area is a bit wasted area anyway as you cannot easily run code there because of registers. Suzy on the other hand does not care. So the screen space and the registers can reside on the same memory locations without problems.

 

FFF8-FFFF sacred registers

FC00-FFF7 registers E018-FFF7 Screen buffer

FB68-FBFF 2nd loader 151 bytes

 

Just thought of sharing this as well. Of course using the stack area is pretty safe too.

 

OMG!!! Your <3 love <3 is soooo cute!

Edited by karri

Share this post


Link to post
Share on other sites

I see no benefit in a 2nd stage loader. The first stage loader (the one which is decrypted) can be large enough to load the game unless the reason is to start it at $200. But since every game needs variables, it is easier to put those from $200 upward and let the code start later.

Currently, you are "wasting" 29 bytes in stage1.

Share this post


Link to post
Share on other sites

Yes, of course stage 1 can set up all kinds of things, including a small logo sprite/registers

I will see if I fit in a full loader as well :)

I think it is a bit too tight, but worth an attempt. Going for 2nd stage is certainly faster than decoding a 2nd 51 byte chunk.

Currently I even waste most of Block0, too.

Share this post


Link to post
Share on other sites

ROM space is usually enough available. But I agree, that decoding takes some time.

So the challenge is to fit a complete loader into 51 bytes ;-)

You do not need the last JMP if you load to $200+52 ;-)

Share this post


Link to post
Share on other sites

Got it! ;-)

Would be cool if someone could test this on a real Lynx.

A 1-shot loader. I was stupid not thinking of this earlier.

There is even still room for improvement but no longer required:

 

lda #1
sta block
jsr $fe00

;harccoded target length
lda #11
sta blocks2load

load_loop
load_a_full_block
ldy #2;2 pages = 512 Bytes Blocksize
ldx #0
pageloop
lda cart0
target
sta $0300,x
inx
bne pageloop
inc target+2
dey ;pages
bne pageloop

inc block
lda block
jsr $fe00
dec blocks2load
bne load_a_full_block
ready

jmp $0300

1shotload.zip

Share this post


Link to post
Share on other sites

You could store the blocks to load at byte 52 in the ROM and start the loader with reading it.

 

lda cart0 // get number of full blocks to load

sta blocks2load

stz block

load_full_block:

inc block

lda block

jsr $fe00

tay // a == 0 after fe00, x == 2

pageloop
lda cart0
target
sta $0300,y
iny
bne pageloop
inc target+2
dex ;pages
bne pageloop
dec blocks2load
bne load_a_full_block
ready
jmp $0300

 

=> 37 bytes, 13 to go ;-)

Share this post


Link to post
Share on other sites

Enough space left to try the same for the target, right now I assemble and encrypt everything in a Makefile anyway, but might be more generic this way indeed.

Works! Now not that much space left ;-) (0 in the code below)

 

block = $02
blocks2load = $03
exe=$04

lda cart0 // get number of full blocks to load
sta blocks2load
ldx #1
l1
lda cart0
sta target+1,x
sta exe,x
dex
bpl l1
stz block
load_a_full_block:
inc block
lda block
jsr $fe00
tay // a == 0 after fe00, x == 2
pageloop
lda cart0
target
sta $0300,y
iny
bne pageloop
inc target+2
dex ;pages
bne pageloop
dec blocks2load
bne load_a_full_block
ready
jmp (exe)

 

That's the new generic loader now:

00000000 ff 27 8e df 7a ec e5 9d 40 62 6c 4e 39 0a 36 05 |.'[email protected]|
00000010 23 a0 00 ff 7c 51 78 34 3a de d4 da 96 17 3a 61 |#...|Qx4:.....:a|
00000020 6c 26 91 20 be e5 41 e5 51 f5 52 b2 1f 68 ae ed |l&. ..A.Q.R..h..|
00000030 4d ec cb 31 |M..1|

I use it as:

.bin 0,0,"1shotload.enc"
.byte (end_of_game-start_of_game)/BLOCKSIZE+1 ;size of game in blocks
.byte $03,$00 ;big endian!
.dsb (BLOCKSIZE*STARTBLOCK)-*,0;align to a full block for my own loader
start_of_game
.bin 0,0,"game.bin"

end_of_game

Share this post


Link to post
Share on other sites

Thank you very much ;-)

Here is a version that runs itself from stack and loads (hardcoded though) to $0200

but I like the more generic one more.

 

;A and X are 0 on entry
;ldx #0
l2
lda code,x
sta $100,x
inx
bpl l2
jmp $100
code
*=$100
lda cart0 // get number of full blocks to load
sta blocks2load
stz block
load_a_full_block:
inc block
lda block
jsr $fe00
tay // a == 0 after fe00, x == 2
pageloop
lda cart0
target
sta $200,y
iny
bne pageloop
inc target+2
dex
bne pageloop
dec blocks2load
bne load_a_full_block
ready
jmp $200

Share this post


Link to post
Share on other sites

*hmm* I would place the code at $200-sizeof(loader) to remove the last JMP ;-)

Based on Monty Python: Every byte that's wasted ...

Share this post


Link to post
Share on other sites

Considered that but it's a bit less trivial due to the JSRs in the code (and us hogging the stack ;-).

You'd need to fiddle TXS for that, more ugly than 3 jump bytes in my book ;-)

 

EDIT:

this will do, but no bytes 'saved' anywhere, just a bit more obfuscation at work ;-)

 

ldx #$ff
txs
ldx #33
l2
lda code,x
pha
dex
bpl l2
jmp $1de

 

code
*=$1de
lda cart0 // get number of full blocks to load
sta blocks2load
stz block
load_a_full_block:
inc block
lda block
jsr $fe00
tay // a == 0 after fe00, x == 2
pageloop
lda cart0
target
sta $200,y
iny
bne pageloop
inc target+2
dex
bne pageloop
dec blocks2load
bne load_a_full_block
ready
;jmp $200
code_size=*-$1de
init_size=code-$200
#print code_size
#print init_size
#print 50-(code_size+init_size)
.dsb 50-(code_size+init_size),0

Share this post


Link to post
Share on other sites

:-) thought of "pha" also ...

 

Since X is 0 on entry, you can use "dex" instead "ldx #$ff"

    txs
    ldx #ready-stack_code
copy:
    lda code,x
    pha
    dex
    bpl copy
    bra $1de

or

    ldx #ready-stack_code
copy:
    lda code,x
    pha
    dex
    BPL copy
    LDA cart0 ; get number of full blocks to load
    sta blocks2load
    STZ block
    bra $1e5
code
    ;.org $1e5
stack_code:
load_a_full_block:
    inc block
    lda block
    jsr $fe00
    tay ; a == 0 after fe00, x == 2
pageloop
    lda cart0
target
    sta $200,y
    iny
    bne pageloop
    inc target+2
    dex
    bne pageloop
    dec blocks2load
    bne load_a_full_block
ready

less pushes.

(I like these nonsense-optimizations .... :-) )

Edited by 42bs

Share this post


Link to post
Share on other sites

Sweet!

See? Reinventing the wheel can be most productive and/or/eor fun ;-)

Attached is a little speed showcase.

Loading 64512 Byte as first file.

However, this shows already a downside ;-)

Only loads full blocks currently and not hiding screen in the stack-running version.

No time now for a proper example.

Still 160x800 pixels loaded impressively fast I think.

This was fun.

Cheers,

Martin

bigpic.zip

  • Like 2

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...