Jump to content

Photo

Lynx loader from scratch


56 replies to this topic

#1 enthusi OFFLINE  

enthusi

    Dragonstomper

  • 519 posts
  • Location:Potsdam, Germany

Posted Thu Aug 23, 2018 7:07 AM

Hi,

I am trying to fully get what's going on there and write an own loader.

I base the first tests on Karris nice small loader:

*=$200

    stz mapctl    
    lda #$04    
    sta serctl    
    ldx #$00    
loop
    lda cart0    
    sta $f000,x    
    inx               
    bne loop 

    jmp $f000

.dsb $200+50-*,0

 

this loads a full page (the ripple counter is supposedly somewhere in the middle of Bank0 now).

stage2 is only ~100 bytes long currently.

However, my own stage2 loader starts by setting block 1 (512 Byte Blocksize),

so the ripple counter should be reset and the further data is taken from Block 1, 0 offset or in terms of

ROM at 0x200 and in terms of LNX at 0x240 (correct??).

Any error in thinking here (as it doesnt work).

I encrypt the above code with lynxenc to (only stage1):

 

00000000  ff cc e7 22 43 9e 5a e6  4e c1 47 ba 12 48 0d ff  |..."C.Z.N.G..H..|
00000010  f6 ed 22 8e 00 6d 47 57  ac cb c1 6f 79 82 87 99  |.."..mGW...oy...|
00000020  42 e7 71 9c aa de 7f f6  75 a6 fa 1a 3d 01 97 75  |B.q.....u...=..u|
00000030  22 99 43 11                                       |".C.|
 

lynxdec agrees on the content at least :)

 

Hard to tell in handysdl or mednafen if that part worked at all.

I thought a BRK also exits handysdl but even starting the stage1 loader with BRK results in 'nothing' happening.

 

My LNX header is pretty certainly ok:

00000000  4c 59 4e 58 00 02 00 00  01 00 41 73 73 65 6d 62  |LYNX......Assemb|
00000010  6c 6f 69 64 73 00 00 00  00 00 00 00 00 00 00 00  |loids...........|
00000020  00 00 00 00 00 00 00 00  00 00 50 72 69 6f 72 41  |..........PriorA|
00000030  72 74 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |rt..............|
00000040  ff cc e7 22 43 9e 5a e6  4e c1 47 ba 12 48 0d ff  |..."C.Z.N.G..H..|
00000050  f6 ed 22 8e 00 6d 47 57  ac cb c1 6f 79 82 87 99  |.."..mGW...oy...|
00000060  42 e7 71 9c aa de 7f f6  75 a6 fa 1a 3d 01 97 75  |B.q.....u...=..u|
00000070  22 99 43 11 a9 1a 8d 8a  fd a9 0b 8d 8b fd a9 02  |".C.............|
00000080  8d 87 fd a9 01 85 10 20  39 f0 a9 14 85 14 a0 02  |....... 9.......|
....

 

The stage2 loader starts right after the last byte of the encryted stage1 code.

Stage2 is not encrypted anymore.

My stage 2 loader does this to set a block address:

(not optimized at all, I know)

 

set_block
.(
    lda iodat
    and #%11111101
    sta tmpiodat
    
    lda block
    sta tmp
    
    ;pre-roll once into bit 0 (not yet 1)
    rol tmp
    rol tmp+1
    
    ldx #7
loop_block    
    
    rol tmp
    rol tmp+1 ;bit 7 of block -> bit 1 of tmp+1
    
    lda tmp+1
    and #%00000010
    ora tmpiodat
    sta iodat
    
    lda #%00000011 ;0
    sta sysctl1
    lda #%00000010 ;1
    sta sysctl1
    
    dex
    bpl loop_block
    rts
.)  

 

 

and then it starts loading.

The payload starts in the LNX image at 0x240.

 

Can any of you spot an error (in thinking or code) here?

That would be most appreciated :)

Thanks and thanks for all the help/documentation so far!

Martin



#2 42bs OFFLINE  

42bs

    Chopper Commander

  • 199 posts
  • Location:Germany/Southest West

Posted Thu Aug 23, 2018 7:34 AM

If you use handybug you can set a break point in your stage2 loader and then step through your code.
 

And, why re-invent the wheel (means block select)? Do you think you can make a rounder one?



#3 enthusi OFFLINE  

enthusi

    Dragonstomper

  • Topic Starter
  • 519 posts
  • Location:Potsdam, Germany

Posted Thu Aug 23, 2018 7:49 AM

I am running on linux only, so no handydebug unfortunately as far as I know?

This is all part of the learning process. I like loaders anyway, wrote several for C64 and would hate to use a black box on Lynx now.

There is always room for improvement. Either in code or at least in own understanding of things :)

In fact I even consider it vital for a community to hold up knowledge about the 'inner workings'.

BLL is great (in fact it is pretty awesome) but I don't intend to make use of it (other than of course learning from it).

Probably worth digging up that handydebug somehow and worst case set up some notebook with it - thanks for the hint!



#4 42bs OFFLINE  

42bs

    Chopper Commander

  • 199 posts
  • Location:Germany/Southest West

Posted Thu Aug 23, 2018 7:58 AM

Well, a VM with win XP should be enough.

BLL's select block is the "official" one from boot ROM, only register saving added.
Its first action is to reset strobe and output 0.

Maybe "hand"-debug your code.

 

Be sure to restore IODAT on leave!



#5 enthusi OFFLINE  

enthusi

    Dragonstomper

  • Topic Starter
  • 519 posts
  • Location:Potsdam, Germany

Posted Thu Aug 23, 2018 8:15 AM

Ah, MS even has a free XP image for VirtualBox it seems. Will try.

I also went for 'hand debug' but even BRK didnt really do what I had hoped it would, so I have doubts that it is even the set_block code that fails (even though I spotted an error).

I will report back here if I make progress.



#6 enthusi OFFLINE  

enthusi

    Dragonstomper

  • Topic Starter
  • 519 posts
  • Location:Potsdam, Germany

Posted Thu Aug 23, 2018 8:55 AM

Hm, XP (32) complains that the handybug.exe I used (from the testes wanted thread)  is no proper Win32 Application.



#7 42bs OFFLINE  

42bs

    Chopper Commander

  • 199 posts
  • Location:Germany/Southest West

Posted Thu Aug 23, 2018 9:19 AM

Maybe try mine:

http://www.monlynx.d...oad/handybug.7z

 

(Only the exe)

 

Edit: Tried in on ubuntu with wine: No GFX output (as Linux adict you might fix this), but you can single step etc.


Edited by 42bs, Thu Aug 23, 2018 9:28 AM.


#8 enthusi OFFLINE  

enthusi

    Dragonstomper

  • Topic Starter
  • 519 posts
  • Location:Potsdam, Germany

Posted Thu Aug 23, 2018 9:30 AM

Thanks! Yours works in VirtualBox at least. Seems my Wine is somehow borked on top of that but Im fine with something working somewhere ;-)



#9 enthusi OFFLINE  

enthusi

    Dragonstomper

  • Topic Starter
  • 519 posts
  • Location:Potsdam, Germany

Posted Fri Aug 24, 2018 1:31 AM

Splendid, I got it fixed. Will improve it a alitte and post here. Thanks 42BS und Sage!



#10 enthusi OFFLINE  

enthusi

    Dragonstomper

  • Topic Starter
  • 519 posts
  • Location:Potsdam, Germany

Posted Fri Aug 24, 2018 2:18 AM

Find attached a minimal loader that loads a 5500 Byte test program.

Can't be much faster I guess.

Stage2 fits into stack and payload starts at $0200 then.

 

Stage1 is this now:

*=    $0200
    stz mapctl    ;BIOS seems to set this to 3 which I think is fine for most cases
    lda #$04    
    sta serctl    
    ldx #162 ;(256-94 size of stage2)
loop
    lda cart0    
    sta !$005e,x    
    inx        
    bne loop    
    jmp $100

 

I assume 64KB LNX files are not that common?

I haven't tested this on real hardware yet.

Thanks for your help :)

Cheers,

Martin

Attached Files

  • Attached File  love.zip   5.23KB   36 downloads


#11 karri OFFLINE  

karri

    River Patroller

  • 2,488 posts
  • Location:Espoo, Finland

Posted Fri Aug 24, 2018 3:19 AM

I kind of see the value in minimizing code. But the Stage1 cannot be smaller than 51 bytes.

 

Would it be possible to use the extra zero-bytes for something useful?

 

There is also the case of AUDIN. As the Lynx I and Lynx II start up with AUDIN in different states I thought it would be a good idea to set it in Stage1.

 

One thing that I was concerned about was to be able to load in data in any place. That is why I put my 2nd loader at the same sport ar Mickey ROM and SCREEN bufffers. That area is a bit wasted area anyway as you cannot easily run code there because of registers. Suzy on the other hand does not care. So the screen space and the registers can reside on the same memory locations without problems.

 

FFF8-FFFF sacred registers

FC00-FFF7 registers      E018-FFF7 Screen buffer

FB68-FBFF 2nd loader 151 bytes

 

Just thought of sharing this as well. Of course using the stack area is pretty safe too.

 

OMG!!! Your <3 love <3 is soooo cute!


Edited by karri, Fri Aug 24, 2018 3:21 AM.


#12 42bs OFFLINE  

42bs

    Chopper Commander

  • 199 posts
  • Location:Germany/Southest West

Posted Fri Aug 24, 2018 3:29 AM

I see no benefit in a 2nd stage loader. The first stage loader (the one which is decrypted) can be large enough to load the game unless the reason is to start it at $200. But since every game needs variables, it is easier to put those from $200 upward and let the code start later.

Currently, you are "wasting" 29 bytes in stage1.



#13 enthusi OFFLINE  

enthusi

    Dragonstomper

  • Topic Starter
  • 519 posts
  • Location:Potsdam, Germany

Posted Fri Aug 24, 2018 3:54 AM

Yes, of course stage 1 can set up all kinds of things, including a small logo sprite/registers

I will see if I fit in a full loader as well :)

I think it is a bit too tight, but worth an attempt. Going for 2nd stage is certainly faster than decoding a 2nd 51 byte chunk.

Currently I even waste most of Block0, too.



#14 42bs OFFLINE  

42bs

    Chopper Commander

  • 199 posts
  • Location:Germany/Southest West

Posted Fri Aug 24, 2018 5:04 AM

ROM space is usually enough available. But I agree, that decoding takes some time.

So the challenge is to fit a complete loader into 51 bytes ;-)

You do not need the last JMP if you load to $200+52 ;-)
 



#15 enthusi OFFLINE  

enthusi

    Dragonstomper

  • Topic Starter
  • 519 posts
  • Location:Potsdam, Germany

Posted Fri Aug 24, 2018 5:07 AM

into 50 bytes as far as I know? But good point about the JMP ;-)



#16 42bs OFFLINE  

42bs

    Chopper Commander

  • 199 posts
  • Location:Germany/Southest West

Posted Fri Aug 24, 2018 5:15 AM

Right, 50.



#17 enthusi OFFLINE  

enthusi

    Dragonstomper

  • Topic Starter
  • 519 posts
  • Location:Potsdam, Germany

Posted Fri Aug 24, 2018 5:26 AM

Got it! ;-)

Would be cool if someone could test this on a real Lynx.

A 1-shot loader. I was stupid not thinking of this earlier.

There is even still room for improvement but no longer required:

 

    lda #1
    sta block
    jsr $fe00
    
;harccoded target length
    lda #11
    sta blocks2load
   
load_loop
load_a_full_block
    ldy #2;2 pages = 512 Bytes Blocksize
    ldx #0
pageloop
    lda cart0
target
    sta $0300,x
    inx
    bne pageloop
    inc target+2
    dey ;pages
    bne pageloop
    
    inc block
    lda block
    jsr $fe00
    dec blocks2load
    bne load_a_full_block
ready    

    jmp $0300

Attached Files



#18 42bs OFFLINE  

42bs

    Chopper Commander

  • 199 posts
  • Location:Germany/Southest West

Posted Fri Aug 24, 2018 5:50 AM

You could store the blocks to load at byte 52 in the ROM and start the loader with reading it.

 

lda cart0 // get number of full blocks to load

sta blocks2load

stz block

load_full_block:

   inc block

   lda block

   jsr $fe00

   tay // a == 0 after fe00, x == 2

pageloop
    lda cart0
target
    sta $0300,y
    iny
    bne pageloop
    inc target+2
    dex ;pages
    bne pageloop
    dec blocks2load
    bne load_a_full_block
ready    
    jmp $0300

 

=> 37 bytes, 13 to go ;-)



#19 enthusi OFFLINE  

enthusi

    Dragonstomper

  • Topic Starter
  • 519 posts
  • Location:Potsdam, Germany

Posted Fri Aug 24, 2018 5:58 AM

Enough space left to try the same for the target, right now I assemble and encrypt everything in a Makefile anyway, but might be more generic this way indeed.

Works! Now not that much space left ;-) (0 in the code below)

 

block =  $02
blocks2load = $03
exe=$04

    lda cart0 // get number of full blocks to load
    sta blocks2load
    ldx #1
l1
    lda cart0
    sta target+1,x
    sta exe,x
    dex
    bpl l1
    stz block
load_a_full_block:
   inc block
   lda block
   jsr $fe00
   tay // a == 0 after fe00, x == 2
pageloop
    lda cart0
target
    sta $0300,y
    iny
    bne pageloop
    inc target+2
    dex ;pages
    bne pageloop
    dec blocks2load
    bne load_a_full_block
ready    
    jmp (exe)

 

That's the new generic loader now:

00000000  ff 27 8e df 7a ec e5 9d  40 62 6c 4e 39 0a 36 05  |.'..z...@blN9.6.|
00000010  23 a0 00 ff 7c 51 78 34  3a de d4 da 96 17 3a 61  |#...|Qx4:.....:a|
00000020  6c 26 91 20 be e5 41 e5  51 f5 52 b2 1f 68 ae ed  |l&. ..A.Q.R..h..|
00000030  4d ec cb 31                                       |M..1|
 

I use it as:

.bin 0,0,"1shotload.enc"
.byte (end_of_game-start_of_game)/BLOCKSIZE+1 ;size of game in blocks
.byte $03,$00 ;big endian!
.dsb (BLOCKSIZE*STARTBLOCK)-*,0;align to a full block for my own loader
start_of_game
.bin 0,0,"game.bin" 

end_of_game



#20 karri OFFLINE  

karri

    River Patroller

  • 2,488 posts
  • Location:Espoo, Finland

Posted Fri Aug 24, 2018 6:54 AM

This is so cool. Congrats!



#21 enthusi OFFLINE  

enthusi

    Dragonstomper

  • Topic Starter
  • 519 posts
  • Location:Potsdam, Germany

Posted Fri Aug 24, 2018 7:27 AM

Thank you very much ;-)

Here is a version that runs itself from stack and loads (hardcoded though) to $0200

but I like the more generic one more.

 

    ;A and X are 0 on entry
    ;ldx #0
l2    
    lda code,x
    sta $100,x
    inx
    bpl l2
    jmp $100
code
*=$100
    lda cart0 // get number of full blocks to load
    sta blocks2load
    stz block
load_a_full_block:
   inc block
   lda block
   jsr $fe00
   tay // a == 0 after fe00, x == 2
pageloop
    lda cart0
target
    sta $200,y
    iny
    bne pageloop
    inc target+2
    dex
    bne pageloop
    dec blocks2load
    bne load_a_full_block
ready    
    jmp $200



#22 42bs OFFLINE  

42bs

    Chopper Commander

  • 199 posts
  • Location:Germany/Southest West

Posted Fri Aug 24, 2018 7:51 AM

*hmm* I would place the code at $200-sizeof(loader) to remove the last JMP ;-)

Based on Monty Python: Every byte that's wasted ...



#23 enthusi OFFLINE  

enthusi

    Dragonstomper

  • Topic Starter
  • 519 posts
  • Location:Potsdam, Germany

Posted Fri Aug 24, 2018 7:59 AM

Considered that but it's a bit less trivial due to the JSRs in the code (and us hogging the stack ;-).

You'd need to fiddle TXS for that, more ugly than 3 jump bytes in my book ;-)

 

EDIT:

this will do, but no bytes 'saved' anywhere, just a bit more obfuscation at work ;-)

 

    ldx #$ff
    txs
    ldx #33
l2    
    lda code,x
    pha
    dex
    bpl l2
    jmp $1de

 

code
*=$1de
    lda cart0 // get number of full blocks to load
    sta blocks2load
    stz block
load_a_full_block:
   inc block
   lda block
   jsr $fe00
   tay // a == 0 after fe00, x == 2
pageloop
    lda cart0
target
    sta $200,y
    iny
    bne pageloop
    inc target+2
    dex
    bne pageloop
    dec blocks2load
    bne load_a_full_block
ready    
    ;jmp $200
code_size=*-$1de   
init_size=code-$200
#print code_size
#print init_size
#print 50-(code_size+init_size)    
.dsb 50-(code_size+init_size),0



#24 42bs OFFLINE  

42bs

    Chopper Commander

  • 199 posts
  • Location:Germany/Southest West

Posted Fri Aug 24, 2018 8:10 AM

:-) thought of "pha" also ...

 

Since X is 0 on entry, you can use "dex" instead "ldx #$ff"

    txs
    ldx #ready-stack_code
copy:
    lda code,x
    pha
    dex
    bpl copy
    bra $1de

or

    ldx #ready-stack_code
copy:
    lda code,x
    pha
    dex
    BPL copy
    LDA cart0 ; get number of full blocks to load
    sta blocks2load
    STZ block
    bra $1e5
code
    ;.org $1e5
stack_code:
load_a_full_block:
    inc block
    lda block
    jsr $fe00
    tay ; a == 0 after fe00, x == 2
pageloop
    lda cart0
target
    sta $200,y
    iny
    bne pageloop
    inc target+2
    dex
    bne pageloop
    dec blocks2load
    bne load_a_full_block
ready

less pushes.

(I like these nonsense-optimizations .... :-) )


Edited by 42bs, Fri Aug 24, 2018 8:33 AM.


#25 enthusi OFFLINE  

enthusi

    Dragonstomper

  • Topic Starter
  • 519 posts
  • Location:Potsdam, Germany

Posted Fri Aug 24, 2018 9:14 AM

Sweet!

See? Reinventing the wheel can be most productive and/or/eor fun ;-)

Attached is a little speed showcase.

Loading 64512 Byte as first file.

However, this shows already a downside ;-)

Only loads full blocks currently and not hiding screen in the stack-running version.

No time now for a proper example.

Still 160x800 pixels loaded impressively fast I think.

This was fun.

Cheers,

Martin

Attached Files






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users