Jump to content
IGNORED

Lynx loader from scratch


enthusi

Recommended Posts

  • 2 months later...

Trying to squeeze STNICC scene into a card I found ethusi's loader wastes some space. So here is another version, which fetches the data right after the encrypted loader. Three bytes are left for personal use ;-)

; micro loader
;
; programm must start at $1ff, first byte must contain number of
; pages to load (see demo.s), so actual code at $200
;
; Note: Does not clear AUDIN, therefor not for use for bank-switching carts!
;      (lda #$1a; sta $FD8A, and FE00 sets AUDIN (B4) == 0)
;

RCART_0		EQU $fcb2 ; cart data register

BLOCKNR		EQU 0		; zeroed by ROM
PAGECNT		EQU $1ff

	RUN    $0200

	; SP = 3 after ROM, so push 3 bytes plus
	ldx	#(b9+1)-b0+3
cpy:
	stz	$fda0,x		; clear colors
	lda	b0,x		; copy loader
	pha
	dex
	bpl	cpy

	ldy	#51+1		; already 51 bytes loaded from 1st block!
	bra	$200-(b9+1-b1)

	; to be copied into stack
b0:
	dex
	bne	b2
	inc	BLOCKNR		; next block
	lda	BLOCKNR
	jsr	$fe00		; select block
b1:
	ldx	#4		; 4 pages per block
b2:
	lda	RCART_0
DST
	sta	$200-(51+2),y	; first byte goes to $1ff (PAGECNT)
	iny
	bne	b2

	inc	$200-(b9+1-DST)+2	; next dst page

	dec	PAGECNT
	bne	b0
	dc.b 	$80		; opcode "BRA"
	; PAGECNT will be here, if zero => BRA $200
b9:

	; program is here ...

endofbl:
size	set endofbl-$200
free	set 49-size
	echo "Free %Dfree"
	IF free < 0
	echo "Size must be <= 50!"
	ENDIF

	; fill remaining space
	IF free > 0
	REPT	free
	dc.b	$42		; unused space shall not be 0!
	ENDR
	ENDIF
	dc.b 	$00		; end mark!

The program must start at $1ff with the number of pages, so actual code begins at $200.

Edited by 42bs
Link to comment
Share on other sites

This one just loads one file. No directory. This can be implemented in the application though.

Here my Makefile:

all: ml.lnx

ml.lyx: ml_enc.bin demo.bin
	cat ml_enc.bin demo.bin >$@

ml.lnx: ml.lyx
	make_lnx $< -b0 256K -o $@

ml_enc.bin: ml.bin
	lynxenc $< $@

ml.bin: micro_loader.s
	lyxass -d -o $@ $<

demo.bin: demo.s
	lyxass -d -o $@ $<

.PHONY: clean
clean:
	rm -f *.bin
	rm -f ml.lnx ml.lyx

Link to comment
Share on other sites

  • 5 months later...

just a hint:

the second stage in the microloader is not encrypted. thus there is no real speed difference to your ansatz.

if you want to do a REAL optimization, you have to choose the filler bytes such, that the multiplication is faster.

 

 

The challenge was to fit as much as possible in the first 50bytes since this is the minimum.

 

But, oh, I get your point, if the first stage is only a few bytes that loads the rest and we fill the remainder with optimal values, decryption plus additional loading might be quicker.

 

But to find this kind of optimized code one needs to have exact cycle counts. There is a guy in the 6502-FB group who made a simulator with lots to debugging features. But for Apple][.

 

I do not trust handybug's cycle count, but the decription could be run in another simulator ...

 

Next challenge :-)

 

I did some profiling of that modular multiplication algorithm while it decoded Karri's micro-loader and here are the results:

 

post-23393-0-26741300-1557577553_thumb.png

 

Last four columns say about number of branch taken vs. not taken. It took overall 1486320 cycles. I think it might be faster if the encoded stream had more zero bits. But how much it takes on Lynx to execute such number of cycles? It's much more than 0.4s? So is there really any sense optimizing it?

Edited by laoo
  • Like 1
Link to comment
Share on other sites

"So is there really any sense optimizing it?"

 

IHMO, optimizing up to a certain point is a "must". Beyond this point it is just: "because I can" ;-)

 

Yeah, yeah, I know. Premature optimization is my hobby too ;) But there was a question what is the stake here? I don't own the console and really don't know how much it takes to decrypt one 51 byte long block. I presume that decrypting one instead of two is perceivable, but will it be visible to speed up the process by few percent? I even don't know how much faster it can be. I could do some tests - generate some blocks with different number of zeros and see what the number of cycles it will take. But overall the means of optimization here are... cumbersome at least. It involves filling the extra space with different values and checking the number of zeros on result after encoding. Pure brute-force. If someone has idle cycles on his/her machine it can be done in spare time.

Link to comment
Share on other sites

On a real (original) Lynx, the short loader is quicker than the tube takes to stabilize.

Also, the ROM zeroes all RAM, which takes far more time than decoding one block.

 

So, finding the optimal block is academical and of no real use. :-)

Link to comment
Share on other sites

I see :)

Nevertheless if someone would feel an urge to pursue the Monty Python's level of academicity I could prepare an idle priority brute-force encrypter that would walk through whole space of different fillings of unused bytes in search of encoded block with as many zeros as possible :-)

Link to comment
Share on other sites

Also, the ROM zeroes all RAM, which takes far more time than decoding one block.

 

I don't see this zeroing the RAM as there seems to be garbage values in uninitialized variables. Or it could be left-overs from the decryption process.

Link to comment
Share on other sites

This code in the ROM:

 

clearMem:
STZ z01
LDA #$00
eFE1D:
STA (z00),Y
INY
BNE eFE1D
INC z01
BNE eFE1D

 

It is called at the very beginning:

 

eFF80:
LDA $FC88
BEQ end
PLA
eFF86:
PLA
PLA
PLA
LDY #$02
STY $FD8B
INY
STY $FD8A
STY $FFF9
STZ z00
JMP clearMem

 

Link to comment
Share on other sites

 

I don't see this zeroing the RAM as there seems to be garbage values in uninitialized variables. Or it could be left-overs from the decryption process.

 

Here's the animation of content of page zero before and after decryption. So yes, page zero is littered with leftovers from the decryption process.

 

post-23393-0-57488300-1557658964.gif

 

how did you profile Lynx code?

 

I ran decryption in Altirra. Processor is roughly the same :)

But it might be a good idea to add a profiler to Handy. It should not be very difficult. Given it has cycle exact emulation.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...