42bs Posted September 17, 2018 Share Posted September 17, 2018 Oh, Never Seen this. Interesting. No C64 guy when $00 is being used :-) Are 0,1,2 even initialized? :-) I think this is an early version. I doubt it works. Quote Link to comment Share on other sites More sharing options...
42bs Posted December 6, 2018 Share Posted December 6, 2018 (edited) Trying to squeeze STNICC scene into a card I found ethusi's loader wastes some space. So here is another version, which fetches the data right after the encrypted loader. Three bytes are left for personal use ; micro loader ; ; programm must start at $1ff, first byte must contain number of ; pages to load (see demo.s), so actual code at $200 ; ; Note: Does not clear AUDIN, therefor not for use for bank-switching carts! ; (lda #$1a; sta $FD8A, and FE00 sets AUDIN (B4) == 0) ; RCART_0 EQU $fcb2 ; cart data register BLOCKNR EQU 0 ; zeroed by ROM PAGECNT EQU $1ff RUN $0200 ; SP = 3 after ROM, so push 3 bytes plus ldx #(b9+1)-b0+3 cpy: stz $fda0,x ; clear colors lda b0,x ; copy loader pha dex bpl cpy ldy #51+1 ; already 51 bytes loaded from 1st block! bra $200-(b9+1-b1) ; to be copied into stack b0: dex bne b2 inc BLOCKNR ; next block lda BLOCKNR jsr $fe00 ; select block b1: ldx #4 ; 4 pages per block b2: lda RCART_0 DST sta $200-(51+2),y ; first byte goes to $1ff (PAGECNT) iny bne b2 inc $200-(b9+1-DST)+2 ; next dst page dec PAGECNT bne b0 dc.b $80 ; opcode "BRA" ; PAGECNT will be here, if zero => BRA $200 b9: ; program is here ... endofbl: size set endofbl-$200 free set 49-size echo "Free %Dfree" IF free < 0 echo "Size must be <= 50!" ENDIF ; fill remaining space IF free > 0 REPT free dc.b $42 ; unused space shall not be 0! ENDR ENDIF dc.b $00 ; end mark! The program must start at $1ff with the number of pages, so actual code begins at $200. Edited December 6, 2018 by 42bs Quote Link to comment Share on other sites More sharing options...
Heaven/TQA Posted December 6, 2018 Share Posted December 6, 2018 Ok. Just briefly followed. Is that loader working with loading files? Or is it based on loading 1 file only? Quote Link to comment Share on other sites More sharing options...
42bs Posted December 6, 2018 Share Posted December 6, 2018 This one just loads one file. No directory. This can be implemented in the application though. Here my Makefile: all: ml.lnx ml.lyx: ml_enc.bin demo.bin cat ml_enc.bin demo.bin >$@ ml.lnx: ml.lyx make_lnx $< -b0 256K -o $@ ml_enc.bin: ml.bin lynxenc $< $@ ml.bin: micro_loader.s lyxass -d -o $@ $< demo.bin: demo.s lyxass -d -o $@ $< .PHONY: clean clean: rm -f *.bin rm -f ml.lnx ml.lyx Quote Link to comment Share on other sites More sharing options...
enthusi Posted December 6, 2018 Author Share Posted December 6, 2018 I used a variant of the loader I posted here for the "Lacim's Legacy" demo/preview and simple use an own 'filesystem'. Quote Link to comment Share on other sites More sharing options...
sage Posted December 6, 2018 Share Posted December 6, 2018 then you waste indeed some bytes. the micro loader used in current c65 and lynxdir is starting at byte 53 Quote Link to comment Share on other sites More sharing options...
enthusi Posted December 6, 2018 Author Share Posted December 6, 2018 Not wasted, just not used by first file I have routines to arrange ALL files in the end with as few page overlaps as necessary. Not that you'd really notice during the game but I like that 1 Quote Link to comment Share on other sites More sharing options...
laoo Posted May 11, 2019 Share Posted May 11, 2019 (edited) just a hint: the second stage in the microloader is not encrypted. thus there is no real speed difference to your ansatz. if you want to do a REAL optimization, you have to choose the filler bytes such, that the multiplication is faster. The challenge was to fit as much as possible in the first 50bytes since this is the minimum. But, oh, I get your point, if the first stage is only a few bytes that loads the rest and we fill the remainder with optimal values, decryption plus additional loading might be quicker. But to find this kind of optimized code one needs to have exact cycle counts. There is a guy in the 6502-FB group who made a simulator with lots to debugging features. But for Apple][. I do not trust handybug's cycle count, but the decription could be run in another simulator ... Next challenge :-) I did some profiling of that modular multiplication algorithm while it decoded Karri's micro-loader and here are the results: Last four columns say about number of branch taken vs. not taken. It took overall 1486320 cycles. I think it might be faster if the encoded stream had more zero bits. But how much it takes on Lynx to execute such number of cycles? It's much more than 0.4s? So is there really any sense optimizing it? Edited May 11, 2019 by laoo 1 Quote Link to comment Share on other sites More sharing options...
42bs Posted May 12, 2019 Share Posted May 12, 2019 "So is there really any sense optimizing it?" IHMO, optimizing up to a certain point is a "must". Beyond this point it is just: "because I can" 1 Quote Link to comment Share on other sites More sharing options...
laoo Posted May 12, 2019 Share Posted May 12, 2019 "So is there really any sense optimizing it?" IHMO, optimizing up to a certain point is a "must". Beyond this point it is just: "because I can" Yeah, yeah, I know. Premature optimization is my hobby too But there was a question what is the stake here? I don't own the console and really don't know how much it takes to decrypt one 51 byte long block. I presume that decrypting one instead of two is perceivable, but will it be visible to speed up the process by few percent? I even don't know how much faster it can be. I could do some tests - generate some blocks with different number of zeros and see what the number of cycles it will take. But overall the means of optimization here are... cumbersome at least. It involves filling the extra space with different values and checking the number of zeros on result after encoding. Pure brute-force. If someone has idle cycles on his/her machine it can be done in spare time. Quote Link to comment Share on other sites More sharing options...
42bs Posted May 12, 2019 Share Posted May 12, 2019 On a real (original) Lynx, the short loader is quicker than the tube takes to stabilize. Also, the ROM zeroes all RAM, which takes far more time than decoding one block. So, finding the optimal block is academical and of no real use. :-) Quote Link to comment Share on other sites More sharing options...
laoo Posted May 12, 2019 Share Posted May 12, 2019 I see Nevertheless if someone would feel an urge to pursue the Monty Python's level of academicity I could prepare an idle priority brute-force encrypter that would walk through whole space of different fillings of unused bytes in search of encoded block with as many zeros as possible :-) Quote Link to comment Share on other sites More sharing options...
Cyprian Posted May 12, 2019 Share Posted May 12, 2019 I did some profiling of that modular multiplication algorithm while it decoded Karri's micro-loader and here are the results:how did you profile Lynx code? Quote Link to comment Share on other sites More sharing options...
+karri Posted May 12, 2019 Share Posted May 12, 2019 Also, the ROM zeroes all RAM, which takes far more time than decoding one block. I don't see this zeroing the RAM as there seems to be garbage values in uninitialized variables. Or it could be left-overs from the decryption process. Quote Link to comment Share on other sites More sharing options...
42bs Posted May 12, 2019 Share Posted May 12, 2019 This code in the ROM: clearMem: STZ z01 LDA #$00eFE1D: STA (z00),Y INY BNE eFE1D INC z01 BNE eFE1D It is called at the very beginning: eFF80: LDA $FC88 BEQ end PLAeFF86: PLA PLA PLA LDY #$02 STY $FD8B INY STY $FD8A STY $FFF9 STZ z00 JMP clearMem Quote Link to comment Share on other sites More sharing options...
laoo Posted May 12, 2019 Share Posted May 12, 2019 I don't see this zeroing the RAM as there seems to be garbage values in uninitialized variables. Or it could be left-overs from the decryption process. Here's the animation of content of page zero before and after decryption. So yes, page zero is littered with leftovers from the decryption process. how did you profile Lynx code? I ran decryption in Altirra. Processor is roughly the same But it might be a good idea to add a profiler to Handy. It should not be very difficult. Given it has cycle exact emulation. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.