funkheld Posted October 27, 2020 Share Posted October 27, 2020 (edited) what about effectus 0.5.3 for the demo? I guess it's used a lot greeting Edited October 27, 2020 by funkheld Quote Link to comment Share on other sites More sharing options...
zbyti Posted October 27, 2020 Share Posted October 27, 2020 @fenrock if you want to use flames as benchmark use D40B VCOUNT ;) Quote Link to comment Share on other sites More sharing options...
zbyti Posted October 27, 2020 Share Posted October 27, 2020 (edited) @fenrock As far as I understand you use lookup table for multiplication, If you want to fair comparison with Mad Pascal use {$f $60} directive in MP suite. I'm not remember well if $6000 is free area but you can check it FASTMUL {$F page} Seriously fast multiplication (8-bit and 16-bit) {$f $70} // fastmul at $7000 Alternatywne procedury szybkiego mnożenia dla typu BYTE, SHORTINT, WORD, SMALLINT, SHORTREAL. Procedury zajmują 2KB i są umieszczane od adresu PAGE*256. Edited October 27, 2020 by zbyti more info Quote Link to comment Share on other sites More sharing options...
fenrock Posted October 27, 2020 Author Share Posted October 27, 2020 Just now, zbyti said: @fenrock As far as I understand you use lookup table for multiplication, If you want to fair comparison with Mad Pascal use {$f $70} directive in MP suite. Thanks, I've been wondering on a solution for this, as storing the first 127 squared numbers in an array is kind-of avoiding the point of the test. I'll look into this. Quote Link to comment Share on other sites More sharing options...
fenrock Posted October 27, 2020 Author Share Posted October 27, 2020 @zbyti what is the story behind the md5 benchmark? Is it to test the public freepascal implementation here? I may have to find a C version to use to compare to it. Quote Link to comment Share on other sites More sharing options...
Gury Posted October 27, 2020 Share Posted October 27, 2020 I am glad to see new actor in Atari 8-bit world. KickC seems like good competitor joining existing languages for our beloved machine. I see it like good alternative for CC65 providing C language as the basis. This and Mad Pascal can also provide it with new comparisons in speed with new examples, which zbyti is preparing for us, even for other languages. If there will be even more interest for this language, it will be good candidate for supporting in Mad Studio. 1 Quote Link to comment Share on other sites More sharing options...
zbyti Posted October 27, 2020 Share Posted October 27, 2020 2 minutes ago, fenrock said: @zbyti what is the story behind the md5 benchmark? From the beginning I not wanted to test language specific libraries, only essentials. But @tebe beat VBCC and I did an exception 1 Quote Link to comment Share on other sites More sharing options...
zbyti Posted October 27, 2020 Share Posted October 27, 2020 (edited) @fenrock Monte Carlo Case CC65 uses: ; 8x16 routine with external entry points used by the 16x16 routine in mul.s tosmula0: tosumula0: sta ptr4 mul8x16:jsr popptr1 ; Get left operand (Y=0 by popptr1) tya ; Clear byte 1 ldy #8 ; Number of bits ldx ptr1+1 ; check if lhs is 8 bit only beq mul8x8 ; Do 8x8 multiplication if high byte zero mul8x16a: sta ptr4+1 ; Clear byte 2 lsr ptr4 ; Get first bit into carry @L0: bcc @L1 clc adc ptr1 tax lda ptr1+1 ; hi byte of left op adc ptr4+1 sta ptr4+1 txa @L1: ror ptr4+1 ror a ror ptr4 dey bne @L0 tax lda ptr4 ; Load the result rts Mad Pascal uses: ; ; Ullrich von Bassewitz, 2009-08-17 ; ; CC65 runtime: 8x8 => 16 unsigned multiplication ; */ .proc imulCL ptr1 = ecx ptr4 = eax ldy #8 lda #0 lsr ptr4 ; Get first bit into carry @L0: bcc @L1 clc adc ptr1 @L1: ror @ ror ptr4 dey bne @L0 sta ptr4+1 rts .endp @tebe explained: MP extends the type for multiplication and other operations, for u8xu8 it assumes the result of 16b, only later during optimization it will start to reject redundant operations result is written to eax, eax + 1 (16b) procedure in the tests turned out to be several scanning lines faster than the one used previously: .proc imulCL lda #$00 LDY #$09 CLC LOOP ROR @ ROR eax BCC MUL2 CLC ;DEC AUX above to remove CLC ADC ecx MUL2 DEY BNE LOOP STA eax+1 RTS .endp Millfork approach: byte*byte produces a byte, this is by design. An arithmetic operator never promotes the result to a type larger that the type of its arguments. In order to get a word, you need to explicitly cast one of the arguments to word: x = n*word(n) This causes a call to __mul_u16u8u16, which is defined in m6502/zp_reg.mfk. The same file also contains __mul_u16u16u16 and __mul_u8u8u8, plus all the division and modulo implementations. Edited October 27, 2020 by zbyti Millfork Quote Link to comment Share on other sites More sharing options...
tebe Posted October 27, 2020 Share Posted October 27, 2020 2 hours ago, fenrock said: @zbyti what is the story behind the md5 benchmark? Is it to test the public freepascal implementation here? I may have to find a C version to use to compare to it. https://github.com/tebe6502/Mad-Pascal/blob/master/lib/md5.pas 1 Quote Link to comment Share on other sites More sharing options...
zbyti Posted October 27, 2020 Share Posted October 27, 2020 @fenrock it's funny how big hole you have in the middle of the screen in landscape Quote Link to comment Share on other sites More sharing options...
funkheld Posted October 27, 2020 Share Posted October 27, 2020 (edited) Hi, Thank You. at kickc the suite.xex is now 4750 bytes small. greeting Edited October 27, 2020 by funkheld Quote Link to comment Share on other sites More sharing options...
zbyti Posted October 27, 2020 Share Posted October 27, 2020 (edited) 3 hours ago, zbyti said: @fenrock it's funny how big hole you have in the middle of the screen in landscape char landscapeBase[] = kickasm {{ .byte $AA, $96, $90, $90, $7A, $7A, $6E, $6E, $5E, $5E, $56, $56, $52, $50 }}; looks like signed/unsigned char problem to me but that's just only a deduction :] Edited October 27, 2020 by zbyti snippet Quote Link to comment Share on other sites More sharing options...
fenrock Posted October 27, 2020 Author Share Posted October 27, 2020 37 minutes ago, zbyti said: looks like signed/unsigned char problem to me but that's just only a deduction :] This is entirely possible, I'll check. I've had to write some fragments for some of the conditional code that I may have got wrong. It does look like it's right between the 7A and 90 value in the array, so good shout on it being signed issue. Quote Link to comment Share on other sites More sharing options...
Wrathchild Posted October 27, 2020 Share Posted October 27, 2020 There is no signed or unsigned... only ZUUL ? 2 Quote Link to comment Share on other sites More sharing options...
zbyti Posted October 27, 2020 Share Posted October 27, 2020 Wake up Ripley! You are not a ZUUL! We are on the LV-426 and we are screwed... Quote Link to comment Share on other sites More sharing options...
fenrock Posted October 27, 2020 Author Share Posted October 27, 2020 Well, running side by side, kickc (L) vs pascal (R), with no random values changing the heights, there's definitely a problem on the left! I've removed all the signed values, everything *should* be uint8_t. Time to debug. Quote Link to comment Share on other sites More sharing options...
fenrock Posted October 27, 2020 Author Share Posted October 27, 2020 (edited) Fixed it. The array heights were either not getting set correctly, or getting trashed on startup. EDIT: Well that was educational. Turned out to be an "out by 1" error, when I was copying the name of the benchmark, it didn't have a terminating 0 at the end of the name, so ran into the next data section, which turned out to be the heights array, and was trashing them by turning them into screen codes. So it was my fault all along, not the programs Edited October 27, 2020 by fenrock 2 Quote Link to comment Share on other sites More sharing options...
zbyti Posted October 27, 2020 Share Posted October 27, 2020 (edited) 45 minutes ago, fenrock said: it didn't have a terminating 0 at the end of the name ZUUL? it was joke or hint? I must read about C jargon Edited October 27, 2020 by zbyti joke or hint Quote Link to comment Share on other sites More sharing options...
fenrock Posted October 27, 2020 Author Share Posted October 27, 2020 8 minutes ago, zbyti said: ZUUL? it was joke or hint? I must read about C jargon You called it and i didn't even notice! Quote Link to comment Share on other sites More sharing options...
zbyti Posted October 27, 2020 Share Posted October 27, 2020 (edited) I check last results... If you done everything right and compiler not cheating You are extremely fast on arrays, guessing bench also have quite good score. Nice job! Quote You called it and i didn't even notice! All credits to @Wrathchild :] Edited October 27, 2020 by zbyti typo Quote Link to comment Share on other sites More sharing options...
fenrock Posted October 27, 2020 Author Share Posted October 27, 2020 (edited) 1 hour ago, zbyti said: I check last results... If you done everything right and compiler not cheating You are extremely fast on arrays, guessing bench also have quite good score. Nice job! All credits to @Wrathchild :] The montecarlo is still using the pre-generated sqr() array, I haven't changed it to using fastmul yet. I changed the guessing game to 10x as it made the difference between using signed byte and normal byte more exaggerated. In the loop of 1000, if I use the signed byte, it reduces the time a lot. Probably because I copied the asm code from a good website for doing signed comparison, but my hand crafted version for unsigned vs signed comparison was rubbish I usually try and remember to inspect the code to check it isn't optimizing too much away - sometimes I have to return a value to ensure it's kept in the looping. It's tricky stopping kickc from doing what it's supposed to do I'm currently creating an md5.c benchmark, but having to write extra code fragments for the assembler to understand cardinal (4 byte) operations, so it's going slowly. If the results are fair (need double checking), then at the moment kickc is: Landscape: faster (quite a bit) Chessboard: slower (10%) QR 1D: faster (much) Countdown For/While: much slower Sieve 1028/1899: slower (quite a bit) Bubble Sort: faster (much) Montecarlo PI: faster (using fixed table) YoshPlus: slower (just) Guessing Game: faster (quite a bit) I'd really like to have a look at the FOR/WHILE differences, that's quite a lot. And in kickc, they are almost identical whichever way you go. EDIT: I'm going on the last picture in the benchmarks repository btw, if there's an updated one in the forums, apologies if I have the times wrong Edited October 27, 2020 by fenrock 2 Quote Link to comment Share on other sites More sharing options...
zbyti Posted October 28, 2020 Share Posted October 28, 2020 (edited) To be precise: you are extremely fast on small (byte indexed) arrays. This is what I was looking for for a chess program (that is why these two "much" tests were created) but I think Mad Pascal make up something Edited October 28, 2020 by zbyti make up ;) :D Quote Link to comment Share on other sites More sharing options...
zbyti Posted October 28, 2020 Share Posted October 28, 2020 (edited) example why I didn't write a suite for Action! Bubble Sort KickC: 2A41: A2 00 LDX #$00 2A43: BC 00 2F LDY $2F00,X 2A46: BD 01 2F LDA $2F01,X 2A49: 84 FF STY $FF ;FPTR2+1 2A4B: C5 FF CMP $FF ;FPTR2+1 2A4D: B0 07 BCS $2A56 2A4F: 9D 00 2F STA $2F00,X 2A52: 98 TYA 2A53: 9D 01 2F STA $2F01,X 2A56: E8 INX 2A57: E0 FE CPX #$FE 2A59: D0 E8 BNE $2A43 2A5B: C6 88 DEC $88 ;STMTAB 2A5D: A9 FF LDA #$FF 2A5F: C5 88 CMP $88 ;STMTAB 2A61: D0 DE BNE $2A41 2A63: 60 RTS Action!: 31B5: A5 CD LDA $CD 31B7: D0 03 BNE $31BC 31B9: 4C 0B 32 JMP $320B 31BC: A0 00 LDY #$00 31BE: 84 CC STY $CC 31C0: A9 FD LDA #$FD 31C2: C5 CC CMP $CC 31C4: B0 03 BCS $31C9 31C6: 4C 01 32 JMP $3201 31C9: A6 CC LDX $CC 31CB: BD 13 20 LDA $2013,X 31CE: 85 CA STA $CA ;LOADFLG 31D0: 18 CLC 31D1: A5 CC LDA $CC 31D3: 69 01 ADC #$01 31D5: 85 AE STA $AE ;LELNUM+1 31D7: A6 AE LDX $AE ;LELNUM+1 31D9: BD 13 20 LDA $2013,X 31DC: 85 CB STA $CB 31DE: A5 CB LDA $CB 31E0: C5 CA CMP $CA ;LOADFLG 31E2: 90 03 BCC $31E7 31E4: 4C FC 31 JMP $31FC 31E7: A5 CB LDA $CB 31E9: A6 CC LDX $CC 31EB: 9D 13 20 STA $2013,X 31EE: 18 CLC 31EF: A5 CC LDA $CC 31F1: 69 01 ADC #$01 31F3: 85 AE STA $AE ;LELNUM+1 31F5: A5 CA LDA $CA ;LOADFLG 31F7: A6 AE LDX $AE ;LELNUM+1 31F9: 9D 13 20 STA $2013,X 31FC: E6 CC INC $CC 31FE: 4C C0 31 JMP $31C0 3201: 38 SEC 3202: A5 CD LDA $CD 3204: E9 01 SBC #$01 3206: 85 CD STA $CD 3208: 4C B5 31 JMP $31B5 320B: A2 21 LDX #$21 31D9: BD 13 20 LDA $2013,X 31DC: 85 CB STA $CB 31DE: A5 CB LDA $CB 31E0: C5 CA CMP $CA ;LOADFLG 31E2: 90 03 BCC $31E7 31E4: 4C FC 31 JMP $31FC 31E7: A5 CB LDA $CB 31E9: A6 CC LDX $CC 31EB: 9D 13 20 STA $2013,X 31EE: 18 CLC 31EF: A5 CC LDA $CC 31F1: 69 01 ADC #$01 31F3: 85 AE STA $AE ;LELNUM+1 31F5: A5 CA LDA $CA ;LOADFLG 31F7: A6 AE LDX $AE ;LELNUM+1 31F9: 9D 13 20 STA $2013,X 31FC: E6 CC INC $CC 31FE: 4C C0 31 JMP $31C0 3201: 38 SEC 3202: A5 CD LDA $CD 3204: E9 01 SBC #$01 3206: 85 CD STA $CD 3208: 4C B5 31 JMP $31B5 For many years Action! was the fastest native compiler on Atari, maybe even on any 8-bit system? Maybe the optimized Advan Basic can sometimes match the Action! speed... BS.ACT Edited October 28, 2020 by zbyti fastest compiler 2 Quote Link to comment Share on other sites More sharing options...
fenrock Posted October 30, 2020 Author Share Posted October 30, 2020 (edited) I've been working on an md5 implementation for the benchmarks, just added to the repo. If you want to compile this yourself, you'll need to download and build the latest kickc as there's a bunch of fragments for 32 bit integers that needed to be written. Thanks to @JesperGravgaard for doing that. I haven't done a full implementation like @tebe has done at https://github.com/tebe6502/Mad-Pascal/blob/master/lib/md5.pas, instead, it just creates md5 values. Which is a testament to how good the Mad Pascal version is. I've also added the latest results to the repo's front page: The md5 implementation has add about 2k to the binary (there's a lot of static tables for initialising vectors), it's now 7,099 bytes. Will have some time over the weekend to add more benchmarks. I fancy having a go at the fire one for more eye candy EDIT: Cool, the linked image updates when I change it in the repo! Added matrix trans, using 1D arrays, indexed with offsets to emulate 2D. Edited October 30, 2020 by fenrock add size info 2 Quote Link to comment Share on other sites More sharing options...
zbyti Posted October 30, 2020 Share Posted October 30, 2020 38 minutes ago, fenrock said: Added matrix trans, using 1D arrays, indexed with offsets to emulate 2D. In Mad Pascal 2D Arrays relies (under the hood) on multiplication - its only for clarification Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.