KickC Benchmark Tests (cf Mad Pascal) ... WIP

funkheld · October 27, 2020

what about effectus 0.5.3 for the demo?

I guess it's used a lot

greeting

Edited October 27, 2020 by funkheld

zbyti · October 27, 2020

@fenrock if you want to use flames as benchmark use D40B VCOUNT ;)

zbyti · October 27, 2020

@fenrock

As far as I understand you use lookup table for multiplication, If you want to fair comparison with Mad Pascal use

{$f $60}

directive in MP suite. I'm not remember well if $6000 is free area but you can check it

FASTMUL {$F page}

Seriously fast multiplication (8-bit and 16-bit)

{$f $70}  // fastmul at $7000

Alternatywne procedury szybkiego mnożenia dla typu BYTE, SHORTINT, WORD, SMALLINT, SHORTREAL. Procedury zajmują 2KB i są umieszczane od adresu PAGE*256.

Edited October 27, 2020 by zbyti
more info

fenrock · October 27, 2020

Just now, zbyti said:
@fenrock

As far as I understand you use lookup table for multiplication, If you want to fair comparison with Mad Pascal use
{$f $70}
directive in MP suite.

Thanks, I've been wondering on a solution for this, as storing the first 127 squared numbers in an array is kind-of avoiding the point of the test.

I'll look into this.

fenrock · October 27, 2020

@zbyti what is the story behind the md5 benchmark?

Is it to test the public freepascal implementation here?

I may have to find a C version to use to compare to it.

Gury · October 27, 2020

I am glad to see new actor in Atari 8-bit world. KickC seems like good competitor joining existing languages for our beloved machine. I see it like good alternative for CC65 providing C language as the basis. This and Mad Pascal can also provide it with new comparisons in speed with new examples, which zbyti is preparing for us, even for other languages. If there will be even more interest for this language, it will be good candidate for supporting in Mad Studio.

zbyti · October 27, 2020

2 minutes ago, fenrock said:

@zbyti what is the story behind the md5 benchmark?

From the beginning I not wanted to test language specific libraries, only essentials. But @tebe beat VBCC and I did an exception

zbyti · October 27, 2020

@fenrock Monte Carlo Case

CC65 uses:

; 8x16 routine with external entry points used by the 16x16 routine in mul.s
tosmula0:
tosumula0:
        sta     ptr4
mul8x16:jsr     popptr1         ; Get left operand (Y=0 by popptr1)

        tya                     ; Clear byte 1
        ldy     #8              ; Number of bits
        ldx     ptr1+1          ; check if lhs is 8 bit only
        beq     mul8x8          ; Do 8x8 multiplication if high byte zero
mul8x16a:
        sta     ptr4+1          ; Clear byte 2

        lsr     ptr4            ; Get first bit into carry
@L0:    bcc     @L1

        clc
        adc     ptr1
        tax
        lda     ptr1+1          ; hi byte of left op
        adc     ptr4+1
        sta     ptr4+1
        txa

@L1:    ror     ptr4+1
        ror     a
        ror     ptr4
        dey
        bne     @L0
        tax
        lda     ptr4            ; Load the result
        rts

Mad Pascal uses:

;
; Ullrich von Bassewitz, 2009-08-17
;
; CC65 runtime: 8x8 => 16 unsigned multiplication
;

*/
.proc	imulCL
ptr1 = ecx
ptr4 = eax
	
	ldy #8
	lda #0

        lsr     ptr4            ; Get first bit into carry
@L0:    bcc     @L1
        clc
        adc     ptr1
@L1:    ror	@
        ror     ptr4
        dey
        bne     @L0
        sta	ptr4+1

	rts
.endp

@tebe explained:

MP extends the type for multiplication and other operations,
for u8xu8 it assumes the result of 16b, only later during optimization
it will start to reject redundant operations

result is written to eax, eax + 1 (16b)

procedure in the tests turned out to be several scanning lines faster than the one used previously:

.proc	imulCL

	lda #$00

	LDY #$09
	CLC
LOOP	ROR @
	ROR eax
	BCC MUL2
	CLC		;DEC AUX above to remove CLC
	ADC ecx
MUL2	DEY
	BNE LOOP

	STA eax+1

	RTS
.endp

Millfork approach:

byte*byte produces a byte, this is by design. An arithmetic operator never promotes the result to a type larger that the type of its arguments.

In order to get a word, you need to explicitly cast one of the arguments to word: x = n*word(n)
This causes a call to __mul_u16u8u16, which is defined in m6502/zp_reg.mfk.
The same file also contains __mul_u16u16u16 and __mul_u8u8u8, plus all the division and modulo implementations.

Edited October 27, 2020 by zbyti
Millfork

tebe · October 27, 2020

2 hours ago, fenrock said:

@zbyti what is the story behind the md5 benchmark?

Is it to test the public freepascal implementation here?

I may have to find a C version to use to compare to it.

https://github.com/tebe6502/Mad-Pascal/blob/master/lib/md5.pas

zbyti · October 27, 2020

@fenrock it's funny how big hole you have in the middle of the screen in landscape

atari001.png.d87fc15026c340a33c7d99f973c91737.png atari002.png.7308cb818026802620b51573dd59ece0.png atari003.png.f98e6a4ca07173e2b37fae9a75a99b17.png

funkheld · October 27, 2020

Hi, Thank You.

at kickc the suite.xex is now 4750 bytes small.

greeting

Edited October 27, 2020 by funkheld

zbyti · October 27, 2020

3 hours ago, zbyti said:

@fenrock it's funny how big hole you have in the middle of the screen in landscape

char landscapeBase[] = kickasm {{
    .byte $AA, $96, $90, $90, $7A, $7A, $6E, $6E, $5E, $5E, $56, $56, $52, $50 
  }};

looks like signed/unsigned char problem to me but that's just only a deduction :]

Edited October 27, 2020 by zbyti
snippet

fenrock · October 27, 2020

37 minutes ago, zbyti said:

looks like signed/unsigned char problem to me but that's just only a deduction :]

This is entirely possible, I'll check. I've had to write some fragments for some of the conditional code that I may have got wrong.

It does look like it's right between the 7A and 90 value in the array, so good shout on it being signed issue.

Wrathchild · October 27, 2020

There is no signed or unsigned... only ZUUL ?

zbyti · October 27, 2020

Wake up Ripley! You are not a ZUUL! We are on the LV-426 and we are screwed...

fenrock · October 27, 2020

Well, running side by side, kickc (L) vs pascal (R), with no random values changing the heights, there's definitely a problem on the left!

I've removed all the signed values, everything *should* be uint8_t. Time to debug.

fenrock · October 27, 2020

Fixed it.

The array heights were ~~either not getting set correctly, or~~ getting trashed on startup.

EDIT: Well that was educational.

Turned out to be an "out by 1" error, when I was copying the name of the benchmark, it didn't have a terminating 0 at the end of the name, so ran into the next data section, which turned out to be the heights array, and was trashing them by turning them into screen codes.

So it was my fault all along, not the programs

Edited October 27, 2020 by fenrock

zbyti · October 27, 2020

45 minutes ago, fenrock said:

it didn't have a terminating 0 at the end of the name

ZUUL? it was joke or hint? I must read about C jargon

Edited October 27, 2020 by zbyti
joke or hint

fenrock · October 27, 2020

8 minutes ago, zbyti said:

ZUUL? it was joke or hint? I must read about C jargon

You called it and i didn't even notice!

zbyti · October 27, 2020

I check last results... If you done everything right and compiler not cheating You are extremely fast on arrays, guessing bench also have quite good score. Nice job!

Quote

You called it and i didn't even notice!

All credits to @Wrathchild :]

Edited October 27, 2020 by zbyti
typo

fenrock · October 27, 2020

1 hour ago, zbyti said:

I check last results... If you done everything right and compiler not cheating You are extremely fast on arrays, guessing bench also have quite good score. Nice job!

All credits to @Wrathchild :]

The montecarlo is still using the pre-generated sqr() array, I haven't changed it to using fastmul yet.

I changed the guessing game to 10x as it made the difference between using signed byte and normal byte more exaggerated. In the loop of 1000, if I use the signed byte, it reduces the time a lot.

Probably because I copied the asm code from a good website for doing signed comparison, but my hand crafted version for unsigned vs signed comparison was rubbish

I usually try and remember to inspect the code to check it isn't optimizing too much away - sometimes I have to return a value to ensure it's kept in the looping. It's tricky stopping kickc from doing what it's supposed to do

I'm currently creating an md5.c benchmark, but having to write extra code fragments for the assembler to understand cardinal (4 byte) operations, so it's going slowly.

If the results are fair (need double checking), then at the moment kickc is:

Landscape: faster (quite a bit)

Chessboard: slower (10%)

QR 1D: faster (much)

Countdown For/While: much slower

Sieve 1028/1899: slower (quite a bit)

Bubble Sort: faster (much)

Montecarlo PI: faster (using fixed table)

YoshPlus: slower (just)

Guessing Game: faster (quite a bit)

I'd really like to have a look at the FOR/WHILE differences, that's quite a lot. And in kickc, they are almost identical whichever way you go.

EDIT: I'm going on the last picture in the benchmarks repository btw, if there's an updated one in the forums, apologies if I have the times wrong

Edited October 27, 2020 by fenrock

zbyti · October 28, 2020

To be precise: you are extremely fast on small (byte indexed) arrays. This is what I was looking for for a chess program (that is why these two "much" tests were created) but I think Mad Pascal make up something

Edited October 28, 2020 by zbyti
make up ;) :D

zbyti · October 28, 2020

example why I didn't write a suite for Action!

Bubble Sort KickC:

2A41: A2 00     LDX #$00
2A43: BC 00 2F  LDY $2F00,X
2A46: BD 01 2F  LDA $2F01,X
2A49: 84 FF     STY $FF     ;FPTR2+1
2A4B: C5 FF     CMP $FF     ;FPTR2+1
2A4D: B0 07     BCS $2A56
2A4F: 9D 00 2F  STA $2F00,X
2A52: 98        TYA
2A53: 9D 01 2F  STA $2F01,X
2A56: E8        INX
2A57: E0 FE     CPX #$FE
2A59: D0 E8     BNE $2A43
2A5B: C6 88     DEC $88     ;STMTAB
2A5D: A9 FF     LDA #$FF
2A5F: C5 88     CMP $88     ;STMTAB
2A61: D0 DE     BNE $2A41
2A63: 60        RTS

Action!:

31B5: A5 CD     LDA $CD
31B7: D0 03     BNE $31BC
31B9: 4C 0B 32  JMP $320B
31BC: A0 00     LDY #$00
31BE: 84 CC     STY $CC
31C0: A9 FD     LDA #$FD
31C2: C5 CC     CMP $CC
31C4: B0 03     BCS $31C9
31C6: 4C 01 32  JMP $3201
31C9: A6 CC     LDX $CC
31CB: BD 13 20  LDA $2013,X
31CE: 85 CA     STA $CA     ;LOADFLG
31D0: 18        CLC
31D1: A5 CC     LDA $CC
31D3: 69 01     ADC #$01
31D5: 85 AE     STA $AE     ;LELNUM+1
31D7: A6 AE     LDX $AE     ;LELNUM+1
31D9: BD 13 20  LDA $2013,X
31DC: 85 CB     STA $CB
31DE: A5 CB     LDA $CB
31E0: C5 CA     CMP $CA     ;LOADFLG
31E2: 90 03     BCC $31E7
31E4: 4C FC 31  JMP $31FC
31E7: A5 CB     LDA $CB
31E9: A6 CC     LDX $CC
31EB: 9D 13 20  STA $2013,X
31EE: 18        CLC
31EF: A5 CC     LDA $CC
31F1: 69 01     ADC #$01
31F3: 85 AE     STA $AE     ;LELNUM+1
31F5: A5 CA     LDA $CA     ;LOADFLG
31F7: A6 AE     LDX $AE     ;LELNUM+1
31F9: 9D 13 20  STA $2013,X
31FC: E6 CC     INC $CC
31FE: 4C C0 31  JMP $31C0
3201: 38        SEC
3202: A5 CD     LDA $CD
3204: E9 01     SBC #$01
3206: 85 CD     STA $CD
3208: 4C B5 31  JMP $31B5
320B: A2 21     LDX #$21
31D9: BD 13 20  LDA $2013,X
31DC: 85 CB     STA $CB
31DE: A5 CB     LDA $CB
31E0: C5 CA     CMP $CA     ;LOADFLG
31E2: 90 03     BCC $31E7
31E4: 4C FC 31  JMP $31FC
31E7: A5 CB     LDA $CB
31E9: A6 CC     LDX $CC
31EB: 9D 13 20  STA $2013,X
31EE: 18        CLC
31EF: A5 CC     LDA $CC
31F1: 69 01     ADC #$01
31F3: 85 AE     STA $AE     ;LELNUM+1
31F5: A5 CA     LDA $CA     ;LOADFLG
31F7: A6 AE     LDX $AE     ;LELNUM+1
31F9: 9D 13 20  STA $2013,X
31FC: E6 CC     INC $CC
31FE: 4C C0 31  JMP $31C0
3201: 38        SEC
3202: A5 CD     LDA $CD
3204: E9 01     SBC #$01
3206: 85 CD     STA $CD
3208: 4C B5 31  JMP $31B5

For many years Action! was the fastest native compiler on Atari, maybe even on any 8-bit system? Maybe the optimized Advan Basic can sometimes match the Action! speed...

BS.ACT

Edited October 28, 2020 by zbyti
fastest compiler

fenrock · October 30, 2020

I've been working on an md5 implementation for the benchmarks, just added to the repo.

If you want to compile this yourself, you'll need to download and build the latest kickc as there's a bunch of fragments for 32 bit integers that needed to be written.

Thanks to @JesperGravgaard for doing that.

I haven't done a full implementation like @tebe has done at https://github.com/tebe6502/Mad-Pascal/blob/master/lib/md5.pas, instead, it just creates md5 values.

Which is a testament to how good the Mad Pascal version is.

I've also added the latest results to the repo's front page:

The md5 implementation has add about 2k to the binary (there's a lot of static tables for initialising vectors), it's now 7,099 bytes.

Will have some time over the weekend to add more benchmarks. I fancy having a go at the fire one for more eye candy

EDIT: Cool, the linked image updates when I change it in the repo!

Added matrix trans, using 1D arrays, indexed with offsets to emulate 2D.

Edited October 30, 2020 by fenrock
add size info

zbyti · October 30, 2020

38 minutes ago, fenrock said:

Added matrix trans, using 1D arrays, indexed with offsets to emulate 2D.

In Mad Pascal 2D Arrays relies (under the hood) on multiplication - its only for clarification

KickC Benchmark Tests (cf Mad Pascal) ... WIP

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members