Jump to content

Photo

Benchmarking Languages


159 replies to this topic

#26 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 3,909 posts
  • Location:Denmark

Posted Sun Jan 24, 2016 10:46 AM

Many languages for the TI, including TurboForth, GPL, GCC and Java - are indeed "compiled", so I don't see any problem with XB being "optimized" by simply compiling it.
 
So this is how I would put it. ;)
 
Language   First Pass    Optimized
Assembly     17 sec         5 sec
TurboForth   48 sec        29 sec
XB         2000 sec        37 sec
FbForth      70 sec        58 sec
GPL          80 sec       none yet
ABASIC      490 sec       none yet


#27 senior_falcon OFFLINE  

senior_falcon

    Dragonstomper

  • 911 posts
  • Location:Lansing, NY, USA

Posted Sun Jan 24, 2016 10:52 AM

Just to make sure we're comparing apples to apples, 51 seconds is probably the right number for compiled XB.  That is the speed running with the normal 32K memory.  The 37 seconds is the time when the 32K is on the 16 bit bus, which I don't think was used for the other times.



#28 Lee Stewart OFFLINE  

Lee Stewart

    River Patroller

  • 3,315 posts
  • Location:Silver Run, Maryland

Posted Sun Jan 24, 2016 11:22 AM

So, that would make it now

Language   First Pass    Optimized
--------   ----------    ---------
Assembly     17 sec         5 sec
fbForth      70 sec        26 sec
TurboForth   48 sec        29 sec
XB         2000 sec        51 sec
GPL          80 sec       none yet
ABASIC      490 sec       none yet

...lee



#29 mizapf OFFLINE  

mizapf

    River Patroller

  • 2,512 posts
  • Location:Germany

Posted Sun Jan 24, 2016 12:22 PM

My benchmarks (on MESS, can check with real iron later), hand-stopped:

 

TI-99/4A (8bit 32K)

Slow: 17.7 s

Optimized: 7.4 s

 

TI-99/4A (16bit 0ws 32K)

Slow: 10.1 s

Optimized: 4.9 s

 

Geneve (GPL mode):

Slow: 5.6 s

Optimized: 3.3 s

 

But that's a bit too short for a reliable timing by hand.



#30 Tursi OFFLINE  

Tursi

    River Patroller

  • Topic Starter
  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Sun Jan 24, 2016 12:50 PM

No, I can't agree that compiling a language is the same as optimizing it -- even if the compiler did full XB, which it does not. (For instance, it's an Integer BASIC rather than Floating point).

 

The goal of the "optimized" column is "how fast can the same environment be made to run by a skilled developer", and it's no longer the same environment. Likewise I would reject stepping outside the language with assembly subroutines and the like. ;)

 

Mizapf -- I don't know what language or code you are testing with those scores. :) But I definitely do not wish to start a new column for every hardware configuration - so no 16-bit RAM tests are needed. it's not a test of the language when the hardware changes. (I'll remove the XB Compiled second score too - I assumed that time was referring to a different runtime).

 

When I did my timing I rounded all fractional seconds down due to being hand timed. :)

 

Still sorting by the first column. ;) The optimized timings aren't really setting a baseline, they're setting a mastery score. ;)

Language   First Pass    Optimized
Assembly     17 sec         5 sec
TurboForth   48 sec        29 sec
Compiled XB  51 sec       none yet
FbForth      70 sec        26 sec
GPL          80 sec       none yet
ABASIC      490 sec       none yet
XB         2000 sec       none yet

Edited by Tursi, Sun Jan 24, 2016 12:54 PM.


#31 mizapf OFFLINE  

mizapf

    River Patroller

  • 2,512 posts
  • Location:Germany

Posted Sun Jan 24, 2016 12:57 PM

It was of course the assembly language version (both the first one (optimized) and your VSBW version (slow)). And I thought it could be interesting to have another number since the 16bit expansion was not that uncommon. OK, I had one.

 

Sure, it's intended to be a language comparison and not a hardware comparison, so if you thing it's not quite on topic, you need not add those numbers. :-)



#32 Vorticon OFFLINE  

Vorticon

    River Patroller

  • 2,726 posts
  • Location:Eagan, MN, USA

Posted Sun Jan 24, 2016 9:48 PM

UCSD pascal benchmark data:

 

First pass: 7300 seconds

Program benchmrk;
uses support,sprite;
var
	x,y,cnt : integer;
begin
	page(output);
	set_screen(2);
	set_spr_size(1);
	set_spr_attribute(1,42,2,0,1,1,0,0);
	cnt := 100;
	while cnt > 0 do
		begin
			for x := 1 to 240 do
				set_spr_attribute(1,42,2,0,1,x,0,0);
			for y := 1 to 176 do
				set_spr_attribute(1,42,2,0,y,240,0,0);
			for x := 240 downto 1 do
				set_spr_attribute(1,42,2,0,176,x,0,0);
			for y := 176 downto 1 do
				set_spr_attribute(1,42,2,0,y,1,0,0);
		        cnt := cnt - 1;
		 end;
end.

Optimized: 780 seconds

Program benchmrk;
uses support,sprite;
var
	x,y,cnt : integer;
	spr : link;
begin
	page(output);
	set_screen(2);
	set_spr_size(1);
	set_spr_attribute(1,42,2,0,1,1,0,0);
	cnt := 100;
        new(spr);
	spr^.packet := [spr_x_pos,spr_y_pos];
	while cnt > 0 do
		begin
			for x := 1 to 240 do
				begin
					with spr^ do
						begin
							x_pos := x;
							y_pos := 1;
						end;
					set_sprite(1,spr);
				end;
			for y := 1 to 176 do
				begin
					with spr^ do
						begin
							x_pos := 240;
							y_pos := y;
						end;
					set_sprite(1,spr);
				end;
			for x := 240 downto 1 do
				begin
					with spr^ do
						begin	
							x_pos := x;
							y_pos := 176;
						end;
					set_sprite(1,spr);
				end;
			for y := 176 downto 1 do
				begin
					with spr^ do
						begin
							x_pos := 1;
							Y_pos := y;
						end;
					set_sprite(1,spr);
				end;
			cnt := cnt - 1;
		end;
end.

Kind of disappointed frankly... Now if I was to include embedded assembly into the mix, then it would have been a different matter :) Incidentally, the above code will only find the system libraries (support and sprite) if the compiler disk in in drive 1 (i.e #4: ). Not quite sure why... That said, the UCSD pascal TI implementation is pretty powerful nonetheless, with advanced features such as concurrent processes, segment routines and units and it's a shame it never became popular within the TI community. When coupled with a few strategic external assembly routines to speed up graphics, it could be quite capable.

 

So latest update:

Language   First Pass    Optimized
Assembly     17 sec         5 sec
TurboForth   48 sec        29 sec
Compiled XB  51 sec       none yet
FbForth      70 sec        26 sec
GPL          80 sec       none yet
ABASIC      490 sec       none yet
XB          2000 sec      none yet
UCSD Pascal 7300 sec      780 sec


#33 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 3,909 posts
  • Location:Denmark

Posted Mon Jan 25, 2016 2:09 AM

This seems like a straightforward benchmark, but what does it actually mean to move a sprite around at a rate faster than 1/60s, resulting in visual frames being skipped? ;)


He he. Yes, we look at the "purpose" and "end result" of the XB program, = (setup and) move sprite. And then "optimize" or "convert" (heading straight for integers) and "benchmark" (compare) only "speed".

Indeed, when you can't see the sprite move, the benchmark itself could come in question. We look at the program and "believe", or we may "slow down" the program to "witness" correct movement of sprite. If the count or sprite is off by one or two, it really doesn't matter.

If we had an XB program moving sprites with values that are not easily represented with integers, let's consider complex floating point exercises, - would the XB program then be considered "unfair" or would we quickly look at "purpose" and "end result", and then rewrite and "emulate". But then I think floating point is available to most of the languages.

Apart from speed, other elements can be used in benchmarking. Accuracy. Size of program. Footprint(s). Time(s) to boot, load, compile and get running - maybe compile on both the TI versus cross-platform. Amount of support besides stock console, eg. cartridge and Memory Expansion. Learning curve, going from this or that. Experience needed, optimizing TurboForth with or without inline assembly may require "Expert".

What I'm saying is probably, XB is good, old, easy, sleazy, sexy and/or quick for certain stuff. And vice versa.

;)

Edited by sometimes99er, Mon Jan 25, 2016 5:15 AM.


#34 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 744 posts
  • Location:Belgium

Posted Mon Jan 25, 2016 4:47 AM

The following GCC program, using Tursi's libti99 (which is the defacto standard utility library for GCC programs), clocks in at 15 seconds (hand-timed). I'll do an optimized version that writes to VDP RAM directly later today.
 
int main(int argc, char *argv[])
{
	// Set registers for graphics mode, with 8x8 sprite magnification on
	int regval = set_graphics(VDP_SPR_8x8MAG);
	VDP_SET_REGISTER(VDP_REG_MODE1, regval);

	// Drop sprite 0, with pattern 42 and the color green on the screen at position 1,1
	sprite(0, 42, 2, 1, 1);

	// Loop 100 times
	int x, y, cnt = 100;
	while (cnt)
	{
		for (x = 1; x < 240; x++)
			sprite(0, 42, 2, 1, x);
		for (y = 1; y < 176; y++)
			sprite(0, 42, 2, y, 240);
		for (x = 240; x > 1; x--)
			sprite(0, 42, 2, 176, x);
		for (y = 176; y > 1; y--)
			sprite(0, 42, 2, y, 1);

		cnt--;
	}

	while(1) { /* idle loop */ }; return 0;
}
So, non hand-optimized C code beats non hand-optimized assembler code. Although I guess that's to be expected when using the -Os flag.

#35 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 744 posts
  • Location:Belgium

Posted Mon Jan 25, 2016 5:03 AM

And the following optimized version takes 5 seconds to run (again, timed by hand):
 
#define SPR0_X_VRAM_LOC		0x0301
#define SPR0_Y_VRAM_LOC		0x0300
int main(int argc, char *argv[])
{
	// Set registers for graphics mode
	int regval = set_graphics(VDP_SPR_8x8MAG);
	VDP_SET_REGISTER(VDP_REG_MODE1, regval);

	// Drop a sprite on the screen
	sprite(0, 42, 2, 1, 1);

	// Loop 100 times
	int x, y, cnt = 100;
	while (cnt)
	{
		for (x = 1; x < 240; x++)
		{
			VDP_SET_ADDRESS_WRITE(SPR0_X_VRAM_LOC);
			VDPWD = x;
		}
		for (y = 1; y < 176; y++)
		{
			VDP_SET_ADDRESS_WRITE(SPR0_Y_VRAM_LOC);
			VDPWD = y;
		}
		for (; x > 1; x--)
		{
			VDP_SET_ADDRESS_WRITE(SPR0_X_VRAM_LOC);
			VDPWD = x;
		}
		for (; y > 1; y--)
		{
			VDP_SET_ADDRESS_WRITE(SPR0_Y_VRAM_LOC);
			VDPWD = y;
		}

		cnt--;
	}

	while(1) { /* idle loop */ }; return 0;
}
Assembler is surely still faster, but 100 loops is not enough to expose a measurable difference.

*edit* Since we're updating the table ourselves, here goes:
Language   First Pass    Optimized
GCC          15 sec         5 sec
Assembly     17 sec         5 sec
TurboForth   48 sec        29 sec
Compiled XB  51 sec       none yet
FbForth      70 sec        26 sec
GPL          80 sec       none yet
ABASIC      490 sec       none yet
XB          2000 sec      none yet
UCSD Pascal 7300 sec      780 sec

Edited by TheMole, Mon Jan 25, 2016 5:05 AM.


#36 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 3,909 posts
  • Location:Denmark

Posted Mon Jan 25, 2016 5:14 AM

Excellent.  :thumbsup: 
 
All of which I believe only Assembly can be run / made run faster / "optimized" for the F18A ?
 
And how about MLC by moulinaie. And perhaps not forgetting Java by mikeakohn.
 
;)

#37 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Mon Jan 25, 2016 5:22 AM

So in terms of speed versus programming effort it's clear: we should all be using C since that's is very close to assembly language speed but much easier to write the code in C than it is in Forth or assembly.

Hats of to the GCC compiler for the 9900. Seriously impressive.

#38 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 744 posts
  • Location:Belgium

Posted Mon Jan 25, 2016 5:49 AM

All of which I believe only Assembly can be run / made run faster / "optimized" for the F18A ?

You can target the F18A GPU with GCC as well. I've done this for my two F18A demos (Fire: http://atariage.com/...rces/?p=3278174and Street Fighter: http://atariage.com/...es/?p=3259078).
 

And how about MLC by moulinaie. And perhaps not forgetting Java by mikeakohn.
 
;)


Would love to see that, the alternative languages deserve some love!

#39 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 3,909 posts
  • Location:Denmark

Posted Mon Jan 25, 2016 7:59 AM

So in terms of speed versus programming effort it's clear: we should all be using C since that's is very close to assembly language speed but much easier to write the code in C than it is in Forth or assembly.


I believe there are other factors than speed and ease of programming language, with both being a matter of opinion, taste and where you are (as a programmer etc.) !?

Assembly is compact, close to the chips and you're totally in control. And you can stay with the TI instead of going cross-platform.

C is very nice. Can you compile on the TI ? Personally I quit on GCC just looking at the installation process and time involved.

Java looks very nice. Definitely cross-platform. Relatively easy to set up.

Now, I guess I may have picked up a bit of momentum with Assembly, so I'm not sure I wanna go with the overhead and runtime of C - or TurboForth for that matter. Have to say I can quite easily debug in Assembly. I think I understand Forth, but boy did it lock up on me many times. I'm just wondering why there's so maaany words, when v!, v@, !, @, and others already open up literally the whole world. But sure, one won't hardly notice if the cartridge ROM is 8, 16, 32 or even 256KB.

And I guess you may even be more comfortable with Forth and Assembly than, say C and Basic ?

;)

Edited by sometimes99er, Mon Jan 25, 2016 8:00 AM.


#40 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Mon Jan 25, 2016 8:08 AM

Personally I quit on GCC just looking at the installation process and time

Totally. I have never tried it for exactly that reason.

And I guess you may even be more comfortable with Forth and Assembly than, say C and Basic ?

Yes I think that's true. Assembly first then Forth. Assembly on the 9900 family is the nicest I've ever used. 68k coming a close second place.

Edited by Willsy, Mon Jan 25, 2016 8:09 AM.


#41 Tursi OFFLINE  

Tursi

    River Patroller

  • Topic Starter
  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Mon Jan 25, 2016 9:03 PM

Kind of disappointed frankly... Now if I was to include embedded assembly into the mix, then it would have been a different matter :) Incidentally, the above code will only find the system libraries (support and sprite) if the compiler disk in in drive 1 (i.e #4: ). Not quite sure why... That said, the UCSD pascal TI implementation is pretty powerful nonetheless, with advanced features such as concurrent processes, segment routines and units and it's a shame it never became popular within the TI community. When coupled with a few strategic external assembly routines to speed up graphics, it could be quite capable.


Thanks! I'll have to see at some point if the DSK1 issue is the reason mine wouldn't build.

I am surprised by those scores! But I guess it is a fully multitasking operating system running in GPL. :/

#42 Tursi OFFLINE  

Tursi

    River Patroller

  • Topic Starter
  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Mon Jan 25, 2016 9:05 PM

If we had an XB program moving sprites with values that are not easily represented with integers, let's consider complex floating point exercises, - would the XB program then be considered "unfair" or would we quickly look at "purpose" and "end result", and then rewrite and "emulate". But then I think floating point is available to most of the languages.


Probably true! But I chose these criteria specifically, since the question was speed comparison of the languages, and it's a common choice. Absolutely true that if you want to do floating point math, you are making pain for yourself with a number of these choices and slower may be a better choice. ;) It all depends on what you want.

But no benchmark is inherently "fair" to all languages or all systems. :)

#43 Tursi OFFLINE  

Tursi

    River Patroller

  • Topic Starter
  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Mon Jan 25, 2016 9:09 PM

Assembler is surely still faster, but 100 loops is not enough to expose a measurable difference.


The GCC port to the 9900 does a really nice job. Since there were a lot of early bugs, I've spent a lot of time looking at the resulting assembly code, and it DOES often manage to create optimizations that I would not have thought of, or at least that I wouldn't have expected a compiler to get. It gets a little confused dealing with bytes instead of ints sometimes and produces less efficient code, but not by a lot.

I would expect your optimized version there to look very much like the one I wrote. That the unoptimized version BEATS my unoptimized version is almost certainly because of the overhead of VSBW (which does multiple BLWPs).. libti99 I wrote to do things quick and inline. ;)

#44 senior_falcon OFFLINE  

senior_falcon

    Dragonstomper

  • 911 posts
  • Location:Lansing, NY, USA

Posted Mon Jan 25, 2016 10:16 PM

Using POKEV to move the sprite instead of LOCATE, compiled XB runs in 37 seconds with the normal 32K.  (POKEV is not in the released version, it was just custom written for this test)



#45 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 3,909 posts
  • Location:Denmark

Posted Mon Jan 25, 2016 11:57 PM

Probably true! But I chose these criteria specifically, since the question was speed comparison of the languages, and it's a common choice.


Was it really "only" a question of speed - or "how do you think GPL will compare with p-code". Could very well just be speed. ;)

I think it's only natural/common to do "benchmarking" on the speed factor only. It's much more simple to keep other factors (for benchmarking) and requirements *) for each language out of the equation. It may have become irrelevant (over time) but I can understand the relative popularity/success of TI XB to TI GCC. Eg. counting people who have tried one and/or the other (since launch), counting programs using one or the other.

;)

*) Eg. does it need Memory Expansion. This used to be a show-stopper for many.

Edited by sometimes99er, Tue Jan 26, 2016 12:01 AM.


#46 Vorticon OFFLINE  

Vorticon

    River Patroller

  • 2,726 posts
  • Location:Eagan, MN, USA

Posted Tue Jan 26, 2016 12:42 AM

I think you are overthinking this. It really boils down to speed, notwithstanding the hardware requirements needed to run a specific language (p-code card, memory expansion etc...), as long as no programming "doping" is used with embedded assembly language.
I think this was a fun exercise and an eye opener, particularly regarding gcc and UCSD pascal :)

#47 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 3,909 posts
  • Location:Denmark

Posted Tue Jan 26, 2016 1:21 AM

I think you are overthinking this. It really boils down to speed, ...


Yes. And conclusions have been drawn. I can understand why. But are we all jumping on the GCC wagon ? ;)

Edited by sometimes99er, Tue Jan 26, 2016 2:23 AM.


#48 Tursi OFFLINE  

Tursi

    River Patroller

  • Topic Starter
  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Tue Jan 26, 2016 4:09 AM

Was it really "only" a question of speed - or "how do you think GPL will compare with p-code".


Okay, yes, fair... but speed is what I chose to measure. It's where I focus most of my own efforts and it's also one of the easiest things to measure. ;) Nobody says that there can only be one benchmark program and nobody says that these results are anything more than curiosities. By all means, if you want to see other tests -- create another program, start another thread, and let's start adding dimensions! :)
 

Yes. And conclusions have been drawn. I can understand why. But are we all jumping on the GCC wagon ?


I don't even understand this statement... we have a single table measuring one dimension and only one person said GCC looked like a way to go - the same person said he can't use GCC because of the entry barrier to getting it going in the first place. As for the bandwagon, I've been on the GCC bandwagon since the first release - I barely write assembly anymore. But I don't speak for anyone else. ;)

Elsewhere, I'm going to accept your optimized compiled speed, Senior Falcon, as long as you promise to "someday" release the VPOKE extension. ;)
 
Language   First Pass     Optimized
GCC           15 sec         5 sec
Assembly      17 sec         5 sec
TurboForth    48 sec        29 sec
Compiled XB   51 sec        37 sec
FbForth       70 sec        26 sec
GPL           80 sec       none yet
ABASIC       490 sec       none yet
XB          2000 sec       none yet
UCSD Pascal 7300 sec       780 sec

Edited by Tursi, Tue Jan 26, 2016 4:10 AM.


#49 Tursi OFFLINE  

Tursi

    River Patroller

  • Topic Starter
  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Tue Jan 26, 2016 4:11 AM

I just realized, even though I've written it many times, that it's "Senior Falcon", and I have always read it to myself as "Señor Falcon"... ;)

#50 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 3,909 posts
  • Location:Denmark

Posted Tue Jan 26, 2016 5:56 AM

I don't even understand this statement...


:|

Sorry about any barriers. The statement can easily be taken out of context and doesn't stand on its own. Hope my English isn't gibberish and/or my wording come on as too strong.

:)




0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users