Benchmarking Languages

sometimes99er · January 24, 2016

Many languages for the TI, including TurboForth, GPL, GCC and Java - are indeed "compiled", so I don't see any problem with XB being "optimized" by simply compiling it.

So this is how I would put it.

Language   First Pass    Optimized
Assembly     17 sec         5 sec
TurboForth   48 sec        29 sec
XB         2000 sec        37 sec
FbForth      70 sec        58 sec
GPL          80 sec       none yet
ABASIC      490 sec       none yet

senior_falcon · January 24, 2016

Just to make sure we're comparing apples to apples, 51 seconds is probably the right number for compiled XB. That is the speed running with the normal 32K memory. The 37 seconds is the time when the 32K is on the 16 bit bus, which I don't think was used for the other times.

+Lee Stewart · January 24, 2016

So, that would make it now

Language   First Pass    Optimized
--------   ----------    ---------
Assembly     17 sec         5 sec
fbForth      70 sec        26 sec
TurboForth   48 sec        29 sec
XB         2000 sec        51 sec
GPL          80 sec       none yet
ABASIC      490 sec       none yet

...lee

+mizapf · January 24, 2016

My benchmarks (on MESS, can check with real iron later), hand-stopped:

TI-99/4A (8bit 32K)

Slow: 17.7 s

Optimized: 7.4 s

TI-99/4A (16bit 0ws 32K)

Slow: 10.1 s

Optimized: 4.9 s

Geneve (GPL mode):

Slow: 5.6 s

Optimized: 3.3 s

But that's a bit too short for a reliable timing by hand.

Tursi · January 24, 2016

No, I can't agree that compiling a language is the same as optimizing it -- even if the compiler did full XB, which it does not. (For instance, it's an Integer BASIC rather than Floating point).

The goal of the "optimized" column is "how fast can the same environment be made to run by a skilled developer", and it's no longer the same environment. Likewise I would reject stepping outside the language with assembly subroutines and the like.

Mizapf -- I don't know what language or code you are testing with those scores. But I definitely do not wish to start a new column for every hardware configuration - so no 16-bit RAM tests are needed. it's not a test of the language when the hardware changes. (I'll remove the XB Compiled second score too - I assumed that time was referring to a different runtime).

When I did my timing I rounded all fractional seconds down due to being hand timed.

Still sorting by the first column. The optimized timings aren't really setting a baseline, they're setting a mastery score.

Language   First Pass    Optimized
Assembly     17 sec         5 sec
TurboForth   48 sec        29 sec
Compiled XB  51 sec       none yet
FbForth      70 sec        26 sec
GPL          80 sec       none yet
ABASIC      490 sec       none yet
XB         2000 sec       none yet

Edited January 24, 2016 by Tursi

+mizapf · January 24, 2016

It was of course the assembly language version (both the first one (optimized) and your VSBW version (slow)). And I thought it could be interesting to have another number since the 16bit expansion was not that uncommon. OK, I had one.

Sure, it's intended to be a language comparison and not a hardware comparison, so if you thing it's not quite on topic, you need not add those numbers. :-)

+Vorticon · January 25, 2016

UCSD pascal benchmark data:

First pass: 7300 seconds

Program benchmrk;
uses support,sprite;
var
	x,y,cnt : integer;
begin
	page(output);
	set_screen(2);
	set_spr_size(1);
	set_spr_attribute(1,42,2,0,1,1,0,0);
	cnt := 100;
	while cnt > 0 do
		begin
			for x := 1 to 240 do
				set_spr_attribute(1,42,2,0,1,x,0,0);
			for y := 1 to 176 do
				set_spr_attribute(1,42,2,0,y,240,0,0);
			for x := 240 downto 1 do
				set_spr_attribute(1,42,2,0,176,x,0,0);
			for y := 176 downto 1 do
				set_spr_attribute(1,42,2,0,y,1,0,0);
		        cnt := cnt - 1;
		 end;
end.

Optimized: 780 seconds

Program benchmrk;
uses support,sprite;
var
	x,y,cnt : integer;
	spr : link;
begin
	page(output);
	set_screen(2);
	set_spr_size(1);
	set_spr_attribute(1,42,2,0,1,1,0,0);
	cnt := 100;
        new(spr);
	spr^.packet := [spr_x_pos,spr_y_pos];
	while cnt > 0 do
		begin
			for x := 1 to 240 do
				begin
					with spr^ do
						begin
							x_pos := x;
							y_pos := 1;
						end;
					set_sprite(1,spr);
				end;
			for y := 1 to 176 do
				begin
					with spr^ do
						begin
							x_pos := 240;
							y_pos := y;
						end;
					set_sprite(1,spr);
				end;
			for x := 240 downto 1 do
				begin
					with spr^ do
						begin	
							x_pos := x;
							y_pos := 176;
						end;
					set_sprite(1,spr);
				end;
			for y := 176 downto 1 do
				begin
					with spr^ do
						begin
							x_pos := 1;
							Y_pos := y;
						end;
					set_sprite(1,spr);
				end;
			cnt := cnt - 1;
		end;
end.

Kind of disappointed frankly... Now if I was to include embedded assembly into the mix, then it would have been a different matter Incidentally, the above code will only find the system libraries (support and sprite) if the compiler disk in in drive 1 (i.e #4: ). Not quite sure why... That said, the UCSD pascal TI implementation is pretty powerful nonetheless, with advanced features such as concurrent processes, segment routines and units and it's a shame it never became popular within the TI community. When coupled with a few strategic external assembly routines to speed up graphics, it could be quite capable.

So latest update:

Language   First Pass    Optimized
Assembly     17 sec         5 sec
TurboForth   48 sec        29 sec
Compiled XB  51 sec       none yet
FbForth      70 sec        26 sec
GPL          80 sec       none yet
ABASIC      490 sec       none yet
XB          2000 sec      none yet
UCSD Pascal 7300 sec      780 sec

sometimes99er · January 25, 2016

This seems like a straightforward benchmark, but what does it actually mean to move a sprite around at a rate faster than 1/60s, resulting in visual frames being skipped?

He he. Yes, we look at the "purpose" and "end result" of the XB program, = (setup and) move sprite. And then "optimize" or "convert" (heading straight for integers) and "benchmark" (compare) only "speed".

Indeed, when you can't see the sprite move, the benchmark itself could come in question. We look at the program and "believe", or we may "slow down" the program to "witness" correct movement of sprite. If the count or sprite is off by one or two, it really doesn't matter.

If we had an XB program moving sprites with values that are not easily represented with integers, let's consider complex floating point exercises, - would the XB program then be considered "unfair" or would we quickly look at "purpose" and "end result", and then rewrite and "emulate". But then I think floating point is available to most of the languages.

Apart from speed, other elements can be used in benchmarking. Accuracy. Size of program. Footprint(s). Time(s) to boot, load, compile and get running - maybe compile on both the TI versus cross-platform. Amount of support besides stock console, eg. cartridge and Memory Expansion. Learning curve, going from this or that. Experience needed, optimizing TurboForth with or without inline assembly may require "Expert".

What I'm saying is probably, XB is good, old, easy, sleazy, sexy and/or quick for certain stuff. And vice versa.

Edited January 25, 2016 by sometimes99er

TheMole · January 25, 2016

The following GCC program, using Tursi's libti99 (which is the defacto standard utility library for GCC programs), clocks in at 15 seconds (hand-timed). I'll do an optimized version that writes to VDP RAM directly later today.

int main(int argc, char *argv[])
{
	// Set registers for graphics mode, with 8x8 sprite magnification on
	int regval = set_graphics(VDP_SPR_8x8MAG);
	VDP_SET_REGISTER(VDP_REG_MODE1, regval);

	// Drop sprite 0, with pattern 42 and the color green on the screen at position 1,1
	sprite(0, 42, 2, 1, 1);

	// Loop 100 times
	int x, y, cnt = 100;
	while (cnt)
	{
		for (x = 1; x < 240; x++)
			sprite(0, 42, 2, 1, x);
		for (y = 1; y < 176; y++)
			sprite(0, 42, 2, y, 240);
		for (x = 240; x > 1; x--)
			sprite(0, 42, 2, 176, x);
		for (y = 176; y > 1; y--)
			sprite(0, 42, 2, y, 1);

		cnt--;
	}

	while(1) { /* idle loop */ }; return 0;
}

So, non hand-optimized C code beats non hand-optimized assembler code. Although I guess that's to be expected when using the -Os flag.

TheMole · January 25, 2016

And the following optimized version takes 5 seconds to run (again, timed by hand):

#define SPR0_X_VRAM_LOC		0x0301
#define SPR0_Y_VRAM_LOC		0x0300
int main(int argc, char *argv[])
{
	// Set registers for graphics mode
	int regval = set_graphics(VDP_SPR_8x8MAG);
	VDP_SET_REGISTER(VDP_REG_MODE1, regval);

	// Drop a sprite on the screen
	sprite(0, 42, 2, 1, 1);

	// Loop 100 times
	int x, y, cnt = 100;
	while (cnt)
	{
		for (x = 1; x < 240; x++)
		{
			VDP_SET_ADDRESS_WRITE(SPR0_X_VRAM_LOC);
			VDPWD = x;
		}
		for (y = 1; y < 176; y++)
		{
			VDP_SET_ADDRESS_WRITE(SPR0_Y_VRAM_LOC);
			VDPWD = y;
		}
		for (; x > 1; x--)
		{
			VDP_SET_ADDRESS_WRITE(SPR0_X_VRAM_LOC);
			VDPWD = x;
		}
		for (; y > 1; y--)
		{
			VDP_SET_ADDRESS_WRITE(SPR0_Y_VRAM_LOC);
			VDPWD = y;
		}

		cnt--;
	}

	while(1) { /* idle loop */ }; return 0;
}

Assembler is surely still faster, but 100 loops is not enough to expose a measurable difference.

*edit* Since we're updating the table ourselves, here goes:

Language   First Pass    Optimized
GCC          15 sec         5 sec
Assembly     17 sec         5 sec
TurboForth   48 sec        29 sec
Compiled XB  51 sec       none yet
FbForth      70 sec        26 sec
GPL          80 sec       none yet
ABASIC      490 sec       none yet
XB          2000 sec      none yet
UCSD Pascal 7300 sec      780 sec

Edited January 25, 2016 by TheMole

sometimes99er · January 25, 2016

Excellent. :thumbsup:

All of which I believe only Assembly can be run / made run faster / "optimized" for the F18A ?

And how about MLC by moulinaie. And perhaps not forgetting Java by mikeakohn.

Willsy · January 25, 2016

So in terms of speed versus programming effort it's clear: we should all be using C since that's is very close to assembly language speed but much easier to write the code in C than it is in Forth or assembly.

Hats of to the GCC compiler for the 9900. Seriously impressive.

TheMole · January 25, 2016

All of which I believe only Assembly can be run / made run faster / "optimized" for the F18A ?

You can target the F18A GPU with GCC as well. I've done this for my two F18A demos (Fire: http://atariage.com/forums/topic/207586-f18a-programming-info-and-resources/?p=3278174and Street Fighter: http://atariage.com/forums/topic/207586-f18a-programming-info-and-resources/?p=3259078).

And how about MLC by moulinaie. And perhaps not forgetting Java by mikeakohn.

Would love to see that, the alternative languages deserve some love!

sometimes99er · January 25, 2016

So in terms of speed versus programming effort it's clear: we should all be using C since that's is very close to assembly language speed but much easier to write the code in C than it is in Forth or assembly.

I believe there are other factors than speed and ease of programming language, with both being a matter of opinion, taste and where you are (as a programmer etc.) !?

Assembly is compact, close to the chips and you're totally in control. And you can stay with the TI instead of going cross-platform.

C is very nice. Can you compile on the TI ? Personally I quit on GCC just looking at the installation process and time involved.

Java looks very nice. Definitely cross-platform. Relatively easy to set up.

Now, I guess I may have picked up a bit of momentum with Assembly, so I'm not sure I wanna go with the overhead and runtime of C - or TurboForth for that matter. Have to say I can quite easily debug in Assembly. I think I understand Forth, but boy did it lock up on me many times. I'm just wondering why there's so maaany words, when v!, v@, !, @, and others already open up literally the whole world. But sure, one won't hardly notice if the cartridge ROM is 8, 16, 32 or even 256KB.

And I guess you may even be more comfortable with Forth and Assembly than, say C and Basic ?

Edited January 25, 2016 by sometimes99er

Willsy · January 25, 2016

Personally I quit on GCC just looking at the installation process and time

Totally. I have never tried it for exactly that reason.

And I guess you may even be more comfortable with Forth and Assembly than, say C and Basic ?

Yes I think that's true. Assembly first then Forth. Assembly on the 9900 family is the nicest I've ever used. 68k coming a close second place. Edited January 25, 2016 by Willsy

Tursi · January 26, 2016

Kind of disappointed frankly... Now if I was to include embedded assembly into the mix, then it would have been a different matter Incidentally, the above code will only find the system libraries (support and sprite) if the compiler disk in in drive 1 (i.e #4: ). Not quite sure why... That said, the UCSD pascal TI implementation is pretty powerful nonetheless, with advanced features such as concurrent processes, segment routines and units and it's a shame it never became popular within the TI community. When coupled with a few strategic external assembly routines to speed up graphics, it could be quite capable.

Thanks! I'll have to see at some point if the DSK1 issue is the reason mine wouldn't build.

I am surprised by those scores! But I guess it is a fully multitasking operating system running in GPL. :/

Tursi · January 26, 2016

If we had an XB program moving sprites with values that are not easily represented with integers, let's consider complex floating point exercises, - would the XB program then be considered "unfair" or would we quickly look at "purpose" and "end result", and then rewrite and "emulate". But then I think floating point is available to most of the languages.

Probably true! But I chose these criteria specifically, since the question was speed comparison of the languages, and it's a common choice. Absolutely true that if you want to do floating point math, you are making pain for yourself with a number of these choices and slower may be a better choice. It all depends on what you want.

But no benchmark is inherently "fair" to all languages or all systems.

Tursi · January 26, 2016

Assembler is surely still faster, but 100 loops is not enough to expose a measurable difference.

The GCC port to the 9900 does a really nice job. Since there were a lot of early bugs, I've spent a lot of time looking at the resulting assembly code, and it DOES often manage to create optimizations that I would not have thought of, or at least that I wouldn't have expected a compiler to get. It gets a little confused dealing with bytes instead of ints sometimes and produces less efficient code, but not by a lot.

I would expect your optimized version there to look very much like the one I wrote. That the unoptimized version BEATS my unoptimized version is almost certainly because of the overhead of VSBW (which does multiple BLWPs).. libti99 I wrote to do things quick and inline.

senior_falcon · January 26, 2016

Using POKEV to move the sprite instead of LOCATE, compiled XB runs in 37 seconds with the normal 32K. (POKEV is not in the released version, it was just custom written for this test)

sometimes99er · January 26, 2016

Probably true! But I chose these criteria specifically, since the question was speed comparison of the languages, and it's a common choice.

Was it really "only" a question of speed - or "how do you think GPL will compare with p-code". Could very well just be speed.

I think it's only natural/common to do "benchmarking" on the speed factor only. It's much more simple to keep other factors (for benchmarking) and requirements *) for each language out of the equation. It may have become irrelevant (over time) but I can understand the relative popularity/success of TI XB to TI GCC. Eg. counting people who have tried one and/or the other (since launch), counting programs using one or the other.

*) Eg. does it need Memory Expansion. This used to be a show-stopper for many.

Edited January 26, 2016 by sometimes99er

+Vorticon · January 26, 2016

I think you are overthinking this. It really boils down to speed, notwithstanding the hardware requirements needed to run a specific language (p-code card, memory expansion etc...), as long as no programming "doping" is used with embedded assembly language.

I think this was a fun exercise and an eye opener, particularly regarding gcc and UCSD pascal

sometimes99er · January 26, 2016

I think you are overthinking this. It really boils down to speed, ...

Yes. And conclusions have been drawn. I can understand why. But are we all jumping on the GCC wagon ?

Edited January 26, 2016 by sometimes99er

Tursi · January 26, 2016

Was it really "only" a question of speed - or "how do you think GPL will compare with p-code".

Okay, yes, fair... but speed is what I chose to measure. It's where I focus most of my own efforts and it's also one of the easiest things to measure. Nobody says that there can only be one benchmark program and nobody says that these results are anything more than curiosities. By all means, if you want to see other tests -- create another program, start another thread, and let's start adding dimensions!

Yes. And conclusions have been drawn. I can understand why. But are we all jumping on the GCC wagon ?

I don't even understand this statement... we have a single table measuring one dimension and only one person said GCC looked like a way to go - the same person said he can't use GCC because of the entry barrier to getting it going in the first place. As for the bandwagon, I've been on the GCC bandwagon since the first release - I barely write assembly anymore. But I don't speak for anyone else.

Elsewhere, I'm going to accept your optimized compiled speed, Senior Falcon, as long as you promise to "someday" release the VPOKE extension.

Language   First Pass     Optimized
GCC           15 sec         5 sec
Assembly      17 sec         5 sec
TurboForth    48 sec        29 sec
Compiled XB   51 sec        37 sec
FbForth       70 sec        26 sec
GPL           80 sec       none yet
ABASIC       490 sec       none yet
XB          2000 sec       none yet
UCSD Pascal 7300 sec       780 sec

Edited January 26, 2016 by Tursi

Tursi · January 26, 2016

I just realized, even though I've written it many times, that it's "Senior Falcon", and I have always read it to myself as "Señor Falcon"...

sometimes99er · January 26, 2016

I don't even understand this statement...

Sorry about any barriers. The statement can easily be taken out of context and doesn't stand on its own. Hope my English isn't gibberish and/or my wording come on as too strong.

Benchmarking Languages

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members