Jump to content
IGNORED

Speed Warz -- _____ -vs- _____


Omega-TI

Recommended Posts

Here we go! Room to talk about milliseconds!

 

Just as a quick background from where I'm coming from, my first computer was the TI, back around '82. I would have been 17 at the time. Taught myself BASIC from the manuals, graduated up to XB, but never got into other languages for the TI (such as assembly) since I couldn't afford to expand my system. Storage was cassette. I eventually got a PEB around 84, but soon went on to more "modern" machines like the C128.

 

I never got rid of my TI though. Now that I'm old and settled, I can afford to spend a bit and play when possible. I have a section of my basement set aside for vintage computer systems (all setup and working!). Everything from a C= Pet, Coleco Adam, C128D, Atari 130XE, CoCo III, and so on. But the gem of my system is still my TI, expanded a bit with Myarc floppy controller and dual DSDD drives and a Horizon Ramdisk (love this thing!). And of course the F18a. Anybody with real iron needs one of these.

 

After watching the first season of "Halt and Catch Fire", I got the bug again to play in my fun little world. I found an old program I wrote in XB, a simple clone of QBERT, and thought that an interesting project for myself would be to see what I could do to optimize it and update it. I recently posted the original version of the game on this forum.

 

One thing that struck me about the game was how slow it ran, despite really not doing a lot. I decided at that point to start up a set of benchmarks for myself that would test various ways to do the same thing. I found some surprising results and also verified other assumptions. I am incorporating those changes into my game and also continuing to benchmark new things as I think of them. When I feel they are in good enough shape, this is the thread that would be good for me to publish both the benchmarks and the updated game code.

 

We've hashed out the whole RND thing in other areas, but there are still new things to be found. That's why I love discussions. If I had not mentioned my use of PEEK -31880 then I would not have found out about -31879 from "lee". Even if it only shaves a bit of time off, when it comes to XB, every little bit helps.

 

Without going into gory details, here are a few findings I have from my benchmarks:

* REM statements take time. This is true of full REM lines as well as tail comments. If you have a tight loop, put any code comments outside of it.

* Throwing multiple statements on a single line does not have an appreciable effect on execution speed vs those statements being on seperate lines.

* Dedicated functions may not be the fastest way to check something. For example, "IF SGN(J)=1" is slower to execute than "IF J>0".

* 28 sprites on the screen, motionless, have no effect on the program. 28 sprites moving at full speed can cut your program speed by almost 50%!! There are a lot of tests I still need to do in this area, but the takeaway is to be careful with auto-motion with your sprites.

* Using DEF to define and use a function is very slow. You're better off creating a new SUB for that same function.

* There is no appreciable difference in using hard-coded numbers vs variables. CALL SPRITE(#1,.... is the same as CALL SPRITE (#TEST,.....

* Short vs long variable names has no speed impact. If you want readable code and program size is not a big deal, be descriptive in your variable names.

 

All that being said, I'm still refining the benchmarks. I'd be open for any additions to it.

  • Like 4
Link to comment
Share on other sites

* Short vs long variable names has no speed impact. If you want readable code and program size is not a big deal, be descriptive in your variable names.

 

 

 

How did you test this? I am curious, as I don't believe this holds true as the number of variables and size of program increases. Prescan also may come into play here.

  • Like 1
Link to comment
Share on other sites

How did you test this? I am curious, as I don't believe this holds true as the number of variables and size of program increases. Prescan also may come into play here.

Used the following code. Timing is, of course, somewhat flexible due to how fast my finger is on the stopwatch between "start" and "stop".

 

True, these are short programs. If there is a correlation between program size and runtime execution then that could be interesting to investigate.

 

Use sprite with number

100 CALL CLEAR

110 CALL SPRITE(#1,42,9,50,50)

120 CALL SAY("START")

130 FOR I=1 TO 1000

140 CALL POSITION(#1,X,Y)

150 NEXT I

160 CALL SAY("STOP")

170 PRINT X;" ";Y

 

33/1000 = 0.033

 

Use sprite short variable

100 CALL CLEAR

110 A=1

120 CALL SPRITE(#A,42,9,50,50)

130 CALL SAY("START")

140 FOR I=1 TO 1000

150 CALL POSITION(#A,X,Y)

160 NEXT I

170 CALL SAY("STOP")

180 PRINT X;" ";Y

 

33/1000 = 0.033

 

Use sprite long variable

100 CALL CLEAR

110 A23456789012345=1

120 CALL SPRITE(#A23456789012345,42,9,50,50)

130 CALL SAY("START")

140 FOR I=1 TO 1000

150 CALL POSITION(#A23456789012345,X,Y)

160 NEXT I

170 CALL SAY("STOP")

180 PRINT X;" ";Y

 

34/1000 = 0.034

  • Like 1
Link to comment
Share on other sites

Without going into gory details, here are a few findings I have from my benchmarks:
* REM statements take time. This is true of full REM lines as well as tail comments. If you have a tight loop, put any code comments outside of it.
* Throwing multiple statements on a single line does not have an appreciable effect on execution speed vs those statements being on seperate lines.

Have you tried testing RXB multiple subprogram lines like say CALL HCHAR(7,4,42,8,11,8,42,6,22,28,42,3,24,1,42,9) as you can not do this in XB or any other XB?
* Dedicated functions may not be the fastest way to check something. For example, "IF SGN(J)=1" is slower to execute than "IF J>0".

Any command that is subroutine will be slower as the name like SGN must be found and is faster than just a variable alone as in this statement.
* 28 sprites on the screen, motionless, have no effect on the program. 28 sprites moving at full speed can cut your program speed by almost 50%!! There are a lot of tests I still need to do in this area, but the takeaway is to be careful with auto-motion with your sprites.
* Using DEF to define and use a function is very slow. You're better off creating a new SUB for that same function.

This is not true as the DEF function in XB uses much more assembly to execute then a SUB does so a DEF will fetch values faster than SUB depending on how the DEF is written.
* There is no appreciable difference in using hard-coded numbers vs variables. CALL SPRITE(#1,.... is the same as CALL SPRITE (#TEST,.....

I find this hard to believe as I know the GPL code and unlike a hard coded number a routine has to find a Variable location and value. I think you test loop times are to short.
* Short vs long variable names has no speed impact. If you want readable code and program size is not a big deal, be descriptive in your variable names.

Again not true you test loops were much to short. When a variable name is longer it takes the GPL interpreter longer to fetch and use the names. But the loops would have to be insanely long to show this.

All that being said, I'm still refining the benchmarks. I'd be open for any additions to it.

This is a very cool project and to apply a science to this makes this project even cooler to produce.

Edited by RXB
Link to comment
Share on other sites

Without going into gory details, here are a few findings I have from my benchmarks:

* REM statements take time. This is true of full REM lines as well as tail comments. If you have a tight loop, put any code comments outside of it.

* Throwing multiple statements on a single line does not have an appreciable effect on execution speed vs those statements being on seperate lines.

Have you tried testing RXB multiple subprogram lines like say CALL HCHAR(7,4,42,8,11,8,42,6,22,28,42,3,24,1,42,9) as you can not do this in XB or any other XB?

* Dedicated functions may not be the fastest way to check something. For example, "IF SGN(J)=1" is slower to execute than "IF J>0".

Any command that is subroutine will be slower as the name like SGN must be found and is faster than just a variable alone as in this statement.

* 28 sprites on the screen, motionless, have no effect on the program. 28 sprites moving at full speed can cut your program speed by almost 50%!! There are a lot of tests I still need to do in this area, but the takeaway is to be careful with auto-motion with your sprites.

* Using DEF to define and use a function is very slow. You're better off creating a new SUB for that same function.

This is not true as the DEF function in XB uses much more assembly to execute then a SUB does so a DEF will fetch values faster than SUB depending on how the DEF is written.

* There is no appreciable difference in using hard-coded numbers vs variables. CALL SPRITE(#1,.... is the same as CALL SPRITE (#TEST,.....

I find this hard to believe as I know the GPL code and unlike a hard coded number a routine has to find a Variable location and value. I think you test loop times are to short.

* Short vs long variable names has no speed impact. If you want readable code and program size is not a big deal, be descriptive in your variable names.

Again not true you test loops were much to short. When a variable name is longer it takes the GPL interpreter longer to fetch and use the names. But the loops would have to be insanely long to show this.

 

All that being said, I'm still refining the benchmarks. I'd be open for any additions to it.

This is a very cool project and to apply a science to this makes this project even cooler to produce.

 

For the RXB portion, unfortunately I don't have the required hardware to check this out. I play with "real iron" and have not had the time to delve into the land of emulation much. When I get my benchmarks in decent enough shape to share, then anybody with RXB running will be welcome to add their results to it.

 

For the DEF, this is what I have found. Note that I used a bit of a different approach to this, using 3 loops within each test. I went under the assumption that if you are creating a DEF or SUB, the reason was because you wanted to prevent repeating code within your program.

 

Use repeating formula

100 CALL CLEAR

110 CALL SAY("START")

120 FOR I=1 TO 1000

130 X=40*10+1.5*I

140 NEXT I

150 FOR J=1 TO 1000

160 Y=40*10+1.5*J

170 NEXT J

180 FOR K=1 TO 1000

190 Z=40*10+1.5*J

200 NEXT K

210 CALL SAY("STOP")

 

Base loop time = 53/3000 = 0.0177

 

Use predefined function in place of repeating same formula

100 DEF PAY(OT)=40*10+1.5*OT

110 CALL CLEAR

120 CALL SAY("START")

130 FOR I=1 TO 1000

140 X=PAY(I)

150 NEXT I

160 FOR J=1 TO 1000

170 Y=PAY(J)

180 NEXT J

190 FOR K=1 TO 1000

200 Z=PAY(K)

210 NEXT K

220 CALL SAY("STOP")

 

Base loop time = 261/3000 = 0.087

 

Use subprogram for repeating formula

100 CALL CLEAR

110 CALL SAY("START")

120 FOR I=1 TO 1000

130 CALL B3C(J,X)

140 NEXT I

150 FOR J=1 TO 1000

160 CALL B3C(K,Y)

170 NEXT J

180 FOR K=1 TO 1000

190 CALL B3C(L,Z)

200 NEXT K

210 CALL SAY("STOP")

220 SUB B3C(P,Q)

230 Q=40*10+1.5*P

240 SUBEND

 

Base loop time = 78/3000 = 0.026

 

For the other observations, when you say the loops are too short, do you have a length in mind? I had picked 1000 iterations for a few reasons, such as easy math and also not having to wait an insane amount of time for some of the longer tests to run. I also figure that 1000 iterations of something would be enough to show a difference that really matters. For example, let's say there is a difference in execution speed for in a 1 character variable name vs a 15 character variable name. If that difference isn't large enough to make itself felt over 1000 calls, shouldn't that indicate that the difference is not appreciable or relevant? But again, I'm fully open to suggestions.

 

As for the hard coded vs variable thing, my code is posted in msg #6 above. Feel free to run them in XB and double-check my timings.

Link to comment
Share on other sites

A comparison of compiled XB and regular XB:

10 FOR J=1 TO 1000

20 X=RND*100

30 NEXT J

In TI XB this took 71.49 seconds for 71.49 mSec per loop

In compiled XB it took 81.02 seconds for .81 mSec per loop

(after nesting another loop to get to 100000)

So the speed increase was 88x over TI XB.

This must be right around the speed that Forth runs at.

(Note that the compiler uses integer arithmetic.)

Link to comment
Share on other sites

Here's what I found in my speed tests of the new RND function:

I ran these programs for 1 minute in RXB 2012 with the fast RND; RXB 2012; standard XB; TI BASIC:

10 X=RND

20 N=N+1

30 GOTO 10

then

20 N=N+1

30 GOTO 20

 

RXB with the fast RND:

3207 loops = 18.71 mSec/loop; 8915 loops = 6.73 mSec/loop; this gives 11.98 mSec/RND

 

RXB 2012:

855 loops = 70.17 mSec/loop; 8915 loops = 6.73 mSec/loop; this gives 63.44 mSec/RND

 

TI XB was the same as RXB 2012

 

TI BASIC: (just for what it's worth)

3607 loops = 16.63 mSec/loop; 11123 loops = 5.39 mSec/loop; this gives 11.24 mSec/RND

 

So the fast RND generates a random number 5.3x faster than standard XB or RXB 2012

 

How does this translate into the real world of programming? I ran the kaleidoscope program in post #451 of RXB..., after modifying line 540 to end the program rather than loop endlessly.

RXB 2012 took 53.93 seconds to run

RXB with fast RND took 33.21 seconds to run

So this particular program ran 1.62 times faster with the new RND and this is quite noticeable. Your mileage may vary: Kaleidoscope makes pretty heavy use of the RND generator, and most programs would probably see less of a speed increase. Even so, I bet it would be noticeable for most games. This is definitely a worthwhile addition to RXB.

 

 

Translation to fbForth 2.0 takes 1.5 seconds or less—my finger wasn't fast enough.

 

...lee

Link to comment
Share on other sites

A comparison of compiled XB and regular XB:

10 FOR J=1 TO 1000

20 X=RND*100

30 NEXT J

In TI XB this took 71.49 seconds for 71.49 mSec per loop

In compiled XB it took 81.02 seconds for .81 mSec per loop

(after nesting another loop to get to 100000)

So the speed increase was 88x over TI XB.

This must be right around the speed that Forth runs at.

(Note that the compiler uses integer arithmetic.)

 

 

fbForth 2.0 code for this in a 10000 loop takes ~0.33 ms per loop:

: XXX 10000 0 DO 100 RND DROP LOOP ;

The same thing for floating point RND ( FRND ) takes ~1.27 ms per loop:

: YYY 10000 0 DO FRND FDROP LOOP ;

The DROP and FDROP were used to keep the stack from overflowing and probably not quite a fair comparison. I'll throw in variables later to even the playing field.

 

...lee

Link to comment
Share on other sites

We are of course completely ignoring the quality of the random number generator in these comparisons, but I did a rough comparison with a full GCC implementation as well (meaning I used a C-only random number generator instead of using an assembler library or implementation, that might speed it up even further). Also, I did not inline the function (which would take out any time spent on the function call itself) since the other language tests have included that as well. Simply adding "inline" to the function definition would speed this up even more, at the cost of using more memory each place the function is used in real code.

// 16-bit random number generator.
// Based on George Marsaglia's xorshift algorithm
unsigned int rnd_xorshift()
{
  static unsigned int 	x = 1, y = 1;
  unsigned int			t = ( x^( x<<5 ) ); 

  x = y;

  return y = ( y^( y>>1) )^( t^( t>>3 ) );
}

// Simplified version of main function, skipping all screen setup stuff 
void main_loop()
{
  putstring("  Working... ");

  for (loop = 0; loop < 65000; loop++)
  {
    x = rnd_xorshift();
  }

  putstring("  Done!\n");
}

Using the stopwatch, 65000 loops are done in 6.064 seconds (I averaged 10 runs to try to minimize the error margin of doing this manually), which means around 0.09ms per loop, or three times faster than fbForth and ten times faster than compiled Basic.

Apparently the quality of the algorithm itself is pretty good as well, but of course, it's an integer algorithm and it only does two bytes here.

 

Having said that, I am sure that versus the Basics, C wins a lot on the for loop itself already. So all-in-all not a fair test of the speed of the RND algorithm as such, but a good indication of how performance would differ in real world applications.

Link to comment
Share on other sites

 

 

fbForth 2.0 code for this in a 10000 loop takes ~0.33 ms per loop:

: XXX 10000 0 DO 100 RND DROP LOOP ;

It's about the same in TF. It's very fast and fractions of a second are relevant in this test. It's somewhere between 3 and 4 seconds. With only my wrist watch and looking at the screen and then back at my watch that's as much as I can say!

Link to comment
Share on other sites

Hmm, I wonder what the distribution of each is? Is there a short routine that could made to track the results of each?

In my random number benchmarks I do a simple distribution test. I create random numbers from 1 to 10 and keep a count in an array of 10, incrementing each relevant cell as required. I also keep a running total. After the benchmark I print the total/iterations for an average and then print out the array values.

 

I found that TI Console BASIC, XB, and using PEEK -31880 all produced a fairly even distribution and an average hovering around 5 very consistently.

  • Like 1
Link to comment
Share on other sites

In my random number benchmarks I do a simple distribution test. I create random numbers from 1 to 10 and keep a count in an array of 10, incrementing each relevant cell as required. I also keep a running total. After the benchmark I print the total/iterations for an average and then print out the array values.

 

I found that TI Console BASIC, XB, and using PEEK -31880 all produced a fairly even distribution and an average hovering around 5 very consistently.

An even distribution is what you should get, given enough iterations of random numbers. The distribution should look like a plateau. (I did a lot of experimenting with random numbers in MATLAB when I was in school :).)

Link to comment
Share on other sites

Here's a distribution plot of 20000 times through the loop in fbForth 2.0 for numbers from 0 – 99, i.e., 100 RND . Each dotrow is plotted, with 0 at the top. The dot-length of each row represents the frequency of that number:

 

post-29677-0-58328600-1411066412_thumb.png

 

For such a cryptic pseudo-random number generator, the distribution is fairly even. As already pointed out, a perfect distribution should be dead even.

 

...lee

  • Like 1
Link to comment
Share on other sites

 

Is a perfect random distribution is a non-repeating set?

 

Sounds about right. The problem with computers is that this ideal is perhaps approachable, but unattainable in practice—hence, the term "pseudo-random number generator" (PRNG). In fact, for testing purposes, you want computer-generated PRN sequences to be absolutely predictable when started with the same seed. The periodic use of RANDOMIZE to insert a new seed helps to approach true randomness. The program, the results of which are in my last post, starts with whatever number happens to be at 83C0h when fbForth was started.

 

Though a truly random sequence should be virtually unrepeatable, the cumulative set of numbers should be the same, i.e., each number should occur with the same frequency.

 

...lee

  • Like 1
Link to comment
Share on other sites

I set up the program I used in fbForth 2.0 to run the RND loop with or without constant updates to the screen. It will obviously run faster if you tell the program to wait until the end to plot the distribution. The program is called RNDDST and expects 2 numbers on the stack, the number of iterations and a flag (1 = inline plot, 0 = plot at the end). The program is on block 12 of my blocks file, BRESEN. To run the program, that block must first be loaded and then "iter flag RNDDST " executed. The stack signature is

RNDDST   ( iter flag --- )

The program is

BASE->R DECIMAL 0 VARIABLE DST 100 ALLOT  0 CONSTANT NDPLT      
: DSTCLR DST 102 ERASE ;                                        
: DSTLN   0 ROT ROT OVER LINE ;                                 
: DST+  DUP DST + DUP C@ 254 MIN 1+ DUP ROT C!                  
    NDPLT IF DSTLN ELSE DROP DROP THEN ;                        
: DSTPLT  100 0 DO 0 I I DST + C@ I LINE LOOP ;                 
: RNDDST  ( iter flag --- )                                     
    SPLIT DSTCLR ' NDPLT !                                      
     0 DO 100 RND DST+ LOOP NDPLT 0= IF DSTPLT THEN ;           
     
R->BASE

Here's a video of the program running with inline plotting of the distribution:

 

RNDDST4000.wmv

 

...lee

Link to comment
Share on other sites

Mathematically speaking, any pseudo-random generator (without external randomness source) must eventually produce a repeating set, as the relevant state of the computing system only has a limited number of bits.

 

In Unix-like systems like Linux, there are two random generators, /dev/random and /dev/urandom. As the man page states,

 

The random number generator gathers environmental noise from device drivers and other sources into an entropy pool. [...] From this entropy pool random numbers are created.

 

Evaluating the quality of a PRNG is tricky. One thing that comes to my mind is to plot a bitfield, i.e. not the bars as shown above but putting pixels on the bitmap screen. The human perception is quite good when trying to find patterns.

 

On the other hand, I think a sufficiently well distributed value range is good enough for our purposes; we're not doing cryptography. :)

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...