Jump to content

Photo

Benchmarking Languages


159 replies to this topic

#101 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 418 posts

Posted Sat Jul 1, 2017 10:15 AM

I guess you didn't do all the 100 loops. Did you multiply by the wrong value, maybe? I did five turns and multiplied with 20.

 

Anyway, just a few of the programs actually manage any kind of data structures for sprites. The rest focus on writing to the VDP RAM at certain locations, assuming the user knows which VDP RAM location to access. I'll make an optimization for the Pascal version where it also writes to VDP RAM directly, but still in Pascal, just to see what happens. I'm not sure if the routines in unit sprite will access the sprite data by themselves, when there's no countdown data in the sprite record, so I don't know if it works.

 

I've never used sprites for anything in Pascal, so I have no real reference to how fast or slow that handling is. But generally, Pascal runs a couple of times faster than Extended BASIC.

 

Oh, by the way, James D wrote earlier that the PME has no registers, but works with a stack, and that the instructions are 16-bit. The p-machine has some registers, which of course are emulated by the PME, and the instructions are 8-bit. Some of them do have one or more bytes in-line, though, as immediate data. So even if the instructions basically are 8 bits, some are extended to several bytes. Most of the data is referenced either by being on the stack, or via the stack. Variables in the current environment record are referenced through R9 in the TMS 9900 in this particular implementation of the PME. Global data though R14.


Edited by apersson850, Sun Jul 2, 2017 6:41 AM.


#102 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,593 posts
  • Location:Flyover State

Posted Sat Jul 1, 2017 7:02 PM

...

Oh, by the way, James D wrote earlier that the PME has no registers, but works with a stack, and that the instructions are 16-bit. The p-machine has some registers, which of course are emulated by the PME, and the instructions are 8-bit. Some of them do have a second byte, though, so even if the instructions basically are 8 bits, some are extended to 16 bits. Most of the data is referenced either by being on the stack, or via the stack. Variables in the current environment record are referenced through R9 in the TMS 9900 in this particular implementation of the PME.

The part about 16 bit... it's the Pascal numeric types that are normally 16 bit, not the opcodes themselves.  .  

Sorry about that.  But it means a lot of opcodes dealing with 16 bits.
 

 

As for registers... from the P Code Machine Wiki.

 

 

Like many other p-code machines, the UCSD p-Machine is a stack machine, which means that most instructions take their operands from the stack, and place results back on the stack. Thus, the "add" instruction replaces the two topmost elements of the stack with their sum. A few instructions take an immediate argument. Like Pascal, the p-code is strongly typed, supporting boolean (b), character ©, integer (i), real ®, set (s), and pointer (a) types natively.

If by registers you mean the following (from the 'ARCHITECTURE OF THE P-MACHINE' section of the docs), then yeah, it has registers.
But that's not user registers.

 HARDWARE EMULATION: REGISTERS

 
The P-machine uses 16-bit words, with two 8-bit bytes per word.  
It has an evaluation stack, several registers, and a user memory 
containing a program stack and a heap.  All registers are pointers to 
word-aligned structures, except IPC, which is a pointer to byte-aligned 
instructions.  The registers, sometimes referred to as "pseudo-variables",
are:
 
SP: evaluation Stack Pointer.  A pointer to the current "top" of 
    the evaluation stack (one byte beyond the last byte in use).  In the
    Apple, the evaluation stack uses a portion of the 6502's hardware
    stack, starting in hex memory location 1FF and growing down toward
    hex location 100.  It is used to pass parameters, return function
    values, and as an operand source for many instructions.  The
    evaluation stack is extended by loads, and is cut back by stores
    and arithmetic operations.
 
IPC: Interpreter Program Counter.  Contains the address of the next
     instruction to be executed, in the code segment of the currently
     executing procedure.
 
SEG: SEGment pointer points to the procedure dictionary of the 
     segment to which the currently executing procedure belongs.
     (See this manual's appendix OPERATION OF ThE P-MACHINE for
     illustrations.)
 
JTAB: Jump TABle pointer.  A pointer to the table of attributes and
      jump table entries in the procedure code section of the currently
      executing procedure.  (See this manual's appendix OPERATION OF THE
      P-MACHINE for illustrations.)
 
KP: program stacK Pointer.  A pointer to the current top of the
    program stack.  The program stack starts in high user memory and
    grows downward toward the heap.  (See this manual's appendix
    OPERATION OF THE P-MACHINE for illustrations.)
 
HP: Markstack Pointer.  A pointer to the low byte of MSSTAT, in the
    topmost Markstack on the program stack, in the activation record 
    of the currently executing procedure.  Variables local to the current
    procedure are accessed by indexing off MP.
 
NP: New Pointer.  A pointer to the current top of the dynamic heap
    (one byte beyond the last byte in use). The heap starts in low 
    user memory and grows upward toward the program stack.  It contains all
    dynamic variables (see Jensen and Wirth, Chapter 10).  It is
    extended by the standard procedure 'new', and is cut back by the
    standard procedure 'release'.
 
BASE: BASE Procedure.  A pointer to the activation record of the 
      most recently invoked base procedure (lex level 0).  Global (lex
      level 0) variables are accessed by indexing off BASE.
 

 

This is a perfect example of why the P--Machine isn't very efficient on 8 bit machines.
On a more advance 16 bit CPU, this might only need 3 opcodes.
But this is what a 6502 has to do to execute a LOR (logical or) opcode for the p-machine.
This doesn't even show the code that decodes the opcode, calls the opcode routine, and the code run after the exit.
The 6502 is the worst of the 8 bit CPUs for this, but more machines had 6502s than anything else.

If opcodes worked on registers, all the slow indexed instructions, and stack manipulation goes away and it's probably at least 30% faster  
On the 9900, putting data directly into registers would surely be faster than using the stack.
LOR
	TSX
	LDA	P1BASE+3,X
	ORA	P1BASE+1,X
	STA	P1BASE+3,X
	LDA	P1BASE+4,X
	ORA	P1BASE+2,X
	STA	P1BASE+4,X
	INX
	INX
	TXS
        JMP UPDBY1


Edited by JamesD, Sat Jul 1, 2017 7:06 PM.


#103 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 418 posts

Posted Sun Jul 2, 2017 2:39 AM

The part about the registers you quote is from an earlier version of the p-system, not IV.x, so the registers aren't the same. But it's their equivalents I'm referring to, yes.

The interpreter in the 99/4A is further complicated by the fact that it can run code from CPU RAM, VDP RAM or GROM as well. If you like, I can post the inner part of the interpreter here.



#104 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 418 posts

Posted Sun Jul 2, 2017 6:10 AM

Well, never mind waiting for anybody asking for anything...

Here's the PME (P-Machine Emulator) central parts. Note that the address of PMEFETCH is 8300H when the machine is running, so this is in 16-bit RAM. But there are two different fetch routines, one for code in CPU RAM and the other for code in VDP RAM or GROM. They are both loaded at the same place. The CPU RAM version needs some NOP instructions to occupy the same addresses, since there are six external entry points into the interpreter.

PMEFETCH gets the opcode of the instruction, and that's always 8 bits. It then looks into a table, which gives the address to the instruction interpretation.That part begins with an address where to start running the interpretation, since that may be directly after the entry, or it may branch to one of five locations in the interpreter, where one, two or three parameters, that are inline in the code (not on the stack) are fetched to R3, R4 and R5, before the interpretation of the actual instruction continues.

 

I've included a few instructions in full detail.

LDO loads a word from the program's global data area. R14 points to that area.

LOD loads data from a caller's local data area (environment record). There are two short forms, used to load data from the caller, or the caller's caller. Then one general that can load from any lexical level above the currently running procedure. R9 points to the current environment record. It's up to the programmer to think about this. Using variables further up than two levels take more time. There's a similar thing for local variables. There are faster instructions to pick the first variables, compared to those further down among the declarations. So declare short variables that are frequently used first. If you declare an array first, you may run out of reach of all short local variable load instructions in one fell swoop.

The ADI and LOR performs addition and logical or (so you can compare with the 6502 code above). It takes seven instructions to execute these codes, if they are in CPU RAM. Normally, p-code runs from VDP RAM, where it takes eight instructions to accomplish something the TMS 9900 could have done with one, if the p-code was converted to native code. But LOR is one byte long, SOC *SP+.*SP is two. So for this particular instruction, there's a speed gain of roughly eight times, at the loss of twice the memory use.

 

There are p-code instructions for more complex things too, like calling global or local procedures. They are more like small program segments, and may invoke quite a lot of code, if a segment fault is issued on a call (the called procedure isn't currently in memory, but must be retrieved from disk).

The instructions SIGNAL and WAIT also have their own p-codes, as they shouldn't be interrupted.

 

This is not based on any source code or such, but on my own dechipering and inspection of my p-system on the 99/4A. It was necessary to understand more than the manual tells you to be able to implement pre-emptive multitasking and bit-map mode, to allow the system to do turtlegraphics.

 

Spoiler

Edited by apersson850, Sun Jul 2, 2017 10:59 AM.


#105 TheBF OFFLINE  

TheBF

    Moonsweeper

  • 308 posts
  • Location:The Great White North

Posted Sun Jul 2, 2017 8:00 AM

This is cool to see inside this system.

 

Can you think of any good reason that the interpreter has three NOP instructions in it?

 

I find it astounding that someone wanted to slow down this critical piece of the system.



#106 Lee Stewart ONLINE  

Lee Stewart

    River Patroller

  • 3,326 posts
  • Location:Silver Run, Maryland

Posted Sun Jul 2, 2017 8:25 AM

This is cool to see inside this system.

 

Can you think of any good reason that the interpreter has three NOP instructions in it?

 

I find it astounding that someone wanted to slow down this critical piece of the system.

 

Re-read the third line of his response.  :P

 

...lee



#107 TheBF OFFLINE  

TheBF

    Moonsweeper

  • 308 posts
  • Location:The Great White North

Posted Sun Jul 2, 2017 9:48 AM

Got it😐

#108 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 418 posts

Posted Sun Jul 2, 2017 10:04 AM

Most of these NOP instructions aren't really executed, but are there just as fillers. When reading code from memory mapped devices, there's no auto-increment of the PC (R8), so it has to be advanced with extra INC instructions.


Edited by apersson850, Sun Jul 2, 2017 10:56 AM.


#109 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,593 posts
  • Location:Flyover State

Posted Sun Jul 2, 2017 10:28 AM

I'd be less worried about the NOPs and more worried about  how much other code has to be executed just for a single opcode.  

BTW, the first language I found that used this sort of interpreter was BCPL which came a few years before Pascal or Forth. 
I'm pretty sure that's where the idea came from.  


Edited by JamesD, Sun Jul 2, 2017 10:36 AM.


#110 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 418 posts

Posted Sun Jul 2, 2017 10:48 AM

Obviously, p-code runs seven times slower than pure assembly, as it takes seven instructions to execute one, which has a direct correlation to a TMS 9900 instruction.

Instructions that are used to find data in another procedure and such stuff do of course take longer time. But they couldn't execute in one single TMS 9900 instruction either, so the overhead there is less.

If you look at the SLOD2 instruction, it takes 24 TMS 9900 instructions to execute it, and six to decode it. Thus the overhead only adds 25%, not 700% as is the case with ADI.

 

I don't know where the idea to implement p-code for the UCSD system came from. But it's a fairly old idea, that to compile a language to some intermediate code and then either convert that all the way, or interpret it. As they wanted portability, it's very efficient, since implementing the PME on a new platform is a significantly less task than to modify the compiler each time.


Edited by apersson850, Sun Jul 2, 2017 10:56 AM.


#111 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 418 posts

Posted Wed Jul 5, 2017 10:11 AM

Just to see how much of the time is consumed by the sprite routines in the Pascal program, I commented them out. The program still runs all the loops and does all the assignments, but it never calls set_sprite. Now it does the 100 loops in 150 seconds.

Next I'll make it write to the sprite attribute table directly, and we'll see what difference that makes.



#112 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 418 posts

Posted Wed Jul 5, 2017 12:54 PM

In the next step, I added an external procedure which simply plugs the values for x and y directly into the sprite attribute table. Execution time is now 166 seconds.

This is of course still not near Forth or pure assembly, but shows that the p-system and Pascal is at least normally significantly faster than Extended BASIC.


Edited by apersson850, Wed Jul 5, 2017 1:01 PM.


#113 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 418 posts

Posted Wed Jul 5, 2017 2:00 PM

Finally, a programs which does the same, but without any external procedures. Pure Pascal in 273 seconds.

As far as I know, there's no way to poke to VDP memory in Extended BASIC, so that's probably as optimized as it will be. At least not without external functions, like special CALLs implemented on Horizon RAMdisks and such. Thus we have 2000 seconds vs. 273 seconds here. That's in line with what I've experienced before, where Pascal is a couple of times faster than BASIC, but of course not near assembly speed.

There are a few more things you can do to optimize further, but I don't bother now. The step from 780 to 273 seconds still proves that if you know the Pascal system well, you can make it perform better. And the step down to 166 seconds shows that if you use assembly support where it's best needed, then you can get some more. But that's true for most languages.

Language   First Pass     Optimized
GCC           15 sec         5 sec
Assembly      17 sec         5 sec
TurboForth    48 sec        29 sec
Compiled XB   51 sec        37 sec
FbForth       70 sec        26 sec
GPL           80 sec       none yet
ABASIC       490 sec       none yet
XB          2000 sec       none yet
UCSD Pascal 7300 sec       273 sec

Edited by apersson850, Wed Jul 5, 2017 2:06 PM.


#114 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,593 posts
  • Location:Flyover State

Posted Wed Jul 5, 2017 6:53 PM

Ya know, there was a Pascal parser for GCC.  
It's been dropped in recent versions due to lack of a maintainer, but if that were combined with the 9900 GCC changes, it would probably benchmark right up there with C.
In theory, since it uses strict typing, it should be able to optimize some code that can't be optimized under C.
 



#115 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 418 posts

Posted Thu Jul 6, 2017 3:19 AM

Sure, but as far as I see it, it's not interesting. The value with the UCSD p-system lies in the word system. It's the whole system, with code and memory management, libraries and such that I like. Then I can live with that it doesn't outrun Forth, and that I occasionally have to write external procedures to get the desired performance.



#116 Vorticon OFFLINE  

Vorticon

    River Patroller

  • 2,727 posts
  • Location:Eagan, MN, USA

Posted Thu Jul 6, 2017 7:33 AM

 

Finally, a programs which does the same, but without any external procedures. Pure Pascal in 273 seconds.

As far as I know, there's no way to poke to VDP memory in Extended BASIC, so that's probably as optimized as it will be. At least not without external functions, like special CALLs implemented on Horizon RAMdisks and such. Thus we have 2000 seconds vs. 273 seconds here. That's in line with what I've experienced before, where Pascal is a couple of times faster than BASIC, but of course not near assembly speed.

There are a few more things you can do to optimize further, but I don't bother now. The step from 780 to 273 seconds still proves that if you know the Pascal system well, you can make it perform better. And the step down to 166 seconds shows that if you use assembly support where it's best needed, then you can get some more. But that's true for most languages.

Language   First Pass     Optimized
GCC           15 sec         5 sec
Assembly      17 sec         5 sec
TurboForth    48 sec        29 sec
Compiled XB   51 sec        37 sec
FbForth       70 sec        26 sec
GPL           80 sec       none yet
ABASIC       490 sec       none yet
XB          2000 sec       none yet
UCSD Pascal 7300 sec       273 sec

 

 

Could you please post the source code for that program? I'm very interested in seeing how you did it. BTW, this is 10 times faster than XB, not a just a couple of times :)



#117 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 418 posts

Posted Thu Jul 6, 2017 10:42 AM

Sure. Here you go.

Note that the UCSD p-system on the somewhat peculiar TI 99/4A has dual code pools. One in VDP RAM, the other in the 24 K CPU RAM. Normally, the p-system will load pure p-code in the VDP pool. Code containing assembly programs must be loaded in the secondary code pool, as they can't run from video memory. But in this program, I'm writing to the VDP from Pascal. Thus you must use the utility setltype (set language type) to change the type of the code file you run from Pseudo to M_9900.

 

This is not very flexible, as the procedure that writes to VDP RAM is fixed to set the position values for sprite #1, nothing else. A few more milliseconds could have been saved by not calling any procedure at all, but putting the code in line in the main program. Then it would also have been possible to only write one coordinate, as the other is fixed in each loop. But I didn't bother. I just wanted to see if it was the versatile and complicated unit sprite which does some overkill for such a simple task as this one, and thus wastes a lot of time. And it obviously is.

The time related code is just to make the benchmark timing automatic. A few declarations aren't used; they remain from the first version.

 

Spoiler


#118 Tursi OFFLINE  

Tursi

    River Patroller

  • Topic Starter
  • 4,753 posts
  • HarmlessLion
  • Location:BUR

Posted Thu Jul 6, 2017 12:18 PM

 

As far as I know, there's no way to poke to VDP memory in Extended BASIC, so that's probably as optimized as it will be.

 

I've taken several stabs at it, and never done any better. (Which did surprise me).

 

We don't have TI LOGO in there, anyone want to try that one? ;)



#119 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,593 posts
  • Location:Flyover State

Posted Thu Jul 6, 2017 1:49 PM

Sure, but as far as I see it, it's not interesting. The value with the UCSD p-system lies in the word system. It's the whole system, with code and memory management, libraries and such that I like. Then I can live with that it doesn't outrun Forth, and that I occasionally have to write external procedures to get the desired performance.

So writing all the code in Pascal and then compiling pieces that need more speed with a different compiler and making them external procedures isn't interesting?



#120 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 418 posts

Posted Thu Jul 6, 2017 2:05 PM

I thought you were talking about replacing the whole system with a compiler that produced code files which couldn't be loaded under the p-system. As I understand you now it's completely different. Automating that process is like having a native code generator (which normally does accompany the p-system), but perhaps even better, if it is an optimizing such thing.

 

How fast do you guys sort 1000 random integers? Using whatever language you like, thus most likely assembly?



#121 Vorticon OFFLINE  

Vorticon

    River Patroller

  • 2,727 posts
  • Location:Eagan, MN, USA

Posted Thu Jul 6, 2017 3:20 PM

 

I've taken several stabs at it, and never done any better. (Which did surprise me).

 

We don't have TI LOGO in there, anyone want to try that one? ;)

 

I just might! I also think RXB will be a worthwhile contestant as well with its low level access features.



#122 Vorticon OFFLINE  

Vorticon

    River Patroller

  • 2,727 posts
  • Location:Eagan, MN, USA

Posted Thu Jul 6, 2017 3:25 PM

Sure. Here you go.

Note that the UCSD p-system on the somewhat peculiar TI 99/4A has dual code pools. One in VDP RAM, the other in the 24 K CPU RAM. Normally, the p-system will load pure p-code in the VDP pool. Code containing assembly programs must be loaded in the secondary code pool, as they can't run from video memory. But in this program, I'm writing to the VDP from Pascal. Thus you must use the utility setltype (set language type) to change the type of the code file you run from Pseudo to M_9900.

 

This is not very flexible, as the procedure that writes to VDP RAM is fixed to set the position values for sprite #1, nothing else. A few more milliseconds could have been saved by not calling any procedure at all, but putting the code in line in the main program. Then it would also have been possible to only write one coordinate, as the other is fixed in each loop. But I didn't bother. I just wanted to see if it was the versatile and complicated unit sprite which does some overkill for such a simple task as this one, and thus wastes a lot of time. And it obviously is.

The time related code is just to make the benchmark timing automatic. A few declarations aren't used; they remain from the first version.

 

Spoiler

 

Thanks. I think I will need to look this over closely. I was not aware that was a unit call realtime...



#123 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 418 posts

Posted Thu Jul 6, 2017 4:25 PM

Well, it's only my machine, and one of my friend's (but I doubt his is running any longer) that has a unit realtime. The p-system keeps track of today's date, and uses it to tag files when they are created or updated. But you have to key in the date manually, as the system always starts with the same date as last time it was set. It's stored on the disk, so it's the last date the system disk was used you get when you start next time.

If the computer had a battery-backed real time clock, keeping track of date and time, the setting of the date could be done automatically. As there was no such device available on the market at that time, I set about to design and build one. I did of course also write the software to use it. The p-system unit realtime is one part of that software.

I have two editions of the unit realtime, one that works with my own clock hardware and one that works with the Triple-Tech card, which was the first clock card available one the market that I came in touch with. A friend of mine bought one, so I wrote a driver for him to use it with the p-system.

My device uses the same clock chip as you find on the P-GRAM card. It's good for precision timing, as it counts down to 1/1000 s. The chip they used on the Triple-Tech card would do seconds only.

I've also designed my card so that I can use the interrupt generation capability of the clock. Thus the card can issue an alarm, via an interrupt, at a pre-programmed time, or use a counter rollover interrupt, and thus issue one every minute or every 1/10 s or every day. The every day interrupt can be used to change the system date not only when logging on, but also during operation of the p-system.

 

Since I can change the interrupt vector in my system (I can overlay the console ROM with RAM) I have the ability to for example reconfigure the TI to become a controller, which can take actions based on time.

 

But in this benchmark case the clock is simply used to time the activity. The unit realtime allows the dynamic creation of any number of timers (only memory limiting), that can run simultaneously. They can be halted and read individually, and disposed when not needed any longer, to recover the memory used.

 

Anyway, don't look for it in the stock UCSD p-system, as it's not there.



#124 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,593 posts
  • Location:Flyover State

Posted Thu Jul 6, 2017 6:38 PM

I thought you were talking about replacing the whole system with a compiler that produced code files which couldn't be loaded under the p-system. As I understand you now it's completely different. Automating that process is like having a native code generator (which normally does accompany the p-system), but perhaps even better, if it is an optimizing such thing.

 

How fast do you guys sort 1000 random integers? Using whatever language you like, thus most likely assembly?

Well, on smaller projects, you could skip the P-Machine and run at speeds similar to the current GCC C compiler.
But you could also generate native code modules to work with the P-System.  
If you develop something under UCSD that needs more speed in certain parts, make them modules, then compile the code on the PC and have freakish fast speeds while still having the advantages of the P-Machine.  How easy it would be to convert the GCC Pascal output to a UCSD module I don't know.

The problem with the UCSD native code translator, is that the output still works like the P-Machine, just without the decoding phase.
It can't take advantage of lots of registers, it uses the stack a lot, etc...
Translation will certainly cut the execution times by quite a bit, but nothing approaching GCC's code generator.
 



#125 TheBF OFFLINE  

TheBF

    Moonsweeper

  • 308 posts
  • Location:The Great White North

Posted Wed Jul 26, 2017 10:11 AM

I have been using this exercise to beat up the low level code for my Sprite routines.

Here is the current state of the benchmark operating under Indirect threaded Forth (ITC)

and Direct Threaded Forth (DTC), with Top of stack cached in a register.

 

  Lee, 

              I can't figure out how you got FB-Forth to go faster than Turbo Forth in the optimized version.

              Can you double check it when you have nothing better to do?  

 

 

Note: None my code is updated in GITHUB.  I need to do some housecleaning there.     

 

 

Code:

HEX
VARIABLE CNT

DECIMAL
\ SP.LOC is Forth code that writes to sprite descriptor table
\ : SP.LOC    ( dx dy sprt# -- )  >R >CELL R> ]SDT V! ;
​\ alternative to using mtask99 and automotion
: TURSI.BENCH
      GRAPHICS
      PAGE
      1 MAGNIFY
      2 42 0 0 0 SPRITE   \ CAMEL99 uses BASIC color #s
      SP.SHOW
      100 CNT !
      BEGIN
           CNT @ 0> WHILE
           239 0 DO   I   0  0 SP.LOC       LOOP
           175 0 DO 239   I  0 SP.LOC       LOOP
           0 239 DO   I 175  0 SP.LOC   -1 +LOOP
           0 175 DO   0   I  0 SP.LOC   -1 +LOOP
           -1 CNT +!
      REPEAT
HEX
300 CONSTANT $300
$300 1+ CONSTANT $301

DECIMAL 
: TURSI.OPT
      GRAPHICS
      PAGE
      1 MAGNIFY
      2 42 0 0 0 SPRITE   \ CAMEL99 uses BASIC color #s
      SP.SHOW
      100 CNT !
      BEGIN
           CNT @ WHILE
           239 0 DO   I $301 VC!     LOOP
           175 0 DO   I $300 VC!     LOOP
           0 239 DO   I $301 VC! -1 +LOOP
           0 175 DO   I $300 VC! -1 +LOOP
           -1 CNT +!
      REPEAT
;  

Here are the standings

Language   First Pass     Optimized
GCC           15 sec         5 sec
Assembly      17 sec         5 sec
TurboForth    48 sec        29 sec
CAMEL99 DTC   49 sec        27 sec
Compiled XB   51 sec        37 sec
CAMEL99 ITC   55 sec        29 sec
FbForth       70 sec        26 sec
GPL           80 sec       none yet
ABASIC       490 sec       none yet
XB          2000 sec       none yet
UCSD Pascal 7300 sec       273 sec


Edited by TheBF, Thu Aug 10, 2017 2:33 PM.





0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users