Camel99 Forth Information goes here

+TheBF · January 26, 2022

1 minute ago, Lee Stewart said:
Not sure how this would play in TI Forth and fbForth. The payloads for DOVAR DOCON DOUSER all call DODOES , which falls through to DOCOL and $NEXT . All colon definitions end in ; , which executes $SEMIS , which executes a copy of $NEXT , which actually might make it possible to separate direct calls to $NEXT with your improvement (would also affect DOEXEC ). All of this code is in scratchpad RAM on the 16-bit bus for speed:
DODOES DECT SP
       MOV  W,*SP
       MOV  LINK,W
DOCOL  DECT R
       MOV  IP,*R
       MOV  W,IP
$NEXT  MOV  *IP+,W
DOEXEC MOV  *W+,R1
       B    *R1
$SEMIS MOV  *R+,IP
       MOV  *IP+,W   <<<----copy of $NEXT
       MOV  *W+,R1   <<<----copy of DOEXEC
       B    *R1      <<<---+
...lee

Yes this the traditional way the data words were implemented in FIG Forth.

I think to do this NEXT "improvement" (if it is one) you would change both instances of $NEXT.

AND... you have to add INCT W in DOCOL before you move W into IP.

With the alternative NEXT, W contains the CFA when $NEXT completes and we need the PFA. So DOCOL is slightly slower but it seems NEXT is what matters most.

The other thing I found that broke was any cool code that I wrote with ;CODE. (fast arrays now need a INCT W instruction added)

So it's not free. I was mostly curious at this stage to see if it made a difference. But it is probably not too big to fix up my libraries which are mostly Forth.

For comparison my existing implementation in Forth Assembler. with macros. The following is in scratchpad RAM. ( _exit is your $SEMIS )

l: _exit      IP RPOP,        \ >8388
l: _next                      \ Forth ITC NEXT routine (>838A)
@@9:         *IP+ W  MOV,     \ move CFA into Working register & incr IP 
             *W+  R5 MOV,     \ move contents of CFA to R5 & INCR W 
             *R5  B,          \ branch to the address in R5 

l: _enter     IP RPUSH,       \ push IP register onto the return stack 
              W IP MOV,       \ move PFA into Forth IP register 
              @@9 JMP,

Rather than a duplicate copy of _NEXT for _enter (compiled as DOCOL in Forth words) I just used a cheap JMP instruction and use the saved space for some fast primitives.

The new version became:

l: _exit      IP RPOP,        \ >8388
l: _next                      
@@9:         *IP+ W  MOV,     
             *W   R5 MOV,   
             *R5  B,        

l: _enter     IP RPUSH, 
              W INCT,                                                   
              W IP MOV, 
              @@9 JMP,

So the bottom line is as long as you convert the new W register to the PFA in all the words that need it it should work. I think...

+TheBF · January 26, 2022

In case I wasn't clear:

All the words that need it of course are DOVAR DOCON DODOES DOUSER

+TheBF · January 26, 2022

AND it looks like my VOCABULARY/WORDLISTS needs tweaking too. (Yuk)

BF S/W Rules Apply:

1. Everything is possible

2. Nothing is easy

+TheBF · January 27, 2022

Feeling a little congested this week. Might have the dreaded plague but tests are not to be found.

No big deal. I have not even needed to take an aspirin so its pretty mild. Feels like a cold. Might even be a cold.

In the mean time I took some advice that I saw about SAMS cards posted by Lee and others to update my SAMSINI so it works properly for my 1M card and also on Classic99.

The method recommended is swap the page selection byte and write 16 bits. Then to test if it's good, I only read the byte value and compare to what I put in.

This of course is only good for the 1M card but that's what I own. If anyone needs something different I am happy to write it up for them.

To make sure it works on real hardware I sent this code to Camel99 TTY and it passed.

Spoiler


DECIMAL
 24 USER 'R12  \ address of R12 in any Forth workspace

HEX
: SAMSCARD  ( -- ) 1E00 'R12 ! ;   \ select sams card
\ using machine code so we don't need the CRU library
HEX
\ *set the CRU address in 'R12 before using these words*
  CODE 0SBO  ( -- ) 1D00 ,  NEXT, ENDCODE
  CODE 0SBZ  ( -- ) 1E00 ,  NEXT, ENDCODE
  CODE 1SBO  ( -- ) 1D01 ,  NEXT, ENDCODE
  CODE 1SBZ  ( -- ) 1E01 ,  NEXT, ENDCODE

: SAMS-ON   ( -- ) SAMSCARD 1SBO ;  \ enable mapper
: SAMS-OFF  ( -- ) SAMSCARD 1SBZ ;  \ disable mapper

\ * SAMSINI sets 1Mbyte card to "pass-through" condition
: SAMSINI
       SAMSCARD          \ select SAMS card
       0SBO              \ turn card on
       0                 \ register value stays on stack
       4000 20           \ register address, # SAMS regs
       BOUNDS ( -- 4100 4000)
       DO
           DUP >< I !    \ swap bytes and write 16 bits to reg
           I C@  OVER <> ABORT" SAMSINI failed"
           01 +          \ next passthru value
       2 +LOOP
       0SBZ              \ turn off card
       DROP
;

GDMike · January 27, 2022

Get well soon..rest is always a helper

+TheBF · January 31, 2022

It's nice to see a little progress in my health and the kernel.

I have been beating up my newest kernel where I have moved my stacks around a bit.

I had packed things a little tight up in hi RAM and on occasion things were getting stomped.

There have also been the changes to what I keep in 16bit RAM and the improvements made in how the "executor" words were being compiled.

I found a "performance notes" file in my folders from 2018 and there was this little test program:

0 VARIABLE X
HEX
: TEST  ." Running..."
        0 X !
        BEGIN
           X @ 4 <
        WHILE
           0
           BEGIN
             1+ DUP
             FFFF =
           UNTIL
           DROP
           1 X +!
           X @ 4 =
        REPEAT
        0 X !
        CR ." Done!" ;

Back in 2018 with a different computer and a different version of Classic99 I wrote these results. ( I used TF as the gold standard for speed)

Jan 2018:
Turbo Forth 57 sec.
CAMEL99 57.3 sec.

Re-testing this January I get this:

Jan 2022 on Classic99

Turbo Forth 57.4
Camel 2.69 55.7

There are still tons of programs where TF is faster but I finally caught up to Mark on something.

+TheBF · February 1, 2022

What sort of sort are you talking about?

Before I dive into porting @apersson850 's quick sort code I thought I should get some sense of how other sorts to compare.

I have never coded up a sort completely in assembler so I did a bubble sort as the worst case and the simplest to do.

I have a working test harness already that I used for COMBSORT experiments.

The spoiler has the BUBBLE sort coded in Forth with the supporting code.

This took 5:01.33 to sort 1,000 integers.

(The data set was in reversed order for all tests)

Spoiler


INCLUDE DSK1.ASM9900
INCLUDE DSK1.TOOLS
INCLUDE DSK1.RANDOM
INCLUDE DSK1.ELAPSE
INCLUDE DSK1.VALUES
INCLUDE DSK1.ARRAYS

0 VALUE GAP
VARIABLE SFLAG

DECIMAL
1000 VALUE  SIZE

SIZE 4 + ARRAY ]Q

: XCHG  ( adr1 adr2 n1 n2-- )
        S" OVER @ OVER @ SWAP ROT !  SWAP !" EVALUATE ; IMMEDIATE

\ usage:   X Y XCHG  \ moves contents from X->Y & Y->X
\ makes sort 6% faster than the Forth macro on 1000 item reversed sort
\  CODE XCHG   ( adr1 adr2 -- ) \ exchange values of 2 variables
\             *SP+  R0 MOV,    \ pop adr1 into R0
\             R0 ** W  MOV,    \ adr1 @  W  !
\           *TOS R0 ** MOV,    \ mem2me m move  adr2 to adr1
\              W  *TOS MOV,    \ move value adr1 to adr2
\                  TOS POP,
\                    NEXT,
\ ENDCODE

: BUBBLE ( n -- )  \ for comparison
    BEGIN
      SFLAG ON
      DUP 0
      DO
       I 1+ ]Q @   I ]Q @ <     \ test the values
       IF
          I 1+ ]Q   I ]Q  XCHG  \ exchange if needed
          SFLAG OFF
       THEN
      LOOP
      SFLAG @
   UNTIL
   DROP ;  \ 5:01.33 size=1000

\ testing harness
: ERASE   0 FILL ;

\ load the array with different kinds of mixed up data
: CLEARIT   ( -- ) 0 ]Q SIZE CELLS ERASE ;     \ all the same data
: REVERSED  ( -- ) SIZE  0 DO  SIZE I -  I ]Q !  LOOP ;
: ASCEND    ( -- ) SIZE  0 DO       I    I ]Q !  LOOP ;
: RANDIT    ( -- ) SIZE  0 DO  SIZE RND  I ]Q !  LOOP ;
: TWOIT     ( -- ) CLEARIT   99 45 ]Q !  777 SIZE 2/ ]Q ! ;
: TURTLES   ( -- )
           SIZE 0
            DO
               I     I 1+ ]Q !   \ each 2 elements are reversed
               I 1+     I ]Q !
            2 +LOOP ;

\ use this to print the array
: .Q  ( -- ) CR  SIZE  0 DO  I ]Q @ U. ?BREAK LOOP ;

Replacing the Forth version with Forth Assembler version took the time down to 0:55 seconds.

CODE BUBBLE ( addr n -- ) \ reference sort as the worst case
  TOS TOS ADD,  \  n= n x 2
  *SP TOS ADD,  \  tos= last address
   BEGIN,
       *SP R1  MOV,          \ first array address
        R1 R2  MOV,
           R2  INCT,          \ second array address
            R0 SETO,          \ sortflag=TRUE
        BEGIN,
           R1 **  R2 ** CMP,
           HI IF,
             \ Exchange array values
              R2 **    R5 MOV,  \ temp save
              R1 ** R2 ** MOV,
              R5    R1 ** MOV,
                       R0 CLR,  \ sortflag=false
           ENDIF,
           R1 INCT,
           R2 INCT,
           R2 TOS CMP,
        GTE UNTIL,
        R0 TRUE CI,             \ test sort flag
    EQ UNTIL,                   \ loop until true
    NEXT,
    TOS POP,
ENDCODE

The first level comparison is COMBSORT in Forth which is the simplest improvement I have seen on BUBBLE sort for the least effort.

It gave a result of 10.2 seconds... in Forth.

An all Forth recursive Quicksort comes in at 4.4 seconds.

My next reference point will be to see what an all Assembler version of COMBSORT can do.

apersson850 · February 2, 2022

Quicksort is sensitive to how the data is ordered before you start sorting it. Since the algorithm requires that you split the list of data to sort in two, the selection of where to split it is critical. In an ideal world, you should use the median value as the pivot. But the easiest way to find the median is to sort the list first...

Now if the list is in random order, then the probability that some simpler method to find the median gives a good result is higher than if you use a simple method with an already ordered list. This is why Quicksort frequently takes a longer time to sort an already sorted list than a purely random one.

Willsy · February 2, 2022

S" OVER @ OVER @ SWAP ROT !  SWAP !" EVALUATE ; IMMEDIATE

Inlining code... Niiiiiice! :thumbsup: <applause> :waving:

+TheBF · February 2, 2022

7 hours ago, apersson850 said:

Quicksort is sensitive to how the data is ordered before you start sorting it. Since the algorithm requires that you split the list of data to sort in two, the selection of where to split it is critical. In an ideal world, you should use the median value as the pivot. But the easiest way to find the median is to sort the list first...

Now if the list is in random order, then the probability that some simpler method to find the median gives a good result is higher than if you use a simple method with an already ordered list. This is why Quicksort frequently takes a longer time to sort an already sorted list than a purely random one.

Yes I noticed that with the Forth version. The worst cast was almost 9 seconds for a "sparse" array with only two integers and and the rest fo the elements set to zero.

This combsort that I have, seems to not have so much variability with the input data. It's quite remarkable given its simplicity. In high level language it requires a bit of extra code compared to bubble sort.

So if I get combsort in Assembler, I will have three methods to compare.

I am also wondering if switching your insertion sort to combsort for the final cleanup would be better or worse. Not sure.

On 16 elements Insertion sort is pretty optimal except when the data is completely reversed.

procedure combSort(var a: TIntArray);
  var
    i, gap, temp: integer;
    swapped: boolean;
  begin
    gap := length(a);
    swapped := true;
    while (gap > 1) or swapped do
    begin
      gap := trunc(gap / 1.3);
      if (gap < 1) then 
        gap := 1;
      swapped := false;
      for i := 1 to length(a) - gap do
        if a[i] > a[i+gap] then
        begin
	  temp := a[i];
          a[i] := a[i+gap];
          a[i+gap] := temp;
          swapped := true;
        end;
    end;
  end;

+TheBF · February 2, 2022

6 hours ago, Willsy said:
S" OVER @ OVER @ SWAP ROT !  SWAP !" EVALUATE ; IMMEDIATE
Inlining code... Niiiiiice! <applause>

Thanks. The comp.lang.forth world seems to not like these because you might redefine something later in your code that will change what the text-macro does.

Well.. it is Forth so be careful. because this is legal:

FOOT IF SHOOT THEN

The text macro gives a bit of improvement. But not as much as this would.

: EXCH   INLINE[ OVER @ OVER @ SWAP ROT !  SWAP ! ]  ;

https://github.com/bfox9900/CAMEL99-V2/blob/master/LIB.ITC/INLINE.FTH

I was shocked to discover that this was not as important as I thought it would be. Even re-writing the EXCH word in Assembler only improved the sorting speed by 6%.

Edited February 2, 2022 by TheBF
code error

apersson850 · February 2, 2022

56 minutes ago, TheBF said:

Yes I noticed that with the Forth version. The worst cast was almost 9 seconds for a "sparse" array with only two integers and and the rest of the elements set to zero.

Quicksort picks a value in the list as the key. Then it places all smaller values in the left list, all larger in the right list. Then you redo that on both lists. When your list size is down to two, you're done. But it's more efficient to use insertionsort for short lists, so the program should fall back to that when the lists are small.

What you want to avoid is to have the key being the value at the end of the list, since in that case the two lists you get is one with a single value and another with all the other values. What you want are two lists of equal size. So picking the first item as the key is a bad idea if the list is already sorted. You can pick the key at a random place. If you are lucky, good. But you may be unlcuky. The best thing you can do is to pick the median value. The best way to find that is to sort the list, then look in the middle. But then you're already done when you find the median!

My assembly program does implement using a median for the key. It looks at three values, at the start, middle and end of the list. It will pick the median of those. To make it more efficiently, it actually sorts these three values in order, then pick the middle one. Thus, once we find the median, the values involved are already in their right position, relative each other!

However, if almost all values in the list are zero (or any value, but they are all the same), except for two values, then the probability that the median will also be zero is very high, if you calculate it from three values in the list. That's why this is the situation where Quicksort is as slow as it can be.

There's not much to gain by replacing the Insertionsort with Shellsort, or instead use Comb sort, since Shellsort makes Insertionsort more efficient for long lists, just like Comb Sort makes Bubble sort more efficient for long lists. But in my program, Quicksort has already cut down the length of the lists when they are left over to Insertionsort. Ths list size is no more than eight elements at that time.

apersson850 · February 2, 2022

6 hours ago, Willsy said:

Inlining code... Niiiiiice! <applause>

Ouch! It's as tricky as using the procedure p_machine in Pascal. It allows you to generate p-code inline with your program.

Like if you have two pointers, declared to point to two different data structures, then the compiler will prevent you from making pointer a point to structure b. But this code will circumvent that restriction... It's not considered "neat and tidy".

(* Define opcode for STOre *)
const
  STO = 196;
(* Two variables, ptr_a and ptr_b, are assumed being declared as pointers to different data structures *)

(* Create a variable of type a, and make ptr_a point to it *)
new(ptr_a);

(* Now make pointer ptr_b, which isn't allowed to point at the same data type, do it anyway *)
p_machine(^ptr_b,(ptr_a),STO);

Willsy · February 2, 2022

Yes! Forth is assembler language for a virtual machine. Just like any other assembly language, it gives you infinite rope with which to hang yourself!

+TheBF · February 2, 2022

3 hours ago, apersson850 said:
Ouch! It's as tricky as using the procedure p_machine in Pascal. It allows you to generate p-code inline with your program.

Like if you have two pointers, declared to point to two different data structures, then the compiler will prevent you from making pointer a point to structure b. But this code will circumvent that restriction... It's not considered "neat and tidy".
(* Define opcode for STOre *)
const
  STO = 196;
(* Two variables, ptr_a and ptr_b, are assumed being declared as pointers to different data structures *)

(* Create a variable of type a, and make ptr_a point to it *)
new(ptr_a);

(* Now make pointer ptr_b, which isn't allowed to point at the same data type, do it anyway *)
p_machine(^ptr_b,(ptr_a),STO);

I had NO idea you do that with PCODE.

The example Mark referenced is called text macro in Forth. It just hands the text string to the compiler at compile time so you remove one sub-routine call.

So it's inline, but inline Forth code.

The INLINE[ ] directive I created is more like what you are doing with p_machine(). (OR maybe I don't understand it at all)

In Forth we can get the "execution token" (XT) for each word with tick ( ' )
In a machine code primitive the very next address cell contains real machine code.
Each code word ends with the Forth list reader call NEXT. In my case it's a B *R10 instruction so easy to identify.

With those three pieces of information we can copy the machine code from a kernel primitive into another place in memory.

This allowed one to make contiguous chunks of code that the GForth authors call "super-instructions".

GForth does this by default. I haven't figured out how to do that ... yet.

It all sounds great but the speed up is really at very very most about 50%.

Forth's looping words start to get in the way once you remove NEXT from the primitives in the inner loop.

Here is the guts of the routine to copy kernel code to another location.

\ Heap management words
: HEAP    ( -- addr) H @ ;
: HALLOT  ( n -- )   H +! ;
: H,   ( n -- )    HEAP ! 2 HALLOT ;

045A CONSTANT 'NEXT'  \ 9900 CODE for B *R10   Camel99 Forth's NEXT code

: CODE,  ( xt --)  \ Read code word from kernel, compile into target memory
           >BODY 80 CELLS  ( -- addr len)
           BOUNDS ( -- IPend IPstart)
           BEGIN
              DUP @ 'NEXT' <>  \ the instruction is not 'NEXT'
           WHILE
             DUP @  ( -- IP instruction)
             H,   \ compile instruction
             CELL+  \ advance IP
             2DUP < ABORT" End of code not found"
           REPEAT
           2DROP
;

apersson850 · February 2, 2022

24 minutes ago, TheBF said:

I had NO idea you do that with PCODE.

<snip>

OR maybe I don't understand it at all)

I think you do.

Normally, Pascal programs are converted into p-code, which in turn is interpreted by the PME (P-Machine Emulator). The PME is written in assembly, but normally, Pascal programs aren't converted further than to p-code.

But if you want to do something that violates the rules of Pascal, like assigning a 16-bit integer value to the first two characters in a character string, then you don't have to go to assembly to do that. You can use the p_machine instrinsic to create p-code inline with the Pascal code, to get that illegal thing done anyway.

In this particular example you don't have to do that either, since the UCSD p-system defines two different move procedures. You can call them to move any memory content from any address to any other address. There is also fillchar, which sends the same data to any number of bytes at any location. Can be used to zero all elements of an array much faster than indexing through a loop, for example.

So although Forth normally is much faster than Pascal programs, once you know quite a lot about UCSD Pascal, you can create pretty efficient code there too.

+TheBF · February 2, 2022

4 minutes ago, apersson850 said:

I think you do.

Normally, Pascal programs are converted into p-code, which in turn is interpreted by the PME (P-Machine Emulator). The PME is written in assembly, but normally, Pascal programs aren't converted further than to p-code.

But if you want to do something that violates the rules of Pascal, like assigning a 16-bit integer value to the first two characters in a character string, then you don't have to go to assembly to do that. You can use the p_machine instrinsic to create p-code inline with the Pascal code, to get that illegal thing done anyway.

In this particular example you don't have to do that either, since the UCSD p-system defines two different move procedures. You can call them to move any memory content from any address to any other address. There is also fillchar, which sends the same data to any number of bytes at any location. Can be used to zero all elements of an array much faster than indexing through a loop, for example.

So although Forth normally is much faster than Pascal programs, once you know quite a lot about UCSD Pascal, you can create pretty efficient code there too.

Very nice. Thanks for the lucid explanations.

I should warn you (paraphrasing the USA police) anything you say, can or may be stolen by me for future Forth projects.

Yes the realities of machine speeds means we sometimes have to break the polite language paradigm.

fillchar sounds like the Forth word FILL ( addr len char ) which does the same job at machine optimized speed. It is 10X faster than a Forth loop.

On the 9900 I have a few "non-standard Assembly routines that work a machine word size for a bit more speed and half the iterations. (FILLW SCANW MOVEW etc.)

apersson850 · February 2, 2022

The p-system's intrinsics moveleft and moveright, which can move any number of bytes from one address to another, do a test to see if the data to move and the addresses involved are all even. If so, they move half as many words. Otherwise, they move bytes. So they optimize as much as possible.

I haven't checked if fillchar does the same.

The p-system as a business is since long dead, so you can steal whatever ideas you like.

Edited February 2, 2022 by apersson850

+TheBF · February 3, 2022

OK. Now how cool is this video.

I did some editing on the Quicksort for Pascal so it would assemble under the TI assembler.

I loaded my sorting test set.

Link the object file and run and time it from the Forth command line.

AND... I just had to change the PASCAL stack register to Camel99's stack register in the code, and the Forth stack pointer came across into the external program.

The time of 2.66 seconds slower than the original because @apersson850 's machine has 16 bit memory I believe.

This is running completely in 8 bit RAM, workspace and all.

Here is the slightly altered source file.

Spoiler



*        TITL "QUICKSORT FOR INTEGERS"

* Procedure sorting integers in ascending order
* Called from Pascal host by QSORT(A,N)*
* Declared as PROCEDURE QSORT(VAR A:VECTOR* N:INTEGER)* EXTERNAL*
* Vector should be an array [..] of integer*

* Author: @APERSON850 on Atariage.com

* Modified for TI Assembler, to be called from Camel99 Forth
* 2022 Brian Fox

*--------------
* Workspace registers for subroutine at START
*
* R0  Array base pointer
* R1  Array end pointer
* R2  L
* R3  R
* R4  I
* R5  J
* R6  KEY
* R7  Temporary
* R8  Temporary
* R9  Subfile stack pointer
* R10 Main stackpointer
* R11 Pascal return address (P-system WS)
* R12 Not used
* R13 Calling program's Workspace Pointer
* R14 Calling program's Program Counter
* R15 Calling program's Status Register
*-----------------

* The actual vector in the Pascal-program could be indexed [1..n]
* This routine assumes only that n indicates the number of elements, not the
* last array index.


         DEF  QSORT

LIMIT    EQU    16               *Quick- or Insertionsort limit

*  * Don't need this for Forth **
*        BLWP @QSORT
*        AI   R10,4              *Simulate pop of two words
*         B    *R11              *Back to Pascal host

QSORT    DATA SORTWS,START       *Transfer vector for Quicksort

START    MOV  @12(R13),R10       *Get stackpointer from Forth's WP
         LI   R9,ENDSTK          *SUBFILE STACKPOINTER
         MOV  *R10+,R1           * POP PARAMETER N
         MOV  *R10+,R0           * POP ARRAY POINTER
         DEC  R1
         SLA  R1,1
         A    R0,R1              *CALCULATE ARRAY ENDPOINT

         MOV  R0,R2              *L:=1
         MOV  R1,R3              *R:=N
         MOV  R1,R7
         S    R0,R7
         CI   R7,LIMIT
         JLE  INSERT             *FIGURE OUT IF QUICKSORT IS NEEDED

MNLOOP   MOV  R2,R7
         SRL  R7,1
         MOV  R3,R8
         SRL  R8,1
         A    R8,R7
         ANDI R7,>FFFE           *R7:=INT((L+R)/2)
         MOV  *R7,R8
         MOV  @2(R2),*R7
         MOV  R8,@2(R2)          *A[(L+R)/2]:=:A[L+1]

         C    @2(R2),*R3
         JLT  NOSWP1
         MOV  @2(R2),R8
         MOV  *R3,@2(R2)
         MOV  R8,*R3             *A[L+1]:=:A[R]

NOSWP1  C    *R2,*R3
         JLT  NOSWP2
         MOV  *R2,R8
         MOV  *R3,*R2
         MOV  R8,*R3             *A[L]:=:A[R]

NOSWP2   C    @2(R2),*R2
         JLT  NOSWP3
         MOV  @2(R2),R8
         MOV  *R2,@2(R2)
         MOV  R8,*R2             *A[L+1]:=:A[L]

NOSWP3   MOV  R2,R4
         INCT R4                 *I:=L+1
         MOV  R3,R5              *J:=R
         MOV  *R2,R6             *KEY:=A[L]
         JMP  INCLOP

INRLOP   MOV  *R4,R8             *LOOP UNWRAPPING
         MOV  *R5,*R4
         MOV  R8,*R5             *A[I]:=:A[J]

INCLOP  INCT R4                 *I:=I+1
         C    *R4,R6
         JLT  INCLOP            *A[I]<KEY

DECLOP   DECT R5                 *J:=J-1
         C    *R5,R6
         JGT  DECLOP            *A[J]>KEY

         C    R4,R5
         JLE  INRLOP           *IF I<=J THEN CONTINUE

OUT      MOV  *R2,R8
         MOV  *R5,*R2
         MOV  R8,*R5             *A[L]:=:A[J]

DEL1     MOV  R5,R7              *Quicksort subfiles?
         S    R2,R7              *R7:=J-L
         MOV  R3,R8
         S    R4,R8
         INCT R8                 *R8:=R-I+1
         CI   R7,LIMIT
         JH   DEL2
         CI   R8,LIMIT
         JH   DEL2

         CI   R9,ENDSTK          *LVL=0?
         JEQ  INSERT             *No more Quicksorting at all?

         MOV  *R9+,R2            *POP L
         MOV  *R9+,R3            *POP R
         JMP  MNLOOP

DEL2     C    R7,R8              *Determine what is small and large subfile
         JL   ELSE2

         MOV  R2,@LSFL
         MOV  R5,@LSFR
         DECT @LSFR
         MOV  R4,@SSFL
         MOV  R3,@SSFR
         JMP  DEL3

ELSE2    MOV  R4,@LSFL
         MOV  R3,@LSFR
         MOV  R2,@SSFL
         MOV  R5,@SSFR
         DECT @SSFR

DEL3     CI   R7,LIMIT           *Is small subfile big enough to be sorted by
         JLE  THEN3              *Quicksort?
         CI   R8,LIMIT
         JH   ELSE3

THEN3    MOV  @LSFL,R2           *Don't Quicksort small subfile, only large
         MOV  @LSFR,R3
         JMP  MNLOOP

ELSE3    DECT R9                 *Stack large subfile
         MOV  @LSFR,*R9          *PUSH R
         DECT R9
         MOV  @LSFL,*R9          *PUSH L
         MOV  @SSFL,R2           *Sort small subfile
         MOV  @SSFR,R3
         JMP  MNLOOP

*
* Insertion Sort finishing up
*
INSERT   MOV  R1,R4
         DECT R4                 *I:=N-1
         C    R4,R0
         JL   LEAVE              *Check if any looping at al

FORI     C    @2(R4),*R4
         JGT  NEXTI              *If next is greater than this, it's OK

         MOV  *R4,R6             *KEY:=A[I]
         MOV  R4,R5
         INCT R5                 *J:=I+1

WHILE    MOV  *R5,@-2(R5)        *A[J-1]:=A[J]
         INCT R5                 *J:=J+1
         C    R5,R1
         JH   ENDWHL             *J>N?
         C    *R5,R6             *A[J]<KEY?
         JLT  WHILE
ENDWHL MOV  R6,@-2(R5)           *A[J-1]:=KEY
NEXTI    DECT R4
         C    R4,R0              *Check if passed array base point
         JHE  FORI

LEAVE    RTWP                    * Return Forth workspace
*        AI   R6,4               * remove parameters from Forth stack
         B    @R10               * Branch to Forth interpreter

*--------------
* DATA AREA

SORTWS   BSS  >20             *Workspace for sorting routine
SUBSTK   BSS  >40             *Internal subfile stack
ENDSTK   EQU  SUBSTK+>40      *End of that stack

* variable used by quicksort
LSFL     DATA   0             *Large SubFile Left pointer
LSFR     DATA   0             *Large SubFile Right pointer
SSFL     DATA   0             *Small SubFile Left pointer
SSFR     DATA   0             *Small SubFile Right pointer

         END

Willsy · February 3, 2022

Wow! That's awesome - I might have to steal that! Great job :thumbsup:

+TheBF · February 3, 2022

2 minutes ago, Willsy said:

Wow! That's awesome - I might have to steal that! Great job

Of course my regular rates apply. ?

Have fun. I bet you have a ton of Assembler code you could test this way. The linker as it exists can only do 8K total of code but that's a shit-ton for testing purposes.

And all the data could be in your Forth space anyway.

+TheBF · February 3, 2022

It's a bit faster on random data that completely reversed data.

0.5 seconds!

apersson850 · February 3, 2022

Nice to see you got it to work!

Yes, the algorithm works better on random data. Data where almost all values are the same, except a few, or already sorted data, or reverse sorted - these take longer time to process. That has to do with how the sublists are created in these cases.

At the same time it's interesting to see the time difference due to the memory. It's true that my machine has 64 K RAM 16-bit wide, with no wait state generation. My random list was sorted in about 0.25 seconds, so about half the time. That corresponds to my earlier exterments ,where I found the computer to be roughly twice as fast, when running assembly programs where both the workspace and the code is in expansion memory, which normally is 8-bit wide, with six cycles per memory word access, vs. two cycles for my computer. My reversed list was sorted in about 1.4 seconds or so.

What do you use to do the timing of the procedure?

And finally, when it comes to "the regular rates" - don't forget where this came from! ?

HOME AUTOMATION · February 3, 2022

...Stealing is beneath me. |:) However, what little I have to offer, would be a pittance! :grin:

+TheBF · February 3, 2022

51 minutes ago, apersson850 said:

Nice to see you got it to work!

Yes, the algorithm works better on random data. Data where almost all values are the same, except a few, or already sorted data, or reverse sorted - these take longer time to process. That has to do with how the sublists are created in these cases.

At the same time it's interesting to see the time difference due to the memory. It's true that my machine has 64 K RAM 16-bit wide, with no wait state generation. My random list was sorted in about 0.25 seconds, so about half the time. That corresponds to my earlier exterments ,where I found the computer to be roughly twice as fast, when running assembly programs where both the workspace and the code is in expansion memory, which normally is 8-bit wide, with six cycles per memory word access, vs. two cycles for my computer. My reversed list was sorted in about 1.4 seconds or so.

What do you use to do the timing of the procedure?

And finally, when it comes to "the regular rates" - don't forget where this came from! ?

LOL. Yes indeed. I put your handle in my version of the code for that very reason. Willsy has to pay full freight!.

The timing is done using the screen timeout counter and scaling it.

It's not good if you are doing a lot of VDP actions because interrupts get stopped too frequently. But on code like this it real time tracks very well.

So the timing on Classic99 is very close to real iron from what I can measure so your numbers with your super-duper TI-99 make sense versus these.

Thanks again for the code. It was great fun. The first Assembly attempt showed 99 errors. How's that for a coincidence?

Here is the ELAPSE timer code in Forth

Spoiler


\ ELAPSE.FTH  elapsed time measurment words
\ Thanks to Tom Zimmer for the good ideas in FPC circa 1990
\ Ported to HsForth 08MAR91  Brian Fox Canada

\ Ported to CAMEL99 Nov 29 2017, 
\ Simplified with SEXTAL Dec 6 2018
\ Good for 9 minutes maximum duration

\ *** YOU CANNOT CALL KSCAN WHILE TIMING ***

HEX
83D6 CONSTANT TICKER   \ screen timeout counter increments by 2 /16mS

DECIMAL
: SEXTAL   6 BASE ! ;
: <:>     [CHAR] : HOLD ;
: <.>     [CHAR] . HOLD ;
: TIME$   ( n -- addr len) \ string output is more flexible
          BASE @ >R
          \         100ths        secs           minutes
          0 <#  DECIMAL # # <.> # SEXTAL # <:> DECIMAL #S  #>
          R> BASE ! ;

: .ELAPSED ( -- ) 
           TICKER @ 5 6 */ TIME$
           CR ." Elapsed time ="  TYPE ;

: ELAPSE   ( -- <text> ) 1 PARSE  TICKER OFF  EVALUATE .ELAPSED ;

Camel99 Forth Information goes here

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members