CUTTING STRINGS FASTER
One the things that BASIC offers the programmer is a handy string data type. Along with the string data type itself there are a set of functions to cut the strings into various pieces and find strings within other strings.
Many implementations of strings rely on copying the sub-string to a string stack or to another string. Copying bytes back and forth in memory eats up a lot of time. This is even more serious in TI-99 BASIC where the strings are held in VDP memory. What if, when you cut a string, you did not copy it? How would that work you ask? One way that has entered the Forth world is to simply record the memory location of the new sub-string and its length in a temporary location. For example take the string “All my ex’s live in Texas”.
Let us say that this string is currently at address >3000 in low memory and it is 25 bytes long plus 1 byte for the count. In ASM it would look like this:
ADOLL BYTE 25
TEXT ‘All my ex’s live in Texas’
In Forth it would be this:
: A$ ( -- addr len) S” All my ex’s live in Texas” ;
The word ADOLL and A$ in both cases are the labels (name) of the strings. So the way we could represent ADOLL is to just use the address and the length. In ASM this would be in registers typically and in Forth they would be on the DATA stack. For this discussion the numbers would be >3000,25. With this representation, If we wanted to do a LEFT$ operation STRING all we have to do is change the number 25 to the new length. That’s it. No copying and we have created a new string!
In Forth we can make LEFT$ like this:
: LEFT$ ( $addr len newlen – $addr len) NIP ;
STRING 12 LEFT$
The Forth word NIP removes the second item on the DATA stack. One command is all we needed to create the function LEFT$. The output of LEFT$ falls onto the DATA stack where you can print it or save it or whatever you want to do.
And RIGHT$ is not much harder:
: RIGHT$ ( $addr len newlen – $addr len) OVER SWAP - 0 MAX /STRING ;
The word /STRING (pronounced “cut string) above is a surprisingly powerful yet very simple ANS Forth word that takes an (address, len) pair and a number.
It subtracts the number from the length and adds the number to the address. This gives us a new string address starting point and a new length.
We add the 0 MAX to prevent a negative number as a size.
/STRING DEMONSTATION CODE
A$ TYPE All my ex’s live in Texas
A$ 4 /STRING TYPE my ex’s live in Texas
In conventional notation /STRING does this:
DEF CUTSTRING (ADDR,LEN,N)
What we did above was equivalent in conventional notation to:
DEF RIGHT$(ADDR,LEN, NEWLEN)
(These Forth functions return 2 outputs so they don’t fit well with functional notation.
The examples are here to help translate the Forth reverse polish notation and stack movement which can make you feel dizzy when you are not used to it)
You can easily see how this is much, much faster than copying the bytes to a new string or buffer. There are no loops and no copying. In ASM you could transfer these new values to new registers or a better way would be to implement a stack using a register as a pointer and pushing the new values onto the stack for later use. Forth uses its own DATA stack for this purpose and as such the output of each string manipulation can be the input for the next string function.
Consider this expression where the output of LEFT$ is on the stack waiting for RIGHT$. Then TYPE takes the address and length from RIGHT$ and prints it on the screen:
A$ 20 LEFT$ 5 RIGHT$ TYPE
Using this technique we can re-create TI BASIC’s SEG$ with only four words!
: SEG$ ( addr len start size -- addr len)
>R \ push the size onto the rstack temporarily
/STRING \ cut the string using the start parameter
DROP \ NOT going to use the length as is, so drop it
R> \ move the SIZE parameter back onto the data stack
So if you do strings this way you do all the processing with the address and len and only copy back to memory when you are finished the string manipulation.
This makes a very fast system.
In the next installment we will use this method to do something you might see in LISP or Python and it runs pretty fast even on our TI-99.
Edited by TheBF, Thu Nov 9, 2017 10:55 PM.