Tutorial: Arrays in Forth

Willsy · October 14, 2017

Introduction

Arrays are something that you soon get to grips with in most programming languages. Just about every programming language supports them.

Except Forth.

Yes. It's a bit strange really, but Forth provides no built-in support for handling arrays of... well... anything, really.

So you can't do arrays in Forth, right? Wrong. The reason you don't get arrays "out of the box" in Forth is simply this:

"In Forth, you can implement your own arrays, and make them work just like you want them to."

In this article, I'm going to present my own solution, which is tiny and solves the "array problem" in a Forth-like way (i.e. making Forth do as much of the work for us, so that we don't have to).

An Array Syntax

The cool thing about Forth is that when you write Forth "code" you're just extending the language. Each new word that you write becomes part of the language, and we can use this (and indeed, you're supposed to) to our advantage. We're going to invent a little array handling lexicon (group of words) to give TurboForth support for 16-bit (1 cell) integer arrays.

Lets take a look at the typical syntax that one uses in BASIC and C for handling arrays of integers. Lets use cats as an example. We'll have a four element array that holds the number of lives for four cats: Tom, Dick, Harry, and er, Fred.

Here is how we might do this in BASIC and C:

Looks okay to me. How would we do this in Forth? Well, it's Forth, and unlike C and BASIC, we can extend the language any way we see fit. We just have to write the code and it just "becomes" part of the language.

When we declare VALUEs in Forth (a kind of variable that pushes its value to the stack when reference it) we give it an initial value via the stack, and a name, like this:

4 VALUE CATS

So, 4 goes on the stack, the word VALUE executes, and he (VALUE) knows he has to create a new entry in the dictionary for this new value. Where does he get the name from? Well, it comes immediately after the word VALUE, so VALUE itself reads the input in front of itself and gets the name and creates it. This would work well for us with arrays: It's Forthy and Forth users would be comfortable with it immediately. In Forth, it's "common sense".
Declaring an Array

How about this to declare an array:

size [] name

So, that's the size of the array (we're only going to talk about integer arrays), then that weird [] thing (which is what is actually going to create the array for us) and then the name of array, so that we can reference it in our code later. So:

9 [] lives

Would declare an integer array of 9 elements, called "lives". That [] symbol is just two square brackets. It's a Forth word (that we're going to create). Remember that in Forth you can call a word anthing you like, as long as it doesn't have a space in its name. So, we'll use [] as a kind of pictogram, since array access in C, Java and Pascal is accomplished with square brackets (BASIC being the odd one, as it uses parenthesis) so [] should be obvious to us that we're dealing with arrays.
Accessing Array Elements

Okay, so we have a way to declare an array. Now we need a way to write data into the elements of the array. Again, lets look at existing Forth practice for this:

Writing to the Array

In Forth, when writing to variables with the word ! we first place the value to store on the stack, then the variable to store it in, then the word ! takes this information off the stack and uses it to perform the write action, like this:

variable score
999 score !

Since anyone that uses Forth would be comfortable with this, it makes sense to use the same idiom if we can. However, we need three items to go on the stack:

The value to write into the array;
The element number in which to write the value;
The name of the array itself.

Like this:

4 5 lives

So, we want to write the value 4 into the 5th element of the lives array. But, just like ! with variables, we need a word to do the writing for us. Well, it's Forth convention to use the symbol ! to represent writing values to something, since ! has been used to write to variables since at least 1970. So, we really should pick something similar for our lexicon that follows this convention, so that it conveys meaning to the reader.

I elect (because I'm writing this article) the word/symbol [!]

4 5 lives [!]

Here, again we're using the [] nomenclature to indicate that we're dealing with an array, but we have that rather cool ! inside the brackets, which is synonymous with ! which means write. So, the above reads as:

Put 4 on the stack (the value we want to write into our array)
Put 5 on the stack (we want to access the 5th element of our array)
The name of the array;
The word [!] which does the actual writing to the array, using the data we have placed on the stack for it.

The stack signature for [!] would be:

value element array --

Which simply means that [!] exepcts the value, the element, and the array itself on the stack, and it removes them when it's done.

Reading from the Array

So, how to read from the Array? Well, when we read variables, we use the word/symbol @ (fetch). @ just needs an address on the stack, and then it uses that address, goes and reads that address and pushes whatever it finds there onto the stack. Again, we should use the same principle, as the concept is already established in Forth and we should avoid completely new paradigms and nomenclatures unless we have no choice; they're just harder to remember later on.

I elect [@] as our word to read from an array. It will need two items on the stack: The element number, and the array itself:

5 lives [@]

So, the above would read the 5th element from the lives array, and place it on the stack. The stack signature is:

element array -- n

So, after [@] executes, it removes the element number and the array reference from the stack, and leaves a value (n) for us - the value that it read from the array.

That's probably enough to get us started writing code. Strap in...

A Veritable Code Smorgasbord

Well, the first word we need to tackle is [] creates an array for us.

A Detour Into CREATE

There is a very useful word in Forth called CREATE which, er, creates stuff for us. Specifically, it creates words in the dictionary. We're going to go on a little detour here and discuss CREATE for while, because it does a little more than just creating a word in the dictionary for us. The words it creates are smart; they actually inherit some special built in behaviour: They push the address of themselves. Let's look at an example:

create chaos

If you type this directly into the TurboForth command line, it will dutifully obey you, and go ahead and create that word for you. You can now execute that word, straight away, which is a little weird, since the word has no code associated with it. Nevertheless, go ahead and try it:

As you can see, when we executed chaos something was pushed on the stack for us. What is that? Well, words created with CREATE reserve one cell (in TurboForth, a 16-bit (2 byte) word) of 'storage space'. In addition, words created with CREATE inherit some behaviour such that, when they are executed, they push the address of this storage space to the stack (if you are familiar with objects, you can think of words created with CREATE as objects of class CREATE, and they have a single static method, which pushes the address of its storage). In memory, it would look like this (let's assume chaos is stored at memory address B000 hex to keep things simple):

The link field is the link to the previously defined word in the dictionary and is not important to us. The length field just holds the length of the name (plus some other info, again not important to us). Then comes the 5 bytes of the name, plus a padding byte since it's an odd length. Then the Code Field. Now things are getting interesting. This is the address of some code that is executed when you execute chaos (this is where chaos' inherited behaviour comes from). So, when you execute chaos it will actually execute some code at address 63E0 (just chosen for illustration purposes - i've no idea where it goes in reality, and I wrote the TurboForth!).

What does that code do? It pushes the address of the parameter field. It's called a parameter field because we can store, well, a parameter in it. Personally I'd have called it a data field, but what do I know?

Lets prove it:

Here, we have created chaos, then we used ! to store 99 into it. Then we read it back.

Let's look at how this works:

Remember that chaos is "pre-defined" to push the address of its parameter/data field onto the stack when we execute it;
We put 99 on the stack;
We execute chaos, and he puts his data field address on the stack
We excute ! (store) which stores 99 into the address provided by chaos.

Then we read it back:

We execute chaos, so he pushes the address of his data field onto the stack;
We execute @ (fetch), so he uses the address supplied by chaos, reads that address, and puts the result on the stack;
We use . (dot) which takes the top item off the stack and prints it on the screen as a number. And there's the 99 that we stored.

If you're thinking "Hang on, that's just the same as a Forth variable" then congratulations. You're right. In fact the definition of VARIABLE in Forth is:

: VARIABLE CREATE ;

The [] Word

So, now that we know that know about CREATE, we can use it (and some other tricks) to build our array defining word for us, which we have called [].

We're going to built this up as we go, illustrating concepts as we go. Little steps.

As we determined above, we need to declare an array of a specific size (which is going to be passed in on the stack), so CREATE is only going to get us so far... We still need a way to reserve some storage space. I'll just go ahead and give you the code, and then we'll break it down. You'll probably have some "aha!" moments along the way. Later on, we're going to make it more sophisticated, but first we'll keep things simple. Here's the code for our array creation word, called [] (which I pronounce out loud as (drum roll...) "create array").

: [] ( size tib:"name" -- ) create dup , cells allot ;

Not a lot of code is it, but there's quite a lot going on here, as you're about to see. Hang in there. It's worth learning this stuff.

First, the stack signature:

size tib:"name" --

This indicates that [] needs a size on the stack, and also needs the name of the array in the Terminal Input Buffer (tib). What on earth is the terminal input buffer? It's the buffer that you are typing stuff into on the TurboForth command line. This is just a fancy way of saying that [] requires the name of the array to follow the [] itself, like this:

10 [] years

So, let's go through step-by-step, and see what happens when you type 10 [] years into TurboForth:

The number 10 goes onto the stack (no surprises there);
The word [] executes, which:
- Executes CREATE, which:
- Reads ahead in the Terminal Input Buffer and identifies the string years;
- Uses this string to create an entry in the dictionary - so we now have a new word in the dictionary called years;
Executes DUP. Remember that 10 that was placed on the stack? We need two of them, so we DUPd it. Now we have 10 10 on the stack;
Executes , (comma). Comma takes a value off the stack and compiles it (or "commas it" as we say) into the current compilation address in memory. What on earth is that? It's a pointer that allows the system to know where the next thing to be compiled should go. Have a look at the table above that shows the chaos word, with the parameter field at the end. Well, when CREATE finishes creating the word, the current compilation address (known as HERE) is pointing right at that parameter field. Now, when we execute , (comma) it takes the value 10 off the stack, and "comma's it" into memory, so it goes right into the parameter/data field. So, in our case, years has 10 sitting in its data field, which we just put there. Why did we just put that in there? Because, later on, we will need to know how big the array is so that we can do bounds checking (i.e. check that we don't do something stupid like write to the 100th element of a 10 element array). So, since we have stored the size of our array "inside" years, we'll be able to access it later. Recall that words created with CREATE push the address of their data field every time, which, in our case, will always contain the size of the array. Sorry to labour the point but it's critical to understand this.
Executes CELLS. This word simply takes what is on the stack (in our case, the first 10 that we typed in at the command line) and multiplies it by two. Why do we need to do that? Well, because next we're going to execute...
ALLOT. This is a neat word. It takes a number off the stack (20 - it was 10, but CELLS multiplied it by two) and reserves that many bytes of memory. We want to reserve cells, which are two bytes, (10 cells = 20 bytes) which is why we used CELLS convert a cell count into bytes). Internally, ALLOT "reserves" the memory by simply adding the value we just computed (20) to HERE (the current compilation address pointer). So, this is how memory would look after creating our array, years:

As you can see, HERE is now pointing at the address just after the last byte of our array. The next word that we create in the dictionary will start "HERE". That's why it's very important that we include bounds checking. If we didn't, then the word [!] (when we get around to writing it) could possibly destroy other words in the dictionary if we supplied an out-of-range value for the element number.

Let's use DUMP to actually look at the array in memory.

First, type COLD to reset TurboForth to a known state, and let it boot. Next, enter the definition for [] as follows:

: [] ( size tib:"name" -- ) create dup , cells allot ;

Now, create an array, like this:

10 [] years

Next, with the Utilities disk in DSK1, type 36 LOAD to load the dump utility. View the array in memory, like this:

' years 50 DUMP

The above line uses the word ' (pronounced tick) to get the address of the word 'years', we then put 50 on the stack, and call DUMP. DUMP takes the address off the stack, and the 50 (which is the number of bytes to display) and shows us the data.

See the 000A there in the second cell? That's the size of the years array, then 20 bytes of reserved space (here they have the value 20 (hex) but they could have any other value (ALLOT does not initialise the allotted space to zero)). After that, you can see the word _vect which is actually a word used by the DUMP utility. Now you can see that without bounds checking (which we haven't done yet) we could potentially write into any area of memory and destroy the dictionary.

Next, we'll add the ability to write to the array...

Writing to the Array with [!]

As discussed above in the Writing To The Array section, our [!] word will need three items on the stack. In order, these are:

The value to write into the array;
The element or index number in which to write the value;
And finally (at the top of the stack) the name of the array itself.

Therefore the stack signature is:

value index array --

As we can see, there is nothing on the output side of the stack comment, meaning that [!] will consume its arguments (they'll be removed) which is standard practice. Let's roll up our sleeves and build the definition up:

: [!] ( value index array -- ) swap cells + cell+ ! ;

So, remember when [!] executes, it expects three items on the stack, as shown in the stack comment.

The first thing it does is SWAP the index and array, such that the index is at the top of the stack. When we use [!] we'll be specifying the index in cells (16-bit words), so we execute CELLS which converts it to bytes.

Now we execute + which adds the newly converted number of bytes to the array address. The stack now looks like this:

value array_address

Okay, pretty good. However, we have a problem... We are storing the size of the array itself in the first cell of the array space, so we need to step over it. We need to add 2 to the address to make it point to the data-space immediately after the array size.

Now, we could simply execute 2 + (or, even better, 2+) to add two to the address, and that would be perfectly fine. However, we're going to use the word cell+ which actually does the exact same thing (adds 2 to the value at the top of the stack). So why use cell+ instead of 2+? Simply because cell+, as a synonym of 2+ describes the intention of the code better. It tells the reader that we're moving to the next cell in memory, rather than adding 2 to your Space Invaders score, for example.

Okay, so we now have an address at the top of the stack, and a value underneath it, which is all that the word ! ("store") needs to do its work, so next we execute ! and the value is stored in the correct place in the array. Let's try the whole thing from scratch. Type COLD into TurboForth to reset it, and enter the following code:

36 load
: [] ( size "name" -- ) create dup , cells allot ;
: [!] ( value index array -- ) swap cells + cell+ ! ;

The "36 load" loads the DUMP utility so that we can inspect memory.

Now type the following:

10 [] years
1949 0 years [!]
1970 1 years [!]
1977 2 years [!]
1978 3 years [!]
1980 4 years [!]
1981 5 years [!]
1985 6 years [!]
' years 50 dump

Here we can see the contents of our array. Here's what we've got:

Note the value 10 (000A) is there (the size of our array). We haven't added bounds checking yet, we'll get [!] and [@] working first, then we'll re-visit them to include bounds checking later.
Next we see 079D which is 1949 in hex (the year Mark Knopfler was born),
then 07B2 (1970, the year this author was born),
then 07B9 (1977, the year Dire Straits released their seminal first album of the same name),
then 07BA (1978, the year of Communique, Dire Straits' second album),
then 07Bc (1980, the year Dire Straits released their third album, Making Movies (you maybe noticing a bit of a theme here)),
then 07BD (1981, the year the TI-99/4A hit the market),
then 07C1 (1985, the year Dire Straits released Brothers in Arms).

Yes, I do know the year of release of every Dire Straits album, and the in-order track listing of every album. I know what you're thinking.

Reading from the Array with [@]

Okay, so far so good. We can write to the array. Now we need to add some code to read from the array. We'll prove it works then re-vist both [!] and [@] and add bounds checking later.

Here's the definition of [@]:

: [@] ( index array -- value ) cell+ swap cells + @ ;

Here's the break-down:

The first thing we do is add 1 cell (2 bytes) to the address of the array (remember, array actually leaves the address of its parameter field on the stack, which we're using to hold the size of the array).
Now we do a SWAP so the index is now on the top of the stack, where we can work with it.
Using CELLS we convert the index, which is expressed in terms of cells, into a byte value (as mentioned earlier, it just does a mulitply by two, so, for example, 5 CELLS represents 10 bytes).
We add the index and the array address together using +. We now have the address to read from on the stack.
We execute @ which reads from the address passed on the stack, and replaces it with the value read from memory.

And we're done.

That wasn't too bad was it? And, crucially, we've just defined our own way of handling arrays, which can be used in any program we write from now on.

Adding Bounds Checking

The above is all very well, but without bounds checking on our arrays, we run the risk of writing to areas outside of the array, with potentially disastrous results.

Consider this:

5 [] my_array
: test ." Hello Mother" cr ;
$2046 14 my_array [!]
$6174 15 my_array [!]
test

See? We can modify code anywhere in the system. Here, we modified a string, so we got away with, but we could have written into the executable code of a word and crashed the system easily. These types of bugs can be very difficult to find, because it would look the bug occurred inside the word that got blasted over, and not in the code that did the blasting!

Adding bounds checking to our word is very simple, and, since the bounds checking will be common to both words, we'll put it in its own word, and then [!] and [@] can both use it.

We'll create a word called [?] which checks the array index against the allowed size, and will either take no action if the index is legal, or abort if it's illegal.

So the stack signature for [?] will be:

index array -- index array

This is a little unusual, as this word does not consume or change its arguments. It's normal convention for words to consume their arguments. However, that would mean that we'd have to use 2DUP to duplicate the index and array in both [!] and [@], and I don't like that. So, I'm going to have [?] do it instead. If [?] detects an out of bounds access, it will abort with an error message and the program will stop running (and the stack will be cleared). If there is no bounds violation, the index and array address will remain on the stack un-changed.

: [?] ( index array -- index array ) over abs over @ > abort" Array index out of bounds" ;

Since both [!] and [@] are going to use this word, it needs to be defined first. Thus, the whole program together becomes:

: [] ( size tib:"name" -- ) create dup , cells allot ;
: [?] ( index array -- index array ) over abs over @ > abort" Array index out of bounds" ;
: [!] ( value index array -- ) [?] swap cells + cell+ ! ;
: [@] ( index array -- value ) [?] cell+ swap cells + @ ;

Before we discuss how the bounds checking works, it's worth taking a moment to reflect on how much code there isn't. The above code compiles to 128 bytes. That's with bounds checking. And not only that, we implemented arrays exactly the way we wanted to, and, we can change it later if there's something we don't like about it.

"In Forth, you have control over everything."

Okay, lets take a look at how [?] works, as things probably look a little confusing in there:

The first thing to note is that when [?] is called, the index and array (at least) are on top of the stack. In the stack diagrams below I'll use green to indicate stack items that were on the stack before the word executes, and red to indicate stack items that the word itself puts there as it's executing. If the bounds are being checked as part of [!] then we actually have value, index, and array address on the top of the stack, but we're only interested in the top two: the index, and the array address.

We use OVER to move a duplicate of the index "over" the array address, to the top of the stack:

index array index

Then we use ABS to check the index value. If the value is 0 or positive, the index will be un-changed. If the index is negative (for some bizarre reason) then ABS will change the index to a positive value. For example: -3 becomes 3.

Now we use over again move a duplicate of the array address "over" the index, to the top of the stack:

index array index array

Now we execute @ which uses the array address on the top of the stack and reads the size of the array, which, as we learned above, is stored in the parameter address of the array. So our stack looks like this:

index array index size

Next we execute > which compares the index to the size. When > executes it consumes the size and index and leaves a flag on the stack (either TRUE (-1) or FALSE (0). If the flag left on the stack is true, ABORT will abort the running program, and issue the error message. Otherwise it will take no action (apart from removing the true/false flag on the stack). Thus at the end of the word, if no index error occurs, then we are just left with the original index and array on the stack, which is used by both [!] and [@].

And there, dear Forthers, is a simple implementation of arrays with bounds checking in 128 bytes. Nothing fancy, nothing flash. Just what's needed.

That's Forth.

Edited October 17, 2017 by Willsy

+TheBF · October 14, 2017

That's a clean approach to arrays in Forth.

I have to come to the conclusion that is this kind of code is clearer and faster ultimately than using the Forth CREATE DOES> construct for arrays.

Nicely done.

Willsy · October 15, 2017

Thanks! A bit wordy/long-winded but I'm trying to give some insight into what's going on under the covers, and a bit about why we do things we the way we do!

It's a shame more folk aren't trying Forth - any of the incarnations!

+Vorticon · October 15, 2017

You need to get that TF manual finished!!!

+jedimatt42 · October 16, 2017

I feel like you get away with using abs in [?] because CELLS does something you didn't describe. Such as shifting the sign bit off the end when. Is that correct?

-M@

+TheBF · October 17, 2017

I think is more of a protection from a negative index getting into the mix.

Also the error detection uses '>" which is a signed operation so this implementation could only handle an array on **32K cells max.

Which of course in a stock TI-99 is big enough since the largest continuous block of memory is 24K

**(because any number bigger than HEX 7FFF (32,767) is treated as negative in signed operations on the 9900)

The definition for CELLS on a 16 bit CPU is just:

: CELLS ( n -- n ) 2* ;

And 2* is typically just duplicate the top of stack value and ADD the 2 together ;

: 2* ( n -- n) DUP + ; \ in high level Forth. But normally coded in Assembler

Nothing is hidden in Forth, but it can be a little harry sifting through all the Assembly language to find the answer.

Edited October 17, 2017 by TheBF

+Lee Stewart · October 17, 2017

I feel like you get away with using abs in [?] because CELLS does something you didn't describe. Such as shifting the sign bit off the end when. Is that correct?

-M@

Nothing special about CELLS . CELLS is a synonym for 2* as @Willsy indicated. It actually doubles the number on the stack by adding it to itself. The array index is a cell offset, but addresses are calculated using byte offsets, hence the requirement to double the cell count (index) to get a byte count, which is what CELLS does. Typing “ 2 CELLS ” leaves “ 4 ” on the stack.

Actually, I don't think ABS should be used at all. In the arrays under discussion, a negative index is an error. Testing the absolute value of a copy of the index may pass, but then allowing the use of the negative value cannot possibly be OK. I think there need to be two tests, one for too large an index and one for a negative index. The latter will catch indices so large they are negative.

...lee

+jedimatt42 · October 17, 2017

That all sounds good... Maybe Willsy wanted to support negative indexes... Or maybe because on a 64k byte addressable machine, 32k cells is 64k bytes.

I would agree that I would prefer to error on an attempt to use a negative number as an index into an array... or if you had a larger memory system, use an unsigned number.

It gets a little more significant if we decide to apply this technique to a bitset. Then you could easily wish to treat the 16 bit array index value as an unsigned number to get 64k bits.

Anyway, nit picking aside, I'm pretty sure the point was, you can build the semantics you want, for the application at hand. And I appreciate the illustration.

To date, I've worked with arrays in Forth more like you would in ASM. A heap of memory, carefully accessed with address arithmetic. But I like up-leveling like this.

I'm starting to think a domain specific language mindset and Forth go hand in hand.

Anyone every try HYPE on one of your Forths? https://web.archive.org/web/20120429040022/http://home.netsurf.de/helge.horch/hype.html

Thanks Willsy!

-M@

+TheBF · October 17, 2017

Have not tried HYPE, but there are quite a few OOP overlays for Forth.

Thanks for the link. I will give it a whirl. And yes. Domain specific languages are how Forth is used very frequently.

Nobody knows it's Forth under the hood (bonnet) because it looks like something else. :-)

OOP Implementations:

Bernd Paisons did one called minioof

https://www.complang.tuwien.ac.at/forth/gforth/Docs-html/Mini_002dOOF.html

The big one was a language called NEON for the Mac. It survives as MOPS Forth for the MAC.

It is pretty cool too.

Willsy · October 17, 2017

That all sounds good... Maybe Willsy wanted to support negative indexes... Or maybe because on a 64k byte addressable machine, 32k cells is 64k bytes.

I would agree that I would prefer to error on an attempt to use a negative number as an index into an array... or if you had a larger memory system, use an unsigned number.

It gets a little more significant if we decide to apply this technique to a bitset. Then you could easily wish to treat the 16 bit array index value as an unsigned number to get 64k bits.

Anyway, nit picking aside, I'm pretty sure the point was, you can build the semantics you want, for the application at hand. And I appreciate the illustration.

To date, I've worked with arrays in Forth more like you would in ASM. A heap of memory, carefully accessed with address arithmetic. But I like up-leveling like this.

I'm starting to think a domain specific language mindset and Forth go hand in hand.

Anyone every try HYPE on one of your Forths? https://web.archive.org/web/20120429040022/http://home.netsurf.de/helge.horch/hype.html

Thanks Willsy!

-M@

Yes, I was just protecting from using negative indexes, though, as Lee points out, it's not a good idea really. I mean, if you calculate an index of -3, making it +3 isn't really improving the situation is it? There's clearly something wrong!

So it would be better to check against the size of the array (as [?] already does) and less-than-zero.

I'll leave that as an exercise for the reader!

+TheBF · October 18, 2017

This kind of thing is commonly solved using the U< operator. ("un-signed less than" for those not familiar with Forth)

If the array size is 1000 for example, and I ask for index 500

500 1000 U< returns TRUE so it's a good index.

BUT... if I ask for index 1001

1001 1000 U< gives me FALSE (1001 is NOT less-than 1000) so I should abort with an error.

And if I ask for index -1 or -2 those numbers are actually 65,535 and 65,534 unsigned.

-1 1000 U< returns FALSE because it is actually 65535 1000 U<.

So U< lets you protect against negative values and "too high" positive values with one comparison.

Neat trick of twos complement arithmetic.

https://en.wikipedia.org/wiki/Two%27s_complement

+TheBF · January 12, 2018

That all sounds good... Maybe Willsy wanted to support negative indexes... Or maybe because on a 64k byte addressable machine, 32k cells is 64k bytes.

I would agree that I would prefer to error on an attempt to use a negative number as an index into an array... or if you had a larger memory system, use an unsigned number.

It gets a little more significant if we decide to apply this technique to a bitset. Then you could easily wish to treat the 16 bit array index value as an unsigned number to get 64k bits.

Anyway, nit picking aside, I'm pretty sure the point was, you can build the semantics you want, for the application at hand. And I appreciate the illustration.

To date, I've worked with arrays in Forth more like you would in ASM. A heap of memory, carefully accessed with address arithmetic. But I like up-leveling like this.

I'm starting to think a domain specific language mindset and Forth go hand in hand.

Anyone every try HYPE on one of your Forths? https://web.archive.org/web/20120429040022/http://home.netsurf.de/helge.horch/hype.html

Thanks Willsy!

-M@

Hey JediMatt,

I finally got around to seeing if I could compile HYPE and it requires the WORD-LIST wordset from Forth 94.

I have not finished implementing that yet so no go.

But it is a pretty small and effective OOP implementation for sure. Much smarter minds than mind created that one.

Thanks again.

+TheBF · April 11, 2021

This is an old thread but I was playing with an 8 Queens benchmark that needed the "classic" Forth array implementation so I thought I would explain them here.

These are super simple to code into your programs and work quite well. They are a typical "you are responsible for your own bugs" Forth way to doing things so there is no safety net.

Here the Forth interpreter is your friend to let you test each word of code before moving on.

Here are the canonical array implementation shown in numerous Forth tutorials:

: CARRAY ( n -- ) CREATE  ALLOT       DOES>  + ;
: ARRAY ( n -- ) CREATE  CELLS ALLOT  DOES>  SWAP CELLS + ;

Edit: Per comments below remove ALIGN from CARRAY definition

A character array is made with CARRAY. ARRAY creates an integer array. In both cases they take a size argument when you declare them.

To use them you give them an index argument followed by the name of the array and they return the address where the data is located, not the value in the array.

Here is how they work:

CREATE "creates" a label in the Forth dictionary. When that label is invoked it returns an address in memory. ( usually the address right after the text label)

ALLOT given an input argument n , allocate n bytes of memory in the Forth dictionary. This has the effect of giving the label we just "created" more data space. (Like BSS directive)

ALIGN

Align the dictionary pointer variable to a CPU "CELL" (integer) boundary. Not needed for an integer ARRAY, but needed if you declare an odd number of bytes in a CARRAY. ALIGN will force the dictionary ahead one byte to an even address. Like EVEN in Assembler.

All of the above happens at compile time, when you declare a new array in your program like this:

DECIMAL
500 CARRAY MYCHARS

DOES> is the action that MYCHARS "does" when you use its name in your program.

In the case of a CARRAY it "does" an addition. But what will it add?

CREATE remember, will return its data address to us. It turns out that this is the base address of our CARRAY. When we invoke MYCHARS, that address just plops onto the data stack for us.

But to use MYCHARS you had to give it an index, so that index was sitting on the data stack first and the address from CREATE plops down on top of it.

That's the very things we were looking for!

Now '+' just adds the top 2 numbers on the data stack and leaves the sum sitting on data stack.

When its finished you have: index+base = address_of_my_byte!

ARRAY is only a bit more complicated because we want to address 2 bytes at a time for 16 bit integers.

The Forth word CELLS is used to compute memory units or "cells" as they are called in Forth.

For our 9900 CPU CELLS just multiplies by two. So 2 CELLS gives you 4 bytes. (On a 32 bit machine 2 CELLS returns 8 and on a 64 bit CPU 2 CELLS returns 16)

So when we create an ARRAY we use CELLS to compute the size in bytes that will be ALLOTed. Using CELLS properly instead 2* means your code can work on 16,32 or 64 bit machines.

At run-time our data stack looks the same with the index 2nd and the address on top.

We use SWAP to reverse the order, CELLS multiplies the index by 2 and then '+' adds the two values together like CARRAY. So on the top of the stack we have that address of our data.

We get our data from the ARRAY with the fetch operator. '@' and we store it with the store '!'.

For CARRAY we use the character fetch 'C@' and the character store 'C!'.

Like this:

DECIMAL
 1000 CARRAY Q     \ make the byte array called Q
 99 6 Q C!         \ put 99 in the 6th location
 6 Q C@ .  ( 99)   \ fetch the byte and print it

1000 ARRAY T       \ make the array T
  1234 3 T !       \ put 1234 into the 3rd cell
  3 T @ .  ( 1234) \ fetch the 3rd cell and print it

This is already longer than I expected so next time I will show how simple it is to double the speed of accessing these arrays with a couple of machine code instructions.

+Lee Stewart · April 11, 2021

8 hours ago, TheBF said:
Here are the canonical array implementation shown in numerous Forth tutorials:
: CARRAY ( n -- ) CREATE  ALLOT  ALIGN     DOES>  + ;
: ARRAY ( n -- ) CREATE  CELLS ALLOT  DOES>  SWAP CELLS + ;

Though polite of the CARRAY definition, ALIGN seems superfluous. The current dictionary pointer reported by HERE is never guaranteed to be word-aligned, which is why CREATE always aligns a definition.

...lee

+TheBF · April 11, 2021

1 minute ago, Lee Stewart said:

Though polite of the CARRAY definition, ALIGN seems superfluous. The current dictionary pointer reported by HERE is never guaranteed to be word-aligned, which is why CREATE always aligns a definition.

...lee

I was thinking more about after ALLOT.

: ALLOT    DP +! ;

17 CARRAY TEST

In my system would leave DP on an odd address.

However, the really solution would be to preface my DOES> code with ALIGN. That's an accident waiting to happen.

It seems you've done it again Dr. Raid.

Thank you.

+Lee Stewart · April 11, 2021

4 hours ago, TheBF said:
I was thinking more about after ALLOT.
: ALLOT    DP +! ;
17 CARRAY TEST

In my system would leave DP on an odd address.

My point is that it does not matter that CARRAY might leave the new character array on an odd, non-word boundary. It should never be assumed that the dictionary pointer is on a word boundary. That is why CREATE always aligns a new definition to a word boundary.

4 hours ago, TheBF said:

However, the really solution would be to preface my DOES> code with ALIGN. That's an accident waiting to happen. It seems you've done it again Dr. Raid. Thank you.

You give me far too much credit! ? I fail to see how ALIGN would be doing anything useful after DOES> —the address of the first array element returned by executing the array’s name should have been aligned by CREATE when the array word was defined.

...lee

+TheBF · April 11, 2021

I have not tested this but CREATE aligns the start of the array to an even address boundary as you said.

Then we do 17 ALLOT and the dictionary is pointing to an odd address.

Next we run DOES> and in my case it "does" this: (cross-compiler T['] behaves as expected. XIMMEDIATE is cross-compiler IMMEDIATE)

: (;CODE) ( -- )  R> LATEST @ NFA>CFA !  ;

: DOES>    ( -- )
           COMPILE (;CODE)
           06A0 COMPILE, T['] DODOES COMPILE,   \ compiles: BL @DODOES
           ; XIMMEDIATE

My concern is that with the dictionary on an ODD address will some code be compiled on an odd boundary?

As I stare at it now I think my fears are un-founded ... I think.

I used to be confused, but now I am not sure.

+Lee Stewart · April 11, 2021

3 hours ago, TheBF said:

My concern is that with the dictionary on an ODD address will some code be compiled on an odd boundary?

As I stare at it now I think my fears are un-founded ... I think.

Comma ( , ) might be the only problem. It will compile a word on an even boundary that steps on the previous byte, but dutifully advances HERE to the next odd address. Any word that uses CREATE will align to an even boundary, however.

...lee

+TheBF · April 14, 2021

I promised that I would talk about a way to make vanilla arrays faster in Forth. This is applicable to most Forth systems but it requires that you understand a little assembly language and that you understand how your Forth uses the CPU registers internally.

A Little Background

Most Forth systems for TI-99 use an idea called Indirect threaded code. (ITC) Without delving into all the dirty details we can summarize by saying that every Forth word has an internal structure that includes the address of some real machine code (native CPU instructions). We can call this address the CODE FIELD address (CFA) and I call the code it points to the EXECUTOR. (I asked on comp.lang.forth what it is called and there seemed to be no consensus on a good name IMHO so this is my choice)

In the text below is a simplified illustration of the dictionary entry for a Forth word. The name of the word is a string with the length typically kept in the first byte.

Each <word> below is the address of another Forth word and the definition ends with the address of <EXIT> which is some code that works like RETURN in BASIC.

The <EXECUTOR> is special. It is the address of real machine code and is actually the interpreter for whatever follows in the Forth word.

<"FORTHWORD"><EXECUTOR><word><word><word><EXIT>

Each type of Forth word points to its own EXECUTOR:

Constants contain the address of a little routine called DOCON
Variables contain the address of a little routine DOVAR
Colon definitions have the address of a routine call DOCOL

Those EXECUTORs are predefined in your Forth system but wouldn't it be cool if you could plug in your own EXECUTOR code for words that you write?

This can be done with the word ;CODE .

Here is the definition of CARRAY (For Camel99 Forth only) where we change the EXECUTOR code to do what we really want, which is, add the index from the data stack to the base address of our array:

: CARRAY ( n -- )
      CREATE  ALLOT        \ compile time
;CODE
      ( n -- addr)         \ RUN time
      W TOS ADD,           \ Working register W has parameter field
      NEXT,
      ENDCODE

Just as before we CREATE the name in the dictionary and ALLOT some memory bytes for the array.

We end that FORTH definition with ;CODE which turns off the compiler (because we are done with that) and so now we are interpreting the text that follows.

(That's correct, the Forth Assembler is interpreted)

Notice there is only one instruction which adds W to the TOS. What does it mean?

W is the "working register" of the Forth system. (R8 is used in Camel99 Forth)

What is important is that after ;CODE the W register holds the data address of the last word that was "CREATEed".

- The data address is the next cell after the CFA and was called the parameter field in earlier years but is now called the DATA field.

TOS is another name for the top-of-stack in Camel99 Forth. It is actually R4 so another way to understand that code is:

R8 R4 ADD,

So at the end of CARRAY after ;CODE we have assembled one machine instruction into memory.

Camel99 Assembler requires that we add the NEXT, macro before ENDCODE so our code returns to Forth.

(Other systems may do this differently. Check your docs)

Here is the trick:

When we declare a CARRAY, ;CODE knows the address of where we assembled our little ADD instruction
It takes that address and stuffs it into code-field-address of every CARRAY that we declare
This means that when we RUN a CARRAY it automatically takes a number from the top of the data stack and adds it to the array address leaving the result on the top of the data stack

The net effect is that our character arrays are about 90% faster.

In practice we don't always want to compile the assembler into our Forth system just to assemble one instruction so we can replace the assembly language with machine code like this:

( The comma "compiles" a number from the DATA stack, into the next available memory in the Forth dictionary and advances the dictionary pointer)

: CARRAY ( n -- )
      CREATE  ALLOT        \ compile time
;CODE ( n -- addr)         \ RUN time
      A108 ,  \ W TOS ADD,  
      NEXT,
      ENDCODE

Integer arrays require only one more instruction to multiply the index by 2 which is easily done with another add instruction

: ARRAY ( n -- )
      CREATE  CELLS ALLOT      \ compile time
;CODE ( n -- addr)             \ RUN time
       A104 ,   \  R4 R4 ADD,  \ 2*  ie: CELLS
       A108 ,   \  W  R4 ADD,  \ base-address+tos=address'
       NEXT,
       ENDCODE

In the next release of Camel99 Forth the code above will be in DSK1.ARRAYS

Remember that Turbo Forth and FbForth also have a W register, but it is a different register number and they do not keep the top-of-stack in a register.

You will use the stack pointer register for that purpose.

Something like this for CARRAY.

W *SP A,

The effect will be the same. Much faster array access.

Keep the Forth

+Lee Stewart · April 14, 2021

1 hour ago, TheBF said:
Integer arrays require only one more instruction to multiply the index by 2 which is easily done with another add instruction
: ARRAY ( n -- )
      CREATE  CELLS ALLOT      \ compile time
;CODE ( n -- addr)             \ RUN time
       A104 ,   \  R4 R4 ADD,  \ 2*  ie: CELLS
       A108 ,   \  W  R4 ADD,  \ base-address+tos=address'
       NEXT,
       ENDCODE

With a one-bit shift instruction using one less memory access than an add instruction, the following should be faster:

: ARRAY ( n -- )
      CREATE  CELLS ALLOT      \ compile time
;CODE ( n -- addr)             \ RUN time
       0A14 ,   \  R4 1  SLA,  \ 2*  ie: CELLS
       A108 ,   \  W  R4 ADD,  \ base-address+tos=address'
       NEXT,
       ENDCODE

This, sadly, does not obtain for fbForth or TurboForth because SLA, only works on registers.

...lee

+TheBF · April 14, 2021

Thanks for that. I will amend accordingly.

+TheBF · January 28, 2022

String Arrays in Forth

I am watching with interest the exploration of @dhe as he works out the details of using C99. The latest work on string arrays made me wonder how we might do this in Forth.

Disclaimer: These examples are all in Forth '94 dialect and will need some tweaking to work on FbForth or TurboForth but either system is perfectly capable of doing this.

So this code creates string literals as a tight list. The start of each of these strings is the length stored in a byte. These are called byte counted strings and are also used in Pascal. A cool trick with these is you can put them end to end in memory and use the length byte to jump through the list. This the advantage of making this kind of array.

The disadvantage is you should not change the strings once they are compiled into memory. (you could but its easy to crash the system this way)

These are a bit different than the C string arrays @DHE made.

The C arrays are of a fixed width because they are using the [,] operators to make a fixed size 2D array in memory.

This also means you have to be careful. If you dimension the array with each string 10 bytes long and then later write an 11 byte string into that array an "undefined behavior" might occur. (It's not a good thing usually)

So as with all things Forth you first have to teach the machine how to do what you want. There are no arrays, just memory.

The amazing thing to me is how we can create something very functional with the simple operations Forth gives us, with a little imagination.

Here is what it took to make arrays and then define one analogous to the C string arrays.

Spoiler


\ diy compact string Literal array                       Jan 27 2022 Brian Fox

\ this word is in the kernel.  Shown for reference
\ Allocate u bytes, "place" string at caddr in memory as byte-counted string
\ : S,  ( caddr u -- ) HERE OVER 1+ ALLOT PLACE  ALIGN ;

: ,"       [CHAR] " PARSE  S, ; \ parse until " and compile into memory

: NEXT$   ( $addr -- $addr')  COUNT + ALIGNED ;
: NTH$    ( $list n -- $addr)  0 ?DO  NEXT$  LOOP ; \ GOTO the nth string

\ syntactic sugar
: LEN  ( $addr -- ) C@ ;   \ fetch the length byte of a string
: {      0 , ;  \ compile 0 to start array
: }      {   ;  \ compile 0 to end array

\ only now can we make the array because we taught Forth how to do it
CREATE []STR
     {
       ," first string"   ," second string"  ," third string"
       ," fourth string"  ," fifth string"
     }

And here is some example code that does something with the strings

Spoiler


\ code to do something
: PRINT$  ( $addr --) CR  COUNT TYPE ;

[]STR 3 NTH$ PRINT$
[]STR 2 NTH$ PRINT$

\ higher order function to do things to each string in an array
: [ALL]  ( addr Xtoken --)
        >R
        BEGIN
          NEXT$ DUP LEN
        WHILE
          DUP  R@ EXECUTE
        REPEAT
        R> DROP
;

[]STR ' PRINT$ [ALL]

Screen shows what happened

apersson850 · January 28, 2022

5 hours ago, TheBF said:

These are called byte counted strings and are also used in Pascal. A cool trick with these is you can put them end to end in memory and use the length byte to jump through the list. This the advantage of making this kind of array.

The disadvantage is you should not change the strings once they are compiled into memory. (you could but its easy to crash the system this way).

What you are doing here isn't a string array, though, but a list containing strings. An array does allow random access of any item inside, without having to traverse the list first.

If you want to both have a flexible memory usage and be able to replace a string, the typical approach in a language like Pascal is to use a linked list. Then each data item contains a string as well as a pointer.

For this to be truly flexible, you need to keep track of the string, the pointer to the next string and the number of words allocated to each variable.

In UCSD Pascal, they are created by varnew, a command which allocates any number of words on the heap. To release the memory, you must use vardispose. So this is similar to using the standard functions new and dispose, except that the first pair allows the number of words allocated to be undetermined at compile time.

Takes more time to handle, of course, but if memory is at a premium, it may still be worth it. And by handling the pointers, you can delete a string in the list, insert a string and, combining both, replace a string with one of a different length.

Willsy · January 28, 2022

Agree with @apersson850 's interpretation - strictly speaking this is a list of strings, but it is neat, and a lovely example of how something useful can be created with very little code, extending the language with a new feature. Nice. :thumbsup:

For TurboForth, as you suspected, some changes were required:

Had to add ALIGNED;
Had to add PLACE - took a while to find the definition. Found it in the end in a proposal doc;
Changed S, to suit.

TF doesn't have ?DO built in (used in the definition of NTH$). It's included as a loadable extension - block 41 on the boot disk (at least on my version!). However, for this code I simply replaced it with DO - not as robust. Just don't call NTH$ with an empty list :ahoy:

Here's the TF version:

\ Jan 27 2022 Brian Fox - modded for TurboForth 2022-01-28 M.Wills

\ a-addr is the first aligned address greater than or equal to addr.
: ALIGNED ( addr -- a-addr ) 1+ $FFFE AND ; \ added

\ place the string identified by c-addr1 u as a counted string at c-addr2
: PLACE ( c-addr1 u c-addr2 -- ) 2dup c! 1+ swap cmove ; \ added

\ Allocate u bytes, "place" string at caddr in memory as byte-counted string
: S, ( caddr u -- ) HERE OVER 1+ ALLOT PLACE ALIGN ;

: ," [compile] s" s, ; \ parse until " and compile into memory

: NEXT$   ( $addr -- $addr')  COUNT + ALIGNED ;
: NTH$    ( $list n -- $addr)  0 DO  NEXT$  LOOP ; \ GOTO the nth string

\ syntactic sugar
: LEN  ( $addr -- ) C@ ;   \ fetch the length byte of a string
: {      0 , ;  \ compile 0 to start array
: }      {   ;  \ compile 0 to end array

CREATE []STR
     {
       ," first string"   ," second string"  ," third string"
       ," fourth string"  ," fifth string"
     }

The test code has an additional DROP at the end. I think you have a small bug-ette in the test code. From my observations, you need an extra DROP at the very end. You are cleaning up the XT that you persisted on the return stack, but fail to drop the DUPed address returned by NEXT$ (if I'm reading it correctly). In my investigations, the value left on the stack when [ALL] exits was a pointer to the 0 termination value at the end of the string list. In any case, adding an additional DROP at the end fixes it:

\ code to do something
: PRINT$  ( $addr --) CR  COUNT TYPE ;

[]STR 3 NTH$ PRINT$
[]STR 2 NTH$ PRINT$

\ higher order function to do things to each string in an array
: [ALL]  ( addr Xtoken --)
        >R
        BEGIN
          NEXT$ DUP LEN
        WHILE
          DUP  R@ EXECUTE
        REPEAT
        R> DROP DROP
;

[]STR ' PRINT$ [ALL]

Thanks Brian! That was a fun distraction for much longer than it should have been! I'm really quite rusty with Forth these days. :roll:

Edited January 28, 2022 by Willsy

apersson850 · January 28, 2022

29 minutes ago, Willsy said:

Agree with @apersson850 's interpretation - strictly speaking this is a list of strings,

I don't want to be nit-picking, but I don't want to spread the wrong terminology either, to those who aren't fully familiar with these concepts.

Tutorial: Arrays in Forth

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members