Jump to content
IGNORED

Pascal on the 99/4A


apersson850

Recommended Posts

On 8/24/2020 at 6:06 AM, Vorticon said:

The array tmpboard, while declared as [11..88] of integer

 

What kind of information are you keeping in that array?  Namely, will bytes work rather than integers?

 

It may be more efficient for the compiler to generate access to bytes in the array by avoiding a left shift of the index to address integers.

Link to comment
Share on other sites

9 minutes ago, BillG said:

What kind of information are you keeping in that array?  Namely, will bytes work rather than integers?

 

It may be more efficient for the compiler to generate access to bytes in the array by avoiding a left shift of the index to address integers.

Just integers 1-16. I don't believe UCSD pascal supports byte types though.

Link to comment
Share on other sites

You have to declare it as

 

type

  bytes = 0..255;

  bytearray = packed array[0..88] of bytes;

 

However, the system's procedures to handle packed arrays will usually cost more than the normal indexing. But in this special case, the compiler is smart enough to realize that this is a byte, not a value with some odd number of bits. Thus finding the correct value will generate one p-code, LDB (load byte). This is clearly the most efficient way to address an array with byte-size values.

 

Addressing arrays in general allows for any size of each element. The p-code IXA (index array) handles the address calculation. For an array of integers, the word index into the array is the same as the index itself. But since IXA is a general array index calculator, it will multiply (using assembly instruction MPY) the index you supply with the size of each element (in this case 1), then shift the result left one bit, then add this to the array base, to find the memory address of the desired value. This address is left on the stack, where it's picked up by SIND (short index and load), which returns the actual value.

 

So in this case, using the array as suggested above in this post uses about half the amount of instructions (assembly) for each access, and avoids the long MPY instruction altogether. Thus it's both faster and saves memory, even compared to indexing the integers from 11 to 88.

Using a different base than zero, on the other hand, will generate two additional instructions for each access. One to load the base index, the other to do an integer subtraction. Which is equivalent to wasting all you gained from using a byte sized array.

However, running with range checking enabled will cost you more than the longer of the two alternatives for each access, and that's regardless of how you make up the array itself. Once you know your code is working, and you don't get any index out of range errors, you can turn off range checking with the (*$R-*) compiler directive.

 

To summarize, using a packed array of bytes and turning range checking off will reduce the time to access each element to about 1/5 of what you do now.

You can easily optimize the code on the Pascal level, without losing readability, but it takes you are well aware of what the compiler will do with your code, and which p-codes the interpreter runs in the most efficient manner.

Edited by apersson850
Link to comment
Share on other sites

So there is not true byte type as in Turbo Pascal in UCSD, but you can essentially create one.

8 hours ago, BillG said:

Sure it does.  Use array of char and ord() to "convert" to integer.

Clever! Did not think of that :)

1 hour ago, apersson850 said:

You have to declare it as

 

type

  bytes = 0..255;

  bytearray = packed array[0..88] of bytes;

 

However, the system's procedures to handle packed arrays will usually cost more than the normal indexing. But in this special case, the compiler is smart enough to realize that this is a byte, not a value with some odd number of bits. Thus finding the correct value will generate one p-code, LDB (load byte). This is clearly the most efficient way to address an array with byte-size values.

 

Addressing arrays in general allows for any size of each element. The p-code IXA (index array) handles the address calculation. For an array of integers, the word index into the array is the same as the index itself. But since IXA is a general array index calculator, it will multiply (using assembly instruction MPY) the index you supply with the size of each element (in this case 1), then shift the result left one bit, then add this to the array base, to find the memory address of the desired value. This address is left on the stack, where it's picked up by SIND (short index and load), which returns the actual value.

 

So in this case, using the array as suggested above in this post uses about half the amount of instructions (assembly) for each access, and avoids the long MPY instruction altogether. Thus it's both faster and saves memory, even compared to indexing the integers from 11 to 88.

Using a different base than zero, on the other hand, will generate two additional instructions for each access. One to load the base index, the other to do an integer subtraction. Which is equivalent to wasting all you gained from using a byte sized array.

However, running with range checking enabled will cost you more than the longer of the two alternatives for each access, and that's regardless of how you make up the array itself. Once you know your code is working, and you don't get any index out of range errors, you can turn off range checking with the (*$R-*) compiler directive.

 

To summarize, using a packed array of bytes and turning range checking off will reduce the time to access each element to about 1/5 of what you do now.

You can easily optimize the code on the Pascal level, without losing readability, but it takes you are well aware of what the compiler will do with your code, and which p-codes the interpreter runs in the most efficient manner.

This is truly very valuable information and extremely relevant to my project because it's all about accessing and handling arrays. It should be fairly trivial to make the changes since only the declarations change, not the handling of the arrays. I'm going to run some tests and see what kind of speed gains I get.

 

  • Like 3
Link to comment
Share on other sites

Good! Then the effort to make the post paid off immediately.

 

Yes, by declaring byte = 0..255 you get the same thing as is predefined in some systems. There's no difference once you use it. But the minimum allocation unit for an array, unless it's a packed one, is one word (16 bits). That's also the minimum allocation unit for one packed record. So packing two bytes you can do in one word, but two bytes and a boolean will require two words (32 bits).

 

The ord() operator in UCSD Pascal is a bit odd. Together with odd() and char(), it allows not only type casting, but also bitwise boolean expressions.

If you haven't got it, I recommend Advanced UCSD Pascal programming techniques by Eliakim Willner and Barry Demchak. Although written with version IV.13 in mind, a lot of what's in there is relevant to IV.0 as well.

If you can't find it, send a PM to me.

 

The idea above to define an array of char, then use ord() to read their numeric values, that works. But it's an un-necessary deviation from the fact that the data is numeric from the beginning. In UCSD Pascal, a subrange (0..255) is always type compatible with the base type (integer), as well as with other subranges of the same base type.

Edited by apersson850
  • Like 2
Link to comment
Share on other sites

39 minutes ago, Vorticon said:

Unfortunately turning range checking off, changing the arrays base index to 0 and using a packed array of bytes for the board representation only resulted in a minimal saving in overall processing time. Oh well...

I believe Aperson showed the P code interpreter is running about 6..8 instructions for each p-code primitive. That's pretty big overhead on at the glacial pace of 9900.

I have never looked at the GPL interpreter but it is probably about the same or maybe a bit more based on the execution speed of GPL.

 

I guess you need a native code Pascal compiler. 

Link to comment
Share on other sites

On 9/2/2020 at 1:46 AM, TheBF said:

I believe Aperson showed the P code interpreter is running about 6..8 instructions for each p-code primitive. 

Yes, that's correct. but that also implies that if you can remove a few p-code instructions from a sequence, you can also remove quite a lot of assembly instructions.

For the user of one single machine, the advantage of the p-codes is that they are compact. Native code is usually several times larger. So the speed drop from interpretation does give the advantage that even 48 K RAM computers can run pretty big programs. Even without using the segmentation capability of the p-system.

So no, a native code Pascal compiler isn't what a TI 99/4A needs. For computers with larger memories, yes. But not for this little machine.

It's identifying some critical code and re-writing that in assembly that you can do for this machine.

 

  • Like 1
Link to comment
Share on other sites

4 hours ago, apersson850 said:

So no, a native code Pascal compiler isn't what a TI 99/4A needs. For computers with larger memories, yes. But not for this little machine.

It's identifying some critical code and re-writing that in assembly that you can do for this machine.

 

It would depend upon whether you want to create a large program or a small and fast one.

  • Like 1
Link to comment
Share on other sites

If the program is small, then it's usually also pretty fast, as there isn't very much code to run through.

Unless it's a small activity that's executed in a loop, taking many turns, in which case the task to translate the inner core of the loop to assembly usually isn't too daunting.

 

No, what our system needs is a native code generator, which can translate the major part of a p-code routine to assembly automatically. Then you can write, debug and also use the program as it is. But if you do want to sacrifice some memory for speed, you can let the NCG convert some routines, where the program spends most of its time, to machine code.

Edited by apersson850
Link to comment
Share on other sites

2 hours ago, apersson850 said:

If the program is small, then it's usually also pretty fast, as there isn't very much code to run through.

Unless it's a small activity that's executed in a loop, taking many turns, in which case the task to translate the inner core of the loop to assembly usually isn't too daunting.

 

No, what our system needs is a native code generator, which can translate the major part of a p-code routine to assembly automatically. Then you can write, debug and also use the program as it is. But if you do want to sacrifice some memory for speed, you can let the NCG convert some routines, where the program spends most of its time, to machine code.

To your point, I made a little optimizer that expands Forth words that you choose to expand.  By expand I mean copy the machine code from the kernel routines inline into memory.

When I tested it in inner loops I can get up to 2X speed improvement on those parts of the program

 

So the ideal would be an NCG that makes a linkable UNIT for USCD Pascal?

 

 

  • Like 1
Link to comment
Share on other sites

I've never used any NCG. I don't even know if any exists for the TMS 9900. But as far as I've read about them, it seems you start by compiling a Pascal program with a compiler directive, prepare for native code conversion. This directive points out the routines you've deemed interesting to convert. Then you execute the NCG, with the compiled code file as input, and get a new code file, where the appropriate procedure(s) have been converted to native code.

 

For the TI 99/4A, the compiler ignores the code conversion directives, so I don't know exactly what they should accomplish.

But the p-code interpreter does understand the codes indicating that natvie code is coming in the instruction stream. I'm actually, slooowly, working on a program which would do a limited conversion (just a few instructions handled), to investigate the feasibility of doing this on the TI 99/4A. If that works, then an expansion to convert more instructions isn't very difficult. The issue is to get it to work at all.

  • Like 1
Link to comment
Share on other sites

11 hours ago, apersson850 said:

I've never used any NCG. I don't even know if any exists for the TMS 9900. But as far as I've read about them, it seems you start by compiling a Pascal program with a compiler directive, prepare for native code conversion. This directive points out the routines you've deemed interesting to convert. Then you execute the NCG, with the compiled code file as input, and get a new code file, where the appropriate procedure(s) have been converted to native code.

 

For the TI 99/4A, the compiler ignores the code conversion directives, so I don't know exactly what they should accomplish.

But the p-code interpreter does understand the codes indicating that natvie code is coming in the instruction stream. I'm actually, slooowly, working on a program which would do a limited conversion (just a few instructions handled), to investigate the feasibility of doing this on the TI 99/4A. If that works, then an expansion to convert more instructions isn't very difficult. The issue is to get it to work at all.

Thanks for that explanation.

That would be really something for TI-99.  Very interesting work creating good native code generators.  Not for the faint of heart. :) 

 

So if the system can understand that "native code is coming" do we know what form it is expecting? (binary data?) 

If we knew that, then in theory you could write the code in Assembler or even a Forth cross-compiler could be adapted to make a native code block.

I have interfaced Forth to Pascal calling conventions under DOS and it was pretty simple since everything is passed on the stack "behind the curtain" in Pascal.

It might be a whole lot trickier connecting correctly to the USCD VM however. I have no idea on that.

 

 

  • Like 1
Link to comment
Share on other sites

I'm not sure if I misunderstand you, or you misunderstood me.

But there's no reason whatsoever for why the NGC shouldn't be written in Pascal. This has nothing at all to do with any stack in the p-system or anything. You read the code file, modify the procedure you want to convert, then write the file to disk. When you load it, to execute it, the converted parts will run with native code. The other parts will still be p-code. Thus you only need to convert the time critical parts. There's of course no point in converting parts that are mainly waiting for user input, or only run once to set up some data etc. The p-system allows for single procedures, or even parts of single procedures, to be converted, where the rest can be p-code. The existing native code converters don't convert instructions like Call procedure in external segment, since they run so many instructions (several hundred) that the PME overhead is neglectible anyway.

The code to insert instead of the p-code isn't difficult to figure out. At least for a first version, simply patch in a copy of what the PME would execute. Then the overhead of finding the interpreter sequence is removed, but the actual useful code still executed in the same way.

It is also possible to start analyzing and do further optimization of the generated code, but that's nothing I intend to aim for to begin with.

 

The code to modify will be converted so that a program like this one:

 

Read second local variable to stack p-code

Add one to value on stack p-code

Load 22 on stack p-code

Multiply TOS by TOS-1 p-code

Store in second local variable p-code

Return from program unit p-code

 

Will instead be converted to something like this (commented with corresponding p-code):

 

Native code following p-code ; The PME will here branch to the word following this instruction, which is assumed to contain machine code

DECT SP ;Read second local variable to stack

MOV @4(RECP),*SP

INC *SP ;Add one to value on stack

DECT SP ;Load 22 on stack

LI R0,22

MOV R0,*SP

CLR R2 ;Multiply TOS by TOS-1

MOV *SP+,R2

MPY *SP,R2

MOV R3,*SP

MOV *SP+,@4(RECP) ;Store in second local variable

BL *R11 ;Return to p-code interpreter, which will continue with p-code right after this instruction

Return from program unit p-code

 

The code snippets above aren't "prefect", i.e. they aren't entirely authentical. But they do show the principle of the code conversion.

We can also see that 6 bytes are replaced by 30, and roughly 48 instructions are replaced by 18. That gives you an idea about the trade-off in size vs. speed.

Link to comment
Share on other sites

My confusion was more around how you modify the existing compiler to understand 9900 Assembler opcodes or convincing the compiler to convert Pascal to native code.

I assumed that changing the compiler is not possible at least for TI-99, so then I wondered about writing the code manually and putting the binary in system friendly form.

 

If my assumptions about the flexibility of the compiler are wrong then problem solved. :) 

 

 

 

  • Like 1
Link to comment
Share on other sites

You don't modify the compiler. You let the compiler generate the p-code, which is easier than machine code, since p-code is specifically designed to make a compiler's code generation for a language like Pascal easy. Remember that one of the main pillars in the foundation of the UCSD p-system was usefulness on small (by late 1970's standard) computers.

Then, when you have a code file with p-code, you let the NCG loose on that, and let it convert p-codes to machine code. Which also is a reasonably simple task, since most p-codes can be executed with less than ten machine instructions. Some with only one.

This means that the original code file is intact, since you make a new code file with the converted code. Today, it doesn't matter so much, as the p-system as a whole is dead in everyday life, but back then, that implied that you still had a code file you could run on any machine with a p-system installed. Then you had another, larger but faster, code file that would only run assuming a certain CPU. So you lost no portability, but had the option to gain speed (sacrificing memory).

 

Since the code file would grow substantially, you normally would identify the most time critical parts of the program, convert those and only those.

 

In theory you could convert the p-codes manually, but that's very tedious. Since it will also change the structure of the code files, due to the converted code requiring more space, it's very much more convenient to do it with a program.

  • Like 1
Link to comment
Share on other sites

Thank you. Thank makes perfect sense. It would be an interesting albeit significant project I am sure to build the NCG.

 

I found this in Wikipedia:

 

Niklaus Wirth specified a simple p-code machine in the 1976 book Algorithms + Data Structures = Programs.

The machine had 3 registers - a program counter p, a base register b, and a top-of-stack register t. There were 8 instructions:

  1. lit 0, a : load constant a
  2. opr 0, a : execute operation a (13 operations: RETURN, 5 math functions, and 7 comparison functions)
  3. lod l, a : load variable l,a
  4. sto l, a : store variable l,a
  5. cal l, a : call procedure a at level l
  6. int 0, a : increment t-register by a
  7. jmp 0, a : jump to a
  8. jpc 0, a : jump conditional to a[6]

 

These look very familiar to Forth people. :) 

 

Wirth seemed so ahead of the pack back then. 

  • Like 1
Link to comment
Share on other sites

5 hours ago, TheBF said:

Thank you. Thank makes perfect sense. It would be an interesting albeit significant project I am sure to build the NCG.

Well, the p-code instruction set in the version IV p-machine is a bit more complex.

 

But that's not any issue. Since the PME allows for not only a code file where different procedures are using different code types (p-code or machine code), it actually allows for the same procedure to go from p-code to machine code to p-code to machine code to... So a first attempt, just to prove the feasibility of the concept, doesn't have to translate more than a single p-code to assembly. If the program still runs, with the same result, that's proof of concept. It's enough to translate ADI (add integer) into the equivalent A *SP+,*SP to see if it works. Such a small change will not increase the speed at all, since ADI will be replaced by the p-code NAT (native code follows), and then the useful instruction, but if it works, then it's just a question about adding conversion of more p-codes.

 

For other reasons, I've already written most of a program that can build a structure that describes the segments and their embedded procedures in a code file. That can be used to find the procedure to convert.

Testing that the PME can actually switch over to native code, and back again, can be done just with the standard compiler and frequent use of the pmachine procedure.

 

Regarding the error from the compiler, the only thing I know is that BACKEND is the 14th segment in the compiler code. And there's indeed an embedded Backend error string in the constant pool of that segment. But why it's issued I don't know.

Link to comment
Share on other sites

Just now, apersson850 said:

I looked it up. Too many jumps in the jump table. You have too many conditionals in the same procedure. 

Ah... I guess I have to break things up a bit. Thanks for looking into it.

For my own edification, which reference did you use to figure out the error meaning?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...