Jump to content

Photo

Pascal on the 99/4A

Pascal p-system

296 replies to this topic

#276 TheBF OFFLINE  

TheBF

    Moonsweeper

  • 319 posts
  • Location:The Great White North

Posted Sat Jul 1, 2017 5:45 AM

Hmmm! I wonder if a Pascal Compiler written in TurboForth would increase the compilation speed.

From the benchmark result Apersson850 posted, there is room for improvement.  Since the Pascal compiler is written in Pascal, the slowdown seems to be in the p-code interpreter implementation.  If the Pascal compiler emitted Forth that was r"linked" by Turbo Forth and then run, I have a strong feeling it would be much faster than it is currently.

 

Of course if the Pascal compiler emitted native TMS9900 code it would be comparable in speed to C that we see from GCC and potentially the compiler would run even faster on the TI-99.  The caveat is that the compiler program might be larger as native code, but with appropriate compiler optimizations it could be made to be comparable size.

 

Pascal normally compiles much much faster than C, comparable to Forth compilation speeds, because of C's more complicated grammar so I go back to the p-code interpreter as the most probable bottle-neck.



#277 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • Topic Starter
  • 421 posts

Posted Sat Jul 1, 2017 7:30 AM

From what I've read, the p-system Pascal compiler isn't doing much in the way of optimizing for speed. If it does anything, it's for compact code. But that's more a question about generating code that "fits" the PME (P-Machine Emulator) than anything else.

 

As far as I remember, writing cnt := pred(cnt); generates faster (and more compact) code than writing cnt := cnt-1;

 

But that's not where the time is spent in the benchmarks. They mainly test the library sprite, a pre-compiled unit which has a lot of features, but obviously doesn't work very fast. Since you can create linked lists with sprite descriptions, lists where each element has a timeout and links to the next item when the time runs out, they offer more functionality than I've seen implemented for other languages. It could even be that they are written so that you can only update the sprites on each interrupt, but that I don't know. Pure speculation. The p-system does have its own sprite auto-motion code, which runs on each VDP interrupt.

 

Speaking about them, the p-system also handles concurrency and a keyboard buffer during the interrupts. That's a few more instructions to pass through each time. Does Forth scan they keyboard to check for a user break key while words are running?

 

As a general comment, the p-system is designed around the idea of fitting a complete Pascal compiler (equivalent to the capabilities found in Turbo Pasacal 4, which took a lot of inspiration from UCSD Pascal), and a run-time system which allows for dynamically reloacating of code and swapping of code segments from disk automatically during operation, inside the limits imposed by having only 48 K RAM, of which one third actually is video memory. Execution speed takes a hit there.

Another big hub in the p-system design is portability. The compiler is the same program regardless of which system you run it on. If it generated native code, it would require substantial changes for each CPU. Thus portability would be lost.

 

Most p-systems do provide the NCG program, a Native Code Generator which can accept a critical program segment as input and translate that to machine language. The p-code which runs in-line assembly is supported, and the compiler on the TI does support generating in-line p-code directly, but then you have to know about it (it's not documented) and you have to handle how to get the assembly in there. I've found it easier to develop the programs in Pascal, when doable, but design them around calling procedure/functions most of the time, instead of large chunks of in-line code. Thus it's relatively easy to re-write critical things in assembly.


Edited by apersson850, Sat Jul 1, 2017 7:32 AM.


#278 TheBF OFFLINE  

TheBF

    Moonsweeper

  • 319 posts
  • Location:The Great White North

Posted Sat Jul 1, 2017 10:02 AM

Your are such an expert on this system. Thanks for all the insights!

 

Forth does almost nothing without explicit code so no, there are not breaks in the loops. :-)

 

But it can be done by re-defining the loop words like this:

\ example to add BREAK to Forth BEGIN/UNTIL loops
 
: ?BREAK   ( -- )  KEY? ABORT" HALTED BY USER" ;
 
: UNTIL    ( ? --) POSTPONE ?BREAK POSTPONE UNTIL ; IMMEDIATE
 
That's something that is difficult in conventional languages. :-)
 
However to your point this would slow the loops down to 1000 loops per second because of the infernal KSCAN.
 
So maybe this is why Pascal is operating slowly.  
Maybe  you should compare a different benchmark that unrolls a loop into many lines of sequential code and see if things speed up. ?


#279 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • Topic Starter
  • 421 posts

Posted Sun Jul 2, 2017 2:42 AM

No, until proven wrong, I claim that the low performance is due to the implementation of the unit sprite. Again, it's more advanced than any other such implementation, and is really intended to allow free-running lists of sprite actions, running by themselves and updating on the VDP interrupt only. I assume doing modifications to the data structures inside this unit is what takes time.



#280 TheBF OFFLINE  

TheBF

    Moonsweeper

  • 319 posts
  • Location:The Great White North

Posted Sun Jul 2, 2017 8:09 AM

No, until proven wrong, I claim that the low performance is due to the implementation of the unit sprite. Again, it's more advanced than any other such implementation, and is really intended to allow free-running lists of sprite actions, running by themselves and updating on the VDP interrupt only. I assume doing modifications to the data structures inside this unit is what takes time.

ok fair enough.

 

I recently implemented a Sprite control mechanism using a cooperative multi-tasker rather than on an interrupt, so I am very interested in what the P system sprite control does.

Can you show what a "free running list of sprite actions" looks like in Pascal code?

 

I don't want to give you a "make work" project, but perhaps there is some demonstration code somewhere I can look at or a section of the P system manual that describes this functionality.



#281 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • Topic Starter
  • 421 posts

Posted Sun Jul 2, 2017 10:10 AM

You can find the Pascal compiler manual at the WHTech site. The pre-compiled unit sprite is described at page 144. You can also check the chapter before, which describes sound processing in the p-system. As you can see, they've made quite elaborate designs here.



#282 Vorticon OFFLINE  

Vorticon

    River Patroller

  • 2,762 posts
  • Location:Eagan, MN, USA

Posted Sun Jul 2, 2017 2:43 PM

Has the latest update to Classic99 broken somehow the p-code emulation? It can't seem to read the disks at all...



#283 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • Topic Starter
  • 421 posts

Posted Mon Jul 3, 2017 2:50 AM

Of course if the Pascal compiler emitted native TMS9900 code it would be comparable in speed to C that we see from GCC and potentially the compiler would run even faster on the TI-99.  The caveat is that the compiler program might be larger as native code, but with appropriate compiler optimizations it could be made to be comparable size.

It's a bit confusing here, what you mean by "the compiler running".  Are you referring to the compiler itself, when it compiles source code, or the code that it actually compiled?

For the TI 99/4A there's the special issue that it's actually a 32 K RAM machine. The UCSD Pascal needs a 48 K RAM machine to work reasonably well. But they've used the trick that the PME can run code from CPU RAM, VDP RAM and GROM (on the p-code card) to make it possible to implement the UCSD p-system IV.0 on the 99/4A. That wouldn't work if everything was in native code, as you couldn't run any programs from VDP RAM in that case. The p-code card could technically bank-switch a lot more ROM than it does (it has 12 K ROM, where 4 K remains the same, and the other 4 K are two different banks). Now it also has 48 K GROM, and they are easy enough to access, as they are seen through a byte-wide window only.

But this also makes the interpreter slightly slower. Not much when running code in line (the IPC must be separately incremented, so you lose one CPU instruction per p-code), but more so when a jump has to be taken, as it takes longer time to update the VDP or GROM read address than just load a new value to the IPC (which is in R8).

If you jump from code in VDP RAM or GROM to code in CPU RAM, or vice versa, you also have to load the other PME instruction/immediate data fetch routine. It's running in RAM at 8300H for speed, but only one version fits at the same time.



#284 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,654 posts
  • Location:Flyover State

Posted Mon Jul 3, 2017 11:01 AM

From what I've read, the p-system Pascal compiler isn't doing much in the way of optimizing for speed. If it does anything, it's for compact code. But that's more a question about generating code that "fits" the PME (P-Machine Emulator) than anything else.

...

To the best of my knowledge, the UCSD Pascal compiler mostly just barfs out code without any optimization.

The stack machine makes it easy to generate code which keeps the compiler small and able to run in 64K, and it may eliminated the need for some optimizations..
I haven't looked, but there has to be some sort of tracking the stack use, so the code generated won't be completely horrible.
You could probably speed up the code a bit with a peephole optimizer, but optimizing code for the stack machine's assembly might be complicated than for registers.
 



#285 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • Topic Starter
  • 421 posts

Posted Mon Jul 3, 2017 11:11 AM

The code doesn't have to be that horrible just because you push and pop data from the stack. But I know for sure that there are things you need to keep track of as a programmer, things a good optimizing compiler would figure out by itself and fix for you. So I still suspect it's a "you asked for it, you got it" compiler. As far as I've read, it will not use the INC and DEC opcodes unless you tell it to in the source (using pred and succ instead of +1 or -1). I haven't checked, even if that's easy enough to do. A p-code disassembler is a part of the system.

 

They make code generation simple, or at least simpler, by using the fact that some of the p-codes are specifically tailored to meet requirements a Pascal program has. Like the one I showed above, to find a local variable in a lexical parent, any level up.

The general approach to speeding up Pascal programs, or rather p-code programs, is the conversion to native code using the Native code generator program. Unfortunately, there's no such program delivered with the TI 99/4A. The p-codes required as supported, though, so if you study the code files enough, you could write your own.


Edited by apersson850, Mon Jul 3, 2017 11:16 AM.


#286 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,654 posts
  • Location:Flyover State

Posted Mon Jul 3, 2017 11:30 AM

The code doesn't have to be that horrible just because you push and pop data from the stack. But I know for sure that there are things you need to keep track of as a programmer, things a good optimizing compiler would figure out by itself and fix for you. So I still suspect it's a "you asked for it, you got it" compiler. As far as I've read, it will not use the INC and DEC opcodes unless you tell it to in the source (using pred and succ instead of +1 or -1). I haven't checked, even if that's easy enough to do. A p-code disassembler is a part of the system.

 

They make code generation simple, or at least simpler, by using the fact that some of the p-codes are specifically tailored to meet requirements a Pascal program has. Like the one I showed above, to find a local variable in a lexical parent, any level up.

The general approach to speeding up Pascal programs, or rather p-code programs, is the conversion to native code using the Native code generator program. Unfortunately, there's no such program delivered with the TI 99/4A. The p-codes required as supported, though, so if you study the code files enough, you could write your own.

I've written a couple peephole optimizers in the past.
The Z World Z80 Compiler generated several common code sequences, and optimization dropped code size by 20%.  
But a stack based CPU wouldn't benefit from several of those optimizations.
The free 68000 Pascal compiler was horrible and code size was cut by 50% with just 2 optimizations.  
Stuff wouldn't even run in 64K if UCSD were that bad.
But even really good compilers like GCC benefit from peephole optimization, so there has to be some improvement possible.
Even 5% would be a big deal since so much of the system is written in Pascal and compiled by itself.
 



#287 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • Topic Starter
  • 421 posts

Posted Mon Jul 3, 2017 3:34 PM

Could very well be that such optimization would be useful here too. I've not inspected the code in such a way that I looked for typical peephole candidates.



#288 TheBF OFFLINE  

TheBF

    Moonsweeper

  • 319 posts
  • Location:The Great White North

Posted Tue Jul 4, 2017 10:25 AM

It's a bit confusing here, what you mean by "the compiler running".  Are you referring to the compiler itself, when it compiles source code, or the code that it actually compiled?

For the TI 99/4A there's the special issue that it's actually a 32 K RAM machine. The UCSD Pascal needs a 48 K RAM machine to work reasonably well. But they've used the trick that the PME can run code from CPU RAM, VDP RAM and GROM (on the p-code card) to make it possible to implement the UCSD p-system IV.0 on the 99/4A. That wouldn't work if everything was in native code, as you couldn't run any programs from VDP RAM in that case. The p-code card could technically bank-switch a lot more ROM than it does (it has 12 K ROM, where 4 K remains the same, and the other 4 K are two different banks). Now it also has 48 K GROM, and they are easy enough to access, as they are seen through a byte-wide window only.

But this also makes the interpreter slightly slower. Not much when running code in line (the IPC must be separately incremented, so you lose one CPU instruction per p-code), but more so when a jump has to be taken, as it takes longer time to update the VDP or GROM read address than just load a new value to the IPC (which is in R8).

If you jump from code in VDP RAM or GROM to code in CPU RAM, or vice versa, you also have to load the other PME instruction/immediate data fetch routine. It's running in RAM at 8300H for speed, but only one version fits at the same time.

 

 

I was referring to the fact that the compiler is running as a P-code application and if it was a native code app it would be faster.  But as you say, it could never run on a standard TI-99 with regular memory expansion as native code.



#289 TheBF OFFLINE  

TheBF

    Moonsweeper

  • 319 posts
  • Location:The Great White North

Posted Tue Jul 4, 2017 10:25 AM

You can find the Pascal compiler manual at the WHTech site. The pre-compiled unit sprite is described at page 144. You can also check the chapter before, which describes sound processing in the p-system. As you can see, they've made quite elaborate designs here.

Thanks.  The sprite control has some interesting features that are not too hard for me to emulate.



#290 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • Topic Starter
  • 421 posts

Posted Tue Jul 4, 2017 2:02 PM

All right, I did an overhaul of the old Horizon RAMdisk I happened to get, and kind of got it going. I write "kind of", because there seems to be some bad connection somewhere on the card, as a few bytes get destroyed sometimes. Thus I can't really use it, but at least it works good enough to hold data the p-system needs to think it's there.

As can be seen in this picture, my system now thinks it has seven drives (blocked devices, indicated by a # before the name. Four regular drives, named PASSYS, DPASCAL, DESIGN and TIMING. It also has two RAMdisks, named RAMDSK1 and RAMDSK2. The drive called OS is the GROMdisk on the p-code card.

I'm sorry about the reflection from the window, but it's readable.

Anyway, even if this particular RAMdisk has some issues, it's proof of concept. The p-system can use two different RAMdisks, set as DSK5 and DSK6. Then they show up as units #11 and #12. I've yet to find out if a disk reporting as DSK7 will come up as unit #13? I don't know yet. But it doesn't matter too much.

I actually laid my hands on three different Horizon RAMdisks at the same time, so the overhaul will continue with the next one, to see if it works better.

 

The program listed in the file I've referred to before works. But if you install two disks, you must make sure you install one PCB in the character pattern table and the other in the sprite motion table.

 

Regarding the compiler, we'll have to be satisfied with that it is as it is, if we want to run it on a real TI. At least when both the compiler, the source and the object files are on a RAMdisk, it's at least twice as fast as it normally runs. It compiles the RAMdisk install program (470 lines) in two and a half minutes, with the code on normal floppy disk.


Edited by apersson850, Tue Jul 4, 2017 2:05 PM.


#291 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • Topic Starter
  • 421 posts

Posted Thu Jul 6, 2017 3:27 AM

As I actually laid my hands on three different Horizon disks, I also tried with what I think is the latest model, a Horizon RAMdisk 4000. There's an operating system called ROS 8.14F that came with it.

But this one fails. If I add that drive to the p-system in the same way as I did with the Horizon drive I described in the post above, the p-system will call it alright, but the drive DSR never returns to the caller. I found some old documentation, and it seems that OPA did a lot of work to make sure it would collaborate with virtually everything, but the p-system seems to be one of the few things they didn't get it going with.

 

Since I've written a general DSRloader in Pascal, a program that's capable of loading absolute assembly code (generated by the assembler you get with the p-system) at any address, even if it requires setting a certain CRU bit to access the memory, I can make a simple DSR for the Horizon 4000, to see that it can work. It only needs to implement the sector read/write subprogram, so it's not much to write. Especially since I've already done it once, for my own RAMdisk. I only need to change the memory mapping.

 

I noticed that for the older RAMdisks (there was a ROS version 4 with the one that worked above), the source code was included. But as far as I can see, that seems not to be the case with the version 8.14F. Does anyone know if this is the case?


Edited by apersson850, Thu Jul 6, 2017 3:33 AM.


#292 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • Topic Starter
  • 421 posts

Posted Tue Jul 18, 2017 3:23 PM

It turns out that the later ROS versions, like 8.12 and 8.14F, they don't adhere perfectly to the rules for how a DSR should behave. They employ a trick, where they look into internal data for the DSRLNK routine in the console, to determine which subprogram is called on the RAMdisk. This works as long as the normal DSRLNK in the console is calling the card, but the p-system has its own BIOS, with a different DSRLNK, operating under a different principle. Thus this trick fails and Horzon RAMdisks with version 8 don't work with the p-system.

The version 4 I had for the first Horizon RAMdisk I tried does not use this trick, but behaves the way a DSR is supposed to. That's why the older version works. So does my own RAMdisk design, as well as the CorComp RAMdisk a friend of mine once had.

 

A textbook example of why one should have well specified interfaces between software modules, and stick to them!



#293 blakespot OFFLINE  

blakespot

    Chopper Commander

  • 126 posts
  • Location:Alexandria, VA (USA)

Posted Fri Jul 21, 2017 8:35 AM

Speaking of Turbo Pascal, here's perhaps a rarity:

 

8467862667_8242bfbbeb.jpg

 

[ Full photo link ]

 

I bought this in 1987 to use not with a Macintosh but with my Atari 520ST fitted with the Magic Sac cartridge, part of a Macintosh emulator by David Small that worked amazingly well. 

 

 

 

bp



#294 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • Topic Starter
  • 421 posts

Posted Wed Aug 23, 2017 12:37 PM

After writing my own DSR for the Horizon 4000 card I now have, it works fine with the p-code system. Which confirms that there's nothing wrong with my card, but it's the ROS 8 that's at fault. It's too much geared towards the standard 99/4A operating system.



#295 RickyDean ONLINE  

RickyDean

    Dragonstomper

  • 663 posts

Posted Wed Aug 23, 2017 12:53 PM

After writing my own DSR for the Horizon 4000 card I now have, it works fine with the p-code system. Which confirms that there's nothing wrong with my card, but it's the ROS 8 that's at fault. It's too much geared towards the standard 99/4A operating system.

Is this something that you would feel like sharing? I have a couple of pcode cards and various horizon ramdisks, to include a 4000. If and when I get to a point to work them together, this could be useful. Thanks.



#296 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • Topic Starter
  • 421 posts

Posted Wed Aug 23, 2017 2:15 PM

Sure. Here it is.
Note that this is "proof of concept", so it's only for a RAMdisk at CRU base 1500H, with 512 K memory and implements only sector read/write for unit #11 (DSK5). That's the only thing Pascal uses. Then it also requires an installer to put the code on the card and another to convince the p-system about that there are more than the normal three blocked devices.

 

A more flexible driver could be done, if desired, so you don't need to change it for different configurations. This is based on the DSR I have for my own RAM-disk, which simply uses available memory, not used by the p-system, to implement a simple RAMdisk.
 

Spoiler

Edited by apersson850, Wed Aug 23, 2017 2:20 PM.


#297 RickyDean ONLINE  

RickyDean

    Dragonstomper

  • 663 posts

Posted Wed Aug 23, 2017 2:46 PM

Sure. Here it is.
Note that this is "proof of concept", so it's only for a RAMdisk at CRU base 1500H, with 512 K memory and implements only sector read/write for unit #11 (DSK5). That's the only thing Pascal uses. Then it also requires an installer to put the code on the card and another to convince the p-system about that there are more than the normal three blocked devices.

 

A more flexible driver could be done, if desired, so you don't need to change it for different configurations. This is based on the DSR I have for my own RAM-disk, which simply uses available memory, not used by the p-system, to implement a simple RAMdisk.
 

Spoiler

Thanks for the share.






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users