From what I've read, the p-system Pascal compiler isn't doing much in the way of optimizing for speed. If it does anything, it's for compact code. But that's more a question about generating code that "fits" the PME (P-Machine Emulator) than anything else.
As far as I remember, writing cnt := pred(cnt); generates faster (and more compact) code than writing cnt := cnt-1;
But that's not where the time is spent in the benchmarks. They mainly test the library sprite, a pre-compiled unit which has a lot of features, but obviously doesn't work very fast. Since you can create linked lists with sprite descriptions, lists where each element has a timeout and links to the next item when the time runs out, they offer more functionality than I've seen implemented for other languages. It could even be that they are written so that you can only update the sprites on each interrupt, but that I don't know. Pure speculation. The p-system does have its own sprite auto-motion code, which runs on each VDP interrupt.
Speaking about them, the p-system also handles concurrency and a keyboard buffer during the interrupts. That's a few more instructions to pass through each time. Does Forth scan they keyboard to check for a user break key while words are running?
As a general comment, the p-system is designed around the idea of fitting a complete Pascal compiler (equivalent to the capabilities found in Turbo Pasacal 4, which took a lot of inspiration from UCSD Pascal), and a run-time system which allows for dynamically reloacating of code and swapping of code segments from disk automatically during operation, inside the limits imposed by having only 48 K RAM, of which one third actually is video memory. Execution speed takes a hit there.
Another big hub in the p-system design is portability. The compiler is the same program regardless of which system you run it on. If it generated native code, it would require substantial changes for each CPU. Thus portability would be lost.
Most p-systems do provide the NCG program, a Native Code Generator which can accept a critical program segment as input and translate that to machine language. The p-code which runs in-line assembly is supported, and the compiler on the TI does support generating in-line p-code directly, but then you have to know about it (it's not documented) and you have to handle how to get the assembly in there. I've found it easier to develop the programs in Pascal, when doable, but design them around calling procedure/functions most of the time, instead of large chunks of in-line code. Thus it's relatively easy to re-write critical things in assembly.
Edited by apersson850, Sat Jul 1, 2017 7:32 AM.