Jump to content

Recommended Posts

I don't know if this will be enlightening or if it will blow people's minds more, but as an example of REBOL and the inner workings of my compiler, here is the piece of REBOL that currently implements the forever method in my language in the native Atari backend:

 
[
    ++ forevers
    reduce [
        break-line label: bind/new to-word ajoin ['forever- forevers #"*"] generated
        terms/1
        break-line 'CLV
        break-line 'BVC  label
    ]
]

It is just a block! of REBOL data and is handled as such. It is executed as REBOL code when the compiler detects a forever method! in the code of a program in my language. The to-word ajoin expression generates a new, numbered label for the loop. Bind/new adds and binds this label to the generated context!, so it can't conflict with names that the programmer uses in the program. Nevertheless, I append a * to generated labels to also make them stand out textually. Reduce creates a new block! that contains the inner code terms/1 that forever wraps, and the wrapping 6502 code that implements the loop. This is also just a REBOL data block!, but now it contains 6502 code in an assembly-like format. Break-line formats this data in lines that are meaningful to a human programmer. It's a binary format, but it's almost always presented in text form.

 

This is quite similar to how Atari BASIC and Mac/65 tokenise source code to achieve greater efficiency, but much more generic.

 

As you can see, the code is smaller than the explanation, which is a really good sign for expressiveness of the format. (As long as you don't devolve into regular expressions or BrainFuck.)

 

In Rainbow, this REBOL code generates the main loop. From rainbow.list:

    [
        forever-1*
        [
            [
                [LDA-at VCOUNT]
                ASL-A
            ]
            CLC
            ADC-at RTCLOK3
            STA WSYNC
            STA COLBK
        ]
        CLV
        BVC forever-1*
    ]

Again, these are just the nested REBOL data block!'s that we generate. They contain our assembly-like 6502 code, preserving the analysed structure from the high-level source program where that is useful.

 

This is fundamental REBOL. You can build the data structures you need, you can create the words you want to use, you can bind them to the applicable contexts, you can transform the data into the form you need, format it the way you want, transport it to where you need it, and execute it as code when you need it. It's very different from other languages, but very useful.

Link to comment
Share on other sites

11 hours ago, Kaj de Vos said:

This is fundamental REBOL. You can build the data structures you need, you can create the words you want to use, you can bind them to the applicable contexts, you can transform the data into the form you need, format it the way you want, transport it to where you need it, and execute it as code when you need it. It's very different from other languages, but very useful.

I understand that the transformations could do some peephole optimizations, but how do you plan to make a high-level optimizations with such approach?

Link to comment
Share on other sites

I designed it to be optimal for all kinds of optimisations. The data is abstracted through all stages, and I can add the metadata I need for an optimisation. The architecture is broadly similar to LLVM, although LLVM uses more traditional data formats.

 

I am finding that I am doing optimisations differently than they are usually discussed. It seems that most optimising compilers generate unoptimised code and then run ever more optimiser stages over it. That seems backwards to me now. I optimise code as soon as possible. In recent years, LLVM has drawn criticism for not being able to optimise language-specific patterns. I suspect this is related.

 

My ideas about optimisation are changing, so we should establish what we mean. What would you consider high-level optimisations?

Link to comment
Share on other sites

2 hours ago, Kaj de Vos said:

My ideas about optimisation are changing, so we should establish what we mean. What would you consider high-level optimisations?

Optimizations done on a higher level than assembly code like mentioned here, so closer to original source code or some intermediate representation. Example of such optimizations here or here.

2 hours ago, Kaj de Vos said:

In recent years, LLVM has drawn criticism for not being able to optimise language-specific patterns. 

Interesting, any links to read about it?

Edited by ilmenit
Link to comment
Share on other sites

Thanks for the links.

 

The first one is about general program optimisation, which is done manually by the programmer, normally not by compilers.

 

The second link is about all the optimisations VBCC does, high-level and low-level, but limited to C, the only source language it supports, which is a low-level language by todays standards. The documentation doesn't make much distinction between these optimisations, it mostly lists them. What I'm finding out is that some of these optimisations are hugely important, others not so much. VBCC and other compilers make them all optional. It doesn't make sense to me not to do the important optimisations, and about the lesser important ones I wonder whether they are worth the trouble. At least not now in the 6502 backend, and the C backend feeds into any optimising C compiler you want.

 

Also, optimisations that are very important to do to the generated C code to make CC65 generate better code, several of which I learned from your essay, are seldomly or not at all mentioned in other compilers and documentation.

 

Further, VBCC does quite a few optimisations with colourful names that are apparently standard in the industry, to turn bad code into good code. That is not my goal. I expect the programmer to turn in reasonable code, and then I will do my best to generate reasonable C or 6502 code. I wonder how much of the bloat in modern software arises because programmers are taught that they can write bad code and it doesn't matter - in some places and some circumstances, until it does.

 

The third link is what we are looking for: a compiler trying to optimise code generated from high-level languages. It's specific to the Intel compiler, and it's very much focused on loop optimisations. This is an indication that there is apparently not much else that a lower-level backend compiler can do about code generated by frontends for high-level languages.

 

Sorry, I don't have a link to similar thoughts about LLVM. I saw people state it in discussions on forums.

 

In all this, there are a number of problems with wrong priorities, wrong timing and inflexibility, sometimes due to legacy architectures. High-level languages should do their own high-level optimisations when generating intermediate code for lower-level compiler backends, because only they know what their language is about. It's horribly inefficient for a backend to reverse engineer that on the fly, and usually simply not possible. But that's the mantra in the industry: we will allow you to write or generate bad code, and then our compiler will automagically fix it...

 

REBOL is a flexible, but high-level language, yet it always struck me how much REBOL code might as well be written in, say, Action!. My compiler starts out by generating such low-level code, and will resort to more dynamic code in the future, only where it's needed. This makes all the difference: don't generate generic high-level code for a generic high-level virtual machine, make it fit the situation. An example are the loop optimisations I do. The loops are fairly abstract and high-level in the source code, but the compiler analyses them to produce loops that are specific to the situation. In turn, this allows CC65 to generate much better code for them.

Link to comment
Share on other sites

46 minutes ago, Kaj de Vos said:

I wonder how much of the bloat in modern software arises because programmers are taught that they can write bad code and it doesn't matter - in some places and some circumstances, until it does.

Quite a lot, I believe, but what matters nowadays is not always how quickly the code can run but how fast it can be written and how easy is the maintenance and readability -> take a look at the popularity of Python.

Also modern compilers do a lot for you and the general lesson is to do optimizations only where they are needed (visible bottlenecks) with help of a profiler.

 

Link to comment
Share on other sites

True, but REBOL is superior there to Python, so not everything adds up. And if you want to run on 8-bits, or your customer and boss suddenly demand a faster application, you are stuck.

 

Running a profiler also takes effort and time. I have never gotten around to it. I prefer to know myself where I can write better code.

 

According to your definition, all current optimisations in my compiler are high-level. I only just got around to generating assembly code, so all current optimisations are done before that.

Link to comment
Share on other sites

13 hours ago, ilmenit said:

the general lesson is to do optimizations only where they are needed (visible bottlenecks) with help of a profiler.

 

I would modify that slightly - do hand optimisations only when they are needed.


The compiler should be free to do automatic optimisations whenever possible.
Although some hints from the author can change which optimisations the compiler can choose from.
Eg, in C, the volatile keyword on a variable disallows a pile of optimisations but allows the variable to read from a hardware location.
Some other hints from the author also tell the compiler to optimise for speed or size - usually as a global option via command line flags.

 

This reflects a basic tenant of mine - let the machine do the work.

Link to comment
Share on other sites

22 hours ago, ilmenit said:

I wonder how much of the bloat in modern software arises because programmers are taught that they can write bad code and it doesn't matter - in some places and some circumstances, until it does.

What I've always seen as through my years as an Analyst/Programmer is that most people have no idea of how

the hardware their code runs on works, so they just write code that works.

 

As an example, one of the systems we used had some global libraries, one produced log files and wasn't particularly

quick, so we had an in-house competition to see if anyone could speed the code up.

 

We ran the original against a large transaction and it took about 33 seconds.

One guys managed to get it down to around 20 seconds.

Another got it to around 5 seconds (but admitted later he had "looked over my shoulder and pinched some code")

The one I wrote did it in about 0.7 seconds.

The main reason was that I am an electronics engineer also with an HNC in Computer engineering

and had good knowledge on how the system would be processing the data.

 

We were using "C" as our programming language.

  • Like 2
Link to comment
Share on other sites

Nice story. I think you're right.

 

My similar story is that a house mate at university who studied mechanical engineering built a machine for his final assignment, but couldn't get it to work for a long time. Until one day he came home proud and told everyone he had finally gotten his control software to keep up with the machine, by replacing "float" with "int". I didn't know much about C, but it did stun me.

 

I studied computer science myself, but over the years I started noticing a pattern that electrical engineers tend to produce much more practical software. They need it to control their actual interest, the electronics, but they view the software as a necessity that should just work, instead of expanding it until it can read email. They are also used to strict budgets on the electronics parts, so they develop their software the same.

 

In our time on Atari, this was also still true for software engineers; we had strict budgets for program size and speed. These budgets have now gone out the window. They are replaced by budgets on complexity, bug count, runtime, power consumption, crunch time, death marches, missed deadlines, mid-project resets, network attack vectors and company defaults, but most people don't seem to mind much.

 

REBOL is designed by an electronics engineer. Although it can read email. Oh well...

  • Haha 1
Link to comment
Share on other sites

Carl Sassenrath first designed Amiga OS, and REBOL was initially just meant to be the language of the successor OS. So it is designed like an OS, or at least a platform, and should be able to do everything. The features didn't creep in afterwards, but were structured into an architecture. It was only many years later that the mainstream started publishing about the convergence of languages and platforms.

  • Like 1
Link to comment
Share on other sites

My own experience (35 years embedded) is similar.

 

  1. Get the algorithm right (eg, an O(n) search routine is not as good as an O(log(n)) search routine).
  2. Beware of C++ habits where the values get transparently translated back and forth with each layer.
    Eg, object A stores a figure as a int.
    Object B wants that figure as a string - luckily object A has a interface that translates it to a string.
    Object C wants that figure from B (with operations done to it) but wants it as a float - luckily object B has a interface that translates it to a float.
    Object D wants that figure from C (with operations done to it) but wants it as a int - luckily object C has a interface that translates it to a int.
    More time spent on wasteful conversions instead of the actual operations.
  3. Write the code with an eye towards the platform.
    Saw a colleague write an 8-bit AVR program using 32-bit signed ints throughout. Changed it to 8-bit unsigned ints and it was dramatically smaller and faster.
  4. But do not optimise it to death so that it is unreadable and unmaintainable until after the critical parts are identified by a profiler.
    Replacing code like x*8 with x<<3 is just making life hard for the maintenance programmer. Modern compilers usually do this far better and lets you leave the code far cleaner.
    Leave these "tricks" in the 80's where they belong unless timing is super critical or your compiler is old.
  • Like 1
Link to comment
Share on other sites

All agreed.

 

My embedded story is from an Atari friend. He got a job at a company making embedded 6502 products. One of the products suddenly became too slow. He found that there was a multiplication in it, and a colleague had programmed it by adding the number to itself in a loop, and the multiplier had become larger.

 

Oddly, in interpreted languages such as REBOL, ancient optimisation tricks such as replacing a multiplication with shifts are still relevant, because the interpreter doesn't do any optimisation on the code. In my language, this will be different.

  • Like 1
Link to comment
Share on other sites

  • 1 month later...
On 7/15/2021 at 5:33 AM, stepho said:

Leave these "tricks" in the 80's where they belong unless timing is super critical or your compiler is old

That reminds me of a quote in a C programming manual many moons ago, it said:-

Beware any code that has a comment like "/* Neat Trick */"

  • Like 1
Link to comment
Share on other sites

I always remembered the case in the classic "dragon book" on compiler development. Its central point is the complexity of compilers, and to illustrate that, they wrote about a compiler they knew that had only a few comments in the entire source, one of which read "This code is cursed!".

 

I don't really feel that about my compiler. It's now my largest project, currently around 20,000 lines. For the first time in three decades it's beating my Atari emulator that was around 15,000 lines. A few central points of the machinery are fairly complex and tend to shift when small changes are made, but in general, the compiler is quite straightforward. It comes down to the right mix of using traditional compiler engineering insights and using much simpler REBOL techniques.

Link to comment
Share on other sites

9 hours ago, TGB1718 said:

From experience over many years with seasoned C programmers most fall into that category ?

My experience is that non-seasoned programmers do stuff like this:

/*
 * Function: Add(a,b)
 *
 * Add two integer numbers and return its result
 *
 * Parameters:
 *		a - first number
 * 		b - second number
 *
 * Known bugs:
 *		If a + b is too big, it will overflow.
 *		If a + b is too small, it will underflow.
 */

int Add(int a, int b)
{
	int r;			// temp variable

	r = a + b;		// calculate result

	return r;		// return result
}

This is just the beginning. I have seen code where the comments made up 75% or more of the total code size. Most of it unneeded if you choose your function and variable names right.

 

I'm a big proponent of self-documenting code:

int add_two_integers(int number1, int number2) {
	return number1 + number2;	// XXX might over/underflow
}

 

Edit: something I also detest is this:

	if ( a == b )		// test if a equals b
	{
		c = 42;			// if so, assign 42 to c
	}
	else
	{
		c = 0;			// if not, assign 0 to c
	}

instead of:

	c = (a==b) ? 42 : 0;

Now I hear some people say/think that's unreadable, but if you don't know the language all hope is lost. I read the last example exactly as:

 

check equality and either assign 42  or 0 to c.

 

If you wanted to assign either 1 or 0, you can also write:

	c = (a==b);

 

Edited by ivop
typo
  • Like 2
Link to comment
Share on other sites

Agree that over commenting is a waste.

 

Disagree on the other points.
The closer it looks like line noise then the more likely there will be a typo that the eye will gloss over and that the compiler happily accept.
I'll take verbose but obviously correct code any day over short but potentially wrong code.
You are in a maze of twisty syntax that all look the same.

Link to comment
Share on other sites

Fortunately, Meta can do this more elegantly.

a + b  ; Overflow trapped (in the future)
overflow a + b  ; May overflow

Self-documenting, no comments needed.

 

The same flow control is used for either returning a value or discarding it:

c: either a = b [42] [0]

This turned out to be surprisingly hard to generate C code for in all cases. I always thought of C as an orthogonal language supporting expression-based programming, but it isn't.

 

The last example would be:

c: to byte!  a = b

Not as terse as the C version, but more strongly typed.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...