Fitzpatrick's Fabulous Future: compilers

Wednesday, August 28, 2019

Show me the optimizations!

Compilers have gotten good at optimizing code. So good, in fact, that we programmers take optimizations as granted. I don't object to optimizations, but I do think we need to re-think the opaqueness of them.

Optimizations are, in general, good things. They are changes to the code to make it faster and more efficient, while keeping the same functionality. Many times, they are changes that seem small.

For example, the code:

a = f(r * 4 + k)
b = g(r * 4 + k)

can be optimized to

t1 = r * 4 + k
a = f(t1)
b = g(t1)

The common expression r * 4 + k can be performed once, not twice, which reduces the time to execute. (It does require space to store the result, so this optimization is really a trade-off between time and space. Also, it assumes that r and k remain unchanged between calls to f() and g().)

Another example:

for i = 1 to 40
a[i] = r * 4 + k

which can be optimized to:

t1 = r * 4 + k
for i = 1 to 40
a[i] = t1

In this example, the operation r * 4 + k is repeated inside the loop, yet it does not change from iteration to iteration. (It is invariant during the loop.) The optimization moves the calculation outside the loop, which means it is calculated only once.

These are two simple examples. Compilers have made these optimizations for years, if not decades. Today's compilers are much better at optimizing code.

I am less concerned with the number of optimizations, and the types of optimizations, and more concerned with the optimizations themselves.

I would like to see the optimizations.

Optimizations are changes, and I would like to see the changes that the compiler makes to the code.

I would like to see how my code is revised to improve performance.

I know of no compiler that reports the optimizations it makes. Not Microsoft's compilers, not Intel's, not open source. None. And I am not satisfied with that. I want to see the optimizations.

Why do I want to see the optimizations?

First, I want to see how to improve my code. The above examples are trivial, yet instructive. (And I have, at times, written the un-optimized versions of those programs, although on a larger scale and with more variables and calculations to worry about.) Seeing the improvements to the code helps me become a better developer.

Second, I want to see what the compiler is doing. It may be making assumptions that are not true, possibly due to my failure to annotate variables and functions properly. I want to correct those failures.

Third, when reviewing code with other developers, I think we should review not only the original code but also the optimized code. The optimizations may give us insight into our code and data.

It is quite possible that future compilers will provide information about their optimizations. Compilers are sophisticated tools, and they do more than simply convert source code into executable bytes. It is time for them to provide more information to us, the programmers.

Tuesday, June 21, 2016

Compilers and interpreters

Programming languages (with a few exceptions) fall into one of two categories: compiled or interpreted.

Compilers are the natural descendants of assemblers. Assemblers convert text representations of processor-specific operation codes into machine-readable form; compilers convert high-level programs into machine-readable form. Interpreters, on the other hand, read high-level programs and process them, without producing an "executable".

Both forms have advantages. Compiled programs execute faster, and the source code can remain hidden from users, who need only the executable form. Interpreted programs may be slower, but the process of writing (and debugging) tends to be faster and interpreted languages have flexibilities not available in compiled languages.

Programming languages are sometimes created by individuals working without specific sponsorship and direction from a corporation (I call them "enthusiasts"). Other languages are created by corporations, in large, well-planned and well-justified projects.

But is one technique more popular than another? Let's look at the list of popular (according to tiobe.com) languages. Here are the top languages, who created them, whether they are compiled or interpreted, and when they were created:

Java: corporation (Sun); compiled; 1990s
C: enthusiasts (Kernighan and Ritchie); compiled; 1970s
C++: enthusiast (Stroustrup); compiled, derived from C; 1980s
Python: enthusiast (van Rossum); interpreted; 1990s
C#: corporation (Microsoft); compiled; 2000s
PHP: enthusiast (Lerdorf); interpreted; 1990s
JavaScript: individual (Eich); interpreted; 1990s
Perl: enthusiast (Larry Wall); interpreted; 1980s
VB.NET: corporation (Microsoft); compiled; 2000s
Ruby: enthusiast (Matsumoto); interpreted; 1990s
Delphi: corporation (Borland); compiled, derived from Pascal; 1990s
Swift: corporation (Apple); compiled; 2010s
Objective-C: enthusiasts (Cox and Love); compiled, derived from C; 1980s
R: enthusiasts (Ihaka and Gentleman); interpreted, derived from S; 1990s
Matlab: enthusiast (Moler); interpreted; 1970s
SQL: enthusiast (Codd); interpreted; 1970s
D: corporation (Digital Mars); compiled; 2000s
COBOL: government consortium; compiled; 1950s

From this list, a few things are obvious. First, we've invented both compiled and interpreted languages. Second, we've invented both over the age of computers, and continue to do so. It's not that a particular type of language was a fad or has fallen out of favor.

Look at the relationship between the type of creator and the language. Enthusiasts create interpreted languages and corporations to create compiled languages. The list above would match this rule perfectly, except for C. (C++ and Objective-C, derived from C, would naturally be compiled.)

But this is a short list, and small sample sizes may be deceptive. Let's look at some more:

APL: enthusiast (Iverson); interpreted; 1950s

BASIC: enthusiasts (Kemeny and Kurtz); interpreted; 1960s

S: enthusiasts (Becker, Wilks, Chambers); interpreted; 1970s
Fortran: corporation (IBM): compiled, derived from assembly language; 1950s
Pascal: enthusiast (Wirth); compiled; 1960s
Eiffel: enthusiast (Meyer); compiled; 1990s
Forth: enthusiast (Moore); interpreted; 1960s
dBase: enthusiast (Ratliff); interpreted; 1970s
Ada: government agency: compiled; 1970s
PL/I: corporation (IBM); compiled; 1960s
Prolog: enthusiasts (Colmerauer, et al.); interpreted; 1970s
AWK: enthusiasts (Aho, Weinberger, and Kernighan); interpreted; 1970s

DIBOL: corporation (DEC); compiled; 1970s
FOCAL: enthusiast (Merrill); interpreted; 1960s

This expanded shows that enthusiasts *tend* to create interpreted languages but not always. Corporations create compiled languages, though. The only interpreted language created by a corporation might be SQL, created by IBM but I've assigned it to E.F. Codd as an enthusiast.

I'm not sure why enthusiasts would create interpreted languages. Perhaps its more fun that way. Perhaps its easier. Interpreted languages let you stop a running program, examine the innards of your interpreter, adjust things, and continue running, all useful when debugging the interpreter.

Astute readers will note that my assignment of "enthusiast" or "corporation" to languages may be a bit loose. The designation is sometimes difficult. Kernighan and Ritchie, when creating C, were working for AT&T's Bell Labs. Are they corporation employees or enthusiasts? E.F. Codd worked for IBM when publishing his thoughts on relational databases. Is he an employee or an enthusiast? Wayne Ratliff was working for NASA's JPL when he wrote the first version of dBase and was part of Ashton-Tate when he wrote dBase II. Does that make him an employee? In all of these cases, I feel the individuals involved were doing what they did more as enthusiasts than employees.

On the flip side, I've placed Java and C# in the "corporation" side. Neither of these languages have individuals strongly associated with their origins. Java was a thing presented to us by Sun; C# was presented by Microsoft. Did the creation of these languages involve passionate individuals? Certainly. Were those individuals working on these projects independent of the corporation's needs? I see no evidence of that. (Yet I can easily see Kernighan and Ritchie working late at night to add features to their C compiler.)

I don't know if the assignment of "corporation" or "enthusiast" to a language's origin is important -- but I don't know that it isn't. It may be that enthusiasts will continue to create interpreted languages, and corporations will continue to create compiled languages.

I do think it significant that Java and C# live in between, Java with its JVM and C# with its CLR. Perl and Python have moved in that direction, too. They gain some benefits of interpreted languages and retain some benefits of compiled languages. I expect we will see more languages that use these techniques.

One more thing. Two other recently developed languages:

Go: corporation (Google); compiler; 2010s
Checked C: corporation (Microsoft); compiled, derived from C; 2010s

So maybe everyone isn't jumping on the "semi-interpreted" wagon.

Fitzpatrick's Fabulous Future

Wednesday, August 28, 2019

Show me the optimizations!

Tuesday, June 21, 2016

Compilers and interpreters

Blog Archive

About Me