Monday, February 27, 2012

Uglies

Programming has seen a number of ugly things, and we programmers (and more specifically, language designers) have improved them.

The GOTO statement

The most famous ugly thing in programming is probably the GOTO statement. First called out by Edsger Dijkstra in the Communications of the ACM, (description here on wikipedia) it is the poster child of poor programming practices. The GOTO statement was a direct analog of the assembly language "jump" instruction (often assigned the code 'JMP', but it varied from processor to processor), and it allowed for difficult-to-read programs. We improved programming languages with structured programming, 'if/then/else' statements, 'while' loops, and iterations over collections. (The 'goto' statement was omitted from Java, but remains in C, C++, and C#.)

Global variables

Global variables were another form of ugliness, allowing any part of a program to read (or more excitedly, modify) a variable. One never could tell what value a global variable would contain. They were mandatory in COBOL; present in FORTRAN, C, and C++; removed in Java and C#.

Arithmetic IF

FORTRAN has the honor of originating the 'arithmetic IF', a three-destination comparison of a value. (It was not limited to FORTRAN, the thing would show up later in FOCAL.) One of three GOTO statements would be executed, based on the sign of an expression. (The sign could be positive, negative, or zero.) This nasty beast was the result of the IBM 704 instruction set, which allowed such a construct in one instruction. Efficient for the processor, but not so much for the programmers.

Pointers

Pointers were available in Pascal and the life-blood of C. In Pascal (and in C) they allowed the construction of numerous data structures, many of which were impossible in earlier languages. Yet pointers were also a form of "GOTO in data" and lead to lots of headaches. They were eventually replaced by references, which were pointers bound to a known valid entity.

Memory management

The pointers in C (and Pascal, somewhat) demanded memory management. One could allocate memory for anything, but one also had to track that memory and release it when one was finished with it. The later languages of Visual Basic, Perl, Java, C#, Python, and Ruby all replaced manual memory management with garbage collection.

Early garbage collection algorithms were unpredictable and often caused performance problems. Later algorithms (and faster processors) made garbage collection practical.

Column-dependent coding

FORTRAN (and to some extent COBOL) used indentation as a significant indicator to the compiler. FORTRAN was locked into a restrictive format that specified columns for line numbers and statements (and statement continuation on a successive 'source card'). COBOL's view of indentation was more advanced. While optional, it was a popular convention and saw life again in Python's indentation for block definition.

Short variable names

BASIC was initially limited to a single letter and an optional digit. (No 'R2D2' for you!) Original FORTRAN saw variable names limited to six characters. Early PC compilers for BASIC, Pascal, and C had similar restrictions. Modern compilers and interpreters allow for variable names longer than I can care to type (and I type some long names!).

The overall trend

Looking back, we can see that lots of ugly programming constructs were made for efficiency (arithmetic IF) or due to limitations of memory (short variable names) or processing power (GOTO). Advances in hardware allowed for work to be shifted from programmers to compilers, interpreters, and run-time systems. But here's the thing: advances in programming languages and techniques are much slower than advances in hardware.

Today's computers are more powerful than those of the 1960s by several orders of magnitude. While replacing GOTO with structure programming and replacing direct memory management with garbage collection, the change in software is much smaller than that of the change for hardware.

I'm tending to think that this effect is caused by the locality of software. We programmers are close to our programs; the hardware is remote, sitting on the far side of the compiler. We can, should we choose, replace the hardware with faster (compatible) equipment or switch to a new target processor by changing the back end of the compiler. In contrast, the programming language constructs are close to us, living inside our heads. We think in our programming languages and are loathe to give them up.

Moreover, we programmers often learn to overcome the ugly aspects of programming languages and sometimes develop techniques to leverage them. We become attached to these tricks, and we are quite reluctant to let them go.

If we want to advance the art, we will have to give up the old (ugly) constructs and adopt the new techniques. It is not easy; I myself have had to give up BASIC for C, C for C++, and C++ for Java and C#, and C# for Ruby. I have given up unstructured programming for structured programming, structured (procedural) programming for object-oriented programming, and object-oriented programming for immutable-object-programming. Each transition has been difficult, in which I had to un-learn the old ways. Yet I find that the new languages and techniques are better and allow me to be more effective.

No comments: