Monday, August 19, 2019

The Museum Principle for programming

Programming languages have, as one of their features, variables. A variable is a thing that holds a value and that value can vary over time. A simple example:

The statement

a = 1

defines a variable named 'a' and assigns a value of 1. Later, the program may contain the statement

a = 2 

which changes the value from 1 to 2.

The exact operations vary from language to language. In C and C++, the name is closely associated with the underlying memory for the value. Python and Ruby separate the name from the underlying memory, which means that the name can be re-assigned to point to a different underlying value. In C and C++, the names cannot be changed in that manner. But that distinction has little to do with this discussion. Read on.

Some languages have the notion of constants. A constant is a thing that holds a value and that value cannot change over time. It remains constant. C, C++, and Pascal have this notion. In C, a program can contain the statement

const int a = 1;

A later statement that attempts to change the value of 'a' will cause a compiler error. Python and Ruby have no such notion.

Note that I am referring to constants, not literals such as '1' or '3.14' that appear in the code. These are truly constant and cannot be assigned new values. Some early language implementations did allow such behavior. It was never popular.

The notion of 'constness' is useful. It allows the compiler to optimize the code for certain operations. When applied to a parameter of a function, it informs the programmer that he cannot change the value. In C++, a function of a class can be declared 'const' and then that function cannot modify member variables. (I find this capability helpful to organize code and separate functions that change an object from functions that do not.)

The notion of 'constness' is a specific form of a more general concept, one that we programmers tend to not think about. That concept is 'read but don't write', or 'look but don't touch'. Or as I like to think of it, the "Museum Principle".

The Museum Principle states that you can observe the value of a variable, but you cannot change it. This principle is different from 'constness', which states that the value of a variable cannot (and will not) change. The two are close but not identical. The Museum Principle allows the variable to change; but you (or your code) are not making the change.

It may surprise readers to learn that the Museum Principle has been used already, and for quite a long time.

The idea of "look but don't touch" is implemented in Fortran and Pascal, in loop constructs. In these languages, a loop has an index value. The index value is set to an initial value and later modified for each iteration of the loop. Here are some examples that print the numbers from 1 to 10:

An example in Fortran:

      do 100 i = 1, 10
         write(*,*) 'i =', i
100   continue

An example in Pascal:

for i:= 1 to 10 do
begin
  writeln('i =', i)
end;

In both of these loops, the variable i is initialized to the value 1 and incremented by 1 until it reaches the value 10. The body of each loop prints the value of i.

Now here is where the Museum Principle comes into play: In both Fortran and Pascal, you cannot change the value of i within the loop.

That is, the following code is illegal and will not compile:

In Fortran:

      do 100 i = 1, 10
         i = 20
         write(*,*) 'i =', i
100   continue

In Pascal:

for i:= 1 to 10 do
begin
  i := 20
  writeln('i =', i)
end;

The highlighted lines are not permitted. It is part of the specification for both Fortran and Pascal that the loop index is not to be assigned. (Early versions of Fortran and Pascal guaranteed this behavior. Later versions of the languages, which allowed aliases via pointers, could not.)

Compare this to a similar loop in C or C++:

for (unsigned int i = 1; i <= 10; i++)
{
    printf("%d\n", i);
}

The specifications for the C and C++ languages have no such restriction on loop indexes. (In fact, C and C++ do not have the notion of a loop index; they merely allow a variable to be declared and assigned at the beginning of the loop.)

The following code is legal in C and C++ (and does what you expect):

for (unsigned int i = 1; i <= 10; i++)
{
    i = 20;
    printf("%d\n", i);
}

My point here is not to say that Fortran and Pascal are superior to C and C++ (or that C and C++ are superior to Fortran and Pascal). My point is to show that the Museum Principle is useful.

Preventing changes to a loop index variable is the Museum Principle. The programmer can see the value of the variable, and the value does change, but the programmer cannot change the value. The programmer is constrained.

Some might chafe at the idea of such a restraint. Many have complained about the restrictions of Pascal and lauded the freedom of C. Yet over time, modern languages have implemented the restraints of Pascal, such as bounds-checking and type conversion.

Modern languages often eliminate loop index variables, by providing "for-each" loops that iterate over a collection. This feature is a stronger form of the "look but don't touch" restriction on loop index variables. One cannot complain about Fortran's limitations of loop index variables, unless one also dislikes the 'for-each' construct. A for-each iterator has a loop index, invisible (and untouchable!) inside.

For the "normal" loop (in which the index variable is not modified), there is no benefit from a prohibition of change to the index variable. (The programmer makes no attempt to change it.) It is the unusual loops, the loops which have extra logic for special cases, that benefit. Changing the loop index value is a shortcut, often serving a purpose that is not clear (and many times not documented). Preventing that short-cut forces the programmer to use code that is more explicit. A hassle in the short term, but better in the long term.

Constraints -- the right type of constraints -- are useful to programmers. The "structured programming" method was all about constraints for control structures (loops and conditionals) and the prohibition of "goto" operations. Programmers at the time complained, but looking back we can see that it was the right thing to do.

Constraints on loop index variables are also the right thing to do. Applying the Museum Principle to loop index variables will improve code and reduce errors.

No comments: