Fitzpatrick's Fabulous Future: constraints

Wednesday, October 5, 2022

Success with C++

Having recently written on the possible decline of C++, it is perhaps only fair that I share a success story about C++. The C++ programming language is still alive, and still useful. I should know, because a recent project used C++, and successfully!

The project was to maintain and enhance an existing C++ program. The program was written by other programmers before I arrived, over a period of years. Most of the original developers were no longer on the project. (In other words, a legacy system.)

The program itself is small by today's standards, with less than 300,000 lines of source code. It also has an unusual design (by today's standards): The program calculates economic forecasts, using a set of input data. It has no interaction with the user; the calculations are made completely with nothing more than the input data and program logic.

We (the development team) have successfully maintained and enhanced this program by following some rules, and placing some constraints upon ourselves. The goal was to make the code easy to read, easy to debug, and easy to modify. We made some design decisions for performance, but only after our initial design was shown to be slow. These constraints, I think, were key to our success.

We use a subset of C++. The language is large and offers many capabilities; we pick those that are necessary. We use classes. We rarely use inheritance. Instead, we build classes from composition. Thus, we had no problems with slicing of objects. (Slicing is an effect that can occur in C++, when casting a derived class to a base class. It generally does not occur in other OOP languages.) There are a very small number of classes that use inheritance, and in those cases we often want slicing.

We use STL but not BOOST. The STL (the Standard Template Library) is enough for our needs, and we use only what we need: strings, vectors, maps, and an occasional algorithm.

We followed the Java convention for files, classes, class names, and function names. That is, each class is stored in its own file. (In C++, we have two files, for the header file and the source file.) The name of the file is the name of the class (with a ".h" or ".cpp" extension). The class name uses camel-case, with a capital letter at the beginning of each word, for names such as "Year" or "HitAdjustment". Function names use snake-case with all lower-case letters and underscores between words. This naming convention simplified a lot of our code. When creating objects, we could create an object of type Year and name it "year". (The older code using no naming conventions, and many classes had lower-case names, which meant that when creating an object of type "ymd" (for example) we had to pick a name like "my_ymd" and keep track mentally of what was a class name and what was a variable name.)

We do not use namespaces. That is, we do not "use std" or any other namespace. This forces us to specify the namespace for every class. While tedious, it provides the benefit that one can easily see the class for function names. There is no need to search through the code, or guess about a function.

We use operator overloading only for a few classes, and only when the operators are obvious. Most of our code uses function calls. This also reduces guesswork by developers.

We have no friend classes and no friend functions. (We could use them, but we don't need them.)

Our attitude towards memory management is casual. Current operating systems provide a 2 gigabyte space for our programs, and that is enough for our needs. (It has been so far.) We avoid pointers and dynamic allocation of memory. STL allocates memory for its objects, and we assume that it will manage that memory properly.

We do not use lambdas or closures. (We could use them, but we don't need them.)

We use spacing in our code to separate sections of code. We also use spacing to denote statements that are split across multiple lines. (A blank in front and a blank after.)

We use simple expressions. This increases the number of source lines, which eases debugging (we can see intermediate results). We let the C++ compiler optimize expressions for "release" builds.

----

By using a subset of C++, and carefully picking which features make up that subset, we have successfully developed, deployed, and maintained a modest-sized C++ application.

These constraints are not traditionally considered part of the C++ language. We enforce them for our code. It provides us with a consistent style of code, and one that we find readable. New team members find that they can read and understand the code, which was one of our goals. We can quickly make changes, test them, and deploy them -- another goal.

These choices work for us, but we don't claim that they will work for other teams. You may have an application that has a different design, a different user interface, or a different set of computations, and it may require a different set of C++ code.

I don't say that you should use these constraints on your project. But I do say this: you may want to consider some constraints for your code style. We found that these constraints let us move forward, slowly at first and then rapidly.

Monday, August 19, 2019

The Museum Principle for programming

Programming languages have, as one of their features, variables. A variable is a thing that holds a value and that value can vary over time. A simple example:

The statement

a = 1

defines a variable named 'a' and assigns a value of 1. Later, the program may contain the statement

a = 2

which changes the value from 1 to 2.

The exact operations vary from language to language. In C and C++, the name is closely associated with the underlying memory for the value. Python and Ruby separate the name from the underlying memory, which means that the name can be re-assigned to point to a different underlying value. In C and C++, the names cannot be changed in that manner. But that distinction has little to do with this discussion. Read on.

Some languages have the notion of constants. A constant is a thing that holds a value and that value cannot change over time. It remains constant. C, C++, and Pascal have this notion. In C, a program can contain the statement

const int a = 1;

A later statement that attempts to change the value of 'a' will cause a compiler error. Python and Ruby have no such notion.

Note that I am referring to constants, not literals such as '1' or '3.14' that appear in the code. These are truly constant and cannot be assigned new values. Some early language implementations did allow such behavior. It was never popular.

The notion of 'constness' is useful. It allows the compiler to optimize the code for certain operations. When applied to a parameter of a function, it informs the programmer that he cannot change the value. In C++, a function of a class can be declared 'const' and then that function cannot modify member variables. (I find this capability helpful to organize code and separate functions that change an object from functions that do not.)

The notion of 'constness' is a specific form of a more general concept, one that we programmers tend to not think about. That concept is 'read but don't write', or 'look but don't touch'. Or as I like to think of it, the "Museum Principle".

The Museum Principle states that you can observe the value of a variable, but you cannot change it. This principle is different from 'constness', which states that the value of a variable cannot (and will not) change. The two are close but not identical. The Museum Principle allows the variable to change; but you (or your code) are not making the change.

It may surprise readers to learn that the Museum Principle has been used already, and for quite a long time.

The idea of "look but don't touch" is implemented in Fortran and Pascal, in loop constructs. In these languages, a loop has an index value. The index value is set to an initial value and later modified for each iteration of the loop. Here are some examples that print the numbers from 1 to 10:

An example in Fortran:

do 100 i = 1, 10
write(*,*) 'i =', i
100 continue

An example in Pascal:

for i:= 1 to 10 do
begin
writeln('i =', i)
end;

In both of these loops, the variable i is initialized to the value 1 and incremented by 1 until it reaches the value 10. The body of each loop prints the value of i.

Now here is where the Museum Principle comes into play: In both Fortran and Pascal, you cannot change the value of i within the loop.

That is, the following code is illegal and will not compile:

In Fortran:

do 100 i = 1, 10
i = 20
write(*,*) 'i =', i
100 continue

In Pascal:

for i:= 1 to 10 do
begin
i := 20
writeln('i =', i)
end;

The highlighted lines are not permitted. It is part of the specification for both Fortran and Pascal that the loop index is not to be assigned. (Early versions of Fortran and Pascal guaranteed this behavior. Later versions of the languages, which allowed aliases via pointers, could not.)

Compare this to a similar loop in C or C++:

for (unsigned int i = 1; i <= 10; i++)
{
printf("%d\n", i);
}

The specifications for the C and C++ languages have no such restriction on loop indexes. (In fact, C and C++ do not have the notion of a loop index; they merely allow a variable to be declared and assigned at the beginning of the loop.)

The following code is legal in C and C++ (and does what you expect):

for (unsigned int i = 1; i <= 10; i++)
{
i = 20;
printf("%d\n", i);
}

My point here is not to say that Fortran and Pascal are superior to C and C++ (or that C and C++ are superior to Fortran and Pascal). My point is to show that the Museum Principle is useful.

Preventing changes to a loop index variable is the Museum Principle. The programmer can see the value of the variable, and the value does change, but the programmer cannot change the value. The programmer is constrained.

Some might chafe at the idea of such a restraint. Many have complained about the restrictions of Pascal and lauded the freedom of C. Yet over time, modern languages have implemented the restraints of Pascal, such as bounds-checking and type conversion.

Modern languages often eliminate loop index variables, by providing "for-each" loops that iterate over a collection. This feature is a stronger form of the "look but don't touch" restriction on loop index variables. One cannot complain about Fortran's limitations of loop index variables, unless one also dislikes the 'for-each' construct. A for-each iterator has a loop index, invisible (and untouchable!) inside.

For the "normal" loop (in which the index variable is not modified), there is no benefit from a prohibition of change to the index variable. (The programmer makes no attempt to change it.) It is the unusual loops, the loops which have extra logic for special cases, that benefit. Changing the loop index value is a shortcut, often serving a purpose that is not clear (and many times not documented). Preventing that short-cut forces the programmer to use code that is more explicit. A hassle in the short term, but better in the long term.

Constraints -- the right type of constraints -- are useful to programmers. The "structured programming" method was all about constraints for control structures (loops and conditionals) and the prohibition of "goto" operations. Programmers at the time complained, but looking back we can see that it was the right thing to do.

Constraints on loop index variables are also the right thing to do. Applying the Museum Principle to loop index variables will improve code and reduce errors.

Fitzpatrick's Fabulous Future

Wednesday, October 5, 2022

Success with C++

Monday, August 19, 2019

The Museum Principle for programming

Blog Archive

About Me