Fitzpatrick's Fabulous Future: immutable objects

Showing posts with label immutable objects. Show all posts

Wednesday, April 20, 2022

Advances in programming come from restraints

Advances in programming come from, to a large extent, advances in programming languages. And those advances in programming languages, unlikely as it seems, are mostly not expanded features but restrictions.

That advances come from restrictions seems counter-intuitive. How does fewer choices make us better programmers?

Let's look at some selected changes in programming languages, and how they enabled better programming.

The first set of restrictions was structured programming. Structured programming introduced the concepts of the IF/THEN/ELSE statement and the WHILE loop. More importantly, structured programming banished the GOTO statement (and its cousin, the IF/GOTO statement). This restriction was an important advancement for programming.

A GOTO statement allows for arbitrary flows of control within programs. Structured programming's IF/THEN/ELSE and WHILE statements (and WHILE's cousin, the FOR statement) force structure onto programs. Arbitrary flows of control were not possible.

The result was programs that were harder to write but easier to understand, easier to debug, and easier to modify. Structured programming -- the loss of GOTO -- was an advancement in programming.

A similar advance occurred with object-oriented programming. Like structured programming, object-oriented programming was a set of restrictions coupled with a set of new features. In object oriented programming, those restrictions were encapsulation (hiding data within a class) and the limiting of functions (requiring an instance of the class to execute). Data encapsulation protected data from arbitrary changes; one had to go through functions (in well-designed systems) to change the data. Instance functions were limited to executing on instances of the class, which meant that one had to *have* an instance of the class to call the function. Functions could not be called at arbitrary points in the code.

Both structured programming and object-oriented programming advanced the state of the art for programming. They did it by restricting the choices that programmers could make.

I'm going to guess that future advancements in programming will also come from restrictions in new programming languages. What could those restrictions be?

I have a few ideas.

One idea is immutable objects. This idea has been tested in the functional programming languages. Those languages often have immutable objects, objects which, once instantiated, cannot change their state.

In today's object-oriented programming languages, objects are often mutable. They can change their state, either through functions or direct access of member data.

Functional programming languages take a different view. Objects are immutable: once formed they cannot be changed. Immutable objects enforce discipline in programming: you must provide all of the ingredients when instantiating an object; you cannot partially initialize an object and add things later.

I would like to see a programming language that implements immutable objects. But not perfectly -- I want to allow for some objects that are not immutable. Why? Because the shift to "all objects are immutable" is too much, too fast. My preference is for a programming language to encourage immutable designs and require extra effort to design mutable objects.

A second idea is a limit to the complexity of expressions.

Today's programming languages allow for any amount of complexity in an expression. Expressions can be simple (such as A + 1) or complex (such as A + B/C - sqr(B + 7) / 2), or worse.

I want expressions to be short and simple. This means breaking a complex expression into multiple statements. The only language that I know that placed restrictions on expressions was early FORTRAN, and then only for the index to an array variable. (The required form was I*J+K, where I, J, and K were optional.)

Perhaps we could design a language that limited the number of operations in an expression. Simpler expressions are, well, simpler, and easier to understand and modify. Any expression that contained more than a specific number of operations would be an error, forcing the programmer to refactor the expression.

A third idea is limits on the size of functions and classes. Large functions and large classes are harder to understand than small functions and small classes. Most programming languages have a style-checker, and most style-checkers issue warnings for long functions or classes with lots of functions.

I want to strengthen those warnings and change them to errors. A function that is too long (I'm not sure how long is too long, but that's another topic) is an error -- and the compiler or interpreter rejects it. The same applies to a class: too many data members, or too many functions, and you get an error.

But like immutable objects, I will allow for some functions to be larger than the limit, and some classes to be more complex than the limit. I recognize that some classes and functions must break the rules. (But the mechanism to allow a function or class to break the rules must be a nuisance, more than a simple '@allowcomplex' attribute.)

Those are the restrictions that I think will help us advance the art of programming. Immutable objects, simple expressions, and small functions and classes.

Of these ideas, I think the immutable objects will be the first to enter mainstream programming. The concept has been implemented, some people have experience with it, and the experience has been positive. New languages that combine object-oriented programming with functional programming (much like Microsoft's F#, which is not so new) will allow more programmers to see the benefits of immutable objects.

I think programming will be better for it.

Sunday, November 20, 2016

Matters of state

One difference between functional programming and "regular" programming is the use of mutable state. In traditional programming, objects or programs hold state, and that state can change over time. In functional programming, objects are immutable and do not change their state over time.

One traditional beginner's exercise for object-oriented programming is to simulate an automated teller machine (ATM). It is often used because it maps an object of the physical world onto an object in the program world, and the operations are nontrivial yet well-understood.

It also defines an object (the ATM) which exists over time and has different states. As people deposit and withdraw money, the state of the ATM changes. (With enough withdrawals the state becomes "out of cash" and therefore "out of service".)

The ATM model is also a good example of how we in the programming industry have been focussed on changing state. For more than half a century, our computational models have used state -- mutable state -- often to the detriment of maintenance and clarity.

Our fixation on mutable state is clear to those who use functional programming languages. In those languages, state is not mutable. Programs may have objects, but objects are fixed and unchanging. Once created, an object may contain state but cannot change. (If you want an object to contain a different state, then you must create a new object with that different state.)

Programmers in the traditional languages of Java and C# got an exposure to this notion with the immutable strings in those languages. A string in Java is immutable; you cannot change its contents. If you want a string with a different content, such as all lower-case letters, you have to create a new object.

Programming languages such as Haskell and Erlang make that notion the norm. Every object is immutable, every object may contain state but cannot be changed.

Why has it taken us more than fifty years to arrive at this, um, well, state?

I have a few ideas. As usual with my explanations, we have to understand our history.

One reason has to do with efficiency. The other reason has to do with mindset.

Reason one: Objects with mutable state were more efficient.

Early computers were less powerful than those of today. With today's computers, we can devote some percentage of processing to memory management and garbage collection. We can afford the automatic memory management. Earlier computers were less powerful, and creating and destroying objects were operations that took significant amounts of time. It was more efficient to re-use the same object and simply change its state rather than create a new object with the new state, point to that new object, and destroy the old object and return its memory to the free pool.

Reason two: Objects with mutable state match the physical world

Objects in the real world hold physical state. Whether it is an ATM or an automobile or an employee's file, the physical version of the object is one that changes over time. Books at a library, in the past, contained a special pocket glued to the back cover used to hold a card which indicated the borrower and the date due back at the library. That card held different state over time; each lending would be recorded -- until the card was filled.

The physical world has few immutable objects. (Technically all objects are mutable, as they wear and fade over time. But I'm not talking about those kinds of changes.) Most objects, especially objects for computation, change and hold state. Cash registers, ATMs, dresser draws that hold t-shirts, cameras (with film that could be exposed or unexposed), ... just about everything holds state. (Some things do not change, such as bricks and stones used for houses and walkpaths, but those are not used for computation.)

We humans have been computing for thousands of years, and we've been doing it with mutable objects for all of that time. From tally sticks with marks cut by a knife to mechanical adding machines, we've used objects with changing states. It's only in the past half-century that it has been possible to compute with immutable objects.

That's about one percent of the time, which considering everything we're doing, isn't bad. We humans advance our calculation methods slowly. (Consider how long it took to change from Roman numerals to Arabic, and how long it took to accept zero as a number.)

I think the lesson of functional programming (with its immutable objects) is this: We are still in the early days of human computing. We are still figuring out how to calculate, and how to represent those calculations for others to understand. We should not assume that we are "finished" and that programming is "done". We have a long journey ahead of us, one that will bring more changes. We learn as we travel on this journey, and they end -- or even the intermediate points -- is not clear. It is an adventure.

Tuesday, January 13, 2015

Functional programming exists in C++

Some programmers of C++ may look longingly at the new functional programming languages Haskell or Erlang. (Or perhaps the elder languages of Common Lisp or Scheme.) Functional programmi-ness is a Shiny New Thing, and C++ long ago lost its Shiny New Thing luster of object-oriented programming.

Yet C++ programmers, if they look closely enough, can find a little bit of functional programming in their language. Hidden in the C++ specification is a tiny aspect of functional programming. It occurs in C++ constructor initializers.

Initializers are specifications for the initialization of member variables in a constructor. The C++ language provides for default initialization of member variables; initializers override these defaults and let the programmer specific actions.

Given the class:

class MyInts {
private:
int a1_;
int a2_;
public:
MyInts(void);
}

one can store two integers in an object of type MyInts. The old-style C++ method is to provide 'setter' and 'getter' functions to allow the setting and retrieval of values. Something like:

class MyInts {
private:
int a1_;
int a2_;
public:
MyInts(void);
void setA1(int a1) { a1_ = a1; };
int getA1(void) const { return a1_; };
void setA2(int a2) { a2_ = a2; };
int getA2(void) const { return a2_; };
}

The new-style C++ (been around for years, though) dispenses with the 'setter' functions and uses initializers and parameters in the constructor:

class MyInts {
private:
int a1_;
int a2_;
public:
MyInts(int a1, int a2) : a1_(a1), a2_(a2) {};
int getA1(void) const { return a1_; };
int getA2(void) const { return a2_; };
}

The result is an efficiently-constructed object. Another result is an immutable object, as one cannot change its state after construction. (The 'setter' functions are gone.) That may or may not be what you want, although in my experience ir probably is what you want.

Initializers are interesting. One cannot do just anything in an initializer. You can provide a constant value. You can provide a constructor for a class (if your member variable is an object). You can call a function that provides a value, but it should be either a static function (not a member function) or a function outside of the class. (Calling a member function on the same class is an undefined operation. It may work, or it may not.)

These restrictions on initializers enforce one of the attributes of functional programming: immutable objects. In my example, I eliminated the 'setter' functions to may objects of MyInts immutable, but that was an intentional effect. I could have left the 'setter' functions in place, and then objects of MyInts would be mutable.

Initializers brook no such nonsense. You have one opportunity to set the value for a member variable (you cannot initialize a member variable more than once). Once it is set, it cannot be changed, during the initialization. You cannot call a function that has a side effect of changing a member variable that has been previously set. (Such a call would be to a member function, and while permitted by the compiler, you should avoid them.)

Initializers provide a small bit of functional programming inside C++. Who would have thought?

Technically, the attributes I have described are not functional programming, but merely immutable objects. Functional programming allows one to treat functions as first class citizens of the language, creating them and passing them to other functions as needed. The initializers in C++ do not allow such constructs.

Tuesday, December 17, 2013

The transition from object-oriented to functional programming

I am convinced that we will move from object-oriented programming to functional programming. I am also convinced that the transition will be a difficult one, more difficult that the transition from structured programming to object-oriented programming.

The transition from structured programming to object-oriented programming was difficult. Object-oriented programming required a new view of programming, a new way of organizing data and code. For programmers who had learned the ways of structured programming, the shift to object-oriented programming meant learning new techniques.

That in itself is not enough to cause the transition to functional programming to be more difficult. Functional programming is, like object-oriented programming, a new view of programming and a new way of organizing data and code. Why would the transition to functional programming be more difficult?

I think the answer lies within the organization of programs. Structured programming, object-oriented programming, and functional programming all specify a number of rules for programs. For structured programming, functions and subroutines should have a single entry point and exit point and IF/THEN/ELSE blocks and FOR/NEXT loops should be used instead of GOTO statements.

Object-oriented programming groups data into classes and uses polymorphism to replace some conditional statements. But object-oriented programming was not totally incompatible with structured programming (or 'procedural programming'). Object-oriented programming allowed for a top level of new design with lower layers of the old design. Many early object-oriented programs had large chunks of procedural code (and some still do to this day). The thin layer of objects simply acted as a container for structured code.

Functional programming doesn't have this same degree of compatibility with object-oriented programming (or structured programming). Functional programming uses immutable objects; object-oriented programming is usually about mutable objects. Functional programming works with sets of data and leverages tail recursion efficiently, object-oriented programming uses the explicit loops and conditional statements of procedural programming.

The constructs of functional programming work poorly at containing object-oriented constructs. The "trick" of wrapping old code in a containing layer of new code may not work with functional programming and object-oriented programming. It may be better to build functional programming constructs inside of object oriented programming constructs, working from the "inside out" rather than from the "outside in" of the object-oriented transition.

One concept that has helped me transition is that of immutable objects. This is a notion that I have "imported" from functional programming into object-oriented programming. (And I must admit that the idea is not mine or not even new; Java's String objects are immutable and have been since its inception.)

The use of immutable objects has improved my object-oriented programs. It has moved me in the direction of functional programming -- a step in the transition.

I believe that we will transition from object-oriented programming to functional programming. I foresee a large effort to do so, and I foresee that some programs will remain object-oriented programs, just as some legacy programs remain procedural programs. I am uncertain of the time frame; it may be in the next five years or the next twenty. (The advantages of functional programming are compelling, so I'm tending to think sooner rather than later.)

Sunday, April 28, 2013

C++ without source (cpp) files

A thought experiment: can we have C++ programs without source files (that it, without .cpp files)?

The typical C++ program consists of header files (.h) and source files (.cpp). The header files provide definitions for classes, and the source files provide the definition of the implementations.

Yet the C++ language allows one to define function implementation in the header files. We typically see this only for short functions. To wit:

random_file.h

class random_class
{
private:
int foo_;
public:
random_class( int foo ) : foo_(foo);
int foo( void ) { return foo_ };
}

This code defines a small class that contains a single value and has no methods. The sole member variable is initialized in the constructor.

Here's my idea: Using the concepts of functional programming (namely immutable variables that are initialized in the constructor), one can define a class as a constructor and a bunch of read-only accessors.

If we keep class size to a minimum, we can define all classes in header files. The constructors are simple, and the accessor functions simply return calculated values. There is no need for long methods.

(Yes, we could define long functions in headers, but that seems to be cheating. We allow short functions in headers and exile long functions into .cpp files.)

Such a design is, I think, possible, although perhaps impractical. It may be similar to the chemists' "perfect gas", an abstraction that is nice to conceive but unseen in the real world.

Yet a "perfect gas" of a class (perhaps a "perfect class") may be possible for some classes in a program. Those perfect classes would be small, with few member variables and only accessor functions. Its values would be immutable. The member variables may be objects of smaller classes (perhaps perfect classes) with immutable values of their own.

This may be a way to improve code quality. My experience shows that immutable objects are much easier to code, to use, and to debug. If we build simple immutable classes, then we can code them in header files and we can discard the source files.

Coding without source files -- no there is an idea for the future.

Friday, August 24, 2012

How I fix old code

Over the years (and multiple projects) I have developed techniques for improving object-oriented code. My techniques work for me (and the code that has been presented to me). here is what I do:

Start at the bottom Not the base classes, but the bottom-most classes. The classes that are used by other parts of the code, and have no dependencies. These classes can stand alone.

Work your way up After fixing the bottom classes, move up one level. Fix those classes. Repeat. Working up from the bottom is the only way I have found to be effective. One can have an idea of the final result, a vision of the finished product, but only by fixing the problems at the bottom can one achieve any meaningful results.

Identify class dependencies To start at the bottom, one must know the class dependencies. Not the class hierarchy, but the dependencies between classes. (Which classes use which other classes at run-time.) I use some custom Perl scripts to parse code and create a list of dependencies. The scripts are not perfect but they give me a good-enough picture. The classes with no dependencies are the bottom classes. Often they are utility classes that perform low-level operations. They are the place to start.

Create unit tests Tests are your friends! Unit tests for the bottom (stand-alone) classes are generally easy to create and maintain. Tests for higher-level classes are a little trickier, but possible with immutable lower-level classes.

Make objects immutable The Java String class (and the C# String class) showed us a new way of programming. I ignored it for a long time (too long, in my opinion). Immutable objects are unchangeable, and do not have the "classic" object-oriented functions for setting properties. Instead, they are fixed to their original value. When you want to change a property, the immutable object techniques dictate that instead of modifying an object you create a new object.

I start by making the lowest-level classes immutable, and then working my way up the "chain" of class dependencies.

Make member variables private Create accessor functions when necessary. I prefer to create "get" accessors only, but sometime it is necessary to create "set" accessors. I find that it easier to track and identify access with functions than with member variables, but that may be an effect of Visual Studio. Once the accessors are in place, I forget about the "get" accessors and look to remove the "set" accessors"

Create new constructors Constructors are your friends. They take a set of data and build an object. Create the ones that make sense for your application.

Fix existing constructors to be complete Sometimes people use constructors to partially construct objects, relying on the code to call "set" accessors later. Immutable object programming has none of that nonsense: when you construct an object you must provide everything. If you cannot provide everything, then you are not allowed to construct the object! No soup (or object) for you!

When possible, make member functions static Static functions have no access to member variables, so one must pass in all "ingredient" variables. This makes it clear which variables must be defined to call the function. Not all member functions can be static; make the functions called by constructors static when possible. (Really, put the effort into this task.) Calls to static functions can be re-sequenced at will, since they cannot have side effects on the object.

Static functions can also be moved from one class to another, at will. Or at least easier than member functions. It's a good attribute when re-arranging code.

Reduce class size Someone (I don't remember where) claimed that the optimum class size was 70 lines of code. I tend to agree with this size. Bottom classes can easily be expressed in 70 lines. (if not, they are probably composites of multiple elementary classes.) Higher-level classes can often be represented in 70 lines or less, sometimes more. (But never more than 150 lines.)

Reducing class size usually means increasing the number of classes. You code size may shrink somewhat (my experience shows a reduction of 40 to 60 percent) but it does not reduce to zero. Smaller classes often means more classes. I find that a system with more, smaller classes is easier to understand than one with fewer, large classes.

Name your classes well Naming is one of the great challenges of programming. Pick names carefully, and change names when it makes sense. (If your version control system resists changes to class names, get a new version control system. It is the servant, not you!)

Talk with other developers Discuss changes with other developers. Good developers can provide useful feedback and ideas. (Poor developers will waste your time, though.)

Discuss code with non-developers Our goal is to create code that can be read by non-developers who are experts in the subject matter. We want them to read our code, absorb it, and provide feedback. We want them to say "yes, that seems right" (or even better, "oh, there is a problem here with this calculation"). To achieve that level of understanding, we need to strip away all of the programming overhead: temporary variables, memory allocation, and sequence/iteration gunk. With immutable object programming, meaningful names, and modern constructs (in C++, that means BOOST) we can create high-level routines that are readable by non-programmers.

(Note that we are not asking the non-programmers to write code, merely to read it. That is enough.)

These techniques work for me (and the folks on my projects). Your mileage may vary.

Sunday, October 23, 2011

Functional programming pays off (part 2)

We continue to gain from our use of functional programming techniques.

Using just the "immutable object" technique, we've improved our code and made our programming lives easier. Immutable objects have given us two benefits this week.

The first benefit: less code. We revised our test framework to use immutable objects. Rather than instantiating a test object (which exercises the true object under test) and asking it to run the tests, we now instantiate the test object and it runs the tests immediately. We then simply ask it for the results. Our new code is simpler than before, and contains fewer lines of code.

The second benefit: we can extract classes from one program and add them to another -- and do it easily. This is a big win. Often (too often), extracting a class from one program is difficult, because of dependencies and side effects. The one class requires other classes, not just direct dependencies but classes "to the side" and "above" in order to function. In the end, one must import most of the original system!

With immutable objects, we have eliminated side effects. Our code has no "side" or "above" dependencies, and has fewer direct dependencies. Thus, it is much easier for us to move a class from one program into another.

We took advantage of both of these effects this week, re-organizing our code. We were productive because our code used immutable objects.

Friday, September 30, 2011

Functional programming pays off

I've been using the "immutable objects" technique from functional programming. This technique starts with object-oriented programming and constrains objects to immutables, objects that cannot change once constructed. This is quite different from the traditional object-oriented programming approach, in which objects can change their state.

With the immutable object style, objects can be constructed but not modified. This is constraining, yet it is also freeing. Once we have an object, I know that I can re-arrange code and not affect the object -- it is immutable, after all. Re-arranging code lets us simplify the higher-level functions and make the code more readable.

The new technique was not "natural" -- no change in programming techniques ever is -- and it took some effort to change. I started at the bottom of the object hierarchy, which let me modify objects with no dependencies. This approach was important. I could change the bottom-most objects and not affect the other (high-level) objects. It let me introduce the concept gradually, and with minimal ripples.

Over the past few weeks, I have extended the immutable style upwards, and now most classes are immutable. This change has already yielded results; we can debug problems faster and change the system design quickly, and each time we know that we are introducing no new defects. (A comprehensive set of tests helps, too.)

We now have code that can be read by our (non-programming) subject matter experts, code that works, and code that can be easily changed. This is a win for all of us.

I expect immutable object programming to become popular, and soon.

Monday, August 22, 2011

Immutable Object Programming

I've been working with "Immutable Object Programming" and becoming more impressed with it.

Immutable Object Programming is object-oriented programming with objects that, once created, do not change. It is a technique used in functional programming, and I borrowed it as a transition from traditional object-oriented programming to functional programming.

Immutable Object Programming (IOP) enforces a discipline on the programmer, much like structured programming enforced a discipline on programmers. With IOP, one must assemble all components of an object prior to its creation. The approach of traditional object-oriented programming allows for objects to change state, and this is not possible with IOP. With IOP, you do not want an object to change state. Instead, you want a new object, often an object of a different type. Thus, when you have new information, you construct a new object from the old, adding the information and creating a new object of a similar but different type. (For example, a Sale object and a payment are used to construct a CompletedSale object.)

IOP yields programs that have lots of classes and the logic is mostly linear. The majority of statements are assignment statements -- often creating an object, and the logic for iteration and decisions are contained within the constructor code.

As a programmer, I have a good feeling about the programs I write using IOP techniques. It is a feeling of certainty, a feeling that the code is correct. It is a good feeling.

I experienced this feeling once before, when I learned structured programming techniques. At the time, my programs were muddled and difficult to follow. With structured programming techniques, my programs became understandable.

I have not had that feeling since. I did not experience it with object-oriented programming; OOP was difficult to learn and not clarifying.

You can use immutable object programming immediately; it requires no new compiler or language. It requires a certain level of discipline, and a willingness to change. I use it with the C# language; it works with any modern language. (For this conversation, C++ is omitted from the set of modern languages.) I started with the bottom layer of our objects, the ones that are self-contained. Once the "elementary" objects were made immutable, I moved up a layer to the next set of objects. Within a few weeks I was at the highest level of objects in our code.

Fitzpatrick's Fabulous Future