Sunday, November 15, 2009

With more than toothpicks

On one rather large, multi-decade project, the developers proclaimed that their program was object-oriented. Yet when I asked to see a class hierarchy chart, they could not provide one. I found this odd, since a hierarchy chart is useful, especially for new members of the team. The developers claimed that they didn't need one, and that new team members picked up the code without it. (The statement was true, although the system was so large and so complex that new members needed about six months to become productive.)

I was suspicious of the code's object-oriented-ness. I suspected them of not using object-oriented techniques.

It turns out that their code was object-oriented, but only to a small degree. They had lots of classes, all derived from framework classes. Their code was a thin layer built atop the framework. Their 'hierarchy' was exactly one layer deep. (Or tall, depending on how you look at it.)

This kind of design is akin to building a house (the application) on a good foundation (the framework) but then building everything out of toothpicks. Well, maybe not toothpicks, but small stones and pieces of wood. Rather that using studs and pre-assembled windows, this team built everything above the foundation, and built it with only what was in the foundation. They created no classes to help them -- nothing that was the equivalent of pre-made cabinets or carpeting.

The code was difficult to follow, for many reasons. One of the reasons was the constant shifting of context. Some functions were performed in classes, others were performed in code. Different levels of "height" were mixed in the same code. Here's a (small, made-up) example:

    function print_invoice(Items * items, Customer customer, Terms * terms)
    {
        // make sure customer is valid
        if (!customer.valid()) return ERR_CUST_NOT_VALID;

        // set up printer
        PrinterDialog pdlg;
        if (pdlg.DoModel() == S_OK)
        {
            Printer printer = new Printer(pdlg.GetName());

            char * buffer = NULL;

            buffer = customer.GetName();
            buffer[30] = '\0';
            printer.Print(buffer);
            delete buffer[];
            if (customer.IsBusiness())
            {
                 buffer = customer.GetCompany());
                 buffer[35] = '\0';
                 printer.Print(buffer);
            }
            // more lines to print customer info

            for (int i = 0; i < items.Count(); i++)
            {
                 int item_size = item[i].GetSize();
                 char *buffer2 = new char[item_size];
                 buffer2[item_size] = '\0';

                 printer.Print(buffer);

                 delete buffer2[];
            }

            // more printing stuff for terms and totals

        }
    }

This fictitious code captures the spirit of the problem: A relatively high-level function (printing an invoice) has to deal with very low-level operations (memory allocation). This was not an isolated example -- the entire system was coded in this manner.

The problem with this style of code is the load that it places on the programmer. The poor sap who has to maintain this code (or worse, enhance it) has to mentally bounce up and down thinking in high-level business functions and low-level technical functions. Each of these is a context switch, in which the programmer must stop thinking about one set of things and start thinking about another set of things. Context switches are expensive. You want to minimize them. If you force programmers to go through them, they will forget things. (For example, in the above code the programmer did not delete the memory allocated for printing the company name. You probably didn't notice it either -- you were to busy shifting from detail-to-general mode.)

Object-oriented programming lets us organize our code, and lets us organize it on our terms -- we get to define the classes and objects. But so few people use it to their advantage.

To be fair, in all of the programming courses and books, I have seen very little advocacy for programmers. It's not a new concept. Gerry Weinberg wrote about "the readability of programs" in his The Psychology of Computer Programming in the mid 1970s. And Perl offers many ways to to the same thing, with the guiding principle of "using the one that makes sense". But beyond that, I have seen nothing in courses that strive for making a programmer's job easier. Nor have I seen any management tracts on measuring the complexity of code and designing systems to reduce long-term maintenance costs.

Consequently, new programmers start writing code and group everything into the obvious classes, but stop there. They don't (most of the time) create hierarchies of classes. And why should they? None of their courses covered such a concept. Examples in courses have the same mix of high-level and low-level functions, so programmers have been trained to mix them. The systems they build work -- that is they produce the desired output -- with mixed contexts, so it can't be that big of a problem.

In one sense, they are right. Programs with mixed contexts can produce the desired output. Of course so can non-OO programs using structured programming. And so can spaghetti code, using neither OO or structured programming.

Producing the right output is necessary but not sufficient. The design of the program affects future enhancements and defect corrections. I believe -- but have no evidence -- that mixed-context programs have more defects than well-organized programs. I believe this because a well-organized program should be easier to read, and defects should be easier to spot. High-level functions can contain just business logic and low-level functions can contain just technical details, and a reader of either can focus on the task at hand and not switch between the two.

I think that it is time we focus on the readability of the code, and the stress load that bad code puts on programmers. We have the techniques (object-oriented programming) to organize code into readable form. We have the motive (readable code is easier to maintain). We have the computing power to "afford" what some might consider to be "inefficient" code designs.

All we need now is the will.


No comments: