Sunday, November 15, 2009

With more than toothpicks

On one rather large, multi-decade project, the developers proclaimed that their program was object-oriented. Yet when I asked to see a class hierarchy chart, they could not provide one. I found this odd, since a hierarchy chart is useful, especially for new members of the team. The developers claimed that they didn't need one, and that new team members picked up the code without it. (The statement was true, although the system was so large and so complex that new members needed about six months to become productive.)

I was suspicious of the code's object-oriented-ness. I suspected them of not using object-oriented techniques.

It turns out that their code was object-oriented, but only to a small degree. They had lots of classes, all derived from framework classes. Their code was a thin layer built atop the framework. Their 'hierarchy' was exactly one layer deep. (Or tall, depending on how you look at it.)

This kind of design is akin to building a house (the application) on a good foundation (the framework) but then building everything out of toothpicks. Well, maybe not toothpicks, but small stones and pieces of wood. Rather that using studs and pre-assembled windows, this team built everything above the foundation, and built it with only what was in the foundation. They created no classes to help them -- nothing that was the equivalent of pre-made cabinets or carpeting.

The code was difficult to follow, for many reasons. One of the reasons was the constant shifting of context. Some functions were performed in classes, others were performed in code. Different levels of "height" were mixed in the same code. Here's a (small, made-up) example:

    function print_invoice(Items * items, Customer customer, Terms * terms)
    {
        // make sure customer is valid
        if (!customer.valid()) return ERR_CUST_NOT_VALID;

        // set up printer
        PrinterDialog pdlg;
        if (pdlg.DoModel() == S_OK)
        {
            Printer printer = new Printer(pdlg.GetName());

            char * buffer = NULL;

            buffer = customer.GetName();
            buffer[30] = '\0';
            printer.Print(buffer);
            delete buffer[];
            if (customer.IsBusiness())
            {
                 buffer = customer.GetCompany());
                 buffer[35] = '\0';
                 printer.Print(buffer);
            }
            // more lines to print customer info

            for (int i = 0; i < items.Count(); i++)
            {
                 int item_size = item[i].GetSize();
                 char *buffer2 = new char[item_size];
                 buffer2[item_size] = '\0';

                 printer.Print(buffer);

                 delete buffer2[];
            }

            // more printing stuff for terms and totals

        }
    }

This fictitious code captures the spirit of the problem: A relatively high-level function (printing an invoice) has to deal with very low-level operations (memory allocation). This was not an isolated example -- the entire system was coded in this manner.

The problem with this style of code is the load that it places on the programmer. The poor sap who has to maintain this code (or worse, enhance it) has to mentally bounce up and down thinking in high-level business functions and low-level technical functions. Each of these is a context switch, in which the programmer must stop thinking about one set of things and start thinking about another set of things. Context switches are expensive. You want to minimize them. If you force programmers to go through them, they will forget things. (For example, in the above code the programmer did not delete the memory allocated for printing the company name. You probably didn't notice it either -- you were to busy shifting from detail-to-general mode.)

Object-oriented programming lets us organize our code, and lets us organize it on our terms -- we get to define the classes and objects. But so few people use it to their advantage.

To be fair, in all of the programming courses and books, I have seen very little advocacy for programmers. It's not a new concept. Gerry Weinberg wrote about "the readability of programs" in his The Psychology of Computer Programming in the mid 1970s. And Perl offers many ways to to the same thing, with the guiding principle of "using the one that makes sense". But beyond that, I have seen nothing in courses that strive for making a programmer's job easier. Nor have I seen any management tracts on measuring the complexity of code and designing systems to reduce long-term maintenance costs.

Consequently, new programmers start writing code and group everything into the obvious classes, but stop there. They don't (most of the time) create hierarchies of classes. And why should they? None of their courses covered such a concept. Examples in courses have the same mix of high-level and low-level functions, so programmers have been trained to mix them. The systems they build work -- that is they produce the desired output -- with mixed contexts, so it can't be that big of a problem.

In one sense, they are right. Programs with mixed contexts can produce the desired output. Of course so can non-OO programs using structured programming. And so can spaghetti code, using neither OO or structured programming.

Producing the right output is necessary but not sufficient. The design of the program affects future enhancements and defect corrections. I believe -- but have no evidence -- that mixed-context programs have more defects than well-organized programs. I believe this because a well-organized program should be easier to read, and defects should be easier to spot. High-level functions can contain just business logic and low-level functions can contain just technical details, and a reader of either can focus on the task at hand and not switch between the two.

I think that it is time we focus on the readability of the code, and the stress load that bad code puts on programmers. We have the techniques (object-oriented programming) to organize code into readable form. We have the motive (readable code is easier to maintain). We have the computing power to "afford" what some might consider to be "inefficient" code designs.

All we need now is the will.


Wednesday, November 11, 2009

Oh say can you C?

Programmers have two favorite past-times: arguing about languages and inventing new languages. (Arguing about editors is probably a close third.) When we're not doing one, we're probably doing the other.

I've written about the demise of C++. Yet its predecessor, C, is doing well. So well that people have re-invented C to look more like modern object-oriented languages. Two new languages are "Brace" and "OOC". Brace recasts C syntax to match that of Python, removing braces and using indentation for blocking. OOC is an object-oriented language that is compiled to C.

Improvements to the C language are not new. Objective C was developed in the early 1980s, and C++ itself is a "better" version of C. The early implementations of C++ were source-to-source compiled with a program called 'cfront'.

Improvements of this nature happen a lot. Borland improved Pascal, first extending standard Pascal with useful I/O functions and later morphing it into the Delphi product. Microsoft made numerous changes to BASIC, adding features, converting to Visual Basic, and continuing to add (and often change) features. Even FORTRAN was remade into RATFOR, a name derived from 'rational Fortran'. ('Rational Fortran' meant 'looks like C'.)

I'm not sure that Brace will have much in the way of success. Recasting C into Python gets you ... well, something very close to Python. Why exert the effort? If you wanted Python, you should have started with it. Brace does include support for coroutines, something that may appeal to a very narrow audience, and has support for graphics which may appeal to a broader group. But I don't see a compelling reason to move to it. OOC is in a similar situation. My initial take is that OOC is Ruby but with static typing. And if you wanted Ruby... well, you know the rest.

Improvements to C are nice, but I think the improvers miss an important point: C is small enough to fit inside our heads. The C language is simple and can be understood with four concepts: variables, structs, functions, and pointers. Everything in C is built from these four elements, and can be understood in these terms. You can look at C code and compile it with your "cortex compiler". (I'm ignoring atrocities committed by the preprocessor.) The improved versions of C are more complex and understanding a code fragment requires broader knowledge of the program. Every feature of C++ hid something of the code, and made the person reading the code go off and look at other sections.

The most important aspect of a programming language is readability. Programmers read code more often than you think, and they need to understand. C had this quality. Its derivatives do not. Therefore, there is a cost to using the derivatives. There are also benefits, such as better program organization with object-oriented techniques. The transition from C to C++, or Objective C, or Brace, or OOC is a set of trade-offs, and should be made with care.


Sunday, November 8, 2009

Microsoft Shares its Point but Google Waves

Microsoft and Google are the same, yet different. For example, they both offer collaboration tools. Microsoft offers Sharepoint and Google has announced 'Waves'.

Microsoft Sharepoint is a web-based repository for documents (and anything that passes as a document in the Microsoft universe, such as spreadsheets and presentations). Sharepoint also has a built-in list that has no counterpart in the desktop world. And Sharepoint can be extended with programs written on the .NET platform.

Google Waves is a web based repository for conversations -- e-mail threads -- with the addition of anything that passes for a document in the Google universe.

Sharepoint and Waves are similar in that they are built for collaboration. They are also similar in that they use version control to keep previous revisions of documents.

Sharepoint and Waves are different, and their differences say a lot about their respective companies.

Sharepoint is an extension of the desktop. It provides a means for sharing documents, yet it ties in to Microsoft Office neatly. It is a way for Microsoft to step closer to the web and help their customers move.

Waves is an extension of the web forum thread model, tying in to Google documents. It is a way for Google to step closer to the desktop (or functions that are performed on the desktop) and help their customers.

I've used Microsoft Sharepoint and seen demonstration of Waves. I generally discount demonstrations -- anyone can have a nice demo -- but Google's impressed me.

The big difference is in the approach. Microsoft has introduced Sharepoint as a way for people who use desktops and the desktop metaphor to keep using them. Google, on the other hand, has positioned Waves as a replacement for e-mail.

Why should I mention e-mail? Because e-mail is a big problem for most organizations. E-mail is a model of the paper-based mail system, and not effective in the computer world. We know the problems with e-mail and e-mail threads  (reading messages from the bottom up, losing attachments, getting dropped from lists) and the problems are not small. Yet we believed that the problem was in ourselves, not the e-mail concept.

Google has a better way. They move away from e-mail and use a different model, a model of a conversation. People can join and leave as they wish. New joiners can review older messages quickly. Everyone has the latest versions of documents.

And here is the difference between Microsoft and Google. Microsoft created a tool -- Sharepoint -- to address a problem. Sharepoint is nice but frustrating to use; it is an extension of the desktop operating system and expensive to administrate. It offers little for the user and has no concept of e-mail or conversations. Google has taken the bold step of moving to a new concept, thinking (rightfully so in my opinion) that the problems of collaboration cannot be solved with the old metaphors. Google has started with the notion of conversation and built from there.

Just as EDSAC was an electronic version of a mechanical adding machine and EDVAC was a true electronic computer, e-mail is an electronic version of paper mail and Waves is a conversation system. Microsoft is apparently content with e-mail; Google is willing to innovate.


Friday, October 16, 2009

The end of the C++ party

In the future, historians of programming languages will draw a line and say: "this is the point that C++ began it's decline". And that point will be prior to today. The party is over for C++, although many of the partygoers are still drinking punch and throwing streamers in the air.

Peter Seibel's blog excerpts comments from the just-released Coders and Work. He lists multiple comments about the C++ language, all of them from detractors.

C++ has had a history of negative comments. It's early history, as a quiet project and before the internet and related twitterness, saw comments about C++ through e-mails and usenet. As people became interested in C++, there were more comments (some positive and some negative) but there was the feeling that C++ was the future and it was the place to go. Negative comments, when made, were either directed to the difficultly of learning a new paradigm (object-oriented programming), the implementation (the compiler and libraries), or the support tools (the IDE and debugger). C++ was the shiny new thing.

The arrival of IBM OS/2 and Microsoft Windows also made C++ attractive. OS/2 and Windows use an event-driven model, and object-oriented programs fare better than procedural programs. Microsoft's support for C++ (among other languages) also made it a "safe" choice.

The novelty of a new programming language is a powerful drug, and C++ was a new language. Managers may have been reluctant to move to it (the risks of unknown territory and longer ramp-up for developers) and some programmers too (charges of larger executables and "inefficient generated code") but eventually we (as an industry) adopted it. The euphoria of the new was replaced with the optimism of the next release: "Yes," we told ourselves, "we're having difficulties, but the problem is in our compiler, or our own expertise. Next year will be better!"

And for a while, the next year was better. And the year after that one was better too, because we were becoming better object-oriented programmers and the compilers were getting better.

But there were those who complained. And those who doubted. And there were those who took action.

Sun introduced Java, another object-oriented programming language. For a while, it held the allure of "the new thing". It had its rough spots (performance, IDE) but we overcame them and newer versions were better. And C++ was no longer the one and only choice for object-oriented programming. (I'm ignoring the earlier languages such as LISP and Scheme. They never entered the mainstream.)

Once we had Java, we could look at C++ in a different light. C++ was not the shining superhero that we desired. He was just another shlub that happened to do some things well. C++ was demoted from "all-wonderful" to "just another tool", much to the delight of the early complainers.

Other languages emerged. Python. Ruby. Objective-C. Haskell. All were object-oriented, but none powerful enough to dislodge C++. The killer (for C++) was Microsoft's C# language. The introduction of C# (and .NET) struck two blows against C++.

First, C# was viewed as a Java clone. Microsoft failed at embracing and extending Java, so they created a direct competitor. By doing so, they gave Java (and its JVM) the stamp of legitimacy.

Second, Microsoft made C# their premier language, demoting C++ below Visual Basic. (Count the number of sample code fragments on the Microsoft web site.) Now Microsoft was saying that C++ wasn't the shiny new thing.

We (in the programming industry) examined our problems with C++, discussed them, debated, them, and arrived at a conclusion: problems have been solved, but the one problem remaining is that C++ is a difficult language. The next version of the compiler will not fix that problem. Nor will more design patterns. Nor will user groups.

The C++ party is over. People are leaving. Not just the folks in Coders at Work, but regular programmers. Companies are finding it hard to hire C++ programmers. Recruiters tell me that C++ programmers want to move on to other things. We as a profession have decided to if not abandon C++, at least give it a smaller role.

Which presents a problem for the owners of C++ systems.

The decision to leave C++ has been made at the programmer level. Programmers want out. Very few college graduates learn C++ (or want to learn it).

But the owners of systems (businessmen and managers) have not made the decision to leave C++. For the most part, they want to keep their (now legacy) applications running. They see nothing wrong with C++, just as they saw nothing wrong with C and FORTRAN and COBOL and dBase V. C++ works for them.

In a bizarre, almost Marxist twist, the workers are leaving owners with the means of production (the compilers and IDEs of C++) and moving on to other tools.

C++ has been elevated to the rank of "elder language", joining COBOL and possibly FORTRAN. From this point on, I expect that the majority of comments on C++ will be negative. We have decided to put it out to pasture, to retire it.

There is too much code written in C++ to simply abandon it. Businesses have to maintain their code. Some open source projects will continue to use it. But it will be used grudgingly, as a concession to practicalities. Linux won't be converted to a new language... but the successor to Linux will use something other than C++.


Friday, October 9, 2009

Glass houses

I just went through the experience of renewing my IEEE (and IEEE Computer Society) membership with the IEEE web pages. The transaction was, in a word, embarrassing.

Here is my experience:

- After I logged in, the web site complained that I was attempting to start a second session and left me with an empty window. I had to re-load the renewal page to continue. (Not simple press the "reload" button, but re-select the IEEE URL again.)

- The few pages to process the renewal were straightforward, until I reached the "checkout" page. This page had a collection of errors.

- After entering my credit card number, the site informed me that I had too many characters in the number. I had entered the number with spaces, just as it appears on my credit card and my statements. The site also erased my entry, forcing me to re-enter the entire number.

- I used the "auto-fill" button to retrieve the stored address. The auto-fill did not enter a value for the country, however, and nor could I, as the field was disabled. Only after adjusting the street address could I select a country.

- After clicking the "process" button, the web site informed me that I had an invalid value in the "state/province" field. I dutifully reviewed the value supplied by the auto-fill routine, changed it from "MD" to "MD".

- That action fixed the problem with the state/province field, but the web site then erased my credit card number. After entering the credit card number again (the third time), I was able to renew my membership.

If the IEEE (and by association the IEEE Computer Society) cannot create and maintain a check-out web site, a function that has been with us for the past ten years and is considered elementary, then they have little credibility for advice on software design and construction. More than that, if the IEEE cannot get "the basics" right, how can anyone trust them for the advanced concepts?


Thursday, October 8, 2009

A cell phone is not a land-line phone

When you call a land line, you call a place. When you call a cell phone, you call a person.

I heard this idea at a recent O'Reilly conference. (It was either E-Tech or OSCON, but I don't remember. Nor do I remember the speaker.)

In the good ole days, calling a place was the same as calling a person. Mostly. A typical (working-class) person could have two locations: home and office. To discuss business, you called them at their office. To discuss other matters, you called them at their home.

A funny thing happened on the way to the Twenty-first Century: people became mobile, and technology became mobile too.

Mobility is not a new idea. Indeed, one can look at the technological and social changes of the Twentieth Century to see the trend of increasing mobility. Trains, airplanes, hotels, reservation systems... the arrow points from "stay in one place" to "move among locations". Modern-day cell phones and portable internet tablets are logical steps in a long chain.

People have become mobile and businesses will become mobile too.

Yet many people (and many organizations) cling to the old notion of "a person has a place and only one place". Even stronger is the idea "a business has a place and only one place (except for branch offices and subsidiaries)". Our state and federal governments have coded these notions into laws, with concepts of "state of residence" and "permanent address". Many businesses tie their customers to locations, and then build an internal organization based on that assumption (regional sales reps, for example). For customers that have large physical assets such as factories and warehouses, this makes some sense. But for the lightweight customer, one without the anchoring assets, it does not. (Yet businesses -- and governments -- will insist on a declared permanent address because their systems need it.)

Newer businesses are not encumbered with this idea. Twitter and LiveJournal, for example, don't care about your location. They don't have to assess your property, send tax bills, or deliver physical goods. Facebook does allow you to specify a location, but as a convenience for finding other people in your social network. (Facebook does limit you to one physical location, though, so I cannot add my summer home.)

Some businesses go so far as to tie an account to a physical location. Land line phones for one, from the old billing practices of charging based on distance called. At least one large shipping company uses the "you are always in this place" concept, since it also uses "charge based on distance" model.

For moving physical boxes in the real world, this may make some sense, but telephone service has all but completely moved to the "pure minutes" model, with no notion of distance. (Calling across country borders is more expensive, but this is a function of politics and rate tariffs and not technology.)

We have separated a person from a single location. Soon we will detach businesses from single locations.


Sunday, October 4, 2009

Limits to Growth

Did you know that Isaac Newton, esteemed scientist and notable man of the Church, once estimated the theoretical maximum height for trees? I didn't, until I read a recent magazine article. It claimed that he calculated the maximum height as 300 feet, using strength and weight formulas.

I have found no other reference to confirm this event, but perhaps the truth of the event is less important than the idea that one can calculate a theoretical maximum.

For trees, the calculation is straightforward. Weight is a function of volume, and can be charted as a line graph. Strength is a function of the cross-section of the tree, and can also be charted as a line graph. The two lines are not parallel, however, and the point at which they cross is the theoretical maximum. (There are a few other factors, such as the density of the wood, and they can be included in the calculation.) The intersection point is the limit, beyond which no tree can grow.

Let's move from trees to software. Are there limits to software? Can we calculate the maximum size of a program or system? Here the computations are more complex. I'm not referring to arbitrary limits such as the maximum modules a compiler can handle (although those limits seem to be relegated to our past) but to the size of a program, or of a system of programs.

It's hard to say that there are limits to the size of programs. Our industry, over the past sixty years, has seen programs and systems grow in size and complexity. In the early days, a program of a few hundred lines of code was considered large. Today we have systems with hundreds of millions of lines of code. There seems to be no upper limit.

If we cannot identify absolute limits for programs or systems, can we identify limits to programming teams? It's quite easy to see that a programming team of one person would be limited to the one of a single individual. That individual might be extremely talented and extremely hard-working, or might be an average performer. A team of programmers, in theory, can perform more work than a single programmer. Using simple logic, we could simply add programmers until we achieve the needed size.

Readers of The Mythical Man-Month by Fred Brooks will recognize the fallacy of that logic. Adding programmers to a team increases capacity, but also increases the communication load. More programmers need more coordination. Their contributions increase linearly, but coordination effort increases faster than linearly. (Metcalfe's law which indicates that communication channels increase as the square of the participants, works against you here.) You have a graph with two lines, and at some point they cross. Beyond that point, your project spends more time communication than coding, and each additional programmer costs more than they produce.

I don't have numbers. Brooks indicated that a good team size was about seven people. That's probably a shock to the managers of large, multi-million LOC projects and their teams of dozens (hundreds?) of programmers. Perhaps Brooks is wrong, and the number is higher.

The important thing is to monitor the complexity. Knowing the trend helps one plan for resources and measure efficiency. Here's my list of important factors. These are the things I would measure:

- The complexity of the data
- The complexity of the operations on the data
- The power of the programming language
- The power of the development tools (debuggers, automated tests)
- The talent of people on the team (programmers, testers, and managers)
- The communication mechanisms used by the team (e-mail, phone, video conference)
- The coordination mechanisms used by the team (meetings, code reviews, documents)
- The rate at which changes are made to the code
- The quality of the code
- The rate at which code is refactored

The last two factors are often overlooked. Changes made to the code can be of high or low quality. High-quality changes are elegant and easy to maintain. Low-quality changes get the work done, but leave the code difficult to maintain. Refactoring improves the code quality while keeping the feature set constant. Hastily-made changes often leave you in a technical hole. These two factors measure the rate at which you are climbing out of the hole. If you aren't measuring these two factors, then your team is probably digging the hole deeper.

So, as a manager, are you measuring these factors?

Or are you digging the hole deeper?