Sunday, November 22, 2009

Open Source Microsoft

A lot has been written about Microsoft's latest moves to open source.

I don't expect Microsoft to turn itself into Google. Or Apache. Or even Sun or Novell. I expect Microsoft to remain Microsoft. I expect them to remain a for-profit business. I expect them to keep some amount of software as closed source.

Here's what happens if Microsoft opens its source code in a significant manner:

First, the notion of open source software becomes legitimate. People who avoided open source software because it was "not what Microsoft does" will have no reason to avoid it. They may start to laud the principles of open source. Many companies, large and small, will look at the non-Microsoft offerings and consider them. (I expect a number of shops to remain dedicated to Microsoft solutions, open or closed.)

Second, the open source community takes a hit. Not the entire community, but a major portion of it. The blow is psychological, not technical. The openness of open source defines the "open source community" and separates it from the large commercial shops like Microsoft. If Microsoft adopts open source (even in part), then the traditional open source community (many of whom are Microsoft-bashers) suffer an identity crisis.

Third, the open source folks who depended on the notion of "we're not Microsoft" will substitute some other mechnism for differentiating themselves from Microsoft. Look for renewed language wars (tricky with Microsoft funding things like IronPython and IronRuby) and possibly the notion of "pure" open source. The latter may catch companies that use a dual approach to software, such as Novell and MySQL.

Microsoft will stay focussed on its goals. The open source community may become splintered, with some folks searching for ways to bash Microsoft, some folks trying to blend Microsoft into their current solutions, and others remaining on their current path.

Could it be that Microsoft has found a way to neutralize the threat of open source software?

Sunday, November 15, 2009

With more than toothpicks

On one rather large, multi-decade project, the developers proclaimed that their program was object-oriented. Yet when I asked to see a class hierarchy chart, they could not provide one. I found this odd, since a hierarchy chart is useful, especially for new members of the team. The developers claimed that they didn't need one, and that new team members picked up the code without it. (The statement was true, although the system was so large and so complex that new members needed about six months to become productive.)

I was suspicious of the code's object-oriented-ness. I suspected them of not using object-oriented techniques.

It turns out that their code was object-oriented, but only to a small degree. They had lots of classes, all derived from framework classes. Their code was a thin layer built atop the framework. Their 'hierarchy' was exactly one layer deep. (Or tall, depending on how you look at it.)

This kind of design is akin to building a house (the application) on a good foundation (the framework) but then building everything out of toothpicks. Well, maybe not toothpicks, but small stones and pieces of wood. Rather that using studs and pre-assembled windows, this team built everything above the foundation, and built it with only what was in the foundation. They created no classes to help them -- nothing that was the equivalent of pre-made cabinets or carpeting.

The code was difficult to follow, for many reasons. One of the reasons was the constant shifting of context. Some functions were performed in classes, others were performed in code. Different levels of "height" were mixed in the same code. Here's a (small, made-up) example:

    function print_invoice(Items * items, Customer customer, Terms * terms)
    {
        // make sure customer is valid
        if (!customer.valid()) return ERR_CUST_NOT_VALID;

        // set up printer
        PrinterDialog pdlg;
        if (pdlg.DoModel() == S_OK)
        {
            Printer printer = new Printer(pdlg.GetName());

            char * buffer = NULL;

            buffer = customer.GetName();
            buffer[30] = '\0';
            printer.Print(buffer);
            delete buffer[];
            if (customer.IsBusiness())
            {
                 buffer = customer.GetCompany());
                 buffer[35] = '\0';
                 printer.Print(buffer);
            }
            // more lines to print customer info

            for (int i = 0; i < items.Count(); i++)
            {
                 int item_size = item[i].GetSize();
                 char *buffer2 = new char[item_size];
                 buffer2[item_size] = '\0';

                 printer.Print(buffer);

                 delete buffer2[];
            }

            // more printing stuff for terms and totals

        }
    }

This fictitious code captures the spirit of the problem: A relatively high-level function (printing an invoice) has to deal with very low-level operations (memory allocation). This was not an isolated example -- the entire system was coded in this manner.

The problem with this style of code is the load that it places on the programmer. The poor sap who has to maintain this code (or worse, enhance it) has to mentally bounce up and down thinking in high-level business functions and low-level technical functions. Each of these is a context switch, in which the programmer must stop thinking about one set of things and start thinking about another set of things. Context switches are expensive. You want to minimize them. If you force programmers to go through them, they will forget things. (For example, in the above code the programmer did not delete the memory allocated for printing the company name. You probably didn't notice it either -- you were to busy shifting from detail-to-general mode.)

Object-oriented programming lets us organize our code, and lets us organize it on our terms -- we get to define the classes and objects. But so few people use it to their advantage.

To be fair, in all of the programming courses and books, I have seen very little advocacy for programmers. It's not a new concept. Gerry Weinberg wrote about "the readability of programs" in his The Psychology of Computer Programming in the mid 1970s. And Perl offers many ways to to the same thing, with the guiding principle of "using the one that makes sense". But beyond that, I have seen nothing in courses that strive for making a programmer's job easier. Nor have I seen any management tracts on measuring the complexity of code and designing systems to reduce long-term maintenance costs.

Consequently, new programmers start writing code and group everything into the obvious classes, but stop there. They don't (most of the time) create hierarchies of classes. And why should they? None of their courses covered such a concept. Examples in courses have the same mix of high-level and low-level functions, so programmers have been trained to mix them. The systems they build work -- that is they produce the desired output -- with mixed contexts, so it can't be that big of a problem.

In one sense, they are right. Programs with mixed contexts can produce the desired output. Of course so can non-OO programs using structured programming. And so can spaghetti code, using neither OO or structured programming.

Producing the right output is necessary but not sufficient. The design of the program affects future enhancements and defect corrections. I believe -- but have no evidence -- that mixed-context programs have more defects than well-organized programs. I believe this because a well-organized program should be easier to read, and defects should be easier to spot. High-level functions can contain just business logic and low-level functions can contain just technical details, and a reader of either can focus on the task at hand and not switch between the two.

I think that it is time we focus on the readability of the code, and the stress load that bad code puts on programmers. We have the techniques (object-oriented programming) to organize code into readable form. We have the motive (readable code is easier to maintain). We have the computing power to "afford" what some might consider to be "inefficient" code designs.

All we need now is the will.


Wednesday, November 11, 2009

Oh say can you C?

Programmers have two favorite past-times: arguing about languages and inventing new languages. (Arguing about editors is probably a close third.) When we're not doing one, we're probably doing the other.

I've written about the demise of C++. Yet its predecessor, C, is doing well. So well that people have re-invented C to look more like modern object-oriented languages. Two new languages are "Brace" and "OOC". Brace recasts C syntax to match that of Python, removing braces and using indentation for blocking. OOC is an object-oriented language that is compiled to C.

Improvements to the C language are not new. Objective C was developed in the early 1980s, and C++ itself is a "better" version of C. The early implementations of C++ were source-to-source compiled with a program called 'cfront'.

Improvements of this nature happen a lot. Borland improved Pascal, first extending standard Pascal with useful I/O functions and later morphing it into the Delphi product. Microsoft made numerous changes to BASIC, adding features, converting to Visual Basic, and continuing to add (and often change) features. Even FORTRAN was remade into RATFOR, a name derived from 'rational Fortran'. ('Rational Fortran' meant 'looks like C'.)

I'm not sure that Brace will have much in the way of success. Recasting C into Python gets you ... well, something very close to Python. Why exert the effort? If you wanted Python, you should have started with it. Brace does include support for coroutines, something that may appeal to a very narrow audience, and has support for graphics which may appeal to a broader group. But I don't see a compelling reason to move to it. OOC is in a similar situation. My initial take is that OOC is Ruby but with static typing. And if you wanted Ruby... well, you know the rest.

Improvements to C are nice, but I think the improvers miss an important point: C is small enough to fit inside our heads. The C language is simple and can be understood with four concepts: variables, structs, functions, and pointers. Everything in C is built from these four elements, and can be understood in these terms. You can look at C code and compile it with your "cortex compiler". (I'm ignoring atrocities committed by the preprocessor.) The improved versions of C are more complex and understanding a code fragment requires broader knowledge of the program. Every feature of C++ hid something of the code, and made the person reading the code go off and look at other sections.

The most important aspect of a programming language is readability. Programmers read code more often than you think, and they need to understand. C had this quality. Its derivatives do not. Therefore, there is a cost to using the derivatives. There are also benefits, such as better program organization with object-oriented techniques. The transition from C to C++, or Objective C, or Brace, or OOC is a set of trade-offs, and should be made with care.


Sunday, November 8, 2009

Microsoft Shares its Point but Google Waves

Microsoft and Google are the same, yet different. For example, they both offer collaboration tools. Microsoft offers Sharepoint and Google has announced 'Waves'.

Microsoft Sharepoint is a web-based repository for documents (and anything that passes as a document in the Microsoft universe, such as spreadsheets and presentations). Sharepoint also has a built-in list that has no counterpart in the desktop world. And Sharepoint can be extended with programs written on the .NET platform.

Google Waves is a web based repository for conversations -- e-mail threads -- with the addition of anything that passes for a document in the Google universe.

Sharepoint and Waves are similar in that they are built for collaboration. They are also similar in that they use version control to keep previous revisions of documents.

Sharepoint and Waves are different, and their differences say a lot about their respective companies.

Sharepoint is an extension of the desktop. It provides a means for sharing documents, yet it ties in to Microsoft Office neatly. It is a way for Microsoft to step closer to the web and help their customers move.

Waves is an extension of the web forum thread model, tying in to Google documents. It is a way for Google to step closer to the desktop (or functions that are performed on the desktop) and help their customers.

I've used Microsoft Sharepoint and seen demonstration of Waves. I generally discount demonstrations -- anyone can have a nice demo -- but Google's impressed me.

The big difference is in the approach. Microsoft has introduced Sharepoint as a way for people who use desktops and the desktop metaphor to keep using them. Google, on the other hand, has positioned Waves as a replacement for e-mail.

Why should I mention e-mail? Because e-mail is a big problem for most organizations. E-mail is a model of the paper-based mail system, and not effective in the computer world. We know the problems with e-mail and e-mail threads  (reading messages from the bottom up, losing attachments, getting dropped from lists) and the problems are not small. Yet we believed that the problem was in ourselves, not the e-mail concept.

Google has a better way. They move away from e-mail and use a different model, a model of a conversation. People can join and leave as they wish. New joiners can review older messages quickly. Everyone has the latest versions of documents.

And here is the difference between Microsoft and Google. Microsoft created a tool -- Sharepoint -- to address a problem. Sharepoint is nice but frustrating to use; it is an extension of the desktop operating system and expensive to administrate. It offers little for the user and has no concept of e-mail or conversations. Google has taken the bold step of moving to a new concept, thinking (rightfully so in my opinion) that the problems of collaboration cannot be solved with the old metaphors. Google has started with the notion of conversation and built from there.

Just as EDSAC was an electronic version of a mechanical adding machine and EDVAC was a true electronic computer, e-mail is an electronic version of paper mail and Waves is a conversation system. Microsoft is apparently content with e-mail; Google is willing to innovate.