Thursday, December 17, 2009

e-Books are not books

The statement "e-Books are not books" is, on its face, a tautology. Of course they're not books. The exist as bits and can be viewed only by a "reader", a device or program that renders pixels to a person.

My point is beyond the immediate observation. e-Books are not books, and will have capabilities we do not associate with books today. e-Books are a new form, just as the automobile was not a horseless carriage and a word processor is not a typewriter.

We humans need time to understand a new thing. We didn't "get" electronic computing right away. ENIAC was an electronic version of a mechanical adding machine; a few years later EDVAC was a computer.

Shifts in technology can be big or small. Music was distributed on paper; the transition to 78s and LPs was a major shift. It took us some time to fully appreciate the possibilities of recorded music. The shift to compact discs (small shiny plastic instead of large, warping vinyl) was a small one; little changed in our consumption or in the economic model. The shift to digital music on forms other than shiny plastic discs is a big one, and the usage and economic model will change.

An on-line newspaper is not a newspaper. It will become a form of journalism -- but not called a newspaper, nor will it have the same capabilities or limitations of ink slapped onto dead trees.

e-Books are not books. The stories and information presented to us on Kindle and Nook readers is the same information as in printed books, but that will change. For example, I expect that annotations will become the norm for e-books. Multiple readers will provide annotations, with comments for themselves and for others. (Think of it as today's e-book with Twitter and Google.) One person has blogged about their method for reading books (http://www.freestylemind.com/how-to-read-a-book) and how they keep notes and re-read portions of books for better understanding. Their system uses post-it notes. I predict that future e-Book readers will allow for the creation and storage of personal notes, and the sharing of notes with friends and the world.

Or perhaps e-Books will let us revise books and make corrections. (Think "e-Books today combined with Wikipedia".)

And that is why an e-book is not a book.


Sunday, December 13, 2009

Code first, then design, and then architecture

Actually, the sequence is: Tests first, then code, then design, and finally architecture.

On a recent project, I worked the traditional project sequence backwards. Rather than starting with an architecture and developing a design and then code, we built the code and then formed a design, and later evolved an architecture. We used an agile process, so we had tests supporting us as we worked, and those tests were valuable.

Working backwards seems wrong to many project managers. It breaks with the metaphor of programming as building a house. It goes against the training in project management classes. It goes against the big SDLC processes.

Yet it worked for us. We started with a rudimentary set of requirements. Very rudimentary. Something along the lines of "the program must read these files and produce this output", and nothing more formal or detailed. Rather than put the details in a document, we left the details in the sample files.

Our first task was to create a set of tests. Given the nature of the program, we could use simple scripts to run the program and compare output against a set of expected files. The 'diff' program was our test engine.

After we had some tests, we wrote some code and ran the tests. Some passed, but most failed. We weren't discouraged; we expected most of them to fail. We slowly added features to the code and got our tests to pass. As we coded, we thought of more tests and added them to our scripts.

Eventually, we had a program that worked. The code wasn't pretty -- we have made several design compromises as we coded -- but it did provide the desired output. The "standard" process would advance at this point, to formal testing and then deployment. But we had other plans.

We wanted to improve the code. There were several classes that were big and hard to maintain. We knew this by looking at the code. (Even during our coding sessions, we told ourselves "this is ugly".) So we set out to improve the code.

Managers of typical software development efforts might cringe at such an effort. They've probably seen efforts to improve code, many times which fail without delivering any improvement. Or perhaps the programmers say that the code is better, but the manager has no evidence of improvement.

We had two things that helped us. First was our tests. We were re-factoring the code, so we knew that the behavior would not change. (If you're re-factoring code and you want the behavior to change, then you are not re-factoring the code -- you're changing the program.) Our tests kept us honest, by finding changes to behavior. When we were done, we had new code that passed all of the old tests.

The second thing we had was class reference diagrams. Not class hierarchy diagrams, but reference diagrams. Class hierarchy diagrams show you the inheritance and container relationships of classes. Reference diagrams give you a different picture, showing you which classes are used by other classes. The difference is subtle but important.) The reference diagrams gave us a view of the design. They showed all of our classes, with arrows diagramming the connections between classes. We had several knots of code -- sets of classes with tightly-coupled relationships -- and we wanted a cleaner design.

We got our cleaner design, and we kept the "before" and "after" diagrams. The non-technical managers could see the difference, and commented that the "after" design was a better one.

We repeated this cycle of code-some, refactor-some, and an architecture evolved. We're pretty happy with it. It's easy to understand, allows for changes, and gives us the performance that we need.

Funny thing, though. At the start, a few of us had definite ideas about the "right" architecture for the programs. (Full disclosure: I was one such individual.) Our final architecture, the one that evolved to meet the specific needs of the program as we went along and learned about the task, looked quite different from the initial ideas. If we had picked the initial architecture and stayed with it, our resulting program would be complicated and hard to maintain. Instead, by working backwards, we ended with a better design and better code.

Sometimes, the way forward is to go in reverse.


Saturday, December 5, 2009

Limits to App Growth

Long ago (when dinosaurs roamed the Earth), applications were limited in size. Today, applications are limited in size, but from different causes.

Old constraints were hardware and software: the physical size of the computer (memory and disk), the capabilities of the operating system, and the capacities of the compiler. For example, some compilers had a fixed size to the symbol table.

Over the decades, physical machines became more capable, and the limits from operating systems and compilers have become less constraining. So much so that they no longer limit the size of applications. Instead, a different factor is the limiting one. That factor is upgrades to tools.

How can upgrades limit the size of an application? After all, new versions of compilers are always "better" than the old. New operating systems give us more features, not fewer.

The problem comes not from the release of new tools, but from the deprecation of the old ones.

New versions of tools often break compatibility with the old version. Anyone who programmed in Microsoft's Visual Basic saw this as Microsoft rolled out version 4, which broke a lot of code. And then again as version 5 broke a lot of code. And then again as VB.NET broke a ... well, you get the idea.

Some shops avoid compiler upgrades. But you can't avoid the future, and at some point you must upgrade. Possibly because you cannot buy new copies of the old compiler. Possibly because another tool (like the operating system) forces you to the new compiler. Sometimes a new operating system requires the use of new features (Windows NT, for example, with its "Ready for Windows NT" logo requirements).

Such upgrades are problematic for project managers. They divert development resources from other initiatives with no increase in business capabilities. They're also hard to predict, since they occur infrequently. One can see that the effort is related to the size of the code, but little beyond that. Will all modules have to change, or only a few? Does the code use a language feature or library call that has changed? Are all of the third-party libraries compatible with the new compiler?

The project team is especially challenged when there is a hard deadline. This can come from the release of a new platform ("we want to be ready for Windows 7 on its release date!") or the expiration of an old component ("Visual Studio 6 won't be supported on Windows Vista"). In these situations, you *have* to convert your system to the new component/compiler/platform by a specific date.

This is the factor that limits your system size. Small systems can be adapted to a new compiler or platform with some effort. Larger systems require more effort. Systems of a certain size will require so much effort that they cannot be converted in time. What's the crossover point? That depends on your code, your tools, and your team's talent. I think that every shop has its own factors. But make no mistake, in every shop there is a maximum size to a system, a size that once crossed will be too large to upgrade before the deadline.

What are the deadlines? That's the evil part to this situation. You're not in control of these deadlines; your vendors create them. For most shops, that's Microsoft, or Sun, or IBM.

Here's the problem for Microsoft shops: MFC libraries.

Lots of applications use MFC. Big systems and small. Commonly used systems and rarely-used ones. All of them dependent on the MFC libraries.

At some point, Microsoft will drop support for MFC. After they drop support, their new tools will not support MFC, and using MFC will become harder. Shops will try to keep the old tools, or try to drag the libraries into new platforms, but the effort won't be small and won't be pretty.

The sunset of MFC won't be a surprise. I'm sure that Microsoft will announce it well in advance. (They've made similar announcements for other tools.) The announcement will give people notice and let them plan for a transition.

But here's the thing: Some shops won't make the deadline. Some applications are so big that their maintainers will be unable to convert them in time. Even if they start on the day Microsoft announces their intent to "sunset" MFC. Their system is too large to meet the deadline.

That's the limit to systems. Not the size of the physical machine, not the size of the compiler's symbol table, but the effort to "stay on the treadmill" of new versions. Or rather, the ability of the development team to keep from falling off the end of the treadmill.

I've picked MFC as the bogeyman in this essay, but there are other dependencies. Compilers, operating systems, third-party libraries, IP4, UNICODE, mobile-ness in apps, the iPhone, Microsoft Office file formats, Windows as a dominant platform, ... the list goes on.

All projects are built on foundations. These foundations can change. You must be prepared to adapt to changes. Are you and your team ready?


Sunday, November 22, 2009

Open Source Microsoft

A lot has been written about Microsoft's latest moves to open source.

I don't expect Microsoft to turn itself into Google. Or Apache. Or even Sun or Novell. I expect Microsoft to remain Microsoft. I expect them to remain a for-profit business. I expect them to keep some amount of software as closed source.

Here's what happens if Microsoft opens its source code in a significant manner:

First, the notion of open source software becomes legitimate. People who avoided open source software because it was "not what Microsoft does" will have no reason to avoid it. They may start to laud the principles of open source. Many companies, large and small, will look at the non-Microsoft offerings and consider them. (I expect a number of shops to remain dedicated to Microsoft solutions, open or closed.)

Second, the open source community takes a hit. Not the entire community, but a major portion of it. The blow is psychological, not technical. The openness of open source defines the "open source community" and separates it from the large commercial shops like Microsoft. If Microsoft adopts open source (even in part), then the traditional open source community (many of whom are Microsoft-bashers) suffer an identity crisis.

Third, the open source folks who depended on the notion of "we're not Microsoft" will substitute some other mechnism for differentiating themselves from Microsoft. Look for renewed language wars (tricky with Microsoft funding things like IronPython and IronRuby) and possibly the notion of "pure" open source. The latter may catch companies that use a dual approach to software, such as Novell and MySQL.

Microsoft will stay focussed on its goals. The open source community may become splintered, with some folks searching for ways to bash Microsoft, some folks trying to blend Microsoft into their current solutions, and others remaining on their current path.

Could it be that Microsoft has found a way to neutralize the threat of open source software?

Sunday, November 15, 2009

With more than toothpicks

On one rather large, multi-decade project, the developers proclaimed that their program was object-oriented. Yet when I asked to see a class hierarchy chart, they could not provide one. I found this odd, since a hierarchy chart is useful, especially for new members of the team. The developers claimed that they didn't need one, and that new team members picked up the code without it. (The statement was true, although the system was so large and so complex that new members needed about six months to become productive.)

I was suspicious of the code's object-oriented-ness. I suspected them of not using object-oriented techniques.

It turns out that their code was object-oriented, but only to a small degree. They had lots of classes, all derived from framework classes. Their code was a thin layer built atop the framework. Their 'hierarchy' was exactly one layer deep. (Or tall, depending on how you look at it.)

This kind of design is akin to building a house (the application) on a good foundation (the framework) but then building everything out of toothpicks. Well, maybe not toothpicks, but small stones and pieces of wood. Rather that using studs and pre-assembled windows, this team built everything above the foundation, and built it with only what was in the foundation. They created no classes to help them -- nothing that was the equivalent of pre-made cabinets or carpeting.

The code was difficult to follow, for many reasons. One of the reasons was the constant shifting of context. Some functions were performed in classes, others were performed in code. Different levels of "height" were mixed in the same code. Here's a (small, made-up) example:

    function print_invoice(Items * items, Customer customer, Terms * terms)
    {
        // make sure customer is valid
        if (!customer.valid()) return ERR_CUST_NOT_VALID;

        // set up printer
        PrinterDialog pdlg;
        if (pdlg.DoModel() == S_OK)
        {
            Printer printer = new Printer(pdlg.GetName());

            char * buffer = NULL;

            buffer = customer.GetName();
            buffer[30] = '\0';
            printer.Print(buffer);
            delete buffer[];
            if (customer.IsBusiness())
            {
                 buffer = customer.GetCompany());
                 buffer[35] = '\0';
                 printer.Print(buffer);
            }
            // more lines to print customer info

            for (int i = 0; i < items.Count(); i++)
            {
                 int item_size = item[i].GetSize();
                 char *buffer2 = new char[item_size];
                 buffer2[item_size] = '\0';

                 printer.Print(buffer);

                 delete buffer2[];
            }

            // more printing stuff for terms and totals

        }
    }

This fictitious code captures the spirit of the problem: A relatively high-level function (printing an invoice) has to deal with very low-level operations (memory allocation). This was not an isolated example -- the entire system was coded in this manner.

The problem with this style of code is the load that it places on the programmer. The poor sap who has to maintain this code (or worse, enhance it) has to mentally bounce up and down thinking in high-level business functions and low-level technical functions. Each of these is a context switch, in which the programmer must stop thinking about one set of things and start thinking about another set of things. Context switches are expensive. You want to minimize them. If you force programmers to go through them, they will forget things. (For example, in the above code the programmer did not delete the memory allocated for printing the company name. You probably didn't notice it either -- you were to busy shifting from detail-to-general mode.)

Object-oriented programming lets us organize our code, and lets us organize it on our terms -- we get to define the classes and objects. But so few people use it to their advantage.

To be fair, in all of the programming courses and books, I have seen very little advocacy for programmers. It's not a new concept. Gerry Weinberg wrote about "the readability of programs" in his The Psychology of Computer Programming in the mid 1970s. And Perl offers many ways to to the same thing, with the guiding principle of "using the one that makes sense". But beyond that, I have seen nothing in courses that strive for making a programmer's job easier. Nor have I seen any management tracts on measuring the complexity of code and designing systems to reduce long-term maintenance costs.

Consequently, new programmers start writing code and group everything into the obvious classes, but stop there. They don't (most of the time) create hierarchies of classes. And why should they? None of their courses covered such a concept. Examples in courses have the same mix of high-level and low-level functions, so programmers have been trained to mix them. The systems they build work -- that is they produce the desired output -- with mixed contexts, so it can't be that big of a problem.

In one sense, they are right. Programs with mixed contexts can produce the desired output. Of course so can non-OO programs using structured programming. And so can spaghetti code, using neither OO or structured programming.

Producing the right output is necessary but not sufficient. The design of the program affects future enhancements and defect corrections. I believe -- but have no evidence -- that mixed-context programs have more defects than well-organized programs. I believe this because a well-organized program should be easier to read, and defects should be easier to spot. High-level functions can contain just business logic and low-level functions can contain just technical details, and a reader of either can focus on the task at hand and not switch between the two.

I think that it is time we focus on the readability of the code, and the stress load that bad code puts on programmers. We have the techniques (object-oriented programming) to organize code into readable form. We have the motive (readable code is easier to maintain). We have the computing power to "afford" what some might consider to be "inefficient" code designs.

All we need now is the will.


Wednesday, November 11, 2009

Oh say can you C?

Programmers have two favorite past-times: arguing about languages and inventing new languages. (Arguing about editors is probably a close third.) When we're not doing one, we're probably doing the other.

I've written about the demise of C++. Yet its predecessor, C, is doing well. So well that people have re-invented C to look more like modern object-oriented languages. Two new languages are "Brace" and "OOC". Brace recasts C syntax to match that of Python, removing braces and using indentation for blocking. OOC is an object-oriented language that is compiled to C.

Improvements to the C language are not new. Objective C was developed in the early 1980s, and C++ itself is a "better" version of C. The early implementations of C++ were source-to-source compiled with a program called 'cfront'.

Improvements of this nature happen a lot. Borland improved Pascal, first extending standard Pascal with useful I/O functions and later morphing it into the Delphi product. Microsoft made numerous changes to BASIC, adding features, converting to Visual Basic, and continuing to add (and often change) features. Even FORTRAN was remade into RATFOR, a name derived from 'rational Fortran'. ('Rational Fortran' meant 'looks like C'.)

I'm not sure that Brace will have much in the way of success. Recasting C into Python gets you ... well, something very close to Python. Why exert the effort? If you wanted Python, you should have started with it. Brace does include support for coroutines, something that may appeal to a very narrow audience, and has support for graphics which may appeal to a broader group. But I don't see a compelling reason to move to it. OOC is in a similar situation. My initial take is that OOC is Ruby but with static typing. And if you wanted Ruby... well, you know the rest.

Improvements to C are nice, but I think the improvers miss an important point: C is small enough to fit inside our heads. The C language is simple and can be understood with four concepts: variables, structs, functions, and pointers. Everything in C is built from these four elements, and can be understood in these terms. You can look at C code and compile it with your "cortex compiler". (I'm ignoring atrocities committed by the preprocessor.) The improved versions of C are more complex and understanding a code fragment requires broader knowledge of the program. Every feature of C++ hid something of the code, and made the person reading the code go off and look at other sections.

The most important aspect of a programming language is readability. Programmers read code more often than you think, and they need to understand. C had this quality. Its derivatives do not. Therefore, there is a cost to using the derivatives. There are also benefits, such as better program organization with object-oriented techniques. The transition from C to C++, or Objective C, or Brace, or OOC is a set of trade-offs, and should be made with care.


Sunday, November 8, 2009

Microsoft Shares its Point but Google Waves

Microsoft and Google are the same, yet different. For example, they both offer collaboration tools. Microsoft offers Sharepoint and Google has announced 'Waves'.

Microsoft Sharepoint is a web-based repository for documents (and anything that passes as a document in the Microsoft universe, such as spreadsheets and presentations). Sharepoint also has a built-in list that has no counterpart in the desktop world. And Sharepoint can be extended with programs written on the .NET platform.

Google Waves is a web based repository for conversations -- e-mail threads -- with the addition of anything that passes for a document in the Google universe.

Sharepoint and Waves are similar in that they are built for collaboration. They are also similar in that they use version control to keep previous revisions of documents.

Sharepoint and Waves are different, and their differences say a lot about their respective companies.

Sharepoint is an extension of the desktop. It provides a means for sharing documents, yet it ties in to Microsoft Office neatly. It is a way for Microsoft to step closer to the web and help their customers move.

Waves is an extension of the web forum thread model, tying in to Google documents. It is a way for Google to step closer to the desktop (or functions that are performed on the desktop) and help their customers.

I've used Microsoft Sharepoint and seen demonstration of Waves. I generally discount demonstrations -- anyone can have a nice demo -- but Google's impressed me.

The big difference is in the approach. Microsoft has introduced Sharepoint as a way for people who use desktops and the desktop metaphor to keep using them. Google, on the other hand, has positioned Waves as a replacement for e-mail.

Why should I mention e-mail? Because e-mail is a big problem for most organizations. E-mail is a model of the paper-based mail system, and not effective in the computer world. We know the problems with e-mail and e-mail threads  (reading messages from the bottom up, losing attachments, getting dropped from lists) and the problems are not small. Yet we believed that the problem was in ourselves, not the e-mail concept.

Google has a better way. They move away from e-mail and use a different model, a model of a conversation. People can join and leave as they wish. New joiners can review older messages quickly. Everyone has the latest versions of documents.

And here is the difference between Microsoft and Google. Microsoft created a tool -- Sharepoint -- to address a problem. Sharepoint is nice but frustrating to use; it is an extension of the desktop operating system and expensive to administrate. It offers little for the user and has no concept of e-mail or conversations. Google has taken the bold step of moving to a new concept, thinking (rightfully so in my opinion) that the problems of collaboration cannot be solved with the old metaphors. Google has started with the notion of conversation and built from there.

Just as EDSAC was an electronic version of a mechanical adding machine and EDVAC was a true electronic computer, e-mail is an electronic version of paper mail and Waves is a conversation system. Microsoft is apparently content with e-mail; Google is willing to innovate.