Sunday, January 31, 2010

How much code is too much code?

As an industry, we have over half a century of experience. We should, by now, be in a position to use that experience to decide on "good" and "bad" software. One dimension is size. Let's explore that aspect of software.

Certainly, software is needed. Lines of code are required to get the work done. Without code, nothing happens.

Yet too much code is a bad thing. Smaller programs are easier to understand than larger programs. And easier to modify. And to fix. "A simple program has obviously no defects. A complex program has no obvious defects."

So how do we reduce the size of our code? The traditional methods are: eliminate duplicate code, re-factor to combine similar sections, and re-write to use better coding techniques. These are all laudable, but insufficient.

Here's what you need to do:

First, decide that you care about the size of your source code. In my experience, most project managers (and most programmers) care little if at all for the size of the source code. Savvy managers realize that there is a correlation between the size of source code and the quality of the result. They also realize that size has an effect on their team's ability to meet new requirements. But most managers care only for the end result of delivery on time, with as few defects as possible, and ignore the size of the code.

Second, decide that you want to measure your source code. This implies wanting to monitor the code size and take action as you get the measurements. Many project plans are formed up front, with no slack for adjustments. If you're not going to change the project as you get the information, then why bother collecting it?

Third, decide on the measurement frequency. You must measure frequently enough to make a difference, but not so frequently that you waste resources. The measurement must feed into your OODA (Observe-Orient-Decide-Act) loop. Measuring once per year is probably too infrequent. Measuring every day is probably to frequent.

Fourth, pick a measurement. There are different ways to measure source code, each has its advantages. Here are a few:

- Lines Of Code (LOC): a raw count of the lines of source code, easily obtained with tools like wc. Advantages: it's easy to do -- very easy. Disadvantages: it doesn't account for blank lines or comments, nor does it measure complexity. (I can write a complex algorithm in 20 lines of code or 200 lines, but which is better? The short one may be hard to understand. Or the long one may be inefficient.)

- Source Lines Of Code (SLOC): a count of just the source code, omitting blank lines and comments. Advantage: a more accurate measure that LOC. Disadvantage: harder to do -- you need filters before sending source into wc.

- Function points: a measure of complexity of the task, not the code. Advantage: better for comparing different projects (especially those that use disparate technologies). Disadvantage: much, much harder to compute. Perhaps so much harder that the effort to derive these numbers costs more than the benefits.

- Complexity measure (McCabe or others): a measure of the complexity in the code. Advantages: good for identifying complex areas of code, and comparing different projects. Disadvantages: hard to derive.

Fifth, decide on your goal. A capricious goal of "reduce code to half its current size" is foolish. How do you know your code is too big? (Yes, it probably is larger than it needs to be, based on what I've seen with various software efforts. But how do you know that a reduction to half its current size is wise, or even possible?) A different goal, and one that may be more useful, is to improve understanding. Here are some ideas:

- For projects that use multiple languages, understand the relative size of the language code bases. How much of your project is in C++? How much in C#? And how much in Visual Basic?

- For projects with components or libraries, understand the relative size of the different components. How does the size of source code compare with the size of the development teams?

- One measurement you can take (with enough effort) is the rate of change -- not just an absolute size. Identifying the code that changes most frequently lets you identify the "hot spots" of your code. These areas may be at the most risk for defects.

I haven't answered the question of "how much is too much". There is no one answer. But with measurements, you may be able to answer the question for yourself.


Wednesday, January 27, 2010

iPad: an off-by-one error

Apple released the iPad today, to responses that varied from "hooray" to "yawn". The cheers came from the Apple fans, who cheer any announcement from Cupertino. The yawns came from the rest of us, who may be wondering about Apple's strategies.

Here's how I see the iPad: It's Apple's response to the netbook market. Netbook computers have gained popularity over the past few years, and the lack of a netbook class offering in Apple's products in obvious. Rather than produce a smaller MacBook, Apple has chosen the route of a computer without a keyboard. Which is all very good, but also so very 2008.

The yawns from the audience are caused by Apple's focus on netbooks, which are now the old thing. Netbooks are here and part of the ecosystem. The new thing are e-books and e-book readers, like the Kindle and the Nook. (I still think that those are the silliest names on the planet -- but that's not the point.)

Apple addressed last year's hot item, not the current hot item. Yeah, they have a nice-looking tablet computer that can display e-books, and they have an agreement with some publishers. But they are not competing in the market.


Sunday, January 24, 2010

Linux updates vs. Windows updates

This past week, Mozilla released a new version of the Firefox web browser. (Version 3.6, for those who are interested.)

Along with the announcement, various pundits and commentators recommended that we update our Firefox to the new version.

For a while, I had to think about that recommendation. Not because I disagree with it (I believe in staying current with all software, including browsers, compilers, and operating systems) but because I didn't understand it. Why was it necessary to make such a recommendation?

After some pondering, and then some use of my computers at home (one of which runs Windows), I understood.

Microsoft Windows has a built-in update system, but it works only with Microsoft products. (And not necessarily all Microsoft products. It won't update you if the new version requires a new license.)

I've been living in the Linux world. The Linux distros all have update mechanisms -- at least the distros that I have been using, which include SuSE, Ubuntu and its variants, and Debian. The update mechanisms update all of the installed software. If I've selected Firefox, the distros (SuSE, Ubuntu, and Debian) will all install a new version for me. I don't have to do anything -- except run the updates. (And they kick in automatically, but politely ask permission to install the new versions.)

Side note: The different distros run on their own schedules. Some distros send updates immediately, other take a few weeks. Ubuntu is fairly fast, SuSE is a bit slower, and Debian is slower still. So I don't have the latest and greatest versions of everything immediately. But they do get there, eventually.

In the Windows world, the Microsoft updates handle Microsoft products, but beyond that you are on your own. If you want a new version of a non-Microsoft product, the Microsoft update does not help you.

(I have heard of a Linux-like update for Microsoft Windows, one that installs the latest version of open source products, but I have been unable to find it.)

So now I understand. With Linux, I get the latest version, "for free" since I need take no special action. With Windows, I get the latest version of Microsoft software (except when it costs money) and for other software I am on my own.


Sunday, January 17, 2010

Version control is not about versions

We call it version control, but it's really not about version control.

In the distant past, we lived without version control. (Or automated tests. Or e-mail.)

While programs were written by single individuals, things mostly worked. But even a single individual can make mistakes, and a previous version of the program is helpful. The ability to "go back in time" is useful, and a copy of yesterday's version (or last week's, or last year's) is easy enough to keep.

Once program teams became larger than a single person, we needed ways to coordinate the activities of the different team members. It was all too easy to store the code in a common place, let everyone make changes, and have interleaving reads and writes that lost changes.

Thus version control was invented. Version control did two things: it kept previous versions of the code (providing the "back in time" capability for everyone) and it serialized access to files, so all changes were kept.

Modern, professional projects today use version control of some sort. (SourceSafe, CVS, subversion, Git, ... the list is a long one.) People think that it is a necessity (which it is) but don't understand the real reason for selecting a version control system.

Most people think that version control systems (VCSs) are about versions. They focus on the "back in time" capability of a VCS.

But VCS is about more than "back in time".

Your version control system defines how the team works. It sets the rules for interactions between team members. If you select a "locking" VCS (or configure one that way), you define a process that forces team members to wait for files to become "unlocked" to check in their changes. If you define a process that does not lock files, then you force team members to merge changes with concurrent changes.

The VCS can grant and restrict access to different branches of the storage "tree". You can use this feature, or you can allow all team members access to every area. In the latter case, you trust team members to make the right changes. In the former case, you don't trust your team members.

Here's the pattern that I've seen: shops that use agile methods (daily builds and  automated tests, specifically) put less emphasis on their VCS. They let all team members check in files anywhere. They expect team members to run tests prior to check in. They expect their team members to do the right things. They trust their teammates.

Teams that use waterfall methods (infrequent builds, limited or no automated tests) use their VCS as a control mechanism. They restrict access to certain areas, to let only the select trusted few make changes in those areas. They expect their team members to do the wrong things, and use their VCS to guard against mistakes. They do not trust their teammates.

I'm not saying that one of these is better than the other. (Although in my personal experience I prefer the former.) Both have their advantages. Which you choose depends on your management style and your trust in your people.

Version control is about the team, how it interacts, and how much it trusts itself. And how much the managers trust the team.


Sunday, January 10, 2010

Microsoft steps forward to 1970

I recently came across Microsoft's white paper for spreadsheet compliance. It has the impressive title "Spreadsheet Compliance in the 2007 Microsoft Office System".

Published in 2006, it describes a set of techniques that can be used to ensure the storage and management of spreadsheets comply with regulations.

The document is clearly aimed at corporate managers. It begins with an executive summary, describing the risks of poorly managed spreadsheets and the legal ramifications of noncompliance with regulations.

The good news is that Microsoft recognizes that spreadsheets are important to organizations, contain critical data, and deserve proper design and management. (And if Microsoft recognizes that, then other companies are likely to follow.)

The bad news is that Microsoft recommends the use of waterfall methodology for spreadsheet development. The steps are: define requirements, design, implement, test and verify, and deploy. Microsoft includes a footnote that indicates this process for spreadsheets was based on the waterfall model for software development. (Or perhaps Microsoft does not recommend this model. The text reads "Here is a recommended development approach to creating spreadsheets", in the passive voice. Microsoft, aside from publishing this white paper, makes no direct recommendation for this method. But their white paper recommends it, so Microsoft recommends it. And neither the paper nor Microsoft present alternatives.)

My first thought on this recommendation was "Does Microsoft really believe that design-up-front waterfall methods are good ways to design spreadsheets?" I find it hard to believe, since Microsoft must be using agile techniques for their products -- or at least some of them. The waterfall model -- specifically the big-design-up-front, everything-in-one-cycle approach is simplistic and naive. The plans look good on paper, and it promises delivery on a specific date, but for anything beyond trivial projects it fails.

Perhaps Microsoft doesn't believe that the waterfall method is appropriate, but instead thinks that their customers believe it to be appropriate. I find this more reasonable; many companies -- especially big companies -- use waterfall for their development cycle. So Microsoft recommends not what it thinks best, but what it thinks will sell.

Or perhaps this development method is not important. The white paper goes on to explain the features in various Microsoft products that can help companies manage their spreadsheets. I specifically say "companies" because the products are enterprise solutions, not suitable for individuals or small companies. In this case, the white paper is not a solution for problems, but marketing material.

In any case, the impression of Microsoft is not flattering.


Friday, January 8, 2010

The Golden Age of Laptops

In 1994, John Dvorak made numerous predictions in his book "Dvorak Predicts". Many of these turned out to be wrong, including "death of the mainframe", "OS/2 over Windows", and interestingly "death of the desktop". In the spirit of predictions, I have one of my own: the golden age of laptops is over, and with it goes the golden age of Windows.

I like laptop computers. (My opinion of Windows is somewhat different.) I've been using them since the early 1980s, starting with the NEC-8201A laptop that ran BASIC. (The NEC is quite similar to the TRS-80 model 100.) I've seen the early laptops run DOS on a single floppy disk, I've watched the technology grow to allow for hard drives and CD-ROM drives. Screens have improved from the original non-backlit, low-resolution LCD panels to today's photo-quality displays. Networking has been added, first with wires and later without. Laptop computers extended our ability to work untethered. They are a success.

But laptops have reached their peak. Compared to smart phones and e-book readers, laptops are large, expensive, and complicated. Laptops are taking a second seat to their smaller, more convenient brethren.

I expect that laptops will remain with us, especially for corporate users. We still have desktops; heck, we still have mainframes. We will see modest improvements: longer battery life, better screens, lighter and thinner units. But I expect no great leaps forward. Laptops have gotten about as good as they need to be.

The market is moving to a newer breed of device, or perhaps a set of breeds. The smart phones and e-readers are moving to the lead.

Here's the interesting thing about the new devices: they don't run Windows. Microsoft's attempts at smaller-than-laptop units have been dismal failures. (Zune, anyone? Or a Windows tablet PC?) Non-Windows software has gotten good enough to drive the devices, and Windows isn't necessary for them. Nor is Java or Linux. The Barnes and Noble Nook e-reader uses Android, which is a Linux-y thing, but it isn't the common Linux that you download and boot off of CD-ROM.

We are entering a new world of software, an open frontier of possibilities. I expect that the new market will be fractured, with multiple vendors. In some ways, it doesn't matter, since people don't care what software runs their phone. All they want is to talk with their friends, send text messages and photos, play music, and surf the web.


Saturday, January 2, 2010

Estimates of woe

Software development is in crisis. Our projects overrun estimates on a consistent and too-frequent basis. The typical project overruns estimates by as much as forty-six percent or fifty-six percent, depending on your source studies. (Possibly more, if you find the right studies.)

The schedule overrun crisis, and its companion the budget overrun crisis, has been with us for decades.

But first, a question: If a project runs long, that is it takes longer than the initial estimate, which is incorrect? Was the project run incorrectly? Or was the estimate incorrect? (There is nothing magical about estimates. I can come up with incorrect estimates, such as driving from New York to Chicago in three hours.)

Most solutions I have seen have tacitly assumed that estimates were correct and the project management needs improvement. Managers, with their one hammer of project management, could see the problem as a project-management nail. I hear no programmers or practitioners asking for better project management. Of course, with their software development hammer, they may see the problem as a software development nail.

I see no demand for better estimating skills. I see no critical analysis of estimates and the means used to create them.

Here is a simple (and probably wrong) reason for the lack of desire to improve estimating skills: No one wants them. Programmers don't want them, because estimating is not fun (estimates are not programming, and only programming amounts to fun). Managers don't want better estimating techniques, because they don't want to lose the ability to fudge estimates to meet business goals. Let's look at estimates before continuing that thought.

Estimates for repetitive activities are easy. Housing construction, automobile assembly, and light manufacturing all have repetitive processes. (One car of a certain make and model is identical to another of the same make and model, one house in a development tract is almost identical to the house next door.) We've been building houses, cars, and consumer goods for decades, and we know how much time and resources went into previous models.

Estimates for non-repeating activities are harder. The time and skill for creating original art is hard to predict. So is the time for building a bridge or tunnel in a difficult location. (Bridges and tunnels are hard to begin with, but a difficult location adds uncertainties.)

But all estimates must have a bottom, some grounding in resources and elementary activities. For houses, you need to know about concrete, lumber, plumbing, and wiring. For cars, you need to know about sheet metal, windshields, wiring, suspension systems, ... the list goes on, but you can get to the bottom.

Software development is different. The estimates for development (at least the estimates that I have seen) have had no grounding. The project manager asks the team leads for estimates, the team leads ask developers, and developers, with no one left below them, have to provide some numbers. But they have nothing on which to base them. So they make up numbers, which are provided to the team leads. The team leads review them and make suggest changes (possibly based on their development experience and possibly based on their knowledge of a business-imposed schedule. Team leads forward their numbers to the project manager, who also reviews them based on development experience (if any) and knowledge of the business desires, adjusting them to meet the business needs.

Sometimes managers will keep asking development teams for "better" estimates, waiting until the team "gets it right". Some managers will dictate the solution to the team; others will specify a solution and then ask for an estimate. (I suspect the last is to allow the development team to "buy in" to the estimate, and to allow the manager a scapegoat should the project run over.)

So here is my theory: managers claim they want accurate estimates, but really don't. Managers want estimates that they can adjust. Accurate, grounded-in-reality estimates would mean that some projects could not be completed in time for business objectives. Estimates from the current "make the numbers fit" system allow managers to adjust estimates and promise to deliver systems on time. Delivering on time is good, but making the promise is better. Telling the business that they cannot have something is a bad role. Project managers who tell their business that the request cannot be completed on time find that they are given different assignments -- sometimes at different employers. Project managers who make the promise, and then show that they made every effort to complete the project, get to keep their jobs (usually). Many projects run over schedule and over budget, so businesses have little choice but to accept it as the norm.

As long as we have that set of incentives, we will have adjusted estimates. And as long as we have adjusted estimates, we will have overruns.