Sunday, January 31, 2010

How much code is too much code?

As an industry, we have over half a century of experience. We should, by now, be in a position to use that experience to decide on "good" and "bad" software. One dimension is size. Let's explore that aspect of software.

Certainly, software is needed. Lines of code are required to get the work done. Without code, nothing happens.

Yet too much code is a bad thing. Smaller programs are easier to understand than larger programs. And easier to modify. And to fix. "A simple program has obviously no defects. A complex program has no obvious defects."

So how do we reduce the size of our code? The traditional methods are: eliminate duplicate code, re-factor to combine similar sections, and re-write to use better coding techniques. These are all laudable, but insufficient.

Here's what you need to do:

First, decide that you care about the size of your source code. In my experience, most project managers (and most programmers) care little if at all for the size of the source code. Savvy managers realize that there is a correlation between the size of source code and the quality of the result. They also realize that size has an effect on their team's ability to meet new requirements. But most managers care only for the end result of delivery on time, with as few defects as possible, and ignore the size of the code.

Second, decide that you want to measure your source code. This implies wanting to monitor the code size and take action as you get the measurements. Many project plans are formed up front, with no slack for adjustments. If you're not going to change the project as you get the information, then why bother collecting it?

Third, decide on the measurement frequency. You must measure frequently enough to make a difference, but not so frequently that you waste resources. The measurement must feed into your OODA (Observe-Orient-Decide-Act) loop. Measuring once per year is probably too infrequent. Measuring every day is probably to frequent.

Fourth, pick a measurement. There are different ways to measure source code, each has its advantages. Here are a few:

- Lines Of Code (LOC): a raw count of the lines of source code, easily obtained with tools like wc. Advantages: it's easy to do -- very easy. Disadvantages: it doesn't account for blank lines or comments, nor does it measure complexity. (I can write a complex algorithm in 20 lines of code or 200 lines, but which is better? The short one may be hard to understand. Or the long one may be inefficient.)

- Source Lines Of Code (SLOC): a count of just the source code, omitting blank lines and comments. Advantage: a more accurate measure that LOC. Disadvantage: harder to do -- you need filters before sending source into wc.

- Function points: a measure of complexity of the task, not the code. Advantage: better for comparing different projects (especially those that use disparate technologies). Disadvantage: much, much harder to compute. Perhaps so much harder that the effort to derive these numbers costs more than the benefits.

- Complexity measure (McCabe or others): a measure of the complexity in the code. Advantages: good for identifying complex areas of code, and comparing different projects. Disadvantages: hard to derive.

Fifth, decide on your goal. A capricious goal of "reduce code to half its current size" is foolish. How do you know your code is too big? (Yes, it probably is larger than it needs to be, based on what I've seen with various software efforts. But how do you know that a reduction to half its current size is wise, or even possible?) A different goal, and one that may be more useful, is to improve understanding. Here are some ideas:

- For projects that use multiple languages, understand the relative size of the language code bases. How much of your project is in C++? How much in C#? And how much in Visual Basic?

- For projects with components or libraries, understand the relative size of the different components. How does the size of source code compare with the size of the development teams?

- One measurement you can take (with enough effort) is the rate of change -- not just an absolute size. Identifying the code that changes most frequently lets you identify the "hot spots" of your code. These areas may be at the most risk for defects.

I haven't answered the question of "how much is too much". There is no one answer. But with measurements, you may be able to answer the question for yourself.


No comments: