Wednesday, August 10, 2011

A measure of quality

I propose that quality of code (source code) is inversely correlated to duplications within the code. That is, the more duplications, the worse the code. Good code will have few or no duplications.

The traditional argument against duplicate code is the increased risk of defects. When I copy and paste code, I copy not only the code but also all defects within the copied code. (Or if requirements later change and we must modify the copied code, we may miss one of the duplicate locations and make an incomplete set of changes.)

Modern languages allow for the consolidation of duplicate code. Subroutines and functions, parent classes, and code blocks as first-class entities allow the development team to eliminate the duplication of code.

So let us assume that duplicate code is bad. Is it possible to measure (or even detect) code duplications? The answer is yes. I have done it.

Is it easy to detect duplicate code? Again, the answer is yes. Most developers, after some experience with the code base, will know if there are duplicate sections of code. But is there an automated way to detect duplicate code?

And what about measuring duplicate code? Is it easy (or even possible) to create a metric of duplicate code?

Let's handle these separately.

Identifying duplicate blocks of code within a system can be viewed as a scaled-up version of the same problem between two files. Given two separate source files, how can one find the duplicate blocks of code? The method I used was to run a custom program on the two files, a program that identified common blocks of code. The program operated like 'diff', but in reverse: instead of finding differences, it found common blocks. (And in fact that is how we wrote our program. We wrote 'diff', and then changed it to output the common blocks and not the different blocks.)

Writing our 'anti-diff' utility (we called it 'common') was hard enough. Writing it in such a way that it was fast was another challenge. (You can learn about some of the techniques by looking for 'how is grep fast' articles on the web.)

Once the problem has been solved for two files, you can scale it up to all of the files in your project. But be careful! After a moment's thought, you realize that to find all of the common blocks of code, you must compare every file against every other file, and this algorithm scales O(n-squared). This is a bad factor for scaling, and we solved it by throwing hardware at the problem. (Fortunately, the algorithm is parallizable.)

After more thought, you realize that there may be common blocks within a single file, and that you need a special case (and a special utility) to detect them. You are relieved that this special case scales at O(n).

Eventually, you have a process that identifies the duplicate blocks of code within your source code.

The task of identifying duplications may be hard, but assigning a metric is open to debate. Should a block of 10 lines duplicated twice (for a total of three occurrences) count the same as a block of 15 lines duplicated once? Is the longer duplication worse? Or is the more frequent duplication the more severe?

We picked a set of "badness factors" and used them to generate reports. We didn't care too much about the specific factors, or the "quantity vs. length" problem. For us, it was more important to use a consistent set of factors, get a consistent set of metrics, and observe the overall trend. (Which went up for a while, and then levelled off and later decreased as we requested a reduction in duplicate code. Having the reports of the most serious problems was helpful in convincing the development team to address the problem.)

In the end, one must review the costs and the benefits. Was this effort of identifying duplicate code worth the cost? We like to think that it is, for four reasons:

We reduced our code base: we identified and eliminated duplicate code.

We corrected defects: We identified near-identical code and found that the near-duplicates were true duplicates, some with fixes and some without. We combined the code and ensured that all code paths had the right fixes.

We demonstrated an interest in the quality of the code: Rather than focus on only the behavior of the code, we took an active interest in the quality of our source code.

We obtained a leading indicator of quality: Regression tests are lagging indicators of quality, observable only after the coding is complete. We can measure duplicate code from the source code, and from the first day of the project, getting measurements immediately.

We believe that we get the behavior that we reward. By imposing soft penalties for duplicate code, measuring the code, and distributing that information, we changed the behavior of the development team and improved the quality of our code. We made it easy to eliminate the duplicate code, by providing lists of the duplicate code and the locations within the code base.

No comments: