Sunday, March 30, 2014

How to untangle code: Start at the bottom

Messy code is cheap to make and expensive to maintain. Clean code is not so cheap to create but much less expensive to maintain. If you can start with clean code and keep the code clean, you're in a good position. If you have messy code, you can reduce your maintenance costs by improving your code.

But where to begin? The question is difficult to answer, especially on a large code base. Some ideas are:
  • Re-write the entire code
  • Re-write logical sections of code (vertical slices)
  • Re-write layers of code (horizontal slices)
  • Make small improvements everywhere
All of these ideas have merit -- and risk. For very small code sets, a complete re-write is possible. For a system larger than "small", though, a re-write entails a lot of risk.

Slicing the system (either vertically or horizontally) has the appeal of independent teams. The idea is to assign a number of teams to the project, with each project working on an independent section of code. Since the code sets are independent, the teams can work independently. This is an appealing idea but not always practical. It is rare that a system is composed of independent systems. More often, the system is composed of several mutually-dependent systems, and adjustments to any one sub-system will ripple throughout the code.

One can make small improvements everywhere, but this has its limits. The improvements tend to be narrow in scope and systems often need high-level revisions.

Experience has taught me that improvements must start at the "bottom" of the code and work upwards. Improvements at the bottom layer can be made with minimal changes to higher layers. Note that there are some changes to higher layers -- in most systems there are some affects that ripple "upwards". Once the bottom layer is "clean", one can move upwards to improve the next-higher level.

How to identify the bottom layer? In object-oriented code, the process is easy: classes that can stand alone are the bottom layer. Object-oriented code consists of different classes, and some (usually most) classes depend on other classes. (A "car system" depends on various subsystems: "drive train", "suspension", "electrical", etc., and those subsystems in turn depend on smaller components.)

No matter how complex the hierarchy, there is a bottom layer. Some classes are simple enough that they do not include other classes. (At least not other classes that you maintain. They may contain framework-provided classes such as strings and lists and database connections.)

These bottom classes are where I start. I make improvements to these classes, often making them immutable (so they can hold state but they cannot change state). I change their public methods to use consistent names. I simplify their code. When these "bottom" classes are complex (when they hold many member variables) I split them into multiple classes.

The result is a set of simpler, cleaner code that is reliable and readable.

Most of these changes affect the other parts of the system. I make changes gradually, introducing one or two and then re-building the system and fixing broken code. I create unit tests for the revised classes. I share changes with other members of the team and ask for their input.

I don't stop with just these "bottom" classes. Once cleaned, I move up to the next level of code: the classes than depend only on framework and the newly-cleaned classes. With a solid base of clean code below, one can improve the next layer of classes. The improvements are the same: make classes immutable, use consistent names for functions and variables, and split complex classes into smaller classes.

Using this technique, one works from the bottom of the code to the top, cleaning all of the code and ensuring that the entire system is maintainable.

This method is not without drawbacks. Sometimes there are cyclic dependencies between classes and there is no clear "bottom" class. (Good judgement and re-factoring can usually resolve that issue.) The largest challenge is not technical but political -- large code bases with large development teams often have developers with egos, developers who think that they own part of the code. They are often reluctant to give up control of "their" code. This is a management issue, and much has been written on "egoless programming".

Despite the difficulties, this method works. It is the only method that I have found to work. The other approaches too often run into the problem of doing too much at once. The "bottom up" method allows for small, gradual changes. It reduces risk, but cannot eliminate it. It lets the team work at a measured pace, and lets the team measure their progress (how many classes cleaned).

No comments: