Fitzpatrick's Fabulous Future: refactoring

Showing posts with label refactoring. Show all posts

Tuesday, May 8, 2018

Refactor when you need it

The development cycle for Agile and TDD is simple:

Define a new requirement
Write a test for that requirement
Run the test (and see that it fails)
Change the code to make the test pass
Run the test (and see that it passes)
Refactor the code to make it clean
Run the test again (and see that it still passes)

Notice that refactor step near the end? That is what keeps your code clean. It allows you to write a messy solution quickly.

A working solution gives you a good understanding of the requirement, and its affect on the code. With that understanding, you can then improve the code, making it clear for other programmers. The test keeps your revised solutions correct -- if a cleanup change breaks a test, you have to fix the code.

But refactoring is not limited to after a change. You can refactor before a change.

Why would you do that? Why would you refactor before making any changes? After all, if your code is clean, it doesn't need to be refactored. It is already understandable and maintainable. So why refactor in advance?

It turns out that code is not always perfectly clean. Sometimes we stop refactoring early. Sometimes we think our refactoring is complete when it is not. Sometimes we have duplicate code, or poorly named functions, or overweight classes. And sometimes we are enlightened by a new requirement.

A new requirement can force us to look at the code from a different angle. We can see new patterns, or see opportunities for improvement that we failed to see earlier.

When that happens, we see new ways of organizing the code. Often, the new organization allows for an easy change to meet the requirement. We might refactor classes to hold data in a different arrangement (perhaps a dictionary instead of a list) or break large-ish blocks into smaller blocks.

In this situation, it is better to refactor the code before adding the new requirement. Instead of adding the new feature and refactoring, perform the operations in reverse sequence: refactor and then add the requirement. (Of course, you still test and you can still refactor at the end.) The full sequence is:

Define a new requirement
Write a test for that requirement
Run the test (and see that it fails)
Examine the code and identify improvements
Refactor the code (without adding the new requirement)
Run tests to verify that the code still works (skip the new test)
Change the code to make the test pass
Run the test (and see that it passes)
Refactor the code to make it clean
Run the test again (and see that it still passes)

I've added the new steps in bold.

Agile has taught us is to change our processes when the changes are beneficial. Changing the Agile process is part of that. You can refactor before making changes. You should refactor before making changes, when the refactoring will help you.

Monday, August 8, 2016

Agile is all about code quality

Agile promises clean code. That's the purpose of the 'refactor' phase. After creating a test and modifying the code, the developer refactors the code to eliminate compromises made during the changes.

But how much refactoring is enough? One might flippantly say "as much as it takes" but that's not an answer.

For many shops, the answer seems to be "as much as the developer thinks is needed". Other shops allow refactoring until the end of the development cycle. The first is subjective and opens the development team to the risk of spending too much time on refactoring and not enough on adding features. The second is arbitrary and risks short-changing the refactoring phase and allowing messy code to remain in the system.

Agile removes risk by creating automated tests, creating them before modifying the code, and having developers run those automated tests after all changes. Developers must ensure that all tests pass; they cannot move on to other changes while tests are failing.

This process removes judgement from the developer. A developer cannot say that the code is "good enough" without the tests confirming it. The tests are the deciders of completeness.

I believe that we want the same philosophy for code quality. Instead of allowing a developer to decide when refactoring has reached "good enough", we will instead use an automated process to make that decision.

We already have code quality tools. C and C++ have had lint for decades. Other languages have tools as well. (Wikipedia has a page for static analysis tools.) Some are commercial, others open source. Most can be tailored to meet the needs of the team, placing more weight on some issues and ignoring others. My favorite at the moment is 'Rubocop', a style-checking tool for Ruby.

I expect that Agile processes will adopt a measured approach to refactoring. By using one (or several) code assessors, a team can ensure quality of the code.

Such a change is not without ramifications. This change, like the use of automated tests, takes judgement away from the programmer. Code assessment tools can consider many things, some of which are style. They can examine indentation, names of variables or functions, the length or complexity of a function, or the length of a line of code. They can check the number of layers of 'if' statements or 'while' loops.

Deferring judgement to the style checkers will affect managers as well as programmers. If a developer must refactor code until it passes the style checker, then a manager cannot cut short the refactoring phase. Managers will probably not like this change -- it takes away some control. Yet it is necessary to maintain code quality. By ending refactoring before the code is at an acceptable quality, managers allow poor code to remain in the system, which will affect future development.

Agile is all about code quality.

Sunday, March 30, 2014

How to untangle code: Start at the bottom

Messy code is cheap to make and expensive to maintain. Clean code is not so cheap to create but much less expensive to maintain. If you can start with clean code and keep the code clean, you're in a good position. If you have messy code, you can reduce your maintenance costs by improving your code.

But where to begin? The question is difficult to answer, especially on a large code base. Some ideas are:

Re-write the entire code
Re-write logical sections of code (vertical slices)
Re-write layers of code (horizontal slices)
Make small improvements everywhere

All of these ideas have merit -- and risk. For very small code sets, a complete re-write is possible. For a system larger than "small", though, a re-write entails a lot of risk.

Slicing the system (either vertically or horizontally) has the appeal of independent teams. The idea is to assign a number of teams to the project, with each project working on an independent section of code. Since the code sets are independent, the teams can work independently. This is an appealing idea but not always practical. It is rare that a system is composed of independent systems. More often, the system is composed of several mutually-dependent systems, and adjustments to any one sub-system will ripple throughout the code.

One can make small improvements everywhere, but this has its limits. The improvements tend to be narrow in scope and systems often need high-level revisions.

Experience has taught me that improvements must start at the "bottom" of the code and work upwards. Improvements at the bottom layer can be made with minimal changes to higher layers. Note that there are some changes to higher layers -- in most systems there are some affects that ripple "upwards". Once the bottom layer is "clean", one can move upwards to improve the next-higher level.

How to identify the bottom layer? In object-oriented code, the process is easy: classes that can stand alone are the bottom layer. Object-oriented code consists of different classes, and some (usually most) classes depend on other classes. (A "car system" depends on various subsystems: "drive train", "suspension", "electrical", etc., and those subsystems in turn depend on smaller components.)

No matter how complex the hierarchy, there is a bottom layer. Some classes are simple enough that they do not include other classes. (At least not other classes that you maintain. They may contain framework-provided classes such as strings and lists and database connections.)

These bottom classes are where I start. I make improvements to these classes, often making them immutable (so they can hold state but they cannot change state). I change their public methods to use consistent names. I simplify their code. When these "bottom" classes are complex (when they hold many member variables) I split them into multiple classes.

The result is a set of simpler, cleaner code that is reliable and readable.

Most of these changes affect the other parts of the system. I make changes gradually, introducing one or two and then re-building the system and fixing broken code. I create unit tests for the revised classes. I share changes with other members of the team and ask for their input.

I don't stop with just these "bottom" classes. Once cleaned, I move up to the next level of code: the classes than depend only on framework and the newly-cleaned classes. With a solid base of clean code below, one can improve the next layer of classes. The improvements are the same: make classes immutable, use consistent names for functions and variables, and split complex classes into smaller classes.

Using this technique, one works from the bottom of the code to the top, cleaning all of the code and ensuring that the entire system is maintainable.

This method is not without drawbacks. Sometimes there are cyclic dependencies between classes and there is no clear "bottom" class. (Good judgement and re-factoring can usually resolve that issue.) The largest challenge is not technical but political -- large code bases with large development teams often have developers with egos, developers who think that they own part of the code. They are often reluctant to give up control of "their" code. This is a management issue, and much has been written on "egoless programming".

Despite the difficulties, this method works. It is the only method that I have found to work. The other approaches too often run into the problem of doing too much at once. The "bottom up" method allows for small, gradual changes. It reduces risk, but cannot eliminate it. It lets the team work at a measured pace, and lets the team measure their progress (how many classes cleaned).

Friday, February 1, 2013

Refactoring and code cleanup are everyone's job

Code is like fish. Over time (and a surprisingly short period of time), it "goes bad" and starts to smell. While fish must be discarded, code can be improved.

Code can be messy for a number of reasons. It can be assembled from older (poorly written) systems. It can be developed under aggressive timeframes. The developers can be careless or inexperienced.

You want to improve your code. Messy code is hard to understand, difficult to debug, and problematic to change. Projects with messy code find that they miss deadlines and have a large number of defects.

Refactoring is the process of changing the structure of code while maintaining its behavior. By changing the structure, you can improve the readability and maintainability of the code. By keeping the functionality, you keep all current features.

One might think that assigning the task of refactoring to a subset of the team is sufficient. The idea is that this subteam will improve the code, cleaning up the mess that has developed over time. But I believe that such an approach is ineffective.

Refactoring (and code quality in general) is a task for everyone on the project. The approach of a separate team does not work. Here's why:

The team members dedicated to the task are viewed as a separate team. Usually they are viewed as the elite members of the team; more darkly as diva-developers. Sometimes they are viewed as servants, or a lower caste of the team. Fracturing the team in this way benefits no one.

Other (non refactoring) members of the team can become sloppy. They know that someone will come after them to clean their code. That knowledge sets up an incentive for sloppy code -- or at least removes the incentive for clean code.

The biggest reason, though, is one of numbers. The refactoring team is smaller than the rest of the team. This smaller team is attempting to clean up the mess created by the entire team. Your team's current processes create messy code (for whatever reason) so that larger (non-refactoring) team is still creating a mess. The smaller team attempts to clean while the larger team keeps creating a mess. This doesn't work.

As I see it, the only way to clean code is to get everyone involved. No separate team, no elite squad, no penalty assignments. The team's process must change to create clean code (or to improve messy code). Nothing less will do.

Sunday, December 11, 2011

Tradeoffs

It used to be that we had to write small, fast programs. Processors were slow, storage media (punch cards, tape drives, disc drives) were even slower, and memory was limited. In such a world, programmers were rewarded for tight code, and DP managers were rewarded for maintaining systems at utilization rates of ninety to ninety-five percent of machine capacity. The reason was that a higher rate meant that you needed more equipment, and a lower rate meant that you had purchased (or more likely, leased) too much equipment.

In that world, programmers had to make tradeoffs when creating systems. Readable code might not be fast, and fast code might not be readable (and often the two were true). Fast code won out over readable (slower) code. Small code that squeezed the most out of the hardware won out over readable (less efficient) code. The tradeoffs were reasonable.

The world has changed. Computers have become more powerful. Networks are faster and more reliable. Databases are faster, and we have multiple choices of database designs -- not everything is a flat file or a set of related tables. Equipment is cheap, almost commodities.

This change means that the focus of costs now shifts. Equipment is not the big cost item. CPU time is not the big cost item. Telecommunications is not the big cost item.

The big problem of application development, the big expense that concerns managers, the thing that will get attention, will be maintenance: the time and cost to modify or enhance an existing system.

The biggest factor in maintenance costs, in my mind, is the readability of the code. Readable code is easy to change (possibly). Opaque code is impossible to change (certainly).

Some folks look to documentation, such as design or architecture documents. I put little value in documentation; I have always found the code to be the final and most accurate description of the system. Documents suffer from aging: they were correct some but the system has been modified. Documents suffer from imprecision: they specify some but not all of the details. Documents suffer from inaccuracy: they specify what the author thought the system was doing, not what the system actually does.

Sometimes documentation can be useful. The business requirements of a system can be useful. But I find "System architecture" and "Design overview" documents useless.

If the code is to be the documentation for itself, then it must be readable.

Readability is a slippery concept. Different programmers have different ideas about "readability". What is readable to me may not be readable to you. Over my career, my ideas of readability have changed, as I learned new programming techniques (structured programming, object-oriented programming, functional programming), and even as I learned more about a language (my current ideas of "readable" C++ code are very different from my early ideas of "readable" C++ code).

I won't define readability. I will let each project decide on a meaningful definition of readability. I will list a few ideas that will let teams improve the readability of their code (however they define it).

Version control for source code A shop that is not using version control is not serious about software development. There are several reliable, well-documented and well supported, popular systems for version control. Version control lets multiple team members work together and coordinate their changes.

Automated builds An automated build lets you build the system reliably, consistently, and at low effort. You want the product for the customer to be built with a reliable and consistent method.

Any developer can build the system Developers need to build the system to run their tests. They need a reliable, consistent, low-effort, method to do that. And it has to work with their development environment, allowing them to change code and debug the system.

Automated testing Like version control, automated testing is necessary for a modern shop. You want to test the product before you send it to your customers, and you want the testing to be consistent and reliable. (You also want it easy to run.)

Any developer can test the system Developers need to know that their changes affect only the behaviors that they intend, and no other parts of the system. They need to use the tests to ensure that their changes have no unintended side-effects. Low-effort automated tests let them run the tests often.

Acceptance of refactoring To improve code, complicated classes and modules must be changed into sets of smaller, simpler classes and modules. Refactoring changes the code without changing the external behavior of the code. If I start with a system that passes its tests (automated tests, right?) and I refactor it, it should pass the same tests. When I can rearrange code, without changing the behavior, I can make the code more readable.

Incentives for developers to use all of the above Any project that discourages developers from using automated builds or automated tests, either explicitly or implicitly, will see little or no improvements in readability.

But the biggest technique for readable code is that the organization -- its developers and managers -- must want readable code. If the organization is more concerned with "delivering a quality product" or "meeting the quarterly numbers", then they will trade off readability for those goals.

Fitzpatrick's Fabulous Future