Tuesday, February 6, 2018

The IRS made me a better programmer

We US taxpayers have opinions of the IRS, the government agency tasked with the collection of taxes. Those opinions tend to be strong and tend to fall on the "not favorable" side. Yet the IRS did me a great favor and helped me become a better programmer.

The assistance I received was not through employment at the IRS, nor did they send me a memo entitled "How to be a better programmer". They did give me some information, not related to programming, yet it turned out to be the most helpful advice on programming in my career.

That advice was the simple philosophy: One operation at a time.

The IRS uses this philosophy when designing the forms for tax returns. There are a lot of forms, and some cover rather complex notions and operations, and all must be understandable by the average taxpayer. I've looked at these forms (and used a number of them over the years) and while I may dislike our tax laws, I must admit that the forms are as easy and understandable as tax law permits. (Tax law can be complex with intricate concepts, and we can consider this complexity to be "essential" -- it will be present in any tax form no matter how well you design it.)

Back to programming. How does the philosophy of "one operation at a time" change the way I write programs?

A lot, as it turns out.

The philosophy of "one operation at a time" is directly applicable to programming. Well, my programming, at least. I had, over the years, developed a style of combining operations onto a single line.

Here is a simplified example of my code, using the "multiple operations" style:

Foo harry = y.elements().iterate().select('harry')

It is concise, putting several activities on a single line. This style makes for shorter programs, but not necessarily more understandable programs. Shorter programs are better when the shortness is measured in operations, not raw lines. Packing a bunch of operations -- especially unrelated operations -- onto a single line is not simplifying a program. If anything, it is making it more complex, as we tend to assume that operations on the same line are somehow connected.

I changed my style. I shifted from multi-operation lines to single operation lines, and I was immediately pleased with the result.

Here's the example from above, but with the philosophy of one operation per line:

elements = y.elements()
Foo harry = nil
elements.each do |element|
  harry = element if element.name == 'harry'

I have found two immediate benefits from this new style.

The first benefit is a better experience when debugging. When stepping through the code with the debugger, I can examine intermediate values. Debuggers are line-oriented, and execute the single-line version all in one go. (While there are ways to force the debugger to execute each function separately, there are no variables to hold the intermediate results.)

The second benefit is that it is easier to identify duplicate code. By splitting operations onto multiple lines, I find it easier to identify duplicate sequences. Sometimes the code is not an exact duplicate, but the structure is the same. Sometimes portions of the code is the same. I can refactor the duplicated code into functions, which simplifies the code (fewer lines) and consolidates common logic in a single place (one point of truth).

Looking back, I can see that my code is somewhat longer, in terms of lines. (Refactoring common logic reduces it somewhat, but not enough to offset the expansion of multiline operations.)

Yet the longer code is easier to read, easier to explain to others, and easier to fix. And since the programs I am writing are much smaller than the computer's capabilities, there is little expense at slightly longer programs. I suspect that compilers (for languages that use them) are optimizing a lot of my "one at a time" operations and condensing them, perhaps better than I can. The executables produced are about the same size as before. Interpreters, too, seem to have little problem with multiple simple statements, and run the "one operation" version of programs just as fast as the "multiple operations" version. (This is my perception; I have not conducted formal time trials of the two versions.)

Simpler code, easier to debug, and easier to explain to others. What's not to like?

No comments: