Showing posts with label readability. Show all posts
Showing posts with label readability. Show all posts

Sunday, December 15, 2013

Readable does not necessarily mean what you think it means

An easy way to start an argument among programmers (or possibly a fist-fight) is to ask about the readability of programming languages. Most developers have strong opinions about programming languages, especially when you are not paying them.

I'm not sure that "readability" is an aspect of a programming language. Instead, I think readability is an aspect of a specific program (written in any language). I have used many languages (assembly language, BASIC, FORTRAN, Pascal, COBOL, dBase, RBase, Visual Basic, Delphi, C, C++, Java, C#, Perl, Ruby, and Python, to name the popular ones), and examined many programs (small, large, tutorials, and production-level).

I've seen readable programs in all of these languages (including assembly language). I've seen hard-to-read programs in all of those languages, too.

If readability is not an aspect of a programming language but of the program itself, then what makes a program readable? Lots of people have commented on the topic over the decades. The popular ideas are usually:

  • Use comments
  • Follow coding standards
  • Use meaningful names for variables and functions (and classes)
  • Use structured programming techniques
  • Have high cohesion within modules (or classes) and low coupling

These are all worthy ideas.

I would like to add one more: balance the levels of abstraction. Programs of non-trivial complexity are divided into modules. (For object-oriented programming, these divisions are classes. For structured programming, these are functions and libraries.) A readable program will have a balanced set of modules. By "balanced", I mean in terms of size and complexity.

Given a program and its modules, you can take measurements. Typical measurements include: lines of code (derided by many, and for good reasons), cyclomatic complexity, software "volume", and function points. It doesn't particularly matter which metric; any of them give you relative sizes of modules. Once you have module sizes, you can plot those sizes on a chart.

Frequently, readable programs have a smooth curve of module sizes. And just as frequently, hard-to-read programs have a jagged curve. Hard-to-read programs tend to have one or a few large modules and a few or many small modules.

I'm not sure why such a correlation exists. (And perhaps it doesn't; I admit that my observations are limited and the correlation may be proven false with a larger data set.)

Yet I have a theory.

As a programmer examines the code, he frequently moves from one module to another, following the logic and the flow of control. Modules divide the program not only into smaller components but also into levels. The different levels of the program handle different levels of abstraction. Low levels contain small units of code, small collections of data. Higher levels contain larger units of code that are composed of smaller units.

Moving from module to module means (often) moving from one level of organization to another, lower, level. With a smooth distribution of module sizes, the cognitive difference between levels is small. A path from the top level to a module on the bottom level may pass through many intermediate modules, but each transition is small and easy to understand.

A program with a "lumpy" distribution of module sizes, in contrast, lacks this set of small jumps from level to level. Instead, a program with a lumpy distribution of module sizes (and module complexity) has a lumpy set of transitions from layer to layer. Instead of a smooth set of jumps from the top level to the bottom, the jumps occur erratically, some small and some large. This disparity in jump size puts a cognitive load on the reader, making it hard to follow the changes.

If my theory is right, then a readable program is one that consists of layers, with the difference between any two adjacent layers no larger than some critical amount. (And, conversely, a hard-to-read program has multiple layer-pairs that have differences larger than that amount.)

I think that readability is an attribute of a program, not a programming language. We can write readable programs in any language (well, almost any language; a few obscure languages are designed for obfuscation). I think that the commonly accepted ideas for readable programs (comments, coding standards, meaningful names) are good ideas and helpful for building readable programs. I also think that we must structure our programs to have small differences between levels of abstraction.

Sunday, December 11, 2011

Tradeoffs

It used to be that we had to write small, fast programs. Processors were slow, storage media (punch cards, tape drives, disc drives) were even slower, and memory was limited. In such a world, programmers were rewarded for tight code, and DP managers were rewarded for maintaining systems at utilization rates of ninety to ninety-five percent of machine capacity. The reason was that a higher rate meant that you needed more equipment, and a lower rate meant that you had purchased (or more likely, leased) too much equipment.

In that world, programmers had to make tradeoffs when creating systems. Readable code might not be fast, and fast code might not be readable (and often the two were true). Fast code won out over readable (slower) code. Small code that squeezed the most out of the hardware won out over readable (less efficient) code. The tradeoffs were reasonable.

The world has changed. Computers have become more powerful. Networks are faster and more reliable. Databases are faster, and we have multiple choices of database designs -- not everything is a flat file or a set of related tables. Equipment is cheap, almost commodities.

This change means that the focus of costs now shifts. Equipment is not the big cost item. CPU time is not the big cost item. Telecommunications is not the big cost item.

The big problem of application development, the big expense that concerns managers, the thing that will get attention, will be maintenance: the time and cost to modify or enhance an existing system.

The biggest factor in maintenance costs, in my mind, is the readability of the code. Readable code is easy to change (possibly). Opaque code is impossible to change (certainly).

Some folks look to documentation, such as design or architecture documents. I put little value in documentation; I have always found the code to be the final and most accurate description of the system. Documents suffer from aging: they were correct some but the system has been modified. Documents suffer from imprecision: they specify some but not all of the details. Documents suffer from inaccuracy: they specify what the author thought the system was doing, not what the system actually does.

Sometimes documentation can be useful. The business requirements of a system can be useful. But I find "System architecture" and "Design overview" documents useless.

If the code is to be the documentation for itself, then it must be readable.

Readability is a slippery concept. Different programmers have different ideas about "readability". What is readable to me may not be readable to you. Over my career, my ideas of readability have changed, as I learned new programming techniques (structured programming, object-oriented programming, functional programming), and even as I learned more about a language (my current ideas of "readable" C++ code are very different from my early ideas of "readable" C++ code).

I won't define readability. I will let each project decide on a meaningful definition of readability. I will list a few ideas that will let teams improve the readability of their code (however they define it).

Version control for source code A shop that is not using version control is not serious about software development. There are several reliable, well-documented and well supported, popular systems for version control. Version control lets multiple team members work together and coordinate their changes.

Automated builds An automated build lets you build the system reliably, consistently, and at low effort. You want the product for the customer to be built with a reliable and consistent method.

Any developer can build the system Developers need to build the system to run their tests. They need a reliable, consistent, low-effort, method to do that. And it has to work with their development environment, allowing them to change code and debug the system.

Automated testing Like version control, automated testing is necessary for a modern shop. You want to test the product before you send it to your customers, and you want the testing to be consistent and reliable. (You also want it easy to run.)

Any developer can test the system Developers need to know that their changes affect only the behaviors that they intend, and no other parts of the system. They need to use the tests to ensure that their changes have no unintended side-effects. Low-effort automated tests let them run the tests often.

Acceptance of refactoring To improve code, complicated classes and modules must be changed into sets of smaller, simpler classes and modules. Refactoring changes the code without changing the external behavior of the code. If I start with a system that passes its tests (automated tests, right?) and I refactor it, it should pass the same tests. When I can rearrange code, without changing the behavior, I can make the code more readable.

Incentives for developers to use all of the above Any project that discourages developers from using automated builds or automated tests, either explicitly or implicitly, will see little or no improvements in readability.

But the biggest technique for readable code is that the organization -- its developers and managers -- must want readable code. If the organization is more concerned with "delivering a quality product" or "meeting the quarterly numbers", then they will trade off readability for those goals.