Sunday, December 15, 2013

Readable does not necessarily mean what you think it means

An easy way to start an argument among programmers (or possibly a fist-fight) is to ask about the readability of programming languages. Most developers have strong opinions about programming languages, especially when you are not paying them.

I'm not sure that "readability" is an aspect of a programming language. Instead, I think readability is an aspect of a specific program (written in any language). I have used many languages (assembly language, BASIC, FORTRAN, Pascal, COBOL, dBase, RBase, Visual Basic, Delphi, C, C++, Java, C#, Perl, Ruby, and Python, to name the popular ones), and examined many programs (small, large, tutorials, and production-level).

I've seen readable programs in all of these languages (including assembly language). I've seen hard-to-read programs in all of those languages, too.

If readability is not an aspect of a programming language but of the program itself, then what makes a program readable? Lots of people have commented on the topic over the decades. The popular ideas are usually:

  • Use comments
  • Follow coding standards
  • Use meaningful names for variables and functions (and classes)
  • Use structured programming techniques
  • Have high cohesion within modules (or classes) and low coupling

These are all worthy ideas.

I would like to add one more: balance the levels of abstraction. Programs of non-trivial complexity are divided into modules. (For object-oriented programming, these divisions are classes. For structured programming, these are functions and libraries.) A readable program will have a balanced set of modules. By "balanced", I mean in terms of size and complexity.

Given a program and its modules, you can take measurements. Typical measurements include: lines of code (derided by many, and for good reasons), cyclomatic complexity, software "volume", and function points. It doesn't particularly matter which metric; any of them give you relative sizes of modules. Once you have module sizes, you can plot those sizes on a chart.

Frequently, readable programs have a smooth curve of module sizes. And just as frequently, hard-to-read programs have a jagged curve. Hard-to-read programs tend to have one or a few large modules and a few or many small modules.

I'm not sure why such a correlation exists. (And perhaps it doesn't; I admit that my observations are limited and the correlation may be proven false with a larger data set.)

Yet I have a theory.

As a programmer examines the code, he frequently moves from one module to another, following the logic and the flow of control. Modules divide the program not only into smaller components but also into levels. The different levels of the program handle different levels of abstraction. Low levels contain small units of code, small collections of data. Higher levels contain larger units of code that are composed of smaller units.

Moving from module to module means (often) moving from one level of organization to another, lower, level. With a smooth distribution of module sizes, the cognitive difference between levels is small. A path from the top level to a module on the bottom level may pass through many intermediate modules, but each transition is small and easy to understand.

A program with a "lumpy" distribution of module sizes, in contrast, lacks this set of small jumps from level to level. Instead, a program with a lumpy distribution of module sizes (and module complexity) has a lumpy set of transitions from layer to layer. Instead of a smooth set of jumps from the top level to the bottom, the jumps occur erratically, some small and some large. This disparity in jump size puts a cognitive load on the reader, making it hard to follow the changes.

If my theory is right, then a readable program is one that consists of layers, with the difference between any two adjacent layers no larger than some critical amount. (And, conversely, a hard-to-read program has multiple layer-pairs that have differences larger than that amount.)

I think that readability is an attribute of a program, not a programming language. We can write readable programs in any language (well, almost any language; a few obscure languages are designed for obfuscation). I think that the commonly accepted ideas for readable programs (comments, coding standards, meaningful names) are good ideas and helpful for building readable programs. I also think that we must structure our programs to have small differences between levels of abstraction.

No comments: