Showing posts with label levels of logic. Show all posts
Showing posts with label levels of logic. Show all posts

Sunday, December 15, 2013

Readable does not necessarily mean what you think it means

An easy way to start an argument among programmers (or possibly a fist-fight) is to ask about the readability of programming languages. Most developers have strong opinions about programming languages, especially when you are not paying them.

I'm not sure that "readability" is an aspect of a programming language. Instead, I think readability is an aspect of a specific program (written in any language). I have used many languages (assembly language, BASIC, FORTRAN, Pascal, COBOL, dBase, RBase, Visual Basic, Delphi, C, C++, Java, C#, Perl, Ruby, and Python, to name the popular ones), and examined many programs (small, large, tutorials, and production-level).

I've seen readable programs in all of these languages (including assembly language). I've seen hard-to-read programs in all of those languages, too.

If readability is not an aspect of a programming language but of the program itself, then what makes a program readable? Lots of people have commented on the topic over the decades. The popular ideas are usually:

  • Use comments
  • Follow coding standards
  • Use meaningful names for variables and functions (and classes)
  • Use structured programming techniques
  • Have high cohesion within modules (or classes) and low coupling

These are all worthy ideas.

I would like to add one more: balance the levels of abstraction. Programs of non-trivial complexity are divided into modules. (For object-oriented programming, these divisions are classes. For structured programming, these are functions and libraries.) A readable program will have a balanced set of modules. By "balanced", I mean in terms of size and complexity.

Given a program and its modules, you can take measurements. Typical measurements include: lines of code (derided by many, and for good reasons), cyclomatic complexity, software "volume", and function points. It doesn't particularly matter which metric; any of them give you relative sizes of modules. Once you have module sizes, you can plot those sizes on a chart.

Frequently, readable programs have a smooth curve of module sizes. And just as frequently, hard-to-read programs have a jagged curve. Hard-to-read programs tend to have one or a few large modules and a few or many small modules.

I'm not sure why such a correlation exists. (And perhaps it doesn't; I admit that my observations are limited and the correlation may be proven false with a larger data set.)

Yet I have a theory.

As a programmer examines the code, he frequently moves from one module to another, following the logic and the flow of control. Modules divide the program not only into smaller components but also into levels. The different levels of the program handle different levels of abstraction. Low levels contain small units of code, small collections of data. Higher levels contain larger units of code that are composed of smaller units.

Moving from module to module means (often) moving from one level of organization to another, lower, level. With a smooth distribution of module sizes, the cognitive difference between levels is small. A path from the top level to a module on the bottom level may pass through many intermediate modules, but each transition is small and easy to understand.

A program with a "lumpy" distribution of module sizes, in contrast, lacks this set of small jumps from level to level. Instead, a program with a lumpy distribution of module sizes (and module complexity) has a lumpy set of transitions from layer to layer. Instead of a smooth set of jumps from the top level to the bottom, the jumps occur erratically, some small and some large. This disparity in jump size puts a cognitive load on the reader, making it hard to follow the changes.

If my theory is right, then a readable program is one that consists of layers, with the difference between any two adjacent layers no larger than some critical amount. (And, conversely, a hard-to-read program has multiple layer-pairs that have differences larger than that amount.)

I think that readability is an attribute of a program, not a programming language. We can write readable programs in any language (well, almost any language; a few obscure languages are designed for obfuscation). I think that the commonly accepted ideas for readable programs (comments, coding standards, meaningful names) are good ideas and helpful for building readable programs. I also think that we must structure our programs to have small differences between levels of abstraction.

Sunday, March 6, 2011

One level down

A while back, I was build-master for a large project. The project consisted of twenty or so Visual C++ projects ("solutions", in Microsoft's terms) and five C#/.NET projects.

As build master, I had to maintain the build scripts and the system that ran them. The build system itself was a complicated application: A Java program with dozens of classes, XML files for the scripts, and an interface that ran on a web page. Maintaining the build system was expensive, and we chose to re-write the system. The resulting system was a simpler collection of batch files. The batch files looked something like this:


CD project-directory-1
MSDEV /build /solution project-1.sln /project Release
CD ..
CD project-directory-2
MSDEV /build /solution project-2.sln /project Release
CD ..
... repeat for all twenty-five projects

The one feature we needed in the system was for it to stop on an error. That is, if a Visual C++ solution failed to compile, we wanted the build system to stop and report the failure, not continue on and attempt to build the rest of the projects.

Batch files in Windows are not good at stopping. In fact, they are very good at continuing on. You can force a batch file to stop. Here's our first attempt:


CD project-directory-1
MSDEV /build /solution project-1.sln /project Release
IF %ERRORLEVEL% NEQ 0 EXIT /B 1
CD ..
CD project-directory-2
MSDEV /build /solution project-2.sln /project Release
IF %ERRORLEVEL% NEQ 0 EXIT /B 1
CD ..
... repeat for all twenty-five projects

This solution is pretty ugly, since it mixes in the control of the execution with the tasks of the execution. (Not to mention the repetitiveness of the 'IF/EXIT' command.) The problem was pervasive: we wanted our scripts to stop after a failure in any part of the process, not just compiling projects. Thus we needed 'IF/EXIT' lines sprinkled in the early phases of the job when we were getting files from version control and in the later part of the job when we were bundling files into an install package.

After a bit of thought and several discussions, we implemented a different solution. We wrote our own command processor, one that would feed commands to CMD.EXE one at a time, and check the results of each command. When a command failed, our command processor would stop and report the error.

The result was a much simpler script. We took out the 'IF/EXIT' lines, and the script once again focussed on the task of building our projects.

With our new command processor in place, we added logic to capture the output of the called programs. We captured the output of the compilers, the source control utilities, and the install packager. This allowed for an even simpler and more focussed script, since we removed the '>log.txt' and '2>errlog.txt' clauses on each line.

Looking back, I realize that our solution was to move the logic for error detection down one level. It took the problem out of the script space and into the space of the command processing.

Sometimes, pushing a problem to a different level is the right thing to do.