Saturday, January 21, 2012

Premature optimization?

It has been said that premature optimization is the root of all evil. (I read this as a loose translation of "optimizing too early can cause you grief", not as "it makes you a bad person".)

We often optimize our programs and systems. We admire system designers who can build systems that work smoothly and with minimal resources -- that is, systems that are optimized.

But what are optimizations? Most of the time, they are not pure optimizations (which use the smallest amount of system resources) but trade-offs (which use one resource in lieu of another). A simple trade-off optimization is caching: using memory (a cheap resource) to avoid database lookups (an expensive operation).

Optimization, or the selection of a specific set of trade-offs, is a good thing as long as the underlying assumptions hold.

Let us consider a long-standing tool in the developer world: version control systems (VCSs). We have used these tools for forty years, starting with SCCS and moving through various generations (RCS, CVS, PVCS, SourceSafe, Subversion, to name a few).

Many version control systems store revisions to files not as whole files but as 'deltas', the changes from one version to another. This decision is a trade-off: using computations (generating the change list) and reducing disk usage. (The list of differences is often smaller than the revised file.)

This trade-off relied on several assumptions:

  • The files stored in the VCS would be text files
  • The changes from one version to another would be a small fraction of the file
  • Disk space was expensive (compared to the user's time)

It turns out that, some forty years later, these assumptions do not always hold. We are using version control systems for more than source code, and some files that are not text files. (Non-text files are handled poorly by the 'delta' calculation logic, and most VCSs simply give up and store the entire file.) User time is expensive (and getting more so) and disk space is cheap (and also getting more so).

The trade-offs made by version control systems are now working against us. We grumble while our systems generate deltas. We care little that the Microsoft Word document files are stored in their entirety.

The latest version control systems ('git' is an example) do away with the notion of deltas. They store the entire file, with various techniques to compress the file and to de-duplicate data. (We still care about disk usage.)

The notion of storing revisions as deltas was an optimization. It is a notion that we are now moving away from. Was it a premature optimization? Was it a trade-off that we made in error? Is it an example of "the root of all evil"?

I think that the answer is no. At the time, with the technology that we had, using deltas was a good trade-off. It reduced our use of resources, and one can justify the claim of optimization. And most importantly, it worked for a long period of time.

An optimization becomes "bad" when the the underlying assumptions fail. At that point, the system is "upside down", or de-optimized. (Some might say "pessimized".) When that happens, we want to re-design the system to use a better technique (and thus reduce our use of resources). The cost of that change is part of the equation, and must be tallied. A long-running optimization with a low cost of change is good; a short-lived optimization (especially one with a high 'fix' cost at the end) is bad.

Optimizations are like leased cars. You can get by for a period of time with lower payments, but in the end you must turn in the car (or purchase it). Knowing the length of the lease and the tail-end costs is important in your decision. Optimizing without knowing the costs, in my mind, is the root of all evil.

No comments: