Sunday, January 22, 2017

The spreadsheet is a dinosaur

Spreadsheets are dinosaurs. Or more specifically, our current notion of a spreadsheet is dinosaur, a relic from a previous age.

Its not that spreadsheets have not changed. They have changed over the years, mostly by accumulation. Features have been added but core concepts have remained the same.

The original spreadsheet was Visicalc, written for the Apple II in the late 1970s. And while spreadsheets have expanded their capacity and added charts and fonts and database connections, the original concept -- a grid of values and formulas -- has not changed. If we had a time machine, we could pluck a random Visicalc user out of 1979, whisk him to 2017, put him in front of a computer running the latest version of Excel, and he would know what to do. (Aside, perhaps, from the mouse or the touchscreen.)

Spreadsheets are quite the contrast to programming languages and IDEs, which have evolved in that same period. Programming languages have acquired discipline. IDEs have improved editing, syntax highlighting, and debugging. The development process has shifted from "waterfall" to "agile" methods.

Could we improve spreadsheets as we have improved programming languages?

Let's begin by recognizing that improvements are subjective, for both spreadsheets and programming languages. Pascal's adherence to structured programming concepts was lauded as progress by some and decried as oppressive by others. Users of spreadsheets are probably just as opinionated as programmers, so let's avoid the term "improvement" and instead focus on "rigor": Can we improve the rigor of spreadsheets, and assume that improved rigor is accepted as a good thing?

Here are some possible ways to add rigor to spreadsheets:

No forward references Current spreadsheets allow for formulas to reference any cell in the sheet. A formula may use values that are calculated "later" in the sheet, below or to the right. Spreadsheets are relatively clever at determining the proper sequence of calculation, so this is not necessarily a problem. It can be, if a sequence of calculations is self-referencing or "cyclic". Spreadsheets also have logic to identify cyclic calculations, but the work of fixing them is left to the human.

Removing forward references prevents cyclic calculations. By removing forward references, we limit the cells which can be used by a formula. Instead of using any cell, a formula may use only cells above and to the left. (Thus the top left cell may contain a value but not a formula.) With such limits in place, any formula can use only those items that have already been defined, and none of those items can use the current formula.

Not everyone may want to consider the top left corner the "origin". We could allow for each sheet to have an "origin corner" (top left, top, right, bottom left, or bottom right) and require formulas to use cells in the direction of the origin.

Smaller sheets Current spreadsheets allow for large numbers of rows and columns. Large spreadsheets were nice before they could be linked together. Once spreadsheets could be linked, the need for very large sheets evaporated. (Although we humans still too often think that bigger is better.) Smaller sheets force one to organize data. I once worked with a spreadsheet that allowed 52 columns and 128 rows per sheet. At first it was difficult, but with time I learned to work within the restrictions, and my sheets had better structure. Also, it was easier to find and resolve errors.

No absolute coordinates Absolute coordinates, as opposed to relative coordinates, are a hack for the original spreadsheets. They are useful when replicating a formula across multiple cells, and you want to override the default behavior of adjusting cell references.

Instead of absolute coordinates, I find it better to use a named range. (Even for a single cell.) The effect on calculations is the same, and the name of the range provides better information to the reviewer of the spreadsheet.

No coordinates in formulas Extending the last idea, force the use of named ranges for all calculations. (Perhaps this is the programmer in me, familiar with variable names.) Don't use cell references ("A4" or "C15") but require a range name for every source to the formula.

Better auditing The auditing capabilities of Excel are nice, but I find them frustrating and difficult to use. Microsoft chose a visual method for auditing, and I would like an extraction of all formulas for analysis.

Import and export controls on sheets This is an expansion of the "no forward references". It is easy to retrieve values from other sheets, perhaps too easy. One can set of cyclic dependencies across sheets, with sheets mutually dependent on their calculations. Specifying the values that may be retrieved from a spreadsheet (similar to an "export" declaration in some languages) limits the values the values exposed and forces the author to think about each export.

Of course, it would be easy to simply export everything. This avoids thinking and making decisions. To discourage this behavior, we would need a cost mechanism, some penalty for each exposed value. The more values you expose, the more you have to pay. (Rather than a dollar penalty, it may be a quality rating on the spreadsheet.)

None of these changes come for free. All of these changes have the potential to break existing spreadsheets. Yet I think we will see some movement towards them. We rely on spreadsheets for critical calculations, and we need confidence that the computations are correct. Improved rigor builds that confidence.

We may not see a demand for rigor immediately. It may take a significant failure, or a number of failures, before managers and executives demand more from spreadsheet users. When they do, spreadsheet users will demand more from spreadsheets.

1 comment:

Unknown said...

Have you looked at power pivot in Excel, which is sometimes referred to as Modern Excel?