Fitzpatrick's Fabulous Future: programming techniques

Sunday, January 22, 2017

The spreadsheet is a dinosaur

Spreadsheets are dinosaurs. Or more specifically, our current notion of a spreadsheet is dinosaur, a relic from a previous age.

Its not that spreadsheets have not changed. They have changed over the years, mostly by accumulation. Features have been added but core concepts have remained the same.

The original spreadsheet was Visicalc, written for the Apple II in the late 1970s. And while spreadsheets have expanded their capacity and added charts and fonts and database connections, the original concept -- a grid of values and formulas -- has not changed. If we had a time machine, we could pluck a random Visicalc user out of 1979, whisk him to 2017, put him in front of a computer running the latest version of Excel, and he would know what to do. (Aside, perhaps, from the mouse or the touchscreen.)

Spreadsheets are quite the contrast to programming languages and IDEs, which have evolved in that same period. Programming languages have acquired discipline. IDEs have improved editing, syntax highlighting, and debugging. The development process has shifted from "waterfall" to "agile" methods.

Could we improve spreadsheets as we have improved programming languages?

Let's begin by recognizing that improvements are subjective, for both spreadsheets and programming languages. Pascal's adherence to structured programming concepts was lauded as progress by some and decried as oppressive by others. Users of spreadsheets are probably just as opinionated as programmers, so let's avoid the term "improvement" and instead focus on "rigor": Can we improve the rigor of spreadsheets, and assume that improved rigor is accepted as a good thing?

Here are some possible ways to add rigor to spreadsheets:

No forward references Current spreadsheets allow for formulas to reference any cell in the sheet. A formula may use values that are calculated "later" in the sheet, below or to the right. Spreadsheets are relatively clever at determining the proper sequence of calculation, so this is not necessarily a problem. It can be, if a sequence of calculations is self-referencing or "cyclic". Spreadsheets also have logic to identify cyclic calculations, but the work of fixing them is left to the human.

Removing forward references prevents cyclic calculations. By removing forward references, we limit the cells which can be used by a formula. Instead of using any cell, a formula may use only cells above and to the left. (Thus the top left cell may contain a value but not a formula.) With such limits in place, any formula can use only those items that have already been defined, and none of those items can use the current formula.

Not everyone may want to consider the top left corner the "origin". We could allow for each sheet to have an "origin corner" (top left, top, right, bottom left, or bottom right) and require formulas to use cells in the direction of the origin.

Smaller sheets Current spreadsheets allow for large numbers of rows and columns. Large spreadsheets were nice before they could be linked together. Once spreadsheets could be linked, the need for very large sheets evaporated. (Although we humans still too often think that bigger is better.) Smaller sheets force one to organize data. I once worked with a spreadsheet that allowed 52 columns and 128 rows per sheet. At first it was difficult, but with time I learned to work within the restrictions, and my sheets had better structure. Also, it was easier to find and resolve errors.

No absolute coordinates Absolute coordinates, as opposed to relative coordinates, are a hack for the original spreadsheets. They are useful when replicating a formula across multiple cells, and you want to override the default behavior of adjusting cell references.

Instead of absolute coordinates, I find it better to use a named range. (Even for a single cell.) The effect on calculations is the same, and the name of the range provides better information to the reviewer of the spreadsheet.

No coordinates in formulas Extending the last idea, force the use of named ranges for all calculations. (Perhaps this is the programmer in me, familiar with variable names.) Don't use cell references ("A4" or "C15") but require a range name for every source to the formula.

Better auditing The auditing capabilities of Excel are nice, but I find them frustrating and difficult to use. Microsoft chose a visual method for auditing, and I would like an extraction of all formulas for analysis.

Import and export controls on sheets This is an expansion of the "no forward references". It is easy to retrieve values from other sheets, perhaps too easy. One can set of cyclic dependencies across sheets, with sheets mutually dependent on their calculations. Specifying the values that may be retrieved from a spreadsheet (similar to an "export" declaration in some languages) limits the values the values exposed and forces the author to think about each export.

Of course, it would be easy to simply export everything. This avoids thinking and making decisions. To discourage this behavior, we would need a cost mechanism, some penalty for each exposed value. The more values you expose, the more you have to pay. (Rather than a dollar penalty, it may be a quality rating on the spreadsheet.)

None of these changes come for free. All of these changes have the potential to break existing spreadsheets. Yet I think we will see some movement towards them. We rely on spreadsheets for critical calculations, and we need confidence that the computations are correct. Improved rigor builds that confidence.

We may not see a demand for rigor immediately. It may take a significant failure, or a number of failures, before managers and executives demand more from spreadsheet users. When they do, spreadsheet users will demand more from spreadsheets.

Friday, July 26, 2013

Projects must move and grow

Software development projects are like people, in that they grow over time. Technology changes around us, and we must allow for changes to our projects.

The things that change are:

Tools, such as editors and compilers
Processes, such as design reviews and code walk-throughs
Source code

Tools evolve over time. New versions of compilers are released. Sometimes a new version will break existing code. The C++ standards committee works very hard to prevent such breakages. (Although they have made some changes that broke code, after long deliberations.)

A project that practices good hygiene will upgrade tools. The change does not have to be immediate, but it is within a reasonable period of time. (Perhaps six months.)

The "best practices" for software development change over time. Often these changes are made possible with the invention of tools or the release of a new product. Past changes have included the use of version control, the use of lint utilities, code reviews, and automated testing. None of these were available (cheaply) in the 1990s, and they are today. Version control has undergone several generations, from the early PVCS and CVS systems to SourceSafe and Subversion and today's TFS and 'git'.

The source code changes over time, too. Not just for the addition of new features, but improvements to the code. Programming techniques, like tools and best practices, change over time. We moved from procedural programming to object-oriented programming. We've developed patterns such as "Model View Controller" and "Model View ViewModel" which help organize our code and reduce complexity.

Changes for tools, techniques, and source code take time and effort. They must be planned and incorporated into releases. They entail risk; any change can introduce a defect. To make matters worse, such changes are "internal" and offer no direct benefit to the users. The changes are for the benefit of the development team.

I have seen a number of projects start with the then-current set of tools and techniques, only to become established and stay with those tools and techniques. The once-modern project ages into a legacy effort. It is a trap that is all too easy: the demand for new features and bug fixes overwhelm the team and there is no time for non-revenue improvements.

The "no return on investment" argument is difficult to counter. Given finite resources and the choice between a feature that provides revenue against a change that provides no revenue, it is sensible to go with the revenue feature.

Without these internal changes, the project cost rises. The increases are caused by two factors: code complexity and ability to hire staff.

Over time, changes (especially rushed changes) increase the complexity of the code and changes become much more difficult. The code, once neat and organized, becomes messy. Features are added quickly, with compromises made to quality for delivery time. Each additional change adds to the "mess" of the code.

The world of software development advances, but the project remains stuck in its original era. The tools, initially the most modern, age. They become yesterday's tools and techniques.

Another problem is staffing. Few developers are willing to work on a project that has hard-to-maintain code, old tools, and outdated processes. The few that are will do so only at elevated rates. This increases the cost of future maintenance.

Allocating time and effort (and perhaps money) to keep the project up to date is not easy. The payoff is in the long term. A good project manager balances the short-term needs and the long-term goals.

Fitzpatrick's Fabulous Future

Sunday, January 22, 2017

The spreadsheet is a dinosaur

Friday, July 26, 2013

Projects must move and grow

Blog Archive

About Me