Fitzpatrick's Fabulous Future: The provenance of variables

Just about every programming language has the concept of a 'variable', a container of a value. Variables are named 'variable' because their contents can vary -- as opposed to constants. The statement

a = 10

Assigns the variable 'a' a value of '10', denoted in the program as a constant.

(There are some languages which allow for the values of constants to be changed, so one can assign a new value to a constant. It leads to unusual results and is often considered a defect. But I digress.)

The nice thing about variables is that they can vary. The problem with variables is that they vary.

More specifically, when examining a program (say with a debugger), one can see the contents of a variable but one does not know how that value was calculated. Was it assigned? Was the variable incremented? Did the value come from a constant, or was it calculated? When was it assigned?

Here is an idea: Retain the source of values. Modify the notion of a variable. Instead of being a simple container for a value, hold the value and additional information.

For example, a small program:

file f = open("filename")
a = f.read()
b = f.read()
c = (a - b) / 100

Let's assume that the file contains the text "20 4", which is the number 20 followed by a space an then the number 4, all in text format.

In today's programming languages, the variables a, b, and c contains values, and nothing else. The variable 'a' contains 20, the variable 'b' contains '4', and the variable 'c' contains 0.16. Yet they contain no information about how those values were derived.

For a small program such as this example, we can easily look at the code and identify the source of the values. But larger programs are a different story. They are often complex, and the source of a value is not obvious.

With provenance, the variables a, b, and c still contain values, and in addition contain information about those values.

The variable 'a' contains the value 20 and the value 'filename', as that was the source of the value. It would also be possible to contain more information about the file, such as a creation date, a version number (for filesystems that support version numbers), and the position within the file. It can even contain the line number of the assignment, allowing the programmer easy access to the source.

The variable 'b' contains similar information.

The variable 'c' contains information about the variables 'a' and 'b', along with their states at the time of assignment. Consider the revised program:

file f = open("filename")
a = f.read()
b = f.read()
c = (a - b) / 100
... more code
a = 0
b = 1
... more code
d = c * 20

In this program, the variable 'd' is assigned the value 3.2 (assuming that the same file is read) and at that assignment, the variable 'c' holds information about 'a' and 'b' with their initial values of 20 and 4, not their current values of 0 and 1. Thus, a developer can examine the assignment to 'd' and understand the value of 'c'.

In addition to developers, provenance may be useful for financial auditors and examiners. Anyone who cares about the origins of a specific value will find provenance helpful.

Astute readers will be already thinking of the memory requirements for such a scheme. Retaining provenance requires memory -- a lot of memory. A simple variable holding an integer requires four bytes (on many modern systems). With provenance, a 'simple' integer would require the four bytes for the value and as many bytes as required to hold its history. Instead of four bytes, it may require 40, or 400.

Clearly, provenance is not free. It costs memory. It also costs time. Yet the benefits, I think, are clear. So, how to implement it? Some ideas:

- Provenance is needed only when debugging, not during production. Enable it as part of the normal debug information and remove it for 'release' mode.
- Provenance can be applied selectively, to a few variables and not to others.
- Provenance can be implemented selectively. Perhaps one needs only a few pieces of information, such as line number of assignment. Less information requires less memory.

Our computing capacity continues to grow. Processor capabilities, memory size, and storage size, are all increasing faster than program size. That is, our computers are getting bigger, and they are getting bigger faster than our programs are getting bigger. All of that 'extra' space should do something for us, right?

Fitzpatrick's Fabulous Future

Monday, March 27, 2017

The provenance of variables

No comments:

Blog Archive

About Me