Monday, December 13, 2010

The importance of being significant

In engineering computations, we have the notion of "significant figures". This notion tells us how many digits of a number are accurate or "significant", and which digits should be ignored. This sounds worse than it really is; let me provide an example.

If I tell you that I have 100 dollars in my pocket, you will assume that I have *about* 100 dollars. I may have exactly 100 dollars, or I may have 95 or 102 or maybe even 120. My answer provides information to a nice round number, which is convenient for our conversation. (If I actually have $190 something more than $150, I should say "about 200 dollars", since that is the closer round number.) The phrase "100 dollars" is precise to the first digit (the '1' in '100') but not down to the last zero.

On the other hand, if I tell you that I have 100 dollars and 12 cents, then you can assume that I have indeed $100.12 and not something like $120 or $95. By specifying the 12 cents, I have provided an answer with more significant figures; five in the latter case, one in the former.

The number of significant figures is, well, significant. Or at least important. It's a factor in calculations that must be included for reliable results. There are rules for performing arithmetic with numbers, and significant figures tell us when we must stop adding digits of precision.

For example, the hypothetical town of Springfield has a population of 15,000. That number has two significant figures. If one person moves into Springfield, is the population now 15,001? The arithmetic we learned in elementary school says that it is, but that math assumes that the 15,000 population figure is precise to all places (five significant figures). In the real world, town populations are estimates (mostly because they change, but slowly enough that the estimate is still usable). The 15,000 figure is precise to two figures; it has limited precision.

When performing calculations with estimates or other numbers with limited precision, the rule is: you cannot increase precision. You have to keep to the original level of precision, or lose precision. (You cannot make numbers more precise than the original measurements, because that is creating fictional information.)

With a town estimate of 15,000 (two "sig-figs"), adding a person to the town yields an estimate of... 15,000. It's as if I told you that I had $100 in my pocket, and then I found a quarter and tucked it into my pocket. How much do I now have in my pocket? It's not $100.25, because that would increase the number of significant figures from one to five, and you cannot increase precision. We have to stick with one digit of precision, so I *still* have to report $100 in my pocket, despite my windfall.

In the engineering world, respecting the precision of the initial estimates is important for accurate estimates later in the calculations.

I haven't seen this concept carried over to the computer programming world. In programming languages, we have the ability to read and write integers and floating point numbers (and other data types). With integers, we often have the ability to specify the number of character positions for the number; for floating point, we can specify the number of digits and the number of decimal places. But the number of decimal places is not the same as the number of significant figures.

In my experience, I have seen no programming language or class library address this concept. (Perhaps someone has, if so please send info.) Knuth covers the concept in great detail in "The Art of Computer Programming" and explains how precision can be lost during computations. (If you want a scary read, go read that section of his work.)

There may be several reasons for our avoidance of significant figures:

It takes effort to compute. Using significant figures in calculations requires that we drag around additional information and perform additional adjustments on the raw results. This is a problem of computational power.

It requires additional I/O There is more effort to specify the significant figures on input (and to a lesser extent, output) This is an argument of language specification, numeric representation, and input/output capacity.

It reduces the image of authority associated with the computer In Western culture, the computer holds a place of authority of information. Studies have shown that people believe the data on computer printouts more readily data on than hand-written documents. This is an issue of psychology.

Some domains don't need it The banking industry, for example, uses numbers that are precise to a fixed decimal place. When you ask a bank for your balance, it responds with a number precise to the penny, not "about $100". This is in issue of the domain.

My thinking is that all of these arguments made sense in their day, but should be re-examined. We have the computational power and the parsing capabilities for accepting, tracking, and using significant figure information. While banking may be immune to significant figures (and perhaps that is only the accounting side of banking), many other domains need to track the precision of their data.

As for the psychological argument, there is no amount of technology, hardware, or language features that will change our thinking. It is up to us to think about our thinking and change it for the better.

No comments: