Fitzpatrick's Fabulous Future: engineering

Showing posts with label engineering. Show all posts

Thursday, November 10, 2011

The insignificance of significant figures in programming languages

If a city with a population figure of 500,000 gets three more residents, the population figure is... 500,000, not 500,003. The reasoning is this: the original figure was accurate only to the first digit (the hundred-thousands digit). It has a finite precision, and adding a number that is smaller than the precision has no affect on the original number.

Significant figures is not the same as "number of decimal places", although many people do confuse the two.

Significant figures are needed for calculations with measured quantities. Measurements will have some degree of imprecision, and the rigor of significant figures keeps our calculations honest. The rules for significant figures are more complex (and subtle) than a simple "use 3 decimal places". The number of decimal places will vary, and some calculations may affect positions to the left of the decimal point. (As in our "city with 500,000 residents" example.)

For a better description of significant figures, see the wikipedia page.

Applications such as Microsoft Excel (or LibreOffice Calc) have no built-in support for significant figures. Nor, to my knowledge, are there plug-ins or extensions to support calculations with significant figures.

Perhaps the lack of support for significant figures is caused by a lack of demand. Most spreadsheets are built to handle money, which is counted (not measured) and therefore does not fall under the domain of significant figures. (Monetary figures are considered to be exact, in most applications.)

Perhaps the lack of support is driven by the confusion between "decimal places" and "significant figures".

But perhaps the biggest reason for a lack of support for significant figures in applications is this: There is no support for significant figures in popular languages.

A quick search for C++, Java, Python, and Ruby yield no such corresponding packages. Interestingly, the only language that had a package for significant figures was Perl: CPAN has the Math::SigFigs package.

So the better question is: Why do programming languages have no support for significant figures?

Wednesday, October 19, 2011

Engineering vs. craft

Some folks consider the development of software to be a craft; others claim that it is engineering.

As much as I would like for software development to be engineering, I consider it a craft.

Engineering is a craft that must work within measurable constraints, and must optimize some measurable attributes. For example, bridges must support a specific, measurable load, and minimize the materials used in construction (again, measurable quantities).

We do not do this for software.

We manage not software but software development. That is, we measure the cost and time of the development effort, but we do not measure the software itself. (The one exception is measuring the quality of the software, but that is a difficult measurement and we usually measure the number and severity of defects, which is a negative measure.)

If we are to engineer software, then we must measure the software. (We can -- and should -- measure the development effort. Those are necessary measurements. But they are not, by themselves, sufficient for engineering.)

What can we measure in software? Here are some suggestions:

- Lines of code
- Number of classes
- Number of methods
- Average size of classes
- Complexity (cyclomatic, McCabe, or whatever metric you like)
- "Boolean complexity" (the number of boolean constants used within code that are not part of initialization)
- The fraction of classes that are immutable

Some might find the notion of measuring lines of code abhorrent. I will argue that it is not the metric that is evil, it is the use of it to rank and rate programmers. The misuse of metrics is all too easy and can lead to poor code. (You get what you measure and reward.)

Why do we not measure these things? (Or any other aspect of code?)

Probably because there is no way to connect these metrics to project cost. In the end, project cost is what matters. Without a translation from lines of code (or any other metric) to cost, the metrics are meaningless. The code may be one class of ten thousand lines, or one hundred classes of one hundred lines each; without a conversion factor, the cost of each design is the same. (And the cost of each design is effectively zero, since we cannot convert design decisions into costs.)

Our current capabilities do not allow us to assign cost to design, or code size, or code complexity. The only costs we can measure are the development costs: number of programmers, time for testing, and number of defects.

One day in the future we will be able to convert complexity to cost. When we do, we will move from craft to engineering.

Monday, December 13, 2010

The importance of being significant

In engineering computations, we have the notion of "significant figures". This notion tells us how many digits of a number are accurate or "significant", and which digits should be ignored. This sounds worse than it really is; let me provide an example.

If I tell you that I have 100 dollars in my pocket, you will assume that I have *about* 100 dollars. I may have exactly 100 dollars, or I may have 95 or 102 or maybe even 120. My answer provides information to a nice round number, which is convenient for our conversation. (If I actually have $190 something more than $150, I should say "about 200 dollars", since that is the closer round number.) The phrase "100 dollars" is precise to the first digit (the '1' in '100') but not down to the last zero.

On the other hand, if I tell you that I have 100 dollars and 12 cents, then you can assume that I have indeed $100.12 and not something like $120 or $95. By specifying the 12 cents, I have provided an answer with more significant figures; five in the latter case, one in the former.

The number of significant figures is, well, significant. Or at least important. It's a factor in calculations that must be included for reliable results. There are rules for performing arithmetic with numbers, and significant figures tell us when we must stop adding digits of precision.

For example, the hypothetical town of Springfield has a population of 15,000. That number has two significant figures. If one person moves into Springfield, is the population now 15,001? The arithmetic we learned in elementary school says that it is, but that math assumes that the 15,000 population figure is precise to all places (five significant figures). In the real world, town populations are estimates (mostly because they change, but slowly enough that the estimate is still usable). The 15,000 figure is precise to two figures; it has limited precision.

When performing calculations with estimates or other numbers with limited precision, the rule is: you cannot increase precision. You have to keep to the original level of precision, or lose precision. (You cannot make numbers more precise than the original measurements, because that is creating fictional information.)

With a town estimate of 15,000 (two "sig-figs"), adding a person to the town yields an estimate of... 15,000. It's as if I told you that I had $100 in my pocket, and then I found a quarter and tucked it into my pocket. How much do I now have in my pocket? It's not $100.25, because that would increase the number of significant figures from one to five, and you cannot increase precision. We have to stick with one digit of precision, so I *still* have to report $100 in my pocket, despite my windfall.

In the engineering world, respecting the precision of the initial estimates is important for accurate estimates later in the calculations.

I haven't seen this concept carried over to the computer programming world. In programming languages, we have the ability to read and write integers and floating point numbers (and other data types). With integers, we often have the ability to specify the number of character positions for the number; for floating point, we can specify the number of digits and the number of decimal places. But the number of decimal places is not the same as the number of significant figures.

In my experience, I have seen no programming language or class library address this concept. (Perhaps someone has, if so please send info.) Knuth covers the concept in great detail in "The Art of Computer Programming" and explains how precision can be lost during computations. (If you want a scary read, go read that section of his work.)

There may be several reasons for our avoidance of significant figures:

It takes effort to compute. Using significant figures in calculations requires that we drag around additional information and perform additional adjustments on the raw results. This is a problem of computational power.

It requires additional I/O There is more effort to specify the significant figures on input (and to a lesser extent, output) This is an argument of language specification, numeric representation, and input/output capacity.

It reduces the image of authority associated with the computer In Western culture, the computer holds a place of authority of information. Studies have shown that people believe the data on computer printouts more readily data on than hand-written documents. This is an issue of psychology.

Some domains don't need it The banking industry, for example, uses numbers that are precise to a fixed decimal place. When you ask a bank for your balance, it responds with a number precise to the penny, not "about $100". This is in issue of the domain.

My thinking is that all of these arguments made sense in their day, but should be re-examined. We have the computational power and the parsing capabilities for accepting, tracking, and using significant figure information. While banking may be immune to significant figures (and perhaps that is only the accounting side of banking), many other domains need to track the precision of their data.

As for the psychological argument, there is no amount of technology, hardware, or language features that will change our thinking. It is up to us to think about our thinking and change it for the better.

Fitzpatrick's Fabulous Future