Fitzpatrick's Fabulous Future

Thursday, April 13, 2017

Slack, efficiency, and choice

Slack is the opposite of efficiency. Slack is excess capacity, unused space, extra money. Slack is waste and therefore considered bad.

Yet things are not that simple. Yes, slack is excess capacity and unused space. And yes, slack can be viewed as waste. But slack is not entirely bad. Slack has value.

Consider the recent (infamous) overbooking event on United Airlines. One passenger was forcibly removed from a flight to make room for a United crew member (for another flight, the crew member was not working the flight of the incident). United had a fully-booked flight from Chicago to Louisville and needed to move a flight crew from Chicago to Louisville. They asked for volunteers to take other flights; three people took them up on the offer, leaving one seat as an "involuntary re-accommodation".

I won't go into the legal and moral issues of this incident. Instead, I will look at slack.

- The flight had no slack passenger capacity. It was fully booked. (That's usually a good thing for the airline, as it means maximum revenue.)

- The crew had to move from Chicago to Louisville, to start their next assigned flight. It had to be that crew; there was no slack (no extra crew) in Louisville. I assume that there was no other crew in the region that could fill in for the assigned crew. (Keep in mind that crews are regulated as to how much time they can spend working, by union contracts and federal law. This limits the ability of an airline to swap crews like sacks of flour.)

In a perfectly predictable world, we can design, build, and operate systems with no slack. But the world is not perfectly predictable. The world surprises us, and slack helps us cope with those surprises. Extra processing capacity is useful when demand spikes. Extra money is useful for many events, from car crashes to broken water heaters to layoffs.

Slack has value. It buffers us from harsh consequences.

United ran their system with little slack, was subjected to demands greater than expected, and suffered consequences. But this is not really about United or airlines or booking systems. This is about project management, system design, budgeting, and just about any other human activity.

I'm not recommending that you build slack into your systems. I'm not insisting that airlines always leave a few empty seats on each flight.

I'm recommending that you consider slack, and that you make a conscious choice about it. Slack has a cost. It also has benefits. Which has the greater value for you depends on your situation. But don't strive to eliminate slack without thought.

Examine. Evaluate. Think. And then decide.

Sunday, April 9, 2017

Looking inwards and outwards

It's easy to categorize languages. Compiled versus interpreted. Static typing versus dynamic. Strongly typed versus weakly typed. Strict syntax versus liberal. Procedural. Object-oriented. Functional. Languages we like; languages we dislike.

One mechanism I have not seen is the mechanism for assuring quality. It's obscurity is not a surprise -- the mechanisms are more a function of the community, not the language itself.

Quality assurance tends to fall into two categories: syntax checking and unit tests. Both aim to verify that programs perform as expected. The former relies on features of the language, the latter relies on tests that are external to the language (or at least external to the compiler or interpreter).

Interestingly, there is a correlation between execution type (compiled or interpreted) and assurance type (language features or tests). Compiled languages (C, C++, C#) tend to rely on features of the language to ensure correctness. Interpreted languages (Perl, Python, Ruby) tend to rely on external tests.

That interpreted languages rely on external tests is not a surprise. The languages are designed for flexibility and do not have the concepts needed to verify the correctness of code. Ruby especially supports the ability to modify objects and classes at runtime, which means that static code analysis must be either extremely limited or extremely sophisticated.

That compiled languages (and the languages I mentioned are strongly and statically typed) rely on features of the language is also not a surprise. IDEs such as Visual Studio can leverage the typing of the language and analyze the code relatively easily.

We could use tests to verify the behavior of compiled code. Some projects do. But many do not, and I surmise from the behavior of most projects that it is easier to analyze the code than it is to build and run tests. That matches my experience. On some projects, I have refactored code (renaming classes or member variables) and checked in changes after recompiling and without running tests. In these cases, the syntax checking of the compiler is sufficient to ensure quality.

But I think that tests will win out in the end. My reasoning is: language features such as strong typing and static analysis are inward-looking. They verify that the code meets certain syntactic requirements.

Tests, when done right, look not at the code but at the requirements. Good tests are built on requirements, not code syntax. As such, tests are more aligned with the user's needs, and not the techniques used to build the code. Tests are more "in touch" with the actual needs of the system.

The syntax requirements of languages are inward looking. They verify that the code conforms to a set of rules. (This isn't bad, and at times I want C and C++ compilers to require indentation much like Python does.) But conforming to rules, while nice (and possibly necessary) is not sufficient.

Quality software requires looking inward and outward. Good code is easy to read (and easy to change). Good code also performs the necessary tasks, and it is tests -- and only tests -- that can verify that.

Monday, March 27, 2017

The provenance of variables

Just about every programming language has the concept of a 'variable', a container of a value. Variables are named 'variable' because their contents can vary -- as opposed to constants. The statement

a = 10

Assigns the variable 'a' a value of '10', denoted in the program as a constant.

(There are some languages which allow for the values of constants to be changed, so one can assign a new value to a constant. It leads to unusual results and is often considered a defect. But I digress.)

The nice thing about variables is that they can vary. The problem with variables is that they vary.

More specifically, when examining a program (say with a debugger), one can see the contents of a variable but one does not know how that value was calculated. Was it assigned? Was the variable incremented? Did the value come from a constant, or was it calculated? When was it assigned?

Here is an idea: Retain the source of values. Modify the notion of a variable. Instead of being a simple container for a value, hold the value and additional information.

For example, a small program:

file f = open("filename")
a = f.read()
b = f.read()
c = (a - b) / 100

Let's assume that the file contains the text "20 4", which is the number 20 followed by a space an then the number 4, all in text format.

In today's programming languages, the variables a, b, and c contains values, and nothing else. The variable 'a' contains 20, the variable 'b' contains '4', and the variable 'c' contains 0.16. Yet they contain no information about how those values were derived.

For a small program such as this example, we can easily look at the code and identify the source of the values. But larger programs are a different story. They are often complex, and the source of a value is not obvious.

With provenance, the variables a, b, and c still contain values, and in addition contain information about those values.

The variable 'a' contains the value 20 and the value 'filename', as that was the source of the value. It would also be possible to contain more information about the file, such as a creation date, a version number (for filesystems that support version numbers), and the position within the file. It can even contain the line number of the assignment, allowing the programmer easy access to the source.

The variable 'b' contains similar information.

The variable 'c' contains information about the variables 'a' and 'b', along with their states at the time of assignment. Consider the revised program:

file f = open("filename")
a = f.read()
b = f.read()
c = (a - b) / 100
... more code
a = 0
b = 1
... more code
d = c * 20

In this program, the variable 'd' is assigned the value 3.2 (assuming that the same file is read) and at that assignment, the variable 'c' holds information about 'a' and 'b' with their initial values of 20 and 4, not their current values of 0 and 1. Thus, a developer can examine the assignment to 'd' and understand the value of 'c'.

In addition to developers, provenance may be useful for financial auditors and examiners. Anyone who cares about the origins of a specific value will find provenance helpful.

Astute readers will be already thinking of the memory requirements for such a scheme. Retaining provenance requires memory -- a lot of memory. A simple variable holding an integer requires four bytes (on many modern systems). With provenance, a 'simple' integer would require the four bytes for the value and as many bytes as required to hold its history. Instead of four bytes, it may require 40, or 400.

Clearly, provenance is not free. It costs memory. It also costs time. Yet the benefits, I think, are clear. So, how to implement it? Some ideas:

- Provenance is needed only when debugging, not during production. Enable it as part of the normal debug information and remove it for 'release' mode.
- Provenance can be applied selectively, to a few variables and not to others.
- Provenance can be implemented selectively. Perhaps one needs only a few pieces of information, such as line number of assignment. Less information requires less memory.

Our computing capacity continues to grow. Processor capabilities, memory size, and storage size, are all increasing faster than program size. That is, our computers are getting bigger, and they are getting bigger faster than our programs are getting bigger. All of that 'extra' space should do something for us, right?

Tuesday, March 14, 2017

To fragment or not fragment, that is the question

First there were punch cards, and they were good. They were a nice, neat representation of data. One record on one card -- what could be easier?

Except that record sizes were limited to 80 bytes. And if you dropped a stack, and cards got out of sequence.

Then there were magtapes, and they were good too. Better than cards, because record sizes could be larger than 80 bytes. Also, if you dropped a tape the data stayed in sequence. But also quite similar to cards, data on magtapes was simple a series of records.

At first, there was one "file" on a tape: you started at the beginning, you read the records until the "end-of-file" mark, and you stopped. Later, we figured out that a single tape could hold multiple files, one after the other.

Except that files were always contiguous data. They could not be expanded on a single tape, since the expanded file would write over a portion of the next file. (Also, reading and writing to the same tape was not possible on many systems.)

So we invented magnetic disks and magnetic drums, and they were good too. Magtapes permitted sequential access, which meant reading the entire file and processing it. Disks and drums allowed for direct access which meant you could jump to a position in the file, read or write a record, and then jump somewhere else in the file. We eventually moved away from drums and stayed with disks, for a number of reasons.

Early disks allocated space much like tapes: a disk could contain several files but data for each file was contiguous. Programmers and system operators had to manage disk space, allocating space for files in advance. Like files on magtapes, files on disks were contiguous and could not be expanded, as the expansion would write over the next file.

And then we invented filesystems. (On DEC systems, they were called "directory structures".) Filesystems managed disk space, which meant that programmers and operators didn't have to.

Filesystems store files not as a long sequence of disk space but as collections of blocks, each block holding a number of bytes. Blocks added to a file could be from any area of the disk, not necessarily in line (or even close) to the original set of blocks. By adding or removing blocks, files could grow or shrink as necessary. The dynamic allocation of disk space was great!

Except that files were not contiguous.

When processing a file sequentially, it is faster to access a contiguous file than a non-contiguous file. Each block of data follows its predecessor, so the disk's read/write heads move little. For a non-contiguous file, with blocks of data scattered about the disk, the read/write heads must move from track to track to read each set of blocks. The action of moving the read/write heads takes time, and is therefore considered expensive.

Veteran PC users may remember utility programs which had the specific purpose of defragmenting a disk. They were popular in the 1990s.

Now, Windows defragments disks as an internal task. No third-party software is needed. No action by the user is needed.

To review: We started with punch cards, which were contiguous. Then we moved to magtapes, and files were still contiguous. Then we switched to disks, at first with contiguous files and then with non-contiguous files.

Then we created utility programs to make the non-contiguous files contiguous again.

Now we have SSDs (Solid-State Disks), which are really large chunks of memory with extra logic to hold values when the power is off. But they are still memory, and the cost of non-contiguous data is low. There are no read/write heads to move across a platter (indeed, there is no platter).

So the effort expended by Windows to defragment files (on an SSD) is not buying us better performance. It may be costing us, as the "defrag" process does consume CPU and does write to the SSD, and SSDs have a limited number of write operations in their lifespan.

So now, perhaps, we're going back to non-contiguous.

Tennis, anyone?

Thursday, February 23, 2017

The (possibly horrifying) killer app for AI

The original (and so far only) "killer app" was the spreadsheet. The specific spreadsheet was VisiCalc (or Lotus 1-2-3, depending on who you ask) and it was the compelling reason to get a personal computer.

We may see a killer app for AI, and from a completely unexpected direction: performance reviews.

Employee performance reviews, in large companies, often work as follows: each employee is rated on a number of items, frequently from 1 to 5 and sometimes as "meets expectations" or "needs improvement". Items range from meeting budgets and delivery dates to soft skills such as communication and leadership.

HR works to ensure that performance reviews are administered fairly, which means as consistently as possible, which often means "one size fits all". Everyone in the organization, from the entry-level developer to the vice president of accounting, all have the same performance review form and topics. It leads to developers being rated on "meeting budgets" and vice presidents of accounting being rated on "meeting delivery dates".

Just about everyone fears and dislikes the process. Employees dread the annual (or semiannual) review. Managers have no joy for it either.

This is where AI may be attractive.

Instead of a human-driven process, a company may look for an AI-driven process. The human-administered process is rife with potential for inconsistencies (including favoritism) and opens the company to lawsuits. Instead of expending effort to enforce consistent criteria, HR may choose to implement AI for performance reviews. (Managers may have little say in the decision, and many may be secretly relieved at such a change.)

This is a possibly horrifying concept. The mere idea of a computer (which is what AI is, at bottom) rating and ranking employees may be unwelcome among the ranks. The fear of "computer overlords" from the 1960s is still with us, and I suspect few companies would want to be the first to implement such a system.

I recognize that such a system cannot work in a vacuum. It would need input, starting with a list of job responsibilities, assigned tasks and deadlines, and status reports. Early versions will most likely get many things wrong. Over time, I expect they will improve.

Should we move to AI for performance reviews, I have some observations.

First, AI performance review systems may move outside of companies. Just as payroll processing is often outsourced, performance review systems might be outsourced too. The driver is risk avoidance, and companies that build their own performance review AI systems may build in subtle discrimination against women or minorities. An external supplier would have to warrant their system conforms to anti-discrimination laws -- a benefit to the client company.

Second, automating performance reviews could mean more frequent reviews, and more frequent feedback to employees. The choice of annual as a frequency for performance reviews is driven, I suspect, by two factors. First, they are needed to justify changes in compensation. Second, they are expensive to administer. The former mandates at least one per year, the second discourages anything more frequent.

But automating performance reviews should reduce effort and cost. Or at least reduce the marginal cost for reviews beyond the annual review.

Another result of more frequent performance reviews? More frequent information to management about the state of their workforce.

In sum, AI offers a way to reduce cost and risk in performance reviews. It also offers more frequent feedback to employees and more frequent information to management. I see advantages to the use of AI for this despised task.

Now all we need to do is bell the cat.

Sunday, February 12, 2017

Databases, containers, and Clarke's first law

A blog post by a (self-admitted) beginner engineer rants about databases inside of containers. The author lays out the case against using databases inside containers, pointing out potential problems from security to configuration time to the problems of holding state within a container. The argument is intense and passionate, although a bit difficult for me to follow. (That, I believe, is due to my limited knowledge of databases and my even more limited knowledge of containers.)

I believe he raises questions which should be answered before one uses databases in containers. So in one sense, I think he is right.

In a larger sense, I believe he is wrong.

For that opinion, I refer to Clarke's first law, which states: When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.

I suspect that it applies to sysadmins and IT engineers just as much as it does to scientists. I also suspect that age has rather little effect, too. Our case is one of a not-elderly not-scientist claiming that databases inside of containers is impossible, or at least a Bad Idea and Will Lead Only To Suffering.

My view is that containers are useful, and databases are useful, and many in the IT field will want to use databases inside of containers. Not just run programs that access databases on some other (non-containerized) server, but host the database within a container.

Not only will people want to use databases in containers, there will be enough pressure and enough interested people that they will make it happen. If our current database technology does not work well with containers, then engineers will modify containers and databases to make them work. The result will be, quite possibly, different from what we have today. Tomorrow's database may look and act differently from today's databases. (Just as today's phones look and act differently from phones of a decade ago.)

Utility is one of the driving features of technology. Containers have it, so they will be around for a while. Databases have it (they've had it for decades) and they will be around for a while. One or both may change to work with the other.

We'll still call them databases, though. The term is useful, too.

Monday, February 6, 2017

Software development and economics

One of the delights of working in the IT field is that it interacts with so many other fields. On one side is the "user" areas: user interface design, user experience, and don't forget accessibility and section 508 compliance. On the other side is hardware, networking, latency, CPU design, caching and cache invalidation, power consumption, power dissipation, and photolithography.

And then there is economics. Not the economics of buying a new server, or the economics of cloud computing, but "real" economics, the kind used to analyze nations.

Keynesian economics, in short, says that during an economic downturn the government should spend money even if it means accumulating debt. By spending, the government keeps the economy going and speeds the recovery. Once the economy has recovered, the government reduces spending and pays down the debt.

Thus, Keynesian economics posits two stages: one in which the government accumulates debt and one in which the government reduces debt. A "normal" economy will shift from recession to boom (and back), and the government should shift from debt accumulation to debt payment (and back).

It strikes me that this two-cycle approach to fiscal policy is much like Agile development.

The normal view of Agile development is the introduction of small changes, prioritized and reviewed by stakeholders, and tested with automated means. Yet a different view of Agile shows that it is much like Keynesian economics.

If the code corresponds to the economy, and the development team corresponds to the government, then we can build an analogy. The code shifts from an acceptable state to an unacceptable state, due to a new requirement that is not met. In response, the development team implements the new requirement but does so in a way that incurs debt. (The code is messy and needs to be refactored.) At this point, the development team has incurred technical debt.

But since the requirement has been implemented, the code is now in an acceptable state. (That is, the recession is over and the economy has recovered.) At this point, the development team must pay down the debt, by improving the code.

The two-cycle operation of "code and refactor" matches the economic version of "spend and repay".

The economists have it easy, however. Economic downturns occur, but economic recoveries provide a buffer time between them. Development teams must face stakeholders, who once they have a working system, too often demand additional changes immediately. There is no natural "boom time" to allow the developers to refactor the code. Only strong management can enforce a delay to allow for refactoring.