Sunday, April 23, 2017

Two successes from Microsoft

One success is the Surface tablet. Recent articles state that Microsoft is losing, because other manufacturers are producing devices that surpass Microsoft's Surface tablet.

I have a different view. I consider the Surface tablet a success. It's a success because it keeps Microsoft (and Windows) in the market. Microsoft introduced the Surface as response to Apple's iPad tablet. Without the Surface, Microsoft would have offerings for desktop PCs, laptop PCs, and phones. The Surface keeps Microsoft in the market, and keeps customers loyal to Microsoft.

The second success is the CloudBook. Last week saw a leaked document that outlined specifications for a device called a "CloudBook". This appears to be a response to Google's ChromeBook devices, which are lightweight laptops that run ChomeOS and the Chrome browser.

Calling the CloudBook a success is a bit premature. The official CloudBook devices have yet to be released, so we don't know how they will perform and how customers will receive them. (Acer has a laptop that they call a "CloudBook", which is probably a close approximation of the future CloudBooks.)

Yet I believe that CloudBooks will be a success for Microsoft. They keep Microsoft in the market. I think that many businesses will use CloudBooks. They are less expensive than typical laptops, they are easier to administer, and being browser-focused their apps store data in the cloud, not locally. Storing data in the cloud is more secure and eliminates the loss of data due to the loss of a laptop.

Tuesday, April 18, 2017

Microsoft and programming languages

Should Microsoft develop programming languages? Or interpreters and compilers? Should they continue to develop C# (and F#)? Should they continue to develop the C# compiler?

A world in which Microsoft does not develop programming languages would indeed be different. Microsoft's history is full of programming languages and implementations. They started with an interpreter for BASIC. They quickly followed that with a Macro-assembler, a FORTRAN compiler, a COBOL compiler, and even a BASIC compiler (to compete with Digital Research's CBASIC compiler.) When the C programming language became popular, Microsoft acquired a C compiler and, after much rework over the years, expanded it into the Visual Studio we know today. (Some of Microsoft's offerings were products purchased from other sources, but once in the Microsoft fold they received a lot of changes.)

The compilers, interpreters, editors, and debuggers have all served Microsoft well. But Microsoft treated them as tools of its empire, supporting them and enhancing them when such support and enhancements grew Microsoft, and discarding them when they did not aid Microsoft. Examples of discontinued languages include their Pascal compiler, Visual Basic, and the short-lived Visual J#.

Today, Microsoft supports C#, F#, and VB.NET.

I've been thinking about these languages. Microsoft created C# during their "empire" phase, when Microsoft tried to provide everything for everyone. They had to compete with Java, and C# was their entry. VB.NET was necessary to offer a path from Visual Basic into .NET. F# is the most recent addition, an expedition into functional programming.

All of these languages provided a path that lead into (but not out of) the Microsoft world. To use Visual Basic, you had to run Windows. To program in C#, you had to run Windows.

Today, Microsoft is agnostic about operating systems, and languages. Azure supports Windows and Linux. Visual Studio works with PHP, JavaScript, Python, and Ruby, among others. Microsoft has opened the C# compiler and .NET framework to non-Microsoft platforms.

Microsoft is no longer using programming languages as a means to drive people to Windows.

That is a significant change. A consequence of that change is a reduction in the importance of programming languages. It may make sense for Microsoft to let other people develop programming languages. Perhaps Microsoft's best strategy is to provide a superior environment for programming and the development of languages.

Microsoft is not the first company to make this transition. IBM did the same with its languages. FORTRAN, PL/I, APL, SQL, and RPG were all invented by IBM and proprietary, usable only on IBM equipment. Today, IBM provides services and doesn't need private programming languages to sell hardware.

Microsoft cannot simply drop C#. What would make sense would be a gradual, planned, transfer to another organization. Look for actions that continue in the direction of open source for C# and .NET.

Thursday, April 13, 2017

Slack, efficiency, and choice

Slack is the opposite of efficiency. Slack is excess capacity, unused space, extra money. Slack is waste and therefore considered bad.

Yet things are not that simple. Yes, slack is excess capacity and unused space. And yes, slack can be viewed as waste. But slack is not entirely bad. Slack has value.

Consider the recent (infamous) overbooking event on United Airlines. One passenger was forcibly removed from a flight to make room for a United crew member (for another flight, the crew member was not working the flight of the incident). United had a fully-booked flight from Chicago to Louisville and needed to move a flight crew from Chicago to Louisville. They asked for volunteers to take other flights; three people took them up on the offer, leaving one seat as an "involuntary re-accommodation".

I won't go into the legal and moral issues of this incident. Instead, I will look at slack.

- The flight had no slack passenger capacity. It was fully booked. (That's usually a good thing for the airline, as it means maximum revenue.)

- The crew had to move from Chicago to Louisville, to start their next assigned flight. It had to be that crew; there was no slack (no extra crew) in Louisville. I assume that there was no other crew in the region that could fill in for the assigned crew. (Keep in mind that crews are regulated as to how much time they can spend working, by union contracts and federal law. This limits the ability of an airline to swap crews like sacks of flour.)

In a perfectly predictable world, we can design, build, and operate systems with no slack. But the world is not perfectly predictable. The world surprises us, and slack helps us cope with those surprises. Extra processing capacity is useful when demand spikes. Extra money is useful for many events, from car crashes to broken water heaters to layoffs.

Slack has value. It buffers us from harsh consequences.

United ran their system with little slack, was subjected to demands greater than expected, and suffered consequences. But this is not really about United or airlines or booking systems. This is about project management, system design, budgeting, and just about any other human activity.

I'm not recommending that you build slack into your systems. I'm not insisting that airlines always leave a few empty seats on each flight.

I'm recommending that you consider slack, and that you make a conscious choice about it. Slack has a cost. It also has benefits. Which has the greater value for you depends on your situation. But don't strive to eliminate slack without thought.

Examine. Evaluate. Think. And then decide.

Sunday, April 9, 2017

Looking inwards and outwards

It's easy to categorize languages. Compiled versus interpreted. Static typing versus dynamic. Strongly typed versus weakly typed. Strict syntax versus liberal. Procedural. Object-oriented. Functional. Languages we like; languages we dislike.

One mechanism I have not seen is the mechanism for assuring quality. It's obscurity is not a surprise -- the mechanisms are more a function of the community, not the language itself.

Quality assurance tends to fall into two categories: syntax checking and unit tests. Both aim to verify that programs perform as expected. The former relies on features of the language, the latter relies on tests that are external to the language (or at least external to the compiler or interpreter).

Interestingly, there is a correlation between execution type (compiled or interpreted) and assurance type (language features or tests). Compiled languages (C, C++, C#) tend to rely on features of the language to ensure correctness. Interpreted languages (Perl, Python, Ruby) tend to rely on external tests.

That interpreted languages rely on external tests is not a surprise. The languages are designed for flexibility and do not have the concepts needed to verify the correctness of code. Ruby especially supports the ability to modify objects and classes at runtime, which means that static code analysis must be either extremely limited or extremely sophisticated.

That compiled languages (and the languages I mentioned are strongly and statically typed) rely on features of the language is also not a surprise. IDEs such as Visual Studio can leverage the typing of the language and analyze the code relatively easily.

We could use tests to verify the behavior of compiled code. Some projects do. But many do not, and I surmise from the behavior of most projects that it is easier to analyze the code than it is to build and run tests. That matches my experience. On some projects, I have refactored code (renaming classes or member variables) and checked in changes after recompiling and without running tests. In these cases, the syntax checking of the compiler is sufficient to ensure quality.

But I think that tests will win out in the end. My reasoning is: language features such as strong typing and static analysis are inward-looking. They verify that the code meets certain syntactic requirements.

Tests, when done right, look not at the code but at the requirements. Good tests are built on requirements, not code syntax. As such, tests are more aligned with the user's needs, and not the techniques used to build the code. Tests are more "in touch" with the actual needs of the system.

The syntax requirements of languages are inward looking. They verify that the code conforms to a set of rules. (This isn't bad, and at times I want C and C++ compilers to require indentation much like Python does.) But conforming to rules, while nice (and possibly necessary) is not sufficient.

Quality software requires looking inward and outward. Good code is easy to read (and easy to change). Good code also performs the necessary tasks, and it is tests -- and only tests -- that can verify that.

Monday, March 27, 2017

The provenance of variables

Just about every programming language has the concept of a 'variable', a container of a value. Variables are named 'variable' because their contents can vary -- as opposed to constants. The statement

a = 10

Assigns the variable 'a' a value of '10', denoted in the program as a constant.

(There are some languages which allow for the values of constants to be changed, so one can assign a new value to a constant. It leads to unusual results and is often considered a defect. But I digress.)

The nice thing about variables is that they can vary. The problem with variables is that they vary.

More specifically, when examining a program (say with a debugger), one can see the contents of a variable but one does not know how that value was calculated. Was it assigned? Was the variable incremented? Did the value come from a constant, or was it calculated? When was it assigned?

Here is an idea: Retain the source of values. Modify the notion of a variable. Instead of being a simple container for a value, hold the value and additional information.

For example, a small program:

file f = open("filename")
a = f.read()
b = f.read()
c = (a - b) / 100

Let's assume that the file contains the text "20 4", which is the number 20 followed by a space an then the number 4, all in text format.

In today's programming languages, the variables a, b, and c contains values, and nothing else. The variable 'a' contains 20, the variable 'b' contains '4', and the variable 'c' contains 0.16. Yet they contain no information about how those values were derived.

For a small program such as this example, we can easily look at the code and identify the source of the values. But larger programs are a different story. They are often complex, and the source of a value is not obvious.

With provenance, the variables a, b, and c still contain values, and in addition contain information about those values.

The variable 'a' contains the value 20 and the value 'filename', as that was the source of the value. It would also be possible to contain more information about the file, such as a creation date, a version number (for filesystems that support version numbers), and the position within the file. It can even contain the line number of the assignment, allowing the programmer easy access to the source.

The variable 'b' contains similar information.

The variable 'c' contains information about the variables 'a' and 'b', along with their states at the time of assignment. Consider the revised program:

file f = open("filename")
a = f.read()
b = f.read()
c = (a - b) / 100
... more code
a = 0
b = 1
... more code
d = c * 20

In this program, the variable 'd' is assigned the value 3.2 (assuming that the same file is read) and at that assignment, the variable 'c' holds information about 'a' and 'b' with their initial values of 20 and 4, not their current values of 0 and 1. Thus, a developer can examine the assignment to 'd' and understand the value of 'c'.

In addition to developers, provenance may be useful for financial auditors and examiners. Anyone who cares about the origins of a specific value will find provenance helpful.

Astute readers will be already thinking of the memory requirements for such a scheme. Retaining provenance requires memory -- a lot of memory. A simple variable holding an integer requires four bytes (on many modern systems). With provenance, a 'simple' integer would require the four bytes for the value and as many bytes as required to hold its history. Instead of four bytes, it may require 40, or 400.

Clearly, provenance is not free. It costs memory. It also costs time. Yet the benefits, I think, are clear. So, how to implement it? Some ideas:

- Provenance is needed only when debugging, not during production. Enable it as part of the normal debug information and remove it for 'release' mode.
- Provenance can be applied selectively, to a few variables and not to others.
- Provenance can be implemented selectively. Perhaps one needs only a few pieces of information, such as line number of assignment. Less information requires less memory.

Our computing capacity continues to grow. Processor capabilities, memory size, and storage size, are all increasing faster than program size. That is, our computers are getting bigger, and they are getting bigger faster than our programs are getting bigger. All of that 'extra' space should do something for us, right?

Tuesday, March 14, 2017

To fragment or not fragment, that is the question

First there were punch cards, and they were good. They were a nice, neat representation of data. One record on one card -- what could be easier?

Except that record sizes were limited to 80 bytes. And if you dropped a stack, and cards got out of sequence.

Then there were magtapes, and they were good too. Better than cards, because record sizes could be larger than 80 bytes. Also, if you dropped a tape the data stayed in sequence. But also quite similar to cards, data on magtapes was simple a series of records.

At first, there was one "file" on a tape: you started at the beginning, you read the records until the "end-of-file" mark, and you stopped. Later, we figured out that a single tape could hold multiple files, one after the other.

Except that files were always contiguous data. They could not be expanded on a single tape, since the expanded file would write over a portion of the next file. (Also, reading and writing to the same tape was not possible on many systems.)

So we invented magnetic disks and magnetic drums, and they were good too. Magtapes permitted sequential access, which meant reading the entire file and processing it. Disks and drums allowed for direct access which meant you could jump to a position in the file, read or write a record, and then jump somewhere else in the file. We eventually moved away from drums and stayed with disks, for a number of reasons.

Early disks allocated space much like tapes: a disk could contain several files but data for each file was contiguous. Programmers and system operators had to manage disk space, allocating space for files in advance. Like files on magtapes, files on disks were contiguous and could not be expanded, as the expansion would write over the next file.

And then we invented filesystems. (On DEC systems, they were called "directory structures".) Filesystems managed disk space, which meant that programmers and operators didn't have to.

Filesystems store files not as a long sequence of disk space but as collections of blocks, each block holding a number of bytes. Blocks added to a file could be from any area of the disk, not necessarily in line (or even close) to the original set of blocks. By adding or removing blocks, files could grow or shrink as necessary. The dynamic allocation of disk space was great!

Except that files were not contiguous.

When processing a file sequentially, it is faster to access a contiguous file than a non-contiguous file. Each block of data follows its predecessor, so the disk's read/write heads move little. For a non-contiguous file, with blocks of data scattered about the disk, the read/write heads must move from track to track to read each set of blocks. The action of moving the read/write heads takes time, and is therefore considered expensive.

Veteran PC users may remember utility programs which had the specific purpose of defragmenting a disk. They were popular in the 1990s.

Now, Windows defragments disks as an internal task. No third-party software is needed. No action by the user is needed.

To review: We started with punch cards, which were contiguous. Then we moved to magtapes, and files were still contiguous. Then we switched to disks, at first with contiguous files and then with non-contiguous files.

Then we created utility programs to make the non-contiguous files contiguous again.

Now we have SSDs (Solid-State Disks), which are really large chunks of memory with extra logic to hold values when the power is off. But they are still memory, and the cost of non-contiguous data is low. There are no read/write heads to move across a platter (indeed, there is no platter).

So the effort expended by Windows to defragment files (on an SSD) is not buying us better performance. It may be costing us, as the "defrag" process does consume CPU and does write to the SSD, and SSDs have a limited number of write operations in their lifespan.

So now, perhaps, we're going back to non-contiguous.

Tennis, anyone?

Thursday, February 23, 2017

The (possibly horrifying) killer app for AI

The original (and so far only) "killer app" was the spreadsheet. The specific spreadsheet was VisiCalc (or Lotus 1-2-3, depending on who you ask) and it was the compelling reason to get a personal computer.

We may see a killer app for AI, and from a completely unexpected direction: performance reviews.

Employee performance reviews, in large companies, often work as follows: each employee is rated on a number of items, frequently from 1 to 5 and sometimes as "meets expectations" or "needs improvement". Items range from meeting budgets and delivery dates to soft skills such as communication and leadership.

HR works to ensure that performance reviews are administered fairly, which means as consistently as possible, which often means "one size fits all". Everyone in the organization, from the entry-level developer to the vice president of accounting, all have the same performance review form and topics. It leads to developers being rated on "meeting budgets" and vice presidents of accounting being rated on "meeting delivery dates".

Just about everyone fears and dislikes the process. Employees dread the annual (or semiannual) review. Managers have no joy for it either.

This is where AI may be attractive.

Instead of a human-driven process, a company may look for an AI-driven process. The human-administered process is rife with potential for inconsistencies (including favoritism) and opens the company to lawsuits. Instead of expending effort to enforce consistent criteria, HR may choose to implement AI for performance reviews. (Managers may have little say in the decision, and many may be secretly relieved at such a change.)

This is a possibly horrifying concept. The mere idea of a computer (which is what AI is, at bottom) rating and ranking employees may be unwelcome among the ranks. The fear of "computer overlords" from the 1960s is still with us, and I suspect few companies would want to be the first to implement such a system.

I recognize that such a system cannot work in a vacuum. It would need input, starting with a list of job responsibilities, assigned tasks and deadlines, and status reports. Early versions will most likely get many things wrong. Over time, I expect they will improve.

Should we move to AI for performance reviews, I have some observations.

First, AI performance review systems may move outside of companies. Just as payroll processing is often outsourced, performance review systems might be outsourced too. The driver is risk avoidance, and companies that build their own performance review AI systems may build in subtle discrimination against women or minorities. An external supplier would have to warrant their system conforms to anti-discrimination laws -- a benefit to the client company.

Second, automating performance reviews could mean more frequent reviews, and more frequent feedback to employees. The choice of annual as a frequency for performance reviews is driven, I suspect, by two factors. First, they are needed to justify changes in compensation. Second, they are expensive to administer. The former mandates at least one per year, the second discourages anything more frequent.

But automating performance reviews should reduce effort and cost. Or at least reduce the marginal cost for reviews beyond the annual review.

Another result of more frequent performance reviews? More frequent information to management about the state of their workforce.

In sum, AI offers a way to reduce cost and risk in performance reviews. It also offers more frequent feedback to employees and more frequent information to management. I see advantages to the use of AI for this despised task.

Now all we need to do is bell the cat.