Saturday, January 28, 2017

A multitude of virtual machines

In IT, the term "virtual machine" has multiple meanings. We use the term to identify pretend servers with pretend disk space and pretend devices, all hosted on a real (or "physical") server. Cloud computing and even plain old (non-cloud) data centers have multiple instances of virtual machines.

We also use the term to identify the pretend processors used by various programming languages. Java has its JVM, C# has the .NET processor, and other languages have their own imaginary processors. It is an old concept, made popular by Java in the mid-1990s but going back to the 1980s with the UCSD p-System and even into the 1960s.

The two types of virtual machines are complementary. The former duplicates the hardware (usually for a PC) and provides virtual instances of everything in a computer: disk, graphics card, network card, USB and serial ports, and even a floppy disk (if you want). The one thing it doesn't virtualize is the processor; the hypervisor (the program controlling the virtual machines) relies on the physical processor.

The latter is a fictitious processor (with a fictitious instruction set) that is emulated by software on the physical processor. It has no associated hardware, and the term "virtual processor" might have been a better choice. (I have no hope of changing the name now, but I will use the term for this essay.)

It is the virtual processor that interests me. Or rather, it is the number of virtual processors that exist today.

We are blessed (or cursed) with a large number of virtual processors. Oracle's Java uses one called "JVM". Microsoft uses one called "CLR" (for "common language runtime"). Perl uses a virtual processor (two, actually; one for Perl 5 and a different one for Perl 6). Python uses a virtual processor. Ruby, Erlang, PHP, and Javascript all use virtual processors.

We are awash in virtual processors. It seems that each language has its own, but that's not true. The languages Groovy, Scala, Clojure, Kotlin, JRuby, and Jython all run on JVM. Microsoft's CLR runs C#, F#, VB.NET, IronPython, and IronRuby. Even BEAM, the virtual processor for Erlang, supports "joxa", "lfe", "efene", "elixir", "eml", and others.

I will point out that not every language uses a virtual processor. C, C++, Go, and Swift all produce executable code. Their code runs on the real processor. While more efficient, an executable is bound to the processor instruction set, and you must recompile to run on a different processor.

But back to virtual processors. We have a large number of virtual processors. And I have to think: "We've been here before".

The PC world long ago settled on the Intel x86 architecture. Before it did, we had a number of processors, from Intel (the 8080, 8085, 8086, and 8088), Zilog (the Z-80), Motorola (the 6800, 6808, and 6809), and MOS (the 6502).

The mainframe world saw many processors, before the rise of the IBM System/360 processor. Its derivatives are now the standard for mainframes.

Will we converge on a single architecture for virtual processors? I see no reason for such convergence in the commercial languages. Oracle and Microsoft have nothing to gain by adopting the other's technology. Indeed, one using the other would make them beholden to the competition for improvements and corrections.

The open source community is different, and may see convergence. An independent project, providing support for open source languages, may be possible. It may also make sense, allowing the language maintainers to focus on their language-specific features and remove the burden of maintaining the virtual processor. An important factor in such a common virtual processor is the interaction between the language and the virtual processor.

Open source has separated and consolidated other items. Sometimes we settle on a single solution, sometimes multiple. The kernel settled on Linux. The windowing system settled on X and KDE. The file system. The compiler back end.

Why not the virtual processor?

Sunday, January 22, 2017

The spreadsheet is a dinosaur

Spreadsheets are dinosaurs. Or more specifically, our current notion of a spreadsheet is dinosaur, a relic from a previous age.

Its not that spreadsheets have not changed. They have changed over the years, mostly by accumulation. Features have been added but core concepts have remained the same.

The original spreadsheet was Visicalc, written for the Apple II in the late 1970s. And while spreadsheets have expanded their capacity and added charts and fonts and database connections, the original concept -- a grid of values and formulas -- has not changed. If we had a time machine, we could pluck a random Visicalc user out of 1979, whisk him to 2017, put him in front of a computer running the latest version of Excel, and he would know what to do. (Aside, perhaps, from the mouse or the touchscreen.)

Spreadsheets are quite the contrast to programming languages and IDEs, which have evolved in that same period. Programming languages have acquired discipline. IDEs have improved editing, syntax highlighting, and debugging. The development process has shifted from "waterfall" to "agile" methods.

Could we improve spreadsheets as we have improved programming languages?

Let's begin by recognizing that improvements are subjective, for both spreadsheets and programming languages. Pascal's adherence to structured programming concepts was lauded as progress by some and decried as oppressive by others. Users of spreadsheets are probably just as opinionated as programmers, so let's avoid the term "improvement" and instead focus on "rigor": Can we improve the rigor of spreadsheets, and assume that improved rigor is accepted as a good thing?

Here are some possible ways to add rigor to spreadsheets:

No forward references Current spreadsheets allow for formulas to reference any cell in the sheet. A formula may use values that are calculated "later" in the sheet, below or to the right. Spreadsheets are relatively clever at determining the proper sequence of calculation, so this is not necessarily a problem. It can be, if a sequence of calculations is self-referencing or "cyclic". Spreadsheets also have logic to identify cyclic calculations, but the work of fixing them is left to the human.

Removing forward references prevents cyclic calculations. By removing forward references, we limit the cells which can be used by a formula. Instead of using any cell, a formula may use only cells above and to the left. (Thus the top left cell may contain a value but not a formula.) With such limits in place, any formula can use only those items that have already been defined, and none of those items can use the current formula.

Not everyone may want to consider the top left corner the "origin". We could allow for each sheet to have an "origin corner" (top left, top, right, bottom left, or bottom right) and require formulas to use cells in the direction of the origin.

Smaller sheets Current spreadsheets allow for large numbers of rows and columns. Large spreadsheets were nice before they could be linked together. Once spreadsheets could be linked, the need for very large sheets evaporated. (Although we humans still too often think that bigger is better.) Smaller sheets force one to organize data. I once worked with a spreadsheet that allowed 52 columns and 128 rows per sheet. At first it was difficult, but with time I learned to work within the restrictions, and my sheets had better structure. Also, it was easier to find and resolve errors.

No absolute coordinates Absolute coordinates, as opposed to relative coordinates, are a hack for the original spreadsheets. They are useful when replicating a formula across multiple cells, and you want to override the default behavior of adjusting cell references.

Instead of absolute coordinates, I find it better to use a named range. (Even for a single cell.) The effect on calculations is the same, and the name of the range provides better information to the reviewer of the spreadsheet.

No coordinates in formulas Extending the last idea, force the use of named ranges for all calculations. (Perhaps this is the programmer in me, familiar with variable names.) Don't use cell references ("A4" or "C15") but require a range name for every source to the formula.

Better auditing The auditing capabilities of Excel are nice, but I find them frustrating and difficult to use. Microsoft chose a visual method for auditing, and I would like an extraction of all formulas for analysis.

Import and export controls on sheets This is an expansion of the "no forward references". It is easy to retrieve values from other sheets, perhaps too easy. One can set of cyclic dependencies across sheets, with sheets mutually dependent on their calculations. Specifying the values that may be retrieved from a spreadsheet (similar to an "export" declaration in some languages) limits the values the values exposed and forces the author to think about each export.

Of course, it would be easy to simply export everything. This avoids thinking and making decisions. To discourage this behavior, we would need a cost mechanism, some penalty for each exposed value. The more values you expose, the more you have to pay. (Rather than a dollar penalty, it may be a quality rating on the spreadsheet.)

None of these changes come for free. All of these changes have the potential to break existing spreadsheets. Yet I think we will see some movement towards them. We rely on spreadsheets for critical calculations, and we need confidence that the computations are correct. Improved rigor builds that confidence.

We may not see a demand for rigor immediately. It may take a significant failure, or a number of failures, before managers and executives demand more from spreadsheet users. When they do, spreadsheet users will demand more from spreadsheets.

Monday, January 16, 2017

Discipline in programming

Programming has changed over the years. We've created new languages and added features to existing languages. Old languages that many consider obsolete are still in use, and still changing. (COBOL and C++ are two examples.)

Looking at individual changes, it is difficult to see a general pattern. But stepping back and getting a broader view, we can see that the major changes have increased discipline and rigor.

The first major change was the use of high-level languages in place of assembly language. Using high-level languages provided some degree of portability across different hardware (one could, theoretically, run the same FORTRAN program on IBM, Honeywell, and Burroughs mainframes). It meant a distant relationship with the hardware and a reliance on the compiler writers.

The next change was structured programming. It changed our notions of flow control, using "while", "if/then/else", and "for" structures and discouraged the use of "goto".

Then we adopted relational databases, separate from the application program. It required using an API (later standardized as SQL) rather than accessing data directly, and it required thought and planning for the database.

Relational databases forced us to organize data stored on disk. Object-oriented programming forced us to organize data in memory. We needed object models and for very large projects, separate teams to manage the models.

Each of these changes added discipline to programming. The shift to compilers required reliable compilers and reliable vendors to support them. Structured programming applied rigor to the sequence of computation. Relational databases applied rigor to the organization of data stored outside of memory, that is, on disk. Object-oriented programming applied rigor to the organization of data stored in memory.

I should note that each of these changes was opposed. Each had naysayers, usually basing their arguments on performance. And to be fair, the initial implementation of each change did have lower performance than the old way. Yet each change has a group of advocates (I call them "the Pascal crowd" after the early devotees to that language) who pushed for the change. Eventually, the new methods were improved and accepted.

The overall trend is towards rigor and discipline. In other words, the Pascal crowd has consistently won the debates.

Which is why, when looking ahead, I think future changes will keep moving in the direction of rigor and discipline. There may be minor deviations from this path, with new languages introducing undisciplined concepts, but I suspect that they will languish. The successful languages will require more thought, more planning, and prevent more "dangerous" operations.

Functional programming is promising. It applies rigor to the state of our program. Functional programming languages use immutable objects, which once made cannot be changed. As the state of the program is the sum of the state of all variables, functional programming demands more thought given to the state of our system. That fits in with the overall trend.

So I expect that functional languages, like structured languages and object-oriented languages, will be gradually adopted and their style will be accepted as normal. And I expect more changes, all in the direction of improved rigor and discipline.

Wednesday, January 11, 2017

Microsoft's last "we do it all" project

Today's Microsoft is different from the "evil empire" of yesteryear. Today, Microsoft embraces open source and supports non-Microsoft operating systems, languages, databases, and tools.

But it wasn't always that way.

In an earlier age, Microsoft was the empire. They were big, but the reason people considered them an empire was their attitude. Microsoft had answers for all of your computing needs. Operating system. Utilities. Office suite, including e-mail. Database. Development tools. Accounting packages. Project Management. Browser. The goal was for Microsoft to be the sole source for your computing needs.

Microsoft ensured this by making its tools more capable, more performant (is that a word?), more reliable, and more integrated with other Microsoft technologies than the competition's offerings.

One weakness was the command-line shell, CMD.exe or as it was known early on, the "DOS box". CMD.exe was a direct clone of the command-line interface from MS-DOS, which was initially a clone of the CP/M command line interface (itself a copy of DEC's command line interfaces). Microsoft extended the MS-DOS interface over the years, and even added feature in the Windows version.

But Microsoft had to stay compatible with earlier versions, and features had to be inserted into the shell "language", often resulting in a clunky syntax. The decision in MS-DOS to allow the slash character as an option specifier meant that directories had to be separated by backslash. That meant that backslashes could not be used as escape characters (until they could, but only for the double-quote character). Variable names had to be signified with a percent sign, as a dollar sign was allowed as part of a file name (that, too, dated back to CP/M). The compromises cascaded over the years, and the result was a lot of complaints to Microsoft, mostly from developers. (Microsoft gave weight to the opinions of developers, as it knew they were important for future applications.)

Microsoft needed an answer to the complaints. As an empire, Microsoft needed to provide a better shell. They had to provide the best, a product better than the competition. To meet that need, they invented PowerShell.

PowerShell was Microsoft's bigger, better, comprehensive shell. It would fix the problems of CMD and it would offer all needed capabilities. You would not need a competing shell. It had everything, and it was better than all other shells. Its commands were descriptive, not cryptic. Options to commands were consistent. It could run scripts. It had variables (with the 'proper' syntax of dollar signs). It had multiple scopes for variables (something lacking in other shells). It allowed for "pipelining" of commands, and it could pass not just text streams but full .NET objects in the pipeline. It allowed for hooks into the .NET framework.

PowerShell was a shell "done right", with everything you could possibly need.

And it was the last product of Microsoft's "we do it all" strategy.

The problem for Microsoft (and any empire) is that no matter how large you get, the world is always bigger. And since the world is bigger, you cannot provide everything for everyone. No matter how fast or powerful you make your products, someone will want something else, perhaps something small and light. All-encompassing empires are expensive to build and expensive to maintain. Microsoft has come to terms with that concept, and changed its product offerings. Microsoft Azure allows for non-Microsoft technologies such as Linux and Python and PHP. Windows now includes a WSL (Windows Subsytem for Linux) component that runs bash, a popular shell for Linux.

I think this change is good for Microsoft, good for its customers, and good for the industry. For Microsoft, they no longer have to build (and maintain and support) products for everything -- they can focus on their strengths and deliver well-designed and well-supported products without being distracted. Microsoft's customers have a little more work to do, analyzing non-Microsoft products as part of their technology stack. (They can choose to remain with all Microsoft products, but they may miss out on some opportunities.)

The industry, too, benefits. For too long, Microsoft's strategy of supplying everything intimidated people from entering the market. Why invest time and money in a new product for Windows only to see modest success be met with tough competition from Microsoft? I believe that many folks left the Microsoft ecosystem for that reason.

Of course, now Microsoft can concentrate its efforts on its key products and services -- which may change over time. Microsoft may move into markets; don't think that they will ignore opportunities. But they will enter as a competitor, not as "the evil empire".

Tuesday, January 3, 2017

Predictions for 2017

What will happen in the new year? Let's make some predictions!

Cloud computing and containers remain popular.

Ransomware will become more prevalent, with a few big name companies (and a number of smaller companies) suffering infections. Individuals will be affected as well. Companies may be spurred to improve their security; "traditional" malware was annoying but ransomware stops operations and costs actual money. Earlier virus programs would require effort from the support team to resolve, and that expense could be conveniently ignored by managers. But this new breed of malware requires an actual payment, and that is harder to ignore. I expect a louder cry for secure operating systems and applications, but effective changes will take time (years).

Artificial Intelligence and Machine Learning will be discussed. A few big players will advertise projects. They will have little effect on "the little guy", small companies, and slow-moving organizations.

Apple will continue to lead the design for laptops and phones. Laptop computers from other manufacturers will lose DVD readers and switch to USB-C (following Apple's design for the MacBook). Apple itself will look for ways to distinguish its MacBooks from laptops.

Tablet sales will remain weak. We don't know what to do with tablets, at home or in the office. They fill a niche between phones and laptops, but if you have those two you don't need a tablet. If you have a phone and are considering an additional device, the laptop is the better choice. If you have a laptop and are considering an additional device, the phone is the better choice. Tablets offer no unique abilities.

Laptop sales will remain strong. Desktop sales will decline. There is little need for a tower PC, and the prices for laptops are in line with prices for desktops. Laptops offer portability, which is good for telework or group meetings. Tower PCs offer expansion slots, which are good for... um, very little in today's offices.

Tower PCs won't die. They will remain the PC of choice for games, and for specific applications that need the processing power of GPUs. Some manufacturers may drop the desktop configurations, and the remaining manufacturers will be able to raise prices. I won't guess at who will stay in the desktop market.

Amazon.com will grow cloud services but lose market share to Microsoft and Google, who will grow at faster rates. Several small cloud providers will cease operations. If you're using a small provider of cloud services, be prepared to move.

Programming languages will continue to fracture. (Witness the decline on http://www.tiobe.com/tiobe-index/.) The long trend has been to move away from a few dominant languages and towards a collection of mildly popular languages. This change makes life uncomfortable for managers, because there is no one "safe" language that is "the best" for corporate development. But fear not, because...

Vendor relationships will continue to define the best programming languages for your projects: Java with Oracle, C# with Microsoft, Swift with Apple. If you are a Microsoft shop, your best language is C#. (You may consider F# for special projects.) If you are developing iOS applications, your best language is Swift. For Android apps, you want Java. Managers need not worry too much about difficult decisions for programming languages.

Those are my ideas for the new year. Let's see what really happens!