Tuesday, June 21, 2016

Compilers and interpreters

Programming languages (with a few exceptions) fall into one of two categories: compiled or interpreted.

Compilers are the natural descendants of assemblers. Assemblers convert text representations of processor-specific operation codes into machine-readable form; compilers convert high-level programs into machine-readable form. Interpreters, on the other hand, read high-level programs and process them, without producing an "executable".

Both forms have advantages. Compiled programs execute faster, and the source code can remain hidden from users, who need only the executable form. Interpreted programs may be slower, but the process of writing (and debugging) tends to be faster and interpreted languages have flexibilities not available in compiled languages.

Programming languages are sometimes created by individuals working without specific sponsorship and direction from a corporation (I call them "enthusiasts"). Other languages are created by corporations, in large, well-planned and well-justified projects.

But is one technique more popular than another? Let's look at the list of popular (according to tiobe.com) languages. Here are the top languages, who created them, whether they are compiled or interpreted, and when they were created:

Java: corporation (Sun); compiled; 1990s
C: enthusiasts (Kernighan and Ritchie); compiled; 1970s
C++: enthusiast (Stroustrup); compiled, derived from C; 1980s
Python: enthusiast (van Rossum); interpreted; 1990s
C#: corporation (Microsoft); compiled; 2000s
PHP: enthusiast (Lerdorf); interpreted; 1990s
JavaScript: individual (Eich); interpreted; 1990s
Perl: enthusiast (Larry Wall); interpreted; 1980s
VB.NET: corporation (Microsoft); compiled; 2000s
Ruby: enthusiast (Matsumoto); interpreted; 1990s
Delphi: corporation (Borland); compiled, derived from Pascal; 1990s
Swift: corporation (Apple); compiled; 2010s
Objective-C: enthusiasts (Cox and Love); compiled, derived from C; 1980s
R: enthusiasts (Ihaka and Gentleman); interpreted, derived from S; 1990s
Matlab: enthusiast (Moler); interpreted; 1970s
SQL: enthusiast (Codd); interpreted; 1970s
D: corporation (Digital Mars); compiled; 2000s
COBOL: government consortium; compiled; 1950s

From this list, a few things are obvious. First, we've invented both compiled and interpreted languages. Second, we've invented both over the age of computers, and continue to do so. It's not that a particular type of language was a fad or has fallen out of favor.

Look at the relationship between the type of creator and the language. Enthusiasts create interpreted languages and corporations to create compiled languages. The list above would match this rule perfectly, except for C. (C++ and Objective-C, derived from C, would naturally be compiled.)

But this is a short list, and small sample sizes may be deceptive. Let's look at some more:

APL: enthusiast (Iverson); interpreted; 1950s
BASIC: enthusiasts (Kemeny and Kurtz); interpreted; 1960s
S: enthusiasts (Becker, Wilks, Chambers); interpreted; 1970s
Fortran: corporation (IBM): compiled, derived from assembly language; 1950s
Pascal: enthusiast (Wirth); compiled; 1960s
Eiffel: enthusiast (Meyer); compiled; 1990s
Forth: enthusiast (Moore); interpreted; 1960s
dBase: enthusiast (Ratliff); interpreted; 1970s
Ada: government agency: compiled; 1970s
PL/I: corporation (IBM); compiled; 1960s
Prolog: enthusiasts (Colmerauer, et al.); interpreted; 1970s
AWK: enthusiasts (Aho, Weinberger, and Kernighan); interpreted; 1970s
DIBOL: corporation (DEC); compiled; 1970s
FOCAL: enthusiast (Merrill); interpreted; 1960s

This expanded shows that enthusiasts *tend* to create interpreted languages but not always. Corporations create compiled languages, though. The only interpreted language created by a corporation might be SQL, created by IBM but I've assigned it to E.F. Codd as an enthusiast.

I'm not sure why enthusiasts would create interpreted languages. Perhaps its more fun that way. Perhaps its easier. Interpreted languages let you stop a running program, examine the innards of your interpreter, adjust things, and continue running, all useful when debugging the interpreter.

Astute readers will note that my assignment of "enthusiast" or "corporation" to languages may be a bit loose. The designation is sometimes difficult. Kernighan and Ritchie, when creating C, were working for AT&T's Bell Labs. Are they corporation employees or enthusiasts? E.F. Codd worked for IBM when publishing his thoughts on relational databases. Is he an employee or an enthusiast? Wayne Ratliff was working for NASA's JPL when he wrote the first version of dBase and was part of Ashton-Tate when he wrote dBase II. Does that make him an employee? In all of these cases, I feel the individuals involved were doing what they did more as enthusiasts than employees.

On the flip side, I've placed Java and C# in the "corporation" side. Neither of these languages have individuals strongly associated with their origins. Java was a thing presented to us by Sun; C# was presented by Microsoft. Did the creation of these languages involve passionate individuals? Certainly. Were those individuals working on these projects independent of the corporation's needs? I see no evidence of that. (Yet I can easily see Kernighan and Ritchie working late at night to add features to their C compiler.)

I don't know if the assignment of "corporation" or "enthusiast" to a language's origin is important -- but I don't know that it isn't. It may be that enthusiasts will continue to create interpreted languages, and corporations will continue to create compiled languages.

I do think it significant that Java and C# live in between, Java with its JVM and C# with its CLR. Perl and Python have moved in that direction, too. They gain some benefits of interpreted languages and retain some benefits of compiled languages. I expect we will see more languages that use these techniques.

One more thing. Two other recently developed languages:

Go: corporation (Google); compiler; 2010s
Checked C: corporation (Microsoft); compiled, derived from C; 2010s

So maybe everyone isn't jumping on the "semi-interpreted" wagon.

Thursday, June 2, 2016

The big improvement in programming forty years ago

Programming has been around since the beginning of computers, and seen lots of improvements: symbolic assembly, high-level compilers (COBOL and FORTRAN), structured programming (Pascal), object-oriented programming (Smalltalk, C++), virtual machines (Java, C#), scripting languages (Perl, Python, Ruby)... the list goes on.

Yet a significant improvement in programming occurred forty years ago. It made programming simple -- so simple that a non-programmer could do it. And it was ignored by the programming community.

That improvement was... the electronic spreadsheet.

Programming, at its core, is the organization of data and the processing of that data with a sequence of instructions. The niceties of data structures, objects, and just-in-time compilation are just that: niceties. They are there for the convenience of the programmer.

So how do spreadsheets come into it? Spreadsheets, at their core, organize data and process that data with a series of instructions. (Sound familiar?)

Spreadsheets -- the basic grid of numbers and formulas, without the charts, pivot tables, and VBA code -- are programs. Any spreadsheet can be converted into just about any language, from Fortran or BASIC to Java or Python. (The reverse is not true; only a few simple programs in BASIC or Python can be converted into spreadsheets.)

The improvement that spreadsheets made to programming was immediacy. The "programmer" could see the results of a change right after making a change. That immediate feedback was not available in compiled languages, which require the programmer to save the file, compile the program, and then run it. (IDEs like Turbo Pascal and Visual Studio make those steps easy, but there is still a delay.) Even interpreted languages like BASIC or Ruby require the steps of saving and running.

This improvement in programming, the immediate results of a change in the program, went unnoticed by the programming community. Visicalc was created in 1979, almost forty years ago. At the time, popular programming languages were BASIC, COBOL, Fortran, and Pascal.

Instead of building on the innovation of the spreadsheet, programmers have gone in other directions. Programmers focused on maintainability (structured programming), larger programs (object-oriented programming), version control, automated testing, and response to changing requirements (agile methods).

There has been no (or very little) effort for the immediate feedback that we get with spreadsheets.

For forty years.

At some point, we are going to invent a new programming language, one that provides immediate feedback. (Perhaps a language, editor, and run-time environment, which is what a spreadsheet is.) The advantages are great, as anyone who works with a spreadsheet can attest.

Sunday, May 22, 2016

Small check-ins saved me

With all of the new technology, from cloud computing to tablets to big data, we can forget the old techniques that help us.

This week, I was helped by one of those simple techniques. The technique that helped was frequent, small check-ins to version control systems. I was using Microsoft's TFS, but this technique works with any system: TFS, Subversion, git, CVS, ... even SourceSafe!

Small, frequent changes are easier to review and easier to revert than large changes. Any version control system accepts small changes; the decision to make large or small changes is up to the developer.

After a number of changes, the team with whom I work discovered a defect, one that had escaped our tests. We knew that it was caused by a recent change -- we tested releases and found that it occurred only in the most recent release. That information limited the introduction of the defect to the most recent forty check-ins.

Forty check-ins may seem like a long list, but we quickly identified the specific check-in by using a binary search technique: get the source from the middle revision; if the error occurs move to the earlier half, if not move to the later half and start in that half's middle.

The real benefit occurred when we found the specific check-in. Since all check-ins were small, this check-in was too. (It was a change of five different lines.) It was easy to review the five individual lines and find the error.

Once we found the error, it was easy to make the correction to the latest version of the code, run our tests (which now included an addition test for the specific problem we found), verify that the fix was correct, and continue our development.

A large check-in would have required much more examination, and more time.

Small check-ins cost little and provide easy verification. Why not use them?

Sunday, May 15, 2016

Agile values clean code; waterfall may but doesn't have to

Agile and Waterfall are different in a number of ways.

Agile promises that your code is always ready to ship. Waterfall promises that the code will be ready on a specific date in the future.

Agile promises that your system passes the tests (at least the tests for code that has been implemented). Waterfall promises that every requested feature will be implemented.

There is another difference between Agile and Waterfall. Agile values clean code; Waterfall values code that performs as intended but has no notion of code quality. The Agile cycle includes a step for refactoring, a time for developers to modify the code and improve its design. The Waterfall method has no corresponding step or phase.

Which is not to say that Waterfall projects always result in poorly designed code. It is possible to build well-designed code with Waterfall. Agile explicitly recognizes the value of clean code and allocates time for correcting design errors. Waterfall, in contrast, has its multiple phases (analysis, design, coding, testing, and deployment) with the assumption that working code is clean code -- or code of acceptable quality.

I have seen (and participated in) a number of Waterfall projects, and the prevailing attitude is that code improvements can always be made later, "as time allows". The problem is that time never allows.

Many project managers have the mindset that developers should be working on features with "business value". Typically these changes fall into one of three categories: feature to increase revenue, features to reduce costs, and defect corrections. The mindset also considers any effort outside of those areas to be not adding value to the business and therefore not worthy of attention.

Improving code quality is an investment in the future. It is positioning the code to handle changes -- in requirements or staff or technology -- and reducing the effort and cost of those changes. In this light, Agile is looking to the future, and waterfall is looking to the past (or perhaps only the current release).

Thursday, May 5, 2016

Where have all the operating systems gone?

We used to have lots of operating systems. Every hardware manufacturer built their own operating systems. Large manufacturers like IBM and DEC had multiple operating systems, introducing new ones with new hardware.

(It's been said that DEC became a computer company by accident. They really wanted to write operating systems, but they needed processors to run the them and compilers and editors to give them something to do, so they ended up building everything. It's a reasonable theory, given the number of operating systems they produced.)

In the 1970s CP/M was an attempt at an operating system for different hardware platforms. It wasn't the first; Unix was designed for multiple platforms prior. It wasn't the only one; the UCSD p-System used a virtual processor quite like the virtual machine in the Java JVM and ran on various hardware.

Today we also see lots of operating systems. Commonly used ones include Windows, Linux, Mac OS, iOS, Android, Chrome OS, and even watchOS. But are they really different?

Android and Chrome OS are really variants on Linux. Linux itself is a clone of Unix. Mac OS is derived from NetBSD which in turn is derived from the Berkeley System Distribution of Unix. iOS and watchOS are, according to Wikipedia, "Unix-like", and I assume that they are slim versions of NetBSD with added components.

Which means that our list of commonly-used operating systems becomes:

  • Windows
  • Unix

That's a rather small list. (I'm excluding the operating systems used for special purposes, such as embedded systems in automobiles or machinery or network routers.)

I'm not sure that this reduction in operating systems, this approach to a monoculture, is a good thing. Nor am I convinced that it is a bad thing. After all, a common operating system (or two commonly-used operating systems) means that lots of people know how they work. It means that software written for one variant can be easily ported to another variant.

I do feel some sadness at the loss of the variety of earlier years. The early days of microcomputers saw wide variations of operating systems, a kind of Cambrian explosion of ideas and implementations. Different vendors offered different ideas, in hardware and software. The industry had a different feel from today's world of uniform PCs and standard Windows installations. (The variances between versions of Windows, or even between the distros of Linux, and much smaller than the differences between a Data General minicomputer and a DEC minicomputer.)

Settling on a single operating system is a way of settling on a solution. We have a problem, and *this* operating system, *this* solution, is how we address it. We've settled on other standards: character sets, languages (C# and Java are not that different), storage devices, and keyboards. Once we pick a solution and make it a standard, we tend to not think about it. (Is anyone thinking of new keyboard layouts? New character sets?) Operating systems seem to be settling.

Sunday, May 1, 2016

Waterfall and agile depend on customer relations

The debate between Agile and Waterfall methods for project management seems to have forgotten about customers, and more specifically the commitments made to customers.

Waterfall and Agile methods differ in that Waterfall promises a specific set of functionality on a specific date, and Agile promises that the product is always ready to ship (but perhaps not with all the features you want). These two methods require different techniques for project management, but also imply different relationships with customers.

It's easy to forget that the customers are the people who actually pay for the product (or service). Too often, we focus on the "internal customer" and think of due dates and service level agreements. But let's think about the traditional customers.

If your business promises specific functionality to customers on specific dates, they you probably want the waterfall project management techniques. Agile methods are a poor fit. They may work for the development and testing of the product, they don't mesh with the schedules developed by people making promises to customers.

Lots of companies promise specific deliverables at a future date. Some companies have to deliver releases in time for other, external events, such as Intuit's TurboTax in time for tax season. Microsoft has announced software in advance, sometimes to deter competition. (Microsoft is not alone in this activity.)

Not every company works this way. Google, for example, upgrades its search page and apps (Google Docs, Google Sheets) when they can. They have never announced -- in advance -- new features for their search page. (They do announce changes, sometimes a short period before they implement them, but not too far in advance.) Amazon.com rolls out changes to its sales pages and web service platform as they can. There is no "summer release" and no analog to a new car model year. And they are successful.

If your company promises new versions (or new products) on specific dates, you may want to manage your projects with the Waterfall method. The Agile method will fit into your schedule poorly, as it doesn't promise what Agile promises.

You may also want to review your company's philosophy towards releases. Do you need to release software on specific dates? Must you follow a rigid release schedule? For all of your releases?

Sunday, April 17, 2016

After the spreadsheet

The spreadsheet is a wonderful thing. It performs several functions: it holds and organizes data, it specifies calculations, and it presents results. All of these functions, wrapped into a single package, provides convenience to users. Yet that convenience of the single package will be the spreadsheet's undoing.

For the individual, the spreadsheet is a useful tool. But for the enterprise, the spreadsheet creates perhaps more problems than it solves. Since a spreadsheet file contains the data, formulas, and presentation of data, they are often replicated to share with co-workers (usually via e-mail) and duplicated to process different sets of data (the spring sales figures and then the summer sales figures, for example).

The replication of spreadsheets via e-mail can me mitigated by the use of shared file locations ("network drives") and by online versions of spreadsheets which allow for multiple users. But the bigger problem is the duplication of spreadsheets with minor changes.

The duplication of spreadsheet means the duplication of not only the data (which is often changed) but also the duplication of the formulas and the presentation (which often do not change). Since a spreadsheet contains all three components, a new version of data requires a new copy of all components. There is no way to share only one component, no way to share formulas against new data, or different presentations against data and formulas. This means that, over time, an enterprise of any size accumulates multiple spreadsheets with different data and duplicate formulas and macros -- at least you hope that they are duplicate copies.

The design of spreadsheets -- containing the data, formulas, and presentation in one package -- is a holdover from the days of Visicalc and Lotus 1-2-3. Those programs were developed for the Apple II and the IBM PC. With their ability to run only one program at a time, putting everything into one program made sense -- using one program for data, another for calculation, and a third for presentation was awkward and time-consuming. But that applies to the old single-tasking operating systems. Windows and Mac OS and Linux allow for multiple programs to run at the same time, and windowing systems allow for multiple programs to present information to the user at the same time.

If spreadsheets were being invented now in the age of web services and cloud systems and multi-window displays, their design would probably be quite different. Instead of a single program that performed all functions and a single file that contained data, formulas and presentation, we might have something very different. We might create a system of web services, providing data with some and performing calculations with others. The results could be displayed by yet other functions in other windows, possibly published for co-workers to view.

Such a multi-component system would follow the tenets of Unix, which recommends small, independent programs that read data, perform some processing, and provide data. The data and computations could be available via web services. A central service could "fan out" requests to collect data from one or more services, send that data through one or more computing services, and the provide the data to a presentation mechanism such as a graph in a window or even a printed report.

By separating the formulas and macros from the data, we can avoid needless duplication of both. (While most cases see the duplication of formulas to handle different data sets, sometimes different formulas can be applied to the same data.)

Providing data via web services is easy -- web services do that today. There are even web services to convert data into graphs. What about calculations? What language can be used to perform computations on data sets?

The traditional languages of C# and Java are not the best here; we're replacing spreadsheets with something equally usable by non-programmers (or at least similarly usable). The best candidate may be R, the statistical-oriented language. R is established, cross-platform, and capable. It's also a high-level language, close the the formulas of spreadsheets (and more powerful that Microsoft's VBA, which is used for macros in Excel).

Replacing spreadsheets with a trio of data management, computation, and presentation tools will not be easy. The advantages of the spreadsheet include convenience and familiarity. The advantages of separate components are better integration in cloud systems, leveraging of web services, and easier audits of formulas. It may not happen soon, but I think it will happen eventually.