Tuesday, March 31, 2015

Our tools shape our languages, and our programs

Our applications are shaped by many forces: budget, available talent, time, languages, and tools.

When designing applications, languages are sometimes overlooked. Yet it should be obvious to anyone who has used multiple languages. Different languages have different capabilities. C and its descendants have the notions of pointers, something absent in COBOL, Fortran, and BASIC. The notion of pointers lets one build dynamic, complex data structures; such structures are not possible in pointerless languages.

Our tools have an effect on our languages, which can have an effect on our applications.

In the Microsoft world, C# is a simple language, yet the .NET framework of classes is complex and wordy. A simple application in C# requires a significant amount of typing, and a moderately complex application requires a large amount of typing.

Languages such as Perl, Python, and Ruby require less typing. How is it that the .NET framework is so complex? The answer, I believe is in Visual Studio.

Microsoft's development environment is the result of competition and years (decades?) of development. Over that time, Microsoft has added features: debugging, test case management, design documents, and editing features. One significant editing feature is the auto-completion of typing with context-sensitive information. When I am typing a C# program and I start typing a name of a variable, Visual Studio gives me a list of existing variable names. (It also knows that the declaration of a variable is different from the use of a variable, and does not prompt me with existing names.) Visual Studio also lists methods for objects and provides templates for parameters.

This assistance from Visual Studio reduces the burden of typing. Thus, applications that use the .NET framework can appear wordy, but the effort to create them is less than it appears. (Some have argued that the assistance from Visual Studio helps reduce errors, as it suggests only valid possibilities.)

Thus, a development tool can affect our languages (and class frameworks), which in turn can affect our applications. When choosing our tools, we should be aware of their properties and how they can affect our projects.

Tuesday, March 24, 2015

Why Alan Turing is a hero to programmers

We in the programming world have few heroes. That is, we have few heroes who are programmers. There are people such as Bill Gates and Steve Jobs, but they were businessmen and visionaries, but not programmers.

Yet we do have a few heroes, a few programmers who have risen above the crowd. Here is a short list:

Grace Hopper Serving in the Navy, she created and advocated the idea of a compiler. At the time, computers were programmed either by wire (physical wires attached to plug-boards) or with assembly language. A compiler, converting English-like statements into machine instructions, was a bold step.

Donald Knuth His multi-volume work The Art of Computer Programming is a comprehensive work, comprising machine design, assemblers, compilers, searching, sorting, and the limits of digital computation. He also created the TeX language and the notion of "Literate Programming".

Brian Kernighan and Dennis Ritchie Created the C language.

Larry Wall Creator of the Perl language; also the creator of the 'patch' program used to apply changes across systems.

Fred Brooks The chief designer of IBM's operating system for the System/360 computers, his book The Mythical Man-Month has several interesting observations on the teams that construct software.

Gerald Weinberg Known for his books on system analysis and design, I find his work The Psychology of Computer Programming much more useful to programmers and program team managers.

All of these folks are (were) smart, creative, and contributors to the programming art. Yet one has a special place in this list: Alan Turing.

Alan Turing, subject of the recent movie "The Imitation Game", has made significant contributions to the programming craft. They are:

  • Code-breaking in World War II with the Bombe computer
  • The Turing Test
  • Turing Machines
  • The Halting Problem

All of these are impressive. Turing was many things: mathematician, biologist, philosopher, logician.

Of all of his accomplishments, I consider his proof of the halting problem to be the one act that raises him above our other heroes. His work with code-breaking clearly makes him a programmer. His idea of the Turing Test set clear (if perhaps unreachable) goals for artificial intelligence.

The notion of Turing machines, with the corresponding notion that one Turing machine can emulate another Turing machine is a brilliant insight, and enough to make him a hero above others.

Yet it is the halting problem, or more specifically Turing's proof of the halting problem (he proved that one cannot tell, in advance, that a program of sufficient complexity would be guaranteed to stop) is what pushes him across the line. The proof of the halting problem connects programming to mathematics.

Mathematics, of course, has been with us for centuries. It is as old as counting, and has rigorous and durable proofs. Euclid's work is two millennia old, yet still used today. It is these proofs that make mathematics special - math is the "Queen of the Sciences" and used by all other branches of knowledge.

Mathematics is not without problems. There is a proof called the Incompleteness Theorem, which states that any system of axioms (rules) that includes integers, there exist theorems which are known to be true yet cannot be proved with that system of axioms. (The Incompleteness Theorem also states that should you add an axiom to the system to allow such a proof, you will find that there are other theorems which are known to be true but not provable in the new system.)

That sounds a lot like the halting problem.

But the Incompleteness Theorem is the result of thousands of years of study, and computing is young; we have had digital computing for less than a century. Perhaps we can find another corresponding mathematical surprise, one that occurred earlier in the history of mathematics.

Perhaps Turing's proof of the halting problem is closer to the discovery of irrational numbers. The Greek philosophers were enchanted with mathematics, bot geometry and arithmetic. The reduction of physical phenomena to numbers was a specialty for them, and they loved integers and ratios. They were quite surprised (and by some accounts, horrified) to learn that numbers such as the square root of two could not be represented by an integer or a ratio of integers.

That kind of problem sounds close to our halting problem. For the Greeks, irrational numbers broke their nice, neat world of integers. For us, the halting problem breaks the nice, neat world of predictable programs. (To be clear, most of our programs do run "to completion" and halt. The theory states that we cannot know in advance that they will halt. We simply run them and find out.)

Turing gave us the proof of the halting problem. In doing so, he connected programming to mathematics, and (so I argue) us early programmers to the early mathematicians.

And that is why I consider Alan Turing a hero above others.

Thursday, March 19, 2015

The oldest high-level language

When discussing languages, we typically divide them into the groups "high-level" and "low-level".

The explanation I received at the time was that "low-level languages deal with machine-specific issues and high-level languages do not". I was taught this division when I started programming, many moons ago. It works, but it is awkward. It leads to the following:

Low-level languages:
  • Assembly
  • Machine language
Mid-level languages:
  • C
  • C++
  • Lisp
High-level languages:
  • Fortran
  • Cobol
  • BASIC
  • Pascal
  • C#
  • Java
  • Awk
  • Perl
  • Python
  • Ruby
  • Eiffel
  • Erlang
  • Haskell
The awkwardness is the imbalance. Almost every language is considered a high-level language.

If every language ends up "winning" in the high-level category, then the rule has little value. So I started thinking about different ways to divide languages.

The old rule is valid: languages which force the programmer to consider machine-specific details are low-level languages. That concept remains. What we need is a refinement, some concept that can split off some of the high-level languages and push them into the "mid-level" category.

For me, that refinement is memory layout.

If a programming language forces the programmer to think about memory layout, then it is, at best, a mid-level language.

This refinement is consistent with our original rule of machine-dependency. Machine language and assembly language are low-level languages (by convention) and they require attention for the layout of items in memory. Indeed, machine-language and assembly language require the programmer to think about the layout of instructions as well as data.

C and C++, with their pointers, require a fair amount of attention to memory layout. They also have structs which aggregate data elements in memory, and a fair amount of control over the alignment of elements within a struct. They stay within the mid-level group.

Cobol, normally considered a high-level language, drops to mid-level. Anyone who has programmed in Cobol knows that much attention is given to the layout of not only input and output records but also working storage (what most programmers would call "variables").

Fortran also drops to mid-level. Its handling of arrays, its COMMON blocks, and its EQUIVALENCE statement all make the programmer think about memory allocation.

Pascal too moves to mid-level, due to its pointer concept.

With this refinement, we now have the following list:

Low-level languages:
  • Assembly
  • Machine language
Mid-level languages:
  • C
  • C++
  • Lisp
  • Fortran
  • Cobol
  • Pascal
High-level languages:
  • BASIC
  • C#
  • Java
  • Awk
  • Perl
  • Python
  • Ruby
  • Eiffel
  • Erlang
  • Haskell
That improves my issue of "balance" -- although the low-level group remains small at only two members. Perhaps there is a further refinement that moves languages to that group. I am willing to let that wait for another day.

But I do notice that, of the high-level languages, BASIC is the oldest - and by a fair amount. BASIC was designed in the 1960s. The next languages were Awk (1970s) and Perl (1987). A programmer using BASIC does not have to worry about memory layout; variables are instantiated and reclaimed transparently. The BASIC language is limited, although it has been extended over time.

So using my refined heuristic, the oldest high-level programming language is BASIC.

Thursday, March 12, 2015

Smart watches are watches after all

The wristwatch has been called the "first wearable computer". Not the smart watch, but the classic, mechanical, wind-up wristwatch. I agree -- with a reservation about pocketwatches as being the first wearable computing devices.

Apple made a big splash in the news this week, finally announcing their smart watch (among other things). Apple calls their product a "watch". Is it? it seems much more than a plain old watch.

Apple did the same with their iPhone. The iPhone was really a small computer equipped with wireless communication. Yet who wants to carry around a small computer (wireless or not)? It's much easier (psychologically) to carry around a phone.

Looking at our experience with the iPhone, we can see that iPhones (and smart phones in general) are used for many purposes and only occasionally as a phone. We use them to take pictures, to watch movies, to play games, to read e-mail, to send and receive text messages, to navigate, to calculate, to bank online, ... The list is extensive.

Applying that logic to the Apple Watch (and smart watches in general) we can expect many purposes for them. Some of these will duplicate or extend our phones (those small wireless computers): notifications of appointments, navigation, display of e-mail and text messages, and of course to tell the time. Smart watches will offer new functions too: payments at checkout counters, unlocking house doors (equipped with smart locks), unlocking automobiles (and possibly replacing the key entirely), exchanging contact information (virtual business cards), ... the list is extensive.

Smart watches will provide convenience. Smart watches will also add a degree of complexity to our lives. Is my watch up to date? Have I charged it? Is my watch compatible with a merchant's payment system? Does it have a virus or other malware?

We'll call them "watches", since the name "small computing device that I wear on my wrist" is unwieldy. But that was the same issue with the original pocketwatches and wristwatches. They, too, were small computing devices. (Some, in addition to telling time, displayed the phase of the moon and other astronomic data. In the twentieth century, wristwatches often displayed the day of the month.)

So, yes, smart watches are not watches. They do much more than tell time. And yet they are watches, because we define the term "watch" to mean "small computing device that I wear on my wrist". We have for more than a century.


Thursday, February 26, 2015

The names of programming languages

A recent project involved a new programming language (a variant of the classic Dartmouth BASIC) and therefore saw the need for a name for the new language. Of course, a new name should be different from existing names, so I researched the names of programming languages.

My first observation was that we, as an industry, have created a lot of programming languages! I usually think of the set of languages as BASIC, FORTRAN, COBOL, Pascal, C, C++, Java, C#, Perl, Python, and Ruby -- the languages that I use currently or have used in the past. If I think about it, I add some other common languages: RPG, Eiffel, F#, Modula, Prolog, LISP, Forth, AWK, ML, Haskell, and Erlang. (These a programming languages that I have either read about or discussed with fellow programmers.)

As I surveyed existing programming languages, I found many more languages. I found extinct languages, and extant languages. And I noticed various things about their names.

Programming languages, except for a few early languages, have names that are easily pronounceable. Aside from the early "A-0" and "B-0", most languages have recognizable names. We switched quickly from designations of letters and numbers to names like FORTRAN and COBOL.

I also noticed that some names last longer than others. Not just the languages, but the names. The best example may be "BASIC". Created in the 1960s, the BASIC language has undergone a number of changes (some of them radical) and has had a number of implementations. Yet despite its changes, the name has remained. The name has been extended with letters ("CBASIC", "ZBASIC", "GW-BASIC"), numbers ("BASIC-80", "BASIC09"), symbols ("BASIC++"), prefix words ("Visual Basic", "True Basic", "Power Basic"), and sometimes suffixes ("BASIC-PLUS"). Each of these names was used for a variant of the original BASIC language, with separate enhancements.

Other long-lasting names include "LISP", "FORTRAN", and "COBOL".

Long-lasting names tend to have two syllables. Longer names do not stay around. The early languages "BACAIC", "COLINGO", "DYNAMO", "FLOW-MATIC", "FORTRANSIT", "JOVIAL", "MATH-MATIC", "MILITRAN", "NELIAC", and "UNICODE" (yes it was a programming language, different from today's character set) are no longer with us.

Short names of single letters have little popularity. Aside from C (the one exception), other languages (B, D, J) see limited acceptance. The up-and-coming R language for numeric analysis (derived from S, another single-letter language) may have limited acceptance, based on the name. It may be better to change the name to "R-squared" with the designation "R2".

Our current set of popular languages have two-syllable names: "VB" (pronounced "vee bee"), "C#" ("see' sharp"), Java, Python, and Ruby. Even the database language SQL is pronounced "see' kwell" to give it two syllables. Popular languages with only one syllable are Perl (which seems to be on the decline) C, and Swift.

PHP and C++ have three names with syllables. Objective-C clocks in with a possibly unwieldy four syllables; perhaps this was an incentive for Apple to change to Swift.

I expect our two-syllable names to stay with us. The languages may change, as they have changed in the past.

As for my new programming language, the one that was derived from BASIC? I picked a new name, not a variant of BASIC. As someone has already snagged the name "ACIDIC", I chose the synonym alkaline, but changed it to a two-syllable form: Alkyl.

Monday, February 16, 2015

Goodbye, printing

The ability to print has been part of computing for ages. It's been with us since the mainframe era, when it was necessary for developers (to get the results of their compile jobs) and businesspeople (to get the reports needed to run the business).

But printing is not part of the mobile/cloud era. Oh, one can go through various contortions to print from a tablet, but practically no one does. (If any of my Gentle Readers does print from a tablet or smartphone, you can consider yourself a rare bird.)

Printing was really sharing.

Printing served three purposes: to share information (as a report or a memo), to archive data, or to get a bigger picture (larger than a display terminal).

Technology has given us better means of sharing information. With the web and mobile, we can send an e-mail, we can post to Facebook or Twitter, we can publish on a blog, we can make files available on web sites... We no longer need to print our text on paper and distribute it.

Archiving was sharing with someone (perhaps ourselves) in the future. It was a means of storing and retrieving data. This, too, can be handled with newer technologies.

Getting the big picture was important in the days of "glass TTY" terminals, those text-only displays of 24 lines with 80 characters each. Printouts were helpful because they offered more text at one view. But now displays are large and can display more than the old printouts. (At least one page of a printout, which is what we really looked at.)

The one aspect of printed documents which remains is that of legal contracts. We rely on signatures, something that is handled easily with paper and not so easily with computers. Until we change to electronic signatures, we will need paper.

But as a core feature of computer systems, printing has a short life. Say goodbye!

Thursday, February 12, 2015

Floating-point arithmetic will die

Floating-point arithmetic is popular. Yet despite its popularity, I foresee its eventual demise. It will be replaced by arbitrary-precision arithmetic, which offers more accuracy.

Here is a list of popular languages (today) and the availability of calculations with something other than floating-point arithmetic:

COBOL       COMP-3 type (decimal fixed precision)
FORTRAN  none
C                  none (C does not support classes)
C++             GMP package (arbitrary precision)
Java             BigDecimal class (arbitrary precision)
C#                decimal type (a bigger floating-point)
Perl              Rat type (arbitrary precision) ('rat' for 'rational')
Ruby            BigDecimal class (arbitrary precision)
Python         decimal class (arbitrary precision)
JavaScript    big.js script (arbitrary precision)
Swift            none native; can use GMP
Go               'big' package (arbitrary precision)

Almost all of the major languages support something 'better' than floating-point arithmetic.

I put the word 'better' in quotes because the change from floating-point to something else (arbitrary-precision or decimal fixed-precision) is a trade-off. Floating-point arithmetic is fast, at the expense of precision. The IEEE standard for floating-point is good: it allows for a wide range of numbers in a small set of bits and the math is fast. Most computer systems have hardware co-processors for floating-point operations, which means they are very fast.

Arbitrary-precision arithmetic, in contrast, is slow. There are no co-processors to handle it (at least none in mainstream hardware) and a software solution for arbitrary precision is slower than even a software-only solution for floating-point.

Despite the costs, I'm fairly confident that we, as an industry, will switch from floating-point arithmetic to arbitrary-precision arithmetic. Such a change is merely one of along line of changes, each trading computing performance for programmer convenience.

Consider previous changes of convenience:

  • Moving from assembly language to higher-level languages such as COBOL and FORTRAN
  • Structured programming, which avoided GOTO statements and used IF/ELSE and DO/WHILE statements
  • Object-oriented programming (OOP) which enabled encapsulation and composition
  • Run-time checks on memory access
  • Virtual machines (Java's JVM and .NET's CLR) which allowed more run-time checks and enhanced debugging

Each of these changes was made over the objections of performance. And while the older technologies remained, they became niche technologies. We still have assembly language, procedural (non-OOP) programming, and systems without virtual machines. But those technologies are used in a small minority of projects. The technologies that offer convenience for the programmer became mainstream.

Floating-point arithmetic costs programmer time. Code that uses floating-point types must be carefully designed for proper operation, and then carefully reviewed and tested. Any changes to floating-point code must be carefully reviewed. All of these reviews must be done by people familiar with the limitations of floating-point arithmetic.

Not only must the designers, programmers, and reviewers be familiar with the limitations of floating-point arithmetic, they must be able to explain them to other folks involved on the project, people who may be unfamiliar with floating-point arithmetic.

When working with floating-point arithmetic, programmers are put in the position of apologizing for the failings of the computer. Failings that are not easily understood; any schoolage child knows that 0.1 + 0.2 - 0.3 is equal to zero, not some small amount close to zero.

I believe that it is this constant need to explain the failings of floating-point arithmetic that will be its undoing. Programmers will eventually start using arbitrary-precision arithmetic, if for no other reason than to get them out of the explanations of rounding errors. And for most applications, the extra computing time will be insignificant.

Floating-point, like other fallen technologies, will remain a niche skill. But it will be out of the mainstream. The only question is when.