Thursday, March 19, 2015

The oldest high-level language

When discussing languages, we typically divide them into the groups "high-level" and "low-level".

The explanation I received at the time was that "low-level languages deal with machine-specific issues and high-level languages do not". I was taught this division when I started programming, many moons ago. It works, but it is awkward. It leads to the following:

Low-level languages:
  • Assembly
  • Machine language
Mid-level languages:
  • C
  • C++
  • Lisp
High-level languages:
  • Fortran
  • Cobol
  • BASIC
  • Pascal
  • C#
  • Java
  • Awk
  • Perl
  • Python
  • Ruby
  • Eiffel
  • Erlang
  • Haskell
The awkwardness is the imbalance. Almost every language is considered a high-level language.

If every language ends up "winning" in the high-level category, then the rule has little value. So I started thinking about different ways to divide languages.

The old rule is valid: languages which force the programmer to consider machine-specific details are low-level languages. That concept remains. What we need is a refinement, some concept that can split off some of the high-level languages and push them into the "mid-level" category.

For me, that refinement is memory layout.

If a programming language forces the programmer to think about memory layout, then it is, at best, a mid-level language.

This refinement is consistent with our original rule of machine-dependency. Machine language and assembly language are low-level languages (by convention) and they require attention for the layout of items in memory. Indeed, machine-language and assembly language require the programmer to think about the layout of instructions as well as data.

C and C++, with their pointers, require a fair amount of attention to memory layout. They also have structs which aggregate data elements in memory, and a fair amount of control over the alignment of elements within a struct. They stay within the mid-level group.

Cobol, normally considered a high-level language, drops to mid-level. Anyone who has programmed in Cobol knows that much attention is given to the layout of not only input and output records but also working storage (what most programmers would call "variables").

Fortran also drops to mid-level. Its handling of arrays, its COMMON blocks, and its EQUIVALENCE statement all make the programmer think about memory allocation.

Pascal too moves to mid-level, due to its pointer concept.

With this refinement, we now have the following list:

Low-level languages:
  • Assembly
  • Machine language
Mid-level languages:
  • C
  • C++
  • Lisp
  • Fortran
  • Cobol
  • Pascal
High-level languages:
  • BASIC
  • C#
  • Java
  • Awk
  • Perl
  • Python
  • Ruby
  • Eiffel
  • Erlang
  • Haskell
That improves my issue of "balance" -- although the low-level group remains small at only two members. Perhaps there is a further refinement that moves languages to that group. I am willing to let that wait for another day.

But I do notice that, of the high-level languages, BASIC is the oldest - and by a fair amount. BASIC was designed in the 1960s. The next languages were Awk (1970s) and Perl (1987). A programmer using BASIC does not have to worry about memory layout; variables are instantiated and reclaimed transparently. The BASIC language is limited, although it has been extended over time.

So using my refined heuristic, the oldest high-level programming language is BASIC.

No comments: