Showing posts with label programming languages. Show all posts
Showing posts with label programming languages. Show all posts

Thursday, August 3, 2023

We'll always have source code

Will AI change programming? Will it eliminate the need for programmers? Will we no longer need programs, or rather, the source code for programs? I think we will always have source code, and therefore always have programmers. But perhaps not as we think of programmers and source code today.

But first, let's review the notions of computers, software, and source code.

Programming has been with us almost as long as we have had electronic computers.

Longer than that, if we include the punch cards used by the Jacquard loom, But let's stick to electronic computers and the programming of them.

The first digital electronic computers were built in the 1940s. They were programming not by software but by wires -- connecting various wires to various points to perform a specific set of computations. There was no concept of a program -- at least not one for the computer. There were no programming languages and there was no notion of source code.

The 1950s saw the introduction of the stored-program computer. Instead of wiring plug-boards, program instructions were stored in cells inside the computer. We call these instructions "machine code". When programming a computer, machine code is a slightly more convenient than wiring plug-boards, but not by much. Machine code consists of a number of instructions, which each reside at distinct, sequential locations in memory. The processor executes the program by simply reading one instruction from a starting location, executing it, and then reading the next instruction at the next memory address.

Building a program in machine code took a lot of time and required patience and attention to detail. Changing a program often meant inserting instructions, which meant that the programmer had to recalculate all of the destination addresses for loops, branches, and subroutines. With stored-program computers, there was the notion of programming, but not the notion of source code.

Source code exists to be processed by a computer and converted into machine code. We first had source code with symbolic assemblers. Assemblers were (and still are) programs that read a text file and generate machine code. Not just any text file, but a text file that follows specific rules for content and formatting, and specifies a series of machine instructions but as text -- not as numbers. The assembler did the grunt work of converting "mnemonic" codes to numeric machine codes. It also converted numeric and text data to the proper representation for the processor, and calculated the destinations for loops, branches, and subroutines. Revising a program written in assembly language was much easier than revising machine code.

Later languages such as FORTRAN and COBOL converted higher-level text into machine code. They, too, had source code.

Early C compilers converted code into assembly code, which then had to be processed by an assembler. This last sequence looked like this:

    C source code --> [compiler] --> assembly source code --> [assembler] --> machine code

I've listed both the C code and the assembly code as "source code", but in reality only the C code is the source code. The assembly code is merely an intermediate form of the code, something generated by machine and later read by machine.

A better description of the sequence is:

    C source code --> [compiler] --> assembly code --> [assembler] --> machine code

I've changed the "assembly source code" to "assembly code". The adjective "source" is not really correct for it. The C program (at the left) is the one and only source.

Later C compilers omitted this intermediate step and generated machine code directly. The sequence become:

    C source code --> [compiler] --> machine code

Now let's consider AI. (You didn't forget about AI, did you?)

AI can be used to create programs in two ways. One is to enhance a traditional programming IDE with AI, and thereby assist the programmer as he (or she) is typing. That's no different from our current process; all we have done is made the editor a bit smarter.

The other way is to use AI directly and ask it to create the program. In this method, a programmer (or perhaps a non-programmer) provides a prompt text to an AI engine and the AI engine creates the entire program, which is then compiled into machine code. The sequence looks like this:

    AI prompt text --> [AI engine] --> source code --> [compiler] --> machine code

Notice that the word "source" has sneaked back into the middle of the stream. The term doesn't belong there; that code is intermediate and not the source. A better description is:

    Source AI prompt text --> [AI engine] --> intermediate code --> [compiler] --> machine code

This description puts the "source" back on the first step of the process. That prompt text is the true source code. One may argue that a prompt text is not really source code, that it is not specific enough, or not Turing-complete, or not formatted like a traditional program. I think that it is the source code. It is created by a human and it is the text used by the computer to generate the machine code that we desire. That makes it the source.

Notice that in this new process with AI, we still have source code. We still have a way for humans to instruct computers. I've been writing about source code as if it were written. Source code has always been written (or typed, or keypunched) in the past. It is possible that future systems recognize human speech and build programs from that (much like on several science fiction TV programs). If so, those spoken words will be the source code.

AI may change the programming world. It may upend the industry. It may force many programmers to learn new skills, or to retire. But humans will always want to express their desires to computers. The way they express them may be through text, or through speech, or (in some far-off day) through direct neural links. Those thoughts will be source code, and we will always have it. The people who create that source code are programmers, so we will always have them.

We will always have source code and programmers, but source code and programming will change over time.

Tuesday, July 26, 2022

Mental models of computers

In the old movie "The Matrix", there is a scene in which the character Cipher is looking at code and another character, Neo, asks why he looks at the code and not the presentation-level view. Cipher explains that the code is better, because the presentation level is designed to fool us humans. At this the moment that Neo re-thinks his view of computers.

That scene (some years after the debut of the movie) got me thinking about my view of computers.

My mental model of computers is based on text. That is, I think of a computer as a device that processes text and talks to other devices that process text. The CPU processes text, terminals display text to users and accept text via the keyboard, printers print text, and disks store text. (Disks also store data in binary form, but for me that is simply a strange form of text.)

This model is perhaps not surprising, as my early experiences with computers were with text-oriented devices and text-oriented programs. Those computers included a DEC PDP-8 running BASIC; a DECsystem-10 running TOPS-10 with FORTRAN, Pascal, and a few other text-oriented languages; a Heathkit H-89 running HDOS (an operating system much like DEC's RT-11) and BASIC, assembly language, FORTRAN, C, and Pascal.

The devices I used to interact with computers were text terminals. The PDP-8 used Teletype ASR-33s, which had large mechanical keyboards (way more mechanical than today's mechanical keyboards) and printed text on a long continuous roll of paper. The DECsystem-10 and the H-89 both used CRT terminals (no paper) and mostly text with a few special graphics characters.

In those formative years, all of my experience with computers was for programming. That is, the primary purpose of a computer was to learn programming and to do programming. Keep in mind that this was before much of the technical world we have today. There was no Google, no Netflix, no Windows. Spreadsheets were the new thing, and even they were text-oriented. The few graphs that existed in computing were made on special (read that as "expensive and rare") equipment that few people had.

In my mind, back then, computers were for programming and programming was a process that used text and the computers used text so they were a good match for programming.

Programming today is still a text-oriented process. The "source code" of programs, the version that we humans write and that computers either compile or interpret into executed code, is text. One can write programs in the Windows "Notepad" program. (One must save them to disk and then tell the compiler to convert that saved file, but that is simply the process to get a program to run.)

So what does this have to do with "The Matrix" and specifically why is that one scene important?

It strikes me that while my experience with computers started with programming and text-oriented devices, not every (especially now-a-days) has that same experience. Today, people are introduced to computers with cell phones, or possibly tablets. A few may get their first experience on a laptop running Windows or macOS.

All of these are different from my text-based introduction. And all of these are graphics-based. People today, when they first encounter computers, encounter graphical interfaces, and use computers for many things other than programming. People today must have a very different mental model about computers. I saw computers as boxes that processed text. Today, most people see computers as boxes that process graphics, and sound, and voice.

What a shock it must be for someone today to start to learn programming. They are taken out of their comfortable mental model and forced to use text. Some classes begin with simple "hello, world" programs that not only use text source code but also produce text output. How primitive this must seem to people familiar with graphical interfaces! (Some classes begin with simple programs that present web pages, which is a bit better in that the output is familiar, but the source code is still text.)

But this different mental model may be a problem for people entering the programming world. They are moving from a graphical world to a text-based world, and that transition can be difficult. Modern IDE programs ease the transition by allowing many operations in a graphical environment, but the source code remains text.

Do people revolt? Do they reject the text-oriented approach to source code? I imagine that some find the change in mental models difficult, perhaps too difficult, and they abandon programming.

A better question is: Why has no one created a graphical-oriented programming language? Not just a programming language in an IDE -- we already have those. I'm thinking of a new approach to programming, something very different from the text approach of today.

It might be that programming has formed a self-reinforcing loop. Only programmers can create new programming languages and programming environments, and these programmers (obviously) are comfortable with the text paradigm. Perhaps the see no need to make such a large change.

Or it might be that the text model is the best model for programming. Programming is the organization of ideas into clearly specified collections and operations, and text handles that task better than graphics. Visual representations of collections and operations can be clear, and they can be ambiguous. (But then, text representations can also be ambiguous, so I'm not sure that there is a clear advantage for text.)

Or possibly we simply have not seen the right person to come along, with the right mix of technical skills, graphics abilities, and desire for a visual programming language. It may be that graphical programming languages are possible, and that we just haven't invented them.

I want to think it is the last of these reasons, because that means there is a lot more for us to learn about programming. The introduction of a visual programming language will open new vistas for programming, and applications, and computing.

I want to think that there will always be something new for the programmers.

Thursday, May 5, 2022

C but for Intel X86

Let's step away from current events in technology, and indulge in a small reverie. Let's play a game of "what if" with past technology. Specifically the Intel 8086 processor.

I will start with the C programming language. C was designed for the DEC PDP-7 and PDP-11 processors. Those processors had some interesting features, and the C language reflects that. One example is the increment and decrement operators (the '++' and '--' operators) which map closely to addressing modes on the DEC processors.

Suppose someone had developed a programming language for the Intel 8086, just as Kernighan and Ritchie developed C for the PDP-11. What would it look like?

Let's also suppose that this programming language was developed just as the 8086 was designed, or shortly thereafter. It was released in 1978. We're looking at the 8086 (or the 8088, which has the identical instruction set) and not the later processors.

The computing world in 1978 was quite different from today. Personal computers were just entering the market. Apple and Commodore had computers that used the 6502 processor; Radio Shack had computers that used the Z-80 and the 68000.

Today's popular programming languages didn't exist. There was no Python, no Ruby, no Perl, no C#, and no Java. The common languages were COBOL, FORTRAN, BASIC, and assembler. Pascal was available, for some systems. Those would be the reference point for a new programming language. (C did exist, but only in the limited Unix community.)

Just as C leveraged features of the PDP-7 and PDP-11 processors, our hypothetical language would leverage features of the 8086. What are those features?

One feature that jumps out is text strings. The 8086 has instructions to handle null-terminated text. It seems reasonable that an 8086-centric language would support them. They might even be a built-in type for the language. (Strings do require memory management, but that is a feature of the run-time library, not the programming language itself.)

The 8086 supports BCD (binary-converted decimal) arithmetic. BCD math is rare today, but it was common on IBM mainframes and a common way to encode data for exchange with other computers.

The 8086 had a segmented architecture, with four different segments (code, data, stack, and "extra"). Those four segments map well to Pascal's organization of code, static data, stack, and heap. (C and C-derivatives use the same organization.) A language could support dynamic allocation of memory and recursive functions (two things that were not available in COBOL, FORTRAN, or BASIC). And it could also support a "flat" organization like those used in COBOL and FORTRAN, in which all variables and functions are laid out at link time and fixed in position.

There would be no increment or decrement operators.

Who would build such a language? I suppose a natural choice would be Intel, as they knew the processor best. But then, maybe not, as they were busy with hardware design, and had no operating system on which to run a compiler or interpreter.

The two big software houses for small systems (at the time) were Microsoft and Digital Research. Both had experience with programming languages. Microsoft provided BASIC for many different computer systems, and also had FORTRAN and COBOL. Digital Research provided CBASIC (a compiled BASIC) and PL/M (a derivative of IBM's PL/I).

IBM would probably not create a language for the 8086. They had no offering that used that processor. The IBM PC would arrive in 1981, and IBM didn't consider it a serious computer -- at least not until people started buying them in large quantities.

DEC, at the time successful with minicomputers, also had no offering that used the 8086. DEC offered many languages, but used their own processors.

Our language may have been developed by a hardware vendor, such as Apple or Radio Shack, but they like Intel were busy with hardware and did very little in terms of software.

So it may have been either Microsoft or Digital Research. Both companies were oriented for business, so a language developed by either of them would be oriented for business. A new language for business might be modelled on COBOL, but COBOL didn't allow for variable-length strings. FORTRAN was oriented for numeric processing, and it didn't handle strings either. Even Pascal had difficulty with variable-length strings.

My guess is that our new language would mix elements of each of the popular languages. It would be close to Pascal, but with more flexibility for text strings. It would support BCD numeric values, not only in calculations but also in input-output operations. The language would be influenced by COBOL's verbose approach to a language.

We might see something like this:

PROGRAM EXAMPLE
DECLARE
  FILE DIRECT CUSTFILE = "CUST.DAT";
  RECORD CUSTREC
    BCD ACCTNUM PIC 9999,
    BCD BALANCE PIC 9999.99;
    STRING NAME PIC X(20);
    STRING ADDRESS PIC X(20);
  FILE SEQUENTIAL TRANSFILE = "TRANS.DAT";
  RECORD TRANSREC
    BCD ACCTNUM PIC 9999,
    BCD AMOUNT PIC 999.99;
  STRING NAME;
  BCD PAYMENT,TAXRATE;
PROCEDURE INIT
  OPEN CUSTFILE, TRANSFILE;
PROCEDURE MAINLINE
  WHILE NOT END(TRANSFILE) BEGIN
    READ TRANSFILE TO TRANSREC;
  END;
CALL INIT;
CALL MAINLINE;

and so forth. This borrows heavily from COBOL; it could equally borrow from Pascal.

It may have been a popular language. Less verbose than COBOL, but still able to process transactions efficiently. Structured programming from Pascal, but with better input-output. BCD data for efficient storage and data transfer to other systems.

It could have been a contender. In an alternate world, we could be using programming languages derived not from C but from this hybrid language. That might solve some problems (such as buffer overflows) but maybe given us others. (What problems, you ask? I don't know. I just now invented the language, I'll need some time to find the problems.)


Wednesday, January 12, 2022

Successful programming languages

The IT world has seen a number of programming languages. Some became popular and some did not.

With more than half a century of experience, we can see some patterns in languages that became popular. First, let's review some of the popular programming languages.

FORTRAN and COBOL were successful because they met a business need (or two, perhaps). The primary need was for a programming language that was easier to understand than assembly language. The secondary need was for a programming language that could be used across computers of different manufacturers, allowing companies to move programs from one vendor's hardware to another. Both FORTRAN and COBOL met those needs.

BASIC was modestly successful on minicomputers and wildly successful on personal computers. It was possible to run BASIC on the small PCs (sometimes with as little as 4K of memory!). It was easy to use, and amateur programmers could write and run programs relatively easily. It filled the technical space of timesharing on minicomputers, and the technical space of personal computers.

C became popular in the Unix world because it was included in Unix distributions. If you ran Unix, you most likely programmed in C. One could say that it was pushed upon the world by AT&T, the owners of Unix.

SQL became successful in the 1980s, just as databases became available and popular. Prior to databases, computer systems offered "file managers" and "file access libraries" which allowed basic operations on records but not tables. Each library had its own set of capabilities and its own API. SQL provided a common set of operations and a common API. If allowed businesses to move easily from one database to another,  guaranteed a core set of operations, and permitted a broad set of programmers to work on multiple systems.

C++ became popular because it solved a problem with C programs, namely the organization of large programs. C++ offered object-oriented concepts of classes, inheritance, encapsulation, and polymorphism, each of which helps organize code.

Visual Basic became popular because it provided an easy way to write programs for Windows. Microsoft's earlier Visual C++ required knowledge of the Windows API and lots of discipline. Visual Basic required neither, hiding the Windows API from the programmer and providing safety in the programming language.

Objective-C became popular because Apple used it for programming applications for Macintosh computers. (Later, when Apple switched to the Swift programming language, interest in Objective-C plummeted.)

Java became popular because it promised that people could write programs once and then run them anywhere. It did a good job of delivering on that promise, too.

C# is Microsoft's version of Java (its second version, after Visual J++) and is popular only in that Microsoft pushes it. If Microsoft were to disappear overnight, interest in C# would drop dramatically.

Swift is Apple's language for development of applications for the iPhone, iPad, and other Apple products. It is successful because Apple pushes it upon the world.

JavaScript became popular because it was ubiquitous. Much like BASIC on all PCs, JavaScript was in all browsers, and the combination of HTML, the DOM, and JavaScript allowed for web applications with powerful processing in the browser.

Python became popular because it was a better Perl. Python had a simpler syntax and also had object-oriented programming built in.

Notice that, with the exception of Rust replacing C or C++, these new languages become popular in new spaces. They don't replace an existing popular language. BASIC didn't replace COBOL or FORTRAN, it became popular in the new spaces of timesharing and personal computers. C# didn't replace Java; it joined Visual Basic in the Microsoft space and slowly gained in popularity as Microsoft supported it more than Visual Basic.

So we can see that there are a few reasons that languages become popular:

  • A vendor pushes it
  • It solves a commonly-recognized business problem
  • It fills a technical space

If we accept these as the reasons that languages become popular, we can make some predictions about new languages that become popular. We can say that, if a major vendor pushed a language for its projects (a vendor such as Amazon, for example) then that language would become popular.

Or, we could say that a new technical space would allow a new language to become popular.

Or, if there is a commonly recognized business problem with our current set of programming languages, a new language that solves that problem would become popular.

So what do we see?

If we accept that C and C++ have problems (memory management, buffer overflows) and we accept that those problems are commonly recognized, then we can see the replacement of C and C++ with a different language, one that addresses those problems. The prime contender for that is Rust. We may see a gradual shift from C and C++ programming to Rust, as more and more people develop confidence in Rust and develop a fear of memory management and buffer overrun issues.

One technical space that could provide an opportunity for a new programming language is the Internet of Things (IoT). Appliances and devices must communicate with each other and with servers. I suspect that the IoT space is in need more of protocols than of programming languages, but perhaps there is room for a programming language that works with new protocols to establish trusted connections and, more importantly, trusted updates.

A third area is a teaching language. BASIC was designed, initially, for people not familiar with programming. Pascal was designed to teach structured programming. Do we have a language designed to teach object-oriented programming? To teach functional programming? Do we even have a language to teach basic programming? BASIC and Pascal have long been discarded by the community, and introductory courses now often use Java, JavaScript, or Python which are all rather complex languages. A new, simple language could gain popularity.

As we develop new devices for virtual reality, augmented reality, and other aides, we may need new programming languages.

Those are the areas that I think will see new programming languages.

Friday, August 27, 2021

Replacing a language

Do programming languages ever get replaced? Languages are introduced, and some rise and fall in popularity (and some rise after a fall), but very few ever disappear completely.

There are some languages that have disappeared. Two are FLOW-MATIC and NELIAC, languages from the 1950s.

Some languages have fallen to very low levels of usage. These include BASIC, PL/I, PL/M, and DIBOL.

Some languages never die (or at least haven't died yet). The go-to examples are COBOL and FORTRAN. Organizations continue to use them. Fortran is used for some parts of the R implementation, so anyone using R is in a sense using Fortran. We can include SQL in this list of "immortal" programming languages.

Some languages are replaced with new versions: C# version 6 replaced a very similar C# version 5, for example. VB.NET replaced VB6. And while there are a few people who use the old Visual Basic 6, they are a very small number.

The replacement of a language isn't so much about the language as it is about popularity. Several groups measure the popularity of languages: Tiobe, Stack Overflow, the Popularity of Programming Languages (PYPL)... They all rank the languages and show pretty tables and charts.

The mechanics of popularity is such that the "winners" (the most popular languages) are shown, and the "losers" (the less popular languages) are omitted from the table, or shown in a small note at the bottom. (Not unlike the social scene in US high schools.)

If we use this method, then any language that enters the "top 10" or "top 40" (or "top whatever") chart is popular, and any language that is not in the list is not popular. A language that never enters the "top" list is never really replaced, because it never "made it".

A language that does enter the "top" list and then falls out of it has been replaced. That language may still be used, perhaps by thousands, yet it is now considered a "loser".

For this method (referring to the Tiobe index), languages that have been replaced include: Objective-C, COBOL, Prolog, and Ada. They were popular, once. But now they are not as popular as other languages. (I can almost hear the languages protesting a la Norma Desmond: "We are big! It's the programs that got small!")

Sometimes we can state that one specific language replaced another. VB.NET replace VB6, because Microsoft engineered and advertised VB.NET to replace VB6. The same holds for Apple and Swift replacing Objective-C. But can we identify a specific language that replaced COBOL? Or a single language that replaced FORTRAN? Did C++ replace C? Is Rust replacing C++?

We can certainly say that, for a specific project, a new language replaces an older one. Perhaps we can say the same for an organization, if they embark on a project to re-write their code in a new language. I'm not sure that we can make the assertion for the entire IT industry. IT is too large, with too many companies and too many individuals, to verify such a claim in absolute terms. All we can rely on is popularity.

But popularity is not the measure of a language. It doesn't measure the language's capabilities, or its reliability, or its ability to run on multiple platforms.

We don't care about popularity for the technical aspects of the language. We care about the popularity of a language for us, for ourselves.

Product managers care about the popularity of a language because of staffing. A popular language will have lots of people who know it, and therefore the manage will have a (relatively) easy time of finding candidates to hire. An obscure language will have few candidates, and they may demand a higher wage.

Individuals care about the popularity of a language because it means that there will be lots of companies hiring for that skill. Few companies will hire for an obscure programming language.

This essay has meandered from replacing a language to popularity of languages to the concerns of hiring managers and candidates. That's probably not a coincidence. Economic activity drives a lot of behavior; I see no reason that programming should be exempt. When thinking about a programming language, think about the economics, because that will contribute a lot to the language's life.

Sunday, August 15, 2021

COBOL and Elixir

Someone has created a project to transpile (their word) COBOL to Elixir. I have some thoughts on this idea. But first, let's look at the example they provide.

A sample COBOL program:

      >>SOURCE FORMAT FREE
IDENTIFICATION DIVISION.
PROGRAM-ID. Test1.
AUTHOR. Mike Binns.
DATE-WRITTEN. July 25th 2021
DATA DIVISION.
WORKING-STORAGE SECTION.
01 Name     PIC X(4) VALUE "Mike".
PROCEDURE DIVISION.

DISPLAY "Hello " Name

STOP RUN.

This is "hello, world" in COBOL. Note that it is quite longer than equivalent programs in most languages. Also note that while long, it is still readable. Even a person who does not know COBOL can make some sense of it.

Now let's look at the same program, transpiled to Elixr:

defmodule ElixirFromCobol.Test1 do
  @moduledoc """
  author: Mike Binns
  date written: July 25th 2021
  """

  def main do
    try do
      do_main()
    catch
      :stop_run -> :stop_run
    end
  end 

  def do_main do
    # pic: XXXX
    var_Name = "Mike"
    pics = %{"Name" => {:str, "XXXX", 4}}
    IO.puts "Hello " <> var_Name
    throw :stop_run
  end
end

That is ... a lot of code. More than the code for the COBOL version! Some of that is due to the exception of "stop run" which in this small example seems to be excessive. Why wrap the core function inside a main() that simply exists to trap the exception? (There is a reason. More on that later.)

I'm unsure of the reason for this project. If it is a side project made on a whim, and used for the entertainment (or education) of the author, then it makes sense.

But I cannot see this as a serious project, for a couple of reasons.

First, the produced Elixir code is longer, and in my opinion less readable, than the original COBOL code. I may be biased here, as I am somewhat familiar with COBOL and not at all familiar with Elixir, so I can look at COBOL code and say "of course it does that" but when I look at Elixir code I can only guess and think "well, maybe it does that". Elixir seems to follow the syntax for modern scripting languages such as Python and Ruby, with some unusual operators.

Second, the generated Elixir code provides some constructs which are not used. This is, perhaps, an artifact of generated code. Code generators are good, up to a point. They tend to be non-thinking; they read input, apply some rules, and produce output. They don't see the bigger picture. In the example, the transpiler has produced code that contains the variable "pics" which contains information about the COBOL programs PICTURE clauses, but this "pics" variable is not used.

The "pics" variable hints at a larger problem, which is that the transpiled code is not running the equivalent program but is instead interpreting data to achieve the same output. The Elixir program is, in fact, a tuned interpreter for a specific COBOL program. As an interpreter, its performance will be less than that of a compiled program. Where COBOL can compile code to handle the PICTURE clauses, the Elixir code must look up the PICTURE clause at runtime, decode it, and then take action.

My final concern is the most significant. The Elixir programming language is not a good match for the COBOL language. Theoretically, any program written in a Turing-complete language can be re-written in a different Turing-complete language. That's a nice theory, but in practice converting from one language to another can be easy or can be difficult. Modern languages like Elixir have object-oriented and structured programming constructs. COBOL predates those constructs and has procedural code and global variables.

We can see the impedance mismatch in the Elixir code to catch the "stop run" exception. A COBOL program may contain "STOP RUN" anywhere in the code. The Elixir transpiler project has to build extra code to duplicate this capability. I'm not sure how the transpiler will handle global variables, but it will probably be a method that is equally tortured. Converting code from a non-structured language to a structured programming language is difficult at best and results in odd-looking code.

My point here is not to insult or to shout down the transpiler project. It will probably be an educational experience, teaching the author about Elixir and probably more about COBOL.

My first point is that programs are designed to match the programming language. Programs written in object-oriented languages have object-oriented designs. Programs written in functional languages have functional designs. Programs written in non-structured languages have... non-structured designs. The designs from one type of programming language do not translate readily to a programming language of a different type.

My second point is that we assume that modern languages are better than older languages. We assume that object-oriented languages like C++, C#, and Java are better than (non-OO) structured languages like Pascal and Fortran-77. Some of us assume that functional languages are better than object-oriented languages.

I'm not so sure about those assumptions. I think that object oriented languages are better at some tasks that mere structured languages, and older structured-only languages are better at other tasks. Object-oriented languages are useful for large systems; they let us organize code into classes and functions, and even larger constructs through inheritance and templates. Dynamic languages like Python and Ruby are good for some tasks but not others.

And I must conclude that even older, non-functional, non-dynamic, non-object-oriented, non-structured programming languages are good for some tasks.

One analogy of programming languages is that of a carpenter's toolbox: full of various tools for different purposes. COBOL, one of the oldest languages, might be considered the hammer, one of the oldest tools. Hammers do not have the ability of saws, drills, tape measures, or levels, but carpenters still use them, when the task is appropriate for a hammer.

Perhaps we can learn a thing or two from carpenters.

Monday, May 10, 2021

Large programming languages considered harmful

I have become disenchanted with the C# programming language. When it was introduced in 2001, I like the language. But the latter few years have seen me less interested. I finally figured out why.

The reason for my disenchantment is the size of C#. The original version was a medium-sized language. It was an object-oriented language, and in many ways a copy of Java (which was also a medium-sized language in 2001).

Over the years, Microsoft has released new versions of C#. Each new version added features, and increased the capabilities of the language. But as Microsoft increased the capabilities, it also increased the size of the language.

The size of a programming language is an imprecise concept. It is more than a simple count of the keywords, or the number of rules for syntax. The measure I like to use is a rough guess of how much space it requires in the head of a programmer; how much brainpower is required to learn the language and how many neurons are needed to remember the different concepts, keywords, and rules of the language.

Such a measure has not been made with any tools, at least not that I know of. All I have is a rough estimate of a language's size. But that rough estimate is good enough to classify languages into small (BASIC, AWK, original FORTRAN), medium (Ruby, Python), and large (COBOL, C#, and Perl).

It may seem natural that languages expand over time. Languages other than C# have been expanded: Java (by Sun and later Oracle), Visual Basic (by Microsoft), C++ (by committee), Perl, Python, Ruby, and even languages such as COBOL and Fortran.

But such expansions of languages worry me. The source of my worry goes back to the "language wars" of the early days of computing.

In the 1960s, 1970s, and 1980s programmers argued (passionately) over programming languages. C vs Pascal, BASIC vs FORTRAN, Assembly language vs... everything.

Those arguments were fueled, mostly in my opinion, by of the high cost of changing. Programming languages were not free. Compilers and interpreters were sold (or licensed). Changing languages meant spending for the new language -- and abandoning the investment in the old. And that meant that, once invested in a language, you were loath to give it up. And that meant you would defend that choice of programming language. People would rather fight than switch.

In the 2000s, thanks to open source, compilers and interpreters became free. The financial cost of changing from one language to another disappeared. And that meant that people could switch programming languages. And that meant that people could switch rather than fight.

So why am I worried, now, in 2021, about a new round of language wars?

The reason is the size of programming languages. More specifically, the size of the environment for any one programming language. That environment includes the language, the compiler (or interpreter), the standard library (or common packages used for development), and the IDE. Each of these components requires some amount of effort to learn and remember.

As each of these environments grows, the effort to learn it grows. And that means that the effort to switch from one language to another also grows. Changing from C# to Python, for example, requires not only learning the Python syntax, it also requires learning the common packages that are necessary for effective Python programs and also learning the IDE (probably PyCharm, which is quite different from Visual Studio).

We are rebuilding the barriers between programming languages. The old barrier was financial: it cost a lot to switch from one language to another. The new barrier is not financial but technical: the tools are free but the time to learn them is significant.

Barriers to switching programming languages can put us back in the position of defending our choices. Once again, programmers may rather fight than switch.

Tuesday, December 29, 2020

For readability, BASIC was the best

One language stands alone in terms of readability.

That language is BASIC.

BASIC -- the old, pre-Visual Basic of the 1980s -- has a unique characteristic: a single line can be read and understood.

One may think that a line from any programming language can be read and understood. After all, we read and understand programs all the time, don't we? That's true, but we read entire programs, or large sections of programs. Those large fragments of programs contain information that defines the classes, functions, and variables in the programs, and we use that information to understand the code. But if we strip away that extra information, if we limit ourselves to a single line, then we cannot read and completely understand the code.

Let's look at an example line of code:

a = b * 5

Can you tell what this code does? For certain? I cannot.

A naive assessment is that the code retrieves the value of variable 'b', multiplies it by 5, and stores the result in the variable 'a'. It is easy to assume that the variables 'a' and 'b' are numeric. Yet we don't know that -- we only assume it.

If I tell you that the code is Python code, and that 'b' refers to a string object, then our understanding of this code changes. The code still performs a 'multiply' operation, but 'multiply' for a string object is very different from 'multiply' for a numeric object.

Instead, if I tell you that the code is C++, then we must identify the type for 'b' (which is not provided in the single line of code) and we must know if the class for 'b' defines the '*' operator. That operation could do anything, from casting b's contents to a number and multiplying 5 to sending some text to 'cout'.

We like to think we understand the code, but instead we are constantly making assumptions about the code and building an interpretation that is consistent with those assumptions.

But the language BASIC is different.

In BASIC, the line

a = b * 5

or, if you prefer

100 LET A = B * 5

is completely defined. We know that the variable B contains a numeric value. (The syntax and grammar rules for BASIC require that a variable with no trailing sigil is a numeric variable.) We also know that the value of B is defined. (Variables are always defined. If not initialized in our code, that have the value 0.) We know the behavior of the '*' operator -- it cannot be overridden or changed. We know that the variable 'A' is numeric, and that it can receive the results of the multiply operation.

We know these things. We do not need other parts of the program to identify the type for a variable, or a possible redefinition of an operator.

This property of BASIC means that BASIC is readable in a way that other programming languages are not. Other programming languages require knowledge of declarations. All of the C-based languages (C, Objective-C, C++, C#, and even Java) require this. Perl, Python, and Ruby don't have variables; they have names that can refer to any type of object. The only other programming language that comes close is FORTRAN II. It might have had the same readability; it had rules for names of variables and functions.

BASIC's readability is possible because it requires the data type of a variable to be encoded in the name of the variable. This is completely at odds with every modern language, which allow variables to be named with no special markings for type. BASIC used static typing; not only static typing, but overt typing -- the type was expressed in the name.

Static, overt typing was possible in BASIC because BASIC used a limited number of types (numeric, integer, single-precision floating point, double-precision floating point, and string) each of which could be represented by a single punctuation character. Each variable name had a sigil for the type. (Or no sigil, in which case the type was numeric.)

Those sigils were so useful that programmers who switched to Visual Basic kept the idea, through the use of programming style conventions that asked for prefixes for each variable. That effort become unwieldy, as there were many types (Visual Basic used many libraries for Windows functions and classes) and there was no all-encompassing standard and no was to enforce a standard.

Overt typing is possible with a language that has a limited number of types. It won't work (or it hasn't worked) for object-oriented languages. Those languages are designed for large systems with large code bases. They have built-in types and allow for user-defined types, with no support to indicate the type in the name of the variable. And as we saw with Visual Basic, expressing the different types is complicated.

But that doesn't mean the idea is useless. Overt typing worked for BASIC, a language that was designed for small programs. (BASIC was meant to be a language for teaching the skills of programming. The name was an acronym: Beginner's All-Purpose Symbolic Instruction Code.) Overt typing might be helpful for a small language, one designed for small programs.

It strikes me that cloud computing is the place for small languages. Cloud computing uses multiple processors to perform calculations, and split code bases. A well-designed cloud application consists of lots of small programs. Those small programs don't have to be built with object-oriented programming languages. I expect to see new programming languages for cloud-based computing, programming languages that are designed for small programs.

I'm not recommending that we switch from our current set of programming languages to BASIC. But I do think that the readability of BASIC deserves some attention.

Because programs, large or small, are easier to understand when they are readable.

Tuesday, November 17, 2020

The joy of a small programming language

I recently had the pleasure of reading the reference manual for Dibol (a programming language from the 1970s). And reading the manual was a pleasure. What made it so was the brevity and directness of the manual, the simplicity of the concepts, and the feeling of confidence in my ability to start programming it the language.

Dibol is a language from the minicomputer era, a short period of computing that occurred prior to the PC revolution. Made by DEC, it was an alternative for COBOL that ran on small systems and operated on simple files. It's syntax was influenced by COBOL, but much simpler.

The entire reference manual ran about 30 pages.

Dibol, I think, would have fit well in the Unix environment. Unix systems tended to use text files with flexible formats and varying line lengths, and Dibol was built for text files with fixed-length formats, so the match is not perfect. But Dibol is a language that follows the Unix philosophy: a simple tool to perform a simple task.

I could see it as a companion to AWK, a small language which handles text files and variable-length lines.

Dibol didn't "make it" as a programming language. It lives today, in the form of "Synergy/DE" which is a large superset of the original Dibol. So large, in fact, that it perhaps no longer follows the Unix idea of a simple tool. (I suspect the reference manual is longer than 30 pages.) But Dibol and its successors DBL and Synergy/DE have no significant presence in the popularity lists from Tiobe or PYPL or RedMonk.

We have no small languages today -- at least no small languages that enjoy any amount of popularity. AWK may be the most famous of the small languages, and even it is in the lower ranks on popularity lists. The popular languages are big (C++, Java, Python, C#, R, Visual Basic...).

We should be careful with the popularity indices. The methods used by them may skew the results. Tiobe and PYPL count queries about the languages -- people asking questions. I think we can safely assume that people will ask more questions about large languages than small languages, so complex languages have an advantage in the popularity metrics.

RedMonk also looks at Github and the code posted there. It's not clear if RedMonk is counting number of projects, number of files, or lines of code. (A complex language would probably have more lines of code.) But Dibol and other small languages do not show in RedMonk's list.

So we can conclude that there is no significant use of small programming languages. As a programming language, you're either big or you're unknown.

Which, I think, is a shame. The big language are hard to learn and easy to create large, complicated systems. Small languages have the advantage of doing one job and doing it well. We as an industry may be missing out on a lot.

It also means that large programming languages are the "go to" solution for programmers; that programmers prefer large languages. There is some sense in this. A large programming language (C#, Java, take your pick) can handle just about any task. Therefore, a programmer who knows a large programming language is capable (in one sense) of handling any task. Large programming languages give the programmer flexibility, and more opportunity. (I say "in one sense" because knowledge of a programming language is, while necessary, not sufficient for the design of large or specialized applications. System architecture, user requirements, performance analysis, and deployment practices are also required.)

The consistent use of large programming languages means that our solutions will tend to be large. I know of no small Java applications, no small C++ applications, and no small C# applications. There may be small utility programs written in large languages, but I suspect that those utility programs grow over time and become large utility programs.

Large code bases require more effort to maintain than small code bases, so our bias towards large programming languages is, in the long run, costing our customers time and money.

Of course, the alternative (lots of small programs written in small programming languages) can have the problem of multiple languages and not enough people to maintain them. A shop with hundreds of programs written in a dozen or so different programming languages also faces maintenance efforts. Should every programmer in the shop know every programming language used in that shop? That also requires time and effort.

So what should a programmer do? Learn one large language? Learn lots of small languages?

My recommendation is to learn multiple languages, some large and some small. For large languages, pick two of the popular languages C, C++, C#, Java, JavaScript, and Python. Also learn some small languages such as Awk, Lua, and bash.

But don't spend time on Dibol.

Wednesday, October 7, 2020

Platforms are not neutral

We like to think that our platforms (processor, operating system and operating environment) are neutral, at least when it comes to programming languages. After all, why should a processor care if the code that it executes (machine code) was generated by a compiler for C#, or Java, or Fortran, or Cobol? Ditto for the operating system. Does Windows care of the code was from a C++ program? Does Linux care of the code came from Go?

And if processors and operating systems don't care about code, why should a platform such as .NET or the JVM care about code?

One could argue that the JVM was designed for Java, and that it has the data types and operations that are needed for Java programs and not other languages. That argument is correct: the JVM was built for Java. Yet people have built compilers that convert other languages to the JVM bytecode. That list includes Clojure, Lisp, Kotlin, Groovy, JRuby, and Jython. All of those languages run on the JVM. All of those languages use the same data types as Java.

The argument for .NET is somewhat different. The .NET platform was designed for multiple languages. When announced, Microsoft supplied not only the C# compiler but also compilers for C++, VB, and Visual J++. Other companies have added compilers for many other languages.

But those experiences do not mean that the platforms are unbiased.

The .NET platform, and specifically the Common Language Runtime (CLR) was about interoperability. The goal was to allow programs written in different languages to work together. For example, to call a function in Visual Basic from a function in C++.

To achieve this interoperability, the CLR requires languages to use a common set of data types. These common types include 32-bit integers, 64-bit floating-point numbers, and strings. Prior to .NET, the different language compilers from Microsoft all had different ideas about numeric types and string types. C and C++ user null-terminated strings, but Visual Basic used counter-in-front strings. (Thus, a string in Visual Basic could contain NUL characters, but a string in C or C++ could not.) There were differences with floating-point numeric values also.

Notice that these types are aligned with the C data types. The CLR, and the agreement on data types, works for languages that use data types that match C's data types. The .NET version of Visual Basic (VB.NET) had to change its data types in order to comply with the rules of the CLR. Thus, VB.NET was quite a bit different from the previous Visual Basic.

The CLR works for languages that use C-style data types. The CLR supports custom data types, which is nice, and necessary for languages that do not use C-style data types, but then one loses interoperability, and interoperability was the major benefit of .NET.

The .NET platform favors C-style data types. (Namely integers, floating point, and NUL-terminated strings.)

The JVM also favors C-style data types.

Many languages use C-style data types.

What languages don't use C-style data types?

Cobol, for one. Cobol was developed prior to C, and it has its own ideas about data. It allows numeric values with PICTURE clauses, which can define limits and also formatting. Some examples:

   05  AMOUNT1          PIC 999.99.
   05  AMOUNT2          PIC 99999.99.
   05  AMOUNT3          PIC 999999.99.
   05  AMOUNT4          PIC 99.99.

(The '05' at the front of each line is not a part of the variable, but indicates how the variable is part of a larger structure.)

These four different values are numeric, but they do not align well with any of the CLR types. Thus, they cannot be exchanged with other programs in .NET.

There are compilers for Cobol that emit .NET modules. I don't know how they work, but I suspect that they either use custom types (which are not easily exchanged with modules from other languages) or they convert the Cobol-style data to a C-style value (which would incur a performance penalty).

Pascal has a similar problem with data types. Strings in Pascal are length-count strings, not NUL-terminated strings. Pascal has "sets" which can contain a set of values. The notion of a set translates poorly to other languages. C, C++, Java, and C# can use enums to do some of the work, but sets in Pascal are not quite enums.

Pascal also has definite ideas about memory management and pointers, and those ideas do not quite align with memory management in C (or .NET). With care, one can make it work, but Pascal is not a native .NET language any more than Cobol.

Fortran is another language that predates the .NET platform, and doesn't work well on it. Fortran is a simpler language that Cobol or Pascal, and concentrates on numeric values. The numeric types can convert to the CLR numeric types, so compiling and exchanging data is possible.

Fortran's strength was speed. It was (and still is) one of the fastest languages for numeric processing. Its speed is due to its static memory layout, something that I have not seen in compilers for Fortran to .NET modules. Thus, Fortran on .NET loses its advantage. Fortran on .NET is not fast, and I fear it never will be.

Processors, too, are biased. Intel processors handle binary numeric values for integers and floating-point values, but not BCD values. IBM S/360 processors (and their descendants) can handle BCD data. (BCD data is useful for financial transactions because it avoids many issues with floating-point representations.)

Our platforms are biased. We often don't see that bias, most likely because with use only a single platform. (A single processor type, a single operating system, a single programming language.) The JVM and .NET platforms are biased towards C-style data types.

There are different approaches to data and to computation, and we're limiting ourselves by limiting our expertise. I suspect that in the future, developers will rediscover the utility of data types that are not C-style types, especially the PICTURE-specified numeric types of Cobol. As C and its descendants are ill-equipped to handle such data, we will see new languages with the new (old) data types.


Thursday, May 21, 2020

The lessons we programmers learn

We in the programming industry learn from our mistakes, and we create languages to correct our mistakes. Each "generation" of programming language takes the good aspects of previous languages, drops the bad aspects, and adds new improved aspects. (Although we should recognize that "good", "bad", and "improved" are subjective.) Over time, our "programming best practices" are baked into our programming languages.

We learned that assembly language was specific to hardware and forced us to think too much about memory layouts, so we invented high-level languages. COBOL and FORTRAN allowed us to write programs that were portable across computers from different vendors and let us specify variables easily. (I won't say "memory management" here, as early high-level languages did not allow for dynamic allocation of memory the way C and C++ do.)

COBOL, FORTRAN, and BASIC (another high-level language) used GOTO statements and rudimentary IF statements for flow control. We learned that those statements lead to tangled code (some say "spaghetti code"), so we invented structured programming with its "if-then-else" and "do-while" statements. Pascal was one of the first languages to implement structured programming. (It retained the GOTO statement, but it was rarely needed.)

Structured programming was better than non-structured programming, but it was not sufficient for large systems. We learned that large system need more than if-then-else and do-while to organize the code, so we invented object-oriented programming. The programming languages C++, Java, and C# became popular.

Designing new languages seems like a built-in feature of the human brain. And designing new languages that use the best parts of the old languages while replacing the mistakes of the old languages seems like a good thing.

But this arrangement bothers me.

Programmers who learn the whole trail of languages, from assembly to BASIC to C++ to Java, understand the weaknesses of the early languages and the strengths of later languages. But these programmers are few. Most programmers do not learn the whole sequence; they learn only the current languages which have pruned away all of the mistakes.

We programmers often look forward. We want the latest language, the newest database, the most recent operating system. In looking forward, we don't look back. We don't look at the older systems, and the capabilities that they had.

Those old systems (and languages) had interesting features. Their designers had to be creative to solve certain problems. Many of those solutions were discarded as hardware became more powerful and languages became more structured.

Is it possible that we have, in our enthusiasm to improve programming languages, discarded some ideas that are worthy? Have we thrown out a baby (or two) with the bathwater of poor language features?

If we don't look back, if we leave those abandoned features in the dust heap, how would we know?

Thursday, March 19, 2020

The Lesson from BASIC, Visual Basic, and VB.NET

This week Microsoft announced that VB.NET would receive no enhancements. In effect, Microsoft has announced the death of VB.NET. And while some developers may grieve over the loss of their favorite language, we should look at the lesson of VB.NET.

But first, let's review the history of BASIC, the predecessor of VB.NET.

BASIC has a long history, and Microsoft was there for most of it. Invented in the mid-1960s, BASIC was a simple interpreter, designed for timeshare systems and people who were not programmers. The major competing languages, the programming languages one could use instead of BASIC were FORTRAN and COBOL. BASIC, while less powerful, was much easier to use than any of the alternatives.

Small home computers were a natural for BASIC. Microsoft saw an opportunity and built BASIC for the Apple II, Commodore's PET and CBM, Radio Shack's TRS-80, and many others. Wherever you turned, BASIC was available. It was the lingua franca of programming, which made it valuable.

BASIC was popular, but its roots in timeshare made it a text-oriented language. (To be fair, all other languages were text-oriented, too.) As computers become more popular, programmers had to manipulate hardware directly to use special effects such as colors and graphics. Microsoft helped, by enhancing the language with commands for graphics and other hardware such as sound. BASIC remained the premiere language for programming because it was powerful enough (or good enough) to get the job done.

Microsoft's Windows posed a challenge to BASIC. Even with its enhancements for graphics, BASIC was not compatible with the event-driven model of Windows. Microsoft's answer was Visual Basic, a new language that shared some keywords with BASIC but little else. The new Visual Basic was a completely different language, even more powerful than the biggest "Disk BASIC" Microsoft ever released. Microsoft's other language for Windows, Visual C++, was powerful but hard to use. Visual Basic was less powerful but much easier to use, and it had better support for COM. The ease of use and COM support provided value to developers.

Microsoft's .NET posed a second challenge to Visual Basic, which was not compatible with the new architecture the .NET framework. Microsoft's answer was VB.NET, a redesign that looked a lot like C# but with some keywords retained from Visual Basic.

For the past two decades (almost), VB.NET has been a supported language in Microsoft's world, living beside C#. That coexistence now comes to an end, with C# getting upgrades and VB.NET getting... very little.

The problem with VB.NET (as I see it) is that VB.NET was too close, too similar to C#. VB.NET offered little that was different (or better) than C#. Thus, when picking a language for a new project, one could pick C# or VB.NET and be assured that it would work.

But being similar, for programming languages, is not a good thing. Different languages should be different. They should offer different programming constructs and different capabilities, because the differences can provide value.

C++ is different from C, and while C++ can compile and run every C program, the differences between C++ and C are enough that both languages offer value.

Python and Ruby are different enough that both can exist. Both offer value to the programmer.

C# and Java are close cousins, and one could argue that they too are too similar to co-exist. For this case, it may be that the sponsoring companies (Microsoft for C#, Oracle for Java) is enough of a difference. For these languages, the relationship with the sponsoring company is the value.

But VB.NET was too close to C#. Anything you could do in VB.NET you could do in C#, and usually with no additional effort. VB.NET offered nothing of distinct value.

We should note that the end of VB.NET does not mean the end of BASIC. There are other versions of BASIC, each quite different from VB.NET and C#. These different versions may continue to thrive and provide value to programmers.

Sometimes, being different is important.

Thursday, March 5, 2020

Programming languages differ

A developer recently ranted about the Go language. The rant was less about the language and more about run-time libraries and interfaces to underlying operating systems and their file systems. The gist of his rant is that Go is not a suitable programming language for performing certain operations on certain operating systems, and therefore he is "done" with Go.

The first part of his rant is correct. I disagree with the conclusion.

We have the idea that programming languages are general-purpose, that any language can be used for any problem. But programming didn't start that way. Programming languages started unequal, with different languages designed for different types of computing. And while languages have become less specific and more general-purpose, they are still not equal.

I think our idea about a general-purpose programming language (or perhaps an all-purpose programming language) started with the IBM System/360  in the 1960s and the PL/1 programming language.

Prior to the System/360, computers had specific purposes. Some computers were designed for numeric processing, and others were designed for transaction processing. Not only were computers specific to a purpose, but programming languages were, too. FORTRAN was for numeric processing, and used on computers built for numeric processing. COBOL was for transaction processing, and used on computers built for commercial processing.

After IBM introduced the System/360, a general-purpose computer suitable for both numeric and commercial processing, it introduced PL/1, a general-purpose programming language suitable for numeric and commercial processing. (A very neat symmetry, with general-purpose hardware using a general-purpose programming language.)

PL/1 saw little popularity, but the notion of a general-purpose programming language did gain popularity and still dominates the our mindsets.We view programming languages as rough equals, and the choice of language can be made based on factors such as popularity (a proxy for availability of talent) and tool support (such as IDEs and debuggers).

There are incentives to reinforce the notion that a programming language can do all things. One comes from vendors, another comes from managers, and the third comes from programmers.

Vendors have an incentive to push the notion that a language can do everything -- or at least everything the client needs. Systems from a vendor come with some languages but not all languages. Explaining that your languages can solve problems is good marketing. Explaining that they cannot solve every problem is not.

The managers who purchased computers (which were expensive in the early days) wanted validation of their selection. They wanted to hear that their computer could solve the problems of the business. That meant believing in the flexibility and power of the hardware, and of the programming languages.

The third group of believers is programmers. Learning a programming language takes time. The process is an investment. We programmers want to think that we made a good investment. Admitting that a programming language is not suitable for some tasks means that one may have to learn a different programming language. That's another investment of time. It's easier to convince oneself that the current programming language is capable of everything that is needed.

But different programming languages have different strengths -- and different weaknesses. Programming languages are not the same, and they are not interchangeable.

COBOL is good for transaction processing, especially with flat files. But I would not use it for word processing or games.

FORTRAN is good for numeric processing. But I would not for word processing, nor for transaction processing.

Object oriented languages such as C++, Java, and C# are good for large applications that require structure and behavior that can be defined and verified by the compiler. (Static types and type checking.)

BASIC and Pascal are good for learning the concepts of programming. Both have been expanded in many ways, and have been used for serious development.

R is good for statistics and rapid analysis of numeric data, and the visualization of data. But I would not use it for machine learning.

Perl, Python, and Ruby are good for prototyping and for small- to medium-size applications. I would not use them for large-scale systems.

We should not assume that every language is good for every task, or every purpose. Languages (and their run-time libraries) are complex. They can be measured in multiple dimensions: complexity of the language, memory management and garbage collection, type safety, support tools, library support, connections to data bases and data sources, vendor support, community support, and more. Each language has strengths and weaknesses.

The developer who ranted against the Go language had criticisms about Go's handling of filesystems on Windows. His complaint is not unfounded; Go has a library that expects a Unix-like filesystem and not a Windows filesystem, and works poorly with Windows. But that doesn't mean that the language is useless!

An old joke tells of a man who consults a doctor. The man lifts his arm and says "Doc, it hurts when I do this." The doctor replies, "Well, don't do that!" While it gets laughs, there is some wisdom in it. Go is a poor language for handling the Windows filesystem; don't use it for that.

A more general lesson is this: know your task, and know your programming language. Understand what you want to accomplish, at a fairly detailed level. Learn more than one programming language and recognize that some are better (at certain things) than others. When selecting a programming language, make an informed decision. Don't write off a language forever because it cannot do a specific task, or works poorly for some projects.

Tuesday, February 25, 2020

The language treadmill

Technology is not a platform, but a treadmill. Readers of a certain age may remember the closing credits of "The Jetsons", in which George Jetson walks their dog astro on a space-age treadmill that runs a bit faster than he would like. The image is not too far off from the technology treadmill.

We're accustomed to thinking of hardware as changing. This year's laptop computer is better (faster, more powerul, more memory) than last year's laptop computer. We consistently improve processors, memory, storage, displays, ... just about everything. Yet the hardware also changes; today's laptop PCs are a far cry from the original IBM PC. Even desktop PCs are different from the original IBM PC. So different, in fact, that they would not be considered "PC compatible" at the time. (Today's USBC-attached displays would not connect, today's USB keyboards would not connect, nor would USB thumb drives. Nothing from today's PC's would connect to an IBM PC model 5150, nor would anything from 1982's IBM PC connect to today's PCs.)

The treadmill includes more that hardware. It also includes software. Operating systems are the most visible software that changes; we deal with them every day. Microsoft has made improvements to each version of Windows, and Windows 10 is different from Windows 3.11 and especially Windows 1.0. Apple changes OS X and that is quite different from the original OS X and much different from the previous Apple OS versions.

The treadmill extends beyond hardware and operating systems. It includes applications (Microsoft Word, for example) and it also includes programming languages.

Programming languages, whether governed by committee (C, C++) or individual (Perl, Python) change over time. Many times, a new version is "backwards compatible", meaning that your old code will work with the new compiler or interpreter. But not always.

Some years ago, I worked on a project that built and maintained a large-ish C++ application which was deployed on Windows and Solaris. The C++ language was chosen because both platforms supported that language; compilers were available for Windows and for Solaris. But the compilers were upgraded on different schedules: Microsoft had their schedule for upgrades to their compiler, and Sun had their (different) schedules for upgrades to their compiler.

Different upgrade schedules could have been a problem, but it wasn't. (Until we made it one. More on that later.) Most updates to the compilers were backwards-compatible, so old code could be moved to the new compiler, recompiled, and the resulting program run immediately.

Most updates worked that way.

One update did not. It was an update to the C++ language, one that changed the way C++ worked and just so happened to break existing code. (The C++ governing committee was reluctant to make the change, but it was necessary and I agree with the reasoning.) So the upgrade required changes to the code. But we couldn't make the changes immediately; we had to wait for both compilers (Microsoft and Solaris) to support the new version of C++.

It turns out that we also had to wait for our managers to allocate time to make the code changes for the new compilers. Our time was allocated for new features to the large-ish application, not to technical improvements that provided no new business features. Thus, our code stayed with the old (soon to be outdated) structures.

Eventually, we saw a chain of events that forced us to update our code. Sun released a new version of the Solaris operating system, and we had to install that to stay current with out licenses. Once installed, we learned that the new operating system supported only the new version of the C++ compiler, and our code immediately broke. We could not compile and build our application on Solaris, nor release it to our production servers (which had been updated to the new Solaris).

This caused a scramble in the development teams. Our managers (who had prevented us from modifying the C++ code and moving to the new C++ compiler) were now anxious for us to make the changes, run tests, and deploy our application to our servers. We did make the changes, but it required us to stop our current work, make changes, run tests, and deploy the applications. Those efforts delayed other work for new features.

This is a cautionary tale, illustrating the need to stay up to date with programming tools. It also shows that programming languages change. Our example was with C++, but other languages change. Python has had two version "tracks" for some time (version 2 and version 3) and development and support of version 2 has come to an end. The future of Python is its version 3. (Do you use Python? Are you using version 3?)

Oracle has released Java version 14. Many businesses still use version 8.

There have been changes to JavaScript, TypeScript, C# (and .NET, on which C# rests), and even SQL. Most of these changes are backwards-compatible, so no code changes are required to move to the new version.

You will eventually move to the new operating system, or compiler, or interpreter. Your code may have to change. The changes can be on your schedule, or on someone else's schedule. My advice: be aware of updates and migrate your programming tools on yours.

Wednesday, February 12, 2020

Advances in platforms and in programming languages

The history of computing can be described as a series of developments, alternating between computing platforms and programming languages. The predominant pattern is one in which hardware is advanced, and then programming languages. Occasionally, hardware and programming languages advance together, but that is less common. (Hardware and system software -- not programming languages -- do advance together.)

The early mainframe computers were single-purpose devices. In the early 21st century, we think of computers as general-purpose devices, handling financial transactions, personal communication, navigation, and games, because our computing devices perform all of those tasks. But in the early days of electronic computing, devices were not so flexible. Mainframe computers were designed for a single purpose; either commercial (financial) processing, or scientific computation. The distinction was visible through all aspects of the computer system, from the processor and representations for numeric values to input-output devices and the characters available.

Once we had those computers, for commercial and for scientific computation, we built languages. COBOL for commercial processing; FORTRAN for scientific processing.

And thus began the cycle of alternating developments: computing platforms and programming languages. The programming languages follow the platforms.

The next advance in hardware was the general-purpose mainframe. The IBM System/360 was designed for both types of computing, and it used COBOL and FORTRAN. But we also continued the cycle of "platform and then language" with the invention of a general-purpose programming language: PL/1.

PL/1 was the intended successor to COBOL and to FORTRAN. It improved the syntax of both languages and was supposed to replace them. It did not. But it was the language we invented after general-purpose hardware, and it fits in the general pattern of advances in platforms alternating with advances in languages.

The next advance was timesharing. This advance in hardware and in system software let people use computers interactively. It was a big change from the older style of scheduled jobs that ran on batches of data.

The language we invented for this platform? It was BASIC. BASIC was designed for interactive use, and also designed to avoid requests of system operators to load disks or tapes. A BASIC program could contain its code and its data, all in one. Such a thing was not possible in earlier languages.

The next advance was minicomputers. The minicomputer revolution (DEC's PDP-8, PDP-11 and other  systems from other vendors) used BASIC (adopted from timesharing) and FORTRAN. Once again, a new platform initially used the languages from the previous platform.

We also invented languages for minicomputers. DEC invented FOCAL (a lightweight FORTRAN) and DIBOL (a lightweight COBOL). Neither replaced its corresponding "heavyweight" language, but invent them we did.

The PC revolution followed minicomputers. PCs were small computers that could be purchased and used by individuals. Initially, PCs used BASIC. It was a good choice: small enough to fit into the small computers, and simple enough that individuals could quickly understand it.

The PC revolution invented its own languages: CBASIC (a compiled form of BASIC), dBase (later named "xbase"), and most importantly, spreadsheets. While not a programming language, a spreadsheet is a form of programming. It organizes data and specifies calculations. I count it as a programming platform.

The next computing platform was GUI programming, made possible with both the Apple Macintosh and Microsoft Windows. These "operating environments" (as they were called) changed programming from text-oriented to graphics, and required more powerful hardware -- and software. But the Macintosh first used Pascal, and Windows used C, two languages that were already available.

Later, Microsoft invented Visual Basic and provided Visual C++ (a concoction of C++ and macros to handle the needs of GUI programming), which became the dominant languages of Windows. Apple switched from Pascal to Objective-C, which it enhanced for programming the Mac.

The web was another computing advance, bringing two distinct platforms: the server and the browser. At first, servers used Perl and C (or possibly C++); browsers were without a language and had to use plug-ins such as Flash. We quickly invented Java and (somewhat less quickly) adopted it for servers. We also invented JavaScript, and today all browsers provide JavaScript for web pages.

Mobile computing (phones and tablets) started with Objective-C (Apple) and Java (Android), two languages that were convenient for those devices. Apple later invented Swift, to fix problems with the syntax of Objective-C and to provide a better experience for its users. Google invented Go and made it available for Android development, but it has seen limited adoption.

Looking back, we can see a clear pattern. A new computing platform emerges. At first, it uses existing languages. Shortly after the arrival of the platform, we invent new languages for that platform. Sometimes these languages are adopted, sometimes not. Sometimes a language gains popularity much later than expected, as in the case of BASIC, invented for timesharing but used for minicomputers and PCs.

It is a consistent pattern.

Consistent that is, until we get to cloud computing.

Cloud computing is a new platform, much like the web was a new platform, and PCs were a new platform, and general-purpose mainframes were a new platform. And while each of those platforms saw the development of new languages to take advantage of new features, the cloud computing platform has seen... nothing.

Well, "nothing" is a bit harsh and not quite true.

True to the pattern, cloud computing uses existing languages. Cloud applications can be built in Java, JavaScript, Python, C#, C++, and probably Fortran and COBOL. (And there are probably cloud applications that use these languages.)

And we have invented Node.js, a framework in JavaScript that is useful for cloud computing.

But there is no native language for cloud computing. No language that has been designed specifically for cloud computing. (No language of which I am aware. Perhaps there is, lurking in the dark corners of the internet that I have yet to visit.)

Why no language for the cloud platform? I can think of a few reasons:

First, it may be that our current languages are suitable for the development of cloud applications. Languages such as Java and C# may have the overhead of object-oriented design, but that overhead is minimal with careful design. Languages such as Python and JavaScript are interpreted, but that may not be a problem with the scale of cloud processing. Maybe the pressure to design a new language is low.

Second, it may be that developers, managers, and anyone else connected with projects for cloud applications is too busy learning the platform. Cloud platforms (AWS, Azure, GCP, etc.) are complex beasts, and there is a lot to learn. It is possible that we are still learning about cloud platforms and not ready to develop a cloud-specific language.

Third, it may be too complex to develop a cloud-specific programming language. The complexity may reside in separating cloud operations from programming, and we need to understand the cloud before we can understand its limits and the boundaries for a programming language.

I suspect that we will eventually see one or more programming languages for cloud platforms. The new languages may come from the big cloud providers (Amazon, Microsoft, Google) or smaller providers (Dell, Oracle, IBM) or possibly even someone else. Programming languages from the big providers will be applicable for their respective platforms (of course). A programming language from an independent party may work across all cloud platforms -- or may work on only one or a few.

We will have to wait this one out. But keep yours eyes open. Programming languages designed for cloud applications will offer exciting advances for programming.

Wednesday, January 8, 2020

Predictions for 2020

I have some predictions for 2020. They may (or may not) be correct.

Hardware: Virtual desktops and the cloud

I expect 2020 to be the "year of the cloud", in a sense. While cloud computing is popular, I see the phrase "the cloud" becoming more popular in the upcoming year, even more popular than cloud computing itself. How can this be? How can the term be used more than the actual thing?

I expect that lots of people with use the term "the cloud" to mean online (or web-based) computing, even in situations that do not use actual, honest-to-goodness cloud computing.

We will see an interest in virtual desktops, specifically for Windows. Today's PCs are real PCs with operating systems and applications. In 2020, look for a push (by Microsoft) for virtual desktops - instances of Windows hosted on servers and accessed by a browser or by Microsoft's Remote Desktop program.

Most folks will call this "Windows in the cloud" or "cloud computing". The former is a more accurate term, but the second is not too wrong. Virtual desktops will be hosted on servers, with some applications built for cloud computing and others not. Microsoft's Office products (Word, Excel, and Outlook) will all reside in the cloud as try cloud applications. Applications from other vendors will run on virtual desktops but won't be true cloud applications.

Virtual Windows desktops offer several advantages to Microsoft: they are paid as subscriptions, so Microsoft gets a steady cash flow. Second, Microsoft can move customers to the latest version Windows 10. Third, and perhaps most importantly, Apple is not prepared to offer a matching service. (Apple remains in the world of "fat desktops" which run the operating system and the applications. They cannot move that experience to virtual workstations hosted in the cloud.)

Cloud-based Windows is not for everyone. Some will be unwilling to move to virtual desktops, and some will be unable to move. Anyone who insists on running an older version of Windows will remain in "fat desktop" land. And anyone who cannot install their software (perhaps because they have mislaid the install CD-ROM) will stay with their current desktop computers.

Those who do move to virtual desktops will be able to use simpler, lightweight computers (perhaps Chromebooks, perhaps computers with "Windows in S mode") that run little more than the software necessary to access the virtual desktop. The physical desktop computer will be a "terminal" to the virtual desktop. The benefits for those users will be cheaper hardware, more reliable desktops, invisible backups, versioning of data files, and the ability to shift work from one lightweight desktop "terminal" to another.

Programming languages: More Python, less Perl

Perl was perhaps the first programming language to be designated a "scripting language". (It won the designation, although other languages such as Awk and Csh predate it.)

But Perl has fallen on hard times. Developers have left Perl for Python, and the "Perl 6" effort, delayed and poorly advertised, has now re-branded itself as "Raku" to allow Perl 5 to continue without the cloud of eventual shutdown. The change comes late, and I fear that many have abandoned Perl in favor of Python.

Ruby is in a similar situation, with Python appealing to many developers and interest in Ruby is waning.

Python is winning the "scripting race". Many schools teach it, and many experienced developers recommend it as a first language to learn. (I'm in that group.) It has good support with libraries, tools, and documentation. Microsoft supports it in "Visual Studio Code" and in "Visual Studio".

I expect Python to gain in popularity, and Perl and Ruby to decline.

Programming languages: Smaller languages

I expect to see a shift from object-oriented languages (Java, C#, and C++) to smaller scripting languages (Python and perhaps Ruby).

Object-oriented languages are effective for large, complex systems. They were just what we needed for the large, complex applications that ran on PCs and servers -- before cloud computing. With cloud computing, we build systems (large or small) out of services, and we strive to make services small. (A large application can built of many small services.)

Java, C#, and C++ can be used to build small services, but often Python and other languages can be a better fit. The "big" object-oriented languages carry a lot of baggage to make object-oriented programming work; the smaller languages carry less.

Two possible contenders for building services may be Swift and Visual Basic. Swift is relatively new, and still undergoing changes. Visual Basic has evolved through several generations and Microsoft may create a smaller, lighter version for services. (These are guesses; I have no indication from Apple or Microsoft that such projects are underway or even under consideration.)

Programming languages: Safer languages

The languages Rust and Go are getting a lot of attention. Both are compiled languages, letting one build fast and efficient applications.

Rust and Go may challenge C and C++ as the premiere languages for high-performance systems. C and C++ have had a good run, from the 1970s to today. But more and more, safety in programs is becoming important. The design of languages such as Rust and Go give them an advantage over C and C++.

C and C++ will stay with us, of course. Many large-scale and popular projects are written in them. I don't expect those projects to convert to Rust, Go, or any other language.

I do expect new projects to consider Rust and Go as candidate languages. I also expect projects to convert existing systems from their current technology (which could be COBOL, Fortran, or even C and C++) to think about Rust and Go.

Containers: Nice for deployment, little else

Containers were quite popular in the past year, and I expect that they will remain popular. They are convenient ways of deploying (or sharing with others) a complete application, all ready to go.

Containers offer no benefits beyond that, however, and they are helpful only to the people who must deploy applications. Therefore, containers, like virtual machines, will quietly move to the realm of sysadmins.

Development: better tech than ever

For the year 2020, the picture for development looks rather nice. We have capable languages (and lots of them); stable networks, servers, and operating systems; powerful tools for testing, deployment, and monitoring; and excellent tools for communication and collaboration. Problems in development will not be technical in nature -- which means that the challenges we face will be with people and interactions.

For developers (and project managers), I expect to see interest in collaboration tools. That includes the traditional tools such as version control, item tracking, and messaging tools such as Slack. We might see interest in new tools that have not been introduced as yet.

Summary

I see a bright future for development. We have good tools and good practices. The challenges will be people-oriented, not technology-oriented. Let's enjoy this year!

Wednesday, September 4, 2019

Don't shoot me, I'm only the OO programming language!

There has been a lot of hate for object-oriented programming of late. I use the word "hate" with care, as others have described their emotions as such. After decades of success with object-oriented programming, now people are writing articles with titles like "I hate object-oriented programming".

Why such animosity towards object-oriented programming? And why now? I have some ideas.

First, we have the age of object-oriented programming (OOP) as the primary paradigm for programming. I put the acceptance of OOP somewhere after the introduction of Java (in 1995) and before Microsoft's C# and .NET initiative (in 1999), which makes OOP about 25 years old -- or one generation of programmers.

(I know that object-oriented programming was around much earlier than C# and Java, and I don't mean to imply that Java was the first object-oriented language. But Java was the first popular OOP language, the first OOP language that was widely accepted in the programming community.)

So it may be that the rejection of OOP is driven by generational forces. Object-oriented programming, for new programmers, has been around "forever" and is an old way of looking at code. OOP is not the shiny new thing; it is the dusty old thing.

Which leads to my second idea: What is the shiny new thing that replaces object-oriented programming? To answer that question, we have to answer another: what does OOP do for us?

Object-oriented programming, in brief, helps developers organize code. It is one of several techniques to organize code. Others include Structured Programming, subroutines, and functions.

Subroutines are possibly the oldest techniques to organize code. They date back to the days of assembly language, when code that was executed more than once was called with a "branch" or "call" or "jump subroutine" opcode. Instead of repeating code (and using precious memory), common code could be stored once and invoked as often as needed.

Functions date back to at least Fortran, consolidating common code that returns a value.

For two decades (from the mid 1950s to the mid-1970s), subroutines and functions were the only way to organize code. In the mid-1970s, the structured programming movement introduced an additional way to organize code, with IF/THEN/ELSE and WHILE statements (and an avoidance of GOTO). These techniques worked at a more granular level that subroutines and functions. Structured programming organized code "in the small" and subroutines and functions organized code "in the medium". Notice that we had no way (at the time) to organize code "in the large".

Techniques to organize code "in the large" did come. One attempt was dynamic-linked libraries (DLLs), introduced with Microsoft Windows but also used by earlier operating systems. Another was Microsoft's COM, which organized the DLLs. Neither were particularly effective at organizing code.

Object-oriented programming was effective at organizing code at a level higher than procedures and functions. And it has been successful for the past two-plus decades. OOP let programmers build large systems, sometimes with thousands of classes and millions of lines of code.

So what technique has arrived that displaces object-oriented programming? How has the computer world changed, that object-oriented programming would become despised?

I think it is cloud programming and web services, and specifically, microservices.

OOP lets us organize a large code base into classes (and namespaces which contain classes). The concept of a web service also lets us organize our code, in a level higher than procedures and functions. A web service can be a large thing, using OOP to organize its innards.

But a microservice is different from a large web service. A microservice is, by definition, small. A large system can be composed of multiple microservices, but each microservice must be a small component.

Microservices are small enough that they can be handled by a simple script (perhaps in Python or Ruby) that performs a few specific tasks and then exits. Small programs don't need classes and object-oriented programming. Object-oriented programming adds cost to simple programs with no corresponding benefit.

Programmers building microservices in languages such as Java or C# may feel that object-oriented programming is being forced upon them. Both Java and C# are object-oriented languages, and they mandate classes in your program. A simple "Hello, world!" program requires the definition of at least one class, with at least one static method.

Perhaps languages that are not object-oriented are better for microservices. Languages such as Python, Ruby, or even Perl. If performance is a concern, the compiled languages C and Go are available. (It might be that the recent interest in C is driven by the development of cloud applications and microservices for them.)

Object-oriented programming was (and still is) an effective way to manage code for large systems. With the advent of microservices, it is not the only way. Using object-oriented programming for microservices is overkill. OOP requires overhead that is not helpful for small programs; if your microservice is large enough to require OOP, then it isn't a microservice.

I think this is the reason for the recent animosity towards object-oriented programming. Programmers have figured out the OOP doesn't mix with microservices -- but they don't know why. They fell that something is wrong (which it is) but they don't have the ability to shake off the established programming practices and technologies (perhaps because they don't have the authority).

If you are working on a large system, and using microservices, give some thought to your programming language.

Friday, June 21, 2019

The complexity of programming languages

A recent project saw me examining and tokenizing code for different programming languages. The languages ranged from old languages (COBOL and FORTRAN, among others) to modern languages (Python and Go, among others). It was an interesting project, and I learned quite a bit about many different languages. (By 'tokenize', I mean to identify the type of each item in a program: Variables, identifiers, function names, operators, etc. I was not parsing the code, or building an abstract syntax tree, or compiling the code into op-codes. Tokenizing is the first step of compiling, but a far cry from actually compiling the code.)

One surprising result: newer languages are easier to tokenize than older languages. Python is easier to tokenize than COBOL, and Go is easier to tokenize than FORTRAN.

This is counterintuitive. One would think that older languages would be primitive (and therefore easy to tokenize) and modern languages sophisticated (and therefore difficult to tokenize). Yet my experience shows the opposite.

Why would this be? I can think of two -- no, three -- reasons.

First, the old languages (COBOL, FORTRAN, and PL/I) were designed in the age of punch cards, and punch cards impose limits on source code. COBOL, FORTRAN, and PL/I have few things in common, but one thing that they do have in common is line layout and the 'identification' field in columns 72 through 80.

When your program is stored on punch cards, a risk is that someone will drop the deck of cards and the cards will become out of order. Such a thing cannot happen with programs stored in disk files, but with punch cards such an event is a real risk. To recover from that event, the right-most columns were reserved for identification: a code, unique to each line, that would let a card sorter machine (there were such things) put the cards back into their proper order.

The need for an identification column is tied to the punch card medium, yet it became part of each language standard. COBOL, FORTRAN, and PL/I standards all refer to the columns 72 through 80 as reserved for identification, and they could not be used for "real" source code. Programs transferred from punch cards to disk files (when disks became available to programmers) kept the rule for the identification field -- probably to make conversion easy.  Later versions of languages did drop the rule, but the damage had been done. The identification field was part of the language specification.

As part of the language specification, I had to tokenize the identification numbers. Mostly they were not a problem -- just another "thing" to tokenize -- but sometimes they occurred in the middle of a string literal or a comment, which are awkward situations.

Anyway, the tokenization of old languages has its challenges.

New languages don't suffer from such problems. Their source code was never stored on punch cards, and they never had identification fields. (Either within string literals or not.)

But the tokenization of modern languages is easier. Each language has a set of token types, but older languages have a larger set, and a more varied set. Most languages have identifiers, numeric literals, and operators; COBOL also has picture values and level indicators, and PL/I has attributes and conditions (among other token types).

Which brings me to the second reason for modern languages to have simpler tokenizing requirements: The languages are designed to be easy to tokenize.

It seems to me that, intentionally or not, the designers of modern languages have made design choices that reduce the work for tokenizers. They have built languages that are easy to tokenize, and therefore have simple logic for tokenizers. (All compilers and interpreters have tokenizers; it is a step in converting the source to executable bytes.)

So maybe the simplicity of language tokenization is the result of the "laziness" of language designers.

But I have a third reason, one that I believe is the true reason for the simplicity of modern language tokenizers.

Modern languages are easy to tokenize because they are easy to read (by humans).

A language that is easy to read (for a human) is also easy to tokenize. Language designers have been consciously designing languages to be easy to read. (Python is the leading example, but all designers claim their language is "easy to read".)

Languages that are easy to read are easy to tokenize. It's that simple. We've been designing languages are humans, and as a side effect we have made them easy for computers.

I, for one, welcome the change. Not only does it make my job easier (tokenizing all of those languages) but it makes every developer's job easier (reading code from other developers and writing new code).

So I say three cheers for simple* programming languages!

* Simple does not imply weak. A simple programming language may be easy to understand, yet it may also be powerful. The combination of the two is the real benefit here.