Sunday, April 10, 2016

Complexity of programming languages

Are programming languages becoming more or less complex?

It is a simple question, and like many simple questions, the answer is not so simple.

Let's look at a simple program in some languages. The simple program will print the numbers from 1 to 10. Here are programs in FORTRAN, BASIC, Pascal, C, C++, C#, and Python:

FORTRAN 66 (1966)

      DO 100 I = 1, 10
100   WRITE (6, 200) I
200   FORMAT (I5)
      STOP
      END

BASIC (1965)

10 FOR I = 1 TO 10
20 PRINT I
30 NEXT I
99 END

Pascal (1970)

program hello;
begin
  i: integer;

  for i := 1 to 10 do

    WriteLn(i);
end.

C (1978)

#include "stdlib.h"

int main(void)
{
    int i;

    for (i = 1; i <= 10; i++)

      printf("%d", i);
}

C++ (1983)

#include

int main()

{
    for (unsigned int i = 1; i <= 10; i++)
      std::cout << i << std::endl;

    return 0;

}

C# (2000)

using System;

class Program

{
    public static void Main(string[] args)
    {
        for (int i = 1; i <= 10; i++)
            Console.WriteLine(i);
    }
}

Python (2000)

for i in range(1, 10):
    print(str(i))

From this small sampling, a few things are apparent.

First, the programs vary in length. Python is the shortest with only two lines. It's also a recent language, so are languages becoming more terse over time? Not really, as FORTRAN and BASIC are the next shortest languages (with 5 and 4 lines, respectively) and C#, a contemporary of Python, requires 10 lines.

Second, the formatting of program statements has changed. FORTRAN and BASIC, the earliest languages in this set, have strong notions about columns and lines. FORTRAN limits line length to 72 characters, reserves the first 6 columns for line numbers, and another column for continuation characters (which allows statements to exceed the 72 character limit). BASIC relaxed the column restrictions but add the requirement for line numbers on each line. Pascal, C, C++, and C# care nothing about columns and lines, looking at tokens in the code to separate statements. Python relies on indentation for block definitions.

Some languages (BASIC, C#) are capable of printing things by simply mentioning them. Others languages need specifications. FORTRAN has FORMAT statements to specify the exact format of the output. C has a printf() function that needs similar formatting information. I find the mechanisms of BASIC and C# easier to use than the mechanisms of C and Python.

Let's consider a somewhat more complex program, one that lists a set of prime numbers. We'll look at BASIC and Lua, which span the computing age.

Lua

local N = 100
local M = 10
function PRIME()  -- PROCEDURE DECLARATION;
  local X, SQUARE, I, K, LIM, PRIM -- DECLARATION OF VARIABLES;
  local P, V = {}, {}
  P[1] = 2 -- ASSIGNMENT TO FIRST ELEMENT OF p;
  print(2) -- OUTPUT A LINE CONTAINING THE NUMBER 2;
  X = 1
  LIM = 1
  SQUARE = 4
  for I = 2, N do -- LOOP. I TAKES ON 2, 3, ... N;
    repeat -- STOPS WHEN "UNTIL" CONDITION IS TRUE;
      X = X + 2
      if SQUARE <= X then
        V[LIM] = SQUARE
        LIM = LIM + 1
        SQUARE = P[LIM] * P[LIM]
      end
      local K = 2
      local PRIM = true
      while PRIM and K < LIM do
        if V[K] < X then
          V[K] = V[K] + P[K]
        end
        PRIM = X ~= V[K]
        K = K + 1
      end
    until PRIM -- THIS LINE CLOSES THE REPEAT
    P[I] = X
    print(X)
  end
end
PRIME()


BASIC

100 LET N = 100
110 LET M = 10
200 DIM P(100), V(100)
300 LET P(1) = 2
310 PRINT P(1)
320 LET X = 1
330 LET L = 1
340 LET S = 4
350 FOR I = 2 TO N
360  REM    repeat -- STOPS WHEN "UNTIL" CONDITION IS TRUE;
370   LET X = X + 2
380   IF S > X THEN 420
390    LET V(L) = S
400    LET L = L + 1
410    LET S = P(L)^2
420   REM
430   LET K = 2
440   LET P = 1
450   REM while PRIM and K < LIM do
455   IF P <> 1 THEN 520
460   IF K >= L THEN 520
470    IF V(K) >= X THEN 490
480     LET V(K) = V(K) + P(K)
490    REM
500    LET P = 0
501    IF X = V(K) THEN 510
502    LET P = 1
510    LET K = K + 1
515   GOTO 450
520  IF P <> 1 THEN 360
530  LET P(I) = X
540  PRINT X
550 NEXT I
999 END

Both programs work. They produce identical output.  The version in Lua may be a bit easier to read, given that variable names can be more than a single letter.

They are about the same size and complexity. Two versions of a program, one from today and one from the early years of computing, yet they have similar complexity.

Programming languages encode operations into pseudo-English instructions. If the measure of a programming language's capability is its capacity to represent operations in a minimal number of steps, then this example shows that programming languages have not changed over the past five decades.

Caution is advised. These examples (printing "hello" and calculating prime numbers) may be poor representatives of typical programs. Perhaps we should withhold judgement until we consider more (and larger) programs. After all, very few people in 2016 use BASIC; there must be a reason they have selected modern languages.

Perhaps it is better to keep asking the question and examining our criteria for programming languages.

No comments: