Saturday, February 25, 2012

Upper case and spaces


Early written language (that is, natural language) used only upper case letters and no space characters. The space was a medieval invention. In the ancient world, the phrase "texts from the ancient world" appeared as TEXTSFROMTHEANCIENTWORLD. Our concepts of "upper case" and "lower case" come from Renaissance times, with the invention of movable type and two cases to hold the slugs for letters.

The original art of writing was to simply record the phonemes, allowing the reader to re-create the spoken sounds. Text was for re-enacting a speech, not storing information. The notion of "reading to one's self" had yet to emerge. A reader was little more than a modern-day tape recorder in playback mode, generating sounds for the audience (the true readers of the text).


It is interesting that the development of our programming languages parallels the development of written natural language.

Early programming languages used upper case letter in their character set. This was true for FORTRAN, BASIC, COBOL, assembly language, and other languages. As IBM card punches and readers used a limited character set with only uppercase letters, we can see that language design followed the available equipment.

Some early programming languages did not care about space characters. FORTRAN was space-agnostic; the parsing of the language was independent of SPACE characters. One could write

      THETA = 10.0
      DO 9 I = 10,20
      ALPHA = BETA * I
      SUM = ALPHA + SUM
9     CONTINUE

or one could write


      THETA=10.0
      DO9I=10,20
      ALPHA=BETA*I
      SUM=ALPHA+SUM
9     CONTINUE


the FORTRAN compiler would treat them identically. (FORTRAN did care about columns 1 through 7; the first six were reserved for line numbers and column 7 was reserved for the continuation indicator.

Such space-agnostic parsing was not unique to FORTRAN. Early versions of the BASIC language were space-agnostic. (Such was not the case for COBOL or assembly language, both which relied upon SPACE characters to separate tokens.)

In time, we humans learned that written (natural language) text could be read silently, and that space characters improved the readability of the text. Similarly, we human programmers learned that space characters improved the readability of programs (as did blank lines and indentation) and that programs should be read, and should be written to be read. (See The Psychology of Computer Programming by Gerald Weinberg.) While it took several centuries to invent the concept of a space to separate words in natural language, we adopted the concept of whitespace in programming languages in less than a decade.

If we consider spaces and indentation the marks of advanced program-writing, then that leads to the conclusion that the language which uses spaces and indentation most is the most advanced programming language. The languages that use spacing and indentation most, as I understand it, are Python (with its indentation for marking code blocks) and assembly language (with its heavy reliance on spacing and indentation for parsing). All other language fall after those two.

No comments: