Thursday, February 22, 2018

Variables are... variable

The nice (and sometimes frustrating) thing about different programming languages is that they handle things, well, differently.

Consider the simple concept of a "variable". It is a thing in a program that holds a value. One might think that programming languages agree on something so simple -- yet they don't.

There are four actions associated with variables: declaration, initialization, assignment, and reference (as in 'use', not a constrained of pointer).

A declaration tells the compiler or interpreter that a variable exists often specifies a type. Some languages require a declaration before a variable can be assigned a value or used in a calculation; others do not.

Initialization provides a value during declaration. This is a special form of assignment.

Assignment assign a value, and is not part of declaration. It occurs after the declaration, and may occur multiple times. (Some languages do not allow for assignment after initialization.)

A reference of a variable is the use the value, to compute some other value or provide the value to a function or subroutine.

It turns out that different languages have different ideas about these operations. Most languages follow these definitions; the differences are in the presence or absence of these actions.

C, C++, and COBOL (to pick a few languages) all require declarations, allow for initialization, and allow for assignment and referencing.

In C and C++ we can write:

int i = 17;
i = 12;
printf("%d\n", i);

This code declares and initializes the variable i as an int with value 17, then assigns the value 12, then calls the printf() function to write the value to the console. COBOL has similar abilities, although the syntax is different.

Perl, Python, and Ruby (to pick different languages) do not have declarations and initialization but do allow for assignment and reference.

In Ruby we can write:

i = 12
puts i

Which assigns the value 12 to i and then writes it to the console. Notice that there is no declaration and no type specified for the variable.

Astute readers will point out that Python and Ruby don't have "variables", they have "names". A name is a reference to an underlying object, and multiple names can point to the same object. Java and C# use a similar mechanism for non-trivial objects. The difference is not important for this post.

BASIC (not Visual Basic or VB.NET, but old-school BASIC) is a bit different. Like Perl, Python, and Ruby it does not have declarations. Unlike those languages, it lets you write a statement that prints the value of an undeclared (and therefore uninitialized and unassigned) variable:

130 PRINT A

This is a concept that would cause a C compiler to emit errors and refuse to supply an executable. In the scripting languages, this would cause a run-time error. BASIC handles this with grace, providing a default value of 0 for numeric variables and "" for text (string) variables. (The AWK language also assigns a reasonable value to uninitialized variables.)

FORTRAN has an interesting mix of capabilities. It allows for declarations but does not require them. Variables have a specific type, either integer or real. When a variable is listed in a declaration, it has the specified type; when a variable is not declared it has a type based on the first letter of its name!

Like BASIC, variables in FORTRAN can be referenced without being initialized. Unlike BASIC, it does not provide default values. Instead it blissfully uses whatever values are in memory at the location assigned for the variable. (COBOL, C, and C++ have this behavior too.)

What's interesting is the trend over time. Let's look at a summary of languages and their capabilities, and the year in which they were created:

Languages which require declaration but don't force initialization

COBOL (1950s)
Pascal (1970s)
C (1970s)
C++ (1980s)
Java (1995)
C# (2000s)
Objective-C (1990s)

Languages which require declaration and require initialization (or initialize for you)

EIFFEL (1980s)
Go (2010)
Swift (2010)
Rust (2015)

Languages which don't allow declarations and require assignment before reference

Perl (1987)
Python (1989)
Ruby (1990s)

Languages which don't require (or don't allow) declaration and allow reference before assignment

FORTRAN (1950s)
BASIC (1960s)
AWK (1970s)
PowerShell (2000s)

This list of languages is hardly comprehensive, and it ignores the functional programming languages completely. Yet it shows something interesting: there is no trend for variables. That is, languages in the 1950s required declarations (COBOL) or didn't (FORTRAN), and later languages require declaration (Go) or don't (Ruby). Early languages allow for initialization, as do later languages. Early languages allow for use-without-assignment, as do later languages.

Perhaps a more comprehensive list may show trends over time. Perhaps splitting out the different versions of languages will show convergence of variables. Or perhaps not.

It is possible that we (that is, programmers and language designers) don't really know how we want variables to behave in our languages. With more than half a century of experience we're still developing languages with different capabilities.

Or maybe we have, in some way, decided. Its possible that we have decided that we need languages with different capabilities for variables (and therefore different languages). If that is the case, then we will never see a single language become dominant.

That, I think, is a good outcome.


No comments: