Fitzpatrick's Fabulous Future: new programming languages

Showing posts with label new programming languages. Show all posts

Wednesday, April 20, 2022

Advances in programming come from restraints

Advances in programming come from, to a large extent, advances in programming languages. And those advances in programming languages, unlikely as it seems, are mostly not expanded features but restrictions.

That advances come from restrictions seems counter-intuitive. How does fewer choices make us better programmers?

Let's look at some selected changes in programming languages, and how they enabled better programming.

The first set of restrictions was structured programming. Structured programming introduced the concepts of the IF/THEN/ELSE statement and the WHILE loop. More importantly, structured programming banished the GOTO statement (and its cousin, the IF/GOTO statement). This restriction was an important advancement for programming.

A GOTO statement allows for arbitrary flows of control within programs. Structured programming's IF/THEN/ELSE and WHILE statements (and WHILE's cousin, the FOR statement) force structure onto programs. Arbitrary flows of control were not possible.

The result was programs that were harder to write but easier to understand, easier to debug, and easier to modify. Structured programming -- the loss of GOTO -- was an advancement in programming.

A similar advance occurred with object-oriented programming. Like structured programming, object-oriented programming was a set of restrictions coupled with a set of new features. In object oriented programming, those restrictions were encapsulation (hiding data within a class) and the limiting of functions (requiring an instance of the class to execute). Data encapsulation protected data from arbitrary changes; one had to go through functions (in well-designed systems) to change the data. Instance functions were limited to executing on instances of the class, which meant that one had to *have* an instance of the class to call the function. Functions could not be called at arbitrary points in the code.

Both structured programming and object-oriented programming advanced the state of the art for programming. They did it by restricting the choices that programmers could make.

I'm going to guess that future advancements in programming will also come from restrictions in new programming languages. What could those restrictions be?

I have a few ideas.

One idea is immutable objects. This idea has been tested in the functional programming languages. Those languages often have immutable objects, objects which, once instantiated, cannot change their state.

In today's object-oriented programming languages, objects are often mutable. They can change their state, either through functions or direct access of member data.

Functional programming languages take a different view. Objects are immutable: once formed they cannot be changed. Immutable objects enforce discipline in programming: you must provide all of the ingredients when instantiating an object; you cannot partially initialize an object and add things later.

I would like to see a programming language that implements immutable objects. But not perfectly -- I want to allow for some objects that are not immutable. Why? Because the shift to "all objects are immutable" is too much, too fast. My preference is for a programming language to encourage immutable designs and require extra effort to design mutable objects.

A second idea is a limit to the complexity of expressions.

Today's programming languages allow for any amount of complexity in an expression. Expressions can be simple (such as A + 1) or complex (such as A + B/C - sqr(B + 7) / 2), or worse.

I want expressions to be short and simple. This means breaking a complex expression into multiple statements. The only language that I know that placed restrictions on expressions was early FORTRAN, and then only for the index to an array variable. (The required form was I*J+K, where I, J, and K were optional.)

Perhaps we could design a language that limited the number of operations in an expression. Simpler expressions are, well, simpler, and easier to understand and modify. Any expression that contained more than a specific number of operations would be an error, forcing the programmer to refactor the expression.

A third idea is limits on the size of functions and classes. Large functions and large classes are harder to understand than small functions and small classes. Most programming languages have a style-checker, and most style-checkers issue warnings for long functions or classes with lots of functions.

I want to strengthen those warnings and change them to errors. A function that is too long (I'm not sure how long is too long, but that's another topic) is an error -- and the compiler or interpreter rejects it. The same applies to a class: too many data members, or too many functions, and you get an error.

But like immutable objects, I will allow for some functions to be larger than the limit, and some classes to be more complex than the limit. I recognize that some classes and functions must break the rules. (But the mechanism to allow a function or class to break the rules must be a nuisance, more than a simple '@allowcomplex' attribute.)

Those are the restrictions that I think will help us advance the art of programming. Immutable objects, simple expressions, and small functions and classes.

Of these ideas, I think the immutable objects will be the first to enter mainstream programming. The concept has been implemented, some people have experience with it, and the experience has been positive. New languages that combine object-oriented programming with functional programming (much like Microsoft's F#, which is not so new) will allow more programmers to see the benefits of immutable objects.

I think programming will be better for it.

Friday, May 17, 2019

Procedures and functions are two different things

The programming language Pascal had many good ideas. Many of those ideas have been adopted by modern programming languages. One idea that hasn't been adopted was the separation of functions and procedures.

Some definitions are in order. In Pascal, a function is a subroutine that accepts input parameters, can access variables in its scope (and containing scopes), performs some computations, and returns a value. A procedure is similar: it accepts input parameters, can access variables in its scope and containing scopes, performs calculations, and ... does not return a value.

Pascal has the notion of functions and a separate notion of procedures. A function is a function and a procedure is a procedure, and the two are different. A function can be used (in early Pascal, must be used) in an expression. It cannot stand alone.

A procedure, in contrast, is a computational step in a program. It cannot be part of an expression. It is a single statement, although it can be part of an 'if' or 'while' statement block.

Functions and procedures have different purposes, and I believe that the creators of Pascal envisioned functions to be unable to change variables outside of themselves. Procedures, I believe, were intended to change variables outside of their immediate scope. In C++, a Pascal-style function would be a function that is declared 'const', and a procedure would be a function that returns 'void'.

This arrangement is different from the C idea of functions. C combines the idea of function and procedure into a single 'function' construct. A function may be designed to return a value, or it may be designed to return nothing. A function may change variables outside of its scope, but it doesn't have to. (It may or may not have "side effects".)

In the competition among programming languages, C won big early on, and Pascal (or rather, the ideas in Pascal), have gained acceptance slowly. The C notion of function has been carried by other popular languages: C++, Java, C#, Python, Ruby, and even Go.

I remember quite clearly learning about Pascal (many years ago) and feeling that C was superior to Pascal due to its single approach. I sneered (mentally) at Pascal's split between functions and procedures.

I have come to regret those feelings, and now see the benefit of separating functions and procedures. When building (or maintaining) large-ish systems in modern languages (C++, C#, Java, Python), I have created functions that follow the function/procedure split. These languages force one to write functions -- there is no construct for a procedure -- yet I designed some functions to return values and others to not return values. The value-returning functions I made 'const' when possible, and avoided side effects. The functions with side effects I designed to not return values. In sum, I built functions and procedures, although the compiler uses only the 'function' construct.

The future may hold programming languages that provide functions and procedures as separate constructs. I'm confident that we will see languages that have these two ideas. Here's why:

First, there is a new class of programming languages called "functional languages". These include ML, Erlang, Haskell, and F#, to name a few. These functional languages use Pascal's original idea of functions as code blocks that perform a calculation with no side effects and return a value. Language designers have already re-discovered the idea of the "pure function".

Second, most ideas from Pascal have been implemented in modern languages. Bounds-checking for arrays. Structured programming. Limited conversion of values from one type to another. The separation of functions and procedures is one more of these ideas.

The distinction between functions and procedures is one more concept that Pascal got right. I expect to see it in newer languages, perhaps over the next decade. The enthusiasts of functional programming will realize that pure functions are not sufficient and that they need procedures. We'll then see variants of functional languages that include procedures, with purists holding on to procedure-less languages. I'm looking forward to the division of labor between functions and procedures; it has worked well for me in my efforts and a formal recognition will help me convey this division to other programmers.

Tuesday, March 13, 2018

Programming languages and character sets

Programming languages are similar, but not identical. Even "common" things such as expressions can be represented differently in different languages.

FORTRAN used the sequence ".LT." for "less than", normally (today) indicated by the sign <, and ".GT." for "greater than", normally the sign >. Why? Because in the early days of computing, programs were written on punch cards, and punch cards used a small set of characters (uppercase alpha, numeric, and a few punctuation). The signs for "greater than" and "less than" were not part of that character set, so the language designers had to make do with what was available.

BASIC used the parentheses to denote both function arguments and variable subscripts. Nowadays, most languages use square brackets for subscripts. Why did BASIC use parentheses? Because most BASIC programs were written on Teletype machines, large mechanical printing terminals with a limited set of characters. And -- you guessed it -- the square bracket characters were not part of that set.

When C was invented, we were moving from Teletypes to paperless terminals. These new terminals supported the entire ASCII character set, including lowercase letters and all of the punctuation available on today's US keyboards. Thus, C used all of the symbols available, including lowercase letters and just about every punctuation symbol.

Today we use modern equipment to write programs. Just about all of our equipment supports UNICODE. The programming languages we create today use... the ASCII character set.

Oh, programming languages allow string literals and identifiers with non-ASCII characters, but none of our languages require the use of a non-ASCII character. No languages make you declare a lambda function with the character λ, for example.

Why? I would think that programmers would like to use the characters in the larger UNICODE set. The larger character set allows for:

Greek letters for variable names
Multiplication (×) and division (÷) symbols
Distinct characters to denote templates and generics

C++ chose to denote templates with the less-than and greater-than symbols. The decision was somewhat forced, as C++ lives in the ASCII world. Java and C# have followed that convention, although its not clear that they had to. Yet the decision is has its costs; tokenizing source code is much harder with symbols that hold multiple meanings. Java and C# could have used the double-angle brackets (« and ») to denote generics.

I'm not recommending that we use the entire UNICODE set. Several glyphs (such as 'a') have different code points assigned (such as the Latin 'a' and the Cyrllic 'a') and having multiple code points that appear the same is, in my view, asking for trouble. Identifiers and names which appear (to the human eye) to be the same would be considered different by the compiler.

But I am curious as to why we have settled on ASCII as the character set for languages.

Maybe its not the character set. Maybe it is the equipment. Maybe programmers (and more specifically, program language designers) use US keyboards. When looking for characters to represent some idea, our eyes fall upon our keyboards, which present the ASCII set of characters. Maybe it is just easier to use ASCII characters -- and then allow UNICODE later.

If that's true (that our keyboard guides our language design) then I don't expect languages to expand beyond ASCII until keyboards do. And I don't expect keyboards to expand beyond ASCII just for programmers. I expect programmers to keep using the same keyboards that the general computing population uses. In the US, that means ASCII keyboards. In other countries, we will continue to see ASCII-with accented characters, Cyrillic, and special keyboards for oriental languages. I see no reason for a UNICODE-based keyboard.

If our language shapes our thoughts, then our keyboard shapes our languages.

Monday, June 5, 2017

Better programming languages let us do more -- and less

We tend to think that better programming languages let us programmers do more. Which is true, but it is not the complete picture.

Better languages also let us do less. They remove capabilities. In doing so, they remove the possibility for errors.

PL/I was better than COBOL and FORTRAN because it let us write free-form source code. In COBOL and FORTRAN, the column in which code appeared was significant. The restrictions were from the technology of the time (punch cards) but once in the language they were difficult to remove.

BASIC was better than FORTRAN because it eliminated FORMAT specifications. FORMAT specifications were necessary to parse input data and format output data. They were precise, opaque, and easy to get wrong. BASIC, with no such specifications, removed the possibility of errors from such specifications. BASIC also fixed the DO loops of FORTRAN and removed restrictions on subscript form. (In FORTRAN, a subscript could not be an arbitrary expression but had to have the form A*B+C. Any component could be zero and omitted so A+C was allowed, as was A*B. But you could not use A+B+C or A/2.)

Pascal was better than BASIC because it limited the use of GOTO statements. In BASIC, you could use a GOTO to transfer control to any other part of the program, including in and out of loops or subroutines. It made for "spaghetti code" which was difficult to understand and debug. Pascal put an end to that, with a constrained form of GOTO.

Java eliminated the need for the explicit 'delete' or 'free' operations on allocated memory. You cannot forget the 'delete' operation -- you can't write one at all! The internal garbage collector recycles memory. In Java, it is much harder to create memory leaks than in C++ and C.

Python forces us to consider indentation as part of the code. In C, C++, Java, and C#, you can write:

initialize();

if (some_condition)
do_something();
do_another_thing();

complete_the_work();

But the code acts in a way you may not expect. Python's use of indentation to specify code organization makes the code clearer. The Python code:

initialize()

if some_condition:
do_something()
do_another_thing()

complete_the_work()

does what you expect.

New programming languages do provide new capabilities. (Often, they are refinements to constructs and concepts that were implemented roughly in earlier programming languages.) A new programming language is a combination of new things we can do and old things we no longer need to do.

When considering a new language (or reviewing the current language for a project), keep in mind not only the things that a new language lets you do, but also the things that it won't let you do.

Thursday, February 26, 2015

The names of programming languages

A recent project involved a new programming language (a variant of the classic Dartmouth BASIC) and therefore saw the need for a name for the new language. Of course, a new name should be different from existing names, so I researched the names of programming languages.

My first observation was that we, as an industry, have created a lot of programming languages! I usually think of the set of languages as BASIC, FORTRAN, COBOL, Pascal, C, C++, Java, C#, Perl, Python, and Ruby -- the languages that I use currently or have used in the past. If I think about it, I add some other common languages: RPG, Eiffel, F#, Modula, Prolog, LISP, Forth, AWK, ML, Haskell, and Erlang. (These a programming languages that I have either read about or discussed with fellow programmers.)

As I surveyed existing programming languages, I found many more languages. I found extinct languages, and extant languages. And I noticed various things about their names.

Programming languages, except for a few early languages, have names that are easily pronounceable. Aside from the early "A-0" and "B-0", most languages have recognizable names. We switched quickly from designations of letters and numbers to names like FORTRAN and COBOL.

I also noticed that some names last longer than others. Not just the languages, but the names. The best example may be "BASIC". Created in the 1960s, the BASIC language has undergone a number of changes (some of them radical) and has had a number of implementations. Yet despite its changes, the name has remained. The name has been extended with letters ("CBASIC", "ZBASIC", "GW-BASIC"), numbers ("BASIC-80", "BASIC09"), symbols ("BASIC++"), prefix words ("Visual Basic", "True Basic", "Power Basic"), and sometimes suffixes ("BASIC-PLUS"). Each of these names was used for a variant of the original BASIC language, with separate enhancements.

Other long-lasting names include "LISP", "FORTRAN", and "COBOL".

Long-lasting names tend to have two syllables. Longer names do not stay around. The early languages "BACAIC", "COLINGO", "DYNAMO", "FLOW-MATIC", "FORTRANSIT", "JOVIAL", "MATH-MATIC", "MILITRAN", "NELIAC", and "UNICODE" (yes it was a programming language, different from today's character set) are no longer with us.

Short names of single letters have little popularity. Aside from C (the one exception), other languages (B, D, J) see limited acceptance. The up-and-coming R language for numeric analysis (derived from S, another single-letter language) may have limited acceptance, based on the name. It may be better to change the name to "R-squared" with the designation "R2".

Our current set of popular languages have two-syllable names: "VB" (pronounced "vee bee"), "C#" ("see' sharp"), Java, Python, and Ruby. Even the database language SQL is pronounced "see' kwell" to give it two syllables. Popular languages with only one syllable are Perl (which seems to be on the decline) C, and Swift.

PHP and C++ have three names with syllables. Objective-C clocks in with a possibly unwieldy four syllables; perhaps this was an incentive for Apple to change to Swift.

I expect our two-syllable names to stay with us. The languages may change, as they have changed in the past.

As for my new programming language, the one that was derived from BASIC? I picked a new name, not a variant of BASIC. As someone has already snagged the name "ACIDIC", I chose the synonym alkaline, but changed it to a two-syllable form: Alkyl.

Tuesday, April 8, 2014

Java and C# really derive from Pascal

In the history of programming, the "C versus Pascal" debate was a heated and sometimes unpleasant discussion on language design. It was fought over the capabilities of programming languages.

The Pascal side advocated restrictive code, what we today call "type safe" code. Pascal was designed as a teaching language, a replacement for BASIC that contained the ideas of structured programming.

The C side advocated liberal code, what we today call "unsafe code". C was designed not to teach but to get the job done, specifically systems programming jobs that required access to the hardware.

The terms "type safe" and "unsafe code" are telling, and they give away the eventual resolution. C won over Pascal in the beginning, at kept its lead for many years, but Pascal (or rather the ideas in Pascal) have been gaining ground. Even the C and C++ standards have been moving towards the restrictive design of Pascal.

Notable ideas in Pascal included:

Structured programming (blocks, 'while' and 'repeat' loops, 'switch/case' flows, limited goto)
Array data type
Array index checking at run-time
Pointer data type
Strong typing, including pointers
Overflow checking on arithmetic operations
Controlled conversions from one type to another
A constant qualifier for variables
Standard features across implementations

Notable ideas in K&R C:

Structured programming (blocks, 'while' and 'repeat' loops, 'switch/case' flows, limited goto)
Array data type (sort of -- really a syntactic trick involving pointers)
No checking of array index (at compile-time or run-time)
Pointer data type
Strong typing, but not for pointers
No overflow checking
Free conversions from one type to another
No 'const' qualifier
Many features were implementation-dependent

For programmers coming from BASIC (or FORTRAN) the structured programming concepts, common in C and Pascal, were appealing. Yet the other aspects of the C and Pascal programming languages were polar opposites.

It's hard to define a clear victor in the C/Pascal war. Pascal got a boost with the UCSD p-System and a large boost with the Turbo Pascal IDE. C was big in the Unix world and also big for programming Windows. Today, Pascal is viewed as a legacy language while C and its derivatives C++, Java, and C# enjoy popularity.

But if C won in name, Pascal won in spirit. The early, liberal K&R C has been "improved" with later standards that limit the ability to implicitly convert data types. K&R C was also enhanced with the 'const' keyword for variables. C++ introduced classes which allow programmers to build their own data types. So do Java and C#, and they eliminate pointers, check array indexes, and standardize operations across platforms. Java and C# are closer to the spirit of Pascal than C.

Yes, there are differences. Java and C# use braces to define blocks, where Pascal used 'BEGIN' and 'END'. Pascal declares variables with the name-and-then-type sequence, while C, Java, and C# use the type-and-then-name sequence. But if you look at the features, especially those Pascal features criticized as reducing performance, you see them in Java and C#.

We had many debates about the C and Pascal programming languages. In the end, it was not the "elegance" of a language or the capabilities of the IDE that solved the argument. Advances in technology neutralized many of our objections. Faster processors and improvements in compilers eliminated the need for speed tricks and allowed for the "performance killing" features in Pascal. And without realizing it, we adopted them, slowly, quietly, and with new names. We didn't adopt the name Pascal, we didn't adopt the syntax of Pascal, but we did adopt the features of Pascal.

Sunday, January 29, 2012

Productivity

Today, I wrote a small program to parse web pages and create a graph of relationships between them. The task got me thinking of the capabilities of programming languages.

Different programming languages offer different levels of "power". (By power, I mean the amount of work that can be done by a specific amount of code. Powerful languages require less code to perform the same task as weak languages.)

To some extent, power can be masked by the job. Some languages are weak but well-suited to some tasks. A jet airliner is a powerful craft, yet sometimes a hang glider is needed. Let's stick to mainstream projects, which operate in the middle range. We can avoid the hang glider tasks, and we can avoid the tasks that need supersonic stealth flight.

The languages for mainstream projects have, over time, improved. From the Elder Days of FORTRAN and COBOL, through the object revolution of C++, to the network era of Java and C#, later languages (and their IDEs and libraries) have been more capable than their predecessors.

Today I used Ruby and its libraries to parse the HTML in web pages, and GraphViz to render the graph of relationships. I was able to build my "system" in a few hours. Most of my work was "standing on the shoulders of giants", since I could leverage powerful tools. To build the same system in the early 1990s would require a bit more work (Ruby had not been invented, although GraphViz was around).

Languages and tools of today are more powerful than languages and tools of a decade ago. No surprise there. C++ is more powerful than C, Java more powerful than C++, C# slightly more powerful than Java (once you include the IDE and libraries), and Python and Ruby are more powerful than C#. The evolutionary ladder of languages moves us up the scale of power.

But let us think about legacy projects. Legacy projects often stay with their original technology. A project started in C++ often stays in C++; converting to a different language often means a complete re-write of the entire code base. (And often the re-write must be done in toto, converting all code at once.)

Project managers are right to evaluate the cost of converting to a different language. It is a significant effort, and the task can divert resources from enhancements and defect fixes.

But project managers must balance the costs of conversion against the costs of not converting.

A naive approach says that the cost of not converting is zero -- the project continues running and new versions are released.

A more savvy understanding of non-conversion includes the opportunity cost, the cost of lower productivity over time.

The decision of conversion to a new language is similar to the decision to purchase a new car. The old car runs, but needs maintenance and gets poor gas mileage. A new car will cost a lot of money up front, and will have lower maintenance and operating costs (and will probably be more reliable). When do you buy the new car?

An additional musing:

A large project using older technology can be viewed as inefficient, since it requires more developers to perform the same work in a similar system that uses a more powerful language. Therefore, the decision to defer conversion to a new language is a decision to increase the development team (or to forego the opportunity to reduce the development team).

Sunday, November 13, 2011

Programming languages exist to make programming easy

We create programming languages to make programming easy.

After the invention of the electronic computer, we invented FORTRAN and COBOL. Both languages made the act of programming easy. (Easier than assembly language, the only other game in town.) FORTRAN made it easy to perform numeric computations, and despite the horror of its input/output methods, it also made it easier to read and write numerical values. COBOL also made it easy to perform computations and input/output operations; it was slanted towards structured data (records containing fields) and readability (longer variable names, and verbose language keywords).

After the invention of time-sharing (and a shortage of skilled programmers), we invented BASIC, a teaching language that linguistically sat between FORTRAN and COBOL.

After the invention of minicomputers (and the ability for schools and research groups to purchase them), we invented the C language, which combined structured programming concepts from Algol and Pascal with the low-level access of assembly language. The combination allowed researchers to connect computers to laboratory equipment and write efficient programs for processing data.

After the invention of graphics terminals and PCs, we invented Microsoft Windows and the Visual Basic language to program applications in Windows. The earlier languages of C and C++ made programming in Windows possible, but Visual Basic was the language that made it easy.

After PCs became powerful enough, we invented Java, which leverage the power to run interpreted byte-code programs, but also (and more significantly) handle threaded applications. Support for threading was built into the Java language.

With the invention of networking, we created HTML and web browsers and Javascript.

I have left out equipment (microcomputers with CP/M, the Xerox Alto, the Apple Macintosh) and languages (LISP, RPG, C#, and others). I'm looking at the large trend using a few data points. If your favorite computer or language is missing, please forgive my arbitrary selections.

We create languages to make tasks easier for ourselves. As we develop new hardware, larger data sets, and new ways of connecting data and devices, we need new languages to handle the capabilities of these new inventions.

Looking forward, what can we see? What new hardware will stimulate the creation of new languages?

Cloud computing is big, and will lead to creative solutions. We're already seeing new languages that have increased rigor in the form of functional programming. We moved from non-structured programming to structured programming to object-oriented programming; in the future I expect us to move to functional programming. Functional programming is a good fit for cloud computing, with its immutable objects and no-side-effect functions.

Mobile programming is popular, but I don't expect a language for mobile apps. Instead, I expect new languages for mobile devices. The Java, C#, and Objective-C languages (from Google, Microsoft, and Apple, respectively) will mutate into languages better suited to small, mobile devices that must run applications in a secure manner. I expect that security, not performance, will be the driver for change.

Big data is on the rise. We'll see new languages to handle the collection, synchronization, querying, and analysis of large data sets. The language 'Processing' is a start in that direction, letting us render data in a visual form. The invention of NoSQL databases is also a start; look for a 'NoSQL standard' language (or possibly several).

The new languages will allow us to handle new challenges. But that doesn't mean that the old languages will go away. Those languages were designed to handle specific challenges, and they handle them well. So well that new languages have not displaced them. (Billing systems are still in COBOL, scientists still use Fortran, and lots of Microsoft Windows applications are still running in Visual Basic.) New languages are optimized for different criteria and cannot always handle the older tasks; I would not want to write a billing system in C, for example.

As the 'space' of our challenges expands, we invent languages to fill that space. Let's invent some languages and meet some new challenges!

Fitzpatrick's Fabulous Future