Sunday, November 20, 2016

Matters of state

One difference between functional programming and "regular" programming is the use of mutable state. In traditional programming, objects or programs hold state, and that state can change over time. In functional programming, objects are immutable and do not change their state over time.

One traditional beginner's exercise for object-oriented programming is to simulate an automated teller machine (ATM). It is often used because it maps an object of the physical world onto an object in the program world, and the operations are nontrivial yet well-understood.

It also defines an object (the ATM) which exists over time and has different states. As people deposit and withdraw money, the state of the ATM changes. (With enough withdrawals the state becomes "out of cash" and therefore "out of service".)

The ATM model is also a good example of how we in the programming industry have been focussed on changing state. For more than half a century, our computational models have used state -- mutable state -- often to the detriment of maintenance and clarity.

Our fixation on mutable state is clear to those who use functional programming languages. In those languages, state is not mutable. Programs may have objects, but objects are fixed and unchanging. Once created, an object may contain state but cannot change. (If you want an object to contain a different state, then you must create a new object with that different state.)

Programmers in the traditional languages of Java and C# got an exposure to this notion with the immutable strings in those languages. A string in Java is immutable; you cannot change its contents. If you want a string with a different content, such as all lower-case letters, you have to create a new object.

Programming languages such as Haskell and Erlang make that notion the norm. Every object is immutable, every object may contain state but cannot be changed.

Why has it taken us more than fifty years to arrive at this, um, well, state?

I have a few ideas. As usual with my explanations, we have to understand our history.

One reason has to do with efficiency. The other reason has to do with mindset.

Reason one: Objects with mutable state were more efficient.

Early computers were less powerful than those of today. With today's computers, we can devote some percentage of processing to memory management and garbage collection. We can afford the automatic memory management. Earlier computers were less powerful, and creating and destroying objects were operations that took significant amounts of time. It was more efficient to re-use the same object and simply change its state rather than create a new object with the new state, point to that new object, and destroy the old object and return its memory to the free pool.

Reason two: Objects with mutable state match the physical world

Objects in the real world hold physical state. Whether it is an ATM or an automobile or an employee's file, the physical version of the object is one that changes over time. Books at a library, in the past, contained a special pocket glued to the back cover used to hold a card which indicated the borrower and the date due back at the library. That card held different state over time; each lending would be recorded -- until the card was filled.

The physical world has few immutable objects. (Technically all objects are mutable, as they wear and fade over time. But I'm not talking about those kinds of changes.) Most objects, especially objects for computation, change and hold state. Cash registers, ATMs, dresser draws that hold t-shirts, cameras (with film that could be exposed or unexposed), ... just about everything holds state. (Some things do not change, such as bricks and stones used for houses and walkpaths, but those are not used for computation.)

We humans have been computing for thousands of years, and we've been doing it with mutable objects for all of that time. From tally sticks with marks cut by a knife to mechanical adding machines, we've used objects with changing states. It's only in the past half-century that it has been possible to compute with immutable objects.

That's about one percent of the time, which considering everything we're doing, isn't bad. We humans advance our calculation methods slowly. (Consider how long it took to change from Roman numerals to Arabic, and how long it took to accept zero as a number.)

I think the lesson of functional programming (with its immutable objects) is this: We are still in the early days of human computing. We are still figuring out how to calculate, and how to represent those calculations for others to understand. We should not assume that we are "finished" and that programming is "done". We have a long journey ahead of us, one that will bring more changes. We learn as we travel on this journey, and they end -- or even the intermediate points -- is not clear. It is an adventure.

Saturday, October 29, 2016

For compatibility, look across and not down

The PC industry has always had an obsession with compatibility. Indeed, the first question many people asked about computers was "Is it PC compatible?". A fair question at the time, as most software was written for the IBM PC and would not run on other systems.

Over time, our notion of "PC compatible" has changed. Most people today think of a Windows PC as "an IBM PC compatible PC" when in fact the hardware has changed so much that any current PC is not "PC compatible". (You cannot attach any device from an original IBM PC, including the keyboard, display, or adapter card.)

Compatibility is important -- not for everything but for the right things.

The original IBM PCs were, of course, all "PC compatible" (by definition) and the popular software packages (Lotus 1-2-3, Wordstar, WordPerfect, dBase III) were all "PC compatible" too. Yet one could not move data from one program to another. Text in Wordstar was in Wordstar format, numbers and formulas in Lotus 1-2-3 was in Lotus format, and data in dBase was in dBase format.

Application programs were compatible "downwards" but not across". That is, they were compatible with the underlying layers (DOS, BIOS, and the PC hardware) but not with each other. To move data from one program to another it was necessary to "print" to a file and read the file into the destination program. (This assumes that both programs had the ability to export and import text data.)

Windows addressed that problem, with its notion of the clipboard and the ability to copy and paste text. The clipboard was not a complete solution, and Microsoft worked on other technologies to make programs more compatible (DDE, COM, DCOM, and OLE). This was the beginning of compatibility between programs.

The networked applications and the web gave us more insight to compatibility. The first networked applications for PCs were the client/server applications such as Powerbuilder. One PC hosted the database and other PCs sent requests to store, retreive, and update data. At the time, all of the PCs were running Windows.

The web allowed for variation between client and server. With web servers and capable network software, it was no longer necessary for all computers to use the same hardware and operating systems. A Windows PC could request a web page from a server running Unix. A Macintosh PC could request a web page from a Linux server.

Web services use the same mechanisms for web pages, and allow for the same variation between client and server.

We no longer need "downwards" compatibility -- but we do need compatibility "across". A server must understand the incoming request. The client must understand the response. In today's world we ensure compatibility through the character set (UNICODE), and the data format (commonly HTML, JSON, or XML).

This means that our computing infrastructure can vary. It's no longer necessary to ensure that all of our computers are "PC compatible". I expect variation in computer hardware, as different architectures are used for different applications. Large-scale databases may use processors and memory designs that can handle the quantities of data. Small processors will be used for "internet of things" appliances. Nothing requires them to all use a single processor design.

Thursday, October 27, 2016

Actually, new platforms compel new languages

My previous post claimed that new platforms spur the adoption of new languages. The more I think about it, the more I believe I was wrong.

New platforms don't simply spur the adoption of new languages. They compel the adoption of new languages.

A platform offers a set of capabilities and a set of concepts. Languages are designed around those capabilities and concepts. Change the platform, and you change the capabilities and the concepts, and you need a different language.

For batch processing, COBOL and FORTRAN were acceptable. They didn't work for timeshare systems and they didn't work for microcomputers. Timeshare and micrcomputers were interactive, and they needed a language like BASIC.

Windows and OS/2's Presentation Manager required a language that could handle event-driven processing, and object oriented languages (first C++, later Visual Basic and Java) met that need.

Web applications needed a run-time system that was constantly present. We started web applications with Perl and C++ and quickly learned that the startup time for the programs was costing us performance. Java and C# load their run-time systems independently of the application program, and can keep the run-time in memory, which gives better performance.

Changing languages (and the mindset of the underlying platform) is a significant effort. One does not do it lightly, which is why large organizations tend to use older technology.

But where does this leave functional languages?

From my view, I see no platform that requires the use of functional languages. And without a compelling reason to use functional languages, I expect that we won't. Oh, functional languages won't go away; lots of developers use them. (I myself am a fan, although I tend to use Ruby or Python for my own projects.)

But functional languages won't become the popular languages of the day without a reason. Inertia will keep us with other languages.

At least, until a platform arrives that compels the capabilities of functional languages. That platform might be the "internet of things" although I expect the first versions will use the currently popular languages.

Functional languages offer increased reliability. It may be possible to prove certain programs correct, which will be of interest to government agencies, banks, and anyone in the security field. (Turing proved that we could not prove correct most programs, but I believe that we can prove correct programs that are subject to a set of constraints. Functional languages may offer those constrants.)

I'm not abandoning functional languages. I like what they offer. Yet I recognize that they require an additional level of discipline (much like structured programming and object-oriented programmed required additional discipline) and we will switch only when the benefits are higher than the cost.

Sunday, October 23, 2016

Platforms spur languages

We developers like to think that we make languages popular. We like to think that it is us that decide the fate of languages. I'm not so sure. It may be that we're in less control than we believe.

I posit that it is platforms, that is, hardware and operating systems, that drive the popularity of programming languages.

The first programming languages, COBOL and FORTRAN, became popular only after computers became accepted in business and, more importantly, as platforms for running applications.

When computers were special-purpose devices, programs were written for the specific computer, usually in machine code or assembly language, with the idea that the program was part of the computer. The thought of moving an application from one computer to another was alien, like toasting bread with a refridgerator.

It was only after our mindset changed that we thought of moving programs. The idea of the computer as a platform, and the successive idea of moving a program from one computer to another, led to the idea of "portability" and languages common across computers. Thus, COBOL and FORTRAN were born.

Timesharing, a platform different from batch processing, gave us BASIC. Microcomputers in the late 1970s took BASIC to a higher level, making it one of the most popular languages at the time.

In the age of the IBM PC, computer programs become commercial, and BASIC was not suitable. BASIC offered no way to obscure source code and its performance was limited. Most commercial programs were written in assembly language, a choice possible due to the uniformity of the PC.

Microsoft Windows drove C++ and Visual Basic. While it was possible to write applications for Windows in assembly language, it was tedious and maintenance was expensive. C++ and its object-oriented capabilities made programming in Windows practical. Later, Visual Basic had just enough object-oriented capabilities to make programming in Windows not only possible but also easy.

Java's rise started with the internet and the web, but was also popular because it was "not from Microsoft". In this, Java is an outlier: not driven by a technical platform, but by emotion.

MacOS and iOS raised the popularity of Objective-C, at least until Apple announced Swift as the successor language for development. After that announcement, Objective-C dropped in popularity (and Swift started its rise).

Cloud computing and large data sets ("big data") has given us new languages for data management.

Looking at these data points, it seems that different platforms are favorable to different languages, and the popularity of the platform drives the popularity of the language.

Which is not to say that a language cannot achieve popularity without a platform. Some languages have, but they are few.

Some languages have failed to gain popularity despite other, non-platform inducements. The two largest "failures" that come to mind are PL/I and Ada. PL/I was invented by IBM and enjoyed some popularity, but has faded into all-but-obscurity. Ada was mandated by the US Department of Defense as a standard for all applications, and it, too, has faded.

If IBM at its height of market control, and the US DoD through mandate cannot make languages popular, what else (besides platforms) can?

If my theory is correct, then developers and managers should consider platforms when selecting programming languages. Often, developers pick what they know (not an unreasonable choice) and managers pick what is "safe" (also not unreasonable). If using a platform conducive to the language, then the decision is sound. If switching platforms, a different language may be the better option.

Wednesday, October 19, 2016

We prefer horizontal layers, not vertical stacks

Looking back at the 60-plus years of computer systems, we can see a pattern of design preferences. That pattern is an initial preference for vertical design (that is, a complete system from top to bottom) followed by a change to a horizontal divide between a platform and applications on that platform.

A few examples include mainframe computers, word processors, and smart phones.

Mainframe computers, in the early part of the mainframe age, were special-purpose machines. IBM changed the game with its System/360, which was a general-purpose computer. The S/360 could be used for commercial, scientific, or government organizations. It provided a common platform upon which ran application programs. The design was revolutionary, and it has stayed with us. Minicomputers followed the "platform and applications" pattern, as did microcomputers and later IBM's own Personal Computer.

When we think of the phrase "word processor", we think of software, most often Microsoft's "Word" application (which runs on the Windows platform). But word processors were not always purely software. The original word processors were smart typewriters, machines with enhanced capabilities. In the mid-1970s, a word processor was a small computer with a keyboard, display, processing unit, floppy disks for storage, a printer, and software to make it all go.

But word processors as hardware did not last long. We moved away from the all-in-one design. In its place we used the "application on platform" approach, using PCs as the hardware and a word processing application program.

More recently, smart phones have become the platform of choice for photography, music, and navigation. We have moved away from cameras (a complete set of hardware and software for taking pictures), moved away from MP3 players (a complete set of hardware and software for playing music), and moved away from navigation units (a complete set of hardware and software for providing directions). In their place we use smart phones.

(Yes, I know that some people still prefer discrete cameras, and some people still use discrete navigation systems. I myself still use an MP3 player. But the number of people who use discrete devices for these tasks is small.)

I tried thinking of single-use devices that are still popular, and none came to mind. (I also tried thinking of applications that ran on platforms that moved to single-use devices, and also failed.)

It seems we have a definite preference for the "application on platform" design.

What does this mean for the future? For smart phones, possibly not so much -- other than they will remain popular until a new platform arrives. For the "internet of things", it means that we will see a number of task-specific devices such as thermostats and door locks until an "internet of things" platform comes along, and then all of those task-specific devices will become obsolete (like the task-specific mainframes or word processor hardware).

For cloud systems, perhaps the cloud is the platform and the virtual servers are the applications. Rather than discrete web servers and database servers the cloud is the platform for web server and database server "applications" that will be containerized versions of the software. The "application on platform" pattern means that cloud and containers will endure for some time, and is a good choice for architecture.

Sunday, August 21, 2016

Scale

Software development requires an awareness of scale. That is, the knowledge of the size of the software, and the selection of the right tools and skills to manage the software.

Scale is present in just about every aspect of human activity. We humans have different forms of transportation. We can walk, ride bicycles, and drive automobiles. (We can also run, swim, ride busses, and fly hang gliders, but I will stick to the three most common forms.)

Walking is a simple activity. It requires little in the way of planning, little in the way of equipment, and little in the way of skill.

Riding a bicycle is somewhat higher in complexity. It requires equipment (the bicycle) and some skill (knowing how to ride the bicycle). It requires planning: when we arrive at our destination, what do we do with the bicycle? We may need a lock, to secure the bicycle. We may need a safety helmet. The clothes we wear must be tight-fitting, or at least designed to not interfere with the bicycle mechanism. At this level of complexity we must be aware of the constraints on our equipment.

Driving an automobile is more complex than riding a bicycle. It requires more equipment (the automobile) and more skills (driving). It requires more planning. In addition to the car we will need gasoline. And insurance. And registration. At this level of complexity we must be aware of the constraints on our equipment and also the requirements from external entities.

Translating this to software, we can see that some programs are simple and some are complex.

Tiny programs (perhaps a one-liner in Perl) are simple enough that we can write them, use them, and then discard them, with no other thoughts for maintenance or upkeep. Let's consider them the equivalent of walking.

Small programs (perhaps a page-long script in Ruby) require more thought to prepare (and test) and may need comments to describe some of the inner workings. They can be the equivalent of riding a bicycle.

Large programs (let's jump to sophisticated packages with hundreds of thousands of lines of code) require a language that helps us organize our code; comprehensive sets of unit, component, and system tests; and documentation to record the rationale behind design decisions. These are the analogue of driving an automobile.

But here is where software differs from transportation: software changes. The three modes of transportation (walking, bicycle, automobile) are static and distinct. Walking is walking. Driving is driving. But software is dynamic -- frequently, over time, it grows. Many large programs start out as small programs.

A small program can grow into a larger program. When it does, the competent developer changes the tools and practices used to maintain the code. Which means that a competent programmer must be aware of the scale of the project, and the changes in that scale. As the code grows, a programmer must change his (or her) tools and practice.

There's more, of course.

The growth effect of software extends to the management of project teams. A project may start with a small number of people. Over time, the project grows, and the number of people on the team increases.

The techniques and practices of a small team don't work for larger teams. Small teams can operate informally, with everyone in one room and everyone talking to everyone. Larger teams are usually divided into sub-teams, and the coordination of effort is harder. Informal methods work poorly, and the practices must be more structured, more disciplined.

Enterprise-class projects are even more complex. They require more discipline and more structure than the merely large projects. The structure and discipline is often expressed in bureaucracy, frustrating the whims of "lone cowboy" programmers.

Just as a competent developer changes tools and adjusts practices to properly manage a growing code base, the competent manager must also change tools and adjust practices to properly manage the team. Which means that a competent manager must be aware of the scale of the project, and the changes in that scale. As a project grows, a manager must lead his (or her) people through the changes.

Sunday, August 14, 2016

PC-DOS killed the variants of programming languages

BASIC was the last language with variants. Not "variant" in the flexible-value type known as "Variant", but in different implementations. Different dialects.

Many languages have versions. C# has had different releases, as has Java. Perl is transitioning from version 5 (which had multiple sub-versions) to version 6 (which will most likely have multiple sub-versions).  But that's not what I'm talking about.

Some years ago, languages had different dialects. There were multiple implementations with different features. COBOL and FORTRAN all had machine-specific versions. But BASIC had the most variants. For example:

- Most BASICs used the "OPEN" statement to open files, but HP BASIC and GE BASIC used the "FILES" statement which listed the names of all files used in the program. (An OPEN statement lists only one file, and a program may use multiple OPEN statements.)

- Most BASICs used parentheses to enclose variable subscripts, but some used square brackets.

- Some BASICS had "ON n GOTO" statements but some used "GOTO OF n" statements.

- Some BASICS allowed the apostrophe as a comment indicator; others did not.

- Some BASICS allowed for statement modifiers, such as "FOR" or "WHILE" at the end of a statement and others did not.

These are just some of the differences in the dialects of BASIC. There were others.

What interests me is not that BASIC had so many variants, but that languages since then have not. The last attempt at a dialect of a language was Microsoft's Visual J++ as a variant of Java. They were challenged in court by Sun, and no one has attempted a special version of a language since. Because of this, I place the demise of variants in the year 2000.

There are two factors that come to mind. One is standards, the other is open source.
BASIC was introduced to the industry in the 1960s. There was no standard for BASIC, except perhaps for the Dartmouth implementation, which was the first implementation. The expectation of standards has risen since then, with standards for C, C++, Java, C#, JavaScript, and many others. With clear standards, different implementations of languages would be fairly close.

The argument that open source prevented the creation of variants of languages makes some sense. After all, one does not need to create a new, special version of a language when the "real" language is available for free. Why invest effort into a custom implementation? And the timing of open source is coincidental with the demise of variants, with open source rising just as language variants disappeared.

But the explanation is different, I think. It was not standards (or standards committees) and it was not open source that killed variants of languages. It was the PC and Windows.

The IBM PC and PC-DOS saw the standardization and commoditization of hardware, and the separation of software from hardware.

In the 1960s and 1970s, mainframe vendors and minicomputer vendors competed for customer business. They sold hardware, operating systems, and software. They needed ways to distinguish their offerings, and BASIC was one way that they could do that.

Why BASIC? There were several reasons. It was a popular language. It was easily implemented. It had no official standard, so implementors could add whatever features they wanted. A hardware manufacturer could offer their own, special version of BASIC as a productivity tool. IBM continued this "tradition" with BASIC in the ROM of the IBM PC and an advanced BASIC with PC-DOS.

But PC compatibles did not offer BASIC, and didn't need to. When manufacturers figured out how to build compatible computers, the factors for selecting a PC compatible were compatibility and price, not a special version of BASIC. Software would be acquired separately from the hardware.

Mainframes and minicomputers were expensive systems, sold with operating systems and software. PCs were different creatures, sold with an operating system but not software.

It's an idea that holds today.

With software being sold (or distributed, as open source) separately from the hardware, there is no need to build variants. Commercial languages (C#, Java, Swift) are managed by the company, which has an incentive for standardization of the language. Open source languages (Perl, Python, Ruby) can be had "for free", so why build a special version -- especially when that special version will need constant effort to match the changes in the "original"? Standard-based languages (C, C++) offer certainty to customers, and variants on them offer little advantage.

The only language that has variants today seems to be SQL. That makes sense, as the SQL interpreter is bundled with the database. Creating a variant is a way of distinguishing a product from the competition.

I expect that the commercial languages will continue to evolve along consistent lines. Microsoft will enhance C#, but there will be only the Microsoft implementation (or at least, the only implementation of significance). Oracle will maintain Java. Apple will maintain Swift.

The open source languages will evolve too. But Perl, Python, and Ruby will continue to see single implementations.

SQL will continue be the outlier. It will continue to see variants, as different database vendors supply them. It will be interesting to see what happens with the various NoSQL databases.