Wednesday, December 28, 2016

Moving to the cloud requires a lot. Don't be surprised.

Moving applications to the cloud is not easy. Existing applications cannot be simply dropped onto cloud servers and leverage the benefits of cloud computing. And this should not surprise people.

The cloud is a different environment than a web server. (Or a Windows desktop.) Moving to the cloud is a change in platform.

The history of IT has several examples of such changes. Each transition from one platform to another required changes to the code, and often changes to how we *think* about programs.

The operating system

The first changes occurred in the mainframe age. The very first was probably the shift from a raw hardware platform to hardware with an operating system. With raw hardware, the programmer has access to the entire computing system, including memory and devices. With an operating system, the program must request such access through the operating system. It was no longer possible to write directly to the printer; one had to request the use of each device. This change also saw the separation of tasks between programmers and system operators, the latter handling the scheduling and execution of programs. One could not use the older programs; they had to be rewritten to call the operating system rather that communicate with devices.

Timesharing and interactive systems

Timesharing was another change in the mainframe era. In contrast to batch processing (running one program at a time, each program reading and writing data as needed but with no direct interaction with the programmer), timeshare systems interacted with users. Timeshare systems saw the use of on-line terminals, something not available for batch systems. The BASIC language was developed to take advantage of these terminals. Programs had to wait for user input and verify that the input was correct and meaningful. While batch systems could merely write erroneous input to a 'reject' file, timeshare systems could prompt the user for a correction. (If they were written to detect errors.) One could not use a batch program in an interactive environment; programs had to be rewritten.

Minicomputers

The transition from mainframes to minicomputers was, interestingly, one of the simpler conversions in IT history. In many respects, minicomputers were smaller versions of mainframes. IBM minicomputers used the batch processing model that matched its mainframes. Minicomputers from manufacturers like DEC and Data General used interactive systems, following the lead of timeshare systems. In this case, is *was* possible to move programs from mainframes to minicomputers.

Microcomputers

If minicomputers allowed for an easy transition, microcomputers were the opposite. They were small and followed the path of interactive systems. Most ran BASIC in ROM with no other possible languages. The operating systems available (CP/M, MS-DOS, and a host of others) were limited and weak compared to today's, providing no protection for hardware and no multitasking. Every program for microcomputers had to be written from scratch.

Graphical operating systems

Windows (and OS/2 and other systems, for those who remember them) introduced a number of changes to programming. The obvious difference between Windows programs and the older DOS programs was, of course, the graphical user interface. From the programmer's perspective, Windows required event-driven programming, something not available in DOS. A Windows program had to respond to mouse clicks and keyboard entries anywhere on the program's window, which was very different from the DOS text-based input methods. Old DOS programs could not be simply dropped into Windows and run; they had to be rewritten. (Yes, technically one could run the older programs in the "DOS box", but that was not really "moving to Windows".)

Web applications

Web applications, with browsers and servers, HTML and "submit" requests, with CGI scripts and JavaScript and CSS and AJAX, were completely different from Windows "desktop" applications. The intense interaction of a window with fine-grained controls and events was replaced with the large-scale request, eventually getting smaller AJAX and AJAX-like web services. The separation of user interface (HTML, CSS, JavaScript, and browser) from "back end" (the server) required a complete rewrite of applications.

Mobile apps

Small screen. Touch-based. Storage on servers, not so much on the device. Device processor for handling input; main processing on servers.

One could not drop a web application (or an old Windows desktop application) onto a mobile device. (Yes, you can run Windows applications on Microsoft's Surface tablets. But the Surface tablets are really PCs in the shape of tablets, and they do not use the model used by iOS or Android.)

You had to write new apps for mobile devices. You had to build a collection of web services to be run on the back end. (Not too different from the web application back end, but not exactly the same.)

Which brings us to cloud applications

Cloud applications use multiple instances of servers (web servers, database servers, and others) each hosting services (called "microservices" because the service is less that a full application) communicating through message queues.

One cannot simply move a web application into the cloud. You have to rewrite them to split computation and coordination, the latter handled by queues. Computation must be split into small, discrete services. You must write controller services that make requests to multiple microservices. You must design your front-end apps (which run on mobile devices and web browsers) and establish an efficient API to bridge the front-end apps with the back-end services.

In other words, you have to rewrite your applications. (Again.)

A different platform requires a different design. This should not be a surprise.


Wednesday, December 14, 2016

Steps to AI

The phrase "Artificial Intelligence" (AI) has been used to describe computer programs that can perform sophisticated, autonomous operations, and it has been used for decades. (One wag puts it as "artificial intelligence is twenty years away... always".)

Along with AI we have the term "Machine Learning" (ML). Are they different? Yes, but the popular usages make no distinction. And for this post, I will consider them the same.

Use of the term waxes and wanes. The AI term was popular in the 1980s and it is popular now. Once difference between the 1980s and now: we may have enough computing power to actually pull it off.

Should anyone jump into AI? My guess is no. AI has preconditions, things you should be doing before you start with a serious commitment to AI.

First, you need a significant amount of computing power. Second, you need a significant amount of human intelligence. With AI and ML, you are teaching the computer to make decisions. Anyone who has programmed a computer can tell you that this is not trivial.

It strikes me that the necessary elements for AI are very similar to the necessary elements for analytics. Analytics is almost the same as AI - analyzing large quantities of data - except it uses humans to interpret the data, not computers. Analytics is the predecessor to AI. If you're successful at analytics, then you are ready to move on to AI. If you haven't succeeded (or even attempted) at analytics, you're not ready for AI.

Of course, one cannot simply jump into analytics and expect to be successful. Analytics has its own prerequisites. Analytics needs data, the tools to analyze the data and render it for humans, and smart humans to interpret the data. If you don't have the data, the tools, and the clever humans, you're not ready for analytics.

But we're not done with levels of prerequisites! The data for analytics (and eventually AI) has its own set of preconditions. You have to collect the data, store the data, and be able to retrieve the data. You have to understand the data, know its origin (including the origin date and time), and know its expiration date (if it has one). You have to understand the quality of your data.

The steps to artificial intelligence are through data collection, metadata, and analytics. Each step has to be completed before you can advance to the next level. (Much like the Capability Maturity Model.) Don't make the mistake of starting a project without the proper experience in place.

Sunday, November 20, 2016

Matters of state

One difference between functional programming and "regular" programming is the use of mutable state. In traditional programming, objects or programs hold state, and that state can change over time. In functional programming, objects are immutable and do not change their state over time.

One traditional beginner's exercise for object-oriented programming is to simulate an automated teller machine (ATM). It is often used because it maps an object of the physical world onto an object in the program world, and the operations are nontrivial yet well-understood.

It also defines an object (the ATM) which exists over time and has different states. As people deposit and withdraw money, the state of the ATM changes. (With enough withdrawals the state becomes "out of cash" and therefore "out of service".)

The ATM model is also a good example of how we in the programming industry have been focussed on changing state. For more than half a century, our computational models have used state -- mutable state -- often to the detriment of maintenance and clarity.

Our fixation on mutable state is clear to those who use functional programming languages. In those languages, state is not mutable. Programs may have objects, but objects are fixed and unchanging. Once created, an object may contain state but cannot change. (If you want an object to contain a different state, then you must create a new object with that different state.)

Programmers in the traditional languages of Java and C# got an exposure to this notion with the immutable strings in those languages. A string in Java is immutable; you cannot change its contents. If you want a string with a different content, such as all lower-case letters, you have to create a new object.

Programming languages such as Haskell and Erlang make that notion the norm. Every object is immutable, every object may contain state but cannot be changed.

Why has it taken us more than fifty years to arrive at this, um, well, state?

I have a few ideas. As usual with my explanations, we have to understand our history.

One reason has to do with efficiency. The other reason has to do with mindset.

Reason one: Objects with mutable state were more efficient.

Early computers were less powerful than those of today. With today's computers, we can devote some percentage of processing to memory management and garbage collection. We can afford the automatic memory management. Earlier computers were less powerful, and creating and destroying objects were operations that took significant amounts of time. It was more efficient to re-use the same object and simply change its state rather than create a new object with the new state, point to that new object, and destroy the old object and return its memory to the free pool.

Reason two: Objects with mutable state match the physical world

Objects in the real world hold physical state. Whether it is an ATM or an automobile or an employee's file, the physical version of the object is one that changes over time. Books at a library, in the past, contained a special pocket glued to the back cover used to hold a card which indicated the borrower and the date due back at the library. That card held different state over time; each lending would be recorded -- until the card was filled.

The physical world has few immutable objects. (Technically all objects are mutable, as they wear and fade over time. But I'm not talking about those kinds of changes.) Most objects, especially objects for computation, change and hold state. Cash registers, ATMs, dresser draws that hold t-shirts, cameras (with film that could be exposed or unexposed), ... just about everything holds state. (Some things do not change, such as bricks and stones used for houses and walkpaths, but those are not used for computation.)

We humans have been computing for thousands of years, and we've been doing it with mutable objects for all of that time. From tally sticks with marks cut by a knife to mechanical adding machines, we've used objects with changing states. It's only in the past half-century that it has been possible to compute with immutable objects.

That's about one percent of the time, which considering everything we're doing, isn't bad. We humans advance our calculation methods slowly. (Consider how long it took to change from Roman numerals to Arabic, and how long it took to accept zero as a number.)

I think the lesson of functional programming (with its immutable objects) is this: We are still in the early days of human computing. We are still figuring out how to calculate, and how to represent those calculations for others to understand. We should not assume that we are "finished" and that programming is "done". We have a long journey ahead of us, one that will bring more changes. We learn as we travel on this journey, and they end -- or even the intermediate points -- is not clear. It is an adventure.

Saturday, October 29, 2016

For compatibility, look across and not down

The PC industry has always had an obsession with compatibility. Indeed, the first question many people asked about computers was "Is it PC compatible?". A fair question at the time, as most software was written for the IBM PC and would not run on other systems.

Over time, our notion of "PC compatible" has changed. Most people today think of a Windows PC as "an IBM PC compatible PC" when in fact the hardware has changed so much that any current PC is not "PC compatible". (You cannot attach any device from an original IBM PC, including the keyboard, display, or adapter card.)

Compatibility is important -- not for everything but for the right things.

The original IBM PCs were, of course, all "PC compatible" (by definition) and the popular software packages (Lotus 1-2-3, Wordstar, WordPerfect, dBase III) were all "PC compatible" too. Yet one could not move data from one program to another. Text in Wordstar was in Wordstar format, numbers and formulas in Lotus 1-2-3 was in Lotus format, and data in dBase was in dBase format.

Application programs were compatible "downwards" but not across". That is, they were compatible with the underlying layers (DOS, BIOS, and the PC hardware) but not with each other. To move data from one program to another it was necessary to "print" to a file and read the file into the destination program. (This assumes that both programs had the ability to export and import text data.)

Windows addressed that problem, with its notion of the clipboard and the ability to copy and paste text. The clipboard was not a complete solution, and Microsoft worked on other technologies to make programs more compatible (DDE, COM, DCOM, and OLE). This was the beginning of compatibility between programs.

The networked applications and the web gave us more insight to compatibility. The first networked applications for PCs were the client/server applications such as Powerbuilder. One PC hosted the database and other PCs sent requests to store, retreive, and update data. At the time, all of the PCs were running Windows.

The web allowed for variation between client and server. With web servers and capable network software, it was no longer necessary for all computers to use the same hardware and operating systems. A Windows PC could request a web page from a server running Unix. A Macintosh PC could request a web page from a Linux server.

Web services use the same mechanisms for web pages, and allow for the same variation between client and server.

We no longer need "downwards" compatibility -- but we do need compatibility "across". A server must understand the incoming request. The client must understand the response. In today's world we ensure compatibility through the character set (UNICODE), and the data format (commonly HTML, JSON, or XML).

This means that our computing infrastructure can vary. It's no longer necessary to ensure that all of our computers are "PC compatible". I expect variation in computer hardware, as different architectures are used for different applications. Large-scale databases may use processors and memory designs that can handle the quantities of data. Small processors will be used for "internet of things" appliances. Nothing requires them to all use a single processor design.

Thursday, October 27, 2016

Actually, new platforms compel new languages

My previous post claimed that new platforms spur the adoption of new languages. The more I think about it, the more I believe I was wrong.

New platforms don't simply spur the adoption of new languages. They compel the adoption of new languages.

A platform offers a set of capabilities and a set of concepts. Languages are designed around those capabilities and concepts. Change the platform, and you change the capabilities and the concepts, and you need a different language.

For batch processing, COBOL and FORTRAN were acceptable. They didn't work for timeshare systems and they didn't work for microcomputers. Timeshare and micrcomputers were interactive, and they needed a language like BASIC.

Windows and OS/2's Presentation Manager required a language that could handle event-driven processing, and object oriented languages (first C++, later Visual Basic and Java) met that need.

Web applications needed a run-time system that was constantly present. We started web applications with Perl and C++ and quickly learned that the startup time for the programs was costing us performance. Java and C# load their run-time systems independently of the application program, and can keep the run-time in memory, which gives better performance.

Changing languages (and the mindset of the underlying platform) is a significant effort. One does not do it lightly, which is why large organizations tend to use older technology.

But where does this leave functional languages?

From my view, I see no platform that requires the use of functional languages. And without a compelling reason to use functional languages, I expect that we won't. Oh, functional languages won't go away; lots of developers use them. (I myself am a fan, although I tend to use Ruby or Python for my own projects.)

But functional languages won't become the popular languages of the day without a reason. Inertia will keep us with other languages.

At least, until a platform arrives that compels the capabilities of functional languages. That platform might be the "internet of things" although I expect the first versions will use the currently popular languages.

Functional languages offer increased reliability. It may be possible to prove certain programs correct, which will be of interest to government agencies, banks, and anyone in the security field. (Turing proved that we could not prove correct most programs, but I believe that we can prove correct programs that are subject to a set of constraints. Functional languages may offer those constrants.)

I'm not abandoning functional languages. I like what they offer. Yet I recognize that they require an additional level of discipline (much like structured programming and object-oriented programmed required additional discipline) and we will switch only when the benefits are higher than the cost.

Sunday, October 23, 2016

Platforms spur languages

We developers like to think that we make languages popular. We like to think that it is us that decide the fate of languages. I'm not so sure. It may be that we're in less control than we believe.

I posit that it is platforms, that is, hardware and operating systems, that drive the popularity of programming languages.

The first programming languages, COBOL and FORTRAN, became popular only after computers became accepted in business and, more importantly, as platforms for running applications.

When computers were special-purpose devices, programs were written for the specific computer, usually in machine code or assembly language, with the idea that the program was part of the computer. The thought of moving an application from one computer to another was alien, like toasting bread with a refridgerator.

It was only after our mindset changed that we thought of moving programs. The idea of the computer as a platform, and the successive idea of moving a program from one computer to another, led to the idea of "portability" and languages common across computers. Thus, COBOL and FORTRAN were born.

Timesharing, a platform different from batch processing, gave us BASIC. Microcomputers in the late 1970s took BASIC to a higher level, making it one of the most popular languages at the time.

In the age of the IBM PC, computer programs become commercial, and BASIC was not suitable. BASIC offered no way to obscure source code and its performance was limited. Most commercial programs were written in assembly language, a choice possible due to the uniformity of the PC.

Microsoft Windows drove C++ and Visual Basic. While it was possible to write applications for Windows in assembly language, it was tedious and maintenance was expensive. C++ and its object-oriented capabilities made programming in Windows practical. Later, Visual Basic had just enough object-oriented capabilities to make programming in Windows not only possible but also easy.

Java's rise started with the internet and the web, but was also popular because it was "not from Microsoft". In this, Java is an outlier: not driven by a technical platform, but by emotion.

MacOS and iOS raised the popularity of Objective-C, at least until Apple announced Swift as the successor language for development. After that announcement, Objective-C dropped in popularity (and Swift started its rise).

Cloud computing and large data sets ("big data") has given us new languages for data management.

Looking at these data points, it seems that different platforms are favorable to different languages, and the popularity of the platform drives the popularity of the language.

Which is not to say that a language cannot achieve popularity without a platform. Some languages have, but they are few.

Some languages have failed to gain popularity despite other, non-platform inducements. The two largest "failures" that come to mind are PL/I and Ada. PL/I was invented by IBM and enjoyed some popularity, but has faded into all-but-obscurity. Ada was mandated by the US Department of Defense as a standard for all applications, and it, too, has faded.

If IBM at its height of market control, and the US DoD through mandate cannot make languages popular, what else (besides platforms) can?

If my theory is correct, then developers and managers should consider platforms when selecting programming languages. Often, developers pick what they know (not an unreasonable choice) and managers pick what is "safe" (also not unreasonable). If using a platform conducive to the language, then the decision is sound. If switching platforms, a different language may be the better option.

Wednesday, October 19, 2016

We prefer horizontal layers, not vertical stacks

Looking back at the 60-plus years of computer systems, we can see a pattern of design preferences. That pattern is an initial preference for vertical design (that is, a complete system from top to bottom) followed by a change to a horizontal divide between a platform and applications on that platform.

A few examples include mainframe computers, word processors, and smart phones.

Mainframe computers, in the early part of the mainframe age, were special-purpose machines. IBM changed the game with its System/360, which was a general-purpose computer. The S/360 could be used for commercial, scientific, or government organizations. It provided a common platform upon which ran application programs. The design was revolutionary, and it has stayed with us. Minicomputers followed the "platform and applications" pattern, as did microcomputers and later IBM's own Personal Computer.

When we think of the phrase "word processor", we think of software, most often Microsoft's "Word" application (which runs on the Windows platform). But word processors were not always purely software. The original word processors were smart typewriters, machines with enhanced capabilities. In the mid-1970s, a word processor was a small computer with a keyboard, display, processing unit, floppy disks for storage, a printer, and software to make it all go.

But word processors as hardware did not last long. We moved away from the all-in-one design. In its place we used the "application on platform" approach, using PCs as the hardware and a word processing application program.

More recently, smart phones have become the platform of choice for photography, music, and navigation. We have moved away from cameras (a complete set of hardware and software for taking pictures), moved away from MP3 players (a complete set of hardware and software for playing music), and moved away from navigation units (a complete set of hardware and software for providing directions). In their place we use smart phones.

(Yes, I know that some people still prefer discrete cameras, and some people still use discrete navigation systems. I myself still use an MP3 player. But the number of people who use discrete devices for these tasks is small.)

I tried thinking of single-use devices that are still popular, and none came to mind. (I also tried thinking of applications that ran on platforms that moved to single-use devices, and also failed.)

It seems we have a definite preference for the "application on platform" design.

What does this mean for the future? For smart phones, possibly not so much -- other than they will remain popular until a new platform arrives. For the "internet of things", it means that we will see a number of task-specific devices such as thermostats and door locks until an "internet of things" platform comes along, and then all of those task-specific devices will become obsolete (like the task-specific mainframes or word processor hardware).

For cloud systems, perhaps the cloud is the platform and the virtual servers are the applications. Rather than discrete web servers and database servers the cloud is the platform for web server and database server "applications" that will be containerized versions of the software. The "application on platform" pattern means that cloud and containers will endure for some time, and is a good choice for architecture.

Sunday, August 21, 2016

Scale

Software development requires an awareness of scale. That is, the knowledge of the size of the software, and the selection of the right tools and skills to manage the software.

Scale is present in just about every aspect of human activity. We humans have different forms of transportation. We can walk, ride bicycles, and drive automobiles. (We can also run, swim, ride busses, and fly hang gliders, but I will stick to the three most common forms.)

Walking is a simple activity. It requires little in the way of planning, little in the way of equipment, and little in the way of skill.

Riding a bicycle is somewhat higher in complexity. It requires equipment (the bicycle) and some skill (knowing how to ride the bicycle). It requires planning: when we arrive at our destination, what do we do with the bicycle? We may need a lock, to secure the bicycle. We may need a safety helmet. The clothes we wear must be tight-fitting, or at least designed to not interfere with the bicycle mechanism. At this level of complexity we must be aware of the constraints on our equipment.

Driving an automobile is more complex than riding a bicycle. It requires more equipment (the automobile) and more skills (driving). It requires more planning. In addition to the car we will need gasoline. And insurance. And registration. At this level of complexity we must be aware of the constraints on our equipment and also the requirements from external entities.

Translating this to software, we can see that some programs are simple and some are complex.

Tiny programs (perhaps a one-liner in Perl) are simple enough that we can write them, use them, and then discard them, with no other thoughts for maintenance or upkeep. Let's consider them the equivalent of walking.

Small programs (perhaps a page-long script in Ruby) require more thought to prepare (and test) and may need comments to describe some of the inner workings. They can be the equivalent of riding a bicycle.

Large programs (let's jump to sophisticated packages with hundreds of thousands of lines of code) require a language that helps us organize our code; comprehensive sets of unit, component, and system tests; and documentation to record the rationale behind design decisions. These are the analogue of driving an automobile.

But here is where software differs from transportation: software changes. The three modes of transportation (walking, bicycle, automobile) are static and distinct. Walking is walking. Driving is driving. But software is dynamic -- frequently, over time, it grows. Many large programs start out as small programs.

A small program can grow into a larger program. When it does, the competent developer changes the tools and practices used to maintain the code. Which means that a competent programmer must be aware of the scale of the project, and the changes in that scale. As the code grows, a programmer must change his (or her) tools and practice.

There's more, of course.

The growth effect of software extends to the management of project teams. A project may start with a small number of people. Over time, the project grows, and the number of people on the team increases.

The techniques and practices of a small team don't work for larger teams. Small teams can operate informally, with everyone in one room and everyone talking to everyone. Larger teams are usually divided into sub-teams, and the coordination of effort is harder. Informal methods work poorly, and the practices must be more structured, more disciplined.

Enterprise-class projects are even more complex. They require more discipline and more structure than the merely large projects. The structure and discipline is often expressed in bureaucracy, frustrating the whims of "lone cowboy" programmers.

Just as a competent developer changes tools and adjusts practices to properly manage a growing code base, the competent manager must also change tools and adjust practices to properly manage the team. Which means that a competent manager must be aware of the scale of the project, and the changes in that scale. As a project grows, a manager must lead his (or her) people through the changes.

Sunday, August 14, 2016

PC-DOS killed the variants of programming languages

BASIC was the last language with variants. Not "variant" in the flexible-value type known as "Variant", but in different implementations. Different dialects.

Many languages have versions. C# has had different releases, as has Java. Perl is transitioning from version 5 (which had multiple sub-versions) to version 6 (which will most likely have multiple sub-versions).  But that's not what I'm talking about.

Some years ago, languages had different dialects. There were multiple implementations with different features. COBOL and FORTRAN all had machine-specific versions. But BASIC had the most variants. For example:

- Most BASICs used the "OPEN" statement to open files, but HP BASIC and GE BASIC used the "FILES" statement which listed the names of all files used in the program. (An OPEN statement lists only one file, and a program may use multiple OPEN statements.)

- Most BASICs used parentheses to enclose variable subscripts, but some used square brackets.

- Some BASICS had "ON n GOTO" statements but some used "GOTO OF n" statements.

- Some BASICS allowed the apostrophe as a comment indicator; others did not.

- Some BASICS allowed for statement modifiers, such as "FOR" or "WHILE" at the end of a statement and others did not.

These are just some of the differences in the dialects of BASIC. There were others.

What interests me is not that BASIC had so many variants, but that languages since then have not. The last attempt at a dialect of a language was Microsoft's Visual J++ as a variant of Java. They were challenged in court by Sun, and no one has attempted a special version of a language since. Because of this, I place the demise of variants in the year 2000.

There are two factors that come to mind. One is standards, the other is open source.
BASIC was introduced to the industry in the 1960s. There was no standard for BASIC, except perhaps for the Dartmouth implementation, which was the first implementation. The expectation of standards has risen since then, with standards for C, C++, Java, C#, JavaScript, and many others. With clear standards, different implementations of languages would be fairly close.

The argument that open source prevented the creation of variants of languages makes some sense. After all, one does not need to create a new, special version of a language when the "real" language is available for free. Why invest effort into a custom implementation? And the timing of open source is coincidental with the demise of variants, with open source rising just as language variants disappeared.

But the explanation is different, I think. It was not standards (or standards committees) and it was not open source that killed variants of languages. It was the PC and Windows.

The IBM PC and PC-DOS saw the standardization and commoditization of hardware, and the separation of software from hardware.

In the 1960s and 1970s, mainframe vendors and minicomputer vendors competed for customer business. They sold hardware, operating systems, and software. They needed ways to distinguish their offerings, and BASIC was one way that they could do that.

Why BASIC? There were several reasons. It was a popular language. It was easily implemented. It had no official standard, so implementors could add whatever features they wanted. A hardware manufacturer could offer their own, special version of BASIC as a productivity tool. IBM continued this "tradition" with BASIC in the ROM of the IBM PC and an advanced BASIC with PC-DOS.

But PC compatibles did not offer BASIC, and didn't need to. When manufacturers figured out how to build compatible computers, the factors for selecting a PC compatible were compatibility and price, not a special version of BASIC. Software would be acquired separately from the hardware.

Mainframes and minicomputers were expensive systems, sold with operating systems and software. PCs were different creatures, sold with an operating system but not software.

It's an idea that holds today.

With software being sold (or distributed, as open source) separately from the hardware, there is no need to build variants. Commercial languages (C#, Java, Swift) are managed by the company, which has an incentive for standardization of the language. Open source languages (Perl, Python, Ruby) can be had "for free", so why build a special version -- especially when that special version will need constant effort to match the changes in the "original"? Standard-based languages (C, C++) offer certainty to customers, and variants on them offer little advantage.

The only language that has variants today seems to be SQL. That makes sense, as the SQL interpreter is bundled with the database. Creating a variant is a way of distinguishing a product from the competition.

I expect that the commercial languages will continue to evolve along consistent lines. Microsoft will enhance C#, but there will be only the Microsoft implementation (or at least, the only implementation of significance). Oracle will maintain Java. Apple will maintain Swift.

The open source languages will evolve too. But Perl, Python, and Ruby will continue to see single implementations.

SQL will continue be the outlier. It will continue to see variants, as different database vendors supply them. It will be interesting to see what happens with the various NoSQL databases.

Monday, August 8, 2016

Agile is all about code quality

Agile promises clean code. That's the purpose of the 'refactor' phase. After creating a test and modifying the code, the developer refactors the code to eliminate compromises made during the changes.

But how much refactoring is enough? One might flippantly say "as much as it takes" but that's not an answer.

For many shops, the answer seems to be "as much as the developer thinks is needed". Other shops allow refactoring until the end of the development cycle. The first is subjective and opens the development team to the risk of spending too much time on refactoring and not enough on adding features. The second is arbitrary and risks short-changing the refactoring phase and allowing messy code to remain in the system.

Agile removes risk by creating automated tests, creating them before modifying the code, and having developers run those automated tests after all changes. Developers must ensure that all tests pass; they cannot move on to other changes while tests are failing.

This process removes judgement from the developer. A developer cannot say that the code is "good enough" without the tests confirming it. The tests are the deciders of completeness.

I believe that we want the same philosophy for code quality. Instead of allowing a developer to decide when refactoring has reached "good enough", we will instead use an automated process to make that decision.

We already have code quality tools. C and C++ have had lint for decades. Other languages have tools as well. (Wikipedia has a page for static analysis tools.) Some are commercial, others open source. Most can be tailored to meet the needs of the team, placing more weight on some issues and ignoring others. My favorite at the moment is 'Rubocop', a style-checking tool for Ruby.

I expect that Agile processes will adopt a measured approach to refactoring. By using one (or several) code assessors, a team can ensure quality of the code.

Such a change is not without ramifications. This change, like the use of automated tests, takes judgement away from the programmer. Code assessment tools can consider many things, some of which are style. They can examine indentation, names of variables or functions, the length or complexity of a function, or the length of a line of code. They can check the number of layers of 'if' statements or 'while' loops.

Deferring judgement to the style checkers will affect managers as well as programmers. If a developer must refactor code until it passes the style checker, then a manager cannot cut short the refactoring phase. Managers will probably not like this change -- it takes away some control. Yet it is necessary to maintain code quality. By ending refactoring before the code is at an acceptable quality, managers allow poor code to remain in the system, which will affect future development.

Agile is all about code quality.

Sunday, July 31, 2016

Agile pushes ugliness out of the system

Agile differs from Waterfall in many ways. One significant way is that Agile handles ugliness, and Waterfall doesn't.

Agile starts by defining "ugliness" as an unmet requirement. It could be a new feature or a change to the current one. The Agile process sees the ugliness move through the system, from requirements to test to code to deployment. (Waterfall, in contrast, has the notion of requirements but not the concept of ugliness.)

Let's look at how Agile considers ugliness to be larger than just unmet requirements.

The first stage is an unmet requirement. With the Agile process, development occurs in a set of changes (sometimes called "sprints") with a small set of new requirements. Stakeholders may have a long list of unmet requirements, but a single sprint handles a small, manageable set of them. The "ugliness" is the fact that the system (as it is at the beginning of the sprint) does not perform them.

The second stage transforms the unmet requirements into tests. By creating a test -- an automated test -- the unmet requirement is documented and captured in a specific form. The "ugliness" has been captured and specified.

After capture, changes to code move the "ugliness" from a test to code. A developer changes the system to perform the necessary function, and in doing so changes the code. But the resulting code may be "ugly" -- it may duplicate other code, or it may be difficult to read.

The fourth stage (after unmet requirements, capture, and coding) is to remove the "ugliness" of the code. This is the "refactoring" stage, when code is improved without changing the functions it performs. Modifying the code to remove the ugliness is the last stage. After refactoring, the "ugliness" is gone.

The ability to handle "ugliness" is the unique capability of Agile methods. Waterfall has no concept of code quality. It can measure the number of defects, the number of requirements implemented, and even the number of lines of code, but it doesn't recognize the quality of the code. The quality of the code is simply its ability to deliver functionality. This means that ugly code can collect, and collect, and collect. There is nothing in Waterfall to address it.

Agile is different. Agile recognizes that code quality is important. That's the reason for the "refactor" phase. Agile transforms requirements into tests, then into ugly code, and finally into beautiful (or at least non-ugly) code. The result is requirements that are transformed into maintainable code.

Tuesday, July 26, 2016

The changing role of IT

The original focus of IT was efficiency and accuracy. Today, the expectation still includes efficiency and accuracy, yet adds increased revenue and expanded capabilities for customers.

IT has been with us for more than half a century, if you count IT as not only PCs and servers but also minicomputers, mainframes, and batch processing systems for accounting and finance.

Computers were originally large, expensive, and fussy beasts. They required an whole room to themselves. Computers cost a lot of money. Mainframes cost hundreds of thousands of dollars (if not millions). They needed a coterie of attendants: operators, programmers, service technicians, and managers.

Even the early personal computers were expensive. A PC in the early 1980s cost three to five thousand dollars. They didn't need a separate room, but they were a significant investment.

The focus was on efficiency. Computers were meant to make companies more efficient, processing transactions and generating reports faster and more accurately than humans.

Because of their cost, we wanted computers to operate as efficiently as possible. Companies who purchased mainframes would monitor CPU and disk usage to ensure that they were operating in the ninety-percent range. If usage was higher than that, they knew they needed to expand their system; if less, they had spent too much on hardware.

Today, we focus less on efficiency and more on growing the business. We view automation and big data as mechanisms for new services and ways to acquire new customers.

That's quite a shift from the "spend just enough to print accounting reports" mindset. What changed?

I can think of two underlying changes.

First, the size and cost of computers have dropped. A cell phone that fits in your pocket and costs less than a thousand dollars. Laptop PCs can be acquired for similar prices; Chromebooks for significantly less. Phones, tablets, Chromebooks, and even laptops can be operated by a single person.

The drop in cost means that we can worry less about internal efficiency. Buying a mainframe computer that was too large was an expensive mistake. Buying an extra laptop is almost unnoticed. Investing in IT is like any other investment, with a potential return of new business.

Yet there is another effect.

In the early days of IT (from the 1950s to the 1980s), computers were mysterious and almost magical devices. Business managers were unfamiliar with computers. Many people weren't sure that computers would remain tame, and some feared that they would take over (the company, the country, the world). Managers didn't know how to leverage computers to their full extent. Investors were wary of the cost. Customers resisted the use of computer-generated cards that read "do not fold, spindle, or mutilate".

Today, computers are not mysterious, and certainly not magical. They are routine. They are mundane. And business managers don't fear them. Instead, managers see computers as a tool. Investors see them as equipment. Customers willingly install apps on their phones.

I'm not surprised. The business managers of the 1950s grew up with manual processes. Senior managers might have remembered an age without electricity.

Today's managers are comfortable with computers. They used them as children, playing video games and writing programs in BASIC. The thought that computers can assist the business in various tasks is a natural extension of that experience.

Our view of computers has shifted. The large, expensive, magical computation boxes have shrunk and become cheaper, and are now small, flexible, and powerful computation boxes. Simply owning (or leasing) a mainframe would provide strategic advantage through intimidation; now everyone can leverage server farms, networks, cloud computing, and real-time updates. But owning (or leasing) a server farm or a cloud network isn't enough to impress -- managers, customers, and investors look for business results.

With a new view of computers as mundane, its no surprise that businesses look at them as a way to grow.

Thursday, July 21, 2016

Spaghetti in the Cloud

Will cloud computing eliminate spaghetti code? The question is a good one, and the answer is unclear.

First, let's understand the term "spaghetti code". It is a term that dates back to the 1970s according to Wikipedia and was probably an argument for structured programming techniques. Unstructured programming was harder to read and understand, and the term introduced an analogy of messy code.

Spaghetti code was bad. It was hard to understand. It was fragile, and small changes led to unexpected failures. Structured programming was, well, structured and therefore (theoretically) spaghetti programming could not occur under the discipline of structured programming.

But theory didn't work quite right, and even with the benefits of structured programming, we found that we had code that was difficult to maintain. (In other words, spaghetti code.)

After structured programming, object-oriented programming was the solution. Object-oriented programming, with its ability to group data and functions into classes, was going to solve the problems of spaghetti code.

Like structured programming before it, object-oriented programming didn't make all code easy to read and modify.

Which brings us to cloud computing. Will cloud computing suffer from "spaghetti code"? Will we have difficult to read and difficult to maintain systems in the cloud?

The obvious answer is "yes". Companies and individuals who transfer existing (difficult to read) systems into the cloud will have ... difficult-to-understand code in the cloud.

The more subtle answer is... "yes".

The problems of difficult-to-read code is not the programming style (unstructured, structured, or object-oriented) but in mutable state. "State" is the combination of values for all variables and changeable entities in a program. For a program with mutable state, these variables change over time. For one to read and understand the code, one must understand the current state, that is, the current value of all of those values. But to know the current value of those variables, one must understand all of the operations that led to the current state, and that list can be daunting.

The advocates of functional programming (another programming technique) doesn't allow for mutable variables. Variables are fixed and unchanging. Once created, they exist and retain their value forever.

With cloud computing, programs (and variables) do not hold state. Instead, state is stored in databases, and programs run "stateless". Programs are simpler too, with a cloud system using smaller programs linked together with databases and message queues.

But that doesn't prevent people from moving large, complicated programs into the cloud. It doesn't prevent people from writing large, complicated programs in the cloud. Some programs in the cloud will be small and easy to read. Others will be large and hard to understand.

So, will spaghetti code exist in the cloud? Yes. But perhaps not as much as in previous technologies.

Tuesday, July 19, 2016

How programming languages change

Programming languages change. That's not news. Yet programming languages cannot change arbitrarily; the changes are constrained. We should be aware of this, and pick our technology with this in mind.

If we think of a programming language as a set of features, then programming languages can change in three ways:

Add a feature
Modify a feature
Remove a feature

The easiest change (that is, the type with the least resistance from users) is adding a feature. That's no surprise; it allows all of the old programs to continue working.

Modifying an existing feature or removing a feature is a difficult business. It means that some programs will no longer work. (If you're lucky, they won't compile, or the interpreter will reject them. If you're not lucky, the compiler or interpreter will accept them but process them differently.)

So as a programming language changes, the old features remain. Look inside a modern Fortran compiler and you will find FORMAT statements and arithmetic IF constructs, elements of Fortran's early days.

When a programming language changes enough, we change its name. We (the tech industry) modified the C language to mandate prototypes and in doing so we called the revised language "ANSI C". When Stroustup enhanced C to handle object-oriented concepts, he called it "C with Classes". (We've since named it "C++".)

Sometimes we change not the name but the version number. Visual Basic 4 was quite different from Visual Basic 3, and Visual Basic 5 was quite different from Visual Basic 4 (two of the few examples of non-compatible upgrades). Yet the later versions retained the flavor of Visual Basic, so keeping the name made sense.

Perl 6 is different from Perl 5, yet it still runs old code with a compatibility layer.

Fortran can add features but must remain "Fortranish", otherwise we call it "BASIC" or "FOCAL" or something else. Algol must remain Algol or we call it "C". An enhanced Pascal is called "Object Pascal" or "Delphi".

Language names bound a set of features for the language. Change the feature set beyond the boundary, and you also change the name of the language. Which means that a language can change only so much, in only certain dimensions, while remaining the same language.

When we start a project and select a programming language, we're selecting a set of features for development. We're locking ourselves into a future, one that may expand over time -- or may not -- but will remain centered over its current point. COBOL will always be COBOL, C++ will always be C++, and Ruby will always be Ruby. A COBOL program will always be a COBOL program, a C++ program will always be a C++ program, and a Ruby program will always be a Ruby program.

A lot of this is psychology. We certainly could make radical changes to a programming language (any language) and keep the name. But while we *could* do this, we don't. We make small, gradual changes. The changes to programming languages (I hesitate to use the words "improvements" or "progress") are glacial in nature.

I think that tells us something about ourselves, not the technology.

Sunday, July 10, 2016

Oracle's Java Headaches

Oracle, after its purchase of Sun Microsystems, has found that it owns a number of things including MySQL and Java. MySQL presents obvious problems, as it competes with Oracle's big, expensive database. But Java also presents problems.

Oracle has two challenges with Java. The first (and obvious) challenge is money. Specifically, how does Oracle "monetize" Java? Extracting money from programming languages is possible; Microsoft succeeded with BASIC, Visual Basic, and C#. (Although the last was really profitable through Visual Studio, not the language itself.)

Extracting money from Java remains elusive. Oracle's latest attempt to sue Google has failed. It was a "whale" strategy, designed to obtain a large amount from a single entity. With the loss in court, will Oracle look for a different strategy, perhaps one that looks for fees from more (and smaller) entities?

Revenue is one headache for Oracle. A second headache exists, one that is less obvious, and may show up in the expense, not revenue column.

Oracle's history has been with SQL, a language designed in the 1970s for accessing data. As a programming language, it has been remarkably stable, with only a few changes since its inception. In contrast, programming languages like Visual Basic, C, C++, and C# have seen frequent and sometimes significant changes. Java, too, has seen changes, and it has the "JCP", the Java Community Process, which allows just about anyone to recommend changes to the Java language.

This second challenge is more subtle, and possibly larger, for Oracle. After decades of a stable, unchanging language, is it capable of managing a fast-moving programming language? After decades of maintaining the Oracle database (which I'm sure had lots of internal changes and lots of changes requested by Oracle's relatively few high-paying customers) is Oracle ready to maintain a product used by "the rest of us"?

This is the bigger issue for Oracle. Maintaining the Java code base, adapting it to new platforms, adding features to the language, and putting up with all of the pesky requests from pipsqueaks (highly opinionated pipsqueaks, some of us are) is going to be expensive.

So Oracle is in a squeeze. On one side, Java has no significant revenue. On the other, it has expenses (possibly higher than the expenses for the Oracle database). How will Oracle navigate these straights?

I see a few possible ways forward:

Find a funding mechanism Perhaps licenses, perhaps advertising. Perhaps a version of Java for the Internet of Things.

Tie Java to the Oracle database Make Oracle the easiest database to use in Java.

Keep Java but stop development A fast was to reduce costs, but also a fast way to anger users. (On the other hand, Oracle seems to care little about user opinion.)

Spin off Java If Java doesn't fit into Oracle's strategy, why bother to keep it?

The last is an interesting idea. IBM might be interested in Java. Google almost certainly would. Microsoft probably not so much -- except perhaps to prevent Google from acquiring it.

Thursday, July 7, 2016

DOS, Windows, sharing, and mobile

Windows, when it arrived on the scene, changed the world of computing. Prior to Windows, DOS ruled the computing world, and it was a limited world. Windows expanded that world with new capabilities. In some ways, mobile operating systems (Android and iOS) move us back towards the ways of DOS.

IBM PCs (or compatibles, or near-compatibles) running DOS were simple devices. They could run one program at a time; running multiple programs at once was not possible. (Technically, it was possible with a "terminate and stay resident" function call, but such programs were few. For this essay, I'll stick to the "regular" programs.)

Windows brought us an expanded view of computing. Instead of running a single program at a time, Windows allowed for multiple. Windows provided a common way to present text and graphics on the screen, to print to printers, and to share data. Windows was a large step upward from the world of DOS.

Mobile operating systems -- Android and iOS -- provide a different experience. Instead of multiple applications and multiple windows, these operating systems present one application (or "app") at a time. Multiple apps may run, but only one has the screen. Thus, you can listen to music, check e-mail, and get a text message all at the same time. Mobile apps keep the multitasking aspect, but reduce the interaction to one app at a time.

Reducing the number of interactive apps to one is a reduction in capabilities, although it is a simpler experience, and one that makes sense for a phone. (I think it also makes sense for a tablet.)

What I don't see in the mobile operating systems (and what I don't see in Windows, either) is improvements in the ability to share data across applications. DOS had files and pipes (concepts lifted from Unix). Windows added the clipboard, and then Dynamic Data Exchange (DDE) and later, drag-and-drop. The clipboard was popular and is still used today. DDE never got traction, and drag-and-drop is limited to files.

Sharing data across applications is difficult. Each application has its own ideas about data. Word processors hold characters, words, sentences, paragraphs, and documents. Spreadsheets hold numeric values, formulas, cells, rows, columns, and sheets. Databases hold rows and columns -- or "documents" (different from word processor documents) for NoSQL databases. The transfer of data from one application to another is not obvious, and therefore the programming of such transfers is not obvious.

But Windows has had the clipboard for thirty years, and DDE and drag-and-drop for almost as long. Have we had no ideas in that time?

Perhaps our current mobile operating systems are the DOSes of today, waiting for a new, bold operating system to provide new capabilities.

Tuesday, June 21, 2016

Compilers and interpreters

Programming languages (with a few exceptions) fall into one of two categories: compiled or interpreted.

Compilers are the natural descendants of assemblers. Assemblers convert text representations of processor-specific operation codes into machine-readable form; compilers convert high-level programs into machine-readable form. Interpreters, on the other hand, read high-level programs and process them, without producing an "executable".

Both forms have advantages. Compiled programs execute faster, and the source code can remain hidden from users, who need only the executable form. Interpreted programs may be slower, but the process of writing (and debugging) tends to be faster and interpreted languages have flexibilities not available in compiled languages.

Programming languages are sometimes created by individuals working without specific sponsorship and direction from a corporation (I call them "enthusiasts"). Other languages are created by corporations, in large, well-planned and well-justified projects.

But is one technique more popular than another? Let's look at the list of popular (according to tiobe.com) languages. Here are the top languages, who created them, whether they are compiled or interpreted, and when they were created:

Java: corporation (Sun); compiled; 1990s
C: enthusiasts (Kernighan and Ritchie); compiled; 1970s
C++: enthusiast (Stroustrup); compiled, derived from C; 1980s
Python: enthusiast (van Rossum); interpreted; 1990s
C#: corporation (Microsoft); compiled; 2000s
PHP: enthusiast (Lerdorf); interpreted; 1990s
JavaScript: individual (Eich); interpreted; 1990s
Perl: enthusiast (Larry Wall); interpreted; 1980s
VB.NET: corporation (Microsoft); compiled; 2000s
Ruby: enthusiast (Matsumoto); interpreted; 1990s
Delphi: corporation (Borland); compiled, derived from Pascal; 1990s
Swift: corporation (Apple); compiled; 2010s
Objective-C: enthusiasts (Cox and Love); compiled, derived from C; 1980s
R: enthusiasts (Ihaka and Gentleman); interpreted, derived from S; 1990s
Matlab: enthusiast (Moler); interpreted; 1970s
SQL: enthusiast (Codd); interpreted; 1970s
D: corporation (Digital Mars); compiled; 2000s
COBOL: government consortium; compiled; 1950s

From this list, a few things are obvious. First, we've invented both compiled and interpreted languages. Second, we've invented both over the age of computers, and continue to do so. It's not that a particular type of language was a fad or has fallen out of favor.

Look at the relationship between the type of creator and the language. Enthusiasts create interpreted languages and corporations to create compiled languages. The list above would match this rule perfectly, except for C. (C++ and Objective-C, derived from C, would naturally be compiled.)

But this is a short list, and small sample sizes may be deceptive. Let's look at some more:

APL: enthusiast (Iverson); interpreted; 1950s
BASIC: enthusiasts (Kemeny and Kurtz); interpreted; 1960s
S: enthusiasts (Becker, Wilks, Chambers); interpreted; 1970s
Fortran: corporation (IBM): compiled, derived from assembly language; 1950s
Pascal: enthusiast (Wirth); compiled; 1960s
Eiffel: enthusiast (Meyer); compiled; 1990s
Forth: enthusiast (Moore); interpreted; 1960s
dBase: enthusiast (Ratliff); interpreted; 1970s
Ada: government agency: compiled; 1970s
PL/I: corporation (IBM); compiled; 1960s
Prolog: enthusiasts (Colmerauer, et al.); interpreted; 1970s
AWK: enthusiasts (Aho, Weinberger, and Kernighan); interpreted; 1970s
DIBOL: corporation (DEC); compiled; 1970s
FOCAL: enthusiast (Merrill); interpreted; 1960s

This expanded shows that enthusiasts *tend* to create interpreted languages but not always. Corporations create compiled languages, though. The only interpreted language created by a corporation might be SQL, created by IBM but I've assigned it to E.F. Codd as an enthusiast.

I'm not sure why enthusiasts would create interpreted languages. Perhaps its more fun that way. Perhaps its easier. Interpreted languages let you stop a running program, examine the innards of your interpreter, adjust things, and continue running, all useful when debugging the interpreter.

Astute readers will note that my assignment of "enthusiast" or "corporation" to languages may be a bit loose. The designation is sometimes difficult. Kernighan and Ritchie, when creating C, were working for AT&T's Bell Labs. Are they corporation employees or enthusiasts? E.F. Codd worked for IBM when publishing his thoughts on relational databases. Is he an employee or an enthusiast? Wayne Ratliff was working for NASA's JPL when he wrote the first version of dBase and was part of Ashton-Tate when he wrote dBase II. Does that make him an employee? In all of these cases, I feel the individuals involved were doing what they did more as enthusiasts than employees.

On the flip side, I've placed Java and C# in the "corporation" side. Neither of these languages have individuals strongly associated with their origins. Java was a thing presented to us by Sun; C# was presented by Microsoft. Did the creation of these languages involve passionate individuals? Certainly. Were those individuals working on these projects independent of the corporation's needs? I see no evidence of that. (Yet I can easily see Kernighan and Ritchie working late at night to add features to their C compiler.)

I don't know if the assignment of "corporation" or "enthusiast" to a language's origin is important -- but I don't know that it isn't. It may be that enthusiasts will continue to create interpreted languages, and corporations will continue to create compiled languages.

I do think it significant that Java and C# live in between, Java with its JVM and C# with its CLR. Perl and Python have moved in that direction, too. They gain some benefits of interpreted languages and retain some benefits of compiled languages. I expect we will see more languages that use these techniques.

One more thing. Two other recently developed languages:

Go: corporation (Google); compiler; 2010s
Checked C: corporation (Microsoft); compiled, derived from C; 2010s

So maybe everyone isn't jumping on the "semi-interpreted" wagon.

Thursday, June 2, 2016

The big improvement in programming forty years ago

Programming has been around since the beginning of computers, and seen lots of improvements: symbolic assembly, high-level compilers (COBOL and FORTRAN), structured programming (Pascal), object-oriented programming (Smalltalk, C++), virtual machines (Java, C#), scripting languages (Perl, Python, Ruby)... the list goes on.

Yet a significant improvement in programming occurred forty years ago. It made programming simple -- so simple that a non-programmer could do it. And it was ignored by the programming community.

That improvement was... the electronic spreadsheet.

Programming, at its core, is the organization of data and the processing of that data with a sequence of instructions. The niceties of data structures, objects, and just-in-time compilation are just that: niceties. They are there for the convenience of the programmer.

So how do spreadsheets come into it? Spreadsheets, at their core, organize data and process that data with a series of instructions. (Sound familiar?)

Spreadsheets -- the basic grid of numbers and formulas, without the charts, pivot tables, and VBA code -- are programs. Any spreadsheet can be converted into just about any language, from Fortran or BASIC to Java or Python. (The reverse is not true; only a few simple programs in BASIC or Python can be converted into spreadsheets.)

The improvement that spreadsheets made to programming was immediacy. The "programmer" could see the results of a change right after making a change. That immediate feedback was not available in compiled languages, which require the programmer to save the file, compile the program, and then run it. (IDEs like Turbo Pascal and Visual Studio make those steps easy, but there is still a delay.) Even interpreted languages like BASIC or Ruby require the steps of saving and running.

This improvement in programming, the immediate results of a change in the program, went unnoticed by the programming community. Visicalc was created in 1979, almost forty years ago. At the time, popular programming languages were BASIC, COBOL, Fortran, and Pascal.

Instead of building on the innovation of the spreadsheet, programmers have gone in other directions. Programmers focused on maintainability (structured programming), larger programs (object-oriented programming), version control, automated testing, and response to changing requirements (agile methods).

There has been no (or very little) effort for the immediate feedback that we get with spreadsheets.

For forty years.

At some point, we are going to invent a new programming language, one that provides immediate feedback. (Perhaps a language, editor, and run-time environment, which is what a spreadsheet is.) The advantages are great, as anyone who works with a spreadsheet can attest.

Sunday, May 22, 2016

Small check-ins saved me

With all of the new technology, from cloud computing to tablets to big data, we can forget the old techniques that help us.

This week, I was helped by one of those simple techniques. The technique that helped was frequent, small check-ins to version control systems. I was using Microsoft's TFS, but this technique works with any system: TFS, Subversion, git, CVS, ... even SourceSafe!

Small, frequent changes are easier to review and easier to revert than large changes. Any version control system accepts small changes; the decision to make large or small changes is up to the developer.

After a number of changes, the team with whom I work discovered a defect, one that had escaped our tests. We knew that it was caused by a recent change -- we tested releases and found that it occurred only in the most recent release. That information limited the introduction of the defect to the most recent forty check-ins.

Forty check-ins may seem like a long list, but we quickly identified the specific check-in by using a binary search technique: get the source from the middle revision; if the error occurs move to the earlier half, if not move to the later half and start in that half's middle.

The real benefit occurred when we found the specific check-in. Since all check-ins were small, this check-in was too. (It was a change of five different lines.) It was easy to review the five individual lines and find the error.

Once we found the error, it was easy to make the correction to the latest version of the code, run our tests (which now included an addition test for the specific problem we found), verify that the fix was correct, and continue our development.

A large check-in would have required much more examination, and more time.

Small check-ins cost little and provide easy verification. Why not use them?

Sunday, May 15, 2016

Agile values clean code; waterfall may but doesn't have to

Agile and Waterfall are different in a number of ways.

Agile promises that your code is always ready to ship. Waterfall promises that the code will be ready on a specific date in the future.

Agile promises that your system passes the tests (at least the tests for code that has been implemented). Waterfall promises that every requested feature will be implemented.

There is another difference between Agile and Waterfall. Agile values clean code; Waterfall values code that performs as intended but has no notion of code quality. The Agile cycle includes a step for refactoring, a time for developers to modify the code and improve its design. The Waterfall method has no corresponding step or phase.

Which is not to say that Waterfall projects always result in poorly designed code. It is possible to build well-designed code with Waterfall. Agile explicitly recognizes the value of clean code and allocates time for correcting design errors. Waterfall, in contrast, has its multiple phases (analysis, design, coding, testing, and deployment) with the assumption that working code is clean code -- or code of acceptable quality.

I have seen (and participated in) a number of Waterfall projects, and the prevailing attitude is that code improvements can always be made later, "as time allows". The problem is that time never allows.

Many project managers have the mindset that developers should be working on features with "business value". Typically these changes fall into one of three categories: feature to increase revenue, features to reduce costs, and defect corrections. The mindset also considers any effort outside of those areas to be not adding value to the business and therefore not worthy of attention.

Improving code quality is an investment in the future. It is positioning the code to handle changes -- in requirements or staff or technology -- and reducing the effort and cost of those changes. In this light, Agile is looking to the future, and waterfall is looking to the past (or perhaps only the current release).

Thursday, May 5, 2016

Where have all the operating systems gone?

We used to have lots of operating systems. Every hardware manufacturer built their own operating systems. Large manufacturers like IBM and DEC had multiple operating systems, introducing new ones with new hardware.

(It's been said that DEC became a computer company by accident. They really wanted to write operating systems, but they needed processors to run the them and compilers and editors to give them something to do, so they ended up building everything. It's a reasonable theory, given the number of operating systems they produced.)

In the 1970s CP/M was an attempt at an operating system for different hardware platforms. It wasn't the first; Unix was designed for multiple platforms prior. It wasn't the only one; the UCSD p-System used a virtual processor quite like the virtual machine in the Java JVM and ran on various hardware.

Today we also see lots of operating systems. Commonly used ones include Windows, Linux, Mac OS, iOS, Android, Chrome OS, and even watchOS. But are they really different?

Android and Chrome OS are really variants on Linux. Linux itself is a clone of Unix. Mac OS is derived from NetBSD which in turn is derived from the Berkeley System Distribution of Unix. iOS and watchOS are, according to Wikipedia, "Unix-like", and I assume that they are slim versions of NetBSD with added components.

Which means that our list of commonly-used operating systems becomes:

  • Windows
  • Unix

That's a rather small list. (I'm excluding the operating systems used for special purposes, such as embedded systems in automobiles or machinery or network routers.)

I'm not sure that this reduction in operating systems, this approach to a monoculture, is a good thing. Nor am I convinced that it is a bad thing. After all, a common operating system (or two commonly-used operating systems) means that lots of people know how they work. It means that software written for one variant can be easily ported to another variant.

I do feel some sadness at the loss of the variety of earlier years. The early days of microcomputers saw wide variations of operating systems, a kind of Cambrian explosion of ideas and implementations. Different vendors offered different ideas, in hardware and software. The industry had a different feel from today's world of uniform PCs and standard Windows installations. (The variances between versions of Windows, or even between the distros of Linux, and much smaller than the differences between a Data General minicomputer and a DEC minicomputer.)

Settling on a single operating system is a way of settling on a solution. We have a problem, and *this* operating system, *this* solution, is how we address it. We've settled on other standards: character sets, languages (C# and Java are not that different), storage devices, and keyboards. Once we pick a solution and make it a standard, we tend to not think about it. (Is anyone thinking of new keyboard layouts? New character sets?) Operating systems seem to be settling.


Sunday, May 1, 2016

Waterfall and agile depend on customer relations

The debate between Agile and Waterfall methods for project management seems to have forgotten about customers, and more specifically the commitments made to customers.

Waterfall and Agile methods differ in that Waterfall promises a specific set of functionality on a specific date, and Agile promises that the product is always ready to ship (but perhaps not with all the features you want). These two methods require different techniques for project management, but also imply different relationships with customers.

It's easy to forget that the customers are the people who actually pay for the product (or service). Too often, we focus on the "internal customer" and think of due dates and service level agreements. But let's think about the traditional customers.

If your business promises specific functionality to customers on specific dates, they you probably want the waterfall project management techniques. Agile methods are a poor fit. They may work for the development and testing of the product, they don't mesh with the schedules developed by people making promises to customers.

Lots of companies promise specific deliverables at a future date. Some companies have to deliver releases in time for other, external events, such as Intuit's TurboTax in time for tax season. Microsoft has announced software in advance, sometimes to deter competition. (Microsoft is not alone in this activity.)

Not every company works this way. Google, for example, upgrades its search page and apps (Google Docs, Google Sheets) when they can. They have never announced -- in advance -- new features for their search page. (They do announce changes, sometimes a short period before they implement them, but not too far in advance.) Amazon.com rolls out changes to its sales pages and web service platform as they can. There is no "summer release" and no analog to a new car model year. And they are successful.

If your company promises new versions (or new products) on specific dates, you may want to manage your projects with the Waterfall method. The Agile method will fit into your schedule poorly, as it doesn't promise what Agile promises.

You may also want to review your company's philosophy towards releases. Do you need to release software on specific dates? Must you follow a rigid release schedule? For all of your releases?

Sunday, April 17, 2016

After the spreadsheet

The spreadsheet is a wonderful thing. It performs several functions: it holds and organizes data, it specifies calculations, and it presents results. All of these functions, wrapped into a single package, provides convenience to users. Yet that convenience of the single package will be the spreadsheet's undoing.

For the individual, the spreadsheet is a useful tool. But for the enterprise, the spreadsheet creates perhaps more problems than it solves. Since a spreadsheet file contains the data, formulas, and presentation of data, they are often replicated to share with co-workers (usually via e-mail) and duplicated to process different sets of data (the spring sales figures and then the summer sales figures, for example).

The replication of spreadsheets via e-mail can me mitigated by the use of shared file locations ("network drives") and by online versions of spreadsheets which allow for multiple users. But the bigger problem is the duplication of spreadsheets with minor changes.

The duplication of spreadsheet means the duplication of not only the data (which is often changed) but also the duplication of the formulas and the presentation (which often do not change). Since a spreadsheet contains all three components, a new version of data requires a new copy of all components. There is no way to share only one component, no way to share formulas against new data, or different presentations against data and formulas. This means that, over time, an enterprise of any size accumulates multiple spreadsheets with different data and duplicate formulas and macros -- at least you hope that they are duplicate copies.

The design of spreadsheets -- containing the data, formulas, and presentation in one package -- is a holdover from the days of Visicalc and Lotus 1-2-3. Those programs were developed for the Apple II and the IBM PC. With their ability to run only one program at a time, putting everything into one program made sense -- using one program for data, another for calculation, and a third for presentation was awkward and time-consuming. But that applies to the old single-tasking operating systems. Windows and Mac OS and Linux allow for multiple programs to run at the same time, and windowing systems allow for multiple programs to present information to the user at the same time.

If spreadsheets were being invented now in the age of web services and cloud systems and multi-window displays, their design would probably be quite different. Instead of a single program that performed all functions and a single file that contained data, formulas and presentation, we might have something very different. We might create a system of web services, providing data with some and performing calculations with others. The results could be displayed by yet other functions in other windows, possibly published for co-workers to view.

Such a multi-component system would follow the tenets of Unix, which recommends small, independent programs that read data, perform some processing, and provide data. The data and computations could be available via web services. A central service could "fan out" requests to collect data from one or more services, send that data through one or more computing services, and the provide the data to a presentation mechanism such as a graph in a window or even a printed report.

By separating the formulas and macros from the data, we can avoid needless duplication of both. (While most cases see the duplication of formulas to handle different data sets, sometimes different formulas can be applied to the same data.)

Providing data via web services is easy -- web services do that today. There are even web services to convert data into graphs. What about calculations? What language can be used to perform computations on data sets?

The traditional languages of C# and Java are not the best here; we're replacing spreadsheets with something equally usable by non-programmers (or at least similarly usable). The best candidate may be R, the statistical-oriented language. R is established, cross-platform, and capable. It's also a high-level language, close the the formulas of spreadsheets (and more powerful that Microsoft's VBA, which is used for macros in Excel).

Replacing spreadsheets with a trio of data management, computation, and presentation tools will not be easy. The advantages of the spreadsheet include convenience and familiarity. The advantages of separate components are better integration in cloud systems, leveraging of web services, and easier audits of formulas. It may not happen soon, but I think it will happen eventually.

Thursday, April 14, 2016

Technology winners and losers

Technology has been remarkably stable for the past three decades. The PC dominated the hardware market and Microsoft dominated the software market.

People didn't have to use PCs and Microsoft Windows. They could choose to use alternative solutions, such as Apple Macintosh computers with Mac OS. They could use regular PCs with Linux. But the people using out-of-mainstream technology *chose* to use it. They knew what they were getting into. They knew that they would be a minority, that when they entered a computer shop that most of the offerings would be for the other, regular Windows PCs and not their configuration.

The market was not always this way. In the years before the IBM PC, different manufacturers provided different systems: the Apple II, the TRS-80, DEC's Pro-325 and Pro-350, the Amiga, ... there were many. All of those systems were swept aside by the IBM PC, and all of the enthusiasts for those systems knew the pain of loss. They had lost their chosen system to the one designated by the market as the standard.

In a recent conversation with a Windows enthusiast, I realized that he was feeling a similar pain in his situation. He was dejected at the dearth of support for Windows phones -- he owned such a phone, and felt left out of the mobile revolution. Windows phones are out-of-mainstream, and many apps do not run on them.

I imagine that many folks in the IT world are feeling the pain of loss. Some because they have Windows phones. Others because they have been loyal Microsoft users for decades, perhaps their entire career, and now Windows is no longer the center of the software world.

This is their first exposure to loss.

The grizzled veterans who remember CP/M or Amiga DOS have had our loss; we know how to cope. The folks who used WordPerfect or Lotus 1-2-3 had to switch to Microsoft products, they know loss too. But no technology has been forced from the market for quite some time. Perhaps the last was IBM's OS/2, back in the 1990s. (Or perhaps Visual Basic, when it was modified to VB.NET.)

But IT consists of more than grizzled veterans.

For someone entering the IT world after the IBM PC (and especially after the rise of Windows), it would be possible -- and even easy -- to enjoy a career in dominant technologies: stay within the Microsoft set of technology and everything was mainstream. Microsoft technology was supported and accepted. Learning Microsoft technologies such as SQL Server and SharePoint meant that you were on the "winning team".

A lot of folks in technology have never known this kind of technology loss. When your entire career has been with successful, mainstream technology, the change is unsettling.

Microsoft Windows Phone is a technology on the edge. It exists, but it is not mainstream. It is a small, oddball system (in the view of the world). It is not the "winning team"; iOS and Android are the popular, mainstream technologies for phones.

As Microsoft expands beyond Windows with Azure and apps for iOS and Android, it competes with more companies and more technologies. Azure competes with Amazon.com's AWS and Google's Compute Engine. Office Online and Office 365 compete with Google Docs. OneDrive competes with DropBox and BOX. Microsoft's technologies are not the de facto standard, not always the most popular, and sometimes the oddball.

For the folks confronting a change to their worldview that Microsoft technology is always the most popular and most accepted (to a worldview that different technologies compete and sometimes Microsoft loses), a example to follow would be ... Microsoft.

Microsoft, after years of dominance with the Windows platform and applications, has widened its view. It is not "the Windows company" but a technology company that supplies Windows. More than that, it is a technology company that supplies Windows, Azure, Linux, and virtual machines. It is a company that supplies Office applications on Windows, iOS, and Android. It is a technology company that supplies SQL Server on Windows and soon Linux.

Microsoft adapts. It changes to meet the needs of the market.

That's a pretty good example.

Sunday, April 10, 2016

Complexity of programming languages

Are programming languages becoming more or less complex?

It is a simple question, and like many simple questions, the answer is not so simple.

Let's look at a simple program in some languages. The simple program will print the numbers from 1 to 10. Here are programs in FORTRAN, BASIC, Pascal, C, C++, C#, and Python:

FORTRAN 66 (1966)

      DO 100 I = 1, 10
100   WRITE (6, 200) I
200   FORMAT (I5)
      STOP
      END

BASIC (1965)

10 FOR I = 1 TO 10
20 PRINT I
30 NEXT I
99 END

Pascal (1970)

program hello;
begin
  i: integer;

  for i := 1 to 10 do

    WriteLn(i);
end.

C (1978)

#include "stdlib.h"

int main(void)
{
    int i;

    for (i = 1; i <= 10; i++)

      printf("%d", i);
}

C++ (1983)

#include

int main()

{
    for (unsigned int i = 1; i <= 10; i++)
      std::cout << i << std::endl;

    return 0;

}

C# (2000)

using System;

class Program

{
    public static void Main(string[] args)
    {
        for (int i = 1; i <= 10; i++)
            Console.WriteLine(i);
    }
}

Python (2000)

for i in range(1, 10):
    print(str(i))

From this small sampling, a few things are apparent.

First, the programs vary in length. Python is the shortest with only two lines. It's also a recent language, so are languages becoming more terse over time? Not really, as FORTRAN and BASIC are the next shortest languages (with 5 and 4 lines, respectively) and C#, a contemporary of Python, requires 10 lines.

Second, the formatting of program statements has changed. FORTRAN and BASIC, the earliest languages in this set, have strong notions about columns and lines. FORTRAN limits line length to 72 characters, reserves the first 6 columns for line numbers, and another column for continuation characters (which allows statements to exceed the 72 character limit). BASIC relaxed the column restrictions but add the requirement for line numbers on each line. Pascal, C, C++, and C# care nothing about columns and lines, looking at tokens in the code to separate statements. Python relies on indentation for block definitions.

Some languages (BASIC, C#) are capable of printing things by simply mentioning them. Others languages need specifications. FORTRAN has FORMAT statements to specify the exact format of the output. C has a printf() function that needs similar formatting information. I find the mechanisms of BASIC and C# easier to use than the mechanisms of C and Python.

Let's consider a somewhat more complex program, one that lists a set of prime numbers. We'll look at BASIC and Lua, which span the computing age.

Lua

local N = 100
local M = 10
function PRIME()  -- PROCEDURE DECLARATION;
  local X, SQUARE, I, K, LIM, PRIM -- DECLARATION OF VARIABLES;
  local P, V = {}, {}
  P[1] = 2 -- ASSIGNMENT TO FIRST ELEMENT OF p;
  print(2) -- OUTPUT A LINE CONTAINING THE NUMBER 2;
  X = 1
  LIM = 1
  SQUARE = 4
  for I = 2, N do -- LOOP. I TAKES ON 2, 3, ... N;
    repeat -- STOPS WHEN "UNTIL" CONDITION IS TRUE;
      X = X + 2
      if SQUARE <= X then
        V[LIM] = SQUARE
        LIM = LIM + 1
        SQUARE = P[LIM] * P[LIM]
      end
      local K = 2
      local PRIM = true
      while PRIM and K < LIM do
        if V[K] < X then
          V[K] = V[K] + P[K]
        end
        PRIM = X ~= V[K]
        K = K + 1
      end
    until PRIM -- THIS LINE CLOSES THE REPEAT
    P[I] = X
    print(X)
  end
end
PRIME()


BASIC

100 LET N = 100
110 LET M = 10
200 DIM P(100), V(100)
300 LET P(1) = 2
310 PRINT P(1)
320 LET X = 1
330 LET L = 1
340 LET S = 4
350 FOR I = 2 TO N
360  REM    repeat -- STOPS WHEN "UNTIL" CONDITION IS TRUE;
370   LET X = X + 2
380   IF S > X THEN 420
390    LET V(L) = S
400    LET L = L + 1
410    LET S = P(L)^2
420   REM
430   LET K = 2
440   LET P = 1
450   REM while PRIM and K < LIM do
455   IF P <> 1 THEN 520
460   IF K >= L THEN 520
470    IF V(K) >= X THEN 490
480     LET V(K) = V(K) + P(K)
490    REM
500    LET P = 0
501    IF X = V(K) THEN 510
502    LET P = 1
510    LET K = K + 1
515   GOTO 450
520  IF P <> 1 THEN 360
530  LET P(I) = X
540  PRINT X
550 NEXT I
999 END

Both programs work. They produce identical output.  The version in Lua may be a bit easier to read, given that variable names can be more than a single letter.

They are about the same size and complexity. Two versions of a program, one from today and one from the early years of computing, yet they have similar complexity.

Programming languages encode operations into pseudo-English instructions. If the measure of a programming language's capability is its capacity to represent operations in a minimal number of steps, then this example shows that programming languages have not changed over the past five decades.

Caution is advised. These examples (printing "hello" and calculating prime numbers) may be poor representatives of typical programs. Perhaps we should withhold judgement until we consider more (and larger) programs. After all, very few people in 2016 use BASIC; there must be a reason they have selected modern languages.

Perhaps it is better to keep asking the question and examining our criteria for programming languages.