Sunday, August 21, 2016

Scale

Software development requires an awareness of scale. That is, the knowledge of the size of the software, and the selection of the right tools and skills to manage the software.

Scale is present in just about every aspect of human activity. We humans have different forms of transportation. We can walk, ride bicycles, and drive automobiles. (We can also run, swim, ride busses, and fly hang gliders, but I will stick to the three most common forms.)

Walking is a simple activity. It requires little in the way of planning, little in the way of equipment, and little in the way of skill.

Riding a bicycle is somewhat higher in complexity. It requires equipment (the bicycle) and some skill (knowing how to ride the bicycle). It requires planning: when we arrive at our destination, what do we do with the bicycle? We may need a lock, to secure the bicycle. We may need a safety helmet. The clothes we wear must be tight-fitting, or at least designed to not interfere with the bicycle mechanism. At this level of complexity we must be aware of the constraints on our equipment.

Driving an automobile is more complex than riding a bicycle. It requires more equipment (the automobile) and more skills (driving). It requires more planning. In addition to the car we will need gasoline. And insurance. And registration. At this level of complexity we must be aware of the constraints on our equipment and also the requirements from external entities.

Translating this to software, we can see that some programs are simple and some are complex.

Tiny programs (perhaps a one-liner in Perl) are simple enough that we can write them, use them, and then discard them, with no other thoughts for maintenance or upkeep. Let's consider them the equivalent of walking.

Small programs (perhaps a page-long script in Ruby) require more thought to prepare (and test) and may need comments to describe some of the inner workings. They can be the equivalent of riding a bicycle.

Large programs (let's jump to sophisticated packages with hundreds of thousands of lines of code) require a language that helps us organize our code; comprehensive sets of unit, component, and system tests; and documentation to record the rationale behind design decisions. These are the analogue of driving an automobile.

But here is where software differs from transportation: software changes. The three modes of transportation (walking, bicycle, automobile) are static and distinct. Walking is walking. Driving is driving. But software is dynamic -- frequently, over time, it grows. Many large programs start out as small programs.

A small program can grow into a larger program. When it does, the competent developer changes the tools and practices used to maintain the code. Which means that a competent programmer must be aware of the scale of the project, and the changes in that scale. As the code grows, a programmer must change his (or her) tools and practice.

There's more, of course.

The growth effect of software extends to the management of project teams. A project may start with a small number of people. Over time, the project grows, and the number of people on the team increases.

The techniques and practices of a small team don't work for larger teams. Small teams can operate informally, with everyone in one room and everyone talking to everyone. Larger teams are usually divided into sub-teams, and the coordination of effort is harder. Informal methods work poorly, and the practices must be more structured, more disciplined.

Enterprise-class projects are even more complex. They require more discipline and more structure than the merely large projects. The structure and discipline is often expressed in bureaucracy, frustrating the whims of "lone cowboy" programmers.

Just as a competent developer changes tools and adjusts practices to properly manage a growing code base, the competent manager must also change tools and adjust practices to properly manage the team. Which means that a competent manager must be aware of the scale of the project, and the changes in that scale. As a project grows, a manager must lead his (or her) people through the changes.

Sunday, August 14, 2016

PC-DOS killed the variants of programming languages

BASIC was the last language with variants. Not "variant" in the flexible-value type known as "Variant", but in different implementations. Different dialects.

Many languages have versions. C# has had different releases, as has Java. Perl is transitioning from version 5 (which had multiple sub-versions) to version 6 (which will most likely have multiple sub-versions).  But that's not what I'm talking about.

Some years ago, languages had different dialects. There were multiple implementations with different features. COBOL and FORTRAN all had machine-specific versions. But BASIC had the most variants. For example:

- Most BASICs used the "OPEN" statement to open files, but HP BASIC and GE BASIC used the "FILES" statement which listed the names of all files used in the program. (An OPEN statement lists only one file, and a program may use multiple OPEN statements.)

- Most BASICs used parentheses to enclose variable subscripts, but some used square brackets.

- Some BASICS had "ON n GOTO" statements but some used "GOTO OF n" statements.

- Some BASICS allowed the apostrophe as a comment indicator; others did not.

- Some BASICS allowed for statement modifiers, such as "FOR" or "WHILE" at the end of a statement and others did not.

These are just some of the differences in the dialects of BASIC. There were others.

What interests me is not that BASIC had so many variants, but that languages since then have not. The last attempt at a dialect of a language was Microsoft's Visual J++ as a variant of Java. They were challenged in court by Sun, and no one has attempted a special version of a language since. Because of this, I place the demise of variants in the year 2000.

There are two factors that come to mind. One is standards, the other is open source.
BASIC was introduced to the industry in the 1960s. There was no standard for BASIC, except perhaps for the Dartmouth implementation, which was the first implementation. The expectation of standards has risen since then, with standards for C, C++, Java, C#, JavaScript, and many others. With clear standards, different implementations of languages would be fairly close.

The argument that open source prevented the creation of variants of languages makes some sense. After all, one does not need to create a new, special version of a language when the "real" language is available for free. Why invest effort into a custom implementation? And the timing of open source is coincidental with the demise of variants, with open source rising just as language variants disappeared.

But the explanation is different, I think. It was not standards (or standards committees) and it was not open source that killed variants of languages. It was the PC and Windows.

The IBM PC and PC-DOS saw the standardization and commoditization of hardware, and the separation of software from hardware.

In the 1960s and 1970s, mainframe vendors and minicomputer vendors competed for customer business. They sold hardware, operating systems, and software. They needed ways to distinguish their offerings, and BASIC was one way that they could do that.

Why BASIC? There were several reasons. It was a popular language. It was easily implemented. It had no official standard, so implementors could add whatever features they wanted. A hardware manufacturer could offer their own, special version of BASIC as a productivity tool. IBM continued this "tradition" with BASIC in the ROM of the IBM PC and an advanced BASIC with PC-DOS.

But PC compatibles did not offer BASIC, and didn't need to. When manufacturers figured out how to build compatible computers, the factors for selecting a PC compatible were compatibility and price, not a special version of BASIC. Software would be acquired separately from the hardware.

Mainframes and minicomputers were expensive systems, sold with operating systems and software. PCs were different creatures, sold with an operating system but not software.

It's an idea that holds today.

With software being sold (or distributed, as open source) separately from the hardware, there is no need to build variants. Commercial languages (C#, Java, Swift) are managed by the company, which has an incentive for standardization of the language. Open source languages (Perl, Python, Ruby) can be had "for free", so why build a special version -- especially when that special version will need constant effort to match the changes in the "original"? Standard-based languages (C, C++) offer certainty to customers, and variants on them offer little advantage.

The only language that has variants today seems to be SQL. That makes sense, as the SQL interpreter is bundled with the database. Creating a variant is a way of distinguishing a product from the competition.

I expect that the commercial languages will continue to evolve along consistent lines. Microsoft will enhance C#, but there will be only the Microsoft implementation (or at least, the only implementation of significance). Oracle will maintain Java. Apple will maintain Swift.

The open source languages will evolve too. But Perl, Python, and Ruby will continue to see single implementations.

SQL will continue be the outlier. It will continue to see variants, as different database vendors supply them. It will be interesting to see what happens with the various NoSQL databases.

Monday, August 8, 2016

Agile is all about code quality

Agile promises clean code. That's the purpose of the 'refactor' phase. After creating a test and modifying the code, the developer refactors the code to eliminate compromises made during the changes.

But how much refactoring is enough? One might flippantly say "as much as it takes" but that's not an answer.

For many shops, the answer seems to be "as much as the developer thinks is needed". Other shops allow refactoring until the end of the development cycle. The first is subjective and opens the development team to the risk of spending too much time on refactoring and not enough on adding features. The second is arbitrary and risks short-changing the refactoring phase and allowing messy code to remain in the system.

Agile removes risk by creating automated tests, creating them before modifying the code, and having developers run those automated tests after all changes. Developers must ensure that all tests pass; they cannot move on to other changes while tests are failing.

This process removes judgement from the developer. A developer cannot say that the code is "good enough" without the tests confirming it. The tests are the deciders of completeness.

I believe that we want the same philosophy for code quality. Instead of allowing a developer to decide when refactoring has reached "good enough", we will instead use an automated process to make that decision.

We already have code quality tools. C and C++ have had lint for decades. Other languages have tools as well. (Wikipedia has a page for static analysis tools.) Some are commercial, others open source. Most can be tailored to meet the needs of the team, placing more weight on some issues and ignoring others. My favorite at the moment is 'Rubocop', a style-checking tool for Ruby.

I expect that Agile processes will adopt a measured approach to refactoring. By using one (or several) code assessors, a team can ensure quality of the code.

Such a change is not without ramifications. This change, like the use of automated tests, takes judgement away from the programmer. Code assessment tools can consider many things, some of which are style. They can examine indentation, names of variables or functions, the length or complexity of a function, or the length of a line of code. They can check the number of layers of 'if' statements or 'while' loops.

Deferring judgement to the style checkers will affect managers as well as programmers. If a developer must refactor code until it passes the style checker, then a manager cannot cut short the refactoring phase. Managers will probably not like this change -- it takes away some control. Yet it is necessary to maintain code quality. By ending refactoring before the code is at an acceptable quality, managers allow poor code to remain in the system, which will affect future development.

Agile is all about code quality.