Sunday, August 21, 2016

Scale

Software development requires an awareness of scale. That is, the knowledge of the size of the software, and the selection of the right tools and skills to manage the software.

Scale is present in just about every aspect of human activity. We humans have different forms of transportation. We can walk, ride bicycles, and drive automobiles. (We can also run, swim, ride busses, and fly hang gliders, but I will stick to the three most common forms.)

Walking is a simple activity. It requires little in the way of planning, little in the way of equipment, and little in the way of skill.

Riding a bicycle is somewhat higher in complexity. It requires equipment (the bicycle) and some skill (knowing how to ride the bicycle). It requires planning: when we arrive at our destination, what do we do with the bicycle? We may need a lock, to secure the bicycle. We may need a safety helmet. The clothes we wear must be tight-fitting, or at least designed to not interfere with the bicycle mechanism. At this level of complexity we must be aware of the constraints on our equipment.

Driving an automobile is more complex than riding a bicycle. It requires more equipment (the automobile) and more skills (driving). It requires more planning. In addition to the car we will need gasoline. And insurance. And registration. At this level of complexity we must be aware of the constraints on our equipment and also the requirements from external entities.

Translating this to software, we can see that some programs are simple and some are complex.

Tiny programs (perhaps a one-liner in Perl) are simple enough that we can write them, use them, and then discard them, with no other thoughts for maintenance or upkeep. Let's consider them the equivalent of walking.

Small programs (perhaps a page-long script in Ruby) require more thought to prepare (and test) and may need comments to describe some of the inner workings. They can be the equivalent of riding a bicycle.

Large programs (let's jump to sophisticated packages with hundreds of thousands of lines of code) require a language that helps us organize our code; comprehensive sets of unit, component, and system tests; and documentation to record the rationale behind design decisions. These are the analogue of driving an automobile.

But here is where software differs from transportation: software changes. The three modes of transportation (walking, bicycle, automobile) are static and distinct. Walking is walking. Driving is driving. But software is dynamic -- frequently, over time, it grows. Many large programs start out as small programs.

A small program can grow into a larger program. When it does, the competent developer changes the tools and practices used to maintain the code. Which means that a competent programmer must be aware of the scale of the project, and the changes in that scale. As the code grows, a programmer must change his (or her) tools and practice.

There's more, of course.

The growth effect of software extends to the management of project teams. A project may start with a small number of people. Over time, the project grows, and the number of people on the team increases.

The techniques and practices of a small team don't work for larger teams. Small teams can operate informally, with everyone in one room and everyone talking to everyone. Larger teams are usually divided into sub-teams, and the coordination of effort is harder. Informal methods work poorly, and the practices must be more structured, more disciplined.

Enterprise-class projects are even more complex. They require more discipline and more structure than the merely large projects. The structure and discipline is often expressed in bureaucracy, frustrating the whims of "lone cowboy" programmers.

Just as a competent developer changes tools and adjusts practices to properly manage a growing code base, the competent manager must also change tools and adjust practices to properly manage the team. Which means that a competent manager must be aware of the scale of the project, and the changes in that scale. As a project grows, a manager must lead his (or her) people through the changes.

Sunday, August 14, 2016

PC-DOS killed the variants of programming languages

BASIC was the last language with variants. Not "variant" in the flexible-value type known as "Variant", but in different implementations. Different dialects.

Many languages have versions. C# has had different releases, as has Java. Perl is transitioning from version 5 (which had multiple sub-versions) to version 6 (which will most likely have multiple sub-versions).  But that's not what I'm talking about.

Some years ago, languages had different dialects. There were multiple implementations with different features. COBOL and FORTRAN all had machine-specific versions. But BASIC had the most variants. For example:

- Most BASICs used the "OPEN" statement to open files, but HP BASIC and GE BASIC used the "FILES" statement which listed the names of all files used in the program. (An OPEN statement lists only one file, and a program may use multiple OPEN statements.)

- Most BASICs used parentheses to enclose variable subscripts, but some used square brackets.

- Some BASICS had "ON n GOTO" statements but some used "GOTO OF n" statements.

- Some BASICS allowed the apostrophe as a comment indicator; others did not.

- Some BASICS allowed for statement modifiers, such as "FOR" or "WHILE" at the end of a statement and others did not.

These are just some of the differences in the dialects of BASIC. There were others.

What interests me is not that BASIC had so many variants, but that languages since then have not. The last attempt at a dialect of a language was Microsoft's Visual J++ as a variant of Java. They were challenged in court by Sun, and no one has attempted a special version of a language since. Because of this, I place the demise of variants in the year 2000.

There are two factors that come to mind. One is standards, the other is open source.
BASIC was introduced to the industry in the 1960s. There was no standard for BASIC, except perhaps for the Dartmouth implementation, which was the first implementation. The expectation of standards has risen since then, with standards for C, C++, Java, C#, JavaScript, and many others. With clear standards, different implementations of languages would be fairly close.

The argument that open source prevented the creation of variants of languages makes some sense. After all, one does not need to create a new, special version of a language when the "real" language is available for free. Why invest effort into a custom implementation? And the timing of open source is coincidental with the demise of variants, with open source rising just as language variants disappeared.

But the explanation is different, I think. It was not standards (or standards committees) and it was not open source that killed variants of languages. It was the PC and Windows.

The IBM PC and PC-DOS saw the standardization and commoditization of hardware, and the separation of software from hardware.

In the 1960s and 1970s, mainframe vendors and minicomputer vendors competed for customer business. They sold hardware, operating systems, and software. They needed ways to distinguish their offerings, and BASIC was one way that they could do that.

Why BASIC? There were several reasons. It was a popular language. It was easily implemented. It had no official standard, so implementors could add whatever features they wanted. A hardware manufacturer could offer their own, special version of BASIC as a productivity tool. IBM continued this "tradition" with BASIC in the ROM of the IBM PC and an advanced BASIC with PC-DOS.

But PC compatibles did not offer BASIC, and didn't need to. When manufacturers figured out how to build compatible computers, the factors for selecting a PC compatible were compatibility and price, not a special version of BASIC. Software would be acquired separately from the hardware.

Mainframes and minicomputers were expensive systems, sold with operating systems and software. PCs were different creatures, sold with an operating system but not software.

It's an idea that holds today.

With software being sold (or distributed, as open source) separately from the hardware, there is no need to build variants. Commercial languages (C#, Java, Swift) are managed by the company, which has an incentive for standardization of the language. Open source languages (Perl, Python, Ruby) can be had "for free", so why build a special version -- especially when that special version will need constant effort to match the changes in the "original"? Standard-based languages (C, C++) offer certainty to customers, and variants on them offer little advantage.

The only language that has variants today seems to be SQL. That makes sense, as the SQL interpreter is bundled with the database. Creating a variant is a way of distinguishing a product from the competition.

I expect that the commercial languages will continue to evolve along consistent lines. Microsoft will enhance C#, but there will be only the Microsoft implementation (or at least, the only implementation of significance). Oracle will maintain Java. Apple will maintain Swift.

The open source languages will evolve too. But Perl, Python, and Ruby will continue to see single implementations.

SQL will continue be the outlier. It will continue to see variants, as different database vendors supply them. It will be interesting to see what happens with the various NoSQL databases.

Monday, August 8, 2016

Agile is all about code quality

Agile promises clean code. That's the purpose of the 'refactor' phase. After creating a test and modifying the code, the developer refactors the code to eliminate compromises made during the changes.

But how much refactoring is enough? One might flippantly say "as much as it takes" but that's not an answer.

For many shops, the answer seems to be "as much as the developer thinks is needed". Other shops allow refactoring until the end of the development cycle. The first is subjective and opens the development team to the risk of spending too much time on refactoring and not enough on adding features. The second is arbitrary and risks short-changing the refactoring phase and allowing messy code to remain in the system.

Agile removes risk by creating automated tests, creating them before modifying the code, and having developers run those automated tests after all changes. Developers must ensure that all tests pass; they cannot move on to other changes while tests are failing.

This process removes judgement from the developer. A developer cannot say that the code is "good enough" without the tests confirming it. The tests are the deciders of completeness.

I believe that we want the same philosophy for code quality. Instead of allowing a developer to decide when refactoring has reached "good enough", we will instead use an automated process to make that decision.

We already have code quality tools. C and C++ have had lint for decades. Other languages have tools as well. (Wikipedia has a page for static analysis tools.) Some are commercial, others open source. Most can be tailored to meet the needs of the team, placing more weight on some issues and ignoring others. My favorite at the moment is 'Rubocop', a style-checking tool for Ruby.

I expect that Agile processes will adopt a measured approach to refactoring. By using one (or several) code assessors, a team can ensure quality of the code.

Such a change is not without ramifications. This change, like the use of automated tests, takes judgement away from the programmer. Code assessment tools can consider many things, some of which are style. They can examine indentation, names of variables or functions, the length or complexity of a function, or the length of a line of code. They can check the number of layers of 'if' statements or 'while' loops.

Deferring judgement to the style checkers will affect managers as well as programmers. If a developer must refactor code until it passes the style checker, then a manager cannot cut short the refactoring phase. Managers will probably not like this change -- it takes away some control. Yet it is necessary to maintain code quality. By ending refactoring before the code is at an acceptable quality, managers allow poor code to remain in the system, which will affect future development.

Agile is all about code quality.

Sunday, July 31, 2016

Agile pushes ugliness out of the system

Agile differs from Waterfall in many ways. One significant way is that Agile handles ugliness, and Waterfall doesn't.

Agile starts by defining "ugliness" as an unmet requirement. It could be a new feature or a change to the current one. The Agile process sees the ugliness move through the system, from requirements to test to code to deployment. (Waterfall, in contrast, has the notion of requirements but not the concept of ugliness.)

Let's look at how Agile considers ugliness to be larger than just unmet requirements.

The first stage is an unmet requirement. With the Agile process, development occurs in a set of changes (sometimes called "sprints") with a small set of new requirements. Stakeholders may have a long list of unmet requirements, but a single sprint handles a small, manageable set of them. The "ugliness" is the fact that the system (as it is at the beginning of the sprint) does not perform them.

The second stage transforms the unmet requirements into tests. By creating a test -- an automated test -- the unmet requirement is documented and captured in a specific form. The "ugliness" has been captured and specified.

After capture, changes to code move the "ugliness" from a test to code. A developer changes the system to perform the necessary function, and in doing so changes the code. But the resulting code may be "ugly" -- it may duplicate other code, or it may be difficult to read.

The fourth stage (after unmet requirements, capture, and coding) is to remove the "ugliness" of the code. This is the "refactoring" stage, when code is improved without changing the functions it performs. Modifying the code to remove the ugliness is the last stage. After refactoring, the "ugliness" is gone.

The ability to handle "ugliness" is the unique capability of Agile methods. Waterfall has no concept of code quality. It can measure the number of defects, the number of requirements implemented, and even the number of lines of code, but it doesn't recognize the quality of the code. The quality of the code is simply its ability to deliver functionality. This means that ugly code can collect, and collect, and collect. There is nothing in Waterfall to address it.

Agile is different. Agile recognizes that code quality is important. That's the reason for the "refactor" phase. Agile transforms requirements into tests, then into ugly code, and finally into beautiful (or at least non-ugly) code. The result is requirements that are transformed into maintainable code.

Tuesday, July 26, 2016

The changing role of IT

The original focus of IT was efficiency and accuracy. Today, the expectation still includes efficiency and accuracy, yet adds increased revenue and expanded capabilities for customers.

IT has been with us for more than half a century, if you count IT as not only PCs and servers but also minicomputers, mainframes, and batch processing systems for accounting and finance.

Computers were originally large, expensive, and fussy beasts. They required an whole room to themselves. Computers cost a lot of money. Mainframes cost hundreds of thousands of dollars (if not millions). They needed a coterie of attendants: operators, programmers, service technicians, and managers.

Even the early personal computers were expensive. A PC in the early 1980s cost three to five thousand dollars. They didn't need a separate room, but they were a significant investment.

The focus was on efficiency. Computers were meant to make companies more efficient, processing transactions and generating reports faster and more accurately than humans.

Because of their cost, we wanted computers to operate as efficiently as possible. Companies who purchased mainframes would monitor CPU and disk usage to ensure that they were operating in the ninety-percent range. If usage was higher than that, they knew they needed to expand their system; if less, they had spent too much on hardware.

Today, we focus less on efficiency and more on growing the business. We view automation and big data as mechanisms for new services and ways to acquire new customers.

That's quite a shift from the "spend just enough to print accounting reports" mindset. What changed?

I can think of two underlying changes.

First, the size and cost of computers have dropped. A cell phone that fits in your pocket and costs less than a thousand dollars. Laptop PCs can be acquired for similar prices; Chromebooks for significantly less. Phones, tablets, Chromebooks, and even laptops can be operated by a single person.

The drop in cost means that we can worry less about internal efficiency. Buying a mainframe computer that was too large was an expensive mistake. Buying an extra laptop is almost unnoticed. Investing in IT is like any other investment, with a potential return of new business.

Yet there is another effect.

In the early days of IT (from the 1950s to the 1980s), computers were mysterious and almost magical devices. Business managers were unfamiliar with computers. Many people weren't sure that computers would remain tame, and some feared that they would take over (the company, the country, the world). Managers didn't know how to leverage computers to their full extent. Investors were wary of the cost. Customers resisted the use of computer-generated cards that read "do not fold, spindle, or mutilate".

Today, computers are not mysterious, and certainly not magical. They are routine. They are mundane. And business managers don't fear them. Instead, managers see computers as a tool. Investors see them as equipment. Customers willingly install apps on their phones.

I'm not surprised. The business managers of the 1950s grew up with manual processes. Senior managers might have remembered an age without electricity.

Today's managers are comfortable with computers. They used them as children, playing video games and writing programs in BASIC. The thought that computers can assist the business in various tasks is a natural extension of that experience.

Our view of computers has shifted. The large, expensive, magical computation boxes have shrunk and become cheaper, and are now small, flexible, and powerful computation boxes. Simply owning (or leasing) a mainframe would provide strategic advantage through intimidation; now everyone can leverage server farms, networks, cloud computing, and real-time updates. But owning (or leasing) a server farm or a cloud network isn't enough to impress -- managers, customers, and investors look for business results.

With a new view of computers as mundane, its no surprise that businesses look at them as a way to grow.

Thursday, July 21, 2016

Spaghetti in the Cloud

Will cloud computing eliminate spaghetti code? The question is a good one, and the answer is unclear.

First, let's understand the term "spaghetti code". It is a term that dates back to the 1970s according to Wikipedia and was probably an argument for structured programming techniques. Unstructured programming was harder to read and understand, and the term introduced an analogy of messy code.

Spaghetti code was bad. It was hard to understand. It was fragile, and small changes led to unexpected failures. Structured programming was, well, structured and therefore (theoretically) spaghetti programming could not occur under the discipline of structured programming.

But theory didn't work quite right, and even with the benefits of structured programming, we found that we had code that was difficult to maintain. (In other words, spaghetti code.)

After structured programming, object-oriented programming was the solution. Object-oriented programming, with its ability to group data and functions into classes, was going to solve the problems of spaghetti code.

Like structured programming before it, object-oriented programming didn't make all code easy to read and modify.

Which brings us to cloud computing. Will cloud computing suffer from "spaghetti code"? Will we have difficult to read and difficult to maintain systems in the cloud?

The obvious answer is "yes". Companies and individuals who transfer existing (difficult to read) systems into the cloud will have ... difficult-to-understand code in the cloud.

The more subtle answer is... "yes".

The problems of difficult-to-read code is not the programming style (unstructured, structured, or object-oriented) but in mutable state. "State" is the combination of values for all variables and changeable entities in a program. For a program with mutable state, these variables change over time. For one to read and understand the code, one must understand the current state, that is, the current value of all of those values. But to know the current value of those variables, one must understand all of the operations that led to the current state, and that list can be daunting.

The advocates of functional programming (another programming technique) doesn't allow for mutable variables. Variables are fixed and unchanging. Once created, they exist and retain their value forever.

With cloud computing, programs (and variables) do not hold state. Instead, state is stored in databases, and programs run "stateless". Programs are simpler too, with a cloud system using smaller programs linked together with databases and message queues.

But that doesn't prevent people from moving large, complicated programs into the cloud. It doesn't prevent people from writing large, complicated programs in the cloud. Some programs in the cloud will be small and easy to read. Others will be large and hard to understand.

So, will spaghetti code exist in the cloud? Yes. But perhaps not as much as in previous technologies.

Tuesday, July 19, 2016

How programming languages change

Programming languages change. That's not news. Yet programming languages cannot change arbitrarily; the changes are constrained. We should be aware of this, and pick our technology with this in mind.

If we think of a programming language as a set of features, then programming languages can change in three ways:

Add a feature
Modify a feature
Remove a feature

The easiest change (that is, the type with the least resistance from users) is adding a feature. That's no surprise; it allows all of the old programs to continue working.

Modifying an existing feature or removing a feature is a difficult business. It means that some programs will no longer work. (If you're lucky, they won't compile, or the interpreter will reject them. If you're not lucky, the compiler or interpreter will accept them but process them differently.)

So as a programming language changes, the old features remain. Look inside a modern Fortran compiler and you will find FORMAT statements and arithmetic IF constructs, elements of Fortran's early days.

When a programming language changes enough, we change its name. We (the tech industry) modified the C language to mandate prototypes and in doing so we called the revised language "ANSI C". When Stroustup enhanced C to handle object-oriented concepts, he called it "C with Classes". (We've since named it "C++".)

Sometimes we change not the name but the version number. Visual Basic 4 was quite different from Visual Basic 3, and Visual Basic 5 was quite different from Visual Basic 4 (two of the few examples of non-compatible upgrades). Yet the later versions retained the flavor of Visual Basic, so keeping the name made sense.

Perl 6 is different from Perl 5, yet it still runs old code with a compatibility layer.

Fortran can add features but must remain "Fortranish", otherwise we call it "BASIC" or "FOCAL" or something else. Algol must remain Algol or we call it "C". An enhanced Pascal is called "Object Pascal" or "Delphi".

Language names bound a set of features for the language. Change the feature set beyond the boundary, and you also change the name of the language. Which means that a language can change only so much, in only certain dimensions, while remaining the same language.

When we start a project and select a programming language, we're selecting a set of features for development. We're locking ourselves into a future, one that may expand over time -- or may not -- but will remain centered over its current point. COBOL will always be COBOL, C++ will always be C++, and Ruby will always be Ruby. A COBOL program will always be a COBOL program, a C++ program will always be a C++ program, and a Ruby program will always be a Ruby program.

A lot of this is psychology. We certainly could make radical changes to a programming language (any language) and keep the name. But while we *could* do this, we don't. We make small, gradual changes. The changes to programming languages (I hesitate to use the words "improvements" or "progress") are glacial in nature.

I think that tells us something about ourselves, not the technology.