Sunday, May 22, 2016

Small check-ins saved me

With all of the new technology, from cloud computing to tablets to big data, we can forget the old techniques that help us.

This week, I was helped by one of those simple techniques. The technique that helped was frequent, small check-ins to version control systems. I was using Microsoft's TFS, but this technique works with any system: TFS, Subversion, git, CVS, ... even SourceSafe!

Small, frequent changes are easier to review and easier to revert than large changes. Any version control system accepts small changes; the decision to make large or small changes is up to the developer.

After a number of changes, the team with whom I work discovered a defect, one that had escaped our tests. We knew that it was caused by a recent change -- we tested releases and found that it occurred only in the most recent release. That information limited the introduction of the defect to the most recent forty check-ins.

Forty check-ins may seem like a long list, but we quickly identified the specific check-in by using a binary search technique: get the source from the middle revision; if the error occurs move to the earlier half, if not move to the later half and start in that half's middle.

The real benefit occurred when we found the specific check-in. Since all check-ins were small, this check-in was too. (It was a change of five different lines.) It was easy to review the five individual lines and find the error.

Once we found the error, it was easy to make the correction to the latest version of the code, run our tests (which now included an addition test for the specific problem we found), verify that the fix was correct, and continue our development.

A large check-in would have required much more examination, and more time.

Small check-ins cost little and provide easy verification. Why not use them?

Sunday, May 15, 2016

Agile values clean code; waterfall may but doesn't have to

Agile and Waterfall are different in a number of ways.

Agile promises that your code is always ready to ship. Waterfall promises that the code will be ready on a specific date in the future.

Agile promises that your system passes the tests (at least the tests for code that has been implemented). Waterfall promises that every requested feature will be implemented.

There is another difference between Agile and Waterfall. Agile values clean code; Waterfall values code that performs as intended but has no notion of code quality. The Agile cycle includes a step for refactoring, a time for developers to modify the code and improve its design. The Waterfall method has no corresponding step or phase.

Which is not to say that Waterfall projects always result in poorly designed code. It is possible to build well-designed code with Waterfall. Agile explicitly recognizes the value of clean code and allocates time for correcting design errors. Waterfall, in contrast, has its multiple phases (analysis, design, coding, testing, and deployment) with the assumption that working code is clean code -- or code of acceptable quality.

I have seen (and participated in) a number of Waterfall projects, and the prevailing attitude is that code improvements can always be made later, "as time allows". The problem is that time never allows.

Many project managers have the mindset that developers should be working on features with "business value". Typically these changes fall into one of three categories: feature to increase revenue, features to reduce costs, and defect corrections. The mindset also considers any effort outside of those areas to be not adding value to the business and therefore not worthy of attention.

Improving code quality is an investment in the future. It is positioning the code to handle changes -- in requirements or staff or technology -- and reducing the effort and cost of those changes. In this light, Agile is looking to the future, and waterfall is looking to the past (or perhaps only the current release).

Thursday, May 5, 2016

Where have all the operating systems gone?

We used to have lots of operating systems. Every hardware manufacturer built their own operating systems. Large manufacturers like IBM and DEC had multiple operating systems, introducing new ones with new hardware.

(It's been said that DEC became a computer company by accident. They really wanted to write operating systems, but they needed processors to run the them and compilers and editors to give them something to do, so they ended up building everything. It's a reasonable theory, given the number of operating systems they produced.)

In the 1970s CP/M was an attempt at an operating system for different hardware platforms. It wasn't the first; Unix was designed for multiple platforms prior. It wasn't the only one; the UCSD p-System used a virtual processor quite like the virtual machine in the Java JVM and ran on various hardware.

Today we also see lots of operating systems. Commonly used ones include Windows, Linux, Mac OS, iOS, Android, Chrome OS, and even watchOS. But are they really different?

Android and Chrome OS are really variants on Linux. Linux itself is a clone of Unix. Mac OS is derived from NetBSD which in turn is derived from the Berkeley System Distribution of Unix. iOS and watchOS are, according to Wikipedia, "Unix-like", and I assume that they are slim versions of NetBSD with added components.

Which means that our list of commonly-used operating systems becomes:

  • Windows
  • Unix

That's a rather small list. (I'm excluding the operating systems used for special purposes, such as embedded systems in automobiles or machinery or network routers.)

I'm not sure that this reduction in operating systems, this approach to a monoculture, is a good thing. Nor am I convinced that it is a bad thing. After all, a common operating system (or two commonly-used operating systems) means that lots of people know how they work. It means that software written for one variant can be easily ported to another variant.

I do feel some sadness at the loss of the variety of earlier years. The early days of microcomputers saw wide variations of operating systems, a kind of Cambrian explosion of ideas and implementations. Different vendors offered different ideas, in hardware and software. The industry had a different feel from today's world of uniform PCs and standard Windows installations. (The variances between versions of Windows, or even between the distros of Linux, and much smaller than the differences between a Data General minicomputer and a DEC minicomputer.)

Settling on a single operating system is a way of settling on a solution. We have a problem, and *this* operating system, *this* solution, is how we address it. We've settled on other standards: character sets, languages (C# and Java are not that different), storage devices, and keyboards. Once we pick a solution and make it a standard, we tend to not think about it. (Is anyone thinking of new keyboard layouts? New character sets?) Operating systems seem to be settling.


Sunday, May 1, 2016

Waterfall and agile depend on customer relations

The debate between Agile and Waterfall methods for project management seems to have forgotten about customers, and more specifically the commitments made to customers.

Waterfall and Agile methods differ in that Waterfall promises a specific set of functionality on a specific date, and Agile promises that the product is always ready to ship (but perhaps not with all the features you want). These two methods require different techniques for project management, but also imply different relationships with customers.

It's easy to forget that the customers are the people who actually pay for the product (or service). Too often, we focus on the "internal customer" and think of due dates and service level agreements. But let's think about the traditional customers.

If your business promises specific functionality to customers on specific dates, they you probably want the waterfall project management techniques. Agile methods are a poor fit. They may work for the development and testing of the product, they don't mesh with the schedules developed by people making promises to customers.

Lots of companies promise specific deliverables at a future date. Some companies have to deliver releases in time for other, external events, such as Intuit's TurboTax in time for tax season. Microsoft has announced software in advance, sometimes to deter competition. (Microsoft is not alone in this activity.)

Not every company works this way. Google, for example, upgrades its search page and apps (Google Docs, Google Sheets) when they can. They have never announced -- in advance -- new features for their search page. (They do announce changes, sometimes a short period before they implement them, but not too far in advance.) Amazon.com rolls out changes to its sales pages and web service platform as they can. There is no "summer release" and no analog to a new car model year. And they are successful.

If your company promises new versions (or new products) on specific dates, you may want to manage your projects with the Waterfall method. The Agile method will fit into your schedule poorly, as it doesn't promise what Agile promises.

You may also want to review your company's philosophy towards releases. Do you need to release software on specific dates? Must you follow a rigid release schedule? For all of your releases?

Sunday, April 17, 2016

After the spreadsheet

The spreadsheet is a wonderful thing. It performs several functions: it holds and organizes data, it specifies calculations, and it presents results. All of these functions, wrapped into a single package, provides convenience to users. Yet that convenience of the single package will be the spreadsheet's undoing.

For the individual, the spreadsheet is a useful tool. But for the enterprise, the spreadsheet creates perhaps more problems than it solves. Since a spreadsheet file contains the data, formulas, and presentation of data, they are often replicated to share with co-workers (usually via e-mail) and duplicated to process different sets of data (the spring sales figures and then the summer sales figures, for example).

The replication of spreadsheets via e-mail can me mitigated by the use of shared file locations ("network drives") and by online versions of spreadsheets which allow for multiple users. But the bigger problem is the duplication of spreadsheets with minor changes.

The duplication of spreadsheet means the duplication of not only the data (which is often changed) but also the duplication of the formulas and the presentation (which often do not change). Since a spreadsheet contains all three components, a new version of data requires a new copy of all components. There is no way to share only one component, no way to share formulas against new data, or different presentations against data and formulas. This means that, over time, an enterprise of any size accumulates multiple spreadsheets with different data and duplicate formulas and macros -- at least you hope that they are duplicate copies.

The design of spreadsheets -- containing the data, formulas, and presentation in one package -- is a holdover from the days of Visicalc and Lotus 1-2-3. Those programs were developed for the Apple II and the IBM PC. With their ability to run only one program at a time, putting everything into one program made sense -- using one program for data, another for calculation, and a third for presentation was awkward and time-consuming. But that applies to the old single-tasking operating systems. Windows and Mac OS and Linux allow for multiple programs to run at the same time, and windowing systems allow for multiple programs to present information to the user at the same time.

If spreadsheets were being invented now in the age of web services and cloud systems and multi-window displays, their design would probably be quite different. Instead of a single program that performed all functions and a single file that contained data, formulas and presentation, we might have something very different. We might create a system of web services, providing data with some and performing calculations with others. The results could be displayed by yet other functions in other windows, possibly published for co-workers to view.

Such a multi-component system would follow the tenets of Unix, which recommends small, independent programs that read data, perform some processing, and provide data. The data and computations could be available via web services. A central service could "fan out" requests to collect data from one or more services, send that data through one or more computing services, and the provide the data to a presentation mechanism such as a graph in a window or even a printed report.

By separating the formulas and macros from the data, we can avoid needless duplication of both. (While most cases see the duplication of formulas to handle different data sets, sometimes different formulas can be applied to the same data.)

Providing data via web services is easy -- web services do that today. There are even web services to convert data into graphs. What about calculations? What language can be used to perform computations on data sets?

The traditional languages of C# and Java are not the best here; we're replacing spreadsheets with something equally usable by non-programmers (or at least similarly usable). The best candidate may be R, the statistical-oriented language. R is established, cross-platform, and capable. It's also a high-level language, close the the formulas of spreadsheets (and more powerful that Microsoft's VBA, which is used for macros in Excel).

Replacing spreadsheets with a trio of data management, computation, and presentation tools will not be easy. The advantages of the spreadsheet include convenience and familiarity. The advantages of separate components are better integration in cloud systems, leveraging of web services, and easier audits of formulas. It may not happen soon, but I think it will happen eventually.

Thursday, April 14, 2016

Technology winners and losers

Technology has been remarkably stable for the past three decades. The PC dominated the hardware market and Microsoft dominated the software market.

People didn't have to use PCs and Microsoft Windows. They could choose to use alternative solutions, such as Apple Macintosh computers with Mac OS. They could use regular PCs with Linux. But the people using out-of-mainstream technology *chose* to use it. They knew what they were getting into. They knew that they would be a minority, that when they entered a computer shop that most of the offerings would be for the other, regular Windows PCs and not their configuration.

The market was not always this way. In the years before the IBM PC, different manufacturers provided different systems: the Apple II, the TRS-80, DEC's Pro-325 and Pro-350, the Amiga, ... there were many. All of those systems were swept aside by the IBM PC, and all of the enthusiasts for those systems knew the pain of loss. They had lost their chosen system to the one designated by the market as the standard.

In a recent conversation with a Windows enthusiast, I realized that he was feeling a similar pain in his situation. He was dejected at the dearth of support for Windows phones -- he owned such a phone, and felt left out of the mobile revolution. Windows phones are out-of-mainstream, and many apps do not run on them.

I imagine that many folks in the IT world are feeling the pain of loss. Some because they have Windows phones. Others because they have been loyal Microsoft users for decades, perhaps their entire career, and now Windows is no longer the center of the software world.

This is their first exposure to loss.

The grizzled veterans who remember CP/M or Amiga DOS have had our loss; we know how to cope. The folks who used WordPerfect or Lotus 1-2-3 had to switch to Microsoft products, they know loss too. But no technology has been forced from the market for quite some time. Perhaps the last was IBM's OS/2, back in the 1990s. (Or perhaps Visual Basic, when it was modified to VB.NET.)

But IT consists of more than grizzled veterans.

For someone entering the IT world after the IBM PC (and especially after the rise of Windows), it would be possible -- and even easy -- to enjoy a career in dominant technologies: stay within the Microsoft set of technology and everything was mainstream. Microsoft technology was supported and accepted. Learning Microsoft technologies such as SQL Server and SharePoint meant that you were on the "winning team".

A lot of folks in technology have never known this kind of technology loss. When your entire career has been with successful, mainstream technology, the change is unsettling.

Microsoft Windows Phone is a technology on the edge. It exists, but it is not mainstream. It is a small, oddball system (in the view of the world). It is not the "winning team"; iOS and Android are the popular, mainstream technologies for phones.

As Microsoft expands beyond Windows with Azure and apps for iOS and Android, it competes with more companies and more technologies. Azure competes with Amazon.com's AWS and Google's Compute Engine. Office Online and Office 365 compete with Google Docs. OneDrive competes with DropBox and BOX. Microsoft's technologies are not the de facto standard, not always the most popular, and sometimes the oddball.

For the folks confronting a change to their worldview that Microsoft technology is always the most popular and most accepted (to a worldview that different technologies compete and sometimes Microsoft loses), a example to follow would be ... Microsoft.

Microsoft, after years of dominance with the Windows platform and applications, has widened its view. It is not "the Windows company" but a technology company that supplies Windows. More than that, it is a technology company that supplies Windows, Azure, Linux, and virtual machines. It is a company that supplies Office applications on Windows, iOS, and Android. It is a technology company that supplies SQL Server on Windows and soon Linux.

Microsoft adapts. It changes to meet the needs of the market.

That's a pretty good example.

Sunday, April 10, 2016

Complexity of programming languages

Are programming languages becoming more or less complex?

It is a simple question, and like many simple questions, the answer is not so simple.

Let's look at a simple program in some languages. The simple program will print the numbers from 1 to 10. Here are programs in FORTRAN, BASIC, Pascal, C, C++, C#, and Python:

FORTRAN 66 (1966)

      DO 100 I = 1, 10
100   WRITE (6, 200) I
200   FORMAT (I5)
      STOP
      END

BASIC (1965)

10 FOR I = 1 TO 10
20 PRINT I
30 NEXT I
99 END

Pascal (1970)

program hello;
begin
  i: integer;

  for i := 1 to 10 do

    WriteLn(i);
end.

C (1978)

#include "stdlib.h"

int main(void)
{
    int i;

    for (i = 1; i <= 10; i++)

      printf("%d", i);
}

C++ (1983)

#include

int main()

{
    for (unsigned int i = 1; i <= 10; i++)
      std::cout << i << std::endl;

    return 0;

}

C# (2000)

using System;

class Program

{
    public static void Main(string[] args)
    {
        for (int i = 1; i <= 10; i++)
            Console.WriteLine(i);
    }
}

Python (2000)

for i in range(1, 10):
    print(str(i))

From this small sampling, a few things are apparent.

First, the programs vary in length. Python is the shortest with only two lines. It's also a recent language, so are languages becoming more terse over time? Not really, as FORTRAN and BASIC are the next shortest languages (with 5 and 4 lines, respectively) and C#, a contemporary of Python, requires 10 lines.

Second, the formatting of program statements has changed. FORTRAN and BASIC, the earliest languages in this set, have strong notions about columns and lines. FORTRAN limits line length to 72 characters, reserves the first 6 columns for line numbers, and another column for continuation characters (which allows statements to exceed the 72 character limit). BASIC relaxed the column restrictions but add the requirement for line numbers on each line. Pascal, C, C++, and C# care nothing about columns and lines, looking at tokens in the code to separate statements. Python relies on indentation for block definitions.

Some languages (BASIC, C#) are capable of printing things by simply mentioning them. Others languages need specifications. FORTRAN has FORMAT statements to specify the exact format of the output. C has a printf() function that needs similar formatting information. I find the mechanisms of BASIC and C# easier to use than the mechanisms of C and Python.

Let's consider a somewhat more complex program, one that lists a set of prime numbers. We'll look at BASIC and Lua, which span the computing age.

Lua

local N = 100
local M = 10
function PRIME()  -- PROCEDURE DECLARATION;
  local X, SQUARE, I, K, LIM, PRIM -- DECLARATION OF VARIABLES;
  local P, V = {}, {}
  P[1] = 2 -- ASSIGNMENT TO FIRST ELEMENT OF p;
  print(2) -- OUTPUT A LINE CONTAINING THE NUMBER 2;
  X = 1
  LIM = 1
  SQUARE = 4
  for I = 2, N do -- LOOP. I TAKES ON 2, 3, ... N;
    repeat -- STOPS WHEN "UNTIL" CONDITION IS TRUE;
      X = X + 2
      if SQUARE <= X then
        V[LIM] = SQUARE
        LIM = LIM + 1
        SQUARE = P[LIM] * P[LIM]
      end
      local K = 2
      local PRIM = true
      while PRIM and K < LIM do
        if V[K] < X then
          V[K] = V[K] + P[K]
        end
        PRIM = X ~= V[K]
        K = K + 1
      end
    until PRIM -- THIS LINE CLOSES THE REPEAT
    P[I] = X
    print(X)
  end
end
PRIME()


BASIC

100 LET N = 100
110 LET M = 10
200 DIM P(100), V(100)
300 LET P(1) = 2
310 PRINT P(1)
320 LET X = 1
330 LET L = 1
340 LET S = 4
350 FOR I = 2 TO N
360  REM    repeat -- STOPS WHEN "UNTIL" CONDITION IS TRUE;
370   LET X = X + 2
380   IF S > X THEN 420
390    LET V(L) = S
400    LET L = L + 1
410    LET S = P(L)^2
420   REM
430   LET K = 2
440   LET P = 1
450   REM while PRIM and K < LIM do
455   IF P <> 1 THEN 520
460   IF K >= L THEN 520
470    IF V(K) >= X THEN 490
480     LET V(K) = V(K) + P(K)
490    REM
500    LET P = 0
501    IF X = V(K) THEN 510
502    LET P = 1
510    LET K = K + 1
515   GOTO 450
520  IF P <> 1 THEN 360
530  LET P(I) = X
540  PRINT X
550 NEXT I
999 END

Both programs work. They produce identical output.  The version in Lua may be a bit easier to read, given that variable names can be more than a single letter.

They are about the same size and complexity. Two versions of a program, one from today and one from the early years of computing, yet they have similar complexity.

Programming languages encode operations into pseudo-English instructions. If the measure of a programming language's capability is its capacity to represent operations in a minimal number of steps, then this example shows that programming languages have not changed over the past five decades.

Caution is advised. These examples (printing "hello" and calculating prime numbers) may be poor representatives of typical programs. Perhaps we should withhold judgement until we consider more (and larger) programs. After all, very few people in 2016 use BASIC; there must be a reason they have selected modern languages.

Perhaps it is better to keep asking the question and examining our criteria for programming languages.