Thursday, July 31, 2014

Not so special

The history of computers has been the history of things becoming not special.

First were the mainframes. Large, expensive computers ordered, constructed, delivered, and used as a single entity. Only governments and wealthy corporations could own (or lease) a computer. Once acquired, the device was a singleton: it was "the computer". It was special.

Minicomputers reduced the specialness of computers. Instead of a single computer, a company (or a university) could purchase several minicomputers. Computers were no longer single entities in the organization. Instead of "the computer" we had "the computer for accounting" or "the computer for the physics department".

The opposite of "special" is "commodity", and personal computers brought us into a world of commodity computers. A company could have hundreds (or thousands) of computers, all identical.

Yet some computers retained their specialness. E-mail servers were singletons -- and therefore special. Web servers were special. Database servers were special.

Cloud computing reduces specialness again. With cloud systems, we can create virtual systems on demand, from pre-stocked images. We can store an image of a web server and when needed, instantiate a copy and start using it. We have not a single web server but as many as we need. The same holds for database servers. (Of course, cloud systems are designed to use multiple web servers and multiple database servers.)

In the end, specialness goes away. Computers, all computers, become commodities. They are not special.

Monday, July 28, 2014

Improving code can cause an explosion of classes

Object-oriented programming took the world by storm in the 1990s. Those early days saw a lot of programmers learning new skills.

It took some time to truly learn object-oriented programming. The jump from structured programming (or procedural code) to object-oriented programming was not small. (And is still not small.)

Many early attempts at object-oriented programming were inelegant if not amateurish. They contained mistakes, but the errors are only visible in hindsight. Programmers who are inexperienced in a new technique make mistakes. (I'm one of them.)

Common problems were:

  • large classes with many purposes
  • long functions (procedural code wrapped in object clothing)
  • excessive inheritance
  • too little inheritance
  • weak encapsulation (little or no use of access control)
  • little or no composition

Legacy systems often contain these problems. More than three decades after their inception, systems contain original design flaws. The problems remain because they are difficult to correct and the return on the investment is unclear. I often argue that a better design reduces maintenance costs in the future, and the counter-argument is that the current development team knows the code and would gain little from an improved design.

When I can convince the system owners of the benefits of improved code (and I am becoming more convincing over time), we see a remarkable transformation in the code.

The most obvious change is in the number of classes. The revised system contains many more classes, often several times the original number. Yet while the number of classes increases, the total number of lines of code decreases. The construction of new classes allows for the consolidation of duplicate code, something that occurs often in legacy systems.

The new classes are usually small. Instead of the large, multipurpose classes of the earlier design, I move functions to small, single-purpose classes. Some classes are mere data containers, others hold one or two elements and provide a small number of functions on those elements. While small, these classes have a big effect on the readability of the code: they eliminate low-level operations from high-level and mid-level code, allowing the reader to focus on the higher level operations.

Small classes are much easier to test, and much easier to test with automated tools. Even C++ can use automated tests to verify the operation of classes. Automated tests relieve a burden from developers (and testers or "QA" folk) and allow them to direct their efforts to building and maintaining meaningful tests.

A large number of small classes provides an additional benefit: the ability to group classes into libraries. Large (or large-ish) early object-oriented systems tend to group all of the classes into a single package, usually called "the application". With a large number of classes, the system maintainers see groups of classes emerge (perhaps all of the database classes, or all of the elementary data classes). These groupings can be formalized with libraries. For very large projects, these libraries can be maintained by different teams. Libraries can also be shared across multiple projects, reducing the duplication of effort at a larger scale.

Modernizing legacy systems can lead to an "explosion of classes", and this can be a good thing. Smaller classes are easier to understand and maintain. They can be tested independently. They can be grouped into libraries. Do not fear such an increase in the number of classes in your code.

Wednesday, July 23, 2014

Waterfall caused specialization; agile causes generalization

There are a number of differences between waterfall processes and agile processes. Waterfall defines one long, segmented process; agile uses a series of short iterations. Waterfall specifies a product on a specific date; agile guarantees a shippable product throughout the development process.

Another difference between waterfall and agile is the specialization of participants. Waterfall projects are divided into phases: analysis, development, testing, etc. and these phases tend to be long themselves. Agile projects have the same activities (analysis, development, testing) but on a much shorter timeframe. A waterfall project may extend for six months, or a year, or several years, and the phases of those projects may extend for months -- or possibly years.

The segmentation of a waterfall project leads to specialization of the participants. It is common to find a waterfall project staffed by analysts, developers, and testers, each a distinct team with distinct management teams. The different teams use tools specific to their tasks: databases for requirements, compilers and integrated development environments, test automation software and test case management systems.

This specialization is possible due to the long phases of waterfall projects. It is reasonable to have team "A" work on the requirements for a project and then (when the requirements are deemed complete) assign team "B" to the development phase. While team "B" develops the project, team "A" can compose the requirements for another project. Specialization allows for efficient scheduling of personnel.

Agile processes, in contrast, have short iterations of analysis, design, coding, and testing. Instead of months (or years), an iteration may be one or two weeks. With such a short period of time, it makes little sense to have multiple teams work on different aspects. The transfer of knowledge from one team to another, a small task in a multi-year project, is a large effort on a two-week iteration. Such inefficiencies are not practical for short projects, and the better approach is for a single team to perform all of the tasks. (Also, the two-week iteration is not divided into a neat linear sequence of analysis, design, development, and test. All activities occur throughout the iteration, with multiple instances of each task.)

A successful agile process needs people who can perform all of the tasks. It needs not specialists but generalists.

Years of waterfall projects have trained people and companies into thinking that specialists are more efficient than generalists. (There may be a bit of Taylorism here, too.) Such thinking is so pervasive that one finds specialization in the company's job descriptions. One can find specific job descriptions for business analysts, developers, and testers (or "QA Analysts").

The shift to agile projects will lead to a re-thinking of specialization. Generalists will be desired; specialists will find it difficult to fit in. Eventually, generalists will become the norm and specialists will become, well, special. Even the job descriptions will change, with the dominant roles being "development team members" with skills in all areas and a few specialist roles for special tasks.

Monday, July 14, 2014

Spreadsheets can help us learn functional programming

Spreadsheets are quite possibly the worst way to learn programming skills. And they may also be the best way to learn the next "wave" of programming skills. A contradiction? Perhaps.

First, by "spreadsheets" I mean the cell grid and its formulas. I am omitting Visual Basic for Applications (VBA) code which can accompany Microsoft Excel sheets.

Spreadsheets as a programming environment are capable and flexible. They let one assemble a set of data and formulas into a meaningful arrangement. They let you format the data. They provide immediate feedback, with the results of changes displayed immediately.

Spreadsheets also violate a lot of the generally accepted principals of program design. They mix input, data, calculation, and output. They have no mechanisms for structuring calculations or encapsulating data. They have no way to isolate data; everything is "global" and any cell can be used by any other cell.

The lack of structural elements means that spreadsheets tend to "scale up" poorly. A small set of data is easily handled. A somewhat larger set of data (if it is the same type of data) is also manageable. A larger collection of different types of data becomes a challenge. Even with multi-page spreadsheets, one starts allocating regions of a sheet for certain data and certain calculations. These regions become problematic as they grow -- especially if they grow at different rates.

There is no way to condense similar calculations. If ten cells (or one hundred cells) all perform the same operation, they must all contain the same formula. Internally, the spreadsheet may optimize memory usage, but from the "programmer's" point of view, the formulas are repeated. If the general formula must change, it must change in all the cells. (While it is easy to change the formula in one cell and then replicate it to the other cells, it is not always easy to identify which other cells use that formula.)

Spreadsheets offer nothing in the way of a high-level view. Everything is viewed at the cell level: to examine a formula, you must look at the specific cell that contains the formula.

So spreadsheets offer power and immediate feedback, two important aspects of programming. Yet they lack the concepts of structured programming (subroutines, control blocks) and the concepts of object-oriented programming (custom types, encapsulation, inheritance).

With all of these omissions, how can spreadsheets be a good way to learn the next programming style?

The answer is functions.

The next wave of programming (as I see it) is functional programming. With functional programming, one defines and uses functions, and functions are first-class constructs of the language. Functions can be passed as arguments to other functions. They can be constructed by functions, and evaluated by functions. The change from object-oriented programming to functional programming is as large (and maybe larger) than the change from structured programming to object-oriented programming.

Spreadsheets can help us learn functional programming because spreadsheets (the core, non-VBA version of spreadsheets) are all about functions. Every cell contains the result of a function. Once a cell's value is defined, it does not change. (Changing cells in the spreadsheet and pressing the "recalc" button is, in essence, modifying the program an re-executing it.)

Now, the comparison is not complete. Functional programming lets you pass functions as arguments to other functions and lets you build functions "on the fly", and spreadsheets let you do neither. So designing a spreadsheet is not the same as programming in a functional language.

But programming spreadsheets is a start. It is a jumping-off point. It is an introduction to some of the concepts of functional programming.

If you want to learn functional programming, perhaps a good place to start is with your local spreadsheet. Turn off (or ignore) the VBA or macro programming. Stick with cells, values, and functions. Avoid the "optimize" or "search for result" capabilities. Design spreadsheets that compute things that are easy in "real" programming languages. You may be stuck at first, given the constraints of spreadsheet calculations. But keep at it. You will learn techniques that can help you with the next wave of programming.

Tuesday, July 8, 2014

The center of the universe is moving

The real universe, the one in which we live and has planets and solar systems and galaxies, has no center. It is "finite but unbounded" which sounds a bit strange until you realize that the surface of the Earth is also finite but unbounded. There is no edge of the Earth, no end, no boundary. Yet it has a finite size. (The Earth as a planet has a center, but the surface of the Earth does not.)

The IT universe does have centers. For decades, the center of the hardware universe has been the desktop PC and the center of the software universe has been Microsoft Windows and applications for Windows.

That is changing. Windows is no longer the software center of the IT universe. The desktop PC is no longer the hardware center of the IT universe.

The center of the IT universe for consumers has shifted to Apple and Google. The popularity of the iPad, the iPhone, and Android phones shows this. Individuals are happy to purchase these devices. PCs, in contrast, are purchased grudgingly. The purchase of a PC does not instill excitement but resentment.

The center of the IT universe for enterprises remains close to PCs and Microsoft Windows, but it too is moving to cloud computing and mobile devices. Microsoft recognizes this; it has been expanding its Azure cloud services and selling tablets and phones. While it has had little success with mobile devices, it does enjoy some with cloud services. Microsoft is supporting multiple operating systems; its Office products now run on Apple iPads and Android devices.

What does this change mean for the rest of us?

Well, for consumers it means that we will see more options. Instead of the old world of "Windows-only applications running on Microsoft Windows on desktops or laptops", we will see services on Azure available on the device of our choosing.

For enterprises, the same options will appear. This fits in with the "Bring Your Own Device" philosophy, which shifts the costs of hardware from employers to employees.

For developers, the picture is more complex. The old method of developing an application (especially an enterprise application) for Windows only (because Windows was the center of the universe) must give way to a process that develops applications for multiple platforms. The new development paradigm must be mobile/cloud with multiple cloud apps and a solid cloud design.

Microsoft is supporting this new paradigm. Azure supports non-Microsoft products such as Linux. Visual Studio supports non-Microsoft products such as Git, and now targets iOS and Android in addition to Windows.

Almost overnight, the modern Windows-only applications have been graduated to the status of legacy systems.

Thursday, July 3, 2014

Bring back "minicomputer"

The term "minicomputer" is making a comeback.

Late last year, I attended a technical presentation in which the speaker referred to his smart phone as a "minicomputer".

This month, I read a magazine website that used the term minicomputer, referring to an ARM device for testing Android version L.

Neither of these devices is a minicomputer.

The term "minicomputer" was coined in the mainframe era, when all computers (well, all electronic computers) were large, required special rooms with dedicated air conditioning, and were attended by a team of operators and field engineers. Minicomputers were smaller, being about the size of a refrigerator and needing only one or two people to care for them. Revolutionary at the time, minicomputers allowed corporate and college departments set up their own computing environments.

I suspect that the term "mainframe" came into existence only after minicomputers obtained a noticeable presence.

In the late 1970s, the term "microcomputer" was used to describe the early personal computers (the Altair 8800, the IMSAI 8080, the Radio Shack TRS-80). But back to minicomputers.

For me and many others, the term "minicomputer" will always represent the department-sized computers made by Digital Equipment Corporation or Data General. But am I being selfish? Do I have the right to lock the term "minicomputer" to that definition?

Upon consideration, the idea of re-introducing the term "minicomputer" may be reasonable. We don't use the term today. Computers are either mainframes (that term is still in use), servers, desktops, laptops, tablets, phones, phablets, and ... whatever the open-board Arduino and Raspberry Pi devices are called. So the term "minicomputer" has been, in a sense, abandoned. As an abandoned term, it can be re-purposed.

But what devices should be tagged as minicomputers? The root "mini" implies small, as it does in "minimum" or "minimize". A "minicomputer" should therefore be "smaller than a (typical) computer".

What is a typical computer? In the 1960s, they were the large mainframes. And while mainframes exist today, one can hardly argue that they are typical: laptops, tablets, and phones are all outselling them. Embedded systems, existing in cars, microwave ovens, and cameras, are probably the most common form of computing device, but I consider them out of the running. First, they are already small and a smaller computer would be small indeed. Second, most people use those devices without thinking about the computer inside. They use a car, not a "car equipped with onboard computers".

So a minicomputer is something smaller that a desktop PC, a laptop PC, a tablet, or a smartphone.

I'm leaning towards the bare-board computers: the Arduino, the BeagleBone, the Raspberry Pi, and their brethren. These are all small computers in the physical sense, smaller than desktop and laptops. They are also small in power; typically they have low-end processors and limited memory and storage, so they are "smaller" (that is, less capable) that a smartphone.

The open-board computers (excuse me, minicomputers) are also a very small portion of the market, just as their refrigerator-sized namesakes.

Let's go have some fun with minicomputers!