Monday, July 28, 2014

Improving code can cause an explosion of classes

Object-oriented programming took the world by storm in the 1990s. Those early days saw a lot of programmers learning new skills.

It took some time to truly learn object-oriented programming. The jump from structured programming (or procedural code) to object-oriented programming was not small. (And is still not small.)

Many early attempts at object-oriented programming were inelegant if not amateurish. They contained mistakes, but the errors are only visible in hindsight. Programmers who are inexperienced in a new technique make mistakes. (I'm one of them.)

Common problems were:

  • large classes with many purposes
  • long functions (procedural code wrapped in object clothing)
  • excessive inheritance
  • too little inheritance
  • weak encapsulation (little or no use of access control)
  • little or no composition

Legacy systems often contain these problems. More than three decades after their inception, systems contain original design flaws. The problems remain because they are difficult to correct and the return on the investment is unclear. I often argue that a better design reduces maintenance costs in the future, and the counter-argument is that the current development team knows the code and would gain little from an improved design.

When I can convince the system owners of the benefits of improved code (and I am becoming more convincing over time), we see a remarkable transformation in the code.

The most obvious change is in the number of classes. The revised system contains many more classes, often several times the original number. Yet while the number of classes increases, the total number of lines of code decreases. The construction of new classes allows for the consolidation of duplicate code, something that occurs often in legacy systems.

The new classes are usually small. Instead of the large, multipurpose classes of the earlier design, I move functions to small, single-purpose classes. Some classes are mere data containers, others hold one or two elements and provide a small number of functions on those elements. While small, these classes have a big effect on the readability of the code: they eliminate low-level operations from high-level and mid-level code, allowing the reader to focus on the higher level operations.

Small classes are much easier to test, and much easier to test with automated tools. Even C++ can use automated tests to verify the operation of classes. Automated tests relieve a burden from developers (and testers or "QA" folk) and allow them to direct their efforts to building and maintaining meaningful tests.

A large number of small classes provides an additional benefit: the ability to group classes into libraries. Large (or large-ish) early object-oriented systems tend to group all of the classes into a single package, usually called "the application". With a large number of classes, the system maintainers see groups of classes emerge (perhaps all of the database classes, or all of the elementary data classes). These groupings can be formalized with libraries. For very large projects, these libraries can be maintained by different teams. Libraries can also be shared across multiple projects, reducing the duplication of effort at a larger scale.

Modernizing legacy systems can lead to an "explosion of classes", and this can be a good thing. Smaller classes are easier to understand and maintain. They can be tested independently. They can be grouped into libraries. Do not fear such an increase in the number of classes in your code.

Wednesday, July 23, 2014

Waterfall caused specialization; agile causes generalization

There are a number of differences between waterfall processes and agile processes. Waterfall defines one long, segmented process; agile uses a series of short iterations. Waterfall specifies a product on a specific date; agile guarantees a shippable product throughout the development process.

Another difference between waterfall and agile is the specialization of participants. Waterfall projects are divided into phases: analysis, development, testing, etc. and these phases tend to be long themselves. Agile projects have the same activities (analysis, development, testing) but on a much shorter timeframe. A waterfall project may extend for six months, or a year, or several years, and the phases of those projects may extend for months -- or possibly years.

The segmentation of a waterfall project leads to specialization of the participants. It is common to find a waterfall project staffed by analysts, developers, and testers, each a distinct team with distinct management teams. The different teams use tools specific to their tasks: databases for requirements, compilers and integrated development environments, test automation software and test case management systems.

This specialization is possible due to the long phases of waterfall projects. It is reasonable to have team "A" work on the requirements for a project and then (when the requirements are deemed complete) assign team "B" to the development phase. While team "B" develops the project, team "A" can compose the requirements for another project. Specialization allows for efficient scheduling of personnel.

Agile processes, in contrast, have short iterations of analysis, design, coding, and testing. Instead of months (or years), an iteration may be one or two weeks. With such a short period of time, it makes little sense to have multiple teams work on different aspects. The transfer of knowledge from one team to another, a small task in a multi-year project, is a large effort on a two-week iteration. Such inefficiencies are not practical for short projects, and the better approach is for a single team to perform all of the tasks. (Also, the two-week iteration is not divided into a neat linear sequence of analysis, design, development, and test. All activities occur throughout the iteration, with multiple instances of each task.)

A successful agile process needs people who can perform all of the tasks. It needs not specialists but generalists.

Years of waterfall projects have trained people and companies into thinking that specialists are more efficient than generalists. (There may be a bit of Taylorism here, too.) Such thinking is so pervasive that one finds specialization in the company's job descriptions. One can find specific job descriptions for business analysts, developers, and testers (or "QA Analysts").

The shift to agile projects will lead to a re-thinking of specialization. Generalists will be desired; specialists will find it difficult to fit in. Eventually, generalists will become the norm and specialists will become, well, special. Even the job descriptions will change, with the dominant roles being "development team members" with skills in all areas and a few specialist roles for special tasks.

Monday, July 14, 2014

Spreadsheets can help us learn functional programming

Spreadsheets are quite possibly the worst way to learn programming skills. And they may also be the best way to learn the next "wave" of programming skills. A contradiction? Perhaps.

First, by "spreadsheets" I mean the cell grid and its formulas. I am omitting Visual Basic for Applications (VBA) code which can accompany Microsoft Excel sheets.

Spreadsheets as a programming environment are capable and flexible. They let one assemble a set of data and formulas into a meaningful arrangement. They let you format the data. They provide immediate feedback, with the results of changes displayed immediately.

Spreadsheets also violate a lot of the generally accepted principals of program design. They mix input, data, calculation, and output. They have no mechanisms for structuring calculations or encapsulating data. They have no way to isolate data; everything is "global" and any cell can be used by any other cell.

The lack of structural elements means that spreadsheets tend to "scale up" poorly. A small set of data is easily handled. A somewhat larger set of data (if it is the same type of data) is also manageable. A larger collection of different types of data becomes a challenge. Even with multi-page spreadsheets, one starts allocating regions of a sheet for certain data and certain calculations. These regions become problematic as they grow -- especially if they grow at different rates.

There is no way to condense similar calculations. If ten cells (or one hundred cells) all perform the same operation, they must all contain the same formula. Internally, the spreadsheet may optimize memory usage, but from the "programmer's" point of view, the formulas are repeated. If the general formula must change, it must change in all the cells. (While it is easy to change the formula in one cell and then replicate it to the other cells, it is not always easy to identify which other cells use that formula.)

Spreadsheets offer nothing in the way of a high-level view. Everything is viewed at the cell level: to examine a formula, you must look at the specific cell that contains the formula.

So spreadsheets offer power and immediate feedback, two important aspects of programming. Yet they lack the concepts of structured programming (subroutines, control blocks) and the concepts of object-oriented programming (custom types, encapsulation, inheritance).

With all of these omissions, how can spreadsheets be a good way to learn the next programming style?

The answer is functions.

The next wave of programming (as I see it) is functional programming. With functional programming, one defines and uses functions, and functions are first-class constructs of the language. Functions can be passed as arguments to other functions. They can be constructed by functions, and evaluated by functions. The change from object-oriented programming to functional programming is as large (and maybe larger) than the change from structured programming to object-oriented programming.

Spreadsheets can help us learn functional programming because spreadsheets (the core, non-VBA version of spreadsheets) are all about functions. Every cell contains the result of a function. Once a cell's value is defined, it does not change. (Changing cells in the spreadsheet and pressing the "recalc" button is, in essence, modifying the program an re-executing it.)

Now, the comparison is not complete. Functional programming lets you pass functions as arguments to other functions and lets you build functions "on the fly", and spreadsheets let you do neither. So designing a spreadsheet is not the same as programming in a functional language.

But programming spreadsheets is a start. It is a jumping-off point. It is an introduction to some of the concepts of functional programming.

If you want to learn functional programming, perhaps a good place to start is with your local spreadsheet. Turn off (or ignore) the VBA or macro programming. Stick with cells, values, and functions. Avoid the "optimize" or "search for result" capabilities. Design spreadsheets that compute things that are easy in "real" programming languages. You may be stuck at first, given the constraints of spreadsheet calculations. But keep at it. You will learn techniques that can help you with the next wave of programming.

Tuesday, July 8, 2014

The center of the universe is moving

The real universe, the one in which we live and has planets and solar systems and galaxies, has no center. It is "finite but unbounded" which sounds a bit strange until you realize that the surface of the Earth is also finite but unbounded. There is no edge of the Earth, no end, no boundary. Yet it has a finite size. (The Earth as a planet has a center, but the surface of the Earth does not.)

The IT universe does have centers. For decades, the center of the hardware universe has been the desktop PC and the center of the software universe has been Microsoft Windows and applications for Windows.

That is changing. Windows is no longer the software center of the IT universe. The desktop PC is no longer the hardware center of the IT universe.

The center of the IT universe for consumers has shifted to Apple and Google. The popularity of the iPad, the iPhone, and Android phones shows this. Individuals are happy to purchase these devices. PCs, in contrast, are purchased grudgingly. The purchase of a PC does not instill excitement but resentment.

The center of the IT universe for enterprises remains close to PCs and Microsoft Windows, but it too is moving to cloud computing and mobile devices. Microsoft recognizes this; it has been expanding its Azure cloud services and selling tablets and phones. While it has had little success with mobile devices, it does enjoy some with cloud services. Microsoft is supporting multiple operating systems; its Office products now run on Apple iPads and Android devices.

What does this change mean for the rest of us?

Well, for consumers it means that we will see more options. Instead of the old world of "Windows-only applications running on Microsoft Windows on desktops or laptops", we will see services on Azure available on the device of our choosing.

For enterprises, the same options will appear. This fits in with the "Bring Your Own Device" philosophy, which shifts the costs of hardware from employers to employees.

For developers, the picture is more complex. The old method of developing an application (especially an enterprise application) for Windows only (because Windows was the center of the universe) must give way to a process that develops applications for multiple platforms. The new development paradigm must be mobile/cloud with multiple cloud apps and a solid cloud design.

Microsoft is supporting this new paradigm. Azure supports non-Microsoft products such as Linux. Visual Studio supports non-Microsoft products such as Git, and now targets iOS and Android in addition to Windows.

Almost overnight, the modern Windows-only applications have been graduated to the status of legacy systems.

Thursday, July 3, 2014

Bring back "minicomputer"

The term "minicomputer" is making a comeback.

Late last year, I attended a technical presentation in which the speaker referred to his smart phone as a "minicomputer".

This month, I read a magazine website that used the term minicomputer, referring to an ARM device for testing Android version L.

Neither of these devices is a minicomputer.

The term "minicomputer" was coined in the mainframe era, when all computers (well, all electronic computers) were large, required special rooms with dedicated air conditioning, and were attended by a team of operators and field engineers. Minicomputers were smaller, being about the size of a refrigerator and needing only one or two people to care for them. Revolutionary at the time, minicomputers allowed corporate and college departments set up their own computing environments.

I suspect that the term "mainframe" came into existence only after minicomputers obtained a noticeable presence.

In the late 1970s, the term "microcomputer" was used to describe the early personal computers (the Altair 8800, the IMSAI 8080, the Radio Shack TRS-80). But back to minicomputers.

For me and many others, the term "minicomputer" will always represent the department-sized computers made by Digital Equipment Corporation or Data General. But am I being selfish? Do I have the right to lock the term "minicomputer" to that definition?

Upon consideration, the idea of re-introducing the term "minicomputer" may be reasonable. We don't use the term today. Computers are either mainframes (that term is still in use), servers, desktops, laptops, tablets, phones, phablets, and ... whatever the open-board Arduino and Raspberry Pi devices are called. So the term "minicomputer" has been, in a sense, abandoned. As an abandoned term, it can be re-purposed.

But what devices should be tagged as minicomputers? The root "mini" implies small, as it does in "minimum" or "minimize". A "minicomputer" should therefore be "smaller than a (typical) computer".

What is a typical computer? In the 1960s, they were the large mainframes. And while mainframes exist today, one can hardly argue that they are typical: laptops, tablets, and phones are all outselling them. Embedded systems, existing in cars, microwave ovens, and cameras, are probably the most common form of computing device, but I consider them out of the running. First, they are already small and a smaller computer would be small indeed. Second, most people use those devices without thinking about the computer inside. They use a car, not a "car equipped with onboard computers".

So a minicomputer is something smaller that a desktop PC, a laptop PC, a tablet, or a smartphone.

I'm leaning towards the bare-board computers: the Arduino, the BeagleBone, the Raspberry Pi, and their brethren. These are all small computers in the physical sense, smaller than desktop and laptops. They are also small in power; typically they have low-end processors and limited memory and storage, so they are "smaller" (that is, less capable) that a smartphone.

The open-board computers (excuse me, minicomputers) are also a very small portion of the market, just as their refrigerator-sized namesakes.

Let's go have some fun with minicomputers!

Monday, June 30, 2014

Outsource with open source technologies

In the closed-source world, the market encourages duplicate efforts. Lotus creates and sells a spreadsheet, Borland creates and sells a spreadsheet, Microsoft creates and sells a spreadsheet... you get the idea. Each vendor can differentiate their product and make a profit. Vendors keep their source code closed, so each company must create their own spreadsheet from scratch.

The open source world is different. There is no need to create a competing product from scratch. The Libre Office project includes a word processor and a spreadsheet (among other things) and it is open source. If I wanted to create a competing spreadsheet, I could take the code from Libre Office, modify it (a little or a lot) and redistribute it. (The catch is that I would also have to distribute my modified version of the source code.)

Rather than build my own version with private enhancements, it would be easier to suggest my enhancements to the team that maintains Libre Office. With private enhancements, I have to make the same changes with each new release of Libre Office (assuming I want the latest version); by submitting my enhancements (and getting them included) they then become part of the product and I get them with each update. (Of course, so does everyone else.)

Open source is not "one solution only". It has different software packages that exist in the same "space". There are a multitude of text editors. There are different display managers for Linux. There are multiple windowing systems. One can even argue that the languages Awk, Perl, Python, and Ruby all compete. There can be competing efforts in open source.

The closed-source world does not always provide competition. It has settled on some "winner" programs: Microsoft Word for word processing. Microsoft Excel for spreadsheets. Photoshop for editing pictures. Competitors may emerge, but the cost of entry to the market it high.

In general, I think that the overall trend (for closed source and open source) is to move to a single package. The "network effect" exerts a gentle but consistent pull for a single solution in both worlds. The open source market falls quicker than the closed-source market; for-profit vendors have more to gain by keeping their product in the market. They resist the tug of the network effect.

Open source becomes a more efficient space. With fewer people working to create similar-but-different products, the open source world can work on a more diverse set of problems. Or it can invest less effort for the same result.

Many companies invest effort in core competencies and outsource non-essential activities. Open source may be the cost-effective method for those non-essential activities.

Sunday, June 15, 2014

Untangle code with automated testing

Of all of the tools and techniques for untangling code, the most important is automated testing.

What does automated testing have to do with the untangling of code?

Automated testing provides insurance. It provides a back-stop against which developers can make changes.

The task of untangling code, of making code readable, often requires changes across multiple modules and multiple classes. While a few improvements can be made to single modules (or classes), most require changes in multiple modules. Improvements can require changes to the methods exposed by a class, or remove access to member variables. These changes ripple though other classes.

Moreover, the improvement of tangled code often requires a re-thinking of the organization of the code. You move functions from one class to another. You rename variables. You split classes into smaller classes.

These are significant changes, and they can have significant effects on the operation of the code. Of course, while you want to change the organization of the code you want the results of calculations to remain unchanged. That's how automated tests can help.

Automated tests verify that your improvements have no effect on the calculations.

The tests must be automated. Manual tests are expensive: they require time and attention. Manual tests are easy to skip. Manual tests are easy to "flub". Manual tests can be difficult to verify. Automated tests are consistent, accurate, and most of all, cheap. They do not require attention or effort. They are complete.

Automated tests let programmers make significant improvements to the code base and have confidence that their changes are correct. That's how automated tests help you untangle code.