Sunday, August 17, 2014

Reducing the cost of programming

Different programming languages have different capabilities. And not surprisingly, different programming languages have different costs. Over the years, we have found ways of reducing those costs.

Costs include infrastructure (disk space for compiler, memory) and programmer training (how to write programs, how to compile, how to debug). Notice that the load on the programmer can be divided into three: infrastructure (editor, compiler), housekeeping (declarations, memory allocation), and business logic (the code that gets stuff done).

Symbolic assembly code was better than machine code. In machine code, every instruction and memory location must be laid out by the programmer. With a symbolic assembler, the computer did that work.

COBOL and FORTRAN reduced cost by letting the programmer not worry about the machine architecture, register assignment, and call stack management.

BASIC (and time-sharing) made editing easy, eliminated compiling, and made running a program easy. Results were available immediately.

Today we are awash in programming languages. The big ones today (C, Java, Objective C, C++, BASIC, Python, PHP, Perl, and JavaScript -- according to Tiobe) are all good at different things. That is perhaps not a coincidence. People pick the language best suited to the task at hand.

Still, it would be nice to calculate the cost of the different languages. Or if numeric metrics are not possible, at least rank the languages. Yet even that is difficult.

One can easily state that C++ is more complex than C, and therefore conclude that programming in C++ is more expensive that C. Yet that's not quite true. Small programs in C are easier to write than equivalent programs in C++. Large programs are easier to write in C++, since the ability to encapsulate data and group functions into classes helps one organize the code. (Where 'small' and 'large' are left to the reader to define.)

Some languages are compiled and some that are interpreted, and one can argue that a separate step to compile is an expense. (It certainly seems like an expense when I am waiting for the compiler to finish.) Yet languages with compilers (C, C++, Java, C#, Objective-C) all have static typing, which means that the editor built into an IDE can provide information about variables and functions. When editing a program written in one of the interpreted languages, on the other hand, one does not have that help from the editor. The interpreted languages (Perl, Python, PHP, and JavaScript) have dynamic typing, which means that the type of a variable (or function) is not constant but can change as the program runs.

Switching from an "expensive" programming language (let's say C++) to a "reduced cost" programming language (perhaps Python) is not always possible. Programs written in C++ perform better. (On one project, the C++ program ran for several hours; the equivalent program in Perl ran for several days.) C and C++ let one have access to the underlying hardware, something that is not possible in Java or C# (at least not without some add-in trickery, usually involving... C++.)

The line between "cost of programming" and "best language" quickly blurs, and nailing down the costs for the different dimensions of programming (program design, speed of coding, speed of execution, ability to control hardware) get in our way.

In the end, I find that it is easy to rank languages in the order of my preference rather than in an unbiased scheme. And even my preferences are subject to change, given the nature of the project. (Is there existing code? What are other team members using? What performance constraints must we meet?)

Reducing the cost of programming is really about trade-offs. What capabilities do we desire, and what capabilities are we willing to cede? To switch from C++ to C# may mean faster development but slower performance. To switch from PHP to Java may mean better organization of code through classes but slower development. What is it that we really want?

Monday, August 11, 2014

Agile is not compatible with silos

Agile development methods are very different from the traditional Waterfall methods. So different that they can affect the culture of the organization.

Agile make a different promise than Waterfall. Waterfall promises a specific deliverable on a specific date; Agile promises that you can ship whenever you want.

Agile discourages specialization. An iteration is short yet requires analysis, development, and testing. Such a short cycle does not allow for different individuals to perform different tasks.

Yet the biggest difference between Agile and Waterfall is the partitioning of tasks and the encapsulation of information. Waterfall strives for clean, discrete changes from one phase to another, with information flowing between phases in well-defined documents. The flow between the requirements phase and the development phase is the requirements document (or documents). The test results are presented in a specific document. And so on.

Information in each phase is encapsulated in that phase, and only a small set of information is allowed to transfer (one might say 'leak') to another phase.

The partitioning of tasks and the encapsulation of information leads to silos within the organization. Once separate teams are established for requirements, development, testing, and deployment, tensions arise between teams. The testing team identifies defects that reflect on the development team. The development team blames the requirements team for incomplete or ambiguous specifications.

Agile -- at least Agile for small teams -- has none of that. The fast cycles of feature selection, design, development, and test provide immediate feedback. An ambiguous requirement is spotted early, and it is obvious to everyone. Defects are identified and fixed before implementing the next feature set.

More importantly, an Agile project has one team, and the measurement of success for that team is the delivery of software. That focus on success and the inability to shift blame to another team means that it is harder to establish silos.

Which is not to say that Agile will eliminate all silos. An organization with many Agile projects can still have silos. A large company using an "Agile for large companies" process may develop silos.

But for the most part, I believe Agile processes are incompatible with silos. The involvement of necessary stakeholders; the coordinated work of design, development, and testing; and the fast cycle times all push against silo-ization.

Thursday, July 31, 2014

Not so special

The history of computers has been the history of things becoming not special.

First were the mainframes. Large, expensive computers ordered, constructed, delivered, and used as a single entity. Only governments and wealthy corporations could own (or lease) a computer. Once acquired, the device was a singleton: it was "the computer". It was special.

Minicomputers reduced the specialness of computers. Instead of a single computer, a company (or a university) could purchase several minicomputers. Computers were no longer single entities in the organization. Instead of "the computer" we had "the computer for accounting" or "the computer for the physics department".

The opposite of "special" is "commodity", and personal computers brought us into a world of commodity computers. A company could have hundreds (or thousands) of computers, all identical.

Yet some computers retained their specialness. E-mail servers were singletons -- and therefore special. Web servers were special. Database servers were special.

Cloud computing reduces specialness again. With cloud systems, we can create virtual systems on demand, from pre-stocked images. We can store an image of a web server and when needed, instantiate a copy and start using it. We have not a single web server but as many as we need. The same holds for database servers. (Of course, cloud systems are designed to use multiple web servers and multiple database servers.)

In the end, specialness goes away. Computers, all computers, become commodities. They are not special.

Monday, July 28, 2014

Improving code can cause an explosion of classes

Object-oriented programming took the world by storm in the 1990s. Those early days saw a lot of programmers learning new skills.

It took some time to truly learn object-oriented programming. The jump from structured programming (or procedural code) to object-oriented programming was not small. (And is still not small.)

Many early attempts at object-oriented programming were inelegant if not amateurish. They contained mistakes, but the errors are only visible in hindsight. Programmers who are inexperienced in a new technique make mistakes. (I'm one of them.)

Common problems were:

  • large classes with many purposes
  • long functions (procedural code wrapped in object clothing)
  • excessive inheritance
  • too little inheritance
  • weak encapsulation (little or no use of access control)
  • little or no composition

Legacy systems often contain these problems. More than three decades after their inception, systems contain original design flaws. The problems remain because they are difficult to correct and the return on the investment is unclear. I often argue that a better design reduces maintenance costs in the future, and the counter-argument is that the current development team knows the code and would gain little from an improved design.

When I can convince the system owners of the benefits of improved code (and I am becoming more convincing over time), we see a remarkable transformation in the code.

The most obvious change is in the number of classes. The revised system contains many more classes, often several times the original number. Yet while the number of classes increases, the total number of lines of code decreases. The construction of new classes allows for the consolidation of duplicate code, something that occurs often in legacy systems.

The new classes are usually small. Instead of the large, multipurpose classes of the earlier design, I move functions to small, single-purpose classes. Some classes are mere data containers, others hold one or two elements and provide a small number of functions on those elements. While small, these classes have a big effect on the readability of the code: they eliminate low-level operations from high-level and mid-level code, allowing the reader to focus on the higher level operations.

Small classes are much easier to test, and much easier to test with automated tools. Even C++ can use automated tests to verify the operation of classes. Automated tests relieve a burden from developers (and testers or "QA" folk) and allow them to direct their efforts to building and maintaining meaningful tests.

A large number of small classes provides an additional benefit: the ability to group classes into libraries. Large (or large-ish) early object-oriented systems tend to group all of the classes into a single package, usually called "the application". With a large number of classes, the system maintainers see groups of classes emerge (perhaps all of the database classes, or all of the elementary data classes). These groupings can be formalized with libraries. For very large projects, these libraries can be maintained by different teams. Libraries can also be shared across multiple projects, reducing the duplication of effort at a larger scale.

Modernizing legacy systems can lead to an "explosion of classes", and this can be a good thing. Smaller classes are easier to understand and maintain. They can be tested independently. They can be grouped into libraries. Do not fear such an increase in the number of classes in your code.

Wednesday, July 23, 2014

Waterfall caused specialization; agile causes generalization

There are a number of differences between waterfall processes and agile processes. Waterfall defines one long, segmented process; agile uses a series of short iterations. Waterfall specifies a product on a specific date; agile guarantees a shippable product throughout the development process.

Another difference between waterfall and agile is the specialization of participants. Waterfall projects are divided into phases: analysis, development, testing, etc. and these phases tend to be long themselves. Agile projects have the same activities (analysis, development, testing) but on a much shorter timeframe. A waterfall project may extend for six months, or a year, or several years, and the phases of those projects may extend for months -- or possibly years.

The segmentation of a waterfall project leads to specialization of the participants. It is common to find a waterfall project staffed by analysts, developers, and testers, each a distinct team with distinct management teams. The different teams use tools specific to their tasks: databases for requirements, compilers and integrated development environments, test automation software and test case management systems.

This specialization is possible due to the long phases of waterfall projects. It is reasonable to have team "A" work on the requirements for a project and then (when the requirements are deemed complete) assign team "B" to the development phase. While team "B" develops the project, team "A" can compose the requirements for another project. Specialization allows for efficient scheduling of personnel.

Agile processes, in contrast, have short iterations of analysis, design, coding, and testing. Instead of months (or years), an iteration may be one or two weeks. With such a short period of time, it makes little sense to have multiple teams work on different aspects. The transfer of knowledge from one team to another, a small task in a multi-year project, is a large effort on a two-week iteration. Such inefficiencies are not practical for short projects, and the better approach is for a single team to perform all of the tasks. (Also, the two-week iteration is not divided into a neat linear sequence of analysis, design, development, and test. All activities occur throughout the iteration, with multiple instances of each task.)

A successful agile process needs people who can perform all of the tasks. It needs not specialists but generalists.

Years of waterfall projects have trained people and companies into thinking that specialists are more efficient than generalists. (There may be a bit of Taylorism here, too.) Such thinking is so pervasive that one finds specialization in the company's job descriptions. One can find specific job descriptions for business analysts, developers, and testers (or "QA Analysts").

The shift to agile projects will lead to a re-thinking of specialization. Generalists will be desired; specialists will find it difficult to fit in. Eventually, generalists will become the norm and specialists will become, well, special. Even the job descriptions will change, with the dominant roles being "development team members" with skills in all areas and a few specialist roles for special tasks.

Monday, July 14, 2014

Spreadsheets can help us learn functional programming

Spreadsheets are quite possibly the worst way to learn programming skills. And they may also be the best way to learn the next "wave" of programming skills. A contradiction? Perhaps.

First, by "spreadsheets" I mean the cell grid and its formulas. I am omitting Visual Basic for Applications (VBA) code which can accompany Microsoft Excel sheets.

Spreadsheets as a programming environment are capable and flexible. They let one assemble a set of data and formulas into a meaningful arrangement. They let you format the data. They provide immediate feedback, with the results of changes displayed immediately.

Spreadsheets also violate a lot of the generally accepted principals of program design. They mix input, data, calculation, and output. They have no mechanisms for structuring calculations or encapsulating data. They have no way to isolate data; everything is "global" and any cell can be used by any other cell.

The lack of structural elements means that spreadsheets tend to "scale up" poorly. A small set of data is easily handled. A somewhat larger set of data (if it is the same type of data) is also manageable. A larger collection of different types of data becomes a challenge. Even with multi-page spreadsheets, one starts allocating regions of a sheet for certain data and certain calculations. These regions become problematic as they grow -- especially if they grow at different rates.

There is no way to condense similar calculations. If ten cells (or one hundred cells) all perform the same operation, they must all contain the same formula. Internally, the spreadsheet may optimize memory usage, but from the "programmer's" point of view, the formulas are repeated. If the general formula must change, it must change in all the cells. (While it is easy to change the formula in one cell and then replicate it to the other cells, it is not always easy to identify which other cells use that formula.)

Spreadsheets offer nothing in the way of a high-level view. Everything is viewed at the cell level: to examine a formula, you must look at the specific cell that contains the formula.

So spreadsheets offer power and immediate feedback, two important aspects of programming. Yet they lack the concepts of structured programming (subroutines, control blocks) and the concepts of object-oriented programming (custom types, encapsulation, inheritance).

With all of these omissions, how can spreadsheets be a good way to learn the next programming style?

The answer is functions.

The next wave of programming (as I see it) is functional programming. With functional programming, one defines and uses functions, and functions are first-class constructs of the language. Functions can be passed as arguments to other functions. They can be constructed by functions, and evaluated by functions. The change from object-oriented programming to functional programming is as large (and maybe larger) than the change from structured programming to object-oriented programming.

Spreadsheets can help us learn functional programming because spreadsheets (the core, non-VBA version of spreadsheets) are all about functions. Every cell contains the result of a function. Once a cell's value is defined, it does not change. (Changing cells in the spreadsheet and pressing the "recalc" button is, in essence, modifying the program an re-executing it.)

Now, the comparison is not complete. Functional programming lets you pass functions as arguments to other functions and lets you build functions "on the fly", and spreadsheets let you do neither. So designing a spreadsheet is not the same as programming in a functional language.

But programming spreadsheets is a start. It is a jumping-off point. It is an introduction to some of the concepts of functional programming.

If you want to learn functional programming, perhaps a good place to start is with your local spreadsheet. Turn off (or ignore) the VBA or macro programming. Stick with cells, values, and functions. Avoid the "optimize" or "search for result" capabilities. Design spreadsheets that compute things that are easy in "real" programming languages. You may be stuck at first, given the constraints of spreadsheet calculations. But keep at it. You will learn techniques that can help you with the next wave of programming.

Tuesday, July 8, 2014

The center of the universe is moving

The real universe, the one in which we live and has planets and solar systems and galaxies, has no center. It is "finite but unbounded" which sounds a bit strange until you realize that the surface of the Earth is also finite but unbounded. There is no edge of the Earth, no end, no boundary. Yet it has a finite size. (The Earth as a planet has a center, but the surface of the Earth does not.)

The IT universe does have centers. For decades, the center of the hardware universe has been the desktop PC and the center of the software universe has been Microsoft Windows and applications for Windows.

That is changing. Windows is no longer the software center of the IT universe. The desktop PC is no longer the hardware center of the IT universe.

The center of the IT universe for consumers has shifted to Apple and Google. The popularity of the iPad, the iPhone, and Android phones shows this. Individuals are happy to purchase these devices. PCs, in contrast, are purchased grudgingly. The purchase of a PC does not instill excitement but resentment.

The center of the IT universe for enterprises remains close to PCs and Microsoft Windows, but it too is moving to cloud computing and mobile devices. Microsoft recognizes this; it has been expanding its Azure cloud services and selling tablets and phones. While it has had little success with mobile devices, it does enjoy some with cloud services. Microsoft is supporting multiple operating systems; its Office products now run on Apple iPads and Android devices.

What does this change mean for the rest of us?

Well, for consumers it means that we will see more options. Instead of the old world of "Windows-only applications running on Microsoft Windows on desktops or laptops", we will see services on Azure available on the device of our choosing.

For enterprises, the same options will appear. This fits in with the "Bring Your Own Device" philosophy, which shifts the costs of hardware from employers to employees.

For developers, the picture is more complex. The old method of developing an application (especially an enterprise application) for Windows only (because Windows was the center of the universe) must give way to a process that develops applications for multiple platforms. The new development paradigm must be mobile/cloud with multiple cloud apps and a solid cloud design.

Microsoft is supporting this new paradigm. Azure supports non-Microsoft products such as Linux. Visual Studio supports non-Microsoft products such as Git, and now targets iOS and Android in addition to Windows.

Almost overnight, the modern Windows-only applications have been graduated to the status of legacy systems.