Thursday, December 26, 2013

Big data is about action, and leadership

Big data. One of the trends of the year. A new technology that brings new opportunities for businesses and organizations that use it.

Big data also brings challenges: the resources to collect and store large quantities of data, the tools to analyze and present large quantities of data, and the ability to act on that data. That last item is the most important.

Collecting data, storing it, and analyzing it are tasks that mostly consist of technology, and technology is easily available. Data storage (through SAN or NAS or cloud-based storage) is a matter of money and equipment. Collecting data may be a little harder, since you must decide on what to collect and then you must make the programming changes to perform the actual collection -- but those are not that hard.

Analyzing data is also mostly a matter of technology. We have the computing hardware and the analytic software to "slice and dice" data and serve it up in graphs and visualizations.

The hard part of big data is none of the above. The hard part of big data is deciding a course of action and executing it. Big data gives you information. It gives you insight. And I suspect that it gives you those things faster than your current systems.

It's one thing to collect the data. It's another thing to change your procedures and maybe even your business. Collecting the data is primarily a technology issue. Changing procedures is often a political one. People are (often) reluctant to change. Changes to business plans may shift the balance of power within an organization. Your co-workers may be unwilling to give up some of that power. (Of course, others may be more than happy to gain power.)

The challenge of big data is not in the technology but in the changes driven by big data and the leadership for those changes. Interpreting data, deciding on changes, executing those changes, and repeating that cycle (possibly more frequently than before) is the payoff of big data.

Saturday, December 21, 2013

Files no more

One difference between traditional IT and the new cloud IT is the storage of data. Traditional IT systems (desktop PCs) stored data in files; cloud IT systems store data in ... well, that's not so clear. And the opaqueness of cloud systems may be a good thing.

In the Old World of desktop IT, we stored data in files and used that data in application programs. Often, the data stored in files was stored in a format that was specific to the application: documents would be stored in Microsoft Word format, spreadsheets stored in Microsoft Excel format, etc. The operational model was to run a program, load a file, make changes, and then save the file. The center of the old world was files, with application programs orbiting.

In the New World of cloud computing, we store data in... something... and use that data in applications that run on servers. Thus, with Google Drive (the new name for Google Docs) we store our data on Google's servers and access our data through our browser. Google's servers recall the data and present a view of that data to us through our browser. We can make changes and save the data -- although changes in Google Drive are saved automatically.

Are we storing data in files? Well, perhaps. The data is not stored on our PC, but on Google's servers. That is the magic of "software as a service" -- we can access our data from anywhere.

Getting back to the data. Google must store our data somewhere. Is it stored in a file? Or is it stored as a byte-stream in a datastore like CouchDB or memcached? Our viewpoint on our local PC does not allow us to peer inside of the Google machine, so we have no way to tell how our data is stored.

Yes, I know that Google Drive lets us download our data to a file on our PC. We can pick the location, the name, and even the format for the file. But that is not the core existence of the data, it is an "export" operation that extracts data from Google's world and hands it to us. (We can later import that same data back into Google's world, should we want.)

With software as a service (SaaS), our data is stored, but not as files on our local filesystem. Instead, it is stored in the cloud system and the details are hidden from us.

I think that this is an advance. In traditional IT, storing data in files was necessary, a way to store information that would be usable by an application program. (At least, it was the method used by the original Unix and DEC operating systems.) The notion of a file was an agreement between the processors of data and the keepers of data.

Files are not the only method of storing data. Many systems store data in databases, organizing data by rows and columns. While the databases themselves may store their data in files, the database client applications see only the database API and manipulate records and columns.

I've been picking on Google Drive, but the same logic applies to any software-as-a-service, including Microsoft's Office 365. When we move a document from our PC into Microsoft's cloud, it is the same as moving it into Google's cloud. Are the bytes stored as a separate file, or are they stored in a different container -- perhaps SQL Server? We don't know, and we don't care.

We don't really care about files, or about filesystems. We care about our data.

Tuesday, December 17, 2013

The transition from object-oriented to functional programming

I am convinced that we will move from object-oriented programming to functional programming. I am also convinced that the transition will be a difficult one, more difficult that the transition from structured programming to object-oriented programming.

The transition from structured programming to object-oriented programming was difficult. Object-oriented programming required a new view of programming, a new way of organizing data and code. For programmers who had learned the ways of structured programming, the shift to object-oriented programming meant learning new techniques.

That in itself is not enough to cause the transition to functional programming to be more difficult. Functional programming is, like object-oriented programming, a new view of programming and a new way of organizing data and code. Why would the transition to functional programming be more difficult?

I think the answer lies within the organization of programs. Structured programming, object-oriented programming, and functional programming all specify a number of rules for programs. For structured programming, functions and subroutines should have a single entry point and exit point and IF/THEN/ELSE blocks and FOR/NEXT loops should be used instead of GOTO statements.

Object-oriented programming groups data into classes and uses polymorphism to replace some conditional statements. But object-oriented programming was not totally incompatible with structured programming (or 'procedural programming'). Object-oriented programming allowed for a top level of new design with lower layers of the old design. Many early object-oriented programs had large chunks of procedural code (and some still do to this day). The thin layer of objects simply acted as a container for structured code.

Functional programming doesn't have this same degree of compatibility with object-oriented programming (or structured programming). Functional programming uses immutable objects; object-oriented programming is usually about mutable objects. Functional programming works with sets of data and leverages tail recursion efficiently, object-oriented programming uses the explicit loops and conditional statements of procedural programming.

The constructs of functional programming work poorly at containing object-oriented constructs. The "trick" of wrapping old code in a containing layer of new code may not work with functional programming and object-oriented programming. It may be better to build functional programming constructs inside of object oriented programming constructs, working from the "inside out" rather than from the "outside in" of the object-oriented transition.

One concept that has helped me transition is that of immutable objects. This is a notion that I have "imported" from functional programming into object-oriented programming. (And I must admit that the idea is not mine or not even new; Java's String objects are immutable and have been since its inception.)

The use of immutable objects has improved my object-oriented programs. It has moved me in the direction of functional programming -- a step in the transition.

I believe that we will transition from object-oriented programming to functional programming. I foresee a large effort to do so, and I foresee that some programs will remain object-oriented programs, just as some legacy programs remain procedural programs. I am uncertain of the time frame; it may be in the next five years or the next twenty. (The advantages of functional programming are compelling, so I'm tending to think sooner rather than later.)

Sunday, December 15, 2013

Readable does not necessarily mean what you think it means

An easy way to start an argument among programmers (or possibly a fist-fight) is to ask about the readability of programming languages. Most developers have strong opinions about programming languages, especially when you are not paying them.

I'm not sure that "readability" is an aspect of a programming language. Instead, I think readability is an aspect of a specific program (written in any language). I have used many languages (assembly language, BASIC, FORTRAN, Pascal, COBOL, dBase, RBase, Visual Basic, Delphi, C, C++, Java, C#, Perl, Ruby, and Python, to name the popular ones), and examined many programs (small, large, tutorials, and production-level).

I've seen readable programs in all of these languages (including assembly language). I've seen hard-to-read programs in all of those languages, too.

If readability is not an aspect of a programming language but of the program itself, then what makes a program readable? Lots of people have commented on the topic over the decades. The popular ideas are usually:

  • Use comments
  • Follow coding standards
  • Use meaningful names for variables and functions (and classes)
  • Use structured programming techniques
  • Have high cohesion within modules (or classes) and low coupling

These are all worthy ideas.

I would like to add one more: balance the levels of abstraction. Programs of non-trivial complexity are divided into modules. (For object-oriented programming, these divisions are classes. For structured programming, these are functions and libraries.) A readable program will have a balanced set of modules. By "balanced", I mean in terms of size and complexity.

Given a program and its modules, you can take measurements. Typical measurements include: lines of code (derided by many, and for good reasons), cyclomatic complexity, software "volume", and function points. It doesn't particularly matter which metric; any of them give you relative sizes of modules. Once you have module sizes, you can plot those sizes on a chart.

Frequently, readable programs have a smooth curve of module sizes. And just as frequently, hard-to-read programs have a jagged curve. Hard-to-read programs tend to have one or a few large modules and a few or many small modules.

I'm not sure why such a correlation exists. (And perhaps it doesn't; I admit that my observations are limited and the correlation may be proven false with a larger data set.)

Yet I have a theory.

As a programmer examines the code, he frequently moves from one module to another, following the logic and the flow of control. Modules divide the program not only into smaller components but also into levels. The different levels of the program handle different levels of abstraction. Low levels contain small units of code, small collections of data. Higher levels contain larger units of code that are composed of smaller units.

Moving from module to module means (often) moving from one level of organization to another, lower, level. With a smooth distribution of module sizes, the cognitive difference between levels is small. A path from the top level to a module on the bottom level may pass through many intermediate modules, but each transition is small and easy to understand.

A program with a "lumpy" distribution of module sizes, in contrast, lacks this set of small jumps from level to level. Instead, a program with a lumpy distribution of module sizes (and module complexity) has a lumpy set of transitions from layer to layer. Instead of a smooth set of jumps from the top level to the bottom, the jumps occur erratically, some small and some large. This disparity in jump size puts a cognitive load on the reader, making it hard to follow the changes.

If my theory is right, then a readable program is one that consists of layers, with the difference between any two adjacent layers no larger than some critical amount. (And, conversely, a hard-to-read program has multiple layer-pairs that have differences larger than that amount.)

I think that readability is an attribute of a program, not a programming language. We can write readable programs in any language (well, almost any language; a few obscure languages are designed for obfuscation). I think that the commonly accepted ideas for readable programs (comments, coding standards, meaningful names) are good ideas and helpful for building readable programs. I also think that we must structure our programs to have small differences between levels of abstraction.

Wednesday, December 11, 2013

Tablets change our expectations of software

PC software, over time, has grown in size and complexity. For any product, any version had more features (and used more memory) than the previous version.

Microsoft Windows almost made such a change. The marketing materials for Windows offered many things; one of them was simplicity. Windows programs would be "easy to use", so easy that they would be "intuitive".

While software under Windows become more consistent (identical commands to open and save files) and programs could share data (via the clipboard), they were not simpler. The steady march of "more features" continued. Today, Microsoft Word and Microsoft Excel are the standards for word processing and spreadsheets, and both are bulging with features, menus, add-ins, and sophisticated scripting. Competing PC software duplicates all of those features.

Software for tablets is following a different model.

The products for tablets released by Apple, Google, and even Microsoft are reduced versions of their PC counterparts. Apple's "Pages" is a word processor, but smaller than Microsoft Word. Google's "Drive" (the app formerly called "Docs") is a word processor with fewer features than Microsoft Word, and a spreadsheet with fewer features than Microsoft Excel. Even Microsoft's version of Word and Excel for Windows RT omits certain functions.

I see three drivers of this change:

The tablet interface limits features: Tablets have smaller screens, virtual keyboards, and the software generally has no drop-down menus. It is quite difficult to translate a complex PC program into the tablet environment.

Users want tablet software to be simple: Our relationship with tablets is more intimate than our relationship with PCs. We carry tablets with us, and generally pick the one we want. PCs, in contrast, stay at a fixed location and are assigned to us (especially in the workplace). We accept complexity in PC apps, but we push back against complexity in tablet apps.

Tablets complement PCs: We use tablets and PCs for different tasks and different types of work. Tablets let us consume data and perform specific, structured transactions. We can check the status of our systems, or view updates, or read news. We can bank online or update time-tracking entries with tablet apps. For long-form composition, we still use PCs with their physical keyboards, high-precision mice, and larger screens.

The demands placed upon tablet software is different from the demands placed upon desktop software. (I consider laptop PCs to be portable desktop PCs.) We want desktop PCs for composition; we want tablets for consumption and structured transactions. Those different demands are pushing software in different directions: complex for desktops, simple for tablets.

Monday, December 9, 2013

Off-the-shelf components still need attention

Typical advice for the builder of a system is often "when possible, use commercial products". The idea is that a commercial product is more stable, better maintained, and cheaper in the long run.

Today, that advice might be amended to "use commercial or open source products". The idea is to use a large, ready-made component to do the work, rather than write your own.

For some tasks, a commercial product does make sense. It's better to use the commercial (or open source) compilers for C#, Java, and C++ programs that to write your own compilers. It's better to use the commercial (or open source) word processors and spreadsheets.

Using off-the-shelf applications is a pretty easy decision.

Assembling systems with commercial (or open source) products is a bit trickier.

The issue is dependencies. When building a system, your system becomes dependent on the off-the-shelf components. It also becomes dependent on any external web services.

For some reason, we tend to think of off-the-shelf components (or external web services) as long-lasting. We tend to think that they will endure forever. Possibly because we want to forget about them, to use them like electricity or water -- something that is always available.

Commercial products do not live forever. Popular products such as WordPerfect, Lotus Notes, and Windows XP have all been discontinued. If you build a system on a commercial product and it (the commercial product) goes away, what happens to your system?

Web services do not live forever. Google recently terminated its RSS Reader. Other web services have been born, lived, and died. If you build your system on a web service and it (the web service) goes away, what happens to your system?

Product management is about many things, and one of them is dependency management. A responsible product manager knows about the components used within the product (or has people who know). A responsible product manager keeps tabs on those components (or has people who keep tabs). A responsible product manager has a plan for alternative components, should the current ones be discontinued.

Leveraging off-the-shelf products is a reasonable tactic for system design. It can save time and allow your people to work on the critical, proprietary, custom components that are needed to complete the system. But those off-the-shelf components still require a watchful eye and planning; they do not alleviate you of your duties. They can fail, they can be discontinued, and they can break on new versions of the operating system (yet another off-the-shelf component, and another dependency).

Build you system. Rebuild your legacy system. Use off-the-shelf components when they make sense. Use external web services when they make sense.

Stay aware of those components and stay aware of their status in the market. Be prepared for change.

Thursday, December 5, 2013

The future popularity of languages

The popularity of a programming language matters. Popular languages are, well, popular -- they are used by a large number of people. That large user base implies a large set of available programmers who can be hired (important if you are running a project), a large set of supporting documents and web sites (important for managers and hobbyists), and significant sales for tools and training (important if you sell tools and training).

But there are two senses of popular. One sense is the "lots of people like it now" sense; the other is "lots of people will like it in the future" sense. This distinction was apparent at the beginning of my career, when PCs were new and the major programming languages of COBOL and FORTRAN were not the BASIC and Pascal and C of the PC revolution.

COBOL and FORTRAN were the heavyweights, the languages used by serious people in the business of serious programming. BASIC and Pascal and C (at least on PCs) were the languages of hobbyists and dreamers, "toy" languages on "toy" computers. Yet it was C that gave us C++ and eventually Java and C#, the heavyweight languages of today.

The measurements of language popularity blur these two groups of practitioners and dreamers. The practitioners use the tools for established systems and existing enterprises: in the 1970s and 1980s they used COBOL and today they use C++, C#, and Java. The dreamers of the 1970s used BASIC, C, Forth, and Pascal; today they use... well, what do they use?

The Programming Language Popularity web site contains a number of measures. I think that the most "dreamish" of these is the Github statistics. Github is the site for open source projects of all sizes, from enterprise level down to individual enthusiast. It seems a better measure than the "Craigslist" search (which would be weighted towards corporate projects) or the "Google" search (which would include tutorials and examples but perhaps little in the way of dreams).

The top languages in the "Github" list are:

  • Objective C
  • JavaScript
  • Ruby
  • Java
  • Python
  • PHP

A little later down the list (but still in the top twenty) are: Scala, Haskell, Clojure, Lua, and Erlang.

I think that the "Github" list is a good forward indicator for language popularity. I believe that some of these languages will be the future mainstream development languages.

Which ones exactly, I'm not sure.

Java is already a mainstream language; this index indicates a continued interest. I suspect JavaScript will have a bright future quite soon, with Microsoft supporting it for app development. Apple iOS uses Objective-C, so that one is also fairly obvious.

Languages rise and fall in popularity. Project managers would do well to track the up-and-coming languages. Software tends to live for a long time; if you want to stay with mainstream programming languages, you must move with the mainstream. Looking ahead may help with that effort.

Sunday, December 1, 2013

Echo chambers in the tech world

We have echo chambers in the tech world. Echo chambers are those channels of communication that reinforce certain beliefs, sometimes correct and sometimes not. They exist in the political world, but are not limited to that milieu.

The Apple world has the belief that they are immune to malware, that viruses and other nasty things happen only to Windows and Microsoft products. The idea is "common knowledge", and many Macintosh owners will confirm it. But the idea is more than common; it is self-enforcing. Should a Macintosh owner say "I'm going to buy anti-virus software", other Mac owners will convince (or attempt to convince) him otherwise.

The echo chamber of the Apple world enforces the idea that Apple products are not susceptible to attack.

There is a similar echo chamber for the Linux world.

The Microsoft world has an opposite echo chamber, one that insists that Windows is not secure and extra software is required.

These are beliefs, created in earlier times, that endure. People keep the idea from that earlier time. In other words, people have trained themselves to think of Windows as insecure. Microsoft Windows was insecure, but is more secure. (Yes, it is not perfectly secure.) Similarly, Apple products (and Linux) are not completely secure but people have trained themselves to think that they are.

I will make some statements that people may find surprising and perhaps objectionable:

  • Microsoft Windows is fairly secure (not perfect, but pretty good)
  • Apple MacOS X is not perfect and has security flaws
  • Linux (any variant) is not perfect and has security flaws

We need to be aware of our echo chambers, our dearly-held "common knowledge" that may be false. Such ideas may be comforting, but they lead us away from truth.

Tuesday, November 26, 2013

Microsoft generally doesn't innovate

The cool reception of Microsoft's Surface tablets, Windows 8, and Windows RT has people complaining about Microsoft's (in)ability to innovate. Somehow, people have gotten the idea that Microsoft must innovate to stay competitive.

Looking at the history of Microsoft products, I cannot help but think that innovation has been only a small part of their success. Microsoft has been successful due to its marketing, its contracts, its monopoly on Windows, and its proprietary formats.

Still not convinced? Consider these Microsoft products, and their "innovativeness":

  • MS-DOS: purchased from another company
  • C compiler: purchased from Lattice, initially
  • Windows: a better version of OS/2, or MacOS, or AmigaDOS
  • SourceSafe: purchased from another company
  • Visual Studio: a derivation of their earlier IDE, cloned from Borland's TurboPascal
  • C# and .NET: a copy of Java and the JVM
  • Windows RT: a variant of Apple's iOS
  • Azure: a product to compete with Amazon.com's web services and cloud
  • the Surface tablet: a variant on Apple's iPad
  • Word: a better version of WordPerfect
  • Excel: a better version of Microsoft's Multiplan, made to compete with Lotus 1-2-3
  • the Xbox: a game console to compete with Sony and Nintendo game consoles

I argue that Microsoft is an excellent copier of ideas, and not an innovator. All of these products were not innovations by Microsoft.

Some might observe that the list of Microsoft products is much longer than the one I have presented. Others my observe that Microsoft continually improves its products, especially after purchasing one from another company. (Certainly the case for their C compiler, Visual SourceSafe, and C#.)

To be fair, I should list the innovative products from Microsoft:

  • Microsoft BASIC
  • Visual BASIC

Microsoft is not devoid of innovation. But innovation is not Microsoft's big game. Microsoft is better at copying existing products and technologies, re-casting them into Microsoft's own product line, and improving them over time. Those are their strengths.

People may decry Microsoft's lack of innovation. But this is not a new development. Over its history, Microsoft has focussed on other strategies, and gotten good results.

I don't worry about Windows 8 and the Surface tablets being "non-innovative". They are useful products, and I have confidence in Microsoft's abilities to make them work for customers.

Monday, November 25, 2013

The New Aristocracy

The workplace is an interesting study for psychologists. It has many types of interactions and many types of stratifications of employees. The divisions are always based on rank; the demonstrations of rank are varied.

I worked in one companyin which rank was indicated by the type, size, and location of one's workspace. Managers were assigned offices (with doors and solid walls), senior technical people were assigned cubicles next to windows, junior technical employees were assigned cubicles without windows, and contract workers were "doubled up" in windowless cubicles.

In another company, managers were issued color monitors and non-managers were issued (cheaper) monochrome monitors.

We excel at status symbols.

The arrival of tablets (and tablet apps) gives us a new status symbol. It allows us to divide workers into those who work with keyboards and those who work without keyboards. The "new aristocracy" will be, of course, those who work without keyboards. They will be issued tablets, while the "proletariat" will continue to work with keyboards.

I don't expect that this division will occur immediately. Tablets are quite different from desktop PCs and the apps for tablets must be different from desktop apps. It will take time to adapt our current applications to the tablet.

Despite their differences, tablets are -- so far -- much better at consuming information, while PCs are better at composing information. Managers who use information to make decisions will be able to function with tablets, while employees who must prepare the information will continue to do that work on PCs.

I expect that the next big push for tablet applications will be those applications used by managers: project planning software, calendars, dashboards, and document viewers.

The new aristocracy in the office will be those who use tablets.

Thursday, November 21, 2013

The excitement of new tech

GreenArrays has introduced the GA144 chip, which contains 144 F18 processors. They also have a prototyping circuit board for the GA144. These two offerings intrigue me.

The F18 is a processor that uses Forth as its instruction set. That in itself is interesting. Forth is a small, stack-oriented language, initially developed in the 1960s (Wikipedia asserts the origin at 1958) and created to run on diverse architectures. Like C, it is close to hardware and has a small set of native operations. The Forth language lets the user define new "words" and build their own language.

The GA144 has 144 of these processors.

The F18 and the GA144 remind me of the early days of microcomputers, when systems like the Mark-8 and the Altair were available. These "homebrew" systems existed prior to the "commercial" offerings of the Apple II and the Radio Shack TRS-80. They were new things in the world, unlike anything we had seen before.

We were excited by these new microcomputers. We were also ignorant of their capabilities. We knew that they could do things; we didn't know how powerful they would become. Eventually, the commercial systems adopted the IBM PC architecture and the MS-DOS operating system (later, Windows) and became ubiquitous.

I'm excited by the GA144. It's new, it's different, and it's potent. It is a new approach to computing. I don't know where it will take us (or that it will succeed at taking us anywhere) -- but I like that it offers us new options.

Wednesday, November 20, 2013

We need a new UML

The Object Management Group has released a new version of UML. The web site for Dr. Dobb's asks the question: Do You Even Care? It's a proper question.

It's proper because UML, despite a spike of interest in the late 1990s, has failed to move into the mainstream of software development. While the Dr. Dobb's article claims ubiquity ("dozens of UML books published, thousands of articles and blogs posted, and thousands of training classes delivered"), UML is anything but ubiquitous. If anything, UML has been ignored in the latest trends of software: agile development techniques and functional programming. It is designed for large projects and large teams designing the system up front and implementing it according to detailed documents. It is designed for systems built with mutable objects, and functional programming avoids both objects and mutable state.

UML was built to help us design and build large complex systems. It was meant to abstract away details and let us focus on the structure, using a standard notation that could be recognized and understood by all practitioners. We still need those things -- but UML doesn't work for a lot of projects. We need a new UML, one that can work with smaller projects, agile projects, and functional programming languages.

Sunday, November 17, 2013

The end of complex PC apps

Businesses are facing a problem with technology: PCs (and tablets, and smart phones) are changing. Specifically, they are changing faster than businesses would like.

Corporations have many programs that they use internally. Some corporations build their own software, others buy software "off the shelf". Many companies use a combination of both.

All of the companies with whom I have worked wanted stable platforms on which to build their systems and processes. Whether it was a complex program built in C++, a comprehensive model built in a spreadsheet, or an office suite (word processor, spreadsheet, and e-mail), companies want to invest their effort in their custom solutions. They did not want to spend money or time on upgrades and changes to the operating system or commercially available applications.

While they dislike change, corporations are willing to upgrade systems. Corporations want long upgrade cycles. They want gentle upgrade paths, with easy transitions from one version to the next. They were happy with the old Microsoft world: Windows NT, Windows 2000, and Windows XP were excellent examples of the long, gentle upgrades desired by corporations.

That is no longer the world of PCs. The new world sees fast update cycles for operating systems, major updates that require changes to applications. For companies with custom-made applications, they have to invest time and effort in updating their applications to match the new operating systems. (Consider Windows Vista and Windows 8.) For companies with off-the-shelf applications, they have to purchase new versions that run on the new operating systems.

What is a corporation to do?

My guess is that corporations will seek out other platforms and move their apps to those platforms. My guess is that corporations will recognize the cost of frequent change in the PC and mobile platforms, and look for other solutions with lower cost.

If they do, then PCs will lose their title to the development world. The PC platform will not be the primary target for applications.

What are the new platforms? I suspect the two "winning" platforms will be web apps (browsers and servers), and mobile/cloud (tablets and phones with virtualized servers). While the front ends for these systems undergo frequent changes, the back ends are relatively stable. The browsers for web apps are mostly stable and they buffer the app from changes to the operating system. Tablets and smart phones undergo frequent updates; this cost can be minimized with simple apps that can be updated easily.

The big trend is away from complex PC applications. These are too expensive to maintain in the new world of frequent updates to operating systems.

Thursday, November 14, 2013

Instead of simplicity, measure complexity

The IEEE Computer Society devoted their November magazine issue to "Simplicity in IT". Simplicity is a desirable trait, but I have found that one cannot measure it. Instead, one must measure its opposite: complexity.

Some qualities cannot be measured. I learned this lesson as a sysadmin, managing disk space for multiple users and groups. We had large but finite disk resources (resources are always finite), shared by different teams. Despite the large disk resources, the combined usage of the teams exceeded our resources -- in other words, we "ran out of free space". My job was to figure out "where the space had gone".

I quickly learned that the goal of "where the space had gone" was the wrong one. It is impossible to measure, because space doesn't "go" anywhere. I substituted new metrics: who is using space, and how much, and how does that compare to their usage last week? These were possible to measure, and more useful. A developer who uses more than four times the next developer, and more than ten times the average developer is (probably) working inefficiently.

The metric "disk space used by developer" is measurable. The metric "change in usage from last week" is also measurable. In contrast, the metric "where did the unallocated space go" is not.

The measure of simplicity is similar. Instead of measuring simplicity, measure the opposite: complexity. Instead of asking "why is our application (or code, or UI, or database schema) not simple?", ask instead "where is the complexity?"

Complexity in source code can be easily measured. There are a number of commercial tools, a number of open source tools, and I have written a few tools for my own use. Anyone who wants to measure the complexity of their system has tools available to them.

Measuring the change in complexity (such as the change from one week to the next) involves taking measurements at one time and storing them, then taking measurements at a later time and comparing them against the earlier measurements. That is a little more complex that merely taking measurements, but not much more complicated.

Identifying the complex areas of your system give you an indicator. It shows you the sections of your system that you must change to achieve simplicity. That work may be easy, or may be difficult; a measure of complexity merely points to the problem areas.

* * * *

When I measure code, I measure the following:

  • Lines of code
  • Source lines of code (non-comments)
  • Cyclomatic complexity
  • Boolean constants
  • Number of directly referenced classes
  • Number of indirectly referenced classes
  • Number of directly dependent classes
  • Number of indirectly dependent classes
  • Class interface complexity (a count of member variables and public functions)

I find that these metrics let me quickly identify the "problem classes" -- the classes that cause the most defects. I can work on those classes and simplify the system.

Tuesday, November 12, 2013

Measuring technical debt is not enough

I've been working on the issue of technical debt. Technical debt is a big problem, one that affects many projects. (In short, technical debt is the result of short-cuts, quick fixes, and poor design. It leads to code that is complicated, difficult to understand, and risky to change.) Technical debt can exist within source code, a database schema, an interface specification, or any other aspect of a technical product. It can be large or small; it tends to start small and grow over time.

Technical debt is usually easy to identify. Typical indicators are:

  • Poorly named variables, functions or classes
  • Poorly formatted code is another
  • Duplicate code
  • An excessive use of 'true' and 'false' constants in code
  • Functions with multiple 'return' statements
  • Functions with any number of 'continue' statements
  • A flat, wide class hierarchy
  • A tall, narrow class hierarchy
  • Cyclic dependencies among classes

Most of these indicators can be measured -- with automatic means. (The meaningfulness of names cannot.) Measuring these indicators gives you an estimate of your technical debt. Measuring them over time gives you an estimate of the trajectory of technical debt -- how fast it is growing.

But measurements are not enough. What it comes down to is this: I can quantify the technical debt, but not the benefits of fixing it. And that's a problem.

It's a problem because very few people find such information appealing. I'm describing (and quantifying) a problem, and not quantifying (or even describing) the benefits of the solution. Who would buy anything with that kind of a sales pitch?

What we need is a measure of the cost of technical debt. We need to measure the effect of technical debt on our projects. Does it increase the time needed to add new features? Does it increase the risk of defects? Does it drive developers to other projects (thus increasing staff turnover)?

Intuitively, I know (or at least I believe) that we want to reduce technical debt. We want our code to be "clean".

But measuring the effect of technical debt is hard.

Monday, November 11, 2013

Cheap hardware means meetings are expensive

In the early days of computing, everything was precious. Hardware was expensive -- really expensive. Processors, memory, storage, and communications were so expensive that only large businesses and governments could afford computers. (Programmers were relatively cheap.)

Over time, the cost of hardware dropped, and the cost of people rose. In the 1970s and 1980s, there were several charts that showed two lines: a steadily decreasing line for hardware cost and a steadily increasing line for the cost of programmers.

Today, hardware is cheap. Memory is cheap, storage is cheap, network bandwidth is cheap, and even processors are cheap. The Raspberry Pi computer sells for about $30; add a few items and you have a complete system for under $200. Moreover, the GreenArrays GA144 chip offers 144 processors for $20; a usable system for an experimenter will run about $500.

My point is not that hardware has become cheap.

My point is that in the early days, when hardware was expensive, we formed processes and ideas for project management, and those ideas were based on the notion of expensive hardware. Our practices for program design were made to minimize the use of expensive resources. A sound idea -- at the time.

At that time, hardware was expensive and people were cheap. It was better to hold meetings to discuss ideas before testing them on the (expensive) hardware. It was better to hold design reviews. It was better to think and discuss before experimenting. Those ideas are the basis for project management today. Even with Agile Development techniques, the team decides on the features prior to the sprint.

With plentiful hardware (and plentiful software from open source), the cost equations have changed. It is now cheap to experiment and expensive to hold meetings. It is especially expensive to hold meetings in a single place, with all attendees present. The attendees have to travel to the meeting, they have to sit through sections of the meeting that don't apply to them, and they have to return. This cost is (relatively) low when all attendees are already co-located and the meeting is short; the cost grows as you add attendees and topics.

Technology has reduced the cost of hardware. Agile techniques and global competition have reduced the cost of software development. Now we face the cost of meetings. Is anyone measuring the cost of meetings? How much of a project's budget is spent on meetings? Once this number is known, I suspect that meetings will become a target for cost reduction.

Which is not to say that projects need no coordination. Far from it. As projects become larger and more ambitious, coordination becomes more important.

But the form of coordination doesn't necessarily have to be meetings.

Wednesday, November 6, 2013

More was more, but now less is more

IBM and Microsoft built their empires with the strategy "bigger and more features". IBM mainframes, over time, became larger (in terms of processor speed and memory capacity) and included more features. Microsoft software, over time, became larger (in terms of capacity) and included more features.

It was a successful strategy. IBM and Microsoft could win any "checklist battle" which listed the features of products. For many managers, the product with the largest list of features is the safest choice. (Microsoft and IBM's reputations also helped.)

One downside of large, complicated hardware and large, complicated software is that it leads to large, complicated procedures and data sets. Many businesses have developed their operating procedures first around IBM equipment and later around Microsoft software. When developing those procedures, it was natural to, over time, increase the complexity. New business cases, new exceptions, and special circumstances all add to complexity.

Businesses are trying to leverage mobile devices (tablets and phones) and finding that their long-established applications don't "port" easily to the new devices. They are focussing on the software, but the real issue is their processes. The complex procedures behind the software are making it hard to move business to mobile devices.

The user interfaces on mobile devices limit applications to much simpler operations. Perhaps our desire for simplicity comes from the size of the screen, or the change from mouse to touch, or from the fact that we hold the devices in our hands. Regardless of the reason, we want mobile devices to have simple apps.

Complicated applications of the desktop, with drop-down menus, multiple dialogs, and oodles of options simply do not "work" on a mobile device. We saw this with early hand-held devices such as the popular Palm Pilot and the not-so-popular Microsoft PocketPC. Palm's simple operation won over the more complex Windows CE.

Simplicity is a state of mind, one that is hard to obtain. Complicated software tempts one into complicated processes (so many fonts, so many spreadsheet formula operations, ...). Mobile devices demand simplicity. With mobile, "more" may be more, but it is not better. The successful businesses will simplify their procedures and their underlying business rules (perhaps the MBA crowd will prefer the words "streamline" or "optimize") to leverage mobile devices.


Monday, November 4, 2013

For local storage, we get what we want

I hear many complaints about IT equipment, but I have heard few complaints about the cost of storage (that is, disk drives). It wasn't always this way.

In the early microcomputer era, storage was cassette tape. If you were wealthy, storage was floppy disk. Floppy disk systems (and media) were not cheap. They cost both money and time; once you had the hardware you needed to write the software to use them. (CP/M fixed some of that.)

The early hard drives were large, hulking beasts that required special power supplies, dedicated cabinets, and extra care. My first hard drive was a 10 Megabyte disc that was the size of an eight inch floppy drive (or an old-style, paper edition of Webster's Dictionary). The cabinet (which contained the hard drive, an actual eight-inch floppy drive, and power supply) was the size of a microwave oven and weighed more than fifty pounds. The retail price was $5000 -- in 1979 dollars, more than an automobile.

Beyond the money, the time necessary to configure such a drive was non-trivial. Operating systems at the time could access at most 8 Megabytes. You could not use the entire hard drive at one time. Hard drives of 10 Megabytes were partitioned into smaller volumes with special software. These partitioning operations also took time.

People who had hard drives really wanted them. People who didn't have them complained about the cost. People who did have them complained about the time to configure them.

Over time, hard disks became smaller in physical size and required less power. Today, the "standard" hard drive is either a 3.5-inch or 2.5-inch drive that holds a Terabyte and costs less than $200 (in 2013 dollars). Adding such a drive to your existing PC is easy: plug it in and the operating system detects it automatically. Operating systems can address most commonly available hard drives; partitioning is no longer necessary. The "cost" of a hard drive, in terms of money and time, is trivial compared to the prices of that earlier age.

Which leads to a question: If we were, earlier, willing to spend time and money (lots of time and lots of money) on a hard drive, why are we unwilling to spend that time and money now? We could, after all, create disk arrays with huge amounts of storage (petabytes) by ganging together multiple hard drives into a cabinet. Fifty pounds of hard drives, power supply, and interface electronics could store lots and lots of data.

But we don't. We accept the market solution of 2 Terabyte drives and live with that.

The market for the early hard drives was the tinkerers and hackers. These were the folks who enjoyed configuring systems and re-writing operating systems. Today, those are a small percentage of the PC market. Current plug-compatible hard drives of 2 Terabytes or less are, for most people, good enough. These two factors tell me that capacity is good, but convenience is better.

We live with what the market offers (except for a few ornery hackers). We live with the trade-off between cost and convenience. I think we recognize that trade-off. I think we understand that we are living with certain choices.

And I think that we don't complain, because we understand that we have made that choice.

Monday, October 28, 2013

The Cult of Accountability

The disappointing performance of the medical insurance exchange web site (the "Obamacare web site") show the dark side of current project management techniques. After an initial specification, a long wait while multiple teams worked on various parts of the system, and a hasty integration, the web site has numerous, significant problems. Now we have calls from a group I call the "Cult of Accountability". They (the cult) want to know "who is responsible" for the failure.

Big projects often work this way: A large project is assigned to a team (for government projects, the "prime contractor") along with specifications of the deliverable. That team breaks the project into smaller components and assigns components to teams, internal or external (the "sub-contractors") along with specifications for those components. When the work is complete, the work moves in the reverse direction, with the bottom layer of teams providing their components to the next higher layer, those teams assembling the components and providing the results to the next higher layer, until the top team assembles components into a finished product.

This cycle of functional decomposition and specification continues for some number of cycles. Notice that each team starts with a specification, divides the work into smaller pieces, and provides specifications to the down-stream teams.

The top-down design and project planning for many projects is a process that defines tasks, assigns resources, and specifies delivery dates up front. It locks in a deliverable of a specified functionality, a particular design, and a desired level of quality, all on an agreed date. It defines the components and assigns responsibility for each component.

The "divide and conquer" strategy works... if the top team knows everything about the desired deliverable and can divide the work into sensible components, and if the down-stream teams know everything about their particular piece. This is the case for work that has already been done, or work that is very similar to previous work. The assembly of automobiles, for example: each car is a "product" and can be assembled by following well-defined tasks. The work can be divided among multiple teams, some external to the company. The specifications for each part, each assembly, each component, are known and understood.

The "divide and conquer" strategy works poorly for projects that are not similar to previous work. Projects in "unexplored territory" contain a large number of "unknowns". Some are "known unknowns" (we know that we need to test the performance of our database with the expected level of transactions) and some are "unknown unknowns" (we didn't realize that our network bandwidth was insufficient until we went to production). "Unknowns" is another word for "surprises".

In project management, surprises are (usually) bad. You want to avoid them. You can investigate issues and resolve questions, if you know about them. (These are the "known unknowns".) But you cannot (by definition) plan for the "unknown unknowns". If you plan for them, they become "known unknowns".

Project planning must include an evaluation of unknowns, and project process must account for them. Projects with few unknowns can be run with "divide and conquer" (or "waterfall") methods. These projects have few latent surprises.

Projects with many unknowns should be managed with agile techniques. These techniques are better at exploring, performing work in small steps and using the experience from one step to guide later steps. They don't provide a specific date for delivery of all features; they provide a constantly working product with features added over time. They avoid the "big bang" at the end of a long development effort. You exchange certainty of feature set for certainty of quality.

The Cult of Accountability will never accept agile methods. They must have agreements, specific and detailed agreements, up front. In a sense, they are planning to fail -- you need the agreements only when something doesn't work and you need a "fall guy". With agile methods, your deliverable always works, so there is no "accountability hunt". There is no need for a fall guy.

Wednesday, October 23, 2013

Healthcare.gov was a "moonshot", but the Moon mission was not

Various folks have referred to the recent project to build and launch the healthcare.gov web site as a "moonshot". They are using the term to describe a project that:

  • is ambitious in scope
  • has a large number of participants
  • occurs in a short and fixed time frame
  • consists of a single attempt that will either succeed or fail

We in IT seem to thrive on "moonshot" type projects.

But I will observe that the NASA Moon project (the Mercury, Gemini, and Apollo missions) was not a "moonshot". NASA ran the project more like an agile project than the typical waterfall project.

Let's compare.

The NASA Moon project was ambitious. One could even call it audacious.

The NASA Moon project involved a (relatively) large number of participants, including rocket scientists, metallurgists, electrical engineers, chemists, psychologists, biologists, and radio specialists. (And many more.)

The NASA Moon project had a fixed schedule of "by the end of the decade" assigned by President Kennedy in 1961.

The NASA Moon project consisted of a number of phases, each with specific goals and each with subprojects. The Mercury flights established the technology and skills to orbit the Earth. The Gemini missions built on Mercury to dock two vehicles in space. The Apollo missions used that experience to reach the Moon.

It's this last aspect that is very different from the healthcare.gov web site project (and also very different from many IT projects). The NASA Moon program was a series of projects, each feeding into the next. NASA started with a high-level goal and worked its way to that goal. They did not start with a "master project plan" that defined every task and intermediate deliverable. They learned as they went and made plans -- sensible plans, based on their newly-won experience -- for later flights.

The healthcare.gov web site is an ambitious project. It's launch has been difficult, and shows many defects. Could it have been built in an agile manner? Would an agile approach given us a better result?

The web site must perform several major tasks: authenticate users, verify income against government databases, and display valid plans offered by insurance companies. An agile approach would have built the web site in phases. Perhaps the first phase could be allowing people to register and create their profile, the second verifying income, and the third matching users with insurance plans. But such a "phased" release might have been received poorly ("what good is a web site that lets you register but do nothing else?") and perhaps not completed in time.

I don't know that agile methods would have made for better results at healthcare.gov.

But I do know that the Moon project was not a "moonshot".

Sunday, October 20, 2013

Java and Perl are not dead yet

Are Java and Perl dead?

The past few weeks have seen members of the Java community claim that Java is not dead. They have also seen members of the Perl community claim that Perl is not dead.

Many developers may want Perl to die (or Java to die, or COBOL to die) but when members of the community claim "X is not dead" then we should review the health of X.

The industry has seen many languages come and go. Of all the languages that we have seen since the beginning of the computer age, most are dead. Don't take my word; consider the languages documented by Jean Sammet in the late 1960s. They include: A-2, A-3, ADAM, AED, AIMACO, Algol, ALTRAN, AMBIT, AMTRAN, APL, and APT (and that is just the entries under 'A'). Of these, the only language that can be said to be alive today is Algol -- and even that is pushing the definition of 'alive'.

Languages die. But not all -- popular languages live on. Long-lasting languages include COBOL, Fortran, C, C++, and Perl. One could add Java and C# to that list, depending on one's definition of "long-lasting". (I think that there is no debate about their popularity.)

Back to Java and Perl.

I think that Java and Perl are very much alive, in that many projects use them. They are also alive in that new versions are built and released by the maintainers. Perl is an open source language, supported by its community. Java is supported by Oracle.

But Java and Perl are, perhaps, not as popular as they used to be. When introduced, Java was a shiny new thing compared to the existing C++ champion. Perl's introduction was quieter and it developed a following slowly. But slow or fast, they were considered the best in programming: Java for object-oriented programming and Perl for scripting.

Today, Java and Perl have competition. Java has C#, a comparable language for object-oriented programming. It also has the JVM languages Scala and Lua (and others) that have taken the "shiny new thing" title away from Java. Perl has competition from Python and Ruby, two other capable scripting languages.

Java and Perl are no longer the clear leaders. They are no longer the "obvious best" languages for development. When starting a new project, people often pick one of the competition. I think that it is this loss of "obvious best" position that is causing the angst in their respective development communities.

Now, Java and Perl are still capable programming languages. I wouldn't abandon them. For projects that use them today, I would continue to use them.

For new projects... I would consider my options.

Thursday, October 17, 2013

For PCs, small may be the new big thing

PCs have had the same size and shape (roughly) for the past thirty years. While we have seen improvements to processors (faster), memory (more), video adapters (faster and more memory), hard disks (bigger), and communication ports (faster, more, and simpler), the general design of a PC has been stagnant. The color may have changed from IBM's beige to Compaq's brown, to Dell's white, and to HP's black-and-silver, but the PC box has remained... a box.

In the early PC days, the large, spacious box with its expansion slots made sense. In the early days, PCs needed expansion and customization. The "base" PC was not enough for corporate work. When we bought a PC, we added video cards, memory cards, serial and parallel port cards, terminal emulator cards, and network cards. We even added cards with real-time clocks. It was necessary to open the PC and add these cards.

Over the years, more and more "extra" features became "standard". The IBM PC AT came with a built-in real-time clock, which eliminated one card. Memory increased. Hard drives became larger and faster. The serial ports and parallel ports were replaced by USB ports. Today's PC has enough memory, a capable video card, a big enough hard disk, a network interface, and ample USB ports. (Apple computers have slightly different communication options, but enough.)

The one constant in the thirty years of change has been the size of the PC. The original IBM PC was about the size of today's tower PC. PCs still have the card slots and drive bays for expansion, although few corporate users need such things.

That's about to change. PCs will shrink from their current size to one of two smaller sizes: small and nothing. The small PCs will be the size of the Apple Mini: a 4-inch by 4-inch box with ports and no expansion capabilities. The "nothing" size PCs will be virtual machines, existing only in larger computers. (Let's focus on the "small" size. We can discuss virtual PCs another time.)

The small PCs have all the features of a real PC: processor, memory, storage, video, and communications. They may have some compromises, with perhaps not the fastest processors and the most capable video cards, but they are good enough. They can run Windows or Linux, and the Apple Mini Mac runs MacOS, of course. All you need is a display, a keyboard, and a network connection. (These small-form PCs often have wire network interfaces and not wireless.)

I suppose that we can give credit to Apple for the change. Apple's Mini Mac showed that there was a steady demand for smaller, non-PC-shaped PCs. Intel has their "Next Unit of Computing" or NUC device, a small 4-inch by 4-inch PC with communication ports.

Other manufacturers had built small PCs prior to Apple's Mini Mac (the Shuttle PC is a notable pioneer) but received little notice.

The Arduino, the Raspberry Pi, and the Beaglebone and also small-form devices, designed mainly for tinkerers. I expect little interest from the corporate market in these devices.

But I do expect interest in the smaller "professional" units from Apple and Intel. I also expect to see units from other manufacturers like Lenova, Asus, HP, and Dell.

Small will be the new big thing.

Monday, October 14, 2013

Executables, source code, and automated tests

People who use computers tend to think of the programs as the "real" software.

Programmers tend to have a different view. They think of the source code as the "real" software. After all, they can always create a new executable from the source code. The generative property of source code gives it priority of the mere performant property of executable code.

But that logic leads to an interesting conclusion. If source code is superior to executable code because the former can generate the latter, then how do we consider tests, especially automated tests?

Automated tests can be used to "generate" source code. One does not use tests to generate source code in the same, automated manner that a compiler converts source code to an executable, but the process is similar. Given a set of tests, a framework in which to run the tests, and the ability to write source code (and compile it for testing), one can create the source code that produces a program that conforms to the tests.

That was a bit of a circuitous route. Here's the concept in a diagram:


     automated tests --> source code --> executable code


This idea has been used in a number of development techniques. There is test-driven development (TDD), extreme programming (XP), and agile methods. All use the concept of "test first, then code" in which tests (automated tests) are defined first and only then is code changed to conform to the tests.

The advantage of "test first" is that you have tests for all of your code. You are not allowed to write code "because we may need it someday". You either have a test (in which case you write code) or you don't (in which case you don't write code).

A project that follows the "test first" method has tests for all features. If the source code is lost, one can re-create it from the tests. Granted, it might take some time -- this is not a simple re-compile operation. A complex system will have thousands of tests, perhaps hundreds of thousands. Writing code to conform to all of those tests is a manual operation.

But it is possible.

A harder task is going in the other direction, that is, writing tests from the source code. It is too easy to omit cases, to skip functionality, to misunderstand the code. Given the choice, I would prefer to start with tests and write code.

Therefore, I argue that the tests are the true "source" of the system, and the entity we consider "source code" is a derived entity. If I were facing a catastrophe and had to pick one (and only one) of the tests, the source code, and the executable code, I would pick the tests -- provided that they were automated and complete.

Sunday, October 13, 2013

Unstructured data isn't really unstructured

The introduction of NoSQL databases has brought along another concept: unstructured data. Advocates of NoSQL are quick to point out that relational databases are limited to structured data, and NoSQL data stores can handle unstructured (as well as structured) data.

I think that word does not mean what you think it means.

I've seen lots of data, and all of it has been structured. I have yet to meet unstructured data. All data has structure -- the absence of structure implies random data. Even the output of a pseudo-random number generator are structured; it is a series of numeric values.

When people say "structured data", they really mean "a series of objects, each conforming to a specific structure, known in advance". Tables in a relational database certainly follow this rule, as do records in a COBOL data declaration.

NoSQL data stores relaxes these constraints, but still requires that data be structured. If a NoSQL data store uses JSON notation, then each object must be stored in a manner consistent with JSON. The objects in a set may contain different properties, so that one object has a structure quite different from the next object, but each object must be structured.

This notion is not new. While COBOL and FORTRAN were efficient at processing homogenous records, Pascal allowed for "variant" records (it used a key field at the beginning of the record to identify the record type and layout).

What is new is that the object layout is not known in advance. Earlier languages and database systems required the design of the data up front. A COBOL program would know about customer records, for example, and a FORTRAN program would know about data observations. The structure of the data was "baked in" to the program. A new type of customer, or a new type of data set, would require a new version of the program or database schema.

NoSQL lets us create new structures without changing the program or schema. We can add new fields and create new objects for storage and processing, without changing the code.

So as I see it, it's not that data is unstructured. The idea is that we have reduced the coupling between the data from the program. Data is still structured, but the structure is not part of the code.

Thursday, October 10, 2013

Hadoop shows us a possible future of computing

Computing has traditionally been processor-centric. The classic model of computing has a "central processing unit" which performs computations. The data is provided by "peripheral devices", processed by the central unit, and then routed back to peripheral devices (the same as the original devices or possibly others). Mainframes, minicomputers, and PCs all use this model. Even web applications use this model.

Hadoop changes this model. It is designed for Big Data, and the size of data requires a new model. Hadoop stores your data in segments across a number of servers -- with redundancy to prevent loss -- with each segment being 64MB to 2GB. If your data is smaller than 64MB, moving to Hadoop will gain you little. But that's not important here.

What is important is Hadoop's model. Hadoop moves away from the traditional computing model. Instead of a central processor that performs all calculations, Hadoop leverages servers that can hold data and also perform calculations.

Hadoop makes several assumptions:

  • The code is smaller than the data (or a segment of data)
  • Code is transported more easily than data (because of size)
  • Code can run on servers

With these assumptions, Hadoop builds a new model of computing. (To be fair, Hadoop may not be the only package that builds this new model of distributed processing -- or even the first. But it has a lot of interest, so I will use it as the example.)

All very interesting. But here is what I find more interesting: the distributed processing model of Hadoop can be applied to other systems. Hadoop's model makes sense for Big Data, and systems with Little (that is, not Big) data should not use Hadoop.

But perhaps smaller systems can use the model of distributed processing. Instead of moving data to the processor, we can store data with processors and move code to the data. A system could be constructed from servers holding data, connected with a network, and mobile code that can execute anywhere. The chief tasks then become identifying the need for code and moving code to the correct location.

That would give us a very different approach to system design.

Sunday, October 6, 2013

Transformation projects

An organization that maintains software can run several types of projects. Some are simple, others are complex. One of the most complex projects can be what I call "transformation projects", which change the nature of the software.

Examples of these transformation projects are:

  • Upgrading to a new version of the compiler or IDE
  • Changing from one database to another
  • Expanding to another computing platform
  • Converting from one language to another
  • Moving from one computing architecture to another

All transformation projects are not equal. Upgrading a compiler (say Visual Studio 2010 to Visual Studio 2012) is a minor affair. The existing code still compiles. The existing design still works.

Compare that simple project to changing the system's database (for example, switching from MySQL to MariaDB, or possibly Microsoft SQL Server to Oracle). While SQL databases are mostly compatible, they are not exactly compatible. They use different languages for stored procedures. They have different techniques for the optimization of queries.

Transformation projects frequently add no new business features to the software. Consequently, many organizations perform them rarely, sometimes only when necessary. Such a strategy may be viewed as responsible, since by not spending time on upgrades you can focus your effort on features which bring measurable benefit to the business.

But such a strategy has a cost: you have no practice at these transformation projects, and consequently often underestimate the time and risks of such projects. Frequent (or regular) transformation projects gives you experience, which you can use to estimate the time and risk of future similar projects.

I should say "accurately estimate", since you can estimate any project, with or without experience. But experience gives us better estimates. For example, I (and I suspect many people) can accurately estimate the time needed to commute home from the office. They actually do commute home, so they have experience and can (reasonable) assume that future commutes home will take the same amount of time.

An estimate without the corresponding experience is subject to greater variance. As a counter-example to the "time to commute home" example, can you estimate the time it would take to walk to Argentina? (If walking is too long, you may substitute driving.) Such a project contains unknowns: tasks for which we have little or no experience. (In this example those would include crossing national borders, obtaining licenses to drive, and paying for gasoline in foreign countries.)

When an organization avoids transformation tasks (for whatever reason), it foregoes the experience of those projects. When it decides to perform one of those projects (because eventually you will need to perform at least one of them) it has no experience to use in the formulation of the plan. The project schedule may have wildly inaccurate estimates, or may omit important tasks.

The transition from web app to cloud app is such a transformation project. So is the transition from a desktop app to a web app, or a cloud app. If you are working on such a project and find that the project schedule is daunting, difficult to prepare, or just not working for you, look back on your experience with similar projects. If you have none (or very little), then perhaps experience with similar, smaller transformation projects can help.


Monday, September 30, 2013

Agile project cannot overrun the budget

Some recent trade rags published stories about Agile-run projects and how they overran their budgets. This seems wrong, based on what I know about Agile methods.

First, a little history. Before "agile" projects, the dominant project method was "waterfall". Waterfall is a method for running a project. All projects need a number of tasks: analysis, design, development, and testing. The waterfall process puts these in strict sequence, with each occurring before the next phase. Thus, all analysis is performed, then all design, and then all development.

The beauty of waterfall is the schedule. The waterfall method promises a specific deliverable with a specific level of quality on a specific date.

The horror or waterfall is that it rarely delivers on schedule. It is quite common for waterfall projects to overrun their schedule. (Their cost estimates, too.)

Agile is not waterfall. The agile method takes a different approach and makes a different promise. It does not promise a specific deliverable on a specific date. Instead, it makes a promise of deliverability.

An agile project consists of a number of iterations. (Some people call these iterations "sprints". I'm not too attached to either name.) Each iteration adds a small set of features. Each iteration also uses automated tests to ensure that previous features still work. The project starts with a small set of initial features and adds features on each iteration. But the important idea is that the end of each iteration is a working product. It may not have everything you want, but everything that it has does work.

The promise of agile is that you can always deliver your product.

With the agile process, one cannot overrun a budget. (You can always stop when you want.) You can, instead, underdeliver. You may get to the end of your allotted time. You may use all of your allotted funds. But you always have something to deliver. If you run over your budget or schedule, it's because you chose to spend more time or money.

Sunday, September 22, 2013

The microcomputers of today

The microcomputer revolution was started with the MITS Altair 8800, the IMSAI 8080, and smaller computers such as the COSMAC ELF. They were machines made for tinkerers, less polished than the Apple II or Radio Shack TRS-80. They included the bare elements needed for a computer, often omitting the case and power supply. (Tinkerers were expected to supply their own cases and power supplies.)

While less polished, they showed that there was a market for microcomputers, and inspired Apple and Radio Shack (and lots of other vendors) to made and sold microcomputers.

Today sees a resurgence of small, "unpolished" computers that are designed for tinkerers. They include the Arduino, the Raspberry Pi, the Beaglebone, and Intel's Minnowboard system. Like the early, pre-Apple microcomputers, these small systems are the bare essentials. (Including omitting the power supply and case.)

And like the earlier microcomputer craze, they are popular.

What's interesting is that there are no major players in this space. There are no big software vendors supplying software for these new microcomputers.

There were no major software vendors in the early microcomputer space. These systems were offered for sale with minimal (or perhaps zero) software. The CP/M operating system was adopted by users and adapted to their systems. CP/M's appeal was that it could be (relatively, for tinkerers) easily modified for specific systems.

The second generation of microcomputers, the Apple II and TRS-80 and their contemporaries, had a number of differences from the first generation. They were polished: they were complete systems with cases, power supplies, and software.

The second generation of microcomputers had a significant market for software. There were lots of vendors, the largest being Digital Research and Microsoft. Microsoft made its fortune by supplying its BASIC interpreter to just about every hardware vendor.

That market did not include the major players from the mainframe or minicomputer markets. Perhaps they thought that the market dynamics was not profitable -- they had been selling software for thousands of dollars (or tens of thousands) and packages in the microcomputer market sold for hundreds (or sometimes tens).

It strikes me that Microsoft is not supplying software to these new microcomputers.

Perhaps they think that the market dynamics are not profitable.

But these are the first generation of new microcomputers, the "unpolished" systems, made for tinkerers. Perhaps Microsoft will make another fortune in the second generation, as they did with the first microcomputer revolution.

Or perhaps another vendor will.

Thursday, September 19, 2013

Nirvanix collapse is not a cloud failure

The cloud service Nirvanix announced this week that it was closing its doors, and directing its customers to take their data elsewhere. Now, people are claiming this is a failure of cloud technology.

Let's be clear: the problems caused by Nirvanix are not a failure of the cloud. They are a business failure for Nirvanix and a supplier failure for its clients.

Business rely on suppliers for many items. No business is totally independent; business rely on suppliers for office space, computers, printers and paper, electricity, accounting and payroll services, and many other goods and services.

Suppliers can fail. Failure can be small (a delayed delivery, or the wrong item), or large (going out of business). Business must evaluate their suppliers and the risk of failure. Most supplies are commodities and can be easily found through competing suppliers. (Paper, for example.)

Some suppliers are "single-source". Apple, for instance, is the only supplier for its products. IBM PC compatibles are available from a number of sources (Dell, Lenovo, and HP) but MacBooks and iPads are available only from Apple.

Some suppliers are monopolies, and therefore also single sources. Utility companies are often local monopolies; you have exactly one choice for electric power, water, and usually cable TV.

A single-source supplier is a higher risk than a commodity supplier. This is obvious; when a commodity supplier fails you can go to another supplier for the same (or equivalent) item, and when a single-source supplier fails you cannot. It is common for businesses to look for multiple suppliers for the items they purchase.

Cloud services are, for the most part, incompatible, and therefore cloud suppliers are single-source. I cannot easily move my application from Amazon's cloud to Microsoft's cloud, for example. Being single-source, there is a higher risk involved with using them.

Yet many clients of cloud services have bought the argument "when you put something into the cloud, you don't have to worry about administration or back-up". This is false. Of course you have to worry about administration and back-up. You may have less involvement, but the work is still there.

And you also have the risk of supplier failure.

Our society chooses to regulate some suppliers. Utility companies are granted monopolies for efficiency (it makes little sense to run multiple water or power distribution networks) and are regulated to prevent failures. Some non-monopoly companies, such as banks and electricians are regulated for safety of the economy or people.

Other companies, such as payroll companies, are not regulated, and clients must examine the health of a company before committing to them.

I expect that cloud services will be viewed as accounting services: important but not so important as to need regulation. It will be up to clients to choose appropriate suppliers and make contingency plans for failures.

Wednesday, September 18, 2013

Big Data proves the value of open source

Something significant happened with open source software in the past two years. An event that future historians may point to and say "this is when open source software became a force".

That event is Big Data.

Open source has been with us for decades. Yet for all the technologies we have, from the first plug-board computers to smart phones, from the earliest assemblers to the latest language compilers, from the first IDE to Visual Studio, open source software has always copied the proprietary tools. Open source tools have always been implementations of existing ideas. Linux is a functional copy of Unix. The open source compilers and interpreters are for existing languages (C, C++, Fortran, Java). LibreOffice and Open Office are clones of Microsoft Office. Eclipse is an open source IDE, an idea that predates the IBM PC.

Yes, the open source versions of these tools have their own features and advantages. But the ideas behind these tools, the big concepts, are not new.

Big Data is different. Big Data is a new concept, a new entry in the technology toolkit, and its tools are (just about) all open source. Hadoop, NoSQL databases, and many analytics tools are open source. Commercial entities like Oracle and SAS may claim to support Big Data, their support seems less "Big Data" and more "our product can do that too".

A few technologies came close to being completely open source. Web servers are mostly open source, with stiff competition from Microsoft's (closed source) IIS. The scripting languages (Perl, Python, and Ruby) are all open source, but they are extensions of languages like AWK and the C Shell, which were not initially open source.

Big Data, from what I can see, is the first "new concept" technology that has a clear origin in open source. It is the proof that open source can not only copy existing concepts, but introduce new ideas to the world.

And that is a milestone that deserves recognition.

Tuesday, September 17, 2013

When programming, think like a computer

When programming, it is best to think like a computer. It is tempting to think like a human. But humans think very differently than computers (if we allow that computers think), and thinking like a human leads to complex programs.

This was brought home to me while reading William Conley's "Computer Optimization Techniques" which discusses the solutions to Integer Programming problems and related problems. Many of these problems can be solved with brute-force calculations, evaluating every possible solution and identifying the most profitable (or least expensive).

The programs for these brute-force methods are short and simple. Even in FORTRAN, they run less than fifty lines. Their brevity is due to their simplicity. There is no clever coding, no attempt to optimize the algorithm. The programs take advantage of the computer's strength of fast computation.

Humans think very differently. They tire quickly of routine calculations. They can identify patterns and have insights into shortcuts for algorithms. They can take creative leaps to solutions. These are all survival skills, useful for dealing with an uncertain environment and capable predators. But they are quite difficult to encode into a computer program. So hard that it is often more efficient to use brute-force calculations without insights and creative leaps. The time spent making the program "smart" is larger than the time saved by the improved program.

Brute-force is not always the best method for calculations. Sometimes you need a smart program, because the number of computations is staggering. In those cases, it is better to invest the time in improvements. (To his credit, Conley shows techniques to reduce the computations, sometimes by increasing the complexity of the code.)

Computing efficiency (that is, "smart" programs) has been a concern since the first computing machines were made. Necessary at first, the need for efficiency drops over time. Mainframe computers became faster, which allowed for "sloppy" programs ("sloppy" meaning "anything less than maximum efficiency").

Minicomputers were slower than mainframes, significantly less expensive, and another step away from the need for optimized, "smart" programs. PCs were another step. Today, smart phones have more computing power than PCs of a few years ago, at a fraction of the price. Cloud computing, a separate branch in the evolution of computing, offers cheap, readily-available computing power.

I won't claim that computing power is (or will ever be) "too cheap to meter". But it is cheap, and it is plentiful. And with cheap and plentiful computing power, we can build programs that use simple methods.

When writing a computer program, think like a computer. Start with a simple algorithm, one that is not clever. Chances are, it will be good enough.

Monday, September 16, 2013

Software recalls

The IEEE Spectrum web site reports that a software package was recalled. United Healthcare recalled something called "Picis ED Pulsecheck"; the problem relates to notes made by physicians (who would be the users).

I recognize that software for medical records is important. Defects in "normal" software can lose information, but defects in medical records can lead to incorrect diagnoses, incorrect treatments, and complications to the patient -- even death. Software in the medical domain must be correct.

Yet the idea of a "recall" for software seems primitive. Unusual, also; the is the first time I heard of a recall for software.

Recalls make sense for physical products. Physical products that pose some danger to the owner, like automobiles with faulty brake systems or lamps with incorrect wiring.

A recall forces the owner to return the product, or bring the product to a service center where it can be repaired.

Software is different from a physical product. It doesn't exist in a tangible form. For recalls, manufacturers must keep careful records of the repairs made to each unit. I can install software on several computers; do I bring in each copy?

But more than the number of copies, the basic idea of a recall for software seems... wrong. Why force someone to remove software from their location and :bring it in for repair"?

Why not send an update?

In software, we don't think of recalls. Instead we think of updates. (Or "patches", depending on our software subculture.)

All of the major software manufacturers (Microsoft, Apple, Adobe) send updates for their software. Today, those updates are delivered through the internet. Now, perhaps the medical software is on systems that are not connected to the internet (a reasonable security precaution) but updates can be delivered through means other than the internet.

Now, maybe United Healthcare has a good reason for issuing a recall and not sending out updates. Maybe their product is covered by federal or state laws that mandate recalls. Or maybe their corporate mindset is one of products and liability, and they choose to issue recalls. I don't know. (United Healthcare chose not to consult with me before issuing the recall.)

It's not important what United Healthcare does, or why. It's important what you do with your software and your customers. You can issue recalls (if you want) or updates (if you want) or both -- or neither. I encourage you to think about the choices you make. That's the important item here.

Sunday, September 15, 2013

Virtualization and small processors

From the beginning of time (for electronic data processing) we have desired bigger processors. We have wanted shorter clock cycles, more bits, more addressable memory, and more powerful instruction sets, all for processing data faster and more efficiently. With time-sharing we wanted additional controls to separate programs, which lead to more complex processors. With networks and malware we added additional complexity to monitor processes.

The history of processors as been a (mostly) steady upwards ramp. I say "mostly" because the minicomputer revolution (ca. 1965) and microcomputer revolution (1977) saw the adoption of smaller, simpler processors. Yet these smaller processors also increased in complexity, over time. (Microprocessors started with the humble 8080 and advanced to the Z-80, the 8086, the 80286, eventually leading to today's Pentium-derived processors.)

I think that virtualization gives us an opportunity for smaller, simpler processors.

Virtualization creates a world of two levels: the physical and the virtual. The physical processor has to keep the virtual processes running, and keep them isolated. The physical processor is a traditional processor and follows traditional rules: more is better, and keep users out of each others' hair.

But the virtual processors, they can be different. Where is it written that the virtual processor must be the same as the host processor? We've built our systems that way, but is it necessary?

The virtualized machine can be smaller than the physical host, and frequently is. It has less memory, smaller disks, and in general a slower (and usually simpler) processor. Yet a virtual machine is still a full PC.

We understand the computing unit known as a "PC". We've been virtualizing machine in these PC units because it has been easy.

A lot of that "standard PC" contains complexity to handle multiple users.

For cheap, easily created virtual machines, is that complexity really necessary?

It is if we use the virtual PC as we use a physical PC, with multiple users and multiple processes. If we run a web server, then we need that complexity.

But suppose with take a different approach to our use of virtual machines. Suppose that, instead of running a complex program like a web server or a database manager, we handle simple tasks. Let's go further and suppose that we create a virtual machine that is designed to handle only one specific task, and that one task is trivial in comparison to our normal workload.

Let's go even further and say that when the task is done, we destroy the virtual machine. Should we need it again, we can create another one to perform the task. Or another five. Or another five hundred. That's the beauty of virtual machines.

Such a machine would need less "baggage" in its operating system. It would need, at the very least, some code to communicate with the outside world (to get instruction and report the results), the code to perform the work, and... perhaps nothing else. All of the user permissions and memory management "stuff" becomes superfluous.

This virtual machine something that exists between our current virtual PC and an object in a program. This new thing is an entity of the virtualization manager, yet simpler (much simpler) than a PC with operating system and application program.

Being much simpler than a PC, this small, specialized virtual machine can use a much simpler processor design. It doesn't need virtual memory management -- we give the virtual processor enough memory. It doesn't need to worry about multiple user processes -- there is only one user process. The processor has to be capable of running the desired program, of course, but that is a lot simpler than running a whole operating system.

A regular PC is "complexity in a box". The designers of virtualization software (VMware, VirtualPC, VirtualBox, etc.) expend large efforts at duplicating PC hardware in the virtual world, and synchronizing that virtual hardware with the underlying physical hardware.

I suspect that in many cases, we don't want virtual PCs. We want virtual machines that can perform some computation and talk to other processors (database servers, web servers, queue servers, etc.).

Small, disposable, virtual machines can operate as one-time use machines. We can instantiate them, execute them, and then discard them. These small virtual machines become the Dixie cups of the processing world. And small virtual machines can use small virtual processors.

I think we may see a renewed interest in small processor design. For virtual processors, "small" means simple: a simple instruction set, a simple memory architecture, a simple system design.

Wednesday, September 11, 2013

Specialization can be good or bad

Technologies have targets. Some technologies, over time, narrow their targets. Two examples are Windows and .NET.

Windows was, at first, designed to run on multiple hardware platforms. The objective was an "operating environment" that would give Microsoft an opportunity to sell software for multiple hardware platforms. There were versions of Windows for the Zenith Z-100 and the DEC Rainbow; these computers had Intel processors and ran MS-DOS but used architectures different from the IBM PC. Later versions of Windows ran on PowerPC, DEC Alpha, and MIPS processors. Those variants have all ceased; Microsoft supports only Intel PC architecture for "real" Windows and the new Windows RT variant for ARM processors, and both of these run on well-defined hardware.

The .NET platform has also narrowed. Instead of machine architectures, the narrowing has been with programming languages. When Microsoft released .NET, it supplied compilers for four languages: C++, C#, Visual Basic, and Visual J#. Microsoft also made bold proclamations about the .NET platform supporting multiple languages; the implications were that other vendors would build compilers and that Java was a "one-trick pony".

Yet the broad support for languages has narrowed. It was clear from the start that Microsoft was supporting C# as "first among equals". The documentation for C# was more complete and better organized than the documentation for other languages. Other vendors did provide compilers for other languages (and some still do), but the .NET world is pretty much C# with a small set of VB fans. Microsoft's forays into Python and Ruby (the IronPython and IronRuby engines) have been spun off as separate projects; the only "expansion" language from Microsoft is F#, used for functional programming.

Another word for this narrowing of technology is "specialization". Microsoft focussed Windows on the PC platform; the code become specialized. The .NET ecosystem is narrowing to C#; our code is becoming specialized.

Specialization has its advantages. Limiting Windows to the PC architecture reduced Microsoft's costs and enabled them to optimize Windows for the platform. (Later, Microsoft would become strong enough to specify the hardware platform, and they made sure that advances in PC hardware meshed with improvements in Windows.)

Yet specialization is not without risk. When one is optimized for an environment (such as PC hardware or a language), it is hard to move to another environment. Thus, Windows is a great operating system for desktop PCs but a poor fit on tablets. Windows 8 shows that significant changes are needed to move to tablets.

Similarly, specializing in C# may lead to significant hurdles when new programming paradigms emerge. The .NET platform is optimized for C# and its object-oriented roots. Moving to another programming paradigm (such as functional programming) may prove difficult. The IronPython and IronRuby projects may provide some leverage, as may the F# language, but these are quite small compared to C# in the .NET ecosystem.

Interestingly, the "one-trick pony" environment for Java has expanded to include Clojure, Groovy, and Scala, as well as Jython and JRuby. So not all technologies narrow, and Sun's Oracle's Java may avoid the trap of over-specialization.

Picking the target for your technology is a delicate balance. A broad set of targets leads to performance issues and markets with little return. A narrow set of targets reduces costs but foregoes market penetration (and revenue) and may leave you ill-prepared for a paradigm shift. You have to chart your way between the two.

I didn't say it would be easy.