Thursday, December 26, 2013

Big data is about action, and leadership

Big data. One of the trends of the year. A new technology that brings new opportunities for businesses and organizations that use it.

Big data also brings challenges: the resources to collect and store large quantities of data, the tools to analyze and present large quantities of data, and the ability to act on that data. That last item is the most important.

Collecting data, storing it, and analyzing it are tasks that mostly consist of technology, and technology is easily available. Data storage (through SAN or NAS or cloud-based storage) is a matter of money and equipment. Collecting data may be a little harder, since you must decide on what to collect and then you must make the programming changes to perform the actual collection -- but those are not that hard.

Analyzing data is also mostly a matter of technology. We have the computing hardware and the analytic software to "slice and dice" data and serve it up in graphs and visualizations.

The hard part of big data is none of the above. The hard part of big data is deciding a course of action and executing it. Big data gives you information. It gives you insight. And I suspect that it gives you those things faster than your current systems.

It's one thing to collect the data. It's another thing to change your procedures and maybe even your business. Collecting the data is primarily a technology issue. Changing procedures is often a political one. People are (often) reluctant to change. Changes to business plans may shift the balance of power within an organization. Your co-workers may be unwilling to give up some of that power. (Of course, others may be more than happy to gain power.)

The challenge of big data is not in the technology but in the changes driven by big data and the leadership for those changes. Interpreting data, deciding on changes, executing those changes, and repeating that cycle (possibly more frequently than before) is the payoff of big data.

Saturday, December 21, 2013

Files no more

One difference between traditional IT and the new cloud IT is the storage of data. Traditional IT systems (desktop PCs) stored data in files; cloud IT systems store data in ... well, that's not so clear. And the opaqueness of cloud systems may be a good thing.

In the Old World of desktop IT, we stored data in files and used that data in application programs. Often, the data stored in files was stored in a format that was specific to the application: documents would be stored in Microsoft Word format, spreadsheets stored in Microsoft Excel format, etc. The operational model was to run a program, load a file, make changes, and then save the file. The center of the old world was files, with application programs orbiting.

In the New World of cloud computing, we store data in... something... and use that data in applications that run on servers. Thus, with Google Drive (the new name for Google Docs) we store our data on Google's servers and access our data through our browser. Google's servers recall the data and present a view of that data to us through our browser. We can make changes and save the data -- although changes in Google Drive are saved automatically.

Are we storing data in files? Well, perhaps. The data is not stored on our PC, but on Google's servers. That is the magic of "software as a service" -- we can access our data from anywhere.

Getting back to the data. Google must store our data somewhere. Is it stored in a file? Or is it stored as a byte-stream in a datastore like CouchDB or memcached? Our viewpoint on our local PC does not allow us to peer inside of the Google machine, so we have no way to tell how our data is stored.

Yes, I know that Google Drive lets us download our data to a file on our PC. We can pick the location, the name, and even the format for the file. But that is not the core existence of the data, it is an "export" operation that extracts data from Google's world and hands it to us. (We can later import that same data back into Google's world, should we want.)

With software as a service (SaaS), our data is stored, but not as files on our local filesystem. Instead, it is stored in the cloud system and the details are hidden from us.

I think that this is an advance. In traditional IT, storing data in files was necessary, a way to store information that would be usable by an application program. (At least, it was the method used by the original Unix and DEC operating systems.) The notion of a file was an agreement between the processors of data and the keepers of data.

Files are not the only method of storing data. Many systems store data in databases, organizing data by rows and columns. While the databases themselves may store their data in files, the database client applications see only the database API and manipulate records and columns.

I've been picking on Google Drive, but the same logic applies to any software-as-a-service, including Microsoft's Office 365. When we move a document from our PC into Microsoft's cloud, it is the same as moving it into Google's cloud. Are the bytes stored as a separate file, or are they stored in a different container -- perhaps SQL Server? We don't know, and we don't care.

We don't really care about files, or about filesystems. We care about our data.

Tuesday, December 17, 2013

The transition from object-oriented to functional programming

I am convinced that we will move from object-oriented programming to functional programming. I am also convinced that the transition will be a difficult one, more difficult that the transition from structured programming to object-oriented programming.

The transition from structured programming to object-oriented programming was difficult. Object-oriented programming required a new view of programming, a new way of organizing data and code. For programmers who had learned the ways of structured programming, the shift to object-oriented programming meant learning new techniques.

That in itself is not enough to cause the transition to functional programming to be more difficult. Functional programming is, like object-oriented programming, a new view of programming and a new way of organizing data and code. Why would the transition to functional programming be more difficult?

I think the answer lies within the organization of programs. Structured programming, object-oriented programming, and functional programming all specify a number of rules for programs. For structured programming, functions and subroutines should have a single entry point and exit point and IF/THEN/ELSE blocks and FOR/NEXT loops should be used instead of GOTO statements.

Object-oriented programming groups data into classes and uses polymorphism to replace some conditional statements. But object-oriented programming was not totally incompatible with structured programming (or 'procedural programming'). Object-oriented programming allowed for a top level of new design with lower layers of the old design. Many early object-oriented programs had large chunks of procedural code (and some still do to this day). The thin layer of objects simply acted as a container for structured code.

Functional programming doesn't have this same degree of compatibility with object-oriented programming (or structured programming). Functional programming uses immutable objects; object-oriented programming is usually about mutable objects. Functional programming works with sets of data and leverages tail recursion efficiently, object-oriented programming uses the explicit loops and conditional statements of procedural programming.

The constructs of functional programming work poorly at containing object-oriented constructs. The "trick" of wrapping old code in a containing layer of new code may not work with functional programming and object-oriented programming. It may be better to build functional programming constructs inside of object oriented programming constructs, working from the "inside out" rather than from the "outside in" of the object-oriented transition.

One concept that has helped me transition is that of immutable objects. This is a notion that I have "imported" from functional programming into object-oriented programming. (And I must admit that the idea is not mine or not even new; Java's String objects are immutable and have been since its inception.)

The use of immutable objects has improved my object-oriented programs. It has moved me in the direction of functional programming -- a step in the transition.

I believe that we will transition from object-oriented programming to functional programming. I foresee a large effort to do so, and I foresee that some programs will remain object-oriented programs, just as some legacy programs remain procedural programs. I am uncertain of the time frame; it may be in the next five years or the next twenty. (The advantages of functional programming are compelling, so I'm tending to think sooner rather than later.)

Sunday, December 15, 2013

Readable does not necessarily mean what you think it means

An easy way to start an argument among programmers (or possibly a fist-fight) is to ask about the readability of programming languages. Most developers have strong opinions about programming languages, especially when you are not paying them.

I'm not sure that "readability" is an aspect of a programming language. Instead, I think readability is an aspect of a specific program (written in any language). I have used many languages (assembly language, BASIC, FORTRAN, Pascal, COBOL, dBase, RBase, Visual Basic, Delphi, C, C++, Java, C#, Perl, Ruby, and Python, to name the popular ones), and examined many programs (small, large, tutorials, and production-level).

I've seen readable programs in all of these languages (including assembly language). I've seen hard-to-read programs in all of those languages, too.

If readability is not an aspect of a programming language but of the program itself, then what makes a program readable? Lots of people have commented on the topic over the decades. The popular ideas are usually:

  • Use comments
  • Follow coding standards
  • Use meaningful names for variables and functions (and classes)
  • Use structured programming techniques
  • Have high cohesion within modules (or classes) and low coupling

These are all worthy ideas.

I would like to add one more: balance the levels of abstraction. Programs of non-trivial complexity are divided into modules. (For object-oriented programming, these divisions are classes. For structured programming, these are functions and libraries.) A readable program will have a balanced set of modules. By "balanced", I mean in terms of size and complexity.

Given a program and its modules, you can take measurements. Typical measurements include: lines of code (derided by many, and for good reasons), cyclomatic complexity, software "volume", and function points. It doesn't particularly matter which metric; any of them give you relative sizes of modules. Once you have module sizes, you can plot those sizes on a chart.

Frequently, readable programs have a smooth curve of module sizes. And just as frequently, hard-to-read programs have a jagged curve. Hard-to-read programs tend to have one or a few large modules and a few or many small modules.

I'm not sure why such a correlation exists. (And perhaps it doesn't; I admit that my observations are limited and the correlation may be proven false with a larger data set.)

Yet I have a theory.

As a programmer examines the code, he frequently moves from one module to another, following the logic and the flow of control. Modules divide the program not only into smaller components but also into levels. The different levels of the program handle different levels of abstraction. Low levels contain small units of code, small collections of data. Higher levels contain larger units of code that are composed of smaller units.

Moving from module to module means (often) moving from one level of organization to another, lower, level. With a smooth distribution of module sizes, the cognitive difference between levels is small. A path from the top level to a module on the bottom level may pass through many intermediate modules, but each transition is small and easy to understand.

A program with a "lumpy" distribution of module sizes, in contrast, lacks this set of small jumps from level to level. Instead, a program with a lumpy distribution of module sizes (and module complexity) has a lumpy set of transitions from layer to layer. Instead of a smooth set of jumps from the top level to the bottom, the jumps occur erratically, some small and some large. This disparity in jump size puts a cognitive load on the reader, making it hard to follow the changes.

If my theory is right, then a readable program is one that consists of layers, with the difference between any two adjacent layers no larger than some critical amount. (And, conversely, a hard-to-read program has multiple layer-pairs that have differences larger than that amount.)

I think that readability is an attribute of a program, not a programming language. We can write readable programs in any language (well, almost any language; a few obscure languages are designed for obfuscation). I think that the commonly accepted ideas for readable programs (comments, coding standards, meaningful names) are good ideas and helpful for building readable programs. I also think that we must structure our programs to have small differences between levels of abstraction.

Wednesday, December 11, 2013

Tablets change our expectations of software

PC software, over time, has grown in size and complexity. For any product, any version had more features (and used more memory) than the previous version.

Microsoft Windows almost made such a change. The marketing materials for Windows offered many things; one of them was simplicity. Windows programs would be "easy to use", so easy that they would be "intuitive".

While software under Windows become more consistent (identical commands to open and save files) and programs could share data (via the clipboard), they were not simpler. The steady march of "more features" continued. Today, Microsoft Word and Microsoft Excel are the standards for word processing and spreadsheets, and both are bulging with features, menus, add-ins, and sophisticated scripting. Competing PC software duplicates all of those features.

Software for tablets is following a different model.

The products for tablets released by Apple, Google, and even Microsoft are reduced versions of their PC counterparts. Apple's "Pages" is a word processor, but smaller than Microsoft Word. Google's "Drive" (the app formerly called "Docs") is a word processor with fewer features than Microsoft Word, and a spreadsheet with fewer features than Microsoft Excel. Even Microsoft's version of Word and Excel for Windows RT omits certain functions.

I see three drivers of this change:

The tablet interface limits features: Tablets have smaller screens, virtual keyboards, and the software generally has no drop-down menus. It is quite difficult to translate a complex PC program into the tablet environment.

Users want tablet software to be simple: Our relationship with tablets is more intimate than our relationship with PCs. We carry tablets with us, and generally pick the one we want. PCs, in contrast, stay at a fixed location and are assigned to us (especially in the workplace). We accept complexity in PC apps, but we push back against complexity in tablet apps.

Tablets complement PCs: We use tablets and PCs for different tasks and different types of work. Tablets let us consume data and perform specific, structured transactions. We can check the status of our systems, or view updates, or read news. We can bank online or update time-tracking entries with tablet apps. For long-form composition, we still use PCs with their physical keyboards, high-precision mice, and larger screens.

The demands placed upon tablet software is different from the demands placed upon desktop software. (I consider laptop PCs to be portable desktop PCs.) We want desktop PCs for composition; we want tablets for consumption and structured transactions. Those different demands are pushing software in different directions: complex for desktops, simple for tablets.

Monday, December 9, 2013

Off-the-shelf components still need attention

Typical advice for the builder of a system is often "when possible, use commercial products". The idea is that a commercial product is more stable, better maintained, and cheaper in the long run.

Today, that advice might be amended to "use commercial or open source products". The idea is to use a large, ready-made component to do the work, rather than write your own.

For some tasks, a commercial product does make sense. It's better to use the commercial (or open source) compilers for C#, Java, and C++ programs that to write your own compilers. It's better to use the commercial (or open source) word processors and spreadsheets.

Using off-the-shelf applications is a pretty easy decision.

Assembling systems with commercial (or open source) products is a bit trickier.

The issue is dependencies. When building a system, your system becomes dependent on the off-the-shelf components. It also becomes dependent on any external web services.

For some reason, we tend to think of off-the-shelf components (or external web services) as long-lasting. We tend to think that they will endure forever. Possibly because we want to forget about them, to use them like electricity or water -- something that is always available.

Commercial products do not live forever. Popular products such as WordPerfect, Lotus Notes, and Windows XP have all been discontinued. If you build a system on a commercial product and it (the commercial product) goes away, what happens to your system?

Web services do not live forever. Google recently terminated its RSS Reader. Other web services have been born, lived, and died. If you build your system on a web service and it (the web service) goes away, what happens to your system?

Product management is about many things, and one of them is dependency management. A responsible product manager knows about the components used within the product (or has people who know). A responsible product manager keeps tabs on those components (or has people who keep tabs). A responsible product manager has a plan for alternative components, should the current ones be discontinued.

Leveraging off-the-shelf products is a reasonable tactic for system design. It can save time and allow your people to work on the critical, proprietary, custom components that are needed to complete the system. But those off-the-shelf components still require a watchful eye and planning; they do not alleviate you of your duties. They can fail, they can be discontinued, and they can break on new versions of the operating system (yet another off-the-shelf component, and another dependency).

Build you system. Rebuild your legacy system. Use off-the-shelf components when they make sense. Use external web services when they make sense.

Stay aware of those components and stay aware of their status in the market. Be prepared for change.

Thursday, December 5, 2013

The future popularity of languages

The popularity of a programming language matters. Popular languages are, well, popular -- they are used by a large number of people. That large user base implies a large set of available programmers who can be hired (important if you are running a project), a large set of supporting documents and web sites (important for managers and hobbyists), and significant sales for tools and training (important if you sell tools and training).

But there are two senses of popular. One sense is the "lots of people like it now" sense; the other is "lots of people will like it in the future" sense. This distinction was apparent at the beginning of my career, when PCs were new and the major programming languages of COBOL and FORTRAN were not the BASIC and Pascal and C of the PC revolution.

COBOL and FORTRAN were the heavyweights, the languages used by serious people in the business of serious programming. BASIC and Pascal and C (at least on PCs) were the languages of hobbyists and dreamers, "toy" languages on "toy" computers. Yet it was C that gave us C++ and eventually Java and C#, the heavyweight languages of today.

The measurements of language popularity blur these two groups of practitioners and dreamers. The practitioners use the tools for established systems and existing enterprises: in the 1970s and 1980s they used COBOL and today they use C++, C#, and Java. The dreamers of the 1970s used BASIC, C, Forth, and Pascal; today they use... well, what do they use?

The Programming Language Popularity web site contains a number of measures. I think that the most "dreamish" of these is the Github statistics. Github is the site for open source projects of all sizes, from enterprise level down to individual enthusiast. It seems a better measure than the "Craigslist" search (which would be weighted towards corporate projects) or the "Google" search (which would include tutorials and examples but perhaps little in the way of dreams).

The top languages in the "Github" list are:

  • Objective C
  • JavaScript
  • Ruby
  • Java
  • Python
  • PHP

A little later down the list (but still in the top twenty) are: Scala, Haskell, Clojure, Lua, and Erlang.

I think that the "Github" list is a good forward indicator for language popularity. I believe that some of these languages will be the future mainstream development languages.

Which ones exactly, I'm not sure.

Java is already a mainstream language; this index indicates a continued interest. I suspect JavaScript will have a bright future quite soon, with Microsoft supporting it for app development. Apple iOS uses Objective-C, so that one is also fairly obvious.

Languages rise and fall in popularity. Project managers would do well to track the up-and-coming languages. Software tends to live for a long time; if you want to stay with mainstream programming languages, you must move with the mainstream. Looking ahead may help with that effort.

Sunday, December 1, 2013

Echo chambers in the tech world

We have echo chambers in the tech world. Echo chambers are those channels of communication that reinforce certain beliefs, sometimes correct and sometimes not. They exist in the political world, but are not limited to that milieu.

The Apple world has the belief that they are immune to malware, that viruses and other nasty things happen only to Windows and Microsoft products. The idea is "common knowledge", and many Macintosh owners will confirm it. But the idea is more than common; it is self-enforcing. Should a Macintosh owner say "I'm going to buy anti-virus software", other Mac owners will convince (or attempt to convince) him otherwise.

The echo chamber of the Apple world enforces the idea that Apple products are not susceptible to attack.

There is a similar echo chamber for the Linux world.

The Microsoft world has an opposite echo chamber, one that insists that Windows is not secure and extra software is required.

These are beliefs, created in earlier times, that endure. People keep the idea from that earlier time. In other words, people have trained themselves to think of Windows as insecure. Microsoft Windows was insecure, but is more secure. (Yes, it is not perfectly secure.) Similarly, Apple products (and Linux) are not completely secure but people have trained themselves to think that they are.

I will make some statements that people may find surprising and perhaps objectionable:

  • Microsoft Windows is fairly secure (not perfect, but pretty good)
  • Apple MacOS X is not perfect and has security flaws
  • Linux (any variant) is not perfect and has security flaws

We need to be aware of our echo chambers, our dearly-held "common knowledge" that may be false. Such ideas may be comforting, but they lead us away from truth.