Wednesday, January 23, 2013

What you need for analytics

The next fad seems to be analytics. Some might call it "Big Data". Some might argue that the two are not the same. (They're right, the two are different.)

The use of analytics (or Big Data, the argument is the same), requires several things. You need more than just Hadoop and someone who knows the Map-Reduce pattern.

Here's a short list of things to consider:

Your data You need a way to collect your data. You need people who know where your data resides, the format of your data, and the frequency at which it changes.

Your business Most organizations have data that encodes information in specific forms. You need people who understand your business and who can interpret your data.

Your management style Different organizations have different styles of management. You need people who can prepare the data in formats and on frequencies that make sense to your decision-makers. You also need the tools to present it. Those tools can be the analytics software or they can be separate tools and presentation mechanisms, like a printed report.

Resources Analytics, despite the vendor promises, does not happen "by itself". You need people, computers, storage, networks, and software to make it happen.

An open mind Once you have all of the above, you can start using analytics. Are you prepared to benefit from it?

Persistence and patience Analytics systems must be tuned to provide the information you can use. The charts and graphs that contain the really useful results are often not the ones we pick at the start. It is only after we "play with the data" do we identify the pertinent analyses.

A while back, I worked on a project to analyze source code. It was not the typical project for analytics (or Big Data) but we had a lot of code and it was big-ish. We dedicated resources and started with simple analyses (lines of code, lines of non-comment source code). As we developed our knowledge, we changed the analysis. We shifted from simple lines-of-code to complexity, and then duplicate code, and finally to "churn" of the code base and changes to code.

Each of these phases took resources. We needed processing power, storage, and network access. We needed people to code some very specific parsers for our data (the source code) and some custom data storage techniques. (NoSQL was not available at the time.)

Each of these phases took time. We were feeling our way in the dark, which is frequently the case with analytics projects. (You don't know what you're looking for until you find it.) In our case, the useful bit of information was lines of code changed, which turned out to be a consistent predictor for defect counts. (We found a steady pace of 5 defects per 1000 lines of code changed.)

This predictor was useful to determine the effort for testing. (We were, uh, using a technique other than Agile Development.)

But the point is not about development methods. The point is that successful analytics require resources and time. They are often experimental.

Of all the items you need, an open mind is the most important. Once you have the analysis, once you have the result, how do you choose to use it? Do you believe the surprising result? Are you willing to change your processes in light of the analysis? Or do you look for reports than confirm your preconceived beliefs, and keep your current processes in place?

No comments: