Showing posts with label real time. Show all posts
Showing posts with label real time. Show all posts

Tuesday, December 15, 2015

The new real time

In the 1970s, people in IT pursued the elusive goal of "real time" computing. It was elusive because the term was poorly defined. With no clear objective, any system could be marketed as "real time". Marketing folks recognized the popularity of the term and anything that could remotely be described as "real time" was described as "real time".

But most people didn't need (or want) "real time" computing. They wanted "fast enough" computing, which generally meant interactive computing (not batch processing) that responded to requests quickly enough that clerks and bank tellers could answer customers' questions in a single conversation. Once we had interactive computing, we didn't look for "real time" and interest in the term waned.

To be fair to "real time", there *is* a definition of it, one that specifies the criteria for a real-time system. But very few systems actually fall under those criteria, and only a few people in the industry actually care about the term "real time". (Those that do care about the term really do care, though.)

Today, we're pursuing something equally nebulous: "big data".

Lots of people are interested in big data. Lots of marketers are getting involved, too. But do we have a clear understanding of the term?

I suspect that the usage of the term "big data" will follow an arc similar to that of "real time", because the forces driving the interest are similar. Both "real time" and "big data" are poorly defined yet sound cool. Further, I suspect that, like "real time", most people looking for "big data" are really looking for something else. Perhaps they want better analytics ("better" meaning faster, more frequent, more interaction and drill-down capabilities, or merely prettier graphics) for business analysis. Perhaps they want cheaper data storage. Perhaps they want faster development times and fewer challenges with database management.

Whatever the reason, in a few years (I think less than a decade) we will not be using the term "big data" -- except for a few folks who really need it and who really care about it.

Tuesday, July 17, 2012

How big is "big"?

A recent topic of interest in IT has been "big data", sometimes spelled with capitals: "Big Data". We have no hard and fast definition of big data, no specific threshold to cross from "data" to "big data". Does one terabyte constitute "big data"? If not, what about one petabyte?

This puzzle is similar to the question of "real time". Some systems must perform actions in "real time", yet we do not have a truly standard definition of them. If I design a dashboard system for an automobile and equip the automobile with sensors that report data every two seconds, then a real-time dashboard system must process all of the incoming data, by definition. Should I replace the sensors with units that report data every 1/2 second and the dashboard cannot keep up with the faster rate, then the system is not "real time".

But this means that the definition of "real time" depends not only on the design of the processing unit, but also the devices to which it communicates. The system may be considered "real time" until we change a component, then it is not.

I think that the same logic holds for "big data" systems. Today, we consider multiple petabytes to be "big data". Yet in in 1990 when PCs had disks of 30 megabytes, a data set of one gigabyte would be considered "big data". And in the 1960s, a data set of one megabyte would be "big data".

I think that, in the end, the best we can say is that "big" is as big as we want to define it, and "real time" is as fast as we want to define it. "Big data" will always be larger than the average organization can comfortably handle, and "real time" will always be fast enough to process the incoming transactions.

Which means that we will always have some systems that handle big data (and some that do not), and some systems that run in real time (and some that do not). Using the terms properly will rely not on the capabilities of the core components alone, but on our knowledge of the core and peripheral components. We must understand the whole system to declare it to be "big data" or "real time".