Tuesday, March 4, 2014

After Big Data comes Big Computing

The history of computing is a history of tinkering and revision. We excel at developing techniques to handle new challenges.

Consider the history of programming:

Tabulating machines

  • plug-boards with wires

Von Neumann architecture (mainframes)

  • machine language
  • assembly language
  • compilers (FORTRAN and COBOL)
  • interpreters (BASIC) and timeshare systems

The PC revolution (the IBM PC)

  • assembly language
  • Microsoft BASIC

The Windows age

  • Object-oriented programming
  • Event-driven programming
  • Visual Basic

Virtual machines

  • UCSD p-System
  • Java and the JVM

Dynamic languages

  • Perl
  • Python
  • Ruby
  • Javascript

This (severely abridged) list of hardware and programming styles shows how we change our technology. Our progress is not a smooth advance from one level to the present, but a series of jumps, some of them quite large. It was a large jump from plug-boards to memory-resident programs. It was another large jump to an assembler. One can argue that later jumps were larger or smaller, but those arguments are not important to the basic idea.

Notice that we do not know where things are going. We do not see the entire chain up front. In the 1950s, we did not know that we would end up here (in 2014) with dynamic languages and cloud computing. Often we cannot see the next step until it is upon us and only the best of visionaries can see past it.

Big Data is such a jump, enabled by cheap storage and cloud computing. That change in technology is upon us.

Big Data is the acquisition and storage (and use) of large quantities of data. Not just "lots of data" but mind-boggling quantities of data. Data that makes our current "very large" databases look small and puny. Data that contains not only financial transactions but server logs, e-mails, security videos, medical records, and sensor readings from just about any kind of device. (The sensor readings may be from building sensors for temperature, from vehicles for position and speed and engine performance, from packages in transit, from assembly lines, from gardens and parks for temperature and humidity, ... the list is endless.)

But what happens once we acquire and store these mind-boggling heaps of data?

The obvious solution is to do something with it. And we are doing something with it; we use tools like Hadoop to process and analyze and visualize it.

I think Hadoop (and its brethren) are a good start. We're at the dawn of the "Big Data Age", and we don't really know what we want -- in terms of analyses and tools. We have some tools, and they seem okay.

But this is just the dawn of the "Big Data Age". I think we will develop new techniques and tools to analyze our data. And, I suspect those tools and techniques will require lots of computation. So much computation that someone will coin the term "Big Computing" to represent the use of mind-boggling amounts of computing power.

Big Computing seems a natural follow-on to Big Data. And just as we have developed languages to handle new programming challenges, we will develop new languages for Big Computing.

We have two hints for programming in the era of Big Computing. One hint is cloud computing, with its ability to scale up as we need more power. We've already seen that programs for the cloud have a different organization than "classic" programs. Cloud programs use small modules connected by message queues. The modules hold no state, which allows the system to route transactions to any available module.

The other hint is at the small end of the computing world, at the chip level. Here we see advances in processor design: more cores, more caching, more processing. The GreenArrays GA144 is a chip that contains 144 computers -- not cores, but computers. This is another contender for Big Computing.

I'm not sure what "Big Computing" and its programming will look like, but I am confident that they will be interesting!

No comments: