Showing posts with label Hadoop. Show all posts
Showing posts with label Hadoop. Show all posts

Thursday, October 10, 2013

Hadoop shows us a possible future of computing

Computing has traditionally been processor-centric. The classic model of computing has a "central processing unit" which performs computations. The data is provided by "peripheral devices", processed by the central unit, and then routed back to peripheral devices (the same as the original devices or possibly others). Mainframes, minicomputers, and PCs all use this model. Even web applications use this model.

Hadoop changes this model. It is designed for Big Data, and the size of data requires a new model. Hadoop stores your data in segments across a number of servers -- with redundancy to prevent loss -- with each segment being 64MB to 2GB. If your data is smaller than 64MB, moving to Hadoop will gain you little. But that's not important here.

What is important is Hadoop's model. Hadoop moves away from the traditional computing model. Instead of a central processor that performs all calculations, Hadoop leverages servers that can hold data and also perform calculations.

Hadoop makes several assumptions:

  • The code is smaller than the data (or a segment of data)
  • Code is transported more easily than data (because of size)
  • Code can run on servers

With these assumptions, Hadoop builds a new model of computing. (To be fair, Hadoop may not be the only package that builds this new model of distributed processing -- or even the first. But it has a lot of interest, so I will use it as the example.)

All very interesting. But here is what I find more interesting: the distributed processing model of Hadoop can be applied to other systems. Hadoop's model makes sense for Big Data, and systems with Little (that is, not Big) data should not use Hadoop.

But perhaps smaller systems can use the model of distributed processing. Instead of moving data to the processor, we can store data with processors and move code to the data. A system could be constructed from servers holding data, connected with a network, and mobile code that can execute anywhere. The chief tasks then become identifying the need for code and moving code to the correct location.

That would give us a very different approach to system design.