Wednesday, September 18, 2013

Big Data proves the value of open source

Something significant happened with open source software in the past two years. An event that future historians may point to and say "this is when open source software became a force".

That event is Big Data.

Open source has been with us for decades. Yet for all the technologies we have, from the first plug-board computers to smart phones, from the earliest assemblers to the latest language compilers, from the first IDE to Visual Studio, open source software has always copied the proprietary tools. Open source tools have always been implementations of existing ideas. Linux is a functional copy of Unix. The open source compilers and interpreters are for existing languages (C, C++, Fortran, Java). LibreOffice and Open Office are clones of Microsoft Office. Eclipse is an open source IDE, an idea that predates the IBM PC.

Yes, the open source versions of these tools have their own features and advantages. But the ideas behind these tools, the big concepts, are not new.

Big Data is different. Big Data is a new concept, a new entry in the technology toolkit, and its tools are (just about) all open source. Hadoop, NoSQL databases, and many analytics tools are open source. Commercial entities like Oracle and SAS may claim to support Big Data, their support seems less "Big Data" and more "our product can do that too".

A few technologies came close to being completely open source. Web servers are mostly open source, with stiff competition from Microsoft's (closed source) IIS. The scripting languages (Perl, Python, and Ruby) are all open source, but they are extensions of languages like AWK and the C Shell, which were not initially open source.

Big Data, from what I can see, is the first "new concept" technology that has a clear origin in open source. It is the proof that open source can not only copy existing concepts, but introduce new ideas to the world.

And that is a milestone that deserves recognition.

No comments: