Saturday, December 21, 2013

Files no more

One difference between traditional IT and the new cloud IT is the storage of data. Traditional IT systems (desktop PCs) stored data in files; cloud IT systems store data in ... well, that's not so clear. And the opaqueness of cloud systems may be a good thing.

In the Old World of desktop IT, we stored data in files and used that data in application programs. Often, the data stored in files was stored in a format that was specific to the application: documents would be stored in Microsoft Word format, spreadsheets stored in Microsoft Excel format, etc. The operational model was to run a program, load a file, make changes, and then save the file. The center of the old world was files, with application programs orbiting.

In the New World of cloud computing, we store data in... something... and use that data in applications that run on servers. Thus, with Google Drive (the new name for Google Docs) we store our data on Google's servers and access our data through our browser. Google's servers recall the data and present a view of that data to us through our browser. We can make changes and save the data -- although changes in Google Drive are saved automatically.

Are we storing data in files? Well, perhaps. The data is not stored on our PC, but on Google's servers. That is the magic of "software as a service" -- we can access our data from anywhere.

Getting back to the data. Google must store our data somewhere. Is it stored in a file? Or is it stored as a byte-stream in a datastore like CouchDB or memcached? Our viewpoint on our local PC does not allow us to peer inside of the Google machine, so we have no way to tell how our data is stored.

Yes, I know that Google Drive lets us download our data to a file on our PC. We can pick the location, the name, and even the format for the file. But that is not the core existence of the data, it is an "export" operation that extracts data from Google's world and hands it to us. (We can later import that same data back into Google's world, should we want.)

With software as a service (SaaS), our data is stored, but not as files on our local filesystem. Instead, it is stored in the cloud system and the details are hidden from us.

I think that this is an advance. In traditional IT, storing data in files was necessary, a way to store information that would be usable by an application program. (At least, it was the method used by the original Unix and DEC operating systems.) The notion of a file was an agreement between the processors of data and the keepers of data.

Files are not the only method of storing data. Many systems store data in databases, organizing data by rows and columns. While the databases themselves may store their data in files, the database client applications see only the database API and manipulate records and columns.

I've been picking on Google Drive, but the same logic applies to any software-as-a-service, including Microsoft's Office 365. When we move a document from our PC into Microsoft's cloud, it is the same as moving it into Google's cloud. Are the bytes stored as a separate file, or are they stored in a different container -- perhaps SQL Server? We don't know, and we don't care.

We don't really care about files, or about filesystems. We care about our data.

No comments: