Showing posts with label databases. Show all posts
Showing posts with label databases. Show all posts

Sunday, February 12, 2017

Databases, containers, and Clarke's first law

A blog post by a (self-admitted) beginner engineer rants about databases inside of containers. The author lays out the case against using databases inside containers, pointing out potential problems from security to configuration time to the problems of holding state within a container. The argument is intense and passionate, although a bit difficult for me to follow. (That, I believe, is due to my limited knowledge of databases and my even more limited knowledge of containers.)

I believe he raises questions which should be answered before one uses databases in containers. So in one sense, I think he is right.

In a larger sense, I believe he is wrong.

For that opinion, I refer to Clarke's first law, which states: When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.

I suspect that it applies to sysadmins and IT engineers just as much as it does to scientists. I also suspect that age has rather little effect, too. Our case is one of a not-elderly not-scientist claiming that databases inside of containers is impossible, or at least a Bad Idea and Will Lead Only To Suffering.

My view is that containers are useful, and databases are useful, and many in the IT field will want to use databases inside of containers. Not just run programs that access databases on some other (non-containerized) server, but host the database within a container.

Not only will people want to use databases in containers, there will be enough pressure and enough interested people that they will make it happen. If our current database technology does not work well with containers, then engineers will modify containers and databases to make them work. The result will be, quite possibly, different from what we have today. Tomorrow's database may look and act differently from today's databases. (Just as today's phones look and act differently from phones of a decade ago.)

Utility is one of the driving features of technology. Containers have it, so they will be around for a while. Databases have it (they've had it for decades) and they will be around for a while. One or both may change to work with the other.

We'll still call them databases, though. The term is useful, too.

Sunday, February 17, 2013

Losing data in the cloud of big data

NoSQL databases have several advantages over traditional SQL databases -- in certain situations. I think most folks agree that NoSQL databases are better for some tasks, and SQL databases are better in others. And most discussions about Big Data agree that NoSQL is the tool for Big Data databases.

One aspect that I have not seen discussed is auditing. That is, knowing that we have all of the data we expect to have. Traditional data processing systems (accounting, insurance, banking, etc.) have lots of checks in place to ensure that all transactions are processed and none are lost.

These checks and audits were put in place over a long time. I suspect that each error, when detected, was reviewed and a check was added to prevent such errors, or at least detect them early.

Do we have these checks in our Big Data databases? Is it even possible to build the checks for accountability? Big Data is, by definition, big. Bigger than normal, and bigger than one can conveniently inventory. Big Data can also contain things that are not always auditable. We have the techniques to check bank accounts, but how can we check something non-numeric such as photographs, tweets, and Facebook posts?

On the other hand, there may be risks from losing data, or subsets of data. Incomplete datasets may contain bias, a problem for sampling and projections. How can you trust your data if you don't have the checks in place?