Fitzpatrick's Fabulous Future: NoSQL

Showing posts with label NoSQL. Show all posts

Thursday, April 19, 2018

Why no language to replace SQL?

The history of programming is littered with programming languages. Some endure for ages (COBOL, C, Java) and some live briefly (Visual J++). We often develop new languages to replace existing ones (Perl, Python).

Yet one language has endured and has seen no replacements: SQL.

SQL, invented in the 1970s and popularized in the 1980s, has lived a good life with no apparent challengers.

It is an anomaly. Every language I can think of has a "challenger" language. FORTRAN was challenged by BASIC. BASIC was challenged by Pascal. C++ was challenged by Java; Java was challenged by C. Unix shell programming was challenged by AWK, which in turn was challenged by Perl, which in turn has been challenged by Python.

Yet there have been no (serious) challengers to SQL. Why not?

I can think of several reasons:

Everyone loves SQL and no one wants to change it.
Programmers think of SQL as a protocol (specialized for databases) and not a programming language. Therefore, they don't invent a new language to replace it.
Programmers want to work on other things.
The task is bigger than a programming language. Replacing SQL means designing the language, creating an interpreter (or compiler?), command-line tools (these are programmers, after all), bindings to other languages (Python, Ruby, and Perl at minimum), and data access routines. With all features of SQL, including triggers, access controls, transactions, and audit logs.
SQL gets a lot of things right, and works.

I'm betting on the last. SQL, for all of its warts, is effective, efficient, and correct.

But perhaps there is a challenger to SQL: NoSQL.

In one sense, NoSQL is a replacement for SQL. But it is a replacement of more than the language -- it is a replacement of the notion of data structure. NoSQL "databases" store documents and photographs and other things, but they are rarely used to process transactions. NoSQL databases don't replace SQL databases, they complement them. (Some companies move existing data from SQL databases to NoSQL databases, but this is data that fits poorly in the relational structure. They move some of their data but not all of their data out of the SQL database. These companies are fixing a problem, not replacing the SQL language.)

NoSQL is a complement of SQL, not a replacement (and therefore not a true challenger). SQL handles part of our data storage and NoSQL handles a different part.

It seems that SQL will be with us for some time. It is tied to the notion of relational organization, which is a useful mechanism for storing and processing homogeneous data.

Wednesday, March 19, 2014

The fecundity of programming languages

Some programming languages are more rigorous than others. Some programming languages are said to be more beautiful than others. Some programming languages are more popular than others.

And some programming languages are more prolific than others, in the sense that they are the basis for new programming languages.

Algol, for example, influenced the development of Pascal and C, which in turn influenced Java, C# and many others.

FORTRAN influenced BASIC, which in turn gave us CBASIC, Visual Basic, and True Basic.

The Unix shell lead to Awk and Perl, which influenced Python and Ruby.

But COBOL has had little influence on languages. Yes, it has been revised, including an object-oriented version. Yes, it guided the PL/I and ABAP languages. But outside of those business-specific languages, COBOL has had almost no effect on programming languages.

Why?

I'm not certain, but I have two ideas: COBOL was as early language, and it designed for commercial uses.

COBOL is one of the earliest languages, dating back to the 1950s. Other languages of the time include FORTRAN and LISP (and oodles of forgotten languages like A-0 and FLOWMATIC). We had no experience with programming languages. We didn't know what worked and what didn't work. We didn't know which language features were useful to programmers. Since we didn't know, we had to guess.

For a near-blind guess, COBOL was pretty good. It has been useful in close to its original form for decades, a shark in the evolution of programming languages.

The other reason we didn't use COBOL to create other languages is that it was commercial. It was designed for business transactions. While it ran on general-purpose computers, COBOL was specific to the financial applications, and the people who would tinker and build new languages were working in other fields and with computers other than business mainframes.

The tinkerers were using minicomputers (and later, microcomputers). These were not in the financial setting but in universities, where people were more willing to explore new languages. Minicomputers from DEC were often equipped with FORTRAN and BASIC. Unix computers were equipped with C. Microcomputers often came with BASIC baked in, because it was easier for individuals to use.

COBOL's success in the financial sector may have doomed it to stagnancy. Corporations (especially banks and insurance companies) lean conservative with technology and programming; they prefer to focus on profits and not research.

I see a similar future for SQL. As a data descriptions and access language, it does an excellent job. But it is very specific and cannot be used outside of that domain. The up-and-coming NoSQL databases avoid SQL in part, I think, because the SQL language is tied to relational algebra and structured data. I see no languages (well, no popular languages) derived from SQL.

I think the languages that will influence or generate new languages will be those which are currently popular, easily learned, and easily used. They must be available to the tinkerers of today; those tinkerers will be writing the languages of the future. Tinkerers have limited resources, so less expensive languages have an advantage. Tinkerers are also a finicky bunch, with only a few willing to work with ornery products (or languages).

Considering those factors, I think that future languages will come from a set of languages in use today. That set includes C, C#, Java, Python, and JavaScript. I omit a number of candidates, including Perl, C++, and possibly your favorite language. (I consider Perl and C++ difficult languages; tinkerers will move to easier languages. I would like to include FORTH in the list, but it too is a difficult language.)

Sunday, October 13, 2013

Unstructured data isn't really unstructured

The introduction of NoSQL databases has brought along another concept: unstructured data. Advocates of NoSQL are quick to point out that relational databases are limited to structured data, and NoSQL data stores can handle unstructured (as well as structured) data.

I think that word does not mean what you think it means.

I've seen lots of data, and all of it has been structured. I have yet to meet unstructured data. All data has structure -- the absence of structure implies random data. Even the output of a pseudo-random number generator are structured; it is a series of numeric values.

When people say "structured data", they really mean "a series of objects, each conforming to a specific structure, known in advance". Tables in a relational database certainly follow this rule, as do records in a COBOL data declaration.

NoSQL data stores relaxes these constraints, but still requires that data be structured. If a NoSQL data store uses JSON notation, then each object must be stored in a manner consistent with JSON. The objects in a set may contain different properties, so that one object has a structure quite different from the next object, but each object must be structured.

This notion is not new. While COBOL and FORTRAN were efficient at processing homogenous records, Pascal allowed for "variant" records (it used a key field at the beginning of the record to identify the record type and layout).

What is new is that the object layout is not known in advance. Earlier languages and database systems required the design of the data up front. A COBOL program would know about customer records, for example, and a FORTRAN program would know about data observations. The structure of the data was "baked in" to the program. A new type of customer, or a new type of data set, would require a new version of the program or database schema.

NoSQL lets us create new structures without changing the program or schema. We can add new fields and create new objects for storage and processing, without changing the code.

So as I see it, it's not that data is unstructured. The idea is that we have reduced the coupling between the data from the program. Data is still structured, but the structure is not part of the code.

Sunday, February 17, 2013

Losing data in the cloud of big data

NoSQL databases have several advantages over traditional SQL databases -- in certain situations. I think most folks agree that NoSQL databases are better for some tasks, and SQL databases are better in others. And most discussions about Big Data agree that NoSQL is the tool for Big Data databases.

One aspect that I have not seen discussed is auditing. That is, knowing that we have all of the data we expect to have. Traditional data processing systems (accounting, insurance, banking, etc.) have lots of checks in place to ensure that all transactions are processed and none are lost.

These checks and audits were put in place over a long time. I suspect that each error, when detected, was reviewed and a check was added to prevent such errors, or at least detect them early.

Do we have these checks in our Big Data databases? Is it even possible to build the checks for accountability? Big Data is, by definition, big. Bigger than normal, and bigger than one can conveniently inventory. Big Data can also contain things that are not always auditable. We have the techniques to check bank accounts, but how can we check something non-numeric such as photographs, tweets, and Facebook posts?

On the other hand, there may be risks from losing data, or subsets of data. Incomplete datasets may contain bias, a problem for sampling and projections. How can you trust your data if you don't have the checks in place?

Sunday, January 6, 2013

The revenge of Prolog

I picked up a copy of Clocksin and Mellish, the "Programming in Prolog" text from 1981. (Actually, it's the second edition from 1984.)

This tome gave me a great deal of difficulty back in the late 1980s. For some reason, I just could not "get" the idea of the Prolog language.

Reading it now, however, is another story. Now I "get" the idea of Prolog. A few years of experience can have a large effect on one's comprehension of "new" technology.

What strikes me about Prolog is that it is not really a programming language. At least, not in the procedural sense. (And it isn't. Clocksin and Mellish state this up front, in the preface to the first edition.)

If Prolog isn't a programming language, then what is it? As I see it, Prolog is a language for interrogating databases. Not SQL databases with tables and rows and columns, but databases that hold bits of information that Clocksin and Mellish call "facts".

For example, one can define a set of facts:

male(albert).
male(edward).
female(alice).
female(victoria).
parents(edward, victoria, albert).
parents(alice, victoria, albert).

Then define a relationship:

sister_of(X,Y) :- female(X), parents(X, M, F), parents(Y, M, F).

And then one can run a query:

sister_of(alice, edward).

Prolog will answer "yes", working out that Alice is the sister of Edward.

Today, we would call this database of facts a NoSQL database. And that is what the Prolog database is. It is a NoSQL database, that is, a collection of data that does not conform (necessarily) to the strict schema of a relational database. The Prolog database is less capable than modern NoSQL databases: it is limited to text items, numeric items, and aggregations of items. Modern NoSQL databases can hold these and more: pictures, audio files, and all sorts of things.

Finding an early version of the NoSQL database concept in the Prolog language is a pleasant surprise. For me, it validates the notion of NoSQL databases. And it validates the idea of Prolog. Great minds think alike, and similar solutions to problems, separated by time, confirm the value of the solution.

Sunday, July 22, 2012

NoSQL is no big deal -- and that is a big deal

Things move fast in tech, or so they say. I saw this effect in action at the OSCON conference, just last week.

I attended this conference last year, so I can compare the topics of interest for this year against last year.

Last year, NoSQL was the big thing. People were talking about it. Vendors were hawking their NoSQL databases. Developers were talking (incessantly, at times) about their projects that use NoSQL databases. Presenters gave classes on the concepts and proper use of NoSQL databases.

This year, no one was talking about it. (Well, a few people were, but they were a tiny minority.) It wasn't that people had rejected NoSQL databases. In fact, quite the opposite: people had accepted them as normal technology. NoSQL databases are now considered to be "just another tool in our set".

So in the space of twelve months, NoSQL has gone from "the cool new thing" to "no big deal".

And that, I think, is a big deal.

Fitzpatrick's Fabulous Future