Fitzpatrick's Fabulous Future: data services

Showing posts with label data services. Show all posts

Sunday, June 14, 2015

Data services are more flexible than files

Data services provide data. So do files. But the two are very different.

In the classic PC world ("classic" meaning desktop applications), the primary storage mechanism is the file. A file is, at its core, a bunch of bytes. Not just a random collection of bytes, but a meaningful collection. That collection could be a text file, a document, a spreadsheet, or any one of a number of possibilities.

In the cloud world, the primary storage mechanism is the data service. That could be an SQL database, a NoSQL database, or a web service (a data service). A data service provides a collection of values, not a collection of bytes.

Data services are active things. They can perform operations. A data service is much like a query in an SQL database. (One may think of SQL as a data service, if one likes.) You can specify a subset of the data (either columns or rows, or both), the sequence in which the data appears (again, either columns or rows, or both), and the format of the data. For sophisticated services, you can collect data from multiple sources.

Data services are much more flexible and powerful than files.

But that's not what is interesting about data services.

What is interesting about data services is the mindset of the programmer.

When a programmer is working with data files, he must think about what he needs, what is in the file, and how to extract what he needs from the file. The file may have extra data (unwanted data rows, or perhaps undesired headings and footings). The file may have extra columns of data. The data may be in a sequence different from the desired sequence. The data may be in a format that is different from what is needed.

The programmer must compensate for all of these things, and write code to handle the unwanted data or the improper formats. Working with files means writing code to match the file.

In contrast, data services -- well-designed data services -- can format the data, filter the data, and clean the data for the programmer. Data services have capabilities that files do not; they are active and can perform operations.

A programmer using files must think "what does the file provide, and how can I convert it to what I need?"; a programmer using data services thinks "what do I need?".

With data services, the programmer can think less about what is available and think more about what has to be done with the data. If you're a programmer or a manager, you understand how this change makes programmers more efficient.

If you're writing code or managing projects, think about data services. Even outside of the cloud, data services can reduce the programming effort.

Saturday, April 27, 2013

"Not invented here" works poorly with cloud services

This week, colleagues were discussing the "track changes" feature of Microsoft Word. They are building an automated system that uses Word at its core, and they were encountering problems with the "track changes" feature.

This problem lead me to think about system design.

Microsoft Word, while it has a COM-based engine and separate UI layer, is a single, all-in-one, solution for word processing. Every function that you need (or that Microsoft thinks you need) is included in the package.

This design has advantages. Once Word is installed, you have access to every feature. All of the features work together. A new version of Word upgrades all of the features -- none are left behind.

Yet this design is a form of "not invented here". Microsoft supplies the user interface, the spell-check engine and dictionary, the "track changes" feature, and everything else. Even when there were other solutions available, Microsoft built their own. (Or bought an existing solution and welded it into Word.)

Word's design is also closed. One cannot, for example, replace Microsoft's spell-checker with another one. Nor can one replace the "track changes" feature with your own version control system. You are stuck with the entire package.

This philosophy worked for desktop PC software. It works poorly with cloud computing.

In cloud computing, every feature in your system is a service. Instead of a monolithic program, a system is a collection of services, each providing some small amount of well-defined processing. Cloud computing needs this design to scale to larger workloads; you can add more servers for the services that see more demand.

With a system built of services, you must decide on the visibility of those services. Are they open to all? Closed to only your processes? Or do you allow a limited set of users (perhaps subscribers) to use them?

Others must make this decision too. The US Postal Service may provide services for address validation, separate and independent from mailing letters. Thus companies like UPS and FedEx may choose to use those services rather than build their own.

Some companies already do this. Twitter provides information via its API. Lots of start-ups provide information and data.

Existing companies and organizations provide data, or will do so in the future. The government agency NOAA may provide weather information. The New York Stock Exchange may provide stock prices and trade information (again, perhaps only to subscribers). Banks may provide loan payment calculations.

You can choose to build a system in the cloud with only your data and services. Or you can choose to use data and services provided by others. Both have advantages (and risks).

But the automatic reflex of "not invented here" has no place in cloud system design. Evaluate your options and weigh the benefits.

Sunday, April 22, 2012

The big bucket at the center of all things

A lot of organizations have a central database.

The database is the equivalent of a large bucket into which all information is poured. The database is the center of the processing universe for these companies, with application programs relegated to the role of satellites orbiting the database.

The problem with this approach is that the applications are tied to the database schema. When you design your applications and tie them into the central database, you can easily bind them to the schema of the database.

The result is a large collection of applications that all depend on the schema of the database. If you change the schema, you run the risk of breaking some or all of your applications. At minimum you must recompile and redeploy your applications; it may be necessary to redesign some of them.

Notice that this approach scales poorly. As you create new applications, your ability to change the database schema declines. (More specifically, the cost of changing the database schema increases.)

You can make some types of changes to the schema without affecting applications. You can add columns to tables, and you can add tables and views. You can add new entities without affecting existing applications, since they will not be using the new entities. But you cannot rename a table or column, or change the parameters to a stored procedure, without breaking the applications that use those elements. Your ability to change the schema depends on your knowledge of specific dependencies of applications on the database.

Cloud computing may help this problem. Not because cloud computing has scalable processing or scalable database storage. Not because cloud computing uses virtualized servers. And not because cloud computing has neat brand names like "Elastic Cloud" and "Azure".

Cloud computing helps the "database at the center of the universe" by changing the way people think of systems. With cloud computing, designers think in terms of services rather than physical entities. Instead of thinking of a single processor, cloud designers think of processing farms. Instead of thinking of a web server, designers think of web services. Instead of thinking of files, cloud designers think of message queues.

Cloud computing can help solve the problem of the central database by getting people to think of the database as data provided by services, not data defined by a schema. By thinking in terms of data services, application designers then build their applications to consume services. A service layer can map the exposed service to the private database schema. When the schema changes, the service layer can absorb the changes and the applications can remain unchanged.

Some changes to the database schema may bleed through the service layer. Some changes are too large to be absorbed. For those cases, the challenge becomes identifying the applications that use specific data services, a task that I think will be easier than identifying applications that use specific database tables.

Fitzpatrick's Fabulous Future