Showing posts with label file formats. Show all posts
Showing posts with label file formats. Show all posts

Monday, August 24, 2015

The file format wars are over -- and text won

When I first started with computers, files were simple things. Most of them were source code, and a few of them were executables. The source code (BASIC, FORTRAN, and assembly) were all plain text files. The executables were in binary, since they contained machine instructions.

That simple world changed with the PC revolution and the plethora of applications that it brought. Wordstar used a format that was almost text, with ASCII characters and the end of each word marked with a regular character with its 8th bit set. Lotus 1-2-3 used a special file format for its worksheets. dBase II (and dBase III, and dBase IV) used a special format for its data.

There was a "carboniferous explosion" of binary formats. Each and every application had its own format. Binary formatted data was smaller to store, easier to parse, and somewhat proprietary. The last was important for the commercial market; once a customer had lots of data locked in a proprietary format they were unwilling to change to a competitor's product.

The conversion from DOS to Windows changed little. Applications kept their proprietary, binary formats.

Yet recently (that is, with the rise of web services and mobile computing) binary formats have declined. The new favorites are text-based formats: XML, JSON, and YAML.

I have seen no new proprietary, binary format lately. New formats have been one of the text-based formats. Even Microsoft has changed its Office applications (Word, Excel, Powerpoint, and others) to use an XML-based set of files.

This is a big change. Why did it happen?

I can think of several reasons:

First is the existence of the formats. In the "age of binary formats", a binary format was how one stored data. Everyone did it.

Second is the abundance of storage. With limited storage space, a binary format is smaller and a better fit. With today's available storage that pressure does not exist.

Third is the availability of libraries to parse and construct the text formats. We can easily read and write XML (or JSON, or YAML) with commonly-available, tested, working libraries. A proprietary format requires a new (untested) library.

Fourth is the pressure of legislation. Some countries (and some large companies) have mandated the use of open formats, to prevent the lock-in of proprietary data formats.

All of these are good reasons, yet I think there is another factor.

In the past, a file format served the application program. In the data processing world, our mindsets considered applications to "own" the data, with files being nothing more than a convenient holding space to be used when the application was not running (or when it was processing data from a different file). Programs did not share data -- or on the rare occasions when they did, it was through databases or plain text files.

Today, our mobile device apps share data with cloud-based systems. The cloud-based systems are collections of independent applications performing coordinated work. The nature of mobile/cloud is to share data from one application to another. This sharing between programs (sometimes written in different languages) is easier with standard formats and difficult with proprietary formats.

New systems will be developed with open (text) formats for storage and exchange. That means that our existing systems, the dinosaurs of the processing world with their proprietary formats, will fall out of favor.

I don't expect them to vanish completely. They work, which is an important virtue. Replacing them with a new system (or simply modifying them to use text formats) would be expensive with little apparent return on investment. Yet continuing to use them implies that some amount of data (a significant amount) will be locked within proprietary non-text formats.

Expect calls for people with skills in these file formats.

* * * * *

The recent supreme court decision about Java's API (in which the court decided not to hear an appeal) means that for now APIs and file formats can be considered intellectual property. It may be difficult to reverse-engineer the formats for old systems without the expressed permission of the vendor. (And if the vendor is out of business or sold to a larger company, it may be very difficult to obtain such permission.)

Companies may want to evaluate the risk of their data formats.

Tuesday, July 16, 2013

BYOD is about file formats, not software

We tend to think about BYOD (Bring Your Own Device) as freeing the company from expensive software, and freeing employees to select the device and software that works best for them. Individuals can use desktop PCs, laptops, tablets, or smartphones. They can use Microsoft Office or LibreOffice -- or some other tool that lets them exchange files with the team.

One aspect that has been given little thought is upgrades.

Software changes over time. Vendors release new versions. (So do open source projects.)

An obvious question is: How does an organization coordinate changes to new versions?

But the question is not about software.

The real question is: How does an organization coordinate changes to file formats?

Traditional (that is, non-BYOD) shops distribute software to employees. The company buys software licenses and deploys it. Large companies have teams dedicated to this task. In general, everyone has the same version of software. Software is selected by a standards committee, or the office administrator. The file formats "come along for the ride".

With BYOD, the organization must pick the file formats and the employees pick the software.

An organization needs agreement for the exchange of documents. There must be agreement for the project files for development teams. (Microsoft Visual Studio has one format, the Eclipse IDE has another.) A single individual cannot enforce their choice upon the organization.

For an organization that supports BYOD, think about the formats of information. That standards committee that has nothing to do now that BYOD has been implemented? Assign them the task of file format standardization.

Wednesday, February 20, 2013

Adapt or die

Nothing says "we're a big company" like the sentence "Only resumes in WORD format will be accepted". The use of passive voice is strongly correlated to bureaucracy, as is the imperious capital letters for a (perhaps not-so-humble) product.

Demanding a single format is also arrogant. First, it says that you have a certain way of doing business, and that you are unwilling to change. Second, it says that you expect everyone else to conform to your way, regardless of their procedures or technology.

I suspect that many of these companies are using this approach from inertia. In the past, when Windows and Microsoft Office dominated the market, one could reasonably expect everyone else to use the same tools.

Times have changed. Microsoft Office is still popular, and common in corporations. Especially so for large corporations. But it is not universal. People and companies (especially start-ups) use other software and other formats. Local, desktop software now includes Open Office and Office Libre. Apple iWork is available. Google Docs (now named 'Google Drive') lets you compose and edit documents, spreadsheets, and presentations. Other formats include HTML, XML, and TeX.

Demanding a single format is so 1990.


Interestingly, firms that recruit for tech positions are some of the worst offenders. While I have seen none that ask for a facsimile, most ask for resumes in Microsoft Word format - and only that format. A small percentage deign to accept PDF.

This limitation strikes me as, well, limiting. Why accept only the one format?

I can think of two reasons.

First, a single format simplifies archiving. People can point to older word processor formats (Wordstar, Wordperfect) and claim that documents in these formats are no longer readable. They are right -- those formats are unreadable by modern-day word processors.

But the next version of Microsoft Word will (most likely) drop support for the old ".doc" format. When that happens, all of their old (non .docx) Word files will be unreadable, too.

The second reason for using a single format is for simpler internal procedures. If everyone in an organization uses the same format, then the organization can standardize on a single word processor, which reduces outlays for software and time for training.

But the extension of this internal standard to external communications seems unwise. It makes other jump through your hoops, which is at best discourteous. It may irritate your customers or candidates, or worse, drive them away. Do you want to lose business over a file format?

Tech recruiters look especially bad when they do this. It changes their image from "with it" to "stodgy" and "technically capable" to "technically limited". Demanding an old format (such as the soon-to-be-dropped ".doc" format) makes on look behind the times.

If I were a technical recruiting or staffing company, I would want the best candidates for the jobs available, not just those candidates that can jump through arbitrary hoops. (Although that may be exactly what some client companies desire.) I would want to demonstrate flexibility and adaptability to candidates and clients.

Think about your internal standards and your external interactions. Do you adapt to the world, or do you expect the world to adapt to you?