Thursday, August 4, 2022

Eggs and baskets

PCWorld, a venerable trade publication-now-website of the IT realm, recently lost its YouTube video channel. The channel was disabled (or suspended? or deleted?) and no content was available. For more than eight days.

From what I can discern, IDG's YouTube account was controlled by an IDG e-mail address. Everything worked until IDG was purchased by Foundry, and Foundry changed all of IDG's e-mail addresses to Foundry addresses, didn't change the account at YouTube, and YouTube, seeing no activity on the IDG e-mail address or maybe getting bounce messages, cancelled the account.

Thus, the PCWorld video channel was unavailable for over a week.

Why didn't PCWorld restore its channel? Or make its content available on another service? 

My guess is that IDG stored all of their video content on YouTube. That is, the only copy was on YouTube. IDG probably relied on YouTube to keep backup copies and multiple servers for disaster recovery. In short, IDG followed the pattern for cloud-based computing.

The one disaster for which IDG failed to prepare was the account cancellation.

I must say here that a lot of this is speculation on my part. I don't work for PCWorld, or at IDG (um, Foundry) or at YouTube. I don't know that the sequence I have outlined is what actually happened.

My point is not to identify exactly what happened.

My point is this: cloud solutions, like any other type of technology, can be fragile. They can be fragile in ways that we do not expect.

The past half-century of computing has shown us that computers fail. They fail in many ways, from physical problems to programming errors to configuration mistakes. Those failures often cause problems with data, sometimes deleting all of it, sometimes deleting part of it, and sometimes modifying (incorrectly) part of the data. We have a lot of experience with failures, and we have built a set of good practices to recover from those failures.

Cloud-based solutions do not eliminate the need for those precautions. While cloud-based solutions offer protection against some problems, they introduce new problems.

Such as account cancellation.

Businesses (and people, often), when entering into agreements, look for some measure of security. Businesses want to know that the companies they pick to be suppliers will be around for some time. They avoid "fly by night" operations.

A risk in cloud-based solutions is account closure. The risk is not that Google (or Oracle) will go out of business, leaving you stranded. The risk is that the cloud supplier will simply stop doing business with you.

I have seen multiple stories about people or businesses who have had their accounts closed, usually for violating the terms of service. When said people or businesses reach out to the cloud provider (a difficult task in itself, as they don't provide phone support) the cloud provider refuses to discuss the issue, and refuses to provide any details about the violation. From the customer's perspective, the results are very much as if the cloud provider went out of business. But this behavior cannot be predicted from the normal signal of "a reliable business that will be around for a while".

It may take some time, and a few more stories about sudden, unexplained and uncorrectable account closures, but eventually people (and businesses) will recognize the risk and start taking preventative actions. Actions such as keeping local copies of data, backups of that data (not local and not on the main cloud provider), and a second provider for fail-over.

In other words:

Don't put all of your eggs in one cloud basket.

No comments: