Sunday, December 18, 2022

Moving fast and breaking things is not enough

Many have lauded the phrase "Move fast and break things". Uttered by Mark Zuckerberg, founder of Facebook, it became a rallying cry for developing at a fast pace. It is a rejection of the older philosophy of careful analysis, reviewed design, and comprehensive tests. And while the pace of "move fast and break things" has its appeal, it is clear that "move fast and break things", by itself, is not enough.

Moving fast and breaking things results in, obviously, broken things. Broken things can be useful (more on this later) but they are, well, broken. A broken web site does not help customers. A broken database does not produce end-of-month reports. A broken... you get the idea.

Clearly, the one thing that you must do after you break something is to fix it. The fix may be easy or may be difficult, depending on the nature of the failures that occurred. A developer, working in a private sandbox, can break things and then restore them to working order with a "revert" command to the version control system. (This assumes a version control system, which I think in 2022 is a reasonable assumption.)

Moving fast and breaking things in the production environment is most likely a larger problem. One cannot simply revert everything to last night's backup -- today's transactions must be maintained. So we can say that moving fast is safer in developer sandboxes and riskier in production. (Just about everything is riskier in production, I think.)

But breaking things and fixing them is not enough, either. There is little point in breaking something and then fixing in by putting things back as they were.

As I see it, the point of breaking things (and fixing them) is to learn. One can learn about the system: its strengths and weaknesses, how errors are propagated, the dependencies of different components, and the information contained in logs.

With new information, one can fix a system and provide a solution that is better than the previous design. One can identify future areas for improvements. One can understand the limitations of external services and third-party libraries. That knowledge can be used to improve the system, to make it more resilient against failures, to make it more flexible for future enhancements.

So yes, by all means move fast and break things. But also fix things, and learn about the system.