Mike Conigliaro

The Problem with Other People's Infrastructure Code

We have a problem with instant gratification in our industry. I first wrote about this back in 2008 when I complained about wizards, but these days, I’m more concerned about shared infrastructure code (scripts, Chef cookbooks, CloudFormation templates, etc.). With the proliferation of configuration management systems and similar tools, it’s now quite possible to copy and paste your way to a functional production environment in minutes. Unfortunately, I believe all this instant gratification comes with hidden but significant long-term costs, and it needs to be recognized for what it (usually) is: technical debt.

Lack of Control Means Lack of Consistency

A lot of people like to think that shared infrastructure code is like software libraries; that it’s composable and can be swapped in and out with little forethought. The reality is often far from that, forcing you to accept other people’s infrastructure decisions in a way that can be very difficult to change without a series of painful migrations.

Go down this road enough times and you’ll realize that it’s basically impossible to enforce any kind of consistency across your infrastructure when your implementations come from different authors who are not beholden to your organization, and who have wildly different opinions about the “correct” way to do things. This lack of consistency can very quickly turn on-boarding and general management of your environment into a nightmare.

Increased Complexity

For some, the solution to the inconsistency problem is an attempt to create something configurable enough to work for everyone. Unfortunately, this leads to exponential increases in code complexity as the author inevitably tries to support every possible configuration on every operating system known to man (I’m fully convinced this is how Chef got its undeserved reputation for being complicated). This increased complexity often leads to more bugs and more time spent troubleshooting, which will begin to negate the time you thought you’d saved in the first place.

On Reinventing Wheels

Being able to push a button and get a cluster of machines in a few minutes is only the beginning of a good implementation. What will it take to maintain that cluster over the long-term? How do upgrades work? How do you recover from a node failure? How do you avoid getting woken up in the middle of the night? These are the kinds of things every good operations person is thinking about before introducing a new technology to their environment, and having infrastructure code that complements the long-term plans is an integral part of the solution. The alternative is akin to closing your eyes and hoping that some random person on the Internet’s ideas perfectly align with yours.

In this case, “reinventing the wheel” is far from a waste of time, because it’s probably the only way to get a wheel with your exact specifications. More importantly, since reinventing wheels is how we learn to build and maintain wheels, any time spent here will pay huge dividends later in the form of knowing what to do when something goes wrong.

Conclusion

While shared infrastructure code makes for useful examples and as a quick way to test drive things, it’s probably not the best building material for your production environment. Far different from composable software libraries, careless use of other people’s infrastructure code is more akin to forking a web application that looks similar to what you want rather than starting from scratch and building the simplest thing that will work for you. The former strategy will most likely result in a mess, while the latter will result in a codebase that is far smaller, simpler, more consistent, and easier to maintain over the long-term.

blog comments powered by Disqus