Ontology, Infrastructure Classification, and the Design of Chef

An example ontology specification. CC-Attribution-NoDerivs by gertcha on Flickr
An example ontology specification. CC-Attribution-NoDerivs by gertcha on Flickr
In philosophy, ontology is (as Wikipedia says) “the study of what entities exist or can be said to exist, and how such entities can be grouped, related within a hierarchy, and subdivided according to similarities and differences.” Wikipedia goes on to say that ontology is often paired with taxonomy (the science of classification) in IT applications. Chef, however, was explicitly designed to not be an ontological system, in contrast to many other solutions on the market. Why is that? I’d like to take a few moments to explain the design thinking behind Chef — and why we feel that not being an ontology allows us to be the most flexible and extensible automation platform.

One of the principles of good software design is to provide just enough abstraction to make reasoning about things easier. Too little abstraction, and a developer is always down in the weeds. Too much, and you are shoulders-deep in frameworks that bear no resemblance to earthly objects and are impossible to reason about. One can regard the C language as striking the right balance for developers writing code to interface with hardware. It gets them close enough to physical entities (memory, processors, CPU registers) without forcing them to write assembly language, but also without introducing unnecessary concepts for this problem domain like object-orientation. Perhaps this is why C++ has not been as successful in this space.

From Chef’s inception we have also tried very hard to strike a balance. Because Chef is a thin DSL (domain-specific-language) on top of Ruby, we’re able to provide the same user experience that a restrictive ontological framework would give you: namely, a built-in taxonomy of all the basic resources one might configure on a system, plus a defined mechanism to extend that taxonomy. But there’s no possible way that the developers of Chef can possibly anticipate a priori what resources you’re going to want to configure in the future. Thus, the full power of the Ruby language is available to you. It is for these reasons that Ruby was explicitly chosen as an implementation language, because it provides flexibility to operate in both modes. Other languages do not (much as we love Perl).

This leads me to reflect on the phrase “infrastructure as code” and what the definition of “code” is in this context. I see this term being thrown around by many people to describe their solutions. Not all purported “infrastructure as code” solutions are actually that. Actual program code has certain useful properties. Chief among them are easy composition, extension, introspection, and most important, testability: the ability to formally instantiate mock objects and examine the behavior of them in a test context. This is distinct from infrastructure as a ontological document, like JSON or YAML. A grammar with control flows or variable substitution does not make infrastructure as code. It is this property of Chef as being real code that has fueled the fast growth of a testing ecosystem around it. Tools like ChefSpec and Foodcritic could not exist otherwise.

Introspection and composition also allow us to extend Chef into problem domains that didn’t exist when Chef was first written: for example, containers, fleet management, or compliance. Clever engineers are able to extend Chef in a fully-supported way to managing entire fleets of machine using a toolkit like Chef Provisioning, and indeed are able to pull off tricks like addressing Chef within Chef (see the Cheffish library and the experimental resource cookbook). Conceptually, you could even extend the Chef DSL to express any policy in Chef: to grant user X access to certain GitHub repos, to put them into certain Active Directory Groups, even to request that the HR system mark them as an active employee on their first day and to set their weekly pay! All that code is just waiting to be written.

Ever since I first encountered CORBA in college, I’ve been extremely wary of ontology-based solutions. I am particularly unnerved by giant committees like OASIS who are trying to create ontologies to define all of cloud computing using XML. The world changes too quickly to convene subcommittees in order to figure out how to represent new innovations like Joyent’s Triton or Amazon Lambda in a timely way, to say nothing of the complexity of trying to document the world using XSDs. Have we learned nothing from the web innovators’ migration away from XML/SOAP and the rise of RESTful web services over HTTP exchanging simple, JSON payloads?

Ultimately, we believe Chef is the best automation platform because of its infinite flexibility and extensibility, and yes, explicit rejection of ontological design. We cannot possibly anticipate what systems or technologies customers will want to manage in the future, and besides, why should we be the experts? Our users, the domain experts about those things, should be the ones having a say in how they are configured. We merely bring the toolkit to the table, and in the words of the great Larry Wall, “there’s more than one way to do it”.

In another article, I’ll talk about how the extensibility of Chef can be used to solve problems seemingly unrelated to traditional configuration management like security and compliance. Meanwhile, for more information on the shortcomings of ontologies, I invite you to read David Weinberger’s book, “Everything is Miscellaneous: The Power of the New Digital Disorder“.

Julian Dunn

Julian is a former Chef employee