Current State Clarity, Part 3 – Push Jobs

This post is part 3 of a 4 part series. Read Part 1 for an overview of the current state of the ecosystem, and Part 2 on Policyfiles.

If you’re unfamiliar with Chef Push Jobs, I dug into the Wayback Machine to find this post from Michael Ducy illustrating how you might use it in your Chef code. It’s a pretty old post. The reasons why I turned to the Wayback Machine and why you won’t find a copious amount of tutorials illustrating its use should become clear by the end of this post.

Since it’s initial release, Push Jobs has become open-source and there have been a number of updates. Push Jobs 2.x is the current supported version. This post specifically addresses Push Jobs 2.x (though most statements apply to 1.x as well).

The Problem

Chef-client is fundamentally architected as a pull-based system. Some use cases require an immediate, coordinated execution of remote jobs rather than relying on the system to converge over time.

Coordinated, remote execution of arbitrary commands can be solved a number of different ways. In the days before Chef a system administrator may have executed ssh in a for-loop. Since the early days of Chef many have used the knife ssh command to solve for this. Chef users can use the knife ssh subcommand to invoke SSH commands (in parallel) on a subset of nodes within an organization, based on the results of a search query made to the Chef server. knife winrm provides similar functionality for Windows-based nodes. While these solutions may be convenient and easy to use there are a number of limitations and failure scenarios that limit their effectiveness. Chef Push Jobs was built to provide a more resilient system.

Chef Push Jobs is a client-server based job system. A “job” is an action of command to be executed against a subset of nodes. Push Jobs allows you to run a job against any set of nodes defined on your Chef server. Those jobs run against nodes independently of a chef-client run. Push Jobs is an agent-based approach to remote execution that does not rely on SSH (like, for example, knife ssh). As such, Push Job agents are tightly coupled with a Chef server since they rely on it for authentication and search.

Push Jobs: Benefits

Push Jobs has some great built-in functionality. It is written as a flexible and scalable remote execution engine. You can define a job as a single command, a script, or any combination thereof. Jobs can be restricted to run against an explicit list of nodes or against a dynamic set of nodes as defined by the results of a Chef search.

Nodes with the Push Jobs client installed provide presence status to the Push Jobs server via heartbeat. Therefore, jobs also support the notion of a quorum. If a job requires an 80% quorum of target nodes to be available, but only 75% are available when you request the job, then that job will fail. Once submitted, jobs run asynchronously, in parallel, and you can check on the status of those jobs to determine their success. And, of course, Push Jobs has API integration to support submitting and reporting on jobs.

Push Jobs functionality is cross-platform (i.e. you can send the same job to Windows and Linux nodes). Client-server communications are fully encrypted and secure. You can restrict which commands can be executed by the Push Jobs agent. In our benchmarks, Push Jobs could manage jobs distributed to approximately 10,000 nodes before we would reach scaling issues. Overall, Push Jobs is a handy little package.

Push Jobs: Drawbacks

Push Jobs doesn’t scale as well as a Chef server (though this may not be a problem for many users). The Push Jobs server is not highly-available: it only works on a single server. Because it is an extension of the Chef server it must be installed locally on the same machine. Therefore, Push Jobs only works with standalone Chef servers. Due to its design nature, Push Jobs is also incompatible with Hosted Chef.

For security, executable jobs are whitelisted. Therefore jobs must be defined and shared per-node before they can be executed (though this can be handled by the Push Jobs cookbook and we whitelist chef-client out-of-the-box). As previously mentioned, Push Jobs not only requires setting up an additional service, it also requires setting up additional per-node agents. The per-node agents do not have the same wide OS support matrix as chef-client. Setting up and using those agents can lead to some chicken-egg edge case scenarios in highly complex environments. These concerns are mostly categorized as operational overhead and they’re mitigated once configured. But you should be aware these hurdles exist.

Job dispatch and reporting have some concerns to be aware of. Once triggered, jobs must complete or fail: jobs that have been dispatched cannot easily be canceled once in flight. As with all heartbeat systems, Push Jobs is susceptible to network partitioning and that may affect your ability to run or resolve the status of some jobs. The places and situations where this may be of concern are rare, but they do exist.

What this means for you

Should you be using Push Jobs in your infrastructure? The answer is: it depends.

If the pull-based convergence model of chef-client is enough for your environment, then you should continue to use your working solution. If the pull-based convergence model isn’t currently working for you, we encourage you to try designing for it. Pull-based models have several design benefits that make for more resilient infrastructures.

Push Jobs may be for you if you’re in one of the scenarios where they’re particularly useful:

Your remote execution needs surpass that which you can get from the simplicity of using ‘knife ssh’ or ‘knife winrm’ (they do a lot of the same things as Push Jobs).

Re-architecting processes to take advantage of convergent, pull-based methodologies is insufficient for your environment.

You run a standalone Chef server and you need a way to easily trigger chef-client runs on demand.

What we’ve seen from users of Push Jobs is that most fit into category 3: you only want to use Push Jobs to trigger remote chef-client runs immediately. For triggering chef-client immediately, Push Jobs is proven to work well out-of-the-box without much fuss: in that context it is considered feature-complete. As such, Chef Software Inc isn’t planning to add any new features to Push Jobs in the immediate future.

What this means for the Chef community

Push Jobs is available, open-source, and ready for anyone to use. It works very well for triggering chef-client runs on demand, but it should also work well in other contexts. If DevSecOps using it in a broader context, we’d love to hear about your experiences via feedback.chef.io, by emailing the Chef Product team, or by reaching out to us in the Chef community Slack.

Posted in:

George Miranda

Former Chef Employee