Automating Heterogeneous Environments at CenturyLink Cloud

Matt Wrock is a Principal Software Engineer at CenturyLink Cloud, where he works on data center automation. Many of you are already familiar with Matt from his blog Hurry Up and Wait! Matt writes about many aspects of software engineering, including Chef. He spoke on Entering the Chef Ecosystem From a Windows Background at ChefConf 2015, where he also received the title of Awesome Community Chef. Matt is the project founder of Boxstarter, which automates Windows installations and is a contributor to Chocolatey, a package manager for Windows.

We caught up recently with Matt and talked about how he uses Chef in heterogeneous environments. Not only does this mean a mix of Linux and Windows machines, but it also means applications comprised of both .NET and Linux components.

Matt says, “Our main application, the CenturyLink Cloud portal, is a .NET app that uses Couchbase, Elasticsearch, RabbitMQ and HAProxy as its core infrastructure components. These all run on Linux and are deployed by Chef. The .NET web application and middle tier services run on Windows inside of IIS and other standalone Windows services. Chef provisions and configures the Windows servers. The .NET apps themselves are deployed by Octopus Deploy, a deployment automation solution in the .NET space. It deploys NUGET packages from our build server, where they are produced, to the application servers. Chef interacts with Octopus Deploy to ensure that newly provisioned Windows servers have the latest release deployed to them. All of these heterogeneous components talk to one another via a variety of SDKs and protocols.”

This diagram shows the interconnections between the various components.


The Chef infrastructure and Octopus Deploy are located in a single data center while the Windows and Linux clusters are mirrored in many data centers.

If you are just starting to use Chef in a heterogeneous environment, Matt suggests that you first decide on a single operating system to run your Chef workstations and any other components that serve as the “hub” of your Chef continuous delivery (CD) pipeline. “At CenturyLink Cloud this pipeline includes our workstations, Chef provisioning nodes, as well as the build servers and agents that run the actual cookbook CD pipeline. If you’re in a mixed environment with both Windows and Linux, you could run all your knife commands and do your general development on either Windows or Linux.

“Both, I think are OK decisions. It really depends on your team. If most of your infrastructure is on Linux, if most of your engineers are comfortable with Linux, then that makes sense for you. If you’re in a predominantly Windows shop, then it’s probably best to do that on Windows.

“Once you decide on the OS, the Chef ecosystem makes it easy for you, in the sense that most things are plug-in driven. With Chef provisioning or Vagrant or Test Kitchen, it should just be a matter of using the drivers that you need to talk to either your Windows or Linux boxes.

“Over the last year, Chef has taken much of the friction out of this process. With tools like Test Kitchen and knife-windows, a lot of the WinRM stuff works without a lot of configuration or need to know the internals. With Chef it’s easy to get up and running when setting up Windows boxes. You can sit in either one or the other OS but get test environments running in either operating system. One of the main reasons we decided to move forward with Chef as opposed to some of the other configuration management tools out there is that it had the best Windows story.”

Most of the cookbooks his team writes focus solely on either Linux or Windows, but there are a few examples of cross-functionality.

“We do have a recipe, for example, that formats drives. When we provision a box, we might have our chef-provisioning-vsphere driver add a disk to the box if it’s running a server that needs more space. For example, we want our Couchbase servers to have more space than the default so we’ll add a 50GB drive. We’ll do the same for some of our middle-tier Windows boxes. Once a box comes up, it’s a totally different set of commands for volume partitioning and formatting and all that. So there, we’ll have some branching inside of our Chef recipes to handle OS differences.”

Remote node management can be particularly challenging in mixed environments. Matt comments, “At CenturyLink Cloud, we run our workstations on Linux. This means that we can’t access Windows nodes via PowerShell remoting, and we don’t install SSH servers on our Windows nodes. However, we do bootstrap all Windows nodes with our chef-provisioning-vsphere driver, which enables WinRm communication. We can use knife winrm to issue commands on these nodes. For example, when we deploy updated cookbooks to an environment, we roll through each node and force a reconverge. Our internal deployment gem can invoke chef-client on all nodes, regardless of OS, using either WinRM or SSH as a transport protocol, depending on the node.”

If you use WinRM in your own environments, you are probably using the WinRm Ruby gem. Matt says “Many people don’t know that this gem installs a bin executable, rwinrm, that you can use to enter an interactive WinRm session on a Windows node, similar to an interactive SSH session.”

Unit testing is another challenge for mixed environments. A Chef recipe for a Windows node can potentially wrap many PowerShell scripts. In some cases this may be the best approach but it is nearly impossible to unit test. ChefSpec/RSpec can’t do much more than stub these calls so you need to capture these tests in Serverspec, which requires a provisioned test node, which means a longer test feedback cycle. Where possible, Matt suggests using Ruby instead of Powershell scripts. Matt notes, “When we invoke Octopus Deploy to send the latest built release to a newly provisioned web server, we have the option to use either the .NET SDK via PowerShell, a command line .exe, or raw REST calls. Any of these methods can be implemented in a Chef recipe. We use the REST API. The httpclient and webmock Ruby gems makes it no more verbose to use REST than PowerShell, with the added benefit that our logic can be unit tested locally, which allows us to iterate on the recipe more quickly and with strong test coverage. Since most branching logic is covered in the unit tests, we usually only need a couple ServerSpec tests to capture the end-to-end scenario tests.

“On the other hand, for tasks that you have to do in PowerShell, like volume partitioning, there’s probably no way to accomplish them directly in Ruby without a shell. Windows does not expose a REST API for drive partitioning. You have to use either PowerShell or command executables. You’re going to have to run those tests locally on a provisioned test node to see if the tasks work or not.

“With unit tests I can mock out raw HTTP responses and add functionality very quickly to get a good idea if everything’s working. Then, I’ll write integration tests that make sure things are working end to end. It’s nice to have that fast feedback loop and not have to wait until you have a whole bunch of code before you can see if any of it works.

“Also, as I mentioned in my last post, we do almost all of our work within a Vagrant VM, so we can perform our unit tests right there. It’s much faster than spinning up VMs, and I don’t have to talk over a wire; it’s all local.

“When we began working with Chef we already had a background in test-driven development (TDD). So, when we saw ChefSpec, there was an immediate affinity. Right off the bat, we started writing all these ChefSpec tests. We soon learned that if you’re simply testing if a resource got called with particular attributes, the tests may not provide a lot of value and may need to be changed along with any change to the accompanying recipes. However, especially with LWRPs or libraries, ChefSpec comes in very handy, especially when we have a lot of our own raw logic and we’re not just calling resources. ”

For Windows, Matt urges people to be aware of the windows_task resource. It reduces the amount of boilerplate code that you’ll need to write to create a scheduled task. This is particularly useful when performing operations on a Windows node that will be denied when running under a remotely authenticated account or when requiring one’s authentication context to travel a “second hop,” like when accessing a network share. See Matt’s post on this topic for more details on when remote commands can require a scheduled task on Windows. Because the Chef client runs locally, this is often not a concern, but is very relevant when running the Chef client remotely, perhaps via Test Kitchen, knife winrm or Chef provisioning. When recipes succeed locally but fail via these remoting tools, the windows_task can help.

Matt also encourages people to consider using DSC. He says, “When we were doing a lot of our initial Windows recipe writing, it was extremely early days, not only for DSC but for Chef integration with DSC. If I were starting now, I would be leveraging DSC far more. As we move forward and come across things we need to automate—for example, we’re about to tackle some of our SQL Server nodes—you better believe we’re going to be looking for DSC resources because that can take away a lot of the work.”

Finally Matt cautions against including the installation of Windows updates in the initial Chef bootstrapping. “We initially used Boxstarter to install Windows updates when provisioning Windows and quickly identified this as an anti-pattern, primarily because of the time it added to provisioning.” Matt notes that updates take a considerable amount of time to run, involve reboots and introduce added complexity if bootstrapping is being invoked from a remote provisioner. Instead consider a tool like Packer to preinstall patches into your base OS images.

Matt posts regularly on his own blog and has a lot more advice, particularly on working with Windows. For example, in his post discussing the creation of lightweight Vagrant boxes, he talks about ways to reduce base Windows image sizes.

Roberta Leibovitz

Former Chef Employee