DevOps and Monitoring

Traditional monitoring includes monitoring low-level items like CPU, memory, and disk utilization. It is still important to understand and have data for these things as they will help with capacity planning and may be helpful data points when responding to incidents and outages.

DevOps is a cultural and professional movement focused on how to build and operate high-velocity organizations.

Practicing DevOps means understanding who the customers of a service are and their needs. A DevOps approach to monitoring may start by answering the question, “is it up?” Starting there helps encourage discussions and discovering what “up” means. Those discussions happen across many parts of the organization to ensure common understanding. It may mean that your customers can pay you money, that they can stream video from your site, or that customers can reserve seats on a flight or in a venue. 

Customers’ experience of a single interface is likely provided by a number of backend services working in concert to help the customer complete the task at hand. Understanding “up” means understanding how all of these services work together and which parts are essential.

It’s nearly impossible to talk about monitoring without also discussing alerting. Typically alerts are sent to people when monitors pick-up anomalies. Sometimes these alerts are actionable but, too often, they end up just being noise. For example, there may be a spike in CPU load caused by some batch processing that is not actually having an impact on the customer experience and will end after the batch process has completed. Given that scenario, it is not appropriate to send an alert potentially waking someone at 3AM. Yet, this is often what happens. Practicing DevOps means that we put our people first and waking them at 3AM to tell them about something that is not important and requires no immediate action is inhumane.

I recently had the pleasure of joining Leon Adato (@leonadato), Clinton Wolfe (@clintoncwolfe), and Michael Coté (@cote) at THWACKcamp, an online conference hosted by Solarwinds to discuss these ideas about monitoring an more. We were all part of a panel discussion titled ‘When DevOps Says “Monitor”.’  A recording of the panel, a transcript, and other resources are all freely available now over on the THWACKcamp site. Check out the recording and let us know what you think.

How are you reinventing your organization’s approach to monitoring and alerting?

Nathen Harvey

As the VP of Community Development at Chef, Nathen helps the community whip up an awesome ecosystem built around the Chef framework. Nathen also spends much of his time helping people learn about the practices, processes, and technologies that support DevOps, Continuous Delivery, and Web-scale IT. Prior to joining Chef, Nathen spent a number of years managing operations and infrastructure for a number of web applications. Nathen is a co-host of the Food Fight Show, a podcast about Chef and DevOps.