Blog-Desktop_100x385

api.berkshelf.com outage and corrective actions

Yesterday api.berkshelf.com suffered periodic outages over about an eight hour window. Upon learning of the outage, we worked with the core berkshelf team to identify the cause; the Heroku dyno was hitting its memory quota. We then assisted in transferring the instance to a Chef Heroku account to cover the costs of moving the instance to the top tier “dyno” to stabilize the instance by providing additional memory.

Chef stands behind Berkshelf. Today we held a well attended post-mortem with members of both the Chef and Berkshelf teams to discuss the corrective actions to take in light of this outage, and how Chef can help. The Chef Operations team is going to set up monitoring for the API, and Chef will provide support to resolve any outages. We’re also going to have one of our engineering teams work on improving the caching in berkshelf-api to reduce the memory footprint.

We greatly appreciate all the time and effort that the Berkshelf team puts in to providing this tool. In many ways they continue to be recognized as rock stars of the Chef community. We’re going to support them operationally going forward so they can get more rest and keep providing you with awesome open source tools.

Bryan McLellan