Continuous Availability at IBM
What it means to run the cloud that supports very visible event sites.
IBM sponsors many well-known sporting events, such as the US Open and the Wimbledon Championships. The IBM Continuous Availability Services – Events Infrastructure (CAS-EI) is the team that operates and manages the very popular web sites for these events. For example, during this year’s Wimbledon Championships, the team delivered half-a-billion page views during the two-week tournament. In this article, IBM CAS-EI members Brian O’Connell and Rich Bogdany talk about what it means to run the cloud that supports these very visible sites.
The IBM event cloud
The IBM cloud that the events team uses is, in fact, a hybrid cloud. Some of their workloads run in the public cloud, on SoftLayer, while others run in the private cloud, which is based on OpenStack. Currently, AIX makes up a large part of the infrastructure, along with SUSE Linux and Red Hat Linux. However, going forward, the team has decided to support a single platform, Red Hat Enterprise Linux (RHEL). They are gradually moving their systems over, and bringing them under Chef management as they do so. Their development workstations are a mix of Red Hat Linux, Ubuntu and OSX.
The overall environment is complex. There is a test environment, a pre-production environment, which consists of two separate locations, and a production environment, which consists of seven locations. Each production location is treated separately. It’s not unusual to take a cloud location down for a week or even two weeks at a time, or to keep it at an older version, while the web site itself is always available.
Rich Bogdany, Software Engineer
Part of our infrastructure hasn’t been ported to Linux and Chef, yet. I keep seeing people re-doing things because they’re doing it manually. I keep saying, I can’t wait until that’s on Chef so we’re not doing that.
Along with migrating the rest of their infrastructure to Red Hat Linux and Chef, one of the team’s priorities is to increase the number of customers who are in charge of their own deployments. Brian says, “We already have customers who can push the deployment button but they’re so close to our team they understand how things work. Now we’re broadening out to the rest of our customer base.”
Of course, there’s no lack of things to do. Brian says, “We’re always going to be iterating and improving. We’re still trying to figure out how to get all our customers on board, and still build everything repeatedly and easily. Let me put it this way, our backlog is growing, not shrinking.”