Resource background image


Stories from the Edge

Complexities of edge computing and how Chef is helping clients overcome them

Back to Keynotes

In today’s real-time digital first world the ability to provide the same level of customer experiences at a physical location as online is imperative to the long-term success to any company with edge operations. But given the lack of standards for widely adopted edge computing platforms, band-width constraints and lack of visibility on what’s happening on a given device at a given location this is no easy task. Layer on mergers, acquisitions, technical debt, and constantly changing hardware specs and the complexity of trying to accelerate the rate in which app updates and real-time data can be collected on the edge becomes a multi-year XXL size project.

During this session Progress Sr. Product Manager Trevor Hess leads a discussion with panelists who will share their perspectives on the complexities of edge computing and how Chef is helping clients overcome them.

We're going to talk about what DevOps in the edge looks like. We're going to talk about how we can help you with our offerings tackle the opportunities and challenges in the edge market. And then we're going to actually show some real world use cases from some of our existing customers. But first, let's talk about what that opportunity looks like. So if we look from some of the industry perspective, we're seeing that by predicting that by 2023, over half of new enterprise IT infrastructure will be deployed at the edge rather than in corporate data centers. That's up from less than 10% today. And we're seeing that probably by 2024 the number of apps that are at the edge will increase by 800%, that is massive.

But what is the edge? What is edge computing? Well most of us at this point are familiar with the cloud, the cloud is that other person's computer that we borrow to do all of our things that we don't have to have a data center anymore, right. Jokes aside, the cloud is our centralized data center or it's our actual cloud provider if we're using one. We also get the concept of a fog layer in this edge world. So that is, if you're looking at a warehouse or a branch or a store or a restaurant, that's that network or server set that exists inside of that, that allows those locations to connect back to those cloud or data center areas. And then finally, we actually get to those edge systems, which could be point of sale machines, they could be networked devices, they could be other things that have other devices attached to them, that basically are doing things in these remote locations to do something that affects the business that you're trying to do.

And these devices have different requirements than the sort of requirements we see in the world of the cloud today. These things have potentially low connectivity. If it is a remote location, if it is for example a bike rack somewhere that is managing one of those bike exchanges, that may have moments where it has low connectivity and can only check in from time to time. It may have to connect to other devices to manage those devices as well, and need to be sure that that system is running and operating so that those child devices can still continue to operate and work as expected. And their resource and network constrained because that bandwidth may not always be as great as it is when you have an amazing connection at a data center. We have to think about how we operate in this world.

So there are two major concerns that we have when we look at this, one is how do we actually manage the equipment and the device itself and the software that runs on it. And then of course, the concern that everyone has for everything now because we're seeing all these major attack and breaches almost every week now is security. How do you ensure that those devices and those edge machines that you have in your locations are safe and secure? Not only that, but again, if we come back to that idea of that the lower connectivity, you have to make sure that as those systems are deployed out at scale, that each one is updated the way that you expect it to be. Or that it has actually able to connect back to whatever it needs to do. Many places today still have individual field teams that go out to locations again whether that's a warehouse or a restaurant or what have you to go actually physically update the software in those places because there's not an easy way for them to manage all of those edge devices safely and securely and at scale. They need to be able to make sure that the system comes back online and is operational. If something goes wrong, they need a way to be able to turn that software back to where it was in a working state, or put it up to a new version that is deployed almost immediately. We need to know what the state of those applications are. They healthy, are they reporting the information that we expect them to? What version are they running on? What opportunities do they have, what is their state of their current compliance?

There's lots of these concerns that we have to think about and make sure are rolled out well when we are dealing with the situation where the scale can be thousands of devices. If you think about all of these different satellite locations, you can have hundreds of satellite locations of each one having tends to hundreds of edge devices that add up to thousands of these devices. And again today, many of the processes for updating and managing these devices can be manual, there can be few standards or just a single team that tries to go out to every single location over the course of months to update the things that need to be updated. And this results in a lot of things that are really hard to manage. How do you replicate a storm system, or a warehouse system without just building a model of a store itself to test and manage? You wind up with all of these different snowflakes that you need to manage. And you might end up spending a lot of time fixing these production failures when you have to go to every location and make sure that each one is up to date and running the right pieces. And when you have to go to the physical boxes to go manage these things, it can be very hard to manage the compliance and security updates that you need for these systems. Things can fail for unknown reasons and you have to send a tech out to manage them, and there's no way to get an art in glance view of what these systems look like and their state in real time.

Many of you are familiar with us as Chef as some experts in the space of DevOps, bringing that policy as code model where we see these opportunities to deliver software quickly and safely and at scale. And those practices, those principles are really key for maintaining and achieving high performance on the edge. And this is where I'm going to pass things over to Heather, she is going to talk a little bit about an article that she actually wrote recently, and on the why, these things are important and what we need to manage. Thanks, Trevor. Great overview and thank you for lifting the fog for me on what edge computing is and what some of the challenges our clients are facing in their edge computing environments.

As I listen to you and as I've looked at the edge computing environments that customers are dealing with, I always keep asking why is DevOps still not being adopted in mass scale and edge environments for a lot of industries and a lot of customers yet. So really that's what I'm going to talk about now. And really, when I wrote that article a couple of weeks ago, what really hit me was this article I had read on computer weekly back a while ago, and I read this article and I really took away that when it comes to managing edge, it's not really a DevOps approach that's needed, but an OpsDev approach. And why do I say that, it's because edge really totally changes the dynamics around configuration, and testing of an application. You're dealing with relatively simple devices on the edge single purpose devices, all of the complexity is built into the app. The apps need to be self sufficient, they need to be able to run on a variety of different hardware standards, they need to be able to run in a variety of different conditions, much of the Ops is actually built into the app, the configuration, the networking protocol, storage protocols, data protocols. In addition, many edge apps also need to be deployed across other environments and data centers still.

So this creates a complex and massive matrix of code bases, pipelines, deployment processes, and operational practices. Really success on the edge means apps are not only architected to be functionally complete, but also operationally efficient at scale. So, what then does the concept of OpsDev look like? Looking at our customers in the field that are successful and are really kind of like being able to scale their edge operations, talking about customers that can do updates weekly, nightly, even daily, pushing out new features to their edge environments, really kind of the best practices that they've adopted, and kind of summarized them into these five areas here you see. 1, they're using an As code approach, machine readable code that is testable, searchable, is a must for scaling automated pipelines and any org, hopefully if you're a chef user and you're at Chef Conf, you're fully bought into this already. Open source community content. This is just really, really extra important and edge environments. There's just so many devices, so many different configuration, so many different artifacts needed to successfully deploy out to the edge, having that library of content out there that you can use is just critical to kind of building a critical mass of automation assets. And then the third one, this is really what I think is really kind of the magical piece here is standardizing application packaging and testing practices. You can't take a lot of the complexity out of an edge app, but you can take some of the complexity around how that app is built and tested. So, by adopting a standard approach to packaging and creating a single immutable artifact that runs the same in production as it does in depth, so many problems are eliminated. And then the fourth one, implement an edge friendly deployment automation solution. Here not all automation solutions are equal when it comes to edge, many DevOps tools just don't work well in high latency and/or air gapped environments.

Take for example, agentless solutions they lose connectivity, they lose the ability to communicate. In addition to other nuances like working with embedded systems or running on bare metal installs. And then the last one, which Trevor talked a lot about I love that picture with the tree is the story about if a tree falls in the forest does anyone know if they didn't hear it, same thing with your edge devices, right. The ability to validate in real time is just so important. There's no doubt in that environment with that many different systems in that scale failures are going to happen. Really how quick you're able to identify them and deal with them is what is critical. And with that, I'll turn things back over to Trevor, who's going to talk about how Chef helps clients on the edge.

Thanks, Heather. So now as Heather mentioned, we're going to talk about how we can help you tackle these challenges on the edge. So as always our code has our pattern and process that we've always taken to approach solving infrastructure operations and development problems, it's taken as code approach. It gives you things that are testable, scalable, and consistent that you can bring to your fleet to make sure that everything is safe and secure. We keep majority of our code open source, we have excellent community support, we give folks the ability to create standard packages that you can have a single artifact that can be deployed to your systems to manage those applications and deployments. We give you edge friendly automation capabilities, which we'll talk a little bit more about in some detail ahead, where we have air gap support, self-healing infrastructure, as well as roll forward and roll back capabilities that are of those applications. And of course, the real time validation.

So being able to look at that application dashboard and Automate and really see the state of the applications that are running in your fleet so that you can understand where there's a problem or challenge across your deployment. And the way that we do that, is through our enterprise automation stack. Many of you may be using this already, we have a few flavors of it that are the flavor of it is, it contains our infrastructure management capabilities, our compliance and remediation capabilities, and our application delivery capabilities.

So all of those pieces that you may have used already, those of you who are using it, those of you who may be joining us now for the first time, being able to manage your infrastructure as code, being able to manage your compliance as code with those pre-built content packages that we provide, and being able to really build out complex automation for your apps to make sure that they stand up and are built the way that you expect them to be. And we achieve that through this, those of you who are the things that we can help with here, we provide these scalable patterns, we'll stack visibility and unified experiences.

So we're talking about edge computing here today, and how we can help with that, but we can also help with cloud migrations, continuous delivery, patch management. Again, we do infrastructure management compliance, application delivery, we've got our automated system to make it easy for folks to see all of this information come back through. But let's actually go a little bit deeper into our area of application delivery. So I'm to invite Rahul Goel, to speak. He's the new engineering manager over our application delivery capabilities and products. Rahul, do you want to give folks a little bit of an overview of app delivery? Sure, thank you Trevor. So this is Rahul here, I'm the engineering manager for the app delivery team. And I have recently joined progress and Chef six months back. So I'm really super excited to speak with you all in this forum. So we're talking about app delivery. So what is Chef app delivery all about? Chef app delivery is an automation solution to basically be able to build, deploy, and manage the applications and services, both stateful and stateless.

We can deploy and then we have so many different infrastructure environments, including environmental and frankness. | some of the key functionality which we support and provide, is basically around rental car commissions, managing the different environments like delivering test, unified and consistent packaging. release management and automation of rollbacks and roll forward is another area which we have, which we'll talk about in subsequent states. So, where does that delivery framework look like? So I think that app deliver framework consist of products, and we're going to talk about some of these products here. The first product which we have is the developer workspace, it is also known as Habitat Studio. It is a developer friendly workspace where packages can be built and tested in a clean room environment. The next product, which we have is that our own package management, Chef habitat builder is the product which we are talking about here, and it can be consumed either as a cloud based or on primary solution, basically planned fires are stored in the shift habitat builder, and they can be viewed in excess. The next product, which we're talking about is habitat supervisor, which is the runtime agent. The supervisor is more like a process manager with advancement time management capabilities. Finally, we have app dashboards in our economy, which provide visibility and validation capabilities.

Essentially, the supervisor reports the help of the application to the app dashboard. So how does it hurt us all if it works? The process of automating applications with Chef help can be roughly divided into three phases. Define, package, and deliver. During definition phase, we go about declaring a manifest, which is essentially how the application should be packaged, delivered and managed. Once we have defined it we can build and package it. Once we have created the package, basically it became an artifact which is also known as a HART file. HART file can be exported to run in a variety of runtimes, basically with zero refactoring or rewriting. Finally, applications are delivered and managed by Habitat Supervisor. This agent can run on bare metal, in virtualized environments, containers, or in cloud platform. The supervisor is also used during the packaging process to ensure that the artifacts which we are deploying and run are basically created in the same way and tested inside of the environment. What are the advantages which we get from app delivery for Edge scenarios. First advantage which we have is streamlined builds. When we talk about streamlined builds, we are referring to hard files basically, which are single artifacts consisting of what the app needs. It is super lightweight and efficient way of packaging apps. The second advantage is about operational override and how do we reduce it? So we have a product which we were talking about, which is supervisor, which is a very lightweight agent and has an ability to manage the run time, watch for updates, and provide health checks. All those functions enable reducing the operational overhead significantly. The next advantage which we can talk about is managing the deployment period. So as we all know, deployment tends to fail sometimes due to various reasons.

And we again talk about supervisor, which has advanced capabilities like-- reconfigurations, supporting different technologies, being to be able to configure it to a build with different strategies. It also has capabilities around going forward and going back. It can watch for packages to be updated and updated automatically as they are available in the builder. Similarly, rollbacks can also be performed as skills seamlessly in case of failures. OK. So now we've spoken about what I believe it is all about, and what are the different products and the capabilities. I think we want to take a little bit of a deeper dive and understand how do we define the package? And how does it help to create the streamlined build? So essentially, when we're talking about definition of a package on a hard file, we're talking about a plan. is what we altered during our development phase. And here's an example of a plan file, which is a definition for an app consisting of multiple sections metadata and dependencies, runtime, buy-ins, and callbacks.

As we can see in the example, there's really a cool thing that we have defined Tomcat as one of the dependencies. That means that we don't have any prerequisite to basically install Tomcat rather than that Tomcat basically gets packaged and deployed as part of the same hard file. We also have the ability to basically override the default behavior of Chef Habitat in each of the different build phases through build phase callback methods, which are also highlighted in this slide. This slide basically talks about end to end scenario. So we basically start from the Local Dev. work station where the package is defined, and finally, it is basically committed to Source Control and then using one of the CICD pipelines, we basically build the package, the package is uploaded to the Builder, which basically acts as a repository for that package. On the other side of the wall, we have edge devices which have a supervisor running. The supervisor has an ability to watch for a different package of builds to different channels and picks up the changes and deploys it on different edge devices.

Finally, supervisor reports the health status to the app dashboard in Automate. So this is an example of a use case where we have a weber, which is being packaged and deployed using Chef Habitat. It is being managed by a supervisor. And we have an instance of builder, which is having a definition of a channel named rb_prod. And this channel can be watched for updates. Now, if we have to-- so for example right now we have version 7.0 basically available in that particular channel, and that is the version of the package which is running at the runtime for that particular app, and the health of the application is being reported to Chef Automate. So now basically, if we have to promote a new version 8.0, we do that using a command that basically brings that package into a particular channel named rb_prod.

And once that particular version is available, supervisor has an ability to basically watch for the update. As soon as the package is available, the supervisor is able to download the package and basically deploy it in the runtime. If a rollback is needed, as we are seeing in this slide-- for example, due to some runtime failures, we want to rollback the 8.0 version with the more dev. version and basically, again, supervisor automatically picks up the previous version to complete the rollback. What are the capabilities for edge we have today? So basically, Chef supports different ideas of operating systems and architecture for different products like Infra, Hub, and InSpec We also have capabilities in our delivery around automation for-- we basically Chef Habitat. We also have capabilities around compliance with Chef InSpec to design and continuously test and enforce security and compliance standards.

Finally, we have capabilities around infrastructure management with Infra client. now we're going to talk about some real-world use cases from some of our greatest customers who loved to do these things with Habitat and with the Enterprise Automation Stack. So one customer that you've probably all heard of-- actually, I think you all have heard of all the customers we're going to talk about today. Walmart has an application that runs some of the cameras in their stores that leverages some open source technology to do some image processing. And that software can get very cumbersome. It has lots of configuration points. It can be challenging to customize and challenging to update and deploy out.

So Walmart used Habitat to package that application and make it easier than ever for their teams to package that software, make those updates, and give them monitoring and control that they need to know that the software they're deploying is running the way that they expect it to with the configurations that they expect as well, which has let their developers focus much more on solving the problems that they have, and not the platform that they're using. Another one of our customers is Panera Bread. So Panera is probably the largest scale deployment of Habitat. They are currently managing about 12,000 machines with Chef Infra running over 58,000 Habitat application instances. They average about 28,000 builds per month. So if you're going into a Panera store, the systems inside of that store are running Habitat to run the software that they need to manage the store and get your orders and everything through that you need. Again, this is one of those scenarios where the team is needed to go to locations to go and deploy the software that they needed. And it could take up to a month to get all of the software updates out that were needed. And now they can do that in a day and monitor those deployments as they roll out from Dev, preprod to production. And those Dev. ops teams, as they're doing those rollouts are no longer needing to work nights and weekends to get that done. And again, they can then see once those changes have been made inside of Automate what the rollout looks like, and what the health is of those deployments.