This article originally appeared on InfoQ.
I find myself constantly involved in continuous dialogues around the various approaches for enterprise application management in a cloud and DevOps era. One thing that I found particularly difficult in these discussions, is explaining the fundamental shift from the way we used to manage applications in a pre-DevOps/cloud world to the way we manage them today. And then I realized why.
A large part of the reason why this has been so hard, is that we still use the same terminology to describe the solution or features to overcome enterprise application management grief (logging, monitoring, orchestration…), that have historically been used to describe the management of these applications. All while these fundamentally differ from the way we actually implement or consume these features. Then I discovered how I could qualify and quantify, and even reconcile the gap between the traditional data center management model to modern DevOps frameworks.
In this post, I wanted to lay out these thoughts, as I'm sure that many others that are also in the thick of such discussions, especially in the enterprise world that is still heavily controlled by traditional IT groups, can leverage some of the lessons from this experience for their own discussions.
DevOps is Not a Feature!
Faced with this challenge, I believe what best explains where we have gone wrong is when we start thinking of DevOps as a feature.
Quite often you hear the IT guys point to their Tivoli, BMC, CA or other management solution that they use to manage their data center as the basis for a solution for DevOps - because it's a “standard” within the organization.
They do realize that there are special gaps and needs that they need to bridge to satisfy their DevOps team such as continuous deployment scenarios, better automation, and such. However, I’ve found that, initially, to bridge these gaps, the most common approach has been a gradual extension of traditional data center management tools with new features such as continuous deployment tools, automation tools and other “DevOps add-ons, features, and so-called facilitators”.
How is data center and application management different in the DevOps world?
To answer this question we first need to understand the shift that we're going through, and where we're heading.
I like to use the following slide to describe the shift that we're going through.
In a pre-DevOps world our entire data center was built under the assumption that each organization has its own special needs, and therefore requires a tailored approach and solution for almost every challenge. We used to be very proud of how special our data center is, and even kept the way it's run very secretive, and rarely talked about it publicly.
In a post-DevOps world, data centers are built as agile infrastructure that is optimized for the speed in which we can release new products and features, as well as the cost it takes us to achieve this. When we optimize for these goals, being special becomes a barrier, as it results in significantly higher costs and slower processes.
This is where the slide comes in. I find this analogous to the shift that happened in the car industry, when they moved from building custom cars to a production line like Ford or Toyota. That was a major shift not just in the way cars were designed, but also in the way the car organizations were structured. When we optimize for speed and cost we cannot afford silos, we cannot afford high-end devices that are optimized for the extreme scenario. Instead, we have to break silos and we have to use commodity resources.
This also leads to a huge culture change. We're now seeing all the "web scale" companies on stage speaking about their solutions, and even sharing them as open source projects. And they're doing this not because they see the things that they are doing as less valuable. Quite the contrary, they do so because they believe that by doing so, they can get better at a faster pace. Bigger, better, faster.
Ok, so how does this map to concrete features?
Even when I present the slide above, and all the heads in the room start to nod in agreement, it is still never enough.
People nod their head in agreement, but still continue with the approach of adding features to their existing management tools to bridge this gap.
Some of the vendors in this space even went a step further, and have rebranded their solutions, thinking that by calling them by a different name and adding a new bundle it would make these tools fit into this new DevOps world.
I, therefore, had to find a way in which I could also quantify this gap for the product managers in the room. To do that, I used the table below which maps the difference between management solutions in a pre and post DevOps world:
|Closed Source||Open Source|
|Limited Scale (x100s) - rely on a centralized database||Web Scale - everything needs to scale out|
|Manage Hosts/Devices||Managing Infrastructure Systems and Clusters|
|Infrastructure Centric||Application Centric|
|Limited plug-ins||Future Proof|
Monolithic vs. Tool Chain
In a pre-DevOps world, if you wanted to provide a management solution you had to develop your own logging, monitoring, billing, alerting and any other proprietary systems, simply because there was no other way to do it. This resulted in a fairly monolithic management solution.
In a post-DevOps world, we're looking for a best of breed approach where we select a tool chain that keeps on changing and growing fairly rapidly. Every DevOps group tends to select their own set of tools in this chain, which are for the most part from the open source community. They wouldn’t consider a solution that provides a suite of mostly closed source services all coming from the same provider, because by definition, that will both limit their ability to select and integrate new tools into their processes as they are being introduced, and it would also lead them to compromise on the quality of each service. This is because most of the monolithic solutions do a fairly average job with each layer (e.g. logging, monitoring...), as opposed to the individual projects that tend to be best in their domain.
Closed vs. Open Source
In the Devops world, open source has become a key criterion, where many of the traditional management solutions were built as closed source solutions. Contrary to what most people think, the popularity of open source isn't because its entry level is by definition free. Open source determines how well one can use or customize a given framework to their needs in areas where they see gaps. What’s more, it creates a community of users who develop skill sets around these tools, allows for more natural integration between tools, and many other aspects that at the end of the day have a direct impact on the ability to achieve higher productivity and speed of innovation.
Limited Scale vs. Web Scale
Most traditional management solutions were designed to handle tens or hundreds of services and applications at best. Quite often, they are built around a centralized management solution like MySQL, whereas in the web scale world we need to scale to 1000s, or even 100,000 nodes in a typical environment. To reach this level of scale, the architecture of the management framework needs to be designed to scale-out across the entire stack. This can be achieved by separating services such as provisioning, logging, load balancing, and real-time monitoring into independent services that can scale independently. It also needs to use other scalability best practices such as message brokering and asynchronous scale-out.
Manage Host/Devices vs. Managing Infrastructure Systems and Clusters
Likewise, this traditional tooling was designed to manage hosts and devices, whereas modern tooling should manage more sophisticated systems such as software containers and application-level monitoring. In this world, applications have mostly been built as a layer on top of these hosts. This basic assumption starts to break when we need to manage infrastructure systems and clusters.
If you think about what is required to manage a Hadoop or MongoDB cluster, for example, you’ll find that the process of installing and setting up those clusters requires a much more sophisticated process. Let’s take MongoDB orchestration as an example for this.
For the deployment and installation phase you’d need to start by creating the MongoDB cluster machines, and then setting up the network, which also usually requires:
- Opening the client, replication and control ports, and then
- Creating an isolated network for the MongoDB cluster data nodes
Next you’d need to create the relevant number of instances of the MongoDB master and slaves per host, populate the data into MongoDB, and finally publish the peer hosts (i.e. the MongoDB end points).
That’s just the beginning, most orchestration these days doesn’t end with the deployment phase, the post-deployment phase is equally as important. And this too, requires quite a bit of orchestration, including monitoring, workflows and policies, self-healing, auto-scaling, maintenance (e.g. snapshots/upgrades etc.), among other considerations.
Orchestrating such a cluster doesn’t just require setting up the infrastructure, i.e. compute, storage, networking, but also a process which can interact with the MongoDB or Hadoop cluster which will then relay the context of the environment, and even continue the interactive process by calling the Hadoop or MongoDB cluster manager. This delegation process is fairly complex and most traditional tooling was not designed to handle such complex processes.
In the DevOps world on the other hand, managing infrastructure systems and clusters such as database clusters (Mongo, Hadoop), an IP Multimedia Subsystem (IMS) is fairly common. Most of these systems and clusters come with their own orchestration and management. That makes the management challenge quite different, as we now need to allow more delegation of responsibility between the various layers of management, rather than assuming a single source of control for everything.
Infrastructure vs. Application Centric
Once upon a time, in a pre-DevOps world, most management tools were designed to manage compute, storage and network services. Application management was a layer on top, and quite often, was built in as an afterthought i.e. under the assumption that the application is not aware of the fact that it’s even being managed. Therefore, the focus has been on adding management and monitoring capabilities through complex discovery, or even code introspection.
The brave new DevOps world tends to be more application-centric, and the management tasks begin as an integrated part of the development process. In more advanced scenarios it is also common to use modeling languages such as TOSCA, and other similar languages to orchestrate not just the configuration and installation of applications, but to manage "Day-2" operations (i.e. post-deployment). We do so simply because that’s the only way in which we can achieve real automation, and handle complex tasks such as self-healing and auto-scaling.
Limited Plug-Ins vs. Future proofing
In a world in which the “only constant is change” we need to be able to continually introduce new frameworks and services, those that we know as well as those that we don’t yet know exist, but are probably under development as we speak. Traditional management solutions have been known to come with a concept of plug-ins, but quite often these plug-ins are fairly limited and complex to implement, and therefore needed specific support by the management solution owner.
To really be future proof, we need to be more open than that, and allow integration throughout all of the layers of the stack i.e. compute, network, infrastructure, monitoring, logging. On top of this, all that integration needs to happen without the need to modify and change the management solution. In addition, we also need to have runtime integration in which we can easily deploy applications that use a different set of cloud resources and infrastructure, or even different versions of the same underlying cloud.
All this needs to happen with complete isolation, and without the need to bring the management layer down every time that we want to introduce a new plug-in. For example, a development team could have a local OpenStack cloud running in its own data center with an application that can scale-out using OpenStack resources, but when the local OpenStack has insufficient capacity it can scale-out into the AWS cloud.
In addition to that, most of the traditional management servers come with a set of predefined and integrated plugins that are often times pre-integrated with the manager. This many times requires the introduction of new plugins, or the ability to run with different versions of the same plugin. Most traditional management tools do not support this kind of behavior, and are fairly static in this regard.
So, where are we heading?
Ok. By now I hope that I got you thinking that DevOps is far beyond yet another feature to tack on to an existing toolset. Having said that, it is still unclear where we are heading with all the disruption, is there an end in sight?
To answer this question, I will again allude to an analogy from the car industry.
The car industry is now moving from a point in which the production line was centered around the car manufacturing process, where once the car is released, it is managed mostly through manual processes We are now moving to an autonomous process that is continually monitored and equipped with a set of sensors that reports continuously on the state of the car and its surrounding environment. With this we are now able to continuously manage and control the car after it has been released from the manufacturing facility.
Similarly, we're heading to the point in which we're moving from a data center in which most of the processes were centered on getting things installed and deployed to a data center that will ultimately be completely self-managed.
In this data center most of our application management tasks that are performed manually today can become completely automated. This includes capacity management (through auto-scaling), continuous deployment, self-healing, etc. This isn't science fiction, there is already a growing list of organizations, starting with Google and Netflix, that run their data centers in this exact way.