During the past few weeks I gave few talks about our latest XAP release and some of the new models that we introduce in this new release. During all those talks I found myself explaining what is the difference between what I refer to as a Cloud/SaaS enabled application platform, which is basically what XAP is, and traditional application platforms, and how XAP enables taking an existing static application and turning it into an elastic application easily.
In this post I’ll try to provide a summary that explains what are the motivations for this concept, and the different pieces that comprise it in the XAP Cloud/SaaS enabled platform.
The motivation
The introduction of virtualization and cloud computing (Public or Private) enables us to create a new machine with a single API call. This is an extremely powerful capability that can change the way we design and deploy applications. Instead of running an application on a static set of machines, you can shrink and grow the application as needed.
The reality
The fact that we can now add machines using an API call is indeed an important milestone, however most of the existing applications in the data center are not designed to take advantage of this new capability. The reason is that they are tied together in so many areas, that any attempt at stretching them even a little bit will break the entire application.
Making our application elastic
To make your application completely elastic, you’ll need to get to the point where all the different pieces of the application can shrink and grow, while the application is running, and while at the same time ensuring that no data gets lost and the application continues to function as if nothing happened.
In the diagram below I'll try to sketch the various pieces that enabled us to provide a high level of end-to-end application elasticity with the latest XAP 7.0 release.
Common SLA driven clustering
In many of the existing tier-based products, clustering is dealt with separately for the web, messaging, application-server, and database tier. Each tier maintains its own set of assumptions regarding configuration and implementation. That in itself is a huge barrier to achieve end-to-end elasticity, because it turns each tier into a silo. This makes it extremely hard to move all those pieces together when something happens. In addition, the fact that there are so many tiers in our system makes the application extremely inefficient.
With that in mind, the first thing you need to introduce is a common clustering layer that can be shared across all the components in the system. By doing that, you reduce the number of moving parts, achieve consistent behavior across the entire application, and more.
There are basically two parts that comprise an application cluster:
- SLA cluster management – this the piece responsible for match-making between application components and the available resources. It monitors application behavior, making sure that it meets your performance, latency and scaling goals.
- Cluster state management – this is the piece that takes care that when we move pieces of our application around in order the meet a certain SLA, nothing gets broken and no data is lost.
SLA cluster management
At GigaSpaces we refer to this layer as “SLA-driven containers”. The SLA-driven container is basically a light-weight process that can wrap any application and report the status of the contained application. The Container Manager is responsible for maintaining the SLA of a given application when a failover or scaling event occurs. It does so by matchmaking between the required SLA and the available SLA-driven container resources. The SLA-driven container is written in a generic way, so it can wrap and manage other containers as well. A good example is Jetty, Glassfish, Spring and Mule in the Java world. Interestingly enough, this set of containers can also host non-Java applications. Currently that includes .Net and C++ applications. Once we get to this level of abstraction, you can easily support almost any application in the new, dynamic environment.
With the SLA-driven container concept, you can start any application in various deployment topologies: partitioning, partitioning with backup, replicated, etc. For stateless applications, you’d probably use only partitioning. For the stateful part of an application you’d use partitioning with backup, or a replicated topology. You can define the maximum and minimum number of instances that should be running at any time, assign the scaling policy that will trigger scaling up or down, and define a high availability policy to ensure that minimum number of instances is always running as long as there are enough machines available. With 7.0, we added support for “zones” – with a zone can group a cluster of containers into sub groups and apply policy at the group level. A good example is using zone between disaster recovery and primary sites. This ensures that the node in our cluster will run on separate zones.
Cluster state management
Cluster state management holds the state of the various pieces of our application in a reliable fashion. In most traditional systems, this piece was covered by a centralized database of some sort, or another file-system based solution. In a distributed and dynamic environment, this type of solution can be fairly limited, because having a centralized database to manage all your state information creates a huge performance burden and places strict limits on scalability. These limitations led to the implementation of proprietary solutions per product or per layer, which made the problem even worse, because now the state of the application is spread between different state management systems. Ensuring the consistency and reliability of the system in such an environment becomes extremely hard. Trying to dynamically scale and move pieces of the application is nearly impossible under these conditions.
This is where an In-Memory Data Grid (IMDG) becomes fairly handy. You can store the state of our all the various pieces of the application on top of the same IMDG cluster, and by doing that, remove the scaling and performance limitation that exist with the database or file-based approach. In addition, the In-Memory Data Grid was born to run in such a dynamically-scalable environment, and therefore it doesn’t inherit the limitations of a centralized implementation. In addition, the fact that you can manage state in one, shared location significantly reduces the number of moving parts in the application, which in turn enables running the application on much less hardware.
Managing the cluster through API
Clustering in most of today’s platforms is treated as a “black box” or even “black magic”. It is kept pretty much closed, and often very little is known about the implementation behind it. The only way you can control the clustering behavior is through a configuration file, which tends to be pretty fixed to the time of deployment and cannot be changed at runtime. In a virtualized environment, these types of assumptions start to break as it is very hard to anticipate what should be the “right” behavior beforehand and in addition, the impact of a failure, or a scaling event of one component in our system tends to effect other parts of our system in totally unpredictable ways.
XAP 7.0 comes with a brand new set of cluster management APIs, which are aimed to address this specific need. This is a revolutionary model in the application platform space, and so far I believe that XAP is the only product that provides such an API. Personally I feel we’re only starting to scratch the surface with what can be done with this new framework. The most obvious part is the ability to monitor the system at a fine-grained level; the other part is the ability to write custom SLA and deployment automation tools. By automation I mean automating the deployment of an application at deployment time, but in addition, ensuring that the application is kept balanced in the way it uses the resources at runtime.
Putting the two together
There are various systems where state and application SLA management is implemented by two different technologies and products. That approach is doable and can be a good starting point for moving into a dynamically scalable environment. However, in reality, you will soon find that there is a very strong dependency between the two. The decision on which pieces to move and where to move them, will in many cases depend on where the application state lives. For example, you would want to ensure that a backup instance won’t run on the same host as its primary, or in some other cases, the same data center as its primary. You’d want to make sure that the application is spread evenly around the different resources. Deciding on how to spread the application is dependent on the how granularly you can spread the data, etc.
One of the things we did in our product to overcome this limitation is by adding data awareness to our SLA-driven clustering and SLA-awareness to the way we manage our data. By this I mean that we provide built-in SLA for managing state and business logic dependency, as well as transaction support. On the data level we added specific semantics and call-backs to manage the lifecycle of our data instances at runtime. We also use a consistent id assignment for each data instance that ensure each instance contains the appropriate set of data, even if it changes its physical location.
By putting these two tightly together, we can safely manage the dynamic scaling of a stateful transactional application, and move pieces around when needed, while the application is running!
Virtualizing the entire application
Once you get to the point where you’ve managed to create a shared SLA-driven clustering infrastructure, you can start integrating that model into the specific application layers. Below I propose a gradual integration that can make it simpler to move existing applications into a dynamically-scalable environment, with zero or minimum changes to the code.
Web layer virtualization
The web-tier consists of a load-balancer and web-container. The load balancer is a gateway to the outside world and acts as a smart router of requests that arrive from the user’s browser to our web container. In most of the existing systems today, a web container is configured with a set of static IP addresses that form a fixed pool of resources for the application. In this way, you can load-balance the web traffic between this fixed set of resources. In stateful web applications, you would also use a load-balancing policy called “sticky session” to ensure that the state of a particular user session is kept consistent between various user requests.
Sticky sessions places limits on your application scalability, because the user request is routed to a fixed server regardless of the load on that server. In addition, if you add new servers to meet the load, only new user sessions can be load-balanced to these new servers.
To make the web layer completely virtualized, you’ll first need to remove the dependency on a fixed set of machines. Instead, you’ll use a dynamic discovery protocol to detect the availability of new web containers as they come and remove them as they go away. To reduce the dependency on sticky sessions, you can keep the user session on an external In-Memory Data Grid. In this way, the user session will be shared with all the web containers. Since the IMDG will be external to every request, this will involve a network call to load the session into local memory. To reduce this overhead, you can use “local cache” configuration; the local cache holds frequently-used data items and therefore allows you to load the user session only once and avoid the extra network call overhead.
Below is a more detailed description of the components from the above diagram
Dynamic load-balancer
The dynamic load-balancer is responsible for listening to availability of new web containers and updating the load-balancer when a new container joins or leaves the network.
Managed Jetty web container and session on top of the space
The managed web container is a wrapper on top of the common available web containers (currently we support Jetty as the main container). It enables taking an existing web deployment package (WAR package) and deploy it as-is on our web cluster. In this way you can easily turn any web container into a completely virtualized cluster without changing the web application.
You can also choose the web container of choice at deployment time.
Business logic and data virtualization
In most of the existing application platforms, there is a different package that deals with application business logic and data. In a JEE environment, the business logic is covered by a different set of APIs and implementation such as SessionBeans and MessageDrivenBean. Spring provides a more light-weight version referred to as Remoting which is now part of EJB3.x. You can also use other RPC layers such as SOAP, or use JMS explicitly if you want to implement the business logic in an event-driven manner.
The data layer is often treated as a separate layer. In the JEE world, this layer is mapped through Hibernate or JPA. To manage the scalability of the data-layer, distributed caching is often used to reduce the database contention. Caching can be plugged into Hibernate (this is referred to as a 2nd level cache) or provided as an explicit layer that the application interacts with directly.
In a distributed environment, quite often there is dependency between the business logic and the data layer. This is referred to as data affinity. The number of moving parts, different packages, and different clustering implementations per package that existing in most of today’s platforms makes it quite hard to virtualize this part of the application, let alone ensure data affinity.
As with the web layer, the key to virtualizing this layer is mapping both the data and all the business logic components into the same underlying clustering architecture. In other words, all the different APIs will be just a facade that exposes specific application semantics on top of the same clustering implementation. Since the business logic and the data layer is going to share the same high availability and data routing logic, data affinity will be built into the model. In other words, there would be no need for an external mapping layer to route a specific business request to where the data is.
Async persistency
The database is often a centralized piece in any architecture. It is also a contention point that can lead to devastating results in terms of scalability. At the same time removing the database completely can lead to a huge change in our application.
One of the common practices to enable database scalability is by front-ending the database with In-Memory Data Grid which is responsible for loading the data from the database into memory and keeping the database in sync by sending every batch of updates back to the database in an asynchronous fashion.
In this way, the user doesn’t need to block till the request is synched with the database and therefore doesn’t suffer the performance penalty imposed by the database; this eliminates the database’s contention and the scalability limitations it creates.
Taking a step by step approach:
For “green field,” taking the approach I outlined above makes the most sense. However, in reality you need to assume that:
1. You can’t afford a complete re-write
2. You have dependencies on existing pieces of the application that cannot be moved into this model easily.
The diagram below shows how to take a step-by-step approach. The idea is to start with things that wouldn't require changes to the code but should still provide substantial value. The best starting point would be scaling the web-tier, and then adding session high availability and scalability, all the while keeping the database and other pieces of the application as is. You can then scale the application business logic with little changes to configuration – this involves replacing the existing SessionBean and Remoting services with a virtualized implementation of the same services.
Note that we you can keep database scaling as the last piece in the architecture, because scaling the data layer is going to require changes to way our data and application is architected.
Grow as you need vs. rewrite as you need
There are basically two common approaches toward complete application scaling that I've seen so far.
- Rewrite as you need – don’t worry about scaling at the beginning, and pay the cost only when you need to. The built-in assumption here is that many applications will never get to the point where they will ever need to worry about scaling, and it’s better to avoid optimizing things that you might never need in the future. Having said that, and based on many of the hero stories around Twitter, Facebook, eBay and many others who became successful, they were all forced to go through a very painful process of re-write when they reached that point. The cost of that rewrite can be much higher then the actual development cost, because this is a sensitive time when you might your market to your competition. One of the stories I heard about Facebook is that it owes its success to the period of consistent failures experienced by MySpace.
-
Grow as you need – indeed, at the first stages of the project, you don’t necessarily want to spend time optimizing things that you may never need. But the assumption is that the optimization requires lots of effort. As noted above, the step-by-step approach gives you the option to balance effort vs. value at each step. The effort would be the same as with any alternative approach. The main benefit though, is that when you need to grow, you won’t need to go through a complete re-write. The transition can be fairly smooth. In addition, you don’t need to switch and shop for different solutions – you already have everything built into the system to support your growth.
As I wrote in a previous post, Platform as a Service: The Next Generation Application Server? – the built-in assumption that scaling is complex is mostly derived from the fact that most of the current platforms were not designed for scaling. By contrast, the new generation of platforms are designed for scalability from the ground up. When you write your application in Google App Engine or Salesforce, you are not really exposed to all the details of the infrastructure. Most of those details are completely abstracted from your code. True, you need to know what’s going on behind the scenes if you want to reach the best performance and avoid potential consistency issues, but that is already a completely different level of effort than what we are used to with existing systems.
An analogy I like to use to describe this change is multi-threaded programming. Before Java, developing multi-threaded applications was fairly rare. Most applications where written in a single-threaded manner. When Java built threading into the language, writing multi-threaded applications became the norm rather then the exception. The same thing happens when we move to the cloud and virtualized environment. Writing a dynamically scalable application becomes the native way to write an application rather then the exception.