In the first part of this post, I discussed how virtualization and cloud computing, as we know it today, is only a small part of the solution for today’s IT inefficiencies. While new technologies and delivery models have made it much simpler to manage the infrastructure, this is not where our core inefficiencies lie. Virtualization principles must be extended to higher levels of the application stack, to make it easier for all of us to manage, tune and integrate applications. Otherwise we will continue to spend most of our time on things that don’t provide real value to the business – infrastructure, installation, management, tuning, and integration.
In this post, as promised, I’ll show how this missing piece can be filled in practice, by something called Elastic Middleware Services. These are middleware services that can be deployed with one API call, and are completely abstracted from the applications that use them. Elastic Middleware Services are very similar to cloud-based middleware like Amazon SimpleDB, but the major difference is that they are enterprise-grade, and can be purchased and installed in your private data center, for use by your internal enterprise applications.
To better illustrate this idea, I’ll discuss a reference implementation of Elastic Middleware Services which will shortly be released as part of GigaSpaces XAP 7.1. The discussion is intended to illustrate in concrete terms how to fill the missing piece in the virtualization stack. I’ve intentionally focused on the conceptual and fundamental elements of the GigaSpaces implementation, not on specific features and benefits, to make the discussion useful even if you are not using or considering GigaSpaces products.
By the way, I recently hosted a webinar on this topic, together with Uri Cohen, GigaSpaces’ product manager. In the webinar we presented GigaSpaces’ upcoming Elastic Middleware Services. The recorded version of this presentation is available below:
Elastic Middleware Services
The idea is to provide middleware services (messaging, data, Map/Reduce, and more), like those the ones provided by cloud providers such as Amazon and Microsoft Azure, in your own local environment. These services simply extend the basic principles of virtualization, as I outlined in the previous post, to the middleware stack:
Basic virtualization principles
- Break big physical resources into smaller logical units
- Decouple the application from the physical resources
- Provide an abstraction that makes all the small units look like one big unit
From a middleware perspective, this means that instead of having one big database, you have lots of small units of data services (also called partitions or shards), which are grouped together through a client side proxy that exposes all of these small units as one big unit to the client that is using them. The client proxy is also responsible for abstracting the client from the physical location of those resources. To scale, all you need to do is add more of those units. The client experiences these additional units as extra capacity added to the service.
Deploying middleware with one API call
From an API perspective, this is a much simpler approach than traditional middleware, because it introduces a higher-level abstraction to the user, as can be seen in the code snippet below:
With this new elastic middleware API, users define requirements in a language that is closer to their domain. For example, “I want to create a data service that will have 10GB of data and can grow up to 100GB of data, I need hot failover and I'm willing to share my deployment with other users of my own organization but not with users of other organizations.”
In response, the Elastic Middleware Services would automatically do the following:
- Allocate the right number of partitions, based on machine availability and the available memory in each machine, to address the capacity required.
- Automatically launch a hot backup in for each partition and ensure that it runs on separate machine than the primary partition, to address the high-availability requirement.
- Make sure that if another organization’s application is already running on the machine, the application won’t deploy on that machine, to address the security requirement.
Fully automated through built-in SLA
As I noted in my previous post, one of the major costs of enterprise applications over a period of three years is the cost of maintenance and operations. The main component of maintenance costs is the labor cost, closely tied to the amount of manual work you have to put into your system to make sure that it meets the application’s SLA. The cost grows proportionally to the demand for scale.
With Elastic Middleware, maintenance cost can be substantially reduced through built-in SLA and automation that covers the following aspects:
- Scaling SLA – scale when there is a memory shortage, high CPU utilization, Garbage Collector hiccups, etc. Users can choose to scale automatically when any of these events occurs, and customize their own thresholds (e.g. scale when CPU utilization hits 90%). The Elastic Middleware is integrated with a cloud or virtualization framework, to enable it to automatically pull or release of machines as needed.
- Failover – when a machine fails, the Elastic Middleware does one of two things:
- Scale down – if there are available resources to meet the current workload, the system will automatically scale down and continue to service the application.
- Rebalance – if there are not enough resources to serve the application, the Elastic Middleware calls the cloud/virtualization pool and starts a new machine. As soon as the new machine starts, it is added to the existing pool of resources, and the application is re-balanced to take advantage of the additional capacity available through the new machine. It is important to point out that if you don’t have a dynamic pool of resources available through virtualization or cloud computing, you could still start a machine manually. Once the machine is started, adding it to the application’s resources and rebalancing would happen in exactly the same way.
- Continuous rebalancing – when new machines are added to the Elastic Middleware, it immediately detect them and re-balances the assets currently running on the new machines (if necessary), ensuring optimum utilization of all services.
Carrying out this dynamic SLA was deigned to happen while the application is running, ensuring no transaction or data loss during the process. In-flight transactions continue to be served with no noticeable hick-ups.
Concerned of losing control?
Whenever I introduce the concept of built-in SLA and automation, I get two type of responses. One is “Wow, that’s cool!”, the other is “It sounds like I’ll lose control.” The concern about losing control is very valid, as in many cases, especially in non-cloud environments the concept of spare capacity rarely exist and therefore adding a new machine would often involve some manual intervention. In addition, when something goes wrong, other parts of the system or even other parts of the organization need to be involved, so we can’t always assume that full automation is possible.
The cruise control analogy
The approach GigaSpaces has taken is similar to the way cruise control works in our cars. You can choose to give up full control under a certain threshold, but at each point in time you can take the wheel back and resume full control. Like many of the cloud providers, we provides this type of cruise control for our services through an open API that enables management and control of every aspect of our cluster. You can query the available node, CPU, application partitions and service components. GigaSpaces Elastic Middleware was built as a layer on top of that API, which means that if the built-in functionality that comes with the Elastic Middleware doesn’t fit your needs, you can go one layer down and write your own custom behavior. For example, you can specify that in your system, scaling or failover events send an alert that will trigger a manual process for adding a new machine. In other words, you have full flexibility to choose when to turn on “cruise control” and when to take ownership.
SaaS-enabled with built-in multi-tenancy
Multi-tenancy often means that can run multiple users/customer applications on a shared resource, thereby reducing the cost of ownership per user. This is considered an area of high complexity, specifically in SaaS applications, because with today’s middleware, the application itself is responsible for mapping between users and application tenants. With Elastic Middleware Services, multi-tenancy is built into our implementation and API. Thus the burden of sharing resources is moved from the application down to the middleware. Each application works with its own dedicated middleware service (data, messaging, etc.) but that middleware service can share resources with other middleware services. In other words, instead of having one big database running on a dedicated machine and split it out between users at the application level, you can have lots of small databases spread between machines, where each application can have its own dedicated database and still have that database shared with other database instances that runs on that same machine.
This not only significantly simplifies your code, it also provides better isolation between multiple users, as well as independent life cycle management for each tenant.
There are various tradeoffs between sharing and isolation. With isolation you get better security and control of your own environment, but lower utilization and higher cost. With sharing, you can reduce cost and still achieve reasonable isolation, but not at the same level as when running on a dedicated resource. The Elastic Middleware Services make it possible to define the isolation/sharing at the application level. It does this by introducing the notion of an isolation level:
Currently we support three isolation levels:
- Dedicated – guarantees a dedicated machine allocated per instance of the application.
- Shared private – multiple instances of the application or organization share the same resources, but other departments or organizations are isolated.
- Shared public – everyone shares everything.
Available both on your local network and on the cloud
To try the new service you don’t need to have a private cloud or run your application on a public cloud. You don’t even need a virtualization layer. All you need is to launch a single GigaSpaces agent (Java process) per machine (normally that process will be started automatically at machine boot).
Once you’ve done this, you can start interacting with your machines and an create the desired middleware service through a simple API call. The following snippet shows how you might use this API:
Admin admin = new AdminFactory().createAdmin();
ElasticServiceManager elasticServiceManager = admin.getElasticServiceManagers().waitForAtLeastOne();
// Start a new data-grid
ProcessingUnit pu = elasticServiceManager.deploy(new ElasticDataGridDeployment("mygrid") // give it a name
.isolationLevel(IsolationLevel.DEDICATED) //isolation level
.highlyAvailable(true) // set the high availability level
.elasticity("2GB", "6GB") // set the required capacity range
.jvmSize("512MB") // configure the size per VM
.addSla(new MemorySla(70))); // define Memory SLA
That’s it! – you just got a full cluster deployed and ready to use.
What does it mean for you?
To sum up, I here are the main benefits that Elastic Middleware Services bring to each type of user:
For developers
As a developer, you can get access to the service you want just by calling an API, without worrying about installation or cluster setup. You get access to the service from your existing platform. In other words, you don’t need to run your application on GigaSpaces XAP to use any of those services. You can also pick and choose – use only the services you want, e.g. just the data, data and messaging, etc.
For the (private) data center
From a data-center perspective, you could install Elastic Middleware Services only once in your data-center. The same installation and resources would be shared amongst all users in your organization. If you happen to have virtualization in place, you can do this very easily using the GigaSpaces virtual appliances.
Once you install the system, other users in the organization don’t need to go through the installation process – they can start consuming the middleware services just by calling an API, like they would do on the Amazon cloud when they launch a SimpleDB instance.
In a way, the Elastic Middleware Services gives the data center the power to provide higher-lever services to the business, rather than just plain infrastructure services, enabling the organization to become more agile.
For public cloud providers
Public cloud providers can be viewed as an outsourced version of the local data center, so they should experience much the same benefits as data centers. In addition, with Elastic Middlware Services, public cloud providers can offer a set of middleware services that are enterprise-ready with extremely high utilization per service. Because the Elastic Middleware Services are fully memory-based, you can use as much as 10X less machines to achieve the same throughput requirements, compared to a disk-based approach.
Ricky Ho published some interesting performance characteristics of Amazon SQS and SimpleDB in one of his recent posts:
- Network latency and throughput: 20 - 100 ms for SQS access, SimpleDB domain write throughput is 30 - 40 items/sec.
- Eventual consistency: 2 simultaneous requests to dequeue from SQS can both get the same message. SQS sometimes reports empty when there are still messages in the queue
While both SQS and SimpleDB provides a scalable data store and messaging services, their performance are at least 10-50X slower than an equivalent GigaSpaces data and messaging service. Implementing a Map/Reduce scenario with GigaSpaces is going to be significantly simpler and closer to real-time compared to Amazon Elastic Map Reduce (and that’s a topic for a separate post). Another cool feature is the ability to execute your code within the service container, which gives you an even bigger improvement in performance and latency.
The Elastic Middleware Services have much higher utilization and simpler maintenance, compared to equivalent services offered separately to the end-user, because they are provided using a shared cluster with a unified clustering model. With Elastic Middleware Services you get JPA, memcached, key/value (NoSQL), Spring, JMS, Remoting and Map/Reduce in Java or .Net, in a single deployment.
Existing GigaSpaces XAP users
Existing GigaSpaces XAP users can use the Elastic Middleware Services as part of XAP. They’ll benefit from the simplicity provided through the high-level abstraction. In addition, the Elastic Middleware Services make it easier to plug-in the GigaSpaces components into existing application servers or development environments. They also make it significantly simpler to manage and deploy GigaSpaces XAP in various groups in the organization, because they can all share the same virtual pool of machines in the data center.
Availability
The Elastic Middleware Services will be available as part of our upcoming XAP 7.1 release, due for end of March 2010. It is currently available for private beta testing. We will also release a public beta of the Elastic Middleware Services through our upcoming 7.1 release candidate version. As always, we’ll welcome feedback and will be very happy to hear about your specific requirements. You could either post comment to this post or send an email to <pm at gigaspaces dot com>.