« January 2008 | Main | March 2008 »

February 2008

February 26, 2008

When virtualization meets SOA

There have been many separate discussions about SOA and virtualization. Only a few addressed how they relate to each other. 

Interestingly, while I was working on this post, Geva Perry brought to my attention Judith Hurwitz’s blog - Is Virtualization the foundation of SOA? Where she made the following points:

I have been doing a lot of thinking lately about virtualization and cloud computing. The more I look at the foundational requirements for virtualization the more I am convinced that there is a close relationship.

…you have physical resources, data, images, application components in containers, etc. How do you make sure that they maintain the state that you desire? How do you ensure that this version of your resources acts like a well oiled machine?

The second comment by Judith brings up another issue that is often ignored, which I refer to as Intra-Application SOA. There is a huge difference from an architecture standpoint between inter-application SOA, meaning among multiple systems, such as SAP and Siebel (for example, using Web Services) to intra-application SOA, which applies to a specific application, such as an Order Management System or an eCommerce application. The latter are composed out of multiple service components that need to behave as a single application. The consistency, latency, and performance requirements are fundamentally different among "intra" and "inter", and so is the level of granularity in which we break our application services.   

This post focuses on Intra-Application SOA and how we can apply virtualization patterns to simplify the way we turn simple objects (e.g., POJOs) into distributed services. First, I'll start with a passage from the Wikipedia definition of SOA:

SOA separates functions into distinct units (services), which can be distributed over a network and can be combined and reused to create business applications.[2] These services communicate with each other by passing data from one service to another, or by coordinating an activity between two or more services. (Source)

Taking this definition, one way to look at it is that SOA is an evolution of RPC. The main difference between SOA and RPC is that with the latter we maintained direct relationship between the client that invoked a business function and the service that delivered that function, whereas In SOA we break this direct relationship and therefore can map a certain business function to different services that together deliver that specific business function. This seemingly minor difference opens up an entire world of interesting opportunities related to how we map a business request to the underlying service(s) that implement that request. Below is a summary of some of the possible relationships:

  1. Synchronous – In this case we would like our system to wait until a certain business function is executed, and only then proceed.
  2. Asynchronous – In this case we send a request for executing a certain business function but we don't wait for its completion. In order to know the status of this operation we normally get a logical handle (a.k.a Future), which enables us to inquire on the status of the request at a later stage and get the result later. Quite often the request itself triggers a chain of events, which means that the actual service that receives the request may not be the service that will deliver the result of that request.
  3. Parallel – In this scenario, the application is served by multiple instances of the same service (a common scenario in the case of partitioning). We would like to execute the same operation on all service instances at the same time, and aggregate the results. This pattern is also referred to as Map/Reduce.
  4. Content-Based – The mapping of a certain business request to the service that implements it is based on the content of the request. A common scenario would be to use this method for routing requests to the service instance that contains the relevant data required to perform the request. This is also referred to as data affinity.

From a reliability perspective, a request can be made transactional, which means that even if it was invoked asynchronously we are guarantied that it will not be lost if the service failed during the execution of the request. In the event of such failure, the request will be rolled back and another instance of the service will pick it up.

A common problem today is that each of the approaches listed above is believed to require a different transport. Quite often it also requires an explicit change in our service implementation. For example, to achieve synchronous invocation we use something such as RMI. For asynchronous invocation we use a messaging system, such as JMS. For parallel execution we have things like Map/Reduce or Master/Worker.

Things become more complex if we're dealing with stateful services, for which we need to add data affinity and transactional consistency. In such cases we need to be able to route requests to where the data is, and make sure that the service invocation uses the same transactional context as the data service that maintains  state to make sure that in case of failure both the state and the invocation will be rolled back to a consistent state.


The reason for these limitations is that existing solutions evolved kind of backwards, meaning we designed our systems from the technical requirement perspective. In other words, we would ask ourselves: "how do we perform parallel execution?", "how do we execute asynchronous operations?" and designed our services differently based on the way they had been invoked. We did not ask the question: "what does it mean to run multiple services that serve the same business function?", or "how can we abstract the fact that we're interacting with more than one service instance from the client that is using it?",  or "how can we enable clients to choose the method of invocation and the degree of reliability without changing our service implementation?"

This is where virtualization comes to the rescue.   

"[Virtualization is] a technique for hiding the physical characteristics of computing resources from the way in which other systems, applications, or end users interact with those resources. This includes making a single physical resource (such as a server, an operating system, an application, or storage device) appear to function as multiple logical resources; or it can include making multiple physical resources (such as storage devices or servers) appear as a single logical resource." [1] (source)

Now looking at SOA from a virtualization perspective: We should think of the case in which we have 1, 10 or 100 service instances serving a certain business function, as if they were a single service. The way we choose to interact with these services should be abstracted from the client that is using them. In the most simple scenario a client should be able to invoke a method on a service as if it was a local instance or a single remote server. The client should be abstracted from the way we route the request, as well as from the physical location of the service instance over the network. In the same way our system should be smart enough to benefit from the fact that the service is collocated. In such a scenario, the client request wouldn't need to go through the network at all. Having said that, there are cases where we would need to enhance some of the service semantics in cases where we want to map a single request to multiple services in parallel. In such cases we will need to introduce a reducer handler that will be responsible for aggregating the results from all the services.

The most interesting thing about this approach is the simplicity in which we can now turn a simple POJO into a distributed service based on SOA , without writing code and with an advanced level of reliability and flexibility. We don't have to think in advance about how our service is going to be invoked. We don't even need to change our configuration if the service is collocated or remote. Just think about what that means from a testability prepective. We can write our service in our own IDE, run all our functional tests locally and then, using the same exact code, run it in a full-fledged distributed deployment. 

Uri Cohen wrote in his blog a description of the OpenSpaces Service Virtaulization Framework that illustrates how this pattern can be implemented.


My Photo

Twitter Updates

    follow me on Twitter