Space-Based Architecture

June 25, 2009

Google App Engine plus Amazon AWS: Best of both worlds

George Lawton wrote a a good summary of my JavaOne talk in his article titled Google App Engine plus Amazon AWS: Best of both worlds 

Google App Engine (GAE) is focused on making development easy, but limits your options. Amazon Web Services is focused on making development flexible, but complicates the development process. Real enterprise applications require both of these paradigms to achieve success… What we really want is the flexibility and performance of AWS and the simplicity and ease of use of GAE.

This is exactly what we had been working on for the past year, leading us to the launch of our new cloud platform. With this platform we leverage GigaSpaces XAP as the high performance scale-out application server and Amazon as the robust and flexible IaaS. Together they form an alternative Platform as a Service geared for enterprise grade applications. This allows the cloud environment to inherit the extreme performance, latency and scalability of the XAP platform, which in turn enables achieving your performance and scaling target with less machines, implying a lower cost.


Real-life case study: Primatics financial – Risk analysis as a service

Francis de la Cruz and Argyn Kuketayev from Primatics Financial joined me through the presentation. In their part of the session they described their experience in developing a SaaS application for Real Time analytics.

Kuketayev described how Primatics used this approach to create a new automatically scaling cloud version of an existing banking application. Primatics initially developed a mortgage securities application that allows banks to estimate the value of a basket of hundreds of thousands of loans. The value of these loans fluctuates as economic conditions change and some portion of home owners cannot afford to make payments on their loans. Banks normally only need to assess the value of these loans at the end of each month, making them an ideal candidate for cloud services like AWS.

From a scalability perspective the challenge is to be able to provide a highly multi-tenant application that need to serve many firms, many users in that same firm each running many jobs at the same time. Implementing such a model can be fairly complex as you will need to be able to manage the life cycle of each job and each user independently and in isolation from one another.


The need for scale 

Trying to build such a service directly on Amazon is going to be fairly complex, as you can learn from George’s summary below:

Primatics wrote the first version of EVOLV:Risk as a hosted web application for a regional bank.. The application needed to be fault tolerant so that if one node crashed, they did not have to restart the application over again from the beginning. Kuketayev said that it is not just about the loss of four hours, but the office is trying to close out the month and needs to access data to end the monthly cycle so they can go home.

Using GigaSpaces' toolset they rewrote the entire application infrastructure in about four-months to run on top of AWS. Now they can kick off as many instances as required for different banking customers, and each instance runs significantly faster than before. Kuketayev said that it is important for banks that none of their applications run on the same infrastructure as another bank.


The diagram below shows the specific architecture that Primatics ended up using. Those that are familiar with Space Based Architecture would find it fairly straight forward:

The application is built out of  a set of processing units. Each processing unit contains the compute agents in the form of a polling-container.  The compute agents gets a a reference to a remote Data Grid that is shared by all processing units. Each agent gets the job injected to it by the polling container and gets a reference to the data it required to process the job. Once the job is completed, the result is stored back in the space. The results are flashed out back to a database through a mirror service.

In a case of a failure, other compute agents are able to continue from the exact point of failure and continue the job processing as if nothing happened. This is because the state of the job is kept safe in the data-grid and not in the agent’s memory.


image

Kuketayev from Primatics nicely summarized thye lesson he learned after going through the experience of trying to build it on his own vs. trying to use GigaSpaces:

Kuketayev said that one of the biggest lessons is that you need to have your infrastructure do the provisioning for you automatically, or otherwise you end up spending a lot of time just turning things on and off. He said they are now using configuration APIs to automate this process, whereas before they were using scripts. This allow for automatically throttling and failover recovery without human intervention.

Kuketayev advised "You need to make sure you use the right tools … You don't want to have to worry about provisioning and reliability. Make sure you have provisioning, failover, monitoring and SLA out of the box."


The full JavaOne presentation is available here:


Final words

Fr solution providers the size of Primatics, building a risk analysis application as a service couldn’t be possible without cloud computing. Cloud enabled them to offer their solution as a service without the need to go through major investment of building a data center to support it.

Primatics’ experience is not special. One of the benefits of building Software as a Service is that you have one shared environment for all your customers. At the same time, one of the challenges is that in a shared environment, failure becomes more public and will impact ALL your clients. If the system doesn’t scale well, you’re going to be hit twice as hard as in a standalone application.

Building a robust and scalable SaaS application can be fairly complex. A good cloud infrastructure will get you a first class data center, but it won’t solve your application requirements.What’s interesting with cloud computing is that it forces you to think about the cost and efficiency of your application more than ever before. In the Primatics example, running a simulation of 100 nodes for 3 hours is very likely to fail at some point. A failure during such a simulation will immediately cost you 300 hours, not to mention the fact that you might lose the simulation window for the day and the reputation challenge you’ll will be facing with your customers. In addition, putting the data in-memory and making the application run 3-5 times faster means that you would need 1/5 of the machine power, which saves 80% of the cost of running the application.

I believe that the challenges imposed by cloud computing force us to focus on what we do best and avoid investing in areas which are not core to our business. Because the pay-per-use model significantly lowers the cost barrier, going down the path of writing your own infrastructure, as many have tried to do before, will be much more expensive and risky then ever before.

References:

June 01, 2009

GigaSpaces Launches a New Version of its Cloud Computing Framework

We have been working for a while on our new Cloud Computing Framework (CCF). We wanted to wait until we had some real customers and partners behind the platform in production before spreading the word on a larger scale. And now, I am very happy to report that we have reached this point sooner than expected and just in time for the JavaOne event.

The new version includes an enhanced integration with the Amazon billing system Devpay designed to enable users to use GigaSpaces as a service on a true pay per use model. All you need is an Amazon account and your done. Unlike other products, that are offered in similar fashion, we decided not to provide just a bundle of our software packed in an Amazon image (AMI) but instead provide a more tight integration with EC2 that will enable our users to use a single GigaSpaces product for all GigaSpaces versions as well as customize the installation to include their own software packages and application code on the fly. The new CCF provides a true end to end experience that will enable our users to deploy and manage their *entire* application from the load-balancer to the database in one single click.  The latest release of CCF includes support for the new GigaSpaces 7.0 release. Users that are looking to try out the new XAP product can now do that for free without the need to download or install anything.  This will enable you to try out any of the built-in examples, test the high availability features and run benchmarks in just one click. The new version includes lots of additional improvements to the user experience as well as new features and bug fixes. To view a detailed list of the new features click here.

Those that will be visiting JavaOne this week will be able to join the hands-on labs session and experience a full deployment of JEE application using CCF. You will also be able to gain insight from one of our customers, Primatics Financial on how they used CCF to build their Risk Management application as a service – See  full details here

In this post I will try to address some of the most common questions that we often receive when introducing this new service:

Who is using GigaSpaces CCF and Why?

The new release is already used by different customers in the following areas:

ISV’s and Service providers using CCF for SaaS enablement:

Many ISV's are looking for ways to move their standalone applications into a SaaS based offering. In doing so they face a few challenges:

1. Scalability – now their application needs to serve not just the users of a certain customer at a certain location but all customers from all locations. This increases the scalability challenge on their product by an order of magnitude.

2. Continuous high availability – one of the challenges that is imposed by having a shared platform for all customers is that the impact of failure can be catastrophic as failure can affect all of your clients at the same time and becomes public fairly quickly.

3. Portability – There are cases where some customers would still require their application be installed in their local data center for security and latency purposes. Today, this often means you will need to develop two seperate branches of your product to serve these two scenarios. With GigaSpaces you can write your application once and deploy it in the cloud or in your customer data center without having to modify your application and without locking your application to a specific cloud provider.

4. Performance – In developing high performance applications such as Trading, Risk Analysis, or SIP server, keeping the latency at sub msec and throughput at a maximum becomes critical. This is where our XAP middleware has a proven record in running numerious production mission critical applications.

The following are a few public references that chose to use XAP:

- Primatics Financial is using GigaSpaces CCF to build a high performance risk management solution

- Orbyte is using GigaSpaces CCF for building state of the art high performance trading applications that can be offered as service.

Other customers include Online Gaming companies that were looking for a simple and cost effective manner to launch new online games and provide a gaming application platform that enables other gaming providers to host their applications easily in a cloud environment. See further details on our online gaming solution here.


Enterprise customers using CCF to improve time to market of new applications and for large scale testing

According to recent research conducted by Forrester, one of the main requirements driving cloud adoption by the enterprise market is that current IT can’t keep up with the business demands of their organization:

  • Capacity planning is too difficult – The current method of provisioning applications based on estimated peak loads has become either too costly or impossible. Most systems are non linearly scalable which means that knowing how many resources would be needed to meet certain goal is known only through trial and error.
  • Time to market – launching new applications and services often takes months and sometimes year.
  • The business wants a simple and cost effective way to prototype – not all applications make it to production but these initiatives still require the business to invest upfront in the purchase and configuraiton of infrastructure to support these prototypes.
  • Cost saving – There are sets of applications that have fluctuating demand on resources.  Enabling these applications to “lease” a common set of resources only when under heavy demand can lead to huge cost saving and efficiency.

I discussed the enterprises challenges in my previous talks. The most recent one is the one I gave during the CloudSlam event:  Practical Guide to Developing Enterprise Applications on the Cloud- Online Presentation 

A good example of an enterprise customer who is already using our CCF in production is a Large Telco Service Provider that searched for ways to reduce the time to market for launching new scalable web applications. This customer was able to bring a project that previously took more than 6 months just to finish the development and testing to 4 weeks with only 90$ spent on pre-investment.

Large scale testing, performance benchmarks and demo as a service

Users that are not ready to use the cloud to run their production systems can still receive benefits from the cloud for testing and demonstration purposes. There are a few challenges that we tried to address with CCF to make this type of use simpler:

1. Making cloud testing environments identical to your local testing environment – To effectively test your application on the cloud and deploy a production version of your system locally, your application must be at a point where it behaves fairly the same in both environments.

2. Setting up a large scale environment of hundreds of nodes can be fairly complex

3. Trouble shooting failure in such an environment can be a fairly complex and lengthy process.

If your application is already running on our XAP product then porting it to the cloud and running it locally in your production environment will not require any change to your code or configuration. With CCF setting up a large scale environment of hundreds of nodes with the new CCF takes just a few minutes. We designed the CCF to run not only GigaSpaces but other external services which are essential for tests such as JMmeter, Tomcat servers etc. Users can easily configure their own initialization and command scripts when the machine is launched and define which software packages will be installed on their machines on the fly. We added the ability to capture the logs of multiple machines easily and monitor the system and middleware behavior at runtime using a zero installation environment all through the web.

GigaSpaces and our existing customers and partners have already been using CCF to run live demos and recently GigaSpaces started using CCF for our own internal large scale testing purposes. Using the cloud for demo purposes has been extremely useful to show cases that previously were hard to demonstrate such as fail-over and scalability in a real production environment. You can read more on how you can use CCF for running your own demo on the cloud in a similar way without going through all the challenges we had gone through in one of my recent posts: Demo as a Service . Similarly CCF can be ideal for running large scale Benchmarks.

Partner ecosystem -

We designed our CCF in away that would enable us to run not just GigaSpaces components but almost any application such as JEE or Spring based applications. This enables our Solution provider partners to use CCF to offer their specialized solutions and offerings for both GigaSpaces and their other software partners.

System Integrators are also looking for ways to gain a competitive advantage by offering turn key solutions at a significantly shorter time to market and lower initial investment. I tried to outline below some of the public references on these two categories:

  • Solution providers

Real time production monitoringDynatrace offers enhanced integration with CCF for enhanced production monitoring.  Dynatrace users are able to monitor in real time how their application is performing and trace their application bottlenecks in real time – see details on this integration here.

Model Driven Development framework - New Technology/enterprise (NT/e) - GigaSystemBuilder helps Java developers prove applications quickly - in a few days - on a local grid or in the cloud, and then rapidly move on to production development, rather than waste time on finger-trouble or implementation details – see more information here as well as a live demo on their deployment manager solution

  • System integrator

In many cases customers are looking for experts that will enable them to gain the benefit of the cloud and bridge some of the skill-set gaps. AAR is one of our leading system integrators in that field who has already delivered several solutions on top of our GigaSpaces CCF for Enterprise customers. For them, using CCF was a tool to gain a competitive edge and provided their customers the ability to launch new applications in a matter of weeks as opposed to months and at a fraction of the setup (development environment, testing environment, staging environment) and hardware costs that are associated with setting up such an environment with traditional projects.

Here is a quote from one of our System Integrators in the UK that describes their experience in using CCF for the Telco Service provider that I mentioned earlier.

“We use GigaSpaces XAP and CCF to deploy a standard JEE web application on EC2. To address security concerns we kept the business logic outside of the cloud on the customer site. We exposed the business logic through a secured web services channel. We use XAP data grid for maintaining the session high availability and for reducing the latency associated with accessing the remote data center. We used the XAP integrated Jetty container for hosting our web application and through that gained the built-in self healing and auto-scaling provided through the CCF. In this way we were able to deal with potential failure or load without human intervention. From a management and monitoring perspective we were able to leverage the integrated ganglia monitoring and integrated visual representation of different metrics, like cpu/memory/network usage of the entire cluster. We used JMX to bind our custom Mbeans to the integrated ganglia UI. Overall GigaSpaces CCF gave us big boost on utilizing ec2”

I must admit that from our end this was probably the smoothest project we had been engaged with. The reason was probably related to the fact that unlike other projects we had full control over the environment. This enabled us to minimize potential miss configuration errors which is one of the major areas for complexity in setting up such a project. Another benefit was the support. When we faced an issue it was very easy to log-in directly to the machine and fix the problem on the spot without the need to go through a complex process of authorization from the IT Security and without the need to create a reproduction package and ship it over.

The customer was able to deploy a new business application quickly without putting all the initial investment upfront. In this way they gained the freedom to decide at any point if the application should continue, be canned or if they should move it entirely to their local IT.

Looking into the future

The future looks very promising as we continue to work on bringing new customers and businesses towards using CCF and the latest XAP 7.0. We are currently working on supporting the Amazon European datacenter and taking advantage of some of the new XAP 7.0 cluster administration API to enable users to provide thier own custom SLA deployment. At this point, I would like to ask for your specific feedback and wish list for our next major CCF release.

May 19, 2009

GigaSpaces based solution makes it to the finalist of Cisco Developer Contest

I was very pleased to read an email from Leonardo, who was the winner of the OpenSpaces Developer Challenge (a worldwide programming contest using the Gigaspaces application server which was held last year), saying that he is now a finalist in the Cisco developer contest. Here's a bit about him and the application he submitted:

About Leonardo

Leonardo worked for several ISPs in various roles as network administrator and java programmer for IT consulting firms, and finally as software architect in high-performance Java EE based projects. He is passionate about parallel programming, distributed computing and more recently semantic web and its applications on software engineering.

Leonardo was the winner of the OpenSpaces Developer Challenge. He enjoys reading about various technologies in the field of computer science. When he is not developing code, he prefers to spend time with family and friends, walk in the park, or watch a movie.

About the application

Resource Management Platform is a proposal to develop an event based platform that leverages AXP, Services Gateway Initiative (OSGI), Jini and JavaSpaces technologies to enable deployment of IP Multimedia Subsystem (IMS) applications based on Session Initiation Protocol (SIP); more specifically, the Call Section Control Function (CSCF) components. It will have admission control mechanisms to manage Call processing.

This solution improves infrastructure manageability for large scale IMS applications. Such a platform will potentially be useful to enable deployment of high-performance, network-based SaaS (Software as a Service) or Cloud Computing solutions at the network edge by leveraging AXP.


You can find the full details about his project here.

Leonardo's project is interesting, because it shows how you can use Space Based Architecture (SBA) for implementing a scalable Telco application and offer it as SaaS application on the cloud.

Interestingly enough, I got another email the week before from Amin Abbaspour, who presented another case study illustrating how you can build a scalable SMS service using SBA, as shown in this diagram:

image 

What the two projects have in common, from an architecture perspective, is that they both represent a highly scalable Event Driven design. The unique thing about Event Driven applications is that they require a combination of messaging, data and service interaction that needs to be tightly orchestrated to meet high performance/low-latency requirements without compromising on consistency, ordering (FIFO) and reliability. This combination of requirements represent one of the hardest challenges in building scalable architectures. Trying to meet this type of challenge in the traditional way by integrating messaging system for event delivery , database or simple caching (like Memcached or TC) for data and a traditional application server for business logic is going to lead to fairly complex architecture. Trying to reach linear scalability and keeping the latency low with so many moving parts is close to impossible. This is what makes SBA such a good fit. The main difference about SBA is that it recognizes there is strong dependency between messaging, data and business logic. The key is to have one shared clustering, high availability and scalability for all three components of the architecture. This makes it possible to reduce the number of moving parts and network hops associated with each business transaction, thereby increasing reliability.

On a personal level, I was very pleased to see that the software we are developing is helping people like Leonardo and Amin to build their own carrier and put themselves in a unique spot in highly competitive market.

Good luck Leonardo and Amin!

References

May 18, 2009

Practical Guide to Developing Enterprise Applications on the Cloud- Online Presentation

According to a recent survey, available skill sets is one of the leading decision factors for organizations in determining which application platform to use, while scalability and availability are next. This reveals one of the main obstacles for bringing enterprise applications to the cloud: How do you take something as disruptive as cloud computing, and bring it to an enterprise environment, without forcing a complete re-write?

In my recent talk at the CloudSlam (online) Conference, I tried to summarize the challenges people face when trying to deploy enterprise applications on the cloud. Try to imagine a scenario where you have an existing JEE system, and your system is under load. At this point you would like to add more machines to accommodate that load. What will happen to your application in this case? Will your application be able to take advantage of the additional capacity? How do you know how many machines should be added to meet the load in the first place?

The answers I usually hear to these questions are fairly consistent. If you’re lucky, you’ll only require configuration changes to be at a point where you can utilize the extra resources. Knowing how many machines to add in order to meet a certain load is yet another challenge. Going through the normal capacity planning process could take weeks with the way systems are currently running.

At the same time, if you’re going to deploy our application in a static manner (reserved instances, static IP, …), it would probably make it simpler to deploy your existing application on the cloud, but it probably won’t make much economical sense. So the question I was trying to answer is how we get from the static application deployment most of us are using today, to a point where we can get end-to-end scaling and elasticity of our application from the load-balancer to the database without going through a complete re-write.

In this presentation, I tried to suggest a set of solutions to overcome the various challenges I mentioned earlier and how to apply them in a gradual manner to avoid huge initial investments and high risk. There were also some interesting questions toward the end, centered specifically around one of the hardest problems – getting the data pieces sorted out in a such a dynamic environment. Those who missed this presentation can now view it online:



Note: The voice quality isn’t that great, plus the desktop sharing on Webex didn’t give a clear indication when my desktop was actually shared, so the beginning of the talk is missing the first two or three slides. If you have questions or want to get a copy of the presentation, just shoot me an email.


During the upcoming JavaOne conference there will a hands-on lab session by Daniel Templeton from Sun Microsystems (PetClinic in the Clouds: Scaling a Classic Enterprise Application) where users can go through the steps and deploy a Pet Clinic application on the cloud. 

In addition to that lab, I'm going to have a Technical Session (Alternative to Google Application Engine for Java™ Technology-Based Applications).

For those that are not going to join the JavaOne event and want to get some hands-on experience on how the steps that i outlined in the presentation can be implemented in real application, I’d recommend looking Pet Clinic demo that is available on Amazon cloud using our cloud framework.

I’m happy to say that the number of customers that are already using this approach through our Enterprise Cloud middleware platform is growing quickly and were seeing more and more application that would traditionally considered as not "cloud compatible", being deployed on the cloud. Below are few of the public references that were mentioned just recently on Jim Liddle's blog:

April 14, 2009

Designing a Scalable Twitter

Guy Nirpaz, Uri Cohen and Shay Banon came up with an interesting exercise as part of the recent partner training that took place at the GigaSpaces office. In this exercise, the students were asked to come up with a scalable design for Twitter, using Space-Based Architecture.

There are some interesting scalability lessons from this exercise, which are applicable to anyone looking to implement new-style real-time web applications such as the ones used for social networking.

In this post I'll  try to summarize the main patterns to put into place and considerations to make when designing such a scalable architecture.

Background:

For those of you who are not yet familiar with the service, Twitter is sort of a SMS-service meets discussion board.  You can post short messages (up to 140 characters) that can be shared with a group of subscribers that are referred to as "followers". The main difference between twitter and other messaging applications is that both SMS and Instant Messaging (IM) applications were designed primarily for one-on-one communications whereis Twitter was designed primarily for broadcast communications (publish/subscribe, or pub/sub). Another aspect that is special about Twitter is that by default anyone can follow anyone else. In other words, it was designed for open communications, not private, as were IM and SMS.

What are Twitter's scalability challenges?

1. Sending a tweet (a message on Twitter is known as a 'tweet') -– The challenge is how to handle an ever-growing volume of tweets and re-tweets and responses that can lead to a viral "message storm"

2. Reading tweets – The challenge is how to handle a large number of concurrent users that continually “listen” for tweets from users (or topics) they follow.

Designing A Scalable Twitter

Choosing the right scalability patterns

Almost every challenge in software architecture has its roots in one of the existing patterns. So the simplest course is to start by looking for those patterns, and choosing the right patterns to scale the application. Looking at many other scalable architectures, we'll begin with a partitioning pattern as the core design principle. By partitioning our Twitter-like application we'll spread the load across a cluster of servers and scale by simply adding more servers (i.e., partitions).  Another important architectural observation about Twitter is that it doesn’t fit into the classic database-centric design that most web applications do. On the flip side, it doesn’t fit well with a messaging-centric design (pub/sub) either. It is a combination of the two.

A pattern that is suitable for this type of collaborative messaging is known as a blackboard pattern.  In our design, we will use those two design patterns -- partitioning and blackboard -- as the foundation for our scalable Twitter application. With the foundation in place, let’s list the requirements and examine how these patterns can be used to scale the app.

Scalability Requirements

We'll assume a relatively extreme scaling requirement:

  • Tweet Volume: 10 billion tweets per day
  • Tweet Storage: 100 Gigabytes per day (with 10:1 compression)

Additional assumptions:

  • Tweets are limited to 140 characters
  • Tweets are immutable, i.e., there are no updates, only inserts
  • Twitter limits client applications to 70 requests per hour

Now that we have the foundational patterns and clear requirements, we can design the architecture. We'll start first with the blackboard system.

Using an In-Memory Data Grid (IMDG) as a Blackboard System

The are several approaches to building a blackboard system. To maximize performance and scalability, we'll store the data in memory, thus avoiding disk I/O, which is often the main cause for contention. For years, Java has provided a model for designing blackboard systems known as JavaSpaces. More recently, distributed caching has become popular and can provide similar capabilities to those of JavaSpaces. Let's examine two popular distributed caching approaches for our blackboard system:

  1. Simple read-mostly caching using memcached
  2. Read/write caching, also known as an In-Memory Data Grid (IMDG)

Choosing between memcached and an IMDG

Memcached enables us to to store the data (tweets) in a distributed memory set and read it in a scalable fashion. Having said that, be aware that memcached is not transactionally-safe and is not designed for reliability (i.e., it doesn’t support fail-over and high availability). That means that if we use memcached or something similar, we will have to use a database as the back-end. Every tweet posted will have to be written to both memcached and the database in a synchronous fashion to ensure that no tweet will be lost. This approach may be good enough for scaling read access, however, for writes and updates it offers limited scalability.

Unlike memcached, which was designed for simple read-mostly caching, In-Memory Data Grids  are designed for handling a read/write scenario, and can therefore act as the system-of-record for both write and read operations. We can still use a database for long-term persistence, but because the IMDG maintains its reliability purely in memory, we can write and update the database asynchronously and avoid hitting the database bottleneck.

Todd Hoff, author of highscalability.com wrote an interesting summary that covers the different products in this space in a recent post:  Are Cloud Based Memory Architectures the Next Big Thing?

Todd provide a clear explanation of how an IMDG works (using GigaSpaces):


Nati blog 1 (2)
Natiblog 2 (2)  

  • A POJO (Plain Old Java Object) is written through a proxy using a hash-based data routing mechanism to be stored in a partition on a Processing Unit. Attributes of the object are used as a key. This is straightforward hash based partitioning like you would use with memcached.
  • You are operating through GigaSpace's framework/container so they can automatically handle things like messaging, sending change events, replication, failover, master-worker pattern, map-reduce, transactions, parallel processing, parallel query processing, and write-behind to databases.
  • Scaling is accomplished by dividing your objects into more partitions and assigning the partitions to Processing Unit instances which run on nodes-- a scale-out strategy. Objects are kept in RAM and the objects contain both state and behavior. A Service Grid component supports the dynamic creation and termination of Processing Units.

Back to our Twitter app: Given the scalability requirements, we will need to scale both reads and writes, and therefore, an IMDG is a more suitable approach to implementing the blackboard system.

Now let’s examine how the use of an IMDG as the blackboard system enables us to scale both sending and reading tweets. Let's start by designing the partitioned cluster.

Designing a partition architecture

One of the main considerations in designing a partition cluster of any kind is determining the partition key, such as a Customer ID in a CRM application or a Trade ID in a trading application. At first glance, it sounds like a trivial decision, but choosing the right partitioning key requires a deep understanding of the application usage patterns and data model.  In the case of Twitter, we could choose to partition the application by the data-type, the user, the tweet itself or the followers. Our first goal is selecting a key that will that will be granular enough to enable scaling the application just by adding more partitions, while making sure that we don't end up with a key that is too fine-grained -- making it sub-optimal for querying purposes.

If we use the timestamp key, for example, our application will be optimized for “inserts” (writes), however, even a simple query such “retrieve the tweets of a certain user” will force us to execute an aggregated query against all partitions. Alternatively, if we partition the data based on user-id, we'll be able to easily spread the load from different users across partitions. Retrieving the tweets of a certain user is going to be resolved in one call to a single partition. We may encounter a problem if a single user generates a significant higher load than average, however, in the case of Twitter, we can assume that this is not very likely. Partitioning by user-id is a good compromise.

Data capacity analysis

With such extreme requirements it is clear that storing all tweets in memory is going to require huge memory capacity. Very quickly this will become economically prohibitive, so we need to devise a scheme in which the IMDG acts as a buffer for most of the load on the system, and then offloads the data and queries to an underlying persistent storage.  In our Twitter example, it is fair to assume that most real-time queries (those that require fast access to the data) will be resolved in data from the last hour or 24 hours. Queries that require older data will need to hit the database for the initial call. However, subsequent access to fetch new updates should be resolved purely in-memory.

Using this approach, we'll need about 10 servers, each holding 10GB of data in memory to accommodate 24 hours of activity. If we also want to back up the data in memory, we will need double the amount of servers.

Choosing the right eviction policy

It's reasonable to assume that recent data is accessed most and older data is rarely used. To ensure that we get the maximum hit ratio on our memory front-end, let's choose a time-based eviction policy, which always holds the most recent updates in memory. When we will reach our memory capacity limit the oldest data will automatically get evicted from memory. The actual window of time in which we will be able to keep in memory is obviously dependent on the size of the cluster. With an IMDG implementation all tweets are stored in a persistent storage, which means that when tweets are evicted they are not deleted from the system.

Scaling tweet writes:

If we select user-id as the partitioning key, each user tweet will be sent to a specific partition. Multiple users may be routed to the same partition. Usually the algorithm to determine which partition fits a certain user is something like:

routing-key.hashCode() % #of partitions

In GigaSpaces, this is done by marking the routing attribute of our tweet class with an @SpaceRouting annotation.

The web front-end application will call space.write( new Tweet(..),..)  to send the tweets. This way there is nothing in our web client code that exposes the fact that the underlying implementation interacts with a cluster of partitions (spaces in GigaSpaces). Those details are abstracted within the space proxy. When the write method is called on the space proxy it parses the field that matches @SpaceRouting from our Tweet() object and uses this field value to calculate the partition it belongs to. It then uses that value to route the Tweet(..) object to the appropriate partition.

With this approach, the web application can be written in a very simple way and can interact with the entire cluster as if it was a single server.

Natiblog 3

The data from the memory partitions gets stored asynchronously into a persistent storage. The persistent storage could be a database, but it could also be other things, such as an index search engine based on Compass/Lucene.

Scaling tweet reads:

To those familiar with messaging system, at first glance Twitter looks like a classic publish subscribe application. A closer look, however, reveals that any attempt to implement Twitter with something like a JMS message queue is going to fail in achieving a scalable system. This is especially true if you consider that the system needs to maintain a durable queue for each user. That could easily lead to a scenario in which each tweet is published to thousands of subscribers and every re-tweet can potentially lead to a "message storm".

As I discuss above, the right way to think about this type of application is as a blackboard pattern, just as a blackboard (or these days, a whiteboard) is used by a group of people (followers, in the case of Twitter) to share information and collaborate. When someone writes something on the board, everyone sees it and can choose to react. Unlike messaging (take email for example), we don’t need to send separate messages to each subscriber. Instead everyone is looking at the same board. Everything is also copied from the board to paper. When the board runs out of space, we erase it. And we can always page through the paper copy to access the board history. 

In Twitter, this means that each follower that follows a group of people is basically polling for messages posted by those users from the last time he read them. To make things more tangible we can express this type of query with the following SQL syntax:

SELECT * FROM Post WHERE UserID=<id> AND PostedOn > <from date>.

The <from date> will normally be the last few minutes, if we're constantly looking for new messages.

But there's a caveat. Remember that we partitioned the application by user-id? This means that each user's tweets are stored in a separate partition. How can we read all users' posts? If we poll for each user individually, we will end up with a lot of network calls. The simplest approach would be to execute one call that looks for ALL the users we're following and look for updates (new tweets) from those users. The pattern we'll use to perform such this task is mapreduce. One way to do that with GigaSpaces is through the distributed task API:

Nati blog 4

The distributed task API is a modern version of the stored procedure. The following snippet shows what such a call would look like:

AsyncFuture<Long> future = gigaSpace.execute(new GetTweetsUpdates());
long result = future.get(); // result will be the number of primary spaces

The GetTweetsUpdates() class contains code that will be injected in each partition and will enable us to look for updates from the users we follow in a single call. Because the call runs in-process, and because the data is stored in-memory, executing such a task is extremely fast compared with the equivalent with database and stored procedure operations. Execution is aggregated to the caller implicitly. The caller can use a reducer to aggregate the results into a single result object.

Scaling the web front-end

Nothing really new here. We'll use a classic web front-end, which is comprised of a load-balancer and a cluster of web servers that act as a front end to our IMDG instances. The web application will use a single cluster-aware IMDG proxy to send new tweet posts. The IMDG proxy will be responsible for mapping the tweet with the actual partition that hosting the tweet. That logic is kept completely out of the application code. This allows us to keep our web tier clean and simple.

Keeping the web layer stateless to avoid session stickiness

One common pattern for keeping the web tier scalable is to use a Shared-Nothing Architecture, which basically means that the web tier will be stateless. This requires keeping the user session state external to the web-tier. As previously demonstrated, the IMDG can be used as high-performance, scalable data store for maintaining shared session state information. This allows us to avoid session stickiness and to scale the web tier without being locked in to a specific server throughout the entire session, in case the server is over-loaded.

For more information on how to scale the web tier, as well as other important capabilities such as self-healing and auto-scaling, see the following tutorial: Scaling Your Web Application.

Making it simple and cost-effective using cloud computing

Twitter is yet another example for a situation in which system load is highly variable and the difference between average load and peak load can be quite significant. In such cases, provisioning our system can be fairly hard and costly. This is where cloud computing and SLA-driven deployments can help us scale on demand and pay only for what we use.

Once we figured out a way to partition the application, it's going to be much simpler to package the application into self-sufficient units (referred to in GigaSpaces as processing-units) and scale the application simply by adding or removing these units on demand. You can learn more about this here

Final words

Scaling a real-time web application such as Twitter or Facebook introduces unique challenges that are are quite different from those of a "classic" database-centric application. The most profound difference is the fact that unlike with traditional sites, Twitter is a heavy read/write application, and not read-mostly. This seemingly minor difference can break most existing models for web application scalability. Using a combination of memcached + MySQL is not going to cut it for this type of application. 

The good news is that with the right patterns and set of tools, building a scalable architecture that meets such challenges isn’t that difficult.  There are already plenty of success stories that demonstrate that, such as the following example from highscalability.com: Handle 1 Billion Events Per Day Using a Memory Grid

The proposed architecture is by no means perfect and can be further optimized to meet even better performance and latency, but that will come at the cost of simplicity. I believe that the proposed architecture should get you pretty far as-is. Avoid going through more advanced optimizations until the point they are an absolute must.


References

January 09, 2009

Getting ready for the cloud

In the past few months, I have been speaking in various conferences about cloud computing.
In my presentations, I tried to focus primarily on how one can take practical steps to benefit from the cloud today. To illustrate my points, I used GigaSpaces as the scale-out application server and Amazon EC2 as the cloud infrastructure and thus demonstrated how one can deploy EXISTING java applications on the cloud while:

  • Not having to re-write your application
  • Preventing lock-in to specific cloud provider
  • Enabling seamless portability between your local environment to cloud environment 
    • No code or configuration change is required between the two environments
    • Develop local - test on the cloud
    • Built for iterative development

In addition, I demonstrated the use of our new Cloud tools framework which enabled me to fully automate the entire application provisioning process on the cloud. This tool enabled me to:

  • Deploy load balancer
  • Cluster web containers
  • Cluster application processing units
  • MySQL database connected to EBS

And all that through a single click deployment.

In the demonstration, I also showed how you can benefit from the cloud:

  • Dynamically scale when the load breaches a certain threshold
  • New web containers are dynamically linked to the load-balancer (no configuration change is required as all is taken care of on the fly).
  • Self healing - if something breaks, scale-down to existing machine while getting a new machine ready.

The entire presentation can be viewed online 


You can also view the online video presentation that i gave in the Cloud Summit in Tel Aviv where i ran a live demo of our new cloud infrastructure on EC2 demonstrating both dynamic scaling and also what happens when machine fails in front of a live audience!


VideoIcon[1] The video recording of this presentation is available here



If you want to try to run the demo yourself and you don't have an Amazon user - just drop an email to cloud at gigaspaces dot com and will send you a free pass to the early access program.


Enjoy!

December 22, 2008

Reducing latency with Sun Real Time JVM

Frederic Pariente  Engineering Manager at Sun Microsystems posted an interesting summary of a case study with GigaSpaces on Sun blog: Gigaspaces curbs latency outliers with Java Real Time

In the context of a customer proof-of-concept this summer and in the light of the 2.0 release of Java Real Time System --JRTS 1.0 had the bad prerequisite of source code changes--, Gigaspaces revisited the opportunity for Java Real Time to serve the low-latency requirements of trading applications. Gigaspaces XAP 6.5, Solaris 10 and both Java 5.0 standard and real-time JVMs were used for the benchmark. The test scenario included a trade matching engine and multiple clients injecting messages at extreme speed. The success criteria was to get guaranteed latency per message under 10 msec, with no code modification to the matching engine"

"..The first lesson learned was that msec latency was achievable with the standard JVM, through some advanced tuning of the JVM command-line options. While the customer had reported application freezes up to 20 sec during garbage collection under heavy load --he was running the JVM with no particular flags, unfortunately default JVM options optimize for throughput--, latencies could be brought down to milliseconds by switching to the Concurrent Garbarge Collector"

"..The second lesson learned was that the number of outliers can be reduced by an order of magnitude by using the real-time JVM. At a small cost in terms of application throughput --lower-- and CPU usage --higher-- of course"


You can see the full detailed benchmark and JVM option in the original post.

How Real Time JVM works?



For those who are not faimiliare with Real Time JVM, Fredric points to a very detailed presentation on his post Java Real Time for latency-critical banking applications  which I'd recommend looking at. I took one slide from the presentation which i found useful for understanding the general concept behind Real Time VM.

RT-Java

As could be seen in the above diagram the RT-JVM introduces new type of threads "Critical RT threads". It makes sure that GC will not run while those thread are running and in that case provide better predictable behavior.

Other references:

Latency is Everywhere and it Costs You Sales - How to Crush it - My Take

December 09, 2008

Latency is Everywhere and it Costs You Sales - How to Crush it - My Take

Over on HighScalability.com Todd Hoff posted one of the comprehensive articles on latency that I've read titled Latency is Everywhere and it Costs You Sales - How to Crush it. It covers almost every aspect of latency, and is a must-read on the subject. Todd provides a good explanation of how Space-Based Architecture helps in reducing latency through collocation of tiers and by utilizing memory to remove the I/O bottleneck:

The thinking is that the primary source of latency in a system centers around accessing disk. So skip the disk and keep everything in memory. Very logical. As memory is an order of magnitude faster than disk it's hard to argue that latency in such a system wouldn't plummet.

Latency is minimized because objects are in kept memory and work requests are directed directly to the machine containing the already in-memory object. The object implements the request behavior on the same machine. There's no pulling data from a disk. There isn't even the hit of accessing a cache server. And since all other object requests are also served from in-memory objects we've minimized the Service Dependency Latency problem as well.

In this post I wanted to summarize my take-aways from Todd’s article and add some of my own thoughts based on my experience with GigaSpaces customers.

Sources for latency – is it the network or the software?

When discussing latency most people fall into one of two main camps: the "networking" camp and the "software architecture" camp. The former tends to think that the impact of software on latency is negligible, especially when it comes to Web applications.

Marc Abrams says "The bulk of this time is the round trip delay, and only a tiny portion is delay at the server. This implies that the bottleneck in accessing pages over the Internet is due to the Internet itself, and not the server speed."

The "software architecture" camp tends to believe that network latency is a given and there is little we can do about it. The bulk of latency that we can control lies within the software/application architecture. Dan Pritchett's Lessons for Managing Latency provides guidelines for an application architecture that addresses latency requirements using loosely-coupled components, asynchronousinterfaces, horizontal scale from the start, active/active architecture and by avoiding ACID and pessimistic transactions.

So is it the network or the software?

The simplest way to answer this question is to run a mockup test that removes the impact of the software on latency from the equation.

Global optimization vs Local optimization

To better understand latency optimization we can use the analogy of a plant production line. We have a big pipeline of things that need to get done and we need to look at each element in the pipeline to optimize our production latency. In an earlier posts - Moving to Extreme Transactions Processing using Lean methodology - I discuss how we can apply the same principles used in manufacturing line optimization, such as the Lean methodology, in software systems. I tried to illustrate the applicability of some of the core principles of lean methodology. Here’s a recap:

In many cases, we can get more bang for the buck by looking at an extended value-stream, as opposed to a localized one. Local optimization means digging into the latency path in a specific component in our system. With global optimization, however, we look at the entire pipeline and optimize at that level.

An example of local optimization would be looking at our messaging system and lowering the latency of sending a message from point A to point B. An example of global optimization would be looking at the end-to-end transaction. Processing a typical transaction involves sending messages through a messaging system, consuming it, and then updating the database. If we collocate the message queue with the data receiver, we can easily eliminate half of the network hops. Additionally, if we’ll use the same storage for messaging and data, we can avoid the 2-phase commit overhead. At the same time, we can analyze the user experience to see how many clicks it takes to perform a given operation. By reducing the number of clicks we can reduce the perceived latency much more then we can by reducing the time it takes to process each click. It’s easy to see how we can get better latency savings by taking the global optimization view. Global optimization often has much more room for optimization than a local one.

If our system is already designed with a scale-out model, adding more machines and spreading the load is much simpler than trying to apply local optimizations.

Scalability as a major source for latency.

One topic that is often missing or less understood in many latency discussions is the impact of scalability on latency.

Todd writes: "We put shards in parallel to increase capacity, but request latency through the system remains the same". This statement is a common fallacy. It assumes that each request is completely independent of the others. In reality, however, if the application is not designed with a scale-out/share-nothing approach then at some point it will hit a shared contention, which makes those supposedly parallel requests dependent on each another. Contention happens when multiple concurrent users or business-requests hit a shared resource at the same time. A shared resource might be a hardware resource -- such as CPU, memory or disk -- or a software resource, such as a shared database lock. Shared resources need to be freed before another request can be processed through them. This contention time is proportional to the number of concurrent attempts to consume the shared resource and the duration in which the resource is locked. This is one of the basic principles of Amdahl’s Law, which shows that to increase processing capacity of a request that spends 10% of its time on a shared lock will require a 100x increase to CPU power. This contention time, therefore, must be added to our “latency path”. In a non-scalable system this will be proportional to the number of concurrent requests, meaning it will rapidly lengthen as the system load increases, up to the point in which the system will face “resource starvation”. (See further discussion of this here ).

Based on my experience, hardware and software contentions are some of the main contributors to latency. This is partly due to the fact that network overhead latency is relatively fixed, while application overhead latency is variable. It is extremely complex to design a fully-optimized software application.

Scalability happens to be one of those things that are often implemented in a non-optimized manner, and as mentioned above, lead to latency. The only way we can reduce the scalability overhead on latency is by reducing the contention points in our application. The typical method for reducing the contention is through “sharding” (partitioning) the access to those resources using a share-nothing approach. Disks are less concurrent than memory, and therefore, removing the dependency on disk access in the critical path of the operation is one of the keys to a latency reduction strategy. This is one of the key principles of Space-Based Architecture:

The impact of peak load provisioning on the latency cost

Another source of latency is related to provisioning. Many web sites uses static provisioning based on peak load. But how do we measure peak load? With the introduction of social networks, and phenomenon such as the Digg Effect, it becomes extremely hard to predict peak loads, as user traffic is subject to “viral behavior” leading to sudden spikes in traffic. The further ahead we try to plan, we increase the chances of missing the target. This will lead to one of two outcomes: 1)Over-Provisioning – in which case latency is not harmed, but we unnecessarily pay the cost of more servers and other resources than we normally need. 2) Under-Provisioning -- in which case our site may significantly slow down or even crash.

Use on-demand scaling to smooth the latency peaks

If we can't predict the peak loads accurately, we need to scale the system rapidly whenever we see it is approaching capacity. If the system was not designed for scale-out (linear scalability), the process of scaling will involve a substantial amount of work and tuning, which is time-consuming and therefore defeats the purpose.

A scale-out approach enables us to scale on demand and smooth out the impact of load spikes on application latency by adding servers when the load is up and removing unnecessary ones whether load is reduced. In this way we can cost-effectively control latency.

Cloud computing and virtualization enable us to build such an “elastic computing” model with significantly less effort than previously necessary. For example, the GigaSpaces Cloud Framework already supports on-demand scalability for web containers.

My colleague Shay Hasidim posted a latency-benchmark that measured how low-latency is maintained by increasing the number of servers.

Web_bench2[1]

From the results above we see that as we increase the number of web servers system contention (scalability barrier) grows in terms of the number of concurrent users. With a single server, latency increases starting with 100 concurrent users; with two servers, at 300 concurrent users, and with three servers -- 500 concurrent users.

To ensure linear scalability on the web-tier, we must ensure that the underlying data-tier scales-out at the same level as can be seen in the diagram below. In this case we used GigaSpaces’ In-Memory Data Grid as a front-end to a MySQL.


Web_bench3[1]

With the graph above we see that the IMDG scales very close theoretical linear scalability. The above results were achieved with an IMDG running on 2 partitions. Better scalability can be achieved by increasing the number of partitions.

Read more about how to scale-out the data-tier in Scaling-out MySQL.

To enable this level of on-demand scalability we used our new Cloud Framework, which combines the GigaSpaces SLA-driven container as the application deployment virtualization layer, Amazon EC2 as the machine level virtualization layer, and the GigaSpaces application server as the middleware virtualization layer. This way we can provision new machines as soon as the SLA on the web-tier is breached (measuring latency, in this specific case). When such an event happens we launch new machine instances on EC2. A new web container is provisioned on these machines through the GigaSpaces SLA-driven deployment system. An apache load-balancer agent is responsible for synchronizing the load-balancer whenever a new web container joins the cluster. Using this approach we can achieve end-to-end dynamic scalability, starting from the load-balancer, through the web-tier and business-tier, and ending with the data-tier.

It is important to note that while this test was performed on EC2, there is nothing that bounds the solution specifically to the EC2 environment. In fact, we used the same exact model to enable dynamic scaling on private-clouds using GigaSpaces and the Sun Grid Engine, for example. A more detailed description of that is available here.

Data Query latency

Query latency is the time it takes to process a query request and receive the result. There are a few factors that influence query latency:

  1. The time it takes to access the data (read it from file in case it is stored on disk)
  2. Contention - the time spent on a shared lock to access the data
  3. Complexity of the query - the number of calls involved in executing the query

We can address each of those issues as follows:

  1. File systems are not optimized for concurrent access. In addition, file systems are stream-based systems that enforce serialization and de-serialization of the data every time we wish to access it. An easy way to eliminate this overhead is to put the data in memory, which enables access to it using a direct reference. (See further details in InfoQ Article - RAM is the new disk).
  2. We can reduce contention by partitioning the data, which also results in partitioning the lock. Putting the data in-memory also reduces contention because memory is much more concurrent than disk, and we don't need to inherit global file system locks. Instead, each data item can have its own locking. This will enable much more concurrent access to our data.
  3. Collocating the business logic with the data - We can reduce the number of remote calls required for each query using a stored procedure approach, meaning the business logic runs collocated with the data. For partitioned data we will need to use a MapReduce-like pattern to enable execution of the queries on distributed data sources. The fact that our data source is now partitioned enables us to reduce the time it takes to query compared with running the same query in a centralized database for the following reasons:
  • The data-set per partition is smaller; and

We can leverage the full capacity of the CPU/memory of each partition to get more power to process the query

Garbage collection impact on latency

Another source of latency that I found missing in Todd article is the impact of Garbage collection. Garbage collection is used in any Java or .Net application. Garbage collection runs as a background thread that cleans all the unused object in the JVM. In early versions of Java the Garbage collection implementation used a synchronized block on the entire memory during the time the garbage collection cleaned those unused objects. This hiccup time is dependent on the size of memory, number of CPU's and number of objects that are freed between each GC cycle. In those early versions of Java it was a common practice to use Object pooling as a way to reduce this hiccup. Object pooling basically bypassed the GC work and we had to take control over object lifecycle in our code. Having said that Object pools themselves became a shared resource and source for contention. As of Java 5 the GC algorithm was improved to enable more concurrent garbage collection. This means that the hiccup time was curved-out over the time therefore had less impact over our application peak performance. This holds true as long as we have enough CPU cycles to spend on the GC cycles. The caveat is that if our application consumes 100% of the CPU all this optimization is not going to help as when the GC hit our system it will compete with our application time and therefore the end result is long hiccups again. Real-time VM aims to address this problem by spreading CPU cycles between the application threads and GC threads in deterministic manner. I.e. it will slow down our application in some cases to ensure that GC gets enough cycles and in that case provide more predictable latency behavior on behalf of throughput.
The JVM comes with different switches that enable better control over the GC behavior and provides means to adjust the GC time. One thing to note about GC optimization is that it tends be close to a voodoo art. it works in certain scenarios and break in others so It is very hard to find the right combination.

Avoiding GC hiccups - Avoid over utilization

Based on my experience, the simplest and most effective way to avoid GC hiccups is to avoid over-utilization. This means that we need to plan our system in a way that wouldn't consume more then 80% of the CPU and memory under peak loads (you can choose a different threshold that may be more appropriate for your organization). For example, if the servers can process up to 500 requests per second at 100% utilization, and we have a requirement to process 1000 requests/sec, it is better to provision three machines, each processing roughly 330 request/sec, rather than two machines that are maxed out at 1000 request/sec. We also need to make sure that we have the right proportion between memory and CPU. For intensive read/write applications, I would go with at least 1CPU/2GB, and if possible, even 1CPU/1GB. These rules of thumb should get you most of what is needed in terms of latency. Obviously if that’s not enough, then you need to dig deeper into the GC flags or consider a Real-Time JVM, but you should use those options as a last resort.

A note 64 Bit VM provisioning:

Lately I've experienced some cases where people thought that they can use 64-bit machines and large memory heap sizes to reduce the cost of the system (mostly due to software license costs and machines maintenance fees). The assumption was that if with 64-bit each process can manage more memory, they can use fewer machines and fewer processers. What they didn't take into account are the considerations I presented above. The number of CPUs needs to be proportional to the amount of memory, and not just to the number of VM processes running the IMDG. This means that using 64-bit VMs can reduce the number of machines, but might have almost no impact on the number of CPUs the system will leverage. As for the amount of memory that each process can handle - that number tends to vary widely, so I don't feel comfortable giving a concrete number other than to say that I know of systems using the GigaSpaces product with 8GBs per process.

The cost of latency

Everything we do has a $ value associated with it. Latency is no different. Todd mentioned in his post some of the issues related to latency cost.

The cost associated with losing users due to a bad user experience – this measurement is typical for e-commerce, social networking and search engines sites: "Amazon found every 100ms of latency cost them 1% in sales. Google found an extra .5 seconds in search page generation time dropped traffic by 20%."

Another cost associated with losing trades – in this case the cost is a measure of the chance of losing business when your competitor can trade faster than you do: “A broker could lose $4 million in revenues per millisecond if his electronic trading platform is 5 milliseconds behind the competition.”

Another aspect that was not mentioned is the operation costs of achieving latency targets. This cost factor applies to latency in the same way it applies to other scalability operational costs.

The cost of over provisioning
– if the system was not designed for on-demand scaling then we are probably spending money on over-provisioning. Meaning the system is statically provisioned to have more machines than we actually need on average, and we pay the costs of under-utilization (idle resources waiting for peak loads).

The cost of failure – if the system was under-provisioned, then we are likely to face the cost of downtime. According to a Forester survey conducted with 235 organizations, 33% estimate the hourly cost of downtime at $10k-$100k , 25% at $100k-$500k, and 13% $500k-$1M.

How cloud computing can help to improve latency and save some of the latency cost

  1. Built for on-demand scalability – cloud computing is a great enabling infrastructure built for on-demand scaling.
  2. Geographically distributed – we can improve latency by running our servers close to the geographical location of the user. Quoting Todd’s article again: "Facebook opened a new datacenter on the east coast in order to save 70 milliseconds "

We can now have data centers spread around the globe at our disposal making it easy to run our applications in those different data center locations and point the user to the closest location at a fraction of the cost.

It is true that all this can be achieved without cloud computing. But cloud computing reduced the barrier to entry so that even the smallest startup can apply these optimizations, previously considered a luxury that only big companies could afford

My 20/80 rules for achieving predictable latency

I'm sure that many readers are aware of the fact that out of the many possible sources of latency, there are some that are beyond our control: Internet routers, for example.  One of the key questions I ask myself in relation to latency is whether there is a 20/80 rule.  What are the 20% of the things I should focus on that will help me reduce 80% of the latency. In this section I'll try to provide the guidelines I use for designing a system for optimum latency.

  1. Focus on application architecture and leave hardware and OS optimizations as a last resort. The performance provided by commodity hardware should be good enough for 80% of cases. In addition, the effort of optimizing hardware and Internet routers might involve a huge investment, and therefore, should be used sparingly. If you’re not sure whether the source of latency in your application is the network or the software, run the tests I mentioned above.
  2. Start with global optimizations – before you begin to optimize the database, the router and the messaging system, look at the entire pipeline of your business request. By looking at that global level, you may find that parallelizing some part of the request, or changing some of the reliability/consistency requirements, may have a much bigger impact than any local optimization.
  3. Use Spaces Based Architecture principles (even if you’re not using GigaSpaces) – quoting Todd again:
    1. Co-location of the tiers (logic, data, messaging, presentation) on the same physical machine (but with a shared-nothing architecture so that there is minimal communication between machines)
      1. Assemble/Collocate your application components based on the runtime/execution flow dependency and not based on their function in the system. For example if each request need to go through various steps such as parsing, validation, matching and execution it doesn't make sense to do each of those steps in separate process/tier. Instead you can make sure that all of those steps will be collocated and split the application into multiple units each containing all those various components. In SBA we refer to those units as processing units. This is probably one of the main difference between Space Based Architecture and Tier based architecture. In tier based approach our application is broken down into presentation tier, business logic and data-tier where in SBA we tend to collocate all of those tier as much as possible and split the application into multiple horizontal units each containing all the tiers to reduce the amount of moving parts and network hops.
    2. Co-location of services on the same machine
    3. Maintaining data in memory (caching)
    4. Asynch communication to a persistent store and across geographical locations 
      1. Avoid calling any disk/database operation at the critical path of the execution. With the addition of data-grid we can use in-memory data as the system of record. This enables us to avoid data or file access during the critical path of the user request and delegate the update to the data base as an asynchronous operation.
      1. You can add to Todd summary the other pieces associated with Query optimization such as the use of Map/Reduce and moving the logic to the data is located that I laid out above.
  1.  Design your system for dynamic scalability – Dynamic scalability doesn't necessarily means that scaling needs to happen in real time. It means that scaling can be done without changing code and the cycle of scaling is short. In many real-life scenarios “short” could mean a day or even a week.
  2. Provision correctly - Avoid over utilization.
  3. Other tips for optimizing the architecture:
    1. Decouple application components - Use SOA and EDA to make your application granular enough so that you can easily change the way you assemble the different components of your system without code changes. This flexibility is important as it will allow you to decide at different stages what your business logic pipeline is going to look like. It will allow you to optimize later, such as collocating elements that have strong dependencies among them from a business perspective. 
    2. Abstract your communication layer - Abstracting the network layer enables latency reduction when the components are collocated. This abstraction is also important to enable easy plug-ins of different transports without changing code. Assume that in the future new protocols, transports and other technologies will be introduced. By decoupling your code from the transport you can easily plug them in when they become available.


In most cases, following these steps gets you most of what you need. It also provides a good basis for eliminating many of the factors that make latency optimization on other layers more difficult. For example, if we run the business logic in a collocated in-process mode we isolate the impact on our code from external factors such as routers. It also provides a good model for troubleshooting and optimizing the system in case latency goes wrong. Rather than dealing with latency as a big-bang project, we should break it down into the levels that enable us to deal with the latency problem in a more gradual manner.

November 27, 2008

Data agregation pattern for effective monitoring

In my previous post I wrote about two patterns for using a GigaSpaces cluster to solve some of the issues involved in managing distributed applications:

  • Using the space as a scalable alternative to a directory service. With this approach each service publishes itself to the space directory service and a JMX proxy acts as a wrapper that virtualizes the different managed services from the client accessing them.
  • Using the space as a management repository aggregating management data.

Steve Colwill from PSJ published a second post on this topic titled Data Aggregation via JMX and the Grid that covers the second option outlined above.  In summary, Steve suggests that each managed agent will report its aggregate statistics to the space. A JMX façade is used to expose the statistics through a standard JMX API as described in the diagram below:

Agregating data2

If you think about it, this is yet another proof of the old aphorism by Butler Lampson that "All problems in computer science can be solved by another level of indirection".

The thing about indirection though is that it basically moves the problem from one layer (the application layer or management layer in our specific case) to another layer (infrastructure layer).  Now if that infrastructure layer is not in place, we have to create it ourselves, which makes Butler's statement kind of empty. This is where Space-Based Architecture becomes handy. It serves as a general purpose infrastructure layer that provide a means for solving data distribution, high availability and scalability. In this case we applied the abstraction principle as a way to expose the generic space capabilities through a specific set of APIs (JMX in this case). The combination of the two creates what I often refer to as a middleware virtualization layer. We use the same API, but the implementation of this API is virtualized. In this specific case, our JMX API doesn't point to a physical JMX server, instead it points to a virtualized cloud of servers.

Quoting Steve: 

One of the reasons I'm a fan of GigaSpaces and space-based architectures is that a number of architectural choices that are traditionally hard-wired: transactional/non-transactional, sync or async replication can be changed through configuration only. This enables common design patterns (and therefore components) to be applied to a wide range of application problems, by enabling the data integrity/performance equation to be tweaked at a late stage of application assembly.

 

Summary

The main benefit of this approach compared to the one described in the previous post is that it decouples the client and the managed services. A client doesn't need to maintain direct a connection to the managed service. Most of the logic is kept server-side. In this way we can keep the client-side --  which acts as the management façade -- stateless and thin.

Because the space acts as a distributed data-store, it is easier to build aggregated statistics, as all the statistics are placed in one logical entity - the space.

Decoupling enables us to easily add new services/policies to our system without changing the management service. For example, we can add SLA monitors that can listen to the statistics in the space and take action once a certain threshold is breached. This capability makes the space an ideal solution for those looking to build their next-generation management and monitoring application.

Cloud management frameworks (for Infrastructure-as-a-Service) are ideal candidates for this, as they have the need for proximity of the management information and application behavior. The management layer acts as a loopback mechanism that can tell the application how it is actually behaving. The application can use this information to adjust itself to meet a given SLA without human intervention.

Having said all that, it is also important to note that the two options presented in Steve's posts are not necessarily mutually exclusive. I could easily see how one could use the approach presented in this post for maintaining the managed dashboard information, and the approach in the previous post to invoke specific operations on a managed service, in case you want to drill down to the managed service level. Having one underlying technology that can be used to serve the invocation, virtualization and data virtualization is yet another benefit of using the space.

October 06, 2008

Making EDA programming simple with JeeWiz

Event Driven Architecture (EDA) is becoming more popular these days, as the drive for loosely coupled and scalable architecture forces us to break our systems into components and integrate them through some sort of workflow.  Having said that, thinking in asynchronous events is not a trivial concept to deal with, seeing as we used to thinking and programming in a synchronous manner.

Space-Based Architecture lends itself very nicely to EDA, because it provides a means to register for events, manage the state of events and trigger different business logic elements based on state changes.
This makes the programming of EDA relatively simple compared with some of the other options, such as messaging and database systems. The following diagram shows how a typical EDA would look like in a Space-Based world - you can read the full description here.

Typical EDA with Space Based Architecture

While Space-Based Architecture makes EDA relatively simple compared with alternatives it can be made even simpler using advanced code generation tools that follows the Model Driven Development pattern.

JeeWiz is one of the leading products in that space: 


"The goal of JeeWiz is to automate software development as much as possible. JeeWiz builds all the code, configuration and build jobs that can be derived from high-level models of a system, achieving unprecedented levels of automation."

Matthew Fowler, Founder and CEO, New Technology/Enterprise Ltd. gave a presentation in our latest London Event introducing GigaSystemBuilder using JeeWiz which enables a model-driven development with GigaSpaces. JeeWiz is an Eclipse-based tool that makes it easy to create an entire project fairly easy. The product itself is highly customized. Users can use the same model to build their own templates, and in this way automate a large part of their development. The following diagram taken from Matthew's presentation, shows how a typical development process would look like with JeeWiz.

JeeWiz
Matthew's presentation contains more details about the specific integration with GigaSpaces and what the generated code would look like -- I would highly recommend looking into it. The presentation is available online here. I was also happy to see that the GigaSpacesBuilder Eclipse-plugin is now available for download here. It comes with full documentation and an easy guide to get you through the first steps.

Well done Mathew and the JeeWiz team!


My Photo

Twitter Updates

    follow me on Twitter