« February 2008 | Main | April 2008 »

March 2008

March 29, 2008

Scaling Out MySQL

With the recent acquisition of MySQL by Sun, there has been talk about the MySQL open source database now becoming relevant to large enterprises, presumably because it now benefits from Sun's global support, professional services and engineering organizations. In a blog post about the acquisition, SUN CEO Jonathan Schwartz wrote that this is one of his objectives.

Mysql_logoWhile the organizational aspects may have been addressed by the acquisition, MySQL faces some technology limitations which hinder its ability to compete in the enterprise. Like other relational databases, MySQL becomes a scalability bottleneck because it introduces contention among the distributed application components. 

There are basically two approaches to this challenge that I'll touch in this post:

1. Scale your database through database clustering

2. Scale your application, while leaving your existing database untouched by front-ending the database with In-Memory-Data-Grid (IMDG) or caching technologies. The database acts as a persistence store in the background. I refer to this approach as Persistence as a Service (PaaS).

While both options are valid (with pros and cons), in this post I'll focus mostly on the second approach, which introduces some thought-provoking ideas for addressing the challenge.

Disclaimer: While there are various alternative in-memory data grid products, such as Oracle Coherence and IBM ObjectGrid, in this post I'll focus on the GigaSpaces solution, because for obvious reasons I happen to know it better. Having said that, I try to cover the core principles presented here in generic terms as much as possible.

Scaling your database through database clustering:

There are two main approaches for addressing scalability through database clustering:

  • Database replication is used to address concurrent access to the same data. Database replication enables us to load-balance the access to the shared data elements among multiple replicated database instances. In this way we can distribute the load across database servers, and maintain performance even if the number of concurrent users increases.

            Limitations:

  • Limited to "read mostly" scenarios: when it comes to inserts and updates, replication overhead may be a bigger constraint than working with a single server (especially with synchronous replication)
  • Performance: Constrained by disk I/O performance.
  • Consistency: asynchronous replication leads to inconsistency as each database instance might hold a different version of the data. The alternative -- synchronous replication -- may cause significant latency.
  • Utilization/Capacity: replication assumes that all nodes hold the entire data set. This creates two problems:.1) each table holds a large amount of data, which increases query/index complexity. 2) We need to provision (and pay for) more storage capacity with direct proportion to the number of replicated database instances
  • Complexity: most database replication implementations are hard to configure and and are known to cause stability issues.
  • Non-Standard: each database product has different replication semantics, configuration and setup. Moving from one implementation to another might become a nightmare.
  • Database partitioning ("sharding"): database shards/partitions enable the distribution of data on multiple nodes. In other words, each node holds part of the data. This is a better approach for scaling both read and write operations, as well as more efficient use of capacity, as it reduces the volume of data in each database instance.

          Limitations:

  • Limited to applications whose data can be easily partitioned.
  •  Performance: we are still constrained by disk I/O performance
  •  Requires changes to data model: we need to modify the database schema to fit a partitioned model. Many database implementations require that knowledge of which partition  the data resides in is exposed to the application code, which brings us to the next point.
  •  Requires changes to application code: Requires different model for executing aggregated queries (map/reduce and the like).
  •  Static: in most database implementations, adding or changing partitions involves down-time and re-partitioning.
  •  Complex: setting-up database partitions is a fairly complex task, due to the amount of moving parts and the potential of failure during the process.
  •  Non-standard: as with replication, each database product has different replication semantics, configuration and setup. Partitioning introduces more severe limitations, as it often requires changes to our database schema and application code when moving from one database product to another.

Time for a change  -  is database clustering the best we can do?

The fundamental problems with both database replication and database partitioning are the reliance on the performance of the file system/disk and the complexity involved in setting up database clusters. No matter how you turn it around, file systems are fairly ineffective when it comes to concurrency and scaling. This is pure physics:  how fast can disk storage be when every data access must go through serialization/de-serialization to files, as well as mapping from binary format to a usable format? And how concurrent can it be when every file access relies on moving a physical needle between different file sectors? This puts hard limits on latency. In addition, latency is often severely affected by lack of scalability. So putting the two together makes file systems -- and databases, which heavily rely on them -- suffer from limited performance and scalability.

These database patterns evolved under the assumption that memory is scarce and expensive, and that network bandwidth is a bottleneck. Today, memory resources are abundant and available at a relatively low cost. So is bandwidth. These two facts allow us to do things differently than we used to, when file systems were the only economically feasible option.

Scaling through In Memory Caching/Data Grid

It is not surprising that to enhance scalability and performance many Web 2.0 sites use an in-memory caching solution as a front-end to the database. One such popular solution is memcached. Memcached is a simple open source distributed caching solution that uses a protocol level interface to reference data that resides in an external memory server. Memcached enables rudimentary caching and is designed for read-mostly scenarios. It is used mainly as an addition to the LAMP stack.

The simplicity of memcached is both an advantage and a drawback. Memcached is very limited in functionality. For example, it doesn't support transactions, advanced query semantics, and local-cache. In addition, its protocol-based approach requires the application to be explicitly exposed to the cache topology, i.e., it needs to be aware of each server host, and explicitly map operations to a specific node. These limitations prevent us from fully exploiting the memory resources available to us. Instead, we are still heavily relying on the database for most operations.

Enter in-memory Data Grids.

In-memory data grids (IMDG) provide object-based database capabilities in memory, and support core database functionality, such as advanced indexing and querying, transactional semantics and locking. IMDGs also abstract data topology from application code. With this approach, the database is not completely eliminated, but put it in the *right* place. I refer to this model as Persistence as a Service (PaaS). I covered the core principles of this model in this post. Below I'll respond to some of the typical questions I am asked when I present this approach.

How Persistence as a Service works?

With PaaS, we keep the existing databases as-is: same data, same schema and so on. We use a "memory cloud" (i.e., an in-memory data grid) as a front-end to the database. The IMDG loads its initial state from the database and from that point on acts as the "system of record" for our application. In other words, all updates and queries are handled by the IMDG. The IMDG is also responsible for keeping the database in sync. To reduce performance overhead, synchronization with the database is done asynchronously. The rate at which the database is kept in sync is configurable.

The in-memory data model can be different from the one stored in the database. In most cases, the memory-based data model will be partitioned to gain maximum scalability and performance, while the database remains unchanged.

Img1042


How does PaaS improve performance compared to a relational database?

Performance gains over relational databases are achieved because: 

  • PaaS relies on memory as the system of record, and memory is significantly faster and more concurrent than file systems.
  • Data can be accessed by reference, i.e., no need for continuous serialization of data, as with a file system.
  • Data manipulation is performed directly on the in-memory objects. Complex manipulation is easily achieved by running either Java/.Net/C++ code or a SQL query. There is no need for serialization/de-serialization of data or network calls during the process.
  • Reduced contention: instead of placing all data in a single table, and consequently having many clients accessing that table, we split it into many small tables, each of which will be accessed by a smaller number of clients.
  • Parallel aggregated queries: queries that need to span multiple partitions to perform join/sum/max operations can be executed in parallel across the nodes. The fact that the queries run on smaller data sets reduces the time it takes to perform the actual operation on each node. In addition, the fact that queries execute on multiple machines leverages the full CPU and memory power of those machines.
  • In-process local cache: read-mostly operations are cached in the client application local address space. This means that subsequent reads will be executed locally.
  • Avoid Object-Relational Mapping (ORM): read operations are performed directly from memory in object format. Thus, there is no need for O/R mapping overhead at this level. O/R mapping happens in the background either during the initial load process, or during the asynchronous update of the database.

If you keep the database in sync, isn't your solution limited by database performance? 

No. Because:

  • Data is sent asynchronously and in batches
  • Updates are performed in parallel by all partitions.
  • Updates to the database are executed collocated in the same machine as the database through a mirror service. This enables to reduce the network overhead to the data base as well as benefit from specific optimization such as batch operations.
  • The database is not used for high availability purposes. This means that In-flight transactions are not stored in the database, only the end result of the business transactions. This reduces the amount of updates that are sent to the underlying database. Also keep in mind that queries don't really hit the database, only updates and inserts. All this together means that the IMDG acts as a smart buffer to the database. It is common that the number of read/update hits the IMDG receives is 10x higher than the number of hits on the underlying database is seeing. 
  • The database and the application are now decoupled, giving you more options for optimization. For example, there are scenarios where writing to the database is required to ensure the durability of the data.  In this scenario, you can store the data directly in a persistent log (to ensure durability). The log can be updated at a relatively high rate. You can read the data from the persistent log back into the database as an off-line operation. With these options in place we can  easily get to 30,000 to 40,000 updates per second with a single instance of MySQL. If this is not sufficient you can always combine data base clustering  to speed up the data  base  access.

Doesn't asynchronous replication mean that data might be lost in case of failure?
No, because asynchronous replication refers to the transfer of data between the IMDG and the database. The IMDG, however, maintains in-memory backups that are synchronously updated. This means that if one of the nodes in a partitioned cluster failed before the replication to the underlying database took place, its backup will be able to instantly continue from that exact point.

What happens if one of my memory partitions fails?

The backup of that partition takes over and becomes the primary. The data grid cluster-aware proxy re-directs the failed operation to the hot backup implicitly. This enables a smooth transition of the client application during failure -- as if nothing happened. Each primary node may have multiple backups to further reduce the chance of total failure. In addition, the cluster manager detects failure and provisions a new backup instance on one of the available machines. 

What happens if the database fails?
The IMDG maintains a log of all updates and can re-play them as soon as the database becomes available again. It is important to note that during this time the system continues to operate unaffected. The end user will not notice this failure!  

How do I maintain transactional integrity?
The IMDG supports the standard  two-phase commit protocol and XA transactions. Having said that, this model should be avoided as much as possible due to the fact that it introduces dependency among multiple partitions, as well as creates a single point of distributed synchronization in our system. Using a classic distributed transaction model doesn't take advantage of the full linear scalability potential of the partitioned topology. Instead, the recommended approach is to break transactions into small, loosely-coupled services, each of which can be resolved within a single partition. Each partition can maintain transaction integrity using local transactions. This model ensures that in partial failure scenarios the system is kept in a consistent state. 

How is transactional integrity maintained with the database?
As noted above, distributed transactions might introduce a severe performance and scalability bottleneck, especially if done with the database. In addition, attempting to execute transactions with the database violates one of the core principles behind PaaS: asynchronous updates to the database. To avoid this overhead, the IMDG ensures that transactions are resolved purely in-memory and are sent to the database in a single batch. If the update to the database fails, the system will re-try that operation until the update succeeds. 

What types of queries are supported?

  • Template matching (matching object based on class name, class hierarchy, and attribute values)
  • SQL – support range queries, 'like' semantics, etc.
  • Continuous queries – through a combination of notification and SQL.
  • Parallel query (a.k.a Map/Reduce) – queries that are not designated for a specific partition are automatically broadcasted to all partitions and the result is implicitly aggregated on the client side.
  • Iterator – iterate through a large result-set of data.
  • You can find some code snippets of the different query APIs here.

This model relies heavily on partitioning. How do I handle queries that need to span multiple partitions?
Aggregated queries are executed in parallel on all partitions. You can combine this model with stored procedure-like queries to perform more advanced manipulations, such as sum and max. See more details below.   

What about stored procedures and prepared statements?
Because the data is stored in memory, we avoid the use of a proprietary language for stored procedures. Instead, we can use either native Java/.Net/C++ or dynamic languages, such as Groovy and JRuby, to manipulate the data in memory. The IMDG provides native support for executing dynamic languages, routes the query to where the data resides, and enables aggregation of the results back to the client. A reducer can be invoked on the client-side to execute second  level aggregation. See a code example that illustrates how this model works here. 

Can I change these prepared statements and stored procedure equivalents without bringing down the data?
Yes. When you change the script, the script is reloaded to the server while the server is up without the need to bring down the data. The same capability exists in case you need to re-fresh collocated services code on the server-side.  

Do I need to change my application code to use an IMDG?
It depends. There are cases In which introducing an IMDG can be completely seamless and there are cases in which you will need to go through a re-write, depending on the programming model: 

 

 

Nature of Integration with IMDG

Comments/limitations

Hibernate 2nd level cache

Seamless

Best fit for read-mostly applications. Limited performance gain as it still heavily relies on the underlying database.

JDBC

Seamless, but limited

SQL commands written against the IMDG are guarantied to run with other JDBC resources. Doesn't support full SQL 92 and therefore existing applications may require code changes.Recommended for monitoring and administration. Not recommended for application development as it introduces unnecessary O/R mapping complexity.

HashMap

Seamless

Extensions such as timeout and transaction support are available as well. 

GigaSpaces Spring DAO

Partially seamless

Abstracts the transaction handling from the code. Domain model is based on POJOs, and therefore, doesn't require explicit changes, only annotations (annotation can be provided through an external XML file). If our application already uses a DAO pattern then it would require changing the DAO. This allows  narrowing down the scope of changes required to use an IMDG-specific interface. This option is highly recommended for best performance and scalability.

What topologies are supported?
Replicated (synchronous or asynchronous), partitioned, partitioned-with-backup.
See details here.

Do I need to change my code if I switch from one topology to another?

No. The topology is abstracted from the application code. The only caveat is that your code needs to be implemented with partitioning in mind, i.e., moving from a central server or a replicated topology to partitioning doesn't require changes to the code as long as your data includes an attribute that acts as a  routing index.

How are IMDGs and PaaS different from in-memory databases (IMDB)?

The relational model itself doesn't prevents us from taking full advantage of the fact that the data is stored as objects in memory. For example, when we use in-memory storage in an IMDG, we don't need the O/R mapping layer. In addition, we don't need separate languages to perform data manipulation. We can use the native application code, or dynamic languages, for that purpose.

Moreover, one of the fundamental problems with in-memory databases is that relational SQL semantics is not geared to deal with distributed data models. For example, an application that runs on a central server and was uses things like Join, which often maintains references among tables, or even uses aggregated queries such as Sum and Max, doesn't map well to a distributed data model. This is why many existing IMDB implementations only support very basic topologies and often require significant changes to the data schema and application code. This reduces the motivation for using in-memory relational databases, as it lacks transparency.

The GigaSpaces in-memory data grid implementation, for example, exposes a JDBC interface and provides SQL query support. Applications can therefore benefit from best of both worlds: you can read and write objects directly through the GigaSpaces API, query those same objects using SQL semantics, and view and manipulate the entire data set using regular database viewers.

Can I use an existing Hibernate mapping to map data from the database to the IMDG?

Yes. In addition, with PaaS, the Hibernate mapping overhead is reduced as most of it happens in the background, during initial load or during the asynchronous update to the database.

Further information on Hibernate support is available here.

Can I use PaaS with .Net or C++ applications?

Yes. Starting with GigaSpaces 6.5 both Hibernate (Java) and nHibernate (.Net) are supported. C++ applications deffer to the default Hibernate implementation. In addition, with GigaSpaces' new integration with Microsoft Excel, .Net users can easily access data in the IMDG directly from their Excel spreadsheets without writing code!

Final words:

While this approach is generic and can be applied to any database product, MySQL is the most interesting to discuss as it is widely adopted by those who need cost-effective scalability the most, such as web services, social networks and other Web 2.0 applications. In addition, MySQL faced several challenges in penetrating large enterprises. With the acquisition of Sun, MySQL becomes a viable option for such organizations, but still requires the capabilities mentioned above to compete effectively with rival databases. The combination of IMDG/PaaS with MySQL provides a good solution for addressing some of the bigger challenges in cloud-based deployments. More on that in a future post.

March 17, 2008

Scaling out your tier based application in three steps - Now online

A while back I wrote Architecture You Always Wondered About: Lessons Learned at Qcon, where i summarized some of the lessons learned from the Amazon, eBay, Yahoo and LinkedIn architectures, which enabled them to address their scalability requirements.

During that conference I spoke about the limitations of existing tier-based implementations and presented a pattern (Space-Based Architecture) that outlines how one can apply some of these lessons to scale out existing applications in a relatively seamless manner by pushing the complexity to the underlying middleware stack.

The presentation is now available online at InfoQ:

Three steps for turning a tier-based/Spring-application into dynamically scalable services

The theory and principles behind this pattern are based on Lean methodology, which I discussed in Moving to Extreme Transaction Processing using Lean methodology.

  1. Implement a common cluster for the entire system, including messaging and data. A common cluster eliminates the need for redundancy when we introduce fail-over. It also removes the need for a transaction coordinator, as we are no longer dealing with coordination of two separate sub-systems (we also eliminate the need for idempotency).
  2. Remove disk I/O from the critical path of the transaction. The state of the messaging middleware and in-flight transactions will be stored purely in-memory. The system will replication to keep a copy of the data in an alternate memory instance for hot fail-over. Synchronization with the back-end database will be a background process using reliable asynchronous replication. This way we guarantee the consistency and availability of the system purely in-memory, without depending on disk storage. This approach has a nice cost-benefit to it as well, as it reduces the need for expensive disks as part of the infrastructure.
  3. Collocate the business logic with the data. This reduces the network hops as well as the number of moving parts in the system.
  4. Partition. We split (or "shard") the transactions into self-sufficient units of work.

I hope you'll find it interesting.

March 03, 2008

Moving to Extreme Transactions Processing using Lean methodology

In one of my recent posts I used a production line analogy to describe the applicability of methodologies like "Lean" taken from production line optimization to transactional systems.

As a reminder, here's a good explanation about Lean from Amit Rathore's blog:

Ultimately, it focuses on throughput (of whatever is being produced) by taking a strictly system-level view of things. In other words, it doesn’t focus on particular components of the value-stream, but on whether all the components of the chain are working as efficiently as possible, to generate as much overall value as possible.

And from another post by Amit:

In many cases, much more bang for the buck can be got by simply looking at an extended value-stream, as opposed to a localized one.

Transactional systems have many similarities to production lines. In both cases, we're dealing with pipeline optimization.

In this post I provide a more specific example that illustrates how to apply Lean/Agile methodologies to scale-out transactional systems. This new category of transactional systems is also referred to as Extreme Transaction Processing (XTP).

Transactional system example

 We'll start by examining a typical pipeline of a transaction processing system.

The pipeline consist of the following steps:

  • Step 1: Send business request to messaging system
  • Step 2: Take the business request, typically through Message-Driven Beans(MDB), and process it
  • Step 3: Update the state of the transaction
  • Step 4: Trigger an event that will start the next step in our transaction system
  • Repeat 1-4

Ina real world system, the business request might be might be market data feeds, monitored events in a management system, credit card verification requests and so on. The processing of the request might include parsing, validation and/or normalization, which transform the raw feed into something more meaningful. A next step will usually be some sort of "matching" and "processing" – that is where the specific business logic comes in. The full business transaction requires the completion of all these steps.

Adding reliability and consistency

Now let's examine what happens to this system when we add high-availability requirements:

  • Step 1: Send business request to messaging system
  • Step 2: Replicate the message to a back-up node or disk
  • Step 3: Open an XA transaction
  • Step 4: Take the business request (typically through MDB) and process it
  • Step 5: Update the state of the transaction (under the same transaction)
  • Step 6  Replicate the transaction state to maintain high-availability of the database
  • Step 7: Trigger an event that will start the next step in the transaction
  • Step 8:  Commit the transaction (message sent to the transaction coordinator)
  • Step 9: Transaction coordinator call prepare on the messaging system and database
  • Step 10: Transaction coordinator call commit on the messaging and data system.
  • Repeat 1-10

Analysis

The steps listed above constitute what Lean defines as an end-to-end system view. Looking at this view it becomes pretty clear that an attempt to optimize a specific tier (e.g., data, messaging) is only going to produce marginal value (the system is only as strong as its weakest link).

Another thing that we learn is that adding reliability requirements more than doubled the amount of network hops in our system!

The reason for this is related to the tier-based approach. In our case, the tiers are the messaging system and the data-tier. Because each tier is independent of the other, it has its own high-availability model and its own configuration and setup. We need to "pay" the cost of reliability over and over again for each tier. As the tiers represent separate systems, we also needs to add the overhead of external systems, such as a transaction coordinator, to ensure the consistency of the two separate systems.

Adding scalability 

Things become worse when we add scalability requirements.

In most systems, the messaging-tier and data-tier are implemented as centralized servers, and the same applies to the transaction coordinator. What that means is that these centralized points become a bottleneck when we try to scale the system. This bottleneck means that the more transactions we try to process in parallel, the longer it takes to complete each step mentioned above. Furthermore, in many cases the overhead is not linear, especially when disk I/O is involved.

It therefore becomes almost impossible to try and provision our system based on our scaling requirements before we build the entire system and test it. This leads to a huge risk in project management, and that is one of the main reasons many projects managers are surprised to discover  toward the end of the project that the PoC and testing that they conducted earlier, do not represent their *real* system behavior in terms of throughput and latency. In many cases they have to go back to the drawing board for a very painful optimization cycle, or even a complete re-design to meet their goals. At this point, many projects simply fail.

The limitations of scaling on a per-tier basis

Although some implementations of messaging and databases provide partitioning, trying to break those centralized implementations on a per-tier basis -- whether it’s the database or messaging system -- is going to be extremely hard and very complex to maintain. It also introduces another interesting problem, which is the affinity problem. Meaning, if we break each tier into multiple units how do we ensure that the transaction is routed to the "right" unit when the transaction moves from one tier to the other. In many cases this leads to complexity in the code or to a requirement for yet another external system that will act as the coordinator for message routing.

How can Lean and Agile methodologies help

The first step in solving a problem is identifying where the problem is. By taking the "system view" we will have a better picture of where the problem may lie. We can now choose to either address incremental optimizations or eliminate the problem altogether.

Applying incremental optimizations

With incremental optimizations we identify which steps are slowest, and try to improve them. In the example above, the database would seem to be one of the more obvious bottlenecks. We can reduce the time it takes to complete that step by front-ending the database with a distributed cache, also known as an in-memory data grid (IMDG).

We can also try to minimize the use of distributed transactions by adding awareness to our code. In a scenario in which the same transaction may be processed more then once due to failure, this semantic is referred to as idempotent.

The limitations of incremental optimizations

Such optimizations are valid, and in some cases, sufficient. But they are just a painkiller for the symptoms and don't solve the underlying problem, which is a result of the tier-based approach. We also need to remember that writing code that deals with idempotency is complex and error-prone as noted in Johan Strandler's article on InfoQ, New patterns and middleware architecture needed for true linear scalability?

The lack of transactional atomicity between entities and the messaging introduced by this, causes new problems that lurks it's way all up to the business logic; message retries and processes that must be able to handle idempotence.

Achieving end-to-end scalability

Moving to a tier-less approach using Space-Based Architecture (SBA) will help us eliminate all of the bottlenecks described above and transform our transaction processing system into one that fits the definition of Gartner's Extreme Transaction Processing (XTP).

This pattern has the following principles:

  1. Implement a common cluster for the entire system, including messaging and data. A common cluster eliminates the need for redundancy when we introduce fail-over. It also removes the need for a transaction coordinator, as we are no longer dealing with coordination of two separate sub-systems (we also eliminate the need for idempotency).
  2. Remove disk I/O from the critical path of the transaction. The state of the messaging middleware and in-flight transactions will be stored purely in-memory. The system will replication to keep a copy of the data in an alternate memory instance for hot fail-over. Synchronization with the back-end database will be a background process using reliable asynchronous replication. This way we guarantee the consistency and availability of the system purely in-memory, without depending on disk storage. This approach has a nice cost-benefit to it as well, as it reduces the need for expensive disks as part of the infrastructure.
  3. Collocate the business logic with the data. This reduces the network hops as well as the number of moving parts in the system.
  4. Partition. We split (or "shard") the transactions into self-sufficient units of work.

In SBA terminology, steps 1 and 2 above are handled by a Processing Unit. Processing units are the unit of scale and fail-over in the system.

Seamless transition through middleware virtualization

Another aspect of SBA is the abstraction of the API from the underlying implementation. With SBA, the API is basically a façade that exposes certain application semantics (messaging/data). This provides the benefit of the tier-less in-memory approach, without fundamental changes to the existing programming model.

For example, we can still use JMS for as the messaging API, POJOs for implementing the business logic, and DAO for abstracting the data model (more in The Missing Piece in Cloud Computing: Middleware Virtualiztion).

Real-life example: from tier-based to tier-less in 4 weeks!

Toward the end of last year GigaSpaces engaged with a large wireless carrier that was about to launch a new campaign. The company was concerned about the scalability of their existing tier-based Order Management System due a recent failure in an earlier launch. They heard about Space-Based Architecture and  wanted to use GigaSpaces to implement this pattern. A major hurdle was that most of the code was already written for a classic tier-based J2EE model and they had only 4 weeks to integrate with GigaSpaces and go directly into production.

We established a tiger-team that integrated GigaSpaces XAP with their existing application, and within two weeks completed development and started stress testing. The system successfully went live before the holiday shopping season and served an extremely successful campaign. The diagram below shows the before and after architecture of this particular application:

Tierbasedtp_3


Figure 1: Before --  Tier-based order management system

Gridbased_2

Figure 2: After -- Scale-out architecture based on GigaSpaces

Summary:

The end-to-end system view highlights some of the fundamental problems with Tier-Based Architecture (TBA). TBA leads to tier-based thinking: instead of looking at things from an end-to-end perspective, we look at them at the tier level. In retrospect, it's amazing to see how far this thinking has seeped: it defines how we evaluate products, how we tests systems, how we build development teams, and even, to a degree, the organizational structure. It even created an entire ecosystem of products, such as performance monitoring tools, aimed at optimizing the tiers problem. It took the manufacturing industry decades to realize that this is a major problem that requires a new way of thinking. Hopefully, the software industry will come to this realization much quicker. We are already seeing signs of a move away from tier-based thinking. A good example is the transition from the waterfall development approach to Agile and Lean development methodologies. Aligning application architecture with this development model is a natural next step.

The model is mature and there are already many  production references using it, as indicated by the example i described earlier, and the testimonial below from Ashmit Bhattacharya, who runs development for Blackhawk Network, a Safeway subsidiary that is the largest gift card processor in the world:

We were introduced to GigaSpaces by our enterprise service partners, and they decided to come in and integrate with our existing solution as part of the POC. The solution was demonstrated in less than 3 weeks and we had a linearly scalable solution at the end of the exercise. ...
..The best part of the solution in our particular case was the manner in which the solution scaled horizontally. This took a tremendous burden off my architecture teams and we could focus more on functional development of our solution rather than work on the framework.
 

I'm hoping that the work that we're doing at GigaSpaces will help bring this much needed change faster.

 

March 02, 2008

Cloud competing:)

So many talks on clouds..
I found this amusing picture on flickr, i chose to call it "cloud competing" :)

My Photo

Twitter Updates

    follow me on Twitter