I just returned from Qcon-San Francisco. I really enjoyed the conference. The presentations and panels were of very high quality. I also enjoyed the fact that it was intimate enough for meeting with interesting people in the industry. I particularly liked the discussion with Brian Zimmer of Orbitz, who is using Jini as part of their backbone, as well as (eBay), who gave an excellent presentation on Friday on the eBay architecture. Well done Floyd and Trifork (Aino, Roxanne and the rest of the team) for the organization of this event.
In The Architectures You've Always Wondered About track, Second Life, eBay, Yahoo, LinkedIn and Orbitz presented how they dealt with different aspects of their applications, such as scalability. There were quite a few lessons that I learned that day that I thought were worth sharing.
Disclaimer: the information below is based on notes taken during the sessions. It is not detailed coverage of the individual presentations, rather it is my personal interpretation and conclusion from the aggregate. The original presentations from this track are available here.
The Stack
This topic seems to be rather emotional, mainly between the LAMP and Java camps, as I
witnessed myself when I posted Why most large-scale web sites are not written
in Java. Although the LAMP stack is quite popular in this domain (large-scale
web apps), the environments are very heterogeneous, especially with the big
sites: Yahoo and eBay certainly, and this is partially true for Google and
Amazon. In fact, one of Google's applications uses GigaSpaces. An obvious
explanation for this heterogeneity is the fact that many of the big sites
acquired and merged several companies, each with its own implementation stack
There were quite a few (LinkedIn, eBay, Orbitz, Yahoo Bix) that use Java as their core
language. (Note that most of Yahoo's applications *do* use LAMP as their core
stack. Yahoo Bix is an exception). The Linux-Apache-Tomcat-Spring-Hibernate
stack is common among the Java using sites. If I recall correctly, only eBay
uses the full J2EE stack, but they seem to be using only a very small part of
its functionality. Second Life has a rather unusual architecture: they use
mostly C++ and Python, and said that they want to move to Web Services and SOA
as a way to address scalability. Orbitz uses Jini as their service framework, and
did interesting work with Spring to abstract Jini from their developers. They
also developed remoting functionality with Spring to abstract their services
interaction
Integration Strategy
Integration is a big issue for all of these sites, typically because of acquisitions. In the
case of eBay and Yahoo, the companies they bought were built with different
architectures, and in many cases, different implementation stacks. Their policy
is not to interfere with the implementation details (at least not in the early
stages), but focus on how they integrate quickly -- with the end-user
experience in mind. Both Yahoo and eBay built a common framework to address
these integration requirements. A great deal of effort was made on enabling a
common user identity management system (Single Sign-On), as well as a load-balancing
scheme. eBay chose Apache as their common module and extended it quite a bit
with their own modules. Yahoo built an identity management service designed for
extensibility. Extensibility here refers to the ability of each of their
applications to add its own data to the user profile, which can then be used
for personalization and other purposes.
Application Architecture
Not surprisingly, these applications were designed as a tier-based architecture. When
they talk about partitioning it typically applies to the data-tier
Database Tier
MySQL is definitely the most popular database. It's interesting (and surprising) to see the amount of
resources that organizations such as eBay and Google spent on extending mySQL, and
it's nice to see that they contributed those extensions back to the MySQL community.
According to Dan Pritchett (eBay), with their enhancements you can do with
MySQL pretty much the same as with an Oracle database. Also see The Future is
Cloudy, which gives some context on Google's extensions to MySQL. The Oracle
database is still being used by some sites, but usually in conjunction with
MySQL.
Most of the sites said that they store data in-memory to minimize the I/O overhead of the
database, but they do so for read-mostly scenarios, where memory is not relied
upon as the system of record. One of the presenters (I think that it was Dan) mentioned
that the reason for the limited use of caching is the nature of the data usage
patterns. At any point in time each of their servers can request any data item
with no specific order. Because the volumes of data they are dealing with are
huge, they can't store it all in memory. The inconsistent data usage patterns
of their applications minimize the potential for performance gains from using
caching.
I think that this statement should be re-examined, as a lot of progress has been made in this area in
recent years, which can change many of the assumptions currently held in this
regard (but that is a topic for a different discussion). Many sites use memcached
as their caching layer. For example, this case study about TypePad's
architecture (which happens to host this blog) indicates the use of memcached
to store counts, sets, stats, and heavyweight data
Messaging Tier
To address scalability, there is a trend to move away from synchronous RPC approach
towards asynchronous communications (I discuss that more below when I get to
the topic of scalability). One would think that JMS would be used all over the
place to address this requirement. It seems, however, that almost each one of
the presenters said they built their own messaging stack. The reasons seem to
be.
- Requirements for efficient content-based messaging: The type of messaging that is required is neither straight point-to-point nor is it pub/sub. It is more associative in nature. It is a common requirement to look at the message and browse through it before you decide if you want to select it (and the JMS selector interface is rather limited for this)
- Consistency: In order to avoid partial failure and the need for distributed transactions, they store their events in the same partition as the data. In this way they ensure that messages are routed to the same partition as the data (and avoid the use of distributed transactions)
- They were able to fine-tune this layer based on their specific usage and semantics
LinkedIn refers to this type of messaging as a "data bus" - a good term for it This reminds me of the original reason I became interested in JavaSpaces, when I worked on a B2B exchange and had to deal with similar requirements. JavaSpaces does exactly that, i.e., it provides a data bus that combines both the messaging and data in a single consistent implementation.
Scalability as an Afterthought
A message that was consistently delivered during the conference by almost all architects was that scalability shouldn't be dealt with as afterthought. While I agree with this statement, all of the case studies at QCon revealed an interesting fact: Most of the sites described were not initially designed for scalability (there is a famous story about eBay starting as a single dll). Still, it seems that they were able to go through several re-architecture cycles whenever scalability became a big issue. Is there a lesson to learn from this? My view is because current approaches to scalability introduce a huge level of complexity (not to mention the fact that many developers don't understand scalability well), scalability and time-to-market are perceived as two contradicting goals. In other words, an attempt to achieve one of them is viewed as risking the other. The lesson could therefore be that you don't have to implement the scalability patterns on day one, but instead you need to be aware of what scalability means. Even as you're making compromises to address time-to-market requirements, you need to plan ahead for a change at the appropriate time.
Werner Vogel (Amazon
CTO) expressed a similar sentiment when he said: "Scale later. It is soooo difficult to make it right, that sometimes the effort to do it up front is not
justified. Or leave it to somebody that has the knowhow and has done it
already…like Amazon (remember S3 - Virtual Disk, etc.)"
Scalability -- How to Do It Right
And Here's what you've all been waiting for (drumroll):
- Asynchronous event-driven design: Avoid as much as possible any synchronous interaction with the data or business logic tier. Instead, use an event-driven approach and workflow
- Partitioning/Shards: You need to design your data model so that it will fit the partitioning model
- Parallel execution: Parallel execution should be used to get the most out of the available resources. A good place to use parallel execution is for processing users requests. In this case multiple instances of each service can take the requests from the messaging system and execute them in parallel. Another place for parallel processing is using MapReduce for performing aggregated requests on partitioned data
- Replication (read-mostly): In read-mostly scenarios (LinkedIN seems to fall into this category well), database replication can help load-balance the read load by splitting the read requests among the replicated database nodes
- Consistency without distributed transactions: This was one of the hot topics of the conference, which also sparked some discussion during one of the panels I participated in. An argument was made that to reach scalability you had to sacrifice consistency and handle consistency in your applications using things such as optimistic locking and asynchronous error-handling. It also assumes that you will need to handle idempotency in your code. My argument was that while this pattern addresses scalability, it creates complexity and is therefore error-prone. During another panel, Dan Pritchett argued that there are ways to avoid this level of complexity and still achieve the same goal, as I outlined in this blog post.
- Move the database to the background: There was violent agreement that the database bottleneck can only be solved if database interactions happen in the background.
Quoting Werner Vogel again:"To scale: No direct access to the database anymore. Instead data access is encapsulated in services (code and data together), with a stable, public interface."
Other Tips
- Yahoo Bix – Developed a network sniffer to monitor calls to the database and optimize their use of Hibernate. This represents one of the interesting trade-offs in the database abstraction layer: by abstracting what's done behind the scenes, in a sense you allow developers to write non-optimized code. The sniffer approach helped exposing what happens behind the scenes and used this information to optimize the Hibernate code
- LinkedIn – Uses Lucene (and not a database) for indexing. If you're looking for an efficient way to perform indexing and search on your indexes, a database is probably not the right tool for the job. Lucene provides a much more efficient implementation for that purpose. I would also recommend using Compass. And here is some breaking news: I just heard from Shay Banon, Compass project owner that he is working on Lucene integration over an in-memory clustered index (using GigaSpaces). This is quite exciting because it will allow you to store a Lucene index in a distributed manner. It will also allow you to index the content of a Space and perform a Google-like search query!.
Summary: Complexity and Time-to-Market Trade-Offs
The prevailing patterns used for scalability introduce complexity. For example, we need to deal with partial failure scenarios, as well as handling idempotency, as part of our application code. This introduces a high degree of complexity, and is the major reason why most architects and developers start with simpler approaches that are not scalable, knowing that later they will need to completely re-design the application to meet requirements. Second Life had an entire presentation on this topic.
Our challenge at GigaSpaces is to eliminate this contradiction as much possible by making scalability simple, so that scalability patterns could be easily implemented from the get-go, allowing the business to grow incrementally as needed. This was actually the main topic of my presentation at QCon, titled Three Steps to Turning Your Tier-Based Spring Application into Scalable Services.
I'll provide more details on this approach in a future post.