I just returned from Qcon-San Francisco. I really enjoyed the conference. The presentations and panels were of very high quality. I also enjoyed the fact that it was intimate enough for meeting with interesting people in the industry. I particularly liked the discussion with Brian Zimmer of Orbitz, who is using Jini as part of their backbone, as well as (eBay), who gave an excellent presentation on Friday on the eBay architecture. Well done Floyd and Trifork (Aino, Roxanne and the rest of the team) for the organization of this event.
In The Architectures You've Always Wondered About track, Second Life, eBay, Yahoo, LinkedIn and Orbitz presented how they dealt with different aspects of their applications, such as scalability. There were quite a few lessons that I learned that day that I thought were worth sharing.
Disclaimer: the information below is based on notes taken during the sessions. It is not detailed coverage of the individual presentations, rather it is my personal interpretation and conclusion from the aggregate. The original presentations from this track are available here.
This topic seems to be rather emotional, mainly between the LAMP and Java camps, as I witnessed myself when I posted Why most large-scale web sites are not written in Java. Although the LAMP stack is quite popular in this domain (large-scale web apps), the environments are very heterogeneous, especially with the big sites: Yahoo and eBay certainly, and this is partially true for Google and Amazon. In fact, one of Google's applications uses GigaSpaces. An obvious explanation for this heterogeneity is the fact that many of the big sites acquired and merged several companies, each with its own implementation stack There were quite a few (LinkedIn, eBay, Orbitz, Yahoo Bix) that use Java as their core language. (Note that most of Yahoo's applications *do* use LAMP as their core stack. Yahoo Bix is an exception). The Linux-Apache-Tomcat-Spring-Hibernate stack is common among the Java using sites. If I recall correctly, only eBay uses the full J2EE stack, but they seem to be using only a very small part of its functionality. Second Life has a rather unusual architecture: they use mostly C++ and Python, and said that they want to move to Web Services and SOA as a way to address scalability. Orbitz uses Jini as their service framework, and did interesting work with Spring to abstract Jini from their developers. They also developed remoting functionality with Spring to abstract their services interaction
Integration is a big issue for all of these sites, typically because of acquisitions. In the case of eBay and Yahoo, the companies they bought were built with different architectures, and in many cases, different implementation stacks. Their policy is not to interfere with the implementation details (at least not in the early stages), but focus on how they integrate quickly -- with the end-user experience in mind. Both Yahoo and eBay built a common framework to address these integration requirements. A great deal of effort was made on enabling a common user identity management system (Single Sign-On), as well as a load-balancing scheme. eBay chose Apache as their common module and extended it quite a bit with their own modules. Yahoo built an identity management service designed for extensibility. Extensibility here refers to the ability of each of their applications to add its own data to the user profile, which can then be used for personalization and other purposes.
Not surprisingly, these applications were designed as a tier-based architecture. When they talk about partitioning it typically applies to the data-tier Database Tier MySQL is definitely the most popular database. It's interesting (and surprising) to see the amount of resources that organizations such as eBay and Google spent on extending mySQL, and it's nice to see that they contributed those extensions back to the MySQL community. According to Dan Pritchett (eBay), with their enhancements you can do with MySQL pretty much the same as with an Oracle database. Also see The Future is Cloudy, which gives some context on Google's extensions to MySQL. The Oracle database is still being used by some sites, but usually in conjunction with MySQL. Most of the sites said that they store data in-memory to minimize the I/O overhead of the database, but they do so for read-mostly scenarios, where memory is not relied upon as the system of record. One of the presenters (I think that it was Dan) mentioned that the reason for the limited use of caching is the nature of the data usage patterns. At any point in time each of their servers can request any data item with no specific order. Because the volumes of data they are dealing with are huge, they can't store it all in memory. The inconsistent data usage patterns of their applications minimize the potential for performance gains from using caching. I think that this statement should be re-examined, as a lot of progress has been made in this area in recent years, which can change many of the assumptions currently held in this regard (but that is a topic for a different discussion). Many sites use memcached as their caching layer. For example, this case study about TypePad's architecture (which happens to host this blog) indicates the use of memcached to store counts, sets, stats, and heavyweight data
To address scalability, there is a trend to move away from synchronous RPC approach towards asynchronous communications (I discuss that more below when I get to the topic of scalability). One would think that JMS would be used all over the place to address this requirement. It seems, however, that almost each one of the presenters said they built their own messaging stack. The reasons seem to be.
- Requirements for efficient content-based messaging: The type of messaging that is required is neither straight point-to-point nor is it pub/sub. It is more associative in nature. It is a common requirement to look at the message and browse through it before you decide if you want to select it (and the JMS selector interface is rather limited for this)
- Consistency: In order to avoid partial failure and the need for distributed transactions, they store their events in the same partition as the data. In this way they ensure that messages are routed to the same partition as the data (and avoid the use of distributed transactions)
- They were able to fine-tune this layer based on their specific usage and semantics
LinkedIn refers to this type of messaging as a "data bus" - a good term for it This reminds me of the original reason I became interested in JavaSpaces, when I worked on a B2B exchange and had to deal with similar requirements. JavaSpaces does exactly that, i.e., it provides a data bus that combines both the messaging and data in a single consistent implementation.
Scalability as an Afterthought
A message that was consistently delivered during the conference by almost all architects was that scalability shouldn't be dealt with as afterthought. While I agree with this statement, all of the case studies at QCon revealed an interesting fact: Most of the sites described were not initially designed for scalability (there is a famous story about eBay starting as a single dll). Still, it seems that they were able to go through several re-architecture cycles whenever scalability became a big issue. Is there a lesson to learn from this? My view is because current approaches to scalability introduce a huge level of complexity (not to mention the fact that many developers don't understand scalability well), scalability and time-to-market are perceived as two contradicting goals. In other words, an attempt to achieve one of them is viewed as risking the other. The lesson could therefore be that you don't have to implement the scalability patterns on day one, but instead you need to be aware of what scalability means. Even as you're making compromises to address time-to-market requirements, you need to plan ahead for a change at the appropriate time.
Werner Vogel (Amazon
CTO) expressed a similar sentiment when he said: "Scale later. It is soooo difficult to make it right, that sometimes the effort to do it up front is not
justified. Or leave it to somebody that has the knowhow and has done it
already…like Amazon (remember S3 - Virtual Disk, etc.)"
Scalability -- How to Do It Right
And Here's what you've all been waiting for (drumroll):
- Asynchronous event-driven design: Avoid as much as possible any synchronous interaction with the data or business logic tier. Instead, use an event-driven approach and workflow
- Partitioning/Shards: You need to design your data model so that it will fit the partitioning model
- Parallel execution: Parallel execution should be used to get the most out of the available resources. A good place to use parallel execution is for processing users requests. In this case multiple instances of each service can take the requests from the messaging system and execute them in parallel. Another place for parallel processing is using MapReduce for performing aggregated requests on partitioned data
- Replication (read-mostly): In read-mostly scenarios (LinkedIN seems to fall into this category well), database replication can help load-balance the read load by splitting the read requests among the replicated database nodes
- Consistency without distributed transactions: This was one of the hot topics of the conference, which also sparked some discussion during one of the panels I participated in. An argument was made that to reach scalability you had to sacrifice consistency and handle consistency in your applications using things such as optimistic locking and asynchronous error-handling. It also assumes that you will need to handle idempotency in your code. My argument was that while this pattern addresses scalability, it creates complexity and is therefore error-prone. During another panel, Dan Pritchett argued that there are ways to avoid this level of complexity and still achieve the same goal, as I outlined in this blog post.
- Move the database to the background: There was violent agreement that the database bottleneck can only be solved if database interactions happen in the background.
Quoting Werner Vogel again:"To scale: No direct access to the database anymore. Instead data access is encapsulated in services (code and data together), with a stable, public interface."
- Yahoo Bix – Developed a network sniffer to monitor calls to the database and optimize their use of Hibernate. This represents one of the interesting trade-offs in the database abstraction layer: by abstracting what's done behind the scenes, in a sense you allow developers to write non-optimized code. The sniffer approach helped exposing what happens behind the scenes and used this information to optimize the Hibernate code
- LinkedIn – Uses Lucene (and not a database) for indexing. If you're looking for an efficient way to perform indexing and search on your indexes, a database is probably not the right tool for the job. Lucene provides a much more efficient implementation for that purpose. I would also recommend using Compass. And here is some breaking news: I just heard from Shay Banon, Compass project owner that he is working on Lucene integration over an in-memory clustered index (using GigaSpaces). This is quite exciting because it will allow you to store a Lucene index in a distributed manner. It will also allow you to index the content of a Space and perform a Google-like search query!.
Summary: Complexity and Time-to-Market Trade-Offs
The prevailing patterns used for scalability introduce complexity. For example, we need to deal with partial failure scenarios, as well as handling idempotency, as part of our application code. This introduces a high degree of complexity, and is the major reason why most architects and developers start with simpler approaches that are not scalable, knowing that later they will need to completely re-design the application to meet requirements. Second Life had an entire presentation on this topic.
Our challenge at GigaSpaces is to eliminate this contradiction as much possible by making scalability simple, so that scalability patterns could be easily implemented from the get-go, allowing the business to grow incrementally as needed. This was actually the main topic of my presentation at QCon, titled Three Steps to Turning Your Tier-Based Spring Application into Scalable Services.
I'll provide more details on this approach in a future post.