I just returned from Qcon-San
Francisco. I really enjoyed the conference. The presentations and panels were
of very high quality. I also enjoyed the fact that it was intimate enough for
meeting with interesting people in the industry. I particularly liked the discussion
with Brian Zimmer of Orbitz, who is using Jini as part of their backbone, as
well as (eBay), who gave an excellent presentation on Friday on the eBay
architecture. Well done Floyd and Trifork (Aino, Roxanne and the rest of the team) for the organization of this event.
In The Architectures You've Always Wondered About track, Second Life, eBay, Yahoo, LinkedIn and Orbitz presented how they dealt with different aspects of their applications, such as
scalability. There were quite a few lessons that I learned that day that I thought
were worth sharing.
Disclaimer:
the information below is based on notes taken during the sessions. It is not
detailed coverage of the individual presentations, rather it is my personal
interpretation and conclusion from the aggregate. The original presentations
from this track are available here.
The Stack
This topic seems to be rather emotional, mainly between the LAMP and Java camps, as I
witnessed myself when I posted Why most large-scale web sites are not written
in Java. Although the LAMP stack is quite popular in this domain (large-scale
web apps), the environments are very heterogeneous, especially with the big
sites: Yahoo and eBay certainly, and this is partially true for Google and
Amazon. In fact, one of Google's applications uses GigaSpaces. An obvious
explanation for this heterogeneity is the fact that many of the big sites
acquired and merged several companies, each with its own implementation stack
There were quite a few (LinkedIn, eBay, Orbitz, Yahoo Bix) that use Java as their core
language. (Note that most of Yahoo's applications *do* use LAMP as their core
stack. Yahoo Bix is an exception). The Linux-Apache-Tomcat-Spring-Hibernate
stack is common among the Java using sites. If I recall correctly, only eBay
uses the full J2EE stack, but they seem to be using only a very small part of
its functionality. Second Life has a rather unusual architecture: they use
mostly C++ and Python, and said that they want to move to Web Services and SOA
as a way to address scalability. Orbitz uses Jini as their service framework, and
did interesting work with Spring to abstract Jini from their developers. They
also developed remoting functionality with Spring to abstract their services
interaction
Integration Strategy
Integration is a big issue for all of these sites, typically because of acquisitions. In the
case of eBay and Yahoo, the companies they bought were built with different
architectures, and in many cases, different implementation stacks. Their policy
is not to interfere with the implementation details (at least not in the early
stages), but focus on how they integrate quickly -- with the end-user
experience in mind. Both Yahoo and eBay built a common framework to address
these integration requirements. A great deal of effort was made on enabling a
common user identity management system (Single Sign-On), as well as a load-balancing
scheme. eBay chose Apache as their common module and extended it quite a bit
with their own modules. Yahoo built an identity management service designed for
extensibility. Extensibility here refers to the ability of each of their
applications to add its own data to the user profile, which can then be used
for personalization and other purposes.
Application Architecture
Not surprisingly, these applications were designed as a tier-based architecture. When
they talk about partitioning it typically applies to the data-tier
Database Tier
MySQL is definitely the most popular database. It's interesting (and surprising) to see the amount of
resources that organizations such as eBay and Google spent on extending mySQL, and
it's nice to see that they contributed those extensions back to the MySQL community.
According to Dan Pritchett (eBay), with their enhancements you can do with
MySQL pretty much the same as with an Oracle database. Also see The Future is
Cloudy, which gives some context on Google's extensions to MySQL. The Oracle
database is still being used by some sites, but usually in conjunction with
MySQL.
Most of the sites said that they store data in-memory to minimize the I/O overhead of the
database, but they do so for read-mostly scenarios, where memory is not relied
upon as the system of record. One of the presenters (I think that it was Dan) mentioned
that the reason for the limited use of caching is the nature of the data usage
patterns. At any point in time each of their servers can request any data item
with no specific order. Because the volumes of data they are dealing with are
huge, they can't store it all in memory. The inconsistent data usage patterns
of their applications minimize the potential for performance gains from using
caching.
I think that this statement should be re-examined, as a lot of progress has been made in this area in
recent years, which can change many of the assumptions currently held in this
regard (but that is a topic for a different discussion). Many sites use memcached
as their caching layer. For example, this case study about TypePad's
architecture (which happens to host this blog) indicates the use of memcached
to store counts, sets, stats, and heavyweight data
Messaging Tier
To address scalability, there is a trend to move away from synchronous RPC approach
towards asynchronous communications (I discuss that more below when I get to
the topic of scalability). One would think that JMS would be used all over the
place to address this requirement. It seems, however, that almost each one of
the presenters said they built their own messaging stack. The reasons seem to
be.
- Requirements for efficient content-based
messaging: The type of messaging that is required is neither straight point-to-point
nor is it pub/sub. It is more associative in nature. It is a common requirement
to look at the message and browse through it before you decide if you want to
select it (and the JMS selector interface is rather limited for this)
- Consistency: In order to avoid partial failure and the need for
distributed transactions, they store their events in the same partition as the
data. In this way they ensure that messages are routed to the same partition as
the data (and avoid the use of distributed transactions)
- They were able to fine-tune this layer based on their specific usage and semantics
LinkedIn refers to this type of messaging as a "data bus" - a good term for it
This reminds me of the original reason I became interested in JavaSpaces, when I worked on a B2B exchange and had to deal with similar requirements. JavaSpaces
does exactly that, i.e., it provides a data bus that combines both the
messaging and data in a single consistent implementation.
Scalability as an Afterthought
A message that was consistently delivered during the conference by
almost all architects was that scalability shouldn't be dealt with as
afterthought. While I agree with this statement, all of the case studies at
QCon revealed an interesting fact: Most of the sites described were not
initially designed for scalability (there is a famous story about eBay starting
as a single dll). Still, it seems that they were able to go through several re-architecture
cycles whenever scalability became a big issue. Is there a lesson to learn from
this?
My view is because current approaches to scalability introduce a huge
level of complexity (not to mention the fact that many developers don't understand
scalability well), scalability and time-to-market are perceived as two
contradicting goals. In other words, an attempt to achieve one of them is
viewed as risking the other. The lesson could therefore be that you don't have
to implement the scalability patterns on day one, but instead you need to be
aware of what scalability means. Even as you're making compromises to address
time-to-market requirements, you need to plan ahead for a change at the
appropriate time.
Werner Vogel (Amazon
CTO) expressed a similar sentiment when he said: "Scale later. It is soooo difficult to make it right, that sometimes the effort to do it up front is not
justified. Or leave it to somebody that has the knowhow and has done it
already…like Amazon (remember S3 - Virtual Disk, etc.)"
Scalability -- How to Do It Right
And Here's what you've all been waiting for (drumroll):
- Asynchronous event-driven design: Avoid as much as possible
any synchronous interaction with the data or business logic tier. Instead, use
an event-driven approach and workflow
- Partitioning/Shards: You need to design your data model so
that it will fit the partitioning model
- Parallel execution: Parallel execution should be used to
get the most out of the available resources. A good place to use parallel
execution is for processing users requests. In this case multiple instances of
each service can take the requests from the messaging system and execute them in
parallel. Another place for parallel processing is using MapReduce for
performing aggregated requests on partitioned data
- Replication (read-mostly): In read-mostly scenarios
(LinkedIN seems to fall into this category well), database replication can help
load-balance the read load by splitting the read requests among the replicated
database nodes
- Consistency without distributed transactions: This was one
of the hot topics of the conference, which also sparked some discussion during
one of the panels I participated in. An argument was made that to reach
scalability you had to sacrifice consistency and handle consistency in your
applications using things such as optimistic locking and asynchronous
error-handling. It also assumes that you will need to handle idempotency in your code. My
argument was that while this pattern addresses scalability, it creates
complexity and is therefore error-prone. During another panel, Dan
Pritchett argued that there are ways to avoid this level of complexity and
still achieve the same goal, as I outlined in this
blog post.
- Move the database to the background: There was violent
agreement that the database bottleneck can only be solved if database
interactions happen in the background.
Quoting Werner
Vogel again:"To scale: No direct access to the database anymore. Instead
data access is encapsulated in services (code and data together), with a stable,
public interface."
Other Tips
- Yahoo Bix – Developed a network sniffer to monitor calls to the database and
optimize their use of Hibernate. This represents one of the interesting
trade-offs in the database abstraction layer: by abstracting what's done behind
the scenes, in a sense you allow developers to write non-optimized code. The
sniffer approach helped exposing what happens behind the scenes and used this
information to optimize the Hibernate code
- LinkedIn – Uses Lucene (and not a database) for
indexing. If you're looking for an efficient way to perform indexing and search
on your indexes, a database is probably not the right tool for the job. Lucene
provides a much more efficient implementation for that purpose. I would also
recommend using Compass. And here is some breaking news: I just heard from Shay Banon, Compass project owner that he is
working on Lucene integration over an in-memory clustered index (using GigaSpaces). This is quite exciting
because it will allow you to store a Lucene index in a distributed manner. It
will also allow you to index the content of a Space and perform a Google-like
search query!.
Summary: Complexity and Time-to-Market Trade-Offs
The prevailing patterns used for scalability introduce complexity. For
example, we need to deal with partial failure scenarios, as well as handling
idempotency, as part of our application code. This introduces a high degree of
complexity, and is the major reason why most architects and developers start
with simpler approaches that are not scalable, knowing that later they will need
to completely re-design the application to meet requirements. Second Life had an
entire presentation on this topic.
Our challenge at GigaSpaces is to eliminate this contradiction as much
possible by making scalability simple, so that scalability patterns could be
easily implemented from the get-go, allowing the business to grow incrementally
as needed. This was actually the main topic of my presentation at QCon,
titled Three
Steps to Turning Your Tier-Based Spring Application into Scalable Services.
I'll provide more details on this approach in a future post.
Recent Comments