« PaaS – Persistence as a Service (using Hibernate) | Main | Why most large-scale Web sites are not written in Java [Personal View] »

October 03, 2007

Why most large-scale Web sites are not written in Java

During the past few weeks I've had discussions with my colleague Geva Perry trying to answer the question Why most large-scale Web sites are not written in Java?

There is a lot of information in the blogosphere describing the architecture of many popular sites, such as Google, Amazon, eBay, LinkedIn, TypePad, WikiPedia and others.

The folks at Pingdom compiled some of this information, based on information from High-Scalability:

Scalablewebarchitecture_4

Looking at these architectures some observations come to mind: Most of these sites are using LAMP as the core runtime stack. Some have gone so far as to develop their own file system (Google, GFS). Some are using caching to solve the database bottleneck (memcached and the like). Many of them were forced to develop these solutions themselves, as at the time there was no ready-made alternative that could meet their requirements.

The application stack of these Web applications is very different from the stack that mission-critical applications in the financial world are built with. In the financial world, Java -- and to a lesser degree J2EE -- is used extensively. In recent years scalability requirements in capital markets led to a rapid shift in the middleware stack, introducing Compute Grid solutions for virtualization of CPU resources, enabling parallelization of batch applications. Data Grids were also introduced, enabling the virtualization of memory resources. Spring is becoming the common development framework in this world. At GigaSpaces, we're seeing more and more cases where Spring acts as a complete alternative to J2EE.

If we examine both worlds, we can see that both are facing similar challenges related to scalability. Not surprisingly, both ended up introducing similar solutions for addressing the scalability challenges:

On the Data Tier we see the following:

1. Adding a caching layer to take advantage of memory resources availability and reduce I/O overhead
2. Moving from a database-centric approach to partitioning, aka shards  

On the Business Logic Tier:

3. Adding parallelization semantics to the application tier (e.g., MapReduce)
4. Moving to scale-out application models to achieve linear scalability
5. Moving away from the classic two-phase commit and XA for transaction processing  (See: Lessons from Pat Helland: Life Beyond Distributed Transactions)

While there are many similar challenges, and to a certain degree, similar architectures, it seems that both worlds (Web and Financial) took different routes as it relates to the application stack.

Over at the High-Scalability site, someone posted the question: Why doesn't anyone use j2ee?
The answer given in that post can be summarized as follows:

1. LAMP provides a cost-effective solution (most of it relies on *free* open source stack).
2. Java is still used, but not as the primary language, i.e., it is used as one component either in the back-end or the front-end (e.g., servlets).

I have my own thoughts on this matter, but I'll be very interested to see if anyone has any reasonable explanation for it, before I jump in.

Thoughts?

UPDATE (October 11, 2007): This post generated a very active debate in several places, including TheServerSide, and more recently, on Artima. In this post I respond and give some additional thoughts.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d835457b7453ef00e54f038fc38834

Listed below are links to weblogs that reference Why most large-scale Web sites are not written in Java:

» Large Scale Web Site Development from subbu.org
TSS recently had an active thread on Why most large-scale Web sites are not written in Java. I must say that this is a provocative title and naturally caused a lot of passionate readers comment on this thread. The thread was started in response to a po... [Read More]

Comments

My Photo

Twitter Updates

    follow me on Twitter