During the past few weeks I've had discussions with my colleague Geva Perry trying to answer the question Why most large-scale Web sites are not written in Java?
There is a lot of information in the blogosphere describing the architecture of many popular sites, such as Google, Amazon, eBay, LinkedIn, TypePad, WikiPedia and others.
The folks at Pingdom compiled some of this information, based on information from High-Scalability:
Looking at these architectures some observations come to mind: Most of these sites are using LAMP as the core runtime stack. Some have gone so far as to develop their own file system (Google, GFS). Some are using caching to solve the database bottleneck (memcached and the like). Many of them were forced to develop these solutions themselves, as at the time there was no ready-made alternative that could meet their requirements.
The application stack of these Web applications is very different from the stack that mission-critical applications in the financial world are built with. In the financial world, Java -- and to a lesser degree J2EE -- is used extensively. In recent years scalability requirements in capital markets led to a rapid shift in the middleware stack, introducing Compute Grid solutions for virtualization of CPU resources, enabling parallelization of batch applications. Data Grids were also introduced, enabling the virtualization of memory resources. Spring is becoming the common development framework in this world. At GigaSpaces, we're seeing more and more cases where Spring acts as a complete alternative to J2EE.
If we examine both worlds, we can see that both are facing similar challenges related to scalability. Not surprisingly, both ended up introducing similar solutions for addressing the scalability challenges:
On the Data Tier we see the following:
1. Adding a caching layer to take advantage of memory resources availability and reduce I/O overhead
2. Moving from a database-centric approach to partitioning, aka shards
On the Business Logic Tier:
3. Adding parallelization semantics to the application tier (e.g., MapReduce)
4. Moving to scale-out application models to achieve linear scalability
5. Moving away from the classic two-phase commit and XA for transaction processing (See: Lessons from Pat Helland: Life Beyond Distributed Transactions)
While there are many similar challenges, and to a certain degree, similar architectures, it seems that both worlds (Web and Financial) took different routes as it relates to the application stack.
Over at the High-Scalability site, someone posted the question: Why doesn't anyone use j2ee?
The answer given in that post can be summarized as follows:
1. LAMP provides a cost-effective solution (most of it relies on *free* open source stack).
2. Java is still used, but not as the primary language, i.e., it is used as one component either in the back-end or the front-end (e.g., servlets).
I have my own thoughts on this matter, but I'll be very interested to see if anyone has any reasonable explanation for it, before I jump in.
Thoughts?
UPDATE (October 11, 2007): This post generated a very active debate in several places, including TheServerSide, and more recently, on Artima. In this post I respond and give some additional thoughts.