Why most large-scale Web sites are not written in Java
During the past few weeks I've had discussions with my colleague Geva Perry trying to answer the question Why most large-scale Web sites are not written in Java?
There is a lot of information in the blogosphere describing the architecture of many popular sites, such as Google, Amazon, eBay, LinkedIn, TypePad, WikiPedia and others.
The folks at Pingdom compiled some of this information, based on information from High-Scalability:
Looking at these architectures some observations come to mind: Most of these sites are using LAMP as the core runtime stack. Some have gone so far as to develop their own file system (Google, GFS). Some are using caching to solve the database bottleneck (memcached and the like). Many of them were forced to develop these solutions themselves, as at the time there was no ready-made alternative that could meet their requirements.
The application stack of these Web applications is very different from the stack that mission-critical applications in the financial world are built with. In the financial world, Java -- and to a lesser degree J2EE -- is used extensively. In recent years scalability requirements in capital markets led to a rapid shift in the middleware stack, introducing Compute Grid solutions for virtualization of CPU resources, enabling parallelization of batch applications. Data Grids were also introduced, enabling the virtualization of memory resources. Spring is becoming the common development framework in this world. At GigaSpaces, we're seeing more and more cases where Spring acts as a complete alternative to J2EE.
If we examine both worlds, we can see that both are facing similar challenges related to scalability. Not surprisingly, both ended up introducing similar solutions for addressing the scalability challenges:
On the Data Tier we see the following:
1. Adding a caching layer to take advantage of memory resources availability and reduce I/O overhead
2. Moving from a database-centric approach to partitioning, aka shards
On the Business Logic Tier:
3. Adding parallelization semantics to the application tier (e.g., MapReduce)
4. Moving to scale-out application models to achieve linear scalability
5. Moving away from the classic two-phase commit and XA for transaction processing (See: Lessons from Pat Helland: Life Beyond Distributed Transactions)
While there are many similar challenges, and to a certain degree, similar architectures, it seems that both worlds (Web and Financial) took different routes as it relates to the application stack.
Over at the High-Scalability site, someone posted the question: Why doesn't anyone use j2ee?
The answer given in that post can be summarized as follows:
1. LAMP provides a cost-effective solution (most of it relies on *free* open source stack).
2. Java is still used, but not as the primary language, i.e., it is used as one component either in the back-end or the front-end (e.g., servlets).
I have my own thoughts on this matter, but I'll be very interested to see if anyone has any reasonable explanation for it, before I jump in.
Thoughts?
UPDATE (October 11, 2007): This post generated a very active debate in several places, including TheServerSide, and more recently, on Artima. In this post I respond and give some additional thoughts.


Funny that you only show simple sites.
Most banks and airlines use J2EE. Most major commerce sites do as well.
The ones listed are all sites that started as personal and small scale projects. Most people will buy more servers before porting an application to a new platform.
Posted by:Paul | October 03, 2007 at 10:07 PM
I don't think its because lack of trying. I think the majority of new large scale web apps TRY to use java. I just think they tend to fail miserably (as most large scale projects do regardless of language). The reason perl and php work so well for people is because its cheap and easy to get started and the sites grow into being large sites.
Posted by:jc | October 03, 2007 at 10:23 PM
Well, I also think that this list contains sites of simple applications, relatively. They may be data intensive, but I don't see a lot of business logic running behind the scenes. Taking LAM aside, when you look at the programming language and the presentation tier, well, it's just much simpler to do this with scripting language such as PHP, Perl or Python, but still, you can probably find lots of "heavier" sites that use java for the back-end, and one of the Ps for the front.
Posted by:Michael | October 03, 2007 at 10:24 PM
Um Ebay? Adwords? Gmail? Amazon? All written in Java. In fact I went to a talk at Java One where the Ebay engineers talked about throwing out their C++ system and rewriting the whole thing in Java split up among several thousand Appservers because they needed to scale their development process.
Posted by:justin | October 03, 2007 at 10:29 PM
One more... I also recently went to a talk at hi5.com which is the 10th biggest site in the world according to Alexa and their whole architecture is Java/Linux/PostgreSQL.
Posted by:justin | October 03, 2007 at 10:33 PM
I agree with other posts: a lot of big java applications are just not mentioned here.
Demetrio
Posted by:Demetrio Filocamo | October 03, 2007 at 11:00 PM
I think that you are looking where the light is.
Statistics is statistics is statistics...
If you give me the time, I will conduct a statistical research which will prove pretty much anything you'd like me to.
Few things that come to mind are:
What does the control group look like?
Did they look at product development effort (dev,test,suport)? Release cycles?
Did they look at TCO?
Did they look at volumes / scales?
etc' etc'
Posted by:Guy Sayar | October 03, 2007 at 11:12 PM
hi there,
apologies if this has duplicate content(i skipped reading prior comments).
I believe that it is NOT easy to deploy(host) a java app. It is much more expensive to host java apps. Also, there are more people who've had experience hosting PHP/perl/etc apps than java.
I'm expecting Sun to rise to the challenge and make glassfish easily hosted on solaris/linux.
also, java was NOT open source. I'm sure that is NOT a very big concern BUT the main guys who built the app would decide what they would use to build the app. Friendster used java servlets but a later revision rebuilt it using Php.
eBay was able to use sun's team to help build its java apps. Not all teams had access to java/j2ee expertise -- especially startups.
Java 6 performance is really so good with tomcat/glassfish/jboss and there are pretty good caching solutions as open source, TODAY, there's hardly a reason NOT to choose java EXCEPT for deployment(hosting) expertise.
BR,
~A
Posted by:anjan bacchu | October 03, 2007 at 11:51 PM
The Pingdom data only primarily lists social network and blog sites. It is fairer to ask: "Why most social network Web sites, regardless of size, are not written in Java?"
I tend to think an important part of the answer is cultural. Most of the people that I know who went off to work on or to start social network sites were web guys - that not so rare breed of people who love eye candy above all else. Most of the Java guys I know are not into eye candy; it's the PHP, Ruby and Perl crowd that is.
I also don't think that the cost of deployment is a primary factor, after all you can develop a web site in Java for free as well.
Posted by:Shane Isbell | October 04, 2007 at 12:18 AM
"LAMP provides a cost-effective solution (most of it relies on *free* open source stack)."
I swear, if I see this pointless drivel again...
So explain to me how things like GlassFish, Eclipse, Derby, Postgres SQL, etc. are not free!?!?!?
This is absolutely idiotic!
Posted by:Commenter | October 04, 2007 at 12:32 AM
You write intelligently, but it appears you did not bother to apply that intelligence to the content behind the writing.
The "question" you pose is ridiculously misguided and is instead you looking for a question to an answer you have already determined.
If you are asking why the sites you've listed don't use Java, well that's a different story and it'd be nice to hear from the developers of those sites.
As to "most" not using Java... you might as well ask why "most bloggers make baseless comments about languages in the form of a question".
You've lost all credibility in my book...
Posted by:Dantelope | October 04, 2007 at 01:37 AM
I agree with Anjan that main reason may be background and cultural. In addition to shared environment (social networks) many of these sites started as small projects, with no budget. The availability of volunteers with PHP/Web and LAMP skills is much bigger than volunteers with enough J2EE experience - it is just sheer size of platform that makes J2EE tougher.
Even if it certainly is possible to build completely free (== no cost) solution with Java as with LAMP (Tomcat/JBoss/Hibernate/MySQL or similar), when it comes to hosting, the footprint of the J2EE solution is bigger and will cost probably more than simple LAMP stack.
Another factor that you did not consider is visibility - many very heavily used systems written in Java do not get much attention because of the mundane, boring, non Web-2-ish nature of service they are providing.
But that are of course just speculations. I do not think we have enough data to make any conclusions ...
Miro
Posted by:Miro | October 04, 2007 at 02:02 AM
I'm sorry that feel that the question in the blog is misguiding, that wasn't my intent.
That's definitely one of the questions. I would be happy to hear your view on this matter.
The other question that i found interesting (That's more on the architecture level rather then specific use of particular language) is:
Why it appears that similar challenge (Scalability) was addressed by different application stack in Web and Financial industries?
I'm basically trying to see f there is anything we can learn from that.
Posted by:Nati Shalom | October 04, 2007 at 08:24 AM
Writting a scalable app is hard with any language. And no language or software that I know of, tries to solve the problem from A to Z (data, cpu, memory virtualization, but also latency, network optimization as your computers will end up in different datacenters across the world).
For example, in java we have JDBC. JDBC does not help at all dealing with shards. I'm not aware of any framework that force you to deal with shards. Shards may be a pain but they are a mandatory for big sites.
An what to say about the database. They sell you big databases with many features that in the end you cannot use if you want to scale!
Building a scaleable site impose some constraints on the way you do things, on the way you code. Nobody has formalized all these constraints and created a framework that would guarantee that a site can scale indefinitely on low cost pc.
Posted by:HiHo | October 04, 2007 at 08:43 AM
Hi there,
On my opinion most large scale web sites are not written in Java is a result of both technical and financial reasons
In a technical perspective:
• developing in Php / Perl is very fast and simple whereas JEE is more complex
• historically speaking the knowledge, hosting services and developers are more available
• LAMP proved to be stable and common whereas JEE was more of an infrastructure
• JEE requires application servers that sometimes are overkill for a web system
• The light web languages (Php/Perl) are more flexible to changes in the short run (as part of poor architecture that is based on Non-MVC, of course in the long run the cost of any change is dramatically higher)
• The deployment and testing of java application is far slower and requires relatively strong machines
In financial perspective
• JEE developers are far more expensive than Perl / Php
• The learning curve and time to market are longer
• Hosting of JEE application servers is more expensive
In general the light web languages gave faster results and were able to react to a fast changing environment
The websites that were emerging at the end of the 90s went through a lot of changes that were as drastic as rewriting the application from scratch. Most web applications where 80% stateless and load balancing provided a good solution for scaling.
Regardless of all the “pros” for Php/Perl, I believe that Java will increase dramatically:
Today, where the market is larger and mature there is an emerging demand for performance and scalability. Design patterns and open source utilities has a substantial impact on reducing time and cost and giving more stability and flexibility to JEE application. Today JEE application reacts faster and better to changes (if designed correctly) it provides a more integrative and consistent application than any other language available
Cheers,
Mickey
Posted by:Mickey Ohayon | October 04, 2007 at 08:46 AM
> Why it appears that similar challenge (Scalability) was addressed by different application stack in Web and Financial industries?
I think that web site need to scale at the lowest possible cost. In the financial industry that have the money to use their own datacenters, higher end servers, pay for software licenses...
Posted by:HiHo | October 04, 2007 at 08:47 AM
There is no Java hacker in this world :-). Even they hack any particular problem, it would take a long time to complete or they are simply mad. The same can be done in matters of time in Perl/Python/Ruby/etc.
I couldn't agree that Java is not used because it is not open sourced language. There are plenty of awesome open source frameworks available in Java for web development and we can get a lot of help through community. Very rarely people would change the language core even if they do,there would be a lot of question about the language scalability.
Many banks or institution choosing Java/C# because it is a current fashion in computer era. There are companies who advertise Java/C#. It is more than management choice than programmers choice. Your manager used to advise you to think out-side box by giving you off-the-shelf component. Is it not rude? :-)
Linux/open source users tends to contribute more to the community because the community is the one who helped them to fix user's own (yes) issues. Microsoft/others would charge you if you need any help from them :-). They charged us for their undocumented own software bugs.
Scripting language (with Meta-language feature) would provide unimaginable flexibilities and new thoughts.
Posted by:Krish | October 04, 2007 at 12:06 PM
I will make an attempt to answer Nati's question:
Why it appears that similar challenge (Scalability) was addressed by different application stack in Web and Financial industries?
Basing it on my experience working for a large, primarily NON-IT company, I can see the mentality that drives the financial world when choosing technology. More than choosing the technology they like to choose Vendors and Partners that would drive their technology. For example, IBM has invested heavily on J2EE and IBM happens to have an everlasting presence in such large companies. Similarly for Oracle, Weblogic. These companies offer a lot more than just a j2ee server. They offer their suite of products and support. These are the driving factors for a non - IT company. The non-IT firms believe going open source way is to not get complete professional support (I disagree though).
Putting things into perspective, I think to a large extent it makes sense for financial , manufacturing etc companies to have their IT be handled by a vendor rather than invest on technologies that need people at hand.
What do you think?
Posted by:Akshay | October 04, 2007 at 04:36 PM
Scaling JEE applications are hard? Have you heard of Terracotta? makes scaling so easy a Caveman can do it...
ps. and it is open source.
http://www.terracotta.org/ i don't work for the company,i've been developing enterprise apps for about 9 years now..and i used terrocotta in my last and it worked flawlessly..just spreading the word on a good solution to scale easy ur apps..
The reason why many of these social sites are not written in java, because they started out as small projects, there are numerous finanical, government,private systems that are "mission critical" and scaled written in java/ejbs/spring,etc.
Posted by:Jeryl Cook | October 04, 2007 at 05:01 PM
What about bebo.com ? Isn't it written in Java ? Can anyone confirm ?
Posted by:David Lee | October 04, 2007 at 05:02 PM
Guys, I guess you guys are just thinking too much on technical side. Think about the processes. For security, deployment, pluggable interfaces, SCM, resource management and all that, they involve tons of paperwork or processes to simply put something in place in production. It's not just "Hey Joe, give me write access to the /httpdoc folder so I can deploy my php files". It doesn't work that way in large enterprises. Think about a security department involving 10 people signing off the paper for adding two people in the production to have write access to only one file, do you know how long does it take and how complicated that is? All these are for auditing purposes; the more people work together, the more audit process it requires. Technically, it can be done in both PHP and Java but when it comes to management, JEE and applications give you much more than just technical to make your life easier. This is just my 2 cents.
Posted by:Andy | October 04, 2007 at 05:15 PM
I have read almost all the comments above and agree with lot of them that JEE solution in expensive and complicated but I think we should also look at the stats that these websites, even though were startup earlier, were build by people who are well aware of all the technical solutions available in the market and they still choosed LAMP because it is easy and fast. These people are also aware that their site may grow bigger and as they grew, they still work good so we can't argue that these technologies are not scalable. Its only how you use these technologies to work efficiently for you(of course at a minimum cost..)
Posted by:Rupinder Bir | October 04, 2007 at 05:18 PM
A couple of reasons scalability evolves differently has to do with money and requirements. First, finance and telecom - which are remarkably similar in many ways - have a lot of money to spend on hardware: it is much easier and faster to scale with load balancers, HA databases (such as Oracle), reverse proxies and so on. I've seen web companies, which have less money, handle load balancing at the application level. This scales very poorly, but they save money in the short run. There is no doubt budget plays a part here.
The second difference has to do with high availability requirements. When you are dealing with financial and telecom companies the high availability, fault tolerance and geographic redundancy requirements are much higher than web companies. MySpace is slow with many failed requests but who really cares. If the same thing happens when people are making trades, it's a different matter. Here we are left with two choices (or some combination thereof): use database partitioning or have an expensive cluster - by this I mean two or more active, replicated instances - of databases. If I have the money, using a cluster is a lot easier and quicker to scale. If I have a cheaper DB, like MySQL or SQLServer, this option isn't even on the table from a technical standpoint: I have to go with database partitioning (shards being one example) for scalability.
I would also like to stress that many developers in web companies don't understand how to leverage network architecture but rather think about solving the problem from an application side, hence the focus on database partitioning as the primary option for scalability.
I know there are exception cases to what I have said, Google being one, as they have a lot of money but do database partitioning; they also build their own load-balancers. This is a case of a software company that has the expertise, the money, and generally people who like to geek out thinking it would be a really neat thing to do.
Posted by:Shane Isbell | October 04, 2007 at 09:08 PM
Hi ,
I myself had been struggling with the ease factor that LAMP provides over J2EE.
Lot of financial companies that I have worked with use J2EE. J2EE certainly scales. But depends on How you build the system . If you build a truly stateless Application , Absolutely it will scale . Have seen that scale in one of the biggest brokeragesin US.
The problem happens when Lot of technologies are pushed into J2EE, without the disclaimer that " Be cautioned , using this will make your application non scalable". JSF leads the pack there . It does not scale , because it adds so much state into the app.
SO I guess you can build non scalable app even in LAMP and vice versa.
More and more I have come to believe scalability and performance has now become a function of how well the system is architected rather than being a function of J2EE or LAMP.
comments ?
Thanks ,
Krish .
Posted by:Krishnendu Majumdar | October 05, 2007 at 06:38 AM
That's a kind of a question. For what's worthy (maybe 2 cents), here are my thoughts.
I have to say that I am a Java guy, I have years of experience in financial institutions and telcos. I always worked on large scale enterprise B2B or B2C systems where scalability and performances were top on the list of the requirements. I've read that Java would lead to higher deployment costs, that LAMP stack is application level scalable, that Java developers are more expensive than anybody else, etc... That's partially true indeed. But I'd say that the difference is that financial institutions and telcos have a more conservative profile when it's technology risk management. An heterogeneous stack might be cheaper at first glance, but the total risk associated is higher. Finance and telco rather accept a slightly less performance and (probably) more expensive technology because it's more manageable and stable. Also, web companies are usually more aggressive and use "cutting edge" technology where finance and telco are usually a step behind.
Posted by:Angelo Andreetto | October 05, 2007 at 10:16 AM