Amazon recently released a new service, SimpleDB, which provides simple persistence and query services for applications running on the Amazon Elastic Compute Cloud (EC2). SimpleDB is part of Amazon's web services strategy (or cloud computing services) and seems to compete quite effectively with Google's alternative stack, as noted by Nitin Borwankar in his excellent post Amazon SimpleDB 101 & Why It Matters :
... a side-by-side comparison makes it clear that Amazon WS in general – and SimpleDB in particular — is superior, for the following reasons:
- Google’s offerings – not only BigTable but GoogleBase, Gdisk, etc. — all have an ad hoc, grab-bag-of-tools feeling to them, devoid of any integrated strategy. Or if there is one, it is well-hidden.
- Amazon WS clearly involves a well-designed master plan aimed at changing the face of software as a service, each new offering akin to a chess piece in a game focused on creating strategic long-term value. And with SimpleDB, the queen has moved to the center.
- Amazon WS is based on the YOYODA principle — You Own Your Own Data, Always. Along with Amazon S3, SimpleDB is a sharp arrow in the quiver of open data proponents.
- Amazon WS includes a built-in, flexible payment system so users are neither forced to offer their app for free nor have an “ad-supported” model forced upon them. Now you can build a data-based web app on SimpleDB and seamlessly charge for it.
SimpleDB is not a database
It's important to note that SimpleDB is not a database per-se. I think that the name SimpleDB is more confusing than helpful. There is a very detailed technical comparison between traditional databases and SimpleDB here. The long list of limitations -- such as transactional support, query semantics and consistency -- is a clear indication that SimpleDB is not yet another database, and shouldn't be measured as such.
SimpleDB has some thought-provoking capabilities. One of them is "eventual consistency", which means that updates may take time to propagate among all the SimpleDB instances. During that time lapse, data read from one of the copies may represent an inconsistent state, which will eventually become consistent within a certain period of time. Another interesting feature is the ability to add new fields and indexes to existing items on-the-fly. There is a small caveat, though, which is that all SimpleDB attribute values are strings. This can be categorized as either a limitation or an advantage, depending on how you look at it.
The list of limitations raises the question of what is SimpleDB good for? And when should I consider using it? To answer these questions I'll start by outlining what seems to be Amazon's main motivation for introducing this new service in the first place:
Motivation behind SimpleDB:
1. Complexity of database set-up in a cloud computing environment. Setting up a database cluster in a regular data center is very complex. Setting up a database cluster in a cloud computing environment is significantly more complex. SimpleDB provides a built-in clustering model designed from the get-go to run in the cloud.
2. Cost-effectiveness. The cost model of EC2 is pay-per-use. Most existing database licensing models don't fit this model. SimpleDB is charged per-use in the same way other compute resources in the Amazon Web Services framework are charged.
3. Optimized for read-mostly web applications. The relational database model is too complex for handling the sparse, unnormalized data that is typical in web applications. SimpleDB was designed to handle only this type of data (more details below), and therefore, is a much simpler alternative to traditional databases, whose complexity is mostly due to the very rigid consistency semantics requirements they were designed for.
What is SimpleDB good for?
The features (and limitations) provided by SimpleDB seem to be driven by the needs of e-commerce applications. One typical need of these applications is maintaining a product catalog. Quite often that catalog will vary in structure from one item to another. The same item might have a different set of attributes -- which vary, for example, from one vendor to another -- and different values for that same attribute. So this entails a much more loosely defined schema.
From a consistency perspective, an "eventually consistent" state is good enough in most cases. A period of a few seconds of inconsistency isn't a big deal in many e-commerce apps.
I can also see how SimpleDB can be useful for maintaining directory service information, such as user profiles, configuration management information and so on. It can also be used as a shared repository between multiple sites.
Final words
SimpleDB seems to address a need that I have seen referred to as Document-Driven Databases, in which records aren’t grouped by their structure but by their attributes. ORM tools, such as Hibernate or the Active Record pattern, attempt to address this requirement by hiding the underlying relational model. They, however, still inherit the complexity and limitation of the underlying relational model. Having said that, SimpeleDB is clearly not a solution for every scenario. In fact, it solves only a limited set of scenarios, such as the one described above. As a disruptive technology, I expect that it will take some time before there is enough experience and patterns to use it correctly in the architecture.
The introduction of SimpleDB occurred mainly due to the limitations of existing database implementations, and how well they fit (or rather, don't fit) with the cloud computing model. There are other approaches that can be used to address these limitations, some of which I covered in my recent posts PaaS – Persistence as a Service (using Hibernate) (which discusses how you can address such requirements while keeping the data in the existing database) and The Missing Piece in Cloud Computing: Middleware Virtualiztion (which provides a broader context on the need for virtualization of the entire middleware stack, not just the data store, to make better use of cloud computing).
There are other solutions, some of which are in the making as we speak, such as the integration of Lucene/Compass and GigaSpaces. I'm sure that there are other solutions that aim to solve this challenge that I'm not aware of, so my recommendation is simple: before going down the SimpleDB path, take a good look at your application requirements, and make sure that it is the right solution for your problems.