In Putting the Database Where It Belongs, I reviewed the role that I think the database should take in today's scalable architecture. I introduced a concept that I refer to as PaaS – Persistence as a Service, as the right place for it. In this post I'll go into more detail on what that means:
New middleware components, such as Data Grids, provide a means to utilize memory resources -- which are available today at a much higher capacity and lower cost. These memory resources can be used as the "system-of-record" for in-flight transactional information, or state (pretty much the role the RDBMS played with file systems). The application no longer needs to interact directly with the database. Instead it interacts with the In-Memory Data Grid (IMDG), which takes care of the consistency, availability and reliability of our application state.
With this approach, the database will be used as a mechanism for the durability of our information, similar to the role played by today's Data Warehouses. As with Data Warehouses, the application doesn;t need to interact with the directly with the database; it uses asynchronous communication to handle the synchronization with the back-end database.
If the In-Memory Data Grid is the system of record, persisting the data to disk is simply a background process, completely de-coupled from the application. We can easily change the persistent store implementation, and also change how the data is persisted, at what rate it should be synchronized, etc. All this can be done outside the context of our application. By "outside" I mean that neither the application code or configuration should is affected by any of these changes. This is possible because we no longer need to embed the database *awareness* into our application runtime: we simply run a persistent service on the network, which will take care of it. It is the role of the Data Grid to implicitly take care of the synchronization with the persistent service.
How do we integrate this model with existing applications and databases?
Hibernate is a widely used O/R mapping solution. It provides a good solution for mapping the (in-memory) object view to the (persistent) relational view.
Using Hibernate Second-Level Cache
Hibernate has features such as Second-Level Cache that provide basic integration designed for read-mostly scenarios in which we can use the Hibernate API to abstract the interaction with the database. We can use Hibernate to delegate our query into an In-Memory Cache and save the I/O overhead associated with retrieval of data that was already loaded prior to this query. The benefit of this approach is that it can be made transparent to Hibernate users. The down side is that it is rather limited to read-mostly scenarios, it doesn't address the scalability challenge and it still requires tight coupling between our application and the database.
Moving Hibernate Mapping to the Background
We can overcome these limitation quite easily by putting the In-Memory Data Grid as a front-end to Hibernate. Instead of interacting directly with Hibernate from within our application, we interact with the Data Grid. The Data Grid handles the synchronization with the Persistence Service, which uses a Hibernate plug-in to map the object view and the relational view. The mapping is done outside the context of the application. It is done in the background and, therefore, provides much higher performance and scaling benefits.
A detailed explanation on how to integrate Hibernate's Second Level Cache with a Data Grid (specifically GigaSpaces'), as well as how to use the Data-Grid Hibernate Plug-In is provided here.
I believe that this shift in the data-middleware stack is only the first step in a bigger shift toward a new middleware stack which I'll discuss in future posts.