PaaS – Persistence as a Service (using Hibernate)
In Putting the Database Where It Belongs, I reviewed the role that I think the database should take in today's scalable architecture. I introduced a concept that I refer to as PaaS – Persistence as a Service, as the right place for it. In this post I'll go into more detail on what that means:
New middleware components, such as Data Grids, provide a means to utilize memory resources -- which are available today at a much higher capacity and lower cost. These memory resources can be used as the "system-of-record" for in-flight transactional information, or state (pretty much the role the RDBMS played with file systems). The application no longer needs to interact directly with the database. Instead it interacts with the In-Memory Data Grid (IMDG), which takes care of the consistency, availability and reliability of our application state.
With this approach, the database will be used as a mechanism for the durability of our information, similar to the role played by today's Data Warehouses. As with Data Warehouses, the application doesn;t need to interact with the directly with the database; it uses asynchronous communication to handle the synchronization with the back-end database.
If the In-Memory Data Grid is the system of record, persisting the data to disk is simply a background process, completely de-coupled from the application. We can easily change the persistent store implementation, and also change how the data is persisted, at what rate it should be synchronized, etc. All this can be done outside the context of our application. By "outside" I mean that neither the application code or configuration should is affected by any of these changes. This is possible because we no longer need to embed the database *awareness* into our application runtime: we simply run a persistent service on the network, which will take care of it. It is the role of the Data Grid to implicitly take care of the synchronization with the persistent service.
How do we integrate this model with existing applications and databases?
Hibernate is a widely used O/R mapping solution. It provides a good solution for mapping the (in-memory) object view to the (persistent) relational view.
Using Hibernate Second-Level Cache
Hibernate has features such as Second-Level Cache that provide basic integration designed for read-mostly scenarios in which we can use the Hibernate API to abstract the interaction with the database. We can use Hibernate to delegate our query into an In-Memory Cache and save the I/O overhead associated with retrieval of data that was already loaded prior to this query. The benefit of this approach is that it can be made transparent to Hibernate users. The down side is that it is rather limited to read-mostly scenarios, it doesn't address the scalability challenge and it still requires tight coupling between our application and the database.
Moving Hibernate Mapping to the Background
We can overcome these limitation quite easily by putting the In-Memory Data Grid as a front-end to Hibernate. Instead of interacting directly with Hibernate from within our application, we interact with the Data Grid. The Data Grid handles the synchronization with the Persistence Service, which uses a Hibernate plug-in to map the object view and the relational view. The mapping is done outside the context of the application. It is done in the background and, therefore, provides much higher performance and scaling benefits.


A detailed explanation on how to integrate Hibernate's Second Level Cache with a Data Grid (specifically GigaSpaces'), as well as how to use the Data-Grid Hibernate Plug-In is provided here.
I believe that this shift in the data-middleware stack is only the first step in a bigger shift toward a new middleware stack which I'll discuss in future posts.

This is quite interesting concept. I am also looking same kinda of option using products like Gigaspaces EDG. I would like to know your views on following
a) We can have at the data grid at enterprise level and also at application level (data-grid in a cluster of applications of same type...eg--shopping cart apps deployed in multiple m/cs). In the case of the enterprise level, data in grid will be mostly data that represent the domain model...mostly act as a data services grid. But if we look at application level, data will be domain model plus data about other infrastructure services like authentication/authorization/scheduling etc. My question is what type of data should be exchanged through the grid. Overusage of the data grid will also create performance bottlenecks?
b) How we ensure the data security in enterprise grid? For example, multiple application may be consumers of the data from the grid,,,,few application should have access to sensitive data...other apps should be restricted .. Is it possible?
Posted by:Prashanthjee S | September 24, 2007 at 12:00 PM
Hi Prashanthjee
In general the data that should reside within the DataGrid should be the one that is frequently used and lay within the critical path of the application in terms of performance and latency. There are different usage patterns that you could use to overcome the capacity limitations for example you could use local-cache to load data on-demand to your application, this could be used for read mostly data scenario where it is likely that the application will have subsequent reads of the same information. For read/write scenario you should use partitioning, in this case the capacity is driven by the number of partitions and capacity per partition.
There are various levels of isolation that you can apply. Cluster level isolation is done at the discovery phase, in this case each group of nodes can be associated with a specific group-name and be found only under that group name. On the space level a space can block access based on authentication, if that is not enough you can block request based on the content level - at this stage user can be granted access only to specific entry from a authorized class that contains specific attribute value and blocked for all the rest.
In any event I would recommend that you would also look at the following paper Data-Awareness and Low-Latency on the Enterprise Grid
That covers what it means to use DataGrid at the enterprise level.
Posted by:Nati Shalom | September 26, 2007 at 04:34 PM
Nati,
I have worked with javaspaces and currently work on Spring applications with Hibernate for persistence. The applications I develop are web based running on Tomcat.
I deploy the spring app as a jar file in Tomcat since there is servlets/struts/jsp front end. Its all collocated.
How can I scale out the application? I read a number of SBA papers. I am bit confused about PU. My servlets have to interact with business services and return results to jsp. How can I scale out this model wrt PU and where will Tomcat be. Will it be running inside a Space Based Container as mentioned in a white paper in Mule? I am still trying to figure out where Tomcat will be etc. etc.
Also is there an equivalent of Pub/Sub model in JavaSpace? I know take is kind of queue(PTP) in that once taken from queue it is no longer available.
I have always been a fan of JavaSpaces and you guys through SBA is doing a great job. I want to do a presenation to management on all this and thats the reason I am asking.
I also read your whitepaper regarding Mule and Gigaspaces and I use Mule also for integration needs. Similar like Tomcat where will Mule container run? Will it run inside the Space Based Container that you elude to.?
In web apps where will you manage session data is it in the IMDG?
Thanks in advance and great to see SBA eveloving.
Regards,
Hari
Posted by:Hari | September 29, 2007 at 01:57 PM
Hi Hari,
The questions regarding how one integrates the web front end is an interesting one. Basically, you have several options:
1. Use our PU and deploy it onto our Grid Containers (the PU starts a Space). You can then communicates with the PUs and its business logic (through the Space) from your web container using our Spring integration (you connect to the remote Space and operate on it).
2. Have the Space and your "PU" (which are just Spring application context) started from within your web module deployed into tomcat.
3. Start tomcat within our Grid Container. This is the most interesting solution, though not very simple currently. We are working on making it simple.
Cheers,
Shay Banon
System Architect at GigaSpaces
Posted by:Shay Banon | October 01, 2007 at 10:17 AM