Spring

January 09, 2009

Getting ready for the cloud

In the past few months, I have been speaking in various conferences about cloud computing.
In my presentations, I tried to focus primarily on how one can take practical steps to benefit from the cloud today. To illustrate my points, I used GigaSpaces as the scale-out application server and Amazon EC2 as the cloud infrastructure and thus demonstrated how one can deploy EXISTING java applications on the cloud while:

  • Not having to re-write your application
  • Preventing lock-in to specific cloud provider
  • Enabling seamless portability between your local environment to cloud environment 
    • No code or configuration change is required between the two environments
    • Develop local - test on the cloud
    • Built for iterative development

In addition, I demonstrated the use of our new Cloud tools framework which enabled me to fully automate the entire application provisioning process on the cloud. This tool enabled me to:

  • Deploy load balancer
  • Cluster web containers
  • Cluster application processing units
  • MySQL database connected to EBS

And all that through a single click deployment.

In the demonstration, I also showed how you can benefit from the cloud:

  • Dynamically scale when the load breaches a certain threshold
  • New web containers are dynamically linked to the load-balancer (no configuration change is required as all is taken care of on the fly).
  • Self healing - if something breaks, scale-down to existing machine while getting a new machine ready.

The entire presentation can be viewed online 


You can also view the online video presentation that i gave in the Cloud Summit in Tel Aviv where i ran a live demo of our new cloud infrastructure on EC2 demonstrating both dynamic scaling and also what happens when machine fails in front of a live audience!


VideoIcon[1] The video recording of this presentation is available here



If you want to try to run the demo yourself and you don't have an Amazon user - just drop an email to cloud at gigaspaces dot com and will send you a free pass to the early access program.


Enjoy!

August 18, 2008

GigaSpaces XAP 6.5/6.6 new releases

GigaSpaces 6.5 was released at the end of June, and we are now working on the 6.6 release, with the first milestone already publicly available. These are major milestones in a series of upcoming releases all aimed at strengthening our proposition as a Scale-Out App Server. Our main goal is to significantly simplify the process of achieving scalability, including scaling an EXISTING application within days and without enforcing a complete re-architecture.

I refer to this as our "Seamless Scaling" or "Simple Scaling" initiative. You can read some of the rationale behind this initiative in my previous post Can scalability be made seamless. This is a very ambitious goal, and by all means, we are not finished. In addition to the tremendous enhancements already put in place, we have a long-term roadmap that covers many aspects of the product.

The efforts we have undertaken (as well as those on our roadmap) involve enhancements to our development frameworks in Java, .Net and C++, mostly around the abstraction layer, including supporting standard APIs that enable us to inject many of our Data Grid and event-driven capabilities through annotations and configuration, with zero or minimal changes to application code.

They also involve increasing robustness, making large-scale deployments simpler to deploy and manage. Other efforts include extensive integration with popular frameworks, such as Spring and Mule, and recently we also added Web framework integration with Jetty. All this is designed to make the end-to-end scaling experience extremely simple and native. Judging by recent feedback we received, some of it publicly referenceable, it looks like we're making great strides in achieving our goal. I particularly liked the following quote by one of our existing customers Monte Paschi Group, who built a new pricing system with GigaSpaces. Their full case study is available  here. I chose the following quote from this study as it highlights some of the benefits that we don't always emphasize enough - development simplicity:

The development team is happy, too,  since the architecture has been
greatly simplified compared to the multi-layered application server system.
"We're not a software company, we're a financial company," Santini
explains. "We didn't have weeks or months to study the technology. Our
main goal was to use it to achieve our goals. GigaSpaces XAP allowed us
to do that right out of the box.

You can read the full details of the 6.5 release here. For convenience, we grouped separately the long list of features into Java,.Net and C++ categories, and provided detailed descriptions that outline the rationale and the value behind each feature.

In this post I'll try to highlight some of the important features and provide insight into our future plan.

Seamless scaling using the Service Virtualization Framework

At the application layer the most notable feature is the Service Virtualization Framework. The Service Virtualization Framework (SVF) can be seen as a major enhancement to Session Beans in EJB3 and Spring Remoting. This  framework enables you to write your business logic as a POJO and deploy the services across a cluster of machines, while providing a single client proxy that virtualizes all those instances as if they were a single server. For more details I recommend reading a new white paper covering the concept behind this framework and how it can help you simply build scalable, high-performance SOA and event-driven applications. The white paper is available here

Seamless scaling of popular development frameworks

We enhanced and expanded our integration with popular development frameworks. The purpose of the integration is to provide end-to-end seamless scaling to such frameworks in a way that doesn't require changes to the application. A good example is our Mule-ESB support. Mule users can take existing Mule 2.0 applications and significantly improve performance and scalability by plugging in the GigaSpaces runtime into the Mule Framework. The good news is that the wiring happens outside of your application code at a couple of levels:

  1. Connector level – leveraging our messaging layer as the transport for Mule
  2. Clustering level – taking advantage of our clustering capabilities, enabling the internal Mule data structure to span across multiple machines for scalability and high-availability

This integration is provided as part of our open source framework, OpenSpaces. We hope and anticipate that these integrations will be used as a reference by other frameworks looking for ways to provide similar levels of scalability and reliability. A good example for that already happening is the work David Greco performed by integrating the Camel open source ESB with GigaSpaces.

With 6.6 we also added out-of-the-box integration to Jetty. This was done in collaboration with the Webtide team (the company behind Jetty), who have given us excellent support throughout the process. What i like about this integration is that it enables taking an EXISTING web application packaged as a WAR and dynamically scaling it across a pool of machines. With this approach, you also get session-replication injected to your existing application without touching your code or WAR package. If you're willing to make slight configuration changes, you can get caching reference injected into your application. There is a new example that shows what it takes to scale an existing web application. The example uses the Spring Pet Clinic application and deploys it on a GigaSpaces cluster. The full example is available here.

Removing the language barrier

For decades language have been treated almost as a religion by developers. As an ex-CORBA guy, I know how much language interoperability is painful to deal with, and often requires compromises on functionality, performance and complexity. At GigaSpaces, we realized that there is no reason that different languages shouldn't be treated simply as forms of writing business logic. They each generate different values. For example, .Net provides better integration with Windows applications. C++ provides better performance in certain areas and provides low level APIs for integration with many third-party libraries.

Persistence: 6.5 has some major enhancements for implementing our Persistence-as-a-Service model in .Net through support of nHibernate. This enables .Net applications to integrate with existing databases and have their own database mapping layer with a native .Net API. This comes along with quite extensive Perforamance improvements. You can see some of the results here for .Net, and here for C++.

One of our goals for 6.5 was to bring the same level of scalability and simplicity we provide for Java to .Net and to C++, without compromising performance. Unlike some of the alternatives in the market, we don't just provide remote access to our Java runtime, but provide complete application server capabilities in these two languages -- as well as complete interoperability among all three. Java, C++ and .Net services and clients can run and share the same process and leverage that to remove the network call overhead often required when a call crosses language boundaries. An immediate benefit is that you're able to run your C++ and .Net business logic where the data is. Furthermore, you can now leverage our existing SLA-driven deployment to automate the deployment of Java and .Net applications. This means that instead of running each server process manually, you have a single deployment command that will make sure that your serves are running on the appropriate machines, that your backups are running on different hosts from your primaries, that if one machine goes down a new instance will take over immediately, or if one is not available, as soon as it becomes available -- all that without any human intervention!

Dynamic language support

The Java framework guys realized the need to support dynamic language as part of Java, making the JVM a common platform for running various languages. GigaSpaces XAP 6.5 leverages this, and provides enhanced support for Groovy, JRuby and JavaScript.

Dynamic language support enables writing business logic in Groovy/Jruby/JavaScript and executing it on the GigaSpaces cluster. One of the common use cases for this capability is to provide an elegant alternative to Stored Procedure. this means you can write business logic in Groovy, for example, that will be executed directly on the data grid nodes. With this, you can write your own custom data-queries and aggregation functions, and execute them where the data is. Beyond the performance benefit that you gain out of running the logic collocated with the data, you gain the benefit of using dynamic languages, i.e., you can add new functions on the fly without the need to deal with class-versions and class-loading issues and without the need to bring the data down whenever you do that. In this way you can add new functions while the system is running and continues to serve other applications.
This feature leverages the SVF mentioned above. This means that you can choose to run these dynamic procedures synchronously, asynchronously, in parallel, etc. Now Isn't that cool?

Click here for code snippets and detailed descriptions of this feature.

Data awareness everywhere
Throughout all of our development efforts, we are making sure data-awareness is maintained across the entire stack. Data-awareness means that invoking a method on the new Service Virtualization Framework can be routed to a particular service instance based on the data associated with that service instance. It also means that when you send a message through our JMS implementation, you we will be able to route it to the JMS partition that manages the relevant data. Unlike alternative solutions, this is native to our environment, meaning that there is no need for external integration and complexity to achieve this behavior.

Click here to view a code snippet and detailed description of how routing is handled in the Service Virtualiztion Framework.

Performance, Performance and more Performance...

Improving performance remains a constant goal for all of our releases. As the product matures, finding places in the product where performance can be further optimized is getting harder, and I therefore am always surprised when one of the developers comes up with some creative idea around performance.

In this release we improved performance on several fronts -- including Java, .Net and C++ -- which involved significant optimizations of object serilization and multi-core scalability. For the latter we are working with Azul, and making it part of our testing environment, as well as other multi-core systems such as Sun Niagra. You can see some of the figures and details here, here (Comparison with previous release of .Net) and here(C++).

We conducted detailed comparisons of latency and throughput of a "classic" transactional application based on the standard JEE model (Using JBoss ,JMS, Spring, Hibernate) with the same application but using GigaSpaces as the messaging and data-layer -- and eventually replaced the entire JEE stack with a GigaSpaces + Spring stack. It is important to note that throughout this process, the business logic code remained untouched. The initial results of this tests can be found here:

Latency1jpg_version1Throughput1jpg_version1

You can find the details of the code that was used in this test and the migration steps in a new whitepaper that is now available on our site here.  Uri Cohen wrote up in his blog (The Space as a Messaging Backbone) some of the interesting findings from this analysis that showed the difference between end-to-end measurment and point optimization and why in some cases putting a distributed cache in front of a database is not going to be enough.

What's next:

For obvious reasons I can't expose our entire roadmap as of yet. What I can say for sure is that we're going to continue improving the level of seamless and simple scalabaility provided by our platform.

We view the partnership and integration with other frameworks as strategic, and we're going to continue with that effort. One of the frameworks we are planning on working on is GlassFish.

We already announced our first cloud offering, designed to run on Amazon EC2, and including partnerships and integration with RightScale and Cohesive FT. If I'm not mistaken, this was the first Java application server available in a pay-per-use model, designed to meet the needs of enterprises and ISVs that want to offer their applications on the cloud, including as Software-as-a-Service. We're going to put in a lof of effort into making cloud deployments simpler, enabling our customers to use it on their local virtualized environments (private clouds) and on public clouds (Amazon EC2, GoGrid, FlexiScale, AppNexus and others), or even a combination of the two, without changing their applications. We're now working on a new version that enables provisioning a cluster of machines, deploying the application on said cluster and opening up an adminstrative console for the cluster -- all with a single click. This is already working in an internal beta. We're planning to provide a preview release by next month. With the availability of Windows-based clouds, we will be providing our .Net application platform as a cloud offering as well.

On the API and Standards front, we recently joined the OSGI alliance, where we expect to play an active role.  We are also looking into ways through which we can strengthen our compliance with some of the latest standards on the JEE stack, such as EJB 3.0 and JPA. The challenge is not just basic API mapping, but how to do it in a way that doesn't break our scale-out architecture and doesn’t create complexity. Unfortunately, previous versions of the EJB spec weren't a good fit. EJB 3.0 looks much more promising.

On the .Net front, we're going to continue with our performance optimization project. We're also working on making our .Net offering fit natively within a .Net development environment by providing better development and installation packages that fit better with the .Net spirit. We are also looking into ways to simplify the testing and debugging process. For pure .Net users we will make the .Net version available as a standalone package at a reduced price (details will follow).

On the C++ front, we're going to provide our customers with an open source version of our C++ binding and a complete package that will enable them to compile and build our C++ with their own set of dependencies, libraries and compiler versions and flags. This will also allow using the current C++ framework as a broad integration framework for third-party tools and languages.

There's much more than I could cover in this post. I tried to put together what I thought are the highlights of the release. As it's impractical to cover such a wide spectrum of topics in a single post, we started a process in which different people from our R&D and field engineering teams will post on specific aspects of the product and best-practices for using GigaSpaces in existing web, financial, online gaming and other applications.


Be part of our next release:

As we are now making the decisions of what to include in our 7.0 release, it would be nice to hear your feedback and specific requests for enhancements or new features. You can either send me a direct email or send it to PM at GigaSpaces.com Alternatively, if you think that you have a good idea that other users might be interested in, you can implement it on our community site – OpenSpaces.org.

The new GigaSpaces XAP 6.5 is available for download here.

July 29, 2008

Can scaling be made seamless?

Putting together the two words "seamless scaling" in front of a technical audience is a very dangerous thing to do. The technically savvy folks are walking around with plenty of scars from previous attempts to scale their system - enough to know that there "scaling" and "seamless" couldn't be further apart. But nevertheless, in this post I'm going to take the risk and do just that :)

Basically what I'm going to try and argue is that while scaling can't be made seamless across the board, there are different techniques to make scaling seamless in certain scenarios, or at least very close to seamless. I will use GigaSpaces as an example of how to achieve seamless migration of existing JEE applications into a scale-out model, with zero or minimal change to the code. I'll also outline our general principles, which I believe are applicable to any application seeking seamless scaling.

The seamless scaling dogma

There has been a lot of discussion over the past year about different patterns of scalability. I devoted quite a few of my posts on this topic. Most of them centered around architecture - how we can use partitioning to avoid a data bottleneck, how we can use in-memory implementations to get better performance and concurrency compared to implementations based on the file-system, and how we can use an asynchronous event-driven architecture as a better way to scale our business logic.

Randy Shoup outlined these principles nicely in his infoQ article, Scalability Best Practices: Lessons from eBay. The dogma behind all these discussions and panels was that scaling requires a very rare set of skills, which average developers don't have, and that's why we're still seeing plenty of online system failure. The most recent was the iPhone launch failure.

Does scaling really have to be complex?

Well, if you look at Network Attached Storage as an example, you'll see there are alternatives to the traditional dogma around scaling. With storage systems, we don't really think of scaling that much. More so - our applications don't really need to be aware of the fact that they run over a local disk or a network -attached device. We can scale by adding disks, even hot-swapping them in some cases, even while our application is still running.

Now imagine what the world would look like if it wasn't that simple. If our application would need to be aware of what's behind the scenes of these storage devices and would have to be re-written to deal with these scaling issues. It's not that hard to imagine, is it? Most likely we would still have been talking about storage-related system failure as a result of bad architecture and implementations issues. But we don't have anything to talk about, because storage gave us a level of abstraction that enabled almost everyone, regardless of their skill-set, to deal with scaling without being an expert at it, or really even thinking about it much at all.

Can we learn any lessons from NAS about our ability to achieve seamless scalability?

Let's see what were the conditions that made seamless scaling with storage possible:

  • Well-defined interface (or abstraction)
  • Interface that fits the share-nothing approach to make it suitable for scaling
  • Simple interface
  • Widely-used interface

Now if we examine these principles as they apply to other layers of the application stack, we'll get a decent answer as to why we haven't been able to apply the same level of seamless scaling - which storage already provides - to these other layers.

In the data layer, the most commonly -used interface is SQL. SQL fit well with criteria (1) and (4) criteria but doesn't meet (2) and (3). HashTable fit well with (2) and (3) but unfortunately is less commonly used in distributed systems. JavaSpaces, like HashTable, fits (2) and (3) but is even less commonly used then HashTable. In the messaging tier, JMS fits well with (1), (3) and (4) but doesn't lend itself well to (2), and so on. And these are the cases where there is a well-defined standard. Unfortunately, in other layers of our applications it's even harder to find a well-defined standard that fits to all of these criteria.

To overcome this complexity, there have been other attempts to use the JVM bytecode as a lowest common denominator and introduce seamless scaling not at the middleware API level, but on the JVM level using bytecode manipulation. This seems like an elegant solution to the problem, however most of the existing distributed systems were not written as a standalone Java applications that get distributed by some sort of magic, so it fails mainly on the 4th criteria - it fits mainly to new applications that were designed with certain assumptions in mind about how the standalone Java code would behave in a distributed environment.

Now to the point - can we scale seamlessly?

Those who expect a simple yes-or-no answer to this question are going to be disappointed - there is no clear answer , because it depends on the specific application scenario, the way the application was written and the maturity of various standards around these applications.

In general I would say that Java-framework-based applications are in better condition then applications based on other frameworks, due to the maturity of the standards and the advanced layer of abstractions that are now available as part of framework such as Spring and Mule.

Seamless scaling at the application layer would most likely mean the ability to plug-in different underlying scalable implementations at the middleware layer (data, messaging, business-logic, presentation). The use of abstraction layers such as IOC in Spring/Mule and the new EJB3 abstraction gives more freedom to plug in different implementations that don't necessarily conform to the exact same standard API. That means that your code can remain intact when you plug in a different messaging implementation, for example, whether it is a JMS implementation, a space-based messaging, or remoting.

Some cases are going to be easier then others. For example, taking a SessionBean and scaling it by having multiple instances of that service running over a pool of machines, while viewing them all as if they where a single server, can be done through configuration changes only. We can do pretty much the same thing to the messaging layer, where we will have a virtual queue and topic rather then a centralized server.

On the data layer things are more tricky, as most of the commonly-used standards in this area don't fit criteria (2) very well. If our data model is built with a complex object graph, or if our queries depends on complex joins, then we're not going to be able to scale it out without changes to the code or to the domain model. But even in these more difficult cases, it's possible to minimize the scope of change by using the DAO pattern, declarative transactions and annotations as a mapping layer on top of the domain model. This means that even if the change can't be completely seamless, it will nevertheless be quite simple to achieve.

Learning from the GigaSpaces experience

At this point I'd like to use our specific experience at GigaSpaces to describe the methods we used to enable seamless scaling:

  • Use standard APIs, but only when it makes sense. For years we chose not to implement large parts of the JEE standard, such as EJB and Entity beans, because they didn't fit the scale-out environment and were too bounded to database. What I'm trying to say is that implementing a standard API is not always going to make the transition to scale-out model seamless, so you should be careful which standard you pick.
  • Leverage existing abstractions to plug in different implementations that are based on other APIs or technologies than the one originally used. We use this principle quite extensively in our OpenSpaces framework, to map our own transaction handlers, Remoting abstraction, to enable seamless scaling of SessionBeans , etc.
  • Use annotations for mapping between different models.
  • Use aspects to add new behavior when it makes sense. We use aspects in several cases such as filters/remoting aspects and security aspects. We will probably be using aspects more to address a more advanced level of serialization.
  • Apply more tightly coupled integration to specific products/frameworks  A good example for that is our Spring, Mule and upcoming web tier integration. This sort of integration enables an end-to-end seamless scaling story that makes the user experience significantly better. On the .Net side our integration with Office and Excel enables something equivalent.
  • Use open source as a tool to open up the framework for extensions and other integration work. This is something that we introduced quite recently through our new OpenSpaces.org community site and found it to be a useful tool with many extensions already available. GigaSpaces users implemented their own extensions and made them available through the community site. The most recent one has been Camel integration.

Real life examples

Of course, this isn't just a theoretical discussion - we've been attempting to achieve this level of seamless scaling in practice since we introduced our middleware virtualization stack, which was our first attempt to address scaling of existing applications and not just new applications.

We have been involved in numerous scenarios of scaling out existing applications. An interesting example is detailed in Mickey's recent blog post, in which he describes in more detail how he was able to scale-out a JBoss/Oracle RAC-based application. Mickey provides a good description with code snippets that show the before and after effects, both in terms of code changes and obviously scaling and performance. You can find the details of that experience here. The bottom line of this case study is the fact that he was able to get that application from 15tx/sec to 1500tx/sec in less then 4 days! For me, measuring the time it takes to move your EXISTING application and see the immediate results is the ultimate measure. You have to agree that if the transition to a scale-out model wasn't seamless, it wouldn't have been possible to do in such a short time, and more importantly, without ripping and replacing the entire application. In Mickey's case, we started with decoupling of the database to get the initial scaling, and replaced the other layers incrementally.

Summary

Storage taught us the lesson of seamless scaling. Seamless scaling can be achieved on other layers of our application as well, using a combination of Standard APIs, Abstractions, Aspects and tailored integration. In most cases, seamless scaling would mean no changes to our application code but would require changes to configuration and packaging. Not all layers can make a fully seamless transition. But in those more difficult cases, we can use the same principles to significantly minimize the changes required for scaling.

In this post i wanted to share some of our GigaSpaces experience in that area as i believe many of the lessons and principles are pretty generic and can be applied to any project/product. At this point it is also important to note that this is not a one-off proposition. It's a continuous effort and requires a long-term roadmap and commitment. We've been struggling with this for years and applied every possible method to achieve this goal. Some required significant re-factoring of our entire infrtustructure. The lastest one has been the addition of our OpenSpaces framework as an open source development framework based on Spring. With this change, we can easily support more APIs and frameworks, as well as build an entire ecosystem around it that will enable others to apply the same model to even more frameworks and applications very easily.

You may wonder why we, as a commercial company, would want to do this - after all it also means that GigaSpaces can be replaced much more easily. Well, the reason is fairly simple - we believe that our success and adoption will be much higher if we can get to the point where scaling any application through GigaSpaces won't require any changes to code. It took few years and an intensive effort to get a point were I can feel comfertable to use the two words "Seamless Scaling". Now we're starting to see the fruits of that effort - just see the recent post by Seon Lee who appears to be one of the Mule users: Mule 2.0 + GigaSpaces 6.5 = Pure Sex:

Gigaspaces released 6.5 with API integration with Mule 2.0 … this is just plain awesome. You can use Gigaspaces as the transport (e.g. in place of JMS) and quickly get a SBA up and running utilizing the same concepts I used at RHG when we were servicing B2B problems. You also get the advantage of the clustering ability and fault tolerance that comes with Gigaspaces – which is just pure sex – not to mention all the other great features that come with this advanced Javaspaces implementation (i.e. management tools, monitoring tools, data partitioning, performance features like batching).

I expect to see even more on that line with our latest 6.6 release which includes Seamless Scaling of Web application - check that out!

July 21, 2008

GigaSpaces is Available on Apache Camel

Apache Camel is a Spring based integration framework.
I was happy to see that David Greco released a JavaSpace connector for Camel based on GigaSpaces.

Quoting from David description of the connector:

"The javaspace: component is a transport for working with any JavaSpace compliant implementation, this component has been tested with both the Blitz implementation and the GigaSpace implementation .
This component can be used for sending and receiving any object inheriting from the Jini Entry class, it's also possible to pass an id (Spring Bean) of a template that can be used for reading/taking the entries from the space.
This component can be also used for sending/receiving any serializable object acting as a sort of generic transport. The JavaSpace component contains a special optimization for dealing with the BeanExchange. It can be used, then, for invoking remotely a POJO using as a transport a JavaSpace.
This latter feature can be used for an easy implementation of the master/worker pattern where a POJO provides the business logic for the worker.
Look at the test cases for seeing the various usage option for this component."

Interestingly enough I'm seeing more the use of space based transport used to drive this new type scale-out integration frameworks. Beyond the space transport i believe that Camel users can leverage the fact that they can use the space as a data-store for sharing the state between the various services without needing to go to database just for that purpose.

Nice work David!





June 16, 2008

Meet us @ TSSJS Prague

TheServerSide has a good record of picking nice spots for their conferences, and this year's Java Symposium in Prague is no exception.

It's looking to be a fun event, as I'm going to meet not just lots of old friends, but also the winners of our first OpenSpaces developer contest. I've already written about the contest and some of the submissions in a previous post. If you haven't already, check it out ,as we decided to continue with the contest next year and use  TSSJS as an opportunity for attendees to apply for "early bird scholarships" worth $1,000 each -- all this at an award ceremony we're holding in Prague during the show. Besides free booze and food at this event, we're going to show a nice video featuring the judges of the competition. Those who worked with Alit through the current competition will probably be happy to know that she is going lead this part of the show together with John Davis who was one of the Judges.

I'm going to be there with some of my colleagues from GigaSpaces, namely Shay Banon and Uri Cohen.
My presentation is titled Getting ready for the cloud but it really talks about the next wave in distributed computing in which clouds plays an important role and have the potential of changing many of the things we used to do in the past.

Banon will be talking about some of the work that he's been doing with Mule, Lucerne/Compass and Spring in his session Beyond Data Grids. I've seen him discussing some of these topics in Las Vegas this year, so I know it's going to be really interesting. Last time it sparked many questions about how clustering technologies can deal with scaling challenges, how in-memory data grids can replace or co-exist with traditional databases, and how they can be applied to different frameworks given real life examples.

Uri is going to talk about his experience in building scalable web 2.0 applications using Ajax, Tomcat and Spring MVC, and running on the Amazon EC2 cloud. He will discuss specific patterns for dealing with Ajax scaling issues, and also provide patterns and tips for moving from a tier-based to a scale-out model based on recent work he's done with JBoss and, of course, GigaSpaces.

The TSS event is also going to be a good opportunity for us to expose some of our latest development in our upcoming 6.5 release,such as the new Service Virtualization Framework (based on Spring Remoting), Dynamic language support, extended support for hibernate and enhanced database integration, built-in Maven support, support for Spring 2.5 annotations and enhanced administrative and real-time monitoring.

Mule users will also benefit from our extensive support for the Mule ESB. We're also going to show some of the latest developments with EC2 and cloud computing environments. Even though TSS events tends to be Java-centric, I believe that Java users will be happy to learn about our interoperability among Java, C++ and .Net. For those unfamiliar with it, I would recommend giving it a closer look as it provides high performance and an extremely simple alternative for making the language barrier pretty much obsolete.

There is much more to it than I can cover in this post. In fact, we realized that an entire post will not be enough to cover all the relevant content of our 6.5 release, so expect to see several dedicated posts in the coming weeks -- here and on the GigaSpaces Blog -- covering different aspects of new features, including some "behind-the-scenes" stories. Stay tuned!

April 30, 2008

Cool Projects on OpenSpaces.org

The OpenSpaces.org community site launched in January. I was surprise by the rapid adoption of OpenSpaces since then, with lots of interesting innovations on things I didn't even think of. I'm sure that some of the projects will be very useful to many OpenSpaces users. This shows the value behind  an ecosystem and community. Given the right tools, people will start collaborating and share things that otherwise would be buried in their hard disk, or in their mind.

The OpenSpaces.org site also provides a great tool for GigaSpaces Partners and individuals in the general developer community to expose their skills by publishing valuable content. A good example is GridDynamics, a GigaSpaces partner, who invested time and effort on producing high quality, well-documented projects.

The same goes for various people on the GigaSpaces team who came up with great ideas based on work that they did with customers. They use the OpenSpaces,org platform to share the tools they developed with other users in the community who might have similar needs. For example, the OpenSpaces demos project shows how to integrate Ajax, Spring MVC and OpenSpaces to scale a typical web application (market data front end, in this specific case). 

Another good example is TGris, an extension of the testing grid framework that we use internally at GigaSpaces, and which several customers showed interest in for automating the testing of their own applications (note that the tool is not specific to OpenSpaces).

Another class of  interesting projects are those that integrate OpenSpaces with various frameworks and APIs. These projects simplify the integration and adoption process, and shorten time-to-value. Good examples are the projects that provide integration for OpenSpaces/GigaSpaces with Amazon SimpleDB, JPA, and Memcached , as well as the  Cache Integration project, which enables OpenSpaces/GigaSpaces support for many frameworks, such as Acegi Security, Cocoon, Jetty, iBatis, OpenJPA, Velocity and others.

Other people built entire functional applications,  such as Leonardo Gocalves's  GoDo - Goods Donation System (see details below), and Jim Liddle's MobileGSFeed, which provides a scalable solution for handling Atom feeds through the iPhone. Jim actually runs our Sales in the UK & Ireland. Never in my dreams did I imagine that OpenSpaces.org would be used by sales guys :-)

Anyway, I'm very pleased to let you know that we reached an important milestone for OpenSpaces two weeks ago when we reached the deadline of the developer contest. Fourteen candidates made it to the final stages. Only three will be finalists. A distinguished panel of judges interviewed each contestant. The judges are Adrian Colyer, CTO, SpringSource; Joe Ottinger, Editor, TheServerSide.com; John Davies; Julian Brown, Architecture Consultant, RWE;  Keerat Sharma, Platform Engineer, Gallup; and Ross Mason, Co-founder and CTO, MuleSource.

All of the candidates put up a real good fight and made it very hard for the judges to reach their final decision. The winners of the contest will be announced in a nice venue in Prague during TheServerSide Java Symposium event. Stay tuned for updates on the exact date and venue here and on The GigaSpaces Blog and web site. We also intend to publish interviews with each of the finalist project owners and post them in a blog.

Here are some of the interesting projects (in alphabetical order). The full list of projects can be found here.

Please join one of the projects or start a new one yourself. If you already developed something, but are concerned about the time it will take to initiate a new project -- don't be! It is extremely easy and quick to start a new project and if you need any help, we're ready to support you.

 

 

 

 

April 29, 2008

Nice article on Space-Based Architecture

I Just came across a very good article by Clara Ko summarizing the concept behind Space-Based Architecture on Java Pulse. The article is based on our recent white paper Scaling Spring Application In 4 Steps.  Clara's article is available here. I highly recommend reading it.

I particularly liked the way Clara summarized the problem with tier-based architectures:

GigaSpaces argues that tier-based architecture is the nemesis to linear scalability. In fact, the architecture itself is the bottleneck, not unoptimized component. This is because in a tier-based architecture, as scability patches are introduced, the communication between the tiers become more complicated and expensive. The complex interaction between tiers and between physical machines in the deployment cause problems with latency and data consistency. Possible problems with tier-based architecture are bottlenecked access to the database, bottlenecked access to a centralized messaging provider, messaging overhead, network latency, inefficient clustering, and unreliable failover recovery. GigaSpaces addresses each of those problems by providing a platform that enables space-based architecture.

Couldn't say it any better.

March 29, 2008

Scaling Out MySQL

With the recent acquisition of MySQL by Sun, there has been talk about the MySQL open source database now becoming relevant to large enterprises, presumably because it now benefits from Sun's global support, professional services and engineering organizations. In a blog post about the acquisition, SUN CEO Jonathan Schwartz wrote that this is one of his objectives.

Mysql_logoWhile the organizational aspects may have been addressed by the acquisition, MySQL faces some technology limitations which hinder its ability to compete in the enterprise. Like other relational databases, MySQL becomes a scalability bottleneck because it introduces contention among the distributed application components. 

There are basically two approaches to this challenge that I'll touch in this post:

1. Scale your database through database clustering

2. Scale your application, while leaving your existing database untouched by front-ending the database with In-Memory-Data-Grid (IMDG) or caching technologies. The database acts as a persistence store in the background. I refer to this approach as Persistence as a Service (PaaS).

While both options are valid (with pros and cons), in this post I'll focus mostly on the second approach, which introduces some thought-provoking ideas for addressing the challenge.

Disclaimer: While there are various alternative in-memory data grid products, such as Oracle Coherence and IBM ObjectGrid, in this post I'll focus on the GigaSpaces solution, because for obvious reasons I happen to know it better. Having said that, I try to cover the core principles presented here in generic terms as much as possible.

Scaling your database through database clustering:

There are two main approaches for addressing scalability through database clustering:

  • Database replication is used to address concurrent access to the same data. Database replication enables us to load-balance the access to the shared data elements among multiple replicated database instances. In this way we can distribute the load across database servers, and maintain performance even if the number of concurrent users increases.

            Limitations:

  • Limited to "read mostly" scenarios: when it comes to inserts and updates, replication overhead may be a bigger constraint than working with a single server (especially with synchronous replication)
  • Performance: Constrained by disk I/O performance.
  • Consistency: asynchronous replication leads to inconsistency as each database instance might hold a different version of the data. The alternative -- synchronous replication -- may cause significant latency.
  • Utilization/Capacity: replication assumes that all nodes hold the entire data set. This creates two problems:.1) each table holds a large amount of data, which increases query/index complexity. 2) We need to provision (and pay for) more storage capacity with direct proportion to the number of replicated database instances
  • Complexity: most database replication implementations are hard to configure and and are known to cause stability issues.
  • Non-Standard: each database product has different replication semantics, configuration and setup. Moving from one implementation to another might become a nightmare.
  • Database partitioning ("sharding"): database shards/partitions enable the distribution of data on multiple nodes. In other words, each node holds part of the data. This is a better approach for scaling both read and write operations, as well as more efficient use of capacity, as it reduces the volume of data in each database instance.

          Limitations:

  • Limited to applications whose data can be easily partitioned.
  •  Performance: we are still constrained by disk I/O performance
  •  Requires changes to data model: we need to modify the database schema to fit a partitioned model. Many database implementations require that knowledge of which partition  the data resides in is exposed to the application code, which brings us to the next point.
  •  Requires changes to application code: Requires different model for executing aggregated queries (map/reduce and the like).
  •  Static: in most database implementations, adding or changing partitions involves down-time and re-partitioning.
  •  Complex: setting-up database partitions is a fairly complex task, due to the amount of moving parts and the potential of failure during the process.
  •  Non-standard: as with replication, each database product has different replication semantics, configuration and setup. Partitioning introduces more severe limitations, as it often requires changes to our database schema and application code when moving from one database product to another.

Time for a change  -  is database clustering the best we can do?

The fundamental problems with both database replication and database partitioning are the reliance on the performance of the file system/disk and the complexity involved in setting up database clusters. No matter how you turn it around, file systems are fairly ineffective when it comes to concurrency and scaling. This is pure physics:  how fast can disk storage be when every data access must go through serialization/de-serialization to files, as well as mapping from binary format to a usable format? And how concurrent can it be when every file access relies on moving a physical needle between different file sectors? This puts hard limits on latency. In addition, latency is often severely affected by lack of scalability. So putting the two together makes file systems -- and databases, which heavily rely on them -- suffer from limited performance and scalability.

These database patterns evolved under the assumption that memory is scarce and expensive, and that network bandwidth is a bottleneck. Today, memory resources are abundant and available at a relatively low cost. So is bandwidth. These two facts allow us to do things differently than we used to, when file systems were the only economically feasible option.

Scaling through In Memory Caching/Data Grid

It is not surprising that to enhance scalability and performance many Web 2.0 sites use an in-memory caching solution as a front-end to the database. One such popular solution is memcached. Memcached is a simple open source distributed caching solution that uses a protocol level interface to reference data that resides in an external memory server. Memcached enables rudimentary caching and is designed for read-mostly scenarios. It is used mainly as an addition to the LAMP stack.

The simplicity of memcached is both an advantage and a drawback. Memcached is very limited in functionality. For example, it doesn't support transactions, advanced query semantics, and local-cache. In addition, its protocol-based approach requires the application to be explicitly exposed to the cache topology, i.e., it needs to be aware of each server host, and explicitly map operations to a specific node. These limitations prevent us from fully exploiting the memory resources available to us. Instead, we are still heavily relying on the database for most operations.

Enter in-memory Data Grids.

In-memory data grids (IMDG) provide object-based database capabilities in memory, and support core database functionality, such as advanced indexing and querying, transactional semantics and locking. IMDGs also abstract data topology from application code. With this approach, the database is not completely eliminated, but put it in the *right* place. I refer to this model as Persistence as a Service (PaaS). I covered the core principles of this model in this post. Below I'll respond to some of the typical questions I am asked when I present this approach.

How Persistence as a Service works?

With PaaS, we keep the existing databases as-is: same data, same schema and so on. We use a "memory cloud" (i.e., an in-memory data grid) as a front-end to the database. The IMDG loads its initial state from the database and from that point on acts as the "system of record" for our application. In other words, all updates and queries are handled by the IMDG. The IMDG is also responsible for keeping the database in sync. To reduce performance overhead, synchronization with the database is done asynchronously. The rate at which the database is kept in sync is configurable.

The in-memory data model can be different from the one stored in the database. In most cases, the memory-based data model will be partitioned to gain maximum scalability and performance, while the database remains unchanged.

Img1042


How does PaaS improve performance compared to a relational database?

Performance gains over relational databases are achieved because: 

  • PaaS relies on memory as the system of record, and memory is significantly faster and more concurrent than file systems.
  • Data can be accessed by reference, i.e., no need for continuous serialization of data, as with a file system.
  • Data manipulation is performed directly on the in-memory objects. Complex manipulation is easily achieved by running either Java/.Net/C++ code or a SQL query. There is no need for serialization/de-serialization of data or network calls during the process.
  • Reduced contention: instead of placing all data in a single table, and consequently having many clients accessing that table, we split it into many small tables, each of which will be accessed by a smaller number of clients.
  • Parallel aggregated queries: queries that need to span multiple partitions to perform join/sum/max operations can be executed in parallel across the nodes. The fact that the queries run on smaller data sets reduces the time it takes to perform the actual operation on each node. In addition, the fact that queries execute on multiple machines leverages the full CPU and memory power of those machines.
  • In-process local cache: read-mostly operations are cached in the client application local address space. This means that subsequent reads will be executed locally.
  • Avoid Object-Relational Mapping (ORM): read operations are performed directly from memory in object format. Thus, there is no need for O/R mapping overhead at this level. O/R mapping happens in the background either during the initial load process, or during the asynchronous update of the database.

If you keep the database in sync, isn't your solution limited by database performance? 

No. Because:

  • Data is sent asynchronously and in batches
  • Updates are performed in parallel by all partitions.
  • Updates to the database are executed collocated in the same machine as the database through a mirror service. This enables to reduce the network overhead to the data base as well as benefit from specific optimization such as batch operations.
  • The database is not used for high availability purposes. This means that In-flight transactions are not stored in the database, only the end result of the business transactions. This reduces the amount of updates that are sent to the underlying database. Also keep in mind that queries don't really hit the database, only updates and inserts. All this together means that the IMDG acts as a smart buffer to the database. It is common that the number of read/update hits the IMDG receives is 10x higher than the number of hits on the underlying database is seeing. 
  • The database and the application are now decoupled, giving you more options for optimization. For example, there are scenarios where writing to the database is required to ensure the durability of the data.  In this scenario, you can store the data directly in a persistent log (to ensure durability). The log can be updated at a relatively high rate. You can read the data from the persistent log back into the database as an off-line operation. With these options in place we can  easily get to 30,000 to 40,000 updates per second with a single instance of MySQL. If this is not sufficient you can always combine data base clustering  to speed up the data  base  access.

Doesn't asynchronous replication mean that data might be lost in case of failure?
No, because asynchronous replication refers to the transfer of data between the IMDG and the database. The IMDG, however, maintains in-memory backups that are synchronously updated. This means that if one of the nodes in a partitioned cluster failed before the replication to the underlying database took place, its backup will be able to instantly continue from that exact point.

What happens if one of my memory partitions fails?

The backup of that partition takes over and becomes the primary. The data grid cluster-aware proxy re-directs the failed operation to the hot backup implicitly. This enables a smooth transition of the client application during failure -- as if nothing happened. Each primary node may have multiple backups to further reduce the chance of total failure. In addition, the cluster manager detects failure and provisions a new backup instance on one of the available machines. 

What happens if the database fails?
The IMDG maintains a log of all updates and can re-play them as soon as the database becomes available again. It is important to note that during this time the system continues to operate unaffected. The end user will not notice this failure!  

How do I maintain transactional integrity?
The IMDG supports the standard  two-phase commit protocol and XA transactions. Having said that, this model should be avoided as much as possible due to the fact that it introduces dependency among multiple partitions, as well as creates a single point of distributed synchronization in our system. Using a classic distributed transaction model doesn't take advantage of the full linear scalability potential of the partitioned topology. Instead, the recommended approach is to break transactions into small, loosely-coupled services, each of which can be resolved within a single partition. Each partition can maintain transaction integrity using local transactions. This model ensures that in partial failure scenarios the system is kept in a consistent state. 

How is transactional integrity maintained with the database?
As noted above, distributed transactions might introduce a severe performance and scalability bottleneck, especially if done with the database. In addition, attempting to execute transactions with the database violates one of the core principles behind PaaS: asynchronous updates to the database. To avoid this overhead, the IMDG ensures that transactions are resolved purely in-memory and are sent to the database in a single batch. If the update to the database fails, the system will re-try that operation until the update succeeds. 

What types of queries are supported?

  • Template matching (matching object based on class name, class hierarchy, and attribute values)
  • SQL – support range queries, 'like' semantics, etc.
  • Continuous queries – through a combination of notification and SQL.
  • Parallel query (a.k.a Map/Reduce) – queries that are not designated for a specific partition are automatically broadcasted to all partitions and the result is implicitly aggregated on the client side.
  • Iterator – iterate through a large result-set of data.
  • You can find some code snippets of the different query APIs here.

This model relies heavily on partitioning. How do I handle queries that need to span multiple partitions?
Aggregated queries are executed in parallel on all partitions. You can combine this model with stored procedure-like queries to perform more advanced manipulations, such as sum and max. See more details below.   

What about stored procedures and prepared statements?
Because the data is stored in memory, we avoid the use of a proprietary language for stored procedures. Instead, we can use either native Java/.Net/C++ or dynamic languages, such as Groovy and JRuby, to manipulate the data in memory. The IMDG provides native support for executing dynamic languages, routes the query to where the data resides, and enables aggregation of the results back to the client. A reducer can be invoked on the client-side to execute second  level aggregation. See a code example that illustrates how this model works here. 

Can I change these prepared statements and stored procedure equivalents without bringing down the data?
Yes. When you change the script, the script is reloaded to the server while the server is up without the need to bring down the data. The same capability exists in case you need to re-fresh collocated services code on the server-side.  

Do I need to change my application code to use an IMDG?
It depends. There are cases In which introducing an IMDG can be completely seamless and there are cases in which you will need to go through a re-write, depending on the programming model: 

 

 

Nature of Integration with IMDG

Comments/limitations

Hibernate 2nd level cache

Seamless

Best fit for read-mostly applications. Limited performance gain as it still heavily relies on the underlying database.

JDBC

Seamless, but limited

SQL commands written against the IMDG are guarantied to run with other JDBC resources. Doesn't support full SQL 92 and therefore existing applications may require code changes.Recommended for monitoring and administration. Not recommended for application development as it introduces unnecessary O/R mapping complexity.

HashMap

Seamless

Extensions such as timeout and transaction support are available as well. 

GigaSpaces Spring DAO

Partially seamless

Abstracts the transaction handling from the code. Domain model is based on POJOs, and therefore, doesn't require explicit changes, only annotations (annotation can be provided through an external XML file). If our application already uses a DAO pattern then it would require changing the DAO. This allows  narrowing down the scope of changes required to use an IMDG-specific interface. This option is highly recommended for best performance and scalability.

What topologies are supported?
Replicated (synchronous or asynchronous), partitioned, partitioned-with-backup.
See details here.

Do I need to change my code if I switch from one topology to another?

No. The topology is abstracted from the application code. The only caveat is that your code needs to be implemented with partitioning in mind, i.e., moving from a central server or a replicated topology to partitioning doesn't require changes to the code as long as your data includes an attribute that acts as a  routing index.

How are IMDGs and PaaS different from in-memory databases (IMDB)?

The relational model itself doesn't prevents us from taking full advantage of the fact that the data is stored as objects in memory. For example, when we use in-memory storage in an IMDG, we don't need the O/R mapping layer. In addition, we don't need separate languages to perform data manipulation. We can use the native application code, or dynamic languages, for that purpose.

Moreover, one of the fundamental problems with in-memory databases is that relational SQL semantics is not geared to deal with distributed data models. For example, an application that runs on a central server and was uses things like Join, which often maintains references among tables, or even uses aggregated queries such as Sum and Max, doesn't map well to a distributed data model. This is why many existing IMDB implementations only support very basic topologies and often require significant changes to the data schema and application code. This reduces the motivation for using in-memory relational databases, as it lacks transparency.

The GigaSpaces in-memory data grid implementation, for example, exposes a JDBC interface and provides SQL query support. Applications can therefore benefit from best of both worlds: you can read and write objects directly through the GigaSpaces API, query those same objects using SQL semantics, and view and manipulate the entire data set using regular database viewers.

Can I use an existing Hibernate mapping to map data from the database to the IMDG?

Yes. In addition, with PaaS, the Hibernate mapping overhead is reduced as most of it happens in the background, during initial load or during the asynchronous update to the database.

Further information on Hibernate support is available here.

Can I use PaaS with .Net or C++ applications?

Yes. Starting with GigaSpaces 6.5 both Hibernate (Java) and nHibernate (.Net) are supported. C++ applications deffer to the default Hibernate implementation. In addition, with GigaSpaces' new integration with Microsoft Excel, .Net users can easily access data in the IMDG directly from their Excel spreadsheets without writing code!

Final words:

While this approach is generic and can be applied to any database product, MySQL is the most interesting to discuss as it is widely adopted by those who need cost-effective scalability the most, such as web services, social networks and other Web 2.0 applications. In addition, MySQL faced several challenges in penetrating large enterprises. With the acquisition of Sun, MySQL becomes a viable option for such organizations, but still requires the capabilities mentioned above to compete effectively with rival databases. The combination of IMDG/PaaS with MySQL provides a good solution for addressing some of the bigger challenges in cloud-based deployments. More on that in a future post.

January 06, 2008

What a year!

For the past few days I've been trying to write a 2007 summary, but I found this task to be extremely difficult because so many things happened this year on so many fronts. I thought that it would probably be best if I'll start by thanking all of our customers who chose GigaSpaces. Special thanks to those who were kind enough to be public about their choice as you can see here ,here and here

This year we saw that many customers found that GigaSpaces and caching solutions are almost incomparable due to the tremendous progress we have made with our 6.0 release (More details provided below). This led to an interesting trend: Customers who already worked with one of the alternative caching products decided to add GigaSpaces to their environment in conjunction with those caching products. In this context GigaSpaces acts as a high-performance SOA platform through the use of our SLA-Driven Container, as well as the messaging bus and Spring components. In fact, in Q4 we closed a very large enterprise deal with one of the leading investment banks for exactly this reason.

I expect that during 2008 we'll see more of this as we plan to provide official support for these sorts of mixed environments. If you think about it, it makes prefect sense. In the JEE world this is a fairly common scenario. For example, it's common to have WebLogic as the application platform in conjunction with other components, such as databases or messaging middleware from other providers, even though WebLogic offers these components. I call that a middleware mash-up:)

Another interesting trend that we've seen is a new class of customers using GigaSpaces:

  1. Customers from other industries, such as Web 2.0 and online gaming (as you can imagine, there is similarity between the two)
  2. Customers who chose GigaSpaces mainly to improve the reliability and responsiveness of their online services. This included online banking applications as well as an order management system for the launch of a cool phone in Europe (I apologize that at this point I can't give more details).

We've also seen increasingly more customers who chose Spring+Hibernate as their middleware stack and wanted to maintain the high availability, scalability and performance of their Spring application without the complexity of J2EE.  Rod Johnson of SpringSource gives a nice explanation of this in this presentation, which is available online here (See slide 33), and in this blog post:

Is it a Tomcat, or the Elephant in the Room?

"...An interesting force is that the highest performing grid/clustering solutions are not the app servers themselves, but specialist solutions such as GigaSpaces, Oracle Coherence and IBM ObjectGrid. There is no reason that HA features need to be associated with Java EE servers."

From a product perspective, we had ambitious goals to enable the scaling-out of stateful applications (the entire application, not just the data) in a way that will be as simple as running it on single machine. The other challenge was to enable the transition of EXISTING tier-based applications to the scale-out model in a *seamless* manner, just by plugging-in our runtime implementation.

We managed to achieve these goals by working on several fronts:    

  • Develop a blueprint based on best practices to achieve scaling, latency, performance and reliability in a scale-out model. We refer to this blueprint as Space Based Architecture (SBA).
  • Extend our product to support SBA by adding built-in components, such as the processing-unit, and event containers.
  • Provide an end to end approach for virtualization of the application in a scale-out model through:
  • Middleware virtualization - (Data, Messaging, Services)    
  • Deployment virtualization - through our SLA driven container
  • Provide seamless transition from the tier-based model to a scale-out model through API facades and declarative abstractions. 

To validate our assumptions, we took a classic transactional application based on the tier-based model and compared it with SBA in terms of complexity, performance and latency . We applied the SAME CODE to both tests (the only change was within the DAO). The results of the test showed, beyond any reasonable doubt, that Space-Based Architecture can drive applications to linearly scalable performance and flat latency during scaling events. You can view the results of this test in the following presentation Scale Out Your Spring Applications in 3 Simple Steps. More details can be found here.

On the data virtualization front, we introduced capabilities such as PaaS - Persistency as a Service and .Net support that addresses both performance and the completeness of our Java-based feature set, including support for SBA (not just caching), and pure .Net components, such as SessionState. We already customers who have been in production for some time with a pure .Net system, built entirely on top of GigaSpaces.

Later in the year we introduced our partnership with Microsoft and our strategic integration effort with Excel. You can find more details on this exciting solution in this white paper. I also gave an online demonstration of it in the latest Grid Association event available online here.

We have also made significant progress with our C++ version, and have prospects running with our private beta of our new C++ support. The new C++ release is going to revolutionize the simplicity in which C++ applications can be written in a distributed environment.  Public beta will be available until end of January .

What's interesting is that we were able to maintain native API support in each language and complete interoperability among Java,.Net and C++ using what we refer to as Portable Binary Serialization format (PBS). This enables interoperability without compromising on performance! We are even able to run C++ services as part of our SLA driven container, which means that you can deploy C++ services just by pointing your service to the dll or shared library. This means that the C++ service can also run embedded within our OpenSpaces Processing Unit.

We made a huge effort in ensuring all these new capabilities are simple to use. Our choice of Spring for our new development framework - OpenSpaces - paid a lot of dividends, as it enabled us to simplify not just how you wire the caching component to your application, but also how you develop, test and scale your entire application in a "cloud" environment. To Spring users, OpenSpaces will fell like a native extension of their existing development framework, where complex things such as transaction handling, partitioning, scaling, fail-over and deployment are abstracted from their development code. Here's an example of feedback we received from one new user who tried our platform, Srini Penchikala:

I have been trying GigaSpaces on my local computer to learn more about
the framework. It's a great implementation that supports simple design and
a scalable and performant solution. Integration with Spring framework makes
GigaSpaces even more powerful.

You and your team have created a great product.

Another effort that helped us achieve simplicity is our award-wining documentation wiki and its Quick Start Guide and webcasts, which provide a walkthrough for writing your first scalable application.

In 2007 we also launched several major community and open source initiatives. One such initiative was our open source development framework, OpenSpaces. Another one was the Start-Up Program (which complements our existing free community edition).  We also announced our killer application contest, in which users can put their skills in distributed programing and scaling to the test and win a $10k grand prize. Some of the interesting submissions that have already been made include applications that will scale on Amazon EC2, scaling the Lucene search engine, scaling gaming applications, integrating OpenSpaces with PHP and other cool ideas. We're going to continue down this path during 2008, and open up more aspects of GigaSpaces, such as our Product Management processes. See below what one of our users had to say on our forum:

All I want to say is that the forum is awesome....so quick response. Thank you all involved with GigaSpaces for providing such nice support

We will also be officially announcing our new developer community portal, OpenSpaces.Org, by the end of January. It is a collaboration platform for users in the GigaSpaces community by sharing code, suggesting and contributing extensions, as well as providing tools and examples.

Now if this wasn't enough innovation wait until you see what we have planned for cloud computing:) We started an effort to enable GigaSpaces deployment on the Amazon Elastic Compute Cloud (EC2). We're about to release our second update of this and the plan is to have it fully integrated with our product release in 2008.

When I look back I'm quite proud of how a company our size is able to get all of this done. Indeed, we wouldn't have been able to achieve all of this if it weren't for the tremendous effort of our R&D team. It started with some new strategic hires and continued with a major investment in making our development process extremely efficient by applying new agile development methodologies such as Scrum (see this from Guy Nirpaz, EVP of R&D, here).

We also made a substantial investment in automating our testing process by developing a distributed testing framework we call Tigris. This framework enables us to run tests on multiple platforms (Windows, Linux, Solaris), Java (1.4, 1.5, 1.6, Sun, JRockit, IBM), .Net, c++ (Windows 32/64, Linux 32/64) with different cluster sizes and topologies -- all fully automated. It has become so successful that we're actually being pressured by our customers to provide them with our testing framework and consult them on our methodologies.

2008 looks very promising. I see great opportunity for GigaSpaces to become the de-facto scale-out platform. I expect to see continued adoption from the in the Spring community seeking a solution for scalability and high-availability, without the complexity of J2EE. I also expect to see greater adoption in the Web 2.0 market, as we're adding support for dynamic languages, as well as integration with web frameworks. As I mentioned, we're going to continue our effort to support Amazon EC2 as part of our platform to enable a simple evaluation of our product, run distributed testing and run a full-blown on-demand production environment. I expect that the significant progress on the .Net and C++ fronts with our next release will enable us to further penetrate those areas as well.

I'm very excited by all this progress and working closely with customers who are building their mission critical applications on top of GigaSpaces. What's nice about it is that our customers never cease to amaze me with their appetite for achieving even more. They are moving so fast in building large clusters that one of our next goals is to allow simple, out-of-the-box large cluster deployment and massive object sizes. With all this excitement,  I decided that this year I am finally going to spend some time and write a book...

Happy New Year!

December 20, 2007

Opinionated Architecture

I first heard the term "opinionated architecture" in Keith Donald's presentation during The Spring Experience. He used this term to describe the emergence of new web frameworks, such as Grails and Ruby on Rails. The term immediately caught my attention, as it describes in two words what we've been trying to achieve with OpenSpaces and Spring. Owen Taylor explains that very well in his recent blog Opinionated architecture - blue prints without the middleman . I particularly liked this testimonial:

As it happens, I have watched the growth of OpenSpaces over the last many months with the skeptical eye of a - er skeptic - not realizing that the purpose of the framework is not only buzzword compliance and the implementation of popular development practices, but in the context of GigaSpaces and what is offered to our prospects and customers, a truly opinionated framework that takes the vastly flexible GigaSpaces infrastructure and service implementations and distills *the* successful use of it all into enforceable use of the Spring programmatic approach and technologies, thus empowering the neophyte and thought-leader alike with the unmistakably opinionated path to success.
The beauty of the switch to the prescriptive and formulaic from the free and often flailing style, is the stunningly rapid improvement in adoption rate, early prototyping successes, and sustained production-systems that continue to reinforce my growing certainty that we got our bit of it right this time.

My Photo

Twitter Updates

    follow me on Twitter