In the past year, Intel issued a series of powerful chips under the new Nehalem microarchitecture, with large numbers of cores and extensive memory capacity. This new class of chips is is part of a bigger Intel initiative referred to as Tera-Scale Computing. Cisco has released their Unified Computing System (UCS) equipped with a unique extended memory and high speed network within the box, which is specifically geared to take advantage of this type of CPU architecture .
This new class of hardware has the potential to revolutionize the IT landscape as we know it.
In Part-I of this post, I want to focus primarily on the potential implications on application architecture, more specifically on the application platform landscape.
In Part-II of this post I will use the new GigaSpaces offering on top of Cisco UCS as a reference implementation of the concept from Part-I.
What is Tera-Scale Computing?
Quoting Intel’s Tera-Scale Computing Research Vision:
By scaling multi-core architectures to 10s to 100s of cores and embracing a shift to parallel programming, we aim to improve performance and increase energy-efficiency.
"Tera" means 1 trillion, or 1,000,000,000,000. Our vision is to create platforms capable of performing trillions of calculations per second (teraflops) on trillions of bytes of data (terabytes).
To put in simple words, Tera-scale is a commoditized version of the mainframe. With this new technology, we could easily create supercomputers at an affordable price.
What are the potential implications on current deployments?
One of the more trivial and probably more common use cases that comes with the introduction of these new powerful machines is the ability to condense more applications and virtual machines on less hardware. In this specific post, however, I'll refer mostly to the implications for the large-scale distributed applications that are becoming more popular these days, as the demand for scaling continues to grow, while cloud-based deployments are becoming more common and affordable.
Is it the end of distributed systems?
For anyone who deals with large scale deployments, it might feel at first impression that we're back to the mainframe days. However, a closer look can yield a completely different picture.
Quoting from my previous post Scale out vs Scale up:
The increased hardware capacity will enable us to manage more data in a shorter amount of time. In addition, the demand for more reliability through redundancy, as well as the need for better utilization through the sharing of resources driven by SaaS/Cloud environments, will force us even more than before towards scale-out and distributed architecture.
So, in essence, what we can expect to see is an evolution where the predominant architecture will be scale-out, but the resources in that architecture will get bigger and bigger, thus making it simpler to manage more data without increasing the complexity of managing it. To maximize the utilization of these bigger resources, we will have to combine a scale-up approach as well.
More specifically, the implications can be broken into two categories:
- Density – we can serve the same capacity and workload of our existing applications on significantly less hardware. This obviously yields an immediate benefit in terms of operational cost associated with the reduced maintenance, cooling and power, space, cabling etc.
- Capacity – we could get tens or hundreds times over the capacity and processing power from the same size and cost of today’s cluster.
The second point is where I see the greater potential. Imagine running your entire application completely in-memory, i.e. -- the disk will become the new tape. The fact that we can now get terabytes of data in-memory with only a few boxes means that we can do things that we couldn’t have dreamt of before -- we can process complex analytics that used to take hours in real time, run new classes of complex fraud detection algorithms, and recognize malicious attacks before they happen. We could correlate customer trends and increase the conversion trend from our visitors in our e-commerce sites. As a Telco provider, we could serve multi-media and social networking activities, process and render videos and images quickly, and provide more targeted and personalized commercials. We can build online gaming applications that could run 3D animation at a fraction of the cost. And the list can go on and on…
The Challenge
As with any new technology, exploiting its full potential involves change. The main challenge is with existing applications. Existing (large scale) applications were not built to utilize new multi-core, network, and memory capacity, simply because they were written at a time when memory was expensive and available at lower capacity of only a few GBs at most, network was considered a bottleneck, and multi-core didn’t exist or was just beginning to emerge. In addition, most of these application were designed to run against a centralized database or some sort of centralized storage, which makes it even harder to take advantage of the power behind this new class of machines.
So while this new class of hardware holds tremendous promise, there is a small caveat that is worth noting in Intel's vision:
By scaling multi-core architectures from tens to hundreds of cores and embracing a shift to parallel programming...
In other words, Intel recognizes the fact that to take full advantage of this new class of hardware, we need to embrace a new architecture that better lends itself to parallel programming.
Tera-Scale Reference-Architecture:
The diagram below, taken from the Intel reference architecture, points to the fact that in order to take advantage of the new underlying multi-core architecture we need a specialized software platform that is thread-aware and can abstract the application from the details of the underlying infrastructure.
In other words, we rely on the application platform to become the glue that enables us to migrate our existing applications from the current centralized model into a a more parallel and decentralized model. To make the transition simple, the platform will need to continue and support the existing interfaces and APIs, but replace them with more scalable implementations. At the same time, the platform needs to expose new capabilities that will enable development of new services that are designed with parallel distribution in mind.
There are two fundamental requirements of such a platform:
Scalable Software Infrastructure
- Highly parallel – The platform itself needs to be designed with extreme parallelism in its internal engine as well as expose parallel programming and event-driven semantics to the application.
- In-Memory – Memory is the only physical device that can manage highly concurrent transactions. To achieve maximum utilization the platform must not rely on disk or devices that do not lend themselves to concurrent processing.
- Scale-up and Out – The platform needs to support seamless transition between scale-up within a machine and scale-out between machines without changing the application code.
Simple – Pre-Integrated
The complexity often associated with tuning of the software and hardware together is going to increase exponentially in this highly parallel environment. Therefore, we can’t afford to think of the software platform and hardware infrastructure as two separate things. Instead, it must be responsibility of the platform to provide a pre-integrated and tuned environment, which includes:
- Hardware and Software tightly integrated – As the application is only exposed to the platform and not to the operating system, a tightly integrated platform can have much more room to use proprietary optimization of the hardware than the application itself. It can integrate with lower-level pieces of the hardware such as tuning network routing, moving the execution environment to the right core to minimize the NUMA effect and take full advantage of the CPU cache.
- Zero configuration – The platform should come pre-tuned and expose a simple setup that can be applied to the entire platform in a consistent way.
- Fully automated deployment – Deploying even the most complex application should involve a single deploy command that could in turn wire all the pieces together programmatically without exposing any manual step to the end user. This includes also the ability to deal with post deployment events, such as recovery processes and auto-scaling. This implies that the platform needs to have built-in integration for Dev-Ops APIs that are backed into the core of the platform and expose full control over the infrastructure services -- Memory, Machine, CPU, Network and middleware services (data, messaging and processing, web, etc.).
In Part-II of this post, I discuss the specific implementation of this concept in our new GigaSpaces Tera-Scale solution for Cisco UCS.