Steven Robbins published an interesting article on InfoQ titled RAM is the new disk.
In the comment thread, Steven Robbins quoted Tim Bray and others comparing file system performance to memory:
Memory is the new disk! With disk speeds growing very slowly and memory chip capacities growing exponentially, in-memory software architectures offer the prospect of orders-of-magnitude improvements in the performance of all kinds of data-intensive applications. Small (1U, 2U) rack-mounted servers with a terabyte or more or memory will be available soon, and will change how we think about the balance between memory and disk in server architectures....
It raises the following questions:
What if the disk were RAM-based? Does that mean that all we need to do is replace the current disks with RAM technology to gain speed? The title of the article leads people to think along those lines.
My own take:
It's not just the speed of memory compared to disks that makes a difference. It's not even the extra benefit of the collocation of CPU and memory. What's really a important is the fact that disk is a sequential storage medium that was designed primarily to store a stream of bytes, not tables of data. That means that if you want to store data objects you need to serialize them into bytes, map sectors in the file system that points to the location of those bytes. Maintaining an index to this data is a relatively expensive operation as every additional index is stored as a copy of the original data, there is no real option to access data by reference, etc. If you think about it, existing RDBMS are basically a mapping layer between data-tables representations and sequential storage. A large part of existing database implementations is spent on addressing the impedance mismatch between the two representations models. All this complexity doesn't really exist when we're dealing with memory. That means that if will take existing databases and run them on memory based devices we're basically going to force the limitations of sequential storage representations into memory.
To exploit the real value of memory based resources we need to have different approach and implementations that assume that data can be accessed by reference - that objects can be accessed directly from our application without complex mapping layer in our native application domain.
At this point I'd like to end with Tim's last remark:
Disk will become the new tape, and will be used in the same way, as a sequential storage medium (streaming from disk is reasonably fast) rather than as a random-access medium (very slow). Tons of opportunities there to develop new products that can offer 10x-100x performance improvements over the existing ones.