File systems are the programs that allow and manage the data and storage thereof in storage devices. Commonly, desktop users interact with files, directories (folders) and volumes (drives). This is the way the file system presents the information to the user. It used to be the case that the mapping between the presentation and storage of the information was direct. Modern file systems are complicated beasts: the feature several abstraction mechanisms that maintain a fixed/static file format, while allowing for improvements on the logical operations that manage where data is stored. They are modular in that they can accept plugins that provide additional functionality, such as compression or encryption of data. Frequently they optimise the writes and the reads from the storage medium so as to attain higher performance. Barring the promises of holographic storage [1], storage technology has remained largely the same in the past three decades. Sadly, for most computer users, the typical file system found on their computer has not progressed too much either.
A few days ago, I watched Hans Reiser’s techtalk on video.google.com. Not the most charismatic, eloquent or gripping speaker, yet his talk reminded me of my younger years when I had spent quite some time playing (see hacking) file systems, while at university. Of course, contrary to his, my time was arguably spent ‘in vain’, as I was merely getting acquainted with the technology and exploring. A main theme around which his speech, and Reiser4 itself, revolves is the unification of namespaces in the modern Operating System. This refers to the ‘integration’ of different types of file formats, storage layers (db, file, or other) including meta-data under a common namespace, allowing a single tool to access and control such information, more or less in the same way a web search engine can access information on many sites, all of different ‘type’ and ‘content’. As Reiser himself states, it is a utopian idea, although he believes that keeping this goal as the driving idea behind his work is enough to warrant a significant outcome. His talk goes on with a technical review of how he and his team of (mostly) Russian programmers, effectively reworked the internals of the Reiser file system for version 4, to overcome some of the performance and slack problems that trouble most file systems. Some of his work is reminiscent of much earlier work by IBM in its mainframe file systems — it is astonishing how many good things are just ignored in the ‘innovation-lacking’ PC world.
While Hans Reiser’s approach is most definitely interesting — I found his talk on reengineering the basic allocation algorithm of the file system interesting as it minimises slack while maintaining very high performance; I also appreciated his modular design — there has been a lot of interesting work on a, now dead, niche operating system, that has remained beyond the realm of everyday use for many years, until very recently. This is the work of Dominic Giampaolo. Dominic is an american programmer who after a job at Be Inc. where he designed the BeFS (arguably for quite a while of the most advanced and innovative personal computer filesystems around), the BeOS file system and various short stints at silicon valley companies (QNX, Google and SGI all have a — small — part in his career) he ended up at Apple Computer. There he brought much of his file system expertise/insight, in an effort that, to this day, has resulted in, first, the journalling version of HFS+ (HFS+ Journaling) where it ‘saved’ Apple some face by breathing some life into the ageing HFS+, and, more recently, Spotlight. Giampaolo’s work on the BeFS has been documented and published in a book, Practical File system Design, which you can now legally find online, for free, in PDF format.
If you are into file system design it is an interesting read.
Similar to his work on what, in the users’ eyes is Spotlight, and a part his work on BeFS is more or less what Microsoft had promised, but will not manage to ship with Vista (hopefully) early next year: WinFS, the metadata-enabled file system for Windows. Apple’s first foray into the world of rich metadata management started in April 2005, when the (still current) version of OS X (Tiger) shipped. Future versions of the OS are expected to depend largely on this new functionality.
There is a lot to be read about file systems, and the PC, while as widespread as it is has long has some of the most primitive of them. Arguably this is due to Microsoft’s complete dominance of the market, which slowed down the evolution of file systems to a crawl, as well as the rapid advance in processing and storage technology, which made the inefficiencies in existing file system designs easy to ignore. It might also be argued that computer use has not had the sophistication and need for more advanced file systems. In a way most computer users are still using antiquated technology; NTFS, the (Windows) New Technology File System, is most definitely the most backwards journalling file system out there today and, sadly, it can be found on the majority of computers worldwide. While it has some interesting traits (B+trees for indexing, extensible metadata-based file structure), a token perhaps of the initial designers’ insight and skill, its development has been much less interesting.
File Systems may seem boring, but they are a fundamental part of managing information on computers. While a lot of work has taken place in other layers of the OS, the development of more advanced file systems, has been very slow. Along with the evolution of user programs (Usability, UI) and indexing (Agents, AI, Data Mining, Data Fusion), advanced file systems will enable a much friendlier and richer interaction and organisation of information, a better computing experience. This will become even more so desirable with the ever-increasing amount of information stored on computers.
For the (very few I guess) people interested in this, I am embedding Hans Reiser’s techtalk below (courtesy of Google).