LinuxWorld
Subscribe to this site with RSS

Kernel space: Chunkfs and reiser4

As disks continue to get bigger faster than data transfer rates improve, we will need better filesystem software to prevent an fsck crisis.

One of the fundamental problems facing filesystem developers is that, while disks are getting both larger and faster, the rate at which they are growing exceeds the rate at which they are speeding up. As a result, the time required to read an entire disk is growing. There is little joy in waiting for a filesystem checker to do its thing during a system reboot, so the prospect of ever-longer fsck delays is understandably lacking in appeal. Unfortunately, that is the direction in which things are going. Journaling filesystems can help avoid fsck, but only in situations where the filesystem has not suffered any sort of corruption.

Given that filesystem checks are something we have to deal with, it's worth thinking about how we might make them faster in the era of terabyte disks. One longstanding idea for improving the situation was recently posted in the form of chunkfs, "fs fission for faster fsck." The core idea is to take a filesystem and split it into several independent filesystems, each of which maintains its own clean/dirty state. Should things go wrong, only those sub-filesystems which were active at the time of failure need to be checked.

Like many experimental filesystem developments, chunkfs is built upon ext2. Internally, it is a series of separate ext2 filesystems which look like a single system to the higher layers of the filesystem. Each chunk can be maintained independently by the filesystem code, but the individual chunks are not visible outside of the filesystem. The idea is relatively simple, though, as always, there are a few pesky details to work out.

One is that inode numbers in the larger chunkfs filesystem must be unique. Each chunk, however, maintains its own list of inodes starting with number one, so inode numbers will be reused from one chunk to the next. Chunkfs makes these numbers unique by putting the chunk number in the upper eight bits of every inode number. As a result, there is a maximum of 256 chunks in any chunkfs filesystem.

A trickier problem comes about when a file grows. The filesystem will try to allocate additional file blocks in the chunk where the file was originally created. Should that chunk fill up, however, something else needs to happen; it would not be good for the filesystem to return "no space" errors when free space does exist in other chunks. The answer here is the creation of a "continuation inode." These inodes track the allocation of blocks in a different chunk; they look much like files in their own right, but they are part of a larger array of block allocations. The "real" inode for a given file can have pointers to up to four continuation inodes in different chunks; if more are needed, each continuation inode can, itself, point to another four continuations. Thus, continuation inodes can be chained to create files of arbitrary length.

Kernel space: Chunkfs and reiser4 By Anonymous on June 8, 2007, 10:04 am Reply | Read entire comment I am glad to hear this ( though I have been hearing this for awhile). I have been using reiserfs (3) for at least 5 years ( I use linux almost exclusively at home...

More than 256 chunks By Valerie Henson on July 31, 2007, 11:19 am Reply | Read entire comment 256 chunks is an accidental limitation of Amit's implementation. The implementation I'm working on has 64-bit chunk ids. In general, this is the case for most...

UnionFS? By Anonymous on June 11, 2007, 3:29 pm Reply | Read entire comment Linux-based NAS vendor Agami Systems uses this general notion for their NAS system, except their chunks use XFS for the base filesystem rather than EXT (there are...

All comments (3)

Note: Register to have your user name appear; otherwise your comment will show up as "Anonymous."

*Anonymous comments will only appear once they are approved by the moderator.

Newsletter sign-up

Sign up for one of Network World's newsletters compliments of Linux World

Linux & Open Source News Alert
Web Applications Alert
Video & Podcast Alert
Security: Threat  Alert
Virtualization Alert

Email Address: