choosing a file system

Rob Mueller robm at fastmail.fm
Fri Jan 2 00:19:52 EST 2009


> There are /lots/ of (comparative) tests done: The most recent I could
> find with a quick Google is here:
>
> http://www.phoronix.com/scan.php?page=article&item=ext4_benchmarks

Almost every filesystem benchmark I've ever seen is effectively useless for 
comparing what's best for a cyrus mail server. They try and show the 
maximums/minimums of a bunch of discrete operation types (eg streaming IO, 
creating files, deleting files, lots of small random reads, etc) running on 
near empty volumes.

What none of them show is what happens to a filesystem when it's a real 
world cyrus mail spool/index:

* 100,000's of directories
* 10,000,000's of files
* 1-1,000,000 files per directory
* files continuously being created and deleted (emails)
* data being appended to existing files (cyrus.* files)
* lots of fsync calls all over the place (every lmtp append has multiple 
fsyncs, as well as various imap actions)
* run over the course of multiple years of continuous operations
* with a filesystem that's 60-90% full depending on your usage levels

There's serious fragmentation issues going on here that no benchmark even 
comes close to simulating.

Now from our experience, I can tell you that ext3 really does poorly on this 
workload compared to reiserfs. We had two exact same servers, one all 
reiserfs and one all ext3. The ext3 one started out ok, but over the course 
of a few weeks/months, it started getting worse and worse and was eventually 
being completely crushed by IO load. The machine running reiserfs had no 
problems at all even though it had more users on it as well and was growing 
at the same rate as the other machine.

Yes we did have directory indexing enabled (we had it turned on from the 
start), and we tried different data modes like data=writeback and 
data=ordered but that didn't help either.

To be honest, I don't know why exactly, and working out what's causing IO 
bottlenecks is not easy. We just went back to reiserfs.

Some previous comments I've made.

http://www.irbs.net/internet/info-cyrus/0412/0042.html
http://lists.andrew.cmu.edu/pipermail/info-cyrus/2006-October/024119.html

> The problem with reiserfs is... well. The developers have explicitely
> stated that the development of v3 has come to its end, and there was the

In this particular case, I'm really almost happy with this! Reiserfs has 
been very stable for us for at least 5 years, and I'm almost glad no-one is 
touching it because invariably people working on something will introduce 
new weird edge case bugs. This was a while back, but it demonstrates how 
apparently just adding 'some "sparse" endian annotations' caused a bug.

http://oss.sgi.com/projects/xfs/faq.html#dir2

That one was really nasty, even the xfs_repair tool couldn't fix it for a 
while!

Having said that, there have been some bugs over the last few years with 
reiserfs, however the kernel developers will still help with bug fixes if 
you find them and can trace them down.

http://blog.fastmail.fm/2007/09/21/reiserfs-bugs-32-bit-vs-64-bit-kernels-cache-vs-inode-memory/
http://lkml.org/lkml/2005/7/12/396
http://lkml.org/lkml/2008/6/17/9

Rob



More information about the Info-cyrus mailing list