choosing a file system

Rob Mueller robm at fastmail.fm
Fri Jan 2 21:21:15 EST 2009


> Now see, I've had almost exactly the opposite experience.  Reiserfs seemed 
> to
> start out well and work consistently until the filesystem reached a 
> certain
> size (around 160GB, ~30m files) at which point backing it up would start 
> to
> take too long and at around 180GB would take nearly a week.  This forced 
> us
> to move to ext3 and it doesn't seem to be degrade that way.  We did, 
> however,
> also move from a single partition to 8 of them, so that obviously has some
> effect as well.

As you noted, changing two variables at once doesn't help you determine 
which was the problem!

Multiple partitions will definitely allow more parallelism, which definitely 
helps speed things up, which is one of the other things we have done over 
time. Basically we went from a few large volumes to hundreds of 
300G(data)/15G(meta) volumes. One of our machines has 40 data volumes + 40 
meta data volumes + the standard FS mounts.

$ mount | wc -l
92

We've found that splitting the data up into more volumes + more cyrus 
instances seems to help as well because it seems to reduce overall 
contention points in the kernel + software (eg filesystem locks spread 
across multiple mounts, db locks are spread across multiple dbs, etc)

Also one thing I did fail to mention, was that for the data volumes, you 
should definitely be using the "notail" mount option. Unfortunately that's 
not the default, and I think it probably should be. Tails packing is neat 
for saving space, but it reduces the average meta-data density, which makes 
"stating" lots of files in a directory a lot slower. I think that's what you 
might have been seeing. Of course you also mounted "noatime,nodiratime" on 
both?

I think that's another problem with a lot of filesystem benchmarks, not 
finding out what the right mount "tuning" options are for your benchmark. 
Arguing that "the default should be fine" is clearly wrong, because every 
sane person uses "noatime", so you're already doing some tuning, so you 
should find out what's best for the filesystem you are trying.

For the record, we use:

noatime,nodiratime,notail,data=ordered

On all our reiserfs volumes.

Rob



More information about the Info-cyrus mailing list