choosing a file system
Rob Mueller
robm at fastmail.fm
Fri Jan 2 21:21:15 EST 2009
> Now see, I've had almost exactly the opposite experience. Reiserfs seemed
> to
> start out well and work consistently until the filesystem reached a
> certain
> size (around 160GB, ~30m files) at which point backing it up would start
> to
> take too long and at around 180GB would take nearly a week. This forced
> us
> to move to ext3 and it doesn't seem to be degrade that way. We did,
> however,
> also move from a single partition to 8 of them, so that obviously has some
> effect as well.
As you noted, changing two variables at once doesn't help you determine
which was the problem!
Multiple partitions will definitely allow more parallelism, which definitely
helps speed things up, which is one of the other things we have done over
time. Basically we went from a few large volumes to hundreds of
300G(data)/15G(meta) volumes. One of our machines has 40 data volumes + 40
meta data volumes + the standard FS mounts.
$ mount | wc -l
92
We've found that splitting the data up into more volumes + more cyrus
instances seems to help as well because it seems to reduce overall
contention points in the kernel + software (eg filesystem locks spread
across multiple mounts, db locks are spread across multiple dbs, etc)
Also one thing I did fail to mention, was that for the data volumes, you
should definitely be using the "notail" mount option. Unfortunately that's
not the default, and I think it probably should be. Tails packing is neat
for saving space, but it reduces the average meta-data density, which makes
"stating" lots of files in a directory a lot slower. I think that's what you
might have been seeing. Of course you also mounted "noatime,nodiratime" on
both?
I think that's another problem with a lot of filesystem benchmarks, not
finding out what the right mount "tuning" options are for your benchmark.
Arguing that "the default should be fine" is clearly wrong, because every
sane person uses "noatime", so you're already doing some tuning, so you
should find out what's best for the filesystem you are trying.
For the record, we use:
noatime,nodiratime,notail,data=ordered
On all our reiserfs volumes.
Rob
More information about the Info-cyrus
mailing list