choosing a file system
ram
ram at netcore.co.in
Sat Jan 3 01:16:41 EST 2009
On Sat, 2009-01-03 at 13:21 +1100, Rob Mueller wrote:
> > Now see, I've had almost exactly the opposite experience. Reiserfs seemed
> > to
> > start out well and work consistently until the filesystem reached a
> > certain
> > size (around 160GB, ~30m files) at which point backing it up would start
> > to
> > take too long and at around 180GB would take nearly a week. This forced
> > us
> > to move to ext3 and it doesn't seem to be degrade that way. We did,
> > however,
> > also move from a single partition to 8 of them, so that obviously has some
> > effect as well.
>
> As you noted, changing two variables at once doesn't help you determine
> which was the problem!
>
> Multiple partitions will definitely allow more parallelism, which definitely
> helps speed things up, which is one of the other things we have done over
> time. Basically we went from a few large volumes to hundreds of
> 300G(data)/15G(meta) volumes. One of our machines has 40 data volumes + 40
> meta data volumes + the standard FS mounts.
>
> $ mount | wc -l
> 92
>
> We've found that splitting the data up into more volumes + more cyrus
> instances seems to help as well because it seems to reduce overall
> contention points in the kernel + software (eg filesystem locks spread
> across multiple mounts, db locks are spread across multiple dbs, etc)
>
Running multiple cyrus instances with different dbs ? How do we do that.
I have seen the ultimate io-contention point is the mailboxes.db file.
And that has to be single.
Do you mean dividing the users to different cyrus instances. That is a
maintenance issue IMHO.
I had the feeling whatever optimizations done at the FS level would give
us a max of 5-10% benefit.
We migrated from ext3 to reiserfs on our cyrus servers with 30k
mailboxes. I am not sure I saw a great benefit in terms of the iowait.
At peak times I always see a iowait of 40-60%
But the new Solid-State-Disks seem very promising. They are claimed to
give 30x the throughput of a 15k rpm disk. If IO improves by 30 times
that should make all these optimizations unnecessary.
As my boss used to tell me ... Good hardware always compensates for
not-so-good software.
> Also one thing I did fail to mention, was that for the data volumes, you
> should definitely be using the "notail" mount option. Unfortunately that's
> not the default, and I think it probably should be. Tails packing is neat
> for saving space, but it reduces the average meta-data density, which makes
> "stating" lots of files in a directory a lot slower. I think that's what you
> might have been seeing. Of course you also mounted "noatime,nodiratime" on
> both?
>
> I think that's another problem with a lot of filesystem benchmarks, not
> finding out what the right mount "tuning" options are for your benchmark.
> Arguing that "the default should be fine" is clearly wrong, because every
> sane person uses "noatime", so you're already doing some tuning, so you
> should find out what's best for the filesystem you are trying.
>
> For the record, we use:
>
> noatime,nodiratime,notail,data=ordered
>
> On all our reiserfs volumes.
>
> Rob
>
> ----
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
More information about the Info-cyrus
mailing list