choosing a file system

Sat Jan 3 01:16:41 EST 2009

On Sat, 2009-01-03 at 13:21 +1100, Rob Mueller wrote:
> > Now see, I've had almost exactly the opposite experience.  Reiserfs seemed 
> > to
> > start out well and work consistently until the filesystem reached a 
> > certain
> > size (around 160GB, ~30m files) at which point backing it up would start 
> > to
> > take too long and at around 180GB would take nearly a week.  This forced 
> > us
> > to move to ext3 and it doesn't seem to be degrade that way.  We did, 
> > however,
> > also move from a single partition to 8 of them, so that obviously has some
> > effect as well.
> 
> As you noted, changing two variables at once doesn't help you determine 
> which was the problem!
> 
> Multiple partitions will definitely allow more parallelism, which definitely 
> helps speed things up, which is one of the other things we have done over 
> time. Basically we went from a few large volumes to hundreds of 
> 300G(data)/15G(meta) volumes. One of our machines has 40 data volumes + 40 
> meta data volumes + the standard FS mounts.
> 
> $ mount | wc -l
> 92
> 
> We've found that splitting the data up into more volumes + more cyrus 
> instances seems to help as well because it seems to reduce overall 
> contention points in the kernel + software (eg filesystem locks spread 
> across multiple mounts, db locks are spread across multiple dbs, etc)
> 

Running multiple cyrus instances with different dbs ? How do we do that.
I have seen the ultimate io-contention point is the mailboxes.db file.
And that has to be single. 
Do you mean dividing the users to different cyrus instances. That is a
maintenance issue IMHO. 

I had the feeling whatever optimizations done at the FS level would give
us a max of 5-10% benefit. 
We migrated from ext3 to reiserfs  on our cyrus servers with 30k
mailboxes. I am not sure I saw a great benefit in terms of the iowait.
At peak times I always see a iowait of 40-60% 

But the new Solid-State-Disks seem very promising. They are claimed to
give 30x the throughput of a 15k rpm disk. If IO improves by 30 times
that should make all these optimizations unnecessary. 
As my boss used to tell me ... Good hardware always compensates for
not-so-good software. 

> Also one thing I did fail to mention, was that for the data volumes, you 
> should definitely be using the "notail" mount option. Unfortunately that's 
> not the default, and I think it probably should be. Tails packing is neat 
> for saving space, but it reduces the average meta-data density, which makes 
> "stating" lots of files in a directory a lot slower. I think that's what you 
> might have been seeing. Of course you also mounted "noatime,nodiratime" on 
> both?
> 
> I think that's another problem with a lot of filesystem benchmarks, not 
> finding out what the right mount "tuning" options are for your benchmark. 
> Arguing that "the default should be fine" is clearly wrong, because every 
> sane person uses "noatime", so you're already doing some tuning, so you 
> should find out what's best for the filesystem you are trying.
> 
> For the record, we use:
> 
> noatime,nodiratime,notail,data=ordered
> 
> On all our reiserfs volumes.
> 
> Rob
> 
> ----
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html