choosing a file system

LALOT Dominique dom.lalot at gmail.com
Wed Dec 31 05:47:49 EST 2008


Thanks for everybody. That was an interesting thread. Nobody seems to use a
NetApp appliance, may be due to NFS architecture problems.

I believe I'll look to ext4 that seemed to be available in last kernel, and
also to Solaris, but we are not enough to support another OS.

Dom

And Happy New Year !

2008/12/31 Bron Gondwana <brong at fastmail.fm>

> On Tue, Dec 30, 2008 at 02:43:14PM -0700, Shawn Nock wrote:
> > Bron and the fastmail guys could tell you more about reiserfs... we've
> > used RH&SuSE/reiserfs/EMC for quite a while and we are very happy.
>
> Yeah, sure could :)
>
> You can probably find plenty of stuff from me in the archives about our
> setup - the basic things are:
>
> * separate metadata on RAID1 10kRPM (or 15kRPM in the new boxes) drives.
> * data files on RAID5 big slow drives - data IO isn't a limiting factor
> * 300Gb "slots" with 15Gb associated meta drives, like this:
>
> /dev/sdb6             14016208   8080360   5935848  58% /mnt/meta6
> /dev/sdb7             14016208   8064848   5951360  58% /mnt/meta7
> /dev/sdb8             14016208   8498812   5517396  61% /mnt/meta8
> /dev/sdd2            292959500 248086796  44872704  85% /mnt/data6
> /dev/sdd3            292959500 242722420  50237080  83% /mnt/data7
> /dev/sdd4            292959500 248840432  44119068  85% /mnt/data8
>
> as you can see, that balances out pretty nicely.  We also store
> per-user bayes databases on the associated meta drives.
>
> We balance our disk usage by moving users between stores when usage
> reaches 88% on any partition.  We get emailed if it goes above 92%
> and paged if it goes above 95%.
>
> Replication.  We have multiple "slots" on each server, and since
> they are all the same size, we have replication pairs spread pretty
> randomly around the hosts, so the failure of any one drive unit
> (SCSI attached SATA) or imap server doesn't significantly overload
> any one other machine.  By using Cyrus replication rather than,
> say, DRBD, a filesystem corruption should only affect a single
> partition, which won't take so long to fsck.
>
> Moving users is easy - we run a sync_server on the Cyrus master, and
> just create a custom config directory with symlinks into the tree on
> the real server and a rewritten piece of mailboxes.db so we can
> rename them during the move if needed.  It's all automatic.
>
> We also have a "CheckReplication" perl module that can be used to
> compare two ends to make sure everything is the same.  It does full
> per-message flags checks, random sha1 integrity checks, etc.
> Does require a custom patch to expose the GUID (as DIGEST.SHA1)
> via IMAP.
>
> I lost an entire drive unit on the 26th.  It stopped responding.
> 8 x 1TB drives in it.
>
> I tried rebooting everything, then switched the affected stores over
> to their replicas.  Total downtime for those users of about 15
> minutes because I tried the reboot first just in case (there's a
> chance that some messages were delivered and not yet replicated,
> so it's better not to bring up the replica uncleanly until you're
> sure there's no other choice)
>
> In the end I decided that it wasn't recoverable quickly enough to
> be viable, so chose new replica pairs for the slots that had been
> on that drive unit (we keep some empty space on our machines for
> just this eventuality) and started up another handy little script
> "sync_all_users" which runs sync_client -u for every user, then
> starts the rolling sync_client again at the end.  It took about
> 16 hours to bring everything back to fully replicated again.
>
> Bron.
>



-- 
Dominique LALOT
Ingénieur Systèmes et Réseaux
http://annuaire.univmed.fr/showuser?uid=lalot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20081231/9741ea07/attachment.html 


More information about the Info-cyrus mailing list