choosing a file system
dom.lalot at gmail.com
Wed Dec 31 05:47:49 EST 2008
Thanks for everybody. That was an interesting thread. Nobody seems to use a
NetApp appliance, may be due to NFS architecture problems.
I believe I'll look to ext4 that seemed to be available in last kernel, and
also to Solaris, but we are not enough to support another OS.
And Happy New Year !
2008/12/31 Bron Gondwana <brong at fastmail.fm>
> On Tue, Dec 30, 2008 at 02:43:14PM -0700, Shawn Nock wrote:
> > Bron and the fastmail guys could tell you more about reiserfs... we've
> > used RH&SuSE/reiserfs/EMC for quite a while and we are very happy.
> Yeah, sure could :)
> You can probably find plenty of stuff from me in the archives about our
> setup - the basic things are:
> * separate metadata on RAID1 10kRPM (or 15kRPM in the new boxes) drives.
> * data files on RAID5 big slow drives - data IO isn't a limiting factor
> * 300Gb "slots" with 15Gb associated meta drives, like this:
> /dev/sdb6 14016208 8080360 5935848 58% /mnt/meta6
> /dev/sdb7 14016208 8064848 5951360 58% /mnt/meta7
> /dev/sdb8 14016208 8498812 5517396 61% /mnt/meta8
> /dev/sdd2 292959500 248086796 44872704 85% /mnt/data6
> /dev/sdd3 292959500 242722420 50237080 83% /mnt/data7
> /dev/sdd4 292959500 248840432 44119068 85% /mnt/data8
> as you can see, that balances out pretty nicely. We also store
> per-user bayes databases on the associated meta drives.
> We balance our disk usage by moving users between stores when usage
> reaches 88% on any partition. We get emailed if it goes above 92%
> and paged if it goes above 95%.
> Replication. We have multiple "slots" on each server, and since
> they are all the same size, we have replication pairs spread pretty
> randomly around the hosts, so the failure of any one drive unit
> (SCSI attached SATA) or imap server doesn't significantly overload
> any one other machine. By using Cyrus replication rather than,
> say, DRBD, a filesystem corruption should only affect a single
> partition, which won't take so long to fsck.
> Moving users is easy - we run a sync_server on the Cyrus master, and
> just create a custom config directory with symlinks into the tree on
> the real server and a rewritten piece of mailboxes.db so we can
> rename them during the move if needed. It's all automatic.
> We also have a "CheckReplication" perl module that can be used to
> compare two ends to make sure everything is the same. It does full
> per-message flags checks, random sha1 integrity checks, etc.
> Does require a custom patch to expose the GUID (as DIGEST.SHA1)
> via IMAP.
> I lost an entire drive unit on the 26th. It stopped responding.
> 8 x 1TB drives in it.
> I tried rebooting everything, then switched the affected stores over
> to their replicas. Total downtime for those users of about 15
> minutes because I tried the reboot first just in case (there's a
> chance that some messages were delivered and not yet replicated,
> so it's better not to bring up the replica uncleanly until you're
> sure there's no other choice)
> In the end I decided that it wasn't recoverable quickly enough to
> be viable, so chose new replica pairs for the slots that had been
> on that drive unit (we keep some empty space on our machines for
> just this eventuality) and started up another handy little script
> "sync_all_users" which runs sync_client -u for every user, then
> starts the rolling sync_client again at the end. It took about
> 16 hours to bring everything back to fully replicated again.
Ingénieur Systèmes et Réseaux
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Info-cyrus