<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi,<div><br></div><div>I would not discount using reiserfs (v3) by any means. It's still by far a better choice for a filesystem with Cyrus then Ext3 or Ext4. I haven't really seen anyone do any tests with Ext4, but I imagine it should be about par for the course for Ext3.</div><div><br></div><div>as far as the NFS... NFS isn't itself that bad, it's just that people tend to find ways to use NFS in a incorrect manner that only ends up leading to failure.</div><div><br></div><div>Scott</div><div><br><div><div>On Dec 31, 2008, at 2:47 AM, LALOT Dominique wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Thanks for everybody. That was an interesting thread. Nobody seems to use a NetApp appliance, may be due to NFS architecture problems.<br><br>I believe I'll look to ext4 that seemed to be available in last kernel, and also to Solaris, but we are not enough to support another OS.<br> <br>Dom<br><br>And Happy New Year !<br><br><div class="gmail_quote">2008/12/31 Bron Gondwana <span dir="ltr"><<a href="mailto:brong@fastmail.fm">brong@fastmail.fm</a>></span><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> <div class="Ih2E3d">On Tue, Dec 30, 2008 at 02:43:14PM -0700, Shawn Nock wrote:<br> > Bron and the fastmail guys could tell you more about reiserfs... we've<br> > used RH&SuSE/reiserfs/EMC for quite a while and we are very happy.<br> <br> </div>Yeah, sure could :)<br> <br> You can probably find plenty of stuff from me in the archives about our<br> setup - the basic things are:<br> <br> * separate metadata on RAID1 10kRPM (or 15kRPM in the new boxes) drives.<br> * data files on RAID5 big slow drives - data IO isn't a limiting factor<br> * 300Gb "slots" with 15Gb associated meta drives, like this:<br> <br> /dev/sdb6 14016208 8080360 5935848 58% /mnt/meta6<br> /dev/sdb7 14016208 8064848 5951360 58% /mnt/meta7<br> /dev/sdb8 14016208 8498812 5517396 61% /mnt/meta8<br> /dev/sdd2 292959500 248086796 44872704 85% /mnt/data6<br> /dev/sdd3 292959500 242722420 50237080 83% /mnt/data7<br> /dev/sdd4 292959500 248840432 44119068 85% /mnt/data8<br> <br> as you can see, that balances out pretty nicely. We also store<br> per-user bayes databases on the associated meta drives.<br> <br> We balance our disk usage by moving users between stores when usage<br> reaches 88% on any partition. We get emailed if it goes above 92%<br> and paged if it goes above 95%.<br> <br> Replication. We have multiple "slots" on each server, and since<br> they are all the same size, we have replication pairs spread pretty<br> randomly around the hosts, so the failure of any one drive unit<br> (SCSI attached SATA) or imap server doesn't significantly overload<br> any one other machine. By using Cyrus replication rather than,<br> say, DRBD, a filesystem corruption should only affect a single<br> partition, which won't take so long to fsck.<br> <br> Moving users is easy - we run a sync_server on the Cyrus master, and<br> just create a custom config directory with symlinks into the tree on<br> the real server and a rewritten piece of mailboxes.db so we can<br> rename them during the move if needed. It's all automatic.<br> <br> We also have a "CheckReplication" perl module that can be used to<br> compare two ends to make sure everything is the same. It does full<br> per-message flags checks, random sha1 integrity checks, etc.<br> Does require a custom patch to expose the GUID (as DIGEST.SHA1)<br> via IMAP.<br> <br> I lost an entire drive unit on the 26th. It stopped responding.<br> 8 x 1TB drives in it.<br> <br> I tried rebooting everything, then switched the affected stores over<br> to their replicas. Total downtime for those users of about 15<br> minutes because I tried the reboot first just in case (there's a<br> chance that some messages were delivered and not yet replicated,<br> so it's better not to bring up the replica uncleanly until you're<br> sure there's no other choice)<br> <br> In the end I decided that it wasn't recoverable quickly enough to<br> be viable, so chose new replica pairs for the slots that had been<br> on that drive unit (we keep some empty space on our machines for<br> just this eventuality) and started up another handy little script<br> "sync_all_users" which runs sync_client -u for every user, then<br> starts the rolling sync_client again at the end. It took about<br> 16 hours to bring everything back to fully replicated again.<br> <font color="#888888"><br> Bron.<br> </font></blockquote></div><br><br clear="all"><br>-- <br>Dominique LALOT<br>Ingénieur Systèmes et Réseaux<br><a href="http://annuaire.univmed.fr/showuser?uid=lalot">http://annuaire.univmed.fr/showuser?uid=lalot</a><br> !DSPAM:495b4f1f47731804284693! ----<br>Cyrus Home Page: <a href="http://cyrusimap.web.cmu.edu/">http://cyrusimap.web.cmu.edu/</a><br>Cyrus Wiki/FAQ: <a href="http://cyrusimap.web.cmu.edu/twiki">http://cyrusimap.web.cmu.edu/twiki</a><br>List Archives/Info: <a href="http://asg.web.cmu.edu/cyrus/mailing-list.html">http://asg.web.cmu.edu/cyrus/mailing-list.html</a><br><br>!DSPAM:495b4f1f47731804284693!<br></blockquote></div><br></div></body></html>