Thanks for everybody. That was an interesting thread. Nobody seems to use a NetApp appliance, may be due to NFS architecture problems.<br><br>I believe I&#39;ll look to ext4 that seemed to be available in last kernel, and also to Solaris, but we are not enough to support another OS.<br>

<br>Dom<br><br>And Happy New Year !<br><br><div class="gmail_quote">2008/12/31 Bron Gondwana <span dir="ltr">&lt;<a href="mailto:brong@fastmail.fm">brong@fastmail.fm</a>&gt;</span><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="Ih2E3d">On Tue, Dec 30, 2008 at 02:43:14PM -0700, Shawn Nock wrote:<br>

&gt; Bron and the fastmail guys could tell you more about reiserfs... we&#39;ve<br>

&gt; used RH&amp;SuSE/reiserfs/EMC for quite a while and we are very happy.<br>

<br>

</div>Yeah, sure could :)<br>

<br>

You can probably find plenty of stuff from me in the archives about our<br>

setup - the basic things are:<br>

<br>

* separate metadata on RAID1 10kRPM (or 15kRPM in the new boxes) drives.<br>

* data files on RAID5 big slow drives - data IO isn&#39;t a limiting factor<br>

* 300Gb &quot;slots&quot; with 15Gb associated meta drives, like this:<br>

<br>

/dev/sdb6 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 14016208 &nbsp; 8080360 &nbsp; 5935848 &nbsp;58% /mnt/meta6<br>

/dev/sdb7 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 14016208 &nbsp; 8064848 &nbsp; 5951360 &nbsp;58% /mnt/meta7<br>

/dev/sdb8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 14016208 &nbsp; 8498812 &nbsp; 5517396 &nbsp;61% /mnt/meta8<br>

/dev/sdd2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;292959500 248086796 &nbsp;44872704 &nbsp;85% /mnt/data6<br>

/dev/sdd3 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;292959500 242722420 &nbsp;50237080 &nbsp;83% /mnt/data7<br>

/dev/sdd4 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;292959500 248840432 &nbsp;44119068 &nbsp;85% /mnt/data8<br>

<br>

as you can see, that balances out pretty nicely. &nbsp;We also store<br>

per-user bayes databases on the associated meta drives.<br>

<br>

We balance our disk usage by moving users between stores when usage<br>

reaches 88% on any partition. &nbsp;We get emailed if it goes above 92%<br>

and paged if it goes above 95%.<br>

<br>

Replication. &nbsp;We have multiple &quot;slots&quot; on each server, and since<br>

they are all the same size, we have replication pairs spread pretty<br>

randomly around the hosts, so the failure of any one drive unit<br>

(SCSI attached SATA) or imap server doesn&#39;t significantly overload<br>

any one other machine. &nbsp;By using Cyrus replication rather than,<br>

say, DRBD, a filesystem corruption should only affect a single<br>

partition, which won&#39;t take so long to fsck.<br>

<br>

Moving users is easy - we run a sync_server on the Cyrus master, and<br>

just create a custom config directory with symlinks into the tree on<br>

the real server and a rewritten piece of mailboxes.db so we can<br>

rename them during the move if needed. &nbsp;It&#39;s all automatic.<br>

<br>

We also have a &quot;CheckReplication&quot; perl module that can be used to<br>

compare two ends to make sure everything is the same. &nbsp;It does full<br>

per-message flags checks, random sha1 integrity checks, etc.<br>

Does require a custom patch to expose the GUID (as DIGEST.SHA1)<br>

via IMAP.<br>

<br>

I lost an entire drive unit on the 26th. &nbsp;It stopped responding.<br>

8 x 1TB drives in it.<br>

<br>

I tried rebooting everything, then switched the affected stores over<br>

to their replicas. &nbsp;Total downtime for those users of about 15<br>

minutes because I tried the reboot first just in case (there&#39;s a<br>

chance that some messages were delivered and not yet replicated,<br>

so it&#39;s better not to bring up the replica uncleanly until you&#39;re<br>

sure there&#39;s no other choice)<br>

<br>

In the end I decided that it wasn&#39;t recoverable quickly enough to<br>

be viable, so chose new replica pairs for the slots that had been<br>

on that drive unit (we keep some empty space on our machines for<br>

just this eventuality) and started up another handy little script<br>

&quot;sync_all_users&quot; which runs sync_client -u for every user, then<br>

starts the rolling sync_client again at the end. &nbsp;It took about<br>

16 hours to bring everything back to fully replicated again.<br>

<font color="#888888"><br>

Bron.<br>

</font></blockquote></div><br><br clear="all"><br>-- <br>Dominique LALOT<br>Ingénieur Systèmes et Réseaux<br><a href="http://annuaire.univmed.fr/showuser?uid=lalot">http://annuaire.univmed.fr/showuser?uid=lalot</a><br>