Recomendations for a 15000 Cyrus Mailboxes

Bron Gondwana brong at fastmail.fm
Wed Apr 11 06:26:37 EDT 2007


On Tue, Apr 10, 2007 at 04:58:23PM +0200, Timo Schoeler wrote:
> On Tue, 10 Apr 2007 06:56:43 -0500
> >    1. Linux LVM over a 600 GB RAID 10 ( 4 x 300 GB)
> >    2. Which filesystem seems to be the better ? ext3 ? xfs ?
> > reiserfs ?
> 
> Do NEVER use XFS on GNU/Linux. (C)XFS is a brilliant FS on sgi's IRIX
> machines, I never lost even a single in more than ten years.
> 
> On GNU/Linux the implementation totally sucks. I'll stop my rant on
> GNU/Linux now ;)
> 
> My guess: ext3. ReiferFS has some very annoying weaknesses that may
> affect you.

We had the opposite experience.  Turn off tails on Reiserfs and you'll
still get better storage rates than ext3, and the difference in heavily
loaded performance is amazing.  We have machines that have been humming
along just fine for months with significantly more users on them than
they had with the abortive ext3 build.

We also apply one patch to reiserfs.  It's a one liner, using
generic_write rather than the Hans Reiser special (5% faster, can
deadlock under heavy load)

We were in the process of working with Namesys people to get that one
resolved back into the kernel when priorities got a little refocussed
for them.

Also, split meta is really valuable, with much faster for the meta
partition.  We have data on giant SATA RAID5 arrays (+ replication,
+ backups) and meta on 10kRPM SATA in RAID1.

Here's 300 seconds worth of IO on one of our servers:

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda          0.05  11.17  5.35  5.95   55.94  136.92    27.97    68.46    17.07     0.00    0.42   0.22   0.24
sdb          0.01 128.52 43.72 50.74  687.77 1434.09   343.88   717.05    22.46     2.41   25.54   2.86  27.02
sdc          0.01 156.61 83.14 73.47 1219.23 1840.68   609.61   920.34    19.54     3.91   24.98   2.47  38.64
sdd          0.00  52.79  7.49 14.55  100.58  538.95    50.29   269.48    29.01     0.58   26.26   5.52  12.17
sde          4.55  52.16 17.98 16.58  218.33  549.94   109.17   274.97    22.24     1.36   39.47   5.11  17.67
sdf          0.33 255.53 99.87 36.73 3669.49 2334.67  1834.74  1167.34    43.95     1.62   11.85   2.76  37.64
sdg          0.00  60.41 31.85 31.18  552.26  732.64   276.13   366.32    20.38     1.63   25.92   4.40  27.76
sdh          2.28  51.95 23.36 11.57 1662.08  508.31   831.04   254.16    62.14     0.47   13.58   7.84  27.38
sdi          2.12  46.52 17.52 13.10 1457.10  476.69   728.55   238.34    63.14     0.45   14.68   6.44  19.71

sda is the system drive.  The pairs are:

meta:   sdb  sdc  |  sdf  sdg
data:   sdd  sde  |  sdh  sdi

Each 4 of which are an external drive unit with 4 fast small
drives in RAID1 and 8 big slow drives in RAID5.

Within the drives the layout is:
  [*-*-] [--*-]  [*-v-v] [-v*v-]

(bigger drives in the second unit)

Where 'v' is a separate low-IO data storage partition that's not
used for IMAP, '*' is a master and '-' is a replica.

As you can imagine, the stats are a bit random there, but this is
a pretty typical low-load time data set.  It's only really daytime
in Europe and parts of Asia at the moment, and we don't have as
many users there as the US.
 
> Best: ZFS on Solaris ;)

Have the fixed the mmap problems yet?  Otherwise, yeah - it looks
pretty funky.  I like the concepts.

I'm also interested in reiser4 if/when it stabilises.  It seems to
have been designed with our workloads specifically in mind!  There's
also dualfs which was mentioned on the lkml recently that I'd love to
play with if it's ever ported forward to the 2.6 series.

> >    3. Which options to format the filesystem ? acording to the chosed
> >       filesystem

No options.  We mount:

rw,noatime,nodiratime,notail,data=journal

I'm not sure that notail is needed with more recent kernels, because
there were patches that supposedly fixed the issue with that, but why
mess with what works!

> >    4. Which pop3 / imap proxy to use ?

nginx.  Without a doubt.  Not only is it amazingly blindingly efficient
with epoll (and probably kqueue if you went FreeBSD), but it has a very
responsive and active author.  Don't read the code though, it's very
well written and tidy, but it will break your brain.  Here's someone who
ENJOYS writing state machines in C.

> >    5. Single instance or multiple instances of cyrus ? taking in mind
> >       that there should be the option to recover a mailbox or some
> > mail of a mailbox without having to shut down the whole cyrus system.

I like small.  It keeps the mailboxes.db small, and hence easier to scan
in a hurry.  If nothing else, it will improve IMAP LIST performance.
That said, all users who share mailboxes will need to be in the same
instance.

> >    6. Best way to perform backups ? LVM snapshots ? shutting down some
> >       cyrus partitions ? RAID10 hot swap ?

I'm in the middle of rewriting ours.  It used to just be files, which was
really easy because they never change(tm).  It turns out not to be strictly
true if someone deletes and recreates the folder, so the new one is going
to be all UUID based (what with our MD5 UUID patch) and store cyrus.index
files as well.  We have a nice cyrus.index and cyrus.header parser, though
I sure would be happier if mailboxes.db included the folder uniqueid in it.
And cyrus.index for that matter.

> >    7. Any other suggestion will be welcome.
> > 
> > Thanks a lot !!

Don't do it.  It will make you pull your hair out.  Email is full of
spammers and scammers and it's a pointless waste of time.  Tell your
erstwhile users to find something better to do with their lives.

Bron.


More information about the Info-cyrus mailing list