<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Michael,<br>
I'd like to thank you for having written up such a succinct and
reasonable description of a well thought out murder installation.
Lot's of good information here, especially for people who may be
considering a move like yours. This could be the bones of a good
Wiki article.<br>
<br>
Cheers,<br>
-nic<br>
<br>
<div class="moz-cite-prefix">On 09/22/2014 06:20 AM, Michael Menge
wrote:<br>
</div>
<blockquote
cite="mid:20140922132042.1365581s5qwf0qoa@webmail.uni-tuebingen.de"
type="cite">Hi,
<br>
<br>
3 weeks ago we changed our changed cyrus imap servers form stand
<br>
alone systems to a cyrus murder cluster. We have ~44000 accounts,
<br>
~457000 Mailboxes, and 2x6.5 TB Mails
<br>
<br>
In our previos setup we had 6 cyrus imap 2.4.17 servers running as
KVM
<br>
VMs with 8 GB memory and 4 Cores each, on an HP Blade center (G7
Blades).
<br>
Each server was running 2 cyrus instances one master system an one
replica
<br>
of one of the other servers. We used DNS cnames to distribute our
users to
<br>
our servers. The filesystems are stored on two Infortrend iSCSI
Raids, so
<br>
that the replic is not on the same iSCSI system as the master.
<br>
<br>
In our new setup each server is running 3 - 4 cyrus instances.
<br>
One Frontend, one backend, one replic and on one of the servers
<br>
the cyrus mupdate master. ClusterIP is used to distribute the
access
<br>
to our frontend instances. The backend and replics are only
listening
<br>
on private IPs.
<br>
<br>
If one server goes down, we will switch that ClusterIP bucket to
one
<br>
of the other servers, and we will restart the replic as backend by
changing
<br>
the config and switch the IP of the replic with the ip of the
backend. This
<br>
is much faster than updating the mailbox location of all the
affected
<br>
mailboxes.
<br>
<br>
If the mupdate master is down we start it on one of the other
servers,
<br>
using the mailboxdb of the frontend and running "ctl_mboxlist -m
-a"
<br>
on all backend instances.
<br>
<br>
Since the migration we discovered some small issues and some bugs.
<br>
<br>
1. usually Cyrus is not CPU bound. One exception is the mupdate
master
<br>
keeping encrypted connection to all frontends and establishing
<br>
new encrypted connections from the backend for every mailbox
creation,
<br>
rename and remove, was too much for the 4 cores so we added 4
additional
<br>
cores to the VMs.
<br>
<br>
2. Our frontend instances use IMAPs and POP3s and don't allow
STARTTLS.
<br>
But we hat to use IMAP and POP3 with STARTTLS on our backends,
as
<br>
the frontends will always use STARTTLS over IMAP and POP3 to
proxy
<br>
the connection.
<br>
<br>
<br>
3. We see more IOERRORs in our cyrus logs. In the standalone
<br>
cyrus imap IOERROR indicated a corruption in one of the cyrus
files
<br>
but that is not the case for the new errors we have found:
<br>
<br>
a) "reading message: unexpected end of file" as far as i can
tell,
<br>
this is triggert by the imap append command. I suspect when
the
<br>
connection between frontend and backend is lost or the
frontend
<br>
dies during upload of the message.
<br>
<br>
b) "opening index %s: Invalid mailbox name" the mailbox name
seem to
<br>
be fine in most cases. I haven only figured out why the
mailbox
<br>
name was considered invalid in one case (the Sting
"Posteingang"
<br>
was translated by the client and the name "INBOX" ins
reserved.
<br>
<br>
It would help if the String IOERROR would not be used in these
cases,
<br>
and if the mailbox name would always be logged consistent to
the
<br>
unixhierarchysep option.
<br>
<br>
<br>
4. Deleting an mailbox with delete_mode: delayed can create a
corrupt
<br>
mailbox in the DELETED tree. In the logs we found the
following:
<br>
<br>
be/beimap[62020]: Rename: user.LoginID.Mail.drafts ->
DELETED.user.LoginID.Mail.drafts.5416CD11
<br>
<br>
be/beimap[62020]: MUPDATE: can't commit mailbox entry for
'DELETED.user.LoginID.Mail.drafts.5416CD11'
<br>
be/beimap[62020]: Deleted mailbox
DELETED.user.LoginID.Mail.drafts.5416CD11
<br>
<br>
and on the next cyr_expire run
<br>
<br>
be/cyr_expire[144388]: IOERROR: opening index
DELETED.user.LoginID.Mail.drafts.5416CD11: System I/O error
<br>
<br>
in the filesystem DELETED/user/LoginID/Mail/drafts was an empty
directory.
<br>
I couldn't find any hints why the mupdate master couldn't
commit the
<br>
mailbox entry, but as "5416CD11" is the timestamp of the
action, I am
<br>
certain that the mailbox did not exist in the mailboxdb before.
And as
<br>
this only happens in some rare cases I suspect a race
condition.
<br>
<br>
5. Some frontend imapd processes receive a SIGSEGV.
<br>
As this seams to happen in the libopenssl I asked on their
mailinglist,
<br>
but didn't receive an answer jet. At the end you will fine an
BT of the
<br>
core dump.
<br>
<br>
I would be glad if changes regarding the logging of IOERRORs
<br>
and mailbox names would be included in Cyrus 2.5
<br>
<br>
Regarding 4. and 5. are these known bugs? I could not find any
matching
<br>
entries in the bug tracker. If they are not know I would add them
to the bug tracker.
<br>
<br>
Regards
<br>
<br>
Michael menge
<br>
<br>
----- ldd imapd ----
<br>
linux-vdso.so.1 => (0x00007fff3ffed000)
<br>
libsasl2.so.2 => /usr/lib64/libsasl2.so.2 (0x00007f40e62a8000)
<br>
libssl.so.0.9.8 => /usr/lib64/libssl.so.0.9.8
(0x00007f40e6052000)
<br>
libcrypto.so.0.9.8 => /usr/lib64/libcrypto.so.0.9.8
(0x00007f40e5cb2000)
<br>
libz.so.1 => /lib64/libz.so.1 (0x00007f40e5a9c000)
<br>
libwrap.so.0 => /lib64/libwrap.so.0 (0x00007f40e5891000)
<br>
libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f40e5678000)
<br>
libc.so.6 => /lib64/libc.so.6 (0x00007f40e52ff000)
<br>
libdl.so.2 => /lib64/libdl.so.2 (0x00007f40e50fb000)
<br>
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f40e4ee3000)
<br>
/lib64/ld-linux-x86-64.so.2 (0x00007f40e64f7000)
<br>
<br>
--- bt on imapd core dump ----
<br>
#0 0x000000000080e130 in ?? ()
<br>
#1 0x00007fe5a839334f in ssl3_get_message (s=0x80e430,
st1=8347825, stn=-1470427072, mt=<optimized out>,
max=102400, ok=0x7fffcc974d08)
<br>
at s3_both.c:522
<br>
#2 0x00007fe5a838ba0d in ssl3_get_key_exchange (s=0x0) at
s3_clnt.c:1103
<br>
#3 0x00007fe5a838dff8 in ssl3_connect (s=0x80e430) at
s3_clnt.c:316
<br>
#4 0x000000000046a177 in tls_start_clienttls (readfd=16,
writefd=16, layerbits=0x7fffcc975104, authid=0x7fffcc975108,
ret=0x7e1fa0,
<br>
sess=0x7e1fa8) at tls.c:1311
<br>
#5 0x00000000004669f4 in do_starttls (s=0x7e16a0,
tls_cmd=0x78a4d0 <imap_protocol+208>) at backend.c:201
<br>
#6 0x0000000000467217 in backend_authenticate (s=0x7e16a0,
prot=0x78a400 <imap_protocol>, mechlist=0x7fffcc976468,
<br>
userid=0x7f5c90 "REPLACED_LOGINID", cb=0x80de30,
status=0x7fffcc976460) at backend.c:378
<br>
#7 0x0000000000467a1a in backend_connect
(ret_backend=0x7e16a0, server=0x7a8960 <partition.17660>
"ma03.mail.localhost",
<br>
prot=0x78a400 <imap_protocol>, userid=0x7f5c90
"REPLACED_LOGINID", cb=0x0, auth_status=0x0) at backend.c:552
<br>
#8 0x0000000000426603 in proxy_findserver (server=0x7a8960
<partition.17660> "ma03.mail.localhost", prot=0x78a400
<imap_protocol>,
<br>
userid=0x7f5c90 "REPLACED_LOGINID", cache=0x7a3010
<backend_cached>, current=0x7a3008 <backend_current>,
inbox=0x7a3000 <backend_inbox>,
<br>
clientin=0x7be450) at proxy.c:179
<br>
#9 0x0000000000426beb in proxy_findinboxserver
(userid=0x7f5b20 "REPLACED_LOGINID") at imap_proxy.c:145
<br>
#10 0x00000000004197c8 in cmd_list (tag=0x7f3720 "42.117",
listargs=0x7fffcc977510) at imapd.c:6036
<br>
#11 0x000000000040c9ee in cmdloop () at imapd.c:1574
<br>
#12 0x000000000040aea5 in service_main (argc=2, argv=0x7b9010,
envp=0x7fffcc97b650) at imapd.c:946
<br>
#13 0x0000000000409ba4 in main (argc=6, argv=0x7fffcc97b618,
envp=0x7fffcc97b650) at service.c:582
<br>
-----------------------------
<br>
<br>
<br>
<br>
<br>
<br>
<br>
--------------------------------------------------------------------------------
<br>
M.Menge Tel.: (49) 7071/29-70316
<br>
Universität Tübingen Fax.: (49) 7071/29-5912
<br>
Zentrum für Datenverarbeitung mail:
<a class="moz-txt-link-abbreviated" href="mailto:michael.menge@zdv.uni-tuebingen.de">michael.menge@zdv.uni-tuebingen.de</a>
<br>
Wächterstraße 76
<br>
72074 Tübingen<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">----
Cyrus Home Page: <a class="moz-txt-link-freetext" href="http://www.cyrusimap.org/">http://www.cyrusimap.org/</a>
List Archives/Info: <a class="moz-txt-link-freetext" href="http://lists.andrew.cmu.edu/pipermail/info-cyrus/">http://lists.andrew.cmu.edu/pipermail/info-cyrus/</a>
To Unsubscribe:
<a class="moz-txt-link-freetext" href="https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus">https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus</a></pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Nic Bernstein <a class="moz-txt-link-abbreviated" href="mailto:nic@onlight.com">nic@onlight.com</a>
Onlight, Inc. <a class="moz-txt-link-abbreviated" href="http://www.onlight.com">www.onlight.com</a>
219 N. Milwaukee St., Suite 2a v. 414.272.4477
Milwaukee, Wisconsin 53202
</pre>
</body>
</html>