switch to cyrus murder (aggregator) feedback
Michael Menge
michael.menge at zdv.uni-tuebingen.de
Mon Sep 22 07:20:42 EDT 2014
Hi,
3 weeks ago we changed our changed cyrus imap servers form stand
alone systems to a cyrus murder cluster. We have ~44000 accounts,
~457000 Mailboxes, and 2x6.5 TB Mails
In our previos setup we had 6 cyrus imap 2.4.17 servers running as KVM
VMs with 8 GB memory and 4 Cores each, on an HP Blade center (G7 Blades).
Each server was running 2 cyrus instances one master system an one replica
of one of the other servers. We used DNS cnames to distribute our users to
our servers. The filesystems are stored on two Infortrend iSCSI Raids, so
that the replic is not on the same iSCSI system as the master.
In our new setup each server is running 3 - 4 cyrus instances.
One Frontend, one backend, one replic and on one of the servers
the cyrus mupdate master. ClusterIP is used to distribute the access
to our frontend instances. The backend and replics are only listening
on private IPs.
If one server goes down, we will switch that ClusterIP bucket to one
of the other servers, and we will restart the replic as backend by changing
the config and switch the IP of the replic with the ip of the backend. This
is much faster than updating the mailbox location of all the affected
mailboxes.
If the mupdate master is down we start it on one of the other servers,
using the mailboxdb of the frontend and running "ctl_mboxlist -m -a"
on all backend instances.
Since the migration we discovered some small issues and some bugs.
1. usually Cyrus is not CPU bound. One exception is the mupdate master
keeping encrypted connection to all frontends and establishing
new encrypted connections from the backend for every mailbox creation,
rename and remove, was too much for the 4 cores so we added 4 additional
cores to the VMs.
2. Our frontend instances use IMAPs and POP3s and don't allow STARTTLS.
But we hat to use IMAP and POP3 with STARTTLS on our backends, as
the frontends will always use STARTTLS over IMAP and POP3 to proxy
the connection.
3. We see more IOERRORs in our cyrus logs. In the standalone
cyrus imap IOERROR indicated a corruption in one of the cyrus files
but that is not the case for the new errors we have found:
a) "reading message: unexpected end of file" as far as i can tell,
this is triggert by the imap append command. I suspect when the
connection between frontend and backend is lost or the frontend
dies during upload of the message.
b) "opening index %s: Invalid mailbox name" the mailbox name seem to
be fine in most cases. I haven only figured out why the mailbox
name was considered invalid in one case (the Sting "Posteingang"
was translated by the client and the name "INBOX" ins reserved.
It would help if the String IOERROR would not be used in these cases,
and if the mailbox name would always be logged consistent to the
unixhierarchysep option.
4. Deleting an mailbox with delete_mode: delayed can create a corrupt
mailbox in the DELETED tree. In the logs we found the following:
be/beimap[62020]: Rename: user.LoginID.Mail.drafts ->
DELETED.user.LoginID.Mail.drafts.5416CD11
be/beimap[62020]: MUPDATE: can't commit mailbox entry for
'DELETED.user.LoginID.Mail.drafts.5416CD11'
be/beimap[62020]: Deleted mailbox DELETED.user.LoginID.Mail.drafts.5416CD11
and on the next cyr_expire run
be/cyr_expire[144388]: IOERROR: opening index
DELETED.user.LoginID.Mail.drafts.5416CD11: System I/O error
in the filesystem DELETED/user/LoginID/Mail/drafts was an empty directory.
I couldn't find any hints why the mupdate master couldn't commit the
mailbox entry, but as "5416CD11" is the timestamp of the action, I am
certain that the mailbox did not exist in the mailboxdb before. And as
this only happens in some rare cases I suspect a race condition.
5. Some frontend imapd processes receive a SIGSEGV.
As this seams to happen in the libopenssl I asked on their mailinglist,
but didn't receive an answer jet. At the end you will fine an BT of the
core dump.
I would be glad if changes regarding the logging of IOERRORs
and mailbox names would be included in Cyrus 2.5
Regarding 4. and 5. are these known bugs? I could not find any matching
entries in the bug tracker. If they are not know I would add them to
the bug tracker.
Regards
Michael menge
----- ldd imapd ----
linux-vdso.so.1 => (0x00007fff3ffed000)
libsasl2.so.2 => /usr/lib64/libsasl2.so.2 (0x00007f40e62a8000)
libssl.so.0.9.8 => /usr/lib64/libssl.so.0.9.8 (0x00007f40e6052000)
libcrypto.so.0.9.8 => /usr/lib64/libcrypto.so.0.9.8 (0x00007f40e5cb2000)
libz.so.1 => /lib64/libz.so.1 (0x00007f40e5a9c000)
libwrap.so.0 => /lib64/libwrap.so.0 (0x00007f40e5891000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f40e5678000)
libc.so.6 => /lib64/libc.so.6 (0x00007f40e52ff000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f40e50fb000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f40e4ee3000)
/lib64/ld-linux-x86-64.so.2 (0x00007f40e64f7000)
--- bt on imapd core dump ----
#0 0x000000000080e130 in ?? ()
#1 0x00007fe5a839334f in ssl3_get_message (s=0x80e430,
st1=8347825, stn=-1470427072, mt=<optimized out>, max=102400,
ok=0x7fffcc974d08)
at s3_both.c:522
#2 0x00007fe5a838ba0d in ssl3_get_key_exchange (s=0x0) at s3_clnt.c:1103
#3 0x00007fe5a838dff8 in ssl3_connect (s=0x80e430) at s3_clnt.c:316
#4 0x000000000046a177 in tls_start_clienttls (readfd=16,
writefd=16, layerbits=0x7fffcc975104, authid=0x7fffcc975108,
ret=0x7e1fa0,
sess=0x7e1fa8) at tls.c:1311
#5 0x00000000004669f4 in do_starttls (s=0x7e16a0,
tls_cmd=0x78a4d0 <imap_protocol+208>) at backend.c:201
#6 0x0000000000467217 in backend_authenticate (s=0x7e16a0,
prot=0x78a400 <imap_protocol>, mechlist=0x7fffcc976468,
userid=0x7f5c90 "REPLACED_LOGINID", cb=0x80de30,
status=0x7fffcc976460) at backend.c:378
#7 0x0000000000467a1a in backend_connect (ret_backend=0x7e16a0,
server=0x7a8960 <partition.17660> "ma03.mail.localhost",
prot=0x78a400 <imap_protocol>, userid=0x7f5c90
"REPLACED_LOGINID", cb=0x0, auth_status=0x0) at backend.c:552
#8 0x0000000000426603 in proxy_findserver (server=0x7a8960
<partition.17660> "ma03.mail.localhost", prot=0x78a400 <imap_protocol>,
userid=0x7f5c90 "REPLACED_LOGINID", cache=0x7a3010
<backend_cached>, current=0x7a3008 <backend_current>, inbox=0x7a3000
<backend_inbox>,
clientin=0x7be450) at proxy.c:179
#9 0x0000000000426beb in proxy_findinboxserver (userid=0x7f5b20
"REPLACED_LOGINID") at imap_proxy.c:145
#10 0x00000000004197c8 in cmd_list (tag=0x7f3720 "42.117",
listargs=0x7fffcc977510) at imapd.c:6036
#11 0x000000000040c9ee in cmdloop () at imapd.c:1574
#12 0x000000000040aea5 in service_main (argc=2, argv=0x7b9010,
envp=0x7fffcc97b650) at imapd.c:946
#13 0x0000000000409ba4 in main (argc=6, argv=0x7fffcc97b618,
envp=0x7fffcc97b650) at service.c:582
-----------------------------
--------------------------------------------------------------------------------
M.Menge Tel.: (49) 7071/29-70316
Universität Tübingen Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung mail:
michael.menge at zdv.uni-tuebingen.de
Wächterstraße 76
72074 Tübingen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5425 bytes
Desc: S/MIME Signatur
Url : http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20140922/14084dad/attachment-0001.bin
More information about the Info-cyrus
mailing list