switch to cyrus murder (aggregator) feedback

Michael Menge michael.menge at zdv.uni-tuebingen.de
Mon Sep 22 07:20:42 EDT 2014


Hi,

3 weeks ago we changed our changed cyrus imap servers form stand
alone systems to a cyrus murder cluster. We have ~44000 accounts,
~457000 Mailboxes, and 2x6.5 TB Mails

In our previos setup we had 6 cyrus imap 2.4.17 servers running as KVM
VMs with 8 GB memory and 4 Cores each, on an HP Blade center (G7 Blades).
Each server was running 2 cyrus instances one master system an one replica
of one of the other servers. We used DNS cnames to distribute our users to
our servers. The filesystems are stored on two Infortrend iSCSI Raids, so
that the replic is not on the same iSCSI system as the master.

In our new setup each server is running 3 - 4 cyrus instances.
One Frontend, one backend, one replic and on one of the servers
the cyrus mupdate master. ClusterIP is used to distribute the access
to our frontend instances. The backend and replics are only listening
on private IPs.

If one server goes down, we will switch that ClusterIP bucket to one
of the other servers, and we will restart the replic as backend by changing
the config and switch the IP of the replic with the ip of the backend. This
is much faster than updating  the mailbox location of all the affected
mailboxes.

If the mupdate master is down we start it on one of the other servers,
using the mailboxdb of the frontend and running "ctl_mboxlist -m -a"
on all backend instances.

Since the migration we discovered some small issues and some bugs.

1. usually Cyrus is not CPU bound. One exception is the mupdate master
    keeping encrypted connection to all frontends and establishing
    new encrypted connections from the backend for every mailbox creation,
    rename and remove, was too much for the 4 cores so we added 4 additional
    cores to the VMs.

2. Our frontend instances use IMAPs and POP3s and don't allow STARTTLS.
    But we hat to use IMAP and POP3 with STARTTLS on our backends, as
    the frontends will always use STARTTLS over IMAP and POP3 to proxy
    the connection.


3. We see more IOERRORs in our cyrus logs. In the standalone
    cyrus imap IOERROR indicated a corruption in one of the cyrus files
    but that is not the case for the new errors we have found:

    a) "reading message: unexpected end of file" as far as i can tell,
       this is triggert by the imap append command. I suspect when the
       connection between frontend and backend is lost or the frontend
       dies during upload of the message.

    b) "opening index %s: Invalid mailbox name" the mailbox name seem to
       be fine in most cases. I haven only figured out why the mailbox
       name was considered invalid in one case (the Sting "Posteingang"
       was translated by the client and the name "INBOX" ins reserved.

    It would help if the String IOERROR would not be used in these cases,
    and if the mailbox name would always be logged consistent to the
    unixhierarchysep option.


4. Deleting an mailbox with delete_mode: delayed can create a corrupt
    mailbox in the DELETED tree. In the logs we found the following:

    be/beimap[62020]: Rename: user.LoginID.Mail.drafts ->  
DELETED.user.LoginID.Mail.drafts.5416CD11

    be/beimap[62020]: MUPDATE: can't commit mailbox entry for  
'DELETED.user.LoginID.Mail.drafts.5416CD11'
    be/beimap[62020]: Deleted mailbox DELETED.user.LoginID.Mail.drafts.5416CD11

    and on the next cyr_expire run

    be/cyr_expire[144388]: IOERROR: opening index  
DELETED.user.LoginID.Mail.drafts.5416CD11: System I/O error

    in the filesystem DELETED/user/LoginID/Mail/drafts was an empty directory.
    I couldn't find any hints why the mupdate master couldn't commit the
    mailbox entry, but as "5416CD11" is the timestamp of the action, I am
    certain that the mailbox did not exist in the mailboxdb before. And as
    this only happens in some rare cases I suspect a race condition.

5. Some frontend imapd processes receive a SIGSEGV.
    As this seams to happen in the libopenssl I asked on their mailinglist,
    but didn't receive an answer jet. At the end you will fine an BT of the
    core dump.

I would be glad if changes regarding the logging of IOERRORs
and mailbox names would be included in Cyrus 2.5

Regarding 4. and 5. are these known bugs? I could not find any matching
entries in the bug tracker. If they are not know I would add them to  
the bug tracker.

Regards

     Michael menge

-----  ldd imapd ----
linux-vdso.so.1 =>  (0x00007fff3ffed000)
libsasl2.so.2 => /usr/lib64/libsasl2.so.2 (0x00007f40e62a8000)
libssl.so.0.9.8 => /usr/lib64/libssl.so.0.9.8 (0x00007f40e6052000)
libcrypto.so.0.9.8 => /usr/lib64/libcrypto.so.0.9.8 (0x00007f40e5cb2000)
libz.so.1 => /lib64/libz.so.1 (0x00007f40e5a9c000)
libwrap.so.0 => /lib64/libwrap.so.0 (0x00007f40e5891000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f40e5678000)
libc.so.6 => /lib64/libc.so.6 (0x00007f40e52ff000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f40e50fb000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f40e4ee3000)
/lib64/ld-linux-x86-64.so.2 (0x00007f40e64f7000)

--- bt on imapd core dump ----
    #0  0x000000000080e130 in ?? ()
    #1  0x00007fe5a839334f in ssl3_get_message (s=0x80e430,  
st1=8347825, stn=-1470427072, mt=<optimized out>, max=102400,  
ok=0x7fffcc974d08)
     at s3_both.c:522
    #2  0x00007fe5a838ba0d in ssl3_get_key_exchange (s=0x0) at s3_clnt.c:1103
    #3  0x00007fe5a838dff8 in ssl3_connect (s=0x80e430) at s3_clnt.c:316
    #4  0x000000000046a177 in tls_start_clienttls (readfd=16,  
writefd=16, layerbits=0x7fffcc975104, authid=0x7fffcc975108,  
ret=0x7e1fa0,
     sess=0x7e1fa8) at tls.c:1311
    #5  0x00000000004669f4 in do_starttls (s=0x7e16a0,  
tls_cmd=0x78a4d0 <imap_protocol+208>) at backend.c:201
    #6  0x0000000000467217 in backend_authenticate (s=0x7e16a0,  
prot=0x78a400 <imap_protocol>, mechlist=0x7fffcc976468,
     userid=0x7f5c90 "REPLACED_LOGINID", cb=0x80de30,  
status=0x7fffcc976460) at backend.c:378
    #7  0x0000000000467a1a in backend_connect (ret_backend=0x7e16a0,  
server=0x7a8960 <partition.17660> "ma03.mail.localhost",
     prot=0x78a400 <imap_protocol>, userid=0x7f5c90  
"REPLACED_LOGINID", cb=0x0, auth_status=0x0) at backend.c:552
    #8  0x0000000000426603 in proxy_findserver (server=0x7a8960  
<partition.17660> "ma03.mail.localhost", prot=0x78a400 <imap_protocol>,
     userid=0x7f5c90 "REPLACED_LOGINID", cache=0x7a3010  
<backend_cached>, current=0x7a3008 <backend_current>, inbox=0x7a3000  
<backend_inbox>,
     clientin=0x7be450) at proxy.c:179
    #9  0x0000000000426beb in proxy_findinboxserver (userid=0x7f5b20  
"REPLACED_LOGINID") at imap_proxy.c:145
    #10 0x00000000004197c8 in cmd_list (tag=0x7f3720 "42.117",  
listargs=0x7fffcc977510) at imapd.c:6036
    #11 0x000000000040c9ee in cmdloop () at imapd.c:1574
    #12 0x000000000040aea5 in service_main (argc=2, argv=0x7b9010,  
envp=0x7fffcc97b650) at imapd.c:946
    #13 0x0000000000409ba4 in main (argc=6, argv=0x7fffcc97b618,  
envp=0x7fffcc97b650) at service.c:582
-----------------------------






--------------------------------------------------------------------------------
M.Menge                                Tel.: (49) 7071/29-70316
Universität Tübingen                   Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung          mail:  
michael.menge at zdv.uni-tuebingen.de
Wächterstraße 76
72074 Tübingen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5425 bytes
Desc: S/MIME Signatur
Url : http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20140922/14084dad/attachment-0001.bin 


More information about the Info-cyrus mailing list