switch to cyrus murder (aggregator) feedback
Nic Bernstein
nic at onlight.com
Mon Sep 22 09:24:41 EDT 2014
Michael,
I'd like to thank you for having written up such a succinct and
reasonable description of a well thought out murder installation. Lot's
of good information here, especially for people who may be considering a
move like yours. This could be the bones of a good Wiki article.
Cheers,
-nic
On 09/22/2014 06:20 AM, Michael Menge wrote:
> Hi,
>
> 3 weeks ago we changed our changed cyrus imap servers form stand
> alone systems to a cyrus murder cluster. We have ~44000 accounts,
> ~457000 Mailboxes, and 2x6.5 TB Mails
>
> In our previos setup we had 6 cyrus imap 2.4.17 servers running as KVM
> VMs with 8 GB memory and 4 Cores each, on an HP Blade center (G7 Blades).
> Each server was running 2 cyrus instances one master system an one
> replica
> of one of the other servers. We used DNS cnames to distribute our
> users to
> our servers. The filesystems are stored on two Infortrend iSCSI Raids, so
> that the replic is not on the same iSCSI system as the master.
>
> In our new setup each server is running 3 - 4 cyrus instances.
> One Frontend, one backend, one replic and on one of the servers
> the cyrus mupdate master. ClusterIP is used to distribute the access
> to our frontend instances. The backend and replics are only listening
> on private IPs.
>
> If one server goes down, we will switch that ClusterIP bucket to one
> of the other servers, and we will restart the replic as backend by
> changing
> the config and switch the IP of the replic with the ip of the backend.
> This
> is much faster than updating the mailbox location of all the affected
> mailboxes.
>
> If the mupdate master is down we start it on one of the other servers,
> using the mailboxdb of the frontend and running "ctl_mboxlist -m -a"
> on all backend instances.
>
> Since the migration we discovered some small issues and some bugs.
>
> 1. usually Cyrus is not CPU bound. One exception is the mupdate master
> keeping encrypted connection to all frontends and establishing
> new encrypted connections from the backend for every mailbox creation,
> rename and remove, was too much for the 4 cores so we added 4
> additional
> cores to the VMs.
>
> 2. Our frontend instances use IMAPs and POP3s and don't allow STARTTLS.
> But we hat to use IMAP and POP3 with STARTTLS on our backends, as
> the frontends will always use STARTTLS over IMAP and POP3 to proxy
> the connection.
>
>
> 3. We see more IOERRORs in our cyrus logs. In the standalone
> cyrus imap IOERROR indicated a corruption in one of the cyrus files
> but that is not the case for the new errors we have found:
>
> a) "reading message: unexpected end of file" as far as i can tell,
> this is triggert by the imap append command. I suspect when the
> connection between frontend and backend is lost or the frontend
> dies during upload of the message.
>
> b) "opening index %s: Invalid mailbox name" the mailbox name seem to
> be fine in most cases. I haven only figured out why the mailbox
> name was considered invalid in one case (the Sting "Posteingang"
> was translated by the client and the name "INBOX" ins reserved.
>
> It would help if the String IOERROR would not be used in these cases,
> and if the mailbox name would always be logged consistent to the
> unixhierarchysep option.
>
>
> 4. Deleting an mailbox with delete_mode: delayed can create a corrupt
> mailbox in the DELETED tree. In the logs we found the following:
>
> be/beimap[62020]: Rename: user.LoginID.Mail.drafts ->
> DELETED.user.LoginID.Mail.drafts.5416CD11
>
> be/beimap[62020]: MUPDATE: can't commit mailbox entry for
> 'DELETED.user.LoginID.Mail.drafts.5416CD11'
> be/beimap[62020]: Deleted mailbox
> DELETED.user.LoginID.Mail.drafts.5416CD11
>
> and on the next cyr_expire run
>
> be/cyr_expire[144388]: IOERROR: opening index
> DELETED.user.LoginID.Mail.drafts.5416CD11: System I/O error
>
> in the filesystem DELETED/user/LoginID/Mail/drafts was an empty
> directory.
> I couldn't find any hints why the mupdate master couldn't commit the
> mailbox entry, but as "5416CD11" is the timestamp of the action, I am
> certain that the mailbox did not exist in the mailboxdb before. And as
> this only happens in some rare cases I suspect a race condition.
>
> 5. Some frontend imapd processes receive a SIGSEGV.
> As this seams to happen in the libopenssl I asked on their
> mailinglist,
> but didn't receive an answer jet. At the end you will fine an BT of
> the
> core dump.
>
> I would be glad if changes regarding the logging of IOERRORs
> and mailbox names would be included in Cyrus 2.5
>
> Regarding 4. and 5. are these known bugs? I could not find any matching
> entries in the bug tracker. If they are not know I would add them to
> the bug tracker.
>
> Regards
>
> Michael menge
>
> ----- ldd imapd ----
> linux-vdso.so.1 => (0x00007fff3ffed000)
> libsasl2.so.2 => /usr/lib64/libsasl2.so.2 (0x00007f40e62a8000)
> libssl.so.0.9.8 => /usr/lib64/libssl.so.0.9.8 (0x00007f40e6052000)
> libcrypto.so.0.9.8 => /usr/lib64/libcrypto.so.0.9.8 (0x00007f40e5cb2000)
> libz.so.1 => /lib64/libz.so.1 (0x00007f40e5a9c000)
> libwrap.so.0 => /lib64/libwrap.so.0 (0x00007f40e5891000)
> libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f40e5678000)
> libc.so.6 => /lib64/libc.so.6 (0x00007f40e52ff000)
> libdl.so.2 => /lib64/libdl.so.2 (0x00007f40e50fb000)
> libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f40e4ee3000)
> /lib64/ld-linux-x86-64.so.2 (0x00007f40e64f7000)
>
> --- bt on imapd core dump ----
> #0 0x000000000080e130 in ?? ()
> #1 0x00007fe5a839334f in ssl3_get_message (s=0x80e430,
> st1=8347825, stn=-1470427072, mt=<optimized out>, max=102400,
> ok=0x7fffcc974d08)
> at s3_both.c:522
> #2 0x00007fe5a838ba0d in ssl3_get_key_exchange (s=0x0) at
> s3_clnt.c:1103
> #3 0x00007fe5a838dff8 in ssl3_connect (s=0x80e430) at s3_clnt.c:316
> #4 0x000000000046a177 in tls_start_clienttls (readfd=16,
> writefd=16, layerbits=0x7fffcc975104, authid=0x7fffcc975108,
> ret=0x7e1fa0,
> sess=0x7e1fa8) at tls.c:1311
> #5 0x00000000004669f4 in do_starttls (s=0x7e16a0, tls_cmd=0x78a4d0
> <imap_protocol+208>) at backend.c:201
> #6 0x0000000000467217 in backend_authenticate (s=0x7e16a0,
> prot=0x78a400 <imap_protocol>, mechlist=0x7fffcc976468,
> userid=0x7f5c90 "REPLACED_LOGINID", cb=0x80de30,
> status=0x7fffcc976460) at backend.c:378
> #7 0x0000000000467a1a in backend_connect (ret_backend=0x7e16a0,
> server=0x7a8960 <partition.17660> "ma03.mail.localhost",
> prot=0x78a400 <imap_protocol>, userid=0x7f5c90 "REPLACED_LOGINID",
> cb=0x0, auth_status=0x0) at backend.c:552
> #8 0x0000000000426603 in proxy_findserver (server=0x7a8960
> <partition.17660> "ma03.mail.localhost", prot=0x78a400 <imap_protocol>,
> userid=0x7f5c90 "REPLACED_LOGINID", cache=0x7a3010
> <backend_cached>, current=0x7a3008 <backend_current>, inbox=0x7a3000
> <backend_inbox>,
> clientin=0x7be450) at proxy.c:179
> #9 0x0000000000426beb in proxy_findinboxserver (userid=0x7f5b20
> "REPLACED_LOGINID") at imap_proxy.c:145
> #10 0x00000000004197c8 in cmd_list (tag=0x7f3720 "42.117",
> listargs=0x7fffcc977510) at imapd.c:6036
> #11 0x000000000040c9ee in cmdloop () at imapd.c:1574
> #12 0x000000000040aea5 in service_main (argc=2, argv=0x7b9010,
> envp=0x7fffcc97b650) at imapd.c:946
> #13 0x0000000000409ba4 in main (argc=6, argv=0x7fffcc97b618,
> envp=0x7fffcc97b650) at service.c:582
> -----------------------------
>
>
>
>
>
>
> --------------------------------------------------------------------------------
>
> M.Menge Tel.: (49) 7071/29-70316
> Universität Tübingen Fax.: (49) 7071/29-5912
> Zentrum für Datenverarbeitung mail:
> michael.menge at zdv.uni-tuebingen.de
> Wächterstraße 76
> 72074 Tübingen
>
>
> ----
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
--
Nic Bernstein nic at onlight.com
Onlight, Inc. www.onlight.com
219 N. Milwaukee St., Suite 2a v. 414.272.4477
Milwaukee, Wisconsin 53202
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20140922/55dbcc1f/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nic.vcf
Type: text/x-vcard
Size: 271 bytes
Desc: not available
Url : http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20140922/55dbcc1f/attachment-0001.vcf
More information about the Info-cyrus
mailing list