From brong at fastmailteam.com Tue Aug 13 05:22:02 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Tue, 13 Aug 2019 19:22:02 +1000 Subject: Cassandane now requires read access to syslog as the cyrus user Message-ID: <7b630f31-4d02-4deb-8307-d20ba9fea6a5@www.fastmail.com> I've just pushed some changes to Cassandane to scan syslog for known "badness" (basically anything with ERROR in it for now, but it's easy to add other things). Unfortunately, that does require that the cyrus user can access the syslog. I've added the template to cassandane.ini.example to choose which file to look at, default is /var/log/syslog. On my Ubuntu machine I just ran "adduser cyrus adm" to grant it access - I'm sure there's similar tricks on other machines, or syslog can be set up to send just cyrus data to a special file and point cassandane there. This allows Instance to watch syslog and die if there are any unhandled IOERROR / DBERROR / SYNCERROR lines, which picks up new sets of bugs automatically :) There are a couple of cases already where I've put in tests which drain the syslog ($self->{instance}->getsyslog) and check for the errors they expect. There will need to be more, so some tests will be failing right now. Cheers, Bron. -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brong at fastmailteam.com Mon Aug 26 07:31:18 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Mon, 26 Aug 2019 21:31:18 +1000 Subject: Minutes of calls for the past couple of weeks Message-ID: <6ff392ad-f654-4028-9255-fac64bce4084@www.fastmail.com> Since I forgot to send them last week, here's minutes for this week and last week! Calls happen at 11am UTC (7am US Eastern, 9pm Australian Eastern) on Mondays at: https://zoom.us/j/598343302 2019-08-26: Just Bron and ellie this week! Bron: * Fastmail production hit an fd leak in calalarmd handling of email sending of delayed emails - team effort with Ken and various internal ops and developers to track down the cause. Turns out it that the sendmail-pipe backend just wasn?t closing the fds, but a couple of memory leaks were picked up during the valgrind runs. * turns out prot_printamap was buggy with zero length strings basically forever! This manifested in cyr_dbtool batch mode occasionally printing foo \n rather than foo ""\n when the value pointer was to an atom character even though the length was zero. * Intermediates remain the gift that keeps on giving. It turns out that ?deleted intermediate? T di was a thing in the mailboxes.db - it didn?t clean up, it didn't promote correctly to a non-deleted intermediate when new children were created, and it didn?t appear via JMAP, leading to the children re-parenting up a level and being unmovable/undeletable due to the update pointing to the wrong place! They have been nuked entirely. A deleted folder now doesn?t care if it used to be an intermediate, it?s just deleted - and hence it cleans up! * EmailSubmission/set was missing the onSuccess* actions due to the side effect of only implementing create initially, but setting up the framework for the other actions. When they got filled out, the later onSuccess handlers were still incomplete. This manifested as a failure of ?undo send? to move the email back to Drafts and set the $draft flag. * A crashed server recovery led to the discovery that sync_crc.basic == 0 and sync_crc.annot == 0 was the default for an empty mailbox, and also the signal for ?CRCs not included in this replication command from an old server, just ignore them!?. This lead to not noticing a message on the replica when the master was empty. The fix that remains backwards compatible to 2.x is to update the crc_annot code in the new Cyrus to calculate annotation CRCs from a base value which isn?t zero (chose 12345678 because it?s in decimal over the wire that that?s really obvious!) - hence it?s easy to distinguish empty folder from no data. * Now that we?re using calalarmd for sending emails 20 seconds after getting them, sending up to 10 seconds early is no longer as small a deal as it was with calendar alarms! Now we look up to 10 seconds ahead to see if something is coming sooner and run again at the next expected time. This means that something created in the last 10 seconds might take a full 10 seconds to run, but otherwise we?ll wake at the time the next predicted action has to run. * There was a locking inversion with the jmapcache for contacts and calendars. It was caching within a read query over the same table. It now builds a hash and caches after the select is finished. It?s still not 100% failure proof because sqlite3 locking is kinda janky around multiple writers, so it may need to be protected by a namelock. We can?t rely on conversations.db locking because it?s done during a /get and the nice thing about JMAP /get is that it can all run with shared locks and read-only mailboxes now since it ?never writes?. Opportunistic caching in this context doesn't really count as a write! * On another topic? the combined IETF calext / calconnect call will be at 11:00 US Eastern (aka, Philadelphia time where the meeting is) on Wednesday, October 9th. ellie: * fixed the crazy nxm mailboxes.db hit when counting quotas in promstatsd, still more optimisations to experiment with but it?s not completely junk anymore * fixed the Metadata.shared cass test to expect the 12345678 default annot crc instead of 0 evidence collected from git logs and internal slack channels suggest that Ken has mostly been working on Snooze support during the past week, when not involved in debugging expeditions. 2019-08-19: Present: ellie, Ken, Bron Bron: * Changed some flags usage in EmailSubmission handling: https://github.com/cyrusimap/cyrus-imapd/pull/2852 * Robert may be able to join remotely for CalConnect * Biggest thing will be JSContact * There should be people from Ribose there, so might be worth it! * There?s a locking problems somewhere in calendar alarms and the Fastmail pusher where the pusher connects back via JMAP to fetch the calendar data and it triggers an sqldb_exec error saying that the DB is already locked. * syslog error checking - Ken is having issues with it. Maybe we need to update the tests to only check syslog if the config has syslog checks enabled. Ken: * has finally fixed the rebase issues for the uuid-by-mailbox commits. * difficult parts are changes to conversations and annotations * would like to do some more testing, but is close to ready to put on master! * Discussed what to do with Snooze and how to handle IMAP and Sieve rules. ellie: * updated some virtdomains and sieve docs * fixed ipurge mboxevents bug * ye olde promstatsd update is back, 2nd time a charm * fixed a couple of bugs in my mboxlist_find* api change from a few weeks ago (doh) -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.cataldo at bluemind.net Wed Aug 28 06:04:34 2019 From: thomas.cataldo at bluemind.net (Thomas Cataldo) Date: Wed, 28 Aug 2019 12:04:34 +0200 Subject: replication together with object storage (cyrus 3.0) Message-ID: <82459351-E135-4675-B4F4-5E398BEDB0C6@bluemind.net> Hi, We are working with a replication enabled cyrus 3.0.8. Our replication endpoint is not another cyrus but I don?t think it matters here. I am working on enabling object storage support with our setup. Our replication settiings look like this : sync_log: 1 sync_log_channels: core core_sync_authname: admin0 core_sync_password: xxxxx core_sync_realm: repl_realm core_sync_host: 172.16.167.129 core_sync_port: 2501 core_sync_repeat_interval: 0 core_sync_try_imap: 0 We made a copy of objectstore_dummy.c and simplified it to hardcode the path to /dummy-sds/ for testing purposes. The object storage/archive setup looks like this : object_storage_enabled: 1 archive_enabled: 1 archive_days: 0 archive_maxsize: 0 archive_keepflagged: 0 >From an imap point of view, everything works fine, emails are written through the object storage and are read from there. # ls -l /dummy-sds/ ... -rw------- 1 cyrus mail 1491 Aug 28 11:35 ae3aaefa50d04042bc369e29e9c819ed600d3a03 -rw------- 1 cyrus mail 863 Aug 28 11:35 f996f4ad4e4c37c3c2cc852214fdeca778d9c43f The problem occurs when sync_client triggers its replication code. >From our logs we see: Aug 28 11:35:23 bm1804 cyrus/lmtp[49143]: creating sql_db /var/spool/cyrus/data/bm-master__devenv_blue/domain/d/devenv.blue/t/user/tom/message.db Aug 28 11:35:23 bm1804 cyrus/imap[116313]: USAGE admin at devenv.blue user: 0.000000 sys: 0.010090 Aug 28 11:35:23 bm1804 cyrus/sync_client[60125]: MAILBOXES devenv.blue!user.admin.Sent devenv.blue!user.tom Aug 28 11:35:23 bm1804 cyrus/lmtp[49143]: Delivered: to mailbox: devenv.blue!user.tom Aug 28 11:35:23 bm1804 cyrus/sync_client[60125]: MAILBOX devenv.blue!user.admin.Sent Aug 28 11:35:23 bm1804 cyrus/sync_client[60125]: IOERROR: Failed to read file /var/spool/cyrus/data/bm-master__devenv_blue/domain/d/devenv.blue/a/user/admin/Sent/4. Aug 28 11:35:23 bm1804 cyrus/lmtp[49143]: USAGE tom at devenv.blue user: 0.000000 sys: 0.011507 Aug 28 11:35:23 bm1804 cyrus/sync_client[60125]: MAILBOX devenv.blue!user.tom Aug 28 11:35:23 bm1804 cyrus/sync_client[60125]: IOERROR: Failed to read file /var/spool/cyrus/data/bm-master__devenv_blue/domain/d/devenv.blue/t/user/tom/1. Aug 28 11:35:23 bm1804 cyrus/sync_client[60125]: MAILBOXES devenv.blue!user.tom Aug 28 11:35:23 bm1804 cyrus/imap[33172]: session initialised for admin at devenv.blue Aug 28 11:35:23 bm1804 cyrus/imap[33172]: login: bm1804.devenv.blue [172.16.167.129] admin at devenv.blue PLAIN User logged in SESSIONID= >From a protocol point of view we receive (REPL C is what sync_client sends us, REPL S is what we respond): 2019-08-28 09:35:23,310 [vert.x-eventloop-thread-1] n.b.b.c.r.s.ReplicationSession INFO - REPL C: [frame-00000001]: GET MAILBOXES (devenv.blue!user.admin.Sent devenv.blue!user.tom) 2019-08-28 09:35:23,326 [vert.x-eventloop-thread-1] n.b.b.c.r.s.ReplicationSession INFO - REPL S: [frame-00000001]: * MAILBOX %(UNIQUEID 2971165a-f29e-46f1-8d97-ec2f62eb3e94 MBOXNAME devenv.blue!user.admin.Sent SYNC_CRC 583232180 SYNC_CRC_ANNOT 0 LAST_UID 3 HIGHESTMODSEQ 6 RECENTUID 3 RECENTTIME 1566981904 LAST_APP... [truncated] 2019-08-28 09:35:23,329 [vert.x-eventloop-thread-1] n.b.b.c.r.s.ReplicationSession INFO - REPL C: [frame-00000002]: APPLY RESERVE %(PARTITION bm-master__devenv_blue MBOXNAME (devenv.blue!user.admin.Sent devenv.blue!user.tom) GUID (f996f4ad4e4c37c3c2cc852214fdeca778d9c43f ae3aaefa50d04042bc369e29e9c819ed600d3a03)) 2019-08-28 09:35:23,332 [vert.x-eventloop-thread-1] n.b.b.c.r.s.ReplicationSession INFO - REPL S: [frame-00000002]: * MISSING (f996f4ad4e4c37c3c2cc852214fdeca778d9c43f ae3aaefa50d04042bc369e29e9c819ed600d3a03) OK success 2019-08-28 09:35:23,334 [vert.x-eventloop-thread-1] n.b.b.c.r.s.ReplicationSession INFO - REPL C: [frame-00000003]: APPLY MESSAGE (NIL) 2019-08-28 09:35:23,334 [vert.x-eventloop-thread-1] n.b.b.c.r.s.ReplicationSession INFO - REPL S: [frame-00000003]: OK success 2019-08-28 09:35:23,337 [vert.x-eventloop-thread-1] n.b.b.c.r.s.ReplicationSession INFO - REPL C: [frame-00000004]: APPLY MAILBOX %(UNIQUEID 2971165a-f29e-46f1-8d97-ec2f62eb3e94 MBOXNAME devenv.blue!user.admin.Sent SYNC_CRC 3650189183 SYNC_CRC_ANNOT 0 LAST_UID 4 HIGHESTMODSEQ 7 RECENTUID 3 RECENTTIME 1566981904 LAST_APPENDDATE 1566984923 POP3_LAST_LOGIN 0 POP3_SHOW_AFTER 0 UIDVALIDITY 1566980951 PARTITION bm-master__devenv_blue ACL "admin0 lrswipkxtecdan E8F1E28F-8778-442B-A581-C6B613CE9555 at devenv.blue lrswipktecdan " OPTIONS P ANNOTATIONS (%(ENTRY /specialuse USERID admin at devenv.blue VALUE {5+}{t28.bin})) RECORD (%(UID 4 MODSEQ 7 LAST_UPDATED 1566984923 FLAGS (\Seen) INTERNALDATE 1566984923 SIZE 863 GUID f996f4ad4e4c37c3c2cc852214fdeca778d9c43f))) The problem seems pretty ?obvious? : sync_client APPLY RESERVE for the 2 bodies (from Sent folder + the copy for the recipient inbox), the it tries to read the bodies but I imagine the sync_client code is not object-storage enabled correctly and only tries the local filesystem path instead of asking to the object storage. It could also be an expected behaviour, as the replica(s) can share the object storage with master and should reply that nothing is missing. What is your point of view on that ? Could you give me a point to where in the sync_client code I should look to ?object storage?-enable it. Quite un-related but we enjoyed reading your stuff about sync improvements and having multiple sync streams between master & replica. If we can help on that we would be happy to. Regards, Thomas. Thomas Cataldo Directeur Technique (+33) 6 42 25 91 38 BlueMind +33 (0)5 81 91 55 60 Hotel des T?l?coms, 40 rue du village d'entreprises 31670 Lab?ge, France www.bluemind.net / https://blog.bluemind.net/fr/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: x-disclaimer-1318405639-0.png Type: image/png Size: 432 bytes Desc: x-disclaimer-1318405639-0.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: x-disclaimer-1318405639-1.png Type: image/png Size: 858 bytes Desc: x-disclaimer-1318405639-1.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: x-disclaimer-1318405639-2.png Type: image/png Size: 432 bytes Desc: x-disclaimer-1318405639-2.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: x-disclaimer-1318405639-3.png Type: image/png Size: 400 bytes Desc: x-disclaimer-1318405639-3.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: x-disclaimer-1318405639-4.png Type: image/png Size: 407 bytes Desc: x-disclaimer-1318405639-4.png URL: