Minutes of calls for the past couple of weeks
brong at fastmailteam.com
Mon Aug 26 07:31:18 EDT 2019
Since I forgot to send them last week, here's minutes for this week and last week!
Calls happen at 11am UTC (7am US Eastern, 9pm Australian Eastern) on Mondays at:
Just Bron and ellie this week!
* Fastmail production hit an fd leak in calalarmd handling of email sending of delayed emails - team effort with Ken and various internal ops and developers to track down the cause. Turns out it that the sendmail-pipe backend just wasn’t closing the fds, but a couple of memory leaks were picked up during the valgrind runs.
* turns out prot_printamap was buggy with zero length strings basically forever! This manifested in cyr_dbtool batch mode occasionally printing foo \n rather than foo ""\n when the value pointer was to an atom character even though the length was zero.
* Intermediates remain the gift that keeps on giving. It turns out that “deleted intermediate” T di was a thing in the mailboxes.db - it didn’t clean up, it didn't promote correctly to a non-deleted intermediate when new children were created, and it didn’t appear via JMAP, leading to the children re-parenting up a level and being unmovable/undeletable due to the update pointing to the wrong place! They have been nuked entirely. A deleted folder now doesn’t care if it used to be an intermediate, it’s just deleted - and hence it cleans up!
* EmailSubmission/set was missing the onSuccess* actions due to the side effect of only implementing create initially, but setting up the framework for the other actions. When they got filled out, the later onSuccess handlers were still incomplete. This manifested as a failure of “undo send” to move the email back to Drafts and set the $draft flag.
* A crashed server recovery led to the discovery that sync_crc.basic == 0 and sync_crc.annot == 0 was the default for an empty mailbox, and also the signal for “CRCs not included in this replication command from an old server, just ignore them!”. This lead to not noticing a message on the replica when the master was empty. The fix that remains backwards compatible to 2.x is to update the crc_annot code in the new Cyrus to calculate annotation CRCs from a base value which isn’t zero (chose 12345678 because it’s in decimal over the wire that that’s really obvious!) - hence it’s easy to distinguish empty folder from no data.
* Now that we’re using calalarmd for sending emails 20 seconds after getting them, sending up to 10 seconds early is no longer as small a deal as it was with calendar alarms! Now we look up to 10 seconds ahead to see if something is coming sooner and run again at the next expected time. This means that something created in the last 10 seconds might take a full 10 seconds to run, but otherwise we’ll wake at the time the next predicted action has to run.
* There was a locking inversion with the jmapcache for contacts and calendars. It was caching within a read query over the same table. It now builds a hash and caches after the select is finished. It’s still not 100% failure proof because sqlite3 locking is kinda janky around multiple writers, so it may need to be protected by a namelock. We can’t rely on conversations.db locking because it’s done during a /get and the nice thing about JMAP /get is that it can all run with shared locks and read-only mailboxes now since it “never writes”. Opportunistic caching in this context doesn't really count as a write!
* On another topic… the combined IETF calext / calconnect call will be at 11:00 US Eastern (aka, Philadelphia time where the meeting is) on Wednesday, October 9th.
* fixed the crazy nxm mailboxes.db hit when counting quotas in promstatsd, still more optimisations to experiment with but it’s not completely junk anymore
* fixed the Metadata.shared cass test to expect the 12345678 default annot crc instead of 0
evidence collected from git logs and internal slack channels suggest that Ken has mostly been working on Snooze support during the past week, when not involved in debugging expeditions.
Present: ellie, Ken, Bron
* Changed some flags usage in EmailSubmission handling: https://github.com/cyrusimap/cyrus-imapd/pull/2852
* Robert may be able to join remotely for CalConnect
* Biggest thing will be JSContact
* There should be people from Ribose there, so might be worth it!
* There’s a locking problems somewhere in calendar alarms and the Fastmail pusher where the pusher connects back via JMAP to fetch the calendar data and it triggers an sqldb_exec error saying that the DB is already locked.
* syslog error checking - Ken is having issues with it. Maybe we need to update the tests to only check syslog if the config has syslog checks enabled.
* has finally fixed the rebase issues for the uuid-by-mailbox commits.
* difficult parts are changes to conversations and annotations
* would like to do some more testing, but is close to ready to put on master!
* Discussed what to do with Snooze and how to handle IMAP and Sieve rules.
* updated some virtdomains and sieve docs
* fixed ipurge mboxevents bug
* ye olde promstatsd update is back, 2nd time a charm
* fixed a couple of bugs in my mboxlist_find* api change from a few weeks ago (doh)
Bron Gondwana, CEO, Fastmail Pty Ltd
brong at fastmailteam.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Cyrus-devel