From rjbs at fastmailteam.com Mon Nov 4 20:04:25 2019 From: rjbs at fastmailteam.com (Ricardo Signes) Date: Mon, 04 Nov 2019 20:04:25 -0500 Subject: time for cyrus-imap v3.2? Message-ID: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> So, I think the plan was to cut a stable Cyrus 3.2 after we had stable JMAP. Is that time now? We talked about this on the Zoom call today. Cyrus master has pretty stable for JMAP core and mail. I think we need to do one more pass through to look for places where Cyrus extensions might leak through without the correct `using` options, but apart from that, I don't think we expect its mail API to change apart from bugfixes. The other part of the conversation was declaring pre-3 releases EOL except for security fixes. I don't have much of a horse in this race, but it felt like a bit of looming question. -- Ricardo Signes (rjbs) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brong at fastmailteam.com Tue Nov 5 00:44:42 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Tue, 05 Nov 2019 16:44:42 +1100 Subject: time for cyrus-imap v3.2? In-Reply-To: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> Message-ID: <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> On Tue, Nov 5, 2019, at 12:04, Ricardo Signes wrote: > So, I think the plan was to cut a stable Cyrus 3.2 after we had stable JMAP. Is that time now? We talked about this on the Zoom call today. I think we're pretty close to it. The big question is: do we fork what will eventually become 3.2 and keep stabilising on it while we ship UUID mailboxes on master, or do we finish 3.2 before we merge uuid mailboxes. > Cyrus master has pretty stable for JMAP core and mail. I think we need to do one more pass through to look for places where Cyrus extensions might leak through without the correct `using` options, but apart from that, I don't think we expect its mail API to change apart from bugfixes. Yep, legit. The one big thing still missing there is PushSubscriptions. I'd be keen to finish writing that. I mean: https://github.com/cyrusimap/cyrus-imapd/issues?q=is%3Aopen+is%3Aissue+label%3A3.2 We should probably do a push and resolve all of those, then boom let's go. > The other part of the conversation was declaring pre-3 releases EOL except for security fixes. > > I don't have much of a horse in this race, but it felt like a bit of looming question. Generally the pattern has been "current stable and oldstable are supported" - so that would be 3.2 and 3.0 once we release 3.2. I'm good with that. Bron. -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.cataldo at bluemind.net Tue Nov 5 01:30:47 2019 From: thomas.cataldo at bluemind.net (Thomas Cataldo) Date: Tue, 5 Nov 2019 07:30:47 +0100 Subject: Which imap command to rename a root mailbox while maintaining its partition In-Reply-To: <6d9174c8-a1f2-9268-571f-e2459fd08cc5@fastmail.com> References: <13F3250B-9B46-4C37-81DF-2F68A5D93F0B@bluemind.net> <6d9174c8-a1f2-9268-571f-e2459fd08cc5@fastmail.com> Message-ID: <742CEDA9-4792-4176-B938-234A7808FDD1@bluemind.net> > On 29 Oct 2019, at 13:13, Ken Murchison wrote: > > x RENAME > > > should work Agree, but it does not :-) At least with version 3.0.8 : localhost> info user/ren at devenv.blue {user/ren at devenv.blue}: private: check: NIL checkperiod: NIL comment: NIL sort: NIL specialuse: NIL thread: NIL expire: NIL news2mail: NIL sieve: NIL squat: NIL shared: check: NIL checkperiod: NIL comment: NIL sort: NIL specialuse: NIL thread: NIL annotsize: 0 duplicatedeliver: false expire: NIL lastpop: NIL lastupdate: 4-Nov-2019 15:32:13 +0000 news2mail: NIL partition: bm-master__devenv_blue pop3newuidl: true pop3showafter: NIL sharedseen: false sieve: NIL size: 32310 squat: NIL synccrcs: 2599665889 0 uniqueid: ee8ede37-153a-4650-bf94-3da7d4f52043 An IMAP session as admin : telnet localhost 1143 Trying ::1... Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. * OK [CAPABILITY IMAP4rev1 LITERAL+ ID ENABLE AUTH=PLAIN SASL-IR] server ready . login admin0 admin . OK [CAPABILITY IMAP4rev1 LITERAL+ ID ENABLE ACL RIGHTS=kxten QUOTA MAILBOX-REFERRALS NAMESPACE UIDPLUS NO_ATOMIC_RENAME UNSELECT CHILDREN MULTIAPPEND BINARY CATENATE CONDSTORE ESEARCH SEARCH=FUZZY SORT SORT=MODSEQ SORT=DISPLAY SORT=UID THREAD=ORDEREDSUBJECT THREAD=REFERENCES THREAD=REFS ANNOTATEMORE ANNOTATE-EXPERIMENT-1 METADATA LIST-EXTENDED LIST-STATUS LIST-MYRIGHTS LIST-METADATA WITHIN QRESYNC SCAN XLIST XMOVE MOVE SPECIAL-USE CREATE-SPECIAL-USE DIGEST=SHA1 X-REPLICATION URLAUTH URLAUTH=BINARY LOGINDISABLED COMPRESS=DEFLATE X-QUOTA=STORAGE X-QUOTA=MESSAGE X-QUOTA=X-ANNOTATION-STORAGE X-QUOTA=X-NUM-FOLDERS IDLE] User logged in SESSIONID= . RENAME user/ren at devenv.blue user/rename at devenv.blue bm-master__devenv_blue . NO Cross-server or cross-partition move w/rename not supported . RENAME user/ren at devenv.blue user/rename at devenv.blue * OK rename user/ren at devenv.blue user/rename at devenv.blue * OK rename user/ren/Drafts at devenv.blue user/rename/Drafts at devenv.blue * OK rename user/ren/Junk at devenv.blue user/rename/Junk at devenv.blue * OK rename user/ren/Outbox at devenv.blue user/rename/Outbox at devenv.blue * OK rename user/ren/Sent at devenv.blue user/rename/Sent at devenv.blue * OK rename user/ren/Trash at devenv.blue user/rename/Trash at devenv.blue . OK Completed But if I use the version without an explicit partition, the new mailbox ends up in : > info user/rename at devenv.blue {user/rename at devenv.blue}: private: check: NIL checkperiod: NIL comment: NIL sort: NIL specialuse: NIL thread: NIL expire: NIL news2mail: NIL sieve: NIL squat: NIL shared: check: NIL checkperiod: NIL comment: NIL sort: NIL specialuse: NIL thread: NIL annotsize: 0 duplicatedeliver: false expire: NIL lastpop: NIL lastupdate: 4-Nov-2019 16:43:36 +0000 news2mail: NIL partition: default pop3newuidl: true pop3showafter: NIL sharedseen: false sieve: NIL size: 32310 squat: NIL synccrcs: 2599665889 0 uniqueid: ee8ede37-153a-4650-bf94-3da7d4f52043 which forces me issue a second command in my imap session : . RENAME user/rename at devenv.blue user/rename at devenv.blue bm-master__devenv_blue * OK rename user/rename at devenv.blue user/rename at devenv.blue * OK rename user/rename/Drafts at devenv.blue user/rename/Drafts at devenv.blue * OK rename user/rename/Junk at devenv.blue user/rename/Junk at devenv.blue * OK rename user/rename/Outbox at devenv.blue user/rename/Outbox at devenv.blue * OK rename user/rename/Sent at devenv.blue user/rename/Sent at devenv.blue * OK rename user/rename/Trash at devenv.blue user/rename/Trash at devenv.blue . OK Completed Which moves the mailbox to the partition where I want it (its original one). The problem with the non-atomic rename is that our replication target receives data belonging to the default partition, which is not desired or expected. Thomas Cataldo Directeur Technique (+33) 6 42 25 91 38 BlueMind +33 (0)5 81 91 55 60 Hotel des T?l?coms, 40 rue du village d'entreprises 31670 Lab?ge, France www.bluemind.net / https://blog.bluemind.net/fr/ From brong at fastmailteam.com Tue Nov 5 03:08:27 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Tue, 05 Nov 2019 19:08:27 +1100 Subject: =?UTF-8?Q?Re:_Which_imap_command_to_rename_a_root_mailbox_while_maintain?= =?UTF-8?Q?ing_its_partition?= In-Reply-To: <742CEDA9-4792-4176-B938-234A7808FDD1@bluemind.net> References: <13F3250B-9B46-4C37-81DF-2F68A5D93F0B@bluemind.net> <6d9174c8-a1f2-9268-571f-e2459fd08cc5@fastmail.com> <742CEDA9-4792-4176-B938-234A7808FDD1@bluemind.net> Message-ID: Wow - this looks like a bug in partition selection for user rename then :( We should fix that. https://github.com/cyrusimap/cyrus-imapd/issues/2907 Cheers, Bron. On Tue, Nov 5, 2019, at 17:30, Thomas Cataldo wrote: > > > > On 29 Oct 2019, at 13:13, Ken Murchison wrote: > > > > x RENAME > > > > > > should work > > > Agree, but it does not :-) > > At least with version 3.0.8 : > > localhost> info user/ren at devenv.blue > {user/ren at devenv.blue}: > private: > check: NIL > checkperiod: NIL > comment: NIL > sort: NIL > specialuse: NIL > thread: NIL > expire: NIL > news2mail: NIL > sieve: NIL > squat: NIL > shared: > check: NIL > checkperiod: NIL > comment: NIL > sort: NIL > specialuse: NIL > thread: NIL > annotsize: 0 > duplicatedeliver: false > expire: NIL > lastpop: NIL > lastupdate: 4-Nov-2019 15:32:13 +0000 > news2mail: NIL > partition: bm-master__devenv_blue > pop3newuidl: true > pop3showafter: NIL > sharedseen: false > sieve: NIL > size: 32310 > squat: NIL > synccrcs: 2599665889 0 > uniqueid: ee8ede37-153a-4650-bf94-3da7d4f52043 > > > An IMAP session as admin : > > telnet localhost 1143 > Trying ::1... > Trying 127.0.0.1... > Connected to localhost. > Escape character is '^]'. > * OK [CAPABILITY IMAP4rev1 LITERAL+ ID ENABLE AUTH=PLAIN SASL-IR] server ready > . login admin0 admin > . OK [CAPABILITY IMAP4rev1 LITERAL+ ID ENABLE ACL RIGHTS=kxten QUOTA MAILBOX-REFERRALS NAMESPACE UIDPLUS NO_ATOMIC_RENAME UNSELECT CHILDREN MULTIAPPEND BINARY CATENATE CONDSTORE ESEARCH SEARCH=FUZZY SORT SORT=MODSEQ SORT=DISPLAY SORT=UID THREAD=ORDEREDSUBJECT THREAD=REFERENCES THREAD=REFS ANNOTATEMORE ANNOTATE-EXPERIMENT-1 METADATA LIST-EXTENDED LIST-STATUS LIST-MYRIGHTS LIST-METADATA WITHIN QRESYNC SCAN XLIST XMOVE MOVE SPECIAL-USE CREATE-SPECIAL-USE DIGEST=SHA1 X-REPLICATION URLAUTH URLAUTH=BINARY LOGINDISABLED COMPRESS=DEFLATE X-QUOTA=STORAGE X-QUOTA=MESSAGE X-QUOTA=X-ANNOTATION-STORAGE X-QUOTA=X-NUM-FOLDERS IDLE] User logged in SESSIONID= > . RENAME user/ren at devenv.blue user/rename at devenv.blue bm-master__devenv_blue > . NO Cross-server or cross-partition move w/rename not supported > . RENAME user/ren at devenv.blue user/rename at devenv.blue > * OK rename user/ren at devenv.blue user/rename at devenv.blue > * OK rename user/ren/Drafts at devenv.blue user/rename/Drafts at devenv.blue > * OK rename user/ren/Junk at devenv.blue user/rename/Junk at devenv.blue > * OK rename user/ren/Outbox at devenv.blue user/rename/Outbox at devenv.blue > * OK rename user/ren/Sent at devenv.blue user/rename/Sent at devenv.blue > * OK rename user/ren/Trash at devenv.blue user/rename/Trash at devenv.blue > . OK Completed > > But if I use the version without an explicit partition, the new mailbox ends up in : > > > info user/rename at devenv.blue > {user/rename at devenv.blue}: > private: > check: NIL > checkperiod: NIL > comment: NIL > sort: NIL > specialuse: NIL > thread: NIL > expire: NIL > news2mail: NIL > sieve: NIL > squat: NIL > shared: > check: NIL > checkperiod: NIL > comment: NIL > sort: NIL > specialuse: NIL > thread: NIL > annotsize: 0 > duplicatedeliver: false > expire: NIL > lastpop: NIL > lastupdate: 4-Nov-2019 16:43:36 +0000 > news2mail: NIL > partition: default > pop3newuidl: true > pop3showafter: NIL > sharedseen: false > sieve: NIL > size: 32310 > squat: NIL > synccrcs: 2599665889 0 > uniqueid: ee8ede37-153a-4650-bf94-3da7d4f52043 > > > which forces me issue a second command in my imap session : > > . RENAME user/rename at devenv.blue user/rename at devenv.blue bm-master__devenv_blue > * OK rename user/rename at devenv.blue user/rename at devenv.blue > * OK rename user/rename/Drafts at devenv.blue user/rename/Drafts at devenv.blue > * OK rename user/rename/Junk at devenv.blue user/rename/Junk at devenv.blue > * OK rename user/rename/Outbox at devenv.blue user/rename/Outbox at devenv.blue > * OK rename user/rename/Sent at devenv.blue user/rename/Sent at devenv.blue > * OK rename user/rename/Trash at devenv.blue user/rename/Trash at devenv.blue > . OK Completed > > > Which moves the mailbox to the partition where I want it (its original one). > > > The problem with the non-atomic rename is that our replication target receives data belonging to the default partition, which is not desired or expected. > > > > > Thomas Cataldo > Directeur Technique > > (+33) 6 42 25 91 38 > > BlueMind > +33 (0)5 81 91 55 60 > Hotel des T?l?coms, 40 rue du village d'entreprises > 31670 Lab?ge, France > www.bluemind.net / https://blog.bluemind.net/fr/ > -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.menge at zdv.uni-tuebingen.de Tue Nov 5 06:56:51 2019 From: michael.menge at zdv.uni-tuebingen.de (Michael Menge) Date: Tue, 05 Nov 2019 12:56:51 +0100 Subject: time for cyrus-imap v3.2? In-Reply-To: <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> Message-ID: <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> Hi all, there are some bugs in cyrus 3.0/3.1 that i would like to see fixed and I want to make sure that these changes will be able to be included after 3.2 is released or will be fixed before 3.2 is released: #2659 allow rename back from deleted mailbox when conversations is enabled #2599 bug renaming/deleting special use folders in murder setup #2598 squat search_engine not used Also fixing "#2774 Murder does not work with TLS" would be appreciate, if not possible the murder documentation should at least been updated Quoting my mail https://lists.andrew.cmu.edu/pipermail/cyrus-devel/2018-July/004297.html > Quoting ellie timoney : >> >> Anyway, it looks to me like the STARTTLS support in mupdate is just >> fundamentally broken at the moment, and my recommendation is to >> not use it. If your IMAP servers need to connect to an mupdate >> server that's not within their trusted network, I guess you'd need >> to set up a VPN for it or something along those lines (but I'm no >> network specialist). >> > could you add a warning in the relevant murder/installation guides > and manuals? Quoting Bron Gondwana : > On Tue, Nov 5, 2019, at 12:04, Ricardo Signes wrote: >> So, I think the plan was to cut a stable Cyrus 3.2 after we had >> stable JMAP. Is that time now? We talked about this on the Zoom >> call today. > > I think we're pretty close to it. The big question is: do we fork > what will eventually become 3.2 and keep stabilising on it while we > ship UUID mailboxes on master, or do we finish 3.2 before we merge > uuid mailboxes. > >> Cyrus master has pretty stable for JMAP core and mail. I think we >> need to do one more pass through to look for places where Cyrus >> extensions might leak through without the correct `using` options, >> but apart from that, I don't think we expect its mail API to change >> apart from bugfixes. > > Yep, legit. The one big thing still missing there is > PushSubscriptions. I'd be keen to finish writing that. I mean: > > https://github.com/cyrusimap/cyrus-imapd/issues?q=is%3Aopen+is%3Aissue+label%3A3.2 > > We should probably do a push and resolve all of those, then boom let's go. > there are some bugs in cyrus 3.0/3.1 that i would like to see fixed and I want to make sure that these changes will be able to be included after 3.2 is released or will be fixed before 3.2 is released: #2659 allow rename back from deleted mailbox when conversations is enabled #2599 bug renaming/deleting special use folders in murder setup #2598 squat search_engine not used Also fixing "#2774 Murder does not work with TLS" would be appreciate, if not possible the murder documentation should at least been updated Quoting my mail https://lists.andrew.cmu.edu/pipermail/cyrus-devel/2018-July/004297.html > Quoting ellie timoney : >> >> Anyway, it looks to me like the STARTTLS support in mupdate is just >> fundamentally broken at the moment, and my recommendation is to >> not use it. If your IMAP servers need to connect to an mupdate >> server that's not within their trusted network, I guess you'd need >> to set up a VPN for it or something along those lines (but I'm no >> network specialist). >> > could you add a warning in the relevant murder/installation guides > and manuals? -------------------------------------------------------------------------------- M.Menge Tel.: (49) 7071/29-70316 Universit?t T?bingen Fax.: (49) 7071/29-5912 Zentrum f?r Datenverarbeitung mail: michael.menge at zdv.uni-tuebingen.de W?chterstra?e 76 72074 T?bingen From brong at fastmailteam.com Tue Nov 5 07:25:59 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Tue, 05 Nov 2019 23:25:59 +1100 Subject: time for cyrus-imap v3.2? In-Reply-To: <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> Message-ID: <8ec13b46-cf19-4681-806a-c708c2d376d6@www.fastmail.com> I've tagged those 4 issues for 3.2. We're going to try to work out what work is necessary for 3.2 to be done, so knowing that these are important is valuable. Cheers, Bron. On Tue, Nov 5, 2019, at 22:56, Michael Menge wrote: > Hi all, > > there are some bugs in cyrus 3.0/3.1 that i would like to see fixed > and I want to make sure that these changes will be able to be > included after 3.2 is released or will be fixed before 3.2 is released: > > #2659 allow rename back from deleted mailbox when conversations is enabled > #2599 bug renaming/deleting special use folders in murder setup > #2598 squat search_engine not used > > Also fixing "#2774 Murder does not work with TLS" would be > appreciate, if not possible the murder documentation should > at least been updated > > Quoting my mail > https://lists.andrew.cmu.edu/pipermail/cyrus-devel/2018-July/004297.html > > > Quoting ellie timoney : > >> > >> Anyway, it looks to me like the STARTTLS support in mupdate is just > >> fundamentally broken at the moment, and my recommendation is to > >> not use it. If your IMAP servers need to connect to an mupdate > >> server that's not within their trusted network, I guess you'd need > >> to set up a VPN for it or something along those lines (but I'm no > >> network specialist). > >> > > could you add a warning in the relevant murder/installation guides > > and manuals? > > Quoting Bron Gondwana : > > > On Tue, Nov 5, 2019, at 12:04, Ricardo Signes wrote: > >> So, I think the plan was to cut a stable Cyrus 3.2 after we had > >> stable JMAP. Is that time now? We talked about this on the Zoom > >> call today. > > > > I think we're pretty close to it. The big question is: do we fork > > what will eventually become 3.2 and keep stabilising on it while we > > ship UUID mailboxes on master, or do we finish 3.2 before we merge > > uuid mailboxes. > > > >> Cyrus master has pretty stable for JMAP core and mail. I think we > >> need to do one more pass through to look for places where Cyrus > >> extensions might leak through without the correct `using` options, > >> but apart from that, I don't think we expect its mail API to change > >> apart from bugfixes. > > > > Yep, legit. The one big thing still missing there is > > PushSubscriptions. I'd be keen to finish writing that. I mean: > > > > https://github.com/cyrusimap/cyrus-imapd/issues?q=is%3Aopen+is%3Aissue+label%3A3.2 > > > > We should probably do a push and resolve all of those, then boom let's go. > > > > there are some bugs in cyrus 3.0/3.1 that i would like to see fixed > and I want to make sure that these changes will be able to be > included after 3.2 is released or will be fixed before 3.2 is released: > > #2659 allow rename back from deleted mailbox when conversations is enabled > #2599 bug renaming/deleting special use folders in murder setup > #2598 squat search_engine not used > > Also fixing "#2774 Murder does not work with TLS" would be > appreciate, if not possible the murder documentation should > at least been updated > > Quoting my mail > https://lists.andrew.cmu.edu/pipermail/cyrus-devel/2018-July/004297.html > > > Quoting ellie timoney : > >> > >> Anyway, it looks to me like the STARTTLS support in mupdate is just > >> fundamentally broken at the moment, and my recommendation is to > >> not use it. If your IMAP servers need to connect to an mupdate > >> server that's not within their trusted network, I guess you'd need > >> to set up a VPN for it or something along those lines (but I'm no > >> network specialist). > >> > > could you add a warning in the relevant murder/installation guides > > and manuals? > > > > > > > -------------------------------------------------------------------------------- > M.Menge Tel.: (49) 7071/29-70316 > Universit?t T?bingen Fax.: (49) 7071/29-5912 > Zentrum f?r Datenverarbeitung mail: > michael.menge at zdv.uni-tuebingen.de > W?chterstra?e 76 > 72074 T?bingen > > -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at anatoli.ws Tue Nov 5 11:44:46 2019 From: me at anatoli.ws (Anatoli) Date: Tue, 5 Nov 2019 13:44:46 -0300 Subject: time for cyrus-imap v3.2? In-Reply-To: <8ec13b46-cf19-4681-806a-c708c2d376d6@www.fastmail.com> References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> <8ec13b46-cf19-4681-806a-c708c2d376d6@www.fastmail.com> Message-ID: Hi All! Bron, for deployments I manage these issues are also important: * #1763 (Backups for SMB (lock entire server for a moment while taking a snapshot)). Don't know if there was any progress on this. Basically, a short (milliseconds to a few seconds) global write lock is needed on all data structures. * #1765 (Move SNMP out from master into a separate daemon) and related pending PR #2100. Ellie had significant progress on this, don't know what's blocking it, but this issue basically blocks any further work on privilege separation like chroot as the main process should retain root while running and the forked children should proceed with setuid & chroot. * #2373 (Shared xDAV (CalDAV/CardDAV) resources are not discoverable). Dilyan Palauzov sent a diff for this in the github repo and there was a discussion with Ken on possible implementations (shared xDAV resources): https://lists.andrew.cmu.edu/pipermail/cyrus-devel/2018-May/004263.html. I guess it had enough progress to try to close it. * #2372 ([FR] ACL on autocreate folders). Basically, for automatic "anyone p" ACL in plus+addressing folders. And there are 46 open PRs in the repo. Maybe they could be reviewed and merged too? Regards, Anatoli On 5/11/19 09:25, Bron Gondwana wrote: > I've tagged those 4 issues for 3.2. > > We're going to try to work out what work is necessary for 3.2 to be > done, so knowing that these are important is valuable. > > Cheers, > > Bron. > > On Tue, Nov 5, 2019, at 22:56, Michael Menge wrote: >> Hi all, >> >> there are some bugs in cyrus 3.0/3.1 that i would like to see fixed >> and I want to make sure that these changes will be able to be >> included after 3.2 is released or will be fixed before 3.2 is released: >> >> #2659 allow rename back from deleted mailbox when conversations is enabled >> #2599 bug renaming/deleting special use folders in murder setup >> #2598 squat search_engine not used >> >> Also fixing "#2774 Murder does not work with TLS" would be >> appreciate, if not possible the murder documentation should >> at least been updated >> >> Quoting my mail?? >> https://lists.andrew.cmu.edu/pipermail/cyrus-devel/2018-July/004297.html >> >> > Quoting ellie timoney : >> >> >> >> Anyway, it looks to me like the STARTTLS support in mupdate is just?? >> >>? fundamentally broken at the moment, and my recommendation is to?? >> >> not? use it.? If your IMAP servers need to connect to an mupdate?? >> >> server? that's not within their trusted network, I guess you'd need?? >> >> to set? up a VPN for it or something along those lines (but I'm no?? >> >> network? specialist). >> >> >> > could you add a warning in the relevant murder/installation guides?? >> > and? manuals? >> >> Quoting Bron Gondwana : >> >> > On Tue, Nov 5, 2019, at 12:04, Ricardo Signes wrote: >> >> So, I think the plan was to cut a stable Cyrus 3.2 after we had?? >> >> stable JMAP. Is that time now? We talked about this on the Zoom?? >> >> call today. >> > >> > I think we're pretty close to it. The big question is: do we fork?? >> > what will eventually become 3.2 and keep stabilising on it while we?? >> > ship UUID mailboxes on master, or do we finish 3.2 before we merge?? >> > uuid mailboxes. >> > >> >> Cyrus master has pretty stable for JMAP core and mail. I think we?? >> >> need to do one more pass through to look for places where Cyrus?? >> >> extensions might leak through without the correct `using` options,?? >> >> but apart from that, I don't think we expect its mail API to change?? >> >> apart from bugfixes. >> > >> > Yep, legit. The one big thing still missing there is?? >> > PushSubscriptions. I'd be keen to finish writing that. I mean: >> > >> > >> https://github.com/cyrusimap/cyrus-imapd/issues?q=is%3Aopen+is%3Aissue+label%3A3.2 >> > >> > We should probably do a push and resolve all of those, then boom >> let's go. >> > >> >> there are some bugs in cyrus 3.0/3.1 that i would like to see fixed >> and I want to make sure that these changes will be able to be >> included after 3.2 is released or will be fixed before 3.2 is released: >> >> #2659 allow rename back from deleted mailbox when conversations is enabled >> #2599 bug renaming/deleting special use folders in murder setup >> #2598 squat search_engine not used >> >> Also fixing "#2774 Murder does not work with TLS" would be >> appreciate, if not possible the murder documentation should >> at least been updated >> >> Quoting my mail?? >> https://lists.andrew.cmu.edu/pipermail/cyrus-devel/2018-July/004297.html >> >> > Quoting ellie timoney : >> >> >> >> Anyway, it looks to me like the STARTTLS support in mupdate is just?? >> >>? fundamentally broken at the moment, and my recommendation is to?? >> >> not? use it.? If your IMAP servers need to connect to an mupdate?? >> >> server? that's not within their trusted network, I guess you'd need?? >> >> to set? up a VPN for it or something along those lines (but I'm no?? >> >> network? specialist). >> >> >> > could you add a warning in the relevant murder/installation guides?? >> > and? manuals? >> >> >> >> >> >> >> -------------------------------------------------------------------------------- >> M.Menge??????????????????????????????? Tel.: (49) 7071/29-70316 >> Universit?t T?bingen?????????????????? Fax.: (49) 7071/29-5912 >> Zentrum f?r Datenverarbeitung????????? mail:?? >> michael.menge at zdv.uni-tuebingen.de >> W?chterstra?e 76 >> 72074 T?bingen >> >> > > -- > ? Bron Gondwana, CEO, Fastmail Pty Ltd > ? brong at fastmailteam.com > > From brong at fastmailteam.com Tue Nov 5 16:20:10 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Wed, 06 Nov 2019 08:20:10 +1100 Subject: time for cyrus-imap v3.2? In-Reply-To: References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> <8ec13b46-cf19-4681-806a-c708c2d376d6@www.fastmail.com> Message-ID: <00a9b398-b305-4bb4-90a0-b528c2d21651@www.fastmail.com> On Wed, Nov 6, 2019, at 03:44, Anatoli via Cyrus-devel wrote: > Hi All! > > Bron, for deployments I manage these issues are also important: First of all - thanks for writing this up. It really helps! > * #1763 (Backups for SMB (lock entire server for a moment while taking a > snapshot)). Don't know if there was any progress on this. Basically, a > short (milliseconds to a few seconds) global write lock is needed on all > data structures. This is not easy unfortunately with all the different datastructures, because it means that everything else which takes a lock is going to need to first take a global shared lock before it does anything else, and that's going to have a performance and complexity impact on everything - because you have to find them ALL or you might wind up with lock inversions down the line. > * #1765 (Move SNMP out from master into a separate daemon) and related > pending PR #2100. Ellie had significant progress on this, don't know > what's blocking it, but this issue basically blocks any further work on > privilege separation like chroot as the main process should retain root > while running and the forked children should proceed with setuid & chroot. Good point - this is something the Greg was close to having done many years ago, but we're not using snmp so it hasn't caused us stress. Happy to put that on the consideration list for 3.2. The downside of making the list of tasks for 3.2 really long is that it could block releasing something which is otherwise still a good improvement over 3.0 and not a regression... *sigh*. But this one will be a good win, so let's do it! > * #2373 (Shared xDAV (CalDAV/CardDAV) resources are not discoverable). > Dilyan Palauzov sent a diff for this in the github repo and there was a > discussion with Ken on possible implementations (shared xDAV resources): > https://lists.andrew.cmu.edu/pipermail/cyrus-devel/2018-May/004263.html. > I guess it had enough progress to try to close it. Labeled. I'm keen to have an answer to it somehow or other. > > * #2372 ([FR] ACL on autocreate folders). Basically, for automatic > "anyone p" ACL in plus+addressing folders. Yep - labeled. OK, the hard bit here isn't implementing (as ellie pointed out) - it's design. We want to make sure we create an interface that people can keep using reliably into the future. I'll have a chat with ellie about this one. > And there are 46 open PRs in the repo. Maybe they could be reviewed and > merged too? Yeah, maybe! Frustratingly the next couple of Cyrus call times aren't going to work for me, I've got a 7am Melbourne time meeting next Tuesday, then I'll be in Singapore for IETF where the Cyrus meeting time is 5am. One downside of pretty much everyone involved in direct Cyrus development being at Fastmail is that we discuss a lot of things in our private slack channel or internal mailing lists where we don't have to be quite so careful about stripping anything that could identify an internal customer... but it does create an impression that there's less happening than you'd otherwise see... and I haven't even posted the meeting minutes recently because they've been taken into a Dropbox paper doc and then langished there :( Sorry. Cheers, Bron. -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellie at fastmail.com Tue Nov 5 17:18:57 2019 From: ellie at fastmail.com (ellie timoney) Date: Wed, 06 Nov 2019 09:18:57 +1100 Subject: time for cyrus-imap v3.2? In-Reply-To: <00a9b398-b305-4bb4-90a0-b528c2d21651@www.fastmail.com> References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> <8ec13b46-cf19-4681-806a-c708c2d376d6@www.fastmail.com> <00a9b398-b305-4bb4-90a0-b528c2d21651@www.fastmail.com> Message-ID: <8d755ed9-e5c1-416c-b033-d9d515f464f0@www.fastmail.com> On Wed, Nov 6, 2019, at 8:20 AM, Bron Gondwana wrote: > * #1765 (Move SNMP out from master into a separate daemon) and related >> pending PR #2100. Ellie had significant progress on this, don't know >> what's blocking it, but this issue basically blocks any further work on >> privilege separation like chroot as the main process should retain root >> while running and the forked children should proceed with setuid & chroot. > > Good point - this is something the Greg was close to having done many years ago, but we're not using snmp so it hasn't caused us stress. Happy to put that on the consideration list for 3.2. The plan with SNMP is to throw it in the bin, and the prometheus stuff is its replacement. I have an open PR to do the "throwing in the bin" but haven't merged it yet. The prometheus stuff is merged on master, but I don't think we (FM) are making full use of the latest iteration of it (which fixed a bunch of performance issues) yet, and I won't quite trust it until I see it under load. I would like to have this considered stable for 3.2, but inasmuch as is currently possible the work is already done and sitting on master for people to play with. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellie at fastmail.com Tue Nov 5 17:24:39 2019 From: ellie at fastmail.com (ellie timoney) Date: Wed, 06 Nov 2019 09:24:39 +1100 Subject: time for cyrus-imap v3.2? In-Reply-To: <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> Message-ID: <64e9e39f-9d37-4040-bac0-6d9f980efa93@www.fastmail.com> On Tue, Nov 5, 2019, at 4:44 PM, Bron Gondwana wrote: > On Tue, Nov 5, 2019, at 12:04, Ricardo Signes wrote: >> So, I think the plan was to cut a stable Cyrus 3.2 after we had stable JMAP. Is that time now? We talked about this on the Zoom call today. > > I think we're pretty close to it. The big question is: do we fork what will eventually become 3.2 and keep stabilising on it while we ship UUID mailboxes on master, or do we finish 3.2 before we merge uuid mailboxes. I don't think we can include uuid mailboxes in 3.2 -- it's too new/untested, whereas this is a "stable release". (But I don't think you were proposing this.) Whether we fork the 3.2 branch now, or wait until we're closer to releasing it, doesn't really matter to me. Though if we have a bunch of stuff we're still stabilising, it's always easier to do that work on master only rather than juggling it on two branches. But either way, it does mean the mailboxes-by-id branch needs to keep sitting on the side and being rebased until after 3.2 becomes its own branch. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brong at fastmailteam.com Tue Nov 5 18:56:44 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Wed, 06 Nov 2019 10:56:44 +1100 Subject: time for cyrus-imap v3.2? In-Reply-To: <64e9e39f-9d37-4040-bac0-6d9f980efa93@www.fastmail.com> References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <64e9e39f-9d37-4040-bac0-6d9f980efa93@www.fastmail.com> Message-ID: <55f3cf60-c5d3-4fec-b074-b4a16d5a5286@www.fastmail.com> On Wed, Nov 6, 2019, at 09:24, ellie timoney wrote: > On Tue, Nov 5, 2019, at 4:44 PM, Bron Gondwana wrote: >> On Tue, Nov 5, 2019, at 12:04, Ricardo Signes wrote: >>> So, I think the plan was to cut a stable Cyrus 3.2 after we had stable JMAP. Is that time now? We talked about this on the Zoom call today. >> >> I think we're pretty close to it. The big question is: do we fork what will eventually become 3.2 and keep stabilising on it while we ship UUID mailboxes on master, or do we finish 3.2 before we merge uuid mailboxes. > > I don't think we can include uuid mailboxes in 3.2 -- it's too new/untested, whereas this is a "stable release". (But I don't think you were proposing this.) No - the idea is to fork 3.2 just before uuid mailboxes lands. The question is: 1) fork now, put all other fixes on both branches. 2) do the 3.2 prep work first on master, then fork that before merging uuidmailboxes. > Whether we fork the 3.2 branch now, or wait until we're closer to releasing it, doesn't really matter to me. Though if we have a bunch of stuff we're still stabilising, it's always easier to do that work on master only rather than juggling it on two branches. But either way, it does mean the mailboxes-by-id branch needs to keep sitting on the side and being rebased until after 3.2 becomes its own branch. Yeah, that's the challenge isn't it :) Which is less work / safer / more understandable. Bron. -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellie at fastmail.com Tue Nov 5 19:06:58 2019 From: ellie at fastmail.com (ellie timoney) Date: Wed, 06 Nov 2019 11:06:58 +1100 Subject: time for cyrus-imap v3.2? In-Reply-To: <55f3cf60-c5d3-4fec-b074-b4a16d5a5286@www.fastmail.com> References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <64e9e39f-9d37-4040-bac0-6d9f980efa93@www.fastmail.com> <55f3cf60-c5d3-4fec-b074-b4a16d5a5286@www.fastmail.com> Message-ID: <73586969-164c-4abe-a635-e66afc773e9e@www.fastmail.com> On Wed, Nov 6, 2019, at 10:56 AM, Bron Gondwana wrote: > On Wed, Nov 6, 2019, at 09:24, ellie timoney wrote: >> On Tue, Nov 5, 2019, at 4:44 PM, Bron Gondwana wrote: >>> On Tue, Nov 5, 2019, at 12:04, Ricardo Signes wrote: >>>> So, I think the plan was to cut a stable Cyrus 3.2 after we had stable JMAP. Is that time now? We talked about this on the Zoom call today. >>> >>> I think we're pretty close to it. The big question is: do we fork what will eventually become 3.2 and keep stabilising on it while we ship UUID mailboxes on master, or do we finish 3.2 before we merge uuid mailboxes. >> >> I don't think we can include uuid mailboxes in 3.2 -- it's too new/untested, whereas this is a "stable release". (But I don't think you were proposing this.) > > No - the idea is to fork 3.2 just before uuid mailboxes lands. The question is: > > 1) fork now, put all other fixes on both branches. > 2) do the 3.2 prep work first on master, then fork that before merging uuidmailboxes. > >> Whether we fork the 3.2 branch now, or wait until we're closer to releasing it, doesn't really matter to me. Though if we have a bunch of stuff we're still stabilising, it's always easier to do that work on master only rather than juggling it on two branches. But either way, it does mean the mailboxes-by-id branch needs to keep sitting on the side and being rebased until after 3.2 becomes its own branch. > > Yeah, that's the challenge isn't it :) Which is less work / safer / more understandable. Once we land mailboxes-by-id on master, I think there's gonna be a lot of big differences between 3.2 and master, which will make fixes-on-both a nuisance (no easy cherry-picks). So I think I've convinced myself we need to get 3.2 to a place we're happy with first, to avoid all the duplication. Though... I suppose the same duplication happens anyway, but in the form of rebases to the mailboxes-by-id branch instead... -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellie at fastmail.com Wed Nov 6 00:02:35 2019 From: ellie at fastmail.com (ellie timoney) Date: Wed, 06 Nov 2019 16:02:35 +1100 Subject: =?UTF-8?Q?Re:_Which_imap_command_to_rename_a_root_mailbox_while_maintain?= =?UTF-8?Q?ing_its_partition?= In-Reply-To: References: <13F3250B-9B46-4C37-81DF-2F68A5D93F0B@bluemind.net> <6d9174c8-a1f2-9268-571f-e2459fd08cc5@fastmail.com> <742CEDA9-4792-4176-B938-234A7808FDD1@bluemind.net> Message-ID: <1e854b7f-8240-4484-bd7b-cf89227e2d63@www.fastmail.com> "x RENAME old new" is now fixed such that the renamed mailbox will remain on its original partition (on both the master and cyrus-imapd-3.0 branches), instead of accidentally going through the choose-a-partition logic. You still can't provide an explicit partition unless the mailbox name isn't changing, even if the partition you name is the original partition. On Tue, Nov 5, 2019, at 7:08 PM, Bron Gondwana wrote: > Wow - this looks like a bug in partition selection for user rename then :( We should fix that. > > https://github.com/cyrusimap/cyrus-imapd/issues/2907 > > Cheers, > > Bron. > > On Tue, Nov 5, 2019, at 17:30, Thomas Cataldo wrote: >> >> >> > On 29 Oct 2019, at 13:13, Ken Murchison wrote: >> > >> > x RENAME >> > >> > >> > should work >> >> >> Agree, but it does not :-) >> >> At least with version 3.0.8 : >> >> localhost> info user/ren at devenv.blue >> {user/ren at devenv.blue}: >> private: >> check: NIL >> checkperiod: NIL >> comment: NIL >> sort: NIL >> specialuse: NIL >> thread: NIL >> expire: NIL >> news2mail: NIL >> sieve: NIL >> squat: NIL >> shared: >> check: NIL >> checkperiod: NIL >> comment: NIL >> sort: NIL >> specialuse: NIL >> thread: NIL >> annotsize: 0 >> duplicatedeliver: false >> expire: NIL >> lastpop: NIL >> lastupdate: 4-Nov-2019 15:32:13 +0000 >> news2mail: NIL >> partition: bm-master__devenv_blue >> pop3newuidl: true >> pop3showafter: NIL >> sharedseen: false >> sieve: NIL >> size: 32310 >> squat: NIL >> synccrcs: 2599665889 0 >> uniqueid: ee8ede37-153a-4650-bf94-3da7d4f52043 >> >> >> An IMAP session as admin : >> >> telnet localhost 1143 >> Trying ::1... >> Trying 127.0.0.1... >> Connected to localhost. >> Escape character is '^]'. >> * OK [CAPABILITY IMAP4rev1 LITERAL+ ID ENABLE AUTH=PLAIN SASL-IR] server ready >> . login admin0 admin >> . OK [CAPABILITY IMAP4rev1 LITERAL+ ID ENABLE ACL RIGHTS=kxten QUOTA MAILBOX-REFERRALS NAMESPACE UIDPLUS NO_ATOMIC_RENAME UNSELECT CHILDREN MULTIAPPEND BINARY CATENATE CONDSTORE ESEARCH SEARCH=FUZZY SORT SORT=MODSEQ SORT=DISPLAY SORT=UID THREAD=ORDEREDSUBJECT THREAD=REFERENCES THREAD=REFS ANNOTATEMORE ANNOTATE-EXPERIMENT-1 METADATA LIST-EXTENDED LIST-STATUS LIST-MYRIGHTS LIST-METADATA WITHIN QRESYNC SCAN XLIST XMOVE MOVE SPECIAL-USE CREATE-SPECIAL-USE DIGEST=SHA1 X-REPLICATION URLAUTH URLAUTH=BINARY LOGINDISABLED COMPRESS=DEFLATE X-QUOTA=STORAGE X-QUOTA=MESSAGE X-QUOTA=X-ANNOTATION-STORAGE X-QUOTA=X-NUM-FOLDERS IDLE] User logged in SESSIONID= >> . RENAME user/ren at devenv.blue user/rename at devenv.blue bm-master__devenv_blue >> . NO Cross-server or cross-partition move w/rename not supported >> . RENAME user/ren at devenv.blue user/rename at devenv.blue >> * OK rename user/ren at devenv.blue user/rename at devenv.blue >> * OK rename user/ren/Drafts at devenv.blue user/rename/Drafts at devenv.blue >> * OK rename user/ren/Junk at devenv.blue user/rename/Junk at devenv.blue >> * OK rename user/ren/Outbox at devenv.blue user/rename/Outbox at devenv.blue >> * OK rename user/ren/Sent at devenv.blue user/rename/Sent at devenv.blue >> * OK rename user/ren/Trash at devenv.blue user/rename/Trash at devenv.blue >> . OK Completed >> >> But if I use the version without an explicit partition, the new mailbox ends up in : >> >> > info user/rename at devenv.blue >> {user/rename at devenv.blue}: >> private: >> check: NIL >> checkperiod: NIL >> comment: NIL >> sort: NIL >> specialuse: NIL >> thread: NIL >> expire: NIL >> news2mail: NIL >> sieve: NIL >> squat: NIL >> shared: >> check: NIL >> checkperiod: NIL >> comment: NIL >> sort: NIL >> specialuse: NIL >> thread: NIL >> annotsize: 0 >> duplicatedeliver: false >> expire: NIL >> lastpop: NIL >> lastupdate: 4-Nov-2019 16:43:36 +0000 >> news2mail: NIL >> partition: default >> pop3newuidl: true >> pop3showafter: NIL >> sharedseen: false >> sieve: NIL >> size: 32310 >> squat: NIL >> synccrcs: 2599665889 0 >> uniqueid: ee8ede37-153a-4650-bf94-3da7d4f52043 >> >> >> which forces me issue a second command in my imap session : >> >> . RENAME user/rename at devenv.blue user/rename at devenv.blue bm-master__devenv_blue >> * OK rename user/rename at devenv.blue user/rename at devenv.blue >> * OK rename user/rename/Drafts at devenv.blue user/rename/Drafts at devenv.blue >> * OK rename user/rename/Junk at devenv.blue user/rename/Junk at devenv.blue >> * OK rename user/rename/Outbox at devenv.blue user/rename/Outbox at devenv.blue >> * OK rename user/rename/Sent at devenv.blue user/rename/Sent at devenv.blue >> * OK rename user/rename/Trash at devenv.blue user/rename/Trash at devenv.blue >> . OK Completed >> >> >> Which moves the mailbox to the partition where I want it (its original one). >> >> >> The problem with the non-atomic rename is that our replication target receives data belonging to the default partition, which is not desired or expected. >> >> >> >> >> Thomas Cataldo >> Directeur Technique >> >> (+33) 6 42 25 91 38 >> >> BlueMind >> +33 (0)5 81 91 55 60 >> Hotel des T?l?coms, 40 rue du village d'entreprises >> 31670 Lab?ge, France >> www.bluemind.net / https://blog.bluemind.net/fr/ >> > > -- > Bron Gondwana, CEO, Fastmail Pty Ltd > brong at fastmailteam.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dilyan.palauzov at aegee.org Wed Nov 6 13:44:23 2019 From: dilyan.palauzov at aegee.org (=?UTF-8?Q?=D0=94=D0=B8=D0=BB=D1=8F=D0=BD_?= =?UTF-8?Q?=D0=9F=D0=B0=D0=BB=D0=B0=D1=83=D0=B7=D0=BE=D0=B2?=) Date: Wed, 06 Nov 2019 18:44:23 +0000 Subject: #2373 (Shared xDAV (CalDAV/CardDAV) resources are not discoverable) / Re: time for cyrus-imap v3.2? In-Reply-To: References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> <8ec13b46-cf19-4681-806a-c708c2d376d6@www.fastmail.com> Message-ID: <4a325b594fadc209222e0a4b9e63a7116198bdd8.camel@aegee.org> Hello, > * #2373 (Shared xDAV (CalDAV/CardDAV) resources are not discoverable). > Dilyan Palauzov sent a diff for this in the github repo and there was a > discussion with Ken on possible implementations (shared xDAV resources): > https://lists.andrew.cmu.edu/pipermail/cyrus-devel/2018-May/004263.html. > I guess it had enough progress to try to close it. as far as I remember there is some DAV sharing progress in the 3.2 upcoming release, but I do not know the details. I guess it handles just scheduling of events of shared/unowned calendars, but not the discovery. At CalDAV level the options for discovery of foreign collections are: - clients (CUAs) support for sharing resources the model described at https://evertpot.com/webdav-caldav-carddav-sharing/ , or - CUAs they discover other users by utilizing the DAV:princapal-property-search to discover the home-sets of another user, from there obtain the accessible calendars, and the CUA memorizes to which calendar it has subscribed. I am not aware of any CUA which does anything of the above. The third option is - CUAs do not change, but the server changes. Once the properties of a home-set (user4/#calendars or user7/#addressbooks ) have READ/LOOKUP righth, these home-sets are returned to the CUA, when CUA asks for the accessible home-sets. On the next step the CUA iterates over all returned home-sets and fetches the collections to which it has access (READ/Lookup ACL). If there is a calendar that does not belong to any user (created with hack command over IMAP), then this calendar could be a part of a fictional /dav/calendars/user/anonymous at domain/ home-set. Such no-owner calendars have to my knowledge no-advantage over calendars belonging to the user/ namespace. The advantage of the last method is, that clients do not have to add any new features. The disadvantage is, that all users see all calendars instantly they have access to, and these can be too much. In particular, clients that have implemented the first two models/workflows, do not have any advantage over clients which have not implemented that two workflows. Greetings ????? From me at anatoli.ws Wed Nov 6 23:46:41 2019 From: me at anatoli.ws (Anatoli) Date: Thu, 7 Nov 2019 01:46:41 -0300 Subject: time for cyrus-imap v3.2? In-Reply-To: <00a9b398-b305-4bb4-90a0-b528c2d21651@www.fastmail.com> References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> <8ec13b46-cf19-4681-806a-c708c2d376d6@www.fastmail.com> <00a9b398-b305-4bb4-90a0-b528c2d21651@www.fastmail.com> Message-ID: <9670005b-ff6d-ad71-fd11-1c77654f438f@anatoli.ws> Bron, Thanks for your detailed reply and the work the FM team is doing! > This is not easy unfortunately with all the different datastructures, > because it means that everything else which takes a lock is going to > need to first take a global shared lock before it does anything else, > and that's going to have a performance and complexity impact on > everything - because you have to find them ALL or you might wind up > with lock inversions down the line. One solution I see is to have a separate single lock, not even a lock per se, but a barrier. I.e. before every write operation there's an atomic check for a flag that write operations should pause. So the code checks for the flag to be ON with some atomic operation like __atomic_load and if it's ON, it sleeps for some milliseconds and tries again. If the barrier is OFF, it continues. And before even checking for the barrier, there's an atomic increment (could be something like __atomic_fetch_add) of current write operations in execution. The pseudocode for the worker thread before it starts a write operation would be like this: __start: atomic_inc(write_operations_in_execution_counter) atomic_load(write_barrier) if (write_barrier == ON) { atomic_dec(write_operations_in_execution_counter) sleep(100ms) goto __start } else { perform_write_operation_with_its_own_locks() atomic_dec(write_operations_in_execution_counter) } And the code that sets the barrier would look something like this: atomic_store(write_barrier, ON) __check_write_counter: atomic_load(write_operations_in_execution_counter) if (write_operations_in_execution_counter == 0) { sync_data_to_disk() signal_data_ready() wait_for_lock_release_with_timeout(5s) atomic_store(write_barrier, OFF) } else { sleep(1ms) goto __check_write_counter } So, basically in the normal case the overhead would be about 3 (there are 3 atomic operations) * 1.35 (the overhead of atomic RAM access compared to a RAM read with a cache miss) * pointer_read_with_cache_miss, which is IMO negligible - the overall code is not so tuned for performance to worry about 4 RAM reads with cache misses. There could be a minimal contention when the barrier is set to ON, as the working threads are incrementing, checking barrier, decrementing the write_operations_in_execution_counter and the thread that set the barrier checks it for == 0, but taking into account the timings of the involved operations like sleeps (with context switches + different sleep intervals) and the inc, check, dec window, this should not be a problem. And it could be rewritten a bit to avoid even this small contention. > Good point - this is something the Greg was close to having done many > years ago, but we're not using snmp so it hasn't caused us stress. > Happy to put that on the consideration list for 3.2. If this is done, I'd try to implement chroot for Cyrus and then pledge & unveil for OpenBSD build. > Yep - labeled. OK, the hard bit here isn't implementing (as ellie > pointed out) - it's design. We want to make sure we create an > interface that people can keep using reliably into the future. I'll > have a chat with ellie about this one. Please let me know if you'd like my feedback once you decide with Ellie on possible directions. Thanks! Anatoli On 5/11/19 18:20, Bron Gondwana wrote: > On Wed, Nov 6, 2019, at 03:44, Anatoli via Cyrus-devel wrote: >> Hi All! >> >> Bron, for deployments I manage these issues are also important: > > First of all - thanks for writing this up.? It really helps! > >> * #1763 (Backups for SMB (lock entire server for a moment while taking a >> snapshot)). Don't know if there was any progress on this. Basically, a >> short (milliseconds to a few seconds) global write lock is needed on all >> data structures. > > This is not easy unfortunately with all the different datastructures, > because it means that everything else which takes a lock is going to > need to first take a global shared lock before it does anything else, > and that's going to have a performance and complexity impact on > everything - because you have to find them ALL or you might wind up with > lock inversions down the line. > >> * #1765 (Move SNMP out from master into a separate daemon) and related >> pending PR #2100. Ellie had significant progress on this, don't know >> what's blocking it, but this issue basically blocks any further work on >> privilege separation like chroot as the main process should retain root >> while running and the forked children should proceed with setuid & chroot. > > Good point - this is something the Greg was close to having done many > years ago, but we're not using snmp so it hasn't caused us stress.? > Happy to put that on the consideration list for 3.2. > > The downside of making the list of tasks for 3.2 really long is that it > could block releasing something which is otherwise still a good > improvement over 3.0 and not a regression... *sigh*.? But this one will > be a good win, so let's do it! > >> * #2373 (Shared xDAV (CalDAV/CardDAV) resources are not discoverable). >> Dilyan Palauzov sent a diff for this in the github repo and there was a >> discussion with Ken on possible implementations (shared xDAV resources): >> https://lists.andrew.cmu.edu/pipermail/cyrus-devel/2018-May/004263.html. >> I guess it had enough progress to try to close it. > > Labeled.? I'm keen to have an answer to it somehow or other. > >> >> * #2372 ([FR] ACL on autocreate folders). Basically, for automatic >> "anyone p" ACL in plus+addressing folders. > > Yep - labeled.? OK, the hard bit here isn't implementing (as ellie > pointed out) - it's design.? We want to make sure we create an interface > that people can keep using reliably into the future.? I'll have a chat > with ellie about this one. > >> And there are 46 open PRs in the repo. Maybe they could be reviewed and >> merged too? > > Yeah, maybe!? Frustratingly the next couple of Cyrus call times aren't > going to work for me, I've got a 7am Melbourne time meeting next > Tuesday, then I'll be in Singapore for IETF where the Cyrus meeting time > is 5am. > > One downside of pretty much everyone involved in direct Cyrus > development being at Fastmail is that we discuss a lot of things in our > private slack channel or internal mailing lists where we don't have to > be quite so careful about stripping anything that could identify an > internal customer... but it does create an impression that there's less > happening than you'd otherwise see... and I haven't even posted the > meeting minutes recently because they've been taken into a Dropbox paper > doc and then langished there :(? Sorry. > > Cheers, > > Bron. > > -- > ? Bron Gondwana, CEO, Fastmail Pty Ltd > ? brong at fastmailteam.com > > From brong at fastmailteam.com Thu Nov 7 17:56:54 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Fri, 08 Nov 2019 09:56:54 +1100 Subject: time for cyrus-imap v3.2? In-Reply-To: <9670005b-ff6d-ad71-fd11-1c77654f438f@anatoli.ws> References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> <8ec13b46-cf19-4681-806a-c708c2d376d6@www.fastmail.com> <00a9b398-b305-4bb4-90a0-b528c2d21651@www.fastmail.com> <9670005b-ff6d-ad71-fd11-1c77654f438f@anatoli.ws> Message-ID: <4f649f79-217b-4c4a-9000-abedd933a6f2@www.fastmail.com> On Thu, Nov 7, 2019, at 15:46, Anatoli via Cyrus-devel wrote: > Bron, > > Thanks for your detailed reply and the work the FM team is doing! > > > This is not easy unfortunately with all the different datastructures, > > because it means that everything else which takes a lock is going to > > need to first take a global shared lock before it does anything else, > > and that's going to have a performance and complexity impact on > > everything - because you have to find them ALL or you might wind up > > with lock inversions down the line. > > One solution I see is to have a separate single lock, not even a lock > per se, but a barrier. I.e. before every write operation How does that maintain consistency? I guess you don't get skew between files, but you still have to do crash recovery on every file. There's no single place to put this either. I think I still prefer the idea of a shared locks that wraps every single other lock, such that taking the snapshot pauses every attempt to start a new lock until it's done, so you always get a completely clean read equivalent to a clean shutdown. Bron. -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From hyc at highlandsun.com Fri Nov 8 15:06:23 2019 From: hyc at highlandsun.com (Howard Chu) Date: Fri, 8 Nov 2019 20:06:23 +0000 Subject: time for cyrus-imap v3.2? In-Reply-To: <4f649f79-217b-4c4a-9000-abedd933a6f2@www.fastmail.com> References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> <8ec13b46-cf19-4681-806a-c708c2d376d6@www.fastmail.com> <00a9b398-b305-4bb4-90a0-b528c2d21651@www.fastmail.com> <9670005b-ff6d-ad71-fd11-1c77654f438f@anatoli.ws> <4f649f79-217b-4c4a-9000-abedd933a6f2@www.fastmail.com> Message-ID: <9f076a08-3401-6d39-d0e2-cd789705ca79@highlandsun.com> Bron Gondwana wrote: > On Thu, Nov 7, 2019, at 15:46, Anatoli via Cyrus-devel wrote: >> Bron, >> >> Thanks for your detailed reply and the work the FM team is doing! >> >> > This is not easy unfortunately with all the different datastructures, >> > because it means that everything else which takes a lock is going to >> > need to first take a global shared lock before it does anything else, >> > and that's going to have a performance and complexity impact on >> > everything - because you have to find them ALL or you might wind up >> > with lock inversions down the line. >> >> One solution I see is to have a separate single lock, not even a lock >> per se, but a barrier. I.e. before every write operation > > How does that maintain consistency?? I guess you don't get skew between files, but you still have to do crash recovery on every file.? There's no single place > to put this either. > > I think I still prefer the idea of a shared locks that wraps every single other lock, such that taking the snapshot pauses every attempt to start a new lock > until it's done, so you always get a completely clean read equivalent to a clean shutdown. Not to sound like a broken record, but - if you were using named databases in LMDB for all of these separate data structures, you would get atomic snapshots for free. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ From me at anatoli.ws Mon Nov 11 22:50:10 2019 From: me at anatoli.ws (Anatoli) Date: Tue, 12 Nov 2019 00:50:10 -0300 Subject: time for cyrus-imap v3.2? In-Reply-To: <4f649f79-217b-4c4a-9000-abedd933a6f2@www.fastmail.com> References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> <8ec13b46-cf19-4681-806a-c708c2d376d6@www.fastmail.com> <00a9b398-b305-4bb4-90a0-b528c2d21651@www.fastmail.com> <9670005b-ff6d-ad71-fd11-1c77654f438f@anatoli.ws> <4f649f79-217b-4c4a-9000-abedd933a6f2@www.fastmail.com> Message-ID: Bron, The proposed algo is a barrier before any single-lock. In itself it's a single lock, but the same code (the pseudocode for the *worker thread* in my previous mail) should be inserted at *every* single-lock/write operation location. If there's no need to pause, the overhead is non-existent. If a pause is requested, all worker threads would pause at the entrance to any single-lock/write code. It would make the entire Cyrus daemon to complete all pending write operations and pause new ones. At this stage, if I understand it correctly, the data on disk would be in a consistent state, ready to take a snapshot or to perform some other operation. Without that, if we just take a snapshot of the fs, it could happen that a) some files are not written entirely (i.e. caught in the middle of a write operation) or b) the contents of some files are newer than the other, i.e. the logical write operation was not atomic (e.g. mail data is written but indexes are not updated yet or something similar). Maybe I didn't understand you correctly. Do you mean that finishing all writes and pausing new ones is not enough to guarantee an integral state of files on disk? If it's the case, what would have to be done to guarantee it (i.e. to make it like Cyrus was shutdown normally)? Regards, Anatoli On 7/11/19 19:56, Bron Gondwana wrote: > On Thu, Nov 7, 2019, at 15:46, Anatoli via Cyrus-devel wrote: >> Bron, >> >> Thanks for your detailed reply and the work the FM team is doing! >> >> > This is not easy unfortunately with all the different datastructures, >> > because it means that everything else which takes a lock is going to >> > need to first take a global shared lock before it does anything else, >> > and that's going to have a performance and complexity impact on >> > everything - because you have to find them ALL or you might wind up >> > with lock inversions down the line. >> >> One solution I see is to have a separate single lock, not even a lock >> per se, but a barrier. I.e. before every write operation > > How does that maintain consistency?? I guess you don't get skew between > files, but you still have to do crash recovery on every file.? There's > no single place to put this either. > > I think I still prefer the idea of a shared locks that wraps every > single other lock, such that taking the snapshot pauses every attempt to > start a new lock until it's done, so you always get a completely clean > read equivalent to a clean shutdown. > > Bron. > -- > ? Bron Gondwana, CEO, Fastmail Pty Ltd > ? brong at fastmailteam.com > > From brong at fastmailteam.com Tue Nov 12 04:20:10 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Tue, 12 Nov 2019 20:20:10 +1100 Subject: time for cyrus-imap v3.2? In-Reply-To: References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> <8ec13b46-cf19-4681-806a-c708c2d376d6@www.fastmail.com> <00a9b398-b305-4bb4-90a0-b528c2d21651@www.fastmail.com> <9670005b-ff6d-ad71-fd11-1c77654f438f@anatoli.ws> <4f649f79-217b-4c4a-9000-abedd933a6f2@www.fastmail.com> Message-ID: <3fe53aee-1088-4a2f-b7fe-9b0f02b60539@www.fastmail.com> On Tue, Nov 12, 2019, at 14:50, Anatoli wrote: > Bron, > > The proposed algo is a barrier before any single-lock. In itself it's a > single lock, but the same code (the pseudocode for the *worker thread* > in my previous mail) should be inserted at *every* single-lock/write > operation location. If there's no need to pause, the overhead is > non-existent. If a pause is requested, all worker threads would pause at > the entrance to any single-lock/write code. > > It would make the entire Cyrus daemon to complete all pending write > operations and pause new ones. At this stage, if I understand it > correctly, the data on disk would be in a consistent state, ready to > take a snapshot or to perform some other operation. "complete all pending write operations and pause new ones" How do you know when the current write operations are finished? > Without that, if we just take a snapshot of the fs, it could happen that > a) some files are not written entirely (i.e. caught in the middle of a > write operation) or b) the contents of some files are newer than the > other, i.e. the logical write operation was not atomic (e.g. mail data > is written but indexes are not updated yet or something similar). > > Maybe I didn't understand you correctly. Do you mean that finishing all > writes and pausing new ones is not enough to guarantee an integral state > of files on disk? If it's the case, what would have to be done to > guarantee it (i.e. to make it like Cyrus was shutdown normally)? I mean that to finish all writes and pause new ones, you need to know that the writes are finished. And not just writes, but sets of writes that are held under a lock together. The way I know to do this is a single global lock with the following properties: 1) every action which locks any file within Cyrus for writing takes a SHARED global lock before it takes the write lock on the file. 2) the SHARED lock is held for the duration of the writes, and released once the writes are finished. 3) the "backup utility" takes an EXCLUSIVE lock on the global lock, which will only be granted once each write is finished. It then takes a snapshot, and releases the EXCLUSIVE lock. This guarantees full consistency. The question that always exists for locks is "what granularity" - too wide, and you hold the lock for a long time. Too narrow, and you take and release it very frequently, adding overhead. My first and most dumbest theory is to go quite wide - add the lock in every runloop and command line utility such that it's held for the entire running of the loop or the utility! Mostly these are done within a fraction of a second. The one place that might be interesting is FETCH 1:* RFC822.PEEK or similar in imapd, where we already have some locking magic that holds a shared namelock on the mailbox to stop repacking while it releases the index lock to allow other actions on the mailbox in the meanwhile. So we could go down a layer and only lock when we lock mailboxes or cyrusdbs, and refcount the global lock. This seems more likely to be a good layer, and not too horrible. The other thing is that we'll need to assert that the lock isn't being held during each daemon's command loop, so that bugs don't leak out to deadlock entire servers. And I think that's nearly it :) Bron. -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at anatoli.ws Wed Nov 13 00:12:12 2019 From: me at anatoli.ws (Anatoli) Date: Wed, 13 Nov 2019 02:12:12 -0300 Subject: time for cyrus-imap v3.2? In-Reply-To: <3fe53aee-1088-4a2f-b7fe-9b0f02b60539@www.fastmail.com> References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> <8ec13b46-cf19-4681-806a-c708c2d376d6@www.fastmail.com> <00a9b398-b305-4bb4-90a0-b528c2d21651@www.fastmail.com> <9670005b-ff6d-ad71-fd11-1c77654f438f@anatoli.ws> <4f649f79-217b-4c4a-9000-abedd933a6f2@www.fastmail.com> <3fe53aee-1088-4a2f-b7fe-9b0f02b60539@www.fastmail.com> Message-ID: > How do you know when the current write operations are finished? The pseudocode from my previous email: __start: atomic_inc(write_operations_in_execution_counter) atomic_load(write_barrier) if (write_barrier == ON) { atomic_dec(write_operations_in_execution_counter) sleep(100ms) goto __start } else { *perform_write_operation_with_its_own_locks()* atomic_dec(write_operations_in_execution_counter) } Here perform_write_operation_with_its_own_locks() is a function (one among many) that today performs a write operation (the algorithm implies that all occurrences of write operations should be wrapped with this pseudocode). Once a write function returns (which would mean that this particular write operation completed), the write_operations_in_execution_counter is decremented. When all operations complete, the counter would be == 0, which is detected by the barrier thread and it proceeds with the sync/fsync to have all files fully written to disk & notifies the external process waiting to take a snapshot. Then it turns off the barrier and the daemon continues normal operation. This is similar to a single global lock as you describe it, and it sort of behaves like it, but it doesn't need to be acquired as such every time and it has practically 0 overhead so it could be placed inside high-frequency loops. It's similar to how modern lock-free concurrency is implemented. The point it that it's an opportunistic locking which could be implemented with very low overhead. And if the code inside the perform_write_operation_with_its_own_locks() is guaranteed not to block on other perform_write_operation_with_its_own_locks() functions, then this implementation would be lock-free for when the barrier is OFF, i.e. no potential deadlocks. For when it's ON, there also would be no deadlocks, but the operation as such won't be fully lock-free. Also, the way I propose to implement the barrier thread, it won't block the server for more than a configurable amount of seconds no matter what (with this, the entire implementation would be lock-free (if we consider the snapshot as part of the process), i.e. it guarantees progress in a finite amount of time, though some threads could starve): atomic_store(write_barrier, ON) __check_write_counter: atomic_load(write_operations_in_execution_counter) if (write_operations_in_execution_counter == 0) { sync_data_to_disk() signal_data_ready() *wait_for_lock_release_with_timeout(5s)* atomic_store(write_barrier, OFF) } else { sleep(1ms) goto __check_write_counter } Here the wait_for_lock_release_with_timeout(5s) function will wait for the release-lock signal for 5 seconds and would turn off the barrier no matter if the external operation (snapshot-taking backup tool) completed, so the server would continue its normal operation once the 5s timeout expires. So while the barrier thread waits for the release-lock signal, the backup tool performs a snapshot and then sends the release-lock signal. The result of the signaling indicates whether the lock was released before or not. If the backup tool receives the code indicating that the lock was released before, it would mean that the snapshot that was taken could be inconsistent. In this case the backup tool could try to perform the operation again or proceed in another way (e.g. to notify the admin that the snapshot takes more than the preconfigured lock-wait time). Again this is the opportunistic locking, i.e. we try to perform an operation without a guarantee of success, so we don't need to wait indefinitely, again providing a lock-free guarantee. If we succeed, then all is good. If not, we try again or abandon the task with an error. And all this would be external to cyrus, it would be implemented in the backup utility. I guess the best way to start with this is to identify all places where data write operations occur (I suppose this is where the mail data and all sorts of databases are written). Once they are identified they could be tweaked a bit for better concurrency and lockability and then we could analyze how to wrap them with a global lock/barrier. Regards, Anatoli On 12/11/19 06:20, Bron Gondwana wrote: > > > On Tue, Nov 12, 2019, at 14:50, Anatoli wrote: >> Bron, >> >> The proposed algo is a barrier before any single-lock. In itself it's a >> single lock, but the same code (the pseudocode for the *worker thread* >> in my previous mail) should be inserted at *every* single-lock/write >> operation location. If there's no need to pause, the overhead is >> non-existent. If a pause is requested, all worker threads would pause at >> the entrance to any single-lock/write code. >> >> It would make the entire Cyrus daemon to complete all pending write >> operations and pause new ones. At this stage, if I understand it >> correctly, the data on disk would be in a consistent state, ready to >> take a snapshot or to perform some other operation. > > "complete all pending write operations and pause new ones" > > How do you know when the current write operations are finished? > >> Without that, if we just take a snapshot of the fs, it could happen that >> a) some files are not written entirely (i.e. caught in the middle of a >> write operation) or b) the contents of some files are newer than the >> other, i.e. the logical write operation was not atomic (e.g. mail data >> is written but indexes are not updated yet or something similar). >> >> Maybe I didn't understand you correctly. Do you mean that finishing all >> writes and pausing new ones is not enough to guarantee an integral state >> of files on disk? If it's the case, what would have to be done to >> guarantee it (i.e. to make it like Cyrus was shutdown normally)? > > I mean that to finish all writes and pause new ones, you need to know > that the writes are finished.? And not just writes, but sets of writes > that are held under a lock together.? The way I know to do this is a > single global lock with the following properties: > > 1) every action which locks any file within Cyrus for writing takes a > SHARED global lock before it takes the write lock on the file. > > 2) the SHARED lock is held for the duration of the writes, and released > once the writes are finished. > > 3) the "backup utility" takes an EXCLUSIVE lock on the global lock, > which will only be granted once each write is finished.? It then takes a > snapshot, and releases the EXCLUSIVE lock. > > This guarantees full consistency. > > The question that always exists for locks is "what granularity" - too > wide, and you hold the lock for a long time.? Too narrow, and you take > and release it very frequently, adding overhead. > > My first and most dumbest theory is to go quite wide - add the lock in > every runloop and command line utility such that it's held for the > entire running of the loop or the utility!? Mostly these are done within > a fraction of a second.? The one place that might be interesting is > FETCH 1:* RFC822.PEEK or similar in imapd, where we already have some > locking magic that holds a shared namelock on the mailbox to stop > repacking while it releases the index lock to allow other actions on the > mailbox in the meanwhile. > > So we could go down a layer and only lock when we lock mailboxes or > cyrusdbs, and refcount the global lock.? This seems more likely to be a > good layer, and not too horrible. > > The other thing is that we'll need to assert that the lock isn't being > held during each daemon's command loop, so that bugs don't leak out to > deadlock entire servers. > > And I think that's nearly it :) > > Bron. > > -- > ? Bron Gondwana, CEO, Fastmail Pty Ltd > ? brong at fastmailteam.com > > From brong at fastmailteam.com Wed Nov 13 00:25:38 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Wed, 13 Nov 2019 16:25:38 +1100 Subject: time for cyrus-imap v3.2? In-Reply-To: References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> <8ec13b46-cf19-4681-806a-c708c2d376d6@www.fastmail.com> <00a9b398-b305-4bb4-90a0-b528c2d21651@www.fastmail.com> <9670005b-ff6d-ad71-fd11-1c77654f438f@anatoli.ws> <4f649f79-217b-4c4a-9000-abedd933a6f2@www.fastmail.com> <3fe53aee-1088-4a2f-b7fe-9b0f02b60539@www.fastmail.com> Message-ID: Given that Cyrus is unix processes, not threads, what mechanism are you proposing for the atomic here? I'd be keen to see something shaped like a pull request against Cyrus showing how this would interact with the existing locking architecture. Cheers, Bron. On Wed, Nov 13, 2019, at 16:12, Anatoli via Cyrus-devel wrote: > > How do you know when the current write operations are finished? > > The pseudocode from my previous email: > > __start: > > atomic_inc(write_operations_in_execution_counter) > > atomic_load(write_barrier) > > if (write_barrier == ON) { > > atomic_dec(write_operations_in_execution_counter) > sleep(100ms) > goto __start > > } else { > > *perform_write_operation_with_its_own_locks()* > atomic_dec(write_operations_in_execution_counter) > > } > > > Here perform_write_operation_with_its_own_locks() is a function (one > among many) that today performs a write operation (the algorithm implies > that all occurrences of write operations should be wrapped with this > pseudocode). > > Once a write function returns (which would mean that this particular > write operation completed), the write_operations_in_execution_counter is > decremented. When all operations complete, the counter would be == 0, > which is detected by the barrier thread and it proceeds with the > sync/fsync to have all files fully written to disk & notifies the > external process waiting to take a snapshot. Then it turns off the > barrier and the daemon continues normal operation. > > This is similar to a single global lock as you describe it, and it sort > of behaves like it, but it doesn't need to be acquired as such every > time and it has practically 0 overhead so it could be placed inside > high-frequency loops. It's similar to how modern lock-free concurrency > is implemented. The point it that it's an opportunistic locking which > could be implemented with very low overhead. > > And if the code inside the perform_write_operation_with_its_own_locks() > is guaranteed not to block on other > perform_write_operation_with_its_own_locks() functions, then this > implementation would be lock-free for when the barrier is OFF, i.e. no > potential deadlocks. For when it's ON, there also would be no deadlocks, > but the operation as such won't be fully lock-free. > > Also, the way I propose to implement the barrier thread, it won't block > the server for more than a configurable amount of seconds no matter what > (with this, the entire implementation would be lock-free (if we consider > the snapshot as part of the process), i.e. it guarantees progress in a > finite amount of time, though some threads could starve): > > atomic_store(write_barrier, ON) > > __check_write_counter: > > atomic_load(write_operations_in_execution_counter) > > if (write_operations_in_execution_counter == 0) { > > sync_data_to_disk() > signal_data_ready() > *wait_for_lock_release_with_timeout(5s)* > atomic_store(write_barrier, OFF) > > } else { > > sleep(1ms) > goto __check_write_counter > > } > > > Here the wait_for_lock_release_with_timeout(5s) function will wait for > the release-lock signal for 5 seconds and would turn off the barrier no > matter if the external operation (snapshot-taking backup tool) > completed, so the server would continue its normal operation once the 5s > timeout expires. > > So while the barrier thread waits for the release-lock signal, the > backup tool performs a snapshot and then sends the release-lock signal. > The result of the signaling indicates whether the lock was released > before or not. If the backup tool receives the code indicating that the > lock was released before, it would mean that the snapshot that was taken > could be inconsistent. > > In this case the backup tool could try to perform the operation again or > proceed in another way (e.g. to notify the admin that the snapshot takes > more than the preconfigured lock-wait time). Again this is the > opportunistic locking, i.e. we try to perform an operation without a > guarantee of success, so we don't need to wait indefinitely, again > providing a lock-free guarantee. If we succeed, then all is good. If > not, we try again or abandon the task with an error. > > And all this would be external to cyrus, it would be implemented in the > backup utility. > > I guess the best way to start with this is to identify all places where > data write operations occur (I suppose this is where the mail data and > all sorts of databases are written). Once they are identified they could > be tweaked a bit for better concurrency and lockability and then we > could analyze how to wrap them with a global lock/barrier. > > Regards, > Anatoli > > > On 12/11/19 06:20, Bron Gondwana wrote: > > > > > > On Tue, Nov 12, 2019, at 14:50, Anatoli wrote: > >> Bron, > >> > >> The proposed algo is a barrier before any single-lock. In itself it's a > >> single lock, but the same code (the pseudocode for the *worker thread* > >> in my previous mail) should be inserted at *every* single-lock/write > >> operation location. If there's no need to pause, the overhead is > >> non-existent. If a pause is requested, all worker threads would pause at > >> the entrance to any single-lock/write code. > >> > >> It would make the entire Cyrus daemon to complete all pending write > >> operations and pause new ones. At this stage, if I understand it > >> correctly, the data on disk would be in a consistent state, ready to > >> take a snapshot or to perform some other operation. > > > > "complete all pending write operations and pause new ones" > > > > How do you know when the current write operations are finished? > > > >> Without that, if we just take a snapshot of the fs, it could happen that > >> a) some files are not written entirely (i.e. caught in the middle of a > >> write operation) or b) the contents of some files are newer than the > >> other, i.e. the logical write operation was not atomic (e.g. mail data > >> is written but indexes are not updated yet or something similar). > >> > >> Maybe I didn't understand you correctly. Do you mean that finishing all > >> writes and pausing new ones is not enough to guarantee an integral state > >> of files on disk? If it's the case, what would have to be done to > >> guarantee it (i.e. to make it like Cyrus was shutdown normally)? > > > > I mean that to finish all writes and pause new ones, you need to know > > that the writes are finished. And not just writes, but sets of writes > > that are held under a lock together. The way I know to do this is a > > single global lock with the following properties: > > > > 1) every action which locks any file within Cyrus for writing takes a > > SHARED global lock before it takes the write lock on the file. > > > > 2) the SHARED lock is held for the duration of the writes, and released > > once the writes are finished. > > > > 3) the "backup utility" takes an EXCLUSIVE lock on the global lock, > > which will only be granted once each write is finished. It then takes a > > snapshot, and releases the EXCLUSIVE lock. > > > > This guarantees full consistency. > > > > The question that always exists for locks is "what granularity" - too > > wide, and you hold the lock for a long time. Too narrow, and you take > > and release it very frequently, adding overhead. > > > > My first and most dumbest theory is to go quite wide - add the lock in > > every runloop and command line utility such that it's held for the > > entire running of the loop or the utility! Mostly these are done within > > a fraction of a second. The one place that might be interesting is > > FETCH 1:* RFC822.PEEK or similar in imapd, where we already have some > > locking magic that holds a shared namelock on the mailbox to stop > > repacking while it releases the index lock to allow other actions on the > > mailbox in the meanwhile. > > > > So we could go down a layer and only lock when we lock mailboxes or > > cyrusdbs, and refcount the global lock. This seems more likely to be a > > good layer, and not too horrible. > > > > The other thing is that we'll need to assert that the lock isn't being > > held during each daemon's command loop, so that bugs don't leak out to > > deadlock entire servers. > > > > And I think that's nearly it :) > > > > Bron. > > > > -- > > Bron Gondwana, CEO, Fastmail Pty Ltd > > brong at fastmailteam.com > > > > > -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at anatoli.ws Wed Nov 13 01:06:22 2019 From: me at anatoli.ws (Anatoli) Date: Wed, 13 Nov 2019 03:06:22 -0300 Subject: time for cyrus-imap v3.2? In-Reply-To: References: <2993c77c-cb3a-4634-a21f-b965baf3eb1d@dogfood.fastmail.com> <586c93a2-0de0-4ed2-b68b-36d7756eeb24@www.fastmail.com> <20191105125651.Horde.4rXtHfxEcnRCUcTjGN08mnd@webmail.uni-tuebingen.de> <8ec13b46-cf19-4681-806a-c708c2d376d6@www.fastmail.com> <00a9b398-b305-4bb4-90a0-b528c2d21651@www.fastmail.com> <9670005b-ff6d-ad71-fd11-1c77654f438f@anatoli.ws> <4f649f79-217b-4c4a-9000-abedd933a6f2@www.fastmail.com> <3fe53aee-1088-4a2f-b7fe-9b0f02b60539@www.fastmail.com> Message-ID: <323fc5b2-c2ad-f272-062d-8b8382bdcce1@anatoli.ws> > Given that Cyrus is unix processes, not threads, what mechanism are > you proposing for the atomic here? __atomic_load/store and __atomic_fetch_add/sub: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html. These are not related to threads or processes, these are compiler built-ins that leverage processor-specific instructions like CAS (compare-and-swap). The compiler knows which instructions to use for each arch. More info here: https://lwn.net/Articles/509102/. Clang/LLVM also has similar built-ins: https://libcxx.llvm.org/atomic_design_a.html. So instead of using some old POSIX synchro primitives, we go one level lower and implement them ourselves in opportunistic lock-free way with compiler built-ins. The built-ins are basically inline asm implementation abstractions available for all archs supported by the compiler. > I'd be keen to see something shaped like a pull request against Cyrus > showing how this would interact with the existing locking > architecture. The issue is I don't know the internals of cyrus enough to be able to identify all the places where write operations occur and I don't have enough understanding of the entire sync/write logic to be able to provide a working solution. But once the write operations are encapsulated in separate functions and there's a list of all of them, I could implement the efficient global locking and the backup tool that would leverage it. Regards, Anatoli On 13/11/19 02:25, Bron Gondwana wrote: > Given that Cyrus is unix processes, not threads, what mechanism are you > proposing for the atomic here? > > I'd be keen to see something shaped like a pull request against Cyrus > showing how this would interact with the existing locking architecture. > > Cheers, > > Bron. > > On Wed, Nov 13, 2019, at 16:12, Anatoli via Cyrus-devel wrote: >> > How do you know when the current write operations are finished? >> >> The pseudocode from my previous email: >> >> __start: >> >> atomic_inc(write_operations_in_execution_counter) >> >> atomic_load(write_barrier) >> >> if (write_barrier == ON) { >> >> ? atomic_dec(write_operations_in_execution_counter) >> ? sleep(100ms) >> ? goto __start >> >> } else { >> >> ? *perform_write_operation_with_its_own_locks()* >> ? atomic_dec(write_operations_in_execution_counter) >> >> } >> >> >> Here perform_write_operation_with_its_own_locks() is a function (one >> among many) that today performs a write operation (the algorithm implies >> that all occurrences of write operations should be wrapped with this >> pseudocode). >> >> Once a write function returns (which would mean that this particular >> write operation completed), the write_operations_in_execution_counter is >> decremented. When all operations complete, the counter would be == 0, >> which is detected by the barrier thread and it proceeds with the >> sync/fsync to have all files fully written to disk & notifies the >> external process waiting to take a snapshot. Then it turns off the >> barrier and the daemon continues normal operation. >> >> This is similar to a single global lock as you describe it, and it sort >> of behaves like it, but it doesn't need to be acquired as such every >> time and it has practically 0 overhead so it could be placed inside >> high-frequency loops. It's similar to how modern lock-free concurrency >> is implemented. The point it that it's an opportunistic locking which >> could be implemented with very low overhead. >> >> And if the code inside the perform_write_operation_with_its_own_locks() >> is guaranteed not to block on other >> perform_write_operation_with_its_own_locks() functions, then this >> implementation would be lock-free for when the barrier is OFF, i.e. no >> potential deadlocks. For when it's ON, there also would be no deadlocks, >> but the operation as such won't be fully lock-free. >> >> Also, the way I propose to implement the barrier thread, it won't block >> the server for more than a configurable amount of seconds no matter what >> (with this, the entire implementation would be lock-free (if we consider >> the snapshot as part of the process), i.e. it guarantees progress in a >> finite amount of time, though some threads could starve): >> >> atomic_store(write_barrier, ON) >> >> __check_write_counter: >> >> atomic_load(write_operations_in_execution_counter) >> >> if (write_operations_in_execution_counter == 0) { >> >> ? sync_data_to_disk() >> ? signal_data_ready() >> ? *wait_for_lock_release_with_timeout(5s)* >> ? atomic_store(write_barrier, OFF) >> >> } else { >> >> ? sleep(1ms) >> ? goto __check_write_counter >> >> } >> >> >> Here the wait_for_lock_release_with_timeout(5s) function will wait for >> the release-lock signal for 5 seconds and would turn off the barrier no >> matter if the external operation (snapshot-taking backup tool) >> completed, so the server would continue its normal operation once the 5s >> timeout expires. >> >> So while the barrier thread waits for the release-lock signal, the >> backup tool performs a snapshot and then sends the release-lock signal. >> The result of the signaling indicates whether the lock was released >> before or not. If the backup tool receives the code indicating that the >> lock was released before, it would mean that the snapshot that was taken >> could be inconsistent. >> >> In this case the backup tool could try to perform the operation again or >> proceed in another way (e.g. to notify the admin that the snapshot takes >> more than the preconfigured lock-wait time). Again this is the >> opportunistic locking, i.e. we try to perform an operation without a >> guarantee of success, so we don't need to wait indefinitely, again >> providing a lock-free guarantee. If we succeed, then all is good. If >> not, we try again or abandon the task with an error. >> >> And all this would be external to cyrus, it would be implemented in the >> backup utility. >> >> I guess the best way to start with this is to identify all places where >> data write operations occur (I suppose this is where the mail data and >> all sorts of databases are written). Once they are identified they could >> be tweaked a bit for better concurrency and lockability and then we >> could analyze how to wrap them with a global lock/barrier. >> >> Regards, >> Anatoli >> >> >> On 12/11/19 06:20, Bron Gondwana wrote: >> >? >> >? >> > On Tue, Nov 12, 2019, at 14:50, Anatoli wrote: >> >> Bron, >> >> >> >> The proposed algo is a barrier before any single-lock. In itself it's a >> >> single lock, but the same code (the pseudocode for the *worker thread* >> >> in my previous mail) should be inserted at *every* single-lock/write >> >> operation location. If there's no need to pause, the overhead is >> >> non-existent. If a pause is requested, all worker threads would >> pause at >> >> the entrance to any single-lock/write code. >> >> >> >> It would make the entire Cyrus daemon to complete all pending write >> >> operations and pause new ones. At this stage, if I understand it >> >> correctly, the data on disk would be in a consistent state, ready to >> >> take a snapshot or to perform some other operation. >> >? >> > "complete all pending write operations and pause new ones" >> >? >> > How do you know when the current write operations are finished? >> >? >> >> Without that, if we just take a snapshot of the fs, it could happen >> that >> >> a) some files are not written entirely (i.e. caught in the middle of a >> >> write operation) or b) the contents of some files are newer than the >> >> other, i.e. the logical write operation was not atomic (e.g. mail data >> >> is written but indexes are not updated yet or something similar). >> >> >> >> Maybe I didn't understand you correctly. Do you mean that finishing all >> >> writes and pausing new ones is not enough to guarantee an integral >> state >> >> of files on disk? If it's the case, what would have to be done to >> >> guarantee it (i.e. to make it like Cyrus was shutdown normally)? >> >? >> > I mean that to finish all writes and pause new ones, you need to know >> > that the writes are finished.? And not just writes, but sets of writes >> > that are held under a lock together.? The way I know to do this is a >> > single global lock with the following properties: >> >? >> > 1) every action which locks any file within Cyrus for writing takes a >> > SHARED global lock before it takes the write lock on the file. >> >? >> > 2) the SHARED lock is held for the duration of the writes, and released >> > once the writes are finished. >> >? >> > 3) the "backup utility" takes an EXCLUSIVE lock on the global lock, >> > which will only be granted once each write is finished.? It then takes a >> > snapshot, and releases the EXCLUSIVE lock. >> >? >> > This guarantees full consistency. >> >? >> > The question that always exists for locks is "what granularity" - too >> > wide, and you hold the lock for a long time.? Too narrow, and you take >> > and release it very frequently, adding overhead. >> >? >> > My first and most dumbest theory is to go quite wide - add the lock in >> > every runloop and command line utility such that it's held for the >> > entire running of the loop or the utility!? Mostly these are done within >> > a fraction of a second.? The one place that might be interesting is >> > FETCH 1:* RFC822.PEEK or similar in imapd, where we already have some >> > locking magic that holds a shared namelock on the mailbox to stop >> > repacking while it releases the index lock to allow other actions on the >> > mailbox in the meanwhile. >> >? >> > So we could go down a layer and only lock when we lock mailboxes or >> > cyrusdbs, and refcount the global lock.? This seems more likely to be a >> > good layer, and not too horrible. >> >? >> > The other thing is that we'll need to assert that the lock isn't being >> > held during each daemon's command loop, so that bugs don't leak out to >> > deadlock entire servers. >> >? >> > And I think that's nearly it :) >> >? >> > Bron. >> >? >> > -- >> > ? Bron Gondwana, CEO, Fastmail Pty Ltd >> > ? brong at fastmailteam.com >> >? >> >? >> > > -- > ? Bron Gondwana, CEO, Fastmail Pty Ltd > ? brong at fastmailteam.com > > From a_s_y at sama.ru Sat Nov 16 07:32:23 2019 From: a_s_y at sama.ru (Sergey) Date: Sat, 16 Nov 2019 16:32:23 +0400 Subject: FR: Reaching limits should be logged (issues 2913) Message-ID: <201911161632.23641.a_s_y@sama.ru> Hello. Some limits can be configured in imapd.conf (such as maxlogins_per_host, maxlogins_per_user, popminpoll). Reaching limits should be logged for properly diagnostic: https://github.com/cyrusimap/cyrus-imapd/issues/2913 I created this issue but encounter a problem to add patches. I put them here. -- Regards, Sergey -------------- next part -------------- A non-text attachment was scrubbed... Name: cyrus-2.5.13-logging-limit.diff Type: text/x-diff Size: 3654 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cyrus-3.0.11-logging-limit.diff Type: text/x-diff Size: 4372 bytes Desc: not available URL: From dilyan.palauzov at aegee.org Tue Nov 19 15:56:49 2019 From: dilyan.palauzov at aegee.org (=?UTF-8?Q?=D0=94=D0=B8=D0=BB=D1=8F=D0=BD_?= =?UTF-8?Q?=D0=9F=D0=B0=D0=BB=D0=B0=D1=83=D0=B7=D0=BE=D0=B2?=) Date: Tue, 19 Nov 2019 20:56:49 +0000 Subject: Debugging Deadlocks Message-ID: <92cca1d7baac62ef2b3cbe3f59a771796aba19dd.camel@aegee.org> Hello, I run cyrus imap 3.0.x with some private changes. Sometimes when stop the master process, the master process utilizes one CPU core to 100% for 5 minutes. After the fifth minute, systemd enforces kill -9. When I attach to the maste process, I see that it some janitor does some work, but I have not checked the details. Has anybody experienced this? I have very few users, but one of the users (me) uses many client simuitaneously. Lets say two IMAP clients, making 4-6 connections in parallel and three CalDAV clients, doing estimated 3-6 connections in parallel. The httpd process is behind a proxy and most of the time the proxy server manages to serialize the requests, and in fact a single httpd process handles the requests. At least it is not visible that under normal circumstances there is a second running httpd process. Under normal circumstances I see also a single lmtpd process and many imapd processes. On some days I observe that the IMAP client cannot obtain list of new messages, it just times out. This could because of deadlocks, but it can be because on that particular day the IO is extremely slow and thus the problem is not withn cyrus. Sometimes I observe afterwards that tha INBOX index is being rebuild. Sometimes, after the INBOX index is rebuild things start working. So on such days I suspect that there is some deadlock. Lets say, if there are two or more long-term running lmtpd processes, then I suspect a deadlock. What approach can use to find where the deadlock is and how can get rid of it? I can attach to a process with STRACE, get the current backtrace and variable values with GDB and I can see (eg. with LSOF) which files are opened in which mode. But I do not know what to look for. Or rather, when I try investigating, almost always I see a process rebuiding my INBOX index and after waiting, waiting, waiting, eventually the INDEX is rebuild. How can I find out why the index was broken? Greetings ????? From thomas.cataldo at bluemind.net Thu Nov 21 05:50:46 2019 From: thomas.cataldo at bluemind.net (Thomas Cataldo) Date: Thu, 21 Nov 2019 11:50:46 +0100 Subject: [RFC] multiplexing cyrus replication with log/log-run sharding & multiple sync_client Message-ID: <62498D72-A5C0-4D9C-B675-3E263B6C66B7@bluemind.net> Hi, In our workload, cyrus replication latency is pretty critical as we serve most read requests from the replica. Having a single network channel between master & replica is a big issue for us. Trying to improve our latency, we implemented the following approach : instead of writing ?channel/log? we write ?channel/log.?. We compute our shard key this way : # cat log.0 APPEND devenv.blue!user.tom.Sent MAILBOX devenv.blue!user.tom.Sent # cat log.2 SEEN tom at devenv.blue 9f799278-a6cd-45b7-9546-0e861d5e15d6 root at bm1804:/var/lib/cyrus/sync/core# cat log.3 ? APPEND devenv.blue!user.sga MAILBOX devenv.blue!user.sga We compute an hashcode of the first argument. We normalize it so devenv.blue!user.tom.Sent and devenv.blue!user.tom have the same hashcode then we ?hashcode % shard_count? to figure out which log file to use. We patched sync_client to add a ?-i ?. sync_client -i 0 will process log.0 and use log-run.0, etc. We don?t spawn sync_client from cyrus.conf but we prefer systemd tricks : /lib/systemd/system/bm-cyrus-syncclient at .service which is a template and we then enable : systemctl enable bm-cyrus-syncclient@{0..3} to spawn 4 sync_client. Attached diff of what we changed. As a side note, our usage forbids moving a mailbox folder into another mailbox (ie. moving user.tom.titi into user.sga.stuff is forbidden in our setup). I guess this approach would be problematic we moving a mailbox subfolder to another mailbox as they might be sharded to separate log files. Any feedback on this approach ? I read that you planned to turn sync_client into a sync daemon. Any schedule estimate on that ? Regards, Thomas. sync_client systemd configuration template : /lib/systemd/system/bm-cyrus-syncclient at .service (%i is expanded to 42 by systemd when you enable syncclient at 42) [Unit] Description=BlueMind Cyrus sync_client service After=bm-cyrus-imapd.service PartOf=bm-cyrus-imapd.service ConditionPathExists=!/etc/bm/bm-cyrus-imapd.disabled [Service] Type=forking Environment=CONF=/etc/imapd.conf ExecStartPre=/usr/bin/find /var/lib/cyrus/sync -name ?log*.%i' -type f -exec rm -f {} \; ExecStart=/usr/sbin/sync_client -C $CONF -t 1800 -n core -i %i -l -r SuccessExitStatus=75 RemainAfterExit=no Restart=always RestartSec=5s TimeoutStopSec=20s [Install] WantedBy=bm-cyrus-imapd.service Thomas Cataldo Directeur Technique (+33) 6 42 25 91 38 BlueMind +33 (0)5 81 91 55 60 Hotel des T?l?coms, 40 rue du village d'entreprises 31670 Lab?ge, France www.bluemind.net / https://blog.bluemind.net/fr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: replication_multiplexing.diff Type: application/octet-stream Size: 23847 bytes Desc: not available URL: -------------- next part -------------- From brong at fastmailteam.com Thu Nov 21 10:25:17 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Thu, 21 Nov 2019 23:25:17 +0800 Subject: =?UTF-8?Q?Re:_[RFC]_multiplexing_cyrus_replication_with_log/log-run_shar?= =?UTF-8?Q?ding_&_multiple_sync=5Fclient?= In-Reply-To: <62498D72-A5C0-4D9C-B675-3E263B6C66B7@bluemind.net> References: <62498D72-A5C0-4D9C-B675-3E263B6C66B7@bluemind.net> Message-ID: <67980f54-4ad8-46cd-89d3-6c597b465e18@www.fastmail.com> Wow, interesting. That definitely works, though I'd probably normalise everything to the user ID so that the seen and mailbox events for the same user got the same channel. We're looking at similar things for our setup too, either shading or even per user logs with a daemon which farms users out to multiple channels. As for when we'd look at a sync daemon: probably next year. We're planning to land uuid based storage soon, which means that renaming users and mailboxes is really fast, then looking at replication channels on top of that would make more sense, because otherwise user renames become tricky. I'll have a look at the diff when it isn't 11:30pm for me. Cheers, Bron On Thu, Nov 21, 2019, at 18:50, Thomas Cataldo wrote: > Hi, > > In our workload, cyrus replication latency is pretty critical as we serve most read requests from the replica. > Having a single network channel between master & replica is a big issue for us. > > Trying to improve our latency, we implemented the following approach : instead of writing ?channel/log? we write ?channel/log.?. > We compute our shard key this way : > > # cat log.0 > APPEND devenv.blue!user.tom.Sent > MAILBOX devenv.blue!user.tom.Sent > > # cat log.2 > SEEN tom at devenv.blue 9f799278-a6cd-45b7-9546-0e861d5e15d6 > root at bm1804:/var/lib/cyrus/sync/core# cat log.3 > ? > APPEND devenv.blue!user.sga > MAILBOX devenv.blue!user.sga > > We compute an hashcode of the first argument. We normalize it so devenv.blue!user.tom.Sent and devenv.blue!user.tom have the same hashcode then we ?hashcode % shard_count? to figure out which log file to use. > We patched sync_client to add a ?-i ?. sync_client -i 0 will process log.0 and use log-run.0, etc. > > We don?t spawn sync_client from cyrus.conf but we prefer systemd tricks : > > /lib/systemd/system/bm-cyrus-syncclient at .service which is a template and we then enable : > systemctl enable bm-cyrus-syncclient@{0..3} to spawn 4 sync_client. > > > Attached diff of what we changed. > > As a side note, our usage forbids moving a mailbox folder into another mailbox (ie. moving user.tom.titi into user.sga.stuff is forbidden in our setup). I guess this approach would be problematic we moving a mailbox subfolder to another mailbox as they might be sharded to separate log files. > > Any feedback on this approach ? I read that you planned to turn sync_client into a sync daemon. Any schedule estimate on that ? > > Regards, > Thomas. > > > sync_client systemd configuration template : > /lib/systemd/system/bm-cyrus-syncclient at .service (%i is expanded to 42 by systemd when you enable syncclient at 42) > [Unit] > Description=BlueMind Cyrus sync_client service > After=bm-cyrus-imapd.service > PartOf=bm-cyrus-imapd.service > ConditionPathExists=!/etc/bm/bm-cyrus-imapd.disabled > > [Service] > Type=forking > Environment=CONF=/etc/imapd.conf > ExecStartPre=/usr/bin/find /var/lib/cyrus/sync -name ?log*.%i' -type f -exec rm -f {} \; > ExecStart=/usr/sbin/sync_client -C $CONF -t 1800 -n core -i %i -l -r > SuccessExitStatus=75 > RemainAfterExit=no > Restart=always > RestartSec=5s > TimeoutStopSec=20s > > [Install] > WantedBy=bm-cyrus-imapd.service > > > > > > Thomas Cataldo > Directeur Technique > > (+33) 6 42 25 91 38 > > BlueMind > +33 (0)5 81 91 55 60 > Hotel des T?l?coms, 40 rue du village d'entreprises > 31670 Lab?ge, France > www.bluemind.net / https://blog.bluemind.net/fr/ > > > > > *Attachments:* > * replication_multiplexing.diff -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brong at fastmailteam.com Thu Nov 21 10:37:49 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Fri, 22 Nov 2019 02:37:49 +1100 Subject: =?UTF-8?Q?Re:_[RFC]_multiplexing_cyrus_replication_with_log/log-run_shar?= =?UTF-8?Q?ding_&_multiple_sync=5Fclient?= In-Reply-To: <67980f54-4ad8-46cd-89d3-6c597b465e18@www.fastmail.com> References: <62498D72-A5C0-4D9C-B675-3E263B6C66B7@bluemind.net> <67980f54-4ad8-46cd-89d3-6c597b465e18@www.fastmail.com> Message-ID: <5ce1a879-4e7e-4a16-8cc3-91d4eec50e8b@dogfood.fastmail.com> Oh I should just add the other thing that you might be interested in that I've got some initial stabs at - synchronous replication. Embedding the sync_client logic into mailbox commit such that any action that writes to a mailbox does a pass through and creates a "SYNC APPLY MAILBOX" dlist stanza and shoves it down the wire at a replica. There's a couple of bits missing so far - it needs a way to upload the message content as well, which I'm probably just going to embed directly the the RECORD compontent - it's kind of ugly but it makes it a single DLIST - and obviously it needs to not fail entirely if the replica is down, so it might be fire and forget or it might have a timeout after which it syslogs but returns. It builds on top of the existing SINCE_MODSEQ and SINCE_UIDNEXT logic that's already in master, and will also want a "sync cache" - which will store the remote MAILBOX line for each mailbox, so you can generate a SYNC APPLY without first having to do a SYNC GET to find the current remote state. Assuming the replica hasn't changed in the meanwhile, this will allow for single round trip apply of changes rather than the current 4 round trips for an APPEND. (truly, it's 4 round trips!) S0 SYNCGET MAILBOX user.cassandane S1 SYNCAPPLY RESERVE S2 SYNCAPPLY MESSAGE S3 SYNCAPPLY MAILBOX In my plan, the "SYNCGET MAILBOX" would not be needed, because you'd already know the remote state. The "RESERVE" would not be needed because you'd already know from the local conversations.db that this message wasn't listed in any other mailbox or with a UID less than the UIDNEXT of the remote mailbox from that known remote state, so all you'd have left is the SYNCAPPLY MESSAGE and the SYNCAPPLY MAILBOX. So it's just a matter of merging those into a single round trip with some nice combined format, and you're done :) Bron. On Fri, Nov 22, 2019, at 02:25, Bron Gondwana wrote: > Wow, interesting. That definitely works, though I'd probably normalise everything to the user ID so that the seen and mailbox events for the same user got the same channel. > > We're looking at similar things for our setup too, either shading or even per user logs with a daemon which farms users out to multiple channels. > > As for when we'd look at a sync daemon: probably next year. We're planning to land uuid based storage soon, which means that renaming users and mailboxes is really fast, then looking at replication channels on top of that would make more sense, because otherwise user renames become tricky. > > I'll have a look at the diff when it isn't 11:30pm for me. > > Cheers, > > Bron > > On Thu, Nov 21, 2019, at 18:50, Thomas Cataldo wrote: >> Hi, >> >> In our workload, cyrus replication latency is pretty critical as we serve most read requests from the replica. >> Having a single network channel between master & replica is a big issue for us. >> >> Trying to improve our latency, we implemented the following approach : instead of writing ?channel/log? we write ?channel/log.?. >> We compute our shard key this way : >> >> # cat log.0 >> APPEND devenv.blue!user.tom.Sent >> MAILBOX devenv.blue!user.tom.Sent >> >> # cat log.2 >> SEEN tom at devenv.blue 9f799278-a6cd-45b7-9546-0e861d5e15d6 >> root at bm1804:/var/lib/cyrus/sync/core# cat log.3 >> ? >> APPEND devenv.blue!user.sga >> MAILBOX devenv.blue!user.sga >> >> We compute an hashcode of the first argument. We normalize it so devenv.blue!user.tom.Sent and devenv.blue!user.tom have the same hashcode then we ?hashcode % shard_count? to figure out which log file to use. >> We patched sync_client to add a ?-i ?. sync_client -i 0 will process log.0 and use log-run.0, etc. >> >> We don?t spawn sync_client from cyrus.conf but we prefer systemd tricks : >> >> /lib/systemd/system/bm-cyrus-syncclient at .service which is a template and we then enable : >> systemctl enable bm-cyrus-syncclient@{0..3} to spawn 4 sync_client. >> >> >> Attached diff of what we changed. >> >> As a side note, our usage forbids moving a mailbox folder into another mailbox (ie. moving user.tom.titi into user.sga.stuff is forbidden in our setup). I guess this approach would be problematic we moving a mailbox subfolder to another mailbox as they might be sharded to separate log files. >> >> Any feedback on this approach ? I read that you planned to turn sync_client into a sync daemon. Any schedule estimate on that ? >> >> Regards, >> Thomas. >> >> >> sync_client systemd configuration template : >> /lib/systemd/system/bm-cyrus-syncclient at .service (%i is expanded to 42 by systemd when you enable syncclient at 42) >> [Unit] >> Description=BlueMind Cyrus sync_client service >> After=bm-cyrus-imapd.service >> PartOf=bm-cyrus-imapd.service >> ConditionPathExists=!/etc/bm/bm-cyrus-imapd.disabled >> >> [Service] >> Type=forking >> Environment=CONF=/etc/imapd.conf >> ExecStartPre=/usr/bin/find /var/lib/cyrus/sync -name ?log*.%i' -type f -exec rm -f {} \; >> ExecStart=/usr/sbin/sync_client -C $CONF -t 1800 -n core -i %i -l -r >> SuccessExitStatus=75 >> RemainAfterExit=no >> Restart=always >> RestartSec=5s >> TimeoutStopSec=20s >> >> [Install] >> WantedBy=bm-cyrus-imapd.service >> >> >> >> >> >> Thomas Cataldo >> Directeur Technique >> >> (+33) 6 42 25 91 38 >> >> BlueMind >> +33 (0)5 81 91 55 60 >> Hotel des T?l?coms, 40 rue du village d'entreprises >> 31670 Lab?ge, France >> www.bluemind.net / https://blog.bluemind.net/fr/ >> >> >> >> >> *Attachments:* >> * replication_multiplexing.diff > > -- > Bron Gondwana, CEO, Fastmail Pty Ltd > brong at fastmailteam.com > -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brong at fastmailteam.com Sun Nov 24 08:06:29 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Mon, 25 Nov 2019 00:06:29 +1100 Subject: CalendarEvents: should we patch the patch, or patch the model? Message-ID: <3a17db21-4bae-4382-aeb2-1e9244d56cb2@dogfood.fastmail.com> This is triggered by the fact that our current updates of calendar participants on overridden events is broken, and the reason is this: 'recurrenceOverrides/2019-03-01T09:00:00/participants~1baz/participationStatus' => 'accepted', This is a patch against an event with the following overrides. "recurrenceOverrides": { "2019-02-01T09:00:00": { "duration": "PT2H" }, "2019-03-01T09:00:00": { "participants/baz": { "@type": "Participant", "sendTo": { "imip": "mailto:baz at local" }, "email": "baz at local", "name": "Baz", "kind": "individual", "roles": { "attendee": true }, "locationId": "loc3", "participationStatus": "needs-action", "expectReply": true, "scheduleSequence": 0 } } }, And the following participants at the top level: "participants": { "bar": { "@type": "Participant", "sendTo": { "imip": "mailto:bar at local" }, "email": "bar at local", "name": "Bar", "kind": "individual", "roles": { "attendee": true }, "locationId": "loc2", "participationStatus": "needs-action", "expectReply": true, "scheduleSequence": 0 }, "foo": { "@type": "Participant", "sendTo": { "imip": "mailto:foo at local" }, "email": "foo at local", "name": "Foo", "kind": "individual", "roles": { "owner": true, "attendee": true, "chair": true }, "locationId": "loc1", "participationStatus": "accepted", "expectReply": false, "scheduleSequence": 0, "participationComment": "Sure; see you \\"soon\\"!" } }, Note that if 'baz' was also present on the top level, the patch with this syntax would be: 'recurrenceOverrides/2019-03-01T09:00:00/participants~1baz~1participationStatus' => 'accepted', And theoretically if there were no participants on the top level event (not possible in this model) it would be: 'recurrenceOverrides/2019-03-01T09:00:00/participants/baz/participationStatus' => 'accepted', Because there would be a rich participants object in the override. *My contention is this:** * We should always patch the model, not the wire representation. The wire representation is a compact representation of the event, in which each recurrence override is just a patch of what needs to be changed, but critically *an entire copy of the event is a valid patch as well*. I would argue that we should always patch the overrides as if we were patching a complete copy of the event, so that the last format is always correct. Otherwise, we're assuming that the server always calculates the patch in the same way, and there's no guarantee of that. This means the client would need to be changed rather than Cyrus to fix our production bug. I have vague memories of discussing this with Robert and deciding that patching the entire model as if the datastructure was fully resolved is the only sane approach, now that I come back an dig into this some more. This means that it doesn't matter how the calendar event is stored on the server, because it always has a way to expand every recurrence out to the full object, and then calculate the minimised version for storage and /get again. This also means that we never get weird '~1' items in the patch paths when we update an event over the wire, which is kinda nice - and the same patch will still apply cleanly even if 'baz' got invited to the master event in the meantime. Bron. -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From dilyan.palauzov at aegee.org Mon Nov 25 09:29:01 2019 From: dilyan.palauzov at aegee.org (=?UTF-8?Q?=D0=94=D0=B8=D0=BB=D1=8F=D0=BD_?= =?UTF-8?Q?=D0=9F=D0=B0=D0=BB=D0=B0=D1=83=D0=B7=D0=BE=D0=B2?=) Date: Mon, 25 Nov 2019 14:29:01 +0000 Subject: The master janitor goes crazy / Re: Debugging Deadlocks In-Reply-To: <92cca1d7baac62ef2b3cbe3f59a771796aba19dd.camel@aegee.org> References: <92cca1d7baac62ef2b3cbe3f59a771796aba19dd.camel@aegee.org> Message-ID: <78928faba6a46f1b60e31d29d1061668a372cda3.camel@aegee.org> Hello, > I run cyrus imap 3.0.x with some private changes. > > Sometimes when stop the master process, the master process utilizes one CPU core to 100% for 5 minutes. After the fifth > minute, systemd enforces kill -9. When I attach to the maste process, I see that it some janitor does some work, but I > have not checked the details. Has anybody experienced this? I run cyrus imap. At some moment I recompile and reinstall the binaries, which in theory means that the binaries detect this change and restart theirselves. At some moment I call "systemctl stop cyrus-imap" which I guess sends SIGTERM to the master process. Then the CPU utilization of the master process goes to 100%. In the systemd service file I have TimeoutStopSec=320 . After this time, the master process continues running and systemd sends 9/SIGKILL. It is not necessary that on re-installing the binaries, and then shutting down the CPU goes to 100%: it is possible that the CPU goes to 100%, without reinstalling (and thus triggering self-restarting) of the imapd/httpd binaries. It is often, but not always, that this 100% CPU loop is entered on shutdown. I have a webmail client and to speedup things it uses SquirrelMail's IMAP Proxy (http://www.imapproxy.org/ a Caching IMAP proxy). It is recommended in the installation manual of Horde/IMP. The IMAP caching proxy connects to 127.0.0.2:143 (and is therefore permitted to skip the TLS overload). In master conf I have a line ?imaplocal cmd="imapd -C /usr/local/etc/cyrus/imapdlocal.conf" listen="127.0.0.2:imap" prefork=0?. When the CPU goes to 100% on shutdown I connect with gdb to the master process. Below is the full backtrace. Does somebody have an explanation why the master process enters a never ending loop? I do not say that all above information has to be involved in the anwer. Has somebody else experienced this effects? Any suggestions how to investigate this deeper? Greetings ????? --- warning: Could not load vsyscall page because no executable was specified Reading symbols from /usr/local/libexec/master... Attaching to program: /usr/local/libexec/master, process 9247 Reading symbols from /usr/local/lib/libcyrus_min.so.0... Reading symbols from /lib/libuuid.so.1... Reading symbols from /usr/local/lib/libgssapi_krb5.so.2... Reading symbols from /usr/local/lib/libkrb5.so.3... Reading symbols from /usr/local/lib/libk5crypto.so.3... Reading symbols from /usr/local/lib/libcom_err.so.3... Reading symbols from /usr/local/lib/libkrb5support.so.0... Reading symbols from /usr/local/lib/libpcreposix.so.0... (No debugging symbols found in /usr/local/lib/libpcreposix.so.0) Reading symbols from /usr/local/lib/libpcre.so.1... (No debugging symbols found in /usr/local/lib/libpcre.so.1) Reading symbols from /usr/local/lib/libxml2.so.2... Reading symbols from /usr/local/lib/liblzma.so.5... (No debugging symbols found in /usr/local/lib/liblzma.so.5) Reading symbols from /usr/local/lib/libical.so.3... Reading symbols from /usr/local/lib/libicalss.so.3... Reading symbols from /usr/local/lib/libicalvcal.so.3... Reading symbols from /usr/local/lib/libicui18n.so.63... Reading symbols from /usr/local/lib/libicuuc.so.63... Reading symbols from /usr/local/lib/libicudata.so.63... (No debugging symbols found in /usr/local/lib/libicudata.so.63) Reading symbols from /usr/local/lib/libsqlite3.so.0... (No debugging symbols found in /usr/local/lib/libsqlite3.so.0) Reading symbols from /usr/local/lib/libz.so.1... (No debugging symbols found in /usr/local/lib/libz.so.1) Reading symbols from /lib64/libm.so.6... Reading symbols from /lib64/libdl.so.2... Reading symbols from /lib64/libpthread.so.0... [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Reading symbols from /lib64/libc.so.6... Reading symbols from /lib64/ld-linux-x86-64.so.2... Reading symbols from /lib64/libresolv.so.2... Reading symbols from /usr/local/lib/libdb-18.1.so... Reading symbols from /usr/local/lib64/libstdc++.so.6... Reading symbols from /usr/local/lib64/libgcc_s.so.1... Reading symbols from /usr/local/lib64/libssl.so.1.1... Reading symbols from /usr/local/lib64/libcrypto.so.1.1... Reading symbols from /lib64/libnss_db.so.2... Reading symbols from /lib64/libnss_files.so.2... Reading symbols from /lib64/libnss_dns.so.2... 0x0000000000405406 in child_janitor (now=...) at master/master.c:1192 1192 janitor_position = janitor_position % child_table_size; ?(gdb) bt f Id Target Id Frame * 1 Thread 0x7f6a08759780 (LWP 9247) "master" 0x0000000000405406 in child_janitor (now=...) at master/master.c:1192 #0 0x0000000000405406 in child_janitor (now=...) at master/master.c:1192 i = 9299 p = 0x4132e0 c = 0x0 #1 0x0000000000409dd7 in main (argc=4, argv=0x7ffea3075108) at master/master.c:2600 i = 14 ready_fds = 3 total_children = 11 tv = { tv_sec = 0, tv_usec = 0 } msg = { message = 1, service_pid = 28219 } maxfd = 41 tvptr = 0x0 interrupted = 0 pidfile = 0x40c4f0 "/var/run/cyrus-master.pid" pidfile_lock = 0x2135ba0 "/usr/local/etc/cyrus/imapdlocal.conf" startup_pipe = {6, 7} pidlock_fd = -1 i = 14 opt = -1 close_std = 1 daemon_mode = 1 error_log = 0x0 alt_config = 0x0 fd = 3 rfds = { fds_bits = {266272, 0 } } r = 1 now = { tv_sec = 1574690925, tv_usec = 958878 } p = 0x0 quit Detaching from program: /usr/local/libexec/master, process 9247 [Inferior 1 (process 9247) detached] From ellie at fastmail.com Tue Nov 26 17:17:13 2019 From: ellie at fastmail.com (ellie timoney) Date: Wed, 27 Nov 2019 09:17:13 +1100 Subject: The master janitor goes crazy / Re: Debugging Deadlocks In-Reply-To: <78928faba6a46f1b60e31d29d1061668a372cda3.camel@aegee.org> References: <92cca1d7baac62ef2b3cbe3f59a771796aba19dd.camel@aegee.org> <78928faba6a46f1b60e31d29d1061668a372cda3.camel@aegee.org> Message-ID: Can you strace the master process next time it's spinning at 100%? What is it doing at that time? On Tue, Nov 26, 2019, at 1:29 AM, ????? ???????? wrote: > Hello, > > > I run cyrus imap 3.0.x with some private changes. > > > > Sometimes when stop the master process, the master process utilizes one CPU core to 100% for 5 minutes. After the fifth > > minute, systemd enforces kill -9. When I attach to the maste process, I see that it some janitor does some work, but I > > have not checked the details. Has anybody experienced this? > > I run cyrus imap. At some moment I recompile and reinstall the > binaries, which in theory means that the binaries > detect this change and restart theirselves. At some moment I call > "systemctl stop cyrus-imap" which I guess sends > SIGTERM to the master process. Then the CPU utilization of the master > process goes to 100%. In the systemd service > file I have TimeoutStopSec=320 . After this time, the master process > continues running and systemd sends 9/SIGKILL. It > is not necessary that on re-installing the binaries, and then shutting > down the CPU goes to 100%: it is possible that > the CPU goes to 100%, without reinstalling (and thus triggering > self-restarting) of the imapd/httpd binaries. > > It is often, but not always, that this 100% CPU loop is entered on shutdown. > > I have a webmail client and to speedup things it uses SquirrelMail's > IMAP Proxy (http://www.imapproxy.org/ a Caching > IMAP proxy). It is recommended in the installation manual of > Horde/IMP. The IMAP caching proxy connects to > 127.0.0.2:143 (and is therefore permitted to skip the TLS overload). > In master conf I have a line > ?imaplocal cmd="imapd -C /usr/local/etc/cyrus/imapdlocal.conf" > listen="127.0.0.2:imap" prefork=0?. > > When the CPU goes to 100% on shutdown I connect with gdb to the master > process. Below is the full backtrace. Does > somebody have an explanation why the master process enters a never > ending loop? > > I do not say that all above information has to be involved in the > anwer. Has somebody else experienced this effects? > Any suggestions how to investigate this deeper? > > Greetings > ????? > > --- > warning: Could not load vsyscall page because no executable was > specified > Reading symbols from /usr/local/libexec/master... > Attaching to program: /usr/local/libexec/master, process 9247 > Reading symbols from /usr/local/lib/libcyrus_min.so.0... > Reading symbols from /lib/libuuid.so.1... > Reading symbols from /usr/local/lib/libgssapi_krb5.so.2... > Reading symbols from /usr/local/lib/libkrb5.so.3... > Reading symbols from /usr/local/lib/libk5crypto.so.3... > Reading symbols from /usr/local/lib/libcom_err.so.3... > Reading symbols from /usr/local/lib/libkrb5support.so.0... > Reading symbols from /usr/local/lib/libpcreposix.so.0... > (No debugging symbols found in /usr/local/lib/libpcreposix.so.0) > Reading symbols from /usr/local/lib/libpcre.so.1... > (No debugging symbols found in /usr/local/lib/libpcre.so.1) > Reading symbols from /usr/local/lib/libxml2.so.2... > Reading symbols from /usr/local/lib/liblzma.so.5... > (No debugging symbols found in /usr/local/lib/liblzma.so.5) > Reading symbols from /usr/local/lib/libical.so.3... > Reading symbols from /usr/local/lib/libicalss.so.3... > Reading symbols from /usr/local/lib/libicalvcal.so.3... > Reading symbols from /usr/local/lib/libicui18n.so.63... > Reading symbols from /usr/local/lib/libicuuc.so.63... > Reading symbols from /usr/local/lib/libicudata.so.63... > (No debugging symbols found in /usr/local/lib/libicudata.so.63) > Reading symbols from /usr/local/lib/libsqlite3.so.0... > (No debugging symbols found in /usr/local/lib/libsqlite3.so.0) > Reading symbols from /usr/local/lib/libz.so.1... > (No debugging symbols found in /usr/local/lib/libz.so.1) > Reading symbols from /lib64/libm.so.6... > Reading symbols from /lib64/libdl.so.2... > Reading symbols from /lib64/libpthread.so.0... > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > Reading symbols from /lib64/libc.so.6... > Reading symbols from /lib64/ld-linux-x86-64.so.2... > Reading symbols from /lib64/libresolv.so.2... > Reading symbols from /usr/local/lib/libdb-18.1.so... > Reading symbols from /usr/local/lib64/libstdc++.so.6... > Reading symbols from /usr/local/lib64/libgcc_s.so.1... > Reading symbols from /usr/local/lib64/libssl.so.1.1... > Reading symbols from /usr/local/lib64/libcrypto.so.1.1... > Reading symbols from /lib64/libnss_db.so.2... > Reading symbols from /lib64/libnss_files.so.2... > Reading symbols from /lib64/libnss_dns.so.2... > 0x0000000000405406 in child_janitor (now=...) at master/master.c:1192 > 1192 janitor_position = janitor_position % child_table_size; > ?(gdb) bt f > Id Target Id Frame > * 1 Thread 0x7f6a08759780 (LWP 9247) "master" 0x0000000000405406 in > child_janitor (now=...) at master/master.c:1192 > #0 0x0000000000405406 in child_janitor (now=...) at > master/master.c:1192 > i = 9299 > p = 0x4132e0 > c = 0x0 > #1 0x0000000000409dd7 in main (argc=4, argv=0x7ffea3075108) at > master/master.c:2600 > i = 14 > ready_fds = 3 > total_children = 11 > tv = { > tv_sec = 0, > tv_usec = 0 > } > msg = { > message = 1, > service_pid = 28219 > } > maxfd = 41 > tvptr = 0x0 > interrupted = 0 > pidfile = 0x40c4f0 "/var/run/cyrus-master.pid" > pidfile_lock = 0x2135ba0 "/usr/local/etc/cyrus/imapdlocal.conf" > startup_pipe = {6, 7} > pidlock_fd = -1 > i = 14 > opt = -1 > close_std = 1 > daemon_mode = 1 > error_log = 0x0 > alt_config = 0x0 > fd = 3 > rfds = { > fds_bits = {266272, 0 } > } > r = 1 > now = { > tv_sec = 1574690925, > tv_usec = 958878 > } > p = 0x0 > quit > Detaching from program: /usr/local/libexec/master, process 9247 > [Inferior 1 (process 9247) detached] > > > > From ellie at fastmail.com Wed Nov 27 18:34:21 2019 From: ellie at fastmail.com (ellie timoney) Date: Thu, 28 Nov 2019 10:34:21 +1100 Subject: The master janitor goes crazy / Re: Debugging Deadlocks In-Reply-To: References: <92cca1d7baac62ef2b3cbe3f59a771796aba19dd.camel@aegee.org> <78928faba6a46f1b60e31d29d1061668a372cda3.camel@aegee.org> Message-ID: <25d97486-b257-44bb-b47a-3ddc9b16d5de@www.fastmail.com> Saw something similar just now when I killed a cassandane run off prematurely. One cyrus master process wound up spinning like this: pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) 0x0000555ac7124a97 in child_janitor (now=...) at master/master.c:1221 1221 janitor_position = janitor_position % child_table_size; (gdb) bt #0 0x0000555ac7124a97 in child_janitor (now=...) at master/master.c:1221 #1 0x0000555ac712a67a in main (argc=10, argv=0x7ffdc1fe78b8) at master/master.c:2812 Haven't dug further yet, but it looks similar to your report On Wed, Nov 27, 2019, at 9:17 AM, ellie timoney wrote: > Can you strace the master process next time it's spinning at 100%? > What is it doing at that time? > > On Tue, Nov 26, 2019, at 1:29 AM, ????? ???????? wrote: > > Hello, > > > > > I run cyrus imap 3.0.x with some private changes. > > > > > > Sometimes when stop the master process, the master process utilizes one CPU core to 100% for 5 minutes. After the fifth > > > minute, systemd enforces kill -9. When I attach to the maste process, I see that it some janitor does some work, but I > > > have not checked the details. Has anybody experienced this? > > > > I run cyrus imap. At some moment I recompile and reinstall the > > binaries, which in theory means that the binaries > > detect this change and restart theirselves. At some moment I call > > "systemctl stop cyrus-imap" which I guess sends > > SIGTERM to the master process. Then the CPU utilization of the master > > process goes to 100%. In the systemd service > > file I have TimeoutStopSec=320 . After this time, the master process > > continues running and systemd sends 9/SIGKILL. It > > is not necessary that on re-installing the binaries, and then shutting > > down the CPU goes to 100%: it is possible that > > the CPU goes to 100%, without reinstalling (and thus triggering > > self-restarting) of the imapd/httpd binaries. > > > > It is often, but not always, that this 100% CPU loop is entered on shutdown. > > > > I have a webmail client and to speedup things it uses SquirrelMail's > > IMAP Proxy (http://www.imapproxy.org/ a Caching > > IMAP proxy). It is recommended in the installation manual of > > Horde/IMP. The IMAP caching proxy connects to > > 127.0.0.2:143 (and is therefore permitted to skip the TLS overload). > > In master conf I have a line > > ?imaplocal cmd="imapd -C /usr/local/etc/cyrus/imapdlocal.conf" > > listen="127.0.0.2:imap" prefork=0?. > > > > When the CPU goes to 100% on shutdown I connect with gdb to the master > > process. Below is the full backtrace. Does > > somebody have an explanation why the master process enters a never > > ending loop? > > > > I do not say that all above information has to be involved in the > > anwer. Has somebody else experienced this effects? > > Any suggestions how to investigate this deeper? > > > > Greetings > > ????? > > > > --- > > warning: Could not load vsyscall page because no executable was > > specified > > Reading symbols from /usr/local/libexec/master... > > Attaching to program: /usr/local/libexec/master, process 9247 > > Reading symbols from /usr/local/lib/libcyrus_min.so.0... > > Reading symbols from /lib/libuuid.so.1... > > Reading symbols from /usr/local/lib/libgssapi_krb5.so.2... > > Reading symbols from /usr/local/lib/libkrb5.so.3... > > Reading symbols from /usr/local/lib/libk5crypto.so.3... > > Reading symbols from /usr/local/lib/libcom_err.so.3... > > Reading symbols from /usr/local/lib/libkrb5support.so.0... > > Reading symbols from /usr/local/lib/libpcreposix.so.0... > > (No debugging symbols found in /usr/local/lib/libpcreposix.so.0) > > Reading symbols from /usr/local/lib/libpcre.so.1... > > (No debugging symbols found in /usr/local/lib/libpcre.so.1) > > Reading symbols from /usr/local/lib/libxml2.so.2... > > Reading symbols from /usr/local/lib/liblzma.so.5... > > (No debugging symbols found in /usr/local/lib/liblzma.so.5) > > Reading symbols from /usr/local/lib/libical.so.3... > > Reading symbols from /usr/local/lib/libicalss.so.3... > > Reading symbols from /usr/local/lib/libicalvcal.so.3... > > Reading symbols from /usr/local/lib/libicui18n.so.63... > > Reading symbols from /usr/local/lib/libicuuc.so.63... > > Reading symbols from /usr/local/lib/libicudata.so.63... > > (No debugging symbols found in /usr/local/lib/libicudata.so.63) > > Reading symbols from /usr/local/lib/libsqlite3.so.0... > > (No debugging symbols found in /usr/local/lib/libsqlite3.so.0) > > Reading symbols from /usr/local/lib/libz.so.1... > > (No debugging symbols found in /usr/local/lib/libz.so.1) > > Reading symbols from /lib64/libm.so.6... > > Reading symbols from /lib64/libdl.so.2... > > Reading symbols from /lib64/libpthread.so.0... > > [Thread debugging using libthread_db enabled] > > Using host libthread_db library "/lib64/libthread_db.so.1". > > Reading symbols from /lib64/libc.so.6... > > Reading symbols from /lib64/ld-linux-x86-64.so.2... > > Reading symbols from /lib64/libresolv.so.2... > > Reading symbols from /usr/local/lib/libdb-18.1.so... > > Reading symbols from /usr/local/lib64/libstdc++.so.6... > > Reading symbols from /usr/local/lib64/libgcc_s.so.1... > > Reading symbols from /usr/local/lib64/libssl.so.1.1... > > Reading symbols from /usr/local/lib64/libcrypto.so.1.1... > > Reading symbols from /lib64/libnss_db.so.2... > > Reading symbols from /lib64/libnss_files.so.2... > > Reading symbols from /lib64/libnss_dns.so.2... > > 0x0000000000405406 in child_janitor (now=...) at master/master.c:1192 > > 1192 janitor_position = janitor_position % child_table_size; > > ?(gdb) bt f > > Id Target Id Frame > > * 1 Thread 0x7f6a08759780 (LWP 9247) "master" 0x0000000000405406 in > > child_janitor (now=...) at master/master.c:1192 > > #0 0x0000000000405406 in child_janitor (now=...) at > > master/master.c:1192 > > i = 9299 > > p = 0x4132e0 > > c = 0x0 > > #1 0x0000000000409dd7 in main (argc=4, argv=0x7ffea3075108) at > > master/master.c:2600 > > i = 14 > > ready_fds = 3 > > total_children = 11 > > tv = { > > tv_sec = 0, > > tv_usec = 0 > > } > > msg = { > > message = 1, > > service_pid = 28219 > > } > > maxfd = 41 > > tvptr = 0x0 > > interrupted = 0 > > pidfile = 0x40c4f0 "/var/run/cyrus-master.pid" > > pidfile_lock = 0x2135ba0 "/usr/local/etc/cyrus/imapdlocal.conf" > > startup_pipe = {6, 7} > > pidlock_fd = -1 > > i = 14 > > opt = -1 > > close_std = 1 > > daemon_mode = 1 > > error_log = 0x0 > > alt_config = 0x0 > > fd = 3 > > rfds = { > > fds_bits = {266272, 0 } > > } > > r = 1 > > now = { > > tv_sec = 1574690925, > > tv_usec = 958878 > > } > > p = 0x0 > > quit > > Detaching from program: /usr/local/libexec/master, process 9247 > > [Inferior 1 (process 9247) detached] > > > > > > > > >