From dilyan.palauzov at aegee.org Mon Dec 2 07:18:13 2019 From: dilyan.palauzov at aegee.org (=?UTF-8?Q?=D0=94=D0=B8=D0=BB=D1=8F=D0=BD_?= =?UTF-8?Q?=D0=9F=D0=B0=D0=BB=D0=B0=D1=83=D0=B7=D0=BE=D0=B2?=) Date: Mon, 02 Dec 2019 12:18:13 +0000 Subject: The master janitor goes crazy / Re: Debugging Deadlocks In-Reply-To: <25d97486-b257-44bb-b47a-3ddc9b16d5de@www.fastmail.com> References: <92cca1d7baac62ef2b3cbe3f59a771796aba19dd.camel@aegee.org> <78928faba6a46f1b60e31d29d1061668a372cda3.camel@aegee.org> <25d97486-b257-44bb-b47a-3ddc9b16d5de@www.fastmail.com> Message-ID: Hello Ellie, this is exactly what I see (countless pselect calls), but I have as second parameter of pselect a much larger array. I just observed that on killing master, it terminates all cyrus processes but two (httpd and notifyd). Then I try to connect to that processes (gdb). This does not work, however, since the processes are moved to zombie status. Greetings ????? On Thu, 2019-11-28 at 10:34 +1100, ellie timoney wrote: > Saw something similar just now when I killed a cassandane run off prematurely. One cyrus master process wound up spinning like this: > > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > > 0x0000555ac7124a97 in child_janitor (now=...) at master/master.c:1221 > 1221 janitor_position = janitor_position % child_table_size; > (gdb) bt > #0 0x0000555ac7124a97 in child_janitor (now=...) at master/master.c:1221 > #1 0x0000555ac712a67a in main (argc=10, argv=0x7ffdc1fe78b8) > at master/master.c:2812 > > Haven't dug further yet, but it looks similar to your report > > On Wed, Nov 27, 2019, at 9:17 AM, ellie timoney wrote: > > Can you strace the master process next time it's spinning at 100%? > > What is it doing at that time? > > > > On Tue, Nov 26, 2019, at 1:29 AM, ????? ???????? wrote: > > > Hello, > > > > > > > I run cyrus imap 3.0.x with some private changes. > > > > > > > > Sometimes when stop the master process, the master process utilizes one CPU core to 100% for 5 minutes. After the fifth > > > > minute, systemd enforces kill -9. When I attach to the maste process, I see that it some janitor does some work, but I > > > > have not checked the details. Has anybody experienced this? > > > > > > I run cyrus imap. At some moment I recompile and reinstall the > > > binaries, which in theory means that the binaries > > > detect this change and restart theirselves. At some moment I call > > > "systemctl stop cyrus-imap" which I guess sends > > > SIGTERM to the master process. Then the CPU utilization of the master > > > process goes to 100%. In the systemd service > > > file I have TimeoutStopSec=320 . After this time, the master process > > > continues running and systemd sends 9/SIGKILL. It > > > is not necessary that on re-installing the binaries, and then shutting > > > down the CPU goes to 100%: it is possible that > > > the CPU goes to 100%, without reinstalling (and thus triggering > > > self-restarting) of the imapd/httpd binaries. > > > > > > It is often, but not always, that this 100% CPU loop is entered on shutdown. > > > > > > I have a webmail client and to speedup things it uses SquirrelMail's > > > IMAP Proxy (http://www.imapproxy.org/ a Caching > > > IMAP proxy). It is recommended in the installation manual of > > > Horde/IMP. The IMAP caching proxy connects to > > > 127.0.0.2:143 (and is therefore permitted to skip the TLS overload). > > > In master conf I have a line > > > ?imaplocal cmd="imapd -C /usr/local/etc/cyrus/imapdlocal.conf" > > > listen="127.0.0.2:imap" prefork=0?. > > > > > > When the CPU goes to 100% on shutdown I connect with gdb to the master > > > process. Below is the full backtrace. Does > > > somebody have an explanation why the master process enters a never > > > ending loop? > > > > > > I do not say that all above information has to be involved in the > > > anwer. Has somebody else experienced this effects? > > > Any suggestions how to investigate this deeper? > > > > > > Greetings > > > ????? > > > > > > --- > > > warning: Could not load vsyscall page because no executable was > > > specified > > > Reading symbols from /usr/local/libexec/master... > > > Attaching to program: /usr/local/libexec/master, process 9247 > > > Reading symbols from /usr/local/lib/libcyrus_min.so.0... > > > Reading symbols from /lib/libuuid.so.1... > > > Reading symbols from /usr/local/lib/libgssapi_krb5.so.2... > > > Reading symbols from /usr/local/lib/libkrb5.so.3... > > > Reading symbols from /usr/local/lib/libk5crypto.so.3... > > > Reading symbols from /usr/local/lib/libcom_err.so.3... > > > Reading symbols from /usr/local/lib/libkrb5support.so.0... > > > Reading symbols from /usr/local/lib/libpcreposix.so.0... > > > (No debugging symbols found in /usr/local/lib/libpcreposix.so.0) > > > Reading symbols from /usr/local/lib/libpcre.so.1... > > > (No debugging symbols found in /usr/local/lib/libpcre.so.1) > > > Reading symbols from /usr/local/lib/libxml2.so.2... > > > Reading symbols from /usr/local/lib/liblzma.so.5... > > > (No debugging symbols found in /usr/local/lib/liblzma.so.5) > > > Reading symbols from /usr/local/lib/libical.so.3... > > > Reading symbols from /usr/local/lib/libicalss.so.3... > > > Reading symbols from /usr/local/lib/libicalvcal.so.3... > > > Reading symbols from /usr/local/lib/libicui18n.so.63... > > > Reading symbols from /usr/local/lib/libicuuc.so.63... > > > Reading symbols from /usr/local/lib/libicudata.so.63... > > > (No debugging symbols found in /usr/local/lib/libicudata.so.63) > > > Reading symbols from /usr/local/lib/libsqlite3.so.0... > > > (No debugging symbols found in /usr/local/lib/libsqlite3.so.0) > > > Reading symbols from /usr/local/lib/libz.so.1... > > > (No debugging symbols found in /usr/local/lib/libz.so.1) > > > Reading symbols from /lib64/libm.so.6... > > > Reading symbols from /lib64/libdl.so.2... > > > Reading symbols from /lib64/libpthread.so.0... > > > [Thread debugging using libthread_db enabled] > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > > Reading symbols from /lib64/libc.so.6... > > > Reading symbols from /lib64/ld-linux-x86-64.so.2... > > > Reading symbols from /lib64/libresolv.so.2... > > > Reading symbols from /usr/local/lib/libdb-18.1.so... > > > Reading symbols from /usr/local/lib64/libstdc++.so.6... > > > Reading symbols from /usr/local/lib64/libgcc_s.so.1... > > > Reading symbols from /usr/local/lib64/libssl.so.1.1... > > > Reading symbols from /usr/local/lib64/libcrypto.so.1.1... > > > Reading symbols from /lib64/libnss_db.so.2... > > > Reading symbols from /lib64/libnss_files.so.2... > > > Reading symbols from /lib64/libnss_dns.so.2... > > > 0x0000000000405406 in child_janitor (now=...) at master/master.c:1192 > > > 1192 janitor_position = janitor_position % child_table_size; > > > ?(gdb) bt f > > > Id Target Id Frame > > > * 1 Thread 0x7f6a08759780 (LWP 9247) "master" 0x0000000000405406 in > > > child_janitor (now=...) at master/master.c:1192 > > > #0 0x0000000000405406 in child_janitor (now=...) at > > > master/master.c:1192 > > > i = 9299 > > > p = 0x4132e0 > > > c = 0x0 > > > #1 0x0000000000409dd7 in main (argc=4, argv=0x7ffea3075108) at > > > master/master.c:2600 > > > i = 14 > > > ready_fds = 3 > > > total_children = 11 > > > tv = { > > > tv_sec = 0, > > > tv_usec = 0 > > > } > > > msg = { > > > message = 1, > > > service_pid = 28219 > > > } > > > maxfd = 41 > > > tvptr = 0x0 > > > interrupted = 0 > > > pidfile = 0x40c4f0 "/var/run/cyrus-master.pid" > > > pidfile_lock = 0x2135ba0 "/usr/local/etc/cyrus/imapdlocal.conf" > > > startup_pipe = {6, 7} > > > pidlock_fd = -1 > > > i = 14 > > > opt = -1 > > > close_std = 1 > > > daemon_mode = 1 > > > error_log = 0x0 > > > alt_config = 0x0 > > > fd = 3 > > > rfds = { > > > fds_bits = {266272, 0 } > > > } > > > r = 1 > > > now = { > > > tv_sec = 1574690925, > > > tv_usec = 958878 > > > } > > > p = 0x0 > > > quit > > > Detaching from program: /usr/local/libexec/master, process 9247 > > > [Inferior 1 (process 9247) detached] > > > > > > > > > > > > From dilyan.palauzov at aegee.org Mon Dec 2 07:46:42 2019 From: dilyan.palauzov at aegee.org (=?UTF-8?Q?=D0=94=D0=B8=D0=BB=D1=8F=D0=BD_?= =?UTF-8?Q?=D0=9F=D0=B0=D0=BB=D0=B0=D1=83=D0=B7=D0=BE=D0=B2?=) Date: Mon, 02 Dec 2019 12:46:42 +0000 Subject: cyrus.cache causes IOERROR: offset greater than cache size Message-ID: <4efb00c86df2dbea21c332edfec5a5750fd62f0c.camel@aegee.org> Hello, sometimes I get in the logs these messages: Dec 01 01:30:50 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size 5243456 2288(0) Dec 01 01:30:50 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 40568 (System I/O error) Dec 01 01:30:54 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size 5244620 2288(0) Dec 01 01:30:54 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 40569 (System I/O error) Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size 5247552 2288(0) Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 40571 (System I/O error) Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15463 (Mailbox format corruption detected) Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: cache entry truncated 1072 1835101728 2288(0) Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15464 (System I/O error) Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: cache entry truncated 2080 1131376244 2288(0) Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15465 (System I/O error) Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size (priority)3136 2288(0) Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15466 (System I/O error) Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size (priority)3976 2288(0) Often it is connected to cyr_exipre, but not always. It can be also lmtpd. When a cyrus.cache inconsistency is detected, the cyrus.cache is rebuild. This means reading a lot of files from the disk. During the reconstruction some locks are active, so effectively a lot of processes (lmtpd, imapd, httpd) are started and all of them wait for the lock to be released. This cache rebuild happens sometimes (perceived) very often. The problem is that on slow hard disks this repack operation can take hours and cyr_expire runs for hours. My reading of the code is that new records are only appended to cyrus.cache and there is some lock ensuring the consistency of the append operation. I have not invested that much time in reading the code. How is expunging supposed to happen in regards of cyrus.cache? Is the on unlink()ing any message the cyrus.cache always supposed to be repacked or where is the code for removing entries from cyrus.cache? How can I debug the cause of the invalid cache record? I assume that the cached records are kept, until the corresponding message file is removed from the disk. The cyr_expire output also contains: Dec 01 01:37:37 mail cyrus/cyr_expire[13952]: IOERROR: conversations_audit on load: /var/imap//user/s/s2.conversations B25572d90ed3363c1 0 (713535 1 0 0 () ((18 713534 1 1 0)) () PleaseconfirmyourNNNNregistrationnow. 0 ()) What am I supposed to do with this message? Regards ????? From brong at fastmailteam.com Mon Dec 2 17:51:27 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Tue, 03 Dec 2019 09:51:27 +1100 Subject: Changing JMAP IDs for Calendar and Contacts to be server generated Message-ID: Hi All, This was discussed into today's Cyrus call, but I figured I should put it here for a public note and to cover the discussion in more detail :) Fastmail has a "caldav_sync" tool, which replicates calendars from outside. Right now we rewrite the UID both ways in order to allow uniqueness of UIDs within our system, because we also constrain each UID to only exist once in all of a user's calendars (because of scheduling). This is variously buggy and annoying. Looking at various solutions for embedding mailboxid as well into the JMAP id, we came to the conclusion that the best move was actually to generate JMAP IDs synthetically on first receipt of a UID, and maintain that ID across changes. This has a couple of other good benefits: * doesn't use random junk off the wire as part of ids * can maintain the JMAP id even when moving between different calendars * fixed length IDs for JMAP, whereas UIDs can be quite long from some services * restricted character set means we don't have to escape parts of the UID (which is not ObjectId safe) All together, a big win. The same as not using the Message-Id header from emails, we won't use the UID from calendars or contacts. I'm looking at potential options for upgrade path for existing events - possibly even rewriting them on disk! It will definitely need a dav_db rewrite. Bron. -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at anatoli.ws Mon Dec 2 23:05:35 2019 From: me at anatoli.ws (Anatoli) Date: Tue, 3 Dec 2019 01:05:35 -0300 Subject: Cyrus webdav with Joplin In-Reply-To: References: <5d3f67e8-2faf-9cd8-a3a5-f4aba87861f2@anatoli.ws> Message-ID: <18ddf123-6d07-a97b-483b-65a57778e39f@anatoli.ws> The meth_mkcol function (and others in http_dav.c?) probably should be checked thoroughly, it looks like inside some conditions it would be better to use different status codes. I'm forwarding this mail to cyrus-devel@ and CC'ing Ken who probably knows this part better than anyone. WebDAV is an HTTP extension so it is guided by the HTTP standard RFC 7231 (HTTP/1.1) with additions by its own standard RFC 4918 (WebDAV) and then MKCOL is further extended by RFC 5689 (Extended MKCOL). For me it seems that in some aspects the later two contradict the first one. >From RFC 7231 (HTTP/1.1 [1]): The 403 (Forbidden) status code indicates that the server understood the request but refuses to *authorize* it. >From RFC 4918 (WebDAV [2]): 403 (Forbidden) - This indicates at least one of two conditions: 1) the server does not allow the creation of collections at the given location in its URL namespace, or 2) the parent collection of the Request-URI exists but cannot accept members. The second condition is what could be used here (the target URL can't accept the specified member which is the current behavior of Cyrus), but it has nothing to do with authorization as defined by HTTP/1.1 for 403. RFC 7231 (HTTP/1.1 [3]): The 405 (Method Not Allowed) status code indicates that the method received in the request-line is known by the origin server but *not supported by the target resource* ([2]), which in this case would mean that the URI on which MKCOL is tried does not allow MKCOL method at all, which is not true. >From RFC 4918 (WebDAV [2]): 405 (Method Not Allowed) - MKCOL can only be executed on an unmapped URL. ??? RFC 7231 (HTTP/1.1 [4]): The 409 (Conflict) status code indicates that the request could not be completed due to a *conflict with the current state of the target resource*, which in this case is the URI on which MKCOL is tried and this is exactly the case: the path already contains a collection so "the request could not be completed due to a conflict with the current state of the target resource". >From RFC 4918 (WebDAV [2]): 409 (Conflict) - A collection cannot be made at the Request-URI until one or more intermediate collections have been created. The server MUST NOT create those intermediate collections automatically. Additionally, RFC 7231 (HTTP/1.1 [5]): The 404 (Not Found) status code indicates that the origin server *did not find a current representation for the target resource*, which IMO is the case when a/b is not found when a/b/c creation is requested, but the WebDAV RFC says it's 409 Conflict, go figure. BTW, citing the HTTP/1.1 RFC: The origin server MUST generate an Allow header field in a 405 response containing a list of the target resource's currently supported methods [3]. [1] https://tools.ietf.org/html/rfc7231#section-6.5.3 [2] https://tools.ietf.org/html/rfc4918#section-9.3.1 [3] https://tools.ietf.org/html/rfc7231#section-6.5.5 [4] https://tools.ietf.org/html/rfc7231#section-6.5.8 [5] https://tools.ietf.org/html/rfc7231#section-6.5.4 On 2/12/19 07:13, Johan Hattne wrote: > Hi Anatoli; > > Thanks for your reply; I?ll be focusing on the MKCOL for now: > > I don?t know about permission to overwrite quite yet, but from looking at the source it seems the break (at https://github.com/cyrusimap/cyrus-imapd/blob/master/imap/http_dav.c#L5590) is what causes HTTP_FORBIDDEN to be returned. Now looking at the code in the client (https://github.com/laurent22/joplin/blob/master/ReactNativeClient/lib/file-api-driver-webdav.js#L164) it appears Joplin is expecting 405, or possibly 409, given the explanation in the comment following line 164. > > Given all that, it would seem to me that Cyrus should possibly change the aforementioned break to a return HTTP_CONFLICT, or HTTP_NOT_ALLOWED if the comment in Joplin is correct. I haven?t tested this yet (nor have read the RFC:s thoroughly), but I?d be happy to submit a pull request if this all checks out. Opinions? > > // Best wishes; Johan > >> On Dec 1, 2019, at 10:57, Anatoli wrote: >> >> Hi Johan, >> >> In RFC 7231 (HTTP 1.1) section 3.1.1.5 >> (https://tools.ietf.org/html/rfc7231#section-3.1.1.5) it says that CT >> header SHOULD be present, otherwise the recipient may interpret it the >> way it wants, so IMO no problem on the Cyrus side here. For >> application/json for example it MUST be present, application/xml doesn't >> demand that, but not sending it IMO is not a good behavior for >> interoperability. >> >> For collection that exists, does the user that makes the request have >> the rights to overwrite the collection? If not, 403 is the correct SC >> (status code). 405 should be used when the specified method is not >> allowed at all on the specified path, independently of the current >> server state, which is not the case here. >> >> So, again IMO no problem on the Cyrus side here, but if the user has >> sufficient rights, instead of 403 I'd use "409 Conflict" which is the >> recommended SC when a record with specified ID/name already exists. >> >> Regards, >> Anatoli >> >> On 28/11/19 04:40, Johan Hattne wrote: >>> Dear all; >>> >>> I?m trying to get Joplin (https://joplinapp.org) to work with Cyrus?s webdav module, and I?ve run into two issues: >>> >>> (1) When attempting to MKCOL a collection that already exists, Cyrus is responding with a 403, rather than a 405, which is what Joplin expects. >>> >>> (2) Cyrus returns an error if the Content-type isn?t set where additional XML-formatted information is required in a POST to complete a request. >>> >>> My skimming of the relevant RFC:s now lead me to believe that Cyrus is right on both counts; however, I don?t know enough about this to say for sure. Can anyone here confirm, or are these genuine Cyrus bugs? >>> >>> // Best wishes; Johan >>> ---- >>> Cyrus Home Page: http://www.cyrusimap.org/ >>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ >>> To Unsubscribe: >>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus >>> >> ---- >> Cyrus Home Page: http://www.cyrusimap.org/ >> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ >> To Unsubscribe: >> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus > From dilyan.palauzov at aegee.org Tue Dec 3 12:27:34 2019 From: dilyan.palauzov at aegee.org (=?UTF-8?Q?=D0=94=D0=B8=D0=BB=D1=8F=D0=BD_?= =?UTF-8?Q?=D0=9F=D0=B0=D0=BB=D0=B0=D1=83=D0=B7=D0=BE=D0=B2?=) Date: Tue, 03 Dec 2019 17:27:34 +0000 Subject: cyrus.cache causes IOERROR: offset greater than cache size In-Reply-To: <4efb00c86df2dbea21c332edfec5a5750fd62f0c.camel@aegee.org> References: <4efb00c86df2dbea21c332edfec5a5750fd62f0c.camel@aegee.org> Message-ID: Hello, it turned out that after emitting the messages below, cyrus.cache (3.0) was not self-repaired and stayed bogus. In fact, reconstruct also does not repair cyrus.cache, unless cyrus.index is deleted. If cyrus.index is present and cyrus.cache is missing, reconstruct creates a four-bytes large file. Greetings ????? On Mon, 2019-12-02 at 12:46 +0000, ????? ???????? wrote: > Hello, > > sometimes I get in the logs these messages: > > Dec 01 01:30:50 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size 5243456 2288(0) > Dec 01 01:30:50 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 40568 (System I/O error) > Dec 01 01:30:54 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size 5244620 2288(0) > Dec 01 01:30:54 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 40569 (System I/O error) > Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size 5247552 2288(0) > Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 40571 (System I/O error) > Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15463 (Mailbox format > corruption detected) > Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: cache entry truncated 1072 1835101728 2288(0) > Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15464 (System I/O error) > Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: cache entry truncated 2080 1131376244 2288(0) > Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15465 (System I/O error) > Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size (priority)3136 2288(0) > Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15466 (System I/O error) > Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size (priority)3976 2288(0) > > Often it is connected to cyr_exipre, but not always. It can be also lmtpd. > > When a cyrus.cache inconsistency is detected, the cyrus.cache is rebuild. This means reading a lot of files from the > disk. During the reconstruction some locks are active, so effectively a lot of processes (lmtpd, imapd, httpd) are > started and all of them wait for the lock to be released. This cache rebuild happens sometimes (perceived) very often. > The problem is that on slow hard disks this repack operation can take hours and cyr_expire runs for hours. > > My reading of the code is that new records are only appended to cyrus.cache and there is some lock ensuring the > consistency of the append operation. > > I have not invested that much time in reading the code. How is expunging supposed to happen in regards of cyrus.cache? > Is the on unlink()ing any message the cyrus.cache always supposed to be repacked or where is the code for removing > entries from cyrus.cache? How can I debug the cause of the invalid cache record? > > I assume that the cached records are kept, until the corresponding message file is removed from the disk. > > The cyr_expire output also contains: > > Dec 01 01:37:37 mail cyrus/cyr_expire[13952]: IOERROR: conversations_audit on load: /var/imap//user/s/s2.conversations > B25572d90ed3363c1 > 0 (713535 1 0 0 () ((18 713534 1 1 0)) () PleaseconfirmyourNNNNregistrationnow. 0 ()) > > What am I supposed to do with this message? > > Regards > ????? > From ellie at fastmail.com Tue Dec 3 22:22:11 2019 From: ellie at fastmail.com (ellie timoney) Date: Wed, 04 Dec 2019 14:22:11 +1100 Subject: The master janitor goes crazy / Re: Debugging Deadlocks In-Reply-To: References: <92cca1d7baac62ef2b3cbe3f59a771796aba19dd.camel@aegee.org> <78928faba6a46f1b60e31d29d1061668a372cda3.camel@aegee.org> <25d97486-b257-44bb-b47a-3ddc9b16d5de@www.fastmail.com> Message-ID: So, using my strace output from the other week as an example: > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) The arguments here are: * nfds: 13 (12+1) * readfds: [8 9 11 12] * writefds: NULL * exceptfds: NULL * timeout: NULL * sigmask: {[], 8} The interesting bits are: * we don't have a timeout (so this pselect would block forever if nothing became ready) * we're only waiting for fds to become readable (not writeable or having exceptions) * we don't have a sigmask set (empty array of 8-byte objects) The return value of 1 means that 1 of the fds was ready, and I surmise that "(in [11])" is telling us that it was fd 11 from the readfds set that was ready (for reading). The fact these pselect calls are all the same tells me that either: a lot is happening on fd 11 and we're not keeping up, or that there's data waiting on fd 11 and we keep ignoring it (so it keeps telling us it's there). The gdb backtrace isn't really useful here I don't think, I think it's coincidental that when we each attached a debugger we both happened to be at that particular line in child_janitor. Once we're in shutdown, child_janitor is the only thing doing much work, and that line is the top of its loop. I think the really useful information to collect next time this happens (and while the master process is still running) is: * What does lsof tell us about that ready file descriptor (in the example, fd 11)? I would be very interested to know if it's a client socket, or an internal messaging socket (that service processes use to tell master their status). * If you can attach a debugger and step through a couple of iterations of master's big "for (;;) {" loop, what path is it taking? What decisions is it making? * Without the debugger, if you let it run like this for 30 seconds or more, does a syslog line like this appear? https://github.com/cyrusimap/cyrus-imapd/blob/96d194de82d3dbe124a359069bd21f5cba7519ba/master/master.c#L1240-L1244 Cheers, ellie On Mon, Dec 2, 2019, at 11:18 PM, ????? ???????? wrote: > Hello Ellie, > > this is exactly what I see (countless pselect calls), but I have as > second parameter of pselect a much larger array. I > just observed that on killing master, it terminates all cyrus processes > but two (httpd and notifyd). Then I try to > connect to that processes (gdb). This does not work, however, since > the processes are moved to zombie status. > > Greetings > ????? > > On Thu, 2019-11-28 at 10:34 +1100, ellie timoney wrote: > > Saw something similar just now when I killed a cassandane run off prematurely. One cyrus master process wound up spinning like this: > > > > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11]) > > > > 0x0000555ac7124a97 in child_janitor (now=...) at master/master.c:1221 > > 1221 janitor_position = janitor_position % child_table_size; > > (gdb) bt > > #0 0x0000555ac7124a97 in child_janitor (now=...) at master/master.c:1221 > > #1 0x0000555ac712a67a in main (argc=10, argv=0x7ffdc1fe78b8) > > at master/master.c:2812 > > > > Haven't dug further yet, but it looks similar to your report > > > > On Wed, Nov 27, 2019, at 9:17 AM, ellie timoney wrote: > > > Can you strace the master process next time it's spinning at 100%? > > > What is it doing at that time? > > > > > > On Tue, Nov 26, 2019, at 1:29 AM, ????? ???????? wrote: > > > > Hello, > > > > > > > > > I run cyrus imap 3.0.x with some private changes. > > > > > > > > > > Sometimes when stop the master process, the master process utilizes one CPU core to 100% for 5 minutes. After the fifth > > > > > minute, systemd enforces kill -9. When I attach to the maste process, I see that it some janitor does some work, but I > > > > > have not checked the details. Has anybody experienced this? > > > > > > > > I run cyrus imap. At some moment I recompile and reinstall the > > > > binaries, which in theory means that the binaries > > > > detect this change and restart theirselves. At some moment I call > > > > "systemctl stop cyrus-imap" which I guess sends > > > > SIGTERM to the master process. Then the CPU utilization of the master > > > > process goes to 100%. In the systemd service > > > > file I have TimeoutStopSec=320 . After this time, the master process > > > > continues running and systemd sends 9/SIGKILL. It > > > > is not necessary that on re-installing the binaries, and then shutting > > > > down the CPU goes to 100%: it is possible that > > > > the CPU goes to 100%, without reinstalling (and thus triggering > > > > self-restarting) of the imapd/httpd binaries. > > > > > > > > It is often, but not always, that this 100% CPU loop is entered on shutdown. > > > > > > > > I have a webmail client and to speedup things it uses SquirrelMail's > > > > IMAP Proxy (http://www.imapproxy.org/ a Caching > > > > IMAP proxy). It is recommended in the installation manual of > > > > Horde/IMP. The IMAP caching proxy connects to > > > > 127.0.0.2:143 (and is therefore permitted to skip the TLS overload). > > > > In master conf I have a line > > > > ?imaplocal cmd="imapd -C /usr/local/etc/cyrus/imapdlocal.conf" > > > > listen="127.0.0.2:imap" prefork=0?. > > > > > > > > When the CPU goes to 100% on shutdown I connect with gdb to the master > > > > process. Below is the full backtrace. Does > > > > somebody have an explanation why the master process enters a never > > > > ending loop? > > > > > > > > I do not say that all above information has to be involved in the > > > > anwer. Has somebody else experienced this effects? > > > > Any suggestions how to investigate this deeper? > > > > > > > > Greetings > > > > ????? > > > > > > > > --- > > > > warning: Could not load vsyscall page because no executable was > > > > specified > > > > Reading symbols from /usr/local/libexec/master... > > > > Attaching to program: /usr/local/libexec/master, process 9247 > > > > Reading symbols from /usr/local/lib/libcyrus_min.so.0... > > > > Reading symbols from /lib/libuuid.so.1... > > > > Reading symbols from /usr/local/lib/libgssapi_krb5.so.2... > > > > Reading symbols from /usr/local/lib/libkrb5.so.3... > > > > Reading symbols from /usr/local/lib/libk5crypto.so.3... > > > > Reading symbols from /usr/local/lib/libcom_err.so.3... > > > > Reading symbols from /usr/local/lib/libkrb5support.so.0... > > > > Reading symbols from /usr/local/lib/libpcreposix.so.0... > > > > (No debugging symbols found in /usr/local/lib/libpcreposix.so.0) > > > > Reading symbols from /usr/local/lib/libpcre.so.1... > > > > (No debugging symbols found in /usr/local/lib/libpcre.so.1) > > > > Reading symbols from /usr/local/lib/libxml2.so.2... > > > > Reading symbols from /usr/local/lib/liblzma.so.5... > > > > (No debugging symbols found in /usr/local/lib/liblzma.so.5) > > > > Reading symbols from /usr/local/lib/libical.so.3... > > > > Reading symbols from /usr/local/lib/libicalss.so.3... > > > > Reading symbols from /usr/local/lib/libicalvcal.so.3... > > > > Reading symbols from /usr/local/lib/libicui18n.so.63... > > > > Reading symbols from /usr/local/lib/libicuuc.so.63... > > > > Reading symbols from /usr/local/lib/libicudata.so.63... > > > > (No debugging symbols found in /usr/local/lib/libicudata.so.63) > > > > Reading symbols from /usr/local/lib/libsqlite3.so.0... > > > > (No debugging symbols found in /usr/local/lib/libsqlite3.so.0) > > > > Reading symbols from /usr/local/lib/libz.so.1... > > > > (No debugging symbols found in /usr/local/lib/libz.so.1) > > > > Reading symbols from /lib64/libm.so.6... > > > > Reading symbols from /lib64/libdl.so.2... > > > > Reading symbols from /lib64/libpthread.so.0... > > > > [Thread debugging using libthread_db enabled] > > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > > > Reading symbols from /lib64/libc.so.6... > > > > Reading symbols from /lib64/ld-linux-x86-64.so.2... > > > > Reading symbols from /lib64/libresolv.so.2... > > > > Reading symbols from /usr/local/lib/libdb-18.1.so... > > > > Reading symbols from /usr/local/lib64/libstdc++.so.6... > > > > Reading symbols from /usr/local/lib64/libgcc_s.so.1... > > > > Reading symbols from /usr/local/lib64/libssl.so.1.1... > > > > Reading symbols from /usr/local/lib64/libcrypto.so.1.1... > > > > Reading symbols from /lib64/libnss_db.so.2... > > > > Reading symbols from /lib64/libnss_files.so.2... > > > > Reading symbols from /lib64/libnss_dns.so.2... > > > > 0x0000000000405406 in child_janitor (now=...) at master/master.c:1192 > > > > 1192 janitor_position = janitor_position % child_table_size; > > > > ?(gdb) bt f > > > > Id Target Id Frame > > > > * 1 Thread 0x7f6a08759780 (LWP 9247) "master" 0x0000000000405406 in > > > > child_janitor (now=...) at master/master.c:1192 > > > > #0 0x0000000000405406 in child_janitor (now=...) at > > > > master/master.c:1192 > > > > i = 9299 > > > > p = 0x4132e0 > > > > c = 0x0 > > > > #1 0x0000000000409dd7 in main (argc=4, argv=0x7ffea3075108) at > > > > master/master.c:2600 > > > > i = 14 > > > > ready_fds = 3 > > > > total_children = 11 > > > > tv = { > > > > tv_sec = 0, > > > > tv_usec = 0 > > > > } > > > > msg = { > > > > message = 1, > > > > service_pid = 28219 > > > > } > > > > maxfd = 41 > > > > tvptr = 0x0 > > > > interrupted = 0 > > > > pidfile = 0x40c4f0 "/var/run/cyrus-master.pid" > > > > pidfile_lock = 0x2135ba0 "/usr/local/etc/cyrus/imapdlocal.conf" > > > > startup_pipe = {6, 7} > > > > pidlock_fd = -1 > > > > i = 14 > > > > opt = -1 > > > > close_std = 1 > > > > daemon_mode = 1 > > > > error_log = 0x0 > > > > alt_config = 0x0 > > > > fd = 3 > > > > rfds = { > > > > fds_bits = {266272, 0 } > > > > } > > > > r = 1 > > > > now = { > > > > tv_sec = 1574690925, > > > > tv_usec = 958878 > > > > } > > > > p = 0x0 > > > > quit > > > > Detaching from program: /usr/local/libexec/master, process 9247 > > > > [Inferior 1 (process 9247) detached] > > > > > > > > > > > > > > > > > > From ellie at fastmail.com Tue Dec 3 22:27:17 2019 From: ellie at fastmail.com (ellie timoney) Date: Wed, 04 Dec 2019 14:27:17 +1100 Subject: Cyrus IMAPd version 3.1.8 Message-ID: <2f1624ab-d716-499f-9c97-72fa2edc8ff8@www.fastmail.com> The Cyrus team is pleased to announce the immediate availability of a new version of Cyrus IMAP: 3.1.8 This is a snapshot of the master branch, and should be considered for testing purposes and bleeding-edge features only. It is available as a git tag, which can be found here: https://github.com/cyrusimap/cyrus-imapd/releases/tag/cyrus-imapd-3.1.8 Join us on Github at https://github.com/cyrusimap/cyrus-imapd to report issues, join in the deliberations of new features for the next Cyrus IMAP release, and to contribute to the documentation. On behalf of the Cyrus team, ellie -------------- next part -------------- An HTML attachment was scrubbed... URL: From David.Luong at interoptechnologies.com Wed Dec 4 12:37:19 2019 From: David.Luong at interoptechnologies.com (Luong, David) Date: Wed, 4 Dec 2019 12:37:19 -0500 Subject: Error building cyrus-imapd-3.1.8 Message-ID: Hi, I?m building with the following options: $ autoreconf -i -s $ ./configure --enable-http --enable-jmap --enable-autocreate --enable-murder --enable-idled --enable-xapian --prefix=/usr/cyrus $ make The build is not completed with the following errors. make[2]: Leaving directory `/cyrus/cyrus-imapd-3.1.8' Making all in perl/annotator make[2]: Entering directory `/cyrus/cyrus-imapd-3.1.8/perl/annotator' make[2]: *** No rule to make target `all'. Stop. make[2]: Leaving directory `/cyrus/cyrus-imapd-3.1.8/perl/annotator' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/cyrus/cyrus-imapd-3.1.8' make: *** [all] Error 2 Please advise. Regards, David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brong at fastmailteam.com Thu Dec 5 06:52:36 2019 From: brong at fastmailteam.com (Bron Gondwana) Date: Thu, 05 Dec 2019 22:52:36 +1100 Subject: Cyrus IMAPd version 3.1.8 In-Reply-To: <2f1624ab-d716-499f-9c97-72fa2edc8ff8@www.fastmail.com> References: <2f1624ab-d716-499f-9c97-72fa2edc8ff8@www.fastmail.com> Message-ID: <44ddb82b-b952-4227-9ee9-31e19c3a3c84@dogfood.fastmail.com> FYI: this is almost exactly what Fastmail is running in production right now - it has about 4 minor commits beyond current production, and is missing the handful of Fastmail specific magic! OldRev: cyrus-imapd-3.1.8 NewRev: fmstable-20191203v1 Removes the following commits: 44210c59 2019-12-02 brong: jmap: don't need to update the alive value, mailbox.c already did that on write 463f6291 2019-12-03 brong: caldav: fix bogus && to & in read_cb 41f6117e 2019-12-02 brong: caldav: scheduling enabled should always be checked on the shared annotation (aka: owner) 3d69f08c 2019-12-03 rsto: jmap_mail: report "xapian" perf filter for contact group searches 469cacc0 2019-12-04 rsto: jmap_mail: move Identity data to jmap:submission capability dc91d2d4 2019-12-04 rsto: jmap_mail: reject mutable search in queryChanges 0b757d88 2019-12-04 rsto: jmap_mail_query: don't crash for nested multipart alternatives 9b30ee3f 2019-12-04 ellie: release notes for 3.1.8 18d157e0 2019-12-04 ellie: fix cve link in 3.1.7 release notes 96d194de 2019-12-04 ellie: developer release 3.1.8 Adds the following commits: 1c6ed3ad 2015-03-30 brong: Fastmail Secrets (no rated) 5ba2dbee 2015-03-30 brong: Fastmail ONLY - make assertion failures and fatal errors into coredumps 26d563c1 2015-03-30 brong: Fastmail ONLY - Remove sieve action string 0e71d55c 2017-08-18 brong: Fastmail ONLY - don't fiddle timezone data in http_caldav_sched.c 61da4794 2018-06-26 brong: Fastmail ONLY - re-apply the VEVENTS ONLY patch for alarms c51f3989 2019-02-06 rsto: Fastmail ONLY - mailbox owners always have ACL_ADMIN in JMAP 2f3e9516 2015-08-07 brong: mkdebian: fastmail build script (v29) This is from the attached "GitBranchDiff" script. All the commits listed as only being on "master" will be merged into Fastmail production next week when we rebase our build on the 3.1.8 tag (and possibly more changes from master too) Cheers, Bron. On Wed, Dec 4, 2019, at 14:27, ellie timoney wrote: > The Cyrus team is pleased to announce the immediate availability of a new version of Cyrus IMAP: 3.1.8 > > This is a snapshot of the master branch, and should be considered for testing purposes and bleeding-edge features only. It is available as a git tag, which can be found here: > > https://github.com/cyrusimap/cyrus-imapd/releases/tag/cyrus-imapd-3.1.8 > > Join us on Github at https://github.com/cyrusimap/cyrus-imapd to report issues, join in the deliberations of new features for the next Cyrus IMAP release, and to contribute to the documentation. > > On behalf of the Cyrus team, > > ellie -- Bron Gondwana, CEO, Fastmail Pty Ltd brong at fastmailteam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: GitBranchDiff.pl Type: application/x-perl Size: 1865 bytes Desc: not available URL: From ellie at fastmail.com Thu Dec 5 17:30:13 2019 From: ellie at fastmail.com (ellie timoney) Date: Fri, 06 Dec 2019 09:30:13 +1100 Subject: Error building cyrus-imapd-3.1.8 In-Reply-To: References: Message-ID: Hi David, That smells like a missing dependency. Have you reviewed https://www.cyrusimap.org/dev/imap/developer/compiling.html ? Looking at the error, and glancing at the dependencies list, I wonder if you need 'perl-devel'. It's listed as a developer-only dependency, but because you're building from a git tag and not a distribution tarball (where some things with tricky dependencies have been pre-compiled), you will probably need some or all of the developer dependencies as well. It would be nice if configure would report the missing dependency, instead of succeeding and then the build fails. If you can track down which missing dependency caused this problem, please let us know and I'll update configure to complain about it. :) Cheers, ellie On Thu, Dec 5, 2019, at 4:37 AM, Luong, David wrote: > Hi, > > I?m building with the following options: > > $ autoreconf -i -s > $ ./configure --enable-http --enable-jmap --enable-autocreate --enable-murder --enable-idled --enable-xapian --prefix=/usr/cyrus > $ make > > The build is not completed with the following errors. > > make[2]: Leaving directory `/cyrus/cyrus-imapd-3.1.8' > Making all in perl/annotator > make[2]: Entering directory `/cyrus/cyrus-imapd-3.1.8/perl/annotator' > make[2]: *** No rule to make target `all'. Stop. > make[2]: Leaving directory `/cyrus/cyrus-imapd-3.1.8/perl/annotator' > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory `/cyrus/cyrus-imapd-3.1.8' > make: *** [all] Error 2 > > > Please advise. > > Regards, > David. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dilyan.palauzov at aegee.org Tue Dec 10 17:52:44 2019 From: dilyan.palauzov at aegee.org (=?UTF-8?Q?=D0=94=D0=B8=D0=BB=D1=8F=D0=BD_?= =?UTF-8?Q?=D0=9F=D0=B0=D0=BB=D0=B0=D1=83=D0=B7=D0=BE=D0=B2?=) Date: Tue, 10 Dec 2019 22:52:44 +0000 Subject: 8074597e mailbox.c: release cache files when locking index Message-ID: Hello, I have the problem that on 3.0 sometimes the cyrus.cache gets truncated to 4 bytes (usually but not always by cyr_expire) and then it takes on Inbox very, very much IO and time to reconstruct cyrus.cache, and after a while cyrus.cache gets trucated again. Does this commit https://github.com/cyrusimap/cyrus-imapd/commit/e8074597e84cfb62cc fix the problem, is it useful for 3.0, does somebody see in syslog similar problems? What problem does this commit solve? Greetings ????? From David.Luong at interoptechnologies.com Wed Dec 11 18:40:17 2019 From: David.Luong at interoptechnologies.com (Luong, David) Date: Wed, 11 Dec 2019 18:40:17 -0500 Subject: Error building cyrus-imapd-3.1.8 In-Reply-To: References: Message-ID: Hi Ellie, I finally resolved the dependencies with the following packages. $ yum install python-docutils.noarch -y $ yum install python-sphinx -y $ yum install python-pygments.noarch -y $ yum install python3-pip.noarch ?y Regards, David. From: Cyrus-devel > on behalf of ellie timoney > Date: Thursday, December 5, 2019 at 4:30 PM To: "cyrus-devel at lists.andrew.cmu.edu" > Subject: Re: Error building cyrus-imapd-3.1.8 Hi David, That smells like a missing dependency. Have you reviewed https://www.cyrusimap.org/dev/imap/developer/compiling.html ? Looking at the error, and glancing at the dependencies list, I wonder if you need 'perl-devel'. It's listed as a developer-only dependency, but because you're building from a git tag and not a distribution tarball (where some things with tricky dependencies have been pre-compiled), you will probably need some or all of the developer dependencies as well. It would be nice if configure would report the missing dependency, instead of succeeding and then the build fails. If you can track down which missing dependency caused this problem, please let us know and I'll update configure to complain about it. :) Cheers, ellie On Thu, Dec 5, 2019, at 4:37 AM, Luong, David wrote: Hi, I?m building with the following options: $ autoreconf -i -s $ ./configure --enable-http --enable-jmap --enable-autocreate --enable-murder --enable-idled --enable-xapian --prefix=/usr/cyrus $ make The build is not completed with the following errors. make[2]: Leaving directory `/cyrus/cyrus-imapd-3.1.8' Making all in perl/annotator make[2]: Entering directory `/cyrus/cyrus-imapd-3.1.8/perl/annotator' make[2]: *** No rule to make target `all'. Stop. make[2]: Leaving directory `/cyrus/cyrus-imapd-3.1.8/perl/annotator' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/cyrus/cyrus-imapd-3.1.8' make: *** [all] Error 2 Please advise. Regards, David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjbs at fastmailteam.com Fri Dec 13 09:59:03 2019 From: rjbs at fastmailteam.com (Ricardo Signes) Date: Fri, 13 Dec 2019 09:59:03 -0500 Subject: yearly release cycle Message-ID: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com> Hey, remember last month when I asked about releasing Cyrus v3.2 ? That thread had some more conversation about what needs to get done before v3.2, and I wanted to come back to it and turn some things on their head. Right now, we?re talking about Cyrus releases being feature-bound. ?We?ll release v3.2 when feature X is done.? I think we?re not being well-served by that. As feature X is delayed (for various reasons that we can?t easily eliminate), it doesn?t just delay the feature, but also all the other minor bugfixes and optimizations that we?ve made in the master branch. Also, it sets up the idea that we delay releases for the sake of fixes, instead of releasing the fixes that are ready. That is: every additional criteria for a new release is another doorway to delay. Instead of opening those doors, I would rather try to eliminate all of them. I propose that instead of tying releases to milestones, we tie them to the calendar. For the sake of full disclosure: I am modeling this suggestion on the release cycle of perl , which I ran for several years. I found the process more than satisfactory, then. 1. A new *unstable release* of Cyrus is made every month. We promise only that it compiled and passed the Cassandane test suite on the release manager?s computer. It might contain regressions from previous unstable releases, it might have crashers or corruptors. We try to avoid any of these, but the goal here is a snapshot for easy month-to-month testing. These are the odd-middle-digit releases. (3.3.x) 2. A new *major release* of Cyrus is made every year. We will have tested it on as many configurations as we can readily test. We will have, some time before the release, frozen the branch for risky changes, to reduce churn. In the meantime, new work lives in feature branches. (The changelogs from each unstable release provide a good basis for the whole-year changelog!) These are the even-middle-digit third-digit-zero releases. (3.4.0) 3. A new *maintenance release* of Cyrus is made for the last two stable releases when there are enough fixes to critical bugs to warrant it. These are the even-middle-digit third-digit-nonzero releases (3.4.1) For the above to work, some more properties need to be maintained. Maintenance releases should be no-brainers to install, so they must only fix regressions, crashers, security vulnerabilities, and the like. This means that once you?re on 3.4.0, you can always upgrade within the 3.4 series with a minimum risk. It also means you get no optimizations, features, and the like. Major releases must clearly document any incompatible changes or upgrade steps required. Because non-regression bugfixes aren?t backported, we want everyone to be able to upgrade from major release to major release, so incompatible changes must be kept to a minimum. In part, this is just ?don?t kill off a feature people use just because it?s a little annoying.? The more important one is ?don?t introduce half-baked things that might need to change,? because people will come to rely on them before you get the updates finished. For features that will require multiple years to get right, they have to go behind a default-off configuration option. I?d strongly suggest they all have a uniform substring like ?unstable?. That way, when a complaint comes in that the behavior of JMAP calendaring has changed, we can reply, ?well, to use it, you had to turn on the unstable_jmap_calendaring? option. If we go with this policy, we?ll need to? 1. identify what issues are *blockers* to v3.2.0, meaning they?re regressions from v3.0 and would reasonably prevent someone from upgrading; this does *not* include all known bugs, since they may be bugs that already exist in the last stable release! 2. pick a release target for v3.2.0; I will arbitrarily suggest March 2 as ?not too far off, but far off enough that we can get things in order?; also, if you?re American, March 2 is 3/2 ;-) 3. produce a changleog, and especially identify what changes in master need documentation as ?incompatible changes? 4. produce a list of changes in master that should be put behind an unstable configuration option and then do it 5. decide when to stop merging non-release-related things to master 6. make a plan for who will do monthly snapshot releases I?ve spoken with ellie and Bron about just a few of these, such that I don?t think it?s all crazy. (ellie notes, correctly, I think, that the first set of releases like this will be the hard ones, where we work out things like ?how do we keep track of incompatibilities, upgrade steps, and also how do we make snapshots dead easy to release.?) If there?s general agreement, I am definitely ready to pitch in and help try to make it work! ? rjbs -------------- next part -------------- An HTML attachment was scrubbed... URL: From murch at fastmail.com Fri Dec 13 10:12:34 2019 From: murch at fastmail.com (Ken Murchison) Date: Fri, 13 Dec 2019 10:12:34 -0500 Subject: yearly release cycle In-Reply-To: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com> References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com> Message-ID: This all seems reasonable to me and I'm in favor of moving forward with this plan. On 12/13/19 9:59 AM, Ricardo Signes wrote: > > Hey, remember last month when I asked about releasing Cyrus?v3.2 > ? > > That thread had some more conversation about what needs to get done > before v3.2, and I wanted to come back to it and turn some things on > their head. > > Right now, we?re talking about Cyrus releases being feature-bound. > ?We?ll release v3.2 when feature X is done.? I think we?re not being > well-served by that. As feature X is delayed (for various reasons that > we can?t easily eliminate), it doesn?t just delay the feature, but > also all the other minor bugfixes and optimizations that we?ve made in > the master branch. Also, it sets up the idea that we delay releases > for the sake of fixes, instead of releasing the fixes that are ready. > > That is: every additional criteria for a new release is another > doorway to delay. Instead of opening those doors, I would rather try > to eliminate all of them. > > I propose that instead of tying releases to milestones, we tie them to > the calendar. For the sake of full disclosure: I am modeling this > suggestion on the release cycle of perl > , which I ran for several years. > I found the process more than satisfactory, then. > > 1. > > A new /unstable release/ of Cyrus is made every month. We promise > only that it compiled and passed the Cassandane test suite on the > release manager?s computer. It might contain regressions from > previous unstable releases, it might have crashers or corruptors. > We try to avoid any of these, but the goal here is a snapshot for > easy month-to-month testing. These are the odd-middle-digit > releases. (3.3.x) > > 2. > > A new /major release/ of Cyrus is made every year. We will have > tested it on as many configurations as we can readily test. We > will have, some time before the release, frozen the branch for > risky changes, to reduce churn. In the meantime, new work lives in > feature branches. (The changelogs from each unstable release > provide a good basis for the whole-year changelog!) These are the > even-middle-digit third-digit-zero releases. (3.4.0) > > 3. > > A new /maintenance release/ of Cyrus is made for the last two > stable releases when there are enough fixes to critical bugs to > warrant it. These are the even-middle-digit third-digit-nonzero > releases (3.4.1) > > For the above to work, some more properties need to be maintained. > > Maintenance releases should be no-brainers to install, so they must > only fix regressions, crashers, security vulnerabilities, and the > like. This means that once you?re on 3.4.0, you can always upgrade > within the 3.4 series with a minimum risk. It also means you get no > optimizations, features, and the like. > > Major releases must clearly document any incompatible changes or > upgrade steps required. Because non-regression bugfixes aren?t > backported, we want everyone to be able to upgrade from major release > to major release, so incompatible changes must be kept to a minimum. > > In part, this is just ?don?t kill off a feature people use just > because it?s a little annoying.? The more important one is ?don?t > introduce half-baked things that might need to change,? because people > will come to rely on them before you get the updates finished. For > features that will require multiple years to get right, they have to > go behind a default-off configuration option. I?d strongly suggest > they all have a uniform substring like ?unstable?. That way, when a > complaint comes in that the behavior of JMAP calendaring has changed, > we can reply, ?well, to use it, you had to turn on the > unstable_jmap_calendaring? option. > > If we go with this policy, we?ll need to? > > 1. > > identify what issues are /blockers/ to v3.2.0, meaning they?re > regressions from v3.0 and would reasonably prevent someone from > upgrading; this does /not/ include all known bugs, since they may > be bugs that already exist in the last stable release! > > 2. > > pick a release target for v3.2.0; I will arbitrarily suggest March > 2 as ?not too far off, but far off enough that we can get things > in order?; also, if you?re American, March 2 is 3/2 ;-) > > 3. > > produce a changleog, and especially identify what changes in > master need documentation as ?incompatible changes? > > 4. > > produce a list of changes in master that should be put behind an > unstable configuration option and then do it > > 5. > > decide when to stop merging non-release-related things to master > > 6. > > make a plan for who will do monthly snapshot releases > > I?ve spoken with ellie and Bron about just a few of these, such that I > don?t think it?s all crazy. (ellie notes, correctly, I think, that the > first set of releases like this will be the hard ones, where we work > out things like ?how do we keep track of incompatibilities, upgrade > steps, and also how do we make snapshots dead easy to release.?) If > there?s general agreement, I am definitely ready to pitch in and help > try to make it work! > > ? > rjbs > > -- Ken Murchison Cyrus Development Team Fastmail US LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at anatoli.ws Tue Dec 17 12:58:26 2019 From: me at anatoli.ws (Anatoli) Date: Tue, 17 Dec 2019 14:58:26 -0300 Subject: yearly release cycle In-Reply-To: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com> References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com> Message-ID: <0dd011b9-1300-8596-d2e1-1169ffa7da0e@anatoli.ws> Hi Ricardo, I find interesting some ideas in the proposed changes, especially I like this: "uniform substring like ?unstable?" (with startup-time warnings if they change or become stable) and "Maintenance releases should be no-brainers to install... This means that once you?re on 3.4.0, you can always upgrade within the 3.4 series with a minimum risk. It also means you get no optimizations, features, and the like.", though I wouldn't limit the fixes to only the critical ones. Any bugfix that doesn't change the behavior is included in the stable maintenance releases. But I couldn't understand from the description what are the benefits of tying major releases to certain calendar dates vs to make a release when certain desired features are implemented and well tested. What happens if some major new feature, that is a must for a new major release to be published in a week, just isn't stable enough yet? Would it have to wait for an entire year to be included in the next major release? Or would you release it anyway as a stable release with known issues? Then, when you implement a new large feature, who would test it? Today, for example, I (as an advanced user and a potential community dev) can run 3.1 branch at some semi-production deployments (and I sometimes do) and report issues. If, with the new scheme, you only guarantee that the unstable branch just compiles, certainly I wouldn't be using it anywhere, and probably neither would other users. Then pre-production testing of new features would be exclusively the developers' task, with obvious limitations. So when the devs are sure that a new feature works well (in their setups and for their use cases), it is included in the next major stable release... and suddenly a lot of migrating users start finding issues. That could create an impression that the new stable releases of Cyrus are not that stable at all. > As feature X is delayed (for various reasons that we can?t easily > eliminate), it doesn?t just delay the feature, but also all the other > minor bugfixes and optimizations that we?ve made in the master branch. Why would a new feature of a stable release (3.2.0) delay bugfixes in the current stable branch (3.0)? > Also, it sets up the idea that we delay releases for the sake of > fixes, instead of releasing the fixes that are ready. I don't understand what you mean here, but with the current scheme (AFAIK) the bug fixes go to the current stable branch (3.0) and all users receive them without delays. New development happens in dev. When some new feature is stable according to the devs (well tested in all environments available to them), it is published as a new minor release in unstable branch (3.1.x). This is expected to be fully-functional releases, just not proven by the time and the community to be bugs-free. I'm not sure how new major releases are managed today, but it could be done this way: at some point in time, when devs decide that the unstable/3.1 branch has accumulated enough features to be published, 3.1 is frozen for new features and it becomes 3.2.0-RC1 so the community in general could start testing the new candidate stable version in their test deployments. If issues are found, they are fixed in RC2, RC3, and so on until no issues are reported for, say, 1 month. Then, the last issues-free RC becomes 3.2.0 release. At the same time, when 3.1 is frozen for new features, a 3.3 branch is created and new features start landing there. And the current stable branch 3.0 receives bug fixes as usual during all this time. New optimizations probably won't be included in the 3.0.x maintenance versions, but that's OK IMO. It's stable, not cutting-edge after all. But it is bugs-free to the extent possible. All bugs, major and minor (without behavior changes), are fixed there immediately. This is a typical release cycle of many server projects. The main advantage over date-bound releases is that the releases are published when they are ready, not when we reach some specific point in time. The disadvantage of the potential for delays could be mitigated by defining certain criteria for the features to be included in each major release. Also, some flexible dates could be defined, e.g. to publish a major release every 6-12 months. Regards, Anatoli On 13/12/19 11:59, Ricardo Signes wrote: > Hey, remember last month when I asked about releasing Cyrus?v3.2 > ? > > That thread had some more conversation about what needs to get done > before v3.2, and I wanted to come back to it and turn some things on > their head. > > Right now, we?re talking about Cyrus releases being feature-bound. > ?We?ll release v3.2 when feature X is done.? I think we?re not being > well-served by that. As feature X is delayed (for various reasons that > we can?t easily eliminate), it doesn?t just delay the feature, but also > all the other minor bugfixes and optimizations that we?ve made in the > master branch. Also, it sets up the idea that we delay releases for the > sake of fixes, instead of releasing the fixes that are ready. > > That is: every additional criteria for a new release is another doorway > to delay. Instead of opening those doors, I would rather try to > eliminate all of them. > > I propose that instead of tying releases to milestones, we tie them to > the calendar. For the sake of full disclosure: I am modeling this > suggestion on the release cycle of perl > , which I ran for several years. I > found the process more than satisfactory, then. > > 1. > > A new /unstable release/ of Cyrus is made every month. We promise > only that it compiled and passed the Cassandane test suite on the > release manager?s computer. It might contain regressions from > previous unstable releases, it might have crashers or corruptors. We > try to avoid any of these, but the goal here is a snapshot for easy > month-to-month testing. These are the odd-middle-digit releases. (3.3.x) > > 2. > > A new /major release/ of Cyrus is made every year. We will have > tested it on as many configurations as we can readily test. We will > have, some time before the release, frozen the branch for risky > changes, to reduce churn. In the meantime, new work lives in feature > branches. (The changelogs from each unstable release provide a good > basis for the whole-year changelog!) These are the even-middle-digit > third-digit-zero releases. (3.4.0) > > 3. > > A new /maintenance release/ of Cyrus is made for the last two stable > releases when there are enough fixes to critical bugs to warrant it. > These are the even-middle-digit third-digit-nonzero releases (3.4.1) > > For the above to work, some more properties need to be maintained. > > Maintenance releases should be no-brainers to install, so they must only > fix regressions, crashers, security vulnerabilities, and the like. This > means that once you?re on 3.4.0, you can always upgrade within the 3.4 > series with a minimum risk. It also means you get no optimizations, > features, and the like. > > Major releases must clearly document any incompatible changes or upgrade > steps required. Because non-regression bugfixes aren?t backported, we > want everyone to be able to upgrade from major release to major release, > so incompatible changes must be kept to a minimum. > > In part, this is just ?don?t kill off a feature people use just because > it?s a little annoying.? The more important one is ?don?t introduce > half-baked things that might need to change,? because people will come > to rely on them before you get the updates finished. For features that > will require multiple years to get right, they have to go behind a > default-off configuration option. I?d strongly suggest they all have a > uniform substring like ?unstable?. That way, when a complaint comes in > that the behavior of JMAP calendaring has changed, we can reply, ?well, > to use it, you had to turn on the unstable_jmap_calendaring? option. > > If we go with this policy, we?ll need to? > > 1. > > identify what issues are /blockers/ to v3.2.0, meaning they?re > regressions from v3.0 and would reasonably prevent someone from > upgrading; this does /not/ include all known bugs, since they may be > bugs that already exist in the last stable release! > > 2. > > pick a release target for v3.2.0; I will arbitrarily suggest March 2 > as ?not too far off, but far off enough that we can get things in > order?; also, if you?re American, March 2 is 3/2 ;-) > > 3. > > produce a changleog, and especially identify what changes in master > need documentation as ?incompatible changes? > > 4. > > produce a list of changes in master that should be put behind an > unstable configuration option and then do it > > 5. > > decide when to stop merging non-release-related things to master > > 6. > > make a plan for who will do monthly snapshot releases > > I?ve spoken with ellie and Bron about just a few of these, such that I > don?t think it?s all crazy. (ellie notes, correctly, I think, that the > first set of releases like this will be the hard ones, where we work out > things like ?how do we keep track of incompatibilities, upgrade steps, > and also how do we make snapshots dead easy to release.?) If there?s > general agreement, I am definitely ready to pitch in and help try to > make it work! > > ? > rjbs > > From rjbs at fastmailteam.com Tue Dec 17 22:08:20 2019 From: rjbs at fastmailteam.com (Ricardo Signes) Date: Tue, 17 Dec 2019 22:08:20 -0500 Subject: yearly release cycle In-Reply-To: <0dd011b9-1300-8596-d2e1-1169ffa7da0e@anatoli.ws> References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com> <0dd011b9-1300-8596-d2e1-1169ffa7da0e@anatoli.ws> Message-ID: <876bd6ef-b002-4da0-8ed9-ecd941db77c1@dogfood.fastmail.com> On Tue, Dec 17, 2019, at 12:58, Anatoli wrote: > Hi Ricardo, Hi! > But I couldn't understand from the description what are the benefits of > tying major releases to certain calendar dates vs to make a release when > certain desired features are implemented and well tested. By promising a new major release every year, you know, when your significant improvement to Cyrus is accepted, it will very likely be released within a year. Right now, users who added a major feature in 2017 are still waiting for a stable release. For example, Sieve duplicate detection was implemented in March 2017. I don't think we have a stable version that has this feature. If this had been a contribution from a potential repeat contributor, it's easy to imagine that they'd have given up in frustration, by now. (Good thing it was good ol' reliable Ken!) The problem with "we will release when X" is ready is that X might not be ready in a year, meaning all the little things don't get released. Also, you can't shove those into maintenance releases, because the little things still can be destabilizing, so it's less likely to be no problem to just upgrade. In the event that a cool new feature isn't quite ready a month before release, I would argue: yes, it has to wait another year. I think it will be pretty rare that this happens, though. If it comes up, of course, an exception to the rules could be discussed, but in my experience, it won't. These kind of features, in largely volunteer-staffed projects, are rarely good at sticking to a timeline. > Then, when you implement a new large feature, who would test it? 1. new large features should have tests written for them, which should be run by developers and dedicated test runners 2. some people always like to run snapshot releases; I have often done coding on dev releases of languages, and some people will surely run their personal services on snapshots 3. feature authors write features so they can use them; this means they're also both motivated and likely to use them before they're declared ready for general release I feel pretty strongly that #3 is the big test. We're almost always close to bleeding edge Cyrus at work, because we have tons of new features that we rely on since cyrus-imapd-3.0.0. We know that many, many of these have been heavily tested in the real world, and we want to declare them generally ready for use, and then be able to do the same regularly as we move forward. > Today, > for example, I (as an advanced user and a potential community dev) can > run 3.1 branch at some semi-production deployments (and I sometimes do) > and report issues. If, with the new scheme, you only guarantee that the > unstable branch just compiles, certainly I wouldn't be using it > anywhere, and probably neither would other users. Then pre-production > testing of new features would be exclusively the developers' task, with > obvious limitations. I think you are seriously overestimating the kind of stability guarantee you get from a 3.1 release. It's really not much more than the proposed snapshot releases, but on a looser timetable. Mostly, we get our current feedback from master, rather than snapshots, because there are fewer known snapshot deployments. Deploying snapshots regularly will give more points where we're specifically asking for feedback. (Also, I guaranteed Cassandane tests would pass, which is a *far* stronger guarantee than compilation.) My expectation is that in reality, the snapshots will be, at any given time, very close to what Fastmail is running in production, or at least in (real, used by real people for real mail) testing environments. > So when the devs are sure that a new feature works well (in their setups > and for their use cases), it is included in the next major stable > release... and suddenly a lot of migrating users start finding issues. > That could create an impression that the new stable releases of Cyrus > are not that stable at all. I expect these features will have been heavily tested over the course of the time between releases. > I don't understand what you mean here, but with the current scheme > (AFAIK) the bug fixes go to the current stable branch (3.0) and all > users receive them without delays. There are two kinds of bugfixes. Some are "there is an obvious regression or crasher." Others are "there has long been a bug that meant that SELECT would fail on mUTF-7 sequences containing three hyphens in a row, and I fixed it!" The intent here is to include only the first category in new maintenance releases, because that optimizes maint releases for stability, making them easy to install without fear. The other fixes are put into the next possible snapshot for inclusion in the next major release. I think your major concerns are: 1. new features might languish for a longer time than needed to be known stable 2. snapshots will be less reliable under this regime than before I feel strongly that #1 will not be the case. We can always talk about making an interim major release if it comes up, but I am predicting that it will not, and if it does, we will think about it and decide that the effort to make sure we feel good about an unexpected major release is not enough to push us to rush. I acknowledge that reasonable people can disagree on this, but the good thing is: we can wait and see! I disagree that #2 will be the case. Master does not churn with very much untested code, and I'm hoping we will slow it down even further by putting more features into feature branches until they're more battle-tested. That will get us more "this snapshot introduces feature X, which has been tested by production users and load!" rather than "master has been growing feature X in pieces for months, and it's all a bit weird." In general, with Fastmail probably-always running a fast-forward of a snapshot in testing, I feel pretty confident about snapshot use for similar under-load testing elsewhere. -- rjbs -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellie at fastmail.com Tue Dec 17 22:47:58 2019 From: ellie at fastmail.com (ellie timoney) Date: Wed, 18 Dec 2019 14:47:58 +1100 Subject: Cyrus IMAPd version 3.1.9 Message-ID: The Cyrus team is pleased to announce the immediate availability of a new version of Cyrus IMAP: 3.1.9 This is a snapshot of the master branch, and should be considered for testing purposes and bleeding-edge features only. It is available as a git tag, which can be found here: https://github.com/cyrusimap/cyrus-imapd/releases/tag/cyrus-imapd-3.1.9 Join us on Github at https://github.com/cyrusimap/cyrus-imapd to report issues, join in the deliberations of new features for the next Cyrus IMAP release, and to contribute to the documentation. On behalf of the Cyrus team, ellie -------------- next part -------------- An HTML attachment was scrubbed... URL: From dilyan.palauzov at aegee.org Wed Dec 18 05:13:06 2019 From: dilyan.palauzov at aegee.org (=?UTF-8?Q?=D0=94=D0=B8=D0=BB=D1=8F=D0=BD_?= =?UTF-8?Q?=D0=9F=D0=B0=D0=BB=D0=B0=D1=83=D0=B7=D0=BE=D0=B2?=) Date: Wed, 18 Dec 2019 10:13:06 +0000 Subject: yearly release cycle In-Reply-To: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com> References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com> Message-ID: <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org> Hello! This is a very good idea! In particular it makes the gap between development code and stable code smaller. Thus fixes for the stable code will be very similar to fixes on the development code. Of course, providing fixes, like optimizations, makes only sense if it is predictable whether the changes will be integrated in reasonable time. The email of Quanah Gibson-Mount from 25 July about the general policy on integrating patches in Cyrus SASL is not answered. Will the time?based release policy also apply to Cyrus SASL? The documentation of Cyrus IMAP, in its invisable parts, needs some tweaking, like the Table Of Content shall be loop- free. In March I submitted a fix at https://github.com/cyrusimap/cyrus-imapd/pull/2703 which is still pending . Today I have forgotten the detalis, so even if somebody starts integrating this and has questions, I am not willing to reread again how Sphinx works (I have not used Sphinx since then) and digest why I did things in a particular way. Each release announcement encourages contributions to the documentation. Regards ????? On Fri, 2019-12-13 at 09:59 -0500, Ricardo Signes wrote: > Hey, remember last month when I asked about releasing Cyrus v3.2? > > That thread had some more conversation about what needs to get done before v3.2, and I wanted to come back to it and turn some things on their head. > > Right now, we?re talking about Cyrus releases being feature-bound. ?We?ll release v3.2 when feature X is done.? I think we?re not being well-served by that. As feature X is delayed (for various reasons that we can?t easily eliminate), it doesn?t just delay the feature, but also all the other minor bugfixes and optimizations that we?ve made in the master branch. Also, it sets up the idea that we delay releases for the sake of fixes, instead of releasing the fixes that are ready. > > That is: every additional criteria for a new release is another doorway to delay. Instead of opening those doors, I would rather try to eliminate all of them. > > I propose that instead of tying releases to milestones, we tie them to the calendar. For the sake of full disclosure: I am modeling this suggestion on the release cycle of perl, which I ran for several years. I found the process more than satisfactory, then. > > A new unstable release of Cyrus is made every month. We promise only that it compiled and passed the Cassandane test suite on the release manager?s computer. It might contain regressions from previous unstable releases, it might have crashers or corruptors. We try to avoid any of these, but the goal here is a snapshot for easy month-to-month testing. These are the odd-middle-digit releases. (3.3.x) > > A new major release of Cyrus is made every year. We will have tested it on as many configurations as we can readily test. We will have, some time before the release, frozen the branch for risky changes, to reduce churn. In the meantime, new work lives in feature branches. (The changelogs from each unstable release provide a good basis for the whole-year changelog!) These are the even-middle-digit third-digit-zero releases. (3.4.0) > > A new maintenance release of Cyrus is made for the last two stable releases when there are enough fixes to critical bugs to warrant it. These are the even-middle-digit third-digit-nonzero releases (3.4.1) > > For the above to work, some more properties need to be maintained. > > Maintenance releases should be no-brainers to install, so they must only fix regressions, crashers, security vulnerabilities, and the like. This means that once you?re on 3.4.0, you can always upgrade within the 3.4 series with a minimum risk. It also means you get no optimizations, features, and the like. > > Major releases must clearly document any incompatible changes or upgrade steps required. Because non-regression bugfixes aren?t backported, we want everyone to be able to upgrade from major release to major release, so incompatible changes must be kept to a minimum. > > In part, this is just ?don?t kill off a feature people use just because it?s a little annoying.? The more important one is ?don?t introduce half-baked things that might need to change,? because people will come to rely on them before you get the updates finished. For features that will require multiple years to get right, they have to go behind a default-off configuration option. I?d strongly suggest they all have a uniform substring like ?unstable?. That way, when a complaint comes in that the behavior of JMAP calendaring has changed, we can reply, ?well, to use it, you had to turn on the unstable_jmap_calendaring? option. > > If we go with this policy, we?ll need to? > > identify what issues are blockers to v3.2.0, meaning they?re regressions from v3.0 and would reasonably prevent someone from upgrading; this does not include all known bugs, since they may be bugs that already exist in the last stable release! > > pick a release target for v3.2.0; I will arbitrarily suggest March 2 as ?not too far off, but far off enough that we can get things in order?; also, if you?re American, March 2 is 3/2 ;-) > > produce a changleog, and especially identify what changes in master need documentation as ?incompatible changes? > > produce a list of changes in master that should be put behind an unstable configuration option and then do it > > decide when to stop merging non-release-related things to master > > make a plan for who will do monthly snapshot releases > > I?ve spoken with ellie and Bron about just a few of these, such that I don?t think it?s all crazy. (ellie notes, correctly, I think, that the first set of releases like this will be the hard ones, where we work out things like ?how do we keep track of incompatibilities, upgrade steps, and also how do we make snapshots dead easy to release.?) If there?s general agreement, I am definitely ready to pitch in and help try to make it work! > > ? > rjbs > > From rjbs at fastmailteam.com Fri Dec 20 21:02:32 2019 From: rjbs at fastmailteam.com (Ricardo Signes) Date: Fri, 20 Dec 2019 21:02:32 -0500 Subject: yearly release cycle In-Reply-To: <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org> References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com> <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org> Message-ID: <3b3208ef-d5fd-4ccc-bc80-5d6ce03ce9bb@beta.fastmail.com> On Wed, Dec 18, 2019, at 05:13, ????? ???????? wrote: > The email of Quanah Gibson-Mount from 25 July about the general policy on integrating patches in Cyrus SASL is not > answered. > > Will the time?based release policy also apply to Cyrus SASL? I think there was some discussion / decision on this a while back, but I don't remember. cyrus-sasl always floats just outside my field of vision? I *think* I'll be talking to Ken on Monday, who can clear things up. -- rjbs -------------- next part -------------- An HTML attachment was scrubbed... URL: From quanah at symas.com Fri Dec 20 22:15:01 2019 From: quanah at symas.com (Quanah Gibson-Mount) Date: Fri, 20 Dec 2019 19:15:01 -0800 Subject: yearly release cycle In-Reply-To: <3b3208ef-d5fd-4ccc-bc80-5d6ce03ce9bb@beta.fastmail.com> References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com> <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org> <3b3208ef-d5fd-4ccc-bc80-5d6ce03ce9bb@beta.fastmail.com> Message-ID: --On Friday, December 20, 2019 9:02 PM -0500 Ricardo Signes wrote: > I think there was some discussion / decision on this a while back, but I > don't remember. cyrus-sasl always floats just outside my field of > vision? I think I'll be talking to Ken on Monday, who can clear things > up. Last August, Ken and I were discussing myself and Howard Chu getting commit access to the cyrus-sasl portion of the project. It had been agreed to be done, but then never occurred. Howard and I are still interested and willing in this, particularly given cyrus-sasl's importance to OpenLDAP. Regards, Quanah -- Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: From murch at fastmail.com Sat Dec 21 08:52:45 2019 From: murch at fastmail.com (Ken Murchison) Date: Sat, 21 Dec 2019 08:52:45 -0500 Subject: yearly release cycle In-Reply-To: References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com> <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org> <3b3208ef-d5fd-4ccc-bc80-5d6ce03ce9bb@beta.fastmail.com> Message-ID: <02c18fb9-2848-53a2-b561-4c0db961e178@fastmail.com> Quanah, I will try to make this happen next week. On 12/20/19 10:15 PM, Quanah Gibson-Mount wrote: > > > --On Friday, December 20, 2019 9:02 PM -0500 Ricardo Signes > wrote: > >> I think there was some discussion / decision on this a while back, but I >> don't remember.? cyrus-sasl always floats just outside my field of >> vision?? I think I'll be talking to Ken on Monday, who can clear things >> up. > > Last August, Ken and I were discussing myself and Howard Chu getting > commit access to the cyrus-sasl portion of the project. It had been > agreed to be done, but then never occurred.? Howard and I are still > interested and willing in this, particularly given cyrus-sasl's > importance to OpenLDAP. > > Regards, > Quanah > > -- > > Quanah Gibson-Mount > Product Architect > Symas Corporation > Packaged, certified, and supported LDAP solutions powered by OpenLDAP: > -- Ken Murchison Cyrus Development Team Fastmail US LLC From ellie at fastmail.com Sun Dec 22 19:36:03 2019 From: ellie at fastmail.com (ellie timoney) Date: Mon, 23 Dec 2019 11:36:03 +1100 Subject: yearly release cycle In-Reply-To: <02c18fb9-2848-53a2-b561-4c0db961e178@fastmail.com> References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com> <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org> <3b3208ef-d5fd-4ccc-bc80-5d6ce03ce9bb@beta.fastmail.com> <02c18fb9-2848-53a2-b561-4c0db961e178@fastmail.com> Message-ID: I tracked down Quanah's github account from a recent pull request, and sent through an invitation to the cyrusimap organisation. Not sure what Howard Chu's email address or github username is? I can invite him too once I know. Cheers, ellie On Sun, Dec 22, 2019, at 12:52 AM, Ken Murchison wrote: > Quanah, > > I will try to make this happen next week. > > > On 12/20/19 10:15 PM, Quanah Gibson-Mount wrote: > > > > > > --On Friday, December 20, 2019 9:02 PM -0500 Ricardo Signes > > wrote: > > > >> I think there was some discussion / decision on this a while back, but I > >> don't remember.? cyrus-sasl always floats just outside my field of > >> vision?? I think I'll be talking to Ken on Monday, who can clear things > >> up. > > > > Last August, Ken and I were discussing myself and Howard Chu getting > > commit access to the cyrus-sasl portion of the project. It had been > > agreed to be done, but then never occurred.? Howard and I are still > > interested and willing in this, particularly given cyrus-sasl's > > importance to OpenLDAP. > > > > Regards, > > Quanah > > > > -- > > > > Quanah Gibson-Mount > > Product Architect > > Symas Corporation > > Packaged, certified, and supported LDAP solutions powered by OpenLDAP: > > > > -- > Ken Murchison > Cyrus Development Team > Fastmail US LLC > > From quanah at symas.com Mon Dec 23 10:03:00 2019 From: quanah at symas.com (Quanah Gibson-Mount) Date: Mon, 23 Dec 2019 07:03:00 -0800 Subject: yearly release cycle In-Reply-To: References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com> <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org> <3b3208ef-d5fd-4ccc-bc80-5d6ce03ce9bb@beta.fastmail.com> <02c18fb9-2848-53a2-b561-4c0db961e178@fastmail.com> Message-ID: <108C557194349F9EF5D7E2EC@[192.168.1.144]> --On Monday, December 23, 2019 11:36 AM +1100 ellie timoney wrote: > I tracked down Quanah's github account from a recent pull request, and > sent through an invitation to the cyrusimap organisation. > > Not sure what Howard Chu's email address or github username is? I can > invite him too once I know. Thanks Ellie! His github username is "hyc". Regards, Quanah -- Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: From ellie at fastmail.com Mon Dec 23 16:15:53 2019 From: ellie at fastmail.com (ellie timoney) Date: Tue, 24 Dec 2019 08:15:53 +1100 Subject: yearly release cycle In-Reply-To: <108C557194349F9EF5D7E2EC@[192.168.1.144]> References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com> <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org> <3b3208ef-d5fd-4ccc-bc80-5d6ce03ce9bb@beta.fastmail.com> <02c18fb9-2848-53a2-b561-4c0db961e178@fastmail.com> <108C557194349F9EF5D7E2EC@[192.168.1.144]> Message-ID: <29715f37-38d5-45ea-8acb-4eec0f8faffe@www.fastmail.com> Thanks, invite sent! :) On Tue, Dec 24, 2019, at 2:03 AM, Quanah Gibson-Mount wrote: > > > --On Monday, December 23, 2019 11:36 AM +1100 ellie timoney > wrote: > > > I tracked down Quanah's github account from a recent pull request, and > > sent through an invitation to the cyrusimap organisation. > > > > Not sure what Howard Chu's email address or github username is? I can > > invite him too once I know. > > Thanks Ellie! His github username is "hyc". > > > > Regards, > Quanah > > -- > > Quanah Gibson-Mount > Product Architect > Symas Corporation > Packaged, certified, and supported LDAP solutions powered by OpenLDAP: > >