From dilyan.palauzov at aegee.org  Mon Dec  2 07:18:13 2019
From: dilyan.palauzov at aegee.org (=?UTF-8?Q?=D0=94=D0=B8=D0=BB=D1=8F=D0=BD_?= =?UTF-8?Q?=D0=9F=D0=B0=D0=BB=D0=B0=D1=83=D0=B7=D0=BE=D0=B2?=)
Date: Mon, 02 Dec 2019 12:18:13 +0000
Subject: The master janitor goes crazy / Re: Debugging Deadlocks
In-Reply-To: <25d97486-b257-44bb-b47a-3ddc9b16d5de@www.fastmail.com>
References: <92cca1d7baac62ef2b3cbe3f59a771796aba19dd.camel@aegee.org>
 <78928faba6a46f1b60e31d29d1061668a372cda3.camel@aegee.org>
 <feb5a91b-56d0-4ac2-b16c-e207362be8f3@www.fastmail.com>
 <25d97486-b257-44bb-b47a-3ddc9b16d5de@www.fastmail.com>
Message-ID: <cd241d576c9902da1da519b7d6d4699eb74a84d2.camel@aegee.org>

Hello Ellie,

this is exactly what I see (countless pselect calls), but I have as second parameter of pselect a much larger array.  I
just observed that on killing master, it terminates all cyrus processes but two (httpd and notifyd).  Then I try to
connect to that processes (gdb).  This does not work, however, since the processes are moved to zombie status.

Greetings
  ?????

On Thu, 2019-11-28 at 10:34 +1100, ellie timoney wrote:
> Saw something similar just now when I killed a cassandane run off prematurely. One cyrus master process wound up spinning like this:
> 
> pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> 
> 0x0000555ac7124a97 in child_janitor (now=...) at master/master.c:1221
> 1221	        janitor_position = janitor_position % child_table_size;
> (gdb) bt
> #0  0x0000555ac7124a97 in child_janitor (now=...) at master/master.c:1221
> #1  0x0000555ac712a67a in main (argc=10, argv=0x7ffdc1fe78b8)
>     at master/master.c:2812
> 
> Haven't dug further yet, but it looks similar to your report
> 
> On Wed, Nov 27, 2019, at 9:17 AM, ellie timoney wrote:
> > Can you strace the master process next time it's spinning at 100%?  
> > What is it doing at that time?
> > 
> > On Tue, Nov 26, 2019, at 1:29 AM, ????? ???????? wrote:
> > > Hello,
> > > 
> > > > I run cyrus imap 3.0.x with some private changes.
> > > > 
> > > > Sometimes when stop the master process, the master process utilizes one CPU core to 100% for 5 minutes.  After the fifth
> > > > minute, systemd enforces kill -9. When I attach to the maste process, I see that it some janitor does some work, but I
> > > > have not checked the details.  Has anybody experienced this?
> > > 
> > > I run cyrus imap.   At some moment I recompile and reinstall the 
> > > binaries, which in theory means that the binaries
> > > detect this change and restart theirselves.  At some moment I call 
> > > "systemctl stop cyrus-imap" which I guess sends
> > > SIGTERM to the master process.   Then the CPU utilization of the master 
> > > process goes to 100%.  In the systemd service
> > > file I have TimeoutStopSec=320 . After this time, the master process 
> > > continues running and systemd sends 9/SIGKILL.  It
> > > is not necessary that on re-installing the binaries, and then shutting 
> > > down the CPU goes to 100%: it is possible that
> > > the CPU goes to 100%, without reinstalling (and thus triggering 
> > > self-restarting) of the imapd/httpd binaries.
> > > 
> > > It is often, but not always, that this 100% CPU loop is entered on shutdown.
> > > 
> > > I have a webmail client and to speedup things it uses SquirrelMail's 
> > > IMAP Proxy (http://www.imapproxy.org/ a Caching
> > > IMAP proxy).  It is recommended in the installation manual of 
> > > Horde/IMP.  The IMAP caching proxy connects to
> > > 127.0.0.2:143 (and is therefore permitted to skip the TLS overload).  
> > > In master conf I have a line
> > > ?imaplocal     cmd="imapd -C /usr/local/etc/cyrus/imapdlocal.conf" 
> > > listen="127.0.0.2:imap" prefork=0?.
> > > 
> > > When the CPU goes to 100% on shutdown I connect with gdb to the master 
> > > process.  Below is the full backtrace.  Does
> > > somebody have an explanation why the master process enters a never 
> > > ending loop?
> > > 
> > > I do not say that all above information has to be involved in the 
> > > anwer.  Has somebody else experienced this effects? 
> > > Any suggestions how to investigate this deeper?
> > > 
> > > Greetings
> > >   ?????
> > > 
> > > ---
> > > warning: Could not load vsyscall page because no executable was 
> > > specified
> > > Reading symbols from /usr/local/libexec/master...
> > > Attaching to program: /usr/local/libexec/master, process 9247
> > > Reading symbols from /usr/local/lib/libcyrus_min.so.0...
> > > Reading symbols from /lib/libuuid.so.1...
> > > Reading symbols from /usr/local/lib/libgssapi_krb5.so.2...
> > > Reading symbols from /usr/local/lib/libkrb5.so.3...
> > > Reading symbols from /usr/local/lib/libk5crypto.so.3...
> > > Reading symbols from /usr/local/lib/libcom_err.so.3...
> > > Reading symbols from /usr/local/lib/libkrb5support.so.0...
> > > Reading symbols from /usr/local/lib/libpcreposix.so.0...
> > > (No debugging symbols found in /usr/local/lib/libpcreposix.so.0)
> > > Reading symbols from /usr/local/lib/libpcre.so.1...
> > > (No debugging symbols found in /usr/local/lib/libpcre.so.1)
> > > Reading symbols from /usr/local/lib/libxml2.so.2...
> > > Reading symbols from /usr/local/lib/liblzma.so.5...
> > > (No debugging symbols found in /usr/local/lib/liblzma.so.5)
> > > Reading symbols from /usr/local/lib/libical.so.3...
> > > Reading symbols from /usr/local/lib/libicalss.so.3...
> > > Reading symbols from /usr/local/lib/libicalvcal.so.3...
> > > Reading symbols from /usr/local/lib/libicui18n.so.63...
> > > Reading symbols from /usr/local/lib/libicuuc.so.63...
> > > Reading symbols from /usr/local/lib/libicudata.so.63...
> > > (No debugging symbols found in /usr/local/lib/libicudata.so.63)
> > > Reading symbols from /usr/local/lib/libsqlite3.so.0...
> > > (No debugging symbols found in /usr/local/lib/libsqlite3.so.0)
> > > Reading symbols from /usr/local/lib/libz.so.1...
> > > (No debugging symbols found in /usr/local/lib/libz.so.1)
> > > Reading symbols from /lib64/libm.so.6...
> > > Reading symbols from /lib64/libdl.so.2...
> > > Reading symbols from /lib64/libpthread.so.0...
> > > [Thread debugging using libthread_db enabled]
> > > Using host libthread_db library "/lib64/libthread_db.so.1".
> > > Reading symbols from /lib64/libc.so.6...
> > > Reading symbols from /lib64/ld-linux-x86-64.so.2...
> > > Reading symbols from /lib64/libresolv.so.2...
> > > Reading symbols from /usr/local/lib/libdb-18.1.so...
> > > Reading symbols from /usr/local/lib64/libstdc++.so.6...
> > > Reading symbols from /usr/local/lib64/libgcc_s.so.1...
> > > Reading symbols from /usr/local/lib64/libssl.so.1.1...
> > > Reading symbols from /usr/local/lib64/libcrypto.so.1.1...
> > > Reading symbols from /lib64/libnss_db.so.2...
> > > Reading symbols from /lib64/libnss_files.so.2...
> > > Reading symbols from /lib64/libnss_dns.so.2...
> > > 0x0000000000405406 in child_janitor (now=...) at master/master.c:1192
> > > 1192	        janitor_position = janitor_position % child_table_size;
> > > ?(gdb) bt f
> > >   Id   Target Id                                 Frame 
> > > * 1    Thread 0x7f6a08759780 (LWP 9247) "master" 0x0000000000405406 in 
> > > child_janitor (now=...) at master/master.c:1192
> > > #0  0x0000000000405406 in child_janitor (now=...) at 
> > > master/master.c:1192
> > >         i = 9299
> > >         p = 0x4132e0 <ctable+16224>
> > >         c = 0x0
> > > #1  0x0000000000409dd7 in main (argc=4, argv=0x7ffea3075108) at 
> > > master/master.c:2600
> > >         i = 14
> > >         ready_fds = 3
> > >         total_children = 11
> > >         tv = {
> > >           tv_sec = 0,
> > >           tv_usec = 0
> > >         }
> > >         msg = {
> > >           message = 1,
> > >           service_pid = 28219
> > >         }
> > >         maxfd = 41
> > >         tvptr = 0x0
> > >         interrupted = 0
> > >         pidfile = 0x40c4f0 "/var/run/cyrus-master.pid"
> > >         pidfile_lock = 0x2135ba0 "/usr/local/etc/cyrus/imapdlocal.conf"
> > >         startup_pipe = {6, 7}
> > >         pidlock_fd = -1
> > >         i = 14
> > >         opt = -1
> > >         close_std = 1
> > >         daemon_mode = 1
> > >         error_log = 0x0
> > >         alt_config = 0x0
> > >         fd = 3
> > >         rfds = {
> > >           fds_bits = {266272, 0 <repeats 15 times>}
> > >         }
> > >         r = 1
> > >         now = {
> > >           tv_sec = 1574690925,
> > >           tv_usec = 958878
> > >         }
> > >         p = 0x0
> > > quit
> > > Detaching from program: /usr/local/libexec/master, process 9247
> > > [Inferior 1 (process 9247) detached]
> > > 
> > > 
> > > 
> > > 


From dilyan.palauzov at aegee.org  Mon Dec  2 07:46:42 2019
From: dilyan.palauzov at aegee.org (=?UTF-8?Q?=D0=94=D0=B8=D0=BB=D1=8F=D0=BD_?= =?UTF-8?Q?=D0=9F=D0=B0=D0=BB=D0=B0=D1=83=D0=B7=D0=BE=D0=B2?=)
Date: Mon, 02 Dec 2019 12:46:42 +0000
Subject: cyrus.cache causes IOERROR: offset greater than cache size
Message-ID: <4efb00c86df2dbea21c332edfec5a5750fd62f0c.camel@aegee.org>

Hello,

sometimes I get in the logs these messages:

Dec 01 01:30:50 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size 5243456 2288(0)
Dec 01 01:30:50 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 40568 (System I/O error)
Dec 01 01:30:54 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size 5244620 2288(0)
Dec 01 01:30:54 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 40569 (System I/O error)
Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size 5247552 2288(0)
Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 40571 (System I/O error)
Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15463 (Mailbox format
corruption detected)
Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: cache entry truncated 1072 1835101728 2288(0)
Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15464 (System I/O error)
Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: cache entry truncated 2080 1131376244 2288(0)
Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15465 (System I/O error)
Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size (priority)3136 2288(0)
Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15466 (System I/O error)
Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size (priority)3976 2288(0)

Often it is connected to cyr_exipre, but not always. It can be also lmtpd.

When a cyrus.cache inconsistency is detected, the cyrus.cache is rebuild. This means reading a lot of files from the
disk.  During the reconstruction some locks are active, so effectively a lot of processes (lmtpd, imapd, httpd) are
started and all of them wait for the lock to be released.  This cache rebuild happens sometimes (perceived) very often. 
The problem is that on slow hard disks this repack operation can take hours and cyr_expire runs for hours.

My reading of the code is that new records are only appended to cyrus.cache and there is some lock ensuring the
consistency of the append operation.

I have not invested that much time in reading the code.  How is expunging supposed to happen in regards of cyrus.cache? 
Is the on unlink()ing any message the cyrus.cache always supposed to be repacked or where is the code for removing
entries from cyrus.cache?  How can I debug the cause of the invalid cache record?  

I assume that the cached records are kept, until the corresponding message file is removed from the disk.

The cyr_expire output also contains:

Dec 01 01:37:37 mail cyrus/cyr_expire[13952]: IOERROR: conversations_audit on load: /var/imap//user/s/s2.conversations
B25572d90ed3363c1
 0 (713535 1 0 0 () ((18 713534 1 1 0)) () PleaseconfirmyourNNNNregistrationnow. 0 ())

What am I supposed to do with this message?

Regards
  ?????


From brong at fastmailteam.com  Mon Dec  2 17:51:27 2019
From: brong at fastmailteam.com (Bron Gondwana)
Date: Tue, 03 Dec 2019 09:51:27 +1100
Subject: Changing JMAP IDs for Calendar and Contacts to be server generated
Message-ID: <db2d1a00-bfce-4efc-b9aa-08ac0192e543@beta.fastmail.com>

Hi All,

This was discussed into today's Cyrus call, but I figured I should put it here for a public note and to cover the discussion in more detail :)

Fastmail has a "caldav_sync" tool, which replicates calendars from outside. Right now we rewrite the UID both ways in order to allow uniqueness of UIDs within our system, because we also constrain each UID to only exist once in all of a user's calendars (because of scheduling).

This is variously buggy and annoying.

Looking at various solutions for embedding mailboxid as well into the JMAP id, we came to the conclusion that the best move was actually to generate JMAP IDs synthetically on first receipt of a UID, and maintain that ID across changes. This has a couple of other good benefits:

* doesn't use random junk off the wire as part of ids
* can maintain the JMAP id even when moving between different calendars
* fixed length IDs for JMAP, whereas UIDs can be quite long from some services
* restricted character set means we don't have to escape parts of the UID (which is not ObjectId safe)

All together, a big win. The same as not using the Message-Id header from emails, we won't use the UID from calendars or contacts.

I'm looking at potential options for upgrade path for existing events - possibly even rewriting them on disk! It will definitely need a dav_db rewrite.

Bron.

--
 Bron Gondwana, CEO, Fastmail Pty Ltd
 brong at fastmailteam.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20191203/b709bffa/attachment.html>

From me at anatoli.ws  Mon Dec  2 23:05:35 2019
From: me at anatoli.ws (Anatoli)
Date: Tue, 3 Dec 2019 01:05:35 -0300
Subject: Cyrus webdav with Joplin
In-Reply-To: <D6FE59F0-21C2-4664-974D-34FD5252E9B6@hattne.se>
References: <CE341ABB-1B25-4516-B7A7-C3B9C484F52B@hattne.se>
 <5d3f67e8-2faf-9cd8-a3a5-f4aba87861f2@anatoli.ws>
 <D6FE59F0-21C2-4664-974D-34FD5252E9B6@hattne.se>
Message-ID: <18ddf123-6d07-a97b-483b-65a57778e39f@anatoli.ws>

The meth_mkcol function (and others in http_dav.c?) probably should be
checked thoroughly, it looks like inside some conditions it would be
better to use different status codes.

I'm forwarding this mail to cyrus-devel@ and CC'ing Ken who probably
knows this part better than anyone.

WebDAV is an HTTP extension so it is guided by the HTTP standard RFC
7231 (HTTP/1.1) with additions by its own standard RFC 4918 (WebDAV) and
then MKCOL is further extended by RFC 5689 (Extended MKCOL).

For me it seems that in some aspects the later two contradict the first one.

>From RFC 7231 (HTTP/1.1 [1]): The 403 (Forbidden) status code indicates
that the server understood the request but refuses to *authorize* it.

>From RFC 4918 (WebDAV [2]): 403 (Forbidden) - This indicates at least
one of two conditions: 1) the server does not allow the creation of
collections at the given location in its URL namespace, or 2) the parent
collection of the Request-URI exists but cannot accept members.

The second condition is what could be used here (the target URL can't
accept the specified member which is the current behavior of Cyrus), but
it has nothing to do with authorization as defined by HTTP/1.1 for 403.


RFC 7231 (HTTP/1.1 [3]): The 405 (Method Not Allowed) status code
indicates that the method received in the request-line is known by the
origin server but *not supported by the target resource* ([2]), which in
this case would mean that the URI on which MKCOL is tried does not allow
MKCOL method at all, which is not true.

>From RFC 4918 (WebDAV [2]): 405 (Method Not Allowed) - MKCOL can only be
executed on an unmapped URL. ???


RFC 7231 (HTTP/1.1 [4]): The 409 (Conflict) status code indicates that
the request could not be completed due to a *conflict with the current
state of the target resource*, which in this case is the URI on which
MKCOL is tried and this is exactly the case: the path already contains a
collection so "the request could not be completed due to a conflict with
the current state of the target resource".

>From RFC 4918 (WebDAV [2]): 409 (Conflict) - A collection cannot be made
at the Request-URI until one or more intermediate collections have been
created. The server MUST NOT create those intermediate collections
automatically.


Additionally, RFC 7231 (HTTP/1.1 [5]): The 404 (Not Found) status code
indicates that the origin server *did not find a current representation
for the target resource*, which IMO is the case when a/b is not found
when a/b/c creation is requested, but the WebDAV RFC says it's 409
Conflict, go figure.

BTW, citing the HTTP/1.1 RFC: The origin server MUST generate an Allow
header field in a 405 response containing a list of the target
resource's currently supported methods [3].

[1] https://tools.ietf.org/html/rfc7231#section-6.5.3
[2] https://tools.ietf.org/html/rfc4918#section-9.3.1
[3] https://tools.ietf.org/html/rfc7231#section-6.5.5
[4] https://tools.ietf.org/html/rfc7231#section-6.5.8
[5] https://tools.ietf.org/html/rfc7231#section-6.5.4

On 2/12/19 07:13, Johan Hattne wrote:
> Hi Anatoli;
> 
> Thanks for your reply; I?ll be focusing on the MKCOL for now:
> 
> I don?t know about permission to overwrite quite yet, but from looking at the source it seems the break (at https://github.com/cyrusimap/cyrus-imapd/blob/master/imap/http_dav.c#L5590) is what causes HTTP_FORBIDDEN to be returned.  Now looking at the code in the client (https://github.com/laurent22/joplin/blob/master/ReactNativeClient/lib/file-api-driver-webdav.js#L164) it appears Joplin is expecting 405, or possibly 409, given the explanation in the comment following line 164.
> 
> Given all that, it would seem to me that Cyrus should possibly change the aforementioned break to a return HTTP_CONFLICT, or HTTP_NOT_ALLOWED if the comment in Joplin is correct.  I haven?t tested this yet (nor have read the RFC:s thoroughly), but I?d be happy to submit a pull request if this all checks out.  Opinions?
> 
> // Best wishes; Johan
> 
>> On Dec 1, 2019, at 10:57, Anatoli <me at anatoli.ws> wrote:
>>
>> Hi Johan,
>>
>> In RFC 7231 (HTTP 1.1) section 3.1.1.5
>> (https://tools.ietf.org/html/rfc7231#section-3.1.1.5) it says that CT
>> header SHOULD be present, otherwise the recipient may interpret it the
>> way it wants, so IMO no problem on the Cyrus side here. For
>> application/json for example it MUST be present, application/xml doesn't
>> demand that, but not sending it IMO is not a good behavior for
>> interoperability.
>>
>> For collection that exists, does the user that makes the request have
>> the rights to overwrite the collection? If not, 403 is the correct SC
>> (status code). 405 should be used when the specified method is not
>> allowed at all on the specified path, independently of the current
>> server state, which is not the case here.
>>
>> So, again IMO no problem on the Cyrus side here, but if the user has
>> sufficient rights, instead of 403 I'd use "409 Conflict" which is the
>> recommended SC when a record with specified ID/name already exists.
>>
>> Regards,
>> Anatoli
>>
>> On 28/11/19 04:40, Johan Hattne wrote:
>>> Dear all;
>>>
>>> I?m trying to get Joplin (https://joplinapp.org) to work with Cyrus?s webdav module, and I?ve run into two issues:
>>>
>>> (1) When attempting to MKCOL a collection that already exists, Cyrus is responding with a 403, rather than a 405, which is what Joplin expects.
>>>
>>> (2) Cyrus returns an error if the Content-type isn?t set where additional XML-formatted information is required in a POST to complete a request.
>>>
>>> My skimming of the relevant RFC:s now lead me to believe that Cyrus is right on both counts; however, I don?t know enough about this to say for sure.  Can anyone here confirm, or are these genuine Cyrus bugs?
>>>
>>> // Best wishes; Johan
>>> ----
>>> Cyrus Home Page: http://www.cyrusimap.org/
>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>>> To Unsubscribe:
>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
>>>
>> ----
>> Cyrus Home Page: http://www.cyrusimap.org/
>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>> To Unsubscribe:
>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
> 

From dilyan.palauzov at aegee.org  Tue Dec  3 12:27:34 2019
From: dilyan.palauzov at aegee.org (=?UTF-8?Q?=D0=94=D0=B8=D0=BB=D1=8F=D0=BD_?= =?UTF-8?Q?=D0=9F=D0=B0=D0=BB=D0=B0=D1=83=D0=B7=D0=BE=D0=B2?=)
Date: Tue, 03 Dec 2019 17:27:34 +0000
Subject: cyrus.cache causes IOERROR: offset greater than cache size
In-Reply-To: <4efb00c86df2dbea21c332edfec5a5750fd62f0c.camel@aegee.org>
References: <4efb00c86df2dbea21c332edfec5a5750fd62f0c.camel@aegee.org>
Message-ID: <fb5bd4d308aef74733f03482a828bda17ffaf23a.camel@aegee.org>

Hello,

it turned out that after emitting the messages below, cyrus.cache (3.0) was not self-repaired and stayed bogus.  In
fact, reconstruct also does not repair cyrus.cache, unless cyrus.index is deleted.  If cyrus.index is present and
cyrus.cache is missing, reconstruct creates a four-bytes large file.

Greetings
  ?????

On Mon, 2019-12-02 at 12:46 +0000, ????? ???????? wrote:
> Hello,
> 
> sometimes I get in the logs these messages:
> 
> Dec 01 01:30:50 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size 5243456 2288(0)
> Dec 01 01:30:50 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 40568 (System I/O error)
> Dec 01 01:30:54 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size 5244620 2288(0)
> Dec 01 01:30:54 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 40569 (System I/O error)
> Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size 5247552 2288(0)
> Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 40571 (System I/O error)
> Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15463 (Mailbox format
> corruption detected)
> Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: cache entry truncated 1072 1835101728 2288(0)
> Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15464 (System I/O error)
> Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: cache entry truncated 2080 1131376244 2288(0)
> Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15465 (System I/O error)
> Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size (priority)3136 2288(0)
> Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: invalid cache record for user.u1 uid 15466 (System I/O error)
> Dec 01 01:30:55 mail cyrus/cyr_expire[13952]: IOERROR: offset greater than cache size (priority)3976 2288(0)
> 
> Often it is connected to cyr_exipre, but not always. It can be also lmtpd.
> 
> When a cyrus.cache inconsistency is detected, the cyrus.cache is rebuild. This means reading a lot of files from the
> disk.  During the reconstruction some locks are active, so effectively a lot of processes (lmtpd, imapd, httpd) are
> started and all of them wait for the lock to be released.  This cache rebuild happens sometimes (perceived) very often. 
> The problem is that on slow hard disks this repack operation can take hours and cyr_expire runs for hours.
> 
> My reading of the code is that new records are only appended to cyrus.cache and there is some lock ensuring the
> consistency of the append operation.
> 
> I have not invested that much time in reading the code.  How is expunging supposed to happen in regards of cyrus.cache? 
> Is the on unlink()ing any message the cyrus.cache always supposed to be repacked or where is the code for removing
> entries from cyrus.cache?  How can I debug the cause of the invalid cache record?  
> 
> I assume that the cached records are kept, until the corresponding message file is removed from the disk.
> 
> The cyr_expire output also contains:
> 
> Dec 01 01:37:37 mail cyrus/cyr_expire[13952]: IOERROR: conversations_audit on load: /var/imap//user/s/s2.conversations
> B25572d90ed3363c1
>  0 (713535 1 0 0 () ((18 713534 1 1 0)) () PleaseconfirmyourNNNNregistrationnow. 0 ())
> 
> What am I supposed to do with this message?
> 
> Regards
>   ?????
> 


From ellie at fastmail.com  Tue Dec  3 22:22:11 2019
From: ellie at fastmail.com (ellie timoney)
Date: Wed, 04 Dec 2019 14:22:11 +1100
Subject: The master janitor goes crazy / Re: Debugging Deadlocks
In-Reply-To: <cd241d576c9902da1da519b7d6d4699eb74a84d2.camel@aegee.org>
References: <92cca1d7baac62ef2b3cbe3f59a771796aba19dd.camel@aegee.org>
 <78928faba6a46f1b60e31d29d1061668a372cda3.camel@aegee.org>
 <feb5a91b-56d0-4ac2-b16c-e207362be8f3@www.fastmail.com>
 <25d97486-b257-44bb-b47a-3ddc9b16d5de@www.fastmail.com>
 <cd241d576c9902da1da519b7d6d4699eb74a84d2.camel@aegee.org>
Message-ID: <c4fca564-147c-4be2-9fe6-b702b61199e3@www.fastmail.com>

So, using my strace output from the other week as an example:

> pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])

The arguments here are:

* nfds: 13 (12+1)
* readfds: [8 9 11 12]
* writefds: NULL
* exceptfds: NULL
* timeout: NULL
* sigmask: {[], 8}

The interesting bits are: 

* we don't have a timeout (so this pselect would block forever if nothing became ready)
* we're only waiting for fds to become readable (not writeable or having exceptions)
* we don't have a sigmask set (empty array of 8-byte objects)

The return value of 1 means that 1 of the fds was ready, and I surmise that "(in [11])" is telling us that it was fd 11 from the readfds set that was ready (for reading).

The fact these pselect calls are all the same tells me that either: a lot is happening on fd 11 and we're not keeping up, or that there's data waiting on fd 11 and we keep ignoring it (so it keeps telling us it's there).

The gdb backtrace isn't really useful here I don't think, I think it's coincidental that when we each attached a debugger we both happened to be at that particular line in child_janitor.  Once we're in shutdown, child_janitor is the only thing doing much work, and that line is the top of its loop.

I think the really useful information to collect next time this happens (and while the master process is still running) is:

* What does lsof tell us about that ready file descriptor (in the example, fd 11)?  I would be very interested to know if it's a client socket, or an internal messaging socket (that service processes use to tell master their status).

* If you can attach a debugger and step through a couple of iterations of master's big "for (;;) {" loop, what path is it taking?  What decisions is it making?

* Without the debugger, if you let it run like this for 30 seconds or more, does a syslog line like this appear? https://github.com/cyrusimap/cyrus-imapd/blob/96d194de82d3dbe124a359069bd21f5cba7519ba/master/master.c#L1240-L1244

Cheers,

ellie

On Mon, Dec 2, 2019, at 11:18 PM, ????? ???????? wrote:
> Hello Ellie,
> 
> this is exactly what I see (countless pselect calls), but I have as 
> second parameter of pselect a much larger array.  I
> just observed that on killing master, it terminates all cyrus processes 
> but two (httpd and notifyd).  Then I try to
> connect to that processes (gdb).  This does not work, however, since 
> the processes are moved to zombie status.
> 
> Greetings
>   ?????
> 
> On Thu, 2019-11-28 at 10:34 +1100, ellie timoney wrote:
> > Saw something similar just now when I killed a cassandane run off prematurely. One cyrus master process wound up spinning like this:
> > 
> > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> > pselect6(13, [8 9 11 12], NULL, NULL, NULL, {[], 8}) = 1 (in [11])
> > 
> > 0x0000555ac7124a97 in child_janitor (now=...) at master/master.c:1221
> > 1221	        janitor_position = janitor_position % child_table_size;
> > (gdb) bt
> > #0  0x0000555ac7124a97 in child_janitor (now=...) at master/master.c:1221
> > #1  0x0000555ac712a67a in main (argc=10, argv=0x7ffdc1fe78b8)
> >     at master/master.c:2812
> > 
> > Haven't dug further yet, but it looks similar to your report
> > 
> > On Wed, Nov 27, 2019, at 9:17 AM, ellie timoney wrote:
> > > Can you strace the master process next time it's spinning at 100%?  
> > > What is it doing at that time?
> > > 
> > > On Tue, Nov 26, 2019, at 1:29 AM, ????? ???????? wrote:
> > > > Hello,
> > > > 
> > > > > I run cyrus imap 3.0.x with some private changes.
> > > > > 
> > > > > Sometimes when stop the master process, the master process utilizes one CPU core to 100% for 5 minutes.  After the fifth
> > > > > minute, systemd enforces kill -9. When I attach to the maste process, I see that it some janitor does some work, but I
> > > > > have not checked the details.  Has anybody experienced this?
> > > > 
> > > > I run cyrus imap.   At some moment I recompile and reinstall the 
> > > > binaries, which in theory means that the binaries
> > > > detect this change and restart theirselves.  At some moment I call 
> > > > "systemctl stop cyrus-imap" which I guess sends
> > > > SIGTERM to the master process.   Then the CPU utilization of the master 
> > > > process goes to 100%.  In the systemd service
> > > > file I have TimeoutStopSec=320 . After this time, the master process 
> > > > continues running and systemd sends 9/SIGKILL.  It
> > > > is not necessary that on re-installing the binaries, and then shutting 
> > > > down the CPU goes to 100%: it is possible that
> > > > the CPU goes to 100%, without reinstalling (and thus triggering 
> > > > self-restarting) of the imapd/httpd binaries.
> > > > 
> > > > It is often, but not always, that this 100% CPU loop is entered on shutdown.
> > > > 
> > > > I have a webmail client and to speedup things it uses SquirrelMail's 
> > > > IMAP Proxy (http://www.imapproxy.org/ a Caching
> > > > IMAP proxy).  It is recommended in the installation manual of 
> > > > Horde/IMP.  The IMAP caching proxy connects to
> > > > 127.0.0.2:143 (and is therefore permitted to skip the TLS overload).  
> > > > In master conf I have a line
> > > > ?imaplocal     cmd="imapd -C /usr/local/etc/cyrus/imapdlocal.conf" 
> > > > listen="127.0.0.2:imap" prefork=0?.
> > > > 
> > > > When the CPU goes to 100% on shutdown I connect with gdb to the master 
> > > > process.  Below is the full backtrace.  Does
> > > > somebody have an explanation why the master process enters a never 
> > > > ending loop?
> > > > 
> > > > I do not say that all above information has to be involved in the 
> > > > anwer.  Has somebody else experienced this effects? 
> > > > Any suggestions how to investigate this deeper?
> > > > 
> > > > Greetings
> > > >   ?????
> > > > 
> > > > ---
> > > > warning: Could not load vsyscall page because no executable was 
> > > > specified
> > > > Reading symbols from /usr/local/libexec/master...
> > > > Attaching to program: /usr/local/libexec/master, process 9247
> > > > Reading symbols from /usr/local/lib/libcyrus_min.so.0...
> > > > Reading symbols from /lib/libuuid.so.1...
> > > > Reading symbols from /usr/local/lib/libgssapi_krb5.so.2...
> > > > Reading symbols from /usr/local/lib/libkrb5.so.3...
> > > > Reading symbols from /usr/local/lib/libk5crypto.so.3...
> > > > Reading symbols from /usr/local/lib/libcom_err.so.3...
> > > > Reading symbols from /usr/local/lib/libkrb5support.so.0...
> > > > Reading symbols from /usr/local/lib/libpcreposix.so.0...
> > > > (No debugging symbols found in /usr/local/lib/libpcreposix.so.0)
> > > > Reading symbols from /usr/local/lib/libpcre.so.1...
> > > > (No debugging symbols found in /usr/local/lib/libpcre.so.1)
> > > > Reading symbols from /usr/local/lib/libxml2.so.2...
> > > > Reading symbols from /usr/local/lib/liblzma.so.5...
> > > > (No debugging symbols found in /usr/local/lib/liblzma.so.5)
> > > > Reading symbols from /usr/local/lib/libical.so.3...
> > > > Reading symbols from /usr/local/lib/libicalss.so.3...
> > > > Reading symbols from /usr/local/lib/libicalvcal.so.3...
> > > > Reading symbols from /usr/local/lib/libicui18n.so.63...
> > > > Reading symbols from /usr/local/lib/libicuuc.so.63...
> > > > Reading symbols from /usr/local/lib/libicudata.so.63...
> > > > (No debugging symbols found in /usr/local/lib/libicudata.so.63)
> > > > Reading symbols from /usr/local/lib/libsqlite3.so.0...
> > > > (No debugging symbols found in /usr/local/lib/libsqlite3.so.0)
> > > > Reading symbols from /usr/local/lib/libz.so.1...
> > > > (No debugging symbols found in /usr/local/lib/libz.so.1)
> > > > Reading symbols from /lib64/libm.so.6...
> > > > Reading symbols from /lib64/libdl.so.2...
> > > > Reading symbols from /lib64/libpthread.so.0...
> > > > [Thread debugging using libthread_db enabled]
> > > > Using host libthread_db library "/lib64/libthread_db.so.1".
> > > > Reading symbols from /lib64/libc.so.6...
> > > > Reading symbols from /lib64/ld-linux-x86-64.so.2...
> > > > Reading symbols from /lib64/libresolv.so.2...
> > > > Reading symbols from /usr/local/lib/libdb-18.1.so...
> > > > Reading symbols from /usr/local/lib64/libstdc++.so.6...
> > > > Reading symbols from /usr/local/lib64/libgcc_s.so.1...
> > > > Reading symbols from /usr/local/lib64/libssl.so.1.1...
> > > > Reading symbols from /usr/local/lib64/libcrypto.so.1.1...
> > > > Reading symbols from /lib64/libnss_db.so.2...
> > > > Reading symbols from /lib64/libnss_files.so.2...
> > > > Reading symbols from /lib64/libnss_dns.so.2...
> > > > 0x0000000000405406 in child_janitor (now=...) at master/master.c:1192
> > > > 1192	        janitor_position = janitor_position % child_table_size;
> > > > ?(gdb) bt f
> > > >   Id   Target Id                                 Frame 
> > > > * 1    Thread 0x7f6a08759780 (LWP 9247) "master" 0x0000000000405406 in 
> > > > child_janitor (now=...) at master/master.c:1192
> > > > #0  0x0000000000405406 in child_janitor (now=...) at 
> > > > master/master.c:1192
> > > >         i = 9299
> > > >         p = 0x4132e0 <ctable+16224>
> > > >         c = 0x0
> > > > #1  0x0000000000409dd7 in main (argc=4, argv=0x7ffea3075108) at 
> > > > master/master.c:2600
> > > >         i = 14
> > > >         ready_fds = 3
> > > >         total_children = 11
> > > >         tv = {
> > > >           tv_sec = 0,
> > > >           tv_usec = 0
> > > >         }
> > > >         msg = {
> > > >           message = 1,
> > > >           service_pid = 28219
> > > >         }
> > > >         maxfd = 41
> > > >         tvptr = 0x0
> > > >         interrupted = 0
> > > >         pidfile = 0x40c4f0 "/var/run/cyrus-master.pid"
> > > >         pidfile_lock = 0x2135ba0 "/usr/local/etc/cyrus/imapdlocal.conf"
> > > >         startup_pipe = {6, 7}
> > > >         pidlock_fd = -1
> > > >         i = 14
> > > >         opt = -1
> > > >         close_std = 1
> > > >         daemon_mode = 1
> > > >         error_log = 0x0
> > > >         alt_config = 0x0
> > > >         fd = 3
> > > >         rfds = {
> > > >           fds_bits = {266272, 0 <repeats 15 times>}
> > > >         }
> > > >         r = 1
> > > >         now = {
> > > >           tv_sec = 1574690925,
> > > >           tv_usec = 958878
> > > >         }
> > > >         p = 0x0
> > > > quit
> > > > Detaching from program: /usr/local/libexec/master, process 9247
> > > > [Inferior 1 (process 9247) detached]
> > > > 
> > > > 
> > > > 
> > > > 
> 
>

From ellie at fastmail.com  Tue Dec  3 22:27:17 2019
From: ellie at fastmail.com (ellie timoney)
Date: Wed, 04 Dec 2019 14:27:17 +1100
Subject: Cyrus IMAPd version 3.1.8
Message-ID: <2f1624ab-d716-499f-9c97-72fa2edc8ff8@www.fastmail.com>

The Cyrus team is pleased to announce the immediate availability of a new version of Cyrus IMAP: 3.1.8

This is a snapshot of the master branch, and should be considered for testing purposes and bleeding-edge features only. It is available as a git tag, which can be found here:

https://github.com/cyrusimap/cyrus-imapd/releases/tag/cyrus-imapd-3.1.8

Join us on Github at https://github.com/cyrusimap/cyrus-imapd to report issues, join in the deliberations of new features for the next Cyrus IMAP release, and to contribute to the documentation.

On behalf of the Cyrus team, 

ellie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20191204/b21ef6f5/attachment-0001.html>

From David.Luong at interoptechnologies.com  Wed Dec  4 12:37:19 2019
From: David.Luong at interoptechnologies.com (Luong, David)
Date: Wed, 4 Dec 2019 12:37:19 -0500
Subject: Error building cyrus-imapd-3.1.8
Message-ID: <DA0D4AEF.24224%David.Luong@interoptechnologies.com>

Hi,

I?m building with the following options:

$ autoreconf -i -s
$ ./configure --enable-http --enable-jmap --enable-autocreate --enable-murder --enable-idled --enable-xapian --prefix=/usr/cyrus
$ make

The build is not completed with the following errors.

make[2]: Leaving directory `/cyrus/cyrus-imapd-3.1.8'
Making all in perl/annotator
make[2]: Entering directory `/cyrus/cyrus-imapd-3.1.8/perl/annotator'
make[2]: *** No rule to make target `all'.  Stop.
make[2]: Leaving directory `/cyrus/cyrus-imapd-3.1.8/perl/annotator'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/cyrus/cyrus-imapd-3.1.8'
make: *** [all] Error 2


Please advise.

Regards,
David.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20191204/e30be44a/attachment.html>

From brong at fastmailteam.com  Thu Dec  5 06:52:36 2019
From: brong at fastmailteam.com (Bron Gondwana)
Date: Thu, 05 Dec 2019 22:52:36 +1100
Subject: Cyrus IMAPd version 3.1.8
In-Reply-To: <2f1624ab-d716-499f-9c97-72fa2edc8ff8@www.fastmail.com>
References: <2f1624ab-d716-499f-9c97-72fa2edc8ff8@www.fastmail.com>
Message-ID: <44ddb82b-b952-4227-9ee9-31e19c3a3c84@dogfood.fastmail.com>

FYI: this is almost exactly what Fastmail is running in production right now - it has about 4 minor commits beyond current production, and is missing the handful of Fastmail specific magic!

OldRev: cyrus-imapd-3.1.8
NewRev: fmstable-20191203v1

Removes the following commits:
 44210c59 2019-12-02 brong: jmap: don't need to update the alive value, mailbox.c already did that on write
 463f6291 2019-12-03 brong: caldav: fix bogus && to & in read_cb
 41f6117e 2019-12-02 brong: caldav: scheduling enabled should always be checked on the shared annotation (aka: owner)
 3d69f08c 2019-12-03 rsto: jmap_mail: report "xapian" perf filter for contact group searches
 469cacc0 2019-12-04 rsto: jmap_mail: move Identity data to jmap:submission capability
 dc91d2d4 2019-12-04 rsto: jmap_mail: reject mutable search in queryChanges
 0b757d88 2019-12-04 rsto: jmap_mail_query: don't crash for nested multipart alternatives
 9b30ee3f 2019-12-04 ellie: release notes for 3.1.8
 18d157e0 2019-12-04 ellie: fix cve link in 3.1.7 release notes
 96d194de 2019-12-04 ellie: developer release 3.1.8

Adds the following commits:
 1c6ed3ad 2015-03-30 brong: Fastmail Secrets (no rated)
 5ba2dbee 2015-03-30 brong: Fastmail ONLY - make assertion failures and fatal errors into coredumps
 26d563c1 2015-03-30 brong: Fastmail ONLY - Remove sieve action string
 0e71d55c 2017-08-18 brong: Fastmail ONLY - don't fiddle timezone data in http_caldav_sched.c
 61da4794 2018-06-26 brong: Fastmail ONLY - re-apply the VEVENTS ONLY patch for alarms
 c51f3989 2019-02-06 rsto: Fastmail ONLY - mailbox owners always have ACL_ADMIN in JMAP
 2f3e9516 2015-08-07 brong: mkdebian: fastmail build script (v29)

This is from the attached "GitBranchDiff" script.

All the commits listed as only being on "master" will be merged into Fastmail production next week when we rebase our build on the 3.1.8 tag (and possibly more changes from master too)

Cheers,

Bron.

On Wed, Dec 4, 2019, at 14:27, ellie timoney wrote:
> The Cyrus team is pleased to announce the immediate availability of a new version of Cyrus IMAP: 3.1.8
> 
> This is a snapshot of the master branch, and should be considered for testing purposes and bleeding-edge features only. It is available as a git tag, which can be found here:
> 
> https://github.com/cyrusimap/cyrus-imapd/releases/tag/cyrus-imapd-3.1.8
> 
> Join us on Github at https://github.com/cyrusimap/cyrus-imapd to report issues, join in the deliberations of new features for the next Cyrus IMAP release, and to contribute to the documentation.
> 
> On behalf of the Cyrus team, 
> 
> ellie

--
 Bron Gondwana, CEO, Fastmail Pty Ltd
 brong at fastmailteam.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20191205/4d0e1781/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GitBranchDiff.pl
Type: application/x-perl
Size: 1865 bytes
Desc: not available
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20191205/4d0e1781/attachment.pl>

From ellie at fastmail.com  Thu Dec  5 17:30:13 2019
From: ellie at fastmail.com (ellie timoney)
Date: Fri, 06 Dec 2019 09:30:13 +1100
Subject: Error building cyrus-imapd-3.1.8
In-Reply-To: <DA0D4AEF.24224%David.Luong@interoptechnologies.com>
References: <DA0D4AEF.24224%David.Luong@interoptechnologies.com>
Message-ID: <d4918175-e2b8-4f14-9b87-fbe46017d306@www.fastmail.com>

Hi David,

That smells like a missing dependency. Have you reviewed https://www.cyrusimap.org/dev/imap/developer/compiling.html ?

Looking at the error, and glancing at the dependencies list, I wonder if you need 'perl-devel'. It's listed as a developer-only dependency, but because you're building from a git tag and not a distribution tarball (where some things with tricky dependencies have been pre-compiled), you will probably need some or all of the developer dependencies as well.

It would be nice if configure would report the missing dependency, instead of succeeding and then the build fails. If you can track down which missing dependency caused this problem, please let us know and I'll update configure to complain about it. :)

Cheers,

ellie

On Thu, Dec 5, 2019, at 4:37 AM, Luong, David wrote:
> Hi,
> 
> I?m building with the following options:
> 
> $ autoreconf -i -s
> $ ./configure --enable-http --enable-jmap --enable-autocreate --enable-murder --enable-idled --enable-xapian --prefix=/usr/cyrus
> $ make
> 
> The build is not completed with the following errors.
> 
> make[2]: Leaving directory `/cyrus/cyrus-imapd-3.1.8'
> Making all in perl/annotator
> make[2]: Entering directory `/cyrus/cyrus-imapd-3.1.8/perl/annotator'
> make[2]: *** No rule to make target `all'. Stop.
> make[2]: Leaving directory `/cyrus/cyrus-imapd-3.1.8/perl/annotator'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/cyrus/cyrus-imapd-3.1.8'
> make: *** [all] Error 2
> 
> 
> Please advise.
> 
> Regards,
> David.
> 
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20191206/547668da/attachment.html>

From dilyan.palauzov at aegee.org  Tue Dec 10 17:52:44 2019
From: dilyan.palauzov at aegee.org (=?UTF-8?Q?=D0=94=D0=B8=D0=BB=D1=8F=D0=BD_?= =?UTF-8?Q?=D0=9F=D0=B0=D0=BB=D0=B0=D1=83=D0=B7=D0=BE=D0=B2?=)
Date: Tue, 10 Dec 2019 22:52:44 +0000
Subject: 8074597e mailbox.c: release cache files when locking index
Message-ID: <b6c01ed48cc117b87c241fb12129f2841bac62be.camel@aegee.org>

Hello,

I have the problem that on 3.0 sometimes the cyrus.cache gets truncated to 4 bytes (usually but not always by
cyr_expire) and then it takes on Inbox very, very much IO and time to reconstruct cyrus.cache, and after a while
cyrus.cache gets trucated again.

Does this commit https://github.com/cyrusimap/cyrus-imapd/commit/e8074597e84cfb62cc fix the problem, is it useful for
3.0, does somebody see in syslog similar problems?  What problem does this commit solve?

Greetings
  ?????	


From David.Luong at interoptechnologies.com  Wed Dec 11 18:40:17 2019
From: David.Luong at interoptechnologies.com (Luong, David)
Date: Wed, 11 Dec 2019 18:40:17 -0500
Subject: Error building cyrus-imapd-3.1.8
In-Reply-To: <d4918175-e2b8-4f14-9b87-fbe46017d306@www.fastmail.com>
References: <DA0D4AEF.24224%David.Luong@interoptechnologies.com>
 <d4918175-e2b8-4f14-9b87-fbe46017d306@www.fastmail.com>
Message-ID: <DA16D9DB.24C31%David.Luong@interoptechnologies.com>

Hi Ellie,

I finally resolved the dependencies with the following packages.

$ yum install python-docutils.noarch -y
$ yum install python-sphinx -y
$ yum install python-pygments.noarch -y
$ yum install python3-pip.noarch ?y

Regards,
David.

From: Cyrus-devel <cyrus-devel-bounces+david.luong=interoptechnologies.com at lists.andrew.cmu.edu<mailto:cyrus-devel-bounces+david.luong=interoptechnologies.com at lists.andrew.cmu.edu>> on behalf of ellie timoney <ellie at fastmail.com<mailto:ellie at fastmail.com>>
Date: Thursday, December 5, 2019 at 4:30 PM
To: "cyrus-devel at lists.andrew.cmu.edu<mailto:cyrus-devel at lists.andrew.cmu.edu>" <cyrus-devel at lists.andrew.cmu.edu<mailto:cyrus-devel at lists.andrew.cmu.edu>>
Subject: Re: Error building cyrus-imapd-3.1.8

Hi David,

That smells like a missing dependency.  Have you reviewed https://www.cyrusimap.org/dev/imap/developer/compiling.html ?

Looking at the error, and glancing at the dependencies list, I wonder if you need 'perl-devel'.  It's listed as a developer-only dependency, but because you're building from a git tag and not a distribution tarball (where some things with tricky dependencies have been pre-compiled), you will probably need some or all of the developer dependencies as well.

It would be nice if configure would report the missing dependency, instead of succeeding and then the build fails.  If you can track down which missing dependency caused this problem, please let us know and I'll update configure to complain about it. :)

Cheers,

ellie

On Thu, Dec 5, 2019, at 4:37 AM, Luong, David wrote:
Hi,

I?m building with the following options:

$ autoreconf -i -s
$ ./configure --enable-http --enable-jmap --enable-autocreate --enable-murder --enable-idled --enable-xapian --prefix=/usr/cyrus
$ make

The build is not completed with the following errors.

make[2]: Leaving directory `/cyrus/cyrus-imapd-3.1.8'
Making all in perl/annotator
make[2]: Entering directory `/cyrus/cyrus-imapd-3.1.8/perl/annotator'
make[2]: *** No rule to make target `all'.  Stop.
make[2]: Leaving directory `/cyrus/cyrus-imapd-3.1.8/perl/annotator'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/cyrus/cyrus-imapd-3.1.8'
make: *** [all] Error 2


Please advise.

Regards,
David.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20191211/e8fb8d14/attachment.html>

From rjbs at fastmailteam.com  Fri Dec 13 09:59:03 2019
From: rjbs at fastmailteam.com (Ricardo Signes)
Date: Fri, 13 Dec 2019 09:59:03 -0500
Subject: yearly release cycle
Message-ID: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com>

Hey, remember last month when I asked about releasing Cyrus v3.2 <https://lists.andrew.cmu.edu/pipermail/cyrus-devel/2019-November/004509.html>?

That thread had some more conversation about what needs to get done before v3.2, and I wanted to come back to it and turn some things on their head.

Right now, we?re talking about Cyrus releases being feature-bound. ?We?ll release v3.2 when feature X is done.? I think we?re not being well-served by that. As feature X is delayed (for various reasons that we can?t easily eliminate), it doesn?t just delay the feature, but also all the other minor bugfixes and optimizations that we?ve made in the master branch. Also, it sets up the idea that we delay releases for the sake of fixes, instead of releasing the fixes that are ready.

That is: every additional criteria for a new release is another doorway to delay. Instead of opening those doors, I would rather try to eliminate all of them.

I propose that instead of tying releases to milestones, we tie them to the calendar. For the sake of full disclosure: I am modeling this suggestion on the release cycle of perl <https://metacpan.org/pod/perlpolicy>, which I ran for several years. I found the process more than satisfactory, then.

 1. A new *unstable release* of Cyrus is made every month. We promise only that it compiled and passed the Cassandane test suite on the release manager?s computer. It might contain regressions from previous unstable releases, it might have crashers or corruptors. We try to avoid any of these, but the goal here is a snapshot for easy month-to-month testing. These are the odd-middle-digit releases. (3.3.x)

 2. A new *major release* of Cyrus is made every year. We will have tested it on as many configurations as we can readily test. We will have, some time before the release, frozen the branch for risky changes, to reduce churn. In the meantime, new work lives in feature branches. (The changelogs from each unstable release provide a good basis for the whole-year changelog!) These are the even-middle-digit third-digit-zero releases. (3.4.0)

 3. A new *maintenance release* of Cyrus is made for the last two stable releases when there are enough fixes to critical bugs to warrant it. These are the even-middle-digit third-digit-nonzero releases (3.4.1)

For the above to work, some more properties need to be maintained.

Maintenance releases should be no-brainers to install, so they must only fix regressions, crashers, security vulnerabilities, and the like. This means that once you?re on 3.4.0, you can always upgrade within the 3.4 series with a minimum risk. It also means you get no optimizations, features, and the like.

Major releases must clearly document any incompatible changes or upgrade steps required. Because non-regression bugfixes aren?t backported, we want everyone to be able to upgrade from major release to major release, so incompatible changes must be kept to a minimum.

In part, this is just ?don?t kill off a feature people use just because it?s a little annoying.? The more important one is ?don?t introduce half-baked things that might need to change,? because people will come to rely on them before you get the updates finished. For features that will require multiple years to get right, they have to go behind a default-off configuration option. I?d strongly suggest they all have a uniform substring like ?unstable?. That way, when a complaint comes in that the behavior of JMAP calendaring has changed, we can reply, ?well, to use it, you had to turn on the unstable_jmap_calendaring? option.

If we go with this policy, we?ll need to?

 1. identify what issues are *blockers* to v3.2.0, meaning they?re regressions from v3.0 and would reasonably prevent someone from upgrading; this does *not* include all known bugs, since they may be bugs that already exist in the last stable release!

 2. pick a release target for v3.2.0; I will arbitrarily suggest March 2 as ?not too far off, but far off enough that we can get things in order?; also, if you?re American, March 2 is 3/2 ;-)

 3. produce a changleog, and especially identify what changes in master need documentation as ?incompatible changes?

 4. produce a list of changes in master that should be put behind an unstable configuration option and then do it

 5. decide when to stop merging non-release-related things to master

 6. make a plan for who will do monthly snapshot releases

I?ve spoken with ellie and Bron about just a few of these, such that I don?t think it?s all crazy. (ellie notes, correctly, I think, that the first set of releases like this will be the hard ones, where we work out things like ?how do we keep track of incompatibilities, upgrade steps, and also how do we make snapshots dead easy to release.?) If there?s general agreement, I am definitely ready to pitch in and help try to make it work!

?
rjbs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20191213/d3e2536d/attachment.html>

From murch at fastmail.com  Fri Dec 13 10:12:34 2019
From: murch at fastmail.com (Ken Murchison)
Date: Fri, 13 Dec 2019 10:12:34 -0500
Subject: yearly release cycle
In-Reply-To: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com>
References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com>
Message-ID: <f837ecde-a44e-30f9-d975-04ddddd6416d@fastmail.com>

This all seems reasonable to me and I'm in favor of moving forward with 
this plan.


On 12/13/19 9:59 AM, Ricardo Signes wrote:
>
> Hey, remember last month when I asked about releasing Cyrus?v3.2 
> <https://lists.andrew.cmu.edu/pipermail/cyrus-devel/2019-November/004509.html>?
>
> That thread had some more conversation about what needs to get done 
> before v3.2, and I wanted to come back to it and turn some things on 
> their head.
>
> Right now, we?re talking about Cyrus releases being feature-bound. 
> ?We?ll release v3.2 when feature X is done.? I think we?re not being 
> well-served by that. As feature X is delayed (for various reasons that 
> we can?t easily eliminate), it doesn?t just delay the feature, but 
> also all the other minor bugfixes and optimizations that we?ve made in 
> the master branch. Also, it sets up the idea that we delay releases 
> for the sake of fixes, instead of releasing the fixes that are ready.
>
> That is: every additional criteria for a new release is another 
> doorway to delay. Instead of opening those doors, I would rather try 
> to eliminate all of them.
>
> I propose that instead of tying releases to milestones, we tie them to 
> the calendar. For the sake of full disclosure: I am modeling this 
> suggestion on the release cycle of perl 
> <https://metacpan.org/pod/perlpolicy>, which I ran for several years. 
> I found the process more than satisfactory, then.
>
> 1.
>
>     A new /unstable release/ of Cyrus is made every month. We promise
>     only that it compiled and passed the Cassandane test suite on the
>     release manager?s computer. It might contain regressions from
>     previous unstable releases, it might have crashers or corruptors.
>     We try to avoid any of these, but the goal here is a snapshot for
>     easy month-to-month testing. These are the odd-middle-digit
>     releases. (3.3.x)
>
> 2.
>
>     A new /major release/ of Cyrus is made every year. We will have
>     tested it on as many configurations as we can readily test. We
>     will have, some time before the release, frozen the branch for
>     risky changes, to reduce churn. In the meantime, new work lives in
>     feature branches. (The changelogs from each unstable release
>     provide a good basis for the whole-year changelog!) These are the
>     even-middle-digit third-digit-zero releases. (3.4.0)
>
> 3.
>
>     A new /maintenance release/ of Cyrus is made for the last two
>     stable releases when there are enough fixes to critical bugs to
>     warrant it. These are the even-middle-digit third-digit-nonzero
>     releases (3.4.1)
>
> For the above to work, some more properties need to be maintained.
>
> Maintenance releases should be no-brainers to install, so they must 
> only fix regressions, crashers, security vulnerabilities, and the 
> like. This means that once you?re on 3.4.0, you can always upgrade 
> within the 3.4 series with a minimum risk. It also means you get no 
> optimizations, features, and the like.
>
> Major releases must clearly document any incompatible changes or 
> upgrade steps required. Because non-regression bugfixes aren?t 
> backported, we want everyone to be able to upgrade from major release 
> to major release, so incompatible changes must be kept to a minimum.
>
> In part, this is just ?don?t kill off a feature people use just 
> because it?s a little annoying.? The more important one is ?don?t 
> introduce half-baked things that might need to change,? because people 
> will come to rely on them before you get the updates finished. For 
> features that will require multiple years to get right, they have to 
> go behind a default-off configuration option. I?d strongly suggest 
> they all have a uniform substring like ?unstable?. That way, when a 
> complaint comes in that the behavior of JMAP calendaring has changed, 
> we can reply, ?well, to use it, you had to turn on the 
> unstable_jmap_calendaring? option.
>
> If we go with this policy, we?ll need to?
>
> 1.
>
>     identify what issues are /blockers/ to v3.2.0, meaning they?re
>     regressions from v3.0 and would reasonably prevent someone from
>     upgrading; this does /not/ include all known bugs, since they may
>     be bugs that already exist in the last stable release!
>
> 2.
>
>     pick a release target for v3.2.0; I will arbitrarily suggest March
>     2 as ?not too far off, but far off enough that we can get things
>     in order?; also, if you?re American, March 2 is 3/2 ;-)
>
> 3.
>
>     produce a changleog, and especially identify what changes in
>     master need documentation as ?incompatible changes?
>
> 4.
>
>     produce a list of changes in master that should be put behind an
>     unstable configuration option and then do it
>
> 5.
>
>     decide when to stop merging non-release-related things to master
>
> 6.
>
>     make a plan for who will do monthly snapshot releases
>
> I?ve spoken with ellie and Bron about just a few of these, such that I 
> don?t think it?s all crazy. (ellie notes, correctly, I think, that the 
> first set of releases like this will be the hard ones, where we work 
> out things like ?how do we keep track of incompatibilities, upgrade 
> steps, and also how do we make snapshots dead easy to release.?) If 
> there?s general agreement, I am definitely ready to pitch in and help 
> try to make it work!
>
> ?
> rjbs
>
>
-- 
Ken Murchison
Cyrus Development Team
Fastmail US LLC

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20191213/eb8a5c74/attachment-0001.html>

From me at anatoli.ws  Tue Dec 17 12:58:26 2019
From: me at anatoli.ws (Anatoli)
Date: Tue, 17 Dec 2019 14:58:26 -0300
Subject: yearly release cycle
In-Reply-To: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com>
References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com>
Message-ID: <0dd011b9-1300-8596-d2e1-1169ffa7da0e@anatoli.ws>

Hi Ricardo,

I find interesting some ideas in the proposed changes, especially I like
this: "uniform substring like ?unstable?" (with startup-time warnings if
they change or become stable) and "Maintenance releases should be
no-brainers to install... This means that once you?re on 3.4.0, you can
always upgrade within the 3.4 series with a minimum risk. It also means
you get no optimizations, features, and the like.", though I wouldn't
limit the fixes to only the critical ones. Any bugfix that doesn't
change the behavior is included in the stable maintenance releases.

But I couldn't understand from the description what are the benefits of
tying major releases to certain calendar dates vs to make a release when
certain desired features are implemented and well tested.

What happens if some major new feature, that is a must for a new major
release to be published in a week, just isn't stable enough yet? Would
it have to wait for an entire year to be included in the next major
release? Or would you release it anyway as a stable release with known
issues?

Then, when you implement a new large feature, who would test it? Today,
for example, I (as an advanced user and a potential community dev) can
run 3.1 branch at some semi-production deployments (and I sometimes do)
and report issues. If, with the new scheme, you only guarantee that the
unstable branch just compiles, certainly I wouldn't be using it
anywhere, and probably neither would other users. Then pre-production
testing of new features would be exclusively the developers' task, with
obvious limitations.

So when the devs are sure that a new feature works well (in their setups
and for their use cases), it is included in the next major stable
release... and suddenly a lot of migrating users start finding issues.
That could create an impression that the new stable releases of Cyrus
are not that stable at all.


> As feature X is delayed (for various reasons that we can?t easily
> eliminate), it doesn?t just delay the feature, but also all the other
> minor bugfixes and optimizations that we?ve made in the master branch.

Why would a new feature of a stable release (3.2.0) delay bugfixes in
the current stable branch (3.0)?

> Also, it sets up the idea that we delay releases for the sake of
> fixes, instead of releasing the fixes that are ready.

I don't understand what you mean here, but with the current scheme
(AFAIK) the bug fixes go to the current stable branch (3.0) and all
users receive them without delays. New development happens in dev. When
some new feature is stable according to the devs (well tested in all
environments available to them), it is published as a new minor release
in unstable branch (3.1.x). This is expected to be fully-functional
releases, just not proven by the time and the community to be bugs-free.

I'm not sure how new major releases are managed today, but it could be
done this way: at some point in time, when devs decide that the
unstable/3.1 branch has accumulated enough features to be published, 3.1
is frozen for new features and it becomes 3.2.0-RC1 so the community in
general could start testing the new candidate stable version in their
test deployments. If issues are found, they are fixed in RC2, RC3, and
so on until no issues are reported for, say, 1 month. Then, the last
issues-free RC becomes 3.2.0 release.

At the same time, when 3.1 is frozen for new features, a 3.3 branch is
created and new features start landing there.

And the current stable branch 3.0 receives bug fixes as usual during all
this time. New optimizations probably won't be included in the 3.0.x
maintenance versions, but that's OK IMO. It's stable, not cutting-edge
after all. But it is bugs-free to the extent possible. All bugs, major
and minor (without behavior changes), are fixed there immediately.

This is a typical release cycle of many server projects. The main
advantage over date-bound releases is that the releases are published
when they are ready, not when we reach some specific point in time.

The disadvantage of the potential for delays could be mitigated by
defining certain criteria for the features to be included in each major
release. Also, some flexible dates could be defined, e.g. to publish a
major release every 6-12 months.

Regards,
Anatoli

On 13/12/19 11:59, Ricardo Signes wrote:
> Hey, remember last month when I asked about releasing Cyrus?v3.2
> <https://lists.andrew.cmu.edu/pipermail/cyrus-devel/2019-November/004509.html>?
> 
> That thread had some more conversation about what needs to get done
> before v3.2, and I wanted to come back to it and turn some things on
> their head.
> 
> Right now, we?re talking about Cyrus releases being feature-bound.
> ?We?ll release v3.2 when feature X is done.? I think we?re not being
> well-served by that. As feature X is delayed (for various reasons that
> we can?t easily eliminate), it doesn?t just delay the feature, but also
> all the other minor bugfixes and optimizations that we?ve made in the
> master branch. Also, it sets up the idea that we delay releases for the
> sake of fixes, instead of releasing the fixes that are ready.
> 
> That is: every additional criteria for a new release is another doorway
> to delay. Instead of opening those doors, I would rather try to
> eliminate all of them.
> 
> I propose that instead of tying releases to milestones, we tie them to
> the calendar. For the sake of full disclosure: I am modeling this
> suggestion on the release cycle of perl
> <https://metacpan.org/pod/perlpolicy>, which I ran for several years. I
> found the process more than satisfactory, then.
> 
>  1.
> 
>     A new /unstable release/ of Cyrus is made every month. We promise
>     only that it compiled and passed the Cassandane test suite on the
>     release manager?s computer. It might contain regressions from
>     previous unstable releases, it might have crashers or corruptors. We
>     try to avoid any of these, but the goal here is a snapshot for easy
>     month-to-month testing. These are the odd-middle-digit releases. (3.3.x)
> 
>  2.
> 
>     A new /major release/ of Cyrus is made every year. We will have
>     tested it on as many configurations as we can readily test. We will
>     have, some time before the release, frozen the branch for risky
>     changes, to reduce churn. In the meantime, new work lives in feature
>     branches. (The changelogs from each unstable release provide a good
>     basis for the whole-year changelog!) These are the even-middle-digit
>     third-digit-zero releases. (3.4.0)
> 
>  3.
> 
>     A new /maintenance release/ of Cyrus is made for the last two stable
>     releases when there are enough fixes to critical bugs to warrant it.
>     These are the even-middle-digit third-digit-nonzero releases (3.4.1)
> 
> For the above to work, some more properties need to be maintained.
> 
> Maintenance releases should be no-brainers to install, so they must only
> fix regressions, crashers, security vulnerabilities, and the like. This
> means that once you?re on 3.4.0, you can always upgrade within the 3.4
> series with a minimum risk. It also means you get no optimizations,
> features, and the like.
> 
> Major releases must clearly document any incompatible changes or upgrade
> steps required. Because non-regression bugfixes aren?t backported, we
> want everyone to be able to upgrade from major release to major release,
> so incompatible changes must be kept to a minimum.
> 
> In part, this is just ?don?t kill off a feature people use just because
> it?s a little annoying.? The more important one is ?don?t introduce
> half-baked things that might need to change,? because people will come
> to rely on them before you get the updates finished. For features that
> will require multiple years to get right, they have to go behind a
> default-off configuration option. I?d strongly suggest they all have a
> uniform substring like ?unstable?. That way, when a complaint comes in
> that the behavior of JMAP calendaring has changed, we can reply, ?well,
> to use it, you had to turn on the unstable_jmap_calendaring? option.
> 
> If we go with this policy, we?ll need to?
> 
>  1.
> 
>     identify what issues are /blockers/ to v3.2.0, meaning they?re
>     regressions from v3.0 and would reasonably prevent someone from
>     upgrading; this does /not/ include all known bugs, since they may be
>     bugs that already exist in the last stable release!
> 
>  2.
> 
>     pick a release target for v3.2.0; I will arbitrarily suggest March 2
>     as ?not too far off, but far off enough that we can get things in
>     order?; also, if you?re American, March 2 is 3/2 ;-)
> 
>  3.
> 
>     produce a changleog, and especially identify what changes in master
>     need documentation as ?incompatible changes?
> 
>  4.
> 
>     produce a list of changes in master that should be put behind an
>     unstable configuration option and then do it
> 
>  5.
> 
>     decide when to stop merging non-release-related things to master
> 
>  6.
> 
>     make a plan for who will do monthly snapshot releases
> 
> I?ve spoken with ellie and Bron about just a few of these, such that I
> don?t think it?s all crazy. (ellie notes, correctly, I think, that the
> first set of releases like this will be the hard ones, where we work out
> things like ?how do we keep track of incompatibilities, upgrade steps,
> and also how do we make snapshots dead easy to release.?) If there?s
> general agreement, I am definitely ready to pitch in and help try to
> make it work!
> 
> ?
> rjbs
> 
> 

From rjbs at fastmailteam.com  Tue Dec 17 22:08:20 2019
From: rjbs at fastmailteam.com (Ricardo Signes)
Date: Tue, 17 Dec 2019 22:08:20 -0500
Subject: yearly release cycle
In-Reply-To: <0dd011b9-1300-8596-d2e1-1169ffa7da0e@anatoli.ws>
References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com>
 <0dd011b9-1300-8596-d2e1-1169ffa7da0e@anatoli.ws>
Message-ID: <876bd6ef-b002-4da0-8ed9-ecd941db77c1@dogfood.fastmail.com>

On Tue, Dec 17, 2019, at 12:58, Anatoli wrote:
> Hi Ricardo,

Hi!

> But I couldn't understand from the description what are the benefits of
> tying major releases to certain calendar dates vs to make a release when
> certain desired features are implemented and well tested.

By promising a new major release every year, you know, when your significant improvement to Cyrus is accepted, it will very likely be released within a year. Right now, users who added a major feature in 2017 are still waiting for a stable release. For example, Sieve duplicate detection was implemented in March 2017. I don't think we have a stable version that has this feature. If this had been a contribution from a potential repeat contributor, it's easy to imagine that they'd have given up in frustration, by now. (Good thing it was good ol' reliable Ken!)

The problem with "we will release when X" is ready is that X might not be ready in a year, meaning all the little things don't get released. Also, you can't shove those into maintenance releases, because the little things still can be destabilizing, so it's less likely to be no problem to just upgrade.

In the event that a cool new feature isn't quite ready a month before release, I would argue: yes, it has to wait another year. I think it will be pretty rare that this happens, though. If it comes up, of course, an exception to the rules could be discussed, but in my experience, it won't. These kind of features, in largely volunteer-staffed projects, are rarely good at sticking to a timeline.

> Then, when you implement a new large feature, who would test it?

1. new large features should have tests written for them, which should be run by developers and dedicated test runners
2. some people always like to run snapshot releases; I have often done coding on dev releases of languages, and some people will surely run their personal services on snapshots
3. feature authors write features so they can use them; this means they're also both motivated and likely to use them before they're declared ready for general release

I feel pretty strongly that #3 is the big test. We're almost always close to bleeding edge Cyrus at work, because we have tons of new features that we rely on since cyrus-imapd-3.0.0. We know that many, many of these have been heavily tested in the real world, and we want to declare them generally ready for use, and then be able to do the same regularly as we move forward.

> Today,
> for example, I (as an advanced user and a potential community dev) can
> run 3.1 branch at some semi-production deployments (and I sometimes do)
> and report issues. If, with the new scheme, you only guarantee that the
> unstable branch just compiles, certainly I wouldn't be using it
> anywhere, and probably neither would other users. Then pre-production
> testing of new features would be exclusively the developers' task, with
> obvious limitations.

I think you are seriously overestimating the kind of stability guarantee you get from a 3.1 release. It's really not much more than the proposed snapshot releases, but on a looser timetable. Mostly, we get our current feedback from master, rather than snapshots, because there are fewer known snapshot deployments. Deploying snapshots regularly will give more points where we're specifically asking for feedback. (Also, I guaranteed Cassandane tests would pass, which is a *far* stronger guarantee than compilation.)

My expectation is that in reality, the snapshots will be, at any given time, very close to what Fastmail is running in production, or at least in (real, used by real people for real mail) testing environments.

> So when the devs are sure that a new feature works well (in their setups
> and for their use cases), it is included in the next major stable
> release... and suddenly a lot of migrating users start finding issues.
> That could create an impression that the new stable releases of Cyrus
> are not that stable at all.

I expect these features will have been heavily tested over the course of the time between releases.

> I don't understand what you mean here, but with the current scheme
> (AFAIK) the bug fixes go to the current stable branch (3.0) and all
> users receive them without delays.

There are two kinds of bugfixes. Some are "there is an obvious regression or crasher." Others are "there has long been a bug that meant that SELECT would fail on mUTF-7 sequences containing three hyphens in a row, and I fixed it!" The intent here is to include only the first category in new maintenance releases, because that optimizes maint releases for stability, making them easy to install without fear. The other fixes are put into the next possible snapshot for inclusion in the next major release.

I think your major concerns are:

1. new features might languish for a longer time than needed to be known stable
2. snapshots will be less reliable under this regime than before

I feel strongly that #1 will not be the case. We can always talk about making an interim major release if it comes up, but I am predicting that it will not, and if it does, we will think about it and decide that the effort to make sure we feel good about an unexpected major release is not enough to push us to rush. I acknowledge that reasonable people can disagree on this, but the good thing is: we can wait and see!

I disagree that #2 will be the case. Master does not churn with very much untested code, and I'm hoping we will slow it down even further by putting more features into feature branches until they're more battle-tested. That will get us more "this snapshot introduces feature X, which has been tested by production users and load!" rather than "master has been growing feature X in pieces for months, and it's all a bit weird."

In general, with Fastmail probably-always running a fast-forward of a snapshot in testing, I feel pretty confident about snapshot use for similar under-load testing elsewhere.

-- 
rjbs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20191217/49952f19/attachment-0001.html>

From ellie at fastmail.com  Tue Dec 17 22:47:58 2019
From: ellie at fastmail.com (ellie timoney)
Date: Wed, 18 Dec 2019 14:47:58 +1100
Subject: Cyrus IMAPd version 3.1.9
Message-ID: <e0e52428-9197-480b-b864-9467be5d177b@www.fastmail.com>

The Cyrus team is pleased to announce the immediate availability of a new version of Cyrus IMAP: 3.1.9

This is a snapshot of the master branch, and should be considered for testing purposes and bleeding-edge features only. It is available as a git tag, which can be found here:

https://github.com/cyrusimap/cyrus-imapd/releases/tag/cyrus-imapd-3.1.9

Join us on Github at https://github.com/cyrusimap/cyrus-imapd to report issues, join in the deliberations of new features for the next Cyrus IMAP release, and to contribute to the documentation.

On behalf of the Cyrus team, 

ellie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20191218/c255e302/attachment.html>

From dilyan.palauzov at aegee.org  Wed Dec 18 05:13:06 2019
From: dilyan.palauzov at aegee.org (=?UTF-8?Q?=D0=94=D0=B8=D0=BB=D1=8F=D0=BD_?= =?UTF-8?Q?=D0=9F=D0=B0=D0=BB=D0=B0=D1=83=D0=B7=D0=BE=D0=B2?=)
Date: Wed, 18 Dec 2019 10:13:06 +0000
Subject: yearly release cycle
In-Reply-To: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com>
References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com>
Message-ID: <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org>

Hello!

This is a very good idea!  In particular it makes the gap between development code and stable code smaller.  Thus fixes
for the stable code will be very similar to fixes on the development code.

Of course, providing fixes, like optimizations, makes only sense if it is predictable whether the changes will be
integrated in reasonable time.

The email of Quanah Gibson-Mount from 25 July about the general policy on integrating patches in Cyrus SASL is not
answered.

Will the time?based release policy also apply to Cyrus SASL?

The documentation of Cyrus IMAP, in its invisable parts, needs some tweaking, like the Table Of Content shall be loop-
free.  In March I submitted a fix at https://github.com/cyrusimap/cyrus-imapd/pull/2703 which is still pending .  Today
I have forgotten the detalis, so even if somebody starts integrating this and has questions, I am not willing to reread
again how Sphinx works (I have not used Sphinx since then) and digest why I did things in a particular way.

Each release announcement encourages contributions to the documentation.

Regards
  ?????

On Fri, 2019-12-13 at 09:59 -0500, Ricardo Signes wrote:
> Hey, remember last month when I asked about releasing Cyrus v3.2?
> 
> That thread had some more conversation about what needs to get done before v3.2, and I wanted to come back to it and turn some things on their head.
> 
> Right now, we?re talking about Cyrus releases being feature-bound. ?We?ll release v3.2 when feature X is done.? I think we?re not being well-served by that. As feature X is delayed (for various reasons that we can?t easily eliminate), it doesn?t just delay the feature, but also all the other minor bugfixes and optimizations that we?ve made in the master branch. Also, it sets up the idea that we delay releases for the sake of fixes, instead of releasing the fixes that are ready.
> 
> That is: every additional criteria for a new release is another doorway to delay. Instead of opening those doors, I would rather try to eliminate all of them.
> 
> I propose that instead of tying releases to milestones, we tie them to the calendar. For the sake of full disclosure: I am modeling this suggestion on the release cycle of perl, which I ran for several years. I found the process more than satisfactory, then.
> 
> A new unstable release of Cyrus is made every month. We promise only that it compiled and passed the Cassandane test suite on the release manager?s computer. It might contain regressions from previous unstable releases, it might have crashers or corruptors. We try to avoid any of these, but the goal here is a snapshot for easy month-to-month testing. These are the odd-middle-digit releases. (3.3.x)
> 
> A new major release of Cyrus is made every year. We will have tested it on as many configurations as we can readily test. We will have, some time before the release, frozen the branch for risky changes, to reduce churn. In the meantime, new work lives in feature branches. (The changelogs from each unstable release provide a good basis for the whole-year changelog!) These are the even-middle-digit third-digit-zero releases. (3.4.0)
> 
> A new maintenance release of Cyrus is made for the last two stable releases when there are enough fixes to critical bugs to warrant it. These are the even-middle-digit third-digit-nonzero releases (3.4.1)
> 
> For the above to work, some more properties need to be maintained.
> 
> Maintenance releases should be no-brainers to install, so they must only fix regressions, crashers, security vulnerabilities, and the like. This means that once you?re on 3.4.0, you can always upgrade within the 3.4 series with a minimum risk. It also means you get no optimizations, features, and the like.
> 
> Major releases must clearly document any incompatible changes or upgrade steps required. Because non-regression bugfixes aren?t backported, we want everyone to be able to upgrade from major release to major release, so incompatible changes must be kept to a minimum.
> 
> In part, this is just ?don?t kill off a feature people use just because it?s a little annoying.? The more important one is ?don?t introduce half-baked things that might need to change,? because people will come to rely on them before you get the updates finished. For features that will require multiple years to get right, they have to go behind a default-off configuration option. I?d strongly suggest they all have a uniform substring like ?unstable?. That way, when a complaint comes in that the behavior of JMAP calendaring has changed, we can reply, ?well, to use it, you had to turn on the unstable_jmap_calendaring? option.
> 
> If we go with this policy, we?ll need to?
> 
> identify what issues are blockers to v3.2.0, meaning they?re regressions from v3.0 and would reasonably prevent someone from upgrading; this does not include all known bugs, since they may be bugs that already exist in the last stable release!
> 
> pick a release target for v3.2.0; I will arbitrarily suggest March 2 as ?not too far off, but far off enough that we can get things in order?; also, if you?re American, March 2 is 3/2 ;-)
> 
> produce a changleog, and especially identify what changes in master need documentation as ?incompatible changes?
> 
> produce a list of changes in master that should be put behind an unstable configuration option and then do it
> 
> decide when to stop merging non-release-related things to master
> 
> make a plan for who will do monthly snapshot releases
> 
> I?ve spoken with ellie and Bron about just a few of these, such that I don?t think it?s all crazy. (ellie notes, correctly, I think, that the first set of releases like this will be the hard ones, where we work out things like ?how do we keep track of incompatibilities, upgrade steps, and also how do we make snapshots dead easy to release.?) If there?s general agreement, I am definitely ready to pitch in and help try to make it work!
> 
> ?
> rjbs
> 
> 


From rjbs at fastmailteam.com  Fri Dec 20 21:02:32 2019
From: rjbs at fastmailteam.com (Ricardo Signes)
Date: Fri, 20 Dec 2019 21:02:32 -0500
Subject: yearly release cycle
In-Reply-To: <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org>
References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com>
 <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org>
Message-ID: <3b3208ef-d5fd-4ccc-bc80-5d6ce03ce9bb@beta.fastmail.com>

On Wed, Dec 18, 2019, at 05:13, ????? ???????? wrote:
> The email of Quanah Gibson-Mount from 25 July about the general policy on integrating patches in Cyrus SASL is not
> answered.
> 
> Will the time?based release policy also apply to Cyrus SASL?

I think there was some discussion / decision on this a while back, but I don't remember. cyrus-sasl always floats just outside my field of vision? I *think* I'll be talking to Ken on Monday, who can clear things up.

-- 
rjbs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20191220/604cc1cf/attachment.html>

From quanah at symas.com  Fri Dec 20 22:15:01 2019
From: quanah at symas.com (Quanah Gibson-Mount)
Date: Fri, 20 Dec 2019 19:15:01 -0800
Subject: yearly release cycle
In-Reply-To: <3b3208ef-d5fd-4ccc-bc80-5d6ce03ce9bb@beta.fastmail.com>
References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com>
 <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org>
 <3b3208ef-d5fd-4ccc-bc80-5d6ce03ce9bb@beta.fastmail.com>
Message-ID: <BC98C0F0B8831414F0B5F812@[192.168.1.144]>


--On Friday, December 20, 2019 9:02 PM -0500 Ricardo Signes 
<rjbs at fastmailteam.com> wrote:

> I think there was some discussion / decision on this a while back, but I
> don't remember.  cyrus-sasl always floats just outside my field of
> vision?  I think I'll be talking to Ken on Monday, who can clear things
> up.

Last August, Ken and I were discussing myself and Howard Chu getting commit 
access to the cyrus-sasl portion of the project.  It had been agreed to be 
done, but then never occurred.  Howard and I are still interested and 
willing in this, particularly given cyrus-sasl's importance to OpenLDAP.

Regards,
Quanah

--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>

From murch at fastmail.com  Sat Dec 21 08:52:45 2019
From: murch at fastmail.com (Ken Murchison)
Date: Sat, 21 Dec 2019 08:52:45 -0500
Subject: yearly release cycle
In-Reply-To: <BC98C0F0B8831414F0B5F812@[192.168.1.144]>
References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com>
 <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org>
 <3b3208ef-d5fd-4ccc-bc80-5d6ce03ce9bb@beta.fastmail.com>
 <BC98C0F0B8831414F0B5F812@[192.168.1.144]>
Message-ID: <02c18fb9-2848-53a2-b561-4c0db961e178@fastmail.com>

Quanah,

I will try to make this happen next week.


On 12/20/19 10:15 PM, Quanah Gibson-Mount wrote:
>
>
> --On Friday, December 20, 2019 9:02 PM -0500 Ricardo Signes 
> <rjbs at fastmailteam.com> wrote:
>
>> I think there was some discussion / decision on this a while back, but I
>> don't remember.? cyrus-sasl always floats just outside my field of
>> vision?? I think I'll be talking to Ken on Monday, who can clear things
>> up.
>
> Last August, Ken and I were discussing myself and Howard Chu getting 
> commit access to the cyrus-sasl portion of the project. It had been 
> agreed to be done, but then never occurred.? Howard and I are still 
> interested and willing in this, particularly given cyrus-sasl's 
> importance to OpenLDAP.
>
> Regards,
> Quanah
>
> -- 
>
> Quanah Gibson-Mount
> Product Architect
> Symas Corporation
> Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
> <http://www.symas.com>

-- 
Ken Murchison
Cyrus Development Team
Fastmail US LLC


From ellie at fastmail.com  Sun Dec 22 19:36:03 2019
From: ellie at fastmail.com (ellie timoney)
Date: Mon, 23 Dec 2019 11:36:03 +1100
Subject: yearly release cycle
In-Reply-To: <02c18fb9-2848-53a2-b561-4c0db961e178@fastmail.com>
References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com>
 <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org>
 <3b3208ef-d5fd-4ccc-bc80-5d6ce03ce9bb@beta.fastmail.com>
 <BC98C0F0B8831414F0B5F812@[192.168.1.144]>
 <02c18fb9-2848-53a2-b561-4c0db961e178@fastmail.com>
Message-ID: <a10a8327-b1ae-4f27-b9f2-b2886bc4a840@www.fastmail.com>

I tracked down Quanah's github account from a recent pull request, and sent through an invitation to the cyrusimap organisation.

Not sure what Howard Chu's email address or github username is?  I can invite him too once I know.

Cheers,

ellie

On Sun, Dec 22, 2019, at 12:52 AM, Ken Murchison wrote:
> Quanah,
> 
> I will try to make this happen next week.
> 
> 
> On 12/20/19 10:15 PM, Quanah Gibson-Mount wrote:
> >
> >
> > --On Friday, December 20, 2019 9:02 PM -0500 Ricardo Signes 
> > <rjbs at fastmailteam.com> wrote:
> >
> >> I think there was some discussion / decision on this a while back, but I
> >> don't remember.? cyrus-sasl always floats just outside my field of
> >> vision?? I think I'll be talking to Ken on Monday, who can clear things
> >> up.
> >
> > Last August, Ken and I were discussing myself and Howard Chu getting 
> > commit access to the cyrus-sasl portion of the project. It had been 
> > agreed to be done, but then never occurred.? Howard and I are still 
> > interested and willing in this, particularly given cyrus-sasl's 
> > importance to OpenLDAP.
> >
> > Regards,
> > Quanah
> >
> > -- 
> >
> > Quanah Gibson-Mount
> > Product Architect
> > Symas Corporation
> > Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
> > <http://www.symas.com>
> 
> -- 
> Ken Murchison
> Cyrus Development Team
> Fastmail US LLC
> 
>

From quanah at symas.com  Mon Dec 23 10:03:00 2019
From: quanah at symas.com (Quanah Gibson-Mount)
Date: Mon, 23 Dec 2019 07:03:00 -0800
Subject: yearly release cycle
In-Reply-To: <a10a8327-b1ae-4f27-b9f2-b2886bc4a840@www.fastmail.com>
References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com>
 <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org>
 <3b3208ef-d5fd-4ccc-bc80-5d6ce03ce9bb@beta.fastmail.com>
 <BC98C0F0B8831414F0B5F812@[192.168.1.144]>
 <02c18fb9-2848-53a2-b561-4c0db961e178@fastmail.com>
 <a10a8327-b1ae-4f27-b9f2-b2886bc4a840@www.fastmail.com>
Message-ID: <108C557194349F9EF5D7E2EC@[192.168.1.144]>


--On Monday, December 23, 2019 11:36 AM +1100 ellie timoney 
<ellie at fastmail.com> wrote:

> I tracked down Quanah's github account from a recent pull request, and
> sent through an invitation to the cyrusimap organisation.
>
> Not sure what Howard Chu's email address or github username is?  I can
> invite him too once I know.

Thanks Ellie!  His github username is "hyc".

<https://github.com/hyc>

Regards,
Quanah

--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>

From ellie at fastmail.com  Mon Dec 23 16:15:53 2019
From: ellie at fastmail.com (ellie timoney)
Date: Tue, 24 Dec 2019 08:15:53 +1100
Subject: yearly release cycle
In-Reply-To: <108C557194349F9EF5D7E2EC@[192.168.1.144]>
References: <76ffb8a6-9204-445e-98e8-5ac19e4e8a3f@dogfood.fastmail.com>
 <321e02a902102e977ea31b303dd122afedb45cb1.camel@aegee.org>
 <3b3208ef-d5fd-4ccc-bc80-5d6ce03ce9bb@beta.fastmail.com>
 <BC98C0F0B8831414F0B5F812@[192.168.1.144]>
 <02c18fb9-2848-53a2-b561-4c0db961e178@fastmail.com>
 <a10a8327-b1ae-4f27-b9f2-b2886bc4a840@www.fastmail.com>
 <108C557194349F9EF5D7E2EC@[192.168.1.144]>
Message-ID: <29715f37-38d5-45ea-8acb-4eec0f8faffe@www.fastmail.com>

Thanks, invite sent! :)

On Tue, Dec 24, 2019, at 2:03 AM, Quanah Gibson-Mount wrote:
> 
> 
> --On Monday, December 23, 2019 11:36 AM +1100 ellie timoney 
> <ellie at fastmail.com> wrote:
> 
> > I tracked down Quanah's github account from a recent pull request, and
> > sent through an invitation to the cyrusimap organisation.
> >
> > Not sure what Howard Chu's email address or github username is?  I can
> > invite him too once I know.
> 
> Thanks Ellie!  His github username is "hyc".
> 
> <https://github.com/hyc>
> 
> Regards,
> Quanah
> 
> --
> 
> Quanah Gibson-Mount
> Product Architect
> Symas Corporation
> Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
> <http://www.symas.com>
>