use valgrind / Re: SIGSEGV in cyrus-imapd 3.0.7 mupdate

ellie timoney ellie at fastmail.com
Thu Jul 12 23:36:23 EDT 2018


>From what I'm seeing here, it looks like mupdate calls tls_init_serverengine() for each new STARTTLS session, and then calls tls_shutdown_serverengine()  when that session ends.

The thing is though, the TLS state that these functions manage is something like a singleton, it should only exist once for each process.  And in fact, tls_init_serverengine() enforces this (which is why you don't see the dh_params being allocated for each connection: it only lets itself run once).  This works fine for all the single-threaded service programs, but mupdate is a threaded application and can have multiple connected client sessions in the same process.  The comments at the top of imap/tls.c are instructive:

> * DESCRIPTION
> *       This module is the interface between Cyrus Imapd and the OpenSSL library.
> *       As of now only one filedescriptor can be handled, so only one
> *       TLS channel can be open at a time.

I can't see anything in the code that makes me suspect this comment is out of date.  It doesn't look the slightest bit thread-safe.

At the moment, it looks to me like the STARTTLS support was added to mupdate with no consideration of its thread-safety, and I'm left to assume that people aren't really using STARTTLS for their mupdate connections, otherwise this probably would've been tripped over long ago.  (Or, maybe it was, and someone said "don't do that", and then it all got lost to history?  It would be interesting to hear from the front lines.)

Anyway, it looks to me like the STARTTLS support in mupdate is just fundamentally broken at the moment, and my recommendation is to not use it.  If your IMAP servers need to connect to an mupdate server that's not within their trusted network, I guess you'd need to set up a VPN for it or something along those lines (but I'm no network specialist).

I don't honestly see this being fixed any time soon -- it would require either:

 * a big rewrite of imap/tls.c to make it thread-safe
 * a big rewrite of mupdate to make it single-threaded like the rest of Cyrus
 * a big rewrite of mupdate to make it do its own TLS handling (rather than using imap/tls.c)

All of these are serious tasks with serious testing requirements, especially considering the need to interact with OpenSSL correctly.  Even if someone does produce patches for master, they won't make it back to the 3.0 series.

Sorry to be the bearer of annoying news!

ellie

On Fri, Jul 13, 2018, at 12:57 PM, ellie timoney wrote:
> I'm still digging, but if you amend your log patch to also NULL out 
> dh_params after it frees it, that would at least prevent it trying to 
> free it over and over:
> 
>   #if (OPENSSL_VERSION_NUMBER >= 0x0090800fL)
> -        if (dh_params) DH_free(dh_params);
> +        if (dh_params) {
> +               syslog(LOG_CRIT, "dh_params will be freed %p", dh_params);
> +               DH_free(dh_params);
> +               dh_params = NULL;
> +               syslog(LOG_CRIT, "dh_params were freed %p", dh_params);
> +       }
>   #endif
> 
> On Fri, Jul 13, 2018, at 12:47 PM, ellie timoney wrote:
> > > so it seems to me that the dh_params were set once on startup but  
> > > freed for each closed connection
> > 
> > Yikes :o
> > 
> > On Thu, Jul 12, 2018, at 7:09 PM, Michael Menge wrote:
> > > Hi,
> > > 
> > > Дилян had suggested to add some debug outputs to imap/tls.c
> > > 
> > > --------
> > > 
> > > diff --git a/imap/tls.c b/imap/tls.c
> > > --- a/imap/tls.c
> > > +++ b/imap/tls.c
> > > @@ -893,7 +893,9 @@ EXPORTED int     tls_init_serverengine(const char *ident,
> > > 
> > >   #if (OPENSSL_VERSION_NUMBER >= 0x0090800fL)
> > >       /* Load DH params for DHE-* key exchanges */
> > > +    syslog(LOG_CRIT, "dh_params will be set, current value=%p", dh_params);
> > >       dh_params = load_dh_param(server_key_file, server_cert_file);
> > > +    syslog(LOG_CRIT, "dh_params were set current_value=%p", dh_params);
> > >       SSL_CTX_set_tmp_dh(s_ctx, dh_params);
> > >   #endif
> > > 
> > > @@ -1308,7 +1310,11 @@ EXPORTED int tls_shutdown_serverengine(void)
> > >           }
> > > 
> > >   #if (OPENSSL_VERSION_NUMBER >= 0x0090800fL)
> > > -        if (dh_params) DH_free(dh_params);
> > > +        if (dh_params) {
> > > +               syslog(LOG_CRIT, "dh_params will be freed %p", dh_params);
> > > +               DH_free(dh_params);
> > > +               syslog(LOG_CRIT, "dh_params were freed %p", dh_params);
> > > +       }
> > >   #endif
> > >       }
> > > -----
> > > 
> > > I did run mupdate with this debug output
> > > 
> > > 
> > > Jul 12 10:17:25 mx02 mu/mupdate[6537]: dh_params will be set, current  
> > > value=(nil)
> > > Jul 12 10:17:25 mx02 mu/mupdate[6537]: inittls: Loading DH parameters  
> > > from file
> > > Jul 12 10:17:25 mx02 mu/mupdate[6537]: dh_params were set  
> > > current_value=0x7fc7541b9600
> > > Jul 12 10:17:25 mx02 mu/mupdate[6537]: starttls: TLSv1.2 with cipher  
> > > ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits new) no authentication
> > > Jul 12 10:17:25 mx02 mu/mupdate[6537]: login: msmx02.mail.localhost  
> > > [10.23.21.78] cyrus PLAIN+TLS User logged in
> > > Jul 12 10:18:37 mx02 mu/mupdate[6537]: starttls: TLSv1.2 with cipher  
> > > ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits new) no authentication
> > > Jul 12 10:18:37 mx02 mu/mupdate[6537]: login: msmx02.mail.localhost  
> > > [10.23.21.78] cyrus PLAIN+TLS User logged in
> > > Jul 12 10:18:37 mx02 mu/mupdate[6537]: dh_params will be freed 0x7fc7541b9600
> > > Jul 12 10:18:37 mx02 mu/mupdate[6537]: dh_params were freed 0x7fc7541b9600
> > > Jul 12 10:18:37 mx02 mu/mupdate[6537]: starttls: TLSv1.2 with cipher  
> > > ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits new) no authentication
> > > Jul 12 10:18:37 mx02 mu/mupdate[6537]: login: msmx02.mail.localhost  
> > > [10.23.21.78] cyrus PLAIN+TLS User logged in
> > > Jul 12 10:18:37 mx02 mu/mupdate[6537]: dh_params will be freed 0x7fc7541b9600
> > > Jul 12 10:18:38 mx02 mu/master[6534]: process type:SERVICE  
> > > name:mupdate path:/usr/local/libexec/mupdate age:89.131s pid:6537  
> > > signaled to death by signal 11 (Segmentation fault, core dumped)
> > > Jul 12 10:18:38 mx02 mu/master[6534]: service mupdate/ipv4 pid 6537 in  
> > > READY state: terminated abnormally
> > > 
> > > so it seems to me that the dh_params were set once on startup but  
> > > freed for each closed connection
> > > 
> > > Regards
> > > 
> > >      Michael Menge
> > > 
> > > 
> > > Quoting Michael Menge <michael.menge at zdv.uni-tuebingen.de>:
> > > 
> > > > Hi Дилян,
> > > >
> > > >
> > > > Quoting Дилян Палаузов <Dilyan.Palauzov at aegee.org>:
> > > >
> > > >> Hello Michael,
> > > >>
> > > >> this is likely either a memory mishandling issue (use after free(),
> > > >> double free(), invalid read()/write()...), which gets evident if cyrus
> > > >> is run under valgrind --tool=memcheck.  I run it with
> > > >>
> > > >> valgrind --num-callers=30 --leak-check=full --track-origins=yes --read-
> > > >> var-info=yes --show-leak-kinds=all --trace-children=yes --track-fds=yes
> > > >> /usr/local/cyrus/bin/master -D &> log-file-memcheck
> > > >>
> > > >
> > > > thanks for the valgind options. Valgrind did find some "Invalid read  
> > > > of size 4"
> > > > and "size 8" in DH_free as well as "Invalid write of size 4" in  
> > > > CRYPTO_add_lock
> > > > and "size 8" in OPENSSL_cleanse. As far as i can tell the memory was  
> > > > always free'd
> > > > before by CRYPTO_free (mem.c:434).
> > > >
> > > > I have attached the full log
> > > >
> > > >>
> > > >> Another reason can be multi-threaded inconsistencies: mutexes locked in
> > > >> inconsistent order by differnt threads (while this is not a cause for
> > > >> crash, it leads to deadlock), mutexes locked in one thread and unlocked
> > > >> in another or alike.  This can be detected by valgrind/helgrind
> > > >>
> > > >> valgrind --tool=helgrind --num-callers=30 --leak-check=full --track-
> > > >> origins=yes --read-var-info=yes --trace-children=yes --track-fds=yes
> > > >> /usr/local/cyrus/bin/master -D &> log-file-helgrind
> > > >>
> > > >
> > > > I had to remove some options here, as my valgrind didn't know them with
> > > > --tool=helgrind
> > > >
> > > > I used /usr/bin/valgrind --tool=helgrind --num-callers=30  
> > > > --read-var-info=yes --trace-children=yes --track-fds=yes  
> > > > /usr/local/libexec/master -C /etc/imapd_mu.conf -M  
> > > > /etc/cyrus_mu.conf -p /var/run/cyurs_mu.pid -D &>  
> > > > /tmp/cyrus-mupdate-log-file-helgrind
> > > >
> > > > Valgrind did find some "Possible data race during read" and "This  
> > > > conflicts with a previous write"
> > > > I have attached the full log as well
> > > >
> > > >
> > > >
> > > >> Of course, it is useful to have debugging symbols.  Compiling with -O3
> > > >> migh make things faster, while compiling with -O0 will make it run
> > > >> considerably slower. If ld (=ld.bfd) is very, very recent (e.g. 2.31),
> > > >> then link with  -Wl,-z,noseparate-code otherwise valgrind cannot read
> > > >> the debug info (https://bugs.kde.org/show_bug.cgi?id=395682).
> > > >>
> > > >> Then it shall be possible to see in one or the other log files what is
> > > >> relevant for causing SIGSERV, in particular the first place where bad
> > > >> operation happens, which might be some time before the crash.
> > > >>
> > > >
> > > > I hope this helps untangle the web of multiple threads, race conditions
> > > > and use after free, to find the root of this problem.
> > > >
> > > > If there is any more that i can do to help let me know.
> > > >
> > > >>
> > > >> On Mon, 2018-07-02 at 17:24 +0200, Michael Menge wrote:
> > > >>> Hi,
> > > >>>
> > > >>> we are in the process of setting up our new production mailserver with
> > > >>> cyrus-imapd 3.0.7 on RHEL 7.5 Servers.
> > > >>>
> > > >>> At the moment we encounter many crashes (SIGSEGV) of the mupdate process on
> > > >>> the mupdate master instance. As soon as we issue a command that updates
> > > >>> multiple mailboxes in short time we trigger a SIGSEGV.
> > > >>>
> > > >>>> sam user/zrstes* cyrus all
> > > >>>
> > > >>> Setting ACL on user/zrstes1...OK.
> > > >>> Setting ACL on user/zrstes1/Mail...OK.
> > > >>> Setting ACL on user/zrstes1/Mail/drafts...OK.
> > > >>> Setting ACL on user/zrstes1/Mail/s-spam...OK.
> > > >>> Setting ACL on user/zrstes1/Mail/sent...OK.
> > > >>> Setting ACL on user/zrstes1/Mail/trash...OK.
> > > >>> Setting ACL on user/zrstes1/Mail/v-spam...OK.
> > > >>> Setting ACL on user/zrstes2...cyrus: lrswipkxtea: no connection to server
> > > >>>
> > > >>> I suspect that at this time the connection from the backend to the
> > > >>> mupdate master
> > > >>> is lost as the mupdate process received the SIGSEGV
> > > >>>
> > > >>> (gdb) bt
> > > >>> #0  0x00007fb49c000098 in ?? ()
> > > >>> #1  0x00007fb4a9a59a85 in DH_free (r=0x7fb49c1b9600) at dh_lib.c:194
> > > >>> #2  0x00007fb4aa84dbef in tls_shutdown_serverengine () at imap/tls.c:1311
> > > >>> #3  0x0000000000404075 in conn_free (C=0x7fb4840009f0) at  
> > > >>> imap/mupdate.c:379
> > > >>> #4  0x00000000004062c0 in thread_main (rock=0x0) at imap/mupdate.c:1330
> > > >>> #5  0x00007fb4a771add5 in start_thread (arg=0x7fb4a22d9700) at
> > > >>> pthread_create.c:308
> > > >>> #6  0x00007fb4a718fb3d in clone () at
> > > >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
> > > >>>
> > > >>>
> > > >>> Further information, as DH_free is in the bt, we have NOT configured
> > > >>> dh_params in
> > > >>> our ssl certificate/key or in the imapd.conf
> > > >>>
> > > >
> > > > Kind regards
> > > >
> > > >      Michael Menge
> > > >
> > > >
> > > >
> > > > --------------------------------------------------------------------------------
> > > > M.Menge                                Tel.: (49) 7071/29-70316
> > > > Universität Tübingen                   Fax.: (49) 7071/29-5912
> > > > Zentrum für Datenverarbeitung          mail:  
> > > > michael.menge at zdv.uni-tuebingen.de
> > > > Wächterstraße 76
> > > > 72074 Tübingen
> > > 
> > > 
> > > 
> > > --------------------------------------------------------------------------------
> > > M.Menge                                Tel.: (49) 7071/29-70316
> > > Universität Tübingen                   Fax.: (49) 7071/29-5912
> > > Zentrum für Datenverarbeitung          mail:  
> > > michael.menge at zdv.uni-tuebingen.de
> > > Wächterstraße 76
> > > 72074 Tübingen
> > > 


More information about the Cyrus-devel mailing list