Cyrus 2.5.9 imapd children and tcp_keepalive & idle socket questions

Andy Dorman adorman at ironicdesign.com
Wed Sep 28 10:59:46 EDT 2016


On 09/27/2016 07:08 PM, ellie timoney via Info-cyrus wrote:
> I wonder if we can see what the old processes are actually doing...
>
> Is there any chance you're able to obtain a backtrace of these?
>
> You can use a command something like:
>
>       gdb -p pid /path/to/imapd
>
> where pid is the one of the imapd process ids (which is the last column
> in that ls output from /run/cyrus/proc/) and where the correct path to
> wherever your imapd executable actually lives (I think this might be
> "/usr/lib/cyrus/bin/imapd" on Debian).  You should run this command as
> the same user that your cyrus server runs as (probably "cyrus").
>
> Once you're in gdb, look for output like "Reading symbols from
> /path/to/imapd...".  If this line also contains "no debugging symbols
> found", then we can't get anything useful like this, just "quit" -> "y"
> to exit gdb.  But if it does find debugging symbols, great: run "bt" to
> obtain a backtrace, then "quit" -> "y" to exit, and send through the
> output from the session (probably best to paste the output into a text
> file and then attach that to your email).
>
> Another possibility is to trace the system calls.  You can use a command
> something like this (also as the "cyrus" user):
>
>     strace -p pid
>
> Do not blindly email the output from this command, it may contain
> sensitive/private content!
>
> Let this run for 10 minutes or so, then press ctrl-c to stop it, and
> then I guess kind of try to describe the output, and we can go from
> there.  If it's very noisy, you've caught it in the middle of doing
> something. Wait a little while and try again, or try a different imapd
> pid: we're trying to see what the inactive ones are waiting for.
>
> Cheers,
>
> ellie

Ellie, thanks for doing the above "extended debug" version of a reply. 
While not a total "nube" to debugging compiled processes, I have not 
done it for a long time and never with Linux, so I am unfamiliar with 
the tools.  Your words are a great help.

I installed and ran gdb on one of our "problem" servers against an 
"idled" pid, 16168, and as you suspected, I found this.

Reading symbols from /usr/lib/cyrus/bin/imapd...(no debugging symbols 
found)...done.

FYI, I confirmed this pid is indeed Idle according to this line in 
/run/cyrus/proc/16168

imap	gertrude.ironicdesign.com [192.168.0.16]	donations at cogift.co 
cogift.co!user.donations	Idle

So I have installed and started strace against idled pid 16168 and am 
collecting the output to a file.

sudo -u cyrus strace -p 16168 -t -o cyrusIdled

16168 is a process that was started about two hours ago (before I saw 
your email) after I restarted cyrus to clear the old processes before we 
reached the max. I added -t to get a time stamp on each line in case I 
need to correlate anything weird with what is going on with the 
Blackberry (like perhaps it is unable to connect as the owner drives 
around?)

I am tailing the log and so far I am seeing a 60-second cycle with 
opening three files (donations.lock, cyrus.index, and cyrus.header) and 
then ending with closing all three.

Below is the strace output of a typical cycle I am seeing.

Please note the first and last lines in this output snippet.  As you can 
see by the time stamps, the first line is written at the end of the 
previous 1 minute cycle, but what you can't see here is that it is 
written without the "= 0 (Timeout)".  The next 1 min cycle begins by 
adding the "= 0 (Timeout)" to the end of that line and finishes with the 
same line again without the "= 0 (Timeout)".

So it looks like the pselect call is timing out after 60 seconds and the 
cycle starts over.  I have no idea if this is normal.  I may install 
strace on our dev server and see what the strace output looks like with 
a normal IMAP client like Thunderbird.

09:39:08 pselect6(16, [0 15], NULL, NULL, {60, 0}, {[], 8}) = 0 (Timeout)
09:40:08 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
09:40:08 gettimeofday({1475073608, 554369}, NULL) = 0
09:40:08 
open("/run/cyrus/lock/domain/W/cogift.co/U/user/donations.lock", 
O_RDWR|O_CREAT|O_TRUNC, 0666) = 19
09:40:08 fcntl(19, F_SETLKW, {l_type=F_RDLCK, l_whence=SEEK_SET, 
l_start=0, l_len=0}) = 0
09:40:08 fcntl(12, F_SETLKW, {l_type=F_RDLCK, l_whence=SEEK_SET, 
l_start=0, l_len=0}) = 0
09:40:08 fstat(12, {st_mode=S_IFREG|0600, st_size=218344, ...}) = 0
09:40:08 stat("/var/lib/cyrus/mailboxes.db", {st_mode=S_IFREG|0600, 
st_size=218344, ...}) = 0
09:40:08 fcntl(12, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, 
l_start=0, l_len=0}) = 0
09:40:08 
open("/var/spool/cyrus/mail/domain/W/cogift.co/U/user/donations/cyrus.index", 
O_RDWR) = 20
09:40:08 fstat(20, {st_mode=S_IFREG|0600, st_size=1688, ...}) = 0
09:40:08 mmap(NULL, 16384, PROT_READ, MAP_SHARED, 20, 0) = 0x7f05ab7a9000
09:40:08 fcntl(20, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, 
l_start=0, l_len=0}) = 0
09:40:08 gettimeofday({1475073608, 555550}, NULL) = 0
09:40:08 
stat("/var/spool/cyrus/mail/domain/W/cogift.co/U/user/donations/cyrus.index", 
{st_mode=S_IFREG|0600, st_size=1688, ...}) = 0
09:40:08 
open("/var/spool/cyrus/mail/domain/W/cogift.co/U/user/donations/cyrus.header", 
O_RDONLY) = 21
09:40:08 fstat(21, {st_mode=S_IFREG|0600, st_size=207, ...}) = 0
09:40:08 mmap(NULL, 207, PROT_READ, MAP_SHARED, 21, 0) = 0x7f05ab7a8000
09:40:08 munmap(0x7f05ab7a8000, 207)    = 0
09:40:08 fcntl(20, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, 
l_start=0, l_len=0}) = 0
09:40:08 gettimeofday({1475073608, 556060}, NULL) = 0
09:40:08 open("/var/lib/cyrus/msg/shutdown", O_RDONLY) = -1 ENOENT (No 
such file or directory)
09:40:08 fcntl(14, F_SETLKW, {l_type=F_RDLCK, l_whence=SEEK_SET, 
l_start=0, l_len=0}) = 0
09:40:08 fstat(14, {st_mode=S_IFREG|0600, st_size=336, ...}) = 0
09:40:08 stat("/var/lib/cyrus/user_deny.db", {st_mode=S_IFREG|0600, 
st_size=336, ...}) = 0
09:40:08 fcntl(14, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, 
l_start=0, l_len=0}) = 0
09:40:08 fcntl(20, F_SETLKW, {l_type=F_RDLCK, l_whence=SEEK_SET, 
l_start=0, l_len=0}) = 0
09:40:08 gettimeofday({1475073608, 556577}, NULL) = 0
09:40:08 
stat("/var/spool/cyrus/mail/domain/W/cogift.co/U/user/donations/cyrus.index", 
{st_mode=S_IFREG|0600, st_size=1688, ...}) = 0
09:40:08 close(21)                      = 0
09:40:08 
open("/var/spool/cyrus/mail/domain/W/cogift.co/U/user/donations/cyrus.header", 
O_RDONLY) = 21
09:40:08 fstat(21, {st_mode=S_IFREG|0600, st_size=207, ...}) = 0
09:40:08 mmap(NULL, 207, PROT_READ, MAP_SHARED, 21, 0) = 0x7f05ab7a8000
09:40:08 munmap(0x7f05ab7a8000, 207)    = 0
09:40:08 fcntl(20, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, 
l_start=0, l_len=0}) = 0
09:40:08 gettimeofday({1475073608, 557191}, NULL) = 0
09:40:08 close(21)                      = 0
09:40:08 close(20)                      = 0
09:40:08 munmap(0x7f05ab7a9000, 16384)  = 0
09:40:08 close(19)                      = 0
09:40:08 rt_sigprocmask(SIG_BLOCK, [INT QUIT ALRM TERM CHLD], [], 8) = 0
09:40:08 pselect6(16, [0 15], NULL, NULL, {60, 0}, {[], 8}

I have no idea if the pselect timeout is important, but I wanted to 
point out that behavior since it is not apparent from just looking at 
the output after the fact.

I am going to let this run all day and will scan it for anything 
different from this 60-second cycle and will post results back to the 
list around this time tomorrow.

Thanks again for all your help.

-- 
Andy Dorman
Ironic Design, Inc.
AnteSpam.com

CONFIDENTIALITY NOTICE: This message is for the named person's use only. 
It may contain confidential, proprietary or legally privileged 
information. No confidentiality or privilege is waived or lost by any 
erroneous transmission. If you receive this message in error, please 
immediately destroy it and notify the sender. You must not, directly or 
indirectly, use, disclose, distribute, or copy any part of this message 
if you are not the intended recipient.



More information about the Info-cyrus mailing list