terminated abnormally
Patrick Radtke
phr2101 at columbia.edu
Thu Mar 30 14:16:04 EST 2006
On Mar 23, 2006, at 8:27 AM, Ken Murchison wrote:
> Patrick Radtke wrote:
>> Every day we get a number of 'terminated abnormally' messages on
>> our frontends.
>> When they happen just occasionally then everything still seems to
>> run fine.
>> However sometimes, there will be a large number of them in a short
>> time span and this is a sure sign that people are having trouble
>> logging in or are being disconnected.
>> here is examples from the log
>> Mar 22 10:50:07 tofu master[19506]: service pop3s pid 18311 in
>> READY state: terminated abnormally
>> Mar 22 10:50:10 tofu master[19506]: service imaps pid 18128 in
>> READY state: terminated abnormally
>> Mar 22 10:50:20 tofu master[19506]: service imaps pid 18348 in
>> READY state: terminated abnormally
>> Mar 22 10:50:23 tofu master[19506]: service imaps pid 13063 in
>> READY state: terminated abnormally
>> Mar 22 10:50:32 tofu master[19506]: service pop3s pid 18404 in
>> READY state: terminated abnormally
>> Mar 22 10:50:33 tofu master[19506]: service pop3s pid 18218 in
>> READY state: terminated abnormally
>> Mar 22 11:01:42 valoney master[2581]: service imaps pid 25617 in
>> READY state: terminated abnormally
>> Mar 22 11:01:46 valoney master[2581]: service pop3s pid 25682 in
>> READY state: terminated abnormally
>> Mar 22 11:01:47 valoney master[2581]: service imaps pid 25903 in
>> READY state: terminated abnormally
>> searching on just a fail process id shows
>> Mar 22 11:01:38 valoney imaps[25903]: imaps failed: [local]
>> Mar 22 11:01:47 valoney master[2581]: service imaps pid 25903 in
>> READY state: terminated abnormally
>> Mar 22 11:01:51 valoney master[2581]: service imaps pid 25903:
>> while trying to process message 0x2: not registered yet
>> Mar 22 11:01:52 valoney master[2581]: service imaps pid 25903 in
>> UNKNOWN state: processing message 0x2
>> ar 22 11:01:04 valoney pop3s[25440]: starttls: TLSv1 with cipher
>> RC4-MD5 (128/128 bits new) no authentication
>> Mar 22 11:01:20 valoney pop3s[25440]: login:
>> pool-141-155-147-109.ny5030.east.verizon.net [141.155.147.109]
>> jg2378 plaintext+TLS User logged in
>> Mar 22 11:01:21 valoney pop3s[25440]: failed to bind to address
>> 128.59.48.36: Cannot assign requested address
>> Mar 22 11:01:21 valoney pop3s[25440]: Doing a peer verify
>> Mar 22 11:01:21 valoney pop3s[25440]: Doing a peer verify
>> Mar 22 11:01:21 valoney pop3s[25440]: starttls: TLSv1 with cipher
>> AES256-SHA (256/256 bits new) no authentication
>> Mar 22 11:01:36 valoney pop3s[25440]: pop3s failed: [local]
>> Mar 22 11:01:37 valoney pop3s[25440]: Fatal error:
>> tls_start_servertls() failed
>> Mar 22 11:01:41 valoney master[2581]: service pop3s pid 25440 in
>> READY state: terminated abnormally
>> Mar 22 11:01:52 valoney master[2581]: service pop3s pid 25440:
>> while trying to process message 0x2: not registered yet
>> Mar 22 11:01:52 valoney master[2581]: service pop3s pid 25440 in
>> UNKNOWN state: processing message 0x2
>
> I've never seen behavior like this before. You have a pop3d which
> looks like it successfully does TLS+PLAIN, but then attempts TLS
> again. What I don't know is if the client has disconnected in
> between, or if there is some kind of race condition.
Ken, is the second TLS just proxyd connecting to a backend?
Syslog on debug (with today's CVS cyrus 2.3 )shows:
Mar 30 13:15:05 hotdog imap[18602]: starttls: TLSv1 with cipher
AES256-SHA (256/256 bits new) no authentication
Mar 30 13:15:07 hotdog imap[18602]: ptload(): fetched cache record
(phr2101)(mark 1143741894, current 1143742507, limit 1143738907)
Mar 30 13:15:07 hotdog imap[18602]: ptload returning data
Mar 30 13:15:07 hotdog imap[18602]: canonified phr2101 -> phr2101
Mar 30 13:15:07 hotdog imap[18602]: login: asiago.cc.columbia.edu
[128.59.59.74] phr2101 PLAIN+TLS User logged in
Mar 30 13:15:07 hotdog imap[18602]: Doing a peer verify
Mar 30 13:15:07 hotdog imap[18602]: Doing a peer verify
Mar 30 13:15:07 hotdog imap[18602]: received server certificate
Mar 30 13:15:07 hotdog imap[18602]: starttls: TLSv1 with cipher
AES256-SHA (256/256 bits new) no authentication
Mar 30 13:15:07 hotdog imap[18602]: ptload(): fetched cache record
(phr2101)(mark 1143741894, current 1143742507, limit 1143738907)
Mar 30 13:15:07 hotdog imap[18602]: ptload returning data
Mar 30 13:15:07 hotdog imap[18602]: canonified phr2101 -> phr2101
Mar 30 13:15:07 hotdog imap[18602]: ptload(): fetched cache record
(murder)(mark 1143741895, current 1143742507, limit 1143738907)
Mar 30 13:15:07 hotdog imap[18602]: ptload returning data
Mar 30 13:15:07 hotdog imap[18602]: canonified murder -> murder
Mar 30 13:15:07 hotdog imap[18602]: open: user phr2101 opened INBOX
on bratwurst.cc.columbia.edu
Proxyd is getting a server certificate which makes me think the 2nd
tls is just encrypting the frontend to backend communication.
Additionally it also looking up the 'murder' user which is the proxy
user we use when frontends log into backends
Increasing the log level to debug shows a lot of
Mar 30 13:20:09 mockmeat master[23360]: service imaps pid 26574 in
BUSY state: terminated abnormally
is that normal? (These ones are being terminated while BUSY instead
of when READY). These logs are from 2.2.12
The BUSY ones look like
Mar 30 13:20:05 mockmeat imaps[26574]: accepted connection
Mar 30 13:20:05 mockmeat imaps[26574]: imaps failed: [local]
Mar 30 13:20:08 mockmeat master[23360]: process 26574 exited, status 75
Mar 30 13:20:09 mockmeat master[23360]: service imaps pid 26574 in
BUSY state: terminated abnormally
here is a different (different from my initial email) looking READY
error
Mar 30 13:19:01 mockmeat master[1494]: about to exec /usr/bin/proxyd
Mar 30 13:19:02 mockmeat imaps[1494]: executed
Mar 30 13:19:55 mockmeat imaps[1494]: accepted connection
Mar 30 13:19:56 mockmeat imaps[1494]: imaps failed: [local]
Mar 30 13:19:59 mockmeat master[23360]: process 1494 exited, status 75
Mar 30 13:20:01 mockmeat master[23360]: service imaps pid 1494 in
READY state: terminated abnormally
In these one is looks like something is failing in
'tls_start_servertls'.
Does [local] just mean the client's host couldn't be determined?
Other strangeness I saw was
Mar 30 13:19:39 mockmeat master[23360]: service sieve pid 1496:
changing from service pop3 due to received message
Mar 30 13:19:01 mockmeat master[1496]: about to exec /usr/bin/timsieved
Mar 30 13:19:02 mockmeat sieve[1496]: executed
Mar 30 13:19:03 mockmeat sieve[1496]: warning: can't get client
address: Connection reset by peer
Mar 30 13:19:04 mockmeat sieve[1496]: accepted connection
Mar 30 13:19:05 mockmeat master[23360]: process 1496 exited, status 0
Mar 30 13:19:39 mockmeat master[23360]: service sieve pid 1496:
changing from service pop3 due to received message
What's going on here? Master looks up the pid in ctable and finds
that the service id doesn't match what it is expecting?
and finally
Mar 30 13:19:36 mockmeat master[23360]: imaps has -12 workers?!?
Looking through the code these last 2 sections are marked ' /*
paranoia */'
I imagine this signifies non-normal behavior... but I'm not sure
about causes.
Our load balancer (CSM) probes the frontends every 6 seconds or so to
confirm that they are responding. The CSM makes a connection and then
disconnects right away. Could this be triggering something in Cyrus?
Its hard to think so since the errors generally are all during peak
usage, and much less in the evening even though the CSM is always
probing at the same rate.
-Patrick
More information about the Info-cyrus
mailing list