negative pop3d worker count

Henrique de Moraes Holschuh hmh at debian.org
Thu Apr 7 18:55:10 EDT 2005


On Thu, 07 Apr 2005, Wolfgang Powisch wrote:
> What should be the expected behaviour when reaching the maxchilds limit.

No more children are spawned, with a possible off-by-one error (i.e. you
could end up with maxchilds+1 children, but that's it).

> What's the reason for the negative worker number ?

A bug somewhere.  The current code is supposed to bitch very loudly if
anything weird happens (which is what the UNKNOWN stuff is all about).  But
it might not be 100% correct, I need to have another look at it, I did find
a few weirdnesses in the original 2.1 patch I submitted that bacame the
roots of what is the child tracking code in 2.2.  At the time, I looked in
2.2 and could not find the same bug in there so I didn't look much further.

> Apr  5 15:23:19 imap01 master[4065]: service pop3 pid 17896: while 
> trying to process message 0x3: not registered yet

This is bad.  Is 2.2's master multithreaded now?  If it is not, it really
should be impossible for the code to receive messages from services before
they are registered (because master should try to process all the pending
messages *after* it finishes registering every child it just spawned).

The other possible scenario for that to happen is when dead children are
forgotten too early.  The code usued to wait at least 2s AFTER receiving a
sigchild before forgetting a dead child, but either this is not enough in
your case, or there is a bug in it.  You can change the hardwired default in
master.c and recompile to test it.  I should tweak that stuff to be
auto-tuneable some of these days.

I remember squashing a bug on Debian's 2.1 master code that caused it not to
process *all* available messages at every interaction (instead, it processed
only one).  This caused messages to pile up in busy sites and triggered the
above issues.  As I said, I think 2.2 does not have this bug, but it is
something that could be checked.

The fact that master STOPPED processing more workers means the resilience
code against such bugs isn't quite right yet.  So we have two bugs, at the
very least.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html




More information about the Info-cyrus mailing list