followup: stuck lmtpd processes

Henrique de Moraes Holschuh hmh at debian.org
Wed Sep 24 11:57:37 EDT 2003


On Wed, 24 Sep 2003, Etienne Goyer wrote:
> On Wed, Sep 24, 2003 at 11:13:06AM -0300, Henrique de Moraes Holschuh wrote:
> > It is not a general solution when you hit glibc/kernel bugs, but I can
> > certainly live with it IF I manage to track down a version of glibc and
> > kernel that won't deadlock, that we can recommend. Either that, or allow for
> > runtime-switchable behaviours (I am willing to code this).
> 
> This is not good enough.  You can't recommend a specific kernel/glibc
> version; this is dictated by the distribution people use.  You can't
> just recommend using the latest either, because a lot (most ?) people
> will prefer to use older, well-known, stable (Debian stable, RedHat 7.3)
> distribution.

They must use the "slow" version then.  What I want is a deadlock test
program that the user can run to verify if his kernel/glibc combination is
fubar or works as documented.  I don't have time to write one right now,
though.

I did check ALL the documentation already, and ALL of it says that sigalarm
MUST interrupt the syscall, and that it HAS to return EINTR.  So, it is a
bug.  So, it needs to be squashed, and people have to either patch or
upgrade their systems... or deal with diminished performance.

> The obvious solution is to not use alarm() to interrupt blocking
> syscall, but to use non-blocking call with select() instead.  I

Yuck.  AFAIK, that means rewriting the entire lock system.  Maybe we could
do that using sort of a 'lock server' inside master, but this will also be
slower than alarm() and fcntl/flock, I think.  This is a last resort
possibility, and I don't think it is worth it since we already have a
workaround that works.

> And please don't scoff it as "a problem with Linux, not Cyrus".  Linux
> may well be broken (I can't tell), but it still constitute the vast
> majority of Cyrus installation (I would believe), and thus merit to be
> accomodated.

Something that works in Linux, sure.  Something that works in broken Linux?
No.  Fix the breakage in Linux, instead.  That's our strenght, and I *will*
stick to it as a Debian maintainer.

There is a workaround that works in Linux, which people can use right now.

There is a proper Unix way to do it (using alarm().  this needs to be added
to Cyrus IMHO) that *might* not work in certain Linux glibc/kernel
combinations.

It is clearly time to track down the glibc/kernel bug, and squash it, and
deploy the general case (and better code) using alarm() in Cyrus...  There
is no reason why this cannot be done in such a way that the user can disable
it, at configure phase.

Now, if other Unixes have stupid lock and alarm() bugs, that deadlock
testing code would be even more useful... :-)

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh




More information about the Info-cyrus mailing list