Cyrus file locking issues
Lawrence Greenfield
leg+ at andrew.cmu.edu
Thu Feb 20 12:31:17 EST 2003
* a generic lock debugging strategy...
The first question is what OS are you running on? If this is Linux,
applying the poll-style locking will probably mask whatever the
problem is. If it's something else:
. processes get stuck waiting for a lock (truss shows stuck process in
fcntl)
. the process holding the lock is likely either
(a) waiting for input or
(b) waiting for another lock
. use lsof to find out who's holding the lock, and then truss/gdb to
find out what that process is currently doing
. if that process is making progress (not waiting for input or a lock)
then it's something I haven't seen and is either a Cyrus bug or an
OS bug.
. if that process is waiting for input, it's likely a user doing an
APPEND to the folder. a gdb backtrace will confirm this. if so,
there's little you can do except wait for the process to time out
(1/2 hour). sometimes TLS can confuse the idle timer, in which case
killing the process is the only possibility.
. if that process is waiting for input and it's not during an APPEND,
it's almost certainly a Cyrus bug and we'd love a backtrace.
. if that process is waiting for a file lock, find out who's holding
it and repeat this process.
* In your case
You aren't running on Linux, so it's unlikely that it's unlikely that
the poll-style locks are going to fix your problem.
It's possible that the cluster filesystem is slightly buggy with file
locks. (Cyrus is a very heavy user of file locks.) Search for possible
OS bugs.
I think you need to iterate on "what's the process that's holding the
lock" doing before we can make any better guesses.
good luck,
Larry
More information about the Info-cyrus
mailing list