Cyrus file locking issues

Lawrence Greenfield leg+ at andrew.cmu.edu
Thu Feb 20 12:31:17 EST 2003


* a generic lock debugging strategy...

The first question is what OS are you running on? If this is Linux,
applying the poll-style locking will probably mask whatever the
problem is. If it's something else:

. processes get stuck waiting for a lock (truss shows stuck process in
  fcntl)

. the process holding the lock is likely either
  (a) waiting for input or
  (b) waiting for another lock

. use lsof to find out who's holding the lock, and then truss/gdb to
  find out what that process is currently doing

. if that process is making progress (not waiting for input or a lock)
  then it's something I haven't seen and is either a Cyrus bug or an
  OS bug.

. if that process is waiting for input, it's likely a user doing an
  APPEND to the folder. a gdb backtrace will confirm this. if so,
  there's little you can do except wait for the process to time out
  (1/2 hour). sometimes TLS can confuse the idle timer, in which case
  killing the process is the only possibility.

. if that process is waiting for input and it's not during an APPEND,
  it's almost certainly a Cyrus bug and we'd love a backtrace.

. if that process is waiting for a file lock, find out who's holding
  it and repeat this process.

* In your case

You aren't running on Linux, so it's unlikely that it's unlikely that
the poll-style locks are going to fix your problem.

It's possible that the cluster filesystem is slightly buggy with file
locks. (Cyrus is a very heavy user of file locks.) Search for possible
OS bugs.

I think you need to iterate on "what's the process that's holding the
lock" doing before we can make any better guesses.

good luck,
Larry





More information about the Info-cyrus mailing list