locking problems with 2.1.9

John Wade jwade at oakton.edu
Wed Nov 6 10:25:37 EST 2002


Hi Pete,

I assume you are using flat seen files.   If so, I ran into this problem on
2.0.16 and came up with a workaround which others ported to 2.1.3.   This
was based on flock, but you might be able to use the same basic
technique.   see http://servercc.oakton.edu/~jwade/cyrus/

The flat file locking code is very strangely broken, I attributed it to
linux kernel problems since I could never reproduce the exact scenario that
I saw in my gdb stack traces.    Others however reported this problem on
enough other platforms (including solaris) that I think the bug is in the
cyrus code.   It will take a far better C programmer than I to track it
down.      What I saw was that the initial process that held the lock that
everyone else was waiting on was invariably a imapd process and it was
trying to lock a file that it already had a lock on.    Meanwhile, even
though the file was locked, other processes had managed to replace it.

The workaround I came up with is to have all attempts at file locks time
out rather than wait indefinitely.    This kills the initial imapd process
that has the problem and the lmtpd's etc, are no longer blocked.   For us,
this happens between one and three times a day.  (the patch I created logs
it to syslog)

Hope this helps,
John

pete at wookie.oit.umass.edu wrote:

> We are experiencing locking problems with cyrus 2.1.9 on a Solaris 8
> system using fcntl and skiplist (except flat for subscriptions).
> We've seen the following issues:
>
>   * Lmtpd's acquire a lock on a cyrus.seen file and never get it;
>     they stack up as mail comes in.
>   * In syslog we see 'IOERROR: reading message: unexpected end of file'
>   * In various partition's 'stage.' directory we see hundreds of
>     messages stacked up waiting for - surprise - users who seem to
>     be having the locking issues.
>   * Some users have cyrus.seen.NEW lying around in their folders.
>
> The above problems exist for only a handful of users; the other 12k
> users seem to be user'ing along without difficulty.  But when the
> other 18k users move to this box it might get worse...
>
> All users were transferred from a different Solaris system (cyrus
> 1.5.27)  to this new one using rsync (mail/folder dirs, quota,
> subscriptions).
>
> Any pointers or suggestions would be helpful!





More information about the Info-cyrus mailing list