bernhard at intevation.de
Fri Apr 7 11:07:21 EDT 2006
On Fri, Apr 07, 2006 at 10:35:44AM -0400, Ken Murchison wrote:
> Bernhard Reiter wrote:
> >we are currently debugging a problem with the Kolab Groupware server
> ><http://www.kolab.org/>. Cyrus imapd is a important core component of
> >the Kolab server and Kolab makes heavy use of mailbox annotations.
> >We are using skiplist as backend for the annotations.db and some users
> >experienced corruptions of the annotations.db.
> >These problems always follow the same pattern:
> >1. Something goes wrong (this is the most interesting part as we
> > don't know by now _what_ exactly goes wrong.)
> >2. A notice on a partial transaction is written to the log:
> > imap: skiplist recovery /kolab/var/imapd/annotations.db: found
> > partial txn, not replaying
> >3. When trying to restart imapd it fails and writes to the log:
> > imap: DBERROR: skiplist recovery
> > /kolab/var/imapd/annotations.db: A9FC8 should be ADD or DELETE
> >The actual defect in the annotations.db skiplist file always follows
> >a constant pattern too:
> >In the "log" part of the skiplist file appears a bunch of nullbytes,
> >exactly between two valid transaction, with roughly the length of an
> >ADD node:
> > 0012e910: 7365 7276 6572 2064 6965 0074 6578 742f
> > 0012e920: 706c 6169 6e00 4424 2e1e 0000 0012 e9c4
> > 0012e930: 0013 47f4 0013 496c 0013 4ce8 ffff ffff
> > ^^^^^^^^^
> > regular end of ADD entry
> > 0012e940: 0000 00ff 0000 0004 0011 79b4 0000 00ff
> > ^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^
> > COMMIT DELETE entry COMMIT
> > 0012e950: 0000 0000 0000 0000 0000 0000 0000 0000
> > 0012e960: 0000 0000 0000 0000 0000 0000 0000 0000
> > 0012e970: 0000 0000 0000 0000 0000 0000 0000 0000
> > 0012e980: 0000 0000 0000 0000 0000 0000 0000 0000
> > 0012e990: 0000 0000 0000 0000 0000 0000 0000 0000
> > 0012e9a0: 0000 0000 0000 0000 0000 0000 0000 0000
> > 0012e9b0: 0000 0000 0000 0000 0000 0000 0000 0000
> > 0012e9c0: 0000 0000 0000 0002 0000 0032 706f 6c79
> > ^^^^^^^^^
> > Regular start of ADD entry
> >Issue 840 in the official Kolab bug tracker contains some more
> >in-depth information on the problem and our analysis so far:
> >Has anyone here on the list a idea what the reason of this could
> >possibly be? Any hints on how to continue debugging and any sound
> >theory on the background of this problem would be highly appreciated.
> I haven't dug into the skiplist code enough to have any *good* thoughts
> on this, but my gut is that either there is a locking problem, an
> off-by-one error or a process if dying in the middle of a txn.
> The Kolab stuff probably hits annotations.db a lot more than a
> standalone Cyrus server, and is probably uncovering a race condition or
> something else related to heavy use.
> >One more catch: we have found no way so fare to reproduce this
> >problem. But we have customers how experience it on a more or less
> >regular basis (about once a month).
> This obviously makes it tough to track down. Is there any pattern to
> when it happens?
The glory details are behind the issue840 link above.
Here is a short summary:
It frequently happens when large mailboxes are initially uploaded.
This usually means a .pst up to 2 GByte is "mapped" by an Outlook Connector
that copyies the emails and objects to imap.
Those uploads trigger a lot of SELECT, CREATE, SETANNOTATION sequences
to create the folder hierarchy first.
We only have closely seen cases
with GNU/Linux with Linux 2.6.x, little endian, ext3
but this is our largest user base, so it does not say much.
Using an SMP kernel on a hyperthreading or multiprocessor machine also
raised chances of this occurring.
We believe in one case without SMP kernel, though.
We have a setup that tries to mirror this patter as good as we can
with script, but failed to reproduce to far.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : https://lists.andrew.cmu.edu/mailman/private/cyrus-devel/attachments/20060407/6362cea8/attachment.bin
More information about the Cyrus-devel