problematic upgrade 2.3.16 -> 2.4.3

Paul Dekkers Paul.Dekkers at surfnet.nl
Wed Nov 10 14:49:47 EST 2010


Hi,

I intentionally waited a few 2.4-releases to have the first dirty bugs
smashed ;-)

In a small test-setup everything was fine. But on a box with actual
users on it, I seem to have some more problems :-(

The machine runs FreeBSD 8.1 (64-bits) with Cyrus from the port. I'm
using ZFS for the imap and metadata partitions, my /var/imap is on UFS.
This was all fine with 2.3.16.

Initially I had problems with my own mail. A sync_client seemed to stall
on my mailbox, and truss told me it was waiting for a lock:

open("/var/imap/lock/user/paul.lock",O_RDWR|O_CREAT|O_TRUNC,0666) = 4 (0x4)

I didn't really get why. Quitting my 2 Thunderbird sessions solved that.

Also, a pop3 session took forever during this lock. I had imap clients
open to my INBOX, the authentication succeeded I noticed in the logs,
but after PASS it took forever (so I quit the telnet).

I see a lot of other disturbing errors, like reconstruct dumping core,
and all kinds of IOERROR's:

lmtp[94506]: Failed to append cache to user.bla for 747
lmtp[94506]: Index upgrade failed: user.bla
lmtp[94506]: IOERROR: locking index user.bla: No such file or directory
master[93701]: process 94506 exited, signaled to death by 6

And another user now gives this error:

imap[95315]: Failed to append cache to user.astrid.Junk for 35
imap[95315]: Index upgrade failed: user.astrid.Junk
imap[95315]: IOERROR: locking index user.astrid.Junk: Bad file descriptor

Not sure what 35 means anyway, there's no file named like that.

Reconstruct dumped core for another user:

reconstruct[94584]: reconstructing user.bla
reconstruct[94584]: Failed to append cache to user.bla for 747
reconstruct[94584]: Index upgrade failed: user.bla
reconstruct[94584]: IOERROR: locking index user.bla: No such file or
directory
kernel: pid 94584 (reconstruct), uid 60: exited on signal 6

... another folder after reconstruction:

imaps[94936]: IOERROR: invalid cache record for user.paul.Sent uid 180
(System I/O error)

That's strange,

And now my sync_client is complaining with:

sync_client[94963]: Fatal error: waitpid failed

ARGH! Any advise? Could these problems be FreeBSD related?

Sounds like a huge reconstruct might be worth it, still trying to figure
out of the reconstruct -G -r user.paul I did on my inbox solved most
issues for my user.

Regards,
Paul

P.S. I think there was one cosmetic issue. I reconstructed one users
mailbox, and after restarting the sync_client, on my replica, I noticed:
syncserver[35643]: Deleted mailbox user.hmm
... while this mailbox was not deleted, and fortunately it was properly
synced.


More information about the Info-cyrus mailing list