problematic upgrade 2.3.16 -> 2.4.3
Paul Dekkers
Paul.Dekkers at surfnet.nl
Wed Nov 10 14:49:47 EST 2010
Hi,
I intentionally waited a few 2.4-releases to have the first dirty bugs
smashed ;-)
In a small test-setup everything was fine. But on a box with actual
users on it, I seem to have some more problems :-(
The machine runs FreeBSD 8.1 (64-bits) with Cyrus from the port. I'm
using ZFS for the imap and metadata partitions, my /var/imap is on UFS.
This was all fine with 2.3.16.
Initially I had problems with my own mail. A sync_client seemed to stall
on my mailbox, and truss told me it was waiting for a lock:
open("/var/imap/lock/user/paul.lock",O_RDWR|O_CREAT|O_TRUNC,0666) = 4 (0x4)
I didn't really get why. Quitting my 2 Thunderbird sessions solved that.
Also, a pop3 session took forever during this lock. I had imap clients
open to my INBOX, the authentication succeeded I noticed in the logs,
but after PASS it took forever (so I quit the telnet).
I see a lot of other disturbing errors, like reconstruct dumping core,
and all kinds of IOERROR's:
lmtp[94506]: Failed to append cache to user.bla for 747
lmtp[94506]: Index upgrade failed: user.bla
lmtp[94506]: IOERROR: locking index user.bla: No such file or directory
master[93701]: process 94506 exited, signaled to death by 6
And another user now gives this error:
imap[95315]: Failed to append cache to user.astrid.Junk for 35
imap[95315]: Index upgrade failed: user.astrid.Junk
imap[95315]: IOERROR: locking index user.astrid.Junk: Bad file descriptor
Not sure what 35 means anyway, there's no file named like that.
Reconstruct dumped core for another user:
reconstruct[94584]: reconstructing user.bla
reconstruct[94584]: Failed to append cache to user.bla for 747
reconstruct[94584]: Index upgrade failed: user.bla
reconstruct[94584]: IOERROR: locking index user.bla: No such file or
directory
kernel: pid 94584 (reconstruct), uid 60: exited on signal 6
... another folder after reconstruction:
imaps[94936]: IOERROR: invalid cache record for user.paul.Sent uid 180
(System I/O error)
That's strange,
And now my sync_client is complaining with:
sync_client[94963]: Fatal error: waitpid failed
ARGH! Any advise? Could these problems be FreeBSD related?
Sounds like a huge reconstruct might be worth it, still trying to figure
out of the reconstruct -G -r user.paul I did on my inbox solved most
issues for my user.
Regards,
Paul
P.S. I think there was one cosmetic issue. I reconstructed one users
mailbox, and after restarting the sync_client, on my replica, I noticed:
syncserver[35643]: Deleted mailbox user.hmm
... while this mailbox was not deleted, and fortunately it was properly
synced.
More information about the Info-cyrus
mailing list