reconstruct caused mailboxes (skiplist) corruption?
bruder at haxent.com.br
Fri Nov 12 14:10:20 EST 2010
We saw something similar:
syslog() messages 'on the wire' (imap, pop3, etcetera) when We've
restarted syslog on an in-production cyrus backend.
In summary, DONT DO IT (syslog stop) with cyrus runing.
On 11/11/2010 07:54 PM, Bron Gondwana wrote:
> On Thu, Nov 11, 2010 at 02:24:47PM -0200, Henrique de Moraes Holschuh wrote:
>> On Thu, 11 Nov 2010, Paul Dekkers wrote:
>>> Uhoh! And then I looked at mailboxes.db: It looks like part completely
>>> rewritten, including the skiplist header, and the first line now said:
>>> user.bla: System I/O error System I/O error
>> This is something that has plagued cyrus for a long time. Can we find a
>> way to actually keep tabs on our FDs so it cannot ever happen again,
>> please? I recall reports of crap showing inside prot streams 10 years
>> ago... if now it is leaking into even worse places, well...
> It's a standalone program. Reconstruct was running all by itself.
>> This probably needs a redesign of master/service fd-passing protocol,
>> and of prot streams to be fixed for good. While at it, we should
>> switch the master/service interaction to a modern design, since the
>> operating system worth bothering with nowadays deal sanely with the
>> thundering herd effect, and all of them have proper socket event support
>> (epoll-like. Would require one of the event abstraction libraries,
>> though, so as to support linux/bsd/solaris with minimum fuss).
> Since that wasn't the issue - why on earth was it allowed to have fd 2
> in the first place? Is Cyrus closing fd 2, or is truss closing it??
> There was no issue outside truss, it was when it ran under truss that
> the issue happened.
> Here's the start of an strace of a reconstruct run on my machine:
> execve("/usr/cyrus/bin/reconstruct", ["/usr/cyrus/bin/reconstruct", "-C", "/tmp/ct-slot2/etc/imapd.conf", "-s"], [/* 20 vars */]) = 0
> brk(0) = 0x12f1000
> access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fceb52d8000
> access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
> open("db-4.6/lib/tls/x86_64/libsasl2.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
> open("db-4.6/lib/tls/libsasl2.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
> open("db-4.6/lib/x86_64/libsasl2.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
> open("db-4.6/lib/libsasl2.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
> open("/etc/ld.so.cache", O_RDONLY) = 3
> Notice the first fd allocated: 3.
> And here's a run under truss on FreeBSD:
> [root at cyrus1 /var/imap]# sudo -u cyrus truss /usr/local/cyrus/bin/reconstruct user.foo
> __sysctl(0x7fffffffe390,0x2,0x7fffffffe3ac,0x7fffffffe3a0,0x0,0x0) = 0 (0x0)
> mmap(0x0,672,PROT_READ|PROT_WRITE,MAP_ANON,-1,0x0) = 34366398464 (0x80065a000)
> munmap(0x80065a000,672) = 0 (0x0)
> __sysctl(0x7fffffffe400,0x2,0x800763428,0x7fffffffe3f8,0x0,0x0) = 0 (0x0)
> mmap(0x0,32768,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34366398464 (0x80065a000)
> issetugid(0x80065b015,0x800654cc4,0x80076fc50,0x80076fc20,0x6351,0x0) = 0 (0x0)
> open("/etc/libmap.conf",O_RDONLY,0666) ERR#2 'No such file or directory'
> access("/usr/lib/libsasl2.so.2",0) ERR#2 'No such file or directory'
> access("/usr/local/lib/libsasl2.so.2",0) = 0 (0x0)
> open("/usr/local/lib/libsasl2.so.2",O_RDONLY,035431400) = 2 (0x2)
> Note the first fd allocated: 2!!!!!
> The question is - why is fd 2 being allocated? Is it necessary to explicitly
> open stderr? The function that's scribbling all over everything is com_err,
> which is supposed to be a BSD error reporting library, it SHOULD know what
> it's doing...
> Bron ( a while later, fd 2 gets re-used as the mailboxes.db handle, and hence
> the mess is created )
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
More information about the Info-cyrus