reconstruct caused mailboxes (skiplist) corruption?

Bron Gondwana brong at fastmail.fm
Thu Nov 11 16:54:24 EST 2010


On Thu, Nov 11, 2010 at 02:24:47PM -0200, Henrique de Moraes Holschuh wrote:
> On Thu, 11 Nov 2010, Paul Dekkers wrote:
> > Uhoh! And then I looked at mailboxes.db: It looks like part completely
> > rewritten, including the skiplist header, and the first line now said:
> > user.bla: System I/O error System I/O error
> 
> This is something that has plagued cyrus for a long time.  Can we find a
> way to actually keep tabs on our FDs so it cannot ever happen again,
> please?  I recall reports of crap showing inside prot streams 10 years
> ago... if now it is leaking into even worse places, well...

It's a standalone program.  Reconstruct was running all by itself.
 
> This probably needs a redesign of master/service fd-passing protocol,
> and of prot streams to be fixed for good.   While at it, we should
> switch the master/service interaction to a modern design, since the
> operating system worth bothering with nowadays deal sanely with the
> thundering herd effect, and all of them have proper socket event support
> (epoll-like. Would require one of the event abstraction libraries,
> though, so as to support linux/bsd/solaris with minimum fuss).

Since that wasn't the issue - why on earth was it allowed to have fd 2
in the first place?  Is Cyrus closing fd 2, or is truss closing it??

There was no issue outside truss, it was when it ran under truss that
the issue happened.

Here's the start of an strace of a reconstruct run on my machine:

execve("/usr/cyrus/bin/reconstruct", ["/usr/cyrus/bin/reconstruct", "-C", "/tmp/ct-slot2/etc/imapd.conf", "-s"], [/* 20 vars */]) = 0
brk(0)                                  = 0x12f1000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fceb52d8000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("db-4.6/lib/tls/x86_64/libsasl2.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("db-4.6/lib/tls/libsasl2.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("db-4.6/lib/x86_64/libsasl2.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("db-4.6/lib/libsasl2.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3


Notice the first fd allocated: 3.

And here's a run under truss on FreeBSD:

[root at cyrus1 /var/imap]# sudo -u cyrus truss /usr/local/cyrus/bin/reconstruct user.foo
__sysctl(0x7fffffffe390,0x2,0x7fffffffe3ac,0x7fffffffe3a0,0x0,0x0) = 0 (0x0)
mmap(0x0,672,PROT_READ|PROT_WRITE,MAP_ANON,-1,0x0) = 34366398464 (0x80065a000)
munmap(0x80065a000,672)		     = 0 (0x0)
__sysctl(0x7fffffffe400,0x2,0x800763428,0x7fffffffe3f8,0x0,0x0) = 0 (0x0)
mmap(0x0,32768,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34366398464 (0x80065a000)
issetugid(0x80065b015,0x800654cc4,0x80076fc50,0x80076fc20,0x6351,0x0) = 0 (0x0)
open("/etc/libmap.conf",O_RDONLY,0666)	     ERR#2 'No such file or directory'
access("/usr/lib/libsasl2.so.2",0)	 ERR#2 'No such file or directory'
access("/usr/local/lib/libsasl2.so.2",0)     = 0 (0x0)
open("/usr/local/lib/libsasl2.so.2",O_RDONLY,035431400) = 2 (0x2)

Note the first fd allocated: 2!!!!!


The question is - why is fd 2 being allocated?  Is it necessary to explicitly
open stderr?  The function that's scribbling all over everything is com_err,
which is supposed to be a BSD error reporting library, it SHOULD know what
it's doing...

Bron ( a while later, fd 2 gets re-used as the mailboxes.db handle, and hence
       the mess is created )


More information about the Info-cyrus mailing list