From baconm at email.unc.edu Wed Jun 17 09:44:16 2009 From: baconm at email.unc.edu (Michael Bacon) Date: Wed, 17 Jun 2009 09:44:16 -0400 Subject: MUPDATE database problems -- the importance of thread safety In-Reply-To: <00BFAADC8FC3B60673A4103D@dhcp00032.its.unc.edu> References: <00BFAADC8FC3B60673A4103D@dhcp00032.its.unc.edu> Message-ID: <74CFE7AD3EF3987A9FE96DAB@dhcp00032.its.unc.edu> It turns out that this was an issue with mupdate being a multi-threaded daemon, and in a critical place in the non-blocking prot code (in prot_flush_internal()), the behavior relies on the value of errno. If it's EAGAIN, the write will try again, otherwise it sets s->error and quits. Naturally, being a global variable normally, errno doesn't work terribly well in multi-threaded code unless the necessary thread safety switch is passed to the compiler. Hence, when thread #5 was getting a -1 from the write(2) system call, it was reading errno as 0, rather than EAGAIN as it should have been. The solution, should anyone else run into this, is as simple as recompiling with the thread safety switch. (In the case of Sun's SPro, it's -mt. I think it's -mthread for gcc, but I'm not sure.) Maddening that the fix was that simple, as I spent two solid weeks hunting for the dratted bug. I have two requests to the CVS maintainers out there. First, the below patch to current CVS isn't terribly comprehensive, and doesn't narrow it down from about a dozen places s->error could be set, but at least would have given SOME kind of indication on the server that something had gone wrong, and might have saved me about a week of hunting. Secondly, I am very weak in the ways of autoconf, but it strikes me that since Cyrus now builds mupdate as multithreaded by default (good decision, IMO), autoconf should make some attempt to figure out what thread safety switch is appropriate and add it to CFLAGS. Regards, Michael Bacon ITS Messaging UNC Chapel Hill --- prot.c 23 Apr 2009 17:10:07 -0000 1.97 +++ prot.c 17 Jun 2009 13:34:26 -0000 @@ -1038,6 +1038,8 @@ /* If we are exiting with an error, we should clear our memory buffer * and set our return code */ if(s->error) { + syslog(LOG_DEBUG, "prot_flush_internal: Error -- %s", s->error); + s->ptr = s->buf; s->cnt = s->maxplain; return EOF; --On June 13, 2009 4:22:03 PM -0400 Michael Bacon wrote: > Hello all, > > We're in the middle of trying to move from our single server installation > to a new murder installation on all new hardware. We're getting into the > late stages of setup, when we've run into a killer problem with getting > the old server to sync up with the MUPDATE server so that we can migrate > off of it. We're under a deadline to get the expensive new hardware > rolled out into production, so any help would be enormously appreciated. > > The test installation with a test backend of, oh, a couple dozen > mailboxes worked flawlessly. Syncing happened just as it was supposed > to, and everything looked good for production. The next step was to > start the old server syncing its database with the MUPDATE server, and > that's where we're stuck. > > The initial sync from the old backend works just fine. During the second > sync, however (ctl_mboxlist -m), the backend connects to the MUPDATE > server, executes a LIST , and then the server returns > somewhere between 2500-10,000 lines (of a 830k+ mailboxes database), and > freezes. A combination of telemetry logs and truss output shows that > the server records itself as having sent more data than the client > receives, but truss'ing the client shows the client expectantly waiting > in a read state. (The server continues to spin in a > fstat/stat/fcntl/fcntl cycle on the mailboxes database, which as far as > I can tell is normal behavior for the skiplist driver, but still looks > really weird in a truss.) > > Now, here's where it gets even weirder: if I connect using mupdatetest > and issue the same LIST command and let it run, the command runs to > completion without error. However, if I at some point use flow control > on my ssh session and hit ^S, then a ^Q, the scrolling continues > briefly, and then the server hangs in a very similar way as above. To > make things even odder, when I run a super-aggressive truss on the > process (truss -aeflE -v all), the error never occurs. It's as if > slowing down the mupdate process keeps it out of whatever error state it > gets into. > > To make matters stranger, when I used the berkeley-hash driver on the > MUPDATE mboxlist, the MUPDATE server fails to return anything from a LIST > command, even when its database is full of matching entries. When > ctl_mboxlist -m is run, an assert() fails and the process exits without > performing any work. > > Because of all of this, I suspect something going wrong with a buffer > filling up ungracefully somewhere. The spot I'm attacking right now is > the 64-bit build -- I'm spending the weekend in the office rebuilding > everything as 32 bit instead (libraries from the ground up), in case > there's some problem with a different interpretation of size_t or some > such thing in the 64-bit world. I'll share any findings in a few days, > but I wanted to get this out earlier. > > We've eliminated hardware, OS, network, and compiler-specific errors by > trying uploading the same database from numerous different clients to > numerous different servers. (See the combinations tried below). I'm > open to any and all suggestions at this point. > > Michael Bacon > ITS Messaging > UNC Chapel Hill > > > > Current system information: > Hardware: Sun T5220s (Sparc CoolThreads architecture) running Solaris 10 > Build: 64-bit binaries built using the Sun SPro compiler (to get > CoolThreads optimizations) > Configuration: tlscache, duplicate, and mboxlist_db all defined to > skiplist > > > Combinations tried: (backend client -> mupdate server) > (all builds currently 64 bit 2.3.13) > > Sun 6800+Sol 9+gcc build -> Sun 5220+Sol 10+spro build > Sun 6800+Sol 9+gcc build -> Sun 5120+Sol 10+spro build > Sun 280R+Sol 9+gcc build -> Sun 5220+Sol 10+spro build > Sun 280R+Sol 9+gcc build -> Same machine, separate cyrus install over > localhost > Sun 5220+Sol 10+spro build -> Sun 5220+Sol 10+spro build > Sun 5220+Sol 10+spro build -> Sun 280R+Sol 9+gcc build > We tried others too, but this covers most of the important combinations, > I think. > ---- > Cyrus Home Page: http://cyrusimap.web.cmu.edu/ > Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki > List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html From wes at umich.edu Wed Jun 17 14:16:44 2009 From: wes at umich.edu (Wesley Craig) Date: Wed, 17 Jun 2009 14:16:44 -0400 Subject: MUPDATE database problems -- the importance of thread safety In-Reply-To: <74CFE7AD3EF3987A9FE96DAB@dhcp00032.its.unc.edu> References: <00BFAADC8FC3B60673A4103D@dhcp00032.its.unc.edu> <74CFE7AD3EF3987A9FE96DAB@dhcp00032.its.unc.edu> Message-ID: Please open a report in bugzilla and mark it was a "blocker". Thanks for finding the issue. :wes On 17 Jun 2009, at 09:44, Michael Bacon wrote: > It turns out that this was an issue with mupdate being a multi- > threaded daemon, and in a critical place in the non-blocking prot > code (in prot_flush_internal()), the behavior relies on the value > of errno. If it's EAGAIN, the write will try again, otherwise it > sets s->error and quits. Naturally, being a global variable > normally, errno doesn't work terribly well in multi-threaded code > unless the necessary thread safety switch is passed to the > compiler. Hence, when thread #5 was getting a -1 from the write(2) > system call, it was reading errno as 0, rather than EAGAIN as it > should have been. > > The solution, should anyone else run into this, is as simple as > recompiling with the thread safety switch. (In the case of Sun's > SPro, it's -mt. I think it's -mthread for gcc, but I'm not sure.) > Maddening that the fix was that simple, as I spent two solid weeks > hunting for the dratted bug. > > I have two requests to the CVS maintainers out there. First, the > below patch to current CVS isn't terribly comprehensive, and > doesn't narrow it down from about a dozen places s->error could be > set, but at least would have given SOME kind of indication on the > server that something had gone wrong, and might have saved me about > a week of hunting. > > Secondly, I am very weak in the ways of autoconf, but it strikes me > that since Cyrus now builds mupdate as multithreaded by default > (good decision, IMO), autoconf should make some attempt to figure > out what thread safety switch is appropriate and add it to CFLAGS. From baconm at email.unc.edu Thu Jun 18 17:44:19 2009 From: baconm at email.unc.edu (Michael Bacon) Date: Thu, 18 Jun 2009 17:44:19 -0400 Subject: Repeat recovers on databases In-Reply-To: <272BC2FE0F18300DCCDCDA04@dhcp00032.its.unc.edu> References: <272BC2FE0F18300DCCDCDA04@dhcp00032.its.unc.edu> Message-ID: <0E5A945BF34FF898C9F1E390@dhcp00032.its.unc.edu> Another one stomped here. This time, it's a 32/64 bit issue. myinit in cyrusdb_skiplist.c assumes that type_t is 4 bytes long, and writes out that many from the current timestamp when creating $confdir/db/skipstamp. On 64-bit Solaris, time_t is 8 bytes (it's typedef'ed as a long). I'm forgetting my Who's Who of big and little endian chips, but my guess is that on x86 systems, the first four bytes are the ones with the real data in them, so there's actually meaningful data that gets written out. On Sparc, though, no such luck. So, when ctl_cyrusdb decides to recover the database, it writes out four bytes of data, all of which happen to be zeroes. Henceforth, every process that looks at the database goes, "oh, look, the database needs recovering!" then spends 55 seconds recovering it before it does any meaningful work, then proceeds to write out 4 bytes of zeroes into the skipstamp file. The next process comes along, reads the skipstamp file, and goes, "oh, look, the database needs recovering!" The fix for it is below. I will also open a bugzilla issue for this. Always remember boys and girls, when you ASS-UM-E the bit size of types, you make lots of ASSemblers go "UM...." exponentially. Michael Bacon ITS Messaging UNC Chapel Hill =================================================================== RCS file: /cvs/src/cyrus/lib/cyrusdb_skiplist.c,v retrieving revision 1.64 diff -u -r1.64 cyrusdb_skiplist.c --- cyrusdb_skiplist.c 8 Oct 2008 15:47:08 -0000 1.64 +++ cyrusdb_skiplist.c 18 Jun 2009 21:42:30 -0000 @@ -239,7 +239,7 @@ if (r != -1) r = ftruncate(fd, 0); a = htonl(global_recovery); - if (r != -1) r = write(fd, &a, 4); + if (r != -1) r = write(fd, &a, sizeof(time_t)); if (r != -1) r = close(fd); if (r == -1) { @@ -252,7 +252,7 @@ fd = open(sfile, O_RDONLY, 0644); if (fd == -1) r = -1; - if (r != -1) r = read(fd, &a, 4); + if (r != -1) r = read(fd, &a, sizeof(time_t)); if (r != -1) r = close(fd); if (r == -1) { --On June 15, 2009 10:07:34 AM -0400 Michael Bacon wrote: > This appears to be an issue in addition to the freeze-ups we're having. > > Given all the dumping and undumping I'm doing in the name of debugging, > this may not be surprising, but I keep seeing instances where a database > gets into some state wherein any process that opens it decides to run a > recover on it before doing anything. Running a ctl_cyrusdb -r, even with > all other processes stopped, does not seem to change this behavior. The > next time a cyrus process starts up, whether it's an imapd, mupdate, or > ctl_mboxlist, the process goes and does a recover before doing anything > else. > > Has anyone else seen this? I've seen it on brand-new, newly "undumped" > databases in the past week. > > Michael Bacon > ITS Messaging > UNC Chapel Hill > ---- > Cyrus Home Page: http://cyrusimap.web.cmu.edu/ > Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki > List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html From brong at fastmail.fm Thu Jun 18 19:47:53 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Fri, 19 Jun 2009 09:47:53 +1000 Subject: Repeat recovers on databases In-Reply-To: <0E5A945BF34FF898C9F1E390@dhcp00032.its.unc.edu> References: <272BC2FE0F18300DCCDCDA04@dhcp00032.its.unc.edu> <0E5A945BF34FF898C9F1E390@dhcp00032.its.unc.edu> Message-ID: <20090618234753.GA4196@brong.net> On Thu, Jun 18, 2009 at 05:44:19PM -0400, Michael Bacon wrote: > Another one stomped here. This time, it's a 32/64 bit issue. myinit in > cyrusdb_skiplist.c assumes that type_t is 4 bytes long, and writes out > that many from the current timestamp when creating $confdir/db/skipstamp. > On 64-bit Solaris, time_t is 8 bytes (it's typedef'ed as a long). I'm > forgetting my Who's Who of big and little endian chips, but my guess is > that on x86 systems, the first four bytes are the ones with the real data > in them, so there's actually meaningful data that gets written out. On > Sparc, though, no such luck. Er, yeah. Ouch. Damn. I want to make it an 8 bit value, but that would be an incompatible format change to skiplists. At which time I would do a bunch of other stuff too. I do have a cyrusdb_skiplist2.c file floating around somewhere that does it (checksums for one thing). I was even thinking of doing something really evil with ordering on checkpoint, but I never got around to running the numbers to see if it made point. Basically instead of: level: 1 2 1 3 2 1 1 2 key : aaa bbb ccc ddd eee fff ggg hhh It would lay the records out like this: level: 3 2 2 2 1 1 1 1 key : ddd bbb eee hhh aaa ccc fff ggg The advantage being that for a lookup, the "next record" at the same level would be directly after the current one, so readahead would be more likely to hit the next node for the search case. It would be a fair bit more random for enumerating though, so I don't know if it's really sane (and of course as you make changes, it all gets more random until the next checkpoint anyway) So anyway, will definitely fix the immediate issue! Thanks, Bron. From brong at fastmail.fm Thu Jun 18 20:09:16 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Fri, 19 Jun 2009 10:09:16 +1000 Subject: Repeat recovers on databases In-Reply-To: <0E5A945BF34FF898C9F1E390@dhcp00032.its.unc.edu> References: <272BC2FE0F18300DCCDCDA04@dhcp00032.its.unc.edu> <0E5A945BF34FF898C9F1E390@dhcp00032.its.unc.edu> Message-ID: <20090619000916.GB5674@brong.net> On Thu, Jun 18, 2009 at 05:44:19PM -0400, Michael Bacon wrote: > The fix for it is below. I will also open a bugzilla issue for this. I think this is actually a better fix that keeps things in the right type on to disk. Can you please test it on your platform. Thanks, Bron. -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Use-correctly-sized-variable-for-recovery-time.patch Type: text/x-diff Size: 2927 bytes Desc: not available Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20090619/1465c6bf/attachment.bin From brong at fastmail.fm Thu Jun 18 19:57:03 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Fri, 19 Jun 2009 09:57:03 +1000 Subject: Repeat recovers on databases In-Reply-To: <0E5A945BF34FF898C9F1E390@dhcp00032.its.unc.edu> References: <272BC2FE0F18300DCCDCDA04@dhcp00032.its.unc.edu> <0E5A945BF34FF898C9F1E390@dhcp00032.its.unc.edu> Message-ID: <20090618235703.GA5674@brong.net> On Thu, Jun 18, 2009 at 05:44:19PM -0400, Michael Bacon wrote: > Another one stomped here. This time, it's a 32/64 bit issue. myinit in > cyrusdb_skiplist.c assumes that type_t is 4 bytes long, and writes out > that many from the current timestamp when creating $confdir/db/skipstamp. Actually, reading the code, that's not strictly true: > a = htonl(global_recovery); > - if (r != -1) r = write(fd, &a, 4); > + if (r != -1) r = write(fd, &a, sizeof(time_t)); It writes "a", which is the result of calling htonl on global_recovery. If htonl isn't returning a 32 bit value of the lower order bytes of the value that it's given, then this bug is going to be causing a LOT more problems than just this. We assume this works in quite a few other places in the code, including the timestamp value in the skiplist header itself, and in places throughout the mailbox code too. "htonl" => "host to net long" by my reading. There's also htonll for 64 bit values. Is your platform creating net longlongs? time_t a; There's the actual bug. That should be bit32 a; Bron. From brong at fastmail.fm Fri Jun 19 01:46:41 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Fri, 19 Jun 2009 15:46:41 +1000 Subject: Repeat recovers on databases In-Reply-To: <20090619044714.GA2143@boogie.lpds.sztaki.hu> References: <272BC2FE0F18300DCCDCDA04@dhcp00032.its.unc.edu><0E5A945BF34FF898C9F1E390@dhcp00032.its.unc.edu><20090619000916.GB5674@brong.net> <20090619044714.GA2143@boogie.lpds.sztaki.hu> Message-ID: <1245390401.8807.1321145869@webmail.messagingengine.com> On Fri, 19 Jun 2009 06:47 +0200, "Gabor Gombas" wrote: > On Fri, Jun 19, 2009 at 10:09:16AM +1000, Bron Gondwana wrote: > > > @@ -192,6 +192,18 @@ struct db_list { > > static time_t global_recovery = 0; > > static struct db_list *open_db = NULL; > > > > +#define BIT32_MAX 4294967295U > > + > > +#if UINT_MAX == BIT32_MAX > > +typedef unsigned int bit32; > > +#elif ULONG_MAX == BIT32_MAX > > +typedef unsigned long bit32; > > +#elif USHRT_MAX == BIT32_MAX > > +typedef unsigned short bit32; > > +#else > > +#error dont know what to use for bit32 > > +#endif > > + > > If you're touching this code, why not use standard stdint.h types like > uint32_t here? Yeah, I was just thinking that actually. Mostly because, well - that's what's already there! I'll do a stdint rewrite of it all some time soon :) Bron. -- Bron Gondwana brong at fastmail.fm From baconm at email.unc.edu Fri Jun 19 09:40:17 2009 From: baconm at email.unc.edu (Michael Bacon) Date: Fri, 19 Jun 2009 09:40:17 -0400 Subject: Repeat recovers on databases In-Reply-To: <20090619000916.GB5674@brong.net> References: <272BC2FE0F18300DCCDCDA04@dhcp00032.its.unc.edu> <0E5A945BF34FF898C9F1E390@dhcp00032.its.unc.edu> <20090619000916.GB5674@brong.net> Message-ID: Right, right, I suppose changing database formats is somehow "bad..." :) This fix also works -- thanks. -Michael --On June 19, 2009 10:09:16 AM +1000 Bron Gondwana wrote: > On Thu, Jun 18, 2009 at 05:44:19PM -0400, Michael Bacon wrote: >> The fix for it is below. I will also open a bugzilla issue for this. > > I think this is actually a better fix that keeps things in the > right type on to disk. Can you please test it on your platform. > > Thanks, > > Bron. From baconm at email.unc.edu Fri Jun 19 15:43:43 2009 From: baconm at email.unc.edu (Michael Bacon) Date: Fri, 19 Jun 2009 15:43:43 -0400 Subject: Repeat recovers on databases In-Reply-To: <20090618235703.GA5674@brong.net> References: <272BC2FE0F18300DCCDCDA04@dhcp00032.its.unc.edu> <0E5A945BF34FF898C9F1E390@dhcp00032.its.unc.edu> <20090618235703.GA5674@brong.net> Message-ID: <12A1FE1EA0F010454F35FE96@dhcp00032.its.unc.edu> --On June 19, 2009 9:57:03 AM +1000 Bron Gondwana wrote: > On Thu, Jun 18, 2009 at 05:44:19PM -0400, Michael Bacon wrote: >> Another one stomped here. This time, it's a 32/64 bit issue. myinit in >> cyrusdb_skiplist.c assumes that type_t is 4 bytes long, and writes out >> that many from the current timestamp when creating >> $confdir/db/skipstamp. > > Actually, reading the code, that's not strictly true: > >> a = htonl(global_recovery); >> - if (r != -1) r = write(fd, &a, 4); >> + if (r != -1) r = write(fd, &a, sizeof(time_t)); > > It writes "a", which is the result of calling htonl on global_recovery. > > If htonl isn't returning a 32 bit value of the lower order bytes of the > value that it's given, then this bug is going to be causing a LOT more > problems than just this. We assume this works in quite a few other > places in the code, including the timestamp value in the skiplist header > itself, and in places throughout the mailbox code too. > > "htonl" => "host to net long" by my reading. There's also htonll for 64 > bit values. Is your platform creating net longlongs? Good question -- this may be a Solaris bug after all. Solaris clearly defines in the man page that htonl is supposed to return a uint32_t from htonl, but looking at sys/byteorder.h, that's um, not being enforced... #if defined(_BIG_ENDIAN) && !defined(ntohl) && !defined(__lint) /* big-endian */ #define ntohl(x) (x) #define ntohs(x) (x) #define htonl(x) (x) #define htons(x) (x) #elif !defined(ntohl) /* little-endian */ I think I may give our friends out in CA a call here... -Michael From baconm at email.unc.edu Fri Jun 19 16:12:40 2009 From: baconm at email.unc.edu (Michael Bacon) Date: Fri, 19 Jun 2009 16:12:40 -0400 Subject: Repeat recovers on databases In-Reply-To: <12A1FE1EA0F010454F35FE96@dhcp00032.its.unc.edu> References: <272BC2FE0F18300DCCDCDA04@dhcp00032.its.unc.edu> <0E5A945BF34FF898C9F1E390@dhcp00032.its.unc.edu> <20090618235703.GA5674@brong.net> <12A1FE1EA0F010454F35FE96@dhcp00032.its.unc.edu> Message-ID: <332C5ACDA5A2292A07DB6498@dhcp00032.its.unc.edu> (Dropping info-cyrus on the followup) --On June 19, 2009 3:43:43 PM -0400 Michael Bacon wrote: > --On June 19, 2009 9:57:03 AM +1000 Bron Gondwana > wrote: > >> On Thu, Jun 18, 2009 at 05:44:19PM -0400, Michael Bacon wrote: >>> Another one stomped here. This time, it's a 32/64 bit issue. myinit in >>> cyrusdb_skiplist.c assumes that type_t is 4 bytes long, and writes out >>> that many from the current timestamp when creating >>> $confdir/db/skipstamp. >> >> Actually, reading the code, that's not strictly true: >> >>> a = htonl(global_recovery); >>> - if (r != -1) r = write(fd, &a, 4); >>> + if (r != -1) r = write(fd, &a, sizeof(time_t)); >> >> It writes "a", which is the result of calling htonl on global_recovery. >> >> If htonl isn't returning a 32 bit value of the lower order bytes of the >> value that it's given, then this bug is going to be causing a LOT more >> problems than just this. We assume this works in quite a few other >> places in the code, including the timestamp value in the skiplist header >> itself, and in places throughout the mailbox code too. >> >> "htonl" => "host to net long" by my reading. There's also htonll for 64 >> bit values. Is your platform creating net longlongs? > > Good question -- this may be a Solaris bug after all. Solaris clearly > defines in the man page that htonl is supposed to return a uint32_t from > htonl, but looking at sys/byteorder.h, that's um, not being enforced... > ># if defined(_BIG_ENDIAN) && !defined(ntohl) && !defined(__lint) > /* big-endian */ ># define ntohl(x) (x) ># define ntohs(x) (x) ># define htonl(x) (x) ># define htons(x) (x) > ># elif !defined(ntohl) /* little-endian */ > > I think I may give our friends out in CA a call here... I've put in a ticket with Sun on this, but in thinking about this, I'm pretty sure this kind of definition is widespread (on our Linux 2.6.9 login cluster it's the same story in netinet/in.h), so while I can point it out to Sun, expecting strong typing to come out of the byteorder functions is probably a general mistake. Since the functions explicitly want a uint32_t or a uint16_t as the argument, the 100% proper thing to do would seem to me to do an explicit typecast in the argument to these functions. If it's just a null macro, that solves the problem, and if it's a real function, it's good form anyway. -Michael From baconm at email.unc.edu Fri Jun 19 18:10:12 2009 From: baconm at email.unc.edu (Michael Bacon) Date: Fri, 19 Jun 2009 18:10:12 -0400 Subject: Repeat recovers on databases In-Reply-To: <332C5ACDA5A2292A07DB6498@dhcp00032.its.unc.edu> References: <272BC2FE0F18300DCCDCDA04@dhcp00032.its.unc.edu> <0E5A945BF34FF898C9F1E390@dhcp00032.its.unc.edu> <20090618235703.GA5674@brong.net> <12A1FE1EA0F010454F35FE96@dhcp00032.its.unc.edu> <332C5ACDA5A2292A07DB6498@dhcp00032.its.unc.edu> Message-ID: --On June 19, 2009 4:12:40 PM -0400 Michael Bacon wrote: > (Dropping info-cyrus on the followup) > > --On June 19, 2009 3:43:43 PM -0400 Michael Bacon > wrote: > >> --On June 19, 2009 9:57:03 AM +1000 Bron Gondwana >> wrote: >> >>> On Thu, Jun 18, 2009 at 05:44:19PM -0400, Michael Bacon wrote: >>>> Another one stomped here. This time, it's a 32/64 bit issue. myinit >>>> in cyrusdb_skiplist.c assumes that type_t is 4 bytes long, and writes >>>> out that many from the current timestamp when creating >>>> $confdir/db/skipstamp. >>> >>> Actually, reading the code, that's not strictly true: >>> >>>> a = htonl(global_recovery); >>>> - if (r != -1) r = write(fd, &a, 4); >>>> + if (r != -1) r = write(fd, &a, sizeof(time_t)); >>> >>> It writes "a", which is the result of calling htonl on global_recovery. >>> >>> If htonl isn't returning a 32 bit value of the lower order bytes of the >>> value that it's given, then this bug is going to be causing a LOT more >>> problems than just this. We assume this works in quite a few other >>> places in the code, including the timestamp value in the skiplist header >>> itself, and in places throughout the mailbox code too. >>> >>> "htonl" => "host to net long" by my reading. There's also htonll for 64 >>> bit values. Is your platform creating net longlongs? >> >> Good question -- this may be a Solaris bug after all. Solaris clearly >> defines in the man page that htonl is supposed to return a uint32_t from >> htonl, but looking at sys/byteorder.h, that's um, not being enforced... >> >># if defined(_BIG_ENDIAN) && !defined(ntohl) && !defined(__lint) >> /* big-endian */ >># define ntohl(x) (x) >># define ntohs(x) (x) >># define htonl(x) (x) >># define htons(x) (x) >> >># elif !defined(ntohl) /* little-endian */ >> >> I think I may give our friends out in CA a call here... > > I've put in a ticket with Sun on this, but in thinking about this, I'm > pretty sure this kind of definition is widespread (on our Linux 2.6.9 > login cluster it's the same story in netinet/in.h), so while I can point > it out to Sun, expecting strong typing to come out of the byteorder > functions is probably a general mistake. Since the functions explicitly > want a uint32_t or a uint16_t as the argument, the 100% proper thing to > do would seem to me to do an explicit typecast in the argument to these > functions. If it's just a null macro, that solves the problem, and if > it's a real function, it's good form anyway. Okay, so here's a patch to go against current CVS+Bron's last patch which converts everything over to uint32_t and does explicit typecasting on the arguments to all byteorder calls. This passes my basic, non-production tests, but it may not be the way folks want to proceed, so I'll float it out there for feedback. I realize this requires C99 spec compliance, but is that still problematic in 2009? -Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: skiplist_uint32.patch Type: application/octet-stream Size: 14006 bytes Desc: not available Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20090619/f6a5a6c6/attachment-0001.obj From brong at fastmail.fm Fri Jun 19 20:54:25 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Sat, 20 Jun 2009 10:54:25 +1000 Subject: Repeat recovers on databases In-Reply-To: <332C5ACDA5A2292A07DB6498@dhcp00032.its.unc.edu> References: <272BC2FE0F18300DCCDCDA04@dhcp00032.its.unc.edu><0E5A945BF34FF898C9F1E390@dhcp00032.its.unc.edu><20090618235703.GA5674@brong.net><12A1FE1EA0F010454F35FE96@dhcp00032.its.unc.edu> <332C5ACDA5A2292A07DB6498@dhcp00032.its.unc.edu> Message-ID: <1245459265.10318.1321282001@webmail.messagingengine.com> On Fri, 19 Jun 2009 16:12 -0400, "Michael Bacon" wrote: > (Dropping info-cyrus on the followup) > > --On June 19, 2009 3:43:43 PM -0400 Michael Bacon > wrote: > > > --On June 19, 2009 9:57:03 AM +1000 Bron Gondwana > > wrote: > ># if defined(_BIG_ENDIAN) && !defined(ntohl) && !defined(__lint) > > /* big-endian */ > ># define ntohl(x) (x) > ># define ntohs(x) (x) > ># define htonl(x) (x) > ># define htons(x) (x) > > > ># elif !defined(ntohl) /* little-endian */ > > > > I think I may give our friends out in CA a call here... > > I've put in a ticket with Sun on this, but in thinking about this, I'm > pretty sure this kind of definition is widespread (on our Linux 2.6.9 > login > cluster it's the same story in netinet/in.h), so while I can point it out > to Sun, expecting strong typing to come out of the byteorder functions is > probably a general mistake. Since the functions explicitly want a > uint32_t > or a uint16_t as the argument, the 100% proper thing to do would seem to > me > to do an explicit typecast in the argument to these functions. If it's > just a null macro, that solves the problem, and if it's a real function, > it's good form anyway. I think it's entirely our fault for storing the result in a time_t, which was 64 bits, and of course it got mapped to the last 4 bytes as follows: 0 0 0 0 t t t t And then we treated it like a string and wrote just the first 4 bytes. It's not Sun's bug, it was Cyrus'. The correct thing to do (and the change that I made in the patch I sent) was to store it in a 32 bit value: t t t t I'm working on a patch to replace the whole lot with uint32_t anyway - standard types for the win :) Bron. -- Bron Gondwana brong at fastmail.fm From brong at fastmail.fm Wed Jun 24 23:34:36 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Thu, 25 Jun 2009 13:34:36 +1000 Subject: Incorrect size calculations on bogus messages Message-ID: <20090625033435.GA9422@brong.net> Here's a funny one. I've recreated it as a simple testcase which I'll paste below. Basically, a message with invalid mime structure causes cyrus to put the wrong "size" information in its headers. Seems some spammers have been generating these, and they show up as replication errors because the index size doesn't match the file size. [brong at imap3 hm]$ cat /mnt/data8/slot308/store23/data/b/user/brong/390978. Return-Path: Received: from compute2.internal (compute2.internal [10.202.2.42]) by store23m.internal (Cyrus v2.3.14-fmsvn18904-c7f26adc) with LMTPA; Wed, 24 Jun 2009 21:53:09 -0400 X-Sieve: CMU Sieve 2.3 X-Spam-score: 1.4 X-Spam-hits: BAYES_20 -0.74, MISSING_MID 0.001, NO_RECEIVED -0.001, NO_RELAYS -0.001, TVD_SPACE_RATIO 2.219, BAYES_USED user X-Spam-source: IP='127.0.0.1', Host='unk', Country='unk', FromHeader='fm', MailFrom='fm' X-Spam-charsets: X-Attached: ForwardedMessage X-Resolved-to: brong at fastmail.fm X-Mail-from: brong at fastmail.fm Received: from test ([10.202.2.231]) by compute2.internal (LMTPProxy); Wed, 24 Jun 2009 21:53:08 -0400 Date: 20 Jun 2009 07:21:45 -0000 MIME-Version: 1.0 To: brong at fastmail.fm Subject: bogusmessage From: brong at fastmail.fm Content-Type: multipart/mixed; boundary="=_31ff156115c676d4fc4fe82130032447" Message-ID: --=_31ff156115c676d4fc4fe82130032447 Content-Transfer-Encoding: Content-Type: message/rfc822; name="ForwardedMessage"; Content-Disposition: inline; filename="ForwardedMessage"; --=_31ff156115c676d4fc4fe82130032447-- [brong at imap3 hm]$ ls -la /mnt/data8/slot308/store23/data/b/user/brong/390978. -rw------- 1 cyrus mail 1189 Jun 24 21:53 /mnt/data8/slot308/store23/data/b/user/brong/390978. [brong at imap3 hm]$ utils/oneoff/index_uids.pl -u 390978 -D /mnt/meta8/slot308/store23/meta/b/user/brong/cyrus.index Uid: 390978 InternalDate: 1245894789 SentDate: 1245513600 Size: 1147 HeaderSize: 961 ContentOffset: 961 CacheOffset: 1066472 LastUpdated: 1245894810 SystemFlags: 00000000000000000000000000000000 UserFlags: 00000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 ContentLines: 5 CacheVersion: 2 MessageGuid: a8c26e46c4ce83fb5d77d360f024e3bbaa8d7371 Modseq: 14869 ======================= So, the file on disk is 1189 bytes long, but the cyrus.index says the size is 1147 bytes. The reason for this is that cyrus builds the bodystructure and calculates the size of all the component parts rather than just using the actual file size. I guess my question is - is there any reason not to just put the actual size-in-bytes of the file into the index header record? Envelope parsing might be slightly messed up, but at least the basics will be OK. Bron. From murch at andrew.cmu.edu Thu Jun 25 10:36:53 2009 From: murch at andrew.cmu.edu (Ken Murchison) Date: Thu, 25 Jun 2009 10:36:53 -0400 Subject: Incorrect size calculations on bogus messages In-Reply-To: <20090625033435.GA9422@brong.net> References: <20090625033435.GA9422@brong.net> Message-ID: <4A438B85.9090002@andrew.cmu.edu> I wonder if we should just reject these messages in lmtpd. Bron Gondwana wrote: > Here's a funny one. I've recreated it as a simple testcase which I'll > paste below. Basically, a message with invalid mime structure causes > cyrus to put the wrong "size" information in its headers. > > Seems some spammers have been generating these, and they show up as > replication errors because the index size doesn't match the file size. > > [brong at imap3 hm]$ cat /mnt/data8/slot308/store23/data/b/user/brong/390978. > Return-Path: > Received: from compute2.internal (compute2.internal [10.202.2.42]) > by store23m.internal (Cyrus v2.3.14-fmsvn18904-c7f26adc) with LMTPA; > Wed, 24 Jun 2009 21:53:09 -0400 > X-Sieve: CMU Sieve 2.3 > X-Spam-score: 1.4 > X-Spam-hits: BAYES_20 -0.74, MISSING_MID 0.001, NO_RECEIVED -0.001, NO_RELAYS -0.001, > TVD_SPACE_RATIO 2.219, BAYES_USED user > X-Spam-source: IP='127.0.0.1', Host='unk', Country='unk', FromHeader='fm', MailFrom='fm' > X-Spam-charsets: > X-Attached: ForwardedMessage > X-Resolved-to: brong at fastmail.fm > X-Mail-from: brong at fastmail.fm > Received: from test ([10.202.2.231]) > by compute2.internal (LMTPProxy); Wed, 24 Jun 2009 21:53:08 -0400 > Date: 20 Jun 2009 07:21:45 -0000 > MIME-Version: 1.0 > To: brong at fastmail.fm > Subject: bogusmessage > From: brong at fastmail.fm > Content-Type: multipart/mixed; > boundary="=_31ff156115c676d4fc4fe82130032447" > Message-ID: > > --=_31ff156115c676d4fc4fe82130032447 > Content-Transfer-Encoding: > Content-Type: message/rfc822; > name="ForwardedMessage"; > Content-Disposition: inline; > filename="ForwardedMessage"; > --=_31ff156115c676d4fc4fe82130032447-- > > [brong at imap3 hm]$ ls -la /mnt/data8/slot308/store23/data/b/user/brong/390978. > -rw------- 1 cyrus mail 1189 Jun 24 21:53 /mnt/data8/slot308/store23/data/b/user/brong/390978. > > [brong at imap3 hm]$ utils/oneoff/index_uids.pl -u 390978 -D /mnt/meta8/slot308/store23/meta/b/user/brong/cyrus.index > Uid: 390978 > InternalDate: 1245894789 > SentDate: 1245513600 > Size: 1147 > HeaderSize: 961 > ContentOffset: 961 > CacheOffset: 1066472 > LastUpdated: 1245894810 > SystemFlags: 00000000000000000000000000000000 > UserFlags: 00000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > ContentLines: 5 > CacheVersion: 2 > MessageGuid: a8c26e46c4ce83fb5d77d360f024e3bbaa8d7371 > Modseq: 14869 > > ======================= > > So, the file on disk is 1189 bytes long, but the > cyrus.index says the size is 1147 bytes. > > The reason for this is that cyrus builds the > bodystructure and calculates the size of all > the component parts rather than just using the > actual file size. > > I guess my question is - is there any reason not > to just put the actual size-in-bytes of the file > into the index header record? Envelope parsing > might be slightly messed up, but at least the > basics will be OK. > > Bron. > > -- Kenneth Murchison Systems Programmer Carnegie Mellon University From carson at taltos.org Thu Jun 25 14:29:38 2009 From: carson at taltos.org (Carson Gaspar) Date: Thu, 25 Jun 2009 11:29:38 -0700 Subject: Incorrect size calculations on bogus messages In-Reply-To: <4A438B85.9090002@andrew.cmu.edu> References: <20090625033435.GA9422@brong.net> <4A438B85.9090002@andrew.cmu.edu> Message-ID: <4A43C212.1000909@taltos.org> Ken Murchison wrote: > I wonder if we should just reject these messages in lmtpd. I wouldn't complain. When I was at Morgan Stanley I worked with Victor Duchovny on a MIME canonicalizer. We discovered all _sorts_ of "interesting" MIME and base64 issues. It is possible to create a mail message in such a way that 6 different mail clients will see 6 different attachments. If you realize that your antivirus is just such a client, the security issues quickly become apparent... And don't get me started on the ZIP format... -- Carson From lists at egidy.de Fri Jun 26 03:26:40 2009 From: lists at egidy.de (Gerd v. Egidy) Date: Fri, 26 Jun 2009 09:26:40 +0200 Subject: Incorrect size calculations on bogus messages In-Reply-To: <4A43C212.1000909@taltos.org> References: <20090625033435.GA9422@brong.net> <4A438B85.9090002@andrew.cmu.edu> <4A43C212.1000909@taltos.org> Message-ID: <200906260926.40535.lists@egidy.de> Hi Carson, > > I wonder if we should just reject these messages in lmtpd. > > I wouldn't complain. When I was at Morgan Stanley I worked with Victor > Duchovny on a MIME canonicalizer. is this an open source project available somewhere? Kind regards, Gerd -- Address (better: trap) for people I really don't want to get mail from: jonas at cactusamerica.com From lists at egidy.de Fri Jun 26 03:35:30 2009 From: lists at egidy.de (Gerd v. Egidy) Date: Fri, 26 Jun 2009 09:35:30 +0200 Subject: Incorrect size calculations on bogus messages In-Reply-To: <4A438B85.9090002@andrew.cmu.edu> References: <20090625033435.GA9422@brong.net> <4A438B85.9090002@andrew.cmu.edu> Message-ID: <200906260935.30929.lists@egidy.de> Hi Ken, > I wonder if we should just reject these messages in lmtpd. if your mailer daemon has already accepted the mail but lmtp rejects it, you have to create a bounce message. When the message is spam you'll usually get a faked sender address and have problems delivering the bounce. I've been creating all kinds of solutions to get rid of such bounces. So please don't add another case where lmtp rejects a message without at the same time creating a filter for the mailer daemon which uses exactly the same criteria for rejection. The filter could be created for the milter interface and thus work for sendmail and postfix. Kind regards, Gerd -- Address (better: trap) for people I really don't want to get mail from: jonas at cactusamerica.com From woods-cyrus at weird.com Sat Jun 27 21:47:42 2009 From: woods-cyrus at weird.com (Greg A. Woods) Date: Sat, 27 Jun 2009 21:47:42 -0400 Subject: Incorrect size calculations on bogus messages In-Reply-To: <200906260935.30929.lists@egidy.de> References: <20090625033435.GA9422@brong.net> <4A438B85.9090002@andrew.cmu.edu> <200906260935.30929.lists@egidy.de> Message-ID: At Fri, 26 Jun 2009 09:35:30 +0200, "Gerd v. Egidy" wrote: Subject: Re: Incorrect size calculations on bogus messages > > if your mailer daemon has already accepted the mail but lmtp rejects it, you > have to create a bounce message. When the message is spam you'll usually get a > faked sender address and have problems delivering the bounce. I've been > creating all kinds of solutions to get rid of such bounces. > > So please don't add another case where lmtp rejects a message without at the > same time creating a filter for the mailer daemon which uses exactly the same > criteria for rejection. The filter could be created for the milter interface > and thus work for sendmail and postfix. Seconded, and agreed 10^100 fold and more. Never EVER put policy rejections in the LDA -- only ever in the MTA. Backscatter attacks, be they purposeful or "accidental", are never fun. Unfortunately they are an ongoing reality for all to many sites. -- Greg A. Woods +1 416 218-0098 VE3TCP RoboHack Planix, Inc. Secrets of the Weird From carson at taltos.org Sun Jun 28 05:14:06 2009 From: carson at taltos.org (Carson Gaspar) Date: Sun, 28 Jun 2009 02:14:06 -0700 Subject: Incorrect size calculations on bogus messages In-Reply-To: <200906260926.40535.lists@egidy.de> References: <20090625033435.GA9422@brong.net> <4A438B85.9090002@andrew.cmu.edu> <4A43C212.1000909@taltos.org> <200906260926.40535.lists@egidy.de> Message-ID: <4A47345E.30000@taltos.org> Gerd v. Egidy wrote: > Hi Carson, > >>> I wonder if we should just reject these messages in lmtpd. >> I wouldn't complain. When I was at Morgan Stanley I worked with Victor >> Duchovni on a MIME canonicalizer. > > is this an open source project available somewhere? Not as far as I know, sadly. -- Carson