From wes at umich.edu Tue Aug 5 15:23:11 2008 From: wes at umich.edu (Wesley Craig) Date: Tue, 5 Aug 2008 15:23:11 -0400 Subject: bug in the proxy module ... In-Reply-To: <7EF5DBE4C76A7B4DA655334E9F2BFD26591E7131D5@FRSPX100.fr01.awl.atosorigin.net> References: <4B9DAB02-1117-4349-AD5C-38E4278AABE7@umich.edu> <484E8AAB.6000305@andrew.cmu.edu> <21FEDCA5-428A-4DDD-936B-F428731ABEC8@umich.edu> <7EF5DBE4C76A7B4DA655334E9F2BFD26591E7131D5@FRSPX100.fr01.awl.atosorigin.net> Message-ID: I think this is fundamentally the same as calling select, no? I've done a little more analysis of when proxy_check_input() is called -- just two places in imapd. The first is when proxying for the IDLE command. In that case, the MUA issues the IDLE command, and if the backend supports IDLE, the proxy sends the IDLE command to the backend. The proxy then sits for whatever the idle timeout is in proxy_check_input(). The old version of this code was fine, because proxy_check_input() is called in a loop. So if there was more data, it would be copied on the next pass. A slight refinement might be to prioritize copying from the server to the client over observing that the client has provided new input, thus breaking the loop. The other imapd call to proxy_check_input() is in the main loop, prior to accepting the next command. Prioritizing copies over checking for client input would probably improve this case as well. The pop3d bug in particular is related to how bitpipe() is written, i.e., it doesn't have any protocol knowledge at all. You'll also notice that bitpipe() is responsible for flushing the data, proxy_check_input() doesn't do that. Changing proxy_check_input() to handle server to client IO copies first and then return would give the outer loop a chance to run. Simply backing out this change: https://bugzilla.andrew.cmu.edu/cgi-bin/cvsweb.cgi/src/cyrus/imap/ proxy.c.diff?r1=1.3;r2=1.4 will probably fix pop3d. A "more correct" fix for the problem referenced in that change: make sure we send all available data, not just one buffer full. this solves a pipelining problem where a response to a command run on a proxy could be output in the middle of a response to a command run on a backend is probably the re-ordering. I think there is in fact no need to "send all available data", since the outer loop should see to it that proxy_check_input() is called multiple times. Ideally, the outer loop would also encapsulate whatever protocol specific knowledge is required -- none in the case of callers like bitpipe(), significant in the case of imapd. Perhaps someone knows how to exercise the pipelining problem referenced in the above change? :wes ps The "reordering" can probably be achieved by adding an else to this if: if ((err = prot_error(pin)) != NULL) { ... } else { return 0; } ... after backing out the change above. This way proxy_check_input() will never tell the caller that there's "unhandled" input from the client until no IO copying can be done. On 29 Jul 2008, at 03:54, Poujol Christophe wrote: > This is a proposal to solve the possible dead lock in proxy. > > The problem was : if the length of the last packet received by > the proxy from one back process is 4096 bytes, the proxy expects to > receive data and the back process waits for commands. > > I propose you a shortcut : > if the proxy receives 4096 bytes the proxy puts back the last byte > into the buffer and it sends the first 4095 bytes through the > outputstream > (the last byte will be read at the next loop). > if the proxy receives less than 4096 bytes it works like previously. > > The initial code could be modified from : > > > > if (pout) { > const char *err; > char buf[4096]; > int c; > > do { > c = prot_read(pin, buf, sizeof(buf)); > > if (c == 0 || c < 0) break; > prot_write(pout, buf, c); > } while (c == sizeof(buf)); > > if ((err = prot_error(pin)) != NULL) { > > > > > into > > > > if (pout) { > const char *err; > char buf[4096]; > int c; > > do { > c = prot_read(pin, buf, sizeof(buf)); > > if (c == 0 || c < 0) break; > > if (c == sizeof(buf)) { > prot_ungetc(buf[sizeof(buf) - 1], pin); > prot_write(pout, buf, c - 1); > } > else { > prot_write(pout, buf, c); > } > > } while (c == sizeof(buf)); > > if ((err = prot_error(pin)) != NULL) { > > > > -----Message d'origine----- > De : cyrus-devel-bounces at lists.andrew.cmu.edu [mailto:cyrus-devel- > bounces at lists.andrew.cmu.edu] De la part de Wesley Craig > Envoy? : mercredi 11 juin 2008 21:47 > ? : Ken Murchison > Cc : cyrus-devel at lists.andrew.cmu.edu > Objet : Re: bug in the proxy module ... > > On 10 Jun 2008, at 10:07, Ken Murchison wrote: >> Any suggestions? I'm off thinking about other things at the moment. > > The comment associated with the change is: > > make sure we send all available data, not just one buffer > full. > this solves a pipelining problem where a response to a > command run > on a proxy > could be output in the middle of a response to a command > run on a > backend > > Both versions call prot_select() once. The old code The new code > (attempts) to copy input to output until end of input, but since it's > only called prot_select() once, that's a problem. There are a couple > of possibilities, perhaps you're more familiar with prot and it's > byzantine usage, but here's my analysis: > > 1) Instead of looping on the size of the read, we loop until > prot_read() returns == 0 or < 0. This assumes that pin isn't set to > allow blocking. I don't like this solution, since I'm not terribly > interested in an exhaustive analysis of every possible pin that > proxy_check_input() might get. Maybe you know something I don't, tho. > > 2) Introduce prot_select() into the read/write loop. This > will > allow you to know that there's still input available really without > blocking. Of course, if it's a very large block of data, you might > not see the next block, return control to the calling function, and > get the same pipelining problem mentioned in the CVS log above. > Assuming you're not worried about that scenario, it's a good solution > because it introduces the idea that output from the backend server is > handled prior to input from the client. > > 3) Continuing on the precedence idea above, split the loop > handling > so that backend output is always handled first. Also, always return > control to the caller if you ever have backend output. This way, > you'll only ever take input from the client if the backend isn't > sending anything. I doubt this solves the race mentioned in (2), > either tho. > > 4) Restructure the routines calling proxy_check_input to > know the > structure of the commands being sent and the corresponding > responses. This is the surest way to fix the above problem, i.e., > don't let the proxy server respond to a command until the response to > the command sent by the backend is done. Of course, tho is a huge > pain, probably involving a ton of additional code. From dimma at higis.ru Tue Aug 5 15:57:23 2008 From: dimma at higis.ru (Dmitriy Kirhlarov) Date: Tue, 05 Aug 2008 23:57:23 +0400 Subject: ptloader problem Message-ID: <4898B0A3.40106@higis.ru> Hi, list We find a problem -- when ptloader build with ldap support by gcc4 on amd64 platform it's doesn't work. After investigation ptloader core with gdb we find a problem. (I'm sorry, for possible unpropper problem description) 1. ldap.h have hints: ---- #if LDAP_DEPRECATED LDAP_F( char ** ) ldap_get_values LDAP_P(( /* deprecated, use ldap_get_values_len */ LDAP *ld, LDAPMessage *entry, LDAP_CONST char *target )); ---- 2. cyrus building without "-DLDAP_DEPRECATED", by default and ldap_get_values is "int32" 3. ptloader running 3.1 call libldap 3.2 libldap get values from server 3.3 return pointer to ptloader as int64 3.4 ptloader get it as _int32_ and core dumping My test configuration: cyrus-imapd-2.3.{8,11} with ldap support cyrus-sasl-saslauthd-2.1.22 with ldap support openldap 2.{3,4} FreeBSD 7.0 amd64 This configuration work very good on FreeBSD 6.x amd64. userbase in ldap, authentication over saslauthd, authorization over ptloader. How I can report a but to developers? I can provide my configs and detalize test procedure, if needed. WBR Dmitriy From mloftis at wgops.com Fri Aug 15 14:07:48 2008 From: mloftis at wgops.com (Michael Loftis) Date: Fri, 15 Aug 2008 12:07:48 -0600 Subject: 2.2.13 authentication problems? Message-ID: <29A9FC9A1BB0AE7ECBED365F@ZOP-MACTEL.local> Our 2.2.13 frontends seem to have some...weird authentication problems with our (one remaining) 2.1 backend. after some indeterminate amount of time or transactions they can no longer authenticate to the backends, but ONLY the imap proxyd's. The error sent tot he client is Server(s) unavailable, and the frontend logs couldn't authenticate to backend server: bad protocol / cancel -- the backend doesn't appear to see any auth attempt, jsut a STARTTLS ... after that I can't follow since it's TLS. So *ANY* pointers other than "upgrade" would be appreciated. Please note everything was working until we brought other 2.2 backends into production, so I'm thinking some bug wherein the frontends are not resetting the SASL state or something, and after communicating with a 2.2 backend, have trouble (somehow??) communicating with our 2.1 backend. I can authenticate just fine manually with AUTHENTICATE PLAIN using openssl s_client, so it's not the backend. It's exceedingly difficult to upgrade this particular 2.1 box, partly because you can't migrate mailboxes off of 2.1 servers (again because of the TLS stuff, I patched our backends to allow PLAIN because there was no other option back then, we do NOT store plain text passwords and we're not using Kerb, so the ONLY option to us is PLAIN). As a complete side note let me reregister an old gripe of mine -- the TLS/SSL/etc requirement with PLAIN is still one of the most silly things. Plain text between the backends on the same switch should be allowed, it would sure make this debuggable. At the very least, it's a local policy decision, not something that should be hardcoded. I could be using IPSEC between the hosts, or some other external security mechanism, or anything, but you make NO allowance for that. -- "Genius might be described as a supreme capacity for getting its possessors into trouble of all kinds." -- Samuel Butler From mloftis at wgops.com Fri Aug 15 14:17:39 2008 From: mloftis at wgops.com (Michael Loftis) Date: Fri, 15 Aug 2008 12:17:39 -0600 Subject: 2.2.13 authentication problems? In-Reply-To: <29A9FC9A1BB0AE7ECBED365F@ZOP-MACTEL.local> References: <29A9FC9A1BB0AE7ECBED365F@ZOP-MACTEL.local> Message-ID: <4C527F5898A88A4EAF8A6517@ZOP-MACTEL.local> Also LMTP is NOT affected either, it appears to ONLY be proxyd (2.2)->imapd (2.1) -- and even then apparently only after the same proxyd talks to a 2.2 based backend. Everything was fine until we moved some of our production systems into 2.2, only an occasional error after the frontend upgrades. It bothered me but I dismissed it. The only reason it was occasional was because we only had a few testing mailboxes on the new 2.2 backend. We then upgraded an old 2.1 backend to 2.2 and that's when the problems really started. From wes at umich.edu Fri Aug 15 15:24:52 2008 From: wes at umich.edu (Wesley Craig) Date: Fri, 15 Aug 2008 15:24:52 -0400 Subject: 2.2.13 authentication problems? In-Reply-To: <29A9FC9A1BB0AE7ECBED365F@ZOP-MACTEL.local> References: <29A9FC9A1BB0AE7ECBED365F@ZOP-MACTEL.local> Message-ID: On 15 Aug 2008, at 14:07, Michael Loftis wrote: > Our 2.2.13 frontends seem to have some...weird authentication > problems with our (one remaining) 2.1 backend. after some > indeterminate amount of time or transactions they can no longer > authenticate to the backends, but ONLY the imap proxyd's. The > error sent tot he client is Server(s) unavailable, and the frontend > logs couldn't authenticate to backend server: bad protocol / cancel > -- the backend doesn't appear to see any auth attempt, jsut a > STARTTLS ... after that I can't follow since it's TLS. There are tools that will decrypt the session. See wireshark, ettercap, etc. Without doing an exhaustive search, I expect most do. > Please note everything was working until we brought other 2.2 > backends into production, so I'm thinking some bug wherein the > frontends are not resetting the SASL state or something, and after > communicating with a 2.2 backend, have trouble (somehow??) > communicating with our 2.1 backend. That's a good guess. I've recently found a place in the 2.3 code where the protocol structure for IMAP was being edited during connection establishment. Since my proxyd was communicating with several different backend versions, the (incorrect) change to the IMAP protocol description was causing a core dump. > As a complete side note let me reregister an old gripe of mine -- > the TLS/SSL/etc requirement with PLAIN is still one of the most > silly things. "allowplaintext: yes" doesn't work for you? I never ran 2.1, and haven't run 2.2 in years, so maybe that option is newer.... :wes From mloftis at wgops.com Fri Aug 15 15:54:50 2008 From: mloftis at wgops.com (Michael Loftis) Date: Fri, 15 Aug 2008 13:54:50 -0600 Subject: 2.2.13 authentication problems? In-Reply-To: References: <29A9FC9A1BB0AE7ECBED365F@ZOP-MACTEL.local> Message-ID: <200D55C5060BF6C2DBC89810@ZOP-MACTEL.local> --On August 15, 2008 3:24:52 PM -0400 Wesley Craig wrote: > On 15 Aug 2008, at 14:07, Michael Loftis wrote: >> Our 2.2.13 frontends seem to have some...weird authentication >> problems with our (one remaining) 2.1 backend. after some >> indeterminate amount of time or transactions they can no longer >> authenticate to the backends, but ONLY the imap proxyd's. The >> error sent tot he client is Server(s) unavailable, and the frontend >> logs couldn't authenticate to backend server: bad protocol / cancel >> -- the backend doesn't appear to see any auth attempt, jsut a >> STARTTLS ... after that I can't follow since it's TLS. > > There are tools that will decrypt the session. See wireshark, ettercap, > etc. Without doing an exhaustive search, I expect most do. > >> Please note everything was working until we brought other 2.2 >> backends into production, so I'm thinking some bug wherein the >> frontends are not resetting the SASL state or something, and after >> communicating with a 2.2 backend, have trouble (somehow??) >> communicating with our 2.1 backend. > > That's a good guess. I've recently found a place in the 2.3 code where > the protocol structure for IMAP was being edited during connection > establishment. Since my proxyd was communicating with several different > backend versions, the (incorrect) change to the IMAP protocol description > was causing a core dump. Can you point me to any code lines so maybe I can start looking? Might be it's just not causing a core dump in my version but it's still causing auth issues "somehow". >> As a complete side note let me reregister an old gripe of mine -- >> the TLS/SSL/etc requirement with PLAIN is still one of the most >> silly things. > > "allowplaintext: yes" doesn't work for you? I never ran 2.1, and haven't > run 2.2 in years, so maybe that option is newer.... Nope, never did as far as I know. It'll allow PLAIN but *ONLY* in conjunction with TLS or SSL. Otherwise it won't present the mechanism and will refuse it if tried. It *WILL* work with IMAP LOGIN or POP3 USER+PASS commands w/o TLS/SSL though. W/o that PLAIN won't be accepted at all. Atleast this is the behavior I've observed in 2.2 and 2.1. > > :wes -- "Genius might be described as a supreme capacity for getting its possessors into trouble of all kinds." -- Samuel Butler From wes at umich.edu Fri Aug 15 17:23:58 2008 From: wes at umich.edu (Wesley Craig) Date: Fri, 15 Aug 2008 17:23:58 -0400 Subject: 2.2.13 authentication problems? In-Reply-To: <200D55C5060BF6C2DBC89810@ZOP-MACTEL.local> References: <29A9FC9A1BB0AE7ECBED365F@ZOP-MACTEL.local> <200D55C5060BF6C2DBC89810@ZOP-MACTEL.local> Message-ID: <583D1549-64C8-455E-B216-FFC06E6EFDF1@umich.edu> On 15 Aug 2008, at 15:54, Michael Loftis wrote: > Can you point me to any code lines so maybe I can start looking? > Might be it's just not causing a core dump in my version but it's > still causing auth issues "somehow". The bug I'm thinking of was introduced in 2.3, so it won't be the same. But looking over imap/backend.c is probably worth the effort. You can also introduce some logging, if you don't want to bother with SSL decryption. >> "allowplaintext: yes" doesn't work for you? I never ran 2.1, and >> haven't >> run 2.2 in years, so maybe that option is newer.... > > Nope, never did as far as I know. It'll allow PLAIN but *ONLY* in > conjunction with TLS or SSL. Otherwise it won't present the > mechanism and will refuse it if tried. It *WILL* work with IMAP > LOGIN or POP3 USER+PASS commands w/o TLS/SSL though. W/o that > PLAIN won't be accepted at all. Atleast this is the behavior I've > observed in 2.2 and 2.1. Hm, you might try examining the SASL secprops, of both the client (proxyd) and the server (the backend). :wes From mloftis at wgops.com Fri Aug 15 18:41:20 2008 From: mloftis at wgops.com (Michael Loftis) Date: Fri, 15 Aug 2008 16:41:20 -0600 Subject: 2.2.13 authentication problems? In-Reply-To: <583D1549-64C8-455E-B216-FFC06E6EFDF1@umich.edu> References: <29A9FC9A1BB0AE7ECBED365F@ZOP-MACTEL.local> <200D55C5060BF6C2DBC89810@ZOP-MACTEL.local> <583D1549-64C8-455E-B216-FFC06E6EFDF1@umich.edu> Message-ID: <7B9A3A2E64B7309F352488C5@ZOP-MACTEL.local> --On August 15, 2008 5:23:58 PM -0400 Wesley Craig wrote: > On 15 Aug 2008, at 15:54, Michael Loftis wrote: >> Can you point me to any code lines so maybe I can start looking? >> Might be it's just not causing a core dump in my version but it's >> still causing auth issues "somehow". > > The bug I'm thinking of was introduced in 2.3, so it won't be the same. > But looking over imap/backend.c is probably worth the effort. You can > also introduce some logging, if you don't want to bother with SSL > decryption. Yeah I'm just not quite sure where to start. I already discovered (and had to patch out) the errant bind() call in backend_connect. Found out that was removed in 2.3 (most of our frontend proxies mention their 'servername' as 127.0.0.1 in /etchosts, WORSE they connect to the backends using an entirely different interface unrelated to the servername.) From jc at irbs.com Tue Aug 19 19:55:03 2008 From: jc at irbs.com (John Capo) Date: Tue, 19 Aug 2008 19:55:03 -0400 Subject: mbexamine and block size cache files Message-ID: <20080819235503.GA56814@exuma.irbs.com> mbexamine depends on cache file entries being NULL terminated and some (many) are not. This problem cache file's size is a multiple of the filsystem block size resulting from a reconstruct -G. -rw------- 1 cyrus cyrus 4096 Aug 19 18:44 cyrus.cache End of the cache file hex dump. 0FC0 3E00 0000 0000 0000 0000 0000 0000 0030 [>..............0] 0FD0 222A 2A2A 6A70 6D74 6D74 6364 732A 2A2A ["***jpmtmtcds***] 0FE0 7765 656B 6C79 6368 616E 6765 732D 6A61 [weeklychanges-ja] 0FF0 6E75 6172 7932 326E 6474 6F32 3574 6822 [nuary22ndto25th"] Mbexamine cores attempting to print that subject due to the missing NULL. To>{21} Cc>{0} Bcc>{0} Bus error Mbexamine appears to be the only thing that depends on NULL terminated strings in the cache file. Quick fix patch attached that lets me get on with the task at hand. To>{21} Cc>{0} Bcc>{0} Subjct>{48}"***jpmtmtcds***weeklychanges-january22ndto25th" John Capo -------------- next part -------------- Index: imap/mbexamine.c =================================================================== RCS file: /usr/local/CVS/src/cyrus-imapd/imap/mbexamine.c,v retrieving revision 1.3 diff -u -r1.3 mbexamine.c --- imap/mbexamine.c 25 Apr 2008 14:29:50 -0000 1.3 +++ imap/mbexamine.c 19 Aug 2008 23:22:58 -0000 @@ -396,8 +396,14 @@ printf(" Bcc>{%d}%s\n", CACHE_ITEM_LEN(cacheitem), cacheitem + CACHE_ITEM_SIZE_SKIP); cacheitem = CACHE_ITEM_NEXT(cacheitem); - printf("Subjct>{%d}%s\n", CACHE_ITEM_LEN(cacheitem), - cacheitem + CACHE_ITEM_SIZE_SKIP); + printf("Subjct>{%d}", CACHE_ITEM_LEN(cacheitem)); + fflush(stdout); + + if (CACHE_ITEM_LEN(cacheitem)) + fwrite(cacheitem + CACHE_ITEM_SIZE_SKIP, CACHE_ITEM_LEN(cacheitem), 1, stdout); + + printf("\n"); + fflush(stdout); if(flag) break; } From wes at umich.edu Tue Aug 19 20:19:55 2008 From: wes at umich.edu (Wesley Craig) Date: Tue, 19 Aug 2008 20:19:55 -0400 Subject: mbexamine and block size cache files In-Reply-To: <20080819235503.GA56814@exuma.irbs.com> References: <20080819235503.GA56814@exuma.irbs.com> Message-ID: <5C4D9A17-E7D9-40ED-AFB4-6C9B79F5EC63@umich.edu> I wonder if: printf("Subjct>{%d}%.*s\n", CACHE_ITEM_LEN(cacheitem), CACHE_ITEM_LEN (cacheitem), cacheitem + CACHE_ITEM_SIZE_SKIP); wouldn't do the trick? :wes On 19 Aug 2008, at 19:55, John Capo wrote: > mbexamine depends on cache file entries being NULL terminated and > some (many) are not. From thomas.jarosch at intra2net.com Wed Aug 20 03:42:38 2008 From: thomas.jarosch at intra2net.com (Thomas Jarosch) Date: Wed, 20 Aug 2008 09:42:38 +0200 Subject: Code question about mycanonifyid() in lib/auth_unix.c Message-ID: <200808200942.40368.thomas.jarosch@intra2net.com> Hello Ken, I've noticed a little piece of code and wanted to ask about the original idea behind it. In mycanconifyid() is a special code path if the identifier begins with "group:". If so, we call getgrnam() and then copy the resulting group name into the buffer. I'm wondering why the code does this? F.e. could getgrnam() return a group alias name when querying an LDAP server? Either the group name can change (so we need to check the buffer as in the attached cyrus-imapd-protect-buffer.patch) or it will never change and we can drop the strcpy() like in the cyrus-imapd-remove-unused-strcpy.patch. Cheers, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20080820/1f077d95/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: cyrus-imapd-protect-buffer.patch Type: text/x-patch Size: 502 bytes Desc: not available Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20080820/1f077d95/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: cyurs-imapd-remove-unused-strcpy.patch Type: text/x-patch Size: 440 bytes Desc: not available Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20080820/1f077d95/attachment-0001.bin From jc at irbs.com Wed Aug 20 09:56:51 2008 From: jc at irbs.com (John Capo) Date: Wed, 20 Aug 2008 09:56:51 -0400 Subject: mbexamine and block size cache files In-Reply-To: <5C4D9A17-E7D9-40ED-AFB4-6C9B79F5EC63@umich.edu> References: <20080819235503.GA56814@exuma.irbs.com> <5C4D9A17-E7D9-40ED-AFB4-6C9B79F5EC63@umich.edu> Message-ID: <20080820135651.GA73439@exuma.irbs.com> Quoting Wesley Craig (wes at umich.edu): > I wonder if: > > printf("Subjct>{%d}%.*s\n", CACHE_ITEM_LEN(cacheitem), CACHE_ITEM_LEN > (cacheitem), > cacheitem + CACHE_ITEM_SIZE_SKIP); > > wouldn't do the trick? That does work on FreeBSD 6.3. > > :wes > > On 19 Aug 2008, at 19:55, John Capo wrote: > >mbexamine depends on cache file entries being NULL terminated and > >some (many) are not. From wes at umich.edu Wed Aug 20 10:30:14 2008 From: wes at umich.edu (Wesley Craig) Date: Wed, 20 Aug 2008 10:30:14 -0400 Subject: mbexamine and block size cache files In-Reply-To: <20080820135651.GA73439@exuma.irbs.com> References: <20080819235503.GA56814@exuma.irbs.com> <5C4D9A17-E7D9-40ED-AFB4-6C9B79F5EC63@umich.edu> <20080820135651.GA73439@exuma.irbs.com> Message-ID: <0F2DE84E-8A35-4432-B79B-D0875F044FBE@umich.edu> On 20 Aug 2008, at 09:56, John Capo wrote: > That does work on FreeBSD 6.3. If many entreis are not NULL terminated, perhaps all of the string formats should be %.*s with a specified length, then. Are you willing to submit a tested patch? :wes From jc at irbs.com Wed Aug 20 16:20:16 2008 From: jc at irbs.com (John Capo) Date: Wed, 20 Aug 2008 16:20:16 -0400 Subject: mbexamine and block size cache files In-Reply-To: <0F2DE84E-8A35-4432-B79B-D0875F044FBE@umich.edu> References: <20080819235503.GA56814@exuma.irbs.com> <5C4D9A17-E7D9-40ED-AFB4-6C9B79F5EC63@umich.edu> <20080820135651.GA73439@exuma.irbs.com> <0F2DE84E-8A35-4432-B79B-D0875F044FBE@umich.edu> Message-ID: <20080820202016.GA81890@exuma.irbs.com> Quoting Wesley Craig (wes at umich.edu): > On 20 Aug 2008, at 09:56, John Capo wrote: > >That does work on FreeBSD 6.3. > > If many entreis are not NULL terminated, perhaps all of the string > formats should be %.*s with a specified length, then. Are you > willing to submit a tested patch? Looking closer at the problem, NULL terminated strings are not by design. Each entry is padded to a 4 byte boundary and a NULL will be in the cache file if the entry length is not a 4 byte multiple. If the entry is a 4 byte multiple, the high order byte of the next entry length will be 0 terminating the string if there is a next entry and if the next entry length is <= 0x00FFFFFF. So, anything that uses strings from the cache file must use the length and not depend on NULL terminated strings. I will patch and test mbexamine. John Capo Tuffmail.com From jc at irbs.com Wed Aug 20 18:25:36 2008 From: jc at irbs.com (John Capo) Date: Wed, 20 Aug 2008 18:25:36 -0400 Subject: mbexamine and block size cache files In-Reply-To: <0F2DE84E-8A35-4432-B79B-D0875F044FBE@umich.edu> References: <20080819235503.GA56814@exuma.irbs.com> <5C4D9A17-E7D9-40ED-AFB4-6C9B79F5EC63@umich.edu> <20080820135651.GA73439@exuma.irbs.com> <0F2DE84E-8A35-4432-B79B-D0875F044FBE@umich.edu> Message-ID: <20080820222536.GA87403@exuma.irbs.com> Quoting Wesley Craig (wes at umich.edu): > On 20 Aug 2008, at 09:56, John Capo wrote: > >That does work on FreeBSD 6.3. > > If many entreis are not NULL terminated, perhaps all of the string > formats should be %.*s with a specified length, then. Are you > willing to submit a tested patch? Tested patch attached. John Capo Tuffmail.com -------------- next part -------------- --- ../../cyrus-imapd-2.3.12/imap/mbexamine.c 2008-04-21 16:58:34.000000000 -0400 +++ mbexamine.c 2008-08-20 16:36:09.000000000 -0400 @@ -365,13 +365,13 @@ cacheitem = mailbox.cache_base + CACHE_OFFSET(i); - printf(" Envel>{%d}%s\n", CACHE_ITEM_LEN(cacheitem), + printf(" Envel>{%d}%.*s\n", CACHE_ITEM_LEN(cacheitem), CACHE_ITEM_LEN(cacheitem), cacheitem + CACHE_ITEM_SIZE_SKIP); cacheitem = CACHE_ITEM_NEXT(cacheitem); - printf("BdyStr>{%d}%s\n", CACHE_ITEM_LEN(cacheitem), + printf("BdyStr>{%d}%.*s\n", CACHE_ITEM_LEN(cacheitem), CACHE_ITEM_LEN(cacheitem), cacheitem + CACHE_ITEM_SIZE_SKIP); cacheitem = CACHE_ITEM_NEXT(cacheitem); - printf(" Body>{%d}%s\n", CACHE_ITEM_LEN(cacheitem), + printf(" Body>{%d}%.*s\n", CACHE_ITEM_LEN(cacheitem), CACHE_ITEM_LEN(cacheitem), cacheitem + CACHE_ITEM_SIZE_SKIP); cacheitem = CACHE_ITEM_NEXT(cacheitem); @@ -381,22 +381,22 @@ #endif cacheitem = CACHE_ITEM_NEXT(cacheitem); - printf("CacHdr>{%d}%s\n", CACHE_ITEM_LEN(cacheitem), + printf("CacHdr>{%d}%.*s\n", CACHE_ITEM_LEN(cacheitem), CACHE_ITEM_LEN(cacheitem), cacheitem + CACHE_ITEM_SIZE_SKIP); cacheitem = CACHE_ITEM_NEXT(cacheitem); - printf(" From>{%d}%s\n", CACHE_ITEM_LEN(cacheitem), + printf(" From>{%d}%.*s\n", CACHE_ITEM_LEN(cacheitem), CACHE_ITEM_LEN(cacheitem), cacheitem + CACHE_ITEM_SIZE_SKIP); cacheitem = CACHE_ITEM_NEXT(cacheitem); - printf(" To>{%d}%s\n", CACHE_ITEM_LEN(cacheitem), + printf(" To>{%d}%.*s\n", CACHE_ITEM_LEN(cacheitem), CACHE_ITEM_LEN(cacheitem), cacheitem + CACHE_ITEM_SIZE_SKIP); cacheitem = CACHE_ITEM_NEXT(cacheitem); - printf(" Cc>{%d}%s\n", CACHE_ITEM_LEN(cacheitem), + printf(" Cc>{%d}%.*s\n", CACHE_ITEM_LEN(cacheitem), CACHE_ITEM_LEN(cacheitem), cacheitem + CACHE_ITEM_SIZE_SKIP); cacheitem = CACHE_ITEM_NEXT(cacheitem); - printf(" Bcc>{%d}%s\n", CACHE_ITEM_LEN(cacheitem), + printf(" Bcc>{%d}%.*s\n", CACHE_ITEM_LEN(cacheitem), CACHE_ITEM_LEN(cacheitem), cacheitem + CACHE_ITEM_SIZE_SKIP); cacheitem = CACHE_ITEM_NEXT(cacheitem); - printf("Subjct>{%d}%s\n", CACHE_ITEM_LEN(cacheitem), + printf("Subjct>{%d}%.*s\n", CACHE_ITEM_LEN(cacheitem), CACHE_ITEM_LEN(cacheitem), cacheitem + CACHE_ITEM_SIZE_SKIP); if(flag) break; From thomas.jarosch at intra2net.com Thu Aug 21 04:49:39 2008 From: thomas.jarosch at intra2net.com (Thomas Jarosch) Date: Thu, 21 Aug 2008 10:49:39 +0200 Subject: [patch] got_signal should be volatile Message-ID: <200808211049.40407.thomas.jarosch@intra2net.com> Hello Ken, attached is a small patch to make "got_signal" volatile. My local SVN history says it's a patch from Debian created in 2003. This should make signal handling more robust in case the value of got_signal gets cached in a register. Applies fine to cyrus-imapd 2.3.12p2. Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20080821/1f6f04ca/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: cyrus-imapd-2.1.15-volatile.patch Type: text/x-patch Size: 252 bytes Desc: not available Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20080821/1f6f04ca/attachment.bin From thomas.jarosch at intra2net.com Thu Aug 21 05:01:00 2008 From: thomas.jarosch at intra2net.com (Thomas Jarosch) Date: Thu, 21 Aug 2008 11:01:00 +0200 Subject: [patch] mailbox select option for ipurge Message-ID: <200808211101.01196.thomas.jarosch@intra2net.com> Hello Ken, here's a repost of a patch from 2005 that somehow got lost on cyrus-devel: (http://marc.info/?l=cyrus-devel&m=112134511310688&w=2) The patch still works and applies fine to cyrus-imapd 2.3.12p2 Thomas ----------------------------------------------------------- Hello, I needed a way to purge only a specific mailbox using the ipurge utility. "-f" wasn't the way to go as it's recursive. Please have a look at the attached patch. The new option was first called "-F mailbox" but I can happen too easy that you specify "-f mailbox" and delete all your mail, so I changed it to "-M mailbox". The verbose output was also improved to show which folders were skipped. The patch was developed for cyrus-imapd 2.2.12 and applies cleanly to HEAD. Is this mailinglist or bugzilla the right place to submit patches? Best regards, Thomas Jarosch ----------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20080821/8e15f172/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: cyrus-imapd-ipurge-mailbox.patch Type: text/x-patch Size: 4003 bytes Desc: not available Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20080821/8e15f172/attachment.bin From thomas.jarosch at intra2net.com Thu Aug 21 05:16:59 2008 From: thomas.jarosch at intra2net.com (Thomas Jarosch) Date: Thu, 21 Aug 2008 11:16:59 +0200 Subject: [patch] set INTERNALDATE as mtime on append Message-ID: <200808211117.00205.thomas.jarosch@intra2net.com> Hello Ken, attached is a small patch that sets the mtime of the corresponding file to the INTERNALDATE of a mail. We use a perl script to restore single mailboxes from backup via IMAP and this enables us to restore the arrival time, too. This was the last one from my patch queue ;-) Thomas -- Address (better: trap) for people I really don't want to get mail from: sophia.hope at cactusamerica.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20080821/fabb4c3c/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: cyrus-imapd-2.2.12-internaldate.patch Type: text/x-patch Size: 915 bytes Desc: not available Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20080821/fabb4c3c/attachment-0001.bin From thomas.jarosch at intra2net.com Thu Aug 21 11:01:46 2008 From: thomas.jarosch at intra2net.com (Thomas Jarosch) Date: Thu, 21 Aug 2008 17:01:46 +0200 Subject: [patch] Improve unix socket permissions Message-ID: <200808211701.47556.thomas.jarosch@intra2net.com> Hello together, currently unix sockets get created by cyrus-master with ownership of "root.root" and file mode 0777. Attached patch makes the user, group and file mode configurable. If nothing is specified in cyrus.conf, it defaults to CYRUS_USER (+group of the user) and mode 660 for improved security. Would be nice if someone on BSD / unix could give it a try as the file mode is set via umask() instead of chmod() to prevent a race condition during creation of the socket. The patch runs fine with cyrus-imapd 2.3.12p2 on linux. Cheers, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20080821/3b392b8c/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: cyrus-imapd-unix-socket-permissions.patch Type: text/x-patch Size: 6070 bytes Desc: not available Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20080821/3b392b8c/attachment.bin From wes at umich.edu Thu Aug 21 11:29:32 2008 From: wes at umich.edu (Wesley Craig) Date: Thu, 21 Aug 2008 11:29:32 -0400 Subject: [patch] mailbox select option for ipurge In-Reply-To: <200808211101.01196.thomas.jarosch@intra2net.com> References: <200808211101.01196.thomas.jarosch@intra2net.com> Message-ID: <67F3C3C0-A936-4FAA-81BE-D9C70E333C47@umich.edu> On 21 Aug 2008, at 05:01, Thomas Jarosch wrote: > Is this mailinglist or bugzilla the right place to submit patches? The mailing list is a good place to discuss changes. For example, before a patch is written, it's a good place to get input. Or after a patch is written, the list is a good place to garner support for a patch you're having trouble getting accepted. Or, get input on how a patch should change in order to get it accepted. The point is discussion, tho. The bugzilla is good for tracking changes. For example, you mention that this patch to ipurge was posted to the list around 2005 and then lost. Losing patches is a shame. The bugzilla is there to prevent changes from being lost. :wes From bawood at umich.edu Mon Aug 25 16:38:35 2008 From: bawood at umich.edu (bawood) Date: Mon, 25 Aug 2008 16:38:35 -0400 Subject: mailboxes.db vs ctl_mboxlist -d Message-ID: <200808251638.36004.bawood@umich.edu> Apologies for missing this topic when it came up originally. It seems like having all the fields tab separated in a text dump is a good reason. When you are trying to pull data out of a text file using unix tools it is significantly easier if the fields are all tab delimited, rather than tabs and spaces intermixed. We have several scripts that use the text dump of the database to gather data and generate various reports, which all broke after updating to 2.3.12. Brian On Tue, 04 Sep 2007, Ken Murchison wrote : >Probably no good reason. I would gladly accept a patch to fix this, if >the undump code can be rewritten to accept either dump format. > > >David Carter wrote: > > Is there a reason why "ctl_mboxlist -d" generates output of the >form?: > > user.dpc99\t0 default\tdpc99\tlrswipcda\t > ^^ > tab here > > The actual values in mailboxes.db are stored as: > > Key: user.dpc99 > Value: 0 default dpc99\tlrswipcda\t > ^ > space here > > > Consequently "cyr_dbtool /var/imap/mailboxes.db set < file" expects >input lines of the form: > > user.dpc99\t0 default dpc99\tlrswipcda\t > ^ > space here > > > I just spent a rather puzzled half hour playing with cyr_dbtool, >wondering why utilities were started to segfault and abort() in amusing >ways. > > Entirely my own fault, but it does demonstrate the danger of using >tabs and spaces inconsistently. I think that cyr_dbtool is correct and >ctl_mboxlist is in the wrong here. From wes at umich.edu Tue Aug 26 15:21:09 2008 From: wes at umich.edu (Wesley Craig) Date: Tue, 26 Aug 2008 15:21:09 -0400 Subject: Code question about mycanonifyid() in lib/auth_unix.c In-Reply-To: <200808200942.40368.thomas.jarosch@intra2net.com> References: <200808200942.40368.thomas.jarosch@intra2net.com> Message-ID: <101F64DD-2543-4CA9-8919-285E8E74A30F@umich.edu> On 20 Aug 2008, at 03:42, Thomas Jarosch wrote: > I've noticed a little piece of code and wanted to ask about the > original idea behind it. In mycanconifyid() is a special code path > if the identifier begins with "group:". If so, we call getgrnam() > and then copy the resulting group name into the buffer. > I'm wondering why the code does this? > F.e. could getgrnam() return a group alias name when querying an > LDAP server? > Either the group name can change (so we need to check the buffer as > in the attached cyrus-imapd-protect-buffer.patch) or it will never > change and we can drop the strcpy() like in the cyrus-imapd-remove- > unused-strcpy.patch. I think the comment is pretty illuminating: /* This used to be far more restrictive, but many sites seem to ignore the * ye olde Unix conventions of username. Specifically, we used to * - drop case on the buffer * - disallow lots of non-alpha characters ('-', '_', others) * Now we do neither of these, but impose a very different policy based on * the character map above. */ If you were using nss_ldap, as you mention, and the user types a group like "MiXeDcAsE", the ldap servers matching rules will allow that to match "mixedcase" -- the group name is typically "cn", which is a case-ignore-string, in LDAP parlance. . Since Cyrus isn't restricting the case, one hopes getgrnam() is applying some sort of name canonicalization. (Examining nss_ldap from PADL, I see that the group name returned is indeed the one provided by the LDAP server.) The memberof function provide by auth_unix.c is case sensitive, you'll notice. So, I think cyrus-imapd-protect-buffer.patch is appropriate (committed). :wes From jc at irbs.com Thu Aug 28 18:17:44 2008 From: jc at irbs.com (John Capo) Date: Thu, 28 Aug 2008 18:17:44 -0400 Subject: 2.3.12 transaction problem within skiplist DB->foreach() Message-ID: <20080828221744.GA71671@exuma.irbs.com> . OK User logged in . rename user/abox user/bbox * BYE Fatal error: Internal error: assertion failed: cyrusdb_skiplist.c: 627: db->lock_status == UNLOCKED This results from attempting to update annotations.db in a DB->foreach() callback. assert(db->lock_status == UNLOCKED) in write_lock() cmd_rename() -> annotatemore_rename() -> annotatemore_findall() -> foreach() -> rename_cb() /* foreach allows for subsidary mailbox operations in 'cb'. if there is a txn, 'cb' must make use of it. */ That comment makes sense but the transaction structure created in foreach() is not made available to the callback. foreach() does provide the transaction structure it created when foreach() returns but that's too late for the callback to use the transaction. Sometimes the same assert() is hit when syncserver tries to create a new mailbox. I haven't look into this one yet. Aug 28 14:04:24 m4 syncserver[77241]: Failed to access inbox for yadda Aug 28 14:04:25 m4 syncserver[77241]: Fatal error: Internal error: assertion failed: cyrusdb_skiplist.c: 622: db-> lock_status == UNLOCKED Aug 28 14:04:25 m4 syncserver[77241]: skiplist: closed while still locked Foreach bug, annotation code DB API violation, or ??? If the comment above is correct, the bug is foreach() not providing the transaction to the callback. Annotation code maybe the only place where updates are done from a foreach() callback. Finding all the rocks and procs is tedious. John Capo From jc at irbs.com Thu Aug 28 19:52:19 2008 From: jc at irbs.com (John Capo) Date: Thu, 28 Aug 2008 19:52:19 -0400 Subject: 2.3.12 transaction problem within skiplist DB->foreach() In-Reply-To: <20080828221744.GA71671@exuma.irbs.com> References: <20080828221744.GA71671@exuma.irbs.com> Message-ID: <20080828235219.GA76085@exuma.irbs.com> I'm convinced this is a foreach() bug. myforeach() is a copy of myfetch() with a loop in the middle for the callback. This bug has been there since day one but showed up as just another unexplained IOERROR message untill the asserts were added in 2.3.12. Patch attached that's running on two test boxes. John Capo Tuffmail.com Quoting John Capo (jc at irbs.com): > . OK User logged in > . rename user/abox user/bbox > * BYE Fatal error: Internal error: assertion failed: cyrusdb_skiplist.c: 627: db->lock_status == UNLOCKED > > This results from attempting to update annotations.db in a DB->foreach() callback. > > assert(db->lock_status == UNLOCKED) in write_lock() > > cmd_rename() -> annotatemore_rename() -> annotatemore_findall() -> foreach() -> rename_cb() > > /* foreach allows for subsidary mailbox operations in 'cb'. > if there is a txn, 'cb' must make use of it. > */ > > That comment makes sense but the transaction structure created in > foreach() is not made available to the callback. foreach() does > provide the transaction structure it created when foreach() returns > but that's too late for the callback to use the transaction. > > Sometimes the same assert() is hit when syncserver tries to create > a new mailbox. I haven't look into this one yet. > > Aug 28 14:04:24 m4 syncserver[77241]: Failed to access inbox for yadda > Aug 28 14:04:25 m4 syncserver[77241]: Fatal error: Internal error: assertion failed: cyrusdb_skiplist.c: 622: db-> lock_status == UNLOCKED > Aug 28 14:04:25 m4 syncserver[77241]: skiplist: closed while still locked > > Foreach bug, annotation code DB API violation, or ??? > > If the comment above is correct, the bug is foreach() not providing > the transaction to the callback. Annotation code maybe the only place > where updates are done from a foreach() callback. Finding all the rocks > and procs is tedious. > > John Capo > > > -------------- next part -------------- Index: lib/cyrusdb_skiplist.c =================================================================== RCS file: /usr/local/CVS/src/cyrus-imapd/lib/cyrusdb_skiplist.c,v retrieving revision 1.5 diff -u -r1.5 cyrusdb_skiplist.c --- lib/cyrusdb_skiplist.c 25 Apr 2008 14:29:52 -0000 1.5 +++ lib/cyrusdb_skiplist.c 28 Aug 2008 23:02:38 -0000 @@ -1079,6 +1079,12 @@ if ((r = newtxn(db, &t))) return r; tp = &t; + + *tid = xmalloc(sizeof(struct txn)); + memcpy(*tid, tp, sizeof(struct txn)); + (*tid)->ismalloc = 1; + db->current_txn = *tid; + } else { assert(db->current_txn == *tid); tp = *tid; @@ -1152,6 +1158,8 @@ } if (tid) { + /* Will this ever be true when *tid is allocated above??? + */ if (!*tid) { /* return the txn structure */ From brong at fastmail.fm Thu Aug 28 20:33:42 2008 From: brong at fastmail.fm (Bron Gondwana) Date: Fri, 29 Aug 2008 10:33:42 +1000 Subject: 2.3.12 transaction problem within skiplist DB->foreach() In-Reply-To: <20080828221744.GA71671@exuma.irbs.com> References: <20080828221744.GA71671@exuma.irbs.com> Message-ID: <1219970022.6610.1271082843@webmail.messagingengine.com> On Thu, 28 Aug 2008 18:17:44 -0400, "John Capo" said: > . OK User logged in > . rename user/abox user/bbox > * BYE Fatal error: Internal error: assertion failed: cyrusdb_skiplist.c: > 627: db->lock_status == UNLOCKED > > This results from attempting to update annotations.db in a DB->foreach() > callback. > > assert(db->lock_status == UNLOCKED) in write_lock() > > cmd_rename() -> annotatemore_rename() -> annotatemore_findall() -> > foreach() -> rename_cb() > > /* foreach allows for subsidary mailbox operations in 'cb'. > if there is a txn, 'cb' must make use of it. > */ > > That comment makes sense but the transaction structure created in > foreach() is not made available to the callback. foreach() does > provide the transaction structure it created when foreach() returns > but that's too late for the callback to use the transaction. > > Sometimes the same assert() is hit when syncserver tries to create > a new mailbox. I haven't look into this one yet. > > Aug 28 14:04:24 m4 syncserver[77241]: Failed to access inbox for yadda > Aug 28 14:04:25 m4 syncserver[77241]: Fatal error: Internal error: > assertion failed: cyrusdb_skiplist.c: 622: db-> lock_status == UNLOCKED > Aug 28 14:04:25 m4 syncserver[77241]: skiplist: closed while still locked > > Foreach bug, annotation code DB API violation, or ??? > > If the comment above is correct, the bug is foreach() not providing > the transaction to the callback. Annotation code maybe the only place > where updates are done from a foreach() callback. Finding all the rocks > and procs is tedious. If you get this assert, it's an API violation. Specifically, it's an API violation that previously would have caused corruption. I put the assert there to ensure that it died instead of silently corrupting your skiplist file back when I was debugging everything to do with skiplist files. Traditionally in Cyrus code, you put the transaction inside your rock, and use it for any subsidiary database calls. Personally, I think the database access interface blows goats, but it's what we have to work with unless we do a pretty major refactor of a lot of code! I did extend it so you can do read-only queries without knowing the transaction, but if you're writing, you really should be passed a copy of the outer transaction. I've never seen the syncserver assert. I'd be tempted to change those asserts to syslog statements which include the database name and key name, and just return DBERROR to the calling code. Bron. -- Bron Gondwana brong at fastmail.fm From brong at fastmail.fm Thu Aug 28 20:54:24 2008 From: brong at fastmail.fm (Bron Gondwana) Date: Fri, 29 Aug 2008 10:54:24 +1000 Subject: 2.3.12 transaction problem within skiplist DB->foreach() In-Reply-To: <20080828235219.GA76085@exuma.irbs.com> References: <20080828221744.GA71671@exuma.irbs.com> <20080828235219.GA76085@exuma.irbs.com> Message-ID: <1219971264.8703.1271085479@webmail.messagingengine.com> On Thu, 28 Aug 2008 19:52:19 -0400, "John Capo" said: > I'm convinced this is a foreach() bug. myforeach() is a copy of > myfetch() with a loop in the middle for the callback. This bug has > been there since day one but showed up as just another unexplained > IOERROR message untill the asserts were added in 2.3.12. > > Patch attached that's running on two test boxes. Oooh, I see what you mean. Yes - of course. The double pointer 'txn **tid' gets passed in, and may be needed DURING the foreach by subsidiary code, but that's not possible because it doesn't actually get set until the end of the function. How annoying. Your patch looks viable, though it causes unreachable code later in the function, and I'd probably update myfetch to be matching in behaviour just for clarity. I'll put that together now. Bron. > Quoting John Capo (jc at irbs.com): > > . OK User logged in > > . rename user/abox user/bbox > > * BYE Fatal error: Internal error: assertion failed: cyrusdb_skiplist.c: 627: db->lock_status == UNLOCKED > > > > This results from attempting to update annotations.db in a DB->foreach() callback. > > > > assert(db->lock_status == UNLOCKED) in write_lock() > > > > cmd_rename() -> annotatemore_rename() -> annotatemore_findall() -> foreach() -> rename_cb() > > > > /* foreach allows for subsidary mailbox operations in 'cb'. > > if there is a txn, 'cb' must make use of it. > > */ > > > > That comment makes sense but the transaction structure created in > > foreach() is not made available to the callback. foreach() does > > provide the transaction structure it created when foreach() returns > > but that's too late for the callback to use the transaction. > > > > Sometimes the same assert() is hit when syncserver tries to create > > a new mailbox. I haven't look into this one yet. > > > > Aug 28 14:04:24 m4 syncserver[77241]: Failed to access inbox for yadda > > Aug 28 14:04:25 m4 syncserver[77241]: Fatal error: Internal error: assertion failed: cyrusdb_skiplist.c: 622: db-> lock_status == UNLOCKED > > Aug 28 14:04:25 m4 syncserver[77241]: skiplist: closed while still locked > > > > Foreach bug, annotation code DB API violation, or ??? > > > > If the comment above is correct, the bug is foreach() not providing > > the transaction to the callback. Annotation code maybe the only place > > where updates are done from a foreach() callback. Finding all the rocks > > and procs is tedious. > > > > John Capo > > > > > > -- Bron Gondwana brong at fastmail.fm From jc at irbs.com Thu Aug 28 21:10:19 2008 From: jc at irbs.com (John Capo) Date: Thu, 28 Aug 2008 21:10:19 -0400 Subject: 2.3.12 transaction problem within skiplist DB->foreach() In-Reply-To: <1219971264.8703.1271085479@webmail.messagingengine.com> References: <20080828221744.GA71671@exuma.irbs.com> <20080828235219.GA76085@exuma.irbs.com> <1219971264.8703.1271085479@webmail.messagingengine.com> Message-ID: <20080829011019.GA77531@exuma.irbs.com> Quoting Bron Gondwana (brong at fastmail.fm): > > On Thu, 28 Aug 2008 19:52:19 -0400, "John Capo" said: > > I'm convinced this is a foreach() bug. myforeach() is a copy of > > myfetch() with a loop in the middle for the callback. This bug has > > been there since day one but showed up as just another unexplained > > IOERROR message untill the asserts were added in 2.3.12. > > > > Patch attached that's running on two test boxes. > > Oooh, I see what you mean. Yes - of course. The double pointer 'txn **tid' > gets passed in, and may be needed DURING the foreach by subsidiary code, > but that's not possible because it doesn't actually get set until the end of > the function. How annoying. Best I can tell, the annotation code is the only place that updates during a foreach(). We use annotations to select name space per user, duplicate delivery per user, and per folder expire times, so the bug bites bad here. Replication has been unstable now that all servers are running 2.3.12. I suspect this bug is part of it. > Your patch looks viable, though it causes unreachable code later in the > function, and I'd probably update myfetch to be matching in behaviour > just for clarity. I'll put that together now. I wasn't willing to remove that code but I didn't think it would ever be hit. John Capo > > Bron. > > > Quoting John Capo (jc at irbs.com): > > > . OK User logged in > > > . rename user/abox user/bbox > > > * BYE Fatal error: Internal error: assertion failed: cyrusdb_skiplist.c: 627: db->lock_status == UNLOCKED > > > > > > This results from attempting to update annotations.db in a DB->foreach() callback. > > > > > > assert(db->lock_status == UNLOCKED) in write_lock() > > > > > > cmd_rename() -> annotatemore_rename() -> annotatemore_findall() -> foreach() -> rename_cb() > > > > > > /* foreach allows for subsidary mailbox operations in 'cb'. > > > if there is a txn, 'cb' must make use of it. > > > */ > > > > > > That comment makes sense but the transaction structure created in > > > foreach() is not made available to the callback. foreach() does > > > provide the transaction structure it created when foreach() returns > > > but that's too late for the callback to use the transaction. > > > > > > Sometimes the same assert() is hit when syncserver tries to create > > > a new mailbox. I haven't look into this one yet. > > > > > > Aug 28 14:04:24 m4 syncserver[77241]: Failed to access inbox for yadda > > > Aug 28 14:04:25 m4 syncserver[77241]: Fatal error: Internal error: assertion failed: cyrusdb_skiplist.c: 622: db-> lock_status == UNLOCKED > > > Aug 28 14:04:25 m4 syncserver[77241]: skiplist: closed while still locked > > > > > > Foreach bug, annotation code DB API violation, or ??? > > > > > > If the comment above is correct, the bug is foreach() not providing > > > the transaction to the callback. Annotation code maybe the only place > > > where updates are done from a foreach() callback. Finding all the rocks > > > and procs is tedious. > > > > > > John Capo > > > > > > > > > > -- > Bron Gondwana > brong at fastmail.fm > From brong at fastmail.fm Thu Aug 28 22:15:28 2008 From: brong at fastmail.fm (Bron Gondwana) Date: Fri, 29 Aug 2008 12:15:28 +1000 Subject: 2.3.12 transaction problem within skiplist DB->foreach() In-Reply-To: <20080829011019.GA77531@exuma.irbs.com> References: <20080828221744.GA71671@exuma.irbs.com><20080828235219.GA76085@exuma.irbs.com><1219971264.8703.1271085479@webmail.messagingengine.com> <20080829011019.GA77531@exuma.irbs.com> Message-ID: <1219976128.17589.1271093859@webmail.messagingengine.com> On Thu, 28 Aug 2008 21:10:19 -0400, "John Capo" said: > Quoting Bron Gondwana (brong at fastmail.fm): > > > > On Thu, 28 Aug 2008 19:52:19 -0400, "John Capo" said: > > > I'm convinced this is a foreach() bug. myforeach() is a copy of > > > myfetch() with a loop in the middle for the callback. This bug has > > > been there since day one but showed up as just another unexplained > > > IOERROR message untill the asserts were added in 2.3.12. > > > > > > Patch attached that's running on two test boxes. > > > > Oooh, I see what you mean. Yes - of course. The double pointer 'txn **tid' > > gets passed in, and may be needed DURING the foreach by subsidiary code, > > but that's not possible because it doesn't actually get set until the end of > > the function. How annoying. > > Best I can tell, the annotation code is the only place that updates > during a foreach(). We use annotations to select name space per > user, duplicate delivery per user, and per folder expire times, so > the bug bites bad here. Replication has been unstable now that all > servers are running 2.3.12. I suspect this bug is part of it. > > > Your patch looks viable, though it causes unreachable code later in the > > function, and I'd probably update myfetch to be matching in behaviour > > just for clarity. I'll put that together now. > > I wasn't willing to remove that code but I didn't think it would > ever be hit. I've always hated the whole transaction management stuff. My patch is still a bit cut-and-pasty, but at least it has some good points: a) tidptr name for all double pointers b) tid name for the current tid in all functions where it's used enough to care about dereferencing just once. c) remove the stupid "malloc" flag and just always malloc, it's not that expensive. d) following on from that, creation is initialisation. Put all the transaction start logic in one place. I'd be even happier to merge a bunch more of the logic into a supporter function rather than having 4 copies of it, but this will do for now. By the way - I've only compile tested it so far... I'll do some more testing before suggesting it for real :) -- Bron Gondwana brong at fastmail.fm -------------- next part -------------- A non-text attachment was scrubbed... Name: cyrus-skiplist-locking-rework-2.3.12.diff Type: text/x-patch Size: 12299 bytes Desc: not available Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20080829/f4c845b6/attachment-0001.bin From brong at fastmail.fm Fri Aug 29 02:31:27 2008 From: brong at fastmail.fm (Bron Gondwana) Date: Fri, 29 Aug 2008 16:31:27 +1000 Subject: [PATCH] Set lock pointer at the start of foreach (Re: 2.3.12 transaction problem within skiplist DB->foreach()) In-Reply-To: <20080828221744.GA71671@exuma.irbs.com> References: <20080828221744.GA71671@exuma.irbs.com> Message-ID: <20080829063127.GA16173@brong.net> On Thu, Aug 28, 2008 at 06:17:44PM -0400, John Capo wrote: > Aug 28 14:04:24 m4 syncserver[77241]: Failed to access inbox for yadda > Aug 28 14:04:25 m4 syncserver[77241]: Fatal error: Internal error: assertion failed: cyrusdb_skiplist.c: 622: db-> lock_status == UNLOCKED > Aug 28 14:04:25 m4 syncserver[77241]: skiplist: closed while still locked > > Foreach bug, annotation code DB API violation, or ??? Ok - it was a foreach bug :( Ken, Please find attached a tested patch that fixes the problem and refactors the bloody mess that was locking code into a couple of nice, neat functions. Specifically, it: a) does away with non-malloc transactions. They added needless complexity (I already had a separate patch dealing with some of that which this one obsoletes). A malloc doesn't cost that much. b) moves the malloc into newtxn c) adds a lock_or_update function which can be called whenever you need to be in a write-lock, and it just does the right thing, either updating the current txn or calling newtxn. d) standarises variable naming: tidptr for the 'struct txn **' everywhere, tid for a single pointer. So code looks the same in all functions. e) fixes John's bug :) By putting the transaction into the double pointer at the start rather than the end of the foreach. In fact, by always updating the pointer at the start. It really simplifies the pointer handling considerably. This is the refactor I should have done back when I was dealing with this area adding current_txn checks and all that jazz. It was a mess of copy-paste code. I'm running this on one server now, with no problems. I'm quite confident that it's correct. I'll be updating the rest of our servers in a moment. Regards, Bron. -------------- next part -------------- A non-text attachment was scrubbed... Name: cyrus-skiplist-locking-rework-2.3.12.diff Type: text/x-diff Size: 12936 bytes Desc: not available Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20080829/9ec253a1/attachment.bin From wes at umich.edu Fri Aug 29 09:36:40 2008 From: wes at umich.edu (Wesley Craig) Date: Fri, 29 Aug 2008 09:36:40 -0400 Subject: [PATCH] Set lock pointer at the start of foreach (Re: 2.3.12 transaction problem within skiplist DB->foreach()) In-Reply-To: <20080829063127.GA16173@brong.net> References: <20080828221744.GA71671@exuma.irbs.com> <20080829063127.GA16173@brong.net> Message-ID: <8275B86A-F955-43BC-AFCD-CF551194504D@umich.edu> On 29 Aug 2008, at 02:31, Bron Gondwana wrote: > Please find attached a tested patch that fixes the problem and > refactors > the bloody mess that was locking code into a couple of nice, neat > functions. A quick rewiew looks good to me. I'll review it closely this afternoon and deploy it on a test cluster. Presuming I find no issues I'll commit it thereafter. Any chance of submitting it to bugzilla? :wes