From robm at fastmail.fm Wed Aug 5 23:01:37 2009 From: robm at fastmail.fm (Rob Mueller) Date: Thu, 6 Aug 2009 13:01:37 +1000 Subject: Moving users between servers with replication Message-ID: <9782B1C3942C4E3287A1785BB5A17887@Atticus> We currently use the cyrus replication system to move users between servers. This works really nicely. We have code that creates a temporary cyrus.conf file with the appropriate contents that allows us to do a single sync_client -u to copy everything about a user from one server to another. However we recently discovered a problem. If that user has a shared seen state, it doesn't work because the shared seen state is stored in the special "anyone" seen file. Of course, we can't just add "sync_client -u anyone", because that would replace the entire shared seen state file on the target server with the one from the source server, we only want the records for shared folders to be copied. Net result is that the replication system is a great way to move users safely from one server to another, *except* for users with shared seen state. Really annoying. Possible options: 1. When you sync a mailbox, sync_client checks the anyone seen file for a corresponding mailbox uniqueid record and syncs that 2. Rather than using the skiplist seen state, shared mailbox seen state is actually stored as a bit per message in the cyrus.index file (like the old seen state was) 3. Actually, it would be nice if user seen state + shared seen state (really 99.9% of cases) was stored as 2 separate bits in the cyrus.index, and only seen state for other user was in skiplist files. Discuss? Rob From brong at fastmail.fm Tue Aug 25 04:42:23 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Tue, 25 Aug 2009 18:42:23 +1000 Subject: FastMail Cyrus Patches for upstream Message-ID: <20090825084223.GA9787@brong.net> UP-FRONT notice. I'd particulary love feedback on the index format change. Here's the executive overview: index header: replace SPARE4 with HEADER_CRC - a CRC32 of the rest of the header. index record: add two additional 32bit values, CACHE_CRC and RECORD_CRC. CACHE_CRC is a crc32 of the entire cache record, and RECORD_CRC is a crc32 of the entire index record (including CACHE_CRC) - providing integrity checking all the way through. Total additional cost: 8 bytes per message plus some CPU time creating and checking the CRCs. Benefit - immediate index corruption detection. I think this is a good thing - in theory the underlying layers should be providing perfect abstractions, but in practice a memory error, disk error or even eratic cable can cause transient failures - and if we write those incorrect values back to the file they last forever. Ok - onto the main show! I've got a small pile of patches for upstream... some quite old and heavily tested, and a couple because I want to grab dibs on index minor_version 11 before someone else claims it and makes a total mess of our patch management! OK - here we go ( sorry about the long URLS - you can just go to http://cyrus.brong.fastmail.fm/ and follow the links, or of course hit github at http://github.com/brong/cyrus-imapd/ ) http://cyrus.brong.fastmail.fm/patches/imapd/0004-Rewrite-mailbox_cache_size-to-populate-a-pointer-str.patch Use a struct of individual cache items rather than macros, allows sanity checks on the cache record to detect corruption and avoid crashing! NOTE: I'd love to do this with index records as well, but it's an awful lot of work. I'll be doing that slowly as time permits. http://cyrus.brong.fastmail.fm/patches/imapd/0021-Complete-rewrite-of-charset-handling-using-Perl.patch http://cyrus.brong.fastmail.fm/patches/imapd/0022-Pass-a-pre-utf-8-encoded-body-to-sieve-for-tests.patch http://cyrus.brong.fastmail.fm/patches/imapd/0023-Add-iso-8859-10-11-13-14-16-charset-support.patch http://cyrus.brong.fastmail.fm/patches/imapd/0024-Fix-iso-2202-kr-and-support-euc-kr-as-well.patch http://cyrus.brong.fastmail.fm/patches/imapd/0025-Convert-to-unicode-5.1.patch NOTE - this is a huge patchset, and it's had an enormous amount of work done on it! This completely changes the charset encoding pathways within Cyrus. It gives unicode 5.1 support, a bunch of new charactersets, and full utf-8 support in sieve scripts. It also allows search with whitespace to work by compressing whitespace to a single space rather than removing it entirely. http://cyrus.brong.fastmail.fm/patches/imapd/0031-CRC32-functions.patch http://cyrus.brong.fastmail.fm/patches/imapd/0032-Add-version-11-mailbox-header-with-crc32-fields.patch This is the new one! It's not entirely complete in its behaviour yet, it only syslogs for issues, and it's not syslogging on all paths that read the index and cache records yet. I'll work on adding those over time. I _believe_ it's creating crc32s on all paths that modify the record, which is the important thing! I've chosen to only implement crc32 where zlib is available, putting stubs that return 0 otherwise. It would be easy enough to copy in the public domain crc32 code that's out there if we want to support everyone. I've also chosen to ignore a cache_crc of zero, so that we can upgrade indexes without a huge IO hit as we read the entire cache to find the initial values. It means one in 2^32 records won't have integrity protection. I can live with that. Comments please. Once I've dumped this stuff in CVS I'd love to cut another release with all the cool new features so other people can use them. The charset support in particular is a nice user-visible thing that fixes a bunch of bugzilla bugs and makes fixing others a lot easier. Bron. From murch at andrew.cmu.edu Tue Aug 25 07:06:21 2009 From: murch at andrew.cmu.edu (Ken Murchison) Date: Tue, 25 Aug 2009 07:06:21 -0400 Subject: FastMail Cyrus Patches for upstream In-Reply-To: <20090825084223.GA9787@brong.net> References: <20090825084223.GA9787@brong.net> Message-ID: <4A93C5AD.3010601@andrew.cmu.edu> The CRC additions make sense to me. I will look at the rest of the patches when I get back from vacation next week. Bron Gondwana wrote: > UP-FRONT notice. I'd particulary love feedback on the index > format change. Here's the executive overview: > > index header: replace SPARE4 with HEADER_CRC - a CRC32 of > the rest of the header. > > index record: add two additional 32bit values, CACHE_CRC > and RECORD_CRC. CACHE_CRC is a crc32 of the entire cache > record, and RECORD_CRC is a crc32 of the entire index > record (including CACHE_CRC) - providing integrity checking > all the way through. > > Total additional cost: 8 bytes per message plus some CPU time > creating and checking the CRCs. Benefit - immediate index > corruption detection. I think this is a good thing - in > theory the underlying layers should be providing perfect > abstractions, but in practice a memory error, disk error or > even eratic cable can cause transient failures - and if we > write those incorrect values back to the file they last > forever. > > Ok - onto the main show! > > > I've got a small pile of patches for upstream... some quite > old and heavily tested, and a couple because I want to grab > dibs on index minor_version 11 before someone else claims it > and makes a total mess of our patch management! > > OK - here we go ( sorry about the long URLS - you can just go > to http://cyrus.brong.fastmail.fm/ and follow the links, or of > course hit github at http://github.com/brong/cyrus-imapd/ ) > > http://cyrus.brong.fastmail.fm/patches/imapd/0004-Rewrite-mailbox_cache_size-to-populate-a-pointer-str.patch > > Use a struct of individual cache items rather than macros, > allows sanity checks on the cache record to detect corruption > and avoid crashing! > > NOTE: I'd love to do this with index records as well, but it's > an awful lot of work. I'll be doing that slowly as time permits. > > http://cyrus.brong.fastmail.fm/patches/imapd/0021-Complete-rewrite-of-charset-handling-using-Perl.patch > http://cyrus.brong.fastmail.fm/patches/imapd/0022-Pass-a-pre-utf-8-encoded-body-to-sieve-for-tests.patch > http://cyrus.brong.fastmail.fm/patches/imapd/0023-Add-iso-8859-10-11-13-14-16-charset-support.patch > http://cyrus.brong.fastmail.fm/patches/imapd/0024-Fix-iso-2202-kr-and-support-euc-kr-as-well.patch > http://cyrus.brong.fastmail.fm/patches/imapd/0025-Convert-to-unicode-5.1.patch > > NOTE - this is a huge patchset, and it's had an enormous amount of > work done on it! This completely changes the charset encoding > pathways within Cyrus. It gives unicode 5.1 support, a bunch of > new charactersets, and full utf-8 support in sieve scripts. It > also allows search with whitespace to work by compressing > whitespace to a single space rather than removing it entirely. > > http://cyrus.brong.fastmail.fm/patches/imapd/0031-CRC32-functions.patch > http://cyrus.brong.fastmail.fm/patches/imapd/0032-Add-version-11-mailbox-header-with-crc32-fields.patch > > This is the new one! It's not entirely complete in its behaviour yet, > it only syslogs for issues, and it's not syslogging on all paths that > read the index and cache records yet. I'll work on adding those over > time. I _believe_ it's creating crc32s on all paths that modify the > record, which is the important thing! > > I've chosen to only implement crc32 where zlib is available, > putting stubs that return 0 otherwise. It would be easy > enough to copy in the public domain crc32 code that's out there > if we want to support everyone. > > I've also chosen to ignore a cache_crc of zero, so that we can upgrade > indexes without a huge IO hit as we read the entire cache to find the > initial values. It means one in 2^32 records won't have integrity > protection. I can live with that. > > Comments please. Once I've dumped this stuff in CVS I'd love to cut > another release with all the cool new features so other people can use > them. The charset support in particular is a nice user-visible thing > that fixes a bunch of bugzilla bugs and makes fixing others a lot > easier. > > Bron. > -- Kenneth Murchison Systems Programmer Carnegie Mellon University From bawood at umich.edu Wed Aug 26 14:47:29 2009 From: bawood at umich.edu (Brian Awood) Date: Wed, 26 Aug 2009 14:47:29 -0400 Subject: FastMail Cyrus Patches for upstream In-Reply-To: <20090825084223.GA9787@brong.net> References: <20090825084223.GA9787@brong.net> Message-ID: <200908261447.29658.bawood@umich.edu> Lately we've been chasing cache file issues here pretty regularly and we just found something yesterday that might indicate what's occurring. While I like the idea of keeping a checksum of the records, it probably won't solve the issues we are seeing. It seems to probably relate to delayed expunge, the cyr_expire and unexpunge processes. After an unexpunge, some cache records end up with a negative offset from the previous message. E.g, the cache record for message 110 may have a location earlier in the file than the record for message 109. Cyrus tries to calculate the length of record 109, gets a negative size, then hilarity and lots of error logs ensue. Initially, it seems like index might be corrupted with a bogus offset for message 110, which just happens to be within range, but it isn't. The record it points to is valid and matches the message on disk. So it seems like there's just a bug somewhere causing the cache file to be written out of order, but I think there are possibly other issues being caused by calculating the cache record length each time it's needed. One example might be that we periodically find a cache file that is unnecessarily large for the number of messages that are present (sometimes to the point of not being able to mmap it). It seems like all these would be non-issues if the length of each cache record was stored along with the offset in the index file rather than trying to calculate it based on the offsets. Does anyone know if there a reason the cache record length isn't stored along with the offset? Brian On Tuesday 25 August 2009 @ 04:42, Bron Gondwana wrote: > UP-FRONT notice. I'd particulary love feedback on the index > format change. Here's the executive overview: > > index header: replace SPARE4 with HEADER_CRC - a CRC32 of > the rest of the header. > > index record: add two additional 32bit values, CACHE_CRC > and RECORD_CRC. CACHE_CRC is a crc32 of the entire cache > record, and RECORD_CRC is a crc32 of the entire index > record (including CACHE_CRC) - providing integrity checking > all the way through. > > Total additional cost: 8 bytes per message plus some CPU time > creating and checking the CRCs. Benefit - immediate index > corruption detection. I think this is a good thing - in > theory the underlying layers should be providing perfect > abstractions, but in practice a memory error, disk error or > even eratic cable can cause transient failures - and if we > write those incorrect values back to the file they last > forever. > > Ok - onto the main show! > > > I've got a small pile of patches for upstream... some quite > old and heavily tested, and a couple because I want to grab > dibs on index minor_version 11 before someone else claims it > and makes a total mess of our patch management! > > OK - here we go ( sorry about the long URLS - you can just go > to http://cyrus.brong.fastmail.fm/ and follow the links, or of > course hit github at http://github.com/brong/cyrus-imapd/ ) > > http://cyrus.brong.fastmail.fm/patches/imapd/0004-Rewrite-mailbox_c >ache_size-to-populate-a-pointer-str.patch > > Use a struct of individual cache items rather than macros, > allows sanity checks on the cache record to detect corruption > and avoid crashing! > > NOTE: I'd love to do this with index records as well, but it's > an awful lot of work. I'll be doing that slowly as time permits. > > http://cyrus.brong.fastmail.fm/patches/imapd/0021-Complete-rewrite- >of-charset-handling-using-Perl.patch > http://cyrus.brong.fastmail.fm/patches/imapd/0022-Pass-a-pre-utf-8- >encoded-body-to-sieve-for-tests.patch > http://cyrus.brong.fastmail.fm/patches/imapd/0023-Add-iso-8859-10-1 >1-13-14-16-charset-support.patch > http://cyrus.brong.fastmail.fm/patches/imapd/0024-Fix-iso-2202-kr-a >nd-support-euc-kr-as-well.patch > http://cyrus.brong.fastmail.fm/patches/imapd/0025-Convert-to-unicod >e-5.1.patch > > NOTE - this is a huge patchset, and it's had an enormous amount of > work done on it! This completely changes the charset encoding > pathways within Cyrus. It gives unicode 5.1 support, a bunch of > new charactersets, and full utf-8 support in sieve scripts. It > also allows search with whitespace to work by compressing > whitespace to a single space rather than removing it entirely. > > http://cyrus.brong.fastmail.fm/patches/imapd/0031-CRC32-functions.p >atch > http://cyrus.brong.fastmail.fm/patches/imapd/0032-Add-version-11-ma >ilbox-header-with-crc32-fields.patch > > This is the new one! It's not entirely complete in its behaviour > yet, it only syslogs for issues, and it's not syslogging on all > paths that read the index and cache records yet. I'll work on > adding those over time. I _believe_ it's creating crc32s on all > paths that modify the record, which is the important thing! > > I've chosen to only implement crc32 where zlib is available, > putting stubs that return 0 otherwise. It would be easy > enough to copy in the public domain crc32 code that's out there > if we want to support everyone. > > I've also chosen to ignore a cache_crc of zero, so that we can > upgrade indexes without a huge IO hit as we read the entire cache > to find the initial values. It means one in 2^32 records won't > have integrity protection. I can live with that. > > Comments please. Once I've dumped this stuff in CVS I'd love to > cut another release with all the cool new features so other people > can use them. The charset support in particular is a nice > user-visible thing that fixes a bunch of bugzilla bugs and makes > fixing others a lot easier. > > Bron. From brong at fastmail.fm Wed Aug 26 21:42:14 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Thu, 27 Aug 2009 11:42:14 +1000 Subject: FastMail Cyrus Patches for upstream In-Reply-To: <200908261447.29658.bawood@umich.edu> References: <20090825084223.GA9787@brong.net> <200908261447.29658.bawood@umich.edu> Message-ID: <20090827014214.GA13194@brong.net> On Wed, Aug 26, 2009 at 02:47:29PM -0400, Brian Awood wrote: > Lately we've been chasing cache file issues here pretty regularly and > we just found something yesterday that might indicate what's > occurring. While I like the idea of keeping a checksum of the > records, it probably won't solve the issues we are seeing. No, probably not! > It seems to probably relate to delayed expunge, the cyr_expire and > unexpunge processes. After an unexpunge, some cache records end up > with a negative offset from the previous message. E.g, the cache > record for message 110 may have a location earlier in the file than > the record for message 109. Cyrus tries to calculate the length of > record 109, gets a negative size, then hilarity and lots of error > logs ensue. I have a fix for that which avoids the hilarity. It's in that queue, and it makes all cache accesses go through one API which does bounds checking. > Initially, it seems like index might be corrupted with a bogus offset > for message 110, which just happens to be within range, but it isn't. > The record it points to is valid and matches the message on disk. So > it seems like there's just a bug somewhere causing the cache file to > be written out of order, but I think there are possibly other issues > being caused by calculating the cache record length each time it's > needed. One example might be that we periodically find a cache file > that is unnecessarily large for the number of messages that are > present (sometimes to the point of not being able to mmap it). No - the calculation of record length each time it's needed is only a problem when the pointer into the cache file is garbage. In which case getting the correct length bit of garbage is still going to ruin your day later on when it tries to parse the internal structure of the cache record and gets all confused. The only thing that makes it work OK is pointing to the correct location. There's a bug in process_records as used by ipurge even in the current release (I think - I pushed the fix to CVS a while back), whereby it used an incompatible set of flags. Basically you can't do an immediate delete EVER when delayed delete is present, or the cache offsets get screwed. The easy fix was to make ipurge a delayed delete. I'm not sure that unexpunge is fixed at all... because I don't use it. But I think it's OK now. > It seems like all these would be non-issues if the length of each > cache record was stored along with the offset in the index file > rather than trying to calculate it based on the offsets. Does anyone > know if there a reason the cache record length isn't stored along > with the offset? No, they'd still be issues. The correct fix is to ensure there's no code path where legitimate use creates invalid cache pointers. The CRC checking is orthagonal to that except that it will make it even more clear when (if, say if) we mess it up. Bron. From rob at nofocus.org Thu Aug 27 00:33:13 2009 From: rob at nofocus.org (Robert Banz) Date: Wed, 26 Aug 2009 21:33:13 -0700 Subject: FastMail Cyrus Patches for upstream In-Reply-To: <20090827014214.GA13194@brong.net> References: <20090825084223.GA9787@brong.net> <200908261447.29658.bawood@umich.edu> <20090827014214.GA13194@brong.net> Message-ID: <3C6E63D5-D99E-4608-B82C-52190F2F21B4@nofocus.org> Bron, As a happy user of cyrus, I'd just like to put a quick thanks out there to the contributions that you and the rest of your crew at fastmail make. It's great stuff. I wish more organizations had the inclination to contribute back as much. -rob From brong at fastmail.fm Thu Aug 27 00:57:24 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Thu, 27 Aug 2009 14:57:24 +1000 Subject: FastMail Cyrus Patches for upstream In-Reply-To: <3C6E63D5-D99E-4608-B82C-52190F2F21B4@nofocus.org> References: <20090825084223.GA9787@brong.net> <200908261447.29658.bawood@umich.edu> <20090827014214.GA13194@brong.net> <3C6E63D5-D99E-4608-B82C-52190F2F21B4@nofocus.org> Message-ID: <1251349044.16341.1331844671@webmail.messagingengine.com> On Wed, 26 Aug 2009 21:33 -0700, "Robert Banz" wrote: > > Bron, > > As a happy user of cyrus, I'd just like to put a quick thanks out > there to the contributions that you and the rest of your crew at > fastmail make. > > It's great stuff. I wish more organizations had the inclination to > contribute back as much. We prefer not to maintain our own patches outside the main tree for too long - amongst other things it leads to more incompatibilities! But hey - appreciate the thanks :) Bron. -- Bron Gondwana brong at fastmail.fm From brong at fastmail.fm Thu Aug 27 01:08:47 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Thu, 27 Aug 2009 15:08:47 +1000 Subject: Index Upgrade Bug Message-ID: <20090827050847.GA22232@brong.net> Doh! >From the end of "mailbox_read_index_header". if (!mailbox_doing_reconstruct && (mailbox->minor_version < MAILBOX_MINOR_VERSION)) { return IMAP_MAILBOX_BADFORMAT; } Now - this is unreachable because earlier in the same scope we have: if ((mailbox->start_offset < OFFSET_HEADER_SIZE) || (mailbox->record_size < INDEX_RECORD_SIZE) || (mailbox->minor_version < MAILBOX_MINOR_VERSION)) { if (mailbox_upgrade_index(mailbox)) return IMAP_IOERROR; syslog(LOG_INFO, "Index upgrade: %s (%d -> %d)", mailbox->name, mailbox->minor_version, MAILBOX_MINOR_VERSION); /* things might have been changed out from under us. reread */ return mailbox_open_index(mailbox); } (the syslog line is added by one of my patches -it's not in upstream) It's also bogus, because it doesn't stop us operating on future version mailboxes. I believe the logic is reversed, and it should be: if (!mailbox_doing_reconstruct && (mailbox->minor_version > MAILBOX_MINOR_VERSION)) { return IMAP_MAILBOX_BADFORMAT; } or even: if (!mailbox_doing_reconstruct && (mailbox->minor_version != MAILBOX_MINOR_VERSION)) { return IMAP_MAILBOX_BADFORMAT; } Ken - comments? Does this make sense to you? I can see this would have broken over the past few index changes - resized uuid => guid, modseq being added, etc. In general you can't just can't accurately update a mailbox of a future version and ensure all the fields are in sync. Symptom here (we hit this upgrading all our servers) was that there were checksum errors galore. Older imapds that were running across the upgrade rewrote parts of the index record without updating the checksums. THANKFULLY (more to the point, good design) there are no crash bugs or unaligned writes. The code is careful about using the index_header field for record size when seeking to the correct offset, but the INDEX_RECORD_SIZE constant for reading and writing buffers. Still - this is going to be an issue for sites upgrading to the new CRC32 protected indexes. It is necessary to stop cyrus before upgrading and then start it again afterwards so you never had processes with two different index minor versions coded into them running at the same time. Bron. From bawood at umich.edu Thu Aug 27 13:58:27 2009 From: bawood at umich.edu (Brian Awood) Date: Thu, 27 Aug 2009 13:58:27 -0400 Subject: FastMail Cyrus Patches for upstream In-Reply-To: <20090827014214.GA13194@brong.net> References: <20090825084223.GA9787@brong.net> <200908261447.29658.bawood@umich.edu> <20090827014214.GA13194@brong.net> Message-ID: <200908271358.28500.bawood@umich.edu> On Wednesday 26 August 2009 @ 21:42, Bron Gondwana wrote: > > I have a fix for that which avoids the hilarity. It's in that > queue, and it makes all cache accesses go through one API which > does bounds checking. I updated my clone of your git repository and took a look at the changes. Having an API to access cache information will definitely be an improvement over using macros all over the place. :) We really appreciate your contributions and usually have at least a couple Fastmail patches in our code base. Anyway, I hope you didn't/don't take my comments the wrong way. I was just hoping that if the next release would be bumping the index revision, we could also add the cache record lengths at the same time. We'd be willing to contribute a patch for it. However, if it looks like no one agrees with our assessment we won't expend the effort, since there's no way we could maintain our own revision of the meta files. Re-reading what I wrote below, I realize that I made it sound like our cache offsets are out of bounds of the file, but that isn't the case. The offsets we see are completely valid and point to the correct cache records. > > Initially, it seems like index might be corrupted with a bogus > > offset for message 110, which just happens to be within range, > > but it isn't. The record it points to is valid and matches the > > message on disk. So it seems like there's just a bug somewhere > > causing the cache file to be written out of order, but I think > > there are possibly other issues being caused by calculating the > > cache record length each time it's needed. One example might be > > that we periodically find a cache file that is unnecessarily > > large for the number of messages that are present (sometimes to > > the point of not being able to mmap it). > > No - the calculation of record length each time it's needed is only > a problem when the pointer into the cache file is garbage. In which > case getting the correct length bit of garbage is still going to > ruin your day later on when it tries to parse the internal > structure of the cache record and gets all confused. I understand your point, but it assumes the cache records are in sequential order which, unfortunately, isn't always the case. Conversely, if you have a bogus offset, using it to calculate the record length will be bogus too. I could create some examples to demonstrate what I'm trying to explain, if anyone is interested. > The only thing that makes it work OK is pointing to the correct > location. > > There's a bug in process_records as used by ipurge even in the > current release (I think - I pushed the fix to CVS a while back), > whereby it used an incompatible set of flags. Basically you can't > do an immediate delete EVER when delayed delete is present, or the > cache offsets get screwed. The easy fix was to make ipurge a > delayed delete. I'm not sure that unexpunge is fixed at all... > because I don't use it. But I think it's OK now. I remember seeing your post about that, we aren't currently using ipurge, though we give users access to the unexpunge functionality via a webapp. > > It seems like all these would be non-issues if the length of each > > cache record was stored along with the offset in the index file > > rather than trying to calculate it based on the offsets. Does > > anyone know if there a reason the cache record length isn't > > stored along with the offset? > > No, they'd still be issues. The correct fix is to ensure there's > no code path where legitimate use creates invalid cache pointers. > The CRC checking is orthagonal to that except that it will make it > even more clear when (if, say if) we mess it up. The point I'm trying to make is that the offsets are legitimate, it's just that they aren't in the order the code assumes them to be. Take for example if a user expunges every other message, or all of the most recent messages except for the first few. Then they copy the remaining messages to another mailbox, the code ends up coping all of the cache records, instead of just the records of the remaining messages. I'm not sure yet how we end up with offsets for higher UIDs which are smaller than offsets for lower UIDs, but when you calculate the record length with two of these valid offsets, the length is definitely not valid! Passing a negative length to write() is a fail. ;) I suspect you might start to detect at least some of these cases in the form of, apparently incorrect, cache record checksums. Thanks, Brian From brong at fastmail.fm Thu Aug 27 18:55:31 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Fri, 28 Aug 2009 08:55:31 +1000 Subject: FastMail Cyrus Patches for upstream In-Reply-To: <200908271358.28500.bawood@umich.edu> References: <20090825084223.GA9787@brong.net> <200908261447.29658.bawood@umich.edu> <20090827014214.GA13194@brong.net> <200908271358.28500.bawood@umich.edu> Message-ID: <20090827225531.GA7742@brong.net> On Thu, Aug 27, 2009 at 01:58:27PM -0400, Brian Awood wrote: > > On Wednesday 26 August 2009 @ 21:42, Bron Gondwana wrote: > > > > I have a fix for that which avoids the hilarity. It's in that > > queue, and it makes all cache accesses go through one API which > > does bounds checking. > > I updated my clone of your git repository and took a look at the > changes. Having an API to access cache information will definitely > be an improvement over using macros all over the place. :) We > really appreciate your contributions and usually have at least a > couple Fastmail patches in our code base. > > Anyway, I hope you didn't/don't take my comments the wrong way. I was > just hoping that if the next release would be bumping the index > revision, we could also add the cache record lengths at the same > time. We'd be willing to contribute a patch for it. However, if it > looks like no one agrees with our assessment we won't expend the > effort, since there's no way we could maintain our own revision of > the meta files. Feedback is great. I really appreciate it. I originally actually would have agreed with you that the cache record length should be in the index record (possibly replacing one of header_size or content_offset, since they're always the same!) - but I don't think the argument for having it holds water. a) it's a pain to update. For cache_crc I've special-cased "0" so we don't have to read every single cache record. You'd have to do the same here, or have an "options" value like pop3_uidl so that a mailbox could be "upgraded" during a process_records run or something. Don't worry, I've already considered this for my "get shared seen back into the index file rather than in a bloody huge non-findable skiplist file that can't be replicated to a new server easily". It's doable. but: b) it's un-necessary. If the record is not broken, then parsing out the length only takes 10 operations, and it's not like we have to do it that often. > Re-reading what I wrote below, I realize that I made it sound like our > cache offsets are out of bounds of the file, but that isn't the case. > The offsets we see are completely valid and point to the correct > cache records. So there shouldn't be any problem then... > > > Initially, it seems like index might be corrupted with a bogus > > > offset for message 110, which just happens to be within range, > > > but it isn't. The record it points to is valid and matches the > > > message on disk. So it seems like there's just a bug somewhere > > > causing the cache file to be written out of order, but I think > > > there are possibly other issues being caused by calculating the > > > cache record length each time it's needed. One example might be > > > that we periodically find a cache file that is unnecessarily > > > large for the number of messages that are present (sometimes to > > > the point of not being able to mmap it). > > > > No - the calculation of record length each time it's needed is only > > a problem when the pointer into the cache file is garbage. In which > > case getting the correct length bit of garbage is still going to > > ruin your day later on when it tries to parse the internal > > structure of the cache record and gets all confused. > > I understand your point, but it assumes the cache records are in > sequential order which, unfortunately, isn't always the case. > Conversely, if you have a bogus offset, using it to calculate the > record length will be bogus too. I could create some examples to > demonstrate what I'm trying to explain, if anyone is interested. Ahh - there were a couple of bugs. One that wrote everything between two records to the file, and another that truncated incorrectly on failed append, leaving all sorts of extra junk in cache files. They should both be fixed in 2.3.14 I think. Though cache files could still contain junk on disk until a rewrite gets the whole thing! > > The only thing that makes it work OK is pointing to the correct > > location. > > > > There's a bug in process_records as used by ipurge even in the > > current release (I think - I pushed the fix to CVS a while back), > > whereby it used an incompatible set of flags. Basically you can't > > do an immediate delete EVER when delayed delete is present, or the > > cache offsets get screwed. The easy fix was to make ipurge a > > delayed delete. I'm not sure that unexpunge is fixed at all... > > because I don't use it. But I think it's OK now. > > I remember seeing your post about that, we aren't currently using > ipurge, though we give users access to the unexpunge functionality > via a webapp. Unexpunge is the only thing I can think of that might give out of order cache records. Though I _think_ it may actually be totally broken with respect to cache records at the moment. I would recommend a reconstruct after any unexpunge at the moment. > > > It seems like all these would be non-issues if the length of each > > > cache record was stored along with the offset in the index file > > > rather than trying to calculate it based on the offsets. Does > > > anyone know if there a reason the cache record length isn't > > > stored along with the offset? > > > > No, they'd still be issues. The correct fix is to ensure there's > > no code path where legitimate use creates invalid cache pointers. > > The CRC checking is orthagonal to that except that it will make it > > even more clear when (if, say if) we mess it up. > > The point I'm trying to make is that the offsets are legitimate, it's > just that they aren't in the order the code assumes them to be. Take > for example if a user expunges every other message, or all of the > most recent messages except for the first few. Then they copy the > remaining messages to another mailbox, the code ends up coping all of > the cache records, instead of just the records of the remaining > messages. I'm not sure yet how we end up with offsets for higher > UIDs which are smaller than offsets for lower UIDs, but when you > calculate the record length with two of these valid offsets, the > length is definitely not valid! Passing a negative length to write() > is a fail. ;) Ok - so that's a bug that needs fixing, if it's still present in CVS. I _thought_ I had fixed it. Fixing this bug doesn't require having the record length in the index file though. It can be done just by reading the record itself - which we now have a nice easy API for, which returns the length and does CRC and internal structure checks at the same time! > I suspect you might start to detect at least some of these cases in > the form of, apparently incorrect, cache record checksums. Definitely. If anything is going odd, chances are the record checksums will be wrong! Bron. From bawood at umich.edu Thu Aug 27 21:19:28 2009 From: bawood at umich.edu (Brian Awood) Date: Thu, 27 Aug 2009 21:19:28 -0400 Subject: FastMail Cyrus Patches for upstream In-Reply-To: <20090827225531.GA7742@brong.net> References: <20090825084223.GA9787@brong.net> <200908271358.28500.bawood@umich.edu> <20090827225531.GA7742@brong.net> Message-ID: <200908272119.28948.bawood@umich.edu> On Thursday 27 August 2009 @ 18:55, Bron Gondwana wrote: > > a) it's a pain to update. For cache_crc I've special-cased "0" so > we don't have to read every single cache record. You'd have to do > the same here, or have an "options" value like pop3_uidl so that a > mailbox could be "upgraded" during a process_records run or > something. Don't worry, I've already considered this for my "get > shared seen back into the index file rather than in a bloody huge > non-findable skiplist file that can't be replicated to a new server > easily". It's doable. True, it would be a pain to update existing files. A low impact way might be to only generate it during things like an append or reconstruct. > b) it's un-necessary. If the record is not broken, then parsing > out the length only takes 10 operations, and it's not like we have > to do it that often. Also true, but if you had the length, it could be: read cache data of length x, calculate the crc of the buffer, verify and then parse it if it's valid. I suppose it's 6 of one and a 1/2 dozen of the other, it just makes more sense to me if the data was verified before parsing. > Ahh - there were a couple of bugs. One that wrote everything > between two records to the file, and another that truncated > incorrectly on failed append, leaving all sorts of extra junk in > cache files. They should both be fixed in 2.3.14 I think. Though > cache files could still contain junk on disk until a rewrite gets > the whole thing! We're running 2.3.14 + local & some Fastmail patches. I think the truncation of the file on a failed append works correctly, but it seems like extra records are still getting copied. > Unexpunge is the only thing I can think of that might give out of > order cache records. Though I _think_ it may actually be totally > broken with respect to cache records at the moment. I would > recommend a reconstruct after any unexpunge at the moment. unexpunge seems to be ok, based on some simple testing. It doesn't touch the cache file at all, and seems to just be moving records back to the index. I think we'll be reconstructing unexpunged mailboxes for now anyway. > Ok - so that's a bug that needs fixing, if it's still present in > CVS. > > I _thought_ I had fixed it. Fixing this bug doesn't require having > the record length in the index file though. It can be done just by > reading the record itself - which we now have a nice easy API for, > which returns the length and does CRC and internal structure checks > at the same time! Great! I'm looking forward to testing out this code, cache files have been our main ongoing issue. Brian From brong at fastmail.fm Thu Aug 27 22:30:22 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Fri, 28 Aug 2009 12:30:22 +1000 Subject: FastMail Cyrus Patches for upstream In-Reply-To: <200908272119.28948.bawood@umich.edu> References: <20090825084223.GA9787@brong.net> <200908271358.28500.bawood@umich.edu> <20090827225531.GA7742@brong.net> <200908272119.28948.bawood@umich.edu> Message-ID: <1251426622.10342.1332013455@webmail.messagingengine.com> On Thu, 27 Aug 2009 21:19 -0400, "Brian Awood" wrote: > > On Thursday 27 August 2009 @ 18:55, Bron Gondwana wrote: > > > > a) it's a pain to update. For cache_crc I've special-cased "0" so > > we don't have to read every single cache record. You'd have to do > > the same here, or have an "options" value like pop3_uidl so that a > > mailbox could be "upgraded" during a process_records run or > > something. Don't worry, I've already considered this for my "get > > shared seen back into the index file rather than in a bloody huge > > non-findable skiplist file that can't be replicated to a new server > > easily". It's doable. > > True, it would be a pain to update existing files. A low impact way > might be to only generate it during things like an append or > reconstruct. Yep - that's pretty much how the CRC32 on the cache records already works. > > b) it's un-necessary. If the record is not broken, then parsing > > out the length only takes 10 operations, and it's not like we have > > to do it that often. > > Also true, but if you had the length, it could be: read cache data of > length x, calculate the crc of the buffer, verify and then parse it > if it's valid. I suppose it's 6 of one and a 1/2 dozen of the other, > it just makes more sense to me if the data was verified before > parsing. Yeah - it does make a bit more sense when you put it like that. Does it make 8 bytes per record worth of sense though? (we need to pad to 8 byte multiples for 64 bit modseq alignment.) I'll have another look at that in a second. > > Ahh - there were a couple of bugs. One that wrote everything > > between two records to the file, and another that truncated > > incorrectly on failed append, leaving all sorts of extra junk in > > cache files. They should both be fixed in 2.3.14 I think. Though > > cache files could still contain junk on disk until a rewrite gets > > the whole thing! > > We're running 2.3.14 + local & some Fastmail patches. I think the > truncation of the file on a failed append works correctly, but it > seems like extra records are still getting copied. Right - that would be a bug then. I'll poke around the code. Just to confirm - the symptom is copied messages out of order causing the intermediate cache records to be copied as well? > > Unexpunge is the only thing I can think of that might give out of > > order cache records. Though I _think_ it may actually be totally > > broken with respect to cache records at the moment. I would > > recommend a reconstruct after any unexpunge at the moment. > > unexpunge seems to be ok, based on some simple testing. It doesn't > touch the cache file at all, and seems to just be moving records back > to the index. I think we'll be reconstructing unexpunged mailboxes > for now anyway. Yeah, that's the problem though, because if the copied-back records might have the wrong offsets unless everything was kept in sync properly. > > Ok - so that's a bug that needs fixing, if it's still present in > > CVS. > > > > I _thought_ I had fixed it. Fixing this bug doesn't require having > > the record length in the index file though. It can be done just by > > reading the record itself - which we now have a nice easy API for, > > which returns the length and does CRC and internal structure checks > > at the same time! > > Great! I'm looking forward to testing out this code, cache files have > been our main ongoing issue. Cool. Sounds good! We've been OK with cache files for a while now, but we had a RAID incident (tech pulled the wrong drive) which made us want better integrity checking just-in-case[tm]. Bron. -- Bron Gondwana brong at fastmail.fm From bawood at umich.edu Fri Aug 28 00:45:06 2009 From: bawood at umich.edu (Brian Awood) Date: Fri, 28 Aug 2009 00:45:06 -0400 Subject: FastMail Cyrus Patches for upstream In-Reply-To: <1251426622.10342.1332013455@webmail.messagingengine.com> References: <20090825084223.GA9787@brong.net> <200908272119.28948.bawood@umich.edu> <1251426622.10342.1332013455@webmail.messagingengine.com> Message-ID: <200908280045.06966.bawood@umich.edu> On Thursday 27 August 2009 @ 22:30, Bron Gondwana wrote: > > Yeah - it does make a bit more sense when you put it like that. > Does it make 8 bytes per record worth of sense though? (we need > to pad to 8 byte multiples for 64 bit modseq alignment.) I'll have > another look at that in a second. It seems worth it to me, I suppose it might use slightly more ram, but the extra disk usage should be negligible. > Right - that would be a bug then. I'll poke around the code. > Just to confirm - the symptom is copied messages out of > order causing the intermediate cache records to be copied > as well? Yeah, one place I noticed a problem was in index_copysetup(), there's CACHE_OFFSET(msgno+1) - CACHE_OFFSET(msgno); and cache_end - CACHE_OFFSET(msgno); > Yeah, that's the problem though, because if the copied-back records > might have the wrong offsets unless everything was kept in sync > properly. Remember that the index records contain the correct offset, but the cache records are "out of order". So doing something like, CACHE_OFFSET(msgno+1) - CACHE_OFFSET(msgno) returns a negative value. It could also return too large of a value if there are UIDs in the expunged state in between msgno+1 and msgno. But it looks like using your cache_parserecord() instead should correct those issues. Brian From brong at fastmail.fm Fri Aug 28 00:59:18 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Fri, 28 Aug 2009 14:59:18 +1000 Subject: Index Upgrade Bug Message-ID: <20090828045918.GA17853@brong.net> Found something else fun. It appears that squatter considers a short string (< 4 bytes) too short to search on, but rather than just skipping over it for the prefilter, it bails on the whole prefiltering stage! I'm in two minds about considering _ANY_ error non-fatal and just grounds for not restricting the search space actually, but short strings are clearly bogus. SEARCH BODY "rare string" SUBJECT "hi" will currently wind up scanning every single message even if you have squatted the mailbox, while: SEARCH BODY "rare string" SUBJECT "hello" will not. Annoying. Here's the patch: diff --git a/imap/search_engines.c b/imap/search_engines.c index 7b013a1..8e7b8b8 100644 --- a/imap/search_engines.c +++ b/imap/search_engines.c @@ -161,6 +161,8 @@ static int search_strlist(SquatSearchIndex* index, struct mailbox* mailbox, memset(tmp, 0, len); if (squat_search_execute(index, s, strlen(s), fill_with_hits, &r) != SQUAT_OK) { + if (squat_get_last_error() == SQUAT_ERR_SEARCH_STRING_TOO_SHORT) + return 1; /* The rest of the search is still viable */ syslog(LOG_DEBUG, "SQUAT string list search failed on string %s " "with part types %s", s, part_types); return 0; I'll be applying this one to CVS soon as well. I can't see any reason not to do it! It's tested, too :) - we're in the process of rolling out a "auto squat while the caches are hot" system - basically it logs body searches and then runs squatter (-i) if the prefilter wasn't very efficient (part of monitorcyrus.pl, the only cyrus change is an auditlog: line for bodysearches) Bron. From brong at fastmail.fm Fri Aug 28 01:08:35 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Fri, 28 Aug 2009 15:08:35 +1000 Subject: FastMail Cyrus Patches for upstream In-Reply-To: <200908280045.06966.bawood@umich.edu> References: <20090825084223.GA9787@brong.net> <200908272119.28948.bawood@umich.edu> <1251426622.10342.1332013455@webmail.messagingengine.com> <200908280045.06966.bawood@umich.edu> Message-ID: <20090828050835.GB17853@brong.net> On Fri, Aug 28, 2009 at 12:45:06AM -0400, Brian Awood wrote: > > Right - that would be a bug then. I'll poke around the code. > > Just to confirm - the symptom is copied messages out of > > order causing the intermediate cache records to be copied > > as well? > > Yeah, one place I noticed a problem was in index_copysetup(), there's > CACHE_OFFSET(msgno+1) - CACHE_OFFSET(msgno); > and > cache_end - CACHE_OFFSET(msgno); You're quite right! It should just be one case: copyargs->copymsg[copyargs->nummsg].cache_len = mailbox_cacherecord_index(mailbox, msgno, 0); > > Yeah, that's the problem though, because if the copied-back records > > might have the wrong offsets unless everything was kept in sync > > properly. > > Remember that the index records contain the correct offset, but the > cache records are "out of order". So doing something like, > CACHE_OFFSET(msgno+1) - CACHE_OFFSET(msgno) > returns a negative value. It could also return too large of a value > if there are UIDs in the expunged state in between msgno+1 and msgno. > But it looks like using your cache_parserecord() instead should > correct those issues. Yep :) I've rolled that patch back down into my cache parsing updates. Note I frequently rebase my "fastmail" branch on github, so it's a bit of a pain to follow! Unfortunately there's really no choice, because you have to rebase in and our of CVS anyway. Bron.