From brong at fastmail.fm Tue Feb 10 00:22:50 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Tue, 10 Feb 2009 16:22:50 +1100 Subject: Git web interface and server experimental copies Message-ID: <20090210052250.GA25154@brong.net> Since the backend server I've got access to is firewalled a little too much for public demonstrations, I've run all this stuff up on the shared machine unix10. Hope nobody minds! http://unix10.andrew.cmu.edu:8044/cgit.cgi Notice at the bottom of each page, a clone URL, like: git://unix10.andrew.cmu.edu/cyrus You can just run: git clone git://unix10.andrew.cmu.edu/cyrus cd cyrus git checkout origin/fastmail And you should be looking at a copy of my fastmail tree :) The "master" branch should be following cvs. I have a shell script that will bring it up to date easily. So - let me know what you think. That's 'cgit'. It offers easy viewing of specific changes and stuff too. Not quite as funky as github, but it's a "proof of concept" that this stuff should run pretty easily on the cmu servers. If I can compile and run it from my home directory! (cgit is running under lighttpd, I picked the port out of thin air). Here's my daemon commands: /afs/andrew.cmu.edu/usr3/brong/local/sbin/lighttpd -f /afs/andrew.cmu.edu/usr3/brong/etc/lighttpd.conf git daemon --export-all --base-path=/afs/andrew.cmu.edu/usr3/brong/git/work/ --detach Enjoy, Bron ( haven't tried running up redmine on AFS yet, that would be... fun ) From marc+cyrus at marcbrockschmidt.de Fri Feb 13 10:32:48 2009 From: marc+cyrus at marcbrockschmidt.de (Marc Brockschmidt) Date: Fri, 13 Feb 2009 16:32:48 +0100 Subject: [PATCH?] recursive RENAME not working for long UIDs Message-ID: <87y6wa8i8v.fsf@pindar.marcbrockschmidt.de> Heya, It seems that in a setup with long UIDs, the recursive rename of folders doesn't work (cf https://bugzilla.andrew.cmu.edu/show_bug.cgi?id=3120). I prepared a patch that seems to fix the issue, but did so by removing a bit of code I didn't really understand. We now would like to use this (or some other fix), but I would love to hear from someone with more clue about the code if this could be a problem. For reference, here's the problematic bit of code: /* Find fixed-string pattern prefix */ for (p = pat; *p; p++) { if (*p == '*' || *p == '%' || *p == '?' || *p == '@') break; } prefixlen = p - pattern; *p = '\0'; /* * If user.X.* or INBOX.* can match pattern, * search for those mailboxes next */ if (userid && (!strncmp(usermboxname+domainlen, pattern, usermboxnamelen-domainlen-1) || !strncasecmp("inbox.", pattern, prefixlen < 6 ? prefixlen : 6))) { [imap/mboxlist.c] The prefix-finding cuts off long UIDs in the middle, thus letting the usermboxname comparision eval to false. Limiting the condition in the fixed prefix search to actually only matching on globs fixes the problem for us. Here's the proposed patch: --- imap/mboxlist.c 2009-01-21 22:33:18.000000000 +0100 +++ imap/mboxlist.c.new 2009-01-21 22:36:29.000000000 +0100 @@ -1996,7 +1996,7 @@ /* Find fixed-string pattern prefix */ for (p = pat; *p; p++) { - if (*p == '*' || *p == '%' || *p == '?' || *p == '@') break; + if (*p == '*' || *p == '%') break; } prefixlen = p - pattern; *p = '\0'; So, was this a correct solution or am I missing some bit of the big picture here? Marc -- Fachbegriffe der Informatik - Einfach erkl?rt 163: SMD Schwer Montierbare Dinger (Holger K?pke) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20090213/5d2f5bc0/attachment.bin From brong at fastmail.fm Mon Feb 23 22:17:26 2009 From: brong at fastmail.fm (Bron Gondwana) Date: Tue, 24 Feb 2009 14:17:26 +1100 Subject: RFC: Charset Conversion Routines Message-ID: <20090224031726.GA13743@brong.net> I'm in the process of rewriting the lib/mkchartable.c and lib/charset.c with the eventual goal being a more flexible charset conversion API that can be used to make sieve rules match on the decoded values, and other funky things. It turns out to be quite a lot of changes. My initial work in progress is up here: http://github.com/brong/cyrus-imapd/commit/863b5b51dd27f184fa00de4ec5a6aca3308fc30e As you can see, it's quite a bit of code. Anyway - I'd like some feedback on a couple of things: a) It's going to use a little more CPU this way, because instead of having a table that converts _directly_ from the source charset to utf-8 in search-canonical-form, it does one conversion to unicode characters (16bit), then another table converts that into a stream of zero to 15 characters (yes, something expands to 15 separate codepoints, no, I don't want to know what it is!) Finally a third pass converts to utf-8 from the character codepoints. b) Should we make this 32bit unicode characters while we're at it, and extend the UTF-8 converter? c) For that matter, should we just be outsourcing all this crap to another library? Does anyone know a good library that can do what Cyrus does (take one character at a time and keep state?) d) Whitespace compression. I'm currently mapping all whitespace to ' ' instead of '', and then either stripping all ' ' from the string, or only outputting them if the previous character on the output string was not a space. Rob tells me that there are some issues with asian charsets and space not having any meaning - how best to handle? e) Interfaces, interfaces, interfaces. At the moment we have: * charset_compilepat - for use in: * charset_searchstring * charset_searchfile * charset_decode_mimebody - and * charset_encode_mimebody * charset_extractfile My current implementation that I'm working on uses "int flags" as an extra parameter to each of these, allowing CHARSET_CANON and CHARSET_STRIPSPACE to be passed down to the translation layer. Would people be happy with that as an interface? It's somewhat invasive, needing changes through lots of imap/*.c and sieve/*.c files. Bron. From alexey.melnikov at isode.com Tue Feb 24 06:13:47 2009 From: alexey.melnikov at isode.com (Alexey Melnikov) Date: Tue, 24 Feb 2009 11:13:47 +0000 Subject: RFC: Charset Conversion Routines In-Reply-To: <20090224031726.GA13743@brong.net> References: <20090224031726.GA13743@brong.net> Message-ID: <49A3D66B.4010702@isode.com> Bron Gondwana wrote: >I'm in the process of rewriting the lib/mkchartable.c >and lib/charset.c with the eventual goal being a more >flexible charset conversion API that can be used to >make sieve rules match on the decoded values, and >other funky things. > >It turns out to be quite a lot of changes. My initial >work in progress is up here: > >http://github.com/brong/cyrus-imapd/commit/863b5b51dd27f184fa00de4ec5a6aca3308fc30e > >As you can see, it's quite a bit of code. > > >Anyway - I'd like some feedback on a couple of things: > >a) It's going to use a little more CPU this way, because > instead of having a table that converts _directly_ from > the source charset to utf-8 in search-canonical-form, > it does one conversion to unicode characters (16bit), > then another table converts that into a stream of zero > to 15 characters (yes, something expands to 15 separate > codepoints, no, I don't want to know what it is!) > > Finally a third pass converts to utf-8 from the > character codepoints. > >b) Should we make this 32bit unicode characters while we're > at it, and extend the UTF-8 converter? > > Yes! And upgrade the tables to Unicode 5.1.0. And also change the normalization to conform to RFC 5051. >c) For that matter, should we just be outsourcing all this > crap to another library? Does anyone know a good library > that can do what Cyrus does (take one character at a time > and keep state?) > > I am not sure about that, but if people know a good library... From alexey.melnikov at isode.com Tue Feb 24 06:20:31 2009 From: alexey.melnikov at isode.com (Alexey Melnikov) Date: Tue, 24 Feb 2009 11:20:31 +0000 Subject: RFC: Charset Conversion Routines In-Reply-To: <20090224031726.GA13743@brong.net> References: <20090224031726.GA13743@brong.net> Message-ID: <49A3D7FF.70001@isode.com> Bron Gondwana wrote: >d) Whitespace compression. I'm currently mapping all > whitespace to ' ' instead of '', and then either stripping > all ' ' from the string, or only outputting them if the > previous character on the output string was not a space. > Rob tells me that there are some issues with asian charsets > and space not having any meaning - how best to handle? > > I think no matter what you do with whitespace compression, it might not work for some languages. So I wouldn't worry too much about this, as long as this procedure is optional (or can be controlled by a configuration option or a client). >e) Interfaces, interfaces, interfaces. At the moment we have: > >* charset_compilepat - for use in: > * charset_searchstring > * charset_searchfile >* charset_decode_mimebody - and > * charset_encode_mimebody >* charset_extractfile > >My current implementation that I'm working on uses "int flags" >as an extra parameter to each of these, allowing CHARSET_CANON >and CHARSET_STRIPSPACE to be passed down to the translation >layer. > This looks sensible. Another alternative is to implement whitespace compression in another function, layered on top of the charset API. >Would people be happy with that as an interface? It's >somewhat invasive, needing changes through lots of imap/*.c and >sieve/*.c files. > >Bron. > > From alexey.melnikov at isode.com Tue Feb 24 06:24:19 2009 From: alexey.melnikov at isode.com (Alexey Melnikov) Date: Tue, 24 Feb 2009 11:24:19 +0000 Subject: RFC: Charset Conversion Routines In-Reply-To: <49A3D7FF.70001@isode.com> References: <20090224031726.GA13743@brong.net> <49A3D7FF.70001@isode.com> Message-ID: <49A3D8E3.2030002@isode.com> Also, if you can change the code to use size_t (and eliminate warnings) at least internally in charset.c, that would be great too.