Re-designing cyrus.cache format

Sébastien Michel sebastien.michel at atos.net
Tue Feb 12 13:29:53 EST 2013


2013/2/12 Bron Gondwana <brong at fastmail.fm>:
> One of the perennial topics on #cyrus is "what about a more configurable set of cached headers".
>

Indeed.

> As you can see, there are some normalised things from some headers.  The same information normalised in a DIFFERENT way in the ENVELOPE and then a BODYSTRUCTURE and a BODY response.

Yes it's redundant

>
> 1) keep the BODYSTRUCTURE, it's the result of parsing the entire message, and can't be calculated cheaply again
> 2) keep the SECTION data (possibly along with the bodystructure) - it's the offsets for the various parts of the message, same issue
> 3) add a list of "SUPPRESSED HEADERS".  This would list any header which is present in the file, but NOT in the cache.
> 4) cache every other header, including all the To:, From:, Subject:, etc - in as close to raw form as possible.
>
> The entire list of headers to suppress would initially be:
>
> received
> dkim-signature
> domainkey-signature
> domainkey-x509
>
> But it would be configurable as an imapd.conf option.
>
> NOTE: you can still infer the presence or absence just by querying the suppressed list - so many messages the entire suppressed list would just be 'received'.
>
> This should take fairly similar space to what we have now, be more flexible, and be more future-proof.

However, I think the cache file is already big today. It causes extra disk I/O.

> No matter how you want to parse the fields, the original values is what you've got!  Even if you change the list of headers you suppress, each cache record is complete in itself, so there's no loss of fidelity.
>
> It means a little more CPU to calculate the ENVELOPE, but seriously... I don't think it's a worry in the current world, and it's not so commonly requested anyway.

Completely agree

> =====
>
> Thoughts?

Your proposal sounds good. It is quite close to current dovecot
behavior, according to the documentation :

>Cache file may contain the following information for messages:
>
>    Message headers (some, not all)
>    Sent date (parsed Date: header)
>    Received date (IMAP's INTERNALDATE field)
>    Physical and virtual message sizes
>    Message's parsed MIME structure, allowing to quickly read only a specific MIME part (IMAP's FETCH BODY[1.2.3] command)
>    IMAP's BODY and BODYSTRUCTURE fields
>        If both are used, only BODYSTRUCTURE is saved, since BODY can be generated from it
>    IMAP's ENVELOPE isn't cached currently. Instead the headers used to build it are cached directly.

I also like the opportunity to get out old cached data that are no
longer needed. And the adaptative behavior depending how the IMAP
clients work :
http://wiki2.dovecot.org/IndexFiles
http://wiki2.dovecot.org/Design/Indexes/Cache

However, I wonder what happens when a webmail users requests to sort
the mails by sender, if From headers are not all cached !

Regards,
Sébastien


More information about the Cyrus-devel mailing list