Re-designing cyrus.cache format

Bron Gondwana brong at fastmail.fm
Tue Feb 12 04:44:42 EST 2013


One of the perennial topics on #cyrus is "what about a more configurable set of cached headers".

There's a couple of things about that.  One is that the current cache format is "interesting".  Here's my "dumper" format of one:

------------------------------------------------
ENVELOPE: ("Sun, 01 Jan 2012 06:00:01 +0300" "jabber.ru mailing list memberships reminder" ((NIL NIL "mailman-owner" "jabber.ru")) ((NIL NIL "mailman-bounces" "jabber.ru")) ((NIL NIL "mailman-owner" "jabber.ru")) ((NIL NIL "brong" "fastmail.fm")) NIL NIL NIL "<mailman.137.1325383201.14742.mailman at jabber.ru>")
BODYSTRUCTURE: ("TEXT" "PLAIN" ("CHARSET" "us-ascii") NIL NIL "7BIT" 1070 23 NIL NIL NIL NIL)
BODY: ("TEXT" "PLAIN" ("CHARSET" "us-ascii") NIL NIL "7BIT" 1070 23)
SECTION: 0:(0:2218 2218:1070 4294901760) (0:2218 2218:1070 0) ()
HEADERS: X-Spam-score: 2.4
X-Spam-hits: BAYES_50 0.8, DCC_CHECK 1.5, RP_MATCHES_RCVD 0.1, BAYES_USED user,
  SA_VERSION 3.3.1
X-Spam-source: IP='79.137.226.13', Host='mx.jabber.ru', Country='RU', FromHeader='ru',
  MailFrom='ru'
X-Resolved-to: brong at fastmail.fm
X-Delivered-to: brong at fastmail.fm
X-Mail-from: mailman-bounces at jabber.ru
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Message-ID: <mailman.137.1325383201.14742.mailman at jabber.ru>
Precedence: bulk
List-Id: <mailman.jabber.ru>
Errors-To: mailman-bounces at jabber.ru
X-Truedomain-Domain: jabber.ru
X-Truedomain-SPF: Pass
X-Truedomain-DKIM: Pass
X-Truedomain: Neutral

FROM: <mailman-owner at jabber.ru>
TO: <brong at fastmail.fm>
CC: 
BCC: 
SUBJECT: "jabber.ru mailing list memberships reminder"
------------------------------------------------

As you can see, there are some normalised things from some headers.  The same information normalised in a DIFFERENT way in the ENVELOPE and then a BODYSTRUCTURE and a BODY response.

We have already changed the normalisation rules here a couple of times.

There are two benefits to doing this.

1: reduced CPU usage re-parsing the fields for fast responses.
2: reduced IO because .cache files are a single file, so readahead benefits apply.

Really, "2" is the only thing of value these days.  Pretty much the entire benefit of the cyrus.cache is reduced IO compared to mapping in each message file.

So - I would propose this:

1) keep the BODYSTRUCTURE, it's the result of parsing the entire message, and can't be calculated cheaply again
2) keep the SECTION data (possibly along with the bodystructure) - it's the offsets for the various parts of the message, same issue
3) add a list of "SUPPRESSED HEADERS".  This would list any header which is present in the file, but NOT in the cache.
4) cache every other header, including all the To:, From:, Subject:, etc - in as close to raw form as possible.

The entire list of headers to suppress would initially be:

received
dkim-signature
domainkey-signature
domainkey-x509

But it would be configurable as an imapd.conf option.

NOTE: you can still infer the presence or absence just by querying the suppressed list - so many messages the entire suppressed list would just be 'received'.

This should take fairly similar space to what we have now, be more flexible, and be more future-proof.  No matter how you want to parse the fields, the original values is what you've got!  Even if you change the list of headers you suppress, each cache record is complete in itself, so there's no loss of fidelity.

It means a little more CPU to calculate the ENVELOPE, but seriously... I don't think it's a worry in the current world, and it's not so commonly requested anyway.

=====

Thoughts?

Bron.
-- 
  Bron Gondwana
  brong at fastmail.fm



More information about the Cyrus-devel mailing list