Wiki, information on database back ends.

Tue Sep 14 18:46:06 EDT 2010

On Tue, Sep 14, 2010 at 02:24:17PM -0500, Patrick Goetz wrote:
> On 09/14/2010 12:26 PM, Dave McMurtrie wrote:
> >
> >The Cyrus wiki's content has been mostly moved to
> >http://www.cyrusimap.org/ except for what I considered to be useless or
> >outdated content.
> >
> 
> Hmmm, the take away message is the wiki is rather light on useful,
> timely content.  <:)
> 
> There's been some discussion on the Debian cyrus list about how to
> automate upgrades from cyrus 2.n.k to cyrus 2.m.j.  Jeroen van
> Meeuwen (on both lists) suggested that the cyrus RPM package
> features a utility called cyrus-imapd.cvt_cyrusdb_all which might be
> useful for this.

Yeah - we were talking about this the other day on instant messaging,
Jeroen and I!  I've written something a bit nicer.  Basically, I
ripped out the "guts" of cvt_cyrusdb and stuck it in lib/cyrusdb.c.
Then I wrote a "detect" function that checks that magic and figures
out if the file is berkeley, berkeley-hash or skiplist from its magic.
Then for each file it checks if the type matches the configuration
value, and converts if it doesn't.

This is run during ctl_cyrusdb -r during startup.

> I've been looking at this script, and it mostly appears to be using
> cvt_cyrusdb to convert particular db files to Cyrus skiplists and
> then back again to the original db backend format.  I can't follow
> the script completely as it seems to rely on DB configuration
> details found in the imapd.conf file I don't have in my Debian
> 2.1.16 imap server, and it's also not clear how the script is run.
> 
> This raises a number of questions, though:
> 
> 1.
> Cyrus skiplists?  I thought all the DB files were in Berkeley DB
> format.  I tried to find some documentation on skiplists, but only
> found an old message to the developer list from Bron Gondwana
> discussing skiplist bugs
> (http://markmail.org/message/zbaq765brbg2acfj).

Yes, Cyrus Skiplists.  It's a DB format written entirely inside
Cyrus.  They're quite stable now.  The only real downside is that
the lock is global per database - they don't have any concept of
row locking, so concurrency can suffer.  This usually isn't a
big problem.  At FastMail we've had ALL our databases in skiplist
for a couple of years now.

> On the other hand, this guy talks about converting all Berkeley DB
> files to skiplists because of perceived libdb bugs:
> http://www.mail-archive.com/info-cyrus@lists.andrew.cmu.edu/msg31953.html

I'm currently trying to find someone (either inside Opera or elsewhere)
to help me debug Cyrus' use of BDB and see if we can do it better.
I suspect the BDB problems are more with how we're using it as
with BDB itself.

> Skiplists: what are they, when and why use them?  Either I'm a bad
> googler or documentation seems to be lacking.

lib/cyrusdb_skiplist.c - knock yourself out :)

They're very good for sequential reads - "foreach" and friends.  It's
a very lightweight format, which provides pretty good locality of
data - so it's fairly cache friendly.

> 2.
> The Redhat cvt_cyrusdb_all script seems to assume a specific set of
> database files.  Is the set of cyrus imap DB files fixed, and if so
> what are they?  Is there any documentation on what each database
> file contains? This would be very useful to people trying to convert
> older cyrus IMAP installations to new ones.

Pretty much, yes.  There are a handful of files - plus the per user
seen, sub and quota files.  Seen are skiplist and sub is flat file.
Quota is its own special format.  Here's the listing of the main
databases:

dblist[] = {
    { FNAME_MBOXLIST,>-->-------&config_mboxlist_db,>---1 },
    { FNAME_QUOTADB,>--->-------&config_quota_db,>------1 },
    { FNAME_ANNOTATIONS,>-------&config_annotation_db,>-1 },
    { FNAME_DELIVERDB,>->-------&config_duplicate_db,>--0 },
    { FNAME_TLSSESSIONS,>-------&config_tlscache_db,>---0 },
    { FNAME_PTSDB,>----->-------&config_ptscache_db,>---0 },
    { FNAME_STATUSCACHEDB,>-----&config_statuscache_db,>0 },
    { NULL,>---->------->-------NULL,>-->------->-------0 }
};

The only three you really need to care about are mboxlist,
quota and annotations - and of those, quota probably doesn't
exists if you've got "legacyquota".  By legacy I mean, we use
it - because it's less lock contention and more reliable.

Anyway.  Discard the ones with '0' in the archive value,
because they're just caches and the format has probably
changed anwyay - but upgrade your mboxlist and annotation
files.

Skiplist hasn't changed format in approximately forever.
I have considered upgrading it (mainly to add some more
internal integrity checks), but the benefits haven't
outweighed the costs yet.  I did write a skiplist-2 file
format at one point and start playing with it, but that
was years ago.

> 3.
> The dicussion of DB backends leads one to wonder if this means
> Berkeley DB or skiplists, or if other backends are used, too?  Is
> there any documentation on this?

There's flat - and Ken added some SQL support (sqlite,
mysql and postgresql) a little while back, though I
haven't tested it yet.

No, there's not much documentation.  I'm working on
fixing that too.  I wrote up an outline of what I want
to document on the old wiki - not sure if it's been
ported across, but I have a copy in my email as well.
I'll paste it below.

Bron.

====================================================

Here's an overview of what needs to be documented.

---++ On Disk Format

   * mailbox

      * cyrus.header

      * cyrus.index

      * cyrus.cache

      * cyrus.squat (stub for now)

   * message files (rfc822)

   * file naming

      * dir hashing algorithms
      * config variables (including partitions)

      * domain split

   * db subformats

      * quota

      * seen

      * sub

      * mboxlist

      * deliver

      * annotations

      * statuscache

   * sieve

   * sync log files

   * proc files

   * "special" - shutdown, etc.

   * db formats: skiplist, flat, berkeley, quotalegacy

---++ Locking

   * name locks

   * cyrus.index locks

   * deadlock prevention

---++ Index API

   * how it works

   * how the "client view" is kept in sync

---++ Replication

   * wire format (dlist)

   * full protocol overview

   * locking considerations

   * sync_crc - calculation and purpose

   * split brain recovery

---++ Reconstruct

   * how it works now

   * flags and purpose (also, man page)

---++ mbdump

   * still needs to be rewritten to use dlist!

   * incremental dumps

---++ Internal APIs

   * seqset_

   * buf_

   * charset_

   * prot_

There's lots of stuff that needs to be either documented
or updated to make Cyrus development viable for people
who aren't Bron right now.  Lots has changed!