uniqueid based paths (was: Minutes 16/3)

Thomas Jarosch thomas.jarosch at intra2net.com
Mon Mar 23 07:09:23 EDT 2015


Hi Bron,

On Saturday, 21. March 2015 06:32:39 Bron Gondwana wrote:
> Pretty sure we've already had this dance.  I did build a half-way-house
> solution a few years ago, which looked like this:
> 
> b/user/brong/user.brong.subfolder/
> 
> It had the advantage that no email folder contained another folder, so you
> could rename the folder on disk with a rename(2) call.
> 
> But it was still racy, it had filesystem length constraints, and is still
> more expensive than the uniqueid-based rename.

interesting concept. I think filesystem length constraints are the biggest 
issue of this solution. Some people go crazy with sub folder structures.

Anyway, let's skip this part ;)

> But we haven't forgotten about backup and restore.
> 
> > My colleague Gerd v. Egidy wrote this the last time on the topic:
> > -------------------
> > Both are binary structures which are cyrus version dependent and require
> > a working cyrus installation of this version to be of any value. This
> > means selecting the files to restore from backup has to be tightly
> > integrated with a running cyrus which really complicates things.
> 
> We are now going to be storing the ENTIRE mailboxes.db value in
> cyrus.header, to the point that mailboxes.db can be lost, and you can
> just run a reconstruct command to read the filesystem and rebuild it from
> cyrus.header.  Yay.

nice

> > It is also a big burden for version-independent long-term storage.
> > Currently I can take a snapshot from a 10 year old cyrus version, throw
> > away all mailboxes.db and cyrus.headers, and recreate a working cyrus
> > 2.4 installation  from that just using simple shell scripts. I lose
> > flags, acls and other metadata, but I get all the mails and folders,
> > and that is what counts most for the users.
> 
> Yeah, or you could use the complete, incremental backup format that Ellie
> is working on, based on my discussions with David Carter who wrote the
> original replication system and various mailing list threads, plus the
> backup system which works at FastMail already.
> 
> Then you don't lose flags, acls and other metadata either.

starting back in the cyrus 2.1.x days and later, we had severe problems
with corrupted cyrus.header files during backup restore. Those files
were written by faulty hardware and then crashed reconstruct on restore.
-> so we skipped them entirely.

Last week we again had some problem that made reconstruct 2.4.17 choke:
Somehow an entry in cyrus.index was broken. I've watched it via strace:

Reconstruct looped through all messages in the folder and mmaped
each message. Somehow the entry in cyrus.index seems to contain
a binary zero at the beginning of the entry, because the filename
reconstruct tried to open was just the folder name with no message file at 
all. mmap returned an error and reconstruct bailed out.
I think I reported this before but I'm not sure.
The fix was to throw away cyrus.index and do another reconstruct run.

Will the new backup format be text based or binary based?
Sorry I somehow missed the discussion about that.

> > If we switch to uniqueid based paths, what about providing symlinks to a
> > "virtual" directory structure? Updating symlinks would probably not be
> > atomic since we can't do a cyrus.header update and update a symlink at
> > the same time. Though we could only complete the "transaction" in
> > mailboxes.db when all steps are done. So this could work even when
> > imapd
> > is shut down in the middle of a rename.
> 
> We already had this discussion too.  This happens really rarely.  A better
> solution is a human readable and machine parseable version of the folder
> name in the cyrus.header.
> 
> How often is your cyrus instance totally down?  Not very.  And it will be
> easy to have a tool which can parse the mailboxes.db or cyrus.headers to
> find the mailbox you need.

Tech support is another use case here.

People often call and say they can't find a message in this and that folder
or that some Kolab data is not up to date. We ssh into the machine
and then use f.e. midnight commander to browse the folders of the user.

With uniqueid based paths, it won't be easy to use unix tools to grep
the message base of a single user only. You first need to filter
the list of folders and then limit your "view" to that folder list.

A tool that lists the user folder -> uniqueid based path
in a machine parsable way (=scriptable) might help here.

> > What about writing out the folder information from cyrus.header
> > also as JSON text files? This file is not interpreted by cyrus normally,
> > but it could be used by other tools and aid long term storage of
> > backups.
> 
> So JSON vs DLIST is an interesting question that I could be swayed on.
> DLIST is more IMAPpy, but that's not necessarily the most important thing.

JSON has the advantage that it's known to many people already
and is easily supported by many programming languages.
Not sure about DLIST in this regard.

> Now annotations.db is somewhat interesting.  I'm really tempted to store
> all the mailbox annotations in a different file within the mailbox,
> potentially "as well". The advantage of a central annotations.db is speed
> of looking up an annotation on every single mailbox - but that's what
> statuscache and friends are for. You find yourself paying the mailbox
> open price if you want to look at virtual annotations like 'dupdeliver'
> anyway.
> 
> Then both mailboxes.d AND annotations.db would be rebuildable from just a
> spool copy.

speed of lookup might be a concern especially for Kolab groupware users.
Some Kolab clients collect the Kolab "folder type" on application startup by
querying all the annotations for the user at once. This is fast right now.


One more thing regarding uniqueid paths and the duplication
of "meta data" in mailboxes.db + cyrus.header:

- Do you plan on doing any (periodic) sanity check
  if the meta data in mailboxes.db matches cyrus.header?

  If such a sanity check fails, how is it going to be resolved?


Cheers,
Thomas



More information about the Cyrus-devel mailing list