Cyrus 2.5 status

Sat Jan 10 12:52:35 EST 2015

> > > I would put the history of names that it had in the cyrus.header too, so
> > > it
> > > can be recovered.
> > 
> > This will only help finding the correct name when you have a folder, but
> > not help finding a folder when you only got the path+name.
> 
> I don't understand what you mean.  From the cyrus.header file you can easily
> see the folder name.  From the uniqueid in the mailboxes.db (new format),
> you can figure the path name.  It will be something like `cyr_info datapath
> $foldername` to convert that into a path.

Consider you just have the filesystem data of a mailbox storage in your backup, 
no working cyrus installation which could parse a mailboxes.db. How do you find 
the filesystem path of a specific folder you are looking for?

> It will be possible to create the recovery unless cyrus.header is corrupted
> AND mailboxes.db is corrupted, in which case you would need to inspect the
> individual emails to see who owned them.

Both are binary structures which are cyrus version dependent and require a 
working cyrus installation of this version to be of any value. This means 
selecting the files to restore from backup has to be tightly integrated with a 
running cyrus which really complicates things. 

It is also a big burden for version-independent long-term storage. Currently I 
can take a snapshot from a 10 year old cyrus version, throw away all 
mailboxes.db and cyrus.headers, and recreate a working cyrus 2.4 installation 
from that just using simple shell scripts. I lose flags, acls and other 
metadata, but I get all the mails and folders, and that is what counts most 
for the users.

> > It is also a big plus for debugging & maintenance purposes to be able to
> > directly see the folders on the filesystem. E.g. doing a grep, lsof, ftop,
> > ncdu,...
> 
> Sure.  It's getting in the way of some serious optimisation possibilities
> though

All I'm asking for is to keep the folder structures intact as additional 
symlinks.

Lets say you have mail folders like this:

user/foo
user/foo/bar
user/foo/bar/baz

If I understood you correctly, this would become something like this on disk:

/var/spool/imap/uuid/1/234
/var/spool/imap/uuid/6/789
/var/spool/imap/uuid/a/bcd

I think adding something like this would not really hinder or complicate 
optimization much:

/var/spool/imap/user/foo -> ../uuid/1/234

/var/spool/imap/uuid/1/234/... -> ../../../user/foo
/var/spool/imap/uuid/1/234/bar -> ../../6/789

/var/spool/imap/uuid/6/789/... -> ../../1/234
/var/spool/imap/uuid/6/789/baz -> ../../a/bcd

/var/spool/imap/uuid/a/bcd/... -> ../../6/789

This way you can navigate the folder structure up and down with any regular 
file mananger or shell script in a nearly natural manner.

If "..." is the right symlink name for "up one folder" can of course be 
discussed.

> while not
> needing to add tons of complex rollback code to rename.

You propose to keep the data in cyrus.header and mailboxes.db - these are 
separate databases. So you still need to keep some kind of rollback in place 
in case the second of these writes fails. Adding a filesystem write & rollback 
to that should not be too big a problem.

Kind regards,

Gerd