Fixing rename properly
brong at fastmail.fm
Tue May 31 20:41:51 EDT 2016
I got paged at 3am last night due to a replication mess caused by a complex set of renames and possibly partial rename failures (I haven't read the logs to be sure).
The underlying issue was folders with duplicate UNIQUEIDs, so yeah - probably a failed rename. Fine.
But the root cause here is that folder locations and renames are bogus:
a) filename paths contain folder names, which not only restricts valid names depending on the platform, but has all the shell quoting risks you might imagine.
b) renaming folders involves a bunch of IO - and we wound up removing the fast rename codepath because it's messy with sub folders and was never really safe.
The solution here is to store files on disk with the folder uniqueid and update the replication protocol to replicate folders by uniqueid. It also needs mailboxes.db changes, but the $RACL$ work has already set up a nice way to do that.
The migration path is easy. Existing paths stay the same. If any folder is the source of a rename, it gets converted first and then the rename happens. All new folders get uniqueid paths. Then you can just run a task that walks folders and runs exactly the way that rename does now - linking the files across, updating mailboxes.db to know that it's a uniqueid pathed folder, removing the old stuff. Should be pretty easy.
We'll need to fix a bunch of tools of course. And we'll want to store the name history in cyrus.header so reconstruct can still fix things.
OK, file naming. My plan is this:
* screw 'domain/*', it's horrible. usernames are user at domain. Filesystems cope fine with that.
* user paths are $spool/user/$username/$uniqueid
* shared paths are $spool/$toplevel/$uniqueid
(yeah, so you can't have more than 32k folders in a single user or single toplevel, I think that's OK)
hashing - if enabled - is on the first letter of the second folder still, so my INBOX would be:
/var/spool/imap/b/brong at fastmail.fm/48902a4f-73c2-4e0f-ad4a-3e324fd33853/
Which does mean that you can't move folders between users without doing a copy. I think that's OK, 99% vs 1% cases. I would actually happily just reject those outright and make the user create a new folder with a new UNIQUEID, copy the data, delete the old one.
For later consideration - storing all the individual mail messages in a per-user GUID pool instead of by UID. Same basic logic of not duplicating work or doing extra IO, but there's more to think about here.
brong at fastmail.fm
More information about the Cyrus-devel