Seen databases
Bron Gondwana
brong at fastmail.fm
Wed Apr 28 22:58:22 EDT 2010
On Thu, Apr 29, 2010 at 08:44:54AM +1000, Rob Mueller wrote:
> Whether to go seen_db or seen_bigdb, that's trickier. seen_db is
> what almost everyone uses now, but seen_bigdb seems almost sane
> since in most cases, the users own seen state will be in the
> cyrus.index.
That's what I figured...
> There's one issue with seen_bigdb though, you really would have to
> use a real DB (eg bdb or skiplist), not the text file db.
Yes, definitely. We use skiplist for seen_db at the moment anyway.
Also seen_db is what most people use, so it's pretty well tested.
> The other issue I can see, is that seen db is indexed by folder
> unqid. How "unique" are folder id's. They're generated in a pretty
> adhoc fashion, and it's always scared me that it might be too easy
> to generate clashes (when restoring from backups especially), which
> would be especially bad for a seen_bigdb.
It doesn't really matter for a seen_bigdb, because they'll be keyed
by user AND uniqueid - meaning they are no more likely to generate
clashes than they were before under seen_db.
Besides, they only matter within the non-user folders now.
More interesting is the potential for clashes during replication, which
would generate a rename event across users. That could get super-ugly!
But it's not a high risk - the adhoc uniqueid is a hash of the folder
name concatenated with the uidvalidity, so you'd have to have a hash
collision and creation at the same second. Restore from backup after
a rename is the disaster case. The best way to protect against that is
to move the cyrus.header data into a central DB and scan it for matches
before creating an entry. Either key an "index" db against the uniqueid
directly, or just do a full table scan. The IMAP "LIST" command already
does a full table scan, so it can't be TOO expensive :)
Bron.
More information about the Cyrus-devel
mailing list