Seen databases

Rob Mueller robm at fastmail.fm
Wed Apr 28 18:44:54 EDT 2010


> So - in most cases there will be no $user.seen file any
> more.  I'm wondering if there is actually any benefit in
> supporting three different operating modes for seen, or
> if we should standardise on one. The choices are either
> seen_db (advantage - less can go corrupt if anything
> goes wrong) or seen_bigdb (advantage - only one file,
> reduces the "stat" call and inode caching cost)

I think in this case, reducing options is a good idea. seen_local is legacy 
anyway, and why have two other options, when as far as i can tell, everyone 
always uses only one of them anyway.

So I'd say go with one option, means there's less variables and things to 
debug anyway.

Whether to go seen_db or seen_bigdb, that's trickier. seen_db is what almost 
everyone uses now, but seen_bigdb seems almost sane since in most cases, the 
users own seen state will be in the cyrus.index.

There's one issue with seen_bigdb though, you really would have to use a 
real DB (eg bdb or skiplist), not the text file db.

The other issue I can see, is that seen db is indexed by folder unqid. How 
"unique" are folder id's. They're generated in a pretty adhoc fashion, and 
it's always scared me that it might be too easy to generate clashes (when 
restoring from backups especially), which would be especially bad for a 
seen_bigdb.

Rob



More information about the Cyrus-devel mailing list