Seen databases

Ken Murchison murch at andrew.cmu.edu
Tue May 4 15:59:51 EDT 2010


I've been thinking about this for a while and I keep coming back to the 
same answer.

seen_local is legacy and I wouldn't expect to find this in the wild 
anymore.  I don't think we should waste cycles doing anything with it.

I don't recall why seen_bigdb was created by one of my predecesors, but 
its not used in production at CMU.  I don't think its the way to go, 
even with your Seen state changes.

The reason is that I think the distributed Seen state offered by seen_db 
is the best for sites with a large number of shared mailboxes, such as 
CMU.  We currently have over 14,000 shared mailboxes that are called 
bulletin boards on campus (used to be a lot more when we also had 
non-binary newsgroups).  And we have 10's of thousands of users reading 
these mailboxes and maintaining their own Seen state.  Using the current 
divide and conquer approach where we keep each user's Seen state in a 
separate database seems the most sane to me, rather than having several 
hundred or thousand handles open to a single database.

Any change that will effect the performance or stability of CMU's 
current environment would not be a good thing.


Bron Gondwana wrote:
> At the moment Cyrus appears to support 3 seen backends:
> 
> * seen_local:
>     stores all the seen data for all users in a file in
>     the spool directory.  Legacy.
> 
> * seen_db:
>     as far as I can see, everyone uses this.  It's the only
>     one that replication's SETSEEN_ALL command works with
>     for sure.
> 
> * seen_bigdb:
>     one single database for ALL users seen data.
> 
> Now - I'm in two minds.  I've already made one HUGE change
> to how seen is handled, in that it's a system_flag in the
> index record for the owner of the mailbox for user.*
> mailboxes now.  Also recentuid is in the index header for
> the owner.  This catches 99% of cases, reducing IO, since
> compulsory CONDSTORE means we're always updating the
> record for seen changes anyway.
> 
> So - in most cases there will be no $user.seen file any
> more.  I'm wondering if there is actually any benefit in
> supporting three different operating modes for seen, or
> if we should standardise on one. The choices are either
> seen_db (advantage - less can go corrupt if anything
> goes wrong) or seen_bigdb (advantage - only one file,
> reduces the "stat" call and inode caching cost)
> 
> For that matter - if we standardised all $user.sub files
> into a subscription.db, we'd cut yet another bunch of
> tiny files.  I'll probably leave that one alone for now,
> since otherwise these changes will get totally out of
> hand...
> 
> Speaking of which, I'm probably due to write another
> update on how my future branch work is going!
> 
> Anyway - the reason I'm writing this is: I can see
> that I'm going to need to provide a "seen_user_foreach"
> API which calls a function with each given seen record
> name... and I'm wondering if I should write 3 or just
> not bother and standardise on one.
> 
> Bron.
> 

-- 
Kenneth Murchison
Systems Programmer
Carnegie Mellon University


More information about the Cyrus-devel mailing list