seen_db format
Ken Murchison
ken at oceana.com
Tue Sep 27 12:44:14 EDT 2005
Raymond Sundland wrote:
> I was looking at implementing a Learn Spam / Learn Ham feature on my
> server. Basically, I’ll have a cronjob to read users’ Learn Spam
> folders and use spamassassin’s learn function. Pretty basic stuff,
> nothing magical going on here. SpamAssassin’s Bayesian learn function,
> however, requires you teach it what ham is as well, so I want to scan
> the user’s inbox for ham as well.
>
>
>
> So here’s the trick
>
>
>
> I want to read the seen-state db of the user’s inbox to make sure the
> user has “seen” the message. The code will assume that if the user has
> seen the message and has not moved it to the Learn Spam folder within a
> time period (say 3 hours), then the message is Ham and learn it as such.
>
>
>
> I modified the seenstate_db parameter in imapd.conf to use flat files to
> take a look at the format of the file and got this:
>
>
>
> 7b4434cf433945c5 1 1127830200 1 1127829445 1
>
>
>
> I don’t plan to keep it as a flat file, I was convert it back to
> skiplist and use the perl CPAN module Algorithm::SkipList to read the
> skiplist instead. Here’s what I make of the entry so far, but I would
> like a confirmation as to what each field means:
>
>
>
> 7b4434cf433945c5 – hash of the file, but I can’t figure out what kind of
> hash this is
>
> 1127830200 - last time the message was viewed
>
> 1127829445 - either first time the message was viewed –or-
>
> the time it was entered
> into the db –or-
>
> something else ;)
>
>
>
> As for the ‘1’s, I assume at least one of these entries has to do with
> the fact it’s the 1. file in the user’s inbox, but I don’t know what the
> others denote. Hence the question.
>
>
>
> Can anyone shed light on this for me?
Look at doc/internal/database-formats.html in the Cyrus distro.
> Also, if I were to use the perl module to open the seen state db quickly
> to read entries, could this cause a corruption of the seen information?
It shouldn't, since Cyrus allows simultaneous access to the same mailbox
anyways, but its always safer to get this information in protocol (via IMAP)
--
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 2495 Main St. - Suite 401
716-604-0088 x26 Buffalo, NY 14214
--PGP Public Key-- http://www.oceana.com/~ken/ksm.pgp
More information about the Info-cyrus
mailing list