seen_db format

Raymond Sundland raymond at sundland.com
Tue Sep 27 12:21:18 EDT 2005


Actually, to correct myself.  I realized after I sent this that the perl
module I mentioned does not read skiplists but creates them within Perl.  In
any case, I will just switch to Berkeley then since there are indeed perl
modules I can use, despite the fact that BDB is not as fast.  Same questions
apply, however.

 

Thanks again.

 

-----Original Message-----
From: info-cyrus-bounces at lists.andrew.cmu.edu
[mailto:info-cyrus-bounces at lists.andrew.cmu.edu] On Behalf Of Raymond
Sundland
Sent: Tuesday, September 27, 2005 10:22 AM
To: info-cyrus at lists.andrew.cmu.edu
Subject: seen_db format

 

I was looking at implementing a Learn Spam / Learn Ham feature on my server.
Basically, I'll have a cronjob to read users' Learn Spam folders and use
spamassassin's learn function.  Pretty basic stuff, nothing magical going on
here.  SpamAssassin's Bayesian learn function, however, requires you teach
it what ham is as well, so I want to scan the user's inbox for ham as well.

 

So here's the trick

 

I want to read the seen-state db of the user's inbox to make sure the user
has "seen" the message.  The code will assume that if the user has seen the
message and has not moved it to the Learn Spam folder within a time period
(say 3 hours), then the message is Ham and learn it as such.

 

I modified the seenstate_db parameter in imapd.conf to use flat files to
take a look at the format of the file and got this:

 

7b4434cf433945c5        1 1127830200 1 1127829445 1

 

I don't plan to keep it as a flat file, I was convert it back to skiplist
and use the perl CPAN module Algorithm::SkipList  to read the skiplist
instead.  Here's what I make of the entry so far, but I would like a
confirmation as to what each field means:

 

7b4434cf433945c5 - hash of the file, but I can't figure out what kind of
hash this is

1127830200  - last time the message was viewed

1127829445    - either first time the message was viewed -or-

                                                the time it was entered into
the db -or-

                                                something else ;)

 

As for the '1's, I assume at least one of these entries has to do with the
fact it's the 1. file in the user's inbox, but I don't know what the others
denote.  Hence the question.

 

Can anyone shed light on this for me?

 

Also, if I were to use the perl module to open the seen state db quickly to
read entries, could this cause a corruption of the seen information?

 

Thanks.

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.andrew.cmu.edu/mailman/private/info-cyrus/attachments/20050927/44ad9848/attachment.html


More information about the Info-cyrus mailing list