seen_db format

Raymond Sundland raymond at sundland.com
Tue Sep 27 13:20:41 EDT 2005


Thanks Ken.

Regarding the mailbox UID, how is that UID determined and is there anyway to
backtrack a UID to an actual mailbox? (either on the filesystem or in in the
user.xxx.folder format?

-----Original Message-----
From: Ken Murchison [mailto:ken at oceana.com] 
Sent: Tuesday, September 27, 2005 12:44 PM
To: Raymond Sundland
Cc: info-cyrus at lists.andrew.cmu.edu
Subject: Re: seen_db format

Raymond Sundland wrote:

> I was looking at implementing a Learn Spam / Learn Ham feature on my 
> server.  Basically, I'll have a cronjob to read users' Learn Spam 
> folders and use spamassassin's learn function.  Pretty basic stuff, 
> nothing magical going on here.  SpamAssassin's Bayesian learn function, 
> however, requires you teach it what ham is as well, so I want to scan 
> the user's inbox for ham as well.
> 
>  
> 
> So here's the trick
> 
>  
> 
> I want to read the seen-state db of the user's inbox to make sure the 
> user has "seen" the message.  The code will assume that if the user has 
> seen the message and has not moved it to the Learn Spam folder within a 
> time period (say 3 hours), then the message is Ham and learn it as such.
> 
>  
> 
> I modified the seenstate_db parameter in imapd.conf to use flat files to 
> take a look at the format of the file and got this:
> 
>  
> 
> 7b4434cf433945c5        1 1127830200 1 1127829445 1
> 
>  
> 
> I don't plan to keep it as a flat file, I was convert it back to 
> skiplist and use the perl CPAN module Algorithm::SkipList  to read the 
> skiplist instead.  Here's what I make of the entry so far, but I would 
> like a confirmation as to what each field means:
> 
>  
> 
> 7b4434cf433945c5 - hash of the file, but I can't figure out what kind of 
> hash this is
> 
> 1127830200  - last time the message was viewed
> 
> 1127829445    - either first time the message was viewed -or-
> 
>                                                 the time it was entered 
> into the db -or-
> 
>                                                 something else ;)
> 
>  
> 
> As for the '1's, I assume at least one of these entries has to do with 
> the fact it's the 1. file in the user's inbox, but I don't know what the 
> others denote.  Hence the question.
> 
>  
> 
> Can anyone shed light on this for me?

Look at doc/internal/database-formats.html in the Cyrus distro.


> Also, if I were to use the perl module to open the seen state db quickly 
> to read entries, could this cause a corruption of the seen information?

It shouldn't, since Cyrus allows simultaneous access to the same mailbox 
anyways, but its always safer to get this information in protocol (via IMAP)


-- 
Kenneth Murchison     Oceana Matrix Ltd.
Software Engineer     2495 Main St. - Suite 401
716-604-0088 x26      Buffalo, NY 14214
--PGP Public Key--    http://www.oceana.com/~ken/ksm.pgp






More information about the Info-cyrus mailing list