seen_db format

Ken Murchison ken at oceana.com
Tue Sep 27 15:27:14 EDT 2005


Raymond Sundland wrote:

> Thanks Ken.
> 
> Regarding the mailbox UID, how is that UID determined and is there anyway to
> backtrack a UID to an actual mailbox? (either on the filesystem or in in the
> user.xxx.folder format?

Its in cyrus.header for the mailbox.  See doc/internal/mailbox-format.html


> 
> -----Original Message-----
> From: Ken Murchison [mailto:ken at oceana.com] 
> Sent: Tuesday, September 27, 2005 12:44 PM
> To: Raymond Sundland
> Cc: info-cyrus at lists.andrew.cmu.edu
> Subject: Re: seen_db format
> 
> Raymond Sundland wrote:
> 
> 
>>I was looking at implementing a Learn Spam / Learn Ham feature on my 
>>server.  Basically, I'll have a cronjob to read users' Learn Spam 
>>folders and use spamassassin's learn function.  Pretty basic stuff, 
>>nothing magical going on here.  SpamAssassin's Bayesian learn function, 
>>however, requires you teach it what ham is as well, so I want to scan 
>>the user's inbox for ham as well.
>>
>> 
>>
>>So here's the trick
>>
>> 
>>
>>I want to read the seen-state db of the user's inbox to make sure the 
>>user has "seen" the message.  The code will assume that if the user has 
>>seen the message and has not moved it to the Learn Spam folder within a 
>>time period (say 3 hours), then the message is Ham and learn it as such.
>>
>> 
>>
>>I modified the seenstate_db parameter in imapd.conf to use flat files to 
>>take a look at the format of the file and got this:
>>
>> 
>>
>>7b4434cf433945c5        1 1127830200 1 1127829445 1
>>
>> 
>>
>>I don't plan to keep it as a flat file, I was convert it back to 
>>skiplist and use the perl CPAN module Algorithm::SkipList  to read the 
>>skiplist instead.  Here's what I make of the entry so far, but I would 
>>like a confirmation as to what each field means:
>>
>> 
>>
>>7b4434cf433945c5 - hash of the file, but I can't figure out what kind of 
>>hash this is
>>
>>1127830200  - last time the message was viewed
>>
>>1127829445    - either first time the message was viewed -or-
>>
>>                                                the time it was entered 
>>into the db -or-
>>
>>                                                something else ;)
>>
>> 
>>
>>As for the '1's, I assume at least one of these entries has to do with 
>>the fact it's the 1. file in the user's inbox, but I don't know what the 
>>others denote.  Hence the question.
>>
>> 
>>
>>Can anyone shed light on this for me?
> 
> 
> Look at doc/internal/database-formats.html in the Cyrus distro.
> 
> 
> 
>>Also, if I were to use the perl module to open the seen state db quickly 
>>to read entries, could this cause a corruption of the seen information?
> 
> 
> It shouldn't, since Cyrus allows simultaneous access to the same mailbox 
> anyways, but its always safer to get this information in protocol (via IMAP)
> 
> 


-- 
Kenneth Murchison     Oceana Matrix Ltd.
Software Engineer     2495 Main St. - Suite 401
716-604-0088 x26      Buffalo, NY 14214
--PGP Public Key--    http://www.oceana.com/~ken/ksm.pgp



More information about the Info-cyrus mailing list