seen_db format
Ken Murchison
ken at oceana.com
Tue Sep 27 15:27:14 EDT 2005
Raymond Sundland wrote:
> Thanks Ken.
>
> Regarding the mailbox UID, how is that UID determined and is there anyway to
> backtrack a UID to an actual mailbox? (either on the filesystem or in in the
> user.xxx.folder format?
Its in cyrus.header for the mailbox. See doc/internal/mailbox-format.html
>
> -----Original Message-----
> From: Ken Murchison [mailto:ken at oceana.com]
> Sent: Tuesday, September 27, 2005 12:44 PM
> To: Raymond Sundland
> Cc: info-cyrus at lists.andrew.cmu.edu
> Subject: Re: seen_db format
>
> Raymond Sundland wrote:
>
>
>>I was looking at implementing a Learn Spam / Learn Ham feature on my
>>server. Basically, I'll have a cronjob to read users' Learn Spam
>>folders and use spamassassin's learn function. Pretty basic stuff,
>>nothing magical going on here. SpamAssassin's Bayesian learn function,
>>however, requires you teach it what ham is as well, so I want to scan
>>the user's inbox for ham as well.
>>
>>
>>
>>So here's the trick
>>
>>
>>
>>I want to read the seen-state db of the user's inbox to make sure the
>>user has "seen" the message. The code will assume that if the user has
>>seen the message and has not moved it to the Learn Spam folder within a
>>time period (say 3 hours), then the message is Ham and learn it as such.
>>
>>
>>
>>I modified the seenstate_db parameter in imapd.conf to use flat files to
>>take a look at the format of the file and got this:
>>
>>
>>
>>7b4434cf433945c5 1 1127830200 1 1127829445 1
>>
>>
>>
>>I don't plan to keep it as a flat file, I was convert it back to
>>skiplist and use the perl CPAN module Algorithm::SkipList to read the
>>skiplist instead. Here's what I make of the entry so far, but I would
>>like a confirmation as to what each field means:
>>
>>
>>
>>7b4434cf433945c5 - hash of the file, but I can't figure out what kind of
>>hash this is
>>
>>1127830200 - last time the message was viewed
>>
>>1127829445 - either first time the message was viewed -or-
>>
>> the time it was entered
>>into the db -or-
>>
>> something else ;)
>>
>>
>>
>>As for the '1's, I assume at least one of these entries has to do with
>>the fact it's the 1. file in the user's inbox, but I don't know what the
>>others denote. Hence the question.
>>
>>
>>
>>Can anyone shed light on this for me?
>
>
> Look at doc/internal/database-formats.html in the Cyrus distro.
>
>
>
>>Also, if I were to use the perl module to open the seen state db quickly
>>to read entries, could this cause a corruption of the seen information?
>
>
> It shouldn't, since Cyrus allows simultaneous access to the same mailbox
> anyways, but its always safer to get this information in protocol (via IMAP)
>
>
--
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 2495 Main St. - Suite 401
716-604-0088 x26 Buffalo, NY 14214
--PGP Public Key-- http://www.oceana.com/~ken/ksm.pgp
More information about the Info-cyrus
mailing list