DSPAM training integration

Duncan Gibb Duncan.Gibb at SiriusIT.co.uk
Fri Jan 30 05:11:31 EST 2009


Sebastian Maus wrote:

WC> The point of a patch like Dmitriy proposes is for training DSPAM through
WC> user actions in the MUA.  For instance, logging (or otherwise handling)
WC> the contents of the X-DSPAM-Signature when the user refiles messages to
WC> or from their Spam folder.

SM> we did something like that for training personal SpamAssassin Bayes
SM> databases.  Each customer has three SPAM folders (Filtered,
SM> LearnSPAM, LearnHAM). If mail gets dropped into one of the "Learn*"
SM> folders, a cron job (which is in fact a self written, customized
SM> IMAP client) finds them, feeds them to a sa-learn process that
SM> uses the customer's Bayes database and finally deletes and expunges
SM> the folder's content.

Our typical solution is similar, but using a long-running perl script to
look up IMAP servers and the users expected on each from LDAP, then
connect with Mail::IMAPClient as the user (authz as an admin agent).  It
reads user-filed false negatives from a learn-spam folder, runs them
through a learning command (typically SA's "spamc -L"), and drops them
in a spam folder (where sieve would have put them if the MTA had got it
right the first time).  False positives are read from a different folder
and moved back into the user's INBOX once they've been learnt.  The
folder structure and learning command are configurable but currently
have to the the same across a particular installation.

I think what Giuseppe is asking for is a mechanism whereby the user's
action triggers the learning command directly, rather than relying on an
external agent (eg your cron job or our daemon) to come round and look
what the user has done.  That would save a fair amount of CPU resource
in large installations - and be more responsive from the user's point of
view.  An SME deployment would typically have a daemon cycle time around
all mailboxes of several minutes - longer if users have filed lots of
messages to learn.  In a large deployment agent-based learning would be
slower (or need multiple instances running over different servers or
groups of users).


Cheers


Duncan

-- 
Duncan Gibb, Technical Director
Sirius Corporation plc - The Open Source Experts
http://www.siriusit.co.uk/
Tel: +44 870 608 0063


More information about the Cyrus-devel mailing list