Sieve failure on deliverdb corruption...
robm at fastmail.fm
Thu Jun 19 00:43:37 EDT 2003
Good point about complete lmtp failure. Our automated systems which try and
send an email and check for it every 20 minutes would have picked up such a
failure pretty quickly and contacted us... not to mention all the users :)
On the other hand... we have a script that continuously tail's the cyrus log
looking for certain conditions to alert us. I've just added a check in that
if it finds a 'DB_RUNRECOVER' error, it will automatically do a:
su cyrus -c "cd /var/imap/db; /usr/bin/db_recover"
Which should hopefully fix the problem without us even having to get
involved if it ever happens again :)
(If a problem still persists, it then SMS's us)
Is there anyway you can detect an error where db_recover needs to be run,
and do so. If that then fails, then you terminate ltmpd? Basically try and
fix, and only fail it you really can't...
----- Original Message -----
From: "Rob Siemborski" <rjs3 at andrew.cmu.edu>
To: "Rob Mueller" <robm at fastmail.fm>
Cc: "Ken Murchison" <ken at oceana.com>; <info-cyrus at lists.andrew.cmu.edu>
Sent: Thursday, June 19, 2003 2:35 PM
Subject: Re: Sieve failure on deliverdb corruption...
> On Thu, 19 Jun 2003, Rob Mueller wrote:
> > 1. Duplicate DB errors causes mail delivery to fail
> > a) Email lost/bounced. Really bad.
> It depends on the type of failure.
> If we're returing 5xx error codes, then yes, this is the case. Otherwise,
> we have a case where sieve isn't answering (say, if it fatal()s as in my
> other message) or its returning temp failures. In this case, the MTA
> should queue for several days, and no mail should be lost (provided your
> administrator/monitoring system is awake).
> > 2. Duplicate DB errors causes sieve to fail
> > 3. Duplicate DB errors cause multiple vacation responses/redirects
> These are harder to detect in an automated fashion, and thus are likely to
> persist for longer than just killing lmtpd totally.
> Offhand, I'm not totally convinced killing lmtpd is the right course of
> action, but it does make the failure very easy to detect (and the logs
> tend to make it obvious that a corrupted deliver.db is the problem).
> Rob Siemborski * Andrew Systems Group * Cyert Hall 207 * 412-268-7456
> Research Systems Programmer * /usr/contributed Gatekeeper
More information about the Info-cyrus