Experience with duplicate delivery database deadlocks?
Paul M Fleming
pfleming at siumed.edu
Fri Aug 20 13:16:40 EDT 2004
We had to limit the number of lmtp processes and let sendmail do the
queuing.. We're on smaller hardware and a lot fewer accounts and
messages/day but on a PIII w/ 1Gb of RAM we found 10-12 lmtpd was the
sweet spot to consistently prevent deadlocks..
Rob Carter wrote:
>
> Gentlefolk,
>
> Does anyone have experiences they'd be willing to share with combatting
> deadlocks within a BDB 3.3 duplicate delivery database on a high-traffic Cyrus
> v2.1.16 (or earlier 2.1.x) server? We're running a 60,000+ user/1.2 million
> message/day Cyrus postoffice on an 8-way Solaris system, and recently, we've
> started running into increasingly frequent deadlock problems with the
> duplicate suppression database.
>
> The symptoms we're seeing are probably what you'd expect -- our cyrus.conf is
> set to allow up to 120 lmtpd children to run simulateously, and when we hit a
> deadlock condition in the duplicate suppression database, we find that all 120
> of our running lmtpds lock up waiting for write locks in the database.
> "truss" shows them all stuck in "lwp_sema_wait()" calls. Inspection of the
> duplicate database after the fact sometimes shows corruption (usually null
> page pointers reported by db_verify), but sometimes shows nothing -- it's
> possible that we're seeing two different problems with the same end effect,
> but I suspect the database corruption is actually a side-effect of the
> deadlock problem...
>
> We've come up with a work-around that at least allows us to correct the
> situation without performing a master restart (with 4000+ simultaneous IMAPS
> connections, a master restart isn't something we can routinely do,
> unfortunately) -- renaming the duplicate delivery database and its log and
> __db* files and then kill -15'ing all the running lmtpds seems to get us back
> to a functional state with a fresh duplicate suppression database. We're up
> to seeing this happen a bit more than once a day now, though, and it's
> becoming seriously annoying.
>
> We're using the db3_nosync mechanism (with BDB version 3.3.11) for our dup
> suppression database -- one option we're strongly considering is switching to
> the regular "db3" mechanism (without the nosync option) to try to avoid the
> deadlocks, but we're a bit concerned about what that may do to lmtp
> throughput. Turning off duplicate suppression is...politically untenable...at
> this point...
>
> We've also considered running the db3 "db_deadlock" routine to periodically
> detect and try to correct deadlock conditions in the duplicate suppression
> database, but that's also somewhat scary -- it's unclear to us exactly what
> the behavior of an lmtpd awaiting a lock in the duplicate suppression database
> would be when its waiting lock got terminated by the db_deadlock daemon...
>
> Anyone have any experience or wisdom to share about either possible solution,
> or about other things that you've seen work in similar situations? At this
> point, upgrading to 2.2.x is on our radar, but probably not something we can
> approach before mid-semester (2-3 months out), so suggestions for solutions
> with Cyrus v2.1.x would be most appreciated...
>
> --Thanx much,
> --Rob Carter--
> ---
> Cyrus Home Page: http://asg.web.cmu.edu/cyrus
> Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
More information about the Info-cyrus
mailing list