Experience with duplicate delivery database deadlocks?

Derrick J Brashear shadow at andrew.cmu.edu
Sun Aug 22 23:02:20 EDT 2004


On Fri, 20 Aug 2004, Rob Carter wrote:

> Gentlefolk,
>
> Does anyone have experiences they'd be willing to share with combatting 
> deadlocks within a BDB 3.3 duplicate delivery database on a high-traffic 
> Cyrus v2.1.16 (or earlier 2.1.x) server?  We're running a 60,000+ user/1.2 
> million message/day Cyrus postoffice on an 8-way Solaris system, and 
> recently, we've started running into increasingly frequent deadlock problems 
> with the duplicate suppression database.
>
> The symptoms we're seeing are probably what you'd expect -- our cyrus.conf is 
> set to allow up to 120 lmtpd children to run simulateously, and when we hit a 
> deadlock condition in the duplicate suppression database, we find that all 
> 120 of our running lmtpds lock up waiting for write locks in the database. 
> "truss" shows them all stuck in "lwp_sema_wait()" calls.  Inspection of the 
> duplicate database after the fact sometimes shows corruption (usually null 
> page pointers reported by db_verify), but sometimes shows nothing -- it's 
> possible that we're seeing two different problems with the same end effect, 
> but I suspect the database corruption is actually a side-effect of the 
> deadlock problem...

We're not running on anything that hefty, which is probably why we haven't 
seen it. Likewise, at this point I don't see any hardware that beefy 
falling on me to test it with, which is probably somewhat unfortunate.

> We've also considered running the db3 "db_deadlock" routine to periodically 
> detect and try to correct deadlock conditions in the duplicate suppression 
> database, but that's also somewhat scary -- it's unclear to us exactly what 
> the behavior of an lmtpd awaiting a lock in the duplicate suppression 
> database would be when its waiting lock got terminated by the db_deadlock 
> daemon...

It seems like it would be better to detect deadlocks and ascertain where 
and why, but I think I need to review some code before I could possibly 
have any useful suggestions in that vein.

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html




More information about the Info-cyrus mailing list