cpu and cyrus

Wed Aug 31 16:40:24 EDT 2011

On Wed, Aug 31, 2011 at 11:51:03AM -0700, Maria McKinley wrote:
> On 8/31/11 11:44 AM, Wesley Craig wrote:
> > On 31 Aug 2011, at 14:36, Maria McKinley wrote:
> >> Anyway, here is an example of some processes that are getting big:
> >>
> >>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >>
> >> 24328 cyrus     20   0  147m 6236 5004 R 27.4  0.2  22:07.21 imapd
> >>
> >> 27549 cyrus     20   0  147m 6512 5116 R 27.4  0.2 155:41.26 imapd
> >>
> >> 30097 cyrus     20   0  147m 6280 5052 R 27.1  0.2  93:44.08 imapd
> >>
> >>
> >> Unfortunately I can't tell you anymore about these processes, since they
> >> are just cut and pasted from a terminal where I checked top before
> >> restarting cyrus, and cyrus has not regrown yet.
> >
> > It would be handy if you'd gather some metrics from a loaded but not pathological machine, and then additional metrics once it was in a bad state.  Also, getting strace output from a run-away (if it is run-away) process would be handy.
> >
> > :wes
> 
> 
> Not sure how helpful this is, but you can see where the that I have 
> restarted cyrus twice in the last 24 hours, and three times in the last 
> week, and how this has affected the load, etc. of this machine.
> 
> http://www.shadlen.org/munin/Servers/ella.shadlen.org/index.html

I can't believe you capture the uptime graph ;)

Seriously though - it's an infinite loop.  There are a few different
places it could be, and my money is on a bogus .seen file.  There were
definitely some infinite loops in there, and the bugs in skiplist db
locking in 2.2 mean you could have any old rubbish show up over time.

So I'm guessing it's a particular folder access that triggers the
runaway process each time.

Bron.