mupdate cpu, thread timeouts

Mon Jul 12 14:01:20 EDT 2010

On 02 Jul 2010, at 09:29, John Madden wrote:
> I'm concerned about the listener_lock timeouts.

The listener_lock timeout means that the thread waited around for 60  
seconds to see if a connection was going to arrive.  Since it didn't,  
it timed out and that thread went away.  The pthread routine that you  
sit in for 60 seconds is pthread_cond_timedwait().  Perhaps your  
pthread implementation or kernel is implementing a busy wait?

>>> Jul  1 15:16:54 imap mupdate[18203]: unready for connections
>>> Jul  1 15:16:54 imap mupdate[18203]: synchronizing mailbox list with
>>> master mupdate server
>>
>> are the interesting messages.  It says to me that the connection to
>> the mupdate master is being lost.  However, there ought to be an
>> error message to that effect, which I don't see.  What's happening on
>> the mupdate master?
>
> On both the frontend and master, mupdate consumes 100% of the cpu  
> for a few minutes.  I agree, it seems like the update is failing  
> and then restarting.  How do I prevent that?  It went on like this  
> for a few hours twice yesterday, then cleared itself up and it  
> hasn't happened since.
>
> We have been in the process of adding about 100,000 more users over  
> the last few days (so 500k mailboxes).  Is it possible for a  
> frontend to get out of sync with the master to the point where  
> catch-up periods like this become necessary?  I thought each  
> mailbox creation was synchronous across the murder so I'm thinking  
> not, but the timing is interesting.

Mailbox creation is synchronous between a backend and the mupdate  
master.  frontends are streamed updates from the mupdate master,  
typically every few seconds.  So they can definitely get behind.   
imapd & lmtpd on the frontends "kick" the slave mupdate if a mailbox  
they are looking for is missing.  The kick is meant to ensure that  
the slave mupdate is up to date.

I don't think the problem is adding the mailboxes, per se.  The only  
time a slave tries to resync is when the connection to the master is  
lost, or the slave THINKS the connection to the master is lost.  If  
the mupdate master is very busy doing something else and can't  
respond to NOOPs issued by mupdate slaves, then the slaves will  
consider the connection to be lost, drop the connection, and attempt  
to resync.  Since resyncing is a resource intensive activity (and  
single-threaded on the mupdate master, to boot), this resync can  
begin a thrashing cycle of dropped connections between the mupdate  
slaves and the master.  Bad news, and best avoided...

> Can I do anything with the prefork parameter for mupdate to spread  
> things out on more cpu's or increase concurrency?

Prefork doesn't do anything useful for mupdate -- it's about forking  
& accepting connections, not about threads.  The mupdate master is  
multithreaded in many situations.  The mupdate slave on the frontends  
is almost never multithreaded, but it does share code with the  
mupdate master so you see messages about threads.  I suspect that  
mupdate on master & slave are consuming 100% of CPU on one CPU  
because the slave is attempting to update.  That's a synchronous,  
single threaded activity on both, so I would expect it to take a lot  
of CPU and to only be on one CPU.

:wes