mupdate cpu, thread timeouts
wes at umich.edu
Mon Jul 12 14:01:20 EDT 2010
On 02 Jul 2010, at 09:29, John Madden wrote:
> I'm concerned about the listener_lock timeouts.
The listener_lock timeout means that the thread waited around for 60
seconds to see if a connection was going to arrive. Since it didn't,
it timed out and that thread went away. The pthread routine that you
sit in for 60 seconds is pthread_cond_timedwait(). Perhaps your
pthread implementation or kernel is implementing a busy wait?
>>> Jul 1 15:16:54 imap mupdate: unready for connections
>>> Jul 1 15:16:54 imap mupdate: synchronizing mailbox list with
>>> master mupdate server
>> are the interesting messages. It says to me that the connection to
>> the mupdate master is being lost. However, there ought to be an
>> error message to that effect, which I don't see. What's happening on
>> the mupdate master?
> On both the frontend and master, mupdate consumes 100% of the cpu
> for a few minutes. I agree, it seems like the update is failing
> and then restarting. How do I prevent that? It went on like this
> for a few hours twice yesterday, then cleared itself up and it
> hasn't happened since.
> We have been in the process of adding about 100,000 more users over
> the last few days (so 500k mailboxes). Is it possible for a
> frontend to get out of sync with the master to the point where
> catch-up periods like this become necessary? I thought each
> mailbox creation was synchronous across the murder so I'm thinking
> not, but the timing is interesting.
Mailbox creation is synchronous between a backend and the mupdate
master. frontends are streamed updates from the mupdate master,
typically every few seconds. So they can definitely get behind.
imapd & lmtpd on the frontends "kick" the slave mupdate if a mailbox
they are looking for is missing. The kick is meant to ensure that
the slave mupdate is up to date.
I don't think the problem is adding the mailboxes, per se. The only
time a slave tries to resync is when the connection to the master is
lost, or the slave THINKS the connection to the master is lost. If
the mupdate master is very busy doing something else and can't
respond to NOOPs issued by mupdate slaves, then the slaves will
consider the connection to be lost, drop the connection, and attempt
to resync. Since resyncing is a resource intensive activity (and
single-threaded on the mupdate master, to boot), this resync can
begin a thrashing cycle of dropped connections between the mupdate
slaves and the master. Bad news, and best avoided...
> Can I do anything with the prefork parameter for mupdate to spread
> things out on more cpu's or increase concurrency?
Prefork doesn't do anything useful for mupdate -- it's about forking
& accepting connections, not about threads. The mupdate master is
multithreaded in many situations. The mupdate slave on the frontends
is almost never multithreaded, but it does share code with the
mupdate master so you see messages about threads. I suspect that
mupdate on master & slave are consuming 100% of CPU on one CPU
because the slave is attempting to update. That's a synchronous,
single threaded activity on both, so I would expect it to take a lot
of CPU and to only be on one CPU.
More information about the Info-cyrus