rooling replication and many pop3d

Andy Bennett andyjpb at ashurst.eu.org
Wed Nov 23 20:07:40 EST 2011


Hi,

>>> I enabled replication between two servers with version 2.4.10 cyrus.
>>> I set the option for the rolling replication, and it works fine but
>>> obviously I have a high CPU load.
>>> Unfortunately after 10 minutes of running processes pop3d increasing
>>> from 50 to over 200, making the server unusable for customers.
>>> Can you tell me why this increase is abnormal?
>> Can you use something like 'top' to work out which processes are
>> consuming most of the CPU time?

Thanks for the screenshots.


> This screenshot of top before  :
> 
> http://www.digicolor.net/cyrus/img1.jpg

This shows a load average of around 1. That means that, at any given
point in time over the past 1 and 5 minutes, 1 process has been waiting
in the run queue, ready to go.
This therefore not an entirely idle machine. I see you're running a
nameserver and a few other things: it looks they've been busy on the CPU
but not excessively so.

What's more worrying is the 4.5% of CPU time spent "waiting". This time
is accrued when processes are unable to run due to outstanding IO.



> and after 20 minutes of rooling replication :
> 
> http://www.digicolor.net/cyrus/img5.jpg

This shows a load average of around 7 and 30% of CPU time spent in
iowait. This machine does not seem to be managing well with the IO load
of rolling replication.



>> Can you use something like 'vmstat 1' to show us how much I/O there is
>> on the system?
> 
> This screenshot of top before  :
> 
> http://www.digicolor.net/cyrus/img3.jpg

This shows a system that is not reading anything from disk (bi). A small
number of blocks are being written out to disk (bo). Each line
represents activity for a period of 1 second, as specified by the
parameter to 'vmstat'.


> and after 20 minutes of rooling replication :
> 
> http://www.digicolor.net/cyrus/img6.jpg

This system is writing to disk but it's very choppy. Sometimes it's
getting 7,000 blocks out per second and other times it's only 1,000.
Depending on your block size, this probably represents only a few MB per
second. The last column shows iowait CPU percentage.. and it's rather large.

What IO subsystem do you have on this machine? What filesystem are you
using?
The IO on this machine appears to be struggling significantly.

I did a quick test on my laptop. I have a 2.5", 7,200rpm 200GB disk.

I ran this in my home directory to cause every file to be read from disk:

-----
$ find -type f | xargs cat > /dev/null
-----

'vmstat 1' gives lines like this:

-----
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
id wa
 1  1   1680  30600    272 2575688    0    0 12376   576 2141 3424  3  6
47 43
 1  1   1680  32396    272 2576736    0    0 29988     0 1882 3632  5  7
47 41
 0  1   1680  33868    272 2576660    0    0 46820     0 2304 4443  4  8
48 39
 1  0   1680  33416    272 2578600    0    0 36716     0 2067 3733  3  7
48 42
 0  1   1680  34000    272 2581944    0    0 50432     0 1164 2983  3  6
50 42
 0  1   1680  31876    272 2585320    0    0 46464    64 1223 2964  3  8
49 40
 1  1   1680  30288    272 2588672    0    0 51712     0 1380 3658  3  7
46 43
 0  1   1680  29836    272 2590552    0    0 59776     0 1288 3549  4  7
47 42
 0  1   1680  30324    272 2592948    0    0 58368     0 1287 3568  2  7
49 41
 1  1   1680  30308    272 2593108    0    0 12800    18  917 1673  2  2
49 46
-----

They're an order of magnitude greater than what you're seeing. As you
can see, I drop a few bi when I start to do bo but that's because I've
only got a single spindle.

Please can you run the same test?

Can you track the source of all those writes in img3?





Please can you tell us more about the type of machine you are trying to
run this on?


Thanks for the info and screenshots so far.




>> Are most of the pop3d processes sleeping in iowait?
>> Do you use any other servers such as the impad?
> Yes I have imapd
> 
> This is screenshot of pstree before :
> 
> http://www.digicolor.net/cyrus/img2.jpg
> 
> 
> This is screenshot of pstree after  :
> 
> http://www.digicolor.net/cyrus/img4.jpg





Regards,
@ndy

-- 
andyjpb at ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF



More information about the Info-cyrus mailing list