High Load Avg and Context Switches

Jeremy Sanders jsanders at teklinks.com
Mon Mar 17 23:21:52 EST 2003


Why would pop3d have all the hot backup db files open? If 50 pop3d
processes are trying to access the same 4 files, I can see why there
would be some contention there....

pop3d      1423 cyrus  mem    REG       58,9   270336    324606
/var/spool/imap/db/__db.002
pop3d      1423 cyrus  mem    REG       58,9    98304    324607
/var/spool/imap/db/__db.003
pop3d      1423 cyrus  mem    REG       58,9 17063936    324642
/var/spool/imap/db/__db.004
pop3d      1423 cyrus  mem    REG       58,9    32768    324827
/var/spool/imap/db/__db.005
pop3d      1423 cyrus  mem    REG       72,2    45415   3597474
/lib/libnss_files-2.2.5.so
pop3d      1423 cyrus  mem    REG       72,2    46117   3597482
/lib/libnss_nisplus-2.2.5.so
pop3d      1423 cyrus  mem    REG       72,2  1402035   1733410
/lib/i686/libc-2.2.5.so
pop3d      1423 cyrus    5u   REG       58,9  7703932    324844
/var/spool/imap/db/log.0000000022
pop3d      1423 cyrus    6r   REG       58,9  7703932    324844
/var/spool/imap/db/log.0000000022


Thanks,

Jeremy



>>> "Jeremy Sanders" <jsanders at teklinks.com> 03/17/03 04:15PM >>>
Hello,

I'm running Cyrus IMAP 2.1.12 on a Redhat 7.3 box running kernel
2.4.20. The imap partition is on a Compaq RA4100 with a Compaq
Fiber-channel HBA in the server. It is an lvm ext3 partition running
w/
noatime,data=ordered.

Here is a vmstat output, notice the spike in context switches:

[root at mailserv2 root]# vmstat -n 2
   procs                      memory    swap          io     system   

    cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs 
us
 sy  id
 0 36  3    512 1045092 101396 194348   0   0    92   589  334   542  
9  28  64
 2 32  2    512 1040448 101408 194504   0   0    54   802  621  1393 
61  12  27
 0 36  2    512 1040500 101408 194492   0   0     0     0  152   420 
34   2  64
 0 40  2    512 1040412 101428 194516   0   0     0   124  235   197  
1   1  98
 0 10  1    512 1042044 101484 194488   0   0    68   778  388   417  
7   4  89
 4  6  1    512 1042312 101548 194500   0   0     0   778  794 245291 

3  72  25
 2 10  3    512 1042328 101564 194508   0   0     0   154  299 303798 
17  83   0
 2 14  3    512 1042320 101564 194512   0   0     0     0  178 378398 

1  99   0
 2 18  3    512 1041972 101572 194520   0   0     0   104  190 358011 

0 100   0
 2 15  2    512 1038820 101604 195456   0   0    42  1804  434 58432 
25  33  42
 0 20  1    512 1038480 101632 195452   0   0     0   798  194   197  
0   2  98
 0 30  1    512 1038476 101632 195452   0   0     0     0  152   229  
0   1  99
 0 56  1    512 1037040 101668 195588   0   0    64   512  330   926 
14  12  74
 2 56  1    512 1036748 101668 195580   0   0     0   240  194   573 
10   8  82
 2  7  1    512 1035512 101796 195648   0   0    60  1858 1245  1424 
13  12  75
 0 13  1    512 1035460 101816 195648   0   0     0   198  161   252 
18   1  81
 0 14  1    512 1035460 101816 195648   0   0     0     0  129    75  
0   0  99
 0 15  2    512 1035444 101832 195648   0   0     0    32  142    88  
1   0  99
 0  0  0    512 1037516 101940 195024   0   0     6   726  534   680  
3   3  94
 0  0  0    512 1037712 102012 194932   0   0     0   614  569   968  
4   4  92
 0  0  0    512 1037664 102056 194916   0   0     8   600  335   511  
1   2  97


It's IO read activity is also high as would be expected from server
that is being popped by 1,000 Outlook clients continuously. The
loadavg
ranges from 11-48. If it's around 11-15 the server runs fine. If it
gets
over 20, the server is noticably slower. Up until last week the
loadavg
was consistently below 5. The processor is mostly idle. I thought the
ext3 partition might be corrupt, so I made a new lvm partition and
cpio'ed the data over to the new filesystem. That didn't help either.
We've also adjusted elvtune parameters in both directions without any
appreciable difference. The only change that has had a positive impact
so far was changing data=journaled to data=ordered.

We also pruned the delivery database and increased the
/proc/sys/fs/file-max parameter.

Any help would be appreciated.

Thanks,



Jeremy Sanders, CCNP RHCE CNE
Senior System Engineer
Teklinks, Inc.
205-249-5988




More information about the Info-cyrus mailing list