Cyrus-imapd memory tuning

Marco falon at ruparpiemonte.it
Mon Mar 10 04:49:52 EDT 2014


Hello cyrus users,

  I have a Cyrus-imapd server with 2400 mailboxes imap accessed by  
Open-Xchange client. Days ago this Cyrus-Imapd server experiences an  
Out of Memory and it starts to sacrifice childs:

2014-03-04T15:25:48.927562+01:00 ucstore-csi kernel: imapd: page  
allocation failure. order:1, mode:0x20
2014-03-04T15:25:48.934114+01:00 ucstore-csi kernel: Pid: 18151, comm:  
imapd Not tainted 2.6.32-279.el6.x86_64 #1
2014-03-04T15:25:48.934118+01:00 ucstore-csi kernel: Call Trace:
2014-03-04T15:25:48.934122+01:00 ucstore-csi kernel: <IRQ>   
[<ffffffff8112759f>] ? __alloc_pages_nodemask+0x77f/0x940
2014-03-04T15:25:48.934123+01:00 ucstore-csi kernel:  
[<ffffffff81161d62>] ? kmem_getpages+0x62/0x170
2014-03-04T15:25:48.934123+01:00 ucstore-csi kernel:  
[<ffffffff8116297a>] ? fallback_alloc+0x1ba/0x270
2014-03-04T15:25:48.934124+01:00 ucstore-csi kernel:  
[<ffffffff811623cf>] ? cache_grow+0x2cf/0x320
2014-03-04T15:25:48.934124+01:00 ucstore-csi kernel:  
[<ffffffff811626f9>] ? ____cache_alloc_node+0x99/0x160
2014-03-04T15:25:48.934125+01:00 ucstore-csi kernel:  
[<ffffffff811634db>] ? kmem_cache_alloc+0x11b/0x190
2014-03-04T15:25:48.934125+01:00 ucstore-csi kernel:  
[<ffffffff8142dc68>] ? sk_prot_alloc+0x48/0x1c0
2014-03-04T15:25:48.934127+01:00 ucstore-csi kernel:  
[<ffffffff8142df32>] ? sk_clone+0x22/0x2e0
2014-03-04T15:25:48.934128+01:00 ucstore-csi kernel:  
[<ffffffff8147bb86>] ? inet_csk_clone+0x16/0xd0
2014-03-04T15:25:48.934128+01:00 ucstore-csi kernel:  
[<ffffffff81494ae3>] ? tcp_create_openreq_child+0x23/0x450
2014-03-04T15:25:48.934129+01:00 ucstore-csi kernel:  
[<ffffffff8149239d>] ? tcp_v4_syn_recv_sock+0x4d/0x310
2014-03-04T15:25:48.934129+01:00 ucstore-csi kernel:  
[<ffffffff81494886>] ? tcp_check_req+0x226/0x460
2014-03-04T15:25:48.934130+01:00 ucstore-csi kernel:  
[<ffffffff81491dbb>] ? tcp_v4_do_rcv+0x35b/0x430
2014-03-04T15:25:48.934132+01:00 ucstore-csi kernel:  
[<ffffffff814935be>] ? tcp_v4_rcv+0x4fe/0x8d0
2014-03-04T15:25:48.934133+01:00 ucstore-csi kernel:  
[<ffffffff811acdd7>] ? end_bio_bh_io_sync+0x37/0x60
2014-03-04T15:25:48.934133+01:00 ucstore-csi kernel:  
[<ffffffff814712dd>] ? ip_local_deliver_finish+0xdd/0x2d0
2014-03-04T15:25:48.934134+01:00 ucstore-csi kernel:  
[<ffffffff81471568>] ? ip_local_deliver+0x98/0xa0
2014-03-04T15:25:48.934134+01:00 ucstore-csi kernel:  
[<ffffffff81470a2d>] ? ip_rcv_finish+0x12d/0x440
2014-03-04T15:25:48.934135+01:00 ucstore-csi kernel:  
[<ffffffff81470fb5>] ? ip_rcv+0x275/0x350
2014-03-04T15:25:48.934135+01:00 ucstore-csi kernel:  
[<ffffffff8143a7bb>] ? __netif_receive_skb+0x49b/0x6f0
2014-03-04T15:25:48.934137+01:00 ucstore-csi kernel:  
[<ffffffff8143ca38>] ? netif_receive_skb+0x58/0x60
2014-03-04T15:25:48.934138+01:00 ucstore-csi kernel:  
[<ffffffffa00aea9d>] ? vmxnet3_rq_rx_complete+0x36d/0x880 [vmxnet3]
2014-03-04T15:25:48.934138+01:00 ucstore-csi kernel:  
[<ffffffff812871e0>] ? swiotlb_map_page+0x0/0x100
2014-03-04T15:25:48.934139+01:00 ucstore-csi kernel:  
[<ffffffffa00af203>] ? vmxnet3_poll_rx_only+0x43/0xc0 [vmxnet3]
2014-03-04T15:25:48.934139+01:00 ucstore-csi kernel:  
[<ffffffff8143f193>] ? net_rx_action+0x103/0x2f0
2014-03-04T15:25:48.934140+01:00 ucstore-csi kernel:  
[<ffffffff81073ec1>] ? __do_softirq+0xc1/0x1e0
2014-03-04T15:25:48.934140+01:00 ucstore-csi kernel:  
[<ffffffff810db800>] ? handle_IRQ_event+0x60/0x170
2014-03-04T15:25:48.934142+01:00 ucstore-csi kernel:  
[<ffffffff81073f1f>] ? __do_softirq+0x11f/0x1e0
2014-03-04T15:25:48.934143+01:00 ucstore-csi kernel:  
[<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
2014-03-04T15:25:48.934143+01:00 ucstore-csi kernel:  
[<ffffffff8100de85>] ? do_softirq+0x65/0xa0
2014-03-04T15:25:48.934144+01:00 ucstore-csi kernel:  
[<ffffffff81073ca5>] ? irq_exit+0x85/0x90
2014-03-04T15:25:48.934144+01:00 ucstore-csi kernel:  
[<ffffffff81505af5>] ? do_IRQ+0x75/0xf0
2014-03-04T15:25:48.934145+01:00 ucstore-csi kernel:  
[<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
2014-03-05T15:38:32.815336+01:00 ucstore-csi kernel: Out of memory:  
Kill process 1778 (irqbalance) score 1 or sacrifice child
2014-03-05T15:38:32.815336+01:00 ucstore-csi kernel: Killed process  
1778, UID 0, (irqbalance) total-vm:9140kB, anon-rss:88kB, file-rss:4
kB
2014-03-05T15:38:32.815338+01:00 ucstore-csi kernel: imapd invoked  
oom-killer: gfp_mask=0xd0, order=1, oom_adj=0, oom_score_adj=0
2014-03-05T15:38:32.815339+01:00 ucstore-csi kernel: imapd cpuset=/  
mems_allowed=0
2014-03-05T15:38:32.815339+01:00 ucstore-csi kernel: Pid: 19228, comm:  
imapd Not tainted 2.6.32-279.el6.x86_64 #1
2014-03-05T15:38:32.815340+01:00 ucstore-csi kernel: Call Trace:
2014-03-05T15:38:32.815340+01:00 ucstore-csi kernel:  
[<ffffffff810c4971>] ? cpuset_print_task_mems_allowed+0x91/0xb0
2014-03-05T15:38:32.815341+01:00 ucstore-csi kernel:  
[<ffffffff811170e0>] ? dump_header+0x90/0x1b0
2014-03-05T15:38:32.815341+01:00 ucstore-csi kernel:  
[<ffffffff812146fc>] ? security_real_capable_noaudit+0x3c/0x70
2014-03-05T15:38:32.815343+01:00 ucstore-csi kernel:  
[<ffffffff81117562>] ? oom_kill_process+0x82/0x2a0
2014-03-05T15:38:32.815344+01:00 ucstore-csi kernel:  
[<ffffffff811174a1>] ? select_bad_process+0xe1/0x120
2014-03-05T15:38:32.815344+01:00 ucstore-csi kernel:  
[<ffffffff811179a0>] ? out_of_memory+0x220/0x3c0
2014-03-05T15:38:32.815345+01:00 ucstore-csi kernel:  
[<ffffffff811276be>] ? __alloc_pages_nodemask+0x89e/0x940
2014-03-05T15:38:32.815345+01:00 ucstore-csi kernel:  
[<ffffffff8115c1da>] ? alloc_pages_current+0xaa/0x110
2014-03-05T15:38:32.815346+01:00 ucstore-csi kernel:  
[<ffffffff811253ce>] ? __get_free_pages+0xe/0x50
2014-03-05T15:38:32.815346+01:00 ucstore-csi kernel:  
[<ffffffff81069464>] ? copy_process+0xe4/0x13c0
2014-03-05T15:38:32.815348+01:00 ucstore-csi kernel:  
[<ffffffff8104452c>] ? __do_page_fault+0x1ec/0x480
2014-03-05T15:38:32.815349+01:00 ucstore-csi kernel:  
[<ffffffff812718b1>] ? cpumask_any_but+0x31/0x50
2014-03-05T15:38:32.815349+01:00 ucstore-csi kernel:  
[<ffffffff8106a7d4>] ? do_fork+0x94/0x460
2014-03-05T15:38:32.815350+01:00 ucstore-csi kernel:  
[<ffffffff81081ba1>] ? do_sigaction+0x91/0x1d0
2014-03-05T15:38:32.815350+01:00 ucstore-csi kernel:  
[<ffffffff810d69e2>] ? audit_syscall_entry+0x272/0x2a0
2014-03-05T15:38:32.815351+01:00 ucstore-csi kernel:  
[<ffffffff81009598>] ? sys_clone+0x28/0x30
2014-03-05T15:38:32.815351+01:00 ucstore-csi kernel:  
[<ffffffff8100b413>] ? stub_clone+0x13/0x20
2014-03-05T15:38:32.815353+01:00 ucstore-csi kernel:  
[<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
2014-03-05T15:38:32.815354+01:00 ucstore-csi kernel: Mem-Info:
2014-03-05T15:38:32.815354+01:00 ucstore-csi kernel: Node 0 DMA per-cpu:
2014-03-05T15:38:32.815355+01:00 ucstore-csi kernel: CPU    0: hi:     
0, btch:   1 usd:   0
2014-03-05T15:38:32.815355+01:00 ucstore-csi kernel: CPU    1: hi:     
0, btch:   1 usd:   0
2014-03-05T15:38:32.815356+01:00 ucstore-csi kernel: Node 0 DMA32 per-cpu:
2014-03-05T15:38:32.815356+01:00 ucstore-csi kernel: CPU    0: hi:   
186, btch:  31 usd:   0
2014-03-05T15:38:32.815358+01:00 ucstore-csi kernel: CPU    1: hi:   
186, btch:  31 usd:   0
2014-03-05T15:38:32.815358+01:00 ucstore-csi kernel: Node 0 Normal per-cpu:
2014-03-05T15:38:32.815359+01:00 ucstore-csi kernel: CPU    0: hi:   
186, btch:  31 usd:   0
2014-03-05T15:38:32.815359+01:00 ucstore-csi kernel: CPU    1: hi:   
186, btch:  31 usd:   9
2014-03-05T15:38:32.815360+01:00 ucstore-csi kernel:  
active_anon:1076363 inactive_anon:208842 isolated_anon:14
2014-03-05T15:38:32.815360+01:00 ucstore-csi kernel: active_file:128  
inactive_file:422 isolated_file:15
2014-03-05T15:38:32.815362+01:00 ucstore-csi kernel: unevictable:0  
dirty:0 writeback:0 unstable:0
2014-03-05T15:38:32.815363+01:00 ucstore-csi kernel: free:148958  
slab_reclaimable:29256 slab_unreclaimable:148642
2014-03-05T15:38:32.815363+01:00 ucstore-csi kernel: mapped:862  
shmem:3101 pagetables:329229 bounce:0
2014-03-05T15:38:32.815364+01:00 ucstore-csi kernel: Node 0 DMA  
free:15660kB min:124kB low:152kB high:184kB active_anon:0kB  
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB  
isolated(anon):0kB isolated(file):0kB present:15268kB mlocked:0kB  
dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB  
slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB  
bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
2014-03-05T15:38:32.815364+01:00 ucstore-csi kernel: lowmem_reserve[]:  
0 3000 8050 8050
2014-03-05T15:38:32.815365+01:00 ucstore-csi kernel: Node 0 DMA32  
free:525368kB min:25140kB low:31424kB high:37708kB  
active_anon:1410856kB inactive_anon:352760kB active_file:0kB  
inactive_file:44kB unevictable:0kB isolated(anon):0kB  
isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB  
writeback:0kB mapped:2288kB shmem:8380kB slab_reclaimable:44624kB  
slab_unreclaimable:155772kB kernel_stack:46984kB pagetables:242404kB  
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0  
all_unreclaimable? no
2014-03-05T15:38:32.815367+01:00 ucstore-csi kernel: lowmem_reserve[]:  
0 0 5050 5050
2014-03-05T15:38:32.815367+01:00 ucstore-csi kernel: Node 0 Normal  
free:54804kB min:42316kB low:52892kB high:63472kB  
active_anon:2894596kB inactive_anon:482608kB active_file:512kB  
inactive_file:1644kB unevictable:0kB isolated(anon):76kB  
isolated(file):60kB present:5171200kB mlocked:0kB dirty:0kB  
writeback:0kB mapped:1160kB shmem:4024kB slab_reclaimable:72400kB  
slab_unreclaimable:438796kB kernel_stack:4496kB pagetables:1074512kB  
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:338  
all_unreclaimable? no
2014-03-05T15:38:32.815368+01:00 ucstore-csi kernel: lowmem_reserve[]: 0 0 0 0
2014-03-05T15:38:32.815368+01:00 ucstore-csi kernel: Node 0 DMA: 1*4kB  
1*8kB 0*16kB 1*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB  
3*4096kB = 15660kB
2014-03-05T15:38:32.815369+01:00 ucstore-csi kernel: Node 0 DMA32:  
128262*4kB 978*8kB 78*16kB 29*32kB 6*64kB 0*128kB 0*256kB 0*512kB  
0*1024kB 1*2048kB 0*4096kB = 525480kB
2014-03-05T15:38:32.815370+01:00 ucstore-csi kernel: Node 0 Normal:  
12511*4kB 8*8kB 39*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB  
0*2048kB 1*4096kB = 54828kB
2014-03-05T15:38:32.815373+01:00 ucstore-csi kernel: 13536 total  
pagecache pages
2014-03-05T15:38:32.815373+01:00 ucstore-csi kernel: 9873 pages in swap cache
2014-03-05T15:38:32.815374+01:00 ucstore-csi kernel: Swap cache stats:  
add 1527415, delete 1517542, find 37369649/37407554
2014-03-05T15:38:32.815374+01:00 ucstore-csi kernel: Free swap  = 0kB
2014-03-05T15:38:32.815375+01:00 ucstore-csi kernel: Total swap = 4194296kB
2014-03-05T15:38:32.815375+01:00 ucstore-csi kernel: 2097136 pages RAM
2014-03-05T15:38:32.815377+01:00 ucstore-csi kernel: 81706 pages reserved
2014-03-05T15:38:32.815377+01:00 ucstore-csi kernel: 18873 pages shared
2014-03-05T15:38:32.815378+01:00 ucstore-csi kernel: 1847951 pages non-shared

In average I have about 400 max simoultaneous connections and I have  
no problems on memory. I think a network issue (DNS or LDAP stalled)  
causes connections to suddenly increase to 3500. Imapd processes were  
opened until memory overflow.

My server is:
Red Hat Enterprise Linux Server release 6.3 (Santiago)
Without problems I read something like this:

              total       used       free     shared    buffers     cached
Mem:       8061976    7651020     410956          0    1355964    3412788
-/+ buffers/cache:    2882268    5179708
Swap:      4194296      32180    4162116

procs -----------memory---------- ---swap-- -----io---- --system--  
-----cpu-----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us  
sy id wa st
  2  0  32180 386880 1356476 3423712    0    0   643   327   25   18  
10  4 81  5  0

current cyrus.conf:
SERVICES {
   # add or remove based on preferences
   imap          cmd="imapd" listen="imap" prefork=5
   pop3          cmd="pop3d" listen="pop3" prefork=3
   sieve         cmd="timsieved" listen="sieve" prefork=0
   lmtp          cmd="lmtpd -a" listen="lmtp" prefork=0
}



I have to prevent memory issue when some oddity forces clients to make  
DOS on Cyrus. So I would like to configure the maxchild cyrus  
parameter for imap. I would like to set this value to avoid memory  
issue during normal work, having a known value of system RAM.

I see that an IMAPD process takes in average 22-25MB. With 8GB RAM,  
the server would swap already with less than 400 conns; it not  
happens, so this evaluation is wrong or too many conservative. I think  
that I better consider differences between RSS and SHR memory to  
tuning imapd processes number, but I'm not sure.

Could you help me in this tuning? In particular I'm interested on  
relation between memory usage and maxchild imapd processes.

Meanwhile I would also tune the maxfds parameter. With lsof I measure  
about 60 opened files by each imapd process. If I have 400 imapd  
processes it means a 'ulimit -f' global system of 60*400=24000. This  
is wrong, because I currently have a 4096 limit and I never had  
problems. Maybe do I have to consider only 'Running' processes to  
compute this treshold?

Thank you very much for every hints.
Marco




More information about the Info-cyrus mailing list