Cyrus-imapd memory tuning
Marco
falon at ruparpiemonte.it
Mon Mar 10 04:49:52 EDT 2014
Hello cyrus users,
I have a Cyrus-imapd server with 2400 mailboxes imap accessed by
Open-Xchange client. Days ago this Cyrus-Imapd server experiences an
Out of Memory and it starts to sacrifice childs:
2014-03-04T15:25:48.927562+01:00 ucstore-csi kernel: imapd: page
allocation failure. order:1, mode:0x20
2014-03-04T15:25:48.934114+01:00 ucstore-csi kernel: Pid: 18151, comm:
imapd Not tainted 2.6.32-279.el6.x86_64 #1
2014-03-04T15:25:48.934118+01:00 ucstore-csi kernel: Call Trace:
2014-03-04T15:25:48.934122+01:00 ucstore-csi kernel: <IRQ>
[<ffffffff8112759f>] ? __alloc_pages_nodemask+0x77f/0x940
2014-03-04T15:25:48.934123+01:00 ucstore-csi kernel:
[<ffffffff81161d62>] ? kmem_getpages+0x62/0x170
2014-03-04T15:25:48.934123+01:00 ucstore-csi kernel:
[<ffffffff8116297a>] ? fallback_alloc+0x1ba/0x270
2014-03-04T15:25:48.934124+01:00 ucstore-csi kernel:
[<ffffffff811623cf>] ? cache_grow+0x2cf/0x320
2014-03-04T15:25:48.934124+01:00 ucstore-csi kernel:
[<ffffffff811626f9>] ? ____cache_alloc_node+0x99/0x160
2014-03-04T15:25:48.934125+01:00 ucstore-csi kernel:
[<ffffffff811634db>] ? kmem_cache_alloc+0x11b/0x190
2014-03-04T15:25:48.934125+01:00 ucstore-csi kernel:
[<ffffffff8142dc68>] ? sk_prot_alloc+0x48/0x1c0
2014-03-04T15:25:48.934127+01:00 ucstore-csi kernel:
[<ffffffff8142df32>] ? sk_clone+0x22/0x2e0
2014-03-04T15:25:48.934128+01:00 ucstore-csi kernel:
[<ffffffff8147bb86>] ? inet_csk_clone+0x16/0xd0
2014-03-04T15:25:48.934128+01:00 ucstore-csi kernel:
[<ffffffff81494ae3>] ? tcp_create_openreq_child+0x23/0x450
2014-03-04T15:25:48.934129+01:00 ucstore-csi kernel:
[<ffffffff8149239d>] ? tcp_v4_syn_recv_sock+0x4d/0x310
2014-03-04T15:25:48.934129+01:00 ucstore-csi kernel:
[<ffffffff81494886>] ? tcp_check_req+0x226/0x460
2014-03-04T15:25:48.934130+01:00 ucstore-csi kernel:
[<ffffffff81491dbb>] ? tcp_v4_do_rcv+0x35b/0x430
2014-03-04T15:25:48.934132+01:00 ucstore-csi kernel:
[<ffffffff814935be>] ? tcp_v4_rcv+0x4fe/0x8d0
2014-03-04T15:25:48.934133+01:00 ucstore-csi kernel:
[<ffffffff811acdd7>] ? end_bio_bh_io_sync+0x37/0x60
2014-03-04T15:25:48.934133+01:00 ucstore-csi kernel:
[<ffffffff814712dd>] ? ip_local_deliver_finish+0xdd/0x2d0
2014-03-04T15:25:48.934134+01:00 ucstore-csi kernel:
[<ffffffff81471568>] ? ip_local_deliver+0x98/0xa0
2014-03-04T15:25:48.934134+01:00 ucstore-csi kernel:
[<ffffffff81470a2d>] ? ip_rcv_finish+0x12d/0x440
2014-03-04T15:25:48.934135+01:00 ucstore-csi kernel:
[<ffffffff81470fb5>] ? ip_rcv+0x275/0x350
2014-03-04T15:25:48.934135+01:00 ucstore-csi kernel:
[<ffffffff8143a7bb>] ? __netif_receive_skb+0x49b/0x6f0
2014-03-04T15:25:48.934137+01:00 ucstore-csi kernel:
[<ffffffff8143ca38>] ? netif_receive_skb+0x58/0x60
2014-03-04T15:25:48.934138+01:00 ucstore-csi kernel:
[<ffffffffa00aea9d>] ? vmxnet3_rq_rx_complete+0x36d/0x880 [vmxnet3]
2014-03-04T15:25:48.934138+01:00 ucstore-csi kernel:
[<ffffffff812871e0>] ? swiotlb_map_page+0x0/0x100
2014-03-04T15:25:48.934139+01:00 ucstore-csi kernel:
[<ffffffffa00af203>] ? vmxnet3_poll_rx_only+0x43/0xc0 [vmxnet3]
2014-03-04T15:25:48.934139+01:00 ucstore-csi kernel:
[<ffffffff8143f193>] ? net_rx_action+0x103/0x2f0
2014-03-04T15:25:48.934140+01:00 ucstore-csi kernel:
[<ffffffff81073ec1>] ? __do_softirq+0xc1/0x1e0
2014-03-04T15:25:48.934140+01:00 ucstore-csi kernel:
[<ffffffff810db800>] ? handle_IRQ_event+0x60/0x170
2014-03-04T15:25:48.934142+01:00 ucstore-csi kernel:
[<ffffffff81073f1f>] ? __do_softirq+0x11f/0x1e0
2014-03-04T15:25:48.934143+01:00 ucstore-csi kernel:
[<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
2014-03-04T15:25:48.934143+01:00 ucstore-csi kernel:
[<ffffffff8100de85>] ? do_softirq+0x65/0xa0
2014-03-04T15:25:48.934144+01:00 ucstore-csi kernel:
[<ffffffff81073ca5>] ? irq_exit+0x85/0x90
2014-03-04T15:25:48.934144+01:00 ucstore-csi kernel:
[<ffffffff81505af5>] ? do_IRQ+0x75/0xf0
2014-03-04T15:25:48.934145+01:00 ucstore-csi kernel:
[<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
2014-03-05T15:38:32.815336+01:00 ucstore-csi kernel: Out of memory:
Kill process 1778 (irqbalance) score 1 or sacrifice child
2014-03-05T15:38:32.815336+01:00 ucstore-csi kernel: Killed process
1778, UID 0, (irqbalance) total-vm:9140kB, anon-rss:88kB, file-rss:4
kB
2014-03-05T15:38:32.815338+01:00 ucstore-csi kernel: imapd invoked
oom-killer: gfp_mask=0xd0, order=1, oom_adj=0, oom_score_adj=0
2014-03-05T15:38:32.815339+01:00 ucstore-csi kernel: imapd cpuset=/
mems_allowed=0
2014-03-05T15:38:32.815339+01:00 ucstore-csi kernel: Pid: 19228, comm:
imapd Not tainted 2.6.32-279.el6.x86_64 #1
2014-03-05T15:38:32.815340+01:00 ucstore-csi kernel: Call Trace:
2014-03-05T15:38:32.815340+01:00 ucstore-csi kernel:
[<ffffffff810c4971>] ? cpuset_print_task_mems_allowed+0x91/0xb0
2014-03-05T15:38:32.815341+01:00 ucstore-csi kernel:
[<ffffffff811170e0>] ? dump_header+0x90/0x1b0
2014-03-05T15:38:32.815341+01:00 ucstore-csi kernel:
[<ffffffff812146fc>] ? security_real_capable_noaudit+0x3c/0x70
2014-03-05T15:38:32.815343+01:00 ucstore-csi kernel:
[<ffffffff81117562>] ? oom_kill_process+0x82/0x2a0
2014-03-05T15:38:32.815344+01:00 ucstore-csi kernel:
[<ffffffff811174a1>] ? select_bad_process+0xe1/0x120
2014-03-05T15:38:32.815344+01:00 ucstore-csi kernel:
[<ffffffff811179a0>] ? out_of_memory+0x220/0x3c0
2014-03-05T15:38:32.815345+01:00 ucstore-csi kernel:
[<ffffffff811276be>] ? __alloc_pages_nodemask+0x89e/0x940
2014-03-05T15:38:32.815345+01:00 ucstore-csi kernel:
[<ffffffff8115c1da>] ? alloc_pages_current+0xaa/0x110
2014-03-05T15:38:32.815346+01:00 ucstore-csi kernel:
[<ffffffff811253ce>] ? __get_free_pages+0xe/0x50
2014-03-05T15:38:32.815346+01:00 ucstore-csi kernel:
[<ffffffff81069464>] ? copy_process+0xe4/0x13c0
2014-03-05T15:38:32.815348+01:00 ucstore-csi kernel:
[<ffffffff8104452c>] ? __do_page_fault+0x1ec/0x480
2014-03-05T15:38:32.815349+01:00 ucstore-csi kernel:
[<ffffffff812718b1>] ? cpumask_any_but+0x31/0x50
2014-03-05T15:38:32.815349+01:00 ucstore-csi kernel:
[<ffffffff8106a7d4>] ? do_fork+0x94/0x460
2014-03-05T15:38:32.815350+01:00 ucstore-csi kernel:
[<ffffffff81081ba1>] ? do_sigaction+0x91/0x1d0
2014-03-05T15:38:32.815350+01:00 ucstore-csi kernel:
[<ffffffff810d69e2>] ? audit_syscall_entry+0x272/0x2a0
2014-03-05T15:38:32.815351+01:00 ucstore-csi kernel:
[<ffffffff81009598>] ? sys_clone+0x28/0x30
2014-03-05T15:38:32.815351+01:00 ucstore-csi kernel:
[<ffffffff8100b413>] ? stub_clone+0x13/0x20
2014-03-05T15:38:32.815353+01:00 ucstore-csi kernel:
[<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
2014-03-05T15:38:32.815354+01:00 ucstore-csi kernel: Mem-Info:
2014-03-05T15:38:32.815354+01:00 ucstore-csi kernel: Node 0 DMA per-cpu:
2014-03-05T15:38:32.815355+01:00 ucstore-csi kernel: CPU 0: hi:
0, btch: 1 usd: 0
2014-03-05T15:38:32.815355+01:00 ucstore-csi kernel: CPU 1: hi:
0, btch: 1 usd: 0
2014-03-05T15:38:32.815356+01:00 ucstore-csi kernel: Node 0 DMA32 per-cpu:
2014-03-05T15:38:32.815356+01:00 ucstore-csi kernel: CPU 0: hi:
186, btch: 31 usd: 0
2014-03-05T15:38:32.815358+01:00 ucstore-csi kernel: CPU 1: hi:
186, btch: 31 usd: 0
2014-03-05T15:38:32.815358+01:00 ucstore-csi kernel: Node 0 Normal per-cpu:
2014-03-05T15:38:32.815359+01:00 ucstore-csi kernel: CPU 0: hi:
186, btch: 31 usd: 0
2014-03-05T15:38:32.815359+01:00 ucstore-csi kernel: CPU 1: hi:
186, btch: 31 usd: 9
2014-03-05T15:38:32.815360+01:00 ucstore-csi kernel:
active_anon:1076363 inactive_anon:208842 isolated_anon:14
2014-03-05T15:38:32.815360+01:00 ucstore-csi kernel: active_file:128
inactive_file:422 isolated_file:15
2014-03-05T15:38:32.815362+01:00 ucstore-csi kernel: unevictable:0
dirty:0 writeback:0 unstable:0
2014-03-05T15:38:32.815363+01:00 ucstore-csi kernel: free:148958
slab_reclaimable:29256 slab_unreclaimable:148642
2014-03-05T15:38:32.815363+01:00 ucstore-csi kernel: mapped:862
shmem:3101 pagetables:329229 bounce:0
2014-03-05T15:38:32.815364+01:00 ucstore-csi kernel: Node 0 DMA
free:15660kB min:124kB low:152kB high:184kB active_anon:0kB
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:15268kB mlocked:0kB
dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
2014-03-05T15:38:32.815364+01:00 ucstore-csi kernel: lowmem_reserve[]:
0 3000 8050 8050
2014-03-05T15:38:32.815365+01:00 ucstore-csi kernel: Node 0 DMA32
free:525368kB min:25140kB low:31424kB high:37708kB
active_anon:1410856kB inactive_anon:352760kB active_file:0kB
inactive_file:44kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB
writeback:0kB mapped:2288kB shmem:8380kB slab_reclaimable:44624kB
slab_unreclaimable:155772kB kernel_stack:46984kB pagetables:242404kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
2014-03-05T15:38:32.815367+01:00 ucstore-csi kernel: lowmem_reserve[]:
0 0 5050 5050
2014-03-05T15:38:32.815367+01:00 ucstore-csi kernel: Node 0 Normal
free:54804kB min:42316kB low:52892kB high:63472kB
active_anon:2894596kB inactive_anon:482608kB active_file:512kB
inactive_file:1644kB unevictable:0kB isolated(anon):76kB
isolated(file):60kB present:5171200kB mlocked:0kB dirty:0kB
writeback:0kB mapped:1160kB shmem:4024kB slab_reclaimable:72400kB
slab_unreclaimable:438796kB kernel_stack:4496kB pagetables:1074512kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:338
all_unreclaimable? no
2014-03-05T15:38:32.815368+01:00 ucstore-csi kernel: lowmem_reserve[]: 0 0 0 0
2014-03-05T15:38:32.815368+01:00 ucstore-csi kernel: Node 0 DMA: 1*4kB
1*8kB 0*16kB 1*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB
3*4096kB = 15660kB
2014-03-05T15:38:32.815369+01:00 ucstore-csi kernel: Node 0 DMA32:
128262*4kB 978*8kB 78*16kB 29*32kB 6*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 1*2048kB 0*4096kB = 525480kB
2014-03-05T15:38:32.815370+01:00 ucstore-csi kernel: Node 0 Normal:
12511*4kB 8*8kB 39*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 1*4096kB = 54828kB
2014-03-05T15:38:32.815373+01:00 ucstore-csi kernel: 13536 total
pagecache pages
2014-03-05T15:38:32.815373+01:00 ucstore-csi kernel: 9873 pages in swap cache
2014-03-05T15:38:32.815374+01:00 ucstore-csi kernel: Swap cache stats:
add 1527415, delete 1517542, find 37369649/37407554
2014-03-05T15:38:32.815374+01:00 ucstore-csi kernel: Free swap = 0kB
2014-03-05T15:38:32.815375+01:00 ucstore-csi kernel: Total swap = 4194296kB
2014-03-05T15:38:32.815375+01:00 ucstore-csi kernel: 2097136 pages RAM
2014-03-05T15:38:32.815377+01:00 ucstore-csi kernel: 81706 pages reserved
2014-03-05T15:38:32.815377+01:00 ucstore-csi kernel: 18873 pages shared
2014-03-05T15:38:32.815378+01:00 ucstore-csi kernel: 1847951 pages non-shared
In average I have about 400 max simoultaneous connections and I have
no problems on memory. I think a network issue (DNS or LDAP stalled)
causes connections to suddenly increase to 3500. Imapd processes were
opened until memory overflow.
My server is:
Red Hat Enterprise Linux Server release 6.3 (Santiago)
Without problems I read something like this:
total used free shared buffers cached
Mem: 8061976 7651020 410956 0 1355964 3412788
-/+ buffers/cache: 2882268 5179708
Swap: 4194296 32180 4162116
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu-----
r b swpd free buff cache si so bi bo in cs us
sy id wa st
2 0 32180 386880 1356476 3423712 0 0 643 327 25 18
10 4 81 5 0
current cyrus.conf:
SERVICES {
# add or remove based on preferences
imap cmd="imapd" listen="imap" prefork=5
pop3 cmd="pop3d" listen="pop3" prefork=3
sieve cmd="timsieved" listen="sieve" prefork=0
lmtp cmd="lmtpd -a" listen="lmtp" prefork=0
}
I have to prevent memory issue when some oddity forces clients to make
DOS on Cyrus. So I would like to configure the maxchild cyrus
parameter for imap. I would like to set this value to avoid memory
issue during normal work, having a known value of system RAM.
I see that an IMAPD process takes in average 22-25MB. With 8GB RAM,
the server would swap already with less than 400 conns; it not
happens, so this evaluation is wrong or too many conservative. I think
that I better consider differences between RSS and SHR memory to
tuning imapd processes number, but I'm not sure.
Could you help me in this tuning? In particular I'm interested on
relation between memory usage and maxchild imapd processes.
Meanwhile I would also tune the maxfds parameter. With lsof I measure
about 60 opened files by each imapd process. If I have 400 imapd
processes it means a 'ulimit -f' global system of 60*400=24000. This
is wrong, because I currently have a 4096 limit and I never had
problems. Maybe do I have to consider only 'Running' processes to
compute this treshold?
Thank you very much for every hints.
Marco
More information about the Info-cyrus
mailing list