ZFS doing insane I/O reads
Pascal Gienger
pascal.gienger at uni-konstanz.de
Tue Feb 28 01:43:07 EST 2012
Le 28/02/2012 07:13, Ram a écrit :
> This is a 16GB Ram server running Linux Centos 5.5 64 bit.
> There seems to be something definitely wrong .. because all the memory
> on the machine is free.
> (I dont seem to have fsstat on my server .. I will have to get it
> compiled )
ZFS as FUSE?
We have Solaris 10 on x86(amd64) and we noticed that ZFS needs _RAM_,
the more, the better.
On Solaris, using "mdb" you can look at the memory consumption (in pages
of physical memory):
bash-3.2# mdb -k
Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp
cpu.generic zfs sockfs ip hook neti sctp arp usba fcp fctl qlc lofs sata
fcip random crypto logindmux ptm ufs mpt mpt_sas ]
> ::memstat
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 6052188 23641 36%
ZFS File Data 4607758 17999 27%
Anon 2115097 8262 13%
Exec and libs 6915 27 0%
Page cache 82665 322 0%
Free (cachelist) 433268 1692 3%
Free (freelist) 3477076 13582 21%
Total 16774967 65527
Physical 16327307 63778
As this is early in the morning, there are plenty of free pages in RAM
(4 million), and the memory mapped executables of Cyrus IMAPd and shared
libraries only consume 6915 pages, 27 MB.
1779 connections at this moment.
We had to go from 32 GB to 64 GB per node due to extreme lags in IMAP
spool processing. And even with 64 GB when memory pressure from the
Kernel and Anon (mapped pages without an underlying file: classical
malloc() or mmap mapped on /dev/zero after COW) there are light
degradations in access times on "high volume" hours. Another idea we had
was the usage of a fast SSD as Layer 2 ARC (L2ARC) named "cache" on the
zpool command line, based on the lru algorithm at the end the blocks
containing the "cyrus.*"-files should be there. The problem lies in the
fact that a pool with a local cache device and remote SAN (FiberChannel)
storage won't be able to be imported automatically on another machine
without "replacing" the "faulty" device. And for the price of an
FC-enabled SSD you can buy MUCH RAM.
Does your CentOS system have some kind of trace to look for the block
numbers which are read constantly? In Solaris I use dtrace to look for
that and also for file based i/o to look WHICH files get read and
written when there is starvation.
--
Pascal Gienger Jabber/XMPP/Mail: pascal.gienger at uni-konstanz.de
University of Konstanz, IT Services Department ("Rechenzentrum")
Building V, Room V404, Phone +49 7531 88 5048, Fax +49 7531 88 3739
G+: https://plus.google.com/114525323843315818983/
More information about the Info-cyrus
mailing list