ZFS doing insane I/O reads

Tue Feb 28 01:43:07 EST 2012

Le 28/02/2012 07:13, Ram a écrit :

> This is a 16GB Ram server running Linux Centos 5.5 64 bit.
> There seems to be something definitely wrong .. because all the memory
> on the machine is free.
> (I dont seem to have fsstat on my server ..  I will have to get it
> compiled )

ZFS as FUSE?

We have Solaris 10 on x86(amd64) and we noticed that ZFS needs _RAM_, 
the more, the better.

On Solaris, using "mdb" you can look at the memory consumption (in pages 
of physical memory):

bash-3.2# mdb -k
Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp 
cpu.generic zfs sockfs ip hook neti sctp arp usba fcp fctl qlc lofs sata 
fcip random crypto logindmux ptm ufs mpt mpt_sas ]
> ::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                    6052188             23641   36%
ZFS File Data             4607758             17999   27%
Anon                      2115097              8262   13%
Exec and libs                6915                27    0%
Page cache                  82665               322    0%
Free (cachelist)           433268              1692    3%
Free (freelist)           3477076             13582   21%

Total                    16774967             65527
Physical                 16327307             63778

As this is early in the morning, there are plenty of free pages in RAM 
(4 million), and the memory mapped executables of Cyrus IMAPd and shared 
libraries only consume 6915 pages, 27 MB.

1779 connections at this moment.

We had to go from 32 GB to 64 GB per node due to extreme lags in IMAP 
spool processing. And even with 64 GB when memory pressure from the 
Kernel and Anon (mapped pages without an underlying file: classical 
malloc() or mmap mapped on /dev/zero after COW) there are light 
degradations in access times on "high volume" hours. Another idea we had 
was the usage of a fast SSD as Layer 2 ARC (L2ARC) named "cache" on the 
zpool command line, based on the lru algorithm at the end the blocks 
containing the "cyrus.*"-files should be there. The problem lies in the 
fact that a pool with a local cache device and remote SAN (FiberChannel) 
storage won't be able to be imported automatically on another machine 
without "replacing" the "faulty" device. And for the price of an 
FC-enabled SSD you can buy MUCH RAM.

Does your CentOS system have some kind of trace to look for the block 
numbers which are read constantly? In Solaris I use dtrace to look for 
that and also for file based i/o to look WHICH files get read and 
written when there is starvation.

-- 
Pascal Gienger     Jabber/XMPP/Mail: pascal.gienger at uni-konstanz.de
University of Konstanz, IT Services Department ("Rechenzentrum")
Building V, Room V404, Phone +49 7531 88 5048, Fax +49 7531 88 3739
G+: https://plus.google.com/114525323843315818983/