Load spikes when new email arrives

Fri Jan 25 10:37:59 EST 2013

On Fri, Jan 25, 2013 at 10:26 AM, Blake Hudson <blake at ispn.net> wrote:

>
> francis picabia wrote the following on 1/25/2013 7:55 AM:
>
>
> On Thu, Jan 24, 2013 at 12:22 PM, Blake Hudson <blake at ispn.net> wrote:
>
>>
>>
>>  There are a couple suggestions I'd like to put forth. First, improper
>> partition alignment is generally masked by the controller cache. I
>> strongly encourage you to check that your RAID array is making use of
>> this cache by enabling the WriteBack caching option on this array,
>> especially if your PERC card has a BBU (I think this was optional on
>> perc 5). You can install the MegaCLI tool from LSI to verify this (can
>> also be checked from OpenManage or reboot into the controller BIOS).
>>
>>
> Thanks for this tip.  It put me on to what is wrong.
>
> Jan 18 07:25:39 myserv Server Administrator: Storage Service EventID:
> 2335  Controller event log: BBU disabled; changing WB virtual disks to WT:
> Controller 0 (PERC 5/i Integrated)
>
> Bingo!  We had write back all along, and the performance tanked when it
> fell back to write through.  I was wondering why my policy change attempts
> were flipping back when I tried testing WB this morning!
>
> This explains everything we've been seeing.  Wow.  Gotta call Dell.
>
> Thanks everyone for the assistance.  I didn't think a battery which shows
> OK
> status in omreport could wound us!
>
>
>  The PERC cards will disable write-back caching while the BBU is
> charging/exercising. However, within a few hours the BBU should return to
> normal status. In rare instances, people on the Dell mailing list have
> reported that their caching status never returns to write-back - even after
> attempting to force write-back caching on the array. Attempts and power
> cycling or firmware flashing are tried, but seem to be futile in most
> cases. Often, replacement of the card is necessary. I'm unsure if it's the
> battery, the card, or some software setting, but I would definitely follow
> up with Dell.
>

I found the problem as I was attempting to set a policy of write back (wb)
and it
said success but status showed write through.  Then I saw entries in the
logs
predating my use of the omconfig command with writepolicy, and that was the
clue.

This sample command is a way to dump the controller's log to disk:

/opt/dell/srvadmin/sbin/omconfig storage controller controller=0
action=exportlog

Then look for "Absolute" (at least true for Perc 5/i):

grep Absolute /var/log/lsi_0125.log

T27:     Absolute State of Charge  : 33 %
T27:     Absolute State of Charge  : 29 %
T27:     Absolute State of Charge  : 33 %
T27:     Absolute State of Charge  : 29 %
T27:     Absolute State of Charge  : 33 %
T27:     Absolute State of Charge  : 29 %

I believe at below 30% it is the threshold where write back is disabled.

I'm glad we caught this as there are a number of Dell 2950 systems
in a similar state or about to be.

>
> On the next server (or array) you configure, I would attempt to align your
> partitions as you've investigated. Sector 2048 seems to be a good starting
> position for most RAID levels. I have no conclusive evidence that a
> different file system or alignment improves my performance, because I've
> never done a fair side by side test with controlled inputs. However, we use
> ext4 and do align our partitions using RAID10 on 15k SAS drives for all our
> Cyrus installs. I have found some issues with the newer systems that I
> attribute to the move from ext3 to ext4 which can result in MySQL
> replication problems on power loss/freeze, but these issues are vary rare
> and usually easy to recover from in our environment. I also notice that new
> systems always perform better than the old systems, even with identical
> hardware - I've often attributed this to fragmentation.
>
>
This exercise has been useful to compile a set of things we want to ensure
on future system setups for best performance:

1. RAID 10
2. ext4  (fsync problem gone)
3. align partitions to MB, not cylinders
4. use write-back writepolicy and automate check of the BBU status
   (don't ever want to reach a BBU disabled state)

According to Redhat's feature request bugzilla report, converting to ext4
from ext3
is unsupported mash up.  They recommend only new ext4 creation.

Thanks to all for their recommendations.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20130125/f321e45d/attachment-0001.html