<br><br><div class="gmail_quote">On Fri, Jan 25, 2013 at 10:26 AM, Blake Hudson <span dir="ltr"><<a href="mailto:blake@ispn.net" target="_blank">blake@ispn.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<br>
<div>francis picabia wrote the following on
1/25/2013 7:55 AM:<br>
</div><div><div class="h5">
<blockquote type="cite"><br>
<div class="gmail_quote">On Thu, Jan 24, 2013 at 12:22 PM, Blake
Hudson <span dir="ltr"><<a href="mailto:blake@ispn.net" target="_blank">blake@ispn.net</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div><br>
<br>
</div>
</div>
There are a couple suggestions I'd like to put forth. First,
improper<br>
partition alignment is generally masked by the controller
cache. I<br>
strongly encourage you to check that your RAID array is making
use of<br>
this cache by enabling the WriteBack caching option on this
array,<br>
especially if your PERC card has a BBU (I think this was
optional on<br>
perc 5). You can install the MegaCLI tool from LSI to verify
this (can<br>
also be checked from OpenManage or reboot into the controller
BIOS).<br>
<br>
</blockquote>
<div><br>
Thanks for this tip. It put me on to what is wrong.<br>
<br>
Jan 18 07:25:39 myserv Server Administrator: Storage Service
EventID: 2335 Controller event log: BBU disabled; changing WB
virtual disks to WT: Controller 0 (PERC 5/i Integrated) <br>
<br>
Bingo! We had write back all along, and the performance
tanked when it<br>
fell back to write through. I was wondering why my policy
change attempts<br>
were flipping back when I tried testing WB this morning!<br>
<br>
This explains everything we've been seeing. Wow. Gotta call
Dell.<br>
<br>
Thanks everyone for the assistance. I didn't think a battery
which shows OK<br>
status in omreport could wound us!<br>
<br>
</div>
</div>
<br>
</blockquote></div></div>
The PERC cards will disable write-back caching while the BBU is
charging/exercising. However, within a few hours the BBU should
return to normal status. In rare instances, people on the Dell
mailing list have reported that their caching status never returns
to write-back - even after attempting to force write-back caching on
the array. Attempts and power cycling or firmware flashing are
tried, but seem to be futile in most cases. Often, replacement of
the card is necessary. I'm unsure if it's the battery, the card, or
some software setting, but I would definitely follow up with Dell.<br></div></blockquote><div><br>I found the problem as I was attempting to set a policy of write back (wb) and it<br>said success but status showed write through. Then I saw entries in the logs<br>
predating my use of the omconfig command with writepolicy, and that was the clue.<br><br>This sample command is a way to dump the controller's log to disk:<br><br>/opt/dell/srvadmin/sbin/omconfig storage controller controller=0 action=exportlog<br>
<br>Then look for "Absolute" (at least true for Perc 5/i):<br><br>grep Absolute /var/log/lsi_0125.log <br><br>T27: Absolute State of Charge : 33 % <br>T27: Absolute State of Charge : 29 % <br>T27: Absolute State of Charge : 33 % <br>
T27: Absolute State of Charge : 29 % <br>T27: Absolute State of Charge : 33 % <br>T27: Absolute State of Charge : 29 % <br><br>I believe at below 30% it is the threshold where write back is disabled.<br><br>
I'm glad we caught this as there are a number of Dell 2950 systems<br>in a similar state or about to be.<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<br>
On the next server (or array) you configure, I would attempt to
align your partitions as you've investigated. Sector 2048 seems to
be a good starting position for most RAID levels. I have no
conclusive evidence that a different file system or alignment
improves my performance, because I've never done a fair side by side
test with controlled inputs. However, we use ext4 and do align our
partitions using RAID10 on 15k SAS drives for all our Cyrus
installs. I have found some issues with the newer systems that I
attribute to the move from ext3 to ext4 which can result in MySQL
replication problems on power loss/freeze, but these issues are vary
rare and usually easy to recover from in our environment. I also
notice that new systems always perform better than the old systems,
even with identical hardware - I've often attributed this to
fragmentation.<span class="HOEnZb"><font color="#888888"><br>
</font></span><br></div></blockquote></div><br>This exercise has been useful to compile a set of things we want to ensure<br>on future system setups for best performance:<br><br>1. RAID 10<br>2. ext4 (fsync problem gone)<br>
3. align partitions to MB, not cylinders<br>4. use write-back writepolicy and automate check of the BBU status<br> (don't ever want to reach a BBU disabled state)<br><br>According to Redhat's feature request bugzilla report, converting to ext4 from ext3<br>
is unsupported mash up. They recommend only new ext4 creation.<br><br>Thanks to all for their recommendations.<br><br>