<br><br><div class="gmail_quote">On Fri, Jan 25, 2013 at 10:26 AM, Blake Hudson <span dir="ltr">&lt;<a href="mailto:blake@ispn.net" target="_blank">blake@ispn.net</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


  <div text="#000000" bgcolor="#FFFFFF">

    <br>

    <div>francis picabia wrote the following on

      1/25/2013 7:55 AM:<br>

    </div><div><div class="h5">

    <blockquote type="cite"><br>

      <div class="gmail_quote">On Thu, Jan 24, 2013 at 12:22 PM, Blake

        Hudson <span dir="ltr">&lt;<a href="mailto:blake@ispn.net" target="_blank">blake@ispn.net</a>&gt;</span>

        wrote:<br>

        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div>

            <div><br>

              <br>

            </div>

          </div>

          There are a couple suggestions I&#39;d like to put forth. First,

          improper<br>

          partition alignment is generally masked by the controller

          cache. I<br>

          strongly encourage you to check that your RAID array is making

          use of<br>

          this cache by enabling the WriteBack caching option on this

          array,<br>

          especially if your PERC card has a BBU (I think this was

          optional on<br>

          perc 5). You can install the MegaCLI tool from LSI to verify

          this (can<br>

          also be checked from OpenManage or reboot into the controller

          BIOS).<br>

          <br>

        </blockquote>

        <div><br>

          Thanks for this tip.  It put me on to what is wrong.<br>

          <br>

          Jan 18 07:25:39 myserv Server Administrator: Storage Service

          EventID: 2335  Controller event log: BBU disabled; changing WB

          virtual disks to WT:  Controller 0 (PERC 5/i Integrated) <br>

          <br>

          Bingo!  We had write back all along, and the performance

          tanked when it<br>

          fell back to write through.  I was wondering why my policy

          change attempts<br>

          were flipping back when I tried testing WB this morning!<br>

          <br>

          This explains everything we&#39;ve been seeing.  Wow.  Gotta call

          Dell.<br>

          <br>

          Thanks everyone for the assistance.  I didn&#39;t think a battery

          which shows OK<br>

          status in omreport could wound us!<br>

          <br>

        </div>

      </div>

      <br>

    </blockquote></div></div>

    The PERC cards will disable write-back caching while the BBU is

    charging/exercising. However, within a few hours the BBU should

    return to normal status. In rare instances, people on the Dell

    mailing list have reported that their caching status never returns

    to write-back - even after attempting to force write-back caching on

    the array. Attempts and power cycling or firmware flashing are

    tried, but seem to be futile in most cases. Often, replacement of

    the card is necessary. I&#39;m unsure if it&#39;s the battery, the card, or

    some software setting, but I would definitely follow up with Dell.<br></div></blockquote><div><br>I found the problem as I was attempting to set a policy of write back (wb) and it<br>said success but status showed write through.  Then I saw entries in the logs<br>

predating my use of the omconfig command with writepolicy, and that was the clue.<br><br>This sample command is a way to dump the controller&#39;s log to disk:<br><br>/opt/dell/srvadmin/sbin/omconfig storage controller controller=0 action=exportlog<br>

<br>Then look for &quot;Absolute&quot; (at least true for Perc 5/i):<br><br>grep Absolute /var/log/lsi_0125.log <br><br>T27:     Absolute State of Charge  : 33 % <br>T27:     Absolute State of Charge  : 29 % <br>T27:     Absolute State of Charge  : 33 % <br>

T27:     Absolute State of Charge  : 29 % <br>T27:     Absolute State of Charge  : 33 % <br>T27:     Absolute State of Charge  : 29 % <br><br>I believe at below 30% it is the threshold where write back is disabled.<br><br>

I&#39;m glad we caught this as there are a number of Dell 2950 systems<br>in a similar state or about to be.<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div text="#000000" bgcolor="#FFFFFF">

    <br>

    On the next server (or array) you configure, I would attempt to

    align your partitions as you&#39;ve investigated. Sector 2048 seems to

    be a good starting position for most RAID levels. I have no

    conclusive evidence that a different file system or alignment

    improves my performance, because I&#39;ve never done a fair side by side

    test with controlled inputs. However, we use ext4 and do align our

    partitions using RAID10 on 15k SAS drives for all our Cyrus

    installs. I have found some issues with the newer systems that I

    attribute to the move from ext3 to ext4 which can result in MySQL

    replication problems on power loss/freeze, but these issues are vary

    rare and usually easy to recover from in our environment. I also

    notice that new systems always perform better than the old systems,

    even with identical hardware - I&#39;ve often attributed this to

    fragmentation.<span class="HOEnZb"><font color="#888888"><br>

    </font></span><br></div></blockquote></div><br>This exercise has been useful to compile a set of things we want to ensure<br>on future system setups for best performance:<br><br>1. RAID 10<br>2. ext4  (fsync problem gone)<br>

3. align partitions to MB, not cylinders<br>4. use write-back writepolicy and automate check of the BBU status<br>   (don&#39;t ever want to reach a BBU disabled state)<br><br>According to Redhat&#39;s feature request bugzilla report, converting to ext4 from ext3<br>

is unsupported mash up.  They recommend only new ext4 creation.<br><br>Thanks to all for their recommendations.<br><br>