<div class="gmail_quote">On Wed, Jan 23, 2013 at 5:25 PM, Andrew Morgan <span dir="ltr">&lt;<a href="mailto:morgan@orst.edu" target="_blank">morgan@orst.edu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">On Wed, 23 Jan 2013, francis picabia wrote:<br>

<br>

</div><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Thanks for the response.  I have been checking my iostat whenever there is<br>

a number of messages in the active queue.<br>

<br>

Here is a sample snapshot from a script I run (ignoring the first<br>

iostat output of averages):<br>

<br>

Active in queue: 193<br>

12:47:01 up 5 days,  5:23,  6 users,  load average: 14.11, 9.22, 4.67<br>

<br>

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util<br>

sda5              3.25   281.00 19.75 129.50   654.00  3384.00    27.06 5.53   36.24   6.69  99.80<br>

<br>

svctm is about the same as when not under load and it went above 7 only<br>

once.<br>

Then there is this comment about the validity of tracking svctm:<br>

<a href="http://www.xaprb.com/blog/2010/09/06/beware-of-svctm-in-linuxs-iostat/" target="_blank">http://www.xaprb.com/blog/<u></u>2010/09/06/beware-of-svctm-in-<u></u>linuxs-iostat/</a><br>

<br>

%util is often reaching close to %100 when there is a queue to process.<br>

<br>

sda5 is where the cyrus mail/imap lives.  Our account names all begin with<br>

numbers, so almost all mail accounts are under the q folder.<br>

</blockquote>

<br></div>

Okay, I didn&#39;t realize svctm could be suspect, although I guess that makes sense in a RAID array.  What about your await times?  Does await increase during peak loads?<br>

<br>

It seems pretty clear from iostat that you are IO bound on writes during mail delivery.  As Vincent said in his reply, RAID5 performs poorly during writes.  Each write actually consumes 4 disk operations (read old data, read old parity, write new data, write new parity).  If you can live with the slight additional risk, turn on write caching on the Perc 5/i if you haven&#39;t already.  I think they call it &quot;write-back&quot; versus &quot;write-through&quot;.<br>


<br>

If you can handle it, you would probably be a lot happier converting that RAID5 set to RAID10.  You&#39;ll lose a disk worth of capacity, but get double the write performance.<br>

<br>

However, what is your real goal?  Do you want to deliver mail more quickly, or do you want to reduce your load average?  You can probably reduce your load average and perhaps gain a bit of speed by tweaking the lmtp maxchild limit.  If you really need to deliver mail more quickly, then you need to throw more IOPS at it.<br>


<br>

Let&#39;s keep this discussion going!  There are lots of ways to tune for performance.  I&#39;ve probably missed some.  :)<br>

<br></blockquote></div><br>In another email discussion on the Redhat mailing list, I&#39;ve confirmed we have<br>an issue with partition alignment.  This is getting to be quite the mess<br>out there.  I saw one posting where it is speculated there are thousands of<br>

poorly set up disk partitions for their RAID stripe size.  fdisk and<br>OS installers were late getting updated for the new TB disks<br>and SSD disks as well.  Partition alignment might account<br>for 5 to 30% of a performance hit.<br>

<br>I&#39;ve checked and my cyrus lmtpd process count<br>never exceeds 11 under work load.<br>await jumps up to 150-195 at worst.<br><br>If I&#39;m already at IO saturation, I can&#39;t see how a higher lmtpd limit<br>would help.<br>

<br>My goal is to keep the system load reasonable so it is responsive for<br>mailbox access by the end users.  Right now we get nagios alerts<br>about 6 times a day for excessive load.  If I can move the mail<br>queue workload into a hill instead of a sharp peak on the cacti<br>

load graph, it would be good.  There are minutes around the peaks<br>where the queue is emptied and we have only 5 messages<br>inbound per minute.<br><br>In hind sight, I agree RAID 10 should have been implemented.<br>At the time, four years ago, getting lots of space was the<br>

priority as space needs always grow.  We&#39;ve never seen load<br>issues until this month, and it seems to coincide with a <br>general increase of all email volume and traffic.  Our primary<br>MX is also getting hit more than normal.<br>

<br><br>