Slow lmtpd

Mon Mar 5 17:13:52 EST 2007

> Can values way above 100% be trusted? If so, it's pretty bad (this is
> from a situation where there are 200 lmtp processes, which is the
> current limit I set):

I've never seen over 100%, and it doesn't seem to make sense, so I'm 
guessing it's a bogus value.

> avg-cpu:  %user   %nice %system %iowait   %idle
>           2.53    0.00    5.26   89.98    2.23

However this shows that the system is mainly waiting on IO as we expected.

> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await  svctm  %util
> etherd/e0.0
>             0.00   0.00  5.87 235.02  225.10 2513.77   112.55  1256.88
> 11.37     0.00  750.32 750.32 18074.51

Ugg, if you line those up, await = 750.32

await - The  average  time  (in  milliseconds)  for I/O requests issued to 
the device to be served. This includes the time spent by the requests in 
queue and the time spent servicing them.

So it's taking 0.75 seconds on average to service an IO request, that's 
really bad.

> Load average tends to get really high. It starts increasing really fast
> after the number of lmtpd processes reaches the limit set in cyrus.conf,
> and can easily get to 150 or 200. One of the moments where the problem

Makes sense. There's 200 lmtpd processes waiting on IO, and in linux at 
least, the load average is calculated as number of processes not in "sleep" 
state basically.

Really you never want that many lmtpd processes, if they're all in use, it's 
clear you've got an IO problem. Limiting it to 10 or so is probably a 
reasonable number to avoid complete IO saturation and IO sevice delays.

> - The ones that don't have the problem use local disks instead of AoE
> - The ones that don't have the problem are limited to 2000 domains
> (around 8000 accounts), while the one using the AoE storage serves 4000
> domains (around 20000 accounts).
>
> Anyone running cyrus with that many accounts?

Yes, no problem, though using local disks.

I think the problem is probably the latency that AoE introduces into the 
disk path. A couple of questions

1. How many disks in the AoE array?
2. Are they all one RAID array, or multiple RAID arrays? What type?
3. Are they one volume, or multiple volumes?

Because of the latency for system <-> drive IO, the thing you want to try 
and do is allow the OS to send more outstanding requests in parallel. The 
problem is I don't know where in the FS <-> RAID <-> AoE path the 
serialising bits are, so I'm not sure what the best things to do to increase 
parallelism are, but the usualy things to try are more RAID arrays with less 
drives per array, and more volumes per RAID array. This gives more places 
for parallelism to occur assuming there's not something holding some 
internal lock somewhere.

Some of our machines have 4 RAID arrays divided up into 40 separate 
filesystems/volumes.

Rob