Huge performance problems after updating from 2.4 to 2.5.9

Wolfgang Breyha wbreyha at gmx.net
Thu Sep 29 09:14:34 EDT 2016


Hi!

A can add another story of that type, but with different setup:

We already migrated to 2.5.7 on our ten backends some month ago step by step
and upgraded to 2.5.9 lately. We never had any performance issues on them. All
of them have done a full "reconstruct -V max" and special-use metadata is set
to step in for static xlist missing in 2.5 frontends.

Yesterday I did the final step by updating our mupdate server and the murder
frontends. No problems on the mupdate server so far.

The configs are mostly the same like for 2.4 (except for the changed names).
The only key change done was from skiplist to twoskip for all databases:
annotation_db: twoskip
duplicate_db: twoskip
ptscache_db: twoskip
mboxlist_db: twoskip
seenstate_db: twoskip
statuscache_db: twoskip
subscription_db: flat
tls_sessions_db: twoskip
userdeny_db: flat

All of them are placed in /var/spool/imap/config except statuscache and
tlscache which reside in /dev/shm.

After migrating the frontends yesterday evening I already recognized a higher
load and decided to keep an eye on it.

Our 3 frontends are:
CentOS 6 (vmware based)
16GB RAM
4 CPUs

They usually have ~4000-4500 concurrent imap(s) connections at peak time and
had a load ~1-3 with cyrus 2.4. swap was not used.

Today in the morning we had the same connection count with load ~100-250 with
cyrus 2.5.9 and twoskip. no swap used. top shows a huge amount of imap
processes in running state. Another thing I recognized was that changes in
mailboxes.db on backends reached the mupdate server, but didn't come through
to the frontends anymore even after 15 minutes.

I changed mailboxes.db and tlscache back to skiplist and load went down to
5-15. So, much better but still 5 times as high as with 2.4 and still causing
headaches.

Currently I have no idea what causes the load besides twoskip (which I can
live without since skiplist never caused us any troubles).

I compared "USAGE" loglines from two days (2.4) ago to 2.5 with skiplist and
twoskip.

2.4   : user: 0.384322, sys: 0.029056 (23538 USAGE lines, 09:00-10:00)
2.4   : user: 0.353515, sys: 0.030814 (25790 USAGE lines, 13:00-14:00)
2.5 ts: user: 0.903491, sys: 0.028196 (23429 ...        , 08:00-09:00)
2.5 ts: user: 1.130391, sys: 0.032894 (27709 ...        , 09:00-10:00)
2.5 sl: user: 0.864270, sys: 0.026580 (29567 ...        , 13:00-14:00)
2.5 sl: user: 1.077462, sys: 0.032952 (25879 ...        , 14:00-15:00)

Comparing all three I wonder why skiplist makes such a big difference.

lmtpd shows now relevant difference for all three.

Most likely I will go for horizontal scaling putting a 4th frontend in the line.

Greetings, Wolfgang
-- 
Wolfgang Breyha <wbreyha at gmx.net> | http://www.blafasel.at/
Vienna University Computer Center | Austria



More information about the Info-cyrus mailing list