Upgrade 2.1.12 - 2.2.10 weirdness (subscribe -> cpu bound)

Gilles Bruno Gilles.Bruno at ujf-grenoble.fr
Fri Dec 3 14:00:16 EST 2004


Hi,

sorry for this long mail ... and my poor english

since the announce of (CAN-2004-1015), we're (slowly/cautiously) upgrading 
our cyrus-imap servers : everythink went fine went we upgraded cyrus 2.1.15 
and 2.2.8 to 2.2.10, but we've got problems upgrading or major server, 
currently running cyrus imapd 2.1.12.

** What we did :

This server runs Cyrus 2.1.12 + Sasl 2.1.12 + db3 and hosts ~ 3000 users, 
140 Go (Raid 5) e-mail (about 20.000.000 mails). We decided "prudently " to 
migrate its content to another cyrus 2.2.10 server - the "old" 2.1.12 server 
is *still* untouched.

We backed up its mailspool, mailboxes.db (flat), user.seen & user.sub files 
to a freshly installed FreeBSD 4 stable + db4 (db41-4.1.25) + sasl 2.1.18 + 
cyrus 2.2.10.

For it's backend db we use the following settings :

annotation_db:   skiplist          (unused on 2.1.12)
duplicate_db:    berkeley-nosync   (DB3 on 2.1.12)
mboxlist_db:     flat              (unchanged)
ptscache_db:     berkeley          (unused on 2.1.12)
quota_db:        flat              (unused - no quotas)
seenstate_db:    flat              (unchanged)
subscription_db: flat              (unchanged)
tlscache_db:     berkeley-nosync   (unused - no tls)

(we use the *same* DB backends on our old 2.1.12 server)

We reconstructed its mailspool twice (su cyrus -c /usr/cyrus/bin/reconstruct 
-rf user),  ran chk_cyrus : flawlessly


** What then happened :

So far, we had no visible problem - the user can happily use their 
mailboxes, neither seen states or ACL have been lost

*But* when a single user want to modify it subscription (using 
mozilla/thunderbird: "Files" -> "Subscribe...") the imapd process take 
"ages" ~ 20s, but worse this imapd eats ~ 80% CPU on a dual Xeon 2.8/1Go !!

About 20s after (even if the user has only 20 mbox), it gives the right 
list, but we're really freightened when there will be ~ 500 simultaneous 
users :/

... furthermore, we run exactly the same cyrus imapd binaries on the same 
hardware (Dell Pe2650), OS (FreeBSD 4 stable) on another server (succesfully 
upgraded from 2.2.8 to 2.2.10) without any problem - the subscribe dialog 
appears without delay/ CPu "plateau"

We ktraced the imapd process on the 2 servers without any diffs (minus delays)


** So far our conclusions :

  . it's not an I/O issue - no activity on the dedicated raid/AHC 
39160/partition - quite dead iostat stats

  . neither ctl_cyrusdb -r nor chk_cyrus complain - no "suspicious" log

  . it's not likely a user.sub DB problem - we tried converting user.sub to 
skiplist, DB and even recreated them without any success

---------------- sample dialog (mbox names obscured/removed) :
        "27 lsub "" "INBOX.*"\r
        (snip)
	27 OK Completed (0.000 secs 26 calls)\r

        "28 list "" "INBOX.%"\r
        (snip)
	28 OK Completed (0.008 secs 31 calls)\r

        "29 list "" "INBOX.%.%"\r
        (snip)
	29 OK Completed (0.016 secs 18 calls)\r

        "30 lsub "" "user.*"\r
        (snip)
        "30 OK Completed (0.000 secs 1 calls)\r

        "31 list "" "user.%"\r
        (snip)
        "31 OK Completed (6.227 secs 1 calls)\r <- ###### THIS ONE

        "32 list "" "user.%.%"\r
        (snip)
        "32 OK Completed (6.375 secs 1 calls)\r <- ###### THIS ONE
        "
        "33 lsub "" "*"\r
        (snip)
	33 OK Completed (0.008 secs 28 calls)\r

        "34 list "" "%"\r
        (snip)
        "* LIST (\\HasChildren) "." "INBOX"\r
	* LIST (\\HasChildren) "." "XXX"\r     (shared mb)
	34 OK Completed (6.305 secs 34 calls)\r  <- ###### THIS ONE


        "35 list "" "%.%"\r
        "* LIST (\\HasChildren) "." "INBOX.XXX"\r
	* LIST (\\HasNoChildren) "." "INBOX.YYY"\r
         (snip)
	* LIST (\\HasNoChildren) "." "crip-visio.gdfgdfg"\r    (shared mb)
	35 OK Completed (6.492 secs 32 calls)\r    <- ###### THIS ONE
        "
        "36 IDLE\r
        "36 OK Completed\r
        "
        "37 close\r
	38 logout\r
----------------

  . the ktrace shows (of course) many calls to the mailboxes.db file - 
apparently, the ( LIST "" "user.%" ) commands take ages to completed (we've 
got 127000 lines in our mailboxes.db flat file) - but when we run under 
cyradm a single "listmailboxes %" or "listmailboxes %.%" it completes at 
normal speed...

                           =-=-=-=

Could any "gurus" out there enlighten us : we're running out of candle for 
our voodoo cults... and of course (thanx Mr. Murphy) we've got to migrate 
quickly - our of campus imap acces is blocked since Wed. 25/11

If it is mailboxes.db related (???) would a single reconstruct -rf from an 
empty mbox.db file help (but we ran it twice - there were no diffs) ??


Thanks for your patience for this long mail,

best regards


Gilles BRUNO
System Admin
University Joseph Fourier - France
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html




More information about the Info-cyrus mailing list