Cyrus Murder 2.3 - questions/problems

Mirosław Jaworski mjaw at ikp.pl
Fri May 19 07:13:25 EDT 2006


Hi

I am new to this list, although using couple Cyrus installations
for couple years with success.

I decided to migrate my biggest single big box installation
( ~40k mailboxes ) to somehow more fault-tolerant x86 farm.
Having good experience with Cyrus i decided to test cyrus murder.

I started from
http://asg.web.cmu.edu/cyrus/download/imapd/install-murder.html
trying to make 2 frontend 2 backend setup.

After some time i found 2.3 unified backend/fronted feature and 
decided to give it a shot.  

My current lab configuration is single FreeBSD 5.4 box with jails
with installed murder enabled cyrus 2.3. Jails are
- 10.10.10.1 ( mupdate master ) 
- 10.10.10.2  ( mupdate slave )

Problems:
- mupdate eating cpu 
  - while starting first node ( with mupdate master ) "mupdate -m"
    eats a lot of cpu time 

start of the first node with mupdate master:

May 19 12:00:55 lab master[99930]: process started
May 19 12:00:55 lab ctl_cyrusdb[99931]: recovering cyrus databases
May 19 12:00:55 lab ctl_cyrusdb[99931]: skiplist:
recovered /var/imap/mailboxes.db (2 records, 580 bytes) in 0 seconds
May 19 12:00:55 lab ctl_cyrusdb[99931]: skiplist:
recovered /var/imap/annotations.db (0 records, 144 bytes) in 0 seconds
May 19 12:00:55 lab ctl_cyrusdb[99931]: done recovering cyrus databases
May 19 12:00:55 lab master[99930]: ready for work
May 19 12:00:55 lab ctl_cyrusdb[99932]: checkpointing cyrus databases
May 19 12:00:55 lab ctl_cyrusdb[99932]: done checkpointing cyrus
databases

  - when i start second node "master's" mupdate stops eating time,
"slave's"
    starts

start of the second node with mupdate slave:

May 19 12:03:58 lab master[99982]: process started
May 19 12:03:58 lab ctl_cyrusdb[99983]: recovering cyrus databases
May 19 12:03:58 lab ctl_cyrusdb[99983]: skiplist:
recovered /var/imap/mailboxes.db (1 record, 568 bytes) in 0 seconds
May 19 12:03:58 lab ctl_cyrusdb[99983]: skiplist:
recovered /var/imap/annotations.db (0 records, 144 bytes) in 0 seconds
May 19 12:03:58 lab ctl_cyrusdb[99983]: done recovering cyrus databases
May 19 12:03:58 lab master[99982]: ready for work
May 19 12:03:58 lab ctl_cyrusdb[99984]: checkpointing cyrus databases
May 19 12:03:58 lab ctl_cyrusdb[99984]: done checkpointing cyrus
databases
May 19 12:03:58 lab mupdate[99933]: no user in db
May 19 12:03:58 lab mupdate[99933]: login: mail1.test.pl [10.10.10.1]
mupdate DIGEST-MD5 User logged in
May 19 12:03:58 lab mupdate[99985]: successful mupdate connection to
10.10.10.1
May 19 12:03:58 lab mupdate[99985]: unready for connections
May 19 12:03:58 lab mupdate[99985]: synchronizing mailbox list with
master mupdate server
May 19 12:03:58 lab mupdate[99985]: mailbox list synchronization
complete

  - ktrace of the cpu eating mupdate doesnt show anything interesting:

 72078 mupdate  0.000000 CALL  kse_wakeup(0x810d990)
 72078 mupdate  0.000017 RET   kse_wakeup 0
 72078 mupdate  0.000029 RET   kse_release 0
 72078 mupdate  0.000038 CALL  kse_release(0x812bfac)
 72078 mupdate  0.000048 CALL  kse_wakeup(0x810d990)
 72078 mupdate  0.000052 RET   kse_wakeup 0
 72078 mupdate  0.000059 RET   kse_release 0
 72078 mupdate  0.000067 CALL  gettimeofday(0xbfa8de58,0)
 72078 mupdate  0.000073 RET   gettimeofday 0
 72078 mupdate  0.000077 CALL  select(0x7,0xbfa8deb0,0,0,0xbfa8dea8)
 72078 mupdate  0.000087 RET   select 0
 72078 mupdate  0.000091 CALL  gettimeofday(0xbfa8de58,0)
 72078 mupdate  0.000096 RET   gettimeofday 0
 72078 mupdate  0.000102 CALL  kse_wakeup(0x810da10)
 72078 mupdate  0.000107 RET   kse_wakeup 0
 72078 mupdate  0.000115 RET   kse_release 0
 72078 mupdate  0.000123 CALL  kse_release(0x812ffac)
 72078 mupdate  0.000135 CALL  kse_release(0x8113fac)
 72078 mupdate  0.000146 CALL  kse_wakeup(0x810da10)
 72078 mupdate  0.000151 RET   kse_wakeup 0
 72078 mupdate  0.000158 RET   kse_release 0
 72078 mupdate  0.000165 CALL  kse_wakeup(0x810d410)
 72078 mupdate  0.000170 RET   kse_wakeup 0
 72078 mupdate  0.000177 RET   kse_release 0
 72078 mupdate  0.000183 CALL  gettimeofday(0xbfaadfa0,0)
 72078 mupdate  0.000187 RET   gettimeofday 0
 72078 mupdate  0.000194 CALL  kse_release(0x8113fac)
 72078 mupdate  0.000206 CALL  gettimeofday(0xbfa8dfa0,0)
 72078 mupdate  0.000212 RET   gettimeofday 0
 72078 mupdate  0.000220 CALL  kse_release(0x812bfac)
 72078 mupdate  0.000233 CALL  gettimeofday(0xbfa9de58,0)
 72078 mupdate  0.000238 RET   gettimeofday 0
 72078 mupdate  0.000243 CALL  select(0x7,0xbfa9deb0,0,0,0xbfa9dea8)
 72078 mupdate  0.000249 RET   select 0

it's not even a millisecond :/

- "synchronizing mailbox list" by mupdate seems to work one way only
  - adding a mailbox on 10.10.10.1
    mailbox is visible ( cyradm's lm command ) on 10.10.10.1,
    not visible on 10.10.10.2 

logging to 10.10.10.1 and adding user.test1, syslog:

May 19 12:18:17 lab imap[412]: no user in db
May 19 12:18:17 lab imap[412]: login: mail1.test.pl [10.10.10.1] cyradm
DIGEST-MD5 User logged in

  - adding a mailbox on 10.10.10.2
    mailbox is visible on 10.10.10.2, also in no time on
    10.10.10.1, syslog:

logging to 10.10.10.2 and adding user.test2, mupdate follows: 
May 19 12:18:39 lab imap[420]: no user in db
May 19 12:18:39 lab imap[420]: login: mail2.test.pl [10.10.10.2] cyradm
DIGEST-MD5 User logged in
May 19 12:18:43 lab mupdate[99933]: no user in db
May 19 12:18:43 lab mupdate[99933]: login: mail1.test.pl [10.10.10.1]
mupdate DIGEST-MD5 User logged in

- proxying doesn't work (?) as expected (?) 
  ( user.test1 mailbox on 10.10.10.1, user.test2 on 10.10.10.2 )

logging to user's mailbox directly on nodes mailbox exist:

# telnet 10.10.10.1 110
Trying 10.10.10.1...
Connected to 10.10.10.1.
Escape character is '^]'.
+OK mail1.test.pl Cyrus POP3 v2.3.3 server ready
<3899096700.1148034763 at mail1.test.pl>
user test1
+OK Name is a valid mailbox
pass test
+OK Mailbox locked and ready
quit
+OK
Connection closed by foreign host.

# telnet 10.10.10.2 110
Trying 10.10.10.2...
Connected to 10.10.10.2.
Escape character is '^]'.
+OK mail2.test.pl Cyrus POP3 Murder v2.3.3 server ready
<2557214129.1148034910 at mail2.test.pl>
user test2
+OK Name is a valid mailbox
pass test
+OK Mailbox locked and ready
quit
+OK
Connection closed by foreign host.

but ( notice different effects while connecting to master and slave
with the user having mailbox on the other node ):

connecting to node with master mupdate with user having mailbox
on the 2nd node:

# telnet 10.10.10.1 110
Trying 10.10.10.1...
Connected to 10.10.10.1.
Escape character is '^]'.
+OK mail1.test.pl Cyrus POP3 v2.3.3 server ready
<2621659328.1148035730 at mail1.test.pl>
user test2
+OK Name is a valid mailbox
pass test
-ERRSASL(-4): no mechanism available: No worthy mechs found   

looks like 10.10.10.1 tried to proxy connection to 10.10.10.2
but couldn't authenticate on 10.10.10.2, syslog:

May 19 12:49:00 lab pop3[1240]: no user in db
May 19 12:49:00 lab pop3[1240]: login: mail1.test.pl [10.10.10.1] test2
plaintext User logged in
May 19 12:49:00 lab pop3[1240]: No worthy mechs found
May 19 12:49:00 lab pop3[1240]: couldn't authenticate to backend server:
no mechanism available
May 19 12:49:00 lab pop3[1240]: couldn't authenticate to backend server

Why the limitation? If user uses weak plain, we don't risk 
anything if we use it inside murder too?

connecting to node with slave mupdate with user having mailbox
on the 1nd node:

# telnet 10.10.10.2 110
Trying 10.10.10.2...
Connected to 10.10.10.2.
Escape character is '^]'.
+OK mail2.test.pl Cyrus POP3 Murder v2.3.3 server ready
<1730543175.1148036623 at mail2.test.pl>
user test1
+OK Name is a valid mailbox
pass test
-ERR [SYS/PERM] Unable to locate maildrop: Mailbox does not exist

That case is simple :/ 10.10.10.2 doesn't know its valid mailbox,
but located on 10.10.10.1 :/
syslog:

May 19 13:03:48 lab pop3[1359]: no user in db
May 19 13:03:48 lab pop3[1359]: login: mail2.test.pl [10.10.10.2] test1
plaintext User logged in
May 19 13:03:51 lab pop3[1359]: Unable to locate maildrop user.test1:
Mailbox does not exist


Questions:
- does each machine in cyrus murder unified setup need to have
  distinct name ( imapd.conf's "servername:" )
  - when all the servers had same servername, traffic wasnt 
    proxied to proper box at all i believe, hence i gave 
    each node distinct name. 
  - i prefer to show same fqdn to the users no matter which
    node they connect to

# telnet 10.10.10.1 110
Trying 10.10.10.1...
Connected to 10.10.10.1.
Escape character is '^]'.
+OK mail1.test.pl Cyrus POP3 v2.3.3 server ready
<1261877316.1148034195 at mail1.test.pl>
    ^^^^^^^^^^^^^

- authentication 
  - what's the minimal subset of authentication rights i need to add
    on nodes sasl databases? looks like i need to add at least each
    user on each node with that node's realm. correct or wrong? 
   
Regards

M

-- 
Miroslaw "Psyborg" Jaworski
GCS/IT d- s+:+ a C++$ UBI++++$ P+++$ L- E--- W++(+++)$ N++ o+ K- w-- O-
M- V- PS+ PE++ Y+ PGP t 5? X+ R++ !tv b++(+++) DI++ D+ G e* h++ r+++ y?



More information about the Info-cyrus mailing list