sync_client fails to exit when manual replication and rolling replication are combined (2.3.16-8)

Simpson, John R john_simpson at reyrey.com
Fri Oct 15 15:42:21 EDT 2010


Greetings all,

With the help of this list, I've successfully upgraded our lab 2.3.7 (RHEL/CentOS packaged) server to 2.3.16-8 and tested rolling replication, manual replication by user, and manual replication by mailbox.  Everything was going better than expected until I shut down cyrus-imapd and /var/log/maillog started filling up with DB errors.

If I shut down cyrus-imapd with rolling replication enabled and have not run sync_client manually, both Cyrus and sync_client shut down cleanly.

However, if I have run sync_client manually while rolling replication is enabled the rolling replication instance will not exit.  Instead, it appears to start spawning subprocesses and throwing database errors.  The change in database errors (below) appears to coincide with the completion of "Exporting cyrus-imapd databases".  The critical DB error messages continue until sync_client is killed.

I've run "ctl_cyrusdb -r" as suggested by the "run recovery" message.

Below are the steps that reproduce the problem, /var/log/maillog, the most relevant portions of imapd.conf and cyrus.conf, and the packages installed on the system.  cyrus-imapd-2.3.16-8 was built with "rpmbuild -ba" on CentOS 5.4 64-bit using http://www.invoca.ch/pub/packages/cyrus-imapd/cyrus-imapd-2.3.16-8.src.rpm.  The cyrus-sasl and db4 packages are from CentOS.  Please let me know if any other information would be useful.

Thank you for your help.

Best regards,

John


# /usr/lib/cyrus-imapd/sync_client -v -u testuser at testdomain.net
USER testuser at testdomain.net
ADDSUB testuser at testdomain.net INBOX
# date ; service cyrus-imapd stop
Fri Oct 15 14:51:34 EDT 2010
Shutting down cyrus-imapd:                                 [  OK  ]
Exporting cyrus-imapd databases:                           [  OK  ]

Oct 15 14:50:58 eml-store04 sync_client[23742]: USER received NO response: IMAP_MAILBOX_NONEXISTENT Failed to access inbox for testuser at testdomain.net: Mailbox does not exist

  NOTE: Despite this message, the user appears identical on the 
  master and replica when checked with ctl_mboxlist -d.

Oct 15 14:51:35 eml-store04 master[22922]: attempting clean shutdown on SIGQUIT
Oct 15 14:51:35 eml-store04 master[22922]: process 22950 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22949 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22948 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22947 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22946 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22945 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22944 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22943 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22939 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22938 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22937 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22936 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22935 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22934 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22933 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22932 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22931 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: All children have exited, closing down
Oct 15 14:51:35 eml-store04 sync_client[23914]: DBERROR db4: region 1 (environment): reference count went negative
Oct 15 14:51:35 eml-store04 sync_client[23916]: DBERROR db4: region 1 (environment): reference count went negative
Oct 15 14:51:35 eml-store04 sync_client[23919]: DBERROR db4: region 1 (environment): reference count went negative
Oct 15 14:51:35 eml-store04 sync_client[23925]: DBERROR db4: region 1 (environment): reference count went negative
Oct 15 14:51:35 eml-store04 sync_client[23929]: DBERROR db4: region 1 (environment): reference count went negative
... many more ...
Oct 15 14:51:41 eml-store04 sync_client[25331]: DBERROR db4: region 1 (environment): reference count went negative
Oct 15 14:51:41 eml-store04 sync_client[25332]: DBERROR db4: region 1 (environment): reference count went negative
Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR db4: PANIC: fatal region error detected; run recovery
Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR: critical database situation
Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR db4: PANIC: fatal region error detected; run recovery
Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR: critical database situation
... continue until sync_client is killed ...


>From /etc/cyrus.conf:
START {
  # do not delete this entry!
  recover       cmd="ctl_cyrusdb -r"
  # this is only necessary if using idled for IMAP IDLE
  idled         cmd="idled"
  syncclient    cmd="/usr/lib/cyrus-imapd/sync_client -r -o" listen="csync"
}

>From /etc/imapd.conf
  ## Added for replication -- Master
  sync_host: eml-replica04.asddev.reyrey.com
  sync_authname: xyz
  sync_password: abc
  sync_compress: 0
  sync_log: 1
  guid_mode: sha1

Packages installed:
  cyrus-imapd-2.3.16-8
  cyrus-imapd-utils-2.3.16-8
  cyrus-sasl-2.1.22-5.el5_4.3
  cyrus-sasl-lib-2.1.22-5.el5_4.3
  cyrus-sasl-lib-2.1.22-5.el5_4.3
  cyrus-sasl-plain-2.1.22-5.el5_4.3
  cyrus-sasl-plain-2.1.22-5.el5_4.3
  db4-4.3.29-10.el5_5.2
  db4-4.3.29-10.el5_5.2
  db4-utils-4.3.29-10.el5_5.2
  postfix-2.3.3-2.1.el5_2


John Simpson 
Senior Software Engineer, I. T. Engineering and Operations



More information about the Info-cyrus mailing list