replication fails after 2.3.9 -> 2.3.11

Paul Dekkers Paul.Dekkers at surfnet.nl
Sun Feb 24 13:58:19 EST 2008


Hi,

Paul Dekkers wrote:
> Simon Matter wrote:
>>> Paul Dekkers wrote:
>>>
>>>> I finally found a moment for upgrading my 2.3.9 install (using Simon's
>>>> RPMs on Red Hat 4.6, 64-bit) to 2.3.11-3 (leaving the config files
>>>> untouched), after which it seems that replication isn't working properly
>>>> anymore.
>>> While it seems to be only replication for now that fails me; is it
>>> possible to revert to the previous version? (While that implies for me
>>> that I'll have to rpm -e and install the previous rpm, I suppose.)
>>> Not sure if I'd like that, but I really really liked my replication
>>> running.
>> You could do that with "rpm -Uvh --oldpackage ...".
> 
> Ah, thanks for that. I might do that if I can't get it to work soon,
> 
> (Judging from the changes/upgrade notes I guess nothing dramatically or
> irreversibly changed in any of the databases/formats, I didn't touch the
> GUID bits yet - so I guess I should be fine there downgrading.)

I reverted to the old packages now. And although regular operation is
(again) fine, I do have some problems with replication, and/or my
replica, although at first the downgrade seemed to solve everything.

After the downgrade I was pleasantly surprised that I could login in on
the replica (just started imapd on the replica to test this, this was my
"downgrade test") before a manual user.paul replication from the master.
After this sync, I was unable to SELECT my INBOX on the replica
(everything OK on the master). A reconstruct of my user.paul on the
replica solved this.

While strace-ing it appears that imapd hung on the seen file; either
during the mmap of it, or the fcntl, but I'm not convinced that this
file was faulty - I could easily cvt't it to flat and back to skiplist
without solving the issue (or other errors).
(This could be completely unrelated; I've seen this before, where imapd
hung and consumed 100% CPU until the folder was reconstructed.)

But: It seems now that every folder that was successfully synced during
2.3.11 now needs a reconstruct on the replica. The replica logs:

syncserver[19616]: cmd_status_work_sub(): UIDs out of order!
last message repeated 304 times
master[19519]: process 19616 exited, signaled to death by 7
master[19519]: service syncserver pid 19616 in BUSY state: terminated
abnormally

Fortunately, I see no such errors on my master. But - well - with 2.3.11
on both systems it seemed that I had to reconstruct on all folders on my
master, and now I downgraded to 2.3.9 it seems that I need to
reconstruct all (touched) folders on my replica. (At least that does not
consume user CPU, of course.)

I'm not 100% sure if I'm better off then before the downgrade. I'll find
out after reconstructing some more users I suppose (which takes ages).

Any clues/suggestions are welcome :-)

Paul


>>>> If I run the sync_client, just a simple -u paul, I see in my logs:
>>>>
>>>> sync_client[18493]: SETMODSEQ received BAD response: Syntax error in
>>>> Setflags: Invalid modseq
>>>> sync_client[18493]: Error in do_user(paul): bailing out!
>>>>
>>>> Before the upgrade, I'm sure replication was working properly. I
>>>> checked, both servers are really running the same versions of
>>>> everything.
>>>>
>>>> I noticed that if I strace the sync_client, the folder on which it bails
>>>> out is always the same. If I reconstruct that folder, and re-run (or
>>>> just the mailbox), the process continues (up to the next folder that
>>>> causes the thing to bail out - although it doesn't bail out on every
>>>> folder).
>>>>
>>>> There were more strange log-items related to the sync_client;
>>>>
>>>> sync_client[18232]: USER: Invalid type 1 response from server
>>>> sync_client[18232]: Discarding: 0000000000000000000000000000000000000000
>>>> ()
>>>> sync_client[18232]: Discarding: 2 0
>>>> 0000000000000000000000000000000000000000 ()
>>>> sync_client[18232]: Discarding: 3 0
>>>> 0000000000000000000000000000000000000000 ()
>>>> sync_client[18232]: Discarding: 4 0
>>>> 0000000000000000000000000000000000000000 (\answered)
>>>> sync_client[18232]: Discarding: 5 0
>>>> 0000000000000000000000000000000000000000 ()
>>>>
>>>> and a bunch more, like:
>>>>
>>>> sync_client[18232]: Discarding: archief.thuispc
>>>> ...
>>>> sync_client[18232]: sync_eatlines_unsolicited(): resynchronised okay
>>>> ...
>>>> sync_client[18232]: Processing sync log file
>>>> /data/config/imap/sync/log-18231 failed: Bad protocol
>>>> sync_client[18231]: process 18232 exited, status 1
>>>>
>>>> Any clue why replication stopped working properly for me after the
>>>> upgrade?
>>> There is more sync-related uglyness in my logging; while I suppose this
>>> is the most harmless one:
>>>
>>> sync_client[19532]: Hit upload limit 0 at UID 180958 for user.paul.Junk,
>>> sending
>>>
>>> ... I don't recall seing it before. (And a limit of 0?!)
>>>
>>> What is worse, is that sync_client now also segfaults on the
>>> rolling-log, as soon as I start a sync_client -v -r -f log,
>>>
>>> MAILBOXES user.henny user.henny.Email lists.IETF-announce
>>> user.paul.Drafts archief.netmaster.spam user.elise
>>> Segmentation fault
>>>
>>> And my kernel logs that as:
>>>
>>> sync_client[18881]: segfault at 0000000000000000 rip 0000002a96054a30
>>> rsp 0000007fbfffda08 error 4
>>>
>>> ... unfortunately, the sync-log is only getting bigger, and I didn't
>>> realize that running a sync_client -r -f log would take that much IO and
>>> CPU (or that is something that changed in this version too).
>>>
>>> Somehow I'm not sure if running a reconstruct on all mailboxes is an
>>> option, it would also take a huge amount of time. But somehow I don't
>>> think it makes sense.
>>>
>>> I'll include my imapd.conf below, in case that is useful.
>>>
>>> Paul
>>>
>>> P.S. Hmm, and I intentionally skipped 2.3.10 as I believe that people
>>> were having problems with that, and waited a bit with 2.3.11 :-S
>> I'm not using replication but IIRC there were some changes between 2.3.9
>> and 2.3.11 which have to be addressed when using replication. Did you
>> carefully check the upgrade instructions? Maybe there is something you
>> have to do.
> 
> I did have a look at that; but I'm afraid there's nothing in there that
> I missed; didn't touch the GUIDs (and your RPM leaves guid_mode default,
> which is "off"), there are a couple of changes in replication that might
> just be the cause of my problems, but it's not clearly related I'm
> afraid. (Or at least there's nothing I didn't do that I should have done.)
> 
> I actually run replication with 2.3.11 on a different machine without
> problems, but that's a small setup and on FreeBSD instead of Red Hat.
> But I know what differences there are with the RPM, the manual is very
> helpful with that, so I don't expect anything RPM-specific. (And there
> was actually a fix for delayed delete in 2.3.11 in combination with
> replication, so even if the invoca RPM has delayed delete by default
> enabled I think it should work.)
> 
>> Another note: Be aware that the invoca rpm has some changed defaults for
>> imapd.conf (which is stated in the manpage). Now, if one feature doesn't
>> play nice with replication, this won't disturb other people who don't have
>> those options enabled. Options that come to mind are:
>>
>> delete_mode: delayed
>> expunge_mode: delayed
>> flushseenstate: 1
> 
> The delete_mode is indeed new; I didn't change the toggle there while I
> did for expunge_mode, but now that I put it back to "immediate" it
> doesn't help me either. (And I actually found that now I have a folder
> "DELETED" that also got replicated (before it crashed again) ;-) but to
> a different partition actually then on my master, surprisingly. Oh well.)
> 
>> Sorry if it doesn't really help.
> 
> Well, thanks for replying!
> (The suggestion how to revert to the previous Cyrus is useful, and I'm
> afraid I'll need it.)
> 
> Paul
> 
> 
>> Simon
>>
>>> My imapd.conf on the master:
>>>
>>> configdirectory: /data/config/imap
>>> defaultpartition: imap4
>>> partition-imap1: /data/imap1
>>> partition-imap2: /data/imap2
>>> partition-imap3: /data/imap3
>>> partition-imap4: /data/imap4
>>> sievedir: /data/config/sieve
>>> hashimapspool: false
>>>
>>> md5_dir: /data/config/md5
>>>
>>> allowanonymouslogin: no
>>> allowplaintext: yes
>>> plaintextloginpause: 0
>>> admins: cyrus
>>> sasl_pwcheck_method: saslauthd
>>> sasl_mech_list: PLAIN LOGIN
>>> #sasl_pwcheck_method: auxprop
>>>
>>> duplicatesuppression: 1
>>> quotawarn: 90
>>> postuser: shared
>>> lmtp_downcase_rcpt: yes
>>> username_tolower: yes
>>>
>>> sieveuserhomedir: false
>>> unix_group_enable: 1
>>>
>>> sync_host: ...
>>> sync_authname: cyrus
>>> sync_password: ...
>>>
>>> sync_machineid: 2
>>> sync_log: true
>>>
>>> # default invoca-rpm db definitions on this machine!
>>> ## explicit database definitions (from the past)
>>> ##duplicate_db: skiplist
>>> ## deliver.db: Berkeley DB (Btree, version 8, native byte-order)
>>> #duplicate_db: berkeley
>>> #mboxlist_db: skiplist
>>> ## mailbox keys?
>>> #mboxkey_db: skiplist
>>> #seenstate_db: skiplist
>>> #subscription_db: flat
>>> ##tlscache_db: skiplist
>>> ## tls_sessions.db: Berkeley DB (Btree, version 8, native byte-order)
>>> #tlscache_db: berkeley
>>> #annotation_db: skiplist
>>> ##ptscache_db: skiplist
>>> #ptscache_db: berkeley
>>> #quota_db: quotalegacy
>>>
>>> # without this, I got errors in my test-setup using the dovecot imaptest
>>> expunge_mode: immediate
>>>
>>> ----
>>> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
>>> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
>>> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>>>
>>
> 
> ----
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html



More information about the Info-cyrus mailing list