sync_client problems

Bron Gondwana brong at fastmail.fm
Fri May 12 09:16:56 EDT 2017


Just read the 148 file and see if there is corruption.  I suspect that's the cause.

Bron

On Fri, 12 May 2017, at 23:09, Eric Cunningham wrote:
> I have snapshot backups of that account, that particular message and the 
> various cyrus db files.  What can I provide for review?
> 
> 
> On 5/11/17 7:38 PM, Bron Gondwana wrote:
> > It looked like the '148.' file was corrupted in some way.  I assume you've deleted all the evidence now, so we can't see how!  Oh well :(
> >
> > Bron.
> >
> > On Fri, 12 May 2017, at 04:55, Eric Cunningham wrote:
> >> Just to close the loop on this, once the corrupted cdm-lit account on my
> >> copy server successfully reconstructed, the backlog of replication has
> >> completed successfully and is now caught up.
> >>
> >> -Eric
> >>
> >> On 05/11/2017 02:20 PM, Eric Cunningham wrote:
> >>> -G didn't seem to help, but I tried a "reconstruct -f -r" command
> >>> without having previously deleted all occurrences of the cyrus.* files
> >>> and it was then successful.
> >>>
> >>>
> >>>
> >>> On 05/11/2017 01:44 PM, Patrick Boutilier wrote:
> >>>> Try a -G with reconstruct?
> >>>>
> >>>> -G     Force  re-parsing  of the underlying message (checks GUID
> >>>> correctness).  Reconstruct with -G should fix all possible individual
> >>>> message issues, including corrupted data files.
> >>>>
> >>>>
> >>>> On 05/11/2017 02:37 PM, Eric Cunningham wrote:
> >>>>> I have to walk this back.  In looking slightly further back in my
> >>>>> logfiles, before every instance of a failure to sync some folder, I
> >>>>> see a common error reported prior to every "bailing out! Bad
> >>>>> protocol" error - "IOERROR: GUID mismatch
> >>>>> /var/spool/cyrus/mail/c/user/cdm-lit/Sent/148."
> >>>>>
> >>>>> May 11 12:44:09 imap1 sync_client[48590]: IOERROR: GUID mismatch
> >>>>> /var/spool/cyrus/mail/c/user/cdm-lit/Sent/148.
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
> >>>>> response to MAILBOX (end of file reached)
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: do_folders(): update
> >>>>> failed: user.cdm-lit.Sent 'Bad protocol'
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
> >>>>> response to MAILBOXES (end of file reached)
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
> >>>>> response to MAILBOXES (end of file reached)
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
> >>>>> response to MAILBOXES (end of file reached)
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
> >>>>> response to MAILBOXES (end of file reached)
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
> >>>>> response to MAILBOXES (end of file reached)
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
> >>>>> response to MAILBOXES (end of file reached)
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
> >>>>> response to MAILBOXES (end of file reached)
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
> >>>>> response to UNMAILBOX (end of file reached)
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: folder_delete(): failed:
> >>>>> user.guest-student-coordinator.Trash.Aarflot 'Bad protocol'
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: Error in do_sync(): bailing
> >>>>> out! Bad protocol
> >>>>> May 11 12:44:22 imap1 sync_client[48590]: Processing sync log file
> >>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol
> >>>>>
> >>>>> When I recontructed this cdm-lit account, it ran successfully on my
> >>>>> master host, but fails on my copy host:
> >>>>>
> >>>>> [cyrus at imap2 ~]$ reconstruct -f -r user.cdm-lit
> >>>>> user.cdm-lit
> >>>>> user.cdm-lit.Sea Trail 2013
> >>>>> user.cdm-lit.Sent uid 1 found - adding
> >>>>> user.cdm-lit.Sent uid 2 found - adding
> >>>>> user.cdm-lit.Sent uid 3 found - adding
> >>>>> user.cdm-lit.Sent uid 4 found - adding
> >>>>> ....
> >>>>> user.cdm-lit.Sent uid 146 found - adding
> >>>>> user.cdm-lit.Sent uid 147 found - adding
> >>>>> user.cdm-lit.Sent uid 148 found - adding
> >>>>> fatal error: Internal error: assertion failed: imap/mailbox.c: 2847:
> >>>>> record->size
> >>>>>
> >>>>> Since I couldn't get a successful reconstruct on this account, I
> >>>>> deleted it and recreated it from my master host.  However, I'm still
> >>>>> unable to get a successful reconstruct with "failed to read index
> >>>>> header" for every subfolder and "fatal error: Internal error:
> >>>>> assertion failed: imap/mailbox.c: 2847: record->size"
> >>>>>
> >>>>> Any ideas on how to correct this so I can see if I can then get past
> >>>>> the replication error?
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> -Eric
> >>>>>
> >>>>>
> >>>>> On 05/11/2017 11:46 AM, Eric Cunningham wrote:
> >>>>>> Thanks Bron, but that doesn't seem to work for me, unless I'm
> >>>>>> missing something.
> >>>>>>
> >>>>>> I ran reconstructs for this account on both my master and copy hosts:
> >>>>>>
> >>>>>> [cyrus at imap1 ~]$ reconstruct -f -r user.scramer
> >>>>>> user.scramer
> >>>>>> user.scramer.Archive
> >>>>>> user.scramer.Archives
> >>>>>> user.scramer.Archives.2004
> >>>>>> ...
> >>>>>> user.scramer.Trash.IS Networking and Operations
> >>>>>> user.scramer.Trash.IS Security
> >>>>>> user.scramer.Trash.IS Servers
> >>>>>> user.scramer.Trash.IS Staff
> >>>>>> user.scramer.Trash.IS Surveys
> >>>>>> ...
> >>>>>>
> >>>>>>
> >>>>>> [cyrus at imap2 ~]$ reconstruct -f -r user.scramer
> >>>>>> user.scramer
> >>>>>> user.scramer.Archive
> >>>>>> user.scramer.Archives
> >>>>>> user.scramer.Archives.2004
> >>>>>> ...
> >>>>>> user.scramer.Trash.IS Networking and Operations
> >>>>>> user.scramer.Trash.IS Security
> >>>>>> user.scramer.Trash.IS Servers
> >>>>>> user.scramer.Trash.IS Staff
> >>>>>> user.scramer.Trash.IS Surveys
> >>>>>> ...
> >>>>>>
> >>>>>>
> >>>>>> I then restart the replication against the log-run file and it again
> >>>>>> fails on a folder that no longer exists in that account:
> >>>>>>
> >>>>>> ...
> >>>>>> May 11 11:40:51 imap1 sync_client[60696]: sync_mailboxes: doing 1000
> >>>>>> May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length
> >>>>>> response to MAILBOXES (end of file reached)
> >>>>>> May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length
> >>>>>> response to MAILBOXES (end of file reached)
> >>>>>> May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length
> >>>>>> response to UNMAILBOX (end of file reached)
> >>>>>> May 11 11:40:51 imap1 sync_client[60696]: folder_delete(): failed:
> >>>>>> user.scramer.Trash.IS Software Management 'Bad protocol'
> >>>>>> May 11 11:40:51 imap1 sync_client[60696]: Error in do_sync():
> >>>>>> bailing out! Bad protocol
> >>>>>> May 11 11:40:51 imap1 sync_client[60696]: Processing sync log file
> >>>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol
> >>>>>>
> >>>>>>
> >>>>>> -Eric
> >>>>>>
> >>>>>>
> >>>>>> On 05/11/2017 02:14 AM, Bron Gondwana wrote:
> >>>>>>> Looks like you have a corrupted mailboxes database - if you run a
> >>>>>>> reconstruct at each end then it should be fine.  Replication won't
> >>>>>>> bail for an actually-doesn't-exist mailbox, but it can't fix broken
> >>>>>>> mailboxes, that's what reconstruct is for.
> >>>>>>>
> >>>>>>> Bron.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, 11 May 2017, at 01:01, Eric Cunningham wrote:
> >>>>>>>> Hi list, I'm running cyrus replication on cyrus 2.5.10.  It seems
> >>>>>>>> whenever a folder is encountered in log-run that doesn't exist on the
> >>>>>>>> client and/or the server, the replication process logs "Error in
> >>>>>>>> do_sync(): bailing out! Bad protocol".  Sometimes, it dies
> >>>>>>>> completely,
> >>>>>>>> other times it restarts at the beginning of the log-run file.  But
> >>>>>>>> when
> >>>>>>>> it re-encounters the folder in question, the same failure occurs
> >>>>>>>> and the
> >>>>>>>> problem continuously loops until some manual action is taken.  I
> >>>>>>>> have a
> >>>>>>>> checker script that attempts to restart a stopped sync but I haven't
> >>>>>>>> tried to code it to get past these types of folder errors.  In the
> >>>>>>>> meantime, replication effectively stops and the replication log
> >>>>>>>> continuously builds in size.  You might imagine my horror at
> >>>>>>>> discovering
> >>>>>>>> yesterday that replication of my production server with 1200 accounts
> >>>>>>>> had been failing for over 7 weeks unnoticed, generating 9.5 million
> >>>>>>>> lines in the replication log files.  I had gotten out of the habit of
> >>>>>>>> daily monitoring of replication as it had been running fine and I
> >>>>>>>> eventually moved on to other things.  I'm trying to get
> >>>>>>>> replication to
> >>>>>>>> run through the enormous backlog but am running into more similar
> >>>>>>>> folder
> >>>>>>>> issues that cause the process to fail, I make manual fixes, and start
> >>>>>>>> the process again.  In my experience with this, reconstruction of the
> >>>>>>>> affected folders on the master host and copy host has no effect.
> >>>>>>>>
> >>>>>>>> Can this otherwise excellent feature be made more robust to work past
> >>>>>>>> these folder issues?
> >>>>>>>>
> >>>>>>>> Here are a couple of examples I've observed thus far of how the
> >>>>>>>> process
> >>>>>>>> is failing for me.
> >>>>>>>>
> >>>>>>>> --------------
> >>>>>>>>
> >>>>>>>> In this example, the subfolder "Fly America Foreign Travel" no longer
> >>>>>>>> existed on the master host but did exist on the copy host.  To
> >>>>>>>> resolve
> >>>>>>>> this, I had to manually delete that folder from the copy host.
> >>>>>>>>
> >>>>>>>> May  9 15:17:48 imap1 sync_client[34614]: inefficient replication
> >>>>>>>> (8 >
> >>>>>>>> 5) user.jtaft.PO.Fly America Foreign Travel
> >>>>>>>> May  9 15:17:48 imap1 sync_client[34614]: FULLMAILBOX received NO
> >>>>>>>> response: IMAP_MAILBOX_NONEXISTENT No Such Mailbox
> >>>>>>>> May  9 15:17:48 imap1 sync_client[34614]: do_folders(): update
> >>>>>>>> failed:
> >>>>>>>> user.jtaft.PO.Fly America Foreign Travel 'Mailbox does not exist'
> >>>>>>>> May  9 15:17:48 imap1 sync_client[34614]: IOERROR: Mailbox does
> >>>>>>>> not exist
> >>>>>>>> May  9 15:17:48 imap1 sync_client[34614]: Error in do_sync(): bailing
> >>>>>>>> out! Mailbox does not exist
> >>>>>>>> May  9 15:17:48 imap1 sync_client[34614]: Processing sync log file
> >>>>>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Mailbox does
> >>>>>>>> not exist
> >>>>>>>> May  9 15:17:48 imap1 sync_client[34614]: Reprocessing sync log file
> >>>>>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run
> >>>>>>>>
> >>>>>>>> --------------
> >>>>>>>>
> >>>>>>>> In this example, the subfolder "Sohn" did not exist on the master
> >>>>>>>> host
> >>>>>>>> nor on the copy host.  To resolve, I had to edit log-run and
> >>>>>>>> completely
> >>>>>>>> remove all entries for this "Sohn" subfolder.
> >>>>>>>>
> >>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length
> >>>>>>>> response
> >>>>>>>> to MAILBOXES (end of file reached)
> >>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length
> >>>>>>>> response
> >>>>>>>> to MAILBOXES (end of file reached)
> >>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length
> >>>>>>>> response
> >>>>>>>> to UNMAILBOX (end of file reached)
> >>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: folder_delete(): failed:
> >>>>>>>> user.gg-chair.Trash.Sohn 'Bad protocol'
> >>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: Error in do_sync(): bailing
> >>>>>>>> out! Bad protocol
> >>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: Processing sync log file
> >>>>>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol
> >>>>>>>>
> >>>>>>>> --------------
> >>>>>>>>
> >>>>>>>> Often, errors are logged without "bailing out!" and replication
> >>>>>>>> continues to run:
> >>>>>>>>
> >>>>>>>> May 10 10:45:00 imap1 sync_client[87438]: sync_mailboxes: doing 1000
> >>>>>>>> May 10 10:45:01 imap1 sync_client[87438]: sync_mailboxes: doing 1000
> >>>>>>>> May 10 10:45:24 imap1 sync_client[87438]: IOERROR: GUID mismatch
> >>>>>>>> /var/spool/cyrus/mail/c/user/cdm-lit/Sent/148.
> >>>>>>>> May 10 10:45:37 imap1 sync_client[87438]: IOERROR: zero length
> >>>>>>>> response
> >>>>>>>> to MAILBOX (end of file reached)
> >>>>>>>> May 10 10:45:37 imap1 sync_client[87438]: do_folders(): update
> >>>>>>>> failed:
> >>>>>>>> user.cdm-lit.Sent 'Bad protocol'
> >>>>>>>> May 10 10:45:37 imap1 sync_client[87438]: sync_mailboxes: doing 1000
> >>>>>>>>
> >>>>>>>> --------------
> >>>>>>>>
> >>>>>>>> Thank you, list!
> >>>>>>>>
> >>>>>>>> -Eric
> >>>>>>>>
> >>>>>>>> ----
> >>>>>>>> Cyrus Home Page: http://www.cyrusimap.org/
> >>>>>>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> >>>>>>>> To Unsubscribe:
> >>>>>>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> ----
> >>>>> Cyrus Home Page: http://www.cyrusimap.org/
> >>>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> >>>>> To Unsubscribe:
> >>>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
> >>>>
> >>>>
> >>>>
> >>>> ----
> >>>> Cyrus Home Page: http://www.cyrusimap.org/
> >>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> >>>> To Unsubscribe:
> >>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
> >>>>
> >>>
> >>
> >> ----
> >> Cyrus Home Page: http://www.cyrusimap.org/
> >> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> >> To Unsubscribe:
> >> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
> >
> >
> 


-- 
  Bron Gondwana
  brong at fastmail.fm


More information about the Info-cyrus mailing list