sync_client problems
Eric Cunningham
eric at whoi.edu
Fri May 12 09:46:17 EDT 2017
Yes, file 148 is completely corrupted.
On 5/12/17 9:16 AM, Bron Gondwana wrote:
> Just read the 148 file and see if there is corruption. I suspect that's the cause.
>
> Bron
>
> On Fri, 12 May 2017, at 23:09, Eric Cunningham wrote:
>> I have snapshot backups of that account, that particular message and the
>> various cyrus db files. What can I provide for review?
>>
>>
>> On 5/11/17 7:38 PM, Bron Gondwana wrote:
>>> It looked like the '148.' file was corrupted in some way. I assume you've deleted all the evidence now, so we can't see how! Oh well :(
>>>
>>> Bron.
>>>
>>> On Fri, 12 May 2017, at 04:55, Eric Cunningham wrote:
>>>> Just to close the loop on this, once the corrupted cdm-lit account on my
>>>> copy server successfully reconstructed, the backlog of replication has
>>>> completed successfully and is now caught up.
>>>>
>>>> -Eric
>>>>
>>>> On 05/11/2017 02:20 PM, Eric Cunningham wrote:
>>>>> -G didn't seem to help, but I tried a "reconstruct -f -r" command
>>>>> without having previously deleted all occurrences of the cyrus.* files
>>>>> and it was then successful.
>>>>>
>>>>>
>>>>>
>>>>> On 05/11/2017 01:44 PM, Patrick Boutilier wrote:
>>>>>> Try a -G with reconstruct?
>>>>>>
>>>>>> -G Force re-parsing of the underlying message (checks GUID
>>>>>> correctness). Reconstruct with -G should fix all possible individual
>>>>>> message issues, including corrupted data files.
>>>>>>
>>>>>>
>>>>>> On 05/11/2017 02:37 PM, Eric Cunningham wrote:
>>>>>>> I have to walk this back. In looking slightly further back in my
>>>>>>> logfiles, before every instance of a failure to sync some folder, I
>>>>>>> see a common error reported prior to every "bailing out! Bad
>>>>>>> protocol" error - "IOERROR: GUID mismatch
>>>>>>> /var/spool/cyrus/mail/c/user/cdm-lit/Sent/148."
>>>>>>>
>>>>>>> May 11 12:44:09 imap1 sync_client[48590]: IOERROR: GUID mismatch
>>>>>>> /var/spool/cyrus/mail/c/user/cdm-lit/Sent/148.
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
>>>>>>> response to MAILBOX (end of file reached)
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: do_folders(): update
>>>>>>> failed: user.cdm-lit.Sent 'Bad protocol'
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
>>>>>>> response to MAILBOXES (end of file reached)
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
>>>>>>> response to MAILBOXES (end of file reached)
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
>>>>>>> response to MAILBOXES (end of file reached)
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
>>>>>>> response to MAILBOXES (end of file reached)
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
>>>>>>> response to MAILBOXES (end of file reached)
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
>>>>>>> response to MAILBOXES (end of file reached)
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
>>>>>>> response to MAILBOXES (end of file reached)
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
>>>>>>> response to UNMAILBOX (end of file reached)
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: folder_delete(): failed:
>>>>>>> user.guest-student-coordinator.Trash.Aarflot 'Bad protocol'
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: Error in do_sync(): bailing
>>>>>>> out! Bad protocol
>>>>>>> May 11 12:44:22 imap1 sync_client[48590]: Processing sync log file
>>>>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol
>>>>>>>
>>>>>>> When I recontructed this cdm-lit account, it ran successfully on my
>>>>>>> master host, but fails on my copy host:
>>>>>>>
>>>>>>> [cyrus at imap2 ~]$ reconstruct -f -r user.cdm-lit
>>>>>>> user.cdm-lit
>>>>>>> user.cdm-lit.Sea Trail 2013
>>>>>>> user.cdm-lit.Sent uid 1 found - adding
>>>>>>> user.cdm-lit.Sent uid 2 found - adding
>>>>>>> user.cdm-lit.Sent uid 3 found - adding
>>>>>>> user.cdm-lit.Sent uid 4 found - adding
>>>>>>> ....
>>>>>>> user.cdm-lit.Sent uid 146 found - adding
>>>>>>> user.cdm-lit.Sent uid 147 found - adding
>>>>>>> user.cdm-lit.Sent uid 148 found - adding
>>>>>>> fatal error: Internal error: assertion failed: imap/mailbox.c: 2847:
>>>>>>> record->size
>>>>>>>
>>>>>>> Since I couldn't get a successful reconstruct on this account, I
>>>>>>> deleted it and recreated it from my master host. However, I'm still
>>>>>>> unable to get a successful reconstruct with "failed to read index
>>>>>>> header" for every subfolder and "fatal error: Internal error:
>>>>>>> assertion failed: imap/mailbox.c: 2847: record->size"
>>>>>>>
>>>>>>> Any ideas on how to correct this so I can see if I can then get past
>>>>>>> the replication error?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> -Eric
>>>>>>>
>>>>>>>
>>>>>>> On 05/11/2017 11:46 AM, Eric Cunningham wrote:
>>>>>>>> Thanks Bron, but that doesn't seem to work for me, unless I'm
>>>>>>>> missing something.
>>>>>>>>
>>>>>>>> I ran reconstructs for this account on both my master and copy hosts:
>>>>>>>>
>>>>>>>> [cyrus at imap1 ~]$ reconstruct -f -r user.scramer
>>>>>>>> user.scramer
>>>>>>>> user.scramer.Archive
>>>>>>>> user.scramer.Archives
>>>>>>>> user.scramer.Archives.2004
>>>>>>>> ...
>>>>>>>> user.scramer.Trash.IS Networking and Operations
>>>>>>>> user.scramer.Trash.IS Security
>>>>>>>> user.scramer.Trash.IS Servers
>>>>>>>> user.scramer.Trash.IS Staff
>>>>>>>> user.scramer.Trash.IS Surveys
>>>>>>>> ...
>>>>>>>>
>>>>>>>>
>>>>>>>> [cyrus at imap2 ~]$ reconstruct -f -r user.scramer
>>>>>>>> user.scramer
>>>>>>>> user.scramer.Archive
>>>>>>>> user.scramer.Archives
>>>>>>>> user.scramer.Archives.2004
>>>>>>>> ...
>>>>>>>> user.scramer.Trash.IS Networking and Operations
>>>>>>>> user.scramer.Trash.IS Security
>>>>>>>> user.scramer.Trash.IS Servers
>>>>>>>> user.scramer.Trash.IS Staff
>>>>>>>> user.scramer.Trash.IS Surveys
>>>>>>>> ...
>>>>>>>>
>>>>>>>>
>>>>>>>> I then restart the replication against the log-run file and it again
>>>>>>>> fails on a folder that no longer exists in that account:
>>>>>>>>
>>>>>>>> ...
>>>>>>>> May 11 11:40:51 imap1 sync_client[60696]: sync_mailboxes: doing 1000
>>>>>>>> May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length
>>>>>>>> response to MAILBOXES (end of file reached)
>>>>>>>> May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length
>>>>>>>> response to MAILBOXES (end of file reached)
>>>>>>>> May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length
>>>>>>>> response to UNMAILBOX (end of file reached)
>>>>>>>> May 11 11:40:51 imap1 sync_client[60696]: folder_delete(): failed:
>>>>>>>> user.scramer.Trash.IS Software Management 'Bad protocol'
>>>>>>>> May 11 11:40:51 imap1 sync_client[60696]: Error in do_sync():
>>>>>>>> bailing out! Bad protocol
>>>>>>>> May 11 11:40:51 imap1 sync_client[60696]: Processing sync log file
>>>>>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol
>>>>>>>>
>>>>>>>>
>>>>>>>> -Eric
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/11/2017 02:14 AM, Bron Gondwana wrote:
>>>>>>>>> Looks like you have a corrupted mailboxes database - if you run a
>>>>>>>>> reconstruct at each end then it should be fine. Replication won't
>>>>>>>>> bail for an actually-doesn't-exist mailbox, but it can't fix broken
>>>>>>>>> mailboxes, that's what reconstruct is for.
>>>>>>>>>
>>>>>>>>> Bron.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, 11 May 2017, at 01:01, Eric Cunningham wrote:
>>>>>>>>>> Hi list, I'm running cyrus replication on cyrus 2.5.10. It seems
>>>>>>>>>> whenever a folder is encountered in log-run that doesn't exist on the
>>>>>>>>>> client and/or the server, the replication process logs "Error in
>>>>>>>>>> do_sync(): bailing out! Bad protocol". Sometimes, it dies
>>>>>>>>>> completely,
>>>>>>>>>> other times it restarts at the beginning of the log-run file. But
>>>>>>>>>> when
>>>>>>>>>> it re-encounters the folder in question, the same failure occurs
>>>>>>>>>> and the
>>>>>>>>>> problem continuously loops until some manual action is taken. I
>>>>>>>>>> have a
>>>>>>>>>> checker script that attempts to restart a stopped sync but I haven't
>>>>>>>>>> tried to code it to get past these types of folder errors. In the
>>>>>>>>>> meantime, replication effectively stops and the replication log
>>>>>>>>>> continuously builds in size. You might imagine my horror at
>>>>>>>>>> discovering
>>>>>>>>>> yesterday that replication of my production server with 1200 accounts
>>>>>>>>>> had been failing for over 7 weeks unnoticed, generating 9.5 million
>>>>>>>>>> lines in the replication log files. I had gotten out of the habit of
>>>>>>>>>> daily monitoring of replication as it had been running fine and I
>>>>>>>>>> eventually moved on to other things. I'm trying to get
>>>>>>>>>> replication to
>>>>>>>>>> run through the enormous backlog but am running into more similar
>>>>>>>>>> folder
>>>>>>>>>> issues that cause the process to fail, I make manual fixes, and start
>>>>>>>>>> the process again. In my experience with this, reconstruction of the
>>>>>>>>>> affected folders on the master host and copy host has no effect.
>>>>>>>>>>
>>>>>>>>>> Can this otherwise excellent feature be made more robust to work past
>>>>>>>>>> these folder issues?
>>>>>>>>>>
>>>>>>>>>> Here are a couple of examples I've observed thus far of how the
>>>>>>>>>> process
>>>>>>>>>> is failing for me.
>>>>>>>>>>
>>>>>>>>>> --------------
>>>>>>>>>>
>>>>>>>>>> In this example, the subfolder "Fly America Foreign Travel" no longer
>>>>>>>>>> existed on the master host but did exist on the copy host. To
>>>>>>>>>> resolve
>>>>>>>>>> this, I had to manually delete that folder from the copy host.
>>>>>>>>>>
>>>>>>>>>> May 9 15:17:48 imap1 sync_client[34614]: inefficient replication
>>>>>>>>>> (8 >
>>>>>>>>>> 5) user.jtaft.PO.Fly America Foreign Travel
>>>>>>>>>> May 9 15:17:48 imap1 sync_client[34614]: FULLMAILBOX received NO
>>>>>>>>>> response: IMAP_MAILBOX_NONEXISTENT No Such Mailbox
>>>>>>>>>> May 9 15:17:48 imap1 sync_client[34614]: do_folders(): update
>>>>>>>>>> failed:
>>>>>>>>>> user.jtaft.PO.Fly America Foreign Travel 'Mailbox does not exist'
>>>>>>>>>> May 9 15:17:48 imap1 sync_client[34614]: IOERROR: Mailbox does
>>>>>>>>>> not exist
>>>>>>>>>> May 9 15:17:48 imap1 sync_client[34614]: Error in do_sync(): bailing
>>>>>>>>>> out! Mailbox does not exist
>>>>>>>>>> May 9 15:17:48 imap1 sync_client[34614]: Processing sync log file
>>>>>>>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Mailbox does
>>>>>>>>>> not exist
>>>>>>>>>> May 9 15:17:48 imap1 sync_client[34614]: Reprocessing sync log file
>>>>>>>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run
>>>>>>>>>>
>>>>>>>>>> --------------
>>>>>>>>>>
>>>>>>>>>> In this example, the subfolder "Sohn" did not exist on the master
>>>>>>>>>> host
>>>>>>>>>> nor on the copy host. To resolve, I had to edit log-run and
>>>>>>>>>> completely
>>>>>>>>>> remove all entries for this "Sohn" subfolder.
>>>>>>>>>>
>>>>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length
>>>>>>>>>> response
>>>>>>>>>> to MAILBOXES (end of file reached)
>>>>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length
>>>>>>>>>> response
>>>>>>>>>> to MAILBOXES (end of file reached)
>>>>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length
>>>>>>>>>> response
>>>>>>>>>> to UNMAILBOX (end of file reached)
>>>>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: folder_delete(): failed:
>>>>>>>>>> user.gg-chair.Trash.Sohn 'Bad protocol'
>>>>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: Error in do_sync(): bailing
>>>>>>>>>> out! Bad protocol
>>>>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: Processing sync log file
>>>>>>>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol
>>>>>>>>>>
>>>>>>>>>> --------------
>>>>>>>>>>
>>>>>>>>>> Often, errors are logged without "bailing out!" and replication
>>>>>>>>>> continues to run:
>>>>>>>>>>
>>>>>>>>>> May 10 10:45:00 imap1 sync_client[87438]: sync_mailboxes: doing 1000
>>>>>>>>>> May 10 10:45:01 imap1 sync_client[87438]: sync_mailboxes: doing 1000
>>>>>>>>>> May 10 10:45:24 imap1 sync_client[87438]: IOERROR: GUID mismatch
>>>>>>>>>> /var/spool/cyrus/mail/c/user/cdm-lit/Sent/148.
>>>>>>>>>> May 10 10:45:37 imap1 sync_client[87438]: IOERROR: zero length
>>>>>>>>>> response
>>>>>>>>>> to MAILBOX (end of file reached)
>>>>>>>>>> May 10 10:45:37 imap1 sync_client[87438]: do_folders(): update
>>>>>>>>>> failed:
>>>>>>>>>> user.cdm-lit.Sent 'Bad protocol'
>>>>>>>>>> May 10 10:45:37 imap1 sync_client[87438]: sync_mailboxes: doing 1000
>>>>>>>>>>
>>>>>>>>>> --------------
>>>>>>>>>>
>>>>>>>>>> Thank you, list!
>>>>>>>>>>
>>>>>>>>>> -Eric
>>>>>>>>>>
>>>>>>>>>> ----
>>>>>>>>>> Cyrus Home Page: http://www.cyrusimap.org/
>>>>>>>>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>>>>>>>>>> To Unsubscribe:
>>>>>>>>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ----
>>>>>>> Cyrus Home Page: http://www.cyrusimap.org/
>>>>>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>>>>>>> To Unsubscribe:
>>>>>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----
>>>>>> Cyrus Home Page: http://www.cyrusimap.org/
>>>>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>>>>>> To Unsubscribe:
>>>>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
>>>>>>
>>>>>
>>>>
>>>> ----
>>>> Cyrus Home Page: http://www.cyrusimap.org/
>>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>>>> To Unsubscribe:
>>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
>>>
>>>
>>
>
>
More information about the Info-cyrus
mailing list