Replication failed 3.0.5 -> 3.0.13, now 3.0.13->3.0.13
Olaf Frączyk
olaf at navi.pl
Tue Apr 21 08:30:53 EDT 2020
Thank you for the telemetry hint :)
I don't use the syncserver - the replication is done via IMAP port on
the replica side. I have no idea how to have strace spawned by cyrus
master process. When I attach later to imapd using strace -p I'm afraid
some info already will be lost.
The syncserver is marked as deprecated in the docs, so I went with the
more modern option. Maybe here is the problem ;)
The funny thing is that from time to time the replication progresses a
little. I don't like non-repetitive behavior ;)
Thanks again for the hints.
On 2020-04-21 14:13, Michael Menge wrote:
> Hi,
>
> Quoting Olaf Frączyk <olaf at navi.pl>:
>
>> The current situation is:
>>
>> 1. Replica:
>>
>> stopped and started the replica
>>
>> no activity on replica - iotop and top show nothing
>>
>> the only messages on replica is incoming connection from master
>>
>> 2. Master:
>>
>> when I run sync_client -r I still get:
>>
>> Apr 21 12:38:36 ifs sync_client[29518]: Reprocessing sync log file
>> /var/lib/imap/sync/log-run
>> Apr 21 12:43:27 ifs sync_client[29518]: IOERROR: zero length response
>> to MAILBOXES (idle for too long)
>> Apr 21 12:43:27 ifs sync_client[29518]: IOERROR: zero length response
>> to RESTART (idle for too long)
>> Apr 21 12:43:27 ifs sync_client[29518]: Error in do_sync(): bailing
>> out! Bad protocol
>> Apr 21 12:43:27 ifs sync_client[29518]: Processing sync log file
>> /var/lib/imap/sync/log-run failed: Bad protocol
>>
>> 3. There is 27 GB of about 45 GB replicated and there is no further
>> progress
>>
>> 4. How to find out why the replica doesn't respond?
>>
>
> You can enable telemetry logging on the replica by creating a folder
> /var/lib/imap/log/<SYNC_AUTHNAME>
> where <SYNC_AUTHNAME> is the value of the sync_authname. If you give
> cyrus write permissions for this folder
> it will create log-files for each process and what it received and
> send from/to the sync_client with timestamps.
>
> Also you can try to use strace on the syncserver process to figure out
> which files it is accessing
>
>
>> On 2020-04-21 11:18, Olaf Frączyk wrote:
>>> I also found out that when I see on master:
>>>
>>> Apr 21 11:12:38 ifs sync_client[27996]: IOERROR: zero length
>>> response to MAILBOXES (idle for too long)
>>> Apr 21 11:12:38 ifs sync_client[27996]: IOERROR: zero length
>>> response to RESTART (idle for too long)
>>> Apr 21 11:12:38 ifs sync_client[27996]: Error in do_sync(): bailing
>>> out! Bad protocol
>>> Apr 21 11:12:38 ifs sync_client[27996]: Processing sync log file
>>> /var/lib/imap/sync/log-run failed: Bad protocol
>>>
>>> I also get on the replica:
>>>
>>> Apr 21 11:12:38 skink1 imap[5775]: Connection reset by peer, closing
>>> connection
>>>
>>> But I also see, that before it happens, there is no activity both on
>>> the replica and on the master for some time.
>>>
>>> So maybe the imap server process is not recovering correctly in the
>>> longlock situation?
>>>
>>> On 2020-04-21 11:07, Olaf Frączyk wrote:
>>>> Hi,
>>>>
>>>> When I run sync_client -r on the master I see the following on the
>>>> replica:
>>>>
>>>> Apr 21 10:56:15 skink1 imap[5775]: mailbox: longlock
>>>> navi.pl!user.olaf for 1.7 seconds
>>>> Apr 21 10:56:20 skink1 imap[5775]: mailbox: longlock
>>>> navi.pl!user.piotr for 2.0 seconds
>>>> Apr 21 10:56:23 skink1 imap[5775]: mailbox: longlock
>>>> navi.pl!user.olaf for 2.9 seconds
>>>> Apr 21 10:56:26 skink1 imap[5775]: mailbox: longlock
>>>> navi.pl!user.piotr for 3.0 seconds
>>>>
>>>> The mailboxes have several GB in Inbox folder and several GB in
>>>> subfolders. The inbox folders have about 20,000 messages, the
>>>> subfolders upto 15,000
>>>>
>>>> Could it cause problems?
>
> the longlock is normaly not the problem. While on process has the lock
> no other process can write to the mailbox, but on the replica there
> normaly
> is no other process that should access the mailbox
>
>>>>
>>>> Maybe I should move the sync_client from START section to SERVICES,
>>>> it seems that it is not automatically restarted
>>>>
> I havend tried starting sync_client in the service section.
> I start the sync_client via systemd.
>
>>>> On 2020-04-21 08:47, Michael Menge wrote:
>>>>> Hi Olaf
>>>>>
>>>>>
>>>>> Quoting Olaf Frączyk <olaf at navi.pl>:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I upgraded to 3.0.13 but it didn't help.
>>>>>>
>>>>>> This time it copied about 18GB
>>>>>>
>>>>>> in the logs I still see:
>>>>>>
>>>>>> 1 - inefficient replication
>>>>>>
>>>>>> 2 - IOERROR: zero length response to MAILBOXES (idle for too long)
>>>>>> IOERROR: zero length response to RESTART (idle for too long)
>>>>>> Error in do_sync(): bailing out! Bad protocol
>>>>>>
>>>>>> But I have no idea what can I do next and why it fails
>>>>>>
>>>>>> Apr 21 02:24:46 ifs sync_client[12656]: IOERROR: zero length
>>>>>> response to MAILBOXES (idle for too long)
>>>>>> Apr 21 02:24:46 ifs sync_client[12656]: IOERROR: zero length
>>>>>> response to RESTART (idle for too long)
>>>>>> Apr 21 02:24:46 ifs sync_client[12656]: Error in do_sync():
>>>>>> bailing out! Bad protocol
>>>>>
>>>>> do you see any errors on the syncserver side. The error look like the
>>>>> sync_client is waiting for a reply of the server.
>
>
> --------------------------------------------------------------------------------
>
> M.Menge Tel.: (49) 7071/29-70316
> Universität Tübingen Fax.: (49) 7071/29-5912
> Zentrum für Datenverarbeitung mail:
> michael.menge at zdv.uni-tuebingen.de
> Wächterstraße 76
> 72074 Tübingen
>
> ----
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
More information about the Info-cyrus
mailing list