Replication failed 3.0.5 -> 3.0.13, now 3.0.13->3.0.13
Michael Menge
michael.menge at zdv.uni-tuebingen.de
Tue Apr 21 09:00:56 EDT 2020
Quoting Olaf Frączyk <olaf at navi.pl>:
> Thank you for the telemetry hint :)
>
> I don't use the syncserver - the replication is done via IMAP port
> on the replica side. I have no idea how to have strace spawned by
> cyrus master process. When I attach later to imapd using strace -p
> I'm afraid some info already will be lost.
>
You can try "/usr/bin/strace <strace_option> <path_to_imapd>/imapd
<imapd_options>" as cmd
or you can use prefork=1 in the service to prefok one imap process and connect
to this one before you start the sync_client.
> The syncserver is marked as deprecated in the docs, so I went with
> the more modern option. Maybe here is the problem ;)
>
i am using syncserver since cyrus-imapd 2.3. So I had no reason to change it.
> The funny thing is that from time to time the replication progresses
> a little. I don't like non-repetitive behavior ;)
>
> Thanks again for the hints.
>
> On 2020-04-21 14:13, Michael Menge wrote:
>> Hi,
>>
>> Quoting Olaf Frączyk <olaf at navi.pl>:
>>
>>> The current situation is:
>>>
>>> 1. Replica:
>>>
>>> stopped and started the replica
>>>
>>> no activity on replica - iotop and top show nothing
>>>
>>> the only messages on replica is incoming connection from master
>>>
>>> 2. Master:
>>>
>>> when I run sync_client -r I still get:
>>>
>>> Apr 21 12:38:36 ifs sync_client[29518]: Reprocessing sync log file
>>> /var/lib/imap/sync/log-run
>>> Apr 21 12:43:27 ifs sync_client[29518]: IOERROR: zero length
>>> response to MAILBOXES (idle for too long)
>>> Apr 21 12:43:27 ifs sync_client[29518]: IOERROR: zero length
>>> response to RESTART (idle for too long)
>>> Apr 21 12:43:27 ifs sync_client[29518]: Error in do_sync():
>>> bailing out! Bad protocol
>>> Apr 21 12:43:27 ifs sync_client[29518]: Processing sync log file
>>> /var/lib/imap/sync/log-run failed: Bad protocol
>>>
>>> 3. There is 27 GB of about 45 GB replicated and there is no
>>> further progress
>>>
>>> 4. How to find out why the replica doesn't respond?
>>>
>>
>> You can enable telemetry logging on the replica by creating a
>> folder /var/lib/imap/log/<SYNC_AUTHNAME>
>> where <SYNC_AUTHNAME> is the value of the sync_authname. If you
>> give cyrus write permissions for this folder
>> it will create log-files for each process and what it received and
>> send from/to the sync_client with timestamps.
>>
>> Also you can try to use strace on the syncserver process to figure
>> out which files it is accessing
>>
>>
>>> On 2020-04-21 11:18, Olaf Frączyk wrote:
>>>> I also found out that when I see on master:
>>>>
>>>> Apr 21 11:12:38 ifs sync_client[27996]: IOERROR: zero length
>>>> response to MAILBOXES (idle for too long)
>>>> Apr 21 11:12:38 ifs sync_client[27996]: IOERROR: zero length
>>>> response to RESTART (idle for too long)
>>>> Apr 21 11:12:38 ifs sync_client[27996]: Error in do_sync():
>>>> bailing out! Bad protocol
>>>> Apr 21 11:12:38 ifs sync_client[27996]: Processing sync log file
>>>> /var/lib/imap/sync/log-run failed: Bad protocol
>>>>
>>>> I also get on the replica:
>>>>
>>>> Apr 21 11:12:38 skink1 imap[5775]: Connection reset by peer,
>>>> closing connection
>>>>
>>>> But I also see, that before it happens, there is no activity both
>>>> on the replica and on the master for some time.
>>>>
>>>> So maybe the imap server process is not recovering correctly in
>>>> the longlock situation?
>>>>
>>>> On 2020-04-21 11:07, Olaf Frączyk wrote:
>>>>> Hi,
>>>>>
>>>>> When I run sync_client -r on the master I see the following on
>>>>> the replica:
>>>>>
>>>>> Apr 21 10:56:15 skink1 imap[5775]: mailbox: longlock
>>>>> navi.pl!user.olaf for 1.7 seconds
>>>>> Apr 21 10:56:20 skink1 imap[5775]: mailbox: longlock
>>>>> navi.pl!user.piotr for 2.0 seconds
>>>>> Apr 21 10:56:23 skink1 imap[5775]: mailbox: longlock
>>>>> navi.pl!user.olaf for 2.9 seconds
>>>>> Apr 21 10:56:26 skink1 imap[5775]: mailbox: longlock
>>>>> navi.pl!user.piotr for 3.0 seconds
>>>>>
>>>>> The mailboxes have several GB in Inbox folder and several GB in
>>>>> subfolders. The inbox folders have about 20,000 messages, the
>>>>> subfolders upto 15,000
>>>>>
>>>>> Could it cause problems?
>>
>> the longlock is normaly not the problem. While on process has the lock
>> no other process can write to the mailbox, but on the replica there normaly
>> is no other process that should access the mailbox
>>
>>>>>
>>>>> Maybe I should move the sync_client from START section to
>>>>> SERVICES, it seems that it is not automatically restarted
>>>>>
>> I havend tried starting sync_client in the service section.
>> I start the sync_client via systemd.
>>
>>>>> On 2020-04-21 08:47, Michael Menge wrote:
>>>>>> Hi Olaf
>>>>>>
>>>>>>
>>>>>> Quoting Olaf Frączyk <olaf at navi.pl>:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I upgraded to 3.0.13 but it didn't help.
>>>>>>>
>>>>>>> This time it copied about 18GB
>>>>>>>
>>>>>>> in the logs I still see:
>>>>>>>
>>>>>>> 1 - inefficient replication
>>>>>>>
>>>>>>> 2 - IOERROR: zero length response to MAILBOXES (idle for too long)
>>>>>>> IOERROR: zero length response to RESTART (idle for too long)
>>>>>>> Error in do_sync(): bailing out! Bad protocol
>>>>>>>
>>>>>>> But I have no idea what can I do next and why it fails
>>>>>>>
>>>>>>> Apr 21 02:24:46 ifs sync_client[12656]: IOERROR: zero length
>>>>>>> response to MAILBOXES (idle for too long)
>>>>>>> Apr 21 02:24:46 ifs sync_client[12656]: IOERROR: zero length
>>>>>>> response to RESTART (idle for too long)
>>>>>>> Apr 21 02:24:46 ifs sync_client[12656]: Error in do_sync():
>>>>>>> bailing out! Bad protocol
>>>>>>
>>>>>> do you see any errors on the syncserver side. The error look like the
>>>>>> sync_client is waiting for a reply of the server.
>>
>>
>> -------------------------------------------------------------------------------- M.Menge Tel.: (49)
>> 7071/29-70316
>> Universität Tübingen Fax.: (49) 7071/29-5912
>> Zentrum für Datenverarbeitung mail:
>> michael.menge at zdv.uni-tuebingen.de
>> Wächterstraße 76
>> 72074 Tübingen
>>
>> ----
>> Cyrus Home Page: http://www.cyrusimap.org/
>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>> To Unsubscribe:
>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
> ----
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
--------------------------------------------------------------------------------
M.Menge Tel.: (49) 7071/29-70316
Universität Tübingen Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung mail:
michael.menge at zdv.uni-tuebingen.de
Wächterstraße 76
72074 Tübingen
More information about the Info-cyrus
mailing list