Replication failed 3.0.5 -> 3.0.13, now 3.0.13->3.0.13

Michael Menge michael.menge at zdv.uni-tuebingen.de
Tue Apr 21 09:00:56 EDT 2020


Quoting Olaf Frączyk <olaf at navi.pl>:

> Thank you for the telemetry hint :)
>
> I don't use the syncserver - the replication is done via IMAP port  
> on the replica side. I have no idea how to have strace spawned by  
> cyrus master process. When I attach later to imapd using strace -p  
> I'm afraid some info already will be lost.
>

You can try "/usr/bin/strace <strace_option> <path_to_imapd>/imapd  
<imapd_options>" as cmd
or you can use prefork=1 in the service to prefok one imap process and connect
to this one before you start the sync_client.



> The syncserver is marked as deprecated in the docs, so I went with  
> the more modern option. Maybe here is the problem ;)
>

i am using syncserver since cyrus-imapd 2.3. So I had no reason to change it.


> The funny thing is that from time to time the replication progresses  
> a little. I don't like non-repetitive behavior ;)
>
> Thanks again for the hints.
>
> On 2020-04-21 14:13, Michael Menge wrote:
>> Hi,
>>
>> Quoting Olaf Frączyk <olaf at navi.pl>:
>>
>>> The current situation is:
>>>
>>> 1. Replica:
>>>
>>> stopped and started the replica
>>>
>>> no activity on replica - iotop and top show nothing
>>>
>>> the only messages on replica is incoming connection from master
>>>
>>> 2. Master:
>>>
>>> when I run sync_client -r I still get:
>>>
>>> Apr 21 12:38:36 ifs sync_client[29518]: Reprocessing sync log file  
>>> /var/lib/imap/sync/log-run
>>> Apr 21 12:43:27 ifs sync_client[29518]: IOERROR: zero length  
>>> response to MAILBOXES (idle for too long)
>>> Apr 21 12:43:27 ifs sync_client[29518]: IOERROR: zero length  
>>> response to RESTART (idle for too long)
>>> Apr 21 12:43:27 ifs sync_client[29518]: Error in do_sync():  
>>> bailing out! Bad protocol
>>> Apr 21 12:43:27 ifs sync_client[29518]: Processing sync log file  
>>> /var/lib/imap/sync/log-run failed: Bad protocol
>>>
>>> 3. There is 27 GB of about 45 GB replicated and there is no  
>>> further progress
>>>
>>> 4. How to find out why the replica doesn't respond?
>>>
>>
>> You can enable telemetry logging on the replica by creating a  
>> folder /var/lib/imap/log/<SYNC_AUTHNAME>
>> where <SYNC_AUTHNAME> is the value of the  sync_authname. If you  
>> give cyrus write permissions for this folder
>> it will create log-files for each process and what it received and  
>> send from/to the sync_client with timestamps.
>>
>> Also you can try to use strace on the syncserver process to figure  
>> out which files it is accessing
>>
>>
>>> On 2020-04-21 11:18, Olaf Frączyk wrote:
>>>> I also found out that when I see on master:
>>>>
>>>> Apr 21 11:12:38 ifs sync_client[27996]: IOERROR: zero length  
>>>> response to MAILBOXES (idle for too long)
>>>> Apr 21 11:12:38 ifs sync_client[27996]: IOERROR: zero length  
>>>> response to RESTART (idle for too long)
>>>> Apr 21 11:12:38 ifs sync_client[27996]: Error in do_sync():  
>>>> bailing out! Bad protocol
>>>> Apr 21 11:12:38 ifs sync_client[27996]: Processing sync log file  
>>>> /var/lib/imap/sync/log-run failed: Bad protocol
>>>>
>>>> I also get on the replica:
>>>>
>>>> Apr 21 11:12:38 skink1 imap[5775]: Connection reset by peer,  
>>>> closing connection
>>>>
>>>> But I also see, that before it happens, there is no activity both  
>>>> on the replica and on the master for some time.
>>>>
>>>> So maybe the imap server process is not recovering correctly in  
>>>> the longlock situation?
>>>>
>>>> On 2020-04-21 11:07, Olaf Frączyk wrote:
>>>>> Hi,
>>>>>
>>>>> When I run sync_client -r on the master I see the following on  
>>>>> the replica:
>>>>>
>>>>> Apr 21 10:56:15 skink1 imap[5775]: mailbox: longlock  
>>>>> navi.pl!user.olaf for 1.7 seconds
>>>>> Apr 21 10:56:20 skink1 imap[5775]: mailbox: longlock  
>>>>> navi.pl!user.piotr for 2.0 seconds
>>>>> Apr 21 10:56:23 skink1 imap[5775]: mailbox: longlock  
>>>>> navi.pl!user.olaf for 2.9 seconds
>>>>> Apr 21 10:56:26 skink1 imap[5775]: mailbox: longlock  
>>>>> navi.pl!user.piotr for 3.0 seconds
>>>>>
>>>>> The mailboxes have several GB in Inbox folder and several GB in  
>>>>> subfolders. The inbox folders have about 20,000 messages, the  
>>>>> subfolders upto 15,000
>>>>>
>>>>> Could it cause problems?
>>
>> the longlock is normaly not the problem. While on process has the lock
>> no other process can write to the mailbox, but on the replica there normaly
>> is no other process that should access the mailbox
>>
>>>>>
>>>>> Maybe I should move the sync_client from START section to  
>>>>> SERVICES, it seems that it is not automatically restarted
>>>>>
>> I havend tried starting sync_client in the service section.
>> I start the sync_client via systemd.
>>
>>>>> On 2020-04-21 08:47, Michael Menge wrote:
>>>>>> Hi Olaf
>>>>>>
>>>>>>
>>>>>> Quoting Olaf Frączyk <olaf at navi.pl>:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I upgraded to 3.0.13 but it didn't help.
>>>>>>>
>>>>>>> This time it copied about 18GB
>>>>>>>
>>>>>>> in the logs I still see:
>>>>>>>
>>>>>>> 1 - inefficient replication
>>>>>>>
>>>>>>> 2 - IOERROR: zero length response to MAILBOXES (idle for too long)
>>>>>>> IOERROR: zero length response to RESTART (idle for too long)
>>>>>>> Error in do_sync(): bailing out! Bad protocol
>>>>>>>
>>>>>>> But I have no idea what can I do next and why it fails
>>>>>>>
>>>>>>> Apr 21 02:24:46 ifs sync_client[12656]: IOERROR: zero length  
>>>>>>> response to MAILBOXES (idle for too long)
>>>>>>> Apr 21 02:24:46 ifs sync_client[12656]: IOERROR: zero length  
>>>>>>> response to RESTART (idle for too long)
>>>>>>> Apr 21 02:24:46 ifs sync_client[12656]: Error in do_sync():  
>>>>>>> bailing out! Bad protocol
>>>>>>
>>>>>> do you see any errors on the syncserver side. The error look like the
>>>>>> sync_client is waiting for a reply of the server.
>>
>>
>> -------------------------------------------------------------------------------- M.Menge                                Tel.: (49)  
>> 7071/29-70316
>> Universität Tübingen                   Fax.: (49) 7071/29-5912
>> Zentrum für Datenverarbeitung          mail:  
>> michael.menge at zdv.uni-tuebingen.de
>> Wächterstraße 76
>> 72074 Tübingen
>>
>> ----
>> Cyrus Home Page: http://www.cyrusimap.org/
>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>> To Unsubscribe:
>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
> ----
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus



--------------------------------------------------------------------------------
M.Menge                                Tel.: (49) 7071/29-70316
Universität Tübingen                   Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung          mail:  
michael.menge at zdv.uni-tuebingen.de
Wächterstraße 76
72074 Tübingen



More information about the Info-cyrus mailing list