Replication failed 3.0.5 -> 3.0.13, now 3.0.13->3.0.13

Michael Menge michael.menge at zdv.uni-tuebingen.de
Tue Apr 21 08:13:05 EDT 2020


Hi,

Quoting Olaf Frączyk <olaf at navi.pl>:

> The current situation is:
>
> 1. Replica:
>
> stopped and started the replica
>
> no activity on replica - iotop and top show nothing
>
> the only messages on replica is incoming connection from master
>
> 2. Master:
>
> when I run sync_client -r I still get:
>
> Apr 21 12:38:36 ifs sync_client[29518]: Reprocessing sync log file  
> /var/lib/imap/sync/log-run
> Apr 21 12:43:27 ifs sync_client[29518]: IOERROR: zero length  
> response to MAILBOXES (idle for too long)
> Apr 21 12:43:27 ifs sync_client[29518]: IOERROR: zero length  
> response to RESTART (idle for too long)
> Apr 21 12:43:27 ifs sync_client[29518]: Error in do_sync(): bailing  
> out! Bad protocol
> Apr 21 12:43:27 ifs sync_client[29518]: Processing sync log file  
> /var/lib/imap/sync/log-run failed: Bad protocol
>
> 3. There is 27 GB of about 45 GB replicated and there is no further progress
>
> 4. How to find out why the replica doesn't respond?
>

You can enable telemetry logging on the replica by creating a folder  
/var/lib/imap/log/<SYNC_AUTHNAME>
where <SYNC_AUTHNAME> is the value of the  sync_authname. If you give  
cyrus write permissions for this folder
it will create log-files for each process and what it received and  
send from/to the sync_client with timestamps.

Also you can try to use strace on the syncserver process to figure out  
which files it is accessing


> On 2020-04-21 11:18, Olaf Frączyk wrote:
>> I also found out that when I see on master:
>>
>> Apr 21 11:12:38 ifs sync_client[27996]: IOERROR: zero length  
>> response to MAILBOXES (idle for too long)
>> Apr 21 11:12:38 ifs sync_client[27996]: IOERROR: zero length  
>> response to RESTART (idle for too long)
>> Apr 21 11:12:38 ifs sync_client[27996]: Error in do_sync(): bailing  
>> out! Bad protocol
>> Apr 21 11:12:38 ifs sync_client[27996]: Processing sync log file  
>> /var/lib/imap/sync/log-run failed: Bad protocol
>>
>> I also get on the replica:
>>
>> Apr 21 11:12:38 skink1 imap[5775]: Connection reset by peer,  
>> closing connection
>>
>> But I also see, that before it happens, there is no activity both  
>> on the replica and on the master for some time.
>>
>> So maybe the imap server process is not recovering correctly in the  
>> longlock situation?
>>
>> On 2020-04-21 11:07, Olaf Frączyk wrote:
>>> Hi,
>>>
>>> When I run sync_client -r on the master I see the following on the replica:
>>>
>>> Apr 21 10:56:15 skink1 imap[5775]: mailbox: longlock  
>>> navi.pl!user.olaf for 1.7 seconds
>>> Apr 21 10:56:20 skink1 imap[5775]: mailbox: longlock  
>>> navi.pl!user.piotr for 2.0 seconds
>>> Apr 21 10:56:23 skink1 imap[5775]: mailbox: longlock  
>>> navi.pl!user.olaf for 2.9 seconds
>>> Apr 21 10:56:26 skink1 imap[5775]: mailbox: longlock  
>>> navi.pl!user.piotr for 3.0 seconds
>>>
>>> The mailboxes have several GB in Inbox folder and several GB in  
>>> subfolders. The inbox folders have about 20,000 messages, the  
>>> subfolders upto 15,000
>>>
>>> Could it cause problems?

the longlock is normaly not the problem. While on process has the lock
no other process can write to the mailbox, but on the replica there normaly
is no other process that should access the mailbox

>>>
>>> Maybe I should move the sync_client from START section to  
>>> SERVICES, it seems that it is not automatically restarted
>>>
I havend tried starting sync_client in the service section.
I start the sync_client via systemd.

>>> On 2020-04-21 08:47, Michael Menge wrote:
>>>> Hi Olaf
>>>>
>>>>
>>>> Quoting Olaf Frączyk <olaf at navi.pl>:
>>>>
>>>>> Hi,
>>>>>
>>>>> I upgraded to 3.0.13 but it didn't help.
>>>>>
>>>>> This time it copied about 18GB
>>>>>
>>>>> in the logs I still see:
>>>>>
>>>>> 1 - inefficient replication
>>>>>
>>>>> 2 - IOERROR: zero length response to MAILBOXES (idle for too long)
>>>>> IOERROR: zero length response to RESTART (idle for too long)
>>>>> Error in do_sync(): bailing out! Bad protocol
>>>>>
>>>>> But I have no idea what can I do next and why it fails
>>>>>
>>>>> Apr 21 02:24:46 ifs sync_client[12656]: IOERROR: zero length  
>>>>> response to MAILBOXES (idle for too long)
>>>>> Apr 21 02:24:46 ifs sync_client[12656]: IOERROR: zero length  
>>>>> response to RESTART (idle for too long)
>>>>> Apr 21 02:24:46 ifs sync_client[12656]: Error in do_sync():  
>>>>> bailing out! Bad protocol
>>>>
>>>> do you see any errors on the syncserver side. The error look like the
>>>> sync_client is waiting for a reply of the server.


--------------------------------------------------------------------------------
M.Menge                                Tel.: (49) 7071/29-70316
Universität Tübingen                   Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung          mail:  
michael.menge at zdv.uni-tuebingen.de
Wächterstraße 76
72074 Tübingen



More information about the Info-cyrus mailing list