Backend reboot replication lost

Willy Offermans Willy at Offermans.Rompen.nl
Sat Mar 1 07:43:02 EST 2014


Hello Michael and cyrus friends,

On Fri, Feb 28, 2014 at 01:07:45PM +0100, Michael Menge wrote:
> Hi
> 
> Quoting Willy Offermans <Willy at Offermans.Rompen.nl>:
> 
> >Dear cyrus friends,
> >
> >Once more my backend server was rebooted. I did not find any messages in my
> >logs nor did I receive any screen messages, that the replication was
> >stopped. I wonder what will happen in a production environment, when the
> >server reboots without my notice. Replication will fail and I will not be
> >able to guarantee full recovery. To my opinion this is unacceptable.
> >
> >Best would be to incorporate a message system about failure in the
> >sync_client code.
> >
> >I found some entries in the logs of the backend server about access of the
> >replication user: every 10 minutes the user logs on to the backend server.
> >Most probably to replicate the mails. I might use this behavior as a sign
> >of a working replication mechanism. It is only indirect, but it tells me
> >that there is at least some activity from the client to the backend. I
> >wonder why the user is logging on every 10 minutes. Does it mean that the
> >mails, received for the last 9 minutes or so, are not replicated?
> >
> >I'm not very experience in coding, but I will try to dig into the
> >sync_client code and see how things are organised.
> >
> >I restarted the replication by executing ``sync_client -r'' on the client.
> >I do not even know if this is the right step to take to reactivate
> >replication. Can someone confirm? I can see in the logs of the backend,
> >that the replication user logs on every 10 minutes again. I take that as a
> >positive sign, that ``sync_client -r'' restarts the replication, but I have
> >no clue about inconsistencies or other possible checks.
> >
> >
> 
> If you have configured rolling replication, every change will be logged
> to the {configdirectory}/sync/log file. The 'sync_client -r' will check
> for this file, move it to {configdirectory}/sync/log-pid, process the file
> and checks again for a new {configdirectory}/sync/log
> 
> If 'sync_client -r' is not running has crashed {configdirectory}/sync/log
> will grow. So by checking the filesize of the log you know if you replic is
> up to date.
> 
> If sync_client stops, and there is a log-pid file present,
> you run "sync_client -r -f {configdirectory}/sync/log-pid"
> and check that the exit code. If it is 0 you can remove the
> log-pid file and restart 'sync_cliet -r', if not check the logs
> for errors.
> 
> 
> 

The backend rebooted once more. This will still happen several times, I'm
afraid, for reasons not related to cyrus. It gives me the opportunity to 
play with replication.

There were several log files in /var/imap/sync:
log
log-4508
log-74001
log-5600
(I do not remember the exact numbers)

I removed the old log-pid files, leaving log-4508 and log file. I assumed
that the old log-pid remained after previous reboots.

I had a look into log file. It was a listing about my mail boxes, seemingly
randomly, and with double entries. I assumed the entries were connected 
to the reception of incoming mails. However, I have no clue how the entries are
related to the replication process. Maybe someone can shed light on this.

I followed your procedure:

a) sync_client -r -f /var/imap/sync/log-4508
b) sync_client -r

I worked seemingly well. No messages whatsoever. So without any other
proof, I take this for success.

I like to note three things:

1) This procedure should be written somewhere in the manual.
2) There is still a need for double check of successful replication.
3) There is still a need for a message about failure of replication, caused
by reboot or other connection lost.

-- 
Met vriendelijke groeten,
With kind regards,
Mit freundlichen Gruessen,
De jrus wah,

Willy

*************************************
 W.K. Offermans


More information about the Info-cyrus mailing list