Basic two host replication scenario, SSL failure

Bron Gondwana brong at fastmail.fm
Mon Jul 11 08:47:42 EDT 2011


On Mon, Jul 11, 2011 at 11:40:25AM +0300, Ivan Lezhnjov Jr. wrote:
> On Sun, Jul 10, 2011 at 11:07 AM, Bron Gondwana <brong at fastmail.fm> wrote:
> > I would love to replace monitorsync with better logic in sync_client
> > itself, but have not yet got to it!
> 
> So, what does "monitorsync" essentially do with those log files?

Runs sync_client -r -f $file on each file it finds that doesn't have
a corresponding sync_client process with the same PID actually alive
(it checks the ps output) - and if the sync_client run succeeds it
will delete the file, otherwise it emails us about the error and tries
again in next run (from cron every 10 minutes).

> Jul 11 11:21:14 imapsite-master syncserver[14019]: SSL_accept() timed
> out -> fail
> Jul 11 11:21:14 imapsite-master syncserver[14019]: STARTTLS failed:
> imapsite-replica [10.10.0.188]

Sounds like broken authentication.

> ============================== B switched to master
> 
> Jul 11 11:33:45 imapsite-replica sync_client[29199]: couldn't
> authenticate to backend server: no mechanism available
> Jul 11 11:33:45 imapsite-replica sync_client[29479]: couldn't
> authenticate to backend server: no mechanism available

And that's definitely broken authentication or different
configurations.

> > Yeah, of course.  You're doing it wrong[tm].  In theory the sync system
> > can recover from an accidental split brain like this, but it's not
> > ideal.
> 
> I'd be happy to learn what I'm doing exactly wrong :)

Changing stuff under the cyrus instances by rsyncing stuff around.
And it looks like not having the same authentication details or
configs at each end (modulo the bits that actually start and stop
the sync_client).

The only difference between our master and replica configs these days
is that sync_client only gets started on the master.  Actually, we
don't start it from cyrus.conf any more, we bring up the master first,
and then run sync_client from the init script after the master is
fully running.

> > > Replication doesn't work now. The question is can it work after doing
> > > this?
> >
> > You've got your rsynced spool and meta out of sync.  You will need to run
> > a full reconstruct -G to fix this, which will replace the incorrect metadata
> > with what's now in the spool.
> 
> Thank for the tip. Good to know that ;)

Reconstruct is pretty good.  It can deal with most situations where
cyrus is confused.  If you find any where it doesn't, file a bug and
I'll get it fixed!

> > > So, that's all I have to say perhaps. I would really appreciate any help
> > > with this. This seems like a basic, trivial scenario to me but I just
> > > can't seem to get cyrus-imap working right.
> >
> > It's not as trivial as it should be yet - and you can mess yourself up
> > particularly if you go rsyncing stuff between machines!  If you have one
> > host which is "correct" (host B in this case) I recommend that you do a
> > full reconstruct -r -G on it, and then discard the replica and restart
> > replication from scratch.
> 
> I've also just tried to apply these tips. Namely, when B switched to
> master failed to push changes to A switched to replica I did the
> following:
> - stopping the service and then "discarding replica" by removing
> /var/{lib/imap,spool/imap}
> - restarting the service with replica role configuration (which is
> correct by the way)
> 
> Anything else I could try or check?
> 
> PS: sorry for direct message to your inbox Bron :)

No worries - though I should warn you that I'm going on holiday for a
couple of weeks starting tomorrow, so answers may not always be prompt.

Bron.


More information about the Info-cyrus mailing list