How to clean up after syncserver

Bron Gondwana brong at fastmail.fm
Tue Apr 14 04:45:20 EDT 2009


On Tue, Apr 14, 2009 at 09:43:10AM +0200, Michael Glad wrote:
> Bron Gondwana wrote:
> > On Mon, 13 Apr 2009 21:29 +0200, "Michael Glad" <glad at cs.au.dk> wrote:
> >   
> >> Cyrus being restarted / sync server abending, apparently causes it to 
> >> leave sub dirs in the 'sync.' dirs containing hard links  to messages.
> >> They currently sum to 40k+ links on one of my replicas :-( .
> >>     
> >
> > That's a lot of sync server failures!
> >
> >   
> I restart cyrus on my replicas once a day as part of the backup 
> procedure. And whatever hard links

Gosh - what a pain.  I would get too annoyed by the "sync_client
bailing out" notifications (we have a daemon that watches the
imapd.log and makes a noise for anything that's not everyday
notices)

> that sync_server has left in its sync. subdirs contribute to the 40k count.
> 
> And it seems that sync_server has a potential of leaving links:
> 
> [root at replica# date; find sync. g/user/glad -inum 136643179 -ls
> Tue Apr 14 09:25:44 CEST 2009
> 136643179    4 -rw-------   3 cyrus    mail          935 Apr 14 09:15 
> sync./29594/65.
> 136643179    4 -rw-------   3 cyrus    mail          935 Apr 14 09:15 
> g/user/glad/71229.
> 136643179    4 -rw-------   3 cyrus    mail          935 Apr 14 09:15 
> g/user/glad/Incoming/499.
> 
> Why doesn't it (after 10 minutes & counting) remove the sync./29594/65. 
> link after successfully having created the
> two proper  links (I have a Sieve script that files a copy of all mails 
> in the Incoming folder)?

In a single sync run it will keep all temporary files around
just in case there's a COPY coming up that it wants to serve
from the existing file (thanks to GUID uniqueness it can tell
that it's the same file)

Once the single run is finished, then it will unlink all the
files it used.

The reason it's taking so long - probably because you stopped
the replica for a while, so it has a big file to process.  We
split them into 10k chunks when restarting... just so that
progress can be monitored a bit more easily.

> As I write this sentence, 15 minutes have passed and the link is gone. 
> But leaving all those links around for extended
> periods seems like an invitation for bad  things to happen and for 
> garbage to pile up. :-(

So don't shut down your replica then.  We do our backups by
fcntl locking the cyrus.* files (header, index, expunge in that
order) and then duplicating them.  There's no need to lock the
data files - they never change, so you either get the whole
thing or nothing - and if nothing then the file was already
deleted by now.

I'm tempted to write a backup utility that can give you a
consistent snapshot of a single user as a tar file.  The
paths would be such that you could just concatenate a whole
lot of them together and gzip it as the backup!

(also an incremental one - given a previous tar file, back
up just what's changed)
 
> >> I noticed them during an yet unsuccessful attempt to find out why 
> >> message body inconsistencies now and then occur
> >> between master and replica.
> >>     
> >
> > We should have fixed that a while back - but yes, for a little while it would
> > just open the hardlinked file and truncate it without unlinking first - causing
> > random corruption.  Really, REALLY annoying.  The bug certainly doesn't
> > exist in 2.3.14 - I spent quite a while auditing all the paths that open files!
> >   
> I've browsed the sync_server code and have noticed that care is being 
> taken not to overwrite
> left-over links but it seems that this is actually still happening. I'm 
> quite sure that I see the
> problems with 2.3.14. 

I'm guessing they are old ones.  Have you consistency checked 
your entire server?  If your replica has ALWAYS been your
replica then a reconstruct -G on every folder followed by a 
sync_client -u on every user should fix it (it will take
a little while, but at least it can be done one user at a
time with a reconstruct -r -G and then a sync_client -u)

Regards,

Bron ( annoyingly, I haven't set reconstruct to tell you if
       the GUID of any message changes.  I probably should! )


More information about the Info-cyrus mailing list