lmtpd triggering a delivery.db checkpointing (Cyrus 2.3.16)

Bron Gondwana brong at fastmail.fm
Tue May 17 19:51:02 EDT 2016


On Tue, May 17, 2016, at 22:51, Eric Luyten via Info-cyrus wrote:
> On Tue, May 17, 2016 11:45 am, Simon Matter wrote:
> >> Hi,
> >>
> >>
> >>
> >> Several times a month our server freezes up on deliveries and the system
> >> load average shoots up into the hundreds. Things quickly return to normal
> >> between one and two minutes later but this has always puzzled me.
> >>
> >> Today I was watching the system from up close when it happened.
> >>
> >>
> >>
> >> May 17 10:59:14 XXXX lmtp[24980]: skiplist: checkpointed
> >> /ssd/cyrs/imap/deliver.db (223062 records, 25295200 bytes) in 119 seconds
> >>
> >>
> >>
> >>
> >> I took a quick dive into the code but could not find where and when lmtpd
> >> is supposed to trigger a delivery.db checkpointing action.
> >
> > Isn't it controlled by 'checkpoint    cmd="ctl_cyrusdb -c" period=30' in
> > cyrus.conf?
> 
> 
> Okay, I think I found the code in   lib/cyrusdb_skiplist.c
> 
> We do indeed have the (default) 'checkpoint  cmd="ctl_cyrusdb -c" period=30'
> entry in cyrus.conf, 30 referring to the number of minutes between invocations.
> 
> We prune deliver.db every night at 00:55 with -E 1
> 
> 
> So I guess the phenomenon I witnessed this morning correlates with server
> business in the area of deliveries.
> A Cyrus Wiki page hints at reducing the number of minutes down from 30.
> 
> "The most common one is that you need to checkpoint the cyrusdb more often.
>  This can be done with a simple ctl_cyrusdb -c If you do this very often,
>  the amount of log that needs to be recovered will be significantly shorter.
>  We recommend doing this at least once every half hour, and more often on
>  busy sites. "
> (http://cyrusimap.web.cmu.edu/mediawiki/index.php/FAQ)

Urgh: 2.3.x.

Sadly, that's not really hooked up nicely and the terminology is really muddy.
Skiplist databases will rewrite themselves as a more compact version when they
reach a certain ratio of ADD records to INORDER records.

This isn't exposed outside cyrusdb_skiplist.c until 2.5, and it's not hooked into
ctl_cyrusdb's "checkpoint" operation, which just calls a sync on each database
engine:

    case CHECKPOINT:
        r2 = (*(dblist[i].env))->sync();

and then takes a backup of the files with:
        r2 = (*(dblist[i].env))->archive((const char**) archive_files,
                         backup1);


sync does nothing:

static int mysync(void)
{
    return 0;
}


archive takes copies of the files (without even locking!)

static int myarchive(const char **fnames, const char *dirname)
{
    int r;
    const char **fname;
    char dstname[1024], *dp;
    int length, rest;
    
    strlcpy(dstname, dirname, sizeof(dstname));
    length = strlen(dstname);
    dp = dstname + length;
    rest = sizeof(dstname) - length;
    
    /* archive those files specified by the app */
    for (fname = fnames; *fname != NULL; ++fname) {
    syslog(LOG_DEBUG, "archiving database file: %s", *fname);
    strlcpy(dp, strrchr(*fname, '/'), rest);
    r = cyrusdb_copyfile(*fname, dstname);
    if (r) {
        syslog(LOG_ERR,
           "DBERROR: error archiving database file: %s", *fname);
        return CYRUSDB_IOERROR;
    }
    }

    return 0;
}

...

These are identical right up to 3.0, though they're factored out into
"generic sync" and "generic archive".  So ctl_cyrusdb checkpoint
doesn't actually do much worthwhile work.

At least in 3.0 you can use cyr_dbtool to checkpoint a database
explicitly if you want to:

sudo -u cyrus cyr_dbtool /var/imap/deliver.db skiplist repack

But you're running 2.3.x, so none of my last 6 years of work are
available to you!

---------------------------------------

What we do at FastMail to make deliver.db not suck is store it on tmpfs.  The repack is tons faster.  Sure you lose it over a full server restart, but all you lose is the duplicate suppression.  If you wanted to be really clever about it, you could copy the file during the shutdown script and maybe once per hour otherwise, and copy it back onto tmpfs during startup.

duplicate_db_path: /var/run/cyrus/duplicate.db

(where /var/run is a tmpfs on our systems)

Bron.

-- 
  Bron Gondwana
  brong at fastmail.fm


More information about the Info-cyrus mailing list