Sophia database => fail

Bron Gondwana brong at fastmail.fm
Sat Jun 18 19:20:32 EDT 2016


They're not cross-comparable.  Unless you have an lmdb that runs under xapian somehow, I'm constrained by the databases which xapian supports.

Which also means for this architecture, I'm constrained by the fact that there's no way to write non-fsyncing transactions to xapian in such a way that a crash leaves me with a stable old snapshot rather than a broken database.

We're not really using meta any more anyway - we compact directly from tmpfs down to data daily, and repack the data dailys every week.  When data gets more than about 20% of the size of the archive db, we repack them all together.

"compact" is two things - it's merging N databases into one (N can be one), and it's also (optionally) an index scan to find dead records and remove them.

But anyway, that's Xapian, and I'm not so concerned about Xapian except that I'd probably store it onto object storage or change it out for some elastic search cluster in my theoretical local-cache-only design.

Bron.

On Sun, Jun 19, 2016, at 08:59, Howard Chu wrote:
> Howard Chu via Cyrus-devel wrote:
> > Bron Gondwana wrote:
> >> A good example of what I want is the way that the xapianactive file works in
> >> Cyrus
> >> search at FastMail:
> >>
> >> https://blog.fastmail.com/2014/12/01/email-search-system/
> >>
> >> Because only the most recent database is writable (in this case on tmpfs,
> >> because
> >> we don't need 100% reliability for search, it only takes about 20 minutes to
> >> scan
> >> every mailbox and reindex the stuff that was on tmpfs after a crash)
> 
> Also, since you're using tmpfs, this in-memory benchmark is relevant.
> http://lmdb.tech/bench/inmem/
> >>
> >> Every other database is read-only - and you can compact multiple of them
> >> together
> >> into a single database and then atomically switch the old ones out and the
> >> new one
> >> in with a single very quick xapianactive rewrite - so it's acceptable to
> >> stop the world
> >> while doing that.
> >
> > This sounds like a lot of bother, particularly the bit about "checking if
> > tmpfs is full". It's also a bit confusing because you talk about "compacting"
> > which I interpret as "cleaning out empty/unused space inside a DB" but in
> > context it sounds like you really mean "merging" - combining multiple DBs into
> > a single DB.
> >
> > If I were building this system with LMDB there would be no separate temp and
> > meta tiers. LMDB would just mmap the DB on the SSD and let the OS buffer cache
> > keep the hot pages in RAM. I'm not really sure I'd bother with multiple DBs
> > either, there's nothing to compact. The data tier would be no different from
> > the meta tier.
> >
> > When you say you can reindex the stuff on tmpfs quickly, that means you're
> > only reindexing the most recent N emails?
> >
> 
> 
> -- 
>    -- Howard Chu
>    CTO, Symas Corp.           http://www.symas.com
>    Director, Highland Sun     http://highlandsun.com/hyc/
>    Chief Architect, OpenLDAP  http://www.openldap.org/project/


-- 
  Bron Gondwana
  brong at fastmail.fm


More information about the Cyrus-devel mailing list