Cyrus 2.5, xapian, Sphinx and index sizes

Bron Gondwana brong at fastmail.fm
Wed Sep 24 18:50:42 EDT 2014


On Thu, Sep 25, 2014, at 12:23 AM, Sebastian Hagedorn wrote:
> > sdb1-20 are LUKS encrypted partitions on a single hardware RAID6 volume
> > with 12 x 2Tb WD RE4 drives. md2 is also LUKS encrypted, but it's a
> > software RAID1e with 3 x 2Tb WD RE4 drives. md1 is 400Gb Intel DC3700
> > drives in software RAID1.  It's not using LUKS because the drives support
> > encryption on-disk, so we're using that.
> 
> What do you use LUKS for? My best guess would be to make it easier to toss 
> out broken drives without having to worry about personal data remaining on 
> them?

Absolutely.  We're running in remote datacentres, and it's just so much easier to know that when we shut a machine down, there's no risk to personal data.  We can send disks back for RMA likewise.  Our servers aren't CPU-bound, so it doesn't hurt performance significantly, and of course the SSDs don't need it.

> > So how do we structure our search?  It's complicated.  There are 4
> > "tiers" of storage.  The first tier is tmpfs, the second is ssd (it's not
> > used much though), the third is on the search partition, and the 4th is
> > ALSO on the search partition, but it's there for archive purposes, so we
> > can compact most of the long-term search down to a single index without
> > having to rewrite it every week.
> 
> So you only use fast storage for writing? Isn't there a big performance hit 
> for searches on the data and archive partitions? I wonder why you don't use 
> SSDs for those.

Cost, of course.

It's not too bad, because there's never more than one thread writing to the disk, and searches are much rarer than updates.  We might reconsider this when the cost equation changes, but it's still so much better than what we had before that a 1-2 second wait for even quite big (multi gigabyte) accounts is not too bad.

> > I'll attach the xapian_compact.pl script to this email.
> 
> Why is there no job for archiving? You don't really do that manually, I 
> suppose?

It happens automatically whenever we move a user to a new server, or in a few other situations.  Otherwise it's at least semi-manual - the next time will probably be when I do a full re-index of everyone once we add support for a ton of charactersets which are frequent enough in email that we should be converting and indexing them.

Bron.

-- 
  Bron Gondwana
  brong at fastmail.fm


More information about the Info-cyrus mailing list