LARGE single-system Cyrus installs?

Mon Nov 19 05:30:31 EST 2007

On Mon, Nov 19, 2007 at 08:50:16AM +0000, Ian G Batten wrote:
>
> On 17 Nov 07, at 0909, Rob Mueller wrote:
>>
>> This shouldn't really be a problem. Yes the whole file is locked for the
>> duration of the write, however there should be only 1 fsync per
>> "transaction", which is what would introduce any latency. The actual 
>> writes
>> to the db file itself should be basically instant as the OS should just
>> cache them.
>
> One thing that's worth noting for ZFS-ites is that on ZFS, you can have 
> multiple writer threads in a file simultaneously, which UFS can only do for 
> directio under certain conditions I can't recall.  That's a win for 
> overlapping transactions into a file-based database.   We're not hitting 
> mailboxes.db remotely rapidly enough for this to be an issue, but I can 
> imagine it being so for big shops.
>
> In production releases of ZFS fsync() essentially triggers sync() (fixed in 
> Solaris Next).  So if you anticipate a lot of writes (and hence fsync()s) 
> to mailboxes.db then you don't want mailboxes.db in the same ZFS filesystem 
> as things with lots of un-sync'd writes going on.    I've broken up 
> /var/imap for ease of taking and rolling back snapshots, but it has the 
> handy side-effect of isolating delivery.db and mailboxes.db from all the 
> metadata partitions.

Skiplist requires two fsync calls per transaction (single
untransactioned actions are also one transaction), and it
also locks the entire file for the duration of said 
transaction, so you can't have two writes happening at
once.  I haven't built Cyrus on our Solaris box, so I don't
know if it uses fcntl there, it certainly does on the Linux
systems, but it can fall back to flock if fcntl isn't
available.

> In my darker moments, by the way, I'm tempted to put deliver.db into tmpfs. 
>  For planned reboot I could copy it somewhere stable, and I could 
> periodically dump it out to disk.  But if I lost it, the consequences 
> aren't serious, and it's most of the write load through that particular 
> filesystem.

Sounds pretty reasonable to me.

>>
>> Still, you have a point that mailboxes.db is a global point of contention,
>> and it is access a lot, so blocking all processes on it for a write could 
>> be
>> an issue.
>
>
>
>>
>> Which makes me even more glad that we've split up our servers into lots of
>> small cyrus instances, even less points of contention...

Yeah, it's nice.  It's a pain that the entire mailboxes.db blocks
on writes, but it sure keeps the skiplist format simple.  I'd be
interested to see if there are cases where a transaction is kept
open longer than it needs to be though.

Bron.