ReiserFS and general cyrus filesystem usage information - was Re: best filesystem for imap server

Rob Mueller robm at fastmail.fm
Thu Dec 2 17:14:34 EST 2004


> Ordered would be best for a Cyrus spoll, and I guess Data would be best on
> MTAs (when they have a small enough queue lifetime for most messages, and
> the journal is large enough).

I think probably just test and find which one gives you the better 
performance. We tended to find that data=journal actually gave better 
performance, but didn't know exactly why. This seems to be another case of 
most benchmarks != real world!

> Indeed. Although why mailboxes.db (when using the BDB backend, anyway) has
> so much IO I have no idea.  Once read, BDB should be doing IPC to fetch it
> from in-memory cache, not trashing the disk.  Unless writes to 
> mailboxes.db
> are very common.

This was a skiplist mailboxes.db. Bug again, there were 3 things on the 
NVRAM drive:
1. skiplist mailboxes.db
2. skiplist .seen files
3. quota files

I don't have a break down of which of those was causing the most IO load, 
but it's quite possible (and even probable) that it wasn't the mailboxes.db, 
the other 2 sets of files would get a LOT of writes (and this was even with 
noatime and nodiratime as well, definitely filesystem options you should be 
using)

> No doubts about that one (since we're talking about a nvram drive here). 
> I
> wonder if it is such a great idea on a slow device (disk), though.  Do you
> have this data?

No. I did notice once that lots of stat() calls are several times slower on 
HD's with the tails option on. We thus turn it off for all HDs.

> You mean the patches on the threads, or patches available somewhere else
> (where?)

Check the kernel mailing list for the ext3 one, I think it is or will be 
soon in the 2.6 mainline.

For reiserfs, Vladimir Saveliev from namesys told us. "The exampled scenario 
of deadlock happens when user buffer is prepared by mmap(2)-ing a file to 
which we are to write(2). Suggested patch in fs/reiserfs/:"

--- file.c~     2004-10-02 12:29:33.223660850 +0400
+++ file.c      2004-10-08 10:03:03.001561661 +0400
@@ -1137,6 +1137,8 @@
        return result;
     }

+    return generic_file_write(file, buf, count, ppos);
+
     if ( unlikely((ssize_t) count < 0 ))
         return -EINVAL;


We've applied this and haven't had a problem since.

> How stable and stress-tested is data=ordered? and what about the full
> journalling (which might be a good thing on MTAs)?

Seems well tested to me, haven't seen any problems at all and we have lots 
of IO...

Rob

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html




More information about the Info-cyrus mailing list