Is there a limit to number of mailboxes in cyrus

Rob Mueller robm at fastmail.fm
Thu Sep 8 19:29:40 EDT 2005


>>>FWIW, I've experimented with 750k mailboxes on a single system with 8GB 
>>>RAM and we
>>>plan to put that number in production in a couple of months here.
>>
>> Ouch, 750k?  How many concurrent accesses?
>>
>
> We currently have 1.6M, 1.2M and 940k mailboxes in 3 boxes with fiber to a 
> single emc storage, all boxes dual Xeon 3.4Ghz EMT64T with 4G.

We tend to have quite large mailbox lists, but not as large as this. The 
biggest issues we've found with large mailbox lists are:

1. Number of concurrent connections.

If you support/encourage IMAP usage, then you tend to end up with quite a 
few more connections than POP. Although technically IMAP can be very long 
lived, we find there are lots of short connections (mostly due to things 
like Outlook Express which when doing a "sync" pass does a logout and login 
for each *folder* in a users account!) and some long ones. With about 
650,000 folders on one machine (about 130,000 users) and at peak times we 
see about 3500 imapd processes. We use linux 2.6, and find that this is a 
good number of maximum processes to have. Although the kernel is just about 
O(1) for everything these days, we find that there does seem to be a bit of 
an elbow point around the 5000 process mark where things just seem to start 
showing higher latency and average loads on the server

2. Size of mailboxes.db file

With a large mailbox file, you probably want to use the skiplist format. 
Part of the implementation of the skiplist db however is that the entire 
file is mmap'ed into memory. While this is generally fine since each process 
shares the same mmap file backing, with really large mailboxes.db files you 
can end up with just huge page tables.

For instance, the above 650,000 folders mailboxes.db is about 100M is size. 
With pages being 4k each, that means each process needs 25,600 pages just to 
mmap that file into it's process space. If you have > 4GB of RAM, you have 
to use x86_64 or PAE mode in linux. Both of these mean that each page 
requires a 64bit page table entry (8 bytes). If you have 3500 process 
then...

3500 * 25600 * 8 = 716800000 = 683M

Yes, that's 700M of memory just to hold the memory map of all your 
processes, no actual real data at all!!! This also means that you MUST use 
the high-PTE option in linux, or else you'll have lots of low memory 
pressure.

3. IO

CPU isn't an issue. IO definitely is. Cyrus uses minimal CPU on todays 
hardware, but it still is an IO hog. That's part of the reason we sponsored 
the meta-data split patches that have gone into 2.3 so that you can separate 
out the email store part and the cyrus.* files onto separate 
partitions/spindles to improve overall performance. Where possible, split 
out:

user.seen state files
quota files
cyrus.* files
email spool files

Onto separate spindles/partitions. At least that way you'll be able to use 
something like "iostat -p ALL 120" to see which parts of your system are 
generating the largest IO.

Rob




More information about the Info-cyrus mailing list