Is there a limit to number of mailboxes in cyrus
Rob Mueller
robm at fastmail.fm
Thu Sep 8 19:29:40 EDT 2005
>>>FWIW, I've experimented with 750k mailboxes on a single system with 8GB
>>>RAM and we
>>>plan to put that number in production in a couple of months here.
>>
>> Ouch, 750k? How many concurrent accesses?
>>
>
> We currently have 1.6M, 1.2M and 940k mailboxes in 3 boxes with fiber to a
> single emc storage, all boxes dual Xeon 3.4Ghz EMT64T with 4G.
We tend to have quite large mailbox lists, but not as large as this. The
biggest issues we've found with large mailbox lists are:
1. Number of concurrent connections.
If you support/encourage IMAP usage, then you tend to end up with quite a
few more connections than POP. Although technically IMAP can be very long
lived, we find there are lots of short connections (mostly due to things
like Outlook Express which when doing a "sync" pass does a logout and login
for each *folder* in a users account!) and some long ones. With about
650,000 folders on one machine (about 130,000 users) and at peak times we
see about 3500 imapd processes. We use linux 2.6, and find that this is a
good number of maximum processes to have. Although the kernel is just about
O(1) for everything these days, we find that there does seem to be a bit of
an elbow point around the 5000 process mark where things just seem to start
showing higher latency and average loads on the server
2. Size of mailboxes.db file
With a large mailbox file, you probably want to use the skiplist format.
Part of the implementation of the skiplist db however is that the entire
file is mmap'ed into memory. While this is generally fine since each process
shares the same mmap file backing, with really large mailboxes.db files you
can end up with just huge page tables.
For instance, the above 650,000 folders mailboxes.db is about 100M is size.
With pages being 4k each, that means each process needs 25,600 pages just to
mmap that file into it's process space. If you have > 4GB of RAM, you have
to use x86_64 or PAE mode in linux. Both of these mean that each page
requires a 64bit page table entry (8 bytes). If you have 3500 process
then...
3500 * 25600 * 8 = 716800000 = 683M
Yes, that's 700M of memory just to hold the memory map of all your
processes, no actual real data at all!!! This also means that you MUST use
the high-PTE option in linux, or else you'll have lots of low memory
pressure.
3. IO
CPU isn't an issue. IO definitely is. Cyrus uses minimal CPU on todays
hardware, but it still is an IO hog. That's part of the reason we sponsored
the meta-data split patches that have gone into 2.3 so that you can separate
out the email store part and the cyrus.* files onto separate
partitions/spindles to improve overall performance. Where possible, split
out:
user.seen state files
quota files
cyrus.* files
email spool files
Onto separate spindles/partitions. At least that way you'll be able to use
something like "iostat -p ALL 120" to see which parts of your system are
generating the largest IO.
Rob
More information about the Info-cyrus
mailing list