ctl_mboxlist virtual memory exhausted !

Bron Gondwana brong at fastmail.fm
Tue Mar 25 18:27:46 EDT 2008


On Tue, 25 Mar 2008 09:31:40 +0100, "Brasseur Valery" <Valery.Brasseur at atosorigin.com> said:
> Hi,
> 
> I am running 2.3.11, with 2 Millions users (4M mailboxes ;-)
> 
> when trying to do a ctl_mboxlist -m, after some time (a few second !) I
> got a "virtual memory exhausted", and i can see that the process is
> allocating more than 3Gb of memory !

Ouch. That hurts

> did some of you encontered this ?
> any way to bypass ?

We split our Cyrus instance up into 300Gb data partitions.  We currently have
56 stores (112 partitions thanks to replication).  Obviously you need infrastructure
to manage this, and some form of frontend proxy to direct user logins to the correct
store (we use nginx).  Further, any users who need to share mailboxes must be on the
same store.

Still, things are a lot faster when your average mailboxes.db is only 4.5Mb in size
(having just checked the one for the store my mailbox is on)

> I also got a lot's of skiplist corruption when file size is around 700Mb
> for mailboxes.db, and mupdate process getting 100% of CPU when it's
> arrive !!!
> any ideas are welcome !

Are you using:

http://cyrus.brong.fastmail.fm/patches/cyrus-skiplist-safelock-2.3.11.diff
http://cyrus.brong.fastmail.fm/patches/cyrus-skiplist-state-2.3.11.diff
http://cyrus.brong.fastmail.fm/patches/cyrus-skiplist-transactions-2.3.10.diff

Finally, are you running a 32 bit operating system?  With a 700Mb mailboxes.db
being mmaped, you might be pushing close to the available process memory space.
Running a 64 bit kernel would probably help a lot there (you will of course need
to have 64 bit hardware!)

I would seriously recommend against having 2 million users all on one machine
on disaster recovery principles - it takes far too long to copy that much data
onto modern drives, so if you lose your drive unit then getting users back up
and running looks like about a week's sitting there watching data copy.  Yes,
I have done that before.  That's why we run partitions that can rebuild from
scratch in 6 hours now.

Bron.
-- 
  Bron Gondwana
  brong at fastmail.fm



More information about the Info-cyrus mailing list