LARGE single-system Cyrus installs?
Rob Mueller
robm at fastmail.fm
Thu Oct 4 19:32:52 EDT 2007
> Anyhow, just wondering if we the lone rangers on this particular
> edge of the envelope. We alleviated the problem short-term by
> recycling some V240 class systems with arrays into Cyrus boxes
> with about 3,500 users each, and brought our 2 big Cyrus units
> down to 13K-14K users each which seems to work okay.
FastMail has many 100,000's of users in a full replicated setup spread
across 10 backend servers (+ separate MX/Spam/Web/Frontend servers). We use
IBM servers with some off the shelf SATA-to-SCSI RAID DAS (eg like
http://www.areasys.com/area.aspx?m=PSS-6120). Hardware will die at some
stage, that's what replication is for.
Over the years we've tuned a number of things to get the best possible
performance. The biggest things we found:
1. Using the status cache was a big win for us
I did some analysis at one stage, and found that most IMAP clients issue
STATUS calls to every mailbox a user has on a regular basis (every 5 minutes
or so on average, but users can usually change it) so they can update the
unread count on every mailbox. The default status code has to iterate over
the entire cyrus.index file to get the unread count.
Although the cyrus.index file is the smallest file, with 10,000's of users
connected with clients doing this regularly for every folder, it basically
means you either have to have enough memory to keep every cyrus.index hot in
memory, or every 5-15 minutes you'll be forcing a re-read of gigabytes of
data from disk, or you need a better way.
The better way was to have a status cache.
http://cyrus.brong.fastmail.fm/#cyrus-statuscache-2.3.8.diff
This helped reduce meta data IO a lot for us.
2. Split your email data + metadata IO
With the 12 drive SATA-to-SCSI arrays, we get 4 x 150G 10k RPM WD Raptor
drives + 8 x (largest you can get) drives. We then build 2 x 2 drive RAID1 +
2 x 4 drive RAID5 arrays. We use the RAID1 arrays for the meta data (cyrus.*
except squatter) and the RAID5 arrays for the email data. We find the email
to meta ratio about 20-to-1, higher if you have squatter files, so 150G of
meta will support up to 3T of email data fine.
>From our iostat data, this seems to be a nice balance. Checking iostat, a
rough estimate shows meta data get 2 x the rkB/s and 3 x the wkB/s vs the
email spool, even though it's 1/20th the data size and we have the status
cache patch! Basically the meta data is "very hot", so optimising access to
it is important.
3. Not really related to cyrus, but we switched from perdition to nginx as a
frontend POP/IMAP proxy a while back. If you've got lots of IMAP
connections, it's really a worthwhile improvement.
http://blog.fastmail.fm/2007/01/04/webimappop-frontend-proxies-changed-to-nginx/
4. Lots of other little things
a) putting the proc dir on tmpfs is a good idea
b) make sure you have the right filesystem (on linux, reiserfs is much
better than ext3 even with ext3s dir hashing) and journaling modes
> That is our hypothesis right now, that the application has certain limits
> and if you go beyond a certain number of very active users on a
> single backend bad things happen.
Every application has that problem at some point. Consider something that
uses CPU only, and every new unit of work takes the CPU 0.1 seconds, then
you can handle 1-10 units of work arriving per second no problem. If 11
units per second arrive, then after 1 second you'll have done 10, but have 1
unit still to do, but another 11 units arrive in that next second again.
Basically your outstanding work queue will grow forever in theory.
cyrus isn't CPU limited by a long shot, but it can easily become IO limited.
This same effect happens with IO, it's just more noticeable because disks
are slow. Basically if you start issuing IO requests to a disk system and it
can't keep up, the IO queue grows quickly and the system starts crawling.
The only way to improve it is reduce your IOPs (eg less users or optimise
the application to issue less IOPs in some way) or increase the IOPs your
disk system can handle (eg more spindles, faster spindles, NVRAM, etc).
That's what 1 (reduce IOPs application generates) & 2 (put hot data on
faster spindles) above are both about, rather than the other option (reduce
users per server).
Rob
More information about the Info-cyrus
mailing list