UC Davis Cyrus Incident September 2007
Andrew Morgan
morgan at orst.edu
Wed Oct 17 13:36:38 EDT 2007
On Tue, 16 Oct 2007, Vincent Fox wrote:
> So here's the story of the UC Davis (no, not Berkeley) Cyrus conversion.....
[snip]
> 5th STEP: Cyrus migration
> ====================
>
> The politics of educational environment is that you MUST do massive
> changeouts like this during summer quarter. So the last couple of months
> of summer we were busily migrating all the UWash users to Cyrus.
> About 29K users to ms1, and 23K users to ms2. Everything worked great.
> Typically about 500 Cyrus processes running.
>
> 6th STEP: The excrement hits the rotating blades
> ===================================
>
> About a week before classes actually start is when all the kids start moving
> back into town and mailing all their buds. We saw process numbers go
> from 500-ish to as high as 5,000. Load would climb radically after passing
> 2,000 processes and systems became slow to respond. This persisted for
> 4 days with us on the phone with Ken & Jeff and anyone else who would
> talk to us, trying to find the right tweaks on the Cyrus software. We tried
> moving to quota-legacy and using BDB for delivery database a few other
> tweaks suggested, but none brought us substantial relief.
I feel your pain. The first week of fall term is always the time when we
see how well we did our planning and testing. :)
Luckily, we haven't had problems like this with Cyrus, but there are
several software upgrades that have bit us in the ass in the same way.
For example, we upgraded to Horde3 this summer. Everything was humming
along nicely (increased load averages, but still snappy) until the first
day of fall term. Then the MySQL server load average climbed to 100 and
Horde slowed to a crawl. It took about 4 days to figure out that a
particularly obnoxious SQL query was the problem and needed an additional
index (in hindsight, it was pretty obvious).
So, some years we get things right and fall term runs smoothly. Some
years things go badly. :)
Whenever possible, I really prefer to make gradual changes and slowly
ramp-up into production numbers.
This is a fascinating story, so please keep us all posted with your
findings!
Andy
More information about the Info-cyrus
mailing list