Endgame: Cyrus big install at UC Davis

Vincent Fox vbfox at ucdavis.edu
Tue Feb 19 15:50:09 EST 2008


So for those of you who recall back that far.....

UC Davis switched to Cyrus and as soon as fall quarter started
and students started hitting our servers hard, they collapsed.
Load would go up to what SEEMED to be (for a 32-core T2000)
a moderate value of 5+ and then performance would fall off a cliff.
People would be getting timed out, overall it was REALLY bad
here for several days, lots of pressure....

We are running Solaris10 u4 and using a ZFS pool for the mail store.

Conversations with the list and with the developers at CMU seemed
to provide little relief.  Eventually we moved accounts off onto more
servers that were hastily scavenged and when we got below 8,000
per server we could breathe.

After poking at things for quite a while fsync delays were significant.
In the end we believe this not to be a Cyrus problem per-se, but a point
where ZFS had not been optimized.  We finally proved this by running
the filebench "varmail" software against ZFS & UFS and coming up with
much larger fsync times against ZFS.   Oh if only we had heard of this
particular benchmark test earlier!

We found we could make them even up by turning off ZIL, or using
OpenSolaris Nevada_78 which has many ZFS performance optimizations.
Disabling ZIL isn't as bad as it sounds it makes ZFS equivalent to ext3
data=ordered/writeback mode which is what we are all used to.
Everyone cares about FILESYSTEM consistency and peformance.
The admin I've found who sets ext3 data=journalled seems rare.

We debated about using OpenSolaris or going back to UFS, but enventually our
management got someone at Sun to give us an IDR patch for 10u4 with the
ZFS fixes in place.  Now we have all our systems running this patch
and humming along fine.

Anyhow if anyone else is thinking about using ZFS there's a lot to 
recommend it
as long as you are aware of what I just outlined.  If you are going to 
have a
large number of users you'll want OpenSolaris or wait for a later Solaris
update with ZFS performance optimizations. Dunno if that will be u5 or u6.



More information about the Info-cyrus mailing list