Another 2.4 upgrade horror story
bhill at physics.ucsd.edu
Sun Sep 30 12:47:09 EDT 2012
On Sep 25, 2012, at 11:57 AM, Deniss <cyrus at sad.lv> wrote:
> On 25.09.2012 15:28, Eric Luyten wrote:
>> On Tue, September 25, 2012 2:01 pm, Sebastian Hagedorn wrote:
>>> about three weeks ago we upgraded our Cyrus installation from 2.3.x to 2.4.16.
>>> We were aware of the reindexing issue, so we took precautionary
>>> measures, but they didn't help a lot. We've got about 7 TB of mail data for
>>> almost 200,000 mailboxes. We did the upgrade on a Sunday and had told our
>>> users that mail access wouldn't be possible for the whole day. After the
>>> actual software upgrade we ran distributed scripts that triggered the index
>>> upgrades. We started with the largest mailboxes. The idea was that after those
>>> that took the longest had been upgraded, the rest should be OK overnight and
>>> early Monday. However, even though our storage infrastructure was kept at 99 %
>>> I/O saturation, progress was much slower than anticipated.
>>> Ultimately the server was virtually unuseable for the whole Monday and
>>> parts of Tuesday. The last mailbox was finally upgraded on Thursday, although
>>> on Wednesday most things were already working normally.
>>> I realize that some of our problems were caused by infrastructure that's
>>> not up to current standards, but nonetheless I would really urge you to never
>>> again use an upgrade mechanism like that. Give admins a chance to upgrade
>>> indexes in the background and over time.
>> Thank you for sharing your experiences.
>> As a site willing/needing to upgrade from 2.3.16 to 2.4.X this fall, we
>> are interested in learning about your storage backend characteristics.
>> What read/write IOPS rates were you registering before/during/after your
>> upgrade process ?
>> I'd understand your reluctance to share this information in a public forum.
>> No offence taken whatsoever !
>> Kind regards,
>> Eric Luyten, Computing Centre VUB/ULB, Eric.Luyten at vub.ac.be
> migration process from 2.3 to 2.4 took ~ one year for our installation.
> we converted ~200Tb of users data.
> first step we did - spread data on many nodes using cyrus replication.
> next we started converting nodes one by one at weekends nights to
> minimize IO load generated by users.
> in fact cyrus read all data from disk to generate new indexes, so
> convert is limited by disk IO mainly while CPU is pretty cheap nowadays.
> we got around 500Gb in 8 hours rate for forced reindex with 100% disk load.
> we started forced reindex with most active users meanwhile allowing
> users to login and trigger reindex of their mailboxes
Sorry for hi-jacking this thread, but I'm curious as to the preferred method of forcing a reindex on a mailbox? I know it triggers when a user logs in and accesses the mailbox. I would like to divide up users and perform the reindex in chunks.
Bryan D. Hill
UCSD Physics Computing Facility
CTBP Systems Support
9500 Gilman Dr. # 0319
La Jolla, CA 92093
bhill at ucsd.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Info-cyrus