Another 2.4 upgrade horror story

Sun Sep 30 12:47:09 EDT 2012

On Sep 25, 2012, at 11:57 AM, Deniss <cyrus at sad.lv> wrote:

> 
> 
> On 25.09.2012 15:28, Eric Luyten wrote:
>> On Tue, September 25, 2012 2:01 pm, Sebastian Hagedorn wrote:
>>> Hi,
>>> 
>>> 
>>> about three weeks ago we upgraded our Cyrus installation from 2.3.x to 2.4.16.
>>> We were aware of the reindexing issue, so we took precautionary
>>> measures, but they didn't help a lot. We've got about 7 TB of mail data for
>>> almost 200,000 mailboxes. We did the upgrade on a Sunday and had told our
>>> users that mail access wouldn't be possible for the whole day. After the
>>> actual software upgrade we ran distributed scripts that triggered the index
>>> upgrades. We started with the largest mailboxes. The idea was that after those
>>> that took the longest had been upgraded, the rest should be OK overnight and
>>> early Monday. However, even though our storage infrastructure was kept at 99 %
>>> I/O saturation, progress was much slower than anticipated.
>>> 
>>> 
>>> Ultimately the server was virtually unuseable for the whole Monday and
>>> parts of Tuesday. The last mailbox was finally upgraded on Thursday, although
>>> on Wednesday most things were already working normally.
>>> 
>>> I realize that some of our problems were caused by infrastructure that's
>>> not up to current standards, but nonetheless I would really urge you to never
>>> again use an upgrade mechanism like that. Give admins a chance to upgrade
>>> indexes in the background and over time.
>> 
>> 
>> +1
>> 
>> 
>> Sebastian,
>> 
>> 
>> Thank you for sharing your experiences.
>> 
>> As a site willing/needing to upgrade from 2.3.16 to 2.4.X this fall, we
>> are interested in learning about your storage backend characteristics.
>> 
>> What read/write IOPS rates were you registering before/during/after your
>> upgrade process ?
>> 
>> I'd understand your reluctance to share this information in a public forum.
>> No offence taken whatsoever !
>> 
>> 
>> Kind regards,
>> Eric Luyten, Computing Centre VUB/ULB,     Eric.Luyten at vub.ac.be
> 
> 
> migration process from 2.3 to 2.4 took ~ one year for our installation. 
> we converted ~200Tb of users data.
> first step we did - spread data on many nodes using cyrus replication.
> next we started converting nodes one by one at weekends nights to 
> minimize IO load generated by users.
> in fact cyrus read all data from disk to generate new indexes, so 
> convert is limited by disk IO mainly while CPU is pretty cheap nowadays.
> we got around 500Gb in 8 hours rate for forced reindex with 100% disk load.
> we started forced reindex with most active users meanwhile allowing 
> users to login and trigger reindex of their mailboxes
> 
> 

Sorry for hi-jacking this thread, but I'm curious as to the preferred method of forcing a reindex on a mailbox?  I know it triggers when a user logs in and accesses the mailbox.  I would like to divide up users and perform the reindex in chunks.  

Thanks,
Bryan

---
Bryan D. Hill
UCSD Physics Computing Facility
CTBP Systems Support

9500 Gilman Dr.  # 0319
La Jolla, CA 92093
+1-858-534-5538
bhill at ucsd.edu
AIM:  pozvibesd
Web:  http://www.physics.ucsd.edu/pcf

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.andrew.cmu.edu/pipermail/info-cyrus/attachments/20120930/fc24da10/attachment.html