Huge performance problems after updating from 2.4 to 2.5.8
Bron Gondwana
brong at fastmail.fm
Fri Jul 15 08:34:53 EDT 2016
Squatter indexes will be broken without the latest patches on the 2.5 branch. The data gets indexed differently, and I had to patch the squatter code to search both styles.
Bron.
On Fri, Jul 15, 2016, at 22:17, Andre Felipe Machado via Info-cyrus wrote:
> Hello,
> Maybe, after such upgrade, squatter metadata indexes were lost and you should run an incremental squatter again on your mailboxes.
> Even before the scheduled run at events section on /etc/cyrus.conf.
> Regards.
> Andre Felipe
>
>
>
> Hynek Schlawack via Info-cyrus <info-cyrus at lists.andrew.cmu.edu> wrote ..
> > Hello,
> >
> > we’ve updated one of our Cyrus IMAP backends from 2.4 to 2.5.8 on FreeBSD 10.3
> > with ZFS and now we have an operational emergency.
> >
> > Cyrus IMAPd starts fine and keeps working for about 5 to 20 minutes (rather sluggishly
> > tho). At some point the server load starts growing and explodes eventually until
> > we have to restart the IMAP daemons which gives us another 5 to 20 minutes.
> >
> > It doesn’t really matter if we run `reconstruct` in the background or not.
> >
> >
> > # Observations:
> >
> > 1. While healthy, the imapd daemons’s states are mostly `select` or `RUN`. Once
> > things get critical they all are mostly in `zfs` (but do occasionally switch).
> > 2. Customers report that their mail clients are downloading all e-mails. That’s
> > obviously extra bad given we seem to run in some kind of I/O problems. Running
> > `truss` on busy imapd processes seem to confirm that.
> > 3. Once hell breaks loose, IO collapses even on other file systems/hard disks.
> > 4. `top` mentions processes in `lock` state – sometimes even more than 200.
> > That’s nothing we see on our other backends.
> > 5. There seems to be a correlation between processes hanging in `zfs` state and
> > `truss` showing them accessing mailboxes.db. Don’t know if it’s related, but
> > soon after the upgrade, mailboxes.db broke and we had to reconstruct it.
> >
> >
> > # Additional key data:
> >
> > - 25,000 accounts
> > - 4.5 TB data
> > - 64 GB RAM, no apparent swapping
> > - 16 cores CPU
> > - nginx in front of it.
> >
> > ## zpool iostat 5
> >
> > capacity operations bandwidth
> > pool alloc free read write read write
> > ---------- ----- ----- ----- ----- ----- -----
> > tank 4.52T 697G 144 2.03K 1.87M 84.2M
> > tank 4.52T 697G 84 730 2.13M 3.94M
> > tank 4.52T 697G 106 904 2.78M 4.52M
> > tank 4.52T 697G 115 917 3.07M 5.11M
> > tank 4.52T 697G 101 1016 4.04M 5.06M
> > tank 4.52T 697G 124 1.03K 3.27M 6.59M
> >
> > Which doesn’t look special.
> >
> > The data used to be on HDDs and worked fine with an SSD ZIL. After the upgrade
> > and ensuing problems we tried a Hail Mary by replacing the HDDs thru SSDs to no
> > avail (migrated a ZFS snapshot for that).
> >
> > So we do *not* believe it’s really a traditional I/O bottleneck since it only
> > started *after* the upgrade to 2.5 and did not go away by adding SSDs. The change
> > notes led us to believe that there shouldn’t be any I/O storm due to mailbox
> > conversions but is it true in any case? How could we double check? Observation
> > #2 from above leads us to believe that there are in fact some meta data problems.
> > We’re reconstructing in the background but that’s going to take days; which
> > is sadly time we don’t really have.
> >
> > ## procstat -w 1 of an active imapd
> >
> > PID PPID PGID SID TSID THR LOGIN WCHAN EMUL COMM
> > 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor - FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor - FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor - FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor *vm objec FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor - FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor - FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor select FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor select FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor select FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor select FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor select FreeBSD ELF64 imapd
> > 45016 43150 43150 43150 0 1 toor select FreeBSD ELF64 imapd
> >
> >
> > Had anyone similar problems (and got them solved, ideally!)?
> >
> > Are there any known incompatibilities between Cyrus 2.5.8 and FreeBSD/ZFS?
> >
> > Has anyone ever successfully downgraded from 2.5.8 back to 2.4?
> >
> > Do we have any other options?
> >
> > Any help would be *very much* appreciated!
> ----
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
--
Bron Gondwana
brong at fastmail.fm
More information about the Info-cyrus
mailing list