Huge performance problems after updating from 2.4 to 2.5.8
Andre Felipe Machado
andremachado at techforce.com.br
Fri Jul 15 08:17:41 EDT 2016
Hello,
Maybe, after such upgrade, squatter metadata indexes were lost and you should run an incremental squatter again on your mailboxes.
Even before the scheduled run at events section on /etc/cyrus.conf.
Regards.
Andre Felipe
Hynek Schlawack via Info-cyrus <info-cyrus at lists.andrew.cmu.edu> wrote ..
> Hello,
>
> we’ve updated one of our Cyrus IMAP backends from 2.4 to 2.5.8 on FreeBSD 10.3
> with ZFS and now we have an operational emergency.
>
> Cyrus IMAPd starts fine and keeps working for about 5 to 20 minutes (rather sluggishly
> tho). At some point the server load starts growing and explodes eventually until
> we have to restart the IMAP daemons which gives us another 5 to 20 minutes.
>
> It doesn’t really matter if we run `reconstruct` in the background or not.
>
>
> # Observations:
>
> 1. While healthy, the imapd daemons’s states are mostly `select` or `RUN`. Once
> things get critical they all are mostly in `zfs` (but do occasionally switch).
> 2. Customers report that their mail clients are downloading all e-mails. That’s
> obviously extra bad given we seem to run in some kind of I/O problems. Running
> `truss` on busy imapd processes seem to confirm that.
> 3. Once hell breaks loose, IO collapses even on other file systems/hard disks.
> 4. `top` mentions processes in `lock` state – sometimes even more than 200.
> That’s nothing we see on our other backends.
> 5. There seems to be a correlation between processes hanging in `zfs` state and
> `truss` showing them accessing mailboxes.db. Don’t know if it’s related, but
> soon after the upgrade, mailboxes.db broke and we had to reconstruct it.
>
>
> # Additional key data:
>
> - 25,000 accounts
> - 4.5 TB data
> - 64 GB RAM, no apparent swapping
> - 16 cores CPU
> - nginx in front of it.
>
> ## zpool iostat 5
>
> capacity operations bandwidth
> pool alloc free read write read write
> ---------- ----- ----- ----- ----- ----- -----
> tank 4.52T 697G 144 2.03K 1.87M 84.2M
> tank 4.52T 697G 84 730 2.13M 3.94M
> tank 4.52T 697G 106 904 2.78M 4.52M
> tank 4.52T 697G 115 917 3.07M 5.11M
> tank 4.52T 697G 101 1016 4.04M 5.06M
> tank 4.52T 697G 124 1.03K 3.27M 6.59M
>
> Which doesn’t look special.
>
> The data used to be on HDDs and worked fine with an SSD ZIL. After the upgrade
> and ensuing problems we tried a Hail Mary by replacing the HDDs thru SSDs to no
> avail (migrated a ZFS snapshot for that).
>
> So we do *not* believe it’s really a traditional I/O bottleneck since it only
> started *after* the upgrade to 2.5 and did not go away by adding SSDs. The change
> notes led us to believe that there shouldn’t be any I/O storm due to mailbox
> conversions but is it true in any case? How could we double check? Observation
> #2 from above leads us to believe that there are in fact some meta data problems.
> We’re reconstructing in the background but that’s going to take days; which
> is sadly time we don’t really have.
>
> ## procstat -w 1 of an active imapd
>
> PID PPID PGID SID TSID THR LOGIN WCHAN EMUL COMM
> 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor - FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor - FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor - FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor *vm objec FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor - FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor zfs FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor - FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor select FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor select FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor select FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor select FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor select FreeBSD ELF64 imapd
> 45016 43150 43150 43150 0 1 toor select FreeBSD ELF64 imapd
>
>
> Had anyone similar problems (and got them solved, ideally!)?
>
> Are there any known incompatibilities between Cyrus 2.5.8 and FreeBSD/ZFS?
>
> Has anyone ever successfully downgraded from 2.5.8 back to 2.4?
>
> Do we have any other options?
>
> Any help would be *very much* appreciated!
More information about the Info-cyrus
mailing list