Bad logins bogging down server
Michael Sofka
sofkam at rpi.edu
Tue Sep 19 09:52:06 EDT 2017
Follow up....
The botnet is still hammering away, checking those old accounts. But
the bottleneck appears to have been saslauthd threads. Doubling the
thread count from 5 to 10 has resolved the problem for now. (And, might
even explain the occasional slow response from IMAP I've observed.) I
will run more experiments to see just how high the thread count should
be, and I've got a list of other optimizations I will try. This note is
in case somebody else sees the same problem.
Mike
On 09/16/2017 07:41 AM, Michael D. Sofka wrote:
> I'm seeking help from the collective wisdom of the Cyrus world.
>
> In the past two days we have seen first a doubling, and then a
> quadrupling+ of badlogins to Cyrus. These appear to be coming from a
> botnet, in that the IPs are spread around in a way that evades fail2ban.
> It got so bad Friday afternoon, that we took the extraordinary step of
> blocking off-campus connections to IMAP (email can still be read via
> Webmail and the VPN).
>
> The symptoms are that connections grow, and grow and grow until
> authentication slows, holding open connections longer and longer. It
> takes about 15 minutes for the connection number to be at a point at
> which service is interrupted. Friday night at attempt was made to
> re-enable off-campus IMAP, and the bots were still at it, service was
> again disrupted.
>
> But the number of connections does not appear close to max permitted by
> Cyrus.
>
> We have a Murder cluster: Three front-end servers, Two back-end
> servers, Two replication servers.
> The front-end servers are Ubuntu 14.04, Cyrus 2.4.17. The back-end and
> replication servers are Ubuntu 16.04, Cyrus 2.4.18. (Upgrading
> front-ends on the short list.)
>
> Authentication is via saslauthd, configured to use PAM, which is using
> krb5. Kerberos is running on three different kerberos servers. Load on
> the kerberos servers is light, and the kerb-admin says nowhere close to
> saturated. In fact, it handled much higher numbers of authentications
> before imapproxy on the Webmail service. (That was years ago, previous
> kerb servers, so there is still the possibility the kerberos servers are
> somehow slowed....)
>
> Each Front-end server is configured for 5000 imapd on 143, and 5000 on
> port 993. Netstat shows about 4-5,000 imap connections per front-end
> server when authentication slows. There are well under 5000 imapd
> processes of either type. And after the Friday evening test re-allowing
> off-campus IMAP, the network admin reported about 1600 connections to
> port 993 total as IMAP authentication is slowed to a crawl.
>
> We are not close to file-max on any of the servers.
>
> imapd.conf has a 10 second delay for a badlogin.
>
> There are some mupdate log entries
>
> Thread timed out waiting for listener_lock
> Worker thread finished, for a total of 3 (2 spare)
>
> Around the time of the Friday afternoon problems, when I was restarting
> Front-end servers to recover. And no mupdate log entries since. What
> does this mean? There are entries in syslog when mupdate is restarted,
> stating that it could not reset the file limit to 5k.
> mupdate_connections_max is 1024, so the failure to reset has no affect,
> unless that is the limitation. But I see no log entries indicating that.
>
> Any other resources or limits in either Cyrus or Linux (Debian) that I
> should look at?
>
> Thank you in advance for any help.
>
> Mike
>
>
--
Michael D. Sofka sofkam at rpi.edu
ITI Sr. Systems Programmer, Email, TeX, Epistemology
Rensselaer Polytechnic Institute, Troy, NY. http://www.rpi.edu/~sofkam/
More information about the Info-cyrus
mailing list