Bad logins bogging down server

Tue Sep 19 09:52:06 EDT 2017

Follow up....

The botnet is still hammering away, checking those old accounts.  But 
the bottleneck appears to have been saslauthd threads.  Doubling the 
thread count from 5 to 10 has resolved the problem for now.  (And, might 
even explain the occasional slow response from IMAP I've observed.)  I 
will run more experiments to see just how high the thread count should 
be, and I've got a list of other optimizations I will try.  This note is 
  in case somebody else sees the same problem.

Mike

On 09/16/2017 07:41 AM, Michael D. Sofka wrote:
> I'm seeking help from the collective wisdom of the Cyrus world.
> 
> In the past two days we have seen first a doubling, and then a 
> quadrupling+ of badlogins to Cyrus.  These appear to be coming from a 
> botnet, in that the IPs are spread around in a way that evades fail2ban. 
>   It got so bad Friday afternoon, that we took the extraordinary step of 
> blocking off-campus connections to IMAP (email can still be read via 
> Webmail and the VPN).
> 
> The symptoms are that connections grow, and grow and grow until 
> authentication slows, holding open connections longer and longer.  It 
> takes about 15 minutes for the connection number to be at a point at 
> which service is interrupted.  Friday night at attempt was made to 
> re-enable off-campus IMAP, and the bots were still at it, service was 
> again disrupted.
> 
> But the number of connections does not appear close to max permitted by 
> Cyrus.
> 
> We have a Murder cluster:  Three front-end servers, Two back-end 
> servers, Two replication servers.
> The front-end servers are Ubuntu 14.04, Cyrus 2.4.17.  The back-end and 
> replication servers are Ubuntu 16.04, Cyrus 2.4.18.  (Upgrading 
> front-ends on the short list.)
> 
> Authentication is via saslauthd, configured to use PAM, which is using 
> krb5.  Kerberos is running on three different kerberos servers. Load on 
> the kerberos servers is light, and the kerb-admin says nowhere close to 
> saturated.  In fact, it handled much higher numbers of authentications 
> before imapproxy on the Webmail service. (That was years ago, previous 
> kerb servers, so there is still the possibility the kerberos servers are 
> somehow slowed....)
> 
> Each Front-end server is configured for 5000 imapd on 143, and 5000 on 
> port 993.  Netstat shows about 4-5,000 imap connections per front-end 
> server when authentication slows.  There are well under 5000 imapd 
> processes of either type.  And after the Friday evening test re-allowing 
> off-campus IMAP, the network admin reported about 1600 connections to 
> port 993 total as IMAP authentication is slowed to a crawl.
> 
> We are not close to file-max on any of the servers.
> 
> imapd.conf has a 10 second delay for a badlogin.
> 
> There are some mupdate log entries
> 
>     Thread timed out waiting for listener_lock
>     Worker thread finished, for a total of 3 (2 spare)
> 
> Around the time of the Friday afternoon problems, when I was restarting 
> Front-end servers to recover. And no mupdate log entries since.  What 
> does this mean?  There are entries in syslog when mupdate is restarted, 
> stating that it could not reset the file limit to 5k. 
> mupdate_connections_max is 1024, so the failure to reset has no affect, 
> unless that is the limitation.  But I see no log entries indicating that.
> 
> Any other resources or limits in either Cyrus or Linux (Debian) that I 
> should look at?
> 
> Thank you in advance for any help.
> 
> Mike
> 
> 

-- 
Michael D. Sofka               sofkam at rpi.edu
ITI Sr. Systems Programmer,   Email, TeX, Epistemology
Rensselaer Polytechnic Institute, Troy, NY.  http://www.rpi.edu/~sofkam/