The master janitor goes crazy / Re: Debugging Deadlocks

Thu Jan 9 07:14:58 EST 2020

Hello,

I had some more observations.  After `killall -9 master` there are zombies, so maybe the master thinks that some process
IDs have to finish, but in fact the ready PIDs are different.  I have not digged this further.

I had problems with the ID: a lot of read/write operations to the disk caused very slow syslog-daemon.  This in turn
means, that each syslog() call needs very long time (a minute or three) to return, as the syslog daemon needs that much
time to come to the syslog() call and conclude that this particular call causes no write on the disk.

After I both improved the IO on the system and removed some syslog() calls in Cyrus IMAP, the shutdown works fast.  I
have not done excessive experiments, but in the past the janitor went grazy on shutdown often and now this does not
happen.

Greetings
  Дилян

On Wed, 2020-01-08 at 16:02 +1100, ellie timoney wrote:
> Okay, just got distracted by this happening again.  The way I'm reproducing it is by ^C'ing a Cassandane run (which cleanly shuts down the cyrus instances it had started) during the JMAP tests, and I guess I'm sometimes getting lucky on the timing and hitting this.
> 
> On Wed, Dec 4, 2019, at 2:22 PM, ellie timoney wrote:
> > I think the really useful information to collect next time this happens 
> > (and while the master process is still running) is:
> > 
> > * What does lsof tell us about that ready file descriptor (in the 
> > example, fd 11)?  I would be very interested to know if it's a client 
> > socket, or an internal messaging socket (that service processes use to 
> > tell master their status).
> 
> master  3121 cyrus   11u  IPv4           49559608      0t0      TCP localhost:9146 (LISTEN)
> 
> This is the port the http service was listening on.
> 
> > * If you can attach a debugger and step through a couple of iterations 
> > of master's big "for (;;) {" loop, what path is it taking?  What 
> > decisions is it making?
> 
> So the problem pops up here:
> 
>             /* connections */
>             if (y >= 0 && Services[i].ready_workers == 0 &&
>                 Services[i].nactive < Services[i].max_workers &&
>                 !service_is_fork_limited(&Services[i])) {
>                 if (verbose > 2)
>                     syslog(LOG_DEBUG, "listening for connections for %s/%s",
>                            Services[i].name, Services[i].familyname);
>                 FD_SET(y, &rfds);
>                 if (y > maxfd) maxfd = y;
>             }
> 
> (gdb) p Services[i]
> $28 = {name = 0x55df363956c0 "http", listen = 0x55df36395170 "127.0.0.1:9146", 
>   proto = 0x55df36395680 "tcp", exec = 0x55df363956e0, babysit = 0, 
>   associate = 0, family = 2, familyname = 0x55df35a8c780 "ipv4", socket = 11, 
>   stat = {12, 13}, desired_workers = 0, max_workers = 2147483647, maxfds = 0, 
>   maxforkrate = 0, ready_workers = 0, nforks = 1, nactive = 1, 
>   nconnections = 1, forkrate = 0, nreadyfails = 0, lastreadyfail = 0, 
>   last_interval_start = {tv_sec = 1578455582, tv_usec = 342312}, 
>   interval_forks = 1}
> 
> The http service has shut down already... sort of. It's hanging around as a defunct process:
> cyrus     3143  3121  0 14:53 ?        00:00:00 [httpd] <defunct>
> 
> We have no ready_workers, and nactive is less than max_workers, so we're adding the service socket (fd 11) to the listen set.  If verbose were large enough, I think we would be logging "listening for connections for %s/%s" during this loop.
> 
> So, we have data on the socket for this service, and if we weren't already in_shutdown we'd be trying to spawn a new service process to handle it.  But we ARE in_shutdown, so we don't spawn a service, so the data on the socket remains unhandled.
> 
> We don't simply finish shutting down, because that nactive=1 in the service table entry means we think we still have children, so we call child_janitor and then go round the loop again.
> 
> Sooooo we're waiting for child_janitor to clean up after the http service, I guess.
> 
> Inside child_janitor, here's the entry:
> (gdb) print **p
> $36 = {pid = 3143, service_state = SERVICE_STATE_BUSY, janitor_deadline = 0, 
>   si = 1, 
>   desc = 0x55df36397f30 "type:SERVICE name:http path:/dev/shm/cyrus/main/libexec/httpd", spawntime = {tv_sec = 1578455582, tv_usec = 682977}, sighuptime = -1, 
>   next = 0x0}
> 
> So, it thinks the http process is still busy, so we're spinning and waiting for it...  but the defunct state tells us it's already exited, and the system is just waiting for the parent process (master) to waitpid() it before getting rid of it completely.
> 
> It's pretty clear that there's still client data on that socket, but I'm not sure if "shut down Cyrus while there's still a client sending data to httpd" is enough to trigger it, or if it's more subtle.  The data pending on the socket is the reason we see pselect() called over and over again, but I don't think it's the cause of the problem: without that pending data, we'd still be hanging here instead of shutting down cleanly, but we'd be blocked in pselect() instead.
> 
> We _should_ have gotten a sigchld from httpd shutting down, and then reap_child should have waitpid'd it and cleaned up the child table, so we shouldn't be spinning like this.  So I guess the failure occurred somewhere in the sigchld handling, but by the time I notice it spinning it's too late to see what happened.  Did the sigchld go missing?  Did reap_child do the wrong thing?  Who knows!
> 
> I don't have any further insight at the moment.  I guess if I can find a way to reproduce it reasonably reliably, then I can litter reap_child with debug output and see what it reports just before it goes pear-shaped....
> 
> Cheers,
> 
> ellie