Cyrus backend crashing (Solaris)

Kenneth Marshall ktm at
Mon Jul 19 12:39:53 EDT 2010

On Mon, Jul 19, 2010 at 05:04:18PM +0100, David Mayo wrote:
> We are running a Cyrus Murder with one proxy/MUPDATE master and one 
> backend server which uses Cyrus Replication to replicate onto another 
> host. All hosts are running Cyrus imapd 2.3.13 on Solaris 10 Update 7.
> Since moving all our mailboxes to the backend server in March we have 
> seen 4 crashes on the backend server where it has refused to accept new 
> LMTP connections[1] and, although the logs show it is accepting IMAP 
> connections, no clients can get any response to the IMAP server. There 
> don't appear to be any suspicious logs leading upto the event on the 
> proxy server and nothing at all on the backend server.
> When trying to diagnose the issue, on any attempt to run ps, prstat or 
> to HUP the syslogd process (to set the log level for imapd to "debug") 
> the command hangs and cannot be exited with Ctrl+C. Similarly, attempts 
> to kill the master process or shut down the system (even bypassing the 
> shutdown scripts by using "reboot") do not have any effect other than 
> hanging the shell in which the commands were issued. New shells can be 
> opened and certain commands run, but we aren't much closer to knowing 
> precisely what is wrong. The only way to bring the system back is to 
> reset it via the on board console.
> My suspicion is somehow the behaviour in Cyrus is tickling a Solaris 
> bug, but I wanted to check with other Cyrus admins to see if they have 
> seen similar behaviour and had tracked it down to anything in particular.
> Regards,
> Dave.
> David Mayo
> Networks/Systems Administrator
> University of Bath Computing Services, UK
> [1]
> imapd.log:
> Jul  8 11:04:51 lmtpunix[984]: [ID 130975 
> mail.error] connect( failed: Connection timed out
> exim log:
> 2010-07-08 11:04:51 1OWnyD-0000Fo-Ic == XXXXX at R=imap 
> T=cyrus_lmtp defer (-46): LMTP error after end of data: 451 4.4.3 Remote 
> server unavailable

Bad hardware can also cause symptoms of this type. Can you run
diagnostics? I know that that is often hard to do on a production


