robustness ...

Rob Siemborski rjs3 at andrew.cmu.edu
Fri Aug 1 09:59:30 EDT 2003


On Fri, 1 Aug 2003, Jure Pecar wrote:

> Why am i writing all this?
> I still think that imapd should not crash or do other unreasonable things
> (like looping with 100% cpu consumption) when confronted with corrupted
> files. I had many interesting expiriences on our production system, where
> fs badly crashed. IMHO Cyrus still needs some work in the robustness area.

Of course, once you're running on a corrupted file system, all bets are
off.  Any number of things could be wrong: the binaries could have been
damaged, files may have been reassembled incorrectly, or even be missing
entirely.

Cyrus does go to great lengths to defend itself against crashes during
transactional operations (so that data isn't partially committed if the
system crashes), but defending against general filesystem corruption is
an entirely different animal.

Given the amount of memory mapping involved in Cyrus, asking it to
successfully operate given a corrupt filesystem is sort of like telling
any program to operate in the face of unreliable main memory.  Sure, if
you're being very very careful you may be able to get some semblance of
correct behavior, but you'll also take huge performance hit in the common
(un-corrupted) case, and you still may not be able to survive in the
corruption case.

Cyrus does provide tools to help recover from filesystem crashes.  These
include the database recovery utilities, the chk_cyrus utility (which we
wrote after a severe filesystem crash of our own!), reconstruct, and so
on.

Should we strive to do better?  Probably, but when faced with a decision
of whether to track down a problem in Cyrus during normal operation, or
track down a problem in Cyrus in the face of filesystem corruption, I'm
going to have to pick the former almost every time, since it has wider
applicability and there already exist tools to return the Cyrus data store
back to a consistent state.

In any case, if you are so worried about resilience in your software, why
are you using alpha quality software on your production system?

-Rob

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Rob Siemborski * Andrew Systems Group * Cyert Hall 207 * 412-268-7456
Research Systems Programmer * /usr/contributed Gatekeeper





More information about the Info-cyrus mailing list