choosing a file system
Bron Gondwana
brong at fastmail.fm
Sat Jan 10 04:56:00 EST 2009
On Fri, Jan 09, 2009 at 05:20:02PM +0200, Janne Peltonen wrote:
> I've even been playing a little with userland ZFS, but it's far from
> usable in production (was a nice little toy. though, and a /lot/ faster
> than could be believed).
Yeah - zfs-on-fuse is not something I'd want to trust production data
to.
> I think other points concerning why not to change to another OS
> completely for the benefits available in ZFS were already covered by
> Bron, so I'm not going to waste bandwidth any more with this matter. :)
I did get a bit worked up about it ;)
Thankfully, I don't get confronted with fsck prompts very often, because
my response to fsck required is pretty simple these days :)
a) it's a system partition - reinstall. Takes 10 minutes from start to
finish (ok, 15 on some of the bigger servers, POST being the extra)
and doesn't blat data partitions.
Our machines are installed using FAI to bring the base operating
system up and install the "fastmail-server" Debian package, which
pulls in all the packages we use as dependencies. It then checks
out the latest subversion repository and does "make -C conf install"
which sets up everything else.
This is all per-role and per machine configured in a config file
which contains lots of little micro languages optimised for being
easy to read in a 'diff -u', since that's what our subversion
commit hook emails us.
b) if it's a cyrus partition, nuke the data and meta partitions and
re-sync all users from the replicated pair.
c) if it's a VFS partition, nuke it and let the automated balancing
script fill it back up in its own time (this is the nicest one,
all key-value based with sha1. I know I'll probably have to
migrate the whole thing to sha3 at some stage, but happy to wait
until it's finalised)
d) oh yeah, mysql. That's replicated between two machines as well,
and dumped with ibbackup every night. If we lose one of these
we restore from the previous night's backup and let replication
catch up. It's never happened (yet) on the primary pair - I've
had to rebuild a few slaves though, so the process is well tested.
So - no filesystem is sacred. Except for bloody out1 with its 1000+
queued postfix emails and no replication. It's been annoying me for
over a year now, because EVERYTHING ELSE is replicated. We've got
some new hardware in place, so I'm investigating drbd as an option
here. Not convined. It still puts us at the mercy of a filesystem
crash.
I'd prefer a higher level replication solution, but I don't know
any product that replicates outbound mail queues nicely between
multiple machines in a way that guarantees that every mail will be
delivered at least once, and if there's a machine failure the only
possible failure mode is that the second machine isn't aware that
the message hasn't been delivered yet, so delivers it again. That's
what I want.
I'd also like a replication mode for our IMAP server that guaranteed
the message was actually committed to disk on both machines before
returning OK to the lmtpd or imapd. That's a whole lot of work
though.
(we actually lost an entire external drive unit the other day, and
had to move replicas to new machines. ZFS wouldn't have helped here,
the failure was hardware. We would still have had perfectly good
filesystems that were offline. Can't serve up emails while offline)
Bron.
More information about the Info-cyrus
mailing list