choosing a file system
brong at fastmail.fm
Fri Jan 9 00:08:00 EST 2009
On Thu, Jan 08, 2009 at 08:57:18PM -0800, Robert Banz wrote:
> On Jan 8, 2009, at 4:46 PM, Bron Gondwana wrote:
>> On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote:
>>> (Summary of filesystem discussion)
>>> You left out ZFS.
>>> Sometimes Linux admins remind me of Windows admins.
>>> I have adminned a half-dozen UNIX variants professionally but
>>> keep running into admins who only do ONE and for whom every
>>> problem is solved with "how can I do this with one OS only?"
There's a significant upfront cost to learning a whole new system
for one killer feature, especially if it comes along with signifiant
regressions in lots of other features (like a non-sucky userland
out of the box). Applying patches on Solaris seems to be a choice
between incredibly low-level command line tools or boot up a whole
graphical environment on a machine in a datacentre on the other side
of the world.
>> We run one zfs machine. I've seen it report issues on a scrub
>> only to not have them on the second scrub. While it looks shiny
>> and great, it's also relatively new.
> You'd be surprised how unreliable disks and the transport between the
> disk and host can be. This isn't a ZFS problem, but a statistical
> certainty as we're pushing a large amount of bits down the wire.
> You can, with a large enough corpus, have on-disk data corruption, or
> data corruption that appeared en-flight to the disk, or in the
> controller, that your standard disk CRCs can't correct for. As we keep
> pushing the limits, data integrity checking at the filesystem layer --
> before the information is presented for your application to consume --
> has basically become a requirement.
> BTW, the reason that the first scrub saw the error, and the second scrub
> didn't, is that the first scrub fixed it -- that's the job of a ZFS
# zpool status -v rpool
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
scrub: scrub in progress for 0h0m, 0.69% done, 1h40m to go
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t0d0s0 ONLINE 0 0 0
c5t4d0s0 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
if that's an "error that the scrub fixed" then it's a really badly
written error message.
Same error didn't exist next scrub, which was what confused me.
More information about the Info-cyrus