choosing a file system

Fri Jan 9 00:08:00 EST 2009

On Thu, Jan 08, 2009 at 08:57:18PM -0800, Robert Banz wrote:
>
> On Jan 8, 2009, at 4:46 PM, Bron Gondwana wrote:
>
>> On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote:
>>> (Summary of filesystem discussion)
>>>
>>> You left out ZFS.
>>>
>>> Sometimes Linux admins remind me of Windows admins.
>>>
>>> I have adminned a half-dozen UNIX variants professionally but
>>> keep running into admins who only do ONE and for whom every
>>> problem is solved with "how can I do this with one OS only?"

There's a significant upfront cost to learning a whole new system
for one killer feature, especially if it comes along with signifiant
regressions in lots of other features (like a non-sucky userland
out of the box).  Applying patches on Solaris seems to be a choice
between incredibly low-level command line tools or boot up a whole
graphical environment on a machine in a datacentre on the other side
of the world.

>> We run one zfs machine.  I've seen it report issues on a scrub
>> only to not have them on the second scrub.  While it looks shiny
>> and great, it's also relatively new.
>
> You'd be surprised how unreliable disks and the transport between the  
> disk and host can be. This isn't a ZFS problem, but a statistical  
> certainty as we're pushing a large amount of bits down the wire.
>
> You can, with a large enough corpus, have on-disk data corruption, or  
> data corruption that appeared en-flight to the disk, or in the  
> controller, that your standard disk CRCs can't correct for. As we keep  
> pushing the limits, data integrity checking at the filesystem layer --  
> before the information is presented for your application to consume --  
> has basically become a requirement.
>
> BTW, the reason that the first scrub saw the error, and the second scrub 
> didn't, is that the first scrub fixed it -- that's the job of a ZFS 

# zpool status -v rpool
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 0h0m, 0.69% done, 1h40m to go
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c5t0d0s0  ONLINE       0     0     0
            c5t4d0s0  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        //dev/dsk

-------

if that's an "error that the scrub fixed" then it's a really badly
written error message.

Same error didn't exist next scrub, which was what confused me.

Bron.