Re: [zfs-discuss] ZFS vs FAT



>>>>> "jh" == Johan Hartzenberg <jhartzen@XXX> writes:

    jh> Regarding the second type there has been a few bugs in the ZFS
    jh> code, but compared to the bugs in other file systems,
    jh> remarkibly few.

fine, provided you replace ``other filesystems'' with ``other
brand-new unproven filesystems.''

I believe there are more unresolved reports of corruption with ZFS
than with UFS or ext3.

The things that bother me about the ZFS corruption reports:

 * for the mailing list posters, there's no followup.  The people
   using ZFS roughly and experiencing corruption seem to tend to be
   the ones without support contracts.

   except for the one case of the guy with iSCSI that said, ``be sure
   to have ZFS-level redundancy if you're using iSCSI to vendor RAID
   storage hardware''.  For him it almost sounded as if he'd asked for
   support, and they told him ``sorry you're SOL.  we've root-caused
   the problem, and the workaround is, use ZFS-level redundnacy.  no
   other fix is available in stable Solaris.''  He did not say that,
   though, I'm guessing.  I suppose we should find the post and ask
   him.

 * since ZFS has no fsck tool, it can't seem to make up its mind about
   how aggressive to be in recovering.  For UFS, FFS, ext3, reiser,
   XFS filesystems, the fsck tool sort of switched into an aggressive
   mode, ``manual intervention required'' and the manual intervention
   was basically to just say ``fsck --simon-sez'' and cross your
   fingers.  The meaning was, ``try to recover as much data as
   possible, admitting that you've failed to keep your integrity
   guarantees.''  There's no such mode with ZFS.  There is some
   handling of sporadic checksum errors, but the special-cases for
   invalid metadata don't seem to be fully fleshed out.

   with ZFS, like many other obnoxious tools in Solaris, it is often
   extremely obsinate about not letting you do something which it
   thinks could cause strange behavior.  I think its short implicit
   fsck on import is very conservative, and if it rejects your pool,
   there is no recourse, no --simon-sez switch.

   this is arguably a good approach.  It should be a goal that
   eventually you can import any sequence of bits without ever
   panicing the kernel (though that doesn't seem to be the case now).
   And corruption caused by bugs should be fixed at the source, not
   after the fact---either one accomplishes the same thing, except for
   people using out-of-date software, who, once things settle down,
   shouldn't exist.

   but they've yet to prove it's a good approach, and that they can
   actually finish at the level of quality required to make it valid.

   And sometimes it seems like they care more about never having to
   say, ``yes, you paid for support, and no, we lack the resources and
   talent to fix your problem,'' than they do about writing a system
   least likely to lose data.  I'm concerned that this lack of a
   --simon-sez flag has more to do with blaming you rather than Sun
   for losing data, than it does for actually believing in the good
   approach I described in the last paragraph.  The meaning of the
   flag, ``admit you've failed to keep your integrity guarantees, and
   proceed with recovery,'' is I think something they never want to
   express.  The fact that checksum errors are blamed on
   disk/cable/controller problems but occur in other circumstances
   further argues for this mindset.  and if that approach is going to
   pervade the entire filesystem's design, or at least the real
   recovery features are going to be sealed behind a user interface
   that tries to blame the operator and prevent his taking certain
   recovery actions, it'll be a really significant unattractive
   feature.

Attachment: pgpQ5YWQtJ919.pgp
Description: PGP signature