>>>>> "jc" == Joel CARNAT <joel@xxx> writes: jc> where: jc> - Windows just goes blue screen jc> - Linux froze the shell where the access to the disk was jc> started (cd, cp, or such) jc> - NetBSD dropped thousand of messages IMHO, if the error is in a data block the kernel should return EIO to whatever was reading. If the error is in metadata it should remount the filesystem read-only and return EIO as needed. If the error is on an mmap'ed page, I don't know what it should do. And if the error is on the swap partition it should kill any process it was trying to swap in, and blacklist those disk pages. This is exactly the sorts of things Solaris claims to do in Solaris 10 since they started their ``green line'' marketing campaign. They claim they can retire memory modules and CPU's as well. But it's a god damned steaming pile of lies. Solaris panics or freezes forever, sometimes when just one component of a supposedly-mirrored ZFS vdev goes away. On the mailing list they say ``ZFS is not integrated with FMA yet.'' Any year now. At least they know what they're _trying_ to do, though! I've found that in general NetBSD and (recent) Linux can usually get through a 'dd conv=noerror,sync bs=512' on a failing disk, though it sometimes takes a week. Other than that capability, all bets are off. As for why it freezes parts of your system totally unrelated to the block that went illedgible, probably because the disk takes tens of seconds to return failure, and won't service other requests in the mean time. I could imagine a world where this is fixed. There are ``mode pages'' on the disk that you could tweak on the pre-Jobs Mac OS using ``FWB Hard Disk Toolkit'', to ask the disk to return failure sooner. If the disk could be convinced to continue servicing a tagged queue while concurrently doing the retry, that might help through a different mechanism that gives the disk more freedom in its recovery protocol, though you would need some combination of the two tricks to avoid filling the short queue with read-requests for illedgible sectors. But I've never heard of anyone using either approach. Since most disk drive customers don't do it, it seems like you would need to hold a software support contract for your disk's firmware to pull it off consistently. The mode pages aren't always implemented, and a marginal-disk-simulator would be valuable. I wonder what EMC and Hitachi do. For writes, it's no problem because they have their fancy NVRAMs. For reads, maybe just indulge disks that freeze up: if a disk doesn't answer within some subsecond interval, don't wait for it to report fialure. Dispatch the same read to another RAID component, on expiry of the OS's _own_ timer rather than the disk's? Update a disk-demerit counter, and retire the disk if it freezes too often? Anyway, what I mean to say is, I think (good) RAID avoids more than just data loss. Without it, collecting logs of a slowly-failing disk to find and fix the problem is harder. Things like mirrored swap make sense.
Attachment:
pgpfpddm044mH.pgp
Description: PGP signature