Solaris iSCSI 1. to check the initiator password used for discovery, the globally-configured password in IET. there's no bidirectional auth for send-targets. iscsiadm list discovery-address -v 2. if the IET ``OutgoingUser'' is not matching, the less-often-used one that authenticates the target to the initiator, Solaris will say (backwards-soundingly): WARNING: iscsi connection(147) login failed - login failed to authenticate with target WARNING: iscsi connection(147) login failed - Initiator could not be successfully authenticated. (0x02/0x01) 3. if the IET ``IncomingUser'' is not matching, Solaris gives the same error: WARNING: iscsi connection(116) login failed - login failed to authenticate with target WARNING: iscsi connection(116) login failed - Initiator could not be successfully authenticated. (0x02/0x01) WARNING: /iscsi/disk@0000iqn.2006-11.chaos.inner.th3h.fishstick%3Asdg0001,0 (sd0): offline or reservation conflict but here IET will say (which it does not say in (2)): iscsid: CHAP initiator auth.: authentication of iqn.1986-03.com.sun:01:0003ba2d0f67.46e47e5d failed (wrong secret!?) 4. the iSCSI initiator loves to fail to commit its config changes promptly to stable storage. I'm not sure if even 'sync' will do it. well, if running iscsiadm in '-m milestone=none', be sure to first remount root filesystem read/write please! 5. ZFS _still_ (Nevada b71) loves to panic the whole machine when drives go away, _sometimes_. the write-panic? panic: ZFS: I/O failure (write on off 0: zio 30011a31460 [L0 bplist] 4000L/4000P DVA[0]=<0:504e30000:4000> DVA[1]=<0:1080f1c000:4000> DVA[2]=<0:3780374000:4000> fletcher4 uncompressed BE contiguou -----8<----- other zfs problems: bash-3.00# zpool status pool: aboveground state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver in progress, 0.51% done, 15h20m to go config: NAME STATE READ WRITE CKSUM aboveground ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t0d0 ONLINE 0 0 18 c3t1d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 errors: No known data errors first of all, 'zpool status' should always return instantly, like metastat and gmirror status do. It likes to hang quietly for unpredictable amounts of time. A status command should not be touching any disk. If it is, it's by definition doing more than just giving status of ZFS's current state---it's prodding at the guts of ZFS, ``is this device still unopenable?'' or something. There's no way to get what was the status before you mucked with it by typing 'zpool status' and setting off whatever state machines it fires. here, c3t0d0 was briefly unavailable, making the mirror DEGRADED. The mirror ``resilvered'' as soon as c3t0d0 disappeared---I don't know what it means to resilver a single component. No doubt it's something very clever and important (maybe part of why we don't need a metadb with ZFS?), but with one component whatever it's doing really shouldn't be called the same thing as resilvering IMHO. It doesn't take a similar amount of time to resilvering, either---with one component it goes fast. but more importantly, when c3t0d0 came back, no resilvering! The mirror stays out of sync. SVM and gmirror have some kind of fast-resync, and when that doesn't work they just do a full rebuild. I'm not sure I've ever seen one of these fast-resync's actually work (are they just too fast for me? secretly they're happening all the time, more hidden errors? or is fast-resync only used when a full-resync is started, then interrupted?), but at least they never leave the mirror out of sync, and certainly don't do so with no error message aside from these mysterious CKSUM errors that drizzle in over the next few weeks of running this silently desync'd mirror. In my case, from time to time ZFS will detach c3t0d0 again for having excessive CKSUM errors, and I have to 'zpool clear ...' it! ---- aside: What I'm pretty sure I have seen work: * if you 'zpool offline ', disconnect the device, reconnect the device, and 'zpool online ...' it, ZFS will kick iSCSI to reopen the session, and do a very quick, just a few seconds, resilver. And there won't be any CKSUM errors trickling in over the next few weeks. won't work if you just unplug the device without 'zpool offline ...'ing it, though. so it's no good for helping with a scratchy firewire connection, or a brief network outage for an STP tree recalculation or something. * if you have an SVM mirror rebuilding and reboot before it finishes, it'll restart the copy from where it left off. ---- anyway, back to the narrative. next, I did 'zpool attach aboveground c3t0d0 c3t7d0'. That's the cause of the resilver in progress above. but c3t7d0 says ONLINE. It doesn't say degraded, or resilvering, or anything. SVM and gmirror will give the proper status of a component during a rebuild. Here, I don't know which one's rebuilding. This time for me, the resilver got up to almost 3%: bash-3.00# zpool status pool: aboveground state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver in progress, 2.37% done, 15h26m to go config: NAME STATE READ WRITE CKSUM aboveground ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t0d0 ONLINE 0 0 20 c3t1d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 errors: No known data errors great. So far so good. Then, this happened: bash-3.00# zpool status pool: aboveground state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver in progress, 0.01% done, 415h6m to go config: NAME STATE READ WRITE CKSUM aboveground DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c3t0d0 DEGRADED 0 0 1.59K too many errors c3t1d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 errors: No known data errors Remember when I said a component going _away_ starts a resilver? Well, that's happened. We're back to zero. But what's it really doing right now? Is it actually syncing from the good copy onto the newly-added component, c3t1d0 -> c3t7d0? Or is it doing that maybe-metadb-marking thing it does, where it says ``resilvering'' even though you only have one component? When this resilver is over, how do I know c3t7d0 is really, truly up-to-date? In the past, out-of-sync components aren't flagged. Resilvering components aren't flagged, either. From this status message, I have no idea what's going on, and because ZFS's failure to properly resilver is what _caused_, er, whatever we're looking at above, I'm out of trust, too. I want to know exactly what vdev ZFS is ``resilvering'' right now, because I don't trust it any more. The closest I can come is this: bash-3.00# zpool iostat -v capacity operations bandwidth pool used avail read write read write ------------ ----- ----- ----- ----- ----- ----- aboveground 260G 38.2G 24 10 1.48M 80.6K mirror 260G 38.2G 24 10 1.48M 80.6K c3t0d0 - - 13 7 1.10M 82.3K c3t1d0 - - 6 8 571K 81.3K c3t7d0 - - 0 76 414 4.82M ------------ ----- ----- ----- ----- ----- ----- apparently, even though c3t0d0 is degraded, it's still getting significant read bandwidth, so I was wrong above. degraded doesn't mean disabled. In our case there are checksum errors because the disk is out-of-sync with the mirror, but one thing I worry about is, when disks fail they usually start operating at 1/1000th normal speed first, so if ZFS marks such a disk as degraded and keeps using it, the whole array will be so uselessly slow it may as well be failed. I hope they've thought of that. At least there are no reads to the empty disk. That's comforting. Now, what happens when you try to boot your system, and one of the components of a ZFS mirror isn't there? If it's an iSCSI component, god help you. The system will never come up. You have to boot with 'boot -m milestone=none'. Now what? # zpool status aboveground NOTICE: iscsi connection(25) unable to connect to target SENDTARGETS_DISCOVERY NOTICE: iscsi discovery failure - SendTargets (010.100.100.135) NOTICE: iscsi connection(27) unable to connect to target SENDTARGETS_DISCOVERY NOTICE: iscsi discovery failure - SendTargets (010.100.100.138) pool: aboveground state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM aboveground UNAVAIL 0 0 0 insufficient replicas mirror UNAVAIL 0 0 0 insufficient replicas c3t1d0 UNAVAIL 0 0 0 cannot open c3t7d0 UNAVAIL 0 0 0 cannot open # zpool export aboveground NOTICE: iscsi connection(29) unable to connect to target SENDTARGETS_DISCOVERY NOTICE: iscsi discovery failure - SendTargets (010.100.100.135) NOTICE: iscsi connection(31) unable to connect to target SENDTARGETS_DISCOVERY NOTICE: iscsi discovery failure - SendTargets (010.100.100.138) internal error: No such device Abort I thought I would export the pool, reboot, fix iSCSI, and then import the pool. seem reasonable? But ZFS is full of this chicken-and-egg crap. You can't export the pool because it's unavailable. No, you can't 'zpool export -f' it either. Why the hell not? I have to delete my zpool.cache to accomplish this, which will delete all pools and cause major problems (I have /usr on ZFS) like not being able to run 'zpool import' because zpool and its libraries are on /usr. This is _exactly_ why in Unix we normally don't tolerate these binary config files, because they're walled behind tools that forbid you from editing them the way you'd like to in a serious emergency. The Windows Registry is one of the ugliest mistakes I can imagine, yet here on Solaris we have one, too, but we don't even have RegEdit. I got it to boot this way: ok boot -m milestone=none [boots. enter root password for maintenance.] bash-3.00# /sbin/mount -o remount,rw / [<-- otherwise iscsiadm won't update /etc/iscsi/*] bash-3.00# /sbin/mount /usr bash-3.00# /sbin/mount /var bash-3.00# /sbin/mount /tmp bash-3.00# iscsiadm remove discovery-address 10.100.100.135 bash-3.00# iscsiadm remove discovery-address 10.100.100.138 bash-3.00# iscsiadm remove discovery-address 10.100.100.138 iscsiadm: unexpected OS error iscsiadm: Unable to complete operation [<-- good. it's gone.] bash-3.00# sync bash-3.00# lockfs -fa bash-3.00# reboot [reboot] iscsiadm add discovery-address ... iscsiadm list discovery-address -v [verify targets appear underneath each discovery IP address.] iscsiadm list target -vS [verify targets have open session and exist. oops. they don't.] format -e [verify targets have an IDENTIFY name. they do.] iscsiadm list target -vS [now targets do have sessions.] zpool status [pool is online.] zfs list [pool's filesystems are there] ls /export [not mounted] zfs mount -a ls /export [mounted.] [however, some NFS shares are not exported! but some are, just not all! Damnit!!] zpool export aboveground zpool import aboveground [do the ZFS swizzle again, and now NFS is working.] [instead of the export/import, i'm trying 'svcadm refresh network/nfs/server'. working so far.] Now, go restart all nfs clients, because nfs being ``stateless'' or whatever at some point became a lie. These days, I find netbooted clients get unstable and slowly crash if the NFS server goes away. They seem to come back when the server comes back, but then apps start dying. This happens even if they're not using NFS swap. The whole thing is frustratingly fragile and reminds me of Windows. but of course I know the real truth is I could still never do and maintain this whole mess on a different platform. I'm just sad, because NFS really did used to work properly w.r.t. server rebooting, something Windows could never manage. I could possibly go disable NFSv4 everywhere... In my configuration, I have one big Solaris box with a 'boot' pool that runs off direct-attached mirrored FC-AL disks, and an 'aboveground' pool that runs off iSCSI disks inside Linux boxes which are each running the IET daemon. The Linux boxes netboot off the Solaris box, off shares created inside the 'boot' pool. This way, Linux isn't dependent on any of it's own single disks, only on mirrored storage. I don't have to face the i386 redundant booting disaster, ``No operating system'', ``Press to continue or Ctrl-W to enter WizzBangArrayWizard,'' all that garabage---OpenPROM boots Solaris off the mirror, and Linux boots off Solaris over the network. Unfortunately, because even post-FMA-integration ZFS still waits forever for iSCSI targets to appear, I have to go through this 'ok boot -m milestone=none' dance every time I reboot the Solaris box!