Recovering from the crash of 1998-08-27

Q. You say the recovery is done, but I don't see my old folders in pine. Where are my files?

It is likely that recovered files will have the same names as existing files, or that you've changed the way you organize your home directory since the crash. I decided the best plan would be to put everything I recover in ~/restore-19980827. For the rest of this document, ~ = your home directory.

In the best case, this tree will look exactly like your home directory did at the time of the crash. I've made one addition: your old mail inbox is in ~/restore-19980827/mail/inbox-restore-19980827. On a normal running system, your inbox resides outside your home directory, so I moved your old inbox into your old home directory.

Also, you may find some litter in ~/restore-19980827/lost+found. In some cases, the name of a file or directory was lost, but its contents were not. These files and directories are in lost+found.

For some unlucky users, the name of your home directory was lost. In that case, you will very likely still find most of your files, but they'll be in ~/restore-19980827/lost+found/#3391015 (or some other number) instead of in ~/restore-19980827 like the lucky users. For the next question, I'm assuming you're a lucky user. If you're not, and you can't intuitively figure out what to do, just read on a little further.

Q. All I care about is getting my old email. How do I do this?

If you do this, and find that you don't have any folders in the ``Restore'' collection, or if you find you have only an ``inbox'' folder in the ``Restore'' collection, then you are one of the unlucky ones who needs to read the ``I care about more than email'' question below. Don't give up until you've read it.

Q. When you were recovering the data, did you read my email?

No.

I saw the names of files in lost+found, but that's it. I didn't inspect the contents of anything owned by a user, and as far as I can tell no files were recovered without having their ownership recovered, too.

Q. I remember some of the motd's mentioned the filesystem was ``corrupted,'' and that this is the source of the complication, which lead to your procrastination and the nine-month delay. What do you mean, ``corrupted?''

One can count on a hard disk to eventually wear out or break--the only difference between the two is in how long it works before it fails. Hard disks are properly regarded as disposable.

Like hotel doorknobs, hard disks have computers in them. The computer part does not ``wear out.'' The computer part can only ``break''--only the physical spinning medium and moving heads of the disk are expected to ``wear out.'' Indeed, that's what happened to audrey's old disk.

Usually this means only part of the disk goes bad. Data contained within blocks that went bad is lost, while other data is still available.

When (as audrey's disk did) only a few blocks go bad, but not the entire disk, then the filesystem is ``corrupted.'' This is a state of the filesystem that is conceptually similar to the ``partly worn out'' state of the disk: some data can be read from it, but not all data.

Q. The disk went partly bad. Certain regions of the disk became inaccessible. So, only the files stored in those regions were affected, yes? I can assume the corruption affected a certain small percentage of the disk, and that my files will lie within that region with a certain probability, right? That makes sense--I think I understand what's going on now. so, what is the loss percentage, anyway?

No, you don't understand. Not yet.

To think this way understates the complexity of a filesystem. The data in files is stored on a disk. Also stored are the names of files, and maps to find what part of the disk holds their contents. The names and block allocation maps of files are called ``metadata''--data which is not part of any file on the disk, but is still stored on the disk.

To achieve some immediate understanding: a block that went bad could contain metadata, and thus make many files inaccessible. Or, when a block goes bad inside a directory, files are not lost but the _names_ of the files are lost, so they show up as a file named #3391015 in the lost+found directory after recovery.

In actuality, I doubt that the major part of the corruption we experienced is the fault of disk blocks going bad. I can discuss this in more detail if you like.

Q. What this comes down to is simple: you've lost my files once. What is the likelihood you will lose them again?

If we had been running NetBSD when this disk started going bad, far less data would have been lost. Also, the recovery was complicated by the fact that Linux could not read the marginal disk at all--I had to read the marginal disk with NetBSD, rewrite the data onto a good disk with NetBSD, and then let Linux fsck and mount the good copy. Even if I hadn't been such a creative procrastinator, I doubt I could have recovered the data before I had NetBSD running.

However, all sorts of things can lead to losing data. It's unlikely that any future disaster would go down exactly as this one did. The only way I can significantly improve the saftey of your data is to store it in more than one place, either with a replicating filesystem like Coda, or regular tape backups.

Since I do neither of these things, the likelihood that I will lose your files again is high.

If you want to perform your own backups, you may find zip and zipsplit useful tools. You can pack your email into floppy-disk-sized chunks and safely FTP it onto even a relatively primitive desktop machine. I reccommend writing each chunk onto at least two floppies. I also reccommend verifying each of the ZIP files after you write it, using StuffIt, WinZip, or 'pkunzip -t'. I will help you further with this if you want.

However, I have no ETA for when I will start backing up your data regularly.

Q. On second thought, I care about more than email. There were some other files I had, that I want to look at. Also, now that I understand the corruption better, I suspect I may have mail folders without names in lost+found. How can I inspect the recovered tree more closely?

If you are comfortable using the UNIX shell and fileutils, you probably wouldn't ask this question--you'd just poke around in ~/restore-19980827 and find what you need.

For those of you that aren't comfortable with the shell, the makers of everyone's-favorite-email-program have written a file browser, too. It's called pilot. Just type 'pilot' at the prompt where you normally type 'pine'.

With pilot, you can navigate graphically into the restore-19980827/lost+found directory and view whatever files you like. The Berkeley mail folders that pine uses by default are ordinary text files, so you can easily view them and see that they contain messages. When you find a mail folder, you can move it into pine's folder collection with the R)ename command. For example:

If you still can't get any of your old email back, send me a message, and I will try to help you. If I can't at least I'll send you a tootsie-pop to cheer you up.

Q. How can I recover my pine addressbook?

Your address book is a file named .addressbook which will hopefully show up in the restore directory. If it does, you should be able to see it there with pilot.

If it's there, getting at its contents is easy.

Merging your old address book with your current address book is slightly more difficult, but it _can_ be done. The procedure is predictable and consistent.

You can add the entries all at once if you like, or pine will give you a list of all the entries and let you add them one-at-a-time with (painfully) extensive interaction. Either way, you will be warned before an entry in your existing addressbook is overwritten with an old one.

I do not use addressbooks myself, so I tested this procedure with two synthetic pine 4.05 addressbooks. The addressbook format has changed since pine 3.96, and I am not sure how that change affects this procedure. I am optimistic that the procedure will still work, but you can email me if it does not.

When you are done merging, you probably want to go back to S)etup A)ddressbooks and remove your 'restore' addressbook.

Q. How long do I have to get stuff out of the recovered tree before you delete it?

I do not plan to delete the recovered tree. I do not plan to make any further changes to your home directory as a result of this incident. If you feel the directory is clutter, you will have to delete it yourself.


disk recovery FAQ / map / carton's page / Miles Nordin <carton@Ivy.NET>
Last update (UTC timezone): $Id: disk-recovery-faq.html,v 1.4 2004/09/08 07:38:51 carton Exp $