Discussion:
mdadm dropped disk, won't re-add
John Paul Adrian Glaubitz
2012-02-15 13:58:42 UTC
Permalink
Hello,

I have a rather big problem with my Linux software RAID5.

It consists of 4 SATA disks each 1 TB in size, resulting in a 3 TB RAID5
volume (/dev/md0 assembled from /dev/sd{b,c,d,e}1.

Today, mdadm kicked disk sde1 from the RAID since the cable seemed to
make problems. I shutdown the machine, replaced the cable and tried
re-adding the disk, however, mdadm refused to add the drive.

So I re-partioned sde1 and added it as a new devices, mdadm instantly
started rebuilding the raid. Unfortunately, during the rebuild, mdadm
decided to kick sdc1 and I have now ended up with two drives failing.

I have tried re-adding sdc1 with the --re-add command, but mdadm again
refuses to re-add the drive.

I haven't changed anything since as I don't know what to do further. I
don't want to make any further damage to the raid and hope that someone
knows how to restore it.

My primary question is whether mdadm actually deletes any important data
on the remaining disks (sd{b,c,d}1) while rebuilding or whether it just
writes data to the newly added disk sde1.

mdadm is version 3.2.3, kernel is Linux 3.2.0 on Debian Wheezy.

Can anyone give further advise?

I'm attaching the output of mdadm -E /dev/sd{b,c,d,e}1.

Kind Regards,

Adrian
Robin Hill
2012-02-15 14:45:36 UTC
Permalink
Post by John Paul Adrian Glaubitz
Hello,
I have a rather big problem with my Linux software RAID5.
It consists of 4 SATA disks each 1 TB in size, resulting in a 3 TB RAID5
volume (/dev/md0 assembled from /dev/sd{b,c,d,e}1.
Today, mdadm kicked disk sde1 from the RAID since the cable seemed to
make problems. I shutdown the machine, replaced the cable and tried
re-adding the disk, however, mdadm refused to add the drive.
So I re-partioned sde1 and added it as a new devices, mdadm instantly
started rebuilding the raid. Unfortunately, during the rebuild, mdadm
decided to kick sdc1 and I have now ended up with two drives failing.
I have tried re-adding sdc1 with the --re-add command, but mdadm again
refuses to re-add the drive.
That's a safety measure. If it can't actually re-add the drive then it
fails, rather than changing to do an --add instead (as older mdadm
versions did), potentially losing data.
Post by John Paul Adrian Glaubitz
I haven't changed anything since as I don't know what to do further. I
don't want to make any further damage to the raid and hope that someone
knows how to restore it.
My primary question is whether mdadm actually deletes any important data
on the remaining disks (sd{b,c,d}1) while rebuilding or whether it just
writes data to the newly added disk sde1.
It just writes data/checksums to the newly added disk. The only writes
to the remaining disks will be if other applications are writing to the
array during the rebuild process.
Post by John Paul Adrian Glaubitz
mdadm is version 3.2.3, kernel is Linux 3.2.0 on Debian Wheezy.
Can anyone give further advise?
What errors does dmesg give about why sdc1 was failed? You'll need to
fix that before you try recovering the array. If it's a drive error then
using ddrescue to clone it (or as much of it as possible) to sde1 would
probably be your best bet, then get a replacement drive.

Once you've fixed that issue then you should be able to force assemble
the array (mdadm -S /dev/md0; mdadm -Af /dev/md0) and continue/restart
the recovery process. I'd recommend doing a fsck on the filesystem
afterwards as well, especially if you've replaced sdc.

If the force assembly fails then try it with added verbosity (mdadm -S
/dev/md0; mdadm -Afvvv /dev/md0) and post the output from that (and from
dmesg) and hopefully someone will be able to figure out what's going
wrong.

Cheers,
Robin
--
___
( ' } | Robin Hill <***@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
John Paul Adrian Glaubitz
2012-02-15 23:01:26 UTC
Permalink
Hi,
Post by Robin Hill
Post by John Paul Adrian Glaubitz
I have tried re-adding sdc1 with the --re-add command, but mdadm again
refuses to re-add the drive.
That's a safety measure. If it can't actually re-add the drive then it
fails, rather than changing to do an --add instead (as older mdadm
versions did), potentially losing data.
Aha, thanks for clarifying.
Post by Robin Hill
Post by John Paul Adrian Glaubitz
My primary question is whether mdadm actually deletes any important data
on the remaining disks (sd{b,c,d}1) while rebuilding or whether it just
writes data to the newly added disk sde1.
It just writes data/checksums to the newly added disk. The only writes
to the remaining disks will be if other applications are writing to the
array during the rebuild process.
Great :). I was hoping so.
Post by Robin Hill
Post by John Paul Adrian Glaubitz
Can anyone give further advise?
What errors does dmesg give about why sdc1 was failed? You'll need to
fix that before you try recovering the array. If it's a drive error then
using ddrescue to clone it (or as much of it as possible) to sde1 would
probably be your best bet, then get a replacement drive.
Those were errors related to the cable, the SATA link failed, the disk is ok,
smart log is clean.
Post by Robin Hill
Once you've fixed that issue then you should be able to force assemble
the array (mdadm -S /dev/md0; mdadm -Af /dev/md0) and continue/restart
the recovery process. I'd recommend doing a fsck on the filesystem
afterwards as well, especially if you've replaced sdc.
It did work, the raid is now rebuilding. I had actually had a friend who has
more expertise (he is a casual kernel hacker himself) have a look at it and
he fixed everything.

Basically, he reassembled the array from sd{b,c,d}1 with the --force option,
corrected the partitioning on the sde disk (I created a partition larger than
on the other disks accidentally, so he just copied the partition table from
one of the other disks in the array) and then added sde1 as a new disk.

Raid is no rebuilding and will be finished in 4 hours.

Thanks a lot for your quick help!

Adrian
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...