Discussion:
--assume-clean on raid5/6
b***@emc.com
2010-08-06 01:19:25 UTC
Permalink
Hi all,

I've read in the list archives that use of --assume-clean on raid5
(raid6?) is not safe assuming the member drives are not sync, but it's
not clear to me as to why. I can see the content of an written raid5
array change if I fail a drive out of the array (created w/
--assume-clean), but data that I write prior to failing a drive remains
intact. Perhaps I'm missing something. Could somebody elaborate on the
danger/risk of using --assume-clean? Thanks in advance.

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan /*St0fF*/ Hübner
2010-08-07 12:28:55 UTC
Permalink
Hi Brian,

--assume-clean skips over the initial resync. Which - if you will
create a filesystem after creating the array - is a time-saving idea.
But keep in mind: even if the disks are brand new and contain only
zeros, the parity would probably look not all zeros. So reading from
such an array would be a bad idea.
But if the next thing you do is create LVM/filesystem etc., then all bit
read from the array will have been written to before (and by that are in
sync).

Stefan
Post by b***@emc.com
Hi all,
I've read in the list archives that use of --assume-clean on raid5
(raid6?) is not safe assuming the member drives are not sync, but it's
not clear to me as to why. I can see the content of an written raid5
array change if I fail a drive out of the array (created w/
--assume-clean), but data that I write prior to failing a drive remains
intact. Perhaps I'm missing something. Could somebody elaborate on the
danger/risk of using --assume-clean? Thanks in advance.
Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Neil Brown
2010-08-08 08:56:02 UTC
Permalink
On Sat, 07 Aug 2010 14:28:55 +0200
Post by Stefan /*St0fF*/ Hübner
Hi Brian,
=20
--assume-clean skips over the initial resync. Which - if you will
create a filesystem after creating the array - is a time-saving idea.
But keep in mind: even if the disks are brand new and contain only
zeros, the parity would probably look not all zeros. So reading from
such an array would be a bad idea.
But if the next thing you do is create LVM/filesystem etc., then all =
bit
Post by Stefan /*St0fF*/ Hübner
read from the array will have been written to before (and by that are=
in
Post by Stefan /*St0fF*/ Hübner
sync).
There is an important point that this misses.

When md updates a block on a RAID5 it will sometimes use a read-modify-=
write
cycle which reads the old block and old parity, subtracts the old block=
from
the parity block and then added the new block to the parity block. The=
n it
writes the new data block and the new parity block.

If the old parity was correct for the old stripe, then the new parity w=
ill be
correct for the new stripe. But if the old was wrong then the new will=
be
wrong.

So if you use assume-clean then the parity may well be wrong and could =
remain
wrong even when you write new data. If you then lose a device, the dat=
a for
that device will be computed using wrong parity and you will get wrong =
data -
hence data corruption.

So you should only use --assume-clean if you know the array really is
'clean'.

RAID1/RAID10 cannot suffer from this so --assume-clean is quite safe wi=
th
those array types.
The current implementation of RAID6 never does read-modify-write so
--assume-clean is currently safe with RAID6 too. However I do not prom=
ise
that RAID6 might not change to use read-modify-write cycles in some fut=
ure
implementation. So I would not recommend using --assume-clean on RAID6=
just
to avoid the resync cost.

NeilBrown
Post by Stefan /*St0fF*/ Hübner
=20
Stefan
=20
Post by b***@emc.com
Hi all,
=20
I've read in the list archives that use of --assume-clean on raid5
(raid6?) is not safe assuming the member drives are not sync, but i=
t's
Post by Stefan /*St0fF*/ Hübner
Post by b***@emc.com
not clear to me as to why. I can see the content of an written raid=
5
Post by Stefan /*St0fF*/ Hübner
Post by b***@emc.com
array change if I fail a drive out of the array (created w/
--assume-clean), but data that I write prior to failing a drive rem=
ains
Post by Stefan /*St0fF*/ Hübner
Post by b***@emc.com
intact. Perhaps I'm missing something. Could somebody elaborate on =
the
Post by Stefan /*St0fF*/ Hübner
Post by b***@emc.com
danger/risk of using --assume-clean? Thanks in advance.
=20
Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-rai=
d" in
Post by Stefan /*St0fF*/ Hübner
Post by b***@emc.com
More majordomo info at http://vger.kernel.org/majordomo-info.html
=20
--
To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
Post by Stefan /*St0fF*/ Hübner
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
b***@emc.com
2010-08-08 14:17:01 UTC
Permalink
-----Original Message-----
Sent: Sunday, August 08, 2010 4:56 AM
Subject: Re: --assume-clean on raid5/6
On Sat, 07 Aug 2010 14:28:55 +0200
Post by Stefan /*St0fF*/ Hübner
Hi Brian,
--assume-clean skips over the initial resync. Which - if you will
create a filesystem after creating the array - is a time-saving idea.
But keep in mind: even if the disks are brand new and contain only
zeros, the parity would probably look not all zeros. So reading from
such an array would be a bad idea.
But if the next thing you do is create LVM/filesystem etc., then all
bit
Post by Stefan /*St0fF*/ Hübner
read from the array will have been written to before (and by that are
in
Post by Stefan /*St0fF*/ Hübner
sync).
There is an important point that this misses.
When md updates a block on a RAID5 it will sometimes use a read-modify-
write
cycle which reads the old block and old parity, subtracts the old block
from
the parity block and then added the new block to the parity block.
Then it
writes the new data block and the new parity block.
If the old parity was correct for the old stripe, then the new parity
will be
correct for the new stripe. But if the old was wrong then the new will
be
wrong.
So if you use assume-clean then the parity may well be wrong and could
remain
wrong even when you write new data. If you then lose a device, the
data for
that device will be computed using wrong parity and you will get wrong
data -
hence data corruption.
So you should only use --assume-clean if you know the array really is
'clean'.
Thanks for the information guys. I was actually attempting to test whether this could occur with a high-level sequence similar to the following:

- dd /dev/urandom data to 4 small partitions (~10MB each).
- Create a raid5 with --assume-clean on said partitions.
- Write a small bit of data (32 bytes) to the beginning of the md, capture an image of the md to a file.
- Fail/remove a drive from the md, capture a second md file image.
- cmp the file images to see what changed, and read back the first 32 bytes of data.

In this scenario I do observe differences in the file image, but my data remains intact. I ran this sequence multiple times, each time failing a different drive in the array and also tried to stop/restart the array (with a drop_caches in between) before the drive failure step. This leads to my question: is there a write test that can reproduce data corruption under this scenario, or is the rmw cycle some kind of optimization that is not so deterministic?

Also out of curiousity, would --assume-clean be safe on a raid5 if the drives were explicitly zeroed beforehand? Thanks again.

Brian
RAID1/RAID10 cannot suffer from this so --assume-clean is quite safe
with
those array types.
The current implementation of RAID6 never does read-modify-write so
--assume-clean is currently safe with RAID6 too. However I do not
promise
that RAID6 might not change to use read-modify-write cycles in some
future
implementation. So I would not recommend using --assume-clean on RAID6
just
to avoid the resync cost.
NeilBrown
Post by Stefan /*St0fF*/ Hübner
Stefan
Post by b***@emc.com
Hi all,
I've read in the list archives that use of --assume-clean on raid5
(raid6?) is not safe assuming the member drives are not sync, but
it's
Post by Stefan /*St0fF*/ Hübner
Post by b***@emc.com
not clear to me as to why. I can see the content of an written
raid5
Post by Stefan /*St0fF*/ Hübner
Post by b***@emc.com
array change if I fail a drive out of the array (created w/
--assume-clean), but data that I write prior to failing a drive
remains
Post by Stefan /*St0fF*/ Hübner
Post by b***@emc.com
intact. Perhaps I'm missing something. Could somebody elaborate on
the
Post by Stefan /*St0fF*/ Hübner
Post by b***@emc.com
danger/risk of using --assume-clean? Thanks in advance.
Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-
raid" in
Post by Stefan /*St0fF*/ Hübner
Post by b***@emc.com
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid"
in
Post by Stefan /*St0fF*/ Hübner
More majordomo info at http://vger.kernel.org/majordomo-info.html
��{.n�+�������+%��lzwm��b�맲��r��zX��ډا���ܨ}���Ơz�&j:+v�������zZ+
Continue reading on narkive:
Loading...