Discussion:
error injection
(too old to reply)
Jojy Varghese
2011-09-28 19:48:37 UTC
Permalink
Hi
I am trying to dynamically add error injection to my virtual
disk(LVM) for testing+ debugging purpose. I saw "faulty" personality
module in the kernel and was wondering if there was any documentation
on its usage. I am not looking to set up a RAID but a simple mapped
device. So the basic use case is that I need to be able to dynamically
add/remove error sectors and also be able to have granular error
configuration like read error, read+write error etc.

thanks in advance
Jojy
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
NeilBrown
2011-09-28 23:11:28 UTC
Permalink
Post by Jojy Varghese
Hi
I am trying to dynamically add error injection to my virtual
disk(LVM) for testing+ debugging purpose. I saw "faulty" personality
module in the kernel and was wondering if there was any documentation
on its usage. I am not looking to set up a RAID but a simple mapped
device. So the basic use case is that I need to be able to dynamically
add/remove error sectors and also be able to have granular error
configuration like read error, read+write error etc.
thanks in advance
Jojy
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
The 'faulty' md personality is described briefly in the 'md.4' man page which
is included in the mdadm distribution.
I've included the relevant part below.

Configuring the type of faults is described in mdadm.8 under the '-p
--layout=' section. So can adjust the settings using mdadm --grow.
so:
mdadm -B /dev/md0 -l faulty -n1 /dev/sda

will build a 'faulty' device which provides access to /dev/sda, but
introduces faults. Initially no faults will be introduces.

mdadm -G /dev/md0 --layout=rt400

will tell md0 to generate a read error every 400 requests, but not to
remember the error - rt == readtransient
--layout=rp400
will create a persistent error every 400 reads subsequent reads of the same
block will produce the same error. at most 50 persistent errors can be
recorded.
mdadm -G /dev/md0 --layout=clear
will stop producing new errors
mdadm -G /dev/md0 --layout=flush
will forget all persistent errors.


from md.4:

FAULTY
The FAULTY md module is provided for testing purposes. A faulty array
has exactly one component device and is normally assembled without a
superblock, so the md array created provides direct access to all of
the data in the component device.

The FAULTY module may be requested to simulate faults to allow testing
of other md levels or of filesystems. Faults can be chosen to trigger
on read requests or write requests, and can be transient (a subsequent
read/write at the address will probably succeed) or persistent (subse-
quent read/write of the same address will fail). Further, read faults
can be "fixable" meaning that they persist until a write request at the
same address.

Fault types can be requested with a period. In this case, the fault
will recur repeatedly after the given number of requests of the rele-
vant type. For example if persistent read faults have a period of 100,
then every 100th read request would generate a fault, and the faulty
sector would be recorded so that subsequent reads on that sector would
also fail.

There is a limit to the number of faulty sectors that are remembered.
Faults generated after this limit is exhausted are treated as tran-
sient.

The list of faulty sectors can be flushed, and the active list of fail-
ure modes can be cleared.


from mdadm.8:

When setting the failure mode for level faulty, the options are:
write-transient, wt, read-transient, rt, write-persistent, wp,
read-persistent, rp, write-all, read-fixable, rf, clear, flush,
none.

Each failure mode can be followed by a number, which is used as
a period between fault generation. Without a number, the fault
is generated once on the first relevant request. With a number,
the fault will be generated after that many requests, and will
continue to be generated every time the period elapses.

Multiple failure modes can be current simultaneously by using
the --grow option to set subsequent failure modes.

"clear" or "none" will remove any pending or periodic failure
modes, and "flush" will clear any persistent faults.



NeilBrown
Jojy Varghese
2011-09-29 00:59:49 UTC
Permalink
Thanks Neil. I tried setting my sda7 partition to generate write
errors every 40 bytes(writing 1 byte at a time). I did :

1. Create a array with:
mdadm -C /dev/md/me0 -l faulty -n1 /dev/sda7

After this step I can see /dev/md127 and when i do a mdadm -D /dev/md12=
7, i get:

/dev/md127:
Version : 1.2
Creation Time : Wed Sep 28 17:35:50 2011
Raid Level : faulty
Array Size : 969410424 (924.50 GiB 992.68 GB)
Raid Devices : 1
Total Devices : 1
Persistence : Superblock is persistent

Update Time : Wed Sep 28 17:35:50 2011
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

Name : eng-dev16.lab.local:me0 (local to host eng-dev16.lab=
=2Elocal)
UUID : 96f4be10:312f9574:f40107aa:d9f278ba
Events : 0

Number Major Minor RaidDevice State
0 8 7 0 active sync /dev/sda7


2. Set write fault level with:

mdadm -G /dev/md/me0 --layout=3Dwp40



After this when i write > 40 bytes into /dev/md127, i dont get any
I/O errors. I am sure i am doing something wrong here.


Any help is much appreciated.

Thanks
Jojy
=2Ecom>
Hi
=C2=A0I am trying to dynamically add error injection to my virtual
disk(LVM) for testing+ debugging purpose. I saw "faulty" personality
module in the kernel and was wondering if there was any documentatio=
n
on its usage. I am not looking to set up a RAID but a simple mapped
device. So the basic use case is that I need to be able to dynamical=
ly
add/remove error sectors and also be able to have granular error
configuration like read error, read+write error etc.
thanks in advance
Jojy
--
To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
More majordomo info at =C2=A0http://vger.kernel.org/majordomo-info.h=
tml
The 'faulty' md personality is described briefly in the 'md.4' man pa=
ge which
is included in the mdadm distribution.
I've included the relevant part below.
Configuring the type of faults is described in mdadm.8 under the '-p
--layout=3D' section. =C2=A0So can adjust the settings using mdadm --=
grow.
=C2=A0mdadm -B /dev/md0 -l faulty -n1 /dev/sda
will build a 'faulty' device which provides access to /dev/sda, but
introduces faults. =C2=A0Initially no faults will be introduces.
=C2=A0mdadm -G /dev/md0 --layout=3Drt400
will tell md0 to generate a read error every 400 requests, but not to
remember the error - rt =3D=3D readtransient
=C2=A0 --layout=3Drp400
will create a persistent error every 400 reads subsequent reads of th=
e same
block will produce the same error. =C2=A0at most 50 persistent errors=
can be
recorded.
=C2=A0mdadm -G /dev/md0 --layout=3Dclear
will stop producing new errors
=C2=A0mdadm -G /dev/md0 --layout=3Dflush
will forget all persistent errors.
=C2=A0 FAULTY
=C2=A0 =C2=A0 =C2=A0 The FAULTY md module is provided for testing pur=
poses. =C2=A0A faulty =C2=A0array
=C2=A0 =C2=A0 =C2=A0 has =C2=A0exactly =C2=A0one =C2=A0component devi=
ce and is normally assembled without a
=C2=A0 =C2=A0 =C2=A0 superblock, so the md array created provides dir=
ect access =C2=A0to =C2=A0all =C2=A0of
=C2=A0 =C2=A0 =C2=A0 the data in the component device.
=C2=A0 =C2=A0 =C2=A0 The =C2=A0FAULTY module may be requested to simu=
late faults to allow testing
=C2=A0 =C2=A0 =C2=A0 of other md levels or of filesystems. =C2=A0Faul=
ts can be chosen to =C2=A0trigger
=C2=A0 =C2=A0 =C2=A0 on =C2=A0read requests or write requests, and ca=
n be transient (a subsequent
=C2=A0 =C2=A0 =C2=A0 read/write at the address will probably succeed)=
or persistent =C2=A0(subse-
=C2=A0 =C2=A0 =C2=A0 quent =C2=A0read/write of the same address will =
fail). =C2=A0Further, read faults
=C2=A0 =C2=A0 =C2=A0 can be "fixable" meaning that they persist until=
a write request at the
=C2=A0 =C2=A0 =C2=A0 same address.
=C2=A0 =C2=A0 =C2=A0 Fault =C2=A0types =C2=A0can =C2=A0be requested w=
ith a period. =C2=A0In this case, the fault
=C2=A0 =C2=A0 =C2=A0 will recur repeatedly after the given number of =
requests of =C2=A0the =C2=A0rele-
=C2=A0 =C2=A0 =C2=A0 vant type. =C2=A0For example if persistent read =
faults have a period of 100,
=C2=A0 =C2=A0 =C2=A0 then every 100th read request would generate a f=
ault, =C2=A0and =C2=A0the =C2=A0faulty
=C2=A0 =C2=A0 =C2=A0 sector =C2=A0would be recorded so that subsequen=
t reads on that sector would
=C2=A0 =C2=A0 =C2=A0 also fail.
=C2=A0 =C2=A0 =C2=A0 There is a limit to the number of faulty sectors=
that =C2=A0are =C2=A0remembered.
=C2=A0 =C2=A0 =C2=A0 Faults =C2=A0generated =C2=A0after =C2=A0this =C2=
=A0limit is exhausted are treated as tran-
=C2=A0 =C2=A0 =C2=A0 sient.
=C2=A0 =C2=A0 =C2=A0 The list of faulty sectors can be flushed, and t=
he active list of fail-
=C2=A0 =C2=A0 =C2=A0 ure modes can be cleared.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0When setting the fail=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0write-transient, wt, =
read-transient, rt, =C2=A0write-persistent, =C2=A0wp,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0read-persistent, =C2=A0=
rp, write-all, read-fixable, rf, clear, flush,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0none.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Each failure mode can=
be followed by a number, which is used =C2=A0as
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0a =C2=A0period betwee=
n fault generation. =C2=A0Without a number, the fault
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0is generated once on =
the first relevant request. =C2=A0With a number,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0the =C2=A0fault =C2=A0=
will be generated after that many requests, and will
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0continue to be genera=
ted every time the period elapses.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Multiple failure mode=
s can be current =C2=A0simultaneously =C2=A0by =C2=A0using
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0the --grow option to =
set subsequent failure modes.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"clear" =C2=A0or =C2=A0=
"none" =C2=A0will remove any pending or periodic failure
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0modes, and "flush" wi=
ll clear any persistent faults.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
NeilBrown
2011-09-29 01:08:36 UTC
Permalink
Post by Jojy Varghese
Thanks Neil. I tried setting my sda7 partition to generate write
md doesn't see byte writes. It sees sectors or more - usually whole pages or
groups of pages.
Post by Jojy Varghese
mdadm -C /dev/md/me0 -l faulty -n1 /dev/sda7
-C will write a superblock to /dev/sda7 which you don't really want. It
doesn't hurt, but I always used -B (--build) to avoid any metadata.
Post by Jojy Varghese
Version : 1.2
Creation Time : Wed Sep 28 17:35:50 2011
Raid Level : faulty
Array Size : 969410424 (924.50 GiB 992.68 GB)
Raid Devices : 1
Total Devices : 1
Persistence : Superblock is persistent
Update Time : Wed Sep 28 17:35:50 2011
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Name : eng-dev16.lab.local:me0 (local to host eng-dev16.lab.local)
UUID : 96f4be10:312f9574:f40107aa:d9f278ba
Events : 0
Number Major Minor RaidDevice State
0 8 7 0 active sync /dev/sda7
mdadm -G /dev/md/me0 --layout=wp40
After this when i write > 40 bytes into /dev/md127, i dont get any
I/O errors. I am sure i am doing something wrong here.
When you write to /dev/md127 it will just go into the page cache and
eventually be flushed to the device in one write.
Use O_DIRECT or O_SYNC and it will be flushed out more quickly, but always
write at least 512 bytes at a time.

NeilBrown
Post by Jojy Varghese
Any help is much appreciated.
Thanks
Jojy
Jojy Varghese
2011-09-29 02:06:17 UTC
Permalink
Thanks Neil. Also, is there any way to find the current fault blocks be=
ing set?
=2Ecom>
Post by Jojy Varghese
Thanks Neil. I tried setting my sda7 partition to generate write
md doesn't see byte writes. =C2=A0It sees sectors or more - usually w=
hole pages or
groups of pages.
Post by Jojy Varghese
mdadm -C /dev/md/me0 -l faulty -n1 /dev/sda7
-C will write a superblock to /dev/sda7 which you don't really want. =
=C2=A0It
doesn't hurt, but I always used -B (--build) to avoid any metadata.
Post by Jojy Varghese
After this step I can see /dev/md127 and when i do a mdadm -D /dev/m=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Version : 1.2
=C2=A0 Creation Time : Wed Sep 28 17:35:50 2011
=C2=A0 =C2=A0 =C2=A0Raid Level : faulty
=C2=A0 =C2=A0 =C2=A0Array Size : 969410424 (924.50 GiB 992.68 GB)
=C2=A0 =C2=A0Raid Devices : 1
=C2=A0 Total Devices : 1
=C2=A0 =C2=A0 Persistence : Superblock is persistent
=C2=A0 =C2=A0 Update Time : Wed Sep 28 17:35:50 2011
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 State : clean
=C2=A0Active Devices : 1
Working Devices : 1
=C2=A0Failed Devices : 0
=C2=A0 Spare Devices : 0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Name : eng-dev16.lab.local:=
me0 =C2=A0(local to host eng-dev16.lab.local)
Post by Jojy Varghese
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0UUID : 96f4be10:312f9574:f4=
0107aa:d9f278ba
Post by Jojy Varghese
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Events : 0
=C2=A0 =C2=A0 Number =C2=A0 Major =C2=A0 Minor =C2=A0 RaidDevice Sta=
te
Post by Jojy Varghese
=C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 8 =C2=A0 =C2=A0 =C2=
=A0 =C2=A07 =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0active syn=
c =C2=A0 /dev/sda7
Post by Jojy Varghese
mdadm -G /dev/md/me0 --layout=3Dwp40
=C2=A0 After this when i write > 40 bytes into /dev/md127, i dont ge=
t any
Post by Jojy Varghese
I/O errors. I am sure i am doing something wrong here.
When you write to /dev/md127 it will just go into the page cache and
eventually be flushed to the device in one write.
Use O_DIRECT or O_SYNC and it will be flushed out more quickly, but a=
lways
write at least 512 =C2=A0bytes at a time.
NeilBrown
Post by Jojy Varghese
Any help is much appreciated.
Thanks
Jojy
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
NeilBrown
2011-09-29 02:12:26 UTC
Permalink
Thanks Neil. Also, is there any way to find the current fault blocks being set?
No. All you can get is what is shown in "/proc/mdstat".

It wouldn't be too hard to add something to /proc/mdstat or /sys/.... to show
that information.

NeilBrown

Loading...