Discussion:
Raid5 reshape
(too old to reply)
Tim
2006-06-16 06:34:21 UTC
Permalink
Hello all,

I'm sorry if this is a silly question, but I've been digging around for
a few days now and have not found a clear answer, so I'm tossing it out
to those who know it best.

I see that as of a few rc's ago, 2.6.17 has had the capability of adding
additional drives to an active raid 5 array (w/ the proper ver of mdadm,
of course). I cannot, however, for the life of me find out exactly how
one goes about doing it! I would love if someone could give a
step-by-step on what needs to be changed in, say, mdadm.conf (if
anything), and what args you need to throw at mdadm to start the reshape
process.

As a point of reference, here's my current mdadm.conf:


DEVICE /dev/sda1
DEVICE /dev/sdb1
DEVICE /dev/sdc1
ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1,/dev/sdc1 level=5 num-devices=3


I will be adding the devices /dev/sde1 and /dev/sdf1 (when I can find
out how :)

Thanks in advance,
-Tim

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Neil Brown
2006-06-16 06:41:03 UTC
Permalink
Post by Tim
Hello all,
I'm sorry if this is a silly question, but I've been digging around for
a few days now and have not found a clear answer, so I'm tossing it out
to those who know it best.
I see that as of a few rc's ago, 2.6.17 has had the capability of adding
additional drives to an active raid 5 array (w/ the proper ver of mdadm,
of course). I cannot, however, for the life of me find out exactly how
one goes about doing it! I would love if someone could give a
step-by-step on what needs to be changed in, say, mdadm.conf (if
anything), and what args you need to throw at mdadm to start the reshape
process.
DEVICE /dev/sda1
DEVICE /dev/sdb1
DEVICE /dev/sdc1
ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1,/dev/sdc1 level=5 num-devices=3
May I suggest:

DEVICE /dev/sd?1
ARRAY /dev/md0 UUID=whatever

it would be a lot safer.
Post by Tim
I will be adding the devices /dev/sde1 and /dev/sdf1 (when I can find
out how :)
mdadm /dev/md0 --add /dev/sde1 /dev/sdf1
mdadm --grow /dev/md0 --raid-disks=5

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nigel J. Terry
2006-06-16 16:00:49 UTC
Permalink
Post by Neil Brown
Post by Tim
Hello all,
I'm sorry if this is a silly question, but I've been digging around for
a few days now and have not found a clear answer, so I'm tossing it out
to those who know it best.
I see that as of a few rc's ago, 2.6.17 has had the capability of adding
additional drives to an active raid 5 array (w/ the proper ver of mdadm,
of course). I cannot, however, for the life of me find out exactly how
one goes about doing it! I would love if someone could give a
step-by-step on what needs to be changed in, say, mdadm.conf (if
anything), and what args you need to throw at mdadm to start the reshape
process.
DEVICE /dev/sda1
DEVICE /dev/sdb1
DEVICE /dev/sdc1
ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1,/dev/sdc1 level=5 num-devices=3
DEVICE /dev/sd?1
ARRAY /dev/md0 UUID=whatever
it would be a lot safer.
Post by Tim
I will be adding the devices /dev/sde1 and /dev/sdf1 (when I can find
out how :)
mdadm /dev/md0 --add /dev/sde1 /dev/sdf1
mdadm --grow /dev/md0 --raid-disks=5
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
This might be an even sillier question, but I'll ask it anyway...

If I add a drive to my RAID5 array, what happens to the ext3 filesystem
on top of it? Does it grow automatically? Do I have to take some action
to use the extra space?

Thanks

Nigel
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Tim T
2006-06-16 16:14:17 UTC
Permalink
You have to grow the ext3 fs separately. ext2resize /dev/mdX. Keep in
mind this can only be done off-line.

-Tim
Post by Nigel J. Terry
Post by Neil Brown
Post by Tim
Hello all,
I'm sorry if this is a silly question, but I've been digging around for
a few days now and have not found a clear answer, so I'm tossing it out
to those who know it best.
I see that as of a few rc's ago, 2.6.17 has had the capability of adding
additional drives to an active raid 5 array (w/ the proper ver of mdadm,
of course). I cannot, however, for the life of me find out exactly how
one goes about doing it! I would love if someone could give a
step-by-step on what needs to be changed in, say, mdadm.conf (if
anything), and what args you need to throw at mdadm to start the reshape
process.
DEVICE /dev/sda1
DEVICE /dev/sdb1
DEVICE /dev/sdc1
ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1,/dev/sdc1 level=5
num-devices=3
DEVICE /dev/sd?1
ARRAY /dev/md0 UUID=whatever
it would be a lot safer.
Post by Tim
I will be adding the devices /dev/sde1 and /dev/sdf1 (when I can find
out how :)
mdadm /dev/md0 --add /dev/sde1 /dev/sdf1
mdadm --grow /dev/md0 --raid-disks=5
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
This might be an even sillier question, but I'll ask it anyway...
If I add a drive to my RAID5 array, what happens to the ext3
filesystem on top of it? Does it grow automatically? Do I have to take
some action to use the extra space?
Thanks
Nigel
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Neil Brown
2006-06-16 22:28:41 UTC
Permalink
Post by Tim T
You have to grow the ext3 fs separately. ext2resize /dev/mdX. Keep in
mind this can only be done off-line.
ext3 can be resized online. I think ext2resize in the latest release
will "do the right thing" whether it is online or not.

There is a limit to the amount of expansion that can be achieved
on-line. This limit is set when making the filesystem. Depending on
which version of ext2-utils you used to make the filesystem, it may or
may not already be prepared for substantial expansion.

So if you want to do it on-line, give it a try.... or ask on the
ext3-users list for particular details on what versions you need and
how to see if your fs can be expanded.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nigel J. Terry
2006-06-16 22:39:10 UTC
Permalink
Post by Neil Brown
Post by Tim T
You have to grow the ext3 fs separately. ext2resize /dev/mdX. Keep in
mind this can only be done off-line.
ext3 can be resized online. I think ext2resize in the latest release
will "do the right thing" whether it is online or not.
There is a limit to the amount of expansion that can be achieved
on-line. This limit is set when making the filesystem. Depending on
which version of ext2-utils you used to make the filesystem, it may or
may not already be prepared for substantial expansion.
So if you want to do it on-line, give it a try.... or ask on the
ext3-users list for particular details on what versions you need and
how to see if your fs can be expanded.
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks for all the advice. One final question, what kernel and mdadm
versions do I need?

Nigel
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Neil Brown
2006-06-16 22:44:49 UTC
Permalink
Post by Nigel J. Terry
Thanks for all the advice. One final question, what kernel and mdadm
versions do I need?
For resizing raid5:

mdadm-2.4 or later
linux-2.6.17-rc2 or later

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nigel J. Terry
2006-06-17 19:21:39 UTC
Permalink
Post by Neil Brown
Post by Nigel J. Terry
Thanks for all the advice. One final question, what kernel and mdadm
versions do I need?
mdadm-2.4 or later
linux-2.6.17-rc2 or later
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ok, I tried and screwed up!

I upgraded my kernel and mdadm.
I set the grow going and all looked well, so as it said it was going to
take 430 minutes, I went to Starbucks. When I came home there had been a
power cut, but my UPS had shut the system down. When power returned I
rebooted. Now I think I had failed to set the new partition on /dev/hdc1
to Raid Autodetect, so it didn't find it at reboot. I tried to hot add
it, but now I seem to have a deadlock situation. Although --detail shows
that it is degraded and recovering, /proc/mdstat shows it is reshaping.
In truth there is no disk activity and the count in /proc/mdstat is not
changing. I gues sthe only good news is that I can still mount the
device and my data is fine. Please see below...

Any ideas what I should do next? Thanks

Nigel

[***@homepc ~]# uname -a
Linux homepc.nigelterry.net 2.6.17-rc6 #1 SMP Sat Jun 17 11:05:52 EDT
2006 x86_64 x86_64 x86_64 GNU/Linux
[***@homepc ~]# mdadm --version
mdadm - v2.5.1 - 16 June 2006
[***@homepc ~]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.91.03
Creation Time : Tue Apr 18 17:44:34 2006
Raid Level : raid5
Array Size : 490223104 (467.51 GiB 501.99 GB)
Device Size : 245111552 (233.76 GiB 250.99 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Sat Jun 17 15:15:05 2006
State : clean, degraded, recovering
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 128K

Reshape Status : 6% complete
Delta Devices : 1, (3->4)

UUID : 50e3173e:b5d2bdb6:7db3576b:644409bb
Events : 0.3211829

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 3 65 2 active sync /dev/hdb1
3 0 0 3 removed

4 22 1 - spare /dev/hdc1
[***@homepc ~]# cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2]
490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3]
[UUU_]
[=>...................] reshape = 6.9% (17073280/245111552)
finish=86.3min speed=44003K/sec

unused devices: <none>
[***@homepc ~]#

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Neil Brown
2006-06-17 21:55:11 UTC
Permalink
Post by Nigel J. Terry
Any ideas what I should do next? Thanks
Looks like you've probably hit a bug. I'll need a bit more info
though.
Post by Nigel J. Terry
Personalities : [raid5] [raid4]
md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2]
490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3]
[UUU_]
[=>...................] reshape = 6.9% (17073280/245111552)
finish=86.3min speed=44003K/sec
unused devices: <none>
This really makes it look like the reshape is progressing. How
long after the reboot was this taken? How long after hdc1 has hot
added (roughly)? What does it show now?

What happens if you remove hdc1 again? Does the reshape keep going?

What I would expect to happen in this case is that the array reshapes
into a degraded array, then the missing disk is recovered onto hdc1.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nigel J. Terry
2006-06-17 22:01:46 UTC
Permalink
Post by Neil Brown
Post by Nigel J. Terry
Any ideas what I should do next? Thanks
Looks like you've probably hit a bug. I'll need a bit more info
though.
Post by Nigel J. Terry
Personalities : [raid5] [raid4]
md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2]
490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3]
[UUU_]
[=>...................] reshape = 6.9% (17073280/245111552)
finish=86.3min speed=44003K/sec
unused devices: <none>
This really makes it look like the reshape is progressing. How
long after the reboot was this taken? How long after hdc1 has hot
added (roughly)? What does it show now?
What happens if you remove hdc1 again? Does the reshape keep going?
What I would expect to happen in this case is that the array reshapes
into a degraded array, then the missing disk is recovered onto hdc1.
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
I don't know how long the system was reshaping before the power went
off, and then I had to restart when the power came back. It claimed it
was going to take 430 minutes, so 6% would be about 25 minutes, which
could make good sense, certainly it looked like it was working fine when
I went out.

Now nothing is happening, it shows:

[***@homepc ~]# cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2]
490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3]
[UUU_]
[=>...................] reshape = 6.9% (17073280/245111552)
finish=2281.2min speed=1665K/sec

unused devices: <none>
[***@homepc ~]#

so the only thing changing is the time till finish.

I'll try removing and adding /dev/hdc1 again. Will it make any
difference if the device is mounted or not?

Nigel
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nigel J. Terry
2006-06-17 22:05:12 UTC
Permalink
Post by Nigel J. Terry
Post by Neil Brown
Post by Nigel J. Terry
Any ideas what I should do next? Thanks
Looks like you've probably hit a bug. I'll need a bit more info
though.
Post by Nigel J. Terry
Personalities : [raid5] [raid4]
md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2]
490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2
[4/3] [UUU_]
[=>...................] reshape = 6.9% (17073280/245111552)
finish=86.3min speed=44003K/sec
unused devices: <none>
This really makes it look like the reshape is progressing. How
long after the reboot was this taken? How long after hdc1 has hot
added (roughly)? What does it show now?
What happens if you remove hdc1 again? Does the reshape keep going?
What I would expect to happen in this case is that the array reshapes
into a degraded array, then the missing disk is recovered onto hdc1.
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
I don't know how long the system was reshaping before the power went
off, and then I had to restart when the power came back. It claimed it
was going to take 430 minutes, so 6% would be about 25 minutes, which
could make good sense, certainly it looked like it was working fine
when I went out.
Personalities : [raid5] [raid4]
md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2]
490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2
[4/3] [UUU_]
[=>...................] reshape = 6.9% (17073280/245111552)
finish=2281.2min speed=1665K/sec
unused devices: <none>
so the only thing changing is the time till finish.
I'll try removing and adding /dev/hdc1 again. Will it make any
difference if the device is mounted or not?
Nigel
Tried remove and add, made no difference:
[***@homepc ~]# mdadm /dev/md0 --remove /dev/hdc1
mdadm: hot removed /dev/hdc1
[***@homepc ~]# cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 sdb1[1] sda1[0] hdb1[2]
490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3]
[UUU_]
[=>...................] reshape = 6.9% (17073280/245111552)
finish=2321.5min speed=1636K/sec

unused devices: <none>
[***@homepc ~]# mdadm /dev/md0 --add /dev/hdc1
mdadm: re-added /dev/hdc1
[***@homepc ~]# cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 hdc1[4](S) sdb1[1] sda1[0] hdb1[2]
490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3]
[UUU_]
[=>...................] reshape = 6.9% (17073280/245111552)
finish=2329.3min speed=1630K/sec

unused devices: <none>
[***@homepc ~]#

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Neil Brown
2006-06-17 22:17:26 UTC
Permalink
OK, thanks for the extra details. I'll have a look and see what I can
find, but it'll probably be a couple of days before I have anything
useful for you.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nigel J. Terry
2006-06-17 22:19:46 UTC
Permalink
Post by Neil Brown
OK, thanks for the extra details. I'll have a look and see what I can
find, but it'll probably be a couple of days before I have anything
useful for you.
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
OK, I'll try and be patient :-) At least everything else is working.

Let me know if you need to ssh to my machine.

Nigel
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nigel J. Terry
2006-06-18 12:57:27 UTC
Permalink
Post by Neil Brown
OK, thanks for the extra details. I'll have a look and see what I can
find, but it'll probably be a couple of days before I have anything
useful for you.
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
This from dmesg might help diagnose the problem:

md: Autodetecting RAID arrays.
md: autorun ...
md: considering sdb1 ...
md: adding sdb1 ...
md: adding sda1 ...
md: adding hdc1 ...
md: adding hdb1 ...
md: created md0
md: bind<hdb1>
md: bind<hdc1>
md: bind<sda1>
md: bind<sdb1>
md: running: <sdb1><sda1><hdc1><hdb1>
raid5: automatically using best checksumming function: generic_sse
generic_sse: 6795.000 MB/sec
raid5: using function: generic_sse (6795.000 MB/sec)
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
raid5: reshape will continue
raid5: device sdb1 operational as raid disk 1
raid5: device sda1 operational as raid disk 0
raid5: device hdb1 operational as raid disk 2
raid5: allocated 4268kB for md0
raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2
RAID5 conf printout:
--- rd:4 wd:3 fd:1
disk 0, o:1, dev:sda1
disk 1, o:1, dev:sdb1
disk 2, o:1, dev:hdb1
...ok start reshape thread
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000
KB/sec) for reconstruction.
md: using 128k window, over a total of 245111552 blocks.
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
<0000000000000000>{stext+2145382632}
PGD 7c3f9067 PUD 7cb9e067 PMD 0
Oops: 0010 [1] SMP
CPU 0
Modules linked in: raid5 xor usb_storage video button battery ac lp
parport_pc parport floppy nvram snd_intel8x0 snd_ac97_codec snd_ac97_bus
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
snd_pcm_oss snd_mixer_oss ehci_hcd ohci1394 ieee1394 sg snd_pcm uhci_hcd
i2c_nforce2 i2c_core forcedeth ohci_hcd snd_timer snd soundcore
snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd sata_nv
libata sd_mod scsi_mod
Pid: 1432, comm: md0_reshape Not tainted 2.6.17-rc6 #1
RIP: 0010:[<0000000000000000>] <0000000000000000>{stext+2145382632}
RSP: 0000:ffff81007aa43d60 EFLAGS: 00010246
RAX: ffff81007cf72f20 RBX: ffff81007c682000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff81007cf72f20
RBP: 0000000002090900 R08: 0000000000000000 R09: ffff810037f497b0
R10: 0000000b44ffd564 R11: ffffffff8022c92a R12: 0000000000000000
R13: 0000000000000100 R14: 0000000000000000 R15: 0000000000000000
FS: 000000000066d870(0000) GS:ffffffff80611000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000007bebc000 CR4: 00000000000006e0
Process md0_reshape (pid: 1432, threadinfo ffff81007aa42000, task
ffff810037f497b0)
Stack: ffffffff803dce42 0000000000000000 000000001d383600 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000
Call Trace: <ffffffff803dce42>{md_do_sync+1307}
<ffffffff802640c0>{thread_return+0}
<ffffffff8026411e>{thread_return+94}
<ffffffff8029925d>{keventd_create_kthread+0}
<ffffffff803dd3d9>{md_thread+248}
<ffffffff8029925d>{keventd_create_kthread+0}
<ffffffff803dd2e1>{md_thread+0} <ffffffff80232cb1>{kthread+254}
<ffffffff8026051e>{child_rip+8}
<ffffffff8029925d>{keventd_create_kthread+0}
<ffffffff802640c0>{thread_return+0} <ffffffff80232bb3>{kthread+0}
<ffffffff80260516>{child_rip+0}

Code: Bad RIP value.
RIP <0000000000000000>{stext+2145382632} RSP <ffff81007aa43d60>
CR2: 0000000000000000
<6>md: ... autorun DONE.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Neil Brown
2006-06-19 02:03:29 UTC
Permalink
Yes, that helps a lot, thanks.

The problem is that the reshape thread is restarting before the array
is fully set-up, so it ends up dereferencing a NULL pointer.

This patch should fix it.
In fact, there is a small chance that next time you boot it will work
without this patch, but the patch makes it more reliable.

There definitely should be no data-loss due to this bug.

Thanks,
NeilBrown



### Diffstat output
./drivers/md/md.c | 6 ++++--
./drivers/md/raid5.c | 3 ---
2 files changed, 4 insertions(+), 5 deletions(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c 2006-05-30 15:07:14.000000000 +1000
+++ ./drivers/md/md.c 2006-06-19 12:01:47.000000000 +1000
@@ -2719,8 +2719,6 @@ static int do_md_run(mddev_t * mddev)
}

set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- md_wakeup_thread(mddev->thread);
-
if (mddev->sb_dirty)
md_update_sb(mddev);

@@ -2738,6 +2736,10 @@ static int do_md_run(mddev_t * mddev)

mddev->changed = 1;
md_new_event(mddev);
+
+ md_wakeup_thread(mddev->thread);
+ md_wakeup_thread(mddev->sync_thread);
+
return 0;
}


diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c 2006-06-19 11:56:41.000000000 +1000
+++ ./drivers/md/raid5.c 2006-06-19 11:56:44.000000000 +1000
@@ -2373,9 +2373,6 @@ static int run(mddev_t *mddev)
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
mddev->sync_thread = md_register_thread(md_do_sync, mddev,
"%s_reshape");
- /* FIXME if md_register_thread fails?? */
- md_wakeup_thread(mddev->sync_thread);
-
}

/* read-ahead size must cover two whole stripes, which is
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nigel J. Terry
2006-06-19 21:42:41 UTC
Permalink
Post by Neil Brown
Yes, that helps a lot, thanks.
The problem is that the reshape thread is restarting before the array
is fully set-up, so it ends up dereferencing a NULL pointer.
This patch should fix it.
In fact, there is a small chance that next time you boot it will work
without this patch, but the patch makes it more reliable.
There definitely should be no data-loss due to this bug.
Thanks,
NeilBrown
Neil

That seems to have fixed it. The reshape is now progressing and there are no apparent errors in dmesg. Details below.

I'll send another confirmation tomorrow when hopefully it has finished :-)

Many thanks for a great product and great support.

Nigel

[***@homepc ~]# cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2]
490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3]
[UUU_]
[=>...................] reshape = 7.9% (19588744/245111552)
finish=6.4min speed=578718K/sec

unused devices: <none>
[***@homepc ~]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.91.03
Creation Time : Tue Apr 18 17:44:34 2006
Raid Level : raid5
Array Size : 490223104 (467.51 GiB 501.99 GB)
Device Size : 245111552 (233.76 GiB 250.99 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Mon Jun 19 17:38:42 2006
State : clean, degraded, recovering
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 128K

Reshape Status : 8% complete
Delta Devices : 1, (3->4)

UUID : 50e3173e:b5d2bdb6:7db3576b:644409bb
Events : 0.3287189

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 3 65 2 active sync /dev/hdb1
3 0 0 3 removed

4 22 1 - spare /dev/hdc1
[***@homepc ~]#

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Neil Brown
2006-06-19 22:46:28 UTC
Permalink
Post by Nigel J. Terry
That seems to have fixed it. The reshape is now progressing and
there are no apparent errors in dmesg. Details below.
Great!
Post by Nigel J. Terry
I'll send another confirmation tomorrow when hopefully it has finished :-)
Many thanks for a great product and great support.
And thank you for being a patient beta-tester!

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nigel J. Terry
2006-06-19 22:53:37 UTC
Permalink
Post by Neil Brown
Post by Nigel J. Terry
That seems to have fixed it. The reshape is now progressing and
there are no apparent errors in dmesg. Details below.
Great!
Post by Nigel J. Terry
I'll send another confirmation tomorrow when hopefully it has finished :-)
Many thanks for a great product and great support.
And thank you for being a patient beta-tester!
NeilBrown
Neil - I see myself more as being an "idiot-proof" tester than a
beta-tester...

One comment - As I look at the rebuild, which is now over 20%, the time
till finish makes no sense. It did make sense when the first reshape
started. I guess your estimating / averaging algorithm doesn't work for
a restarted reshape. A minor cosmetic issue - see below

Nigel
[***@homepc ~]$ cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2]
490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3]
[UUU_]
[====>................] reshape = 22.7% (55742816/245111552)
finish=5.8min speed=542211K/sec

unused devices: <none>
[***@homepc ~]$


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mike Hardy
2006-06-19 23:29:53 UTC
Permalink
Post by Nigel J. Terry
One comment - As I look at the rebuild, which is now over 20%, the time
till finish makes no sense. It did make sense when the first reshape
started. I guess your estimating / averaging algorithm doesn't work for
a restarted reshape. A minor cosmetic issue - see below
Nigel
Personalities : [raid5] [raid4]
md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2]
490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3]
[UUU_]
[====>................] reshape = 22.7% (55742816/245111552)
finish=5.8min speed=542211K/sec
Unless something has changed recently the parity-rebuild-interrupted /
restarted-parity-rebuild case shows the same behavior.

It's probably the same chunk of code (I haven't looked, bad hacker!
bad!), but I thought I'd mention it in case Neil goes looking

The "speed" is truly impressive though. I'll almost be sorry to see it
fixed :-)

-Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nigel J. Terry
2006-06-19 23:33:07 UTC
Permalink
Post by Mike Hardy
Unless something has changed recently the parity-rebuild-interrupted /
restarted-parity-rebuild case shows the same behavior.
It's probably the same chunk of code (I haven't looked, bad hacker!
bad!), but I thought I'd mention it in case Neil goes looking
The "speed" is truly impressive though. I'll almost be sorry to see it
fixed :-)
-Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
I'd love to agree about the speed, but this has been the longest 5.8
minutes of my life... :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Neil Brown
2006-06-19 23:33:15 UTC
Permalink
Post by Nigel J. Terry
One comment - As I look at the rebuild, which is now over 20%, the time
till finish makes no sense. It did make sense when the first reshape
started. I guess your estimating / averaging algorithm doesn't work for
a restarted reshape. A minor cosmetic issue - see below
Nigel
Personalities : [raid5] [raid4]
md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2]
490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3]
[UUU_]
[====>................] reshape = 22.7% (55742816/245111552)
finish=5.8min speed=542211K/sec
Hmmm..... I see.
This should fix that, but I don't expect you to interrupt your reshape
to try it.

Thanks,
NeilBrown


### Diffstat output
./drivers/md/md.c | 8 +++++---
./include/linux/raid/md_k.h | 3 ++-
2 files changed, 7 insertions(+), 4 deletions(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c 2006-06-19 11:52:55.000000000 +1000
+++ ./drivers/md/md.c 2006-06-20 09:30:57.000000000 +1000
@@ -2717,7 +2717,7 @@ static ssize_t
sync_speed_show(mddev_t *mddev, char *page)
{
unsigned long resync, dt, db;
- resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active));
+ resync = (mddev->curr_mark_cnt - atomic_read(&mddev->recovery_active));
dt = ((jiffies - mddev->resync_mark) / HZ);
if (!dt) dt++;
db = resync - (mddev->resync_mark_cnt);
@@ -4688,8 +4688,9 @@ static void status_resync(struct seq_fil
*/
dt = ((jiffies - mddev->resync_mark) / HZ);
if (!dt) dt++;
- db = resync - (mddev->resync_mark_cnt/2);
- rt = (dt * ((unsigned long)(max_blocks-resync) / (db/100+1)))/100;
+ db = (mddev->curr_mark_cnt - atomic_read(&mddev->recovery_active))
+ - mddev->resync_mark_cnt;
+ rt = (dt/2 * ((unsigned long)(max_blocks-resync) / (db/100+1)))/100;

seq_printf(seq, " finish=%lu.%lumin", rt / 60, (rt % 60)/6);

@@ -5204,6 +5205,7 @@ void md_do_sync(mddev_t *mddev)

j += sectors;
if (j>1) mddev->curr_resync = j;
+ mddev->curr_mark_cnt = io_sectors;
if (last_check == 0)
/* this is the earliers that rebuilt will be
* visible in /proc/mdstat

diff .prev/include/linux/raid/md_k.h ./include/linux/raid/md_k.h
--- .prev/include/linux/raid/md_k.h 2006-06-20 09:31:22.000000000 +1000
+++ ./include/linux/raid/md_k.h 2006-06-20 09:31:58.000000000 +1000
@@ -148,9 +148,10 @@ struct mddev_s

struct mdk_thread_s *thread; /* management thread */
struct mdk_thread_s *sync_thread; /* doing resync or reconstruct */
- sector_t curr_resync; /* blocks scheduled */
+ sector_t curr_resync; /* last block scheduled */
unsigned long resync_mark; /* a recent timestamp */
sector_t resync_mark_cnt;/* blocks written at resync_mark */
+ sector_t curr_mark_cnt; /* blocks scheduled now */

sector_t resync_max_sectors; /* may be set by personality */

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nigel J. Terry
2006-06-19 23:35:10 UTC
Permalink
Post by Neil Brown
Post by Nigel J. Terry
One comment - As I look at the rebuild, which is now over 20%, the time
till finish makes no sense. It did make sense when the first reshape
started. I guess your estimating / averaging algorithm doesn't work for
a restarted reshape. A minor cosmetic issue - see below
Nigel
Personalities : [raid5] [raid4]
md0 : active raid5 sdb1[1] sda1[0] hdc1[4](S) hdb1[2]
490223104 blocks super 0.91 level 5, 128k chunk, algorithm 2 [4/3]
[UUU_]
[====>................] reshape = 22.7% (55742816/245111552)
finish=5.8min speed=542211K/sec
Hmmm..... I see.
This should fix that, but I don't expect you to interrupt your reshape
to try it.
Thanks,
NeilBrown
I have nothing better to do, I'll give it a go and let you know...
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nigel J. Terry
2006-06-20 10:35:29 UTC
Permalink
Nigel J. Terry wrote:

Well good news and bad news I'm afraid...

Well I would like to be able to tell you that the time calculation now
works, but I can't. Here's why: Why I rebooted with the newly built
kernel, it decided to hit the magic 21 reboots and hence decided to
check the array for clean. The normally takes about 5-10 mins, but this
time took several hours, so I went to bed! I suspect that it was doing
the full reshape or something similar at boot time.

Now I am not sure that this makes good sense in a normal environment.
This could keep a server down for hours or days. I might suggest that if
such work was required, the clean check is postponed till next boot and
the reshape allowed to continue in the background.

Anyway the good news is that this morning, all is well, the array is
clean and grown as can be seen below. However, if you look further below
you will see the section from dmesg which still shows RIP errors, so I
guess there is still something wrong, even though it looks like it is
working. Let me know if i can provide any more information.

Once again, many thanks. All I need to do now is grow the ext3 filesystem...

Nigel

[***@homepc ~]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Tue Apr 18 17:44:34 2006
Raid Level : raid5
Array Size : 735334656 (701.27 GiB 752.98 GB)
Device Size : 245111552 (233.76 GiB 250.99 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Tue Jun 20 06:27:49 2006
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 128K

UUID : 50e3173e:b5d2bdb6:7db3576b:644409bb
Events : 0.3366644

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 3 65 2 active sync /dev/hdb1
3 22 1 3 active sync /dev/hdc1
[***@homepc ~]# cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 sdb1[1] sda1[0] hdc1[3] hdb1[2]
735334656 blocks level 5, 128k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>
[***@homepc ~]#

But from dmesg:

md: Autodetecting RAID arrays.
md: autorun ...
md: considering sdb1 ...
md: adding sdb1 ...
md: adding sda1 ...
md: adding hdc1 ...
md: adding hdb1 ...
md: created md0
md: bind<hdb1>
md: bind<hdc1>
md: bind<sda1>
md: bind<sdb1>
md: running: <sdb1><sda1><hdc1><hdb1>
raid5: automatically using best checksumming function: generic_sse
generic_sse: 6795.000 MB/sec
raid5: using function: generic_sse (6795.000 MB/sec)
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
raid5: reshape will continue
raid5: device sdb1 operational as raid disk 1
raid5: device sda1 operational as raid disk 0
raid5: device hdb1 operational as raid disk 2
raid5: allocated 4268kB for md0
raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2
RAID5 conf printout:
--- rd:4 wd:3 fd:1
disk 0, o:1, dev:sda1
disk 1, o:1, dev:sdb1
disk 2, o:1, dev:hdb1
...ok start reshape thread
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000
KB/sec) for reconstruction.
md: using 128k window, over a total of 245111552 blocks.
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
<0000000000000000>{stext+2145382632}
PGD 7c3f9067 PUD 7cb9e067 PMD 0
Oops: 0010 [1] SMP
CPU 0
Modules linked in: raid5 xor usb_storage video button battery ac lp
parport_pc parport floppy nvram snd_intel8x0 snd_ac97_codec snd_ac97_bus
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
snd_pcm_oss snd_mixer_oss ehci_hcd ohci1394 ieee1394 sg snd_pcm uhci_hcd
i2c_nforce2 i2c_core forcedeth ohci_hcd snd_timer snd soundcore
snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd sata_nv
libata sd_mod scsi_mod
Pid: 1432, comm: md0_reshape Not tainted 2.6.17-rc6 #1
RIP: 0010:[<0000000000000000>] <0000000000000000>{stext+2145382632}
RSP: 0000:ffff81007aa43d60 EFLAGS: 00010246
RAX: ffff81007cf72f20 RBX: ffff81007c682000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff81007cf72f20
RBP: 0000000002090900 R08: 0000000000000000 R09: ffff810037f497b0
R10: 0000000b44ffd564 R11: ffffffff8022c92a R12: 0000000000000000
R13: 0000000000000100 R14: 0000000000000000 R15: 0000000000000000
FS: 000000000066d870(0000) GS:ffffffff80611000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000007bebc000 CR4: 00000000000006e0
Process md0_reshape (pid: 1432, threadinfo ffff81007aa42000, task
ffff810037f497b0)
Stack: ffffffff803dce42 0000000000000000 000000001d383600 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000
Call Trace: <ffffffff803dce42>{md_do_sync+1307}
<ffffffff802640c0>{thread_return+0}
<ffffffff8026411e>{thread_return+94}
<ffffffff8029925d>{keventd_create_kthread+0}
<ffffffff803dd3d9>{md_thread+248}
<ffffffff8029925d>{keventd_create_kthread+0}
<ffffffff803dd2e1>{md_thread+0} <ffffffff80232cb1>{kthread+254}
<ffffffff8026051e>{child_rip+8}
<ffffffff8029925d>{keventd_create_kthread+0}
<ffffffff802640c0>{thread_return+0} <ffffffff80232bb3>{kthread+0}
<ffffffff80260516>{child_rip+0}

Code: Bad RIP value.
RIP <0000000000000000>{stext+2145382632} RSP <ffff81007aa43d60>
CR2: 0000000000000000
<6>md: ... autorun DONE.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Neil Brown
2006-06-21 06:02:39 UTC
Permalink
Post by Nigel J. Terry
Well good news and bad news I'm afraid...
Well I would like to be able to tell you that the time calculation now
works, but I can't. Here's why: Why I rebooted with the newly built
kernel, it decided to hit the magic 21 reboots and hence decided to
check the array for clean. The normally takes about 5-10 mins, but this
time took several hours, so I went to bed! I suspect that it was doing
the full reshape or something similar at boot time.
What "magic 21 reboots"?? md has no mechanism to automatically check
the array after N reboots or anything like that. Or are you thinking
of the 'fsck' that does a full check every so-often?
Post by Nigel J. Terry
Now I am not sure that this makes good sense in a normal environment.
This could keep a server down for hours or days. I might suggest that if
such work was required, the clean check is postponed till next boot and
the reshape allowed to continue in the background.
An fsck cannot tell if there is a reshape happening, but the reshape
should notice the fsck and slow down to a crawl so the fsck can complete...
Post by Nigel J. Terry
Anyway the good news is that this morning, all is well, the array is
clean and grown as can be seen below. However, if you look further below
you will see the section from dmesg which still shows RIP errors, so I
guess there is still something wrong, even though it looks like it is
working. Let me know if i can provide any more information.
Once again, many thanks. All I need to do now is grow the ext3 filesystem...
.....
Post by Nigel J. Terry
...ok start reshape thread
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000
KB/sec) for reconstruction.
md: using 128k window, over a total of 245111552 blocks.
<0000000000000000>{stext+2145382632}
PGD 7c3f9067 PUD 7cb9e067 PMD 0
....
Post by Nigel J. Terry
Process md0_reshape (pid: 1432, threadinfo ffff81007aa42000, task
ffff810037f497b0)
Stack: ffffffff803dce42 0000000000000000 000000001d383600 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000
Call Trace: <ffffffff803dce42>{md_do_sync+1307}
<ffffffff802640c0>{thread_return+0}
<ffffffff8026411e>{thread_return+94}
<ffffffff8029925d>{keventd_create_kthread+0}
<ffffffff803dd3d9>{md_thread+248}
That looks very much like the bug that I already sent you a patch for!
Are you sure that the new kernel still had this patch?

I'm a bit confused by this....

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Nigel J. Terry
2006-06-24 22:58:59 UTC
Permalink
Neil

Well I did warn you that I was an idiot... :-) I have been attempting to
work out exactly what I did and what happened. All I have learned is
that I need to keep better notes

Yes, the 21 mounts is a fsck, nothing to do with raid. However it is
still noteworthy that this took several hours to complete with the raid
also reshaping rather than the few minutes I have seen in the past. Some
kind of interaction there.

I think that the kernel I was using had both the fixes you had sent me
in it, but I honestly can't be sure - Sorry. In the past, that bug
caused it to fail immediately and the reshape to freeze. This appeared
to occur after the reshape, maybe a problem at the end of the reshape
process. Probably however I screwed up, and I have no way to retest.

Finally, just a note to say that the system continues to work just fine
and I am really impressed. Thanks again

Nigel

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Continue reading on narkive:
Loading...