Discussion:
Fixes for DDF with LSI BIOS RAID
m***@arcor.de
12 years ago
Permalink
I have been trying to run MD with a BIOS RAID from LSI (LSI Mega Software RAID).
While md nicely detected the RAID setup (2 RAID-1 volumes) in the first place,
I found a few problems. Most importantly, MD would start an intialization after
every reboot. This is not a systemd issue here, I was working with an old distro
with SysV init (but current mdadm). The problem was that the "init_state" flag
was reset after every boot (more precisely, the BIOS restored a different DDF
structure with the init_state flag cleared, ignoring the value mdadm had set).
The second patch in the series is the one that solves this problem. The other two
are enhancements.

Please review.

Martin

PS: These patches fix RAID1. I noticed that there are more severe problems
when I create a RAID 10 in the BIOS. I'll try to fix that, too.

[PATCH 1/3] DDF: cleanly save the secondary DDF structure
[PATCH 2/3] DDF: use existing locations for primary and secondary DDF structure
[PATCH 3/3] DDF: increase seq number when writing meta data
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
m***@arcor.de
12 years ago
Permalink
From: Martin.Wilck <***@arcor.de>

Cleanly increase the seq number when the DDF structures are
written, instead of always setting it back to 1.
---
super-ddf.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/super-ddf.c b/super-ddf.c
index 7fe038e..2f284f3 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -2445,7 +2445,14 @@ static int __write_init_super_ddf(struct supertype *st)
ddf->anchor.secondary_lba = d->secondary_lba;
else
ddf->anchor.secondary_lba = __cpu_to_be64(size - 32*1024*2);
- ddf->anchor.seq = __cpu_to_be32(1);
+ if (ddf->primary.seq != 0xffffffff)
+ ddf->anchor.seq = __cpu_to_be32(
+ __be32_to_cpu(ddf->primary.seq)+1);
+ else if (ddf->secondary.seq != 0xffffffff)
+ ddf->anchor.seq = __cpu_to_be32(
+ __be32_to_cpu(ddf->secondary.seq)+1);
+ else
+ ddf->anchor.seq = __cpu_to_be32(1);
memcpy(&ddf->primary, &ddf->anchor, 512);
memcpy(&ddf->secondary, &ddf->anchor, 512);
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
m***@arcor.de
12 years ago
Permalink
From: Martin Wilck <***@arcor.de>

Some RAID BIOSes apparently use hard-coded LBA offsets (presumably
from the end of the disk) for the primary and secondary DDF
structure, ignoring the values given in the DDF anchor. This is
broken BIOS behavior, but it will cause any changes made by MD
(e.g. setting the init_state flag after a full initialization)
to be "forgotten" after the next reboot.

This patch fixes this by using the exiting LBA locations if
available. Verified that this fixes MD+LSI Mega Software RAID
BIOS.
---
super-ddf.c | 28 ++++++++++++++++++++++++----
1 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/super-ddf.c b/super-ddf.c
index c336db4..7fe038e 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -421,6 +421,9 @@ struct ddf_super {
char *devname;
int fd;
unsigned long long size; /* sectors */
+ unsigned long long primary_lba; /* sectors */
+ unsigned long long secondary_lba; /* sectors */
+ unsigned long long workspace_lba; /* sectors */
int pdnum; /* index in ->phys */
struct spare_assign *spare;
void *mdupdate; /* hold metadata update */
@@ -666,8 +669,16 @@ static int load_ddf_local(int fd, struct ddf_super *super,
dl->fd = keep ? fd : -1;

dl->size = 0;
- if (get_dev_size(fd, devname, &dsize))
+ if (get_dev_size(fd, devname, &dsize)) {
dl->size = dsize >> 9;
+ }
+ /* If the disks have different sizes, the LBAs will differ
+ between phys disks.
+ At this point here, the values in super->active must be valid
+ for this phys disk. */
+ dl->primary_lba = super->active->primary_lba;
+ dl->secondary_lba = super->active->secondary_lba;
+ dl->workspace_lba = super->active->workspace_lba;
dl->spare = NULL;
for (i = 0 ; i < super->max_part ; i++)
dl->vlist[i] = NULL;
@@ -2422,9 +2433,18 @@ static int __write_init_super_ddf(struct supertype *st)
*/
get_dev_size(fd, NULL, &size);
size /= 512;
- ddf->anchor.workspace_lba = __cpu_to_be64(size - 32*1024*2);
- ddf->anchor.primary_lba = __cpu_to_be64(size - 16*1024*2);
- ddf->anchor.secondary_lba = __cpu_to_be64(size - 31*1024*2);
+ if (d->workspace_lba != 0)
+ ddf->anchor.workspace_lba = d->workspace_lba;
+ else
+ ddf->anchor.workspace_lba = __cpu_to_be64(size - 32*1024*2);
+ if (d->primary_lba != 0)
+ ddf->anchor.primary_lba = d->primary_lba;
+ else
+ ddf->anchor.primary_lba = __cpu_to_be64(size - 16*1024*2);
+ if (d->secondary_lba != 0)
+ ddf->anchor.secondary_lba = d->secondary_lba;
+ else
+ ddf->anchor.secondary_lba = __cpu_to_be64(size - 32*1024*2);
ddf->anchor.seq = __cpu_to_be32(1);
memcpy(&ddf->primary, &ddf->anchor, 512);
memcpy(&ddf->secondary, &ddf->anchor, 512);
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
m***@arcor.de
12 years ago
Permalink
From: Martin Wilck <***@arcor.de>

So far, mdadm only saved the header of the secondary structure.
With this patch, the full secondary DDF structure is saved
consistently
---
super-ddf.c | 136 +++++++++++++++++++++++++++++++++-------------------------
1 files changed, 77 insertions(+), 59 deletions(-)

diff --git a/super-ddf.c b/super-ddf.c
index 3b3c1f0..c336db4 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -2317,17 +2317,86 @@ static int remove_from_super_ddf(struct supertype *st, mdu_disk_info_t *dk)
*/
#define NULL_CONF_SZ 4096

-static int __write_init_super_ddf(struct supertype *st)
+static int __write_ddf_structure(struct dl *d, struct ddf_super *ddf, __u8 type, char *null_aligned)
{
+ unsigned long long sector;
+ struct ddf_header *header;
+ int fd, i, n_config, conf_size;
+
+ fd = d->fd;
+
+ switch (type) {
+ case DDF_HEADER_PRIMARY:
+ header = &ddf->primary;
+ sector = __be64_to_cpu(header->primary_lba);
+ break;
+ case DDF_HEADER_SECONDARY:
+ header = &ddf->secondary;
+ sector = __be64_to_cpu(header->secondary_lba);
+ break;
+ default:
+ return 0;
+ }
+
+ header->type = type;
+ header->openflag = 0;
+ header->crc = calc_crc(header, 512);
+
+ lseek64(fd, sector<<9, 0);
+ if (write(fd, header, 512) < 0)
+ return 0;
+
+ ddf->controller.crc = calc_crc(&ddf->controller, 512);
+ if (write(fd, &ddf->controller, 512) < 0)
+ return 0;
+
+ ddf->phys->crc = calc_crc(ddf->phys, ddf->pdsize);
+ if (write(fd, ddf->phys, ddf->pdsize) < 0)
+ return 0;
+ ddf->virt->crc = calc_crc(ddf->virt, ddf->vdsize);
+ if (write(fd, ddf->virt, ddf->vdsize) < 0)
+ return 0;
+
+ /* Now write lots of config records. */
+ n_config = ddf->max_part;
+ conf_size = ddf->conf_rec_len * 512;
+ for (i = 0 ; i <= n_config ; i++) {
+ struct vcl *c = d->vlist[i];
+ if (i == n_config)
+ c = (struct vcl*)d->spare;
+
+ if (c) {
+ c->conf.crc = calc_crc(&c->conf, conf_size);
+ if (write(fd, &c->conf, conf_size) < 0)
+ break;
+ } else {
+ unsigned int togo = conf_size;
+ while (togo > NULL_CONF_SZ) {
+ if (write(fd, null_aligned, NULL_CONF_SZ) < 0)
+ break;
+ togo -= NULL_CONF_SZ;
+ }
+ if (write(fd, null_aligned, togo) < 0)
+ break;
+ }
+ }
+ if (i <= n_config)
+ return 0;
+
+ d->disk.crc = calc_crc(&d->disk, 512);
+ if (write(fd, &d->disk, 512) < 0)
+ return 0;

+ return 1;
+}
+
+static int __write_init_super_ddf(struct supertype *st)
+{
struct ddf_super *ddf = st->sb;
- int i;
struct dl *d;
- int n_config;
- int conf_size;
int attempts = 0;
int successes = 0;
- unsigned long long size, sector;
+ unsigned long long size;
char *null_aligned;

if (posix_memalign((void**)&null_aligned, 4096, NULL_CONF_SZ) != 0) {
@@ -2355,6 +2424,7 @@ static int __write_init_super_ddf(struct supertype *st)
size /= 512;
ddf->anchor.workspace_lba = __cpu_to_be64(size - 32*1024*2);
ddf->anchor.primary_lba = __cpu_to_be64(size - 16*1024*2);
+ ddf->anchor.secondary_lba = __cpu_to_be64(size - 31*1024*2);
ddf->anchor.seq = __cpu_to_be32(1);
memcpy(&ddf->primary, &ddf->anchor, 512);
memcpy(&ddf->secondary, &ddf->anchor, 512);
@@ -2363,64 +2433,12 @@ static int __write_init_super_ddf(struct supertype *st)
ddf->anchor.seq = 0xFFFFFFFF; /* no sequencing in anchor */
ddf->anchor.crc = calc_crc(&ddf->anchor, 512);

- ddf->primary.openflag = 0;
- ddf->primary.type = DDF_HEADER_PRIMARY;
-
- ddf->secondary.openflag = 0;
- ddf->secondary.type = DDF_HEADER_SECONDARY;
-
- ddf->primary.crc = calc_crc(&ddf->primary, 512);
- ddf->secondary.crc = calc_crc(&ddf->secondary, 512);
-
- sector = size - 16*1024*2;
- lseek64(fd, sector<<9, 0);
- if (write(fd, &ddf->primary, 512) < 0)
+ if (!__write_ddf_structure(d, ddf, DDF_HEADER_PRIMARY, null_aligned))
continue;

- ddf->controller.crc = calc_crc(&ddf->controller, 512);
- if (write(fd, &ddf->controller, 512) < 0)
- continue;
-
- ddf->phys->crc = calc_crc(ddf->phys, ddf->pdsize);
-
- if (write(fd, ddf->phys, ddf->pdsize) < 0)
+ if (!__write_ddf_structure(d, ddf, DDF_HEADER_SECONDARY, null_aligned))
continue;

- ddf->virt->crc = calc_crc(ddf->virt, ddf->vdsize);
- if (write(fd, ddf->virt, ddf->vdsize) < 0)
- continue;
-
- /* Now write lots of config records. */
- n_config = ddf->max_part;
- conf_size = ddf->conf_rec_len * 512;
- for (i = 0 ; i <= n_config ; i++) {
- struct vcl *c = d->vlist[i];
- if (i == n_config)
- c = (struct vcl*)d->spare;
-
- if (c) {
- c->conf.crc = calc_crc(&c->conf, conf_size);
- if (write(fd, &c->conf, conf_size) < 0)
- break;
- } else {
- unsigned int togo = conf_size;
- while (togo > NULL_CONF_SZ) {
- if (write(fd, null_aligned, NULL_CONF_SZ) < 0)
- break;
- togo -= NULL_CONF_SZ;
- }
- if (write(fd, null_aligned, togo) < 0)
- break;
- }
- }
- if (i <= n_config)
- continue;
- d->disk.crc = calc_crc(&d->disk, 512);
- if (write(fd, &d->disk, 512) < 0)
- continue;
-
- /* Maybe do the same for secondary */
-
lseek64(fd, (size-1)*512, SEEK_SET);
if (write(fd, &ddf->anchor, 512) < 0)
continue;
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Martin Wilck
12 years ago
Permalink
Post by m***@arcor.de
PS: These patches fix RAID1. I noticed that there are more severe problems
when I create a RAID 10 in the BIOS. I'll try to fix that, too.
I've digged a little further and from what I understand, the RAID10
concept in DDF ("DDF_2SPANNED" is what by BIOS creates) is incompatible
with that of MD RAID.

So I guess there is no easy solution, short of implementing an entirely
new RAID mapping in md.

Another possibility might be to setup the "spanning" block map of the
DDF secondary RAID using dm. There would be some minor problems to
overcome first though:

1 - Currently, if I configure RAID10 in the BIOS, md will only
configure the first of the two basic RAID1 arrays.
2 - When md is stopped and writes back the meta data, it will change
the seq number only on the first basic arrays, thus corrupting the meta
data of the RAID set.

I'd suggest that md should bail out if a DDF secondary RAID level is
encountered, knowing that this mapping is currently unsupported. That'd
be certainly better than corrupting the metadata.

Regards
Martin



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Martin Wilck
12 years ago
Permalink
Hi,

I wonder if anybody took a look at these patches I sent a month ago.
I'd appreciate some feedback.

Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...