Bad sequential performance of RAID5 with a lot of disk seeks

Discussion:

P. Gautschi

2014-10-07 04:44:49 UTC

I've created a RAID5 on 5 identical SATA disks. Doing some performance measurements
with dd I get a disappointing performance.
A dd with bs=1M on a btrfs created on md0 transfers about 110 MB/s. (both read and write)
A dd on md0 has the same write speed but only about 20 MB/s on read.
In all of the tests I hear the disk constantly seeking. This was also the case
during creation of the array.
I also created a RAID4 to make sure that I doesn't get fooled by the stripe layout of RAID5.
Now I get about 110 MB/s for write and 230 MB/s for read on md0. But the constant
seeking is still present for both read and write and during creation of the array.

Why are the disk perform so many seek operations? I think a sequential access on md0 should
cause a sequential access on the individual disk.

I have to add that I did something unusual: I created the RAID4/5 with a chunk size of 4KiB.
The idea of this was that when I'm going to use btrfs with the default nodesize of 16KiB
all node write will fill a full stripe and there won't be any RMW at all. (both fortunate
for performance and integrity in a power loss situation.)
Nevertheless I think a sequential access on the array should cause a sequential access on the
disks for any chunk size if the read/write block size is a exact multiple of
the (numdisks-1)*chunk size.

Is there any explanation for the seeks and how do I get rid of them?

Patrick
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Robin Hill

2014-10-07 07:43:47 UTC

Permalink

Post by P. Gautschi
I've created a RAID5 on 5 identical SATA disks. Doing some performance measurements
with dd I get a disappointing performance.
A dd with bs=1M on a btrfs created on md0 transfers about 110 MB/s. (both read and write)
A dd on md0 has the same write speed but only about 20 MB/s on read.
In all of the tests I hear the disk constantly seeking. This was also the case
during creation of the array.
I also created a RAID4 to make sure that I doesn't get fooled by the stripe layout of RAID5.
Now I get about 110 MB/s for write and 230 MB/s for read on md0. But the constant
seeking is still present for both read and write and during creation of the array.
Why are the disk perform so many seek operations? I think a sequential access on md0 should
cause a sequential access on the individual disk.
I have to add that I did something unusual: I created the RAID4/5 with a chunk size of 4KiB.
The idea of this was that when I'm going to use btrfs with the default nodesize of 16KiB
all node write will fill a full stripe and there won't be any RMW at all. (both fortunate
for performance and integrity in a power loss situation.)
Nevertheless I think a sequential access on the array should cause a sequential access on the
disks for any chunk size if the read/write block size is a exact multiple of
the (numdisks-1)*chunk size.
Is there any explanation for the seeks and how do I get rid of them?

After creating the arrays did you wait for them to finish syncing? The
array is created in degraded mode initially and then rebuilds onto the
additional disk (this is the fastest way to do things, unless you know
the disks are all zeroed initially). Until this rebuild is complete then
it'll be competing with any other disk activity.

Cheers,
Robin

--
___
( ' } | Robin Hill <***@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

P. Gautschi

2014-10-07 07:54:22 UTC

Permalink

Yes I did wait for the syncing to complete before accessing md0.
I had the seeks during the syncing and afterwards when reading or
writing to the array.

Patrick

Post by Robin Hill

Post by P. Gautschi
I've created a RAID5 on 5 identical SATA disks. Doing some
performance measurements
with dd I get a disappointing performance.
A dd with bs=1M on a btrfs created on md0 transfers about 110 MB/s. (both read and write)
A dd on md0 has the same write speed but only about 20 MB/s on read.
In all of the tests I hear the disk constantly seeking. This was also the case
during creation of the array.
I also created a RAID4 to make sure that I doesn't get fooled by
the stripe layout of RAID5.
Now I get about 110 MB/s for write and 230 MB/s for read on md0. But the constant
seeking is still present for both read and write and during
creation of the array.
Why are the disk perform so many seek operations? I think a
sequential access on md0 should
cause a sequential access on the individual disk.
I have to add that I did something unusual: I created the RAID4/5
with a chunk size of 4KiB.
The idea of this was that when I'm going to use btrfs with the default nodesize of 16KiB
all node write will fill a full stripe and there won't be any RMW at all. (both fortunate
for performance and integrity in a power loss situation.)
Nevertheless I think a sequential access on the array should cause
a sequential access on the
disks for any chunk size if the read/write block size is a exact multiple of
the (numdisks-1)*chunk size.
Is there any explanation for the seeks and how do I get rid of them?

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Robin Hill

2014-10-07 09:25:48 UTC

Permalink

Post by P. Gautschi

Post by Robin Hill

Post by P. Gautschi
I've created a RAID5 on 5 identical SATA disks. Doing some
performance measurements
with dd I get a disappointing performance.
A dd with bs=1M on a btrfs created on md0 transfers about 110 MB/s.
(both read and write)
A dd on md0 has the same write speed but only about 20 MB/s on read.
In all of the tests I hear the disk constantly seeking. This was also the case
during creation of the array.
I also created a RAID4 to make sure that I doesn't get fooled by
the stripe layout of RAID5.
Now I get about 110 MB/s for write and 230 MB/s for read on md0. But the constant
seeking is still present for both read and write and during
creation of the array.
Why are the disk perform so many seek operations? I think a
sequential access on md0 should
cause a sequential access on the individual disk.
I have to add that I did something unusual: I created the RAID4/5
with a chunk size of 4KiB.
The idea of this was that when I'm going to use btrfs with the
default nodesize of 16KiB
all node write will fill a full stripe and there won't be any RMW
at all. (both fortunate
for performance and integrity in a power loss situation.)
Nevertheless I think a sequential access on the array should cause
a sequential access on the
disks for any chunk size if the read/write block size is a exact multiple of
the (numdisks-1)*chunk size.
Is there any explanation for the seeks and how do I get rid of them?

Yes I did wait for the syncing to complete before accessing md0.
I had the seeks during the syncing and afterwards when reading or
writing to the array.

Hmm, shouldn't be seeking then.

What does the SMART info show for the drives - are there any reallocated
blocks? A large number of those scattered over the disk would certainly
cause seeking for both reads and writes.

It's also worth checking whether there's anything else that would be
accessing the disks in the background (monitoring/indexing/etc).

I can't think of anything else that would be causing reads to seek - SMR
disks or write-intent bitmaps would only affect writes.

Cheers,
Robin

--
___
( ' } | Robin Hill <***@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

P. Gautschi

2014-10-07 10:36:19 UTC

Permalink

Post by Robin Hill
What does the SMART info show for the drives - are there any reallocated
blocks? A large number of those scattered over the disk would certainly
cause seeking for both reads and writes.

I will check the SMART this evening but I don't think that this is causing
the seek. The sound is very constant and for the whole time of syncing
the array.
I will also run a dd on the disk to compare.

Post by Robin Hill
It's also worth checking whether there's anything else that would be
accessing the disks in the background (monitoring/indexing/etc).

Unlikely because I have not yet created a filesystem after setting up
the RAID4.

Post by Robin Hill
I can't think of anything else that would be causing reads to seek - SMR
disks or write-intent bitmaps would only affect writes.

Exactly

Is there any way or tool to monitor all disk read/write commands - not only
the count or amount but every access with LBA and length?

Patrick
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Robin Hill

2014-10-07 11:05:26 UTC

Permalink

Post by P. Gautschi

Post by Robin Hill
It's also worth checking whether there's anything else that would be
accessing the disks in the background (monitoring/indexing/etc).

Unlikely because I have not yet created a filesystem after setting up
the RAID4.

Post by Robin Hill
I can't think of anything else that would be causing reads to seek - SMR
disks or write-intent bitmaps would only affect writes.

Exactly
Is there any way or tool to monitor all disk read/write commands - not only
the count or amount but every access with LBA and length?

You can do:
echo 1 > /proc/sys/vm/block_dump

That will write out all disk IO to the kernel log (process ID,
read/write and block offset only though). It can be very verbose,
especially if you have a lot of other things running on the system, but
you should be able to grep out the necessary lines. Echoing 0 will
switch it back off again.

Otherwise there's probably ways to get more specific results via the
kernel auditing system, but that's nothing I've played with.

Cheers,
Robin

--
___
( ' } | Robin Hill <***@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

XiaoNi

2014-10-08 09:05:32 UTC

Permalink

Hi P. Gautschi

How can you find the seek operations? Do you use blktrace or other
commands? Can you give the
detail commands and informations?

BTW, I think ever if you use larger chunksize when create raid,
there is few RMW. There is a period to wait for full write.

Post by P. Gautschi
I have to add that I did something unusual: I created the RAID4/5 with
a chunk size of 4KiB.
The idea of this was that when I'm going to use btrfs with the default nodesize of 16KiB
all node write will fill a full stripe and there won't be any RMW at all. (both fortunate
for performance and integrity in a power loss situation.)
Nevertheless I think a sequential access on the array should cause a
sequential access on the
disks for any chunk size if the read/write block size is a exact multiple of
the (numdisks-1)*chunk size.
Is there any explanation for the seeks and how do I get rid of them?
Patrick
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html