aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/md
Commit message (Collapse)AuthorAgeFilesLines
...
* | | dm table: fix write same supportMike Snitzer2013-05-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If device_not_write_same_capable() returns true then the iterate_devices loop in dm_table_supports_write_same() should return false. Reported-by: Bharata B Rao <bharata.rao@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.8+ Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* | | dm bufio: avoid a possible __vmalloc deadlockMikulas Patocka2013-05-101-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch uses memalloc_noio_save to avoid a possible deadlock in dm-bufio. (it could happen only with large block size, at most PAGE_SIZE << MAX_ORDER (typically 8MiB). __vmalloc doesn't fully respect gfp flags. The specified gfp flags are used for allocation of requested pages, structures vmap_area, vmap_block and vm_struct and the radix tree nodes. However, the kernel pagetables are allocated always with GFP_KERNEL. Thus the allocation of pagetables can recurse back to the I/O layer and cause a deadlock. This patch uses the function memalloc_noio_save to set per-process PF_MEMALLOC_NOIO flag and the function memalloc_noio_restore to restore it. When this flag is set, all allocations in the process are done with implied GFP_NOIO flag, thus the deadlock can't happen. This should be backported to stable kernels, but they don't have the PF_MEMALLOC_NOIO flag and memalloc_noio_save/memalloc_noio_restore functions. So, PF_MEMALLOC should be set and restored instead. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* | | dm snapshot: fix error return code in snapshot_ctrWei Yongjun2013-05-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Return -ENOMEM instead of success if unable to allocate pending exception mempool in snapshot_ctr. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* | | dm cache: fix error return code in cache_createWei Yongjun2013-05-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Return -ENOMEM if memory allocation fails in cache_create instead of 0 (to avoid NULL pointer dereference). Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* | | dm stripe: fix regression in stripe_width calculationMike Snitzer2013-05-101-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a regression in the calculation of the stripe_width in the dm stripe target which led to incorrect processing of device limits. The stripe_width is the stripe device length divided by the number of stripes. The group of commits in the range f14fa69 ("dm stripe: fix size test") to eb850de ("dm stripe: support for non power of 2 chunksize") interfered with each other (a merging error) and led to the stripe_width being set incorrectly to the stripe device length divided by chunk_size * stripe_count. For example, a stripe device's table with: 0 33553920 striped 3 512 ... should result in a stripe_width of 11184640 (33553920 / 3), but due to the bug it was getting set to 21845 (33553920 / (512 * 3)). The impact of this bug is that device topologies that previously worked fine with the stripe target are no longer considered valid. In particular, there is a higher risk of seeing this issue if one of the stripe devices has a 4K logical block size. Resulting in an error message like this: "device-mapper: table: 253:4: len=21845 not aligned to h/w logical block size 4096 of dm-1" The fix is to swap the order of the divisions and to use a temporary variable for the second one, so that width retains the intended value. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.6+ Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* | | Merge branch 'for-3.10/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds2013-05-0829-0/+15775
|\ \ \ | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver updates from Jens Axboe: "It might look big in volume, but when categorized, not a lot of drivers are touched. The pull request contains: - mtip32xx fixes from Micron. - A slew of drbd updates, this time in a nicer series. - bcache, a flash/ssd caching framework from Kent. - Fixes for cciss" * 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits) bcache: Use bd_link_disk_holder() bcache: Allocator cleanup/fixes cciss: bug fix to prevent cciss from loading in kdump crash kernel cciss: add cciss_allow_hpsa module parameter drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions mtip32xx: Workaround for unaligned writes bcache: Make sure blocksize isn't smaller than device blocksize bcache: Fix merge_bvec_fn usage for when it modifies the bvm bcache: Correctly check against BIO_MAX_PAGES bcache: Hack around stuff that clones up to bi_max_vecs bcache: Set ra_pages based on backing device's ra_pages bcache: Take data offset from the bdev superblock. mtip32xx: mtip32xx: Disable TRIM support mtip32xx: fix a smatch warning bcache: Disable broken btree fuzz tester bcache: Fix a format string overflow bcache: Fix a minor memory leak on device teardown bcache: Documentation updates bcache: Use WARN_ONCE() instead of __WARN() bcache: Add missing #include <linux/prefetch.h> ...
| * | bcache: Use bd_link_disk_holder()Kent Overstreet2013-04-301-17/+35
| | | | | | | | | | | | Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * | bcache: Allocator cleanup/fixesKent Overstreet2013-04-302-25/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | The main fix is that bch_allocator_thread() wasn't waiting on garbage collection to finish (if invalidate_buckets had set ca->invalidate_needs_gc); we need that to make sure the allocator doesn't spin and potentially block gc from finishing. Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * | bcache: Make sure blocksize isn't smaller than device blocksizeKent Overstreet2013-04-241-2/+6
| | | | | | | | | | | | | | | | | | | | | Sanity check to make sure we don't end up doing IO the device doesn't support. Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * | bcache: Fix merge_bvec_fn usage for when it modifies the bvmKent Overstreet2013-04-221-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | Stacked md devices reuse the bvm for the subordinate device, causing problems... Reported-by: Michael Balser <michael.balser@profitbricks.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * | bcache: Correctly check against BIO_MAX_PAGESKent Overstreet2013-04-201-5/+4
| | | | | | | | | | | | | | | | | | | | | bch_bio_max_sectors() was checking against BIO_MAX_PAGES as if the limit was for the total bytes in the bio, not the number of segments. Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * | bcache: Hack around stuff that clones up to bi_max_vecsKent Overstreet2013-04-201-0/+9
| | | | | | | | | | | | Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * | bcache: Set ra_pages based on backing device's ra_pagesKent Overstreet2013-04-201-0/+4
| | | | | | | | | | | | Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * | bcache: Take data offset from the bdev superblock.Kent Overstreet2013-04-203-57/+100
| | | | | | | | | | | | | | | | | | | | | Add a new superblock version, and consolidate related defines. Signed-off-by: Gabriel de Perthuis <g2p.code+bcache@gmail.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * | bcache: Disable broken btree fuzz testerKent Overstreet2013-04-081-2/+4
| | | | | | | | | | | | | | | Reported-by: <sasha.levin@oracle.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * | bcache: Fix a format string overflowKent Overstreet2013-04-081-2/+2
| | | | | | | | | | | | | | | Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * | bcache: Fix a minor memory leak on device teardownKent Overstreet2013-04-081-1/+3
| | | | | | | | | | | | | | | Reported-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * | bcache: Use WARN_ONCE() instead of __WARN()Kent Overstreet2013-04-081-1/+1
| | | | | | | | | | | | Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * | bcache: Add missing #include <linux/prefetch.h>Geert Uytterhoeven2013-04-082-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | m68k/allmodconfig: drivers/md/bcache/bset.c: In function ‘bset_search_tree’: drivers/md/bcache/bset.c:727: error: implicit declaration of function ‘prefetch’ drivers/md/bcache/btree.c: In function ‘bch_btree_node_get’: drivers/md/bcache/btree.c:933: error: implicit declaration of function ‘prefetch’ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * | bcache: Sparse fixesKent Overstreet2013-04-084-90/+92
| | | | | | | | | | | | Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * | bcache: Don't export utility code, prefix with bch_Kent Overstreet2013-03-2813-101/+89
| | | | | | | | | | | | | | | | | | Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | bcache: Fix for the build fixesKent Overstreet2013-03-251-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | Commit 82a84eaf7e51ba3da0c36cbc401034a4e943492d left a return 0 in closure_debug_init(). Whoops. Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | bcache: Style/checkpatch fixesKent Overstreet2013-03-2510-56/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | Took out some nested functions, and fixed some more checkpatch complaints. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | bcache: Build fixes from test robotKent Overstreet2013-03-255-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | config: make ARCH=i386 allmodconfig All error/warnings: drivers/md/bcache/bset.c: In function 'bch_ptr_bad': >> drivers/md/bcache/bset.c:164:2: warning: format '%li' expects argument of type 'long int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/debug.c: In function 'bch_pbtree': >> drivers/md/bcache/debug.c:86:4: warning: format '%li' expects argument of type 'long int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/btree.c: In function 'bch_btree_read_done': >> drivers/md/bcache/btree.c:245:8: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/closure.o: In function `closure_debug_init': >> (.init.text+0x0): multiple definition of `init_module' >> drivers/md/bcache/super.o:super.c:(.init.text+0x0): first defined here Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | bcache: A block layer cacheKent Overstreet2013-03-2329-0/+15683
| | | | | | | | | | | | | | | | | | | | | | | | | | | Does writethrough and writeback caching, handles unclean shutdown, and has a bunch of other nifty features motivated by real world usage. See the wiki at http://bcache.evilpiepirate.org for more. Signed-off-by: Kent Overstreet <koverstreet@google.com>
* | | Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-blockLinus Torvalds2013-05-0811-202/+104
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block core updates from Jens Axboe: - Major bit is Kents prep work for immutable bio vecs. - Stable candidate fix for a scheduling-while-atomic in the queue bypass operation. - Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging discard bios. - Tejuns changes to convert the writeback thread pool to the generic workqueue mechanism. - Runtime PM framework, SCSI patches exists on top of these in James' tree. - A few random fixes. * 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits) relay: move remove_buf_file inside relay_close_buf partitions/efi.c: replace useless kzalloc's by kmalloc's fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read() block: fix max discard sectors limit blkcg: fix "scheduling while atomic" in blk_queue_bypass_start Documentation: cfq-iosched: update documentation help for cfq tunables writeback: expose the bdi_wq workqueue writeback: replace custom worker pool implementation with unbound workqueue writeback: remove unused bdi_pending_list aoe: Fix unitialized var usage bio-integrity: Add explicit field for owner of bip_buf block: Add an explicit bio flag for bios that own their bvec block: Add bio_alloc_pages() block: Convert some code to bio_for_each_segment_all() block: Add bio_for_each_segment_all() bounce: Refactor __blk_queue_bounce to not use bi_io_vec raid1: use bio_copy_data() pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage pktcdvd: use bio_copy_data() block: Add bio_copy_data() ...
| * \ \ Merge branch 'writeback-workqueue' of ↵Jens Axboe2013-04-0216-153/+338
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq into for-3.10/core Tejun writes: ----- This is the pull request for the earlier patchset[1] with the same name. It's only three patches (the first one was committed to workqueue tree) but the merge strategy is a bit involved due to the dependencies. * Because the conversion needs features from wq/for-3.10, block/for-3.10/core is based on rc3, and wq/for-3.10 has conflicts with rc3, I pulled mainline (rc5) into wq/for-3.10 to prevent those workqueue conflicts from flaring up in block tree. * Resolving the issue that Jan and Dave raised about debugging requires arch-wide changes. The patchset is being worked on[2] but it'll have to go through -mm after these changes show up in -next, and not included in this pull request. The three commits are located in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git writeback-workqueue Pulling it into block/for-3.10/core produces a conflict in drivers/md/raid5.c between the following two commits. e3620a3ad5 ("MD RAID5: Avoid accessing gendisk or queue structs when not available") 2f6db2a707 ("raid5: use bio_reset()") The conflict is trivial - one removes an "if ()" conditional while the other removes "rbi->bi_next = NULL" right above it. We just need to remove both. The merged branch is available at git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git block-test-merge so that you can use it for verification. The test merge commit has proper merge description. While these changes are a bit of pain to route, they make code simpler and even have, while minute, measureable performance gain[3] even on a workload which isn't particularly favorable to showing the benefits of this conversion. ---- Fixed up the conflict. Conflicts: drivers/md/raid5.c Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | block: Add bio_alloc_pages()Kent Overstreet2013-03-231-13/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | More utility code to replace stuff that's getting open coded. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
| * | | | block: Convert some code to bio_for_each_segment_all()Kent Overstreet2013-03-232-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | More prep work for immutable bvecs: A few places in the code were either open coding or using the wrong version - fix. After we introduce the bvec iter, it'll no longer be possible to modify the biovec through bio_for_each_segment_all() - it doesn't increment a pointer to the current bvec, you pass in a struct bio_vec (not a pointer) which is updated with what the current biovec would be (taking into account bi_bvec_done and bi_size). So because of that it's more worthwhile to be consistent about bio_for_each_segment()/bio_for_each_segment_all() usage. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Alexander Viro <viro@zeniv.linux.org.uk>
| * | | | block: Add bio_for_each_segment_all()Kent Overstreet2013-03-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __bio_for_each_segment() iterates bvecs from the specified index instead of bio->bv_idx. Currently, the only usage is to walk all the bvecs after the bio has been advanced by specifying 0 index. For immutable bvecs, we need to split these apart; bio_for_each_segment() is going to have a different implementation. This will also help document the intent of code that's using it - bio_for_each_segment_all() is only legal to use for code that owns the bio. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Neil Brown <neilb@suse.de> CC: Boaz Harrosh <bharrosh@panasas.com>
| * | | | raid1: use bio_copy_data()Kent Overstreet2013-03-231-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This doesn't really delete any code _yet_, but once immutable bvecs are done we can just delete the rest of the code in that loop. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
| * | | | raid1: Refactor narrow_write_error() to not use bi_idxKent Overstreet2013-03-231-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | More bi_idx removal. This code was just open coding bio_clone(). This could probably be further improved by using bio_advance() instead of skipping over null pages, but that'd be a larger rework. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
| * | | | raid5: use bio_reset()Kent Overstreet2013-03-231-14/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Had to shuffle the code around a bit (where bi_rw and bi_end_io were set), but shouldn't really be anything tricky here Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
| * | | | raid1: use bio_reset()Kent Overstreet2013-03-231-18/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
| * | | | raid10: Use bio_reset()Kent Overstreet2013-03-231-22/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | More prep work for immutable bio vecs, mainly getting rid of references to bi_idx. bio_reset was being open coded in a few places. The one in sync_request was a bit nontrivial to convert, so could use some extra eyeballs. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> Acked-by: NeilBrown <neilb@suse.de>
| * | | | block: Add submit_bio_wait(), remove from mdKent Overstreet2013-03-232-38/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Random cleanup - this code was duplicated and it's not really specific to md. Also added the ability to return the actual error code. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> Acked-by: Tejun Heo <tj@kernel.org>
| * | | | block: Remove bi_idx referencesKent Overstreet2013-03-232-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For immutable bvecs, all bi_idx usage needs to be audited - so here we're removing all the unnecessary uses. Most of these are places where it was being initialized on a bio that was just allocated, a few others are conversions to standard macros. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk>
| * | | | block: Change bio_split() to respect the current value of bi_idxKent Overstreet2013-03-232-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current code bio_split() won't be seeing partially completed bios so this doesn't change any behaviour, but this makes the code a bit clearer as to what bio_split() actually requires. The immediate purpose of the patch is removing unnecessary bi_idx references, but the end goal is to allow partial completed bios to be submitted, which along with immutable biovecs enables effecient bio splitting. Some of the callers were (double) checking that bios could be split, so update their checks too. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Lars Ellenberg <drbd-dev@lists.linbit.com> CC: Neil Brown <neilb@suse.de> CC: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | block: Use bio_sectors() more consistentlyKent Overstreet2013-03-235-30/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bunch of places in the code weren't using it where they could be - this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx into a struct bvec_iter. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: "Ed L. Cashin" <ecashin@coraid.com> CC: Nick Piggin <npiggin@kernel.dk> CC: Jiri Kosina <jkosina@suse.cz> CC: Jim Paris <jim@jtan.com> CC: Geoff Levand <geoff@infradead.org> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Neil Brown <neilb@suse.de> CC: Steven Rostedt <rostedt@goodmis.org> Acked-by: Ed Cashin <ecashin@coraid.com>
| * | | | block: Add bio_end_sector()Kent Overstreet2013-03-236-17/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just a little convenience macro - main reason to add it now is preparing for immutable bio vecs, it'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx into a struct bvec_iter. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Lars Ellenberg <drbd-dev@lists.linbit.com> CC: Jiri Kosina <jkosina@suse.cz> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Neil Brown <neilb@suse.de> CC: Martin Schwidefsky <schwidefsky@de.ibm.com> CC: Heiko Carstens <heiko.carstens@de.ibm.com> CC: linux-s390@vger.kernel.org CC: Chris Mason <chris.mason@fusionio.com> CC: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | md: Convert md_trim_bio() to use bio_advance()Kent Overstreet2013-03-231-13/+4
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> Acked-by: NeilBrown <neilb@suse.de>
* | | | block_device_operations->release() should return voidAl Viro2013-05-072-6/+2
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | The value passed is 0 in all but "it can never happen" cases (and those only in a couple of drivers) *and* it would've been lost on the way out anyway, even if something tried to pass something meaningful. Just don't bother. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | MD: ignore discard request for hard disks of hybid raid1/raid10 arrayShaohua Li2013-04-302-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In SSD/hard disk hybid storage, discard request should be ignored for hard disk. We used to be doing this way, but the unplug path forgets it. This is suitable for stable tree since v3.6. Cc: stable@vger.kernel.org Reported-and-tested-by: Markus <M4rkusXXL@web.de> Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | md: bad block list should default to disabled.NeilBrown2013-04-301-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Maintenance of a bad-block-list currently defaults to 'enabled' and is then disabled when it cannot be supported. This is backwards and causes problem for dm-raid which didn't know to disable it. So fix the defaults, and only enabled for v1.x metadata which explicitly has bad blocks enabled. The problem with dm-raid has been present since badblock support was added in v3.1, so this patch is suitable for any -stable from 3.1 onwards. Cc: stable@vger.kernel.org (3.1+) Reported-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | md: raid1/raid10 md devices leak memory when stoppingHirokazu Takahashi2013-04-302-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hi. Raid1 and raid10 devices leak memory every time they stop. This is a patch for linux-3.9.0-rc7 to fix this problem. Thanks, Hirokazu Takahashi. Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp> Signed-off-by: NeilBrown <neilb@suse.de>
* | | DM RAID: Add message/status support for changing sync actionJonathan Brassow2013-04-241-2/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DM RAID: Add message/status support for changing sync action This patch adds a message interface to dm-raid to allow the user to more finely control the sync actions being performed by the MD driver. This gives the user the ability to initiate "check" and "repair" (i.e. scrubbing). Two additional fields have been appended to the status output to provide more information about the type of sync action occurring and the results of those actions, specifically: <sync_action> and <mismatch_cnt>. These new fields will always be populated. This is essentially the device-mapper way of doing what MD controls through the 'sync_action' sysfs file and shows through the 'mismatch_cnt' sysfs file. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | MD: Export 'md_reap_sync_thread' functionJonathan Brassow2013-04-242-50/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MD: Export 'md_reap_sync_thread' function Make 'md_reap_sync_thread' available to other files, specifically dm-raid.c. - rename reap_sync_thread to md_reap_sync_thread - move the fn after md_check_recovery to match md.h declaration placement - export md_reap_sync_thread Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | md: don't update metadata when stopping a read-only array.NeilBrown2013-04-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | read-only arrays should stay that way as much as possible. Updating the metadata - which could be triggered by a re-add while assembling the array metadata - should be avoided. Signed-off-by: NeilBrown <neilb@suse.de>
* | | md: Allow devices to be re-added to a read-only array.NeilBrown2013-04-241-26/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When assembling an array incrementally we might want to make it device available when "enough" devices are present, but maybe not "all" devices are present. If the remaining devices appear before the array is actually used, they should be added transparently. We do this by using the "read-auto" mode where the array acts like it is read-only until a write request arrives. Current an add-device request switches a read-auto array to active. This means that only one device can be added after the array is first made read-auto. This isn't a problem for RAID5, but is not ideal for RAID6 or RAID10. Also we don't really want to switch the array to read-auto at all when re-adding a device as this doesn't really imply any change. So: - remove the "md_update_sb()" call from add_new_disk(). This isn't really needed as just adding a disk doesn't require a metadata update. Instead, just set MD_CHANGE_DEVS. This will effect a metadata update soon enough, once the array is not read-only. - Allow the ADD_NEW_DISK ioctl to succeed without activating a read-auto array, providing the MD_DISK_SYNC flag is set. In this case, the device will be rejected if it cannot be added with the correct device number, or has an incorrect event count. - Teach remove_and_add_spares() to be careful about adding spares when the array is read-only (or read-mostly) - only add devices that are thought to be in-sync, and only do it if the array is in-sync itself. - In md_check_recovery, use remove_and_add_spares in the read-only case, rather than open coding just the 'remove' part of it. Reported-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
* | | md/raid10: Allow skipping recovery when clean arrays are assembledMartin Wilck2013-04-241-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an array is assembled incrementally with mdadm -I -R and the array switches to "active" mode, md starts a recovery. If the array was clean, the "fullsync" flag will be 0. Skip the full recovery in this case, as RAID1 does (the code was actually copied from the sync_request() method of RAID1). Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>