aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/md
Commit message (Collapse)AuthorAgeFilesLines
...
| * | md/raid6: Fix raid-6 read-error correction in degraded stateGabriele A. Trombetti2010-05-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix: Raid-6 was not trying to correct a read-error when in singly-degraded state and was instead dropping one more device, going to doubly-degraded state. This patch fixes this behaviour. Tested-by: Janos Haar <janos.haar@netcenter.hu> Signed-off-by: Gabriele A. Trombetti <g.trombetti.lkrnl1213@logicschema.com> Reported-by: Janos Haar <janos.haar@netcenter.hu> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org
| * | md/raid5: fix previous patch.NeilBrown2010-04-231-17/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous patch changes stripe and chunk_number to sector_t but mistakenly did not update all of the divisions to use sector_dev(). This patch changes all the those divisions (actually the '%' operator) to sector_div. Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org Tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
| * | md/raid5: allow for more than 2^31 chunks.NeilBrown2010-04-201-12/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With many large drives and small chunk sizes it is possible to create a RAID5 with more than 2^31 chunks. Make sure this works. Reported-by: Brett King <king.br@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org
| * | include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-3014-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
| * | Merge branch 'for-linus' of git://neil.brown.name/mdLinus Torvalds2010-03-185-41/+60
| |\ \ | | | | | | | | | | | | | | | | * 'for-linus' of git://neil.brown.name/md: md: deal with merge_bvec_fn in component devices better.
| * | | Driver core: Constify struct sysfs_ops in struct kobj_typeEmese Revfy2010-03-072-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Constify struct sysfs_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: David Teigland <teigland@redhat.com> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Hans J. Koch <hjk@linutronix.de> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | | | md: don't insist on valid event count for spare devices.NeilBrown2010-05-181-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Devices which know that they are spares do not really need to have an event count that matches the rest of the array, so there are no data-in-sync issues. It is enough that the uuid matches. So remove the requirement that the event count is up-to-date. We currently still write out and event count on spares, but this allows us in a year or 3 to stop doing that completely. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: simplify updating of event count to sometimes avoid updating spares.NeilBrown2010-05-182-20/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When updating the event count for a simple clean <-> dirty transition, we try to avoid updating the spares so they can safely spin-down. As the event_counts across an array must be +/- 1, this means decrementing the event_count on a dirty->clean transition. This is not always safe and we have to avoid the unsafe time. We current do this with a misguided idea about it being safe or not depending on whether the event_count is odd or even. This approach only works reliably in a few common instances, but easily falls down. So instead, simply keep internal state concerning whether it is safe or not, and always assume it is not safe when an array is first assembled. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md/raid6: Fix raid-6 read-error correction in degraded stateGabriele A. Trombetti2010-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix: Raid-6 was not trying to correct a read-error when in singly-degraded state and was instead dropping one more device, going to doubly-degraded state. This patch fixes this behaviour. Tested-by: Janos Haar <janos.haar@netcenter.hu> Signed-off-by: Gabriele A. Trombetti <g.trombetti.lkrnl1213@logicschema.com> Reported-by: Janos Haar <janos.haar@netcenter.hu> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org
* | | | md: restore ability of spare drives to spin down.NeilBrown2010-05-181-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some time ago we stopped the clean/active metadata updates from being written to a 'spare' device in most cases so that it could spin down and say spun down. Device failure/removal etc are still recorded on spares. However commit 51d5668cb2e3fd1827a55 broke this 50% of the time, depending on whether the event count is even or odd. The change log entry said: This means that the alignment between 'odd/even' and 'clean/dirty' might take a little longer to attain, how ever the code makes no attempt to create that alignment, so it could take arbitrarily long. So when we find that clean/dirty is not aligned with odd/even, force a second metadata-update immediately. There are already cases where a second metadata-update is needed immediately (e.g. when a device fails during the metadata update). We just piggy-back on that. Reported-by: Joe Bryant <tenminjoe@yahoo.com> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org
* | | | md: Fix read balancing in RAID1 and RAID10 on drives > 2TBNeilBrown2010-05-182-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | read_balance uses a "unsigned long" for a sector number which will get truncated beyond 2TB. This will cause read-balancing to be non-optimal, and can cause data to be read from the 'wrong' branch during a resync. This has a very small chance of returning wrong data. Reported-by: Jordan Russell <jr-list-2010@quo.to> Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md/linear: standardise all printk messagesNeilBrown2010-05-181-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | md/linear:mdname: Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md/raid0: tidy up printk messages.NeilBrown2010-05-181-45/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | All messages now start md/raid0:md-device-name: Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md/raid10: tidy up printk messages.NeilBrown2010-05-181-30/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | All raid10 printk messages now start md/raid10:md-device-name: Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md/raid1: improve printk messagesNeilBrown2010-05-181-28/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure the array name is included in a uniform way in all printk messages. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md/raid5: improve consistency of error messages.NeilBrown2010-05-181-80/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many 'printk' messages from the raid456 module mention 'raid5' even though it may be a 'raid6' or even 'raid4' array. This can cause confusion. Also the actual array name is not always reported and when it is it is not reported consistently. So change all the messages to start: md/raid:%s: where '%s' becomes e.g. md3 to identify the particular array. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: remove EXPERIMENTAL designation from RAID10NeilBrown2010-05-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RAID10 has been available for quite a while now and is quite well tested, so we can remove the EXPERIMENTAL designation. Reported-by: Eric MSP Veith <eveith@wwweb-library.net> Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: allow integers to be passed to md/levelDan Williams2010-05-181-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | e.g. allow md to interpret 'echo 4 > md/level' as a request for raid4. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | | | md: notify mdstat waiters of level changeDan Williams2010-05-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Level modifications change the output of mdstat. The mdmon manager thread is interested in these events for external metadata management. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | | | md/raid4: permit raid0 takeoverDan Williams2010-05-181-15/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | For consistency allow raid4 to takeover raid0 in addition to raid5 (with a raid4 layout). Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | | | md/raid1: delay reads that could overtake behind-writes.NeilBrown2010-05-183-8/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a raid1 array is configured to support write-behind on some devices, it normally only reads from other devices. If all devices are write-behind (because the rest have failed) it is possible for a read request to be serviced before a behind-write request, which would appear as data corruption. So when forced to read from a WriteMostly device, wait for any write-behind to complete, and don't start any more behind-writes. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md/raid1: fix confusing 'redirect sector' message.NeilBrown2010-05-181-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This message seems to suggest the named device is the one on which a read failed, however it is actually the device that the read will be redirected to. So make the message a little clearer. Reported-by: Tim Burgess <ozburgess@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: don't unregister the thread in mddev_suspendNeilBrown2010-05-181-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is - unnecessary because mddev_suspend is always followed by a call to ->stop, and each ->stop unregisters the thread, and - a problem as it makes it awkwards to suspend and then resume a device as we will want later. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: factor out init code for an mddevNeilBrown2010-05-181-17/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a simple factorisation that makes mddev_find easier to read. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: pass mddev to make_request functions rather than request_queueNeilBrown2010-05-189-26/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to pass the personality make_request function direct to the block layer so the first argument had to be a queue. But now we have the intermediary md_make_request so it makes at lot more sense to pass a struct mddev_s. It makes it possible to have an mddev without its own queue too. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: call md_stop_writes from md_stopNeilBrown2010-05-181-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This moves the call to the other side of set_readonly, but that should not be an issue. This encapsulates in 'md_stop' all of the functionality for internally stopping the array, leaving all the interactions with externalities (sysfs, request_queue, gendisk) in do_md_stop. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: split md_set_readonly out of do_md_stopNeilBrown2010-05-181-39/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using do_md_stop to set an array to read-only is a little confusing. Now most of the common code has been factored out, split md_set_readonly off in to a separate function. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: factor md_stop_writes out of do_md_stop.NeilBrown2010-05-181-15/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Further refactoring of do_md_stop. This one requires some explanation as it takes code from different places in do_md_stop, so some re-ordering happens. We only get into this part of do_md_stop if there are no active opens of the device, so no writes can be happening and the device must have been flushed. In md_stop_writes we want to stop any internal sources of writes - i.e. resync - and flush out the metadata. The only code that was previously before some of this code is code to clean up the queue, the mddev, the gendisk, or sysfs, all of which is probably better after code that makes active changes (i.e. triggers writes). Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: start to refactor do_md_stopNeilBrown2010-05-181-43/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do_md_stop is large and clunky, so hard to understand. This is a first step of refactoring, pulling two simple sub-functions out. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: factor do_md_run to separate accesses to ->gendiskNeilBrown2010-05-181-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As part of relaxing the binding between an mddev and gendisk, we separate do_md_run into two functions. md_run does all the work internal to md do_md_run calls md_run and makes and changes to gendisk that are required. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: remove ->changed and related code.NeilBrown2010-05-184-25/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We set ->changed to 1 and call check_disk_change at the end of md_open so that bd_invalidated would be set and thus partition rescan would happen appropriately. Now that we call revalidate_disk directly, which sets bd_invalidates, that indirection is no longer needed and can be removed. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: don't reference gendisk in getgeoNeilBrown2010-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using ->array_sectors rather than get_capacity() is more direct and is a step towards relaxing the tight connection between mddev and gendisk. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: move io accounting out of personalities into md_make_requestNeilBrown2010-05-187-45/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While I generally prefer letting personalities do as much as possible, given that we have a central md_make_request anyway we may as well use it to simplify code. Also this centralises knowledge of ->gendisk which will help later. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md/raid5: small tidyup in raid5_align_endioNeilBrown2010-05-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Diving through ->queue to find mddev is unnecessarily complex - there is an easier path to finding mddev, so use that. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: add support for raid5 to raid4 conversionNeilBrown2010-05-181-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is unlikely to be wanted, but we may as well provide it for completeness. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: notify level changes through sysfs.Maciej Trela2010-05-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Level changes can be very significant, so make sure to notify them via sysfs. Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: Relax checks on ->max_disks when external metadata handling is used.NeilBrown2010-05-181-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When metadata is being managed by user-space, md doesn't know what the maximum number of devices allowed in an array is so ->max_disks is 0. In this case we should allow any (+ve) number of disks. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: Correctly handle device removal via sysfsMaciej Trela2010-05-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Writing "none" to "../md/dev-xx/slot" removes that device from being an active part of the array, but it didn't set ->raid_disk to -1 to record this fact. Signed-off-by: Maciej Trela <Maciej.Trela@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: Add support for Raid0->Raid10 takeoverTrela, Maciej2010-05-182-51/+155
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: Add support for Raid5->Raid0 and Raid10->Raid0 takeoverTrela, Maciej2010-05-183-6/+129
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md:Add support for Raid0->Raid5 takeoverTrela Maciej2010-05-182-0/+40
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: don't use mddev->raid_disks in raid0 or raid10 while array is active.NeilBrown2010-05-182-13/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a subsequent patch we will make it possible to change mddev->raid_disks while a RAID0 or RAID10 array is active. This is part of the process of reshaping such an array. This means that we cannot use this value while processes requests (it is OK to use it during initialisation as we are locked against changes then). Both RAID0 and RAID10 have the same value stored in the private data structure, so use that value instead. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: discard StateChanged device flag.NeilBrown2010-05-182-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was needed when sysfs files could only be 'notified' from process context. Now that we have sys_notify_direct, we can call it directly from an interrupt. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | drivers/md: Remove unnecessary casts of void *H Hartley Sweeten2010-05-186-21/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | void pointers do not need to be cast to other pointer types. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: expose max value of behind writes counterPaul Clements2010-05-182-1/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep track of the maximum number of concurrent write-behind requests for an md array and exposed this number in sysfs at md/bitmap/max_backlog_used Writing any value to this file will clear it. This allows userspace to be involved in tuning bitmap/backlog. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: remove some dead fields from mddev_sNeilBrown2010-05-181-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These fields have never been used. commit 4b6d287f627b5fb6a49f78f9e81649ff98c62bb7 added them, but also added identical files to bitmap_super_s, and only used the latter. So remove these unused fields. Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md/raid1: fix counting of write targets.NeilBrown2010-05-181-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a very small race window when writing to a RAID1 such that if a device is marked faulty at exactly the wrong time, the write-in-progress will not be sent to the device, but the bitmap (if present) will be updated to say that the write was sent. Then if the device turned out to still be usable as was re-added to the array, the bitmap-based-resync would skip resyncing that block, possibly leading to corruption. This would only be a problem if no further writes were issued to that area of the device (i.e. that bitmap chunk). Suitable for any pending -stable kernel. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: manage redundancy group in sysfs when changing level.NeilBrown2010-05-173-13/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some levels expect the 'redundancy group' to be present, others don't. So when we change level of an array we might need to add or remove this group. This requires fixing up the current practice of overloading ->private to indicate (when ->pers == NULL) that something needs to be removed. So create a new ->to_remove to fill that role. When changing levels, we may need to add or remove attributes. When changing RAID5 -> RAID6, we both add and remove the same thing. It is important to catch this and optimise it out as the removal is delayed until a lock is released, so trying to add immediately would cause problems. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md: remove unneeded sysfs files more promptlyNeilBrown2010-05-171-10/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an array is stopped we need to remove some sysfs files which are dependent on the type of array. We need to delay that deletion as deleting them while holding reconfig_mutex can lead to deadlocks. We currently delay them until the array is completely destroyed. However it is possible to deactivate and then reactivate the array. It is also possible to need to remove sysfs files when changing level, which can potentially happen several times before an array is destroyed. So we need to delete these files more promptly: as soon as reconfig_mutex is dropped. We need to ensure this happens before do_md_run can restart the array, so we use open_mutex for some extra locking. This is not deadlock prone. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
* | | | md/linear: avoid possible oops and array stopNeilBrown2010-05-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit ef286f6fa673cd7fb367e1b145069d8dbfcc6081 it has been important that each personality clears ->private in the ->stop() function, or sets it to a attribute group to be removed. linear.c doesn't. This can sometimes lead to an oops, though it doesn't always. Suitable for 2.6.33-stable and 2.6.34. Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org