aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* md: remove raid5 compute_block and compute_parity5Dan Williams2007-07-131-124/+0
| | | | | | | replaced by raid5_run_ops Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
* md: handle_stripe5 - request io processing in raid5_run_opsDan Williams2007-07-131-58/+13
| | | | | | | | | I/O submission requests were already handled outside of the stripe lock in handle_stripe. Now that handle_stripe is only tasked with finding work, this logic belongs in raid5_run_ops. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
* md: handle_stripe5 - add request/completion logic for async expand opsDan Williams2007-07-131-12/+38
| | | | | | | | | | | | | | | | When a stripe is being expanded bulk copying takes place to move the data from the old stripe to the new. Since raid5_run_ops only operates on one stripe at a time these bulk copies are handled in-line under the stripe lock. In the dma offload case we poll for the completion of the operation. After the data has been copied into the new stripe the parity needs to be recalculated across the new disks. We reuse the existing postxor functionality to carry out this calculation. By setting STRIPE_OP_POSTXOR without setting STRIPE_OP_BIODRAIN the completion path in handle stripe can differentiate expand operations from normal write operations. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
* md: handle_stripe5 - add request/completion logic for async read opsDan Williams2007-07-132-29/+26
| | | | | | | | | | | | | | | | When a read bio is attached to the stripe and the corresponding block is marked R5_UPTODATE, then a read (biofill) operation is scheduled to copy the data from the stripe cache to the bio buffer. handle_stripe flags the blocks to be operated on with the R5_Wantfill flag. If new read requests arrive while raid5_run_ops is running they will not be handled until handle_stripe is scheduled to run again. Changelog: * cleanup to_read and to_fill accounting * do not fail reads that have reached the cache Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
* md: handle_stripe5 - add request/completion logic for async check opsDan Williams2007-07-131-19/+65
| | | | | | | | | | | | | | | | | | | | | | | Check operations are scheduled when the array is being resynced or an explicit 'check/repair' command was sent to the array. Previously check operations would destroy the parity block in the cache such that even if parity turned out to be correct the parity block would be marked !R5_UPTODATE at the completion of the check. When the operation can be carried out by a dma engine the assumption is that it can check parity as a read-only operation. If raid5_run_ops notices that the check was handled by hardware it will preserve the R5_UPTODATE status of the parity disk. When a check operation determines that the parity needs to be repaired we reuse the existing compute block infrastructure to carry out the operation. Repair operations imply an immediate write back of the data, so to differentiate a repair from a normal compute operation the STRIPE_OP_MOD_REPAIR_PD flag is added. Changelog: * remove test_and_set/test_and_clear BUG_ONs, Neil Brown Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
* md: handle_stripe5 - add request/completion logic for async compute opsDan Williams2007-07-132-36/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | handle_stripe will compute a block when a backing disk has failed, or when it determines it can save a disk read by computing the block from all the other up-to-date blocks. Previously a block would be computed under the lock and subsequent logic in handle_stripe could use the newly up-to-date block. With the raid5_run_ops implementation the compute operation is carried out a later time outside the lock. To preserve the old functionality we take advantage of the dependency chain feature of async_tx to flag the block as R5_Wantcompute and then let other parts of handle_stripe operate on the block as if it were up-to-date. raid5_run_ops guarantees that the block will be ready before it is used in another operation. However, this only works in cases where the compute and the dependent operation are scheduled at the same time. If a previous call to handle_stripe sets the R5_Wantcompute flag there is no facility to pass the async_tx dependency chain across successive calls to raid5_run_ops. The req_compute variable protects against this case. Changelog: * remove the req_compute BUG_ON Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
* md: handle_stripe5 - add request/completion logic for async write opsDan Williams2007-07-131-23/+138
| | | | | | | | | | | | | | | | | | | | | | | | After handle_stripe5 decides whether it wants to perform a read-modify-write, or a reconstruct write it calls handle_write_operations5. A read-modify-write operation will perform an xor subtraction of the blocks marked with the R5_Wantprexor flag, copy the new data into the stripe (biodrain) and perform a postxor operation across all up-to-date blocks to generate the new parity. A reconstruct write is run when all blocks are already up-to-date in the cache so all that is needed is a biodrain and postxor. On the completion path STRIPE_OP_PREXOR will be set if the operation was a read-modify-write. The STRIPE_OP_BIODRAIN flag is used in the completion path to differentiate write-initiated postxor operations versus expansion-initiated postxor operations. Completion of a write triggers i/o to the drives. Changelog: * make the 'rcw' parameter to handle_write_operations5 a simple flag, Neil Brown * remove test_and_set/test_and_clear BUG_ONs, Neil Brown Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
* md: common infrastructure for running operations with raid5_run_opsDan Williams2007-07-131-9/+58
| | | | | | | | | | | | | | | | | | | | | | All the handle_stripe operations that are to be transitioned to use raid5_run_ops need a method to coherently gather work under the stripe-lock and hand that work off to raid5_run_ops. The 'get_stripe_work' routine runs under the lock to read all the bits in sh->ops.pending that do not have the corresponding bit set in sh->ops.ack. This modified 'pending' bitmap is then passed to raid5_run_ops for processing. The transition from 'ack' to 'completion' does not need similar protection as the existing release_stripe infrastructure will guarantee that handle_stripe will run again after a completion bit is set, and handle_stripe can tolerate a sh->ops.completed bit being set while the lock is held. A call to async_tx_issue_pending_all() is added to raid5d to kick the offload engines once all pending stripe operations work has been submitted. This enables batching of the submission and completion of operations. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
* md: raid5_run_ops - run stripe operations outside sh->lockDan Williams2007-07-132-3/+614
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the raid acceleration work was proposed, Neil laid out the following attack plan: 1/ move the xor and copy operations outside spin_lock(&sh->lock) 2/ find/implement an asynchronous offload api The raid5_run_ops routine uses the asynchronous offload api (async_tx) and the stripe_operations member of a stripe_head to carry out xor+copy operations asynchronously, outside the lock. To perform operations outside the lock a new set of state flags is needed to track new requests, in-flight requests, and completed requests. In this new model handle_stripe is tasked with scanning the stripe_head for work, updating the stripe_operations structure, and finally dropping the lock and calling raid5_run_ops for processing. The following flags outline the requests that handle_stripe can make of raid5_run_ops: STRIPE_OP_BIOFILL - copy data into request buffers to satisfy a read request STRIPE_OP_COMPUTE_BLK - generate a missing block in the cache from the other blocks STRIPE_OP_PREXOR - subtract existing data as part of the read-modify-write process STRIPE_OP_BIODRAIN - copy data out of request buffers to satisfy a write request STRIPE_OP_POSTXOR - recalculate parity for new data that has entered the cache STRIPE_OP_CHECK - verify that the parity is correct STRIPE_OP_IO - submit i/o to the member disks (note this was already performed outside the stripe lock, but it made sense to add it as an operation type The flow is: 1/ handle_stripe sets STRIPE_OP_* in sh->ops.pending 2/ raid5_run_ops reads sh->ops.pending, sets sh->ops.ack, and submits the operation to the async_tx api 3/ async_tx triggers the completion callback routine to set sh->ops.complete and release the stripe 4/ handle_stripe runs again to finish the operation and optionally submit new operations that were previously blocked Note this patch just defines raid5_run_ops, subsequent commits (one per major operation type) modify handle_stripe to take advantage of this routine. Changelog: * removed ops_complete_biodrain in favor of ops_complete_postxor and ops_complete_write. * removed the raid5_run_ops workqueue * call bi_end_io for reads in ops_complete_biofill, saves a call to handle_stripe * explicitly handle the 2-disk raid5 case (xor becomes memcpy), Neil Brown * fix race between async engines and bi_end_io call for reads, Neil Brown * remove unnecessary spin_lock from ops_complete_biofill * remove test_and_set/test_and_clear BUG_ONs, Neil Brown * remove explicit interrupt handling for channel switching, this feature was absorbed (i.e. it is now implicit) by the async_tx api * use return_io in ops_complete_biofill Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
* raid5: replace custom debug PRINTKs with standard pr_debugDan Williams2007-07-131-58/+58
| | | | | | | | | Replaces PRINTK with pr_debug, and kills the RAID5_DEBUG definition in favor of the global DEBUG definition. To get local debug messages just add '#define DEBUG' to the top of the file. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
* raid5: refactor handle_stripe5 and handle_stripe6 (v3)Dan Williams2007-07-132-786/+756
| | | | | | | | | | | | | | | | | | | | | | | | | | | handle_stripe5 and handle_stripe6 have very deep logic paths handling the various states of a stripe_head. By introducing the 'stripe_head_state' and 'r6_state' objects, large portions of the logic can be moved to sub-routines. 'struct stripe_head_state' consumes all of the automatic variables that previously stood alone in handle_stripe5,6. 'struct r6_state' contains the handle_stripe6 specific variables like p_failed and q_failed. One of the nice side effects of the 'stripe_head_state' change is that it allows for further reductions in code duplication between raid5 and raid6. The following new routines are shared between raid5 and raid6: handle_completed_write_requests handle_requests_to_failed_array handle_stripe_expansion Changes: * v2: fixed 'conf->raid_disk-1' for the raid6 'handle_stripe_expansion' path * v3: removed the unused 'dirty' field from struct stripe_head_state * v3: coalesced open coded bi_end_io routines into return_io() Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
* async_tx: add the async_tx apiDan Williams2007-07-1314-50/+1294
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The async_tx api provides methods for describing a chain of asynchronous bulk memory transfers/transforms with support for inter-transactional dependencies. It is implemented as a dmaengine client that smooths over the details of different hardware offload engine implementations. Code that is written to the api can optimize for asynchronous operation and the api will fit the chain of operations to the available offload resources. I imagine that any piece of ADMA hardware would register with the 'async_*' subsystem, and a call to async_X would be routed as appropriate, or be run in-line. - Neil Brown async_tx exploits the capabilities of struct dma_async_tx_descriptor to provide an api of the following general format: struct dma_async_tx_descriptor * async_<operation>(..., struct dma_async_tx_descriptor *depend_tx, dma_async_tx_callback cb_fn, void *cb_param) { struct dma_chan *chan = async_tx_find_channel(depend_tx, <operation>); struct dma_device *device = chan ? chan->device : NULL; int int_en = cb_fn ? 1 : 0; struct dma_async_tx_descriptor *tx = device ? device->device_prep_dma_<operation>(chan, len, int_en) : NULL; if (tx) { /* run <operation> asynchronously */ ... tx->tx_set_dest(addr, tx, index); ... tx->tx_set_src(addr, tx, index); ... async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param); } else { /* run <operation> synchronously */ ... <operation> ... async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param); } return tx; } async_tx_find_channel() returns a capable channel from its pool. The channel pool is organized as a per-cpu array of channel pointers. The async_tx_rebalance() routine is tasked with managing these arrays. In the uniprocessor case async_tx_rebalance() tries to spread responsibility evenly over channels of similar capabilities. For example if there are two copy+xor channels, one will handle copy operations and the other will handle xor. In the SMP case async_tx_rebalance() attempts to spread the operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor channel0 while cpu1 gets copy channel 1 and xor channel 1. When a dependency is specified async_tx_find_channel defaults to keeping the operation on the same channel. A xor->copy->xor chain will stay on one channel if it supports both operation types, otherwise the transaction will transition between a copy and a xor resource. Currently the raid5 implementation in the MD raid456 driver has been converted to the async_tx api. A driver for the offload engines on the Intel Xscale series of I/O processors, iop-adma, is provided in a later commit. With the iop-adma driver and async_tx, raid456 is able to offload copy, xor, and xor-zero-sum operations to hardware engines. On iop342 tiobench showed higher throughput for sequential writes (20 - 30% improvement) and sequential reads to a degraded array (40 - 55% improvement). For the other cases performance was roughly equal, +/- a few percentage points. On a x86-smp platform the performance of the async_tx implementation (in synchronous mode) was also +/- a few percentage points of the original implementation. According to 'top' on iop342 CPU utilization drops from ~50% to ~15% during a 'resync' while the speed according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s. The tiobench command line used for testing was: tiobench --size 2048 --block 4096 --block 131072 --dir /mnt/raid --numruns 5 * iop342 had 1GB of memory available Details: * if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making async_tx_find_channel a static inline routine that always returns NULL * when a callback is specified for a given transaction an interrupt will fire at operation completion time and the callback will occur in a tasklet. if the the channel does not support interrupts then a live polling wait will be performed * the api is written as a dmaengine client that requests all available channels * In support of dependencies the api implicitly schedules channel-switch interrupts. The interrupt triggers the cleanup tasklet which causes pending operations to be scheduled on the next channel * Xor engines treat an xor destination address differently than a software xor routine. To the software routine the destination address is an implied source, whereas engines treat it as a write-only destination. This patch modifies the xor_blocks routine to take a an explicit destination address to mirror the hardware. Changelog: * fixed a leftover debug print * don't allow callbacks in async_interrupt_cond * fixed xor_block changes * fixed usage of ASYNC_TX_XOR_DROP_DEST * drop dma mapping methods, suggested by Chris Leech * printk warning fixups from Andrew Morton * don't use inline in C files, Adrian Bunk * select the API when MD is enabled * BUG_ON xor source counts <= 1 * implicitly handle hardware concerns like channel switching and interrupts, Neil Brown * remove the per operation type list, and distribute operation capabilities evenly amongst the available channels * simplify async_tx_find_channel to optimize the fast path * introduce the channel_table_initialized flag to prevent early calls to the api * reorganize the code to mimic crypto * include mm.h as not all archs include it in dma-mapping.h * make the Kconfig options non-user visible, Adrian Bunk * move async_tx under crypto since it is meant as 'core' functionality, and the two may share algorithms in the future * move large inline functions into c files * checkpatch.pl fixes * gpl v2 only correction Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-By: NeilBrown <neilb@suse.de>
* xor: make 'xor_blocks' a library routine for use with async_txDan Williams2007-07-138-23/+38
| | | | | | | | | | | | | | | | | | The async_tx api tries to use a dma engine for an operation, but will fall back to an optimized software routine otherwise. Xor support is implemented using the raid5 xor routines. For organizational purposes this routine is moved to a common area. The following fixes are also made: * rename xor_block => xor_blocks, suggested by Adrian Bunk * ensure that xor.o initializes before md.o in the built-in case * checkpatch.pl fixes * mark calibrate_xor_blocks __init, Adrian Bunk Cc: Adrian Bunk <bunk@stusta.de> Cc: NeilBrown <neilb@suse.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* dmaengine: make clients responsible for managing channelsDan Williams2007-07-135-167/+224
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation assumes that a channel will only be used by one client at a time. In order to enable channel sharing the dmaengine core is changed to a model where clients subscribe to channel-available-events. Instead of tracking how many channels a client wants and how many it has received the core just broadcasts the available channels and lets the clients optionally take a reference. The core learns about the clients' needs at dma_event_callback time. In support of multiple operation types, clients can specify a capability mask to only be notified of channels that satisfy a certain set of capabilities. Changelog: * removed DMA_TX_ARRAY_INIT, no longer needed * dma_client_chan_free -> dma_chan_release: switch to global reference counting only at device unregistration time, before it was also happening at client unregistration time * clients now return dma_state_client to dmaengine (ack, dup, nak) * checkpatch.pl fixes * fixup merge with git-ioat Cc: Chris Leech <christopher.leech@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: David S. Miller <davem@davemloft.net>
* dmaengine: refactor dmaengine around dma_async_tx_descriptorDan Williams2007-07-134-253/+474
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current dmaengine interface defines mutliple routines per operation, i.e. dma_async_memcpy_buf_to_buf, dma_async_memcpy_buf_to_page etc. Adding more operation types (xor, crc, etc) to this model would result in an unmanageable number of method permutations. Are we really going to add a set of hooks for each DMA engine whizbang feature? - Jeff Garzik The descriptor creation process is refactored using the new common dma_async_tx_descriptor structure. Instead of per driver do_<operation>_<dest>_to_<src> methods, drivers integrate dma_async_tx_descriptor into their private software descriptor and then define a 'prep' routine per operation. The prep routine allocates a descriptor and ensures that the tx_set_src, tx_set_dest, tx_submit routines are valid. Descriptor creation and submission becomes: struct dma_device *dev; struct dma_chan *chan; struct dma_async_tx_descriptor *tx; tx = dev->device_prep_dma_<operation>(chan, len, int_flag) tx->tx_set_src(dma_addr_t, tx, index /* for multi-source ops */) tx->tx_set_dest(dma_addr_t, tx, index) tx->tx_submit(tx) In addition to the refactoring, dma_async_tx_descriptor also lays the groundwork for definining cross-channel-operation dependencies, and a callback facility for asynchronous notification of operation completion. Changelog: * drop dma mapping methods, suggested by Chris Leech * fix ioat_dma_dependency_added, also caught by Andrew Morton * fix dma_sync_wait, change from Andrew Morton * uninline large functions, change from Andrew Morton * add tx->callback = NULL to dmaengine calls to interoperate with async_tx calls * hookup ioat_tx_submit * convert channel capabilities to a 'cpumask_t like' bitmap * removed DMA_TX_ARRAY_INIT, no longer needed * checkpatch.pl fixes * make set_src, set_dest, and tx_submit descriptor specific methods * fixup git-ioat merge * move group_list and phys to dma_async_tx_descriptor Cc: Jeff Garzik <jeff@garzik.org> Cc: Chris Leech <christopher.leech@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: David S. Miller <davem@davemloft.net>
* I/OAT: fix I/OAT for kexecDan Aloni2007-07-111-0/+13
| | | | | | | | | | | | Under kexec, I/OAT initialization breaks over busy resources because the previous kernel did not release them. I'm not sure this fix can be considered a complete one but it works for me. I guess something similar to the *_remove method should occur there.. Signed-off-by: Dan Aloni <da-x@monatomic.org> Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* I/OAT: warning fixAndrew Morton2007-07-111-10/+16
| | | | | | | | net/ipv4/tcp.c: In function 'tcp_recvmsg': net/ipv4/tcp.c:1111: warning: unused variable 'available' Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Leech <christopher.leech@intel.com>
* I/OAT: Only offload copies for TCP when there will be a context switchChris Leech2007-07-111-3/+7
| | | | | | | | The performance wins come with having the DMA copy engine doing the copies in parallel with the context switch. If there is enough data ready on the socket at recv time just use a regular copy. Signed-off-by: Chris Leech <christopher.leech@intel.com>
* I/OAT: Add documentation for the tcp_dma_copybreak sysctlChris Leech2007-07-111-0/+6
| | | | Signed-off-by: Chris Leech <christopher.leech@intel.com>
* ioatdma: Remove the use of writeq from the ioatdma driverChris Leech2007-07-111-6/+4
| | | | | | | There's only one now anyway, and it's not in a performance path, so make it behave the same on 32-bit and 64-bit CPUs. Signed-off-by: Chris Leech <christopher.leech@intel.com>
* ioatdma: Remove the wrappers around read(bwl)/write(bwl) in ioatdmaChris Leech2007-07-112-150/+28
| | | | Signed-off-by: Chris Leech <christopher.leech@intel.com>
* drivers/dma: handle sysfs errorsJeff Garzik2007-07-111-2/+20
| | | | | | | From: Jeff Garzik <jeff@garzik.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Chris Leech <christopher.leech@intel.com>
* ioatdma: Push pending transactions to hardware more frequentlyChris Leech2007-07-111-2/+2
| | | | | | | | Every 20 descriptors turns out to be to few append commands with newer/faster CPUs. Pushing every 4 still cuts down on MMIO writes to an acceptable level without letting the DMA engine run out of work. Signed-off-by: Chris Leech <christopher.leech@intel.com>
* Linux 2.6.22Linus Torvalds2007-07-081-1/+1
| | | | | | | | Woo-hoo. I'm sure somebody will report a "this doesn't compile, and I have a new root exploit" five minutes after release, but it still feels good ;) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6Linus Torvalds2007-07-083-2/+3
|\ | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6: qd65xx: fix PIO mode selection sis5513: adding PCI-ID
| * qd65xx: fix PIO mode selectionBartlomiej Zolnierkiewicz2007-07-081-2/+1
| | | | | | | | | | | | | | | | | | | | PIO4 is a maximum PIO mode supported by a driver. Using "255" as a max_mode argument to ide_get_best_pio_mode() could result in wrong timings being used by a driver (for "pio" equal to 5) or OOPS (for "pio" values > 5 && < 255). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Reviewed-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
| * sis5513: adding PCI-IDUwe Koziolek2007-07-082-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SiS966 has one additional PCI-ID 1180. If the chipset is using this PCI-ID, the primary channel is connected to the first PATA-port. The secondary channel is connected to SATA-ports in IDE emulation mode. The legacy IO-ports are used. The including of the PCI-ID into pata_sis is not sufficient, because the legacy driver in drivers/ide is initialized before pata_sis. Signed-off-by: Uwe Koziolek <uwe.koziolek@gmx.net> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* | Fix permission checking for the new utimensat() system callLinus Torvalds2007-07-081-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1c710c896eb461895d3c399e15bb5f20b39c9073 added the utimensat() system call, but didn't handle the case of checking for the writability of the target right, when the target was a file descriptor, not a filename. We cannot use vfs_permission(MAY_WRITE) for that case, and need to simply check whether the file descriptor is writable. The oops from using the wrong function was noticed and narrowed down by Markus Trippelsdorf. Cc: Ulrich Drepper <drepper@redhat.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | mm: double mark_page_accessed() in read_cache_page_async()Peter Zijlstra2007-07-081-1/+0
|/ | | | | | | | | | | | | | | Fix a post-2.6.21 regression. read_cache_page_async() has two invocations of mark_page_accessed() which will launch pages right onto the active list. Remove the first one, keeping the latter one. This avoids marking unwanted pages active (in the retry loop). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* DLM must depend on SYSFSAdrian Bunk2007-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | | The dependency of DLM on SYSFS got lost in commit 6ed7257b46709e87d79ac2b6b819b7e0c9184998 resulting in the following compile error with CONFIG_DLM=y, CONFIG_SYSFS=n: <-- snip --> ... LD .tmp_vmlinux1 fs/built-in.o: In function `dlm_lockspace_init': /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/fs/dlm/lockspace.c:231: undefined reference to `kernel_subsys' fs/built-in.o: In function `configfs_init': /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/fs/configfs/mount.c:143: undefined reference to `kernel_subsys' make[1]: *** [.tmp_vmlinux1] Error 1 <-- snip --> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Clean up E7520/7320/7525 quirk printk.Dave Jones2007-07-071-3/+2
| | | | | | | | | | | | | | | The printk level in this printk is bogus, as the previous printk didn't have a terminating \n resulting in .. Intel E7520/7320/7525 detected.<6>Disabling irq balancing and affinity It also never printed a \n at all in the case where we didn't do the quirk. Change it to only make noise if it actually does something useful. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* include/linux/kallsyms.h must #include <linux/errno.h>Adrian Bunk2007-07-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | This patch fixes the following 2.6.22 regression with CONFIG_KALLSYMS=n: <-- snip --> ... CC arch/m32r/kernel/traps.o In file included from /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/arch/m32r/kernel/traps.c:14: /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h: In function 'lookup_symbol_name': /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h:66: error: 'ERANGE' undeclared (first use in this function) /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h:66: error: (Each undeclared identifier is reported only once /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h:66: error: for each function it appears in.) /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h: In function 'lookup_symbol_attrs': /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h:71: error: 'ERANGE' undeclared (first use in this function) make[2]: *** [arch/m32r/kernel/traps.o] Error 1 <-- snip --> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fix use-after-free oops in Bluetooth HID.David Woodhouse2007-07-071-9/+9
| | | | | | | | | | | | | | | | When cleaning up HIDP sessions, we currently close the ACL connection before deregistering the input device. Closing the ACL connection schedules a workqueue to remove the associated objects from sysfs, but the input device still refers to them -- and if the workqueue happens to run before the input device removal, the kernel will oops when trying to look up PHYSDEVPATH for the removed input device. Fix this by deregistering the input device before closing the connections. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* slub: remove useless EXPORT_SYMBOLChristoph Lameter2007-07-061-1/+0
| | | | | | | | | | | | | | kmem_cache_open is static. EXPORT_SYMBOL was leftover from some earlier time period where kmem_cache_open was usable outside of slub. (Fixes powerpc build error) Signed-off-by: Chrsitoph Lameter <clameter@sgi.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* MAINTAINERS new kernel janitors mlmaximilian attems2007-07-061-1/+1
| | | | | | | | | | davem kindly moved the list from osdl to vger. Signed-of-by: maximilian attems <max@stro.at> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* GEODE: reboot fixup for geode machines with CS5536 boardsAndres Salomon2007-07-062-2/+15
| | | | | | | | | | | | Writing to MSR 0x51400017 forces a hard reset on CS5536-based machines, this has the reboot fixup do just that if such a board is detected. Acked-by: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Andres Salomon <dilinger@debian.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'master' of ↵Linus Torvalds2007-07-066-8/+20
|\ | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NETPOLL]: Fixups for 'fix soft lockup when removing module' [NET]: net/core/netevent.c should #include <net/netevent.h> [NETFILTER]: nf_conntrack_h323: add checking of out-of-range on choices' index values [NET] skbuff: remove export of static symbol SCTP: Add scope_id validation for link-local binds SCTP: Check to make sure file is valid before setting timeout SCTP: Fix thinko in sctp_copy_laddrs()
| * [NETPOLL]: Fixups for 'fix soft lockup when removing module'Jarek Poplawski2007-07-051-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | >From my recent patch: > > #1 > > Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work() > > required a work function should always (unconditionally) rearm with > > delay > 0 - otherwise it would endlessly loop. This patch replaces > > this function with cancel_delayed_work(). Later kernel versions don't > > require this, so here it's only for uniformity. But Oleg Nesterov <oleg@tv-sign.ru> found: > But 2.6.22 doesn't need this change, why it was merged? > > In fact, I suspect this change adds a race, ... His description was right (thanks), so this patch reverts #1. Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NET]: net/core/netevent.c should #include <net/netevent.h>Adrian Bunk2007-07-051-0/+1
| | | | | | | | | | | | | | | | Every file should include the headers containing the prototypes for its global functions. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NETFILTER]: nf_conntrack_h323: add checking of out-of-range on choices' ↵Jing Min Zhao2007-07-051-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | index values Choices' index values may be out of range while still encoded in the fixed length bit-field. This bug may cause access to undefined types (NULL pointers) and thus crashes (Reported by Zhongling Wen). This patch also adds checking of decode flag when decoding SEQUENCEs. Signed-off-by: Jing Min Zhao <zhaojingmin@vivecode.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NET] skbuff: remove export of static symbolJohannes Berg2007-07-051-1/+0
| | | | | | | | | | | | | | skb_clone_fraglist is static so it shouldn't be exported. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * SCTP: Add scope_id validation for link-local bindsVlad Yasevich2007-07-051-0/+4
| | | | | | | | | | | | | | | | | | | | SCTP currently permits users to bind to link-local addresses, but doesn't verify that the scope id specified at bind matches the interface that the address is configured on. It was report that this can hang a system. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * SCTP: Check to make sure file is valid before setting timeoutVlad Yasevich2007-07-051-1/+9
| | | | | | | | | | | | | | | | | | | | In-kernel sockets created with sock_create_kern don't usually have a file and file descriptor allocated to them. As a result, when SCTP tries to check the non-blocking flag, we Oops when dereferencing a NULL file pointer. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * SCTP: Fix thinko in sctp_copy_laddrs()Vlad Yasevich2007-07-051-1/+1
| | | | | | | | | | | | | | | | Correctly dereference bytes_copied in sctp_copy_laddrs(). I totally must have spaced when doing this. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linusLinus Torvalds2007-07-069-19/+44
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: [MIPS] Fix scheduling latency issue on 24K, 34K and 74K cores [MIPS] Add macros to encode processor revisions. [MIPS] RM7000: Enable ICACHE_REFILLS_WORKAROUND_WAR. [MIPS] SMTC: Fix cut'n'paste bug in Kconfig.debug [MIPS] Change libgcc-style functions from lib-y to obj-y [MIPS] Fix timer/performance interrupt detection [MIPS] AP/SP: Avoid triggering the 34K E125 performance issue [MIPS] 64-bit TO_PHYS_MASK macro for RM9000 processors
| * | [MIPS] Fix scheduling latency issue on 24K, 34K and 74K coresRalf Baechle2007-07-062-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idle loop goes to sleep using the WAIT instruction if !need_resched(). This has is suffering from from a race condition that if if just after need_resched has returned 0 an interrupt might set TIF_NEED_RESCHED but we've just completed the test so go to sleep anyway. This would be trivial to fix by just disabling interrupts during that sequence as in: local_irq_disable(); if (!need_resched()) __asm__("wait"); local_irq_enable(); but the processor architecture leaves it undefined if a processor calling WAIT with interrupts disabled will ever restart its pipeline and indeed some processors have made use of the freedom provided by the architecture definition. This has been resolved and the Config7.WII bit indicates that the use of WAIT is safe on 24K, 24KE and 34K cores. It also is safe on 74K starting revision 2.1.0 so enable the use of WAIT with interrupts disabled for 74K based on a c0_prid of at least that. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | [MIPS] Add macros to encode processor revisions.Ralf Baechle2007-07-061-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | Older processors used to encode processor version and revision in two 4-bit bitfields, the 4K seems to simply count up and even newer MTI cores have switched to use the 8-bits as 3:3:2 bitfield with the last field as the patch number. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | [MIPS] RM7000: Enable ICACHE_REFILLS_WORKAROUND_WAR.Ralf Baechle2007-07-061-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RM7000 processors and the E9000 cores have a bug (though PMC-Sierra opposes it being called that) where invalid instructions in the same I-cache line worth of instructions being fetched may case spurious exceptions. The workaround for this was only enabled for E9000 cores; enable it also for all RM7000-based platforms. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | [MIPS] SMTC: Fix cut'n'paste bug in Kconfig.debugRalf Baechle2007-07-061-1/+1
| | | | | | | | | | | | | | | | | | This effectivly turned the SMTC_IDLE_HOOK_DEBUG debug option into a no-op. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | [MIPS] Change libgcc-style functions from lib-y to obj-yRalf Baechle2007-07-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Reported by Eugene Surovegin <ebs@ebshome.net>. If only modules were users of these functions they did not get linked into the kernel proper, so later module loads would fail as well. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>