aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-linus' of git://git.infradead.org/ubi-2.6Linus Torvalds2007-04-2730-0/+12114
|\ | | | | | | | | | | | | | | * 'for-linus' of git://git.infradead.org/ubi-2.6: UBI: remove unused variable UBI: add me to MAINTAINERS JFFS2: add UBI support UBI: Unsorted Block Images
| * UBI: remove unused variableArtem Bityutskiy2007-04-271-1/+0
| | | | | | | | Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * UBI: add me to MAINTAINERSArtem Bityutskiy2007-04-271-0/+8
| | | | | | | | Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * JFFS2: add UBI supportArtem Bityutskiy2007-04-273-0/+42
| | | | | | | | | | | | | | This patch make JFFS2 able to work with UBI volumes via the emulated MTD devices which are directly mapped to these volumes. Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>
| * UBI: Unsorted Block ImagesArtem B. Bityutskiy2007-04-2726-0/+12065
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UBI (Latin: "where?") manages multiple logical volumes on a single flash device, specifically supporting NAND flash devices. UBI provides a flexible partitioning concept which still allows for wear-levelling across the whole flash device. In a sense, UBI may be compared to the Logical Volume Manager (LVM). Whereas LVM maps logical sector numbers to physical HDD sector numbers, UBI maps logical eraseblocks to physical eraseblocks. More information may be found at http://www.linux-mtd.infradead.org/doc/ubi.html Partitioning/Re-partitioning An UBI volume occupies a certain number of erase blocks. This is limited by a configured maximum volume size, which could also be viewed as the partition size. Each individual UBI volume's size can be changed independently of the other UBI volumes, provided that the sum of all volume sizes doesn't exceed a certain limit. UBI supports dynamic volumes and static volumes. Static volumes are read-only and their contents are protected by CRC check sums. Bad eraseblocks handling UBI transparently handles bad eraseblocks. When a physical eraseblock becomes bad, it is substituted by a good physical eraseblock, and the user does not even notice this. Scrubbing On a NAND flash bit flips can occur on any write operation, sometimes also on read. If bit flips persist on the device, at first they can still be corrected by ECC, but once they accumulate, correction will become impossible. Thus it is best to actively scrub the affected eraseblock, by first copying it to a free eraseblock and then erasing the original. The UBI layer performs this type of scrubbing under the covers, transparently to the UBI volume users. Erase Counts UBI maintains an erase count header per eraseblock. This frees higher-level layers (like file systems) from doing this and allows for centralized erase count management instead. The erase counts are used by the wear-levelling algorithm in the UBI layer. The algorithm itself is exchangeable. Booting from NAND For booting directly from NAND flash the hardware must at least be capable of fetching and executing a small portion of the NAND flash. Some NAND flash controllers have this kind of support. They usually limit the window to a few kilobytes in erase block 0. This "initial program loader" (IPL) must then contain sufficient logic to load and execute the next boot phase. Due to bad eraseblocks, which may be randomly scattered over the flash device, it is problematic to store the "secondary program loader" (SPL) statically. Also, due to bit-flips it may become corrupted over time. UBI allows to solve this problem gracefully by storing the SPL in a small static UBI volume. UBI volumes vs. static partitions UBI volumes are still very similar to static MTD partitions: * both consist of eraseblocks (logical eraseblocks in case of UBI volumes, and physical eraseblocks in case of static partitions; * both support three basic operations - read, write, erase. But UBI volumes have the following advantages over traditional static MTD partitions: * there are no eraseblock wear-leveling constraints in case of UBI volumes, so the user should not care about this; * there are no bit-flips and bad eraseblocks in case of UBI volumes. So, UBI volumes may be considered as flash devices with relaxed restrictions. Where can it be found? Documentation, kernel code and applications can be found in the MTD gits. What are the applications for? The applications help to create binary flash images for two purposes: pfi files (partial flash images) for in-system update of UBI volumes, and plain binary images, with or without OOB data in case of NAND, for a manufacturing step. Furthermore some tools are/and will be created that allow flash content analysis after a system has crashed.. Who did UBI? The original ideas, where UBI is based on, were developed by Andreas Arnez, Frank Haverkamp and Thomas Gleixner. Josh W. Boyer and some others were involved too. The implementation of the kernel layer was done by Artem B. Bityutskiy. The user-space applications and tools were written by Oliver Lohmann with contributions from Frank Haverkamp, Andreas Arnez, and Artem. Joern Engel contributed a patch which modifies JFFS2 so that it can be run on a UBI volume. Thomas Gleixner did modifications to the NAND layer. Alexander Schmidt made some testing work as well as core functionality improvements. Signed-off-by: Artem B. Bityutskiy <dedekind@linutronix.de> Signed-off-by: Frank Haverkamp <haver@vnet.ibm.com>
* | Merge branch 'upstream-linus' of ↵Linus Torvalds2007-04-2731-2237/+4697
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (27 commits) ocfs2: Cache extent records ocfs2: Remember rw lock level during direct io ocfs2: Fix up i_blocks calculation to know about holes ocfs2: Fix extent lookup to return true size of holes ocfs2: Read from an unwritten extent returns zeros ocfs2: make room for unwritten extents flag ocfs2: Use own splice write actor ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate() [PATCH] Turn do_sync_file_range() into do_sync_mapping_range() ocfs2: zero tail of sparse files on truncate ocfs2: Teach ocfs2_get_block() about holes ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write() ocfs2: teach ocfs2_file_aio_write() about sparse files ocfs2: Turn off shared writeable mmap for local files systems with holes. ocfs2: abstract out allocation locking ocfs2: teach extend/truncate about sparse files ocfs2: temporarily remove extent map caching ocfs2: sparse b-tree support ocfs2: small cleanup of ocfs2_request_delete() ocfs2: remove unused code ...
| * | ocfs2: Cache extent recordsMark Fasheh2007-04-267-0/+289
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The extent map code was ripped out earlier because of an inability to deal with holes. This patch adds back a simpler caching scheme requiring far less code. Our old extent map caching was designed back when meta data block caching in Ocfs2 didn't work very well, resulting in many disk reads. These days our metadata caching is much better, resulting in no un-necessary disk reads. As a result, extent caching doesn't have to be as fancy, nor does it have to cache as many extents. Keeping the last 3 extents seen should be sufficient to give us a small performance boost on some streaming workloads. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: Remember rw lock level during direct ioMark Fasheh2007-04-263-7/+19
| | | | | | | | | | | | | | | | | | | | | Cluster locking might have been redone because a direct write won't complete, so this needs to be reflected in the iocb. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: Fix up i_blocks calculation to know about holesMark Fasheh2007-04-269-30/+25
| | | | | | | | | | | | | | | | | | | | | | | | Older file systems which didn't support holes did a dumb calculation of i_blocks based on i_size. This is no longer accurate, so fix things up to take actual allocation into account. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: Fix extent lookup to return true size of holesMark Fasheh2007-04-265-12/+109
| | | | | | | | | | | | | | | | | | | | | | | | Initially, we had wired things to return a size '1' of holes. Cook up a small amount of code to find the next extent and calculate the number of clusters between the virtual offset and the next allocated extent. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: Read from an unwritten extent returns zerosMark Fasheh2007-04-2610-23/+45
| | | | | | | | | | | | | | | | | | | | | | | | Return an optional extent flags field from our lookup functions and wire up callers to treat unwritten regions as holes for the purpose of returning zeros to the user. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: make room for unwritten extents flagMark Fasheh2007-04-266-69/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to the size of our group bitmaps, we'll never have a leaf node extent record with more than 16 bits worth of clusters. Split e_clusters up so that leaf nodes can get a flags field where we can mark unwritten extents. Interior nodes whose length references all the child nodes beneath it can't split their e_clusters field, so we use a union to preserve sizing there. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: Use own splice write actorMark Fasheh2007-04-263-1/+162
| | | | | | | | | | | | | | | | | | | | | | | | We need to fill holes during a splice write. Provide our own splice write actor which can call ocfs2_file_buffered_write() with a splice-specific callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate()Mark Fasheh2007-04-261-1/+4
| | | | | | | | | | | | | | | | | | | | | Do this instead of filemap_fdatawrite() - this way we sync only the range between i_size and the cluster boundary. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | [PATCH] Turn do_sync_file_range() into do_sync_mapping_range()Mark Fasheh2007-04-262-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do_sync_file_range() accepts a file * from which it takes an address_space to sync. Abstract out the bulk of the function into do_sync_mapping_range() which takes the address_space directly. This way callers who want to sync an address_space directly can take advantage of the functionality provided. do_sync_file_range() is preserved as a small wrapper around do_sync_mapping_range(). Ocfs2 in particular would like to use this to initiate a sync of a specific inode range during truncate, where a file * may not be available. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | ocfs2: zero tail of sparse files on truncateMark Fasheh2007-04-267-25/+328
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we don't zero on extend anymore, truncate needs to be fixed up to zero the part of a file between i_size and and end of it's cluster. Otherwise a subsequent extend could expose bad data. This introduced a new helper, which can be used in ocfs2_write(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: Teach ocfs2_get_block() about holesMark Fasheh2007-04-261-38/+61
| | | | | | | | | | | | | | | | | | | | | | | | ocfs2_get_block() didn't understand sparse files, fix that. Also remove some code that isn't really useful anymore. We can fix up ocfs2_direct_IO_get_blocks() at the same time. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write()Mark Fasheh2007-04-261-120/+5
| | | | | | | | | | | | | | | | | | | | | These are no longer used, and can't handle file systems with sparse file allocation. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: teach ocfs2_file_aio_write() about sparse filesMark Fasheh2007-04-267-57/+1076
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unfortunately, ocfs2 can no longer make use of generic_file_aio_write_nlock() because allocating writes will require zeroing of pages adjacent to the I/O for cluster sizes greater than page size. Implement a custom file write here, which can order page locks for zeroing. This also has the advantage that cluster locks can easily be ordered outside of the page locks. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: Turn off shared writeable mmap for local files systems with holes.Mark Fasheh2007-04-261-2/+5
| | | | | | | | | | | | | | | | | | This will be turned back on once we can do allocation in ->page_mkwrite(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: abstract out allocation lockingMark Fasheh2007-04-261-27/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now, file allocation for ocfs2 is done within ocfs2_extend_file(), which is either called from ->setattr() (for an i_size change), or at the top of ocfs2_file_aio_write(). Inodes on file systems with sparse file support will want to do their allocation during the actual write call. In either case the cluster locking decisions are the same. We abstract out that code into a new function, ocfs2_lock_allocators() which will be used by a later patch to enable writing to sparse files. This also provides a nice cleanup of ocfs2_extend_allocation(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: teach extend/truncate about sparse filesMark Fasheh2007-04-263-237/+320
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no longer exists since i_size is not tied to i_clusters. In ocfs2_extend_file(), we skip the allocation / page zeroing code for file systems which understand sparse files. The core truncate code is changed to do a bottom up tree traversal. This gets abstracted out into it's own function. To make things more readable, most of the special case handling for in-inode extents from ocfs2_do_truncate() is also removed. Though write support for sparse files comes in a later patch, we at least update ocfs2_prepare_inode_for_write() to skip allocation for sparse files. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: temporarily remove extent map cachingMark Fasheh2007-04-2614-996/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code in extent_map.c is not prepared to deal with a subtree being rotated between lookups. This can happen when filling holes in sparse files. Instead of a lengthy patch to update the code (which would likely lose the benefit of caching subtree roots), we remove most of the algorithms and implement a simple path based lookup. A less ambitious extent caching scheme will be added in a later patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: sparse b-tree supportMark Fasheh2007-04-268-497/+2002
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce tree rotations into the b-tree code. This will allow ocfs2 to support sparse files. Much of the added code is designed to be generic (in the ocfs2 sense) so that it can later be re-used to implement large extended attributes. This patch only adds the rotation code and does minimal updates to callers of the extent api. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: small cleanup of ocfs2_request_delete()Mark Fasheh2007-04-261-33/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two checks in there (one for inode newness, one for other mounted nodes) which are unnecessary, so remove them. The DLM will allow the trylock in either case without any messaging overhead. Removing these makes ocfs2_request_delete() a one liner function, so just move the trylock out one level into ocfs2_query_inode_wipe(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: remove unused codeTiger Yang2007-04-264-329/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove node messaging code that becomes unused with the delete inode vote removal. [Removed even more cruft which I spotted during review --Mark] Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: Remove delete inode voteTiger Yang2007-04-2610-38/+205
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ocfs2 currently does cluster-wide node messaging to check the open state of an inode during delete. This patch removes that mechanism in favor of an inode cluster lock which is taken at shared read when an inode is first read and dropped in clear_inode(). This allows a deleting node to test the liveness of an inode by attempting to take an exclusive lock. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: filter more error printsMark Fasheh2007-04-262-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't want to print anything at all in ocfs2_lookup() when getting an error from ocfs2_iget() - it could be something as innocuous as a signal being detected in the dlm. ocfs2_permission() should filter on -ENOENT which ocfs2_meta_lock() can return if the inode was deleted on another node. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: Replace panic() with emergency_restart() when fencingSunil Mushran2007-04-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have noticed panic() hanging leading us to a situation in which the node, while otherwise dead, is still disk heartbeating. This leads to a hung cluster as the other nodes are waiting for this node to stop disk heartbeating. This situation is only resolved by power resetting the box. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: Silence compiler warningsSunil Mushran2007-04-262-2/+2
| | | | | | | | | | | | | | | Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: Local mounts should skip inode updatesMark Fasheh2007-04-261-12/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't want the extent map and uptodate cache destruction in ocfs2_meta_lock_update() on a local mount, so skip that. This fixes several bugs with uptodate being cleared on buffers and extent maps being corrupted. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2_dlm: Call cond_resched_lock() once per hash bucket scanSunil Mushran2007-04-261-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | In dlm_migrate_all_locks(), we currently call cond_resched_lock() after processing each lockres in a hash bucket. Move it outside the loop so as to call it only after the entire hash bucket has been processed. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2_dlm: fix race in dlm_remaster_locksSrinivas Eeda2007-04-261-0/+2
| |/ | | | | | | | | | | | | | | | | | | | | There is a possibility that dlm_remaster_locks could overwride node->state with DLM_RECO_NODE_DATA_REQUESTED after dlm_reco_data_done_handler sets the node->state to DLM_RECO_NODE_DATA_DONE. This could lead to recovery getting stuck and requires a cluster reboot. Synchronize with dlm_reco_state_lock spinlock. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* | Merge branch 'e1000-fixes' of ↵Linus Torvalds2007-04-271-46/+58
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 * 'e1000-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: e1000: FIX: Stop raw interrupts disabled nag from RT e1000: FIX: firmware handover bits e1000: FIX: be ready for incoming irq at pci_request_irq
| * | e1000: FIX: Stop raw interrupts disabled nag from RTMark Huth2007-04-261-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current e1000_xmit_frame spews raw interrupt disabled nag messages when used with RT kernel patches. This patch uses spin_trylock_irqsave, which allows RT patches to properly manage the irq semantics. Signed-off-by: Mark Huth <mhuth@mvista.com> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
| * | e1000: FIX: firmware handover bitsBruce Allan2007-04-261-21/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upon code inspection it was spotted that the firmware handover bit get/set mismatched, which may have resulted in management issues on PCI-E adapters. Setting them correctly may fix some management issues such as arp routing etc. Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
| * | e1000: FIX: be ready for incoming irq at pci_request_irqAuke Kok2007-04-261-21/+45
| |/ | | | | | | | | | | | | | | | | | | DEBUG_SHIRQ code exposed that e1000 was not ready for incoming interrupts after having called pci_request_irq. This obviously requires us to finish our software setup which assigns the irq handler before we request the irq. Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2007-04-2747-1002/+1620
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: (49 commits) IB: Set class_dev->dev in core for nice device symlink IB/ehca: Implement modify_port IB/umad: Clarify documentation of transaction ID IPoIB/cm: spin_lock_irqsave() -> spin_lock_irq() replacements IB/mad: Change SMI to use enums rather than magic return codes IB/umad: Implement GRH handling for sent/received MADs IB/ipoib: Use ib_init_ah_from_path to initialize ah_attr IB/sa: Set src_path_bits correctly in ib_init_ah_from_path() IB/ucm: Simplify ib_ucm_event() RDMA/ucma: Simplify ucma_get_event() IB/mthca: Simplify CQ cleaning in mthca_free_qp() IB/mthca: Fix mthca_write_mtt() on HCAs with hidden memory IB/mthca: Update HCA firmware revisions IB/ipath: Fix WC format drift between user and kernel space IB/ipath: Check that a UD work request's address handle is valid IB/ipath: Remove duplicate stuff from ipath_verbs.h IB/ipath: Check reserved memory keys IB/ipath: Fix unit selection when all CPU affinity bits set IB/ipath: Don't allow QPs 0 and 1 to be opened multiple times IB/ipath: Disable IB link earlier in shutdown sequence ...
| * | IB: Set class_dev->dev in core for nice device symlinkJoachim Fenkes2007-04-245-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All RDMA drivers except ehca set class_dev->dev to their dma_device value (ehca leaves this unset). dma_device is the only value that makes any sense, so move this assignment to core/sysfs.c. This reduce the duplicated code in the rest of the drivers and gives ehca a nice /sys/class/infiniband/ehcaX/device symlink. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * | IB/ehca: Implement modify_portJoachim Fenkes2007-04-245-2/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add "Modify Port" verb support to eHCA driver. The IB communication manager needs this to set the IsCM port capability bit when initializing. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * | IB/umad: Clarify documentation of transaction IDHal Rosenstock2007-04-241-0/+8
| | | | | | | | | | | | | | | Signed-off-by: Hal Rosenstock <halr@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * | IPoIB/cm: spin_lock_irqsave() -> spin_lock_irq() replacementsRoland Dreier2007-04-241-32/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are quite a few places in ipoib_cm.c where we know IRQs are enabled because we do something that sleeps in the same function, so we can convert several occurrences of spin_lock_irqsave() to a plain spin_lock_irq(). This cleans up the source a little and makes the code smaller too: add/remove: 0/0 grow/shrink: 1/5 up/down: 3/-51 (-48) function old new delta ipoib_cm_tx_reap 403 406 +3 ipoib_cm_stale_task 146 145 -1 ipoib_cm_dev_stop 173 172 -1 ipoib_cm_tx_handler 964 956 -8 ipoib_cm_rx_handler 956 937 -19 ipoib_cm_skb_reap 212 190 -22 Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * | IB/mad: Change SMI to use enums rather than magic return codesHal Rosenstock2007-04-243-72/+82
| | | | | | | | | | | | | | | | | | | | | | | | Clarify code by changing return values from SMI functions to named enum values instead of magic 0/1 values. Signed-off-by: Hal Rosenstock <halr@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * | IB/umad: Implement GRH handling for sent/received MADsSean Hefty2007-04-241-6/+12
| | | | | | | | | | | | | | | | | | | | | We need to set the SGID index for routed MADs and pass received GRH information to userspace when a MAD is received. Signed-off-by: Sean Hefty <sean.hefty@intel.com>
| * | IB/ipoib: Use ib_init_ah_from_path to initialize ah_attrSean Hefty2007-04-241-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | To support destinations that are not on the local IB subnet, IPoIB should include the GRH information when constructing an address handle. Using the existing ib_init_ah_from_path() call will do this for us. Signed-off-by: Sean Hefty <sean.hefty@intel.com>
| * | IB/sa: Set src_path_bits correctly in ib_init_ah_from_path()Sean Hefty2007-04-241-1/+23
| | | | | | | | | | | | | | | | | | src_path_bits needs to mask off the base LID value. Signed-off-by: Sean Hefty <sean.hefty@intel.com>
| * | IB/ucm: Simplify ib_ucm_event()Sean Hefty2007-04-241-17/+6
| | | | | | | | | | | | | | | | | | | | | Use wait_event_interruptible() instead of a more complicated open-coded equivalent. Signed-off-by: Sean Hefty <sean.hefty@intel.com>
| * | RDMA/ucma: Simplify ucma_get_event()Sean Hefty2007-04-241-15/+7
| | | | | | | | | | | | | | | | | | | | | Use wait_event_interruptible() instead of a more complicated open-coded equivalent. Signed-off-by: Sean Hefty <sean.hefty@intel.com>
| * | IB/mthca: Simplify CQ cleaning in mthca_free_qp()Roland Dreier2007-04-241-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mthca_free_qp() already has local variables to hold the QP's send_cq and recv_cq, so we can slightly clean up the calls to mthca_cq_clean() by using those local variables instead of expressions like to_mcq(qp->ibqp.send_cq). Also, by cleaning the recv_cq first, we can avoid worrying about whether the QP is attached to an SRQ for the second call, because we would only clean send_cq if send_cq is not equal to recv_cq, and that means send_cq cannot have any receive completions from the QP being destroyed. All this work even improves the generated code a bit: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5) function old new delta mthca_free_qp 510 505 -5 Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * | IB/mthca: Fix mthca_write_mtt() on HCAs with hidden memoryRoland Dreier2007-04-241-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b2875d4c ("IB/mthca: Always fill MTTs from CPU") causes a crash in mthca_write_mtt() with non-memfree HCAs that have their memory hidden (that is, have only two PCI BARs instead of having a third BAR that allows access to the RAM attached to the HCA) on 64-bit architectures. This is because the commit just before, c20e20ab ("IB/mthca: Merge MR and FMR space on 64-bit systems") makes dev->mr_table.fmr_mtt_buddy equal to &dev->mr_table.mtt_buddy and hence mthca_write_mtt() tries to write directly into the HCA's MTT table. However, since that table is in the HCA's memory, this is impossible without the PCI BAR that gives access to that memory. This causes a crash because mthca_tavor_write_mtt_seg() basically tries to dereference some offset of a NULL pointer. Fix this by adding a test of MTHCA_FLAG_FMR in mthca_write_mtt() so that we always use the WRITE_MTT firmware command rather than writing directly if FMRs are not enabled. Signed-off-by: Roland Dreier <rolandd@cisco.com>