aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'please-pull-pstore' of ↵Linus Torvalds2013-09-035-34/+242
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull pstore changes from Tony Luck: "A big part of this is the addition of compression to the generic pstore layer so that all backends can use the pitiful amounts of storage they control more effectively. Three other small fixes/cleanups too. * tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: pstore/ram: (really) fix undefined usage of rounddown_pow_of_two pstore/ram: Read and write to the 'compressed' flag of pstore efi-pstore: Read and write to the 'compressed' flag of pstore erst: Read and write to the 'compressed' flag of pstore powerpc/pseries: Read and write to the 'compressed' flag of pstore pstore: Add file extension to pstore file if compressed pstore: Add decompression support to pstore pstore: Introduce new argument 'compressed' in the read callback pstore: Add compression support to pstore pstore/Kconfig: Select ZLIB_DEFLATE and ZLIB_INFLATE when PSTORE is selected pstore: Add new argument 'compressed' in pstore write callback powerpc/pseries: Remove (de)compression in nvram with pstore enabled pstore: d_alloc_name() doesn't return an ERR_PTR acpi/apei/erst: Add missing iounmap() on error in erst_exec_move_data()
| * pstore/ram: (really) fix undefined usage of rounddown_pow_of_twoMaxime Bizon2013-08-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | Previous attempt to fix was b042e47491ba5f487601b5141a3f1d8582304170 Suggested use of is_power_of_2() was bogus because is_power_of_2(0) is false (documented behaviour). Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * pstore/ram: Read and write to the 'compressed' flag of pstoreAruna Balakrishnaiah2013-08-191-8/+28
| | | | | | | | | | | | | | | | | | | | In pstore write, add character 'C'(compressed) or 'D'(decompressed) in the header while writing to Ram persistent buffer. In pstore read, read the header and update the 'compressed' flag accordingly. Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * pstore: Add file extension to pstore file if compressedAruna Balakrishnaiah2013-08-193-6/+10
| | | | | | | | | | | | | | | | | | | | In case decompression fails, add a ".enc.z" to indicate the file has compressed data. This will help user space utilities to figure out the file contents. Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * pstore: Add decompression support to pstoreAruna Balakrishnaiah2013-08-191-2/+51
| | | | | | | | | | | | | | | | | | | | | | Based on the flag 'compressed' set or not, pstore will decompress the data returning a plain text file. If decompression fails for a particular record it will have the compressed data in the file which can be decompressed with 'openssl' command line tool. Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * pstore: Introduce new argument 'compressed' in the read callbackAruna Balakrishnaiah2013-08-192-2/+5
| | | | | | | | | | | | | | | | | | | | Backends will set the flag 'compressed' after reading the log from persistent store to indicate the data being returned to pstore is compressed or not. Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * pstore: Add compression support to pstoreAruna Balakrishnaiah2013-08-191-9/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add compression support to pstore which will help in capturing more data. Initially, pstore will make a call to kmsg_dump with a bigger buffer and will pass the size of bigger buffer to kmsg_dump and then compress the data to registered buffer of registered size. In case compression fails, pstore will capture the uncompressed data by making a call again to kmsg_dump with registered_buffer of registered size. Pstore will indicate the data is compressed or not with a flag in the write callback. Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * pstore/Kconfig: Select ZLIB_DEFLATE and ZLIB_INFLATE when PSTORE is selectedAruna Balakrishnaiah2013-08-191-0/+2
| | | | | | | | | | | | | | | | | | Pstore will make use of deflate and inflate algorithm to compress and decompress the data. So when Pstore is enabled select zlib_deflate and zlib_inflate. Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * pstore: Add new argument 'compressed' in pstore write callbackAruna Balakrishnaiah2013-08-192-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | Addition of new argument 'compressed' in the write call back will help the backend to know if the data passed from pstore is compressed or not (In case where compression fails.). If compressed, the backend can add a tag indicating the data is compressed while writing to persistent store. Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * pstore: d_alloc_name() doesn't return an ERR_PTRDan Carpenter2013-08-191-2/+1
| | | | | | | | | | | | | | | | | | | | d_alloc_name() returns NULL on error. Also I changed the error code from -ENOSPC to -ENOMEM to reflect that we were short on RAM not disk space. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | Merge branch 'for-3.12' of ↵Linus Torvalds2013-09-031-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "A lot of activities on the cgroup front. Most changes aren't visible to userland at all at this point and are laying foundation for the planned unified hierarchy. - The biggest change is decoupling the lifetime management of css (cgroup_subsys_state) from that of cgroup's. Because controllers (cpu, memory, block and so on) will need to be dynamically enabled and disabled, css which is the association point between a cgroup and a controller may come and go dynamically across the lifetime of a cgroup. Till now, css's were created when the associated cgroup was created and stayed till the cgroup got destroyed. Assumptions around this tight coupling permeated through cgroup core and controllers. These assumptions are gradually removed, which consists bulk of patches, and css destruction path is completely decoupled from cgroup destruction path. Note that decoupling of creation path is relatively easy on top of these changes and the patchset is pending for the next window. - cgroup has its own event mechanism cgroup.event_control, which is only used by memcg. It is overly complex trying to achieve high flexibility whose benefits seem dubious at best. Going forward, new events will simply generate file modified event and the existing mechanism is being made specific to memcg. This pull request contains prepatory patches for such change. - Various fixes and cleanups" Fixed up conflict in kernel/cgroup.c as per Tejun. * 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (69 commits) cgroup: fix cgroup_css() invocation in css_from_id() cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp() cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup cgroup: implement CFTYPE_NO_PREFIX cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax cgroup: fix cgroup_write_event_control() cgroup: fix subsystem file accesses on the root cgroup cgroup: change cgroup_from_id() to css_from_id() cgroup: use css_get() in cgroup_create() to check CSS_ROOT cpuset: remove an unncessary forward declaration cgroup: RCU protect each cgroup_subsys_state release cgroup: move subsys file removal to kill_css() cgroup: factor out kill_css() cgroup: decouple cgroup_subsys_state destruction from cgroup destruction cgroup: replace cgroup->css_kill_cnt with ->nr_css cgroup: bounce cgroup_subsys_state ref kill confirmation to a work item cgroup: move cgroup->subsys[] assignment to online_css() cgroup: reorganize css init / exit paths cgroup: add __rcu modifier to cgroup->subsys[] ...
| * | cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/Tejun Heo2013-08-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The names of the two struct cgroup_subsys_state accessors - cgroup_subsys_state() and task_subsys_state() - are somewhat awkward. The former clashes with the type name and the latter doesn't even indicate it's somehow related to cgroup. We're about to revamp large portion of cgroup API, so, let's rename them so that they're less awkward. Most per-controller usages of the accessors are localized in accessor wrappers and given the amount of scheduled changes, this isn't gonna add any noticeable headache. Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state() to task_css(). This patch is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | | Merge tag 'driver-core-3.12-rc1' of ↵Linus Torvalds2013-09-039-120/+180
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg KH: "Here's the big driver core pull request for 3.12-rc1. Lots of tiny changes here fixing up the way sysfs attributes are created, to try to make drivers simpler, and fix a whole class race conditions with creations of device attributes after the device was announced to userspace. All the various pieces are acked by the different subsystem maintainers" * tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (119 commits) firmware loader: fix pending_fw_head list corruption drivers/base/memory.c: introduce help macro to_memory_block dynamic debug: line queries failing due to uninitialized local variable sysfs: sysfs_create_groups returns a value. debugfs: provide debugfs_create_x64() when disabled rbd: convert bus code to use bus_groups firmware: dcdbas: use binary attribute groups sysfs: add sysfs_create/remove_groups for when SYSFS is not enabled driver core: add #include <linux/sysfs.h> to core files. HID: convert bus code to use dev_groups Input: serio: convert bus code to use drv_groups Input: gameport: convert bus code to use drv_groups driver core: firmware: use __ATTR_RW() driver core: core: use DEVICE_ATTR_RO driver core: bus: use DRIVER_ATTR_WO() driver core: create write-only attribute macros for devices and drivers sysfs: create __ATTR_WO() driver-core: platform: convert bus code to use dev_groups workqueue: convert bus code to use dev_groups MEI: convert bus code to use dev_groups ...
| * | | sysfs: group.c: fix up kerneldocGreg Kroah-Hartman2013-08-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix up the wording of sysfs_create/remove_groups() a bit. Reported-by: Anthony Foiani <tkil@scrye.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: sysfs.h: fix coding style issuesGreg Kroah-Hartman2013-08-211-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | This fixes up the remaining coding style issues in sysfs.h Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: file.c: fix up broken string warningsGreg Kroah-Hartman2013-08-211-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the coding style warnings in fs/sysfs/file.c for broken strings across lines. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: dir.c: fix up odd do/while indentationGreg Kroah-Hartman2013-08-211-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | This fixes up the odd do/while after an if statement warning in dir.c Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: fix up uaccess.h coding style warningsGreg Kroah-Hartman2013-08-212-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | This fixes the uaccess.h warnings in the sysfs.c files. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: fix up 80 column coding style issuesGreg Kroah-Hartman2013-08-214-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | This fixes up the 80 column coding style issues in the sysfs .c files. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: fix up space coding style issuesGreg Kroah-Hartman2013-08-216-36/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes up all of the space-related coding style issues for the sysfs code. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: remove trailing whitespaceGreg Kroah-Hartman2013-08-214-15/+13
| | | | | | | | | | | | | | | | | | | | | | | | This removes all trailing whitespace errors in the sysfs code. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: fix placement of EXPORT_SYMBOL()Greg Kroah-Hartman2013-08-213-20/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The export should happen after the function, not at the bottom of the file, so fix that up. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: group: update copyright to add myself and the LFGreg Kroah-Hartman2013-08-211-0/+2
| | | | | | | | | | | | | | | | Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: group.c: add kerneldoc for sysfs_remove_groupGreg Kroah-Hartman2013-08-211-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sysfs_remove_group() never had kerneldoc, so add it, and fix up the kerneldoc for sysfs_remove_groups() which didn't specify the parameters properly. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: group.c: fix up broken string coding styleGreg Kroah-Hartman2013-08-211-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | checkpatch complains about the broken string in the file, and it's correct, so fix it up. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: group.c: fix up some * coding style issuesGreg Kroah-Hartman2013-08-211-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | This fixes up the * coding style warnings for the group.c sysfs file. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: group.c: fix trailing whitespaceGreg Kroah-Hartman2013-08-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | There was some trailing spaces in the file, fix that up. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: group.c: move EXPORT_SYMBOL_GPL() to the proper locationGreg Kroah-Hartman2013-08-211-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes up the coding style issue of incorrectly placing the EXPORT_SYMBOL_GPL() macro, it should be right after the function itself, not at the end of the file. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | sysfs: add sysfs_create/remove_groups()Greg Kroah-Hartman2013-08-211-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These functions are being open-coded in 3 different places in the driver core, and other driver subsystems will want to start doing this as well, so move it to the sysfs core to keep it all in one place, where we know it is written properly. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | Merge 3.11-rc3 into driver-core-nextGreg Kroah-Hartman2013-07-296-29/+76
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | We want these fixes in this branch. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | cuse: convert class code to use dev_groupsGreg Kroah-Hartman2013-07-261-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dev_attrs field of struct class is going away soon, dev_groups should be used instead. This converts the cuse class code to use the correct field. Acked-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | | Merge branch 'lockref' (locked reference counts)Linus Torvalds2013-09-032-27/+80
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge lockref infrastructure code by me and Waiman Long. I already merged some of the preparatory patches that didn't actually do any semantic changes earlier, but this merges the actual _reason_ for those preparatory patches. The "lockref" structure is a combination "spinlock and reference count" that allows optimized reference count accesses. In particular, it guarantees that the reference count will be updated AS IF the spinlock was held, but using atomic accesses that cover both the reference count and the spinlock words, we can often do the update without actually having to take the lock. This allows us to avoid the nastiest cases of spinlock contention on large machines under heavy pathname lookup loads. When updating the dentry reference counts on a large system, we'll still end up with the cache line bouncing around, but that's much less noticeable than actually having to spin waiting for the lock. * lockref: lockref: implement lockless reference count updates using cmpxchg() lockref: uninline lockref helper functions vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock() vfs: use lockref_get_not_zero() for optimistic lockless dget_parent() lockref: add 'lockref_get_or_lock() helper
| * | | | | vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock()Linus Torvalds2013-09-022-27/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This moves __d_rcu_to_refcount() from <linux/dcache.h> into fs/namei.c and re-implements it using the lockref infrastructure instead. It also adds a lot of comments about what is actually going on, because turning a dentry that was looked up using RCU into a long-lived reference counted entry is one of the more subtle parts of the rcu walk. We also used to be _particularly_ subtle in unlazy_walk() where we re-validate both the dentry and its parent using the same sequence count. We used to do it by nesting the locks and then verifying the sequence count just once. That was silly, because nested locking is expensive, but the sequence count check is not. So this just re-validates the dentry and the parent separately, avoiding the nested locking, and making the lockref lookup possible. Acked-by: Waiman Long <waiman.long@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | vfs: use lockref_get_not_zero() for optimistic lockless dget_parent()Waiman Long2013-09-021-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A valid parent pointer is always going to have a non-zero reference count, but if we look up the parent optimistically without locking, we have to protect against the (very unlikely) race against renaming changing the parent from under us. We do that by using lockref_get_not_zero(), and then re-checking the parent pointer after getting a valid reference. [ This is a re-implementation of a chunk from the original patch by Waiman Long: "dcache: Enable lockless update of dentry's refcount". I've completely rewritten the patch-series and split it up, but I'm attributing this part to Waiman as it's close enough to his earlier patch - Linus ] Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge branch 'akpm' (patches from Andrew Morton)Linus Torvalds2013-08-281-1/+1
|\ \ \ \ \ \ | |/ / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge fixes from Andrew Morton: "Five fixes. err, make that six. let me try again" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: fs/ocfs2/super.c: Use bigger nodestr to accomodate 32-bit node numbers memcg: check that kmem_cache has memcg_params before accessing it drivers/base/memory.c: fix show_mem_removable() to handle missing sections IPC: bugfix for msgrcv with msgtyp < 0 Omnikey Cardman 4000: pull in ioctl.h in user header timer_list: correct the iterator for timer_list
| * | | | | fs/ocfs2/super.c: Use bigger nodestr to accomodate 32-bit node numbersGoldwyn Rodrigues2013-08-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While using pacemaker/corosync, the node numbers are generated using IP address as opposed to serial node number generation. This may not fit in a 8-byte string. Use a bigger string to print the complete node number. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | vfs: make the dentry cache use the lockref infrastructureWaiman Long2013-08-282-37/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This just replaces the dentry count/lock combination with the lockref structure that contains both a count and a spinlock, and does the mechanical conversion to use the lockref infrastructure. There are no semantic changes here, it's purely syntactic. The reference lockref implementation uses the spinlock exactly the same way that the old dcache code did, and the bulk of this patch is just expanding the internal "d_count" use in the dcache code to use "d_lockref.count" instead. This is purely preparation for the real change to make the reference count updates be lockless during the 3.12 merge window. [ As with the previous commit, this is a rewritten version of a concept originally from Waiman, so credit goes to him, blame for any errors goes to me. Waiman's patch had some semantic differences for taking advantage of the lockless update in dget_parent(), while this patch is intentionally a pure search-and-replace change with no semantic changes. - Linus ] Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Revert "fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink"Linus Torvalds2013-08-281-3/+7
|/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit bb2314b47996491bbc5add73633905c3120b6268. It wasn't necessarily wrong per se, but we're still busily discussing the exact details of this all, so I'm going to revert it for now. It's true that you can already do flink() through /proc and that flink() isn't new. But as Brad Spengler points out, some secure environments do not mount proc, and flink adds a new interface that can avoid path lookup of the source for those kinds of environments. We may re-do this (and even mark it for stable backporting back in 3.11 and possibly earlier) once the whole discussion about the interface is done. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brad Spengler <spender@grsecurity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge tag 'jfs-3.11-rc8' of git://github.com/kleikamp/linux-shaggyLinus Torvalds2013-08-261-8/+23
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull jfs fix from Dave Kleikamp: "One JFS patch to fix an incompatibility with NFSv4 resulting in the nfs client reporting a readdir loop" * tag 'jfs-3.11-rc8' of git://github.com/kleikamp/linux-shaggy: jfs: fix readdir cookie incompatibility with NFSv4
| * | | | | jfs: fix readdir cookie incompatibility with NFSv4Dave Kleikamp2013-08-151-8/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFSv4 reserves readdir cookie values 0-2 for special entries (. and ..), but jfs allows a value of 2 for a non-special entry. This incompatibility can result in the nfs client reporting a readdir loop. This patch doesn't change the value stored internally, but adds one to the value exposed to the iterate method. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Tested-by: Christian Kujau <lists@nerdbynature.de>
* | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2013-08-256-12/+15
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Assorted fixes from the last week or so" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: VFS: collect_mounts() should return an ERR_PTR bfs: iget_locked() doesn't return an ERR_PTR efs: iget_locked() doesn't return an ERR_PTR() proc: kill the extra proc_readfd_common()->dir_emit_dots() cope with potentially long ->d_dname() output for shmem/hugetlb
| * | | | | | VFS: collect_mounts() should return an ERR_PTRDan Carpenter2013-08-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should actually be returning an ERR_PTR on error instead of NULL. That was how it was designed and all the callers expect it. [AV: actually, that's what "VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors" missed - originally collect_mounts() was expected to return NULL on failure] Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | bfs: iget_locked() doesn't return an ERR_PTRDan Carpenter2013-08-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | iget_locked() returns a NULL on error, it doesn't return an ERR_PTR. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | efs: iget_locked() doesn't return an ERR_PTR()Dan Carpenter2013-08-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iget_locked() function returns NULL on error and never an ERR_PTR. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | proc: kill the extra proc_readfd_common()->dir_emit_dots()Oleg Nesterov2013-08-241-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | proc_readfd_common() does dir_emit_dots() twice in a row, we need to do this only once. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | cope with potentially long ->d_dname() output for shmem/hugetlbAl Viro2013-08-242-7/+12
| | |_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dynamic_dname() is both too much and too little for those - the output may be well in excess of 64 bytes dynamic_dname() assumes to be enough (thanks to ashmem feeding really long names to shmem_file_setup()) and vsnprintf() is an overkill for those guys. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | Merge tag 'scsi-fixes' of ↵Linus Torvalds2013-08-241-5/+15
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is a set of small bug fixes for lpfc and zfcp and a fix for a fairly nasty bug in sg where a process which cancels I/O completes in a kernel thread which would then try to write back to the now gone userspace and end up writing to a random kernel address instead" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: [SCSI] zfcp: remove access control tables interface (keep sysfs files) [SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops [SCSI] zfcp: fix lock imbalance by reworking request queue locking [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal [SCSI] lpfc: Don't force CONFIG_GENERIC_CSUM on
| * | | | | | [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signalRoland Dreier2013-08-211-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances leads to one process writing data into the address space of some other random unrelated process if the ioctl is interrupted by a signal. What happens is the following: - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the underlying SCSI command will transfer data from the SCSI device to the buffer provided in the ioctl) - Before the command finishes, a signal is sent to the process waiting in the ioctl. This will end up waking up the sg_ioctl() code: result = wait_event_interruptible(sfp->read_wait, (srp_done(sfp, srp) || sdp->detached)); but neither srp_done() nor sdp->detached is true, so we end up just setting srp->orphan and returning to userspace: srp->orphan = 1; write_unlock_irq(&sfp->rq_list_lock); return result; /* -ERESTARTSYS because signal hit process */ At this point the original process is done with the ioctl and blithely goes ahead handling the signal, reissuing the ioctl, etc. - Eventually, the SCSI command issued by the first ioctl finishes and ends up in sg_rq_end_io(). At the end of that function, we run through: write_lock_irqsave(&sfp->rq_list_lock, iflags); if (unlikely(srp->orphan)) { if (sfp->keep_orphan) srp->sg_io_owned = 0; else done = 0; } srp->done = done; write_unlock_irqrestore(&sfp->rq_list_lock, iflags); if (likely(done)) { /* Now wake up any sg_read() that is waiting for this * packet. */ wake_up_interruptible(&sfp->read_wait); kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN); kref_put(&sfp->f_ref, sg_remove_sfp); } else { INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext); schedule_work(&srp->ew.work); } Since srp->orphan *is* set, we set done to 0 (assuming the userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext() to run in a workqueue. - In workqueue context we go through sg_rq_end_io_usercontext() -> sg_finish_rem_req() -> blk_rq_unmap_user() -> ... -> bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user(). The key point here is that we are doing copy_to_user() on a workqueue -- that is, we're on a kernel thread with current->mm equal to whatever random previous user process was scheduled before this kernel thread. So we end up copying whatever data the SCSI command returned to the virtual address of the buffer passed into the original ioctl, but it's quite likely we do this copying into a different address space! As suggested by James Bottomley <James.Bottomley@hansenpartnership.com>, add a check for current->mm (which is NULL if we're on a kernel thread without a real userspace address space) in bio_uncopy_user(), and skip the copy if we're on a kernel thread. There's no reason that I can think of for any caller of bio_uncopy_user() to want to do copying on a kernel thread with a random active userspace address space. Huge thanks to Costa Sapuntzakis <costa@purestorage.com> for the original pointer to this bug in the sg code. Signed-off-by: Roland Dreier <roland@purestorage.com> Tested-by: David Milburn <dmilburn@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* | | | | | | nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP ↵Vyacheslav Dubeyko2013-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | error detection Fix the issue with improper counting number of flying bio requests for BIO_EOPNOTSUPP error detection case. The sb_nbio must be incremented exactly the same number of times as complete() function was called (or will be called) because nilfs_segbuf_wait() will call wail_for_completion() for the number of times set to sb_nbio: do { wait_for_completion(&segbuf->sb_bio_event); } while (--segbuf->sb_nbio > 0); Two functions complete() and wait_for_completion() must be called the same number of times for the same sb_bio_event. Otherwise, wait_for_completion() will hang or leak. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP ↵Vyacheslav Dubeyko2013-08-231-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | error Remove double call of bio_put() in nilfs_end_bio_write() for the case of BIO_EOPNOTSUPP error detection. The issue was found by Dan Carpenter and he suggests first version of the fix too. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>