aboutsummaryrefslogtreecommitdiffstats
path: root/fs/notify
Commit message (Collapse)AuthorAgeFilesLines
* fsnotify: update comments concerning locking schemeLino Sanfilippo2013-07-091-28/+22
| | | | | | | | | | | | There have been changes in the locking scheme of fsnotify but the comments in the source code have not been updated yet. This patch corrects this. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* inotify: fix race when adding a new watchLino Sanfilippo2013-07-091-9/+4
| | | | | | | | | | | | | | | | | | | | In inotify_new_watch() the number of watches for a group is compared against the max number of allowed watches and increased afterwards. The check and incrementation is not done atomically, so it is possible for multiple concurrent threads to pass the check and increment the number of marks above the allowed max. This patch uses an inotify groups mark_lock to ensure that both check and incrementation are done atomic. Furthermore we dont have to worry about the race that allows a concurrent thread to add a watch just after inotify_update_existing_watch() returned with -ENOENT anymore, since this is also synchronized by the groups mark mutex now. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* dnotify: replace dnotify_mark_mutex with mark mutex of dnotify_groupLino Sanfilippo2013-07-091-12/+13
| | | | | | | | | | | | There is no need to use a special mutex to protect against the fcntl/close race (see dnotify.c for a description of this race). Instead the dnotify_groups mark mutex can be used. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fanotify: put duplicate code for adding vfsmount/inode marks into an own ↵Lino Sanfilippo2013-07-091-36/+35
| | | | | | | | | | | | | | function The code under the groups mark_mutex in fanotify_add_inode_mark() and fanotify_add_vfsmount_mark() is almost identical. So put it into a seperate function. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fanotify: fix races when adding/removing marksLino Sanfilippo2013-07-091-12/+37
| | | | | | | | | | | | | | | | | | | | | | | | | For both adding an event to an existing mark and destroying a mark we first have to find it via fsnotify_find_[inode|vfsmount]_mark(). But getting the mark and adding an event (or destroying it) is not done atomically. This opens a race where a thread is about to destroy a mark while another thread still finds the same mark and adds an event to its mask although it will be destroyed. Another race exists concerning the excess of a groups number of marks limit: When a mark is added the number of group marks is checked against the max number of marks per group and increased afterwards. Since check and increment is also not done atomically, this may result in 2 or more processes passing the check at the same time and increasing the number of group marks above the allowed limit. With this patch both races are avoided by doing the concerning operations with the groups mark mutex locked. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fanotify: info leak in copy_event_to_user()Dan Carpenter2013-07-091-0/+1
| | | | | | | | | | | The ->reserved field isn't cleared so we leak one byte of stack information to userspace. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fanotify: quit wanking with FASYNC in ->release()Al Viro2013-06-291-3/+0
| | | | | | | | | ... especially since there's no way to get that sucker on the list fsnotify_fasync() works with - the only thing adding to it is fsnotify_fasync() itself and it's never called for fanotify files while they are opened. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINEAl Viro2013-05-091-0/+17
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2013-05-011-3/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...
| * fanotify: don't wank with FASYNC on ->release()Al Viro2013-04-291-3/+0
| | | | | | | | | | | | ... it's done already by __fput() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'for-linus' of ↵Linus Torvalds2013-05-011-14/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull compat cleanup from Al Viro: "Mostly about syscall wrappers this time; there will be another pile with patches in the same general area from various people, but I'd rather push those after both that and vfs.git pile are in." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: syscalls.h: slightly reduce the jungles of macros get rid of union semop in sys_semctl(2) arguments make do_mremap() static sparc: no need to sign-extend in sync_file_range() wrapper ppc compat wrappers for add_key(2) and request_key(2) are pointless x86: trim sys_ia32.h x86: sys32_kill and sys32_mprotect are pointless get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC merge compat sys_ipc instances consolidate compat lookup_dcookie() convert vmsplice to COMPAT_SYSCALL_DEFINE switch getrusage() to COMPAT_SYSCALL_DEFINE switch epoll_pwait to COMPAT_SYSCALL_DEFINE convert sendfile{,64} to COMPAT_SYSCALL_DEFINE switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect make HAVE_SYSCALL_WRAPPERS unconditional consolidate cond_syscall and SYSCALL_ALIAS declarations teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long get rid of duplicate logics in __SC_....[1-6] definitions
| * | teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long longAl Viro2013-03-031-14/+3
| |/ | | | | | | | | | | | | ... and convert a bunch of SYSCALL_DEFINE ones to SYSCALL_DEFINE<n>, killing the boilerplate crap around them. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | inotify: invalid mask should return a error number but not set itZhao Hongjiang2013-04-301-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we run the crackerjack testsuite, the inotify_add_watch test is stalled. This is caused by the invalid mask 0 - the task is waiting for the event but it never comes. inotify_add_watch() should return -EINVAL as it did before commit 676a0675cf92 ("inotify: remove broken mask checks causing unmount to be EINVAL"). That commit removes the invalid mask check, but that check is needed. Check the mask's ALL_INOTIFY_BITS before the inotify_arg_to_mask() call. If none are set, just return -EINVAL. Because IN_UNMOUNT is in ALL_INOTIFY_BITS, this change will not trigger the problem that above commit fixed. [akpm@linux-foundation.org: fix build] Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Jim Somerville <Jim.Somerville@windriver.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Eric Paris <eparis@parisplace.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | inotify: convert inotify_add_to_idr() to use idr_alloc_cyclic()Jeff Layton2013-04-291-6/+2
|/ | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hlist: drop the node parameter from iteratorsSasha Levin2013-02-273-24/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* inotify: convert to idr_alloc()Tejun Heo2013-02-271-13/+11
| | | | | | | | | | | | | | | | | | Convert to the much saner new idr interface. Note that the adhoc cyclic id allocation is buggy. If wraparound happens, the previous code with idr_get_new_above() may segfault and the converted code will trigger WARN and return -EINVAL. Even if it's fixed to wrap to zero, the code will be prone to unnecessary -ENOSPC failures after the first wraparound. We probably need to implement proper cyclic support in idr. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* inotify: don't use idr_remove_all()Tejun Heo2013-02-271-1/+0
| | | | | | | | | | | | idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2013-02-262-3/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
| * new helper: file_inode(file)Al Viro2013-02-222-3/+3
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | inotify: remove broken mask checks causing unmount to be EINVALJim Somerville2013-02-211-4/+0
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Running the command: inotifywait -e unmount /mnt/disk immediately aborts with a -EINVAL return code. This is however a valid parameter. This abort occurs only if unmount is the sole event parameter. If other event parameters are supplied, then the unmount event wait will work. The problem was introduced by commit 44b350fc23e ("inotify: Fix mask checks"). In that commit, it states: The mask checks in inotify_update_existing_watch() and inotify_new_watch() are useless because inotify_arg_to_mask() sets FS_IN_IGNORED and FS_EVENT_ON_CHILD bits anyway. But instead of removing the useless checks, it did this: mask = inotify_arg_to_mask(arg); - if (unlikely(!mask)) + if (unlikely(!(mask & IN_ALL_EVENTS))) return -EINVAL; The problem is that IN_ALL_EVENTS doesn't include IN_UNMOUNT, and other parts of the code keep IN_UNMOUNT separate from IN_ALL_EVENTS. So the check should be: if (unlikely(!(mask & (IN_ALL_EVENTS | IN_UNMOUNT)))) But inotify_arg_to_mask(arg) always sets the IN_UNMOUNT bit in the mask anyway, so the check is always going to pass and thus should simply be removed. Also note that inotify_arg_to_mask completely controls what mask bits get set from arg, there's no way for invalid bits to get enabled there. Lets fix it by simply removing the useless broken checks. Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Cc: <stable@vger.kernel.org> [2.6.37+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-next' of git://git.infradead.org/users/eparis/notifyLinus Torvalds2012-12-2011-104/+152
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull filesystem notification updates from Eric Paris: "This pull mostly is about locking changes in the fsnotify system. By switching the group lock from a spin_lock() to a mutex() we can now hold the lock across things like iput(). This fixes a problem involving unmounting a fs and having inodes be busy, first pointed out by FAT, but reproducible with tmpfs. This also restores signal driven I/O for inotify, which has been broken since about 2.6.32." Ugh. I *hate* the timing of this. It was rebased after the merge window opened, and then left to sit with the pull request coming the day before the merge window closes. That's just crap. But apparently the patches themselves have been around for over a year, just gathering dust, so now it's suddenly critical. Fixed up semantic conflict in fs/notify/fdinfo.c as per Stephen Rothwell's fixes from -next. * 'for-next' of git://git.infradead.org/users/eparis/notify: inotify: automatically restart syscalls inotify: dont skip removal of watch descriptor if creation of ignored event failed fanotify: dont merge permission events fsnotify: make fasync generic for both inotify and fanotify fsnotify: change locking order fsnotify: dont put marks on temporary list when clearing marks by group fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark() fsnotify: pass group to fsnotify_destroy_mark() fsnotify: use a mutex instead of a spinlock to protect a groups mark list fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed fsnotify: take groups mark_lock before mark lock fsnotify: use reference counting for groups fsnotify: introduce fsnotify_get_group() inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()
| * inotify: automatically restart syscallsEric Paris2012-12-111-1/+1
| | | | | | | | | | | | | | | | | | We were mistakenly returning EINTR when we found an outstanding signal. Instead we should returen ERESTARTSYS and allow the kernel to handle things the right way. Patch-from: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
| * inotify: dont skip removal of watch descriptor if creation of ignored event ↵Lino Sanfilippo2012-12-111-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | failed In inotify_ignored_and_remove_idr() the removal of a watch descriptor is skipped if the allocation of an ignored event failed and we are leaking memory (the watch descriptor and the mark linked to it). This patch ensures that the watch descriptor is removed regardless of whether event creation failed or not. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * fanotify: dont merge permission eventsLino Sanfilippo2012-12-111-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Boyd Yang reported a problem for the case that multiple threads of the same thread group are waiting for a reponse for a permission event. In this case it is possible that some of the threads are never woken up, even if the response for the event has been received (see http://marc.info/?l=linux-kernel&m=131822913806350&w=2). The reason is that we are currently merging permission events if they belong to the same thread group. But we are not prepared to wake up more than one waiter for each event. We do wait_event(group->fanotify_data.access_waitq, event->response || atomic_read(&group->fanotify_data.bypass_perm)); and after that event->response = 0; which is the reason that even if we woke up all waiters for the same event some of them may see event->response being already set 0 again, then go back to sleep and block forever. With this patch we avoid that more than one thread is waiting for a response by not merging permission events for the same thread group any more. Reported-by: Boyd Yang <boyd.yang@gmail.com> Signed-off-by: Lino Sanfilippo <LinoSanfilipp@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * fsnotify: make fasync generic for both inotify and fanotifyEric Paris2012-12-114-9/+16
| | | | | | | | | | | | | | | | inotify is supposed to support async signal notification when information is available on the inotify fd. This patch moves that support to generic fsnotify functions so it can be used by all notification mechanisms. Signed-off-by: Eric Paris <eparis@redhat.com>
| * fsnotify: change locking orderLino Sanfilippo2012-12-111-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote: > > I finally built and tested a v3.0 kernel with these patches (I know I'm > SOOOOOO far behind). Not what I hoped for: > > > [ 150.937798] VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds. Have a nice day... > > [ 150.945290] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070 > > [ 150.946012] IP: [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] PGD 2bf9e067 PUD 2bf9f067 PMD 0 > > [ 150.946012] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > [ 150.946012] CPU 0 > > [ 150.946012] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ext4 jbd2 crc16 joydev ata_piix i2c_piix4 pcspkr uinput ipv6 autofs4 usbhid [last unloaded: scsi_wait_scan] > > [ 150.946012] > > [ 150.946012] Pid: 2764, comm: syscall_thrash Not tainted 3.0.0+ #1 Red Hat KVM > > [ 150.946012] RIP: 0010:[<ffffffff810ffd58>] [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] RSP: 0018:ffff88002c2e5df8 EFLAGS: 00010282 > > [ 150.946012] RAX: 000000004e370d9f RBX: 0000000000000000 RCX: ffff88003a029438 > > [ 150.946012] RDX: 0000000033630a5f RSI: 0000000000000000 RDI: ffff88003491c240 > > [ 150.946012] RBP: ffff88002c2e5e08 R08: 0000000000000000 R09: 0000000000000000 > > [ 150.946012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a029428 > > [ 150.946012] R13: ffff88003a029428 R14: ffff88003a029428 R15: ffff88003499a610 > > [ 150.946012] FS: 00007f5a05420700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000 > > [ 150.946012] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > [ 150.946012] CR2: 0000000000000070 CR3: 000000002a662000 CR4: 00000000000006f0 > > [ 150.946012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 150.946012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > [ 150.946012] Process syscall_thrash (pid: 2764, threadinfo ffff88002c2e4000, task ffff88002bfbc760) > > [ 150.946012] Stack: > > [ 150.946012] ffff88003a029438 ffff88003a029428 ffff88002c2e5e38 ffffffff81102f76 > > [ 150.946012] ffff88003a029438 ffff88003a029598 ffffffff8160f9c0 ffff88002c221250 > > [ 150.946012] ffff88002c2e5e68 ffffffff8115e9be ffff88002c2e5e68 ffff88003a029438 > > [ 150.946012] Call Trace: > > [ 150.946012] [<ffffffff81102f76>] shmem_evict_inode+0x76/0x130 > > [ 150.946012] [<ffffffff8115e9be>] evict+0x7e/0x170 > > [ 150.946012] [<ffffffff8115ee40>] iput_final+0xd0/0x190 > > [ 150.946012] [<ffffffff8115ef33>] iput+0x33/0x40 > > [ 150.946012] [<ffffffff81180205>] fsnotify_destroy_mark_locked+0x145/0x160 > > [ 150.946012] [<ffffffff81180316>] fsnotify_destroy_mark+0x36/0x50 > > [ 150.946012] [<ffffffff81181937>] sys_inotify_rm_watch+0x77/0xd0 > > [ 150.946012] [<ffffffff815aca52>] system_call_fastpath+0x16/0x1b > > [ 150.946012] Code: 67 4a 00 b8 e4 ff ff ff eb aa 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 8b 9f 40 05 00 00 > > [ 150.946012] 83 7b 70 00 74 1c 4c 8d a3 80 00 00 00 4c 89 e7 e8 d2 5d 4a > > [ 150.946012] RIP [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] RSP <ffff88002c2e5df8> > > [ 150.946012] CR2: 0000000000000070 > > Looks at aweful lot like the problem from: > http://www.spinics.net/lists/linux-fsdevel/msg46101.html > I tried to reproduce this bug with your test program, but without success. However, if I understand correctly, this occurs since we dont hold any locks when we call iput() in mark_destroy(), right? With the patches you tested, iput() is also not called within any lock, since the groups mark_mutex is released temporarily before iput() is called. This is, since the original codes behaviour is similar. However since we now have a mutex as the biggest lock, we can do what you suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and call iput() with the mutex held to avoid the race. The patch below implements this. It uses nested locking to avoid deadlock in case we do the final iput() on an inode which still holds marks and thus would take the mutex again when calling fsnotify_inode_delete() in destroy_inode(). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * fsnotify: dont put marks on temporary list when clearing marks by groupLino Sanfilippo2012-12-111-8/+2
| | | | | | | | | | | | | | | | | | | | | | In clear_marks_by_group_flags() the mark list of a group is iterated and the marks are put on a temporary list. Since we introduced fsnotify_destroy_mark_locked() we dont need the temp list any more and are able to remove the marks while the mark list is iterated and the mark list mutex is held. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * fsnotify: introduce locked versions of fsnotify_add_mark() and ↵Lino Sanfilippo2012-12-111-12/+30
| | | | | | | | | | | | | | | | | | | | | | fsnotify_remove_mark() This patch introduces fsnotify_add_mark_locked() and fsnotify_remove_mark_locked() which are essentially the same as fsnotify_add_mark() and fsnotify_remove_mark() but assume that the caller has already taken the groups mark mutex. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * fsnotify: pass group to fsnotify_destroy_mark()Lino Sanfilippo2012-12-117-25/+28
| | | | | | | | | | | | | | | | In fsnotify_destroy_mark() dont get the group from the passed mark anymore, but pass the group itself as an additional parameter to the function. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * fsnotify: use a mutex instead of a spinlock to protect a groups mark listLino Sanfilippo2012-12-114-14/+14
| | | | | | | | | | | | | | | | | | Replaces the groups mark_lock spinlock with a mutex. Using a mutex instead of a spinlock results in more flexibility (i.e it allows to sleep while the lock is held). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * fanotify: add an extra flag to mark_remove_from_mask that indicates wheather ↵Lino Sanfilippo2012-12-111-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | a mark should be destroyed This patch adds an extra flag to mark_remove_from_mask() to inform the caller if the mark should be destroyed. With this we dont destroy the mark implicitly in the function itself any more but let the caller handle it. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * fsnotify: take groups mark_lock before mark lockLino Sanfilippo2012-12-111-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Race-free addition and removal of a mark to a groups mark list would be easier if we could lock the mark list of group before we lock the specific mark. This patch changes the order used to add/remove marks to/from mark lists from 1. mark->lock 2. group->mark_lock 3. inode->i_lock to 1. group->mark_lock 2. mark->lock 3. inode->i_lock Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * fsnotify: use reference counting for groupsLino Sanfilippo2012-12-114-28/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get a group ref for each mark that is added to the groups list and release that ref when the mark is freed in fsnotify_put_mark(). We also use get a group reference for duplicated marks and for private event data. Now we dont free a group any more when the number of marks becomes 0 but when the groups ref count does. Since this will only happen when all marks are removed from a groups mark list, we dont have to set the groups number of marks to 1 at group creation. Beside clearing all marks in fsnotify_destroy_group() we do also flush the groups event queue. This is since events may hold references to groups (due to private event data) and we have to put those references first before we get a chance to put the final ref, which will result in a call to fsnotify_final_destroy_group(). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * fsnotify: introduce fsnotify_get_group()Lino Sanfilippo2012-12-111-0/+8
| | | | | | | | | | | | | | Introduce fsnotify_get_group() which increments the reference counter of a group. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()Lino Sanfilippo2012-12-113-13/+11
| | | | | | | | | | | | | | | | | | | | | | Currently in fsnotify_put_group() the ref count of a group is decremented and if it becomes 0 fsnotify_destroy_group() is called. Since a groups ref count is only at group creation set to 1 and never increased after that a call to fsnotify_put_group() always results in a call to fsnotify_destroy_group(). With this patch fsnotify_destroy_group() is called directly. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
* | fs, fanotify: add @mflags field to fanotify outputCyrill Gorcunov2012-12-171-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel keeps FAN_MARK_IGNORED_SURV_MODIFY bit separately from fsnotify_mark::mask|ignored_mask thus put it in @mflags (mark flags) field so the user-space reader will be able to detect if such bit were used on mark creation procedure. | pos: 0 | flags: 04002 | fanotify flags:10 event-flags:0 | fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003 | fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4 Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: James Bottomley <jbottomley@parallels.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Matthew Helsley <matt.helsley@gmail.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs, notify: add procfs fdinfo helperCyrill Gorcunov2012-12-175-1/+207
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allow us to print out fsnotify details such as watchee inode, device, mask and optionally a file handle. For inotify objects if kernel compiled with exportfs support the output will be | pos: 0 | flags: 02000000 | inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d | inotify wd:2 ino:a111 sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:11a1000020542153 | inotify wd:1 ino:6b149 sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:49b1060023552153 If kernel compiled without exportfs support, the file handle won't be provided but inode and device only. | pos: 0 | flags: 02000000 | inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 | inotify wd:2 ino:a111 sdev:800013 mask:800afce ignored_mask:0 | inotify wd:1 ino:6b149 sdev:800013 mask:800afce ignored_mask:0 For fanotify the output is like | pos: 0 | flags: 04002 | fanotify flags:10 event-flags:0 | fanotify mnt_id:12 mask:3b ignored_mask:0 | fanotify ino:50205 sdev:800013 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:05020500fb1d47e7 To minimize impact on general fsnotify code the new functionality is gathered in fs/notify/fdinfo.c file. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: James Bottomley <jbottomley@parallels.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matthew Helsley <matt.helsley@gmail.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs/notify/inode_mark.c: make fsnotify_find_inode_mark_locked() staticTushar Behera2012-12-171-2/+3
| | | | | | | | | | | | | | | | | | | | | | Fixes following sparse warning: fs/notify/inode_mark.c:127:22: warning: symbol 'fsnotify_find_inode_mark_locked' was not declared. Should it be static? Signed-off-by: Tushar Behera <tushar.behera@linaro.org> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2012-12-132-2/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial branch from Jiri Kosina: "Usual stuff -- comment/printk typo fixes, documentation updates, dead code elimination." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) HOWTO: fix double words typo x86 mtrr: fix comment typo in mtrr_bp_init propagate name change to comments in kernel source doc: Update the name of profiling based on sysfs treewide: Fix typos in various drivers treewide: Fix typos in various Kconfig wireless: mwifiex: Fix typo in wireless/mwifiex driver messages: i2o: Fix typo in messages/i2o scripts/kernel-doc: check that non-void fcts describe their return value Kernel-doc: Convention: Use a "Return" section to describe return values radeon: Fix typo and copy/paste error in comments doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c various: Fix spelling of "asynchronous" in comments. Fix misspellings of "whether" in comments. eisa: Fix spelling of "asynchronous". various: Fix spelling of "registered" in comments. doc: fix quite a few typos within Documentation target: iscsi: fix comment typos in target/iscsi drivers treewide: fix typo of "suport" in various comments and Kconfig treewide: fix typo of "suppport" in various comments ...
| * | various: Fix spelling of "asynchronous" in comments.Adam Buchbinder2012-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | | "Asynchronous" is misspelled in some comments. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
| * | treewide: fix typo of "suport" in various comments and KconfigMasanari Iida2012-11-191-1/+1
| | | | | | | | | | | | | | | Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | | fanotify: fix FAN_Q_OVERFLOW case of fanotify_read()Al Viro2012-11-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the FAN_Q_OVERFLOW bit set in event->mask, the fanotify event metadata will not contain a valid file descriptor, but copy_event_to_user() didn't check for that, and unconditionally does a fd_install() on the file descriptor. Which in turn will cause a BUG_ON() in __fd_install(). Introduced by commit 352e3b249284 ("fanotify: sanitize failure exits in copy_event_to_user()") Mea culpa - missed that path ;-/ Reported-by: Alex Shi <lkml.alex@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | fanotify: fix missing breakEric Paris2012-11-091-0/+1
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Anders Blomdell noted in 2010 that Fanotify lost events and provided a test case. Eric Paris confirmed it was a bug and posted a fix to the list https://groups.google.com/forum/?fromgroups=#!topic/linux.kernel/RrJfTfyW2BE but never applied it. Repeated attempts over time to actually get him to apply it have never had a reply from anyone who has raised it So apply it anyway Signed-off-by: Alan Cox <alan@linux.intel.com> Reported-by: Anders Blomdell <anders.blomdell@control.lth.se> Cc: Eric Paris <eparis@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | switch simple cases of fget_light to fdgetAl Viro2012-09-262-29/+27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fanotify: sanitize failure exits in copy_event_to_user()Al Viro2012-09-261-39/+20
|/ | | | | | | | | | * do copy_to_user() before prepare_for_access_response(); that kills the need in remove_access_response(). * don't do fd_install() until we are past the last possible failure exit. Don't use sys_close() on cleanup side - just put_unused_fd() and fput(). Less racy that way... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch dentry_open() to struct path, make it grab references itselfAl Viro2012-07-231-6/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: switch i_dentry/d_alias to hlistAl Viro2012-07-141-1/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fsnotify: remove unused parameter from send_to_group()Dan Carpenter2012-05-301-6/+6
| | | | | | | | | | We don't use "mnt" anymore in send_to_group() after 1968f5eed5 ("fanotify: use both marks when possible") was applied. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs/notify/notification.c: make subsys_initcall function staticH Hartley Sweeten2012-03-231-2/+1
| | | | | | | | | Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Arun Sharma <asharma@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fsnotify: don't BUG in fsnotify_destroy_mark()Miklos Szeredi2012-01-141-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | Removing the parent of a watched file results in "kernel BUG at fs/notify/mark.c:139". To reproduce add "-w /tmp/audit/dir/watched_file" to audit.rules rm -rf /tmp/audit/dir This is caused by fsnotify_destroy_mark() being called without an extra reference taken by the caller. Reported by Francesco Cosoleto here: https://bugzilla.novell.com/show_bug.cgi?id=689860 Fix by removing the BUG_ON and adding a comment about not accessing mark after the iput. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>