aboutsummaryrefslogtreecommitdiffstats
path: root/fs/namei.c
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | fs: remove the second argument of k[un]map_atomic()Cong Wang2012-03-201-2/+2
| |/ / | | | | | | | | | | | | Acked-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Cong Wang <amwang@redhat.com>
* | | Merge branch 'dcache-word-accesses'Linus Torvalds2012-03-191-0/+122
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * branch 'dcache-word-accesses': vfs: use 'unsigned long' accesses for dcache name comparison and hashing This does the name hashing and lookup using word-sized accesses when that is efficient, namely on x86 (although any little-endian machine with good unaligned accesses would do). It does very much depend on little-endian logic, but it's a very hot couple of functions under some real loads, and this patch improves the performance of __d_lookup_rcu() and link_path_walk() by up to about 30%. Giving a 10% improvement on some very pathname-heavy benchmarks. Because we do make unaligned accesses past the filename, the optimization is disabled when CONFIG_DEBUG_PAGEALLOC is active, and we effectively depend on the fact that on x86 we don't really ever have the last page of usable RAM followed immediately by any IO memory (due to ACPI tables, BIOS buffer areas etc). Some of the bit operations we do are a bit "subtle". It's commented, but you do need to really think about the code. Or just consider it black magic. Thanks to people on G+ for some of the optimized bit tricks.
| * | vfs: use 'unsigned long' accesses for dcache name comparison and hashingLinus Torvalds2012-03-081-0/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ok, this is hacky, and only works on little-endian machines with goo unaligned handling. And even then only with CONFIG_DEBUG_PAGEALLOC disabled, since it can access up to 7 bytes after the pathname. But it runs like a bat out of hell. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | vfs: fix return value from do_last()Miklos Szeredi2012-03-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | complete_walk() returns either ECHILD or ESTALE. do_last() turns this into ECHILD unconditionally. If not in RCU mode, this error will reach userspace which is complete nonsense. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: fix double put after complete_walk()Miklos Szeredi2012-03-101-1/+1
|/ / | | | | | | | | | | | | | | | | | | | | complete_walk() already puts nd->path, no need to do it again at cleanup time. This would result in Oopses if triggered, apparently the codepath is not too well exercised. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | vfs: export full_name_hash() function to modulesLinus Torvalds2012-03-021-0/+1
| | | | | | | | | | | | | | | | Commit 5707c87f "vfs: uninline full_name_hash()" broke the modular build, because it needs exporting now that it isn't inlined any more. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | vfs: split up name hashing in link_path_walk() into helper functionLinus Torvalds2012-03-021-18/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code in link_path_walk() that finds out the length and the hash of the next path component is some of the hottest code in the kernel. And I have a version of it that does things at the full width of the CPU wordsize at a time, but that means that we *really* want to split it up into a separate helper function. So this re-organizes the code a bit and splits the hashing part into a helper function called "hash_name()". It returns the length of the pathname component, while at the same time computing and writing the hash to the appropriate location. The code generation is slightly changed by this patch, but generally for the better - and the added abstraction actually makes the code easier to read too. And the new interface is well suited for replacing just the "hash_name()" function with alternative implementations. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | vfs: uninline full_name_hash()Linus Torvalds2012-03-021-4/+9
|/ | | | | | | | | | | | | | | .. and also use it in lookup_one_len() rather than open-coding it. There aren't any performance-critical users, so inlining it is silly. But it wouldn't matter if it wasn't for the fact that the word-at-a-time dentry name patches want to conditionally replace the function, and uninlining it sets the stage for that. So again, this is a preparatory patch that doesn't change any semantics, and only prepares for a much cleaner and testable word-at-a-time dentry name accessor patch. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vfs: fix d_inode_lookup() dentry ref leakMiklos Szeredi2012-02-131-1/+3
| | | | | | | | d_inode_lookup() leaks a dentry reference on IS_DEADDIR(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* audit: do not call audit_getname on errorEric Paris2012-01-171-15/+13
| | | | | | | | Just a code cleanup really. We don't need to make a function call just for it to return on error. This also makes the VFS function even easier to follow and removes a conditional on a hot path. Signed-off-by: Eric Paris <eparis@redhat.com>
*-. Merge branches 'vfsmount-guts', 'umode_t' and 'partitions' into ZAl Viro2012-01-061-9/+9
|\ \
| * | switch open and mkdir syscalls to umode_tAl Viro2012-01-031-3/+3
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | switch may_mknod() to umode_tAl Viro2012-01-031-1/+1
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | switch ->mknod() to umode_tAl Viro2012-01-031-1/+1
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | switch ->create() to umode_tAl Viro2012-01-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | switch vfs_mkdir() and ->mkdir() to umode_tAl Viro2012-01-031-1/+1
| | | | | | | | | | | | | | | | | | | | | vfs_mkdir() gets int, but immediately drops everything that might not fit into umode_t and that's the only caller of ->mkdir()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | switch sys_mknodat(2) to umode_tAl Viro2012-01-031-2/+2
| |/ | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | vfs: move mnt_mountpoint to struct mountAl Viro2012-01-031-2/+2
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | vfs: now it can be done - make mnt_parent point to struct mountAl Viro2012-01-031-11/+13
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | vfs: mnt_parent moved to struct mountAl Viro2012-01-031-2/+2
| | | | | | | | | | | | the second victim... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | vfs: spread struct mount - __lookup_mnt() resultAl Viro2012-01-031-6/+7
|/ | | | | | switch __lookup_mnt() to returning struct mount *; callers adjusted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: we need to set LOOKUP_JUMPED on mountpoint crossingAl Viro2011-11-071-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mountpoint crossing is similar to following procfs symlinks - we do not get ->d_revalidate() called for dentry we have arrived at, with unpleasant consequences for NFS4. Simple way to reproduce the problem in mainline: cat >/tmp/a.c <<'EOF' #include <unistd.h> #include <fcntl.h> #include <stdio.h> main() { struct flock fl = {.l_type = F_RDLCK, .l_whence = SEEK_SET, .l_len = 1}; if (fcntl(0, F_SETLK, &fl)) perror("setlk"); } EOF cc /tmp/a.c -o /tmp/test then on nfs4: mount --bind file1 file2 /tmp/test < file1 # ok /tmp/test < file2 # spews "setlk: No locks available"... What happens is the missing call of ->d_revalidate() after mountpoint crossing and that's where NFS4 would issue OPEN request to server. The fix is simple - treat mountpoint crossing the same way we deal with following procfs-style symlinks. I.e. set LOOKUP_JUMPED... Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* readlinkat: ensure we return ENOENT for the empty pathname for normal lookupsAndy Whitcroft2011-11-021-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the commit below which added O_PATH support to the *at() calls, the error return for readlink/readlinkat for the empty pathname has switched from ENOENT to EINVAL: commit 65cfc6722361570bfe255698d9cd4dccaf47570d Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sun Mar 13 15:56:26 2011 -0400 readlinkat(), fchownat() and fstatat() with empty relative pathnames This is both unexpected for userspace and makes readlink/readlinkat inconsistant with all other interfaces; and inconsistant with our stated return for these pathnames. As the readlinkat call does not have a flags parameter we cannot use the AT_EMPTY_PATH approach used in the other calls. Therefore expose whether the original path is infact entry via a new user_path_at_empty() path lookup function. Use this to determine whether to default to EINVAL or ENOENT for failures. Addresses http://bugs.launchpad.net/bugs/817187 [akpm@linux-foundation.org: remove unused getname_flags()] Signed-off-by: Andy Whitcroft <apw@canonical.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
* leases: fix write-open/read-lease raceJ. Bruce Fields2011-10-281-4/+1
| | | | | | | | | | | | | | | | In setlease, we use i_writecount to decide whether we can give out a read lease. In open, we break leases before incrementing i_writecount. There is therefore a window between the break lease and the i_writecount increment when setlease could add a new read lease. This would leave us with a simultaneous write open and read lease, which shouldn't happen. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* vfs: add a comment to inode_permission()Andreas Gruenbacher2011-10-281-2/+4
| | | | | | | | Acked-by: J. Bruce Fields <bfields@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andreas Gruenbacher <agruen@kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* vfs: pass all mask flags check_acl and posix_acl_permissionAndreas Gruenbacher2011-10-281-2/+0
| | | | | | | | Acked-by: J. Bruce Fields <bfields@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andreas Gruenbacher <agruen@kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* vfs: indicate that the permission functions take all the MAY_* flagsAndreas Gruenbacher2011-10-281-2/+2
| | | | | | | | Acked-by: J. Bruce Fields <bfields@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andreas Gruenbacher <agruen@kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* vfs: remove LOOKUP_NO_AUTOMOUNT flagLinus Torvalds2011-09-271-6/+0
| | | | | | | | | | | | | | | | | | | | | | That flag no longer makes sense, since we don't look up automount points as eagerly any more. Additionally, it turns out that the NO_AUTOMOUNT handling was buggy to begin with: it would avoid automounting even for cases where we really *needed* to do the automount handling, and could return ENOENT for autofs entries that hadn't been instantiated yet. With our new non-eager automount semantics, one discussion has been about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the newfstatat() and fstatat64() system calls), but it's probably not worth it: you can always force at least directory automounting by simply adding the final '/' to the filename, which works for *all* of the stat family system calls, old and new. So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a result of our bad default behavior. Acked-by: Ian Kent <raven@themaw.net> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vfs pathname lookup: Add LOOKUP_AUTOMOUNT flagLinus Torvalds2011-09-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we've now turned around and made LOOKUP_FOLLOW *not* force an automount, we want to add the ability to force an automount event on lookup even if we don't happen to have one of the other flags that force it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..) Most cases will never want to use this, since you'd normally want to delay automounting as long as possible, which usually implies LOOKUP_OPEN (when we open a file or directory, we really cannot avoid the automount any more). But Trond argued sufficiently forcefully that at a minimum bind mounting a file and quotactl will want to force the automount lookup. Some other cases (like nfs_follow_remote_path()) could use it too, although LOOKUP_DIRECTORY would work there as well. This commit just adds the flag and logic, no users yet, though. It also doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and was made irrelevant by the same change that made us not follow on LOOKUP_FOLLOW. Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Ian Kent <raven@themaw.net> Cc: Jeff Layton <jlayton@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* restore pinning the victim dentry in vfs_rmdir()/vfs_rename_dir()Al Viro2011-09-141-0/+4
| | | | | | | | | | | | | | | | We used to get the victim pinned by dentry_unhash() prior to commit 64252c75a219 ("vfs: remove dget() from dentry_unhash()") and ->rmdir() and ->rename() instances relied on that; most of them don't care, but ones that used d_delete() themselves do. As the result, we are getting rmdir() oopses on NFS now. Just grab the reference before locking the victim and drop it explicitly after unlocking, same as vfs_rename_other() does. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Simon Kirby <sim@hostway.ca> Cc: stable@kernel.org (3.0.x) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vfs: automount should ignore LOOKUP_FOLLOWMiklos Szeredi2011-09-091-18/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to 2.6.38 automount would not trigger on either stat(2) or lstat(2) on the automount point. After 2.6.38, with the introduction of the ->d_automount() infrastructure, stat(2) and others would start triggering automount while lstat(2), etc. still would not. This is a regression and a userspace ABI change. Problem originally reported here: http://thread.gmane.org/gmane.linux.kernel.autofs/6098 It appears that there was an attempt at fixing various userspace tools to not trigger the automount. But since the stat system call is rather common it is impossible to "fix" all userspace. This patch reverts the original behavior, which is to not trigger on stat(2) and other symlink following syscalls. [ It's not really clear what the right behavior is. Apparently Solaris does the "automount on stat, leave alone on lstat". And some programs can get unhappy when "stat+open+fstat" ends up giving a different result from the fstat than from the initial stat. But the change in 2.6.38 resulted in problems for some people, so we're going back to old behavior. Maybe we can re-visit this discussion at some future date - Linus ] Reported-by: Leonardo Chiquitto <leonardo.lists@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Ian Kent <raven@themaw.net> Cc: David Howells <dhowells@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vfs: rename 'do_follow_link' to 'should_follow_link'Linus Torvalds2011-08-071-2/+2
| | | | | | | | Al points out that the do_follow_link() helper function really is misnamed - it's about whether we should try to follow a symlink or not, not about actually doing the following. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fix POSIX ACL permission checkAri Savolainen2011-08-071-1/+1
| | | | | | | | | After commit 3567866bf261: "RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached" posix_acl_permission is being called with an unsupported flag and the permission check fails. This patch fixes the issue. Signed-off-by: Ari Savolainen <ari.m.savolainen@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: optimize inode cache access patternsLinus Torvalds2011-08-061-10/+66
| | | | | | | | | | | | | | | | | | | | | | | | The inode structure layout is largely random, and some of the vfs paths really do care. The path lookup in particular is already quite D$ intensive, and profiles show that accessing the 'inode->i_op->xyz' fields is quite costly. We already optimized the dcache to not unnecessarily load the d_op structure for members that are often NULL using the DCACHE_OP_xyz bits in dentry->d_flags, and this does something very similar for the inode ops that are used during pathname lookup. It also re-orders the fields so that the fields accessed by 'stat' are together at the beginning of the inode structure, and roughly in the order accessed. The effect of this seems to be in the 1-2% range for an empty kernel "make -j" run (which is fairly kernel-intensive, mostly in filename lookup), so it's visible. The numbers are fairly noisy, though, and likely depend a lot on exact microarchitecture. So there's more tuning to be done. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cachedAl Viro2011-08-031-11/+6
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Fix automount for negative autofs dentriesDavid Howells2011-08-011-9/+15
| | | | | | | | | Autofs may set the DCACHE_NEED_AUTOMOUNT flag on negative dentries. These need attention from the automounter daemon regardless of the LOOKUP_FOLLOW flag. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: fix check_acl compile error when CONFIG_FS_POSIX_ACL is not setLinus Torvalds2011-07-251-0/+2
| | | | | | | | | | | | | | | | | Commit e77819e57f08 ("vfs: move ACL cache lookup into generic code") didn't take the FS_POSIX_ACL config variable into account - when that is not set, ACL's go away, and the cache helper functions do not exist, causing compile errors like fs/namei.c: In function 'check_acl': fs/namei.c:191:10: error: implicit declaration of function 'negative_cached_acl' fs/namei.c:196:2: error: implicit declaration of function 'get_cached_acl' fs/namei.c:196:6: warning: assignment makes pointer from integer without a cast fs/namei.c:212:11: error: implicit declaration of function 'set_cached_acl' Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Acked-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vfs: make gcc generate more obvious code for acl permission checkingLinus Torvalds2011-07-251-1/+1
| | | | | | | | | | | | The "fsuid is the inode owner" case is not necessarily always the likely case, but it's the case that doesn't do anything odd and that we want in straight-line code. Make gcc not generate random "jump around for the fun of it" code. This just helps me read profiles. That thing is one of the hottest parts of the whole pathname lookup. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: take the ACL checks to common codeChristoph Hellwig2011-07-251-11/+13
| | | | | | | | | Replace the ->check_acl method with a ->get_acl method that simply reads an ACL from disk after having a cache miss. This means we can replace the ACL checking boilerplate code with a single implementation in namei.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: move ACL cache lookup into generic codeLinus Torvalds2011-07-251-3/+49
| | | | | | | | | | | | | | This moves logic for checking the cached ACL values from low-level filesystems into generic code. The end result is a streamlined ACL check that doesn't need to load the inode->i_op->check_acl pointer at all for the common cached case. The filesystems also don't need to check for a non-blocking RCU walk case in their acl_check() functions, because that is all handled at a VFS layer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Fixup kerneldoc for generic_permission()Tobias Klauser2011-07-201-1/+0
| | | | | | | | The flags parameter went away in d749519b444db985e40b897f73ce1898b11f997e Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* unexport kern_path_parent()Al Viro2011-07-201-1/+0
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch vfs_path_lookup() to struct pathAl Viro2011-07-201-5/+11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* kill lookup_create()Al Viro2011-07-201-36/+18
| | | | | | folded into the only caller (kern_path_create()) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* new helpers: kern_path_create/user_path_createAl Viro2011-07-201-78/+76
| | | | | | | combination of kern_path_parent() and lookup_create(). Does *not* expose struct nameidata to caller. Syscalls converted to that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* kill LOOKUP_CONTINUEAl Viro2011-07-201-8/+3
| | | | | | LOOKUP_PARENT is equivalent to it now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* don't transliterate lower bits of ->intent.open.flags to FMODE_...Al Viro2011-07-201-19/+2
| | | | | | ->create() instances are much happier that way... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Don't pass nameidata when calling vfs_create() from mknod()Al Viro2011-07-201-1/+1
| | | | | | | All instances can cope with that now (and ceph one actually starts working properly). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* merge do_revalidate() into its only callerAl Viro2011-07-201-24/+18
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* no reason to keep exec_permission() separate nowAl Viro2011-07-201-41/+4
| | | | | | cache footprint alone makes it a bad idea... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>