aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2010-11-276-32/+55
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Ensure we return the dirent->d_type when it is known NFS: Correct the array bound calculation in nfs_readdir_add_to_array NFS: Don't ignore errors from nfs_do_filldir() NFS: Fix the error handling in "uncached_readdir()" NFS: Fix a page leak in uncached_readdir() NFS: Fix a page leak in nfs_do_filldir() NFS: Assume eof if the server returns no readdir records NFS: Buffer overflow in ->decode_dirent() should not be fatal Pure nfs client performance using odirect. SUNRPC: Fix an infinite loop in call_refresh/call_refreshresult
| * NFS: Ensure we return the dirent->d_type when it is knownTrond Myklebust2010-11-225-3/+21
| | | | | | | | | | | | | | | | | | Store the dirent->d_type in the struct nfs_cache_array_entry so that we can use it in getdents() calls. This fixes a regression with the new readdir code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: Correct the array bound calculation in nfs_readdir_add_to_arrayTrond Myklebust2010-11-221-4/+5
| | | | | | | | | | | | | | It looks as if the array size calculation in MAX_READDIR_ARRAY does not take the alignment of struct nfs_cache_array_entry into account. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: Don't ignore errors from nfs_do_filldir()Trond Myklebust2010-11-221-9/+9
| | | | | | | | | | | | | | | | We should ignore the errors from the filldir callback, and just interpret them as meaning we should exit, however we should definitely pass back ENOMEM errors. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: Fix the error handling in "uncached_readdir()"Trond Myklebust2010-11-221-3/+2
| | | | | | | | | | | | | | Currently, uncached_readdir() is broken because if fails to handle the results from nfs_readdir_xdr_to_array() correctly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: Fix a page leak in uncached_readdir()Trond Myklebust2010-11-221-2/+3
| | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: Fix a page leak in nfs_do_filldir()Trond Myklebust2010-11-221-5/+5
| | | | | | | | | | | | | | | | | | nfs_do_filldir() must always free desc->page when it is done, otherwise we end up leaking the page. Also remove unused variable 'dentry'. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: Assume eof if the server returns no readdir recordsTrond Myklebust2010-11-221-3/+7
| | | | | | | | | | | | Some servers are known to be buggy w.r.t. this. Deal with them... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: Buffer overflow in ->decode_dirent() should not be fatalTrond Myklebust2010-11-223-3/+3
| | | | | | | | | | | | | | Overflowing the buffer in the readdir ->decode_dirent() should not lead to a fatal error, but rather to an attempt to reread the record in question. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * Pure nfs client performance using odirect.Arun Bharadwaj2010-11-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | When an application opens a file with O_DIRECT flag, if the size of the data that is written is equal to wsize, the client sends a WRITE RPC with stable flag set to UNSTABLE followed by a single COMMIT RPC rather than sending a single WRITE RPC with the stable flag set to FILE_SYNC. This a bug. Patch to fix this. Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2010-11-271-25/+6
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-2.6-block: cciss: fix build for PROC_FS disabled block: fix amiga and atari floppy driver compile warning blk-throttle: Fix calculation of max number of WRITES to be dispatched ioprio: grab rcu_read_lock in sys_ioprio_{set,get}() xen/blkfront: cope with backend that fail empty BLKIF_OP_WRITE_BARRIER requests xen/blkfront: Implement FUA with BLKIF_OP_WRITE_BARRIER xen/blkfront: change blk_shadow.request to proper pointer xen/blkfront: map REQ_FLUSH into a full barrier
| * | ioprio: grab rcu_read_lock in sys_ioprio_{set,get}()Greg Thelen2010-11-151-25/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using: - CONFIG_LOCKUP_DETECTOR=y - CONFIG_PREEMPT=y - CONFIG_LOCKDEP=y - CONFIG_PROVE_LOCKING=y - CONFIG_PROVE_RCU=y found a missing rcu lock during boot on a 512 MiB x86_64 ubuntu vm: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- kernel/pid.c:419 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by ureadahead/1355: #0: (tasklist_lock){.+.+..}, at: [<ffffffff8115bc09>] sys_ioprio_set+0x7f/0x29e stack backtrace: Pid: 1355, comm: ureadahead Not tainted 2.6.37-dbg-DEV #1 Call Trace: [<ffffffff8109c10c>] lockdep_rcu_dereference+0xaa/0xb3 [<ffffffff81088cbf>] find_task_by_pid_ns+0x44/0x5d [<ffffffff81088cfa>] find_task_by_vpid+0x22/0x24 [<ffffffff8115bc3e>] sys_ioprio_set+0xb4/0x29e [<ffffffff8147cf21>] ? trace_hardirqs_off_thunk+0x3a/0x3c [<ffffffff8105c409>] sysenter_dispatch+0x7/0x2c [<ffffffff8147cee2>] ? trace_hardirqs_on_thunk+0x3a/0x3f The fix is to: a) grab rcu lock in sys_ioprio_{set,get}() and b) avoid grabbing tasklist_lock. Discussion in: http://marc.info/?l=linux-kernel&m=128951324702889 Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Modified by Jens to remove the now redundant inner rcu lock and unlock since they are now protected by the outer lock. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2010-11-272-3/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix typo in comment of nilfs_dat_move function nilfs2: nilfs_iget_for_gc() returns ERR_PTR
| * | | nilfs2: fix typo in comment of nilfs_dat_move functionRyusuke Konishi2010-11-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fixes a typo: "uncommited" -> "uncommitted". Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
| * | | nilfs2: nilfs_iget_for_gc() returns ERR_PTRDan Carpenter2010-11-231-2/+2
| | |/ | |/| | | | | | | | | | | | | | | | | | | nilfs_iget_for_gc() returns an ERR_PTR() on failure and doesn't return NULL. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | | reiserfs: fix inode mutex - reiserfs lock misorderingFrederic Weisbecker2010-11-251-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reiserfs_unpack() locks the inode mutex with reiserfs_mutex_lock_safe() to protect against reiserfs lock dependency. However this protection requires to have the reiserfs lock to be locked. This is the case if reiserfs_unpack() is called by reiserfs_ioctl but not from reiserfs_quota_on() when it tries to unpack tails of quota files. Fix the ordering of the two locks in reiserfs_unpack() to fix this issue. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: Markus Gapp <markus.gapp@gmx.net> Reported-by: Jan Kara <jack@suse.cz> Cc: Jeff Mahoney <jeffm@suse.com> Cc: <stable@kernel.org> [2.6.36.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | pagemap: set pagemap walk limit to PMD boundaryNaoya Horiguchi2010-11-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently one pagemap_read() call walks in PAGEMAP_WALK_SIZE bytes (== 512 pages.) But there is a corner case where walk_pmd_range() accidentally runs over a VMA associated with a hugetlbfs file. For example, when a process has mappings to VMAs as shown below: # cat /proc/<pid>/maps ... 3a58f6d000-3a58f72000 rw-p 00000000 00:00 0 7fbd51853000-7fbd51855000 rw-p 00000000 00:00 0 7fbd5186c000-7fbd5186e000 rw-p 00000000 00:00 0 7fbd51a00000-7fbd51c00000 rw-s 00000000 00:12 8614 /hugepages/test then pagemap_read() goes into walk_pmd_range() path and walks in the range 0x7fbd51853000-0x7fbd51a53000, but the hugetlbfs VMA should be handled by walk_hugetlb_range(). Otherwise PMD for the hugepage is considered bad and cleared, which causes undesirable results. This patch fixes it by separating pagemap walk range into one PMD. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | fuse: fix attributes after open(O_TRUNC)Ken Sumrall2010-11-251-0/+10
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The attribute cache for a file was not being cleared when a file is opened with O_TRUNC. If the filesystem's open operation truncates the file ("atomic_o_trunc" feature flag is set) then the kernel should invalidate the cached st_mtime and st_ctime attributes. Also i_size should be explicitly be set to zero as it is used sometimes without refreshing the cache. Signed-off-by: Ken Sumrall <ksumrall@android.com> Cc: Anfei <anfei.zhou@gmail.com> Cc: "Anand V. Avati" <avati@gluster.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for_linus' of ↵Linus Torvalds2010-11-195-55/+37
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard fs: Do not dispatch FITRIM through separate super_operation ext4: ext4_fill_super shouldn't return 0 on corruption jbd2: fix /proc/fs/jbd2/<dev> when using an external journal ext4: missing unlock in ext4_clear_request_list() ext4: fix setting random pages PageUptodate
| * | ext4: Add EXT4_IOC_TRIM ioctl to handle batched discardLukas Czerner2010-11-191-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Filesystem independent ioctl was rejected as not common enough to be in core vfs ioctl. Since we still need to access to this functionality this commit adds ext4 specific ioctl EXT4_IOC_TRIM to dispatch ext4_trim_fs(). It takes fstrim_range structure as an argument. fstrim_range is definec in the include/linux/fs.h and its definition is as follows. struct fstrim_range { __u64 start; __u64 len; __u64 minlen; } start - first Byte to trim len - number of Bytes to trim from start minlen - minimum extent length to trim, free extents shorter than this number of Bytes will be ignored. This will be rounded up to fs block size. After the FITRIM is done, the number of actually discarded Bytes is stored in fstrim_range.len to give the user better insight on how much storage space has been really released for wear-leveling. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | fs: Do not dispatch FITRIM through separate super_operationLukas Czerner2010-11-192-40/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was concern that FITRIM ioctl is not common enough to be included in core vfs ioctl, as Christoph Hellwig pointed out there's no real point in dispatching this out to a separate vector instead of just through ->ioctl. So this commit removes ioctl_fstrim() from vfs ioctl and trim_fs from super_operation structure. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: ext4_fill_super shouldn't return 0 on corruptionDarrick J. Wong2010-11-191-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the start of ext4_fill_super, ret is set to -EINVAL, and any failure path out of that function returns ret. However, the generic_check_addressable clause sets ret = 0 (if it passes), which means that a subsequent failure (e.g. a group checksum error) returns 0 even though the mount should fail. This causes vfs_kern_mount in turn to think that the mount succeeded, leading to an oops. A simple fix is to avoid using ret for the generic_check_addressable check, which was last changed in commit 30ca22c70e3ef0a96ff84de69cd7e8561b416cb2. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | jbd2: fix /proc/fs/jbd2/<dev> when using an external journalyangsheng2010-11-171-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | In jbd2_journal_init_dev(), we need make sure the journal structure is fully initialzied before calling jbd2_stats_proc_init(). Reviewed-by: Andreas Dilger <andreas.dilger@oracle.com> Signed-off-by: yangsheng <sheng.yang@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: missing unlock in ext4_clear_request_list()Dan Carpenter2010-11-171-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | If the the li_request_list was empty then it returned with the lock held. Instead of adding a "goto unlock" I just removed that special case and let it go past the empty list_for_each_safe(). Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: fix setting random pages PageUptodateMarkus Trippelsdorf2010-11-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4_end_bio calls put_page and kmem_cache_free before calling SetPageUpdate(). This can result in setting the PageUptodate bit on random pages and causes the following BUG: BUG: Bad page state in process rm pfn:52e54 page:ffffea0001222260 count:0 mapcount:0 mapping: (null) index:0x0 arch kernel: page flags: 0x4000000000000008(uptodate) Fix the problem by moving put_io_page() after the SetPageUpdate() call. Thanks to Hugh Dickins for analyzing this problem. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2010-11-198-55/+98
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix readdir EOVERFLOW on 32-bit archs ceph: fix frag offset for non-leftmost frags ceph: fix dangling pointer ceph: explicitly specify page alignment in network messages ceph: make page alignment explicit in osd interface ceph: fix comment, remove extraneous args ceph: fix update of ctime from MDS ceph: fix version check on racing inode updates ceph: fix uid/gid on resent mds requests ceph: fix rdcache_gen usage and invalidate ceph: re-request max_size if cap auth changes ceph: only let auth caps update max_size ceph: fix open for write on clustered mds ceph: fix bad pointer dereference in ceph_fill_trace ceph: fix small seq message skipping Revert "ceph: update issue_seq on cap grant"
| * | | ceph: fix readdir EOVERFLOW on 32-bit archsSage Weil2010-11-181-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the readdir filldir_t callers was passing the raw ceph 64-bit ino instead of the hashed 32-bit one, producing an EOVERFLOW in the filler callback. Fix this by calling the ceph_vino_to_ino() helper to do the conversion. Reported-by: Jan Smets <jan.smets@alcatel-lucent.com> Tested-by: Jan Smets <jan.smets@alcatel-lucent.com> Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: fix frag offset for non-leftmost fragsSage Weil2010-11-111-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We start at offset 2 for the leftmost frag, and 0 for subsequent frags. When we reach the end (rightmost), we go back to 2. This fixes readdir on fragmented (large) directories. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: fix dangling pointerSage Weil2010-11-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Clear fi->last_name when it's freed. The only caller is rewinddir() (or equivalent lseek). Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: make page alignment explicit in osd interfaceSage Weil2010-11-093-9/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to infer alignment of IOs within a page based on the file offset, which assumed they matched. This broke with direct IO that was not aligned to pages (e.g., 512-byte aligned IO). We were also trusting the alignment specified in the OSD reply, which could have been adjusted by the server. Explicitly specify the page alignment when setting up OSD IO requests. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: fix comment, remove extraneous argsSage Weil2010-11-091-11/+9
| | | | | | | | | | | | | | | | | | | | | | | | The offset/length arguments aren't used. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: fix update of ctime from MDSSage Weil2010-11-081-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The client can have a newer ctime than the MDS due to AUTH_EXCL and XATTR_EXCL caps as well; update the check in ceph_fill_file_time appropriately. This fixes cases where ctime/mtime goes backward under the right sequence of local updates (e.g. chmod) and mds replies (e.g. subsequent stat that goes to the MDS). Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: fix version check on racing inode updatesSage Weil2010-11-081-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We may get updates on the same inode from multiple MDSs; generally we only pay attention if the update is newer than what we already have. The exception is when an MDS sense unstable information, in which case we always update. The old > check got this wrong when our version was odd (e.g. 3) and the reply version was even (e.g. 2): the older stale (v2) info would be applied. Fixed and clarified the comment. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: fix uid/gid on resent mds requestsSage Weil2010-11-082-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MDS requests can be rebuilt and resent in non-process context, but were filling in uid/gid from current_fsuid/gid. Put that information in the request struct on request setup. This fixes incorrect (and root) uid/gid getting set for requests that are forwarded between MDSs, usually due to metadata migrations. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: fix rdcache_gen usage and invalidateSage Weil2010-11-083-14/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to use rdcache_gen to indicate whether we "might" have cached pages. Now we just look at the mapping to determine that. However, some old behavior remains from that transition. First, rdcache_gen == 0 no longer means we have no pages. That can happen at any time (presumably when we carry FILE_CACHE). We should not reset it to zero, and we should not check that it is zero. That means that the only purpose for rdcache_revoking is to resolve races between new issues of FILE_CACHE and an async invalidate. If they are equal, we should invalidate. On success, we decrement rdcache_revoking, so that it is no longer equal to rdcache_gen. Similarly, if we success in doing a sync invalidate, set revoking = gen - 1. (This is a small optimization to avoid doing unnecessary invalidate work and does not affect correctness.) Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: re-request max_size if cap auth changesSage Weil2010-11-071-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the auth cap migrates to another MDS, clear requested_max_size so that we resend any pending max_size increase requests. This fixes potential hangs on writes that extend a file and race with an cap migration between MDSs. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: only let auth caps update max_sizeSage Weil2010-11-071-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only the auth MDS has a meaningful max_size value for us, so only update it in fill_inode if we're being issued an auth cap. Otherwise, a random stat result from a non-auth MDS can clobber a meaningful max_size, get the client<->mds cap state out of sync, and make writes hang. Specifically, even if the client re-requests a larger max_size (which it will), the MDS won't respond because as far as it knows we already have a sufficiently large value. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: fix open for write on clustered mdsSage Weil2010-11-071-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally when we open a file we already have a cap, and simply update the wanted set. However, if we open a file for write, but don't have an auth cap, that doesn't work; we need to open a new cap with the auth MDS. Only reuse existing caps if we are opening for read or the existing cap is auth. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: fix bad pointer dereference in ceph_fill_traceSage Weil2010-11-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We dereference *in a few lines down, but only set it on rename. It is apparently pretty rare for this to trigger, but I have been hitting it with a clustered MDSs. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | Revert "ceph: update issue_seq on cap grant"Sage Weil2010-10-271-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit d91f2438d881514e4a923fd786dbd94b764a9440. The intent of issue_seq is to distinguish between mds->client messages that (re)create the cap and those that do not, which means we should _only_ be updating that value in the create paths. By updating it in handle_cap_grant, we reset it to zero, which then breaks release. The larger question is what workload/problem made me think it should be updated here... Signed-off-by: Sage Weil <sage@newdream.net>
* | | | BKL: remove references to lock_kernel from commentsArnd Bergmann2010-11-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lock_kernel is gone from the code, so the comments should be updated, too. nfsd now uses lock_flocks instead of lock_kernel to protect against posix file locks. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: J. Bruce Fields <bfields@redhat.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | BKL: remove extraneous #include <smp_lock.h>Arnd Bergmann2010-11-1728-28/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | nfs: Ignore kmemleak false positive in nfs_readdir_make_qstrCatalin Marinas2010-11-161-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Strings allocated via kmemdup() in nfs_readdir_make_qstr() are referenced from the nfs_cache_array which is stored in a page cache page. Kmemleak does not scan such pages and it reports several false positives. This patch annotates the string->name pointer so that kmemleak does not consider it a real leak. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | NFS: readdir shouldn't read beyond the reply returned by the serverTrond Myklebust2010-11-155-7/+11
| | | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | NFS: Fix a couple of regressions in readdir.Trond Myklebust2010-11-151-34/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix up the issue that array->eof_index needs to be able to be set even if array->size == 0. Ensure that we catch all important memory allocation error conditions and/or kmap() failures. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | Revert "NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns ↵Trond Myklebust2010-11-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EISDIR" This reverts commit 80e60639f1b7c121a7fea53920c5a4b94009361a. This change requires further fixes to ensure that the open doesn't succeed if the lookup later results in a regular file being created. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | Regression: fix mounting NFS when NFSv3 support is not compiledPaulius Zaleckas2010-11-151-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trying to mount NFS (root partition in my case) fails if CONFIG_NFS_V3 is not selected. nfs_validate_mount_data() returns EPROTONOSUPPORT, because of this check: #ifndef CONFIG_NFS_V3 if (args->version == 3) goto out_v3_not_compiled; #endif /* !CONFIG_NFS_V3 */ and args->version was always initialized to 3. It was working in 2.6.36 Signed-off-by: Paulius Zaleckas <paulius.zaleckas@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | NLM: Fix a regression in lockdTrond Myklebust2010-11-151-7/+4
| |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nick Bowler reports: There are no unusual messages on the client... but I just logged into the server and I see lots of messages of the following form: nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! Bisected to commit 9247685088398cf21bcb513bd2832b4cd42516c4 (SUNRPC: Properly initialize sock_xprt.srcaddr in all cases) Apparently, removing the 'transport->srcaddr.ss_family = family' from xs_create_sock() triggers this due to nlmclnt_lookup_host() incorrectly initialising the srcaddr family to AF_UNSPEC. Reported-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds2010-11-155-216/+98
|\ \ \ | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Fix inode deallocation race
| * | | GFS2: Fix inode deallocation raceSteven Whitehouse2010-11-155-216/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This area of the code has always been a bit delicate due to the subtleties of lock ordering. The problem is that for "normal" alloc/dealloc, we always grab the inode locks first and the rgrp lock later. In order to ensure no races in looking up the unlinked, but still allocated inodes, we need to hold the rgrp lock when we do the lookup, which means that we can't take the inode glock. The solution is to borrow the technique already used by NFS to solve what is essentially the same problem (given an inode number, look up the inode carefully, checking that it really is in the expected state). We cannot do that directly from the allocation code (lock ordering again) so we give the job to the pre-existing delete workqueue and carry on with the allocation as normal. If we find there is no space, we do a journal flush (required anyway if space from a deallocation is to be released) which should block against the pending deallocations, so we should always get the space back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>