aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2011-07-082-8/+10
|\ | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: unpin stale inodes directly in IOP_COMMITTED
| * xfs: unpin stale inodes directly in IOP_COMMITTEDDave Chinner2011-07-062-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When inodes are marked stale in a transaction, they are treated specially when the inode log item is being inserted into the AIL. It tries to avoid moving the log item forward in the AIL due to a race condition with the writing the underlying buffer back to disk. The was "fixed" in commit de25c18 ("xfs: avoid moving stale inodes in the AIL"). To avoid moving the item forward, we return a LSN smaller than the commit_lsn of the completing transaction, thereby trying to trick the commit code into not moving the inode forward at all. I'm not sure this ever worked as intended - it assumes the inode is already in the AIL, but I don't think the returned LSN would have been small enough to prevent moving the inode. It appears that the reason it worked is that the lower LSN of the inodes meant they were inserted into the AIL and flushed before the inode buffer (which was moved to the commit_lsn of the transaction). The big problem is that with delayed logging, the returning of the different LSN means insertion takes the slow, non-bulk path. Worse yet is that insertion is to a position -before- the commit_lsn so it is doing a AIL traversal on every insertion, and has to walk over all the items that have already been inserted into the AIL. It's expensive. To compound the matter further, with delayed logging inodes are likely to go from clean to stale in a single checkpoint, which means they aren't even in the AIL at all when we come across them at AIL insertion time. Hence these were all getting inserted into the AIL when they simply do not need to be as inodes marked XFS_ISTALE are never written back. Transactional/recovery integrity is maintained in this case by the other items in the unlink transaction that were modified (e.g. the AGI btree blocks) and committed in the same checkpoint. So to fix this, simply unpin the stale inodes directly in xfs_inode_item_committed() and return -1 to indicate that the AIL insertion code does not need to do any further processing of these inodes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* | FS-Cache: Add a helper to bulk uncache pages on an inodeDavid Howells2011-07-073-5/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an FS-Cache helper to bulk uncache pages on an inode. This will only work for the circumstance where the pages in the cache correspond 1:1 with the pages attached to an inode's page cache. This is required for CIFS and NFS: When disabling inode cookie, we were returning the cookie and setting cifsi->fscache to NULL but failed to invalidate any previously mapped pages. This resulted in "Bad page state" errors and manifested in other kind of errors when running fsstress. Fix it by uncaching mapped pages when we disable the inode cookie. This patch should fix the following oops and "Bad page state" errors seen during fsstress testing. ------------[ cut here ]------------ kernel BUG at fs/cachefiles/namei.c:201! invalid opcode: 0000 [#1] SMP Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles] RSP: 0018:ffff88002ce6dd00 EFLAGS: 00010282 RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282 RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300 R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840 R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0 FS: 00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0) Stack: 0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00 ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380 ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56 Call Trace: cachefiles_lookup_object+0x78/0xd4 [cachefiles] fscache_lookup_object+0x131/0x16d [fscache] fscache_object_work_func+0x1bc/0x669 [fscache] process_one_work+0x186/0x298 worker_thread+0xda/0x15d kthread+0x84/0x8c kernel_thread_helper+0x4/0x10 RIP cachefiles_walk_to_object+0x436/0x745 [cachefiles] ---[ end trace 1d481c9af1804caa ]--- I tested the uncaching by the following means: (1) Create a big file on my NFS server (104857600 bytes). (2) Read the file into the cache with md5sum on the NFS client. Look in /proc/fs/fscache/stats: Pages : mrk=25601 unc=0 (3) Open the file for read/write ("bash 5<>/warthog/bigfile"). Look in proc again: Pages : mrk=25601 unc=25601 Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | FDPIC: Fix memory leakDavidlohr Bueso2011-07-061-0/+1
| | | | | | | | | | | | | | | | | | The shdr4extnum variable isn't being freed in the cleanup process of elf_fdpic_core_dump(). Signed-off-by: Davidlohr Bueso <dave@gnu.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fs: fix lock initializationMiklos Szeredi2011-07-061-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | locks_alloc_lock() assumed that the allocated struct file_lock is already initialized to zero members. This is only true for the first allocation of the structure, after reuse some of the members will have random values. This will for example result in passing random fl_start values to userspace in fuse for FL_FLOCK locks, which is an information leak at best. Fix by reinitializing those members which may be non-zero after freeing. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2011-07-051-8/+6
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix sync and dio writes across stripe boundaries libceph: fix page calculation for non-page-aligned io ceph: fix page alignment corrections
| * | ceph: fix sync and dio writes across stripe boundariesSage Weil2011-06-131-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | We were iterating across stripe boundaries properly, but not moving the write buffer pointer forward. This caused us to rewrite the same data after the break. Fix by adjusting the data pointer forward, and recalculating the io and buffer alignment after the break. Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: fix page alignment correctionsSage Weil2011-06-131-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | dd if=/dev/urandom of=/mnt/fs_depot/dd10 bs=500 seek=8388 count=1 dd if=/mnt/fs_depot/dd10 of=/root/dd10out bs=500 skip=8388 count=1 Reported-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2011-07-052-3/+6
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus: hfsplus: Fix double iput of the same inode in hfsplus_fill_super() hfsplus: add missing call to bio_put()
| * | | hfsplus: Fix double iput of the same inode in hfsplus_fill_super()Alexey Khoroshilov2011-06-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a misprint in resource deallocation code on error path in hfsplus_fill_super(): the sbi->alloc_file inode is iput twice, while the root inode in not iput at all. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | hfsplus: add missing call to bio_put()Seth Forshee2011-06-301-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hfsplus leaks bio objects by failing to call bio_put() on the bios it allocates. Add the missing call to fix the leak. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Cc: <stable@kernel.org> # .38.x, .39.x Signed-off-by: Christoph Hellwig <hch@lst.de>
* | | | cifs: set socket send and receive timeouts before attempting connectJeff Layton2011-07-011-8/+8
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Benjamin S. reported that he was unable to suspend his machine while it had a cifs share mounted. The freezer caused this to spew when he tried it: -----------------------[snip]------------------ PM: Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.01 seconds) done. Freezing remaining freezable tasks ... Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0): cifsd S ffff880127f7b1b0 0 1821 2 0x00800000 ffff880127f7b1b0 0000000000000046 ffff88005fe008a8 ffff8800ffffffff ffff880127cee6b0 0000000000011100 ffff880127737fd8 0000000000004000 ffff880127737fd8 0000000000011100 ffff880127f7b1b0 ffff880127736010 Call Trace: [<ffffffff811e85dd>] ? sk_reset_timer+0xf/0x19 [<ffffffff8122cf3f>] ? tcp_connect+0x43c/0x445 [<ffffffff8123374e>] ? tcp_v4_connect+0x40d/0x47f [<ffffffff8126ce41>] ? schedule_timeout+0x21/0x1ad [<ffffffff8126e358>] ? _raw_spin_lock_bh+0x9/0x1f [<ffffffff811e81c7>] ? release_sock+0x19/0xef [<ffffffff8123e8be>] ? inet_stream_connect+0x14c/0x24a [<ffffffff8104485b>] ? autoremove_wake_function+0x0/0x2a [<ffffffffa02ccfe2>] ? ipv4_connect+0x39c/0x3b5 [cifs] [<ffffffffa02cd7b7>] ? cifs_reconnect+0x1fc/0x28a [cifs] [<ffffffffa02cdbdc>] ? cifs_demultiplex_thread+0x397/0xb9f [cifs] [<ffffffff81076afc>] ? perf_event_exit_task+0xb9/0x1bf [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs] [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs] [<ffffffff810444a1>] ? kthread+0x7a/0x82 [<ffffffff81002d14>] ? kernel_thread_helper+0x4/0x10 [<ffffffff81044427>] ? kthread+0x0/0x82 [<ffffffff81002d10>] ? kernel_thread_helper+0x0/0x10 Restarting tasks ... done. -----------------------[snip]------------------ We do attempt to perform a try_to_freeze in cifs_reconnect, but the connection attempt itself seems to be taking longer than 20s to time out. The connect timeout is governed by the socket send and receive timeouts, so we can shorten that period by setting those timeouts before attempting the connect instead of after. Adam Williamson tested the patch and said that it seems to have fixed suspending on his laptop when a cifs share is mounted. Reported-by: Benjamin S <da_joind@gmx.net> Tested-by: Adam Williamson <awilliam@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
* | | proc: restrict access to /proc/PID/ioVasiliy Kulikov2011-06-281-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /proc/PID/io may be used for gathering private information. E.g. for openssh and vsftpd daemons wchars/rchars may be used to learn the precise password length. Restrict it to processes being able to ptrace the target process. ptrace_may_access() is needed to prevent keeping open file descriptor of "io" file, executing setuid binary and gathering io information of the setuid'ed process. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | mm: fix assertion mapping->nrpages == 0 in end_writeback()Jan Kara2011-06-271-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under heavy memory and filesystem load, users observe the assertion mapping->nrpages == 0 in end_writeback() trigger. This can be caused by page reclaim reclaiming the last page from a mapping in the following race: CPU0 CPU1 ... shrink_page_list() __remove_mapping() __delete_from_page_cache() radix_tree_delete() evict_inode() truncate_inode_pages() truncate_inode_pages_range() pagevec_lookup() - finds nothing end_writeback() mapping->nrpages != 0 -> BUG page->mapping = NULL mapping->nrpages-- Fix the problem by doing a reliable check of mapping->nrpages under mapping->tree_lock in end_writeback(). Analyzed by Jay <jinshan.xiong@whamcloud.com>, lost in LKML, and dug out by Miklos Szeredi <mszeredi@suse.de>. Cc: Jay <jinshan.xiong@whamcloud.com> Cc: Miklos Szeredi <mszeredi@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | romfs: fix romfs_get_unmapped_area() argument checkBob Liu2011-06-271-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | romfs_get_unmapped_area() checks argument `len' without considering PAGE_ALIGN which will cause do_mmap_pgoff() return -EINVAL error after commit f67d9b1576c ("nommu: add page_align to mmap"). Fix the check by changing it in same way ramfs_nommu_get_unmapped_area() was changed in ramfs/file-nommu.c. Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Acked-by: Greg Ungerer <gerg@snapgear.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2011-06-275-31/+100
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: btrfs: fix inconsonant inode information Btrfs: make sure to update total_bitmaps when freeing cache V3 Btrfs: fix type mismatch in find_free_extent() Btrfs: make sure to record the transid in new inodes
| * | | btrfs: fix inconsonant inode informationMiao Xie2011-06-273-26/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When iputting the inode, We may leave the delayed nodes if they have some delayed items that have not been dealt with. So when the inode is read again, we must look up the relative delayed node, and use the information in it to initialize the inode. Or we will get inconsonant inode information, it may cause that the same directory index number is allocated again, and hit the following oops: [ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17) [ 5447.569766] ------------[ cut here ]------------ [ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301! [SNIP] [ 5447.790721] Call Trace: [ 5447.793191] [<ffffffffa0641c4e>] btrfs_insert_dir_item+0x189/0x1bb [btrfs] [ 5447.800156] [<ffffffffa0651a45>] btrfs_add_link+0x12b/0x191 [btrfs] [ 5447.806517] [<ffffffffa0651adc>] btrfs_add_nondir+0x31/0x58 [btrfs] [ 5447.812876] [<ffffffffa0651d6a>] btrfs_create+0xf9/0x197 [btrfs] [ 5447.818961] [<ffffffff8111f840>] vfs_create+0x72/0x92 [ 5447.824090] [<ffffffff8111fa8c>] do_last+0x22c/0x40b [ 5447.829133] [<ffffffff8112076a>] path_openat+0xc0/0x2ef [ 5447.834438] [<ffffffff810c58e2>] ? __perf_event_task_sched_out+0x24/0x44 [ 5447.841216] [<ffffffff8103ecdd>] ? perf_event_task_sched_out+0x59/0x67 [ 5447.847846] [<ffffffff81121a79>] do_filp_open+0x3d/0x87 [ 5447.853156] [<ffffffff811e126c>] ? strncpy_from_user+0x43/0x4d [ 5447.859072] [<ffffffff8111f1f5>] ? getname_flags+0x2e/0x80 [ 5447.864636] [<ffffffff8111f179>] ? do_getname+0x14b/0x173 [ 5447.870112] [<ffffffff8111f1b7>] ? audit_getname+0x16/0x26 [ 5447.875682] [<ffffffff8112b1ab>] ? spin_lock+0xe/0x10 [ 5447.880882] [<ffffffff81112d39>] do_sys_open+0x69/0xae [ 5447.886153] [<ffffffff81112db1>] sys_open+0x20/0x22 [ 5447.891114] [<ffffffff813b9aab>] system_call_fastpath+0x16/0x1b Fix it by reusing the old delayed node. Reported-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | | Btrfs: make sure to update total_bitmaps when freeing cache V3Josef Bacik2011-06-251-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user reported this bug again where we have more bitmaps than we are supposed to. This is because we failed to load the free space cache, but don't update the ctl->total_bitmaps counter when we remove entries from the tree. This patch fixes this problem and we should be good to go again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | | Btrfs: fix type mismatch in find_free_extent()Ilya Dryomov2011-06-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | data parameter should be u64 because a full-sized chunk flags field is passed instead of 0/1 for distinguishing data from metadata. All underlying functions expect u64. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | | Btrfs: make sure to record the transid in new inodesChris Mason2011-06-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we create a new inode, we aren't filling in the field that records the transaction that last changed this inode. If we then go to fsync that inode, it will be skipped because the field isn't filled in. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | | | Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2011-06-274-6/+31
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: prevent bogus assert when trying to remove non-existent attribute xfs: clear XFS_IDIRTY_RELEASE on truncate down xfs: reset inode per-lifetime state when recycling it
| * | | xfs: prevent bogus assert when trying to remove non-existent attributeDave Chinner2011-06-231-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the attribute fork on an inode is in btree format and has multiple levels (i.e node format rather than leaf format), then a lookup failure will trigger an assert failure in xfs_da_path_shift if the flag XFS_DA_OP_OKNOENT is not set. This flag is used to indicate to the directory btree code that not finding an entry is not a fatal error. In the case of doing a lookup for a directory name removal, this is valid as a user cannot insert an arbitrary name to remove from the directory btree. However, in the case of the attribute tree, a user has direct control over the attribute name and can ask for any random name to be removed without any validation. In this case, fsstress is asking for a non-existent user.selinux attribute to be removed, and that is causing xfs_da_path_shift() to fall off the bottom of the tree where it asserts that a lookup failure is allowed. Because the flag is not set, we die a horrible death on a debug enable kernel. Prevent this assert from firing on attribute removes by adding the op_flag XFS_DA_OP_OKNOENT to atribute removal operations. Discovered when testing on a SELinux enabled system by fsstress in test 070 by trying to remove a non-existent user.selinux attribute. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | xfs: clear XFS_IDIRTY_RELEASE on truncate downDave Chinner2011-06-231-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an inode is truncated down, speculative preallocation is removed from the inode. This should also reset the state bits for controlling whether preallocation is subsequently removed when the file is next closed. The flag is not being cleared, so repeated operations on a file that first involve a truncate (e.g. multiple repeated dd invocations on a file) give different file layouts for the second and subsequent invocations. Fix this by clearing the XFS_IDIRTY_RELEASE state bit when the XFS_ITRUNCATED bit is detected in xfs_release() and hence ensure that speculative delalloc is removed on files that have been truncated down. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | xfs: reset inode per-lifetime state when recycling itDave Chinner2011-06-232-4/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XFS inodes has several per-lifetime state fields that determine the behaviour of the inode. These state fields are not all reset when an inode is reused from the reclaimable state. This can lead to unexpected behaviour of the new inode such as speculative preallocation not being truncated away in the expected manner for local files until the inode is subsequently truncated, freed or cycles out of the cache. It can also lead to an inode being considered to be a filestream inode or having been truncated when that is not the case. Rework the reinitialisation of the inode when it is recycled to ensure that it is pristine before it is reused. While there, also fix the resetting of state flags in the recycling error paths so the inode does not become unreclaimable. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds2011-06-262-5/+3
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: mark CONFIG_CIFS_NFSD_EXPORT as BROKEN cifs: free blkcipher in smbhash
| * | | | cifs: mark CONFIG_CIFS_NFSD_EXPORT as BROKENJeff Layton2011-06-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This does not work properly with CIFS as current servers do not enable support for the FILE_OPEN_BY_FILE_ID on SMB NTCreateX and not all NFS clients handle ESTALE. For now, it just plain doesn't work. Mark it BROKEN to discourage distros from enabling it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | | | cifs: free blkcipher in smbhashJeff Layton2011-06-241-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is currently leaked in the rc == 0 case. Reported-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-06-264-116/+98
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: cifs: propagate errors from cifs_get_root() to mount(2) cifs: tidy cifs_do_mount() up a bit cifs: more breakage on mount failures cifs: close sget() races cifs: pull freeing mountdata/dropping nls/freeing cifs_sb into cifs_umount() cifs: move cifs_umount() call into ->kill_sb() cifs: pull cifs_mount() call up sanitize cifs_umount() prototype cifs: initialize ->tlink_tree in cifs_setup_cifs_sb() cifs: allocate mountdata earlier cifs: leak on mount if we share superblock cifs: don't pass superblock to cifs_mount() cifs: don't leak nls on mount failure cifs: double free on mount failure take bdi setup/destruction into cifs_mount/cifs_umount Acked-by: Steve French <smfrench@gmail.com>
| * | | | cifs: propagate errors from cifs_get_root() to mount(2)Al Viro2011-06-241-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... instead of just failing with -EINVAL Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | cifs: tidy cifs_do_mount() up a bitAl Viro2011-06-241-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | cifs: more breakage on mount failuresAl Viro2011-06-241-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | if cifs_get_root() fails, we end up with ->mount() returning NULL, which is not what callers expect. Moreover, in case of superblock reuse we end up leaking a superblock reference... Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | cifs: close sget() racesAl Viro2011-06-241-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | have ->s_fs_info set by the set() callback passed to sget() Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | cifs: pull freeing mountdata/dropping nls/freeing cifs_sb into cifs_umount()Al Viro2011-06-242-14/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | all callers of cifs_umount() proceed to do the same thing; pull it into cifs_umount() itself. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | cifs: move cifs_umount() call into ->kill_sb()Al Viro2011-06-241-18/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instead of calling it manually in case if cifs_read_super() fails to set ->s_root, just call it from ->kill_sb(). cifs_put_super() is gone now *and* we have cifs_sb shutdown and destruction done after the superblock is gone from ->s_instances. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | cifs: pull cifs_mount() call upAl Viro2011-06-241-12/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... to the point prior to sget(). Now we have cifs_sb set up early enough. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | sanitize cifs_umount() prototypeAl Viro2011-06-243-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a) superblock argument is unused b) it always returns 0 Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | cifs: initialize ->tlink_tree in cifs_setup_cifs_sb()Al Viro2011-06-242-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | no need to wait until cifs_read_super() and we need it done by the time cifs_mount() will be called. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | cifs: allocate mountdata earlierAl Viro2011-06-241-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pull mountdata allocation up, so that it won't stand in the way when we lift cifs_mount() to location before sget(). Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | cifs: leak on mount if we share superblockAl Viro2011-06-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cifs_sb and nls end up leaked... Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | cifs: don't pass superblock to cifs_mount()Al Viro2011-06-244-22/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To close sget() races we'll need to be able to set cifs_sb up before we get the superblock, so we'll want to be able to do cifs_mount() earlier. Fortunately, it's easy to do - setting ->s_maxbytes can be done in cifs_read_super(), ditto for ->s_time_gran and as for putting MS_POSIXACL into ->s_flags, we can mirror it in ->mnt_cifs_flags until cifs_read_super() is called. Kill unused 'devname' argument, while we are at it... Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | cifs: don't leak nls on mount failureAl Viro2011-06-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | if cifs_sb allocation fails, we still need to drop nls we'd stashed into volume_info - the one we would've copied to cifs_sb if we could allocate the latter. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | cifs: double free on mount failureAl Viro2011-06-241-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | if we get to out_super with ->s_root already set (e.g. with cifs_get_root() failure), we'll end up with cifs_put_super() called and ->mountdata freed twice. We'll also get cifs_sb freed twice and cifs_sb->local_nls dropped twice. The problem is, we can get to out_super both with and without ->s_root, which makes ->put_super() a bad place for such work. Switch to ->kill_sb(), have all that work done there after kill_anon_super(). Unlike ->put_super(), ->kill_sb() is called by deactivate_locked_super() whether we have ->s_root or not. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | take bdi setup/destruction into cifs_mount/cifs_umountAl Viro2011-06-242-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2011-06-241-1/+13
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-block: block: add REQ_SECURE to REQ_COMMON_MASK block: use the passed in @bdev when claiming if partno is zero block: Add __attribute__((format(printf...) and fix fallout block: make disk_block_events() properly wait for work cancellation block: remove non-syncing __disk_block_events() and fold it into disk_block_events() block: don't use non-syncing event blocking in disk_check_events() cfq-iosched: fix locking around ioc->ioc_data assignment
| * | | | | block: use the passed in @bdev when claiming if partno is zeroTejun Heo2011-06-131-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 6b4517a791 (block: implement bd_claiming and claiming block) introduced claiming block to support O_EXCL blkdev opens properly. bd_start_claiming() looks up the part 0 bdev and starts claiming block. The function assumed that there is only one part 0 bdev and always used bdget_disk(disk, 0) to look it up; unfortunately, this isn't true for some drivers (floppy) which use multiple block devices to denote different operating parameters for the same physical device. There can be multiple part 0 bdev's for the same device number. This incorrect assumption caused the wrong bdev to be used during claiming leading to unbalanced bd_holders as reported in the following bug. https://bugzilla.kernel.org/show_bug.cgi?id=28522 This patch updates bd_start_claiming() such that it uses the bdev specified as argument if its partno is zero. Note that this means that different bdev's can be used for the same device and O_EXCL check can be effectively bypassed. It has always been broken that way and floppy is fortunately on its way out. Leave that breakage alone. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Alex Villacis Lasso <avillaci@ceibo.fiec.espol.edu.ec> Tested-by: Alex Villacis Lasso <avillaci@ceibo.fiec.espol.edu.ec> Cc: stable@kernel.org # >= v2.6.36 Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds2011-06-241-14/+25
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: fix wsize negotiation to respect max buffer size and active signing (try #4) CIFS: Fix problem with 3.0-rc1 null user mount failure
| * | | | | | cifs: fix wsize negotiation to respect max buffer size and active signing ↵Jeff Layton2011-06-231-13/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (try #4) Hopefully last version. Base signing check on CAP_UNIX instead of tcon->unix_ext, also clean up the comments a bit more. According to Hongwei Sun's blog posting here: http://blogs.msdn.com/b/openspecification/archive/2009/04/10/smb-maximum-transmit-buffer-size-and-performance-tuning.aspx CAP_LARGE_WRITEX is ignored when signing is active. Also, the maximum size for a write without CAP_LARGE_WRITEX should be the maxBuf that the server sent in the NEGOTIATE request. Fix the wsize negotiation to take this into account. While we're at it, alter the other wsize definitions to use sizeof(WRITE_REQ) to allow for slightly larger amounts of data to potentially be written per request. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | | | | | CIFS: Fix problem with 3.0-rc1 null user mount failurePavel Shilovsky2011-06-221-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Figured it out: it was broken by b946845a9dc523c759cae2b6a0f6827486c3221a commit - "cifs: cifs_parse_mount_options: do not tokenize mount options in-place". So, as a quick fix I suggest to apply this patch. [PATCH] CIFS: Fix kfree() with constant string in a null user case Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
* | | | | | | Remove unneeded version.h includes from fs/Jesper Juhl2011-06-242-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was pointed out by 'make versioncheck' that some includes of linux/version.h were not needed in fs/ (fs/btrfs/ctree.h and fs/omfs/file.c). This patch removes them. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-06-224-12/+11
|\ \ \ \ \ \ \ | |/ / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: jfs: agstart field must be 64 bits JFS: Don't save agno in the inode jfs: Update agstart when resizing volume jfs: old_agsize should be 64 bits in jfs_extendfs