aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds2010-11-2915-114/+572
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits) Btrfs: don't use migrate page without CONFIG_MIGRATION Btrfs: deal with DIO bios that span more than one ordered extent Btrfs: setup blank root and fs_info for mount time Btrfs: fix fiemap Btrfs - fix race between btrfs_get_sb() and umount Btrfs: update inode ctime when using links Btrfs: make sure new inode size is ok in fallocate Btrfs: fix typo in fallocate to make it honor actual size Btrfs: avoid NULL pointer deref in try_release_extent_buffer Btrfs: make btrfs_add_nondir take parent inode as an argument Btrfs: hold i_mutex when calling btrfs_log_dentry_safe Btrfs: use dget_parent where we can UPDATED Btrfs: fix more ESTALE problems with NFS Btrfs: handle NFS lookups properly btrfs: make 1-bit signed fileds unsigned btrfs: Show device attr correctly for symlinks btrfs: Set file size correctly in file clone btrfs: Check if dest_offset is block-size aligned before cloning file Btrfs: handle the space_cache option properly btrfs: Fix early enospc because 'unused' calculated with wrong sign. ...
| * Btrfs: don't use migrate page without CONFIG_MIGRATIONChris Mason2010-11-291-1/+6
| | | | | | | | | | | | Fixes compile error Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: deal with DIO bios that span more than one ordered extentChris Mason2010-11-283-4/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | The new DIO bio splitting code has problems when the bio spans more than one ordered extent. This will happen as the generic DIO code merges our get_blocks calls together into a bigger single bio. This fixes things by walking forward in the ordered extent code finding all the overlapping ordered extents and completing them all at once. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: setup blank root and fs_info for mount timeJosef Bacik2010-11-272-7/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a problem with how we use sget, it searches through the list of supers attached to the fs_type looking for a super with the same fs_devices as what we're trying to mount. This depends on sb->s_fs_info being filled, but we don't fill that in until we get to btrfs_fill_super, so we could hit supers on the fs_type super list that have a null s_fs_info. In order to fix that we need to go ahead and setup a blank root with a blank fs_info to hold fs_devices, that way our test will work out right and then we can set s_fs_info in btrfs_set_super, and then open_ctree will simply use our pre-allocated root and fs_info when setting everything up. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix fiemapJosef Bacik2010-11-271-9/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two big problems currently with FIEMAP 1) We return extents for holes. This isn't supposed to happen, we just don't return extents for holes and then userspace interprets the lack of an extent as a hole. 2) We sometimes don't set FIEMAP_EXTENT_LAST properly. This is because we wait to see a EXTENT_FLAG_VACANCY flag on the em, but this won't happen if say we ask fiemap to map up to the last extent in a file, and there is nothing but holes up to the i_size. To fix this we need to lookup the last extent in this file and save the logical offset, so if we happen to try and map that extent we can be sure to set FIEMAP_EXTENT_LAST. With this patch we now pass xfstest 225, which we never have before. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs - fix race between btrfs_get_sb() and umountIan Kent2010-11-271-0/+6
| | | | | | | | | | | | | | | | | | | | | | When mounting a btrfs file system btrfs_test_super() may attempt to use sb->s_fs_info, the btrfs root, of a super block that is going away and that has had the btrfs root set to NULL in its ->put_super(). But if the super block is going away it cannot be an existing super block so we can return false in this case. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: update inode ctime when using linksJosef Bacik2010-11-271-0/+1
| | | | | | | | | | | | | | | | Currently we fail xfstest 236 because we're not updating the inode ctime on link. This is a simple fix, and makes it so we pass 236 now. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: make sure new inode size is ok in fallocateJosef Bacik2010-11-271-0/+4
| | | | | | | | | | | | | | | | | | | | We have been failing xfstest 228 forever, because we don't check to make sure the new inode size is acceptable as far as RLIMIT is concerned. Just check to make sure it's ok to create a inode with this new size and error out if not. With this patch we now pass 228. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix typo in fallocate to make it honor actual sizeJosef Bacik2010-11-271-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a typo in __btrfs_prealloc_file_range() where we set the i_size to actual_len/cur_offset, and then just set it to cur_offset again, and do the same with btrfs_ordered_update_i_size(). This fixes it back to keeping i_size in a local variable and then updating i_size properly. Tested this with xfs_io -F -f -c "falloc 0 1" -c "pwrite 0 1" foo stat'ing foo gives us a size of 1 instead of 4096 like it was. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: avoid NULL pointer deref in try_release_extent_bufferChris Mason2010-11-211-2/+4
| | | | | | | | | | | | | | If we fail to find a pointer in the radix tree, don't try to deref the NULL one we do have. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: make btrfs_add_nondir take parent inode as an argumentJosef Bacik2010-11-211-22/+16
| | | | | | | | | | | | | | | | | | | | | | | | Everybody who calls btrfs_add_nondir just passes in the dentry of the new file and then dereference dentry->d_parent->d_inode, but everybody who calls btrfs_add_nondir() are already passed the parent's inode. So instead of dereferencing dentry->d_parent, just make btrfs_add_nondir take the dir inode as an argument and pass that along so we don't have to worry about d_parent. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: hold i_mutex when calling btrfs_log_dentry_safeJosef Bacik2010-11-211-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | Since we walk up the path logging all of the parts of the inode's path, we need to hold i_mutex to make sure that the inode is not renamed while we're logging everything. btrfs_log_dentry_safe does dget_parent and all of that jazz, but we may get unexpected results if the rename changes the inode's location while we're higher up the path logging those dentries, so do this for safety reasons. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: use dget_parent where we can UPDATEDJosef Bacik2010-11-214-12/+43
| | | | | | | | | | | | | | | | | | | | | | There are lots of places where we do dentry->d_parent->d_inode without holding the dentry->d_lock. This could cause problems with rename. So instead we need to use dget_parent() and hold the reference to the parent as long as we are going to use it's inode and then dput it at the end. Signed-off-by: Josef Bacik <josef@redhat.com> Cc: raven@themaw.net Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix more ESTALE problems with NFSJosef Bacik2010-11-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | When creating new inodes we don't setup inode->i_generation. So if we generate an fh with a newly created inode we save the generation of 0, but if we flush the inode to disk and have to read it back when getting the inode on the server we'll have the right i_generation, so gens wont match and we get ESTALE. This patch properly sets inode->i_generation when we create the new inode and now I'm no longer getting ESTALE. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: handle NFS lookups properlyJosef Bacik2010-11-211-0/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | People kept reporting NFS issues, specifically getting ESTALE alot. I figured out how to reproduce the problem SERVER mkfs.btrfs /dev/sda1 mount /dev/sda1 /mnt/btrfs-test <add /mnt/btrfs-test to /etc/exports> btrfs subvol create /mnt/btrfs-test/foo service nfs start CLIENT mount server:/mnt/btrfs /mnt/test cd /mnt/test/foo ls SERVER echo 3 > /proc/sys/vm/drop_caches CLIENT ls <-- get an ESTALE here This is because the standard way to lookup a name in nfsd is to use readdir, and what it does is do a readdir on the parent directory looking for the inode of the child. So in this case the parent being / and the child being foo. Well subvols all have the same inode number, so doing a readdir of / looking for inode 256 will return '.', which obviously doesn't match foo. So instead we need to have our own .get_name so that we can find the right name. Our .get_name will either lookup the inode backref or the root backref, whichever we're looking for, and return the name we find. Running the above reproducer with this patch results in everything acting the way its supposed to. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * btrfs: make 1-bit signed fileds unsignedMariusz Kozlowski2010-11-211-3/+3
| | | | | | | | | | | | | | | | | | | | Fixes these sparse warnings: fs/btrfs/ctree.h:811:17: error: dubious one-bit signed bitfield fs/btrfs/ctree.h:812:20: error: dubious one-bit signed bitfield fs/btrfs/ctree.h:813:19: error: dubious one-bit signed bitfield Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * btrfs: Show device attr correctly for symlinksLi Zefan2010-11-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Symlinks and files of other types show different device numbers, though they are on the same partition: $ touch tmp; ln -s tmp tmp2; stat tmp tmp2 File: `tmp' Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 15h/21d Inode: 984027 Links: 1 --- snip --- File: `tmp2' -> `tmp' Size: 3 Blocks: 0 IO Block: 4096 symbolic link Device: 13h/19d Inode: 984028 Links: 1 Reported-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * btrfs: Set file size correctly in file cloneLi Zefan2010-11-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set src_offset = 0, src_length = 20K, dest_offset = 20K. And the original filesize of the dest file 'file2' is 30K: # ls -l /mnt/file2 -rw-r--r-- 1 root root 30720 Nov 18 16:42 /mnt/file2 Now clone file1 to file2, the dest file should be 40K, but it still shows 30K: # ls -l /mnt/file2 -rw-r--r-- 1 root root 30720 Nov 18 16:42 /mnt/file2 Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * btrfs: Check if dest_offset is block-size aligned before cloning fileLi Zefan2010-11-211-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | We've done the check for src_offset and src_length, and We should also check dest_offset, otherwise we'll corrupt the destination file: (After cloning file1 to file2 with unaligned dest_offset) # cat /mnt/file2 cat: /mnt/file2: Input/output error Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: handle the space_cache option properlyJosef Bacik2010-11-211-0/+1
| | | | | | | | | | | | | | | | | | | | When I added the clear_cache option I screwed up and took the break out of the space_cache case statement, so whenever you mount with space_cache you also get clear_cache, which does you no good if you say set space_cache in fstab so it always gets set. This patch adds the break back in properly. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * btrfs: Fix early enospc because 'unused' calculated with wrong sign.Arne Jansen2010-11-211-1/+1
| | | | | | | | | | | | | | | | | | 'unused' calculated with wrong sign in reserve_metadata_bytes(). This might have lead to unwanted over-reservations. Signed-off-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * btrfs: fix panic caused by direct IOMiao Xie2010-11-211-21/+184
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs paniced when we write >64KB data by direct IO at one time. Reproduce steps: # mkfs.btrfs /dev/sda5 /dev/sda6 # mount /dev/sda5 /mnt # dd if=/dev/zero of=/mnt/tmpfile bs=100K count=1 oflag=direct Then btrfs paniced: mapping failed logical 1103155200 bio len 69632 len 12288 ------------[ cut here ]------------ kernel BUG at fs/btrfs/volumes.c:3010! [SNIP] Pid: 1992, comm: btrfs-worker-0 Not tainted 2.6.37-rc1 #1 D2399/PRIMERGY RIP: 0010:[<ffffffffa03d1462>] [<ffffffffa03d1462>] btrfs_map_bio+0x202/0x210 [btrfs] [SNIP] Call Trace: [<ffffffffa03ab3eb>] __btrfs_submit_bio_done+0x1b/0x20 [btrfs] [<ffffffffa03a35ff>] run_one_async_done+0x9f/0xb0 [btrfs] [<ffffffffa03d3d20>] run_ordered_completions+0x80/0xc0 [btrfs] [<ffffffffa03d45a4>] worker_loop+0x154/0x5f0 [btrfs] [<ffffffffa03d4450>] ? worker_loop+0x0/0x5f0 [btrfs] [<ffffffffa03d4450>] ? worker_loop+0x0/0x5f0 [btrfs] [<ffffffff81083216>] kthread+0x96/0xa0 [<ffffffff8100cec4>] kernel_thread_helper+0x4/0x10 [<ffffffff81083180>] ? kthread+0x0/0xa0 [<ffffffff8100cec0>] ? kernel_thread_helper+0x0/0x10 We fix this problem by splitting bios when we submit bios. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * btrfs: cleanup duplicate bio allocating functionsMiao Xie2010-11-213-18/+8
| | | | | | | | | | | | | | | | extent_bio_alloc() and compressed_bio_alloc() are similar, cleanup similar source code. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * btrfs: fix free dip and dip->csums twiceMiao Xie2010-11-211-6/+3
| | | | | | | | | | | | | | | | bio_endio() will free dip and dip->csums, so dip and dip->csums twice will be freed twice. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: add migrate page for metadata inodeChris Mason2010-11-211-2/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Migrate page will directly call the btrfs btree writepage function, which isn't actually allowed. Our writepage assumes that you have locked the extent_buffer and flagged the block as written. Without doing these steps, we can corrupt metadata blocks. A later commit will remove the btree writepage function since it is really only safely used internally by btrfs. We use writepages for everything else. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds2010-11-291-7/+8
|\ \ | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Userland expects quota limit/warn/usage in 512b blocks
| * | GFS2: Userland expects quota limit/warn/usage in 512b blocksAbhijith Das2010-11-191-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Userland programs using the quotactl() syscall assume limit/warn/usage block counts in 512b basic blocks which were instead being read/written in fs blocksize in gfs2. With this patch, gfs2 correctly interacts with the syscall using 512b blocks. Signed-off-by: Abhi Das <adas@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | Un-inline get_pipe_info() helper functionLinus Torvalds2010-11-281-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | This avoids some include-file hell, and the function isn't really important enough to be inlined anyway. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Export 'get_pipe_info()' to other usersLinus Torvalds2010-11-282-12/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And in particular, use it in 'pipe_fcntl()'. The other pipe functions do not need to use the 'careful' version, since they are only ever called for things that are already known to be pipes. The normal read/write/ioctl functions are called through the file operations structures, so if a file isn't a pipe, they'd never get called. But pipe_fcntl() is special, and called directly from the generic fcntl code, and needs to use the same careful function that the splice code is using. Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Rename 'pipe_info()' to 'get_pipe_info()'Linus Torvalds2010-11-281-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | .. and change it to take the 'file' pointer instead of an inode, since that's what all users want anyway. The renaming is preparatory to exporting it to other users. The old 'pipe_info()' name was too generic and is already used elsewhere, so before making the function public we need to use a more specific name. Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2010-11-276-32/+55
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Ensure we return the dirent->d_type when it is known NFS: Correct the array bound calculation in nfs_readdir_add_to_array NFS: Don't ignore errors from nfs_do_filldir() NFS: Fix the error handling in "uncached_readdir()" NFS: Fix a page leak in uncached_readdir() NFS: Fix a page leak in nfs_do_filldir() NFS: Assume eof if the server returns no readdir records NFS: Buffer overflow in ->decode_dirent() should not be fatal Pure nfs client performance using odirect. SUNRPC: Fix an infinite loop in call_refresh/call_refreshresult
| * | | NFS: Ensure we return the dirent->d_type when it is knownTrond Myklebust2010-11-225-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Store the dirent->d_type in the struct nfs_cache_array_entry so that we can use it in getdents() calls. This fixes a regression with the new readdir code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | NFS: Correct the array bound calculation in nfs_readdir_add_to_arrayTrond Myklebust2010-11-221-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It looks as if the array size calculation in MAX_READDIR_ARRAY does not take the alignment of struct nfs_cache_array_entry into account. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | NFS: Don't ignore errors from nfs_do_filldir()Trond Myklebust2010-11-221-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should ignore the errors from the filldir callback, and just interpret them as meaning we should exit, however we should definitely pass back ENOMEM errors. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | NFS: Fix the error handling in "uncached_readdir()"Trond Myklebust2010-11-221-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, uncached_readdir() is broken because if fails to handle the results from nfs_readdir_xdr_to_array() correctly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | NFS: Fix a page leak in uncached_readdir()Trond Myklebust2010-11-221-2/+3
| | | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | NFS: Fix a page leak in nfs_do_filldir()Trond Myklebust2010-11-221-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfs_do_filldir() must always free desc->page when it is done, otherwise we end up leaking the page. Also remove unused variable 'dentry'. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | NFS: Assume eof if the server returns no readdir recordsTrond Myklebust2010-11-221-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | Some servers are known to be buggy w.r.t. this. Deal with them... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | NFS: Buffer overflow in ->decode_dirent() should not be fatalTrond Myklebust2010-11-223-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Overflowing the buffer in the readdir ->decode_dirent() should not lead to a fatal error, but rather to an attempt to reread the record in question. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | Pure nfs client performance using odirect.Arun Bharadwaj2010-11-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an application opens a file with O_DIRECT flag, if the size of the data that is written is equal to wsize, the client sends a WRITE RPC with stable flag set to UNSTABLE followed by a single COMMIT RPC rather than sending a single WRITE RPC with the stable flag set to FILE_SYNC. This a bug. Patch to fix this. Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2010-11-271-25/+6
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-2.6-block: cciss: fix build for PROC_FS disabled block: fix amiga and atari floppy driver compile warning blk-throttle: Fix calculation of max number of WRITES to be dispatched ioprio: grab rcu_read_lock in sys_ioprio_{set,get}() xen/blkfront: cope with backend that fail empty BLKIF_OP_WRITE_BARRIER requests xen/blkfront: Implement FUA with BLKIF_OP_WRITE_BARRIER xen/blkfront: change blk_shadow.request to proper pointer xen/blkfront: map REQ_FLUSH into a full barrier
| * | | | ioprio: grab rcu_read_lock in sys_ioprio_{set,get}()Greg Thelen2010-11-151-25/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using: - CONFIG_LOCKUP_DETECTOR=y - CONFIG_PREEMPT=y - CONFIG_LOCKDEP=y - CONFIG_PROVE_LOCKING=y - CONFIG_PROVE_RCU=y found a missing rcu lock during boot on a 512 MiB x86_64 ubuntu vm: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- kernel/pid.c:419 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by ureadahead/1355: #0: (tasklist_lock){.+.+..}, at: [<ffffffff8115bc09>] sys_ioprio_set+0x7f/0x29e stack backtrace: Pid: 1355, comm: ureadahead Not tainted 2.6.37-dbg-DEV #1 Call Trace: [<ffffffff8109c10c>] lockdep_rcu_dereference+0xaa/0xb3 [<ffffffff81088cbf>] find_task_by_pid_ns+0x44/0x5d [<ffffffff81088cfa>] find_task_by_vpid+0x22/0x24 [<ffffffff8115bc3e>] sys_ioprio_set+0xb4/0x29e [<ffffffff8147cf21>] ? trace_hardirqs_off_thunk+0x3a/0x3c [<ffffffff8105c409>] sysenter_dispatch+0x7/0x2c [<ffffffff8147cee2>] ? trace_hardirqs_on_thunk+0x3a/0x3f The fix is to: a) grab rcu lock in sys_ioprio_{set,get}() and b) avoid grabbing tasklist_lock. Discussion in: http://marc.info/?l=linux-kernel&m=128951324702889 Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Modified by Jens to remove the now redundant inner rcu lock and unlock since they are now protected by the outer lock. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2010-11-272-3/+3
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix typo in comment of nilfs_dat_move function nilfs2: nilfs_iget_for_gc() returns ERR_PTR
| * | | | | nilfs2: fix typo in comment of nilfs_dat_move functionRyusuke Konishi2010-11-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes a typo: "uncommited" -> "uncommitted". Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
| * | | | | nilfs2: nilfs_iget_for_gc() returns ERR_PTRDan Carpenter2010-11-231-2/+2
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nilfs_iget_for_gc() returns an ERR_PTR() on failure and doesn't return NULL. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | | | | reiserfs: fix inode mutex - reiserfs lock misorderingFrederic Weisbecker2010-11-251-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reiserfs_unpack() locks the inode mutex with reiserfs_mutex_lock_safe() to protect against reiserfs lock dependency. However this protection requires to have the reiserfs lock to be locked. This is the case if reiserfs_unpack() is called by reiserfs_ioctl but not from reiserfs_quota_on() when it tries to unpack tails of quota files. Fix the ordering of the two locks in reiserfs_unpack() to fix this issue. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: Markus Gapp <markus.gapp@gmx.net> Reported-by: Jan Kara <jack@suse.cz> Cc: Jeff Mahoney <jeffm@suse.com> Cc: <stable@kernel.org> [2.6.36.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | pagemap: set pagemap walk limit to PMD boundaryNaoya Horiguchi2010-11-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently one pagemap_read() call walks in PAGEMAP_WALK_SIZE bytes (== 512 pages.) But there is a corner case where walk_pmd_range() accidentally runs over a VMA associated with a hugetlbfs file. For example, when a process has mappings to VMAs as shown below: # cat /proc/<pid>/maps ... 3a58f6d000-3a58f72000 rw-p 00000000 00:00 0 7fbd51853000-7fbd51855000 rw-p 00000000 00:00 0 7fbd5186c000-7fbd5186e000 rw-p 00000000 00:00 0 7fbd51a00000-7fbd51c00000 rw-s 00000000 00:12 8614 /hugepages/test then pagemap_read() goes into walk_pmd_range() path and walks in the range 0x7fbd51853000-0x7fbd51a53000, but the hugetlbfs VMA should be handled by walk_hugetlb_range(). Otherwise PMD for the hugepage is considered bad and cleared, which causes undesirable results. This patch fixes it by separating pagemap walk range into one PMD. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | fuse: fix attributes after open(O_TRUNC)Ken Sumrall2010-11-251-0/+10
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The attribute cache for a file was not being cleared when a file is opened with O_TRUNC. If the filesystem's open operation truncates the file ("atomic_o_trunc" feature flag is set) then the kernel should invalidate the cached st_mtime and st_ctime attributes. Also i_size should be explicitly be set to zero as it is used sometimes without refreshing the cache. Signed-off-by: Ken Sumrall <ksumrall@android.com> Cc: Anfei <anfei.zhou@gmail.com> Cc: "Anand V. Avati" <avati@gluster.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for_linus' of ↵Linus Torvalds2010-11-195-55/+37
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard fs: Do not dispatch FITRIM through separate super_operation ext4: ext4_fill_super shouldn't return 0 on corruption jbd2: fix /proc/fs/jbd2/<dev> when using an external journal ext4: missing unlock in ext4_clear_request_list() ext4: fix setting random pages PageUptodate
| * | | | ext4: Add EXT4_IOC_TRIM ioctl to handle batched discardLukas Czerner2010-11-191-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Filesystem independent ioctl was rejected as not common enough to be in core vfs ioctl. Since we still need to access to this functionality this commit adds ext4 specific ioctl EXT4_IOC_TRIM to dispatch ext4_trim_fs(). It takes fstrim_range structure as an argument. fstrim_range is definec in the include/linux/fs.h and its definition is as follows. struct fstrim_range { __u64 start; __u64 len; __u64 minlen; } start - first Byte to trim len - number of Bytes to trim from start minlen - minimum extent length to trim, free extents shorter than this number of Bytes will be ignored. This will be rounded up to fs block size. After the FITRIM is done, the number of actually discarded Bytes is stored in fstrim_range.len to give the user better insight on how much storage space has been really released for wear-leveling. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>