aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* writeback: add name to backing_dev_infoJens Axboe2009-09-1110-0/+10
| | | | | | | | This enables us to track who does what and print info. Its main use is catching dirty inodes on the default_backing_dev_info, so we can fix that up. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* writeback: get rid of pdflush completelyJens Axboe2009-09-111-0/+5
| | | | | | It is now unused, so kill it off. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* writeback: switch to per-bdi threads for flushing dataJens Axboe2009-09-114-292/+713
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This gets rid of pdflush for bdi writeout and kupdated style cleaning. pdflush writeout suffers from lack of locality and also requires more threads to handle the same workload, since it has to work in a non-blocking fashion against each queue. This also introduces lumpy behaviour and potential request starvation, since pdflush can be starved for queue access if others are accessing it. A sample ffsb workload that does random writes to files is about 8% faster here on a simple SATA drive during the benchmark phase. File layout also seems a LOT more smooth in vmstat: r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42 0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44 1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37 0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58 0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34 0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37 0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44 0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38 0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41 0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45 where vanilla tends to fluctuate a lot in the creation phase: r b swpd free buff cache si so bi bo in cs us sy id wa 1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36 1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51 0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40 0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37 1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41 0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49 0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36 1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43 0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39 1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45 1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34 0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54 A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A SSD based writeback test on XFS performs over 20% better as well, with the throughput being very stable around 1GB/sec, where pdflush only manages 750MB/sec and fluctuates wildly while doing so. Random buffered writes to many files behave a lot better as well, as does random mmap'ed writes. A separate thread is added to sync the super blocks. In the long term, adding sync_supers_bdi() functionality could get rid of this thread again. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* writeback: move dirty inodes from super_block to backing_dev_infoJens Axboe2009-09-112-73/+127
| | | | | | | | | This is a first step at introducing per-bdi flusher threads. We should have no change in behaviour, although sb_has_dirty_inodes() is now ridiculously expensive, as there's no easy way to answer that question. Not a huge problem, since it'll be deleted in subsequent patches. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* writeback: get rid of generic_sync_sb_inodes() exportJens Axboe2009-09-114-57/+55
| | | | | | | | | | | | This adds two new exported functions: - writeback_inodes_sb(), which only attempts to writeback dirty inodes on this super_block, for WB_SYNC_NONE writeout. - sync_inodes_sb(), which writes out all dirty inodes on this super_block and also waits for the IO to complete. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* Merge branch 'lookup-permissions-cleanup'Linus Torvalds2009-09-0923-109/+73
|\ | | | | | | | | | | | | | | | | | | | | | | * lookup-permissions-cleanup: jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()' ext[234]: move over to 'check_acl' permission model shmfs: use 'check_acl' instead of 'permission' Make 'check_acl()' a first-class filesystem op Simplify exec_permission_lite(), part 3 Simplify exec_permission_lite() further Simplify exec_permission_lite() logic Do not call 'ima_path_check()' for each path component
| * jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()'Linus Torvalds2009-09-0810-32/+14
| | | | | | | | | | | | | | | | | | | | | | This avoids an indirect call in the VFS for each path component lookup. Well, at least as long as you own the directory in question, and the ACL check is unnecessary. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * ext[234]: move over to 'check_acl' permission modelLinus Torvalds2009-09-0812-36/+18
| | | | | | | | | | | | | | | | | | | | Don't implement per-filesystem 'extX_permission()' functions that have to be called for every path component operation, and instead just expose the actual ACL checking so that the VFS layer can now do it for us. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Make 'check_acl()' a first-class filesystem opLinus Torvalds2009-09-081-27/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is stage one in flattening out the callchains for the common permission testing. Rather than have most filesystem implement their own inode->i_op->permission function that just calls back down to the VFS layers 'generic_permission()' with the per-filesystem ACL checking function, the filesystem can just expose its 'check_acl' function directly, and let the VFS layer do everything for it. This is all just preparatory - no filesystem actually enables this yet. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Simplify exec_permission_lite(), part 3Linus Torvalds2009-09-081-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't call down to the generic inode_permission() function just to call the inode-specific permission function - just do it directly. The generic inode_permission() code does things like checking MAY_WRITE and devcgroup_inode_permission(), neither of which are relevant for the light pathname walk permission checks (we always do just MAY_EXEC, and the inode is never a special device). Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Simplify exec_permission_lite() furtherLinus Torvalds2009-09-081-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | This function is only called for path components that are already known to be directories (they have a '->lookup' method). So don't bother doing that whole S_ISDIR() testing, the whole point of the 'lite()' version is that we know that we are looking at a directory component, and that we're only checking name lookup permission. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Simplify exec_permission_lite() logicLinus Torvalds2009-09-081-4/+1
| | | | | | | | | | | | | | | | | | Instead of returning EAGAIN and having the caller do something special for that case, just do the special case directly. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Do not call 'ima_path_check()' for each path componentLinus Torvalds2009-09-081-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Not only is that a supremely timing-critical path, but it's hopefully some day going to be lockless for the common case, and ima can't do that. Plus the integrity code doesn't even care about non-regular files, so it was always a total waste of time and effort. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | binfmt_elf: fix PT_INTERP bss handlingRoland McGrath2009-09-091-14/+14
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT. Here is a small test case. (Yes, there are other, useful PT_INTERP which have only .text and no .data/.bss.) ----- ptinterp.S _start: .globl _start nop int3 ----- $ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S $ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c $ ./hello Segmentation fault # during execve() itself After applying the patch: $ ./hello Trace trap # user-mode execution after execve() finishes If the ELF headers are actually self-inconsistent, then dying is fine. But having no PROT_WRITE segment is perfectly normal and correct if there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser suggested checking for PROT_WRITE in the bss logic. I think it makes most sense to simply apply the bss logic only when there is bss. This patch looks less trivial than it is due to some reindentation. It just moves the "if (last_bss > elf_bss) {" test up to include the partial-page bss logic as well as the more-pages bss logic. Reported-by: John Reiser <jreiser@bitwagon.com> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* IMA: update ima_counts_putMimi Zohar2009-09-071-7/+15
| | | | | | | | | | | | | | - As ima_counts_put() may be called after the inode has been freed, verify that the inode is not NULL, before dereferencing it. - Maintain the IMA file counters in may_open() properly, decrementing any counter increments on subsequent errors. Reported-by: Ciprian Docan <docan@eden.rutgers.edu> Reported-by: J.R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com Signed-off-by: James Morris <jmorris@namei.org>
* Merge git://git.infradead.org/~dwmw2/mtd-2.6.31Linus Torvalds2009-09-051-0/+10
|\ | | | | | | | | | | | | | | * git://git.infradead.org/~dwmw2/mtd-2.6.31: JFFS2: add missing verify buffer allocation/deallocation mtd: nftl: fix offset alignments mtd: nftl: write support is broken mtd: m25p80: fix null pointer dereference bug
| * JFFS2: add missing verify buffer allocation/deallocationMassimo Cirillo2009-09-031-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function jffs2_nor_wbuf_flash_setup() doesn't allocate the verify buffer if CONFIG_JFFS2_FS_WBUF_VERIFY is defined, so causing a kernel panic when that macro is enabled and the verify function is called. Similarly the jffs2_nor_wbuf_flash_cleanup() must free the buffer if CONFIG_JFFS2_FS_WBUF_VERIFY is enabled. The following patch fixes the problem. The following patch applies to 2.6.30 kernel. Signed-off-by: Massimo Cirillo <maxcir@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@kernel.org
* | Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2009-09-051-1/+1
|\ \ | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: actually enable the swapext compat handler
| * | xfs: actually enable the swapext compat handlerChristoph Hellwig2009-09-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a small typo in the compat ioctl handler that cause the swapext compat handler to never be called. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Torsten Kaiser <just.for.lkml@googlemail.com> Tested-by: Torsten Kaiser <just.for.lkml@googlemail.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2009-09-051-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key
| * | | nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_keyRyusuke Konishi2009-08-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will fix the following preempt count underflow reported from users with the title "[NILFS users] segctord problem" (Message-ID: <949415.6494.qm@web58808.mail.re1.yahoo.com> and Message-ID: <debc30fc0908270825v747c1734xa59126623cfd5b05@mail.gmail.com>): WARNING: at kernel/sched.c:4890 sub_preempt_count+0x95/0xa0() Hardware name: HP Compaq 6530b (KR980UT#ABC) Modules linked in: bridge stp llc bnep rfcomm l2cap xfs exportfs nilfs2 cowloop loop vboxnetadp vboxnetflt vboxdrv btusb bluetooth uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 arc4 snd_hda_codec_analog ecb iwlagn iwlcore rfkill lib80211 mac80211 snd_hda_intel snd_hda_codec ehci_hcd uhci_hcd usbcore snd_hwdep snd_pcm tg3 cfg80211 psmouse snd_timer joydev libphy ohci1394 snd_page_alloc hp_accel lis3lv02d ieee1394 led_class i915 drm i2c_algo_bit video backlight output i2c_core dm_crypt dm_mod Pid: 4197, comm: segctord Not tainted 2.6.30-gentoo-r4-64 #7 Call Trace: [<ffffffff8023fa05>] ? sub_preempt_count+0x95/0xa0 [<ffffffff802470f8>] warn_slowpath_common+0x78/0xd0 [<ffffffff8024715f>] warn_slowpath_null+0xf/0x20 [<ffffffff8023fa05>] sub_preempt_count+0x95/0xa0 [<ffffffffa04ce4db>] nilfs_btnode_prepare_change_key+0x11b/0x190 [nilfs2] [<ffffffffa04d01ad>] nilfs_btree_assign_p+0x19d/0x1e0 [nilfs2] [<ffffffffa04d10ad>] nilfs_btree_assign+0xbd/0x130 [nilfs2] [<ffffffffa04cead7>] nilfs_bmap_assign+0x47/0x70 [nilfs2] [<ffffffffa04d9bc6>] nilfs_segctor_do_construct+0x956/0x20f0 [nilfs2] [<ffffffff805ac8e2>] ? _spin_unlock_irqrestore+0x12/0x40 [<ffffffff803c06e0>] ? __up_write+0xe0/0x150 [<ffffffff80262959>] ? up_write+0x9/0x10 [<ffffffffa04ce9f3>] ? nilfs_bmap_test_and_clear_dirty+0x43/0x60 [nilfs2] [<ffffffffa04cd627>] ? nilfs_mdt_fetch_dirty+0x27/0x60 [nilfs2] [<ffffffffa04db5fc>] nilfs_segctor_construct+0x8c/0xd0 [nilfs2] [<ffffffffa04dc3dc>] nilfs_segctor_thread+0x15c/0x3a0 [nilfs2] [<ffffffffa04dbe20>] ? nilfs_construction_timeout+0x0/0x10 [nilfs2] [<ffffffff80252633>] ? add_timer+0x13/0x20 [<ffffffff802370da>] ? __wake_up_common+0x5a/0x90 [<ffffffff8025e960>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2] [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2] [<ffffffff8025e556>] kthread+0x56/0x90 [<ffffffff8020cdea>] child_rip+0xa/0x20 [<ffffffff8025e500>] ? kthread+0x0/0x90 [<ffffffff8020cde0>] ? child_rip+0x0/0x20 This problem was caused due to a missing radix_tree_preload() call in the retry path of nilfs_btnode_prepare_change_key() function. Reported-by: Eric A <eric225125@yahoo.com> Reported-by: Jerome Poulin <jeromepoulin@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Jerome Poulin <jeromepoulin@gmail.com> Cc: stable@kernel.org
* | | | ext2: fix unbalanced kmap()/kunmap()Nicolas Pitre2009-09-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ext2_rename(), dir_page is acquired through ext2_dotdot(). It is then released through ext2_set_link() but only if old_dir != new_dir. Failing that, the pkmap reference count is never decremented and the page remains pinned forever. Repeat that a couple times with highmem pages and all pkmap slots get exhausted, and every further kmap() calls end up stalling on the pkmap_map_wait queue at which point the whole system comes to a halt. Signed-off-by: Nicolas Pitre <nico@marvell.com> Acked-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'upstream-linus' of ↵Linus Torvalds2009-09-052-2/+13
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: ocfs2_write_begin_nolock() should handle len=0 ocfs2: invalidate dentry if its dentry_lock isn't initialized.
| * | | | ocfs2: ocfs2_write_begin_nolock() should handle len=0Sunil Mushran2009-09-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bug introduced by mainline commit e7432675f8ca868a4af365759a8d4c3779a3d922 The bug causes ocfs2_write_begin_nolock() to oops when len=0. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Cc: stable@kernel.org Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | | | ocfs2: invalidate dentry if its dentry_lock isn't initialized.Tao Ma2009-08-271-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit a5a0a630922a2f6a774b6dac19f70cb5abd86bb0, when ocfs2_attch_dentry_lock fails, we call an extra iput and reset dentry->d_fsdata to NULL. This resolve a bug, but it isn't completed and the dentry is still there. When we want to use it again, ocfs2_dentry_revalidate doesn't catch it and return true. That make future ocfs2_dentry_lock panic out. One bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1162. The resolution is to add a check for dentry->d_fsdata in revalidate process and return false if dentry->d_fsdata is NULL, so that a new ocfs2_lookup will be called again. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* | | | | exec: do not sleep in TASK_TRACED under ->cred_guard_mutexOleg Nesterov2009-09-052-38/+42
| |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tom Horsley reports that his debugger hangs when it tries to read /proc/pid_of_tracee/maps, this happens since "mm_for_maps: take ->cred_guard_mutex to fix the race with exec" 04b836cbf19e885f8366bccb2e4b0474346c02d commit in 2.6.31. But the root of the problem lies in the fact that do_execve() path calls tracehook_report_exec() which can stop if the tracer sets PT_TRACE_EXEC. The tracee must not sleep in TASK_TRACED holding this mutex. Even if we remove ->cred_guard_mutex from mm_for_maps() and proc_pid_attr_write(), another task doing PTRACE_ATTACH should not hang until it is killed or the tracee resumes. With this patch do_execve() does not use ->cred_guard_mutex directly and we do not hold it throughout, instead: - introduce prepare_bprm_creds() helper, it locks the mutex and calls prepare_exec_creds() to initialize bprm->cred. - install_exec_creds() drops the mutex after commit_creds(), and thus before tracehook_report_exec()->ptrace_stop(). or, if exec fails, free_bprm() drops this mutex when bprm->cred != NULL which indicates install_exec_creds() was not called. Reported-by: Tom Horsley <tom.horsley@att.net> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | autofs4 - fix missed case when changing to use struct pathIan Kent2009-08-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the recent change by Al Viro that changes verious subsystems to use "struct path" one case was missed in the autofs4 module which causes mounts to no longer expire. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | inotify: update the group mask on mark additionEric Paris2009-08-281-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Seperating the addition and update of marks in inotify resulted in a regression in that inotify never gets events. The inotify group mask is always 0. This mask should be updated any time a new mark is added. Signed-off-by: Eric Paris <eparis@redhat.com>
* | | | inotify: fix length reporting and size checkingEric Paris2009-08-281-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0db501bd0610ee0c0 introduced a regresion in that it now sends a nul terminator but the length accounting when checking for space or reporting to userspace did not take this into account. This corrects all of the rounding logic. Signed-off-by: Eric Paris <eparis@redhat.com>
* | | | inotify: do not send a block of zeros when no pathname is availableBrian Rogers2009-08-281-3/+5
| |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an event has no pathname, there's no need to pad it with a null byte and therefore generate an inotify_event sized block of zeros. This fixes a regression introduced by commit 0db501bd0610ee0c0aca84d927f90bcccd09e2bd where my system wouldn't finish booting because some process was being confused by this. Signed-off-by: Brian Rogers <brian@xyzw.org> Signed-off-by: Eric Paris <eparis@redhat.com>
* | | Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notifyLinus Torvalds2009-08-272-85/+177
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.infradead.org/users/eparis/notify: inotify: Ensure we alwasy write the terminating NULL. inotify: fix locking around inotify watching in the idr inotify: do not BUG on idr entries at inotify destruction inotify: seperate new watch creation updating existing watches
| * | | inotify: Ensure we alwasy write the terminating NULL.Eric W. Biederman2009-08-271-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before the rewrite copy_event_to_user always wrote a terqminating '\0' byte to user space after the filename. Since the rewrite that terminating byte was skipped if your filename is exactly a multiple of event_size. Ouch! So add one byte to name_size before we round up and use clear_user to set userspace to zero like /dev/zero does instead of copying the strange nul_inotify_event. I can't quite convince myself len_to_zero will never exceed 16 and even if it doesn't clear_user should be more efficient and a more accurate reflection of what the code is trying to do. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | | inotify: fix locking around inotify watching in the idrEric Paris2009-08-271-10/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The are races around the idr storage of inotify watches. It's possible that a watch could be found from sys_inotify_rm_watch() in the idr, but it could be removed from the idr before that code does it's removal. Move the locking and the refcnt'ing so that these have to happen atomically. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | | inotify: do not BUG on idr entries at inotify destructionEric Paris2009-08-271-2/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an inotify watch is left in the idr when an fsnotify group is destroyed this will lead to a BUG. This is not a dangerous situation and really indicates a programming bug and leak of memory. This patch changes it to use a WARN and a printk rather than killing people's boxes. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | | inotify: seperate new watch creation updating existing watchesEric Paris2009-08-271-69/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is nothing known wrong with the inotify watch addition/modification but this patch seperates the two code paths to make them each easy to verify as correct. Signed-off-by: Eric Paris <eparis@redhat.com>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2009-08-274-105/+82
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: update documentation pointers 9p: remove unnecessary v9fses->options which duplicates the mount string net/9p: insulate the client against an invalid error code sent by a 9p server 9p: Add missing cast for the error return value in v9fs_get_inode 9p: Remove redundant inode uid/gid assignment 9p: Fix possible regressions when ->get_sb fails. 9p: Fix v9fs show_options 9p: Fix possible memleak in v9fs_inode_from fid. 9p: minor comment fixes 9p: Fix possible inode leak in v9fs_get_inode. 9p: Check for error in return value of v9fs_fid_add
| * | | | 9p: remove unnecessary v9fses->options which duplicates the mount stringAbhishek Kulkarni2009-08-173-35/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mount options string is saved in sb->s_options. This patch removes the redundant duplicating of the mount options. Also, since we are not displaying anything special in show options, we replace v9fs_show_options with generic_show_options for now. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | | | 9p: Add missing cast for the error return value in v9fs_get_inodeAbhishek Kulkarni2009-08-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cast the error return value (ENOMEM) in v9fs_get_inode() to its correct type using ERR_PTR. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | | | 9p: Remove redundant inode uid/gid assignmentAbhishek Kulkarni2009-08-171-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove a redundant update of inode's i_uid and i_gid after v9fs_get_inode() since the latter already sets up a new inode and sets the proper uid and gid values. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | | | 9p: Fix possible regressions when ->get_sb fails.Abhishek Kulkarni2009-08-171-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ->get_sb can fail causing some badness. this patch fixes * clear sb->fs_s_info in kill_sb. * deactivate_locked_super() calls kill_sb (v9fs_kill_super) which closes the destroys the client, clunks all its fids and closes the v9fs session. Attempting to do it twice will cause an oops. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | | | 9p: Fix v9fs show_optionsAbhishek Kulkarni2009-08-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the delimiter ',' before the options when they are passed and check if no option parameters are passed to prevent displaying NULL in /proc/mounts. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | | | 9p: Fix possible memleak in v9fs_inode_from fid.Abhishek Kulkarni2009-08-171-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add missing p9stat_free in v9fs_inode_from_fid to avoid any possible leaks. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | | | 9p: minor comment fixesAbhishek Kulkarni2009-08-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the comments -- mostly the improper and/or missing descriptions of function parameters. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | | | 9p: Fix possible inode leak in v9fs_get_inode.Abhishek Kulkarni2009-08-171-49/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a missing iput when cleaning up if v9fs_get_inode fails after returning a valid inode. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | | | 9p: Check for error in return value of v9fs_fid_addAbhishek Kulkarni2009-08-171-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check if v9fs_fid_add was successful or not based on its return value. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* | | | | AFS: Stop readlink() on AFS crashing due to NULL 'file' ptrDavid Howells2009-08-271-3/+15
| |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kAFS crashes when asked to read a symbolic link because page_getlink() passes a NULL file pointer to read_mapping_page(), but afs_readpage() expects a file pointer from which to extract a key. Modify afs_readpage() to request the appropriate key from the calling process's keyrings if a file struct is not supplied with one attached. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for_linus' of ↵Linus Torvalds2009-08-252-28/+44
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: ext3: Improve error message that changing journaling mode on remount is not possible ext3: Update Kconfig description of EXT3_DEFAULTS_TO_ORDERED
| * | | | ext3: Improve error message that changing journaling mode on remount is not ↵Jan Kara2009-08-241-13/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | possible This patch makes the error message about changing journaling mode on remount more descriptive. Some people are going to hit this error now due to commit bbae8bcc49bc4d002221dab52c79a50a82e7cd1f if they configure a kernel to default to data=writeback mode. The problem happens if they have data=ordered set for the root filesystem in /etc/fstab but not in the kernel command line (and they don't use initrd). Their filesystem then gets mounted as data=writeback by kernel but then their boot fails because init scripts won't be able to remount the filesystem rw. Better error message will hopefully make it easier for them to find the error in their setup and bother us less with error reports :). Signed-off-by: Jan Kara <jack@suse.cz>
| * | | | ext3: Update Kconfig description of EXT3_DEFAULTS_TO_ORDEREDTheodore Ts'o2009-08-241-15/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old description for this configuration option was perhaps not completely balanced in terms of describing the tradeoffs of using a default of data=writeback vs. data=ordered. Despite the fact that old description very strongly recomended disabling this feature, all of the major distributions have elected to preserve the existing 'legacy' default, which is a strong hint that it perhaps wasn't telling the whole story. This revised description has been vetted by a number of ext3 developers as being better at informing the user about the tradeoffs of enabling or disabling this configuration feature. Cc: linux-ext4@vger.kernel.org Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>
* | | | | NFSv4: Fix an infinite looping problem with the nfs4_state_managerTrond Myklebust2009-08-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 76db6d9500caeaa774a3e32a997eba30bbdc176b (nfs41: add session setup to the state manager) introduces an infinite loop possibility in the NFSv4 state manager. By first checking nfs4_has_session() before clearing the NFS4CLNT_SESSION_SETUP flag, it allows for a situation where someone sets that flag, but it never gets cleared, and so the state manager loops. In fact commit c3fad1b1aaf850bf692642642ace7cd0d64af0a3 (nfs41: add session reset to state manager) causes this to happen every time we get a network partition error. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>