aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* fsnotify: split generic and inode specific mark codeEric Paris2010-07-282-7/+7
| | | | | | | | currently all marking is done by functions in inode-mark.c. Some of this is pretty generic and should be instead done in a generic function and we should only put the inode specific code in inode-mark.c Signed-off-by: Eric Paris <eparis@redhat.com>
* fanotify: sys_fanotify_mark declartionEric Paris2010-07-281-0/+1
| | | | | | | | | This patch simply declares the new sys_fanotify_mark syscall int fanotify_mark(int fanotify_fd, unsigned int flags, u64_mask, int dfd const char *pathname) Signed-off-by: Eric Paris <eparis@redhat.com>
* fanotify: fanotify_init syscall declarationEric Paris2010-07-281-0/+3
| | | | | | | | | | | | This patch defines a new syscall fanotify_init() of the form: int sys_fanotify_init(unsigned int flags, unsigned int event_f_flags, unsigned int priority) This syscall is used to create and fanotify group. This is very similar to the inotify_init() syscall. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: take inode->i_lock inside fsnotify_find_mark_entry()Andreas Gruenbacher2010-07-282-7/+0
| | | | | | | | | All callers to fsnotify_find_mark_entry() except one take and release inode->i_lock around the call. Take the lock inside fsnotify_find_mark_entry() instead. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: rename fsnotify_find_mark_entry to fsnotify_find_markEric Paris2010-07-282-10/+10
| | | | | | the _entry portion of fsnotify functions is useless. Drop it. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: rename fsnotify_mark_entry to just fsnotify_markEric Paris2010-07-283-13/+13
| | | | | | | The name is long and it serves no real purpose. So rename fsnotify_mark_entry to just fsnotify_mark. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: put inode specific fields in an fsnotify_mark in a unionEric Paris2010-07-281-8/+8
| | | | | | | | | The addition of marks on vfs mounts will be simplified if the inode specific parts of a mark and the vfsmnt specific parts of a mark are actually in a union so naming can be easy. This patch just implements the inode struct and the union. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: include vfsmount in should_send_event when appropriateEric Paris2010-07-282-2/+4
| | | | | | | | | To ensure that a group will not duplicate events when it receives it based on the vfsmount and the inode should_send_event test we should distinguish those two cases. We pass a vfsmount to this function so groups can make their own determinations. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: drop mask argument from fsnotify_alloc_groupEric Paris2010-07-282-2/+2
| | | | | | | Nothing uses the mask argument to fsnotify_alloc_group. This patch drops that argument. Signed-off-by: Eric Paris <eparis@redhat.com>
* Audit: only set group mask when something is being watchedEric Paris2010-07-281-2/+9
| | | | | | | | | Currently the audit watch group always sets a mask equal to all events it might care about. We instead should only set the group mask if we are actually watching inodes. This should be a perf win when audit watches are compiled in. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: fsnotify_obtain_group should be fsnotify_alloc_groupEric Paris2010-07-282-3/+3
| | | | | | | | fsnotify_obtain_group was intended to be able to find an already existing group. Nothing uses that functionality. This just renames it to fsnotify_alloc_group so it is clear what it is doing. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: remove group_num altogetherEric Paris2010-07-282-3/+2
| | | | | | | | The original fsnotify interface has a group-num which was intended to be able to find a group after it was added. I no longer think this is a necessary thing to do and so we remove the group_num. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: include data in should_send callsEric Paris2010-07-282-2/+2
| | | | | | | | | fanotify is going to need to look at file->private_data to know if an event should be sent or not. This passes the data (which might be a file, dentry, inode, or none) to the should_send function calls so fanotify can get that information when available Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: provide the data type to should_send_eventEric Paris2010-07-282-2/+4
| | | | | | | | | | fanotify is only interested in event types which contain enough information to open the original file in the context of the fanotify listener. Since fanotify may not want to send events if that data isn't present we pass the data type to the should_send_event function call so fanotify can express its lack of interest. Signed-off-by: Eric Paris <eparis@redhat.com>
* inotify: remove inotify in kernel interfaceEric Paris2010-07-281-1/+0
| | | | | | nothing uses inotify in the kernel, drop it! Signed-off-by: Eric Paris <eparis@redhat.com>
* Audit: audit watch init should not be before fsnotify initEric Paris2010-07-281-1/+1
| | | | | | | | | | | | Audit watch init and fsnotify init both use subsys_initcall() but since the audit watch code is linked in before the fsnotify code the audit watch code would be using the fsnotify srcu struct before it was initialized. This patch fixes that problem by moving audit watch init to device_initcall() so it happens after fsnotify is ready. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Eric Paris <eparis@redhat.com> Tested-by : Sachin Sant <sachinp@in.ibm.com>
* Audit: split audit watch KconfigEric Paris2010-07-282-3/+16
| | | | | | | | Audit watch should depend on CONFIG_AUDIT_SYSCALL and should select FSNOTIFY. This splits the spagetti like mixing of audit_watch and audit_filter code so they can be configured seperately. Signed-off-by: Eric Paris <eparis@redhat.com>
* audit: reimplement audit_trees using fsnotify rather than inotifyEric Paris2010-07-282-106/+132
| | | | | | | Simply switch audit_trees from using inotify to using fsnotify for it's inode pinning and disappearing act information. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: allow addition of duplicate fsnotify marksEric Paris2010-07-281-1/+1
| | | | | | | | | This patch allows a task to add a second fsnotify mark to an inode for the same group. This mark will be added to the end of the inode's list and this will never be found by the stand fsnotify_find_mark() function. This is useful if a user wants to add a new mark before removing the old one. Signed-off-by: Eric Paris <eparis@redhat.com>
* audit: do not get and put just to free a watchEric Paris2010-07-283-31/+5
| | | | | | | | | | deleting audit watch rules is not currently done under audit_filter_mutex. It was done this way because we could not hold the mutex during inotify manipulation. Since we are using fsnotify we don't need to do the extra get/put pair nor do we need the private list on which to store the parents while they are about to be freed. Signed-off-by: Eric Paris <eparis@redhat.com>
* audit: redo audit watch locking and refcnt in light of fsnotifyEric Paris2010-07-281-40/+5
| | | | | | | | fsnotify can handle mutexes to be held across all fsnotify operations since it deals strickly in spinlocks. This can simplify and reduce some of the audit_filter_mutex taking and dropping. Signed-off-by: Eric Paris <eparis@redhat.com>
* audit: convert audit watches to use fsnotify instead of inotifyEric Paris2010-07-281-60/+148
| | | | | | | | Audit currently uses inotify to pin inodes in core and to detect when watched inodes are deleted or unmounted. This patch uses fsnotify instead of inotify. Signed-off-by: Eric Paris <eparis@redhat.com>
* Audit: clean up the audit_watch splitEric Paris2010-07-285-67/+60
| | | | | | | | No real changes, just cleanup to the audit_watch split patch which we done with minimal code changes for easy review. Now fix interfaces to make things work better. Signed-off-by: Eric Paris <eparis@redhat.com>
* dynamic debug: move ddebug_remove_module() down into free_module()Jason Baron2010-07-271-1/+3
| | | | | | | | | | | | | | | | | | | The command echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control causes an oops. Move the call to ddebug_remove_module() down into free_module(). In this way it should be called from all error paths. Currently, we are missing the remove if the module init routine fails. Signed-off-by: Jason Baron <jbaron@redhat.com> Reported-by: Thomas Renninger <trenn@suse.de> Tested-by: Thomas Renninger <trenn@suse.de> Cc: <stable@kernel.org> [2.6.32+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sysrq,kdb: Use __handle_sysrq() for kdb's sysrq functionJason Wessel2010-07-211-2/+1
| | | | | | | | The kdb code should not toggle the sysrq state in case an end user wants to try and resume the normal kernel execution. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
* debug_core,kdb: fix kgdb_connected bit set in the wrong placeJason Wessel2010-07-211-1/+1
| | | | | | | | | | | | Immediately following an exit from the kdb shell the kgdb_connected variable should be set to zero, unless there are breakpoints planted. If the kgdb_connected variable is not zeroed out with kdb, it is impossible to turn off kdb. This patch is merely a work around for now, the real fix will check for the breakpoints. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
* Fix merge regression from external kdb to upstream kdbJason Wessel2010-07-211-0/+1
| | | | | | | | | | | In the process of merging kdb to the mainline, the kdb lsmod command stopped printing the base load address of kernel modules. This is needed for using kdb in conjunction with external tools such as gdb. Simply restore the functionality by adding a kdb_printf for the base load address of the kernel modules. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
* repair gdbstub to match the gdbserial protocol specificationJason Wessel2010-07-211-6/+3
| | | | | | | | | | | | | | | The gdbserial protocol handler should return an empty packet instead of an error string when ever it responds to a command it does not implement. The problem cases come from a debugger client sending qTBuffer, qTStatus, qSearch, qSupported. The incorrect response from the gdbstub leads the debugger clients to not function correctly. Recent versions of gdb will not detach correctly as a result of this behavior. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
* kdb: break out of kdb_ll() when command is terminatedMartin Hicks2010-07-211-0/+3
| | | | | | | | Without this patch the "ll" linked-list traversal command won't terminate when you hit q/Q. Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
* kmemleak: Add support for NO_BOOTMEM configurationsCatalin Marinas2010-07-191-0/+6
| | | | | | | | | | | | | With commits 08677214 and 59be5a8e, alloc_bootmem()/free_bootmem() and friends use the early_res functions for memory management when NO_BOOTMEM is enabled. This patch adds the kmemleak calls in the corresponding code paths for bootmem allocations. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org
* module: initialize module dynamic debug laterYehuda Sadeh2010-07-041-8/+15
| | | | | | | | | | | | | We should initialize the module dynamic debug datastructures only after determining that the module is not loaded yet. This fixes a bug that introduced in 2.6.35-rc2, where when a trying to load a module twice, we also load it's dynamic printing data twice which causes all sorts of nasty issues. Also handle the dynamic debug cleanup later on failure. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (removed a #ifdef) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds2010-07-022-10/+10
|\ | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Cure nr_iowait_cpu() users init: Fix comment init, sched: Fix race between init and kthreadd
| * sched: Cure nr_iowait_cpu() usersPeter Zijlstra2010-07-012-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0224cf4c5e (sched: Intoduce get_cpu_iowait_time_us()) broke things by not making sure preemption was indeed disabled by the callers of nr_iowait_cpu() which took the iowait value of the current cpu. This resulted in a heap of preempt warnings. Cure this by making nr_iowait_cpu() take a cpu number and fix up the callers to pass in the right number. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Jiri Slaby <jslaby@suse.cz> Cc: linux-pm@lists.linux-foundation.org LKML-Reference: <1277968037.1868.120.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | futex: futex_find_get_task remove credentails checkMichal Hocko2010-06-301-13/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | futex_find_get_task is currently used (through lookup_pi_state) from two contexts, futex_requeue and futex_lock_pi_atomic. None of the paths looks it needs the credentials check, though. Different (e)uids shouldn't matter at all because the only thing that is important for shared futex is the accessibility of the shared memory. The credentail check results in glibc assert failure or process hang (if glibc is compiled without assert support) for shared robust pthread mutex with priority inheritance if a process tries to lock already held lock owned by a process with a different euid: pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed. The problem is that futex_lock_pi_atomic which is called when we try to lock already held lock checks the current holder (tid is stored in the futex value) to get the PI state. It uses lookup_pi_state which in turn gets task struct from futex_find_get_task. ESRCH is returned either when the task is not found or if credentials check fails. futex_lock_pi_atomic simply returns if it gets ESRCH. glibc code, however, doesn't expect that robust lock returns with ESRCH because it should get either success or owner died. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Darren Hart <dvhltc@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nick Piggin <npiggin@suse.de> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | kexec: fix Oops in crash_shrink_memory()Pavan Naregundi2010-06-291-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When crashkernel is not enabled, "echo 0 > /sys/kernel/kexec_crash_size" OOPSes the kernel in crash_shrink_memory. This happens when crash_shrink_memory tries to release the 'crashk_res' resource which are not reserved. Also value of "/sys/kernel/kexec_crash_size" shows as 1, which should be 0. This patch fixes the OOPS in crash_shrink_memory and shows "/sys/kernel/kexec_crash_size" as 0 when crash kernel memory is not reserved. Signed-off-by: Pavan Naregundi <pavan@linux.vnet.ibm.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Simon Horman <horms@verge.net.au> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2010-06-281-1/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Fix undeclared ENOSYS in include/linux/tracepoint.h perf record: prevent kill(0, SIGTERM); perf session: Remove threads from tree on PERF_RECORD_EXIT perf/tracing: Fix regression of perf losing kprobe events perf_events: Fix Intel Westmere event constraints perf record: Don't call newt functions when not initialized
| * | perf/tracing: Fix regression of perf losing kprobe eventsSteven Rostedt2010-06-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the addition of the code to shrink the kernel tracepoint infrastructure, we lost kprobes being traced by perf. The reason is that I tested if the "tp_event->class->perf_probe" existed before enabling it. This prevents "ftrace only" events (like the function trace events) from being enabled by perf. Unfortunately, kprobe events do not use perf_probe. This causes kprobes to be missed by perf. To fix this, we add the test to see if "tp_event->class->reg" exists as well as perf_probe. Normal trace events have only "perf_probe" but no "reg" function, and kprobes and syscalls have the "reg" but no "perf_probe". The ftrace unique events do not have either, so this is a valid test. If a kprobe or syscall is not to be probed by perf, the "reg" function is called anyway, and will return a failure and prevent perf from probing it. Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | Merge branch 'irq-fixes-for-linus' of ↵Linus Torvalds2010-06-281-0/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Deal with desc->set_type() changing desc->chip
| * | | genirq: Deal with desc->set_type() changing desc->chipThomas Gleixner2010-06-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The set_type() function can change the chip implementation when the trigger mode changes. That might result in using an non-initialized irq chip when called from __setup_irq() or when called via set_irq_type() on an already enabled irq. The set_irq_type() function should not be called on an enabled irq, but because we forgot to put a check into it, we have a bunch of users which grew the habit of doing that and it never blew up as the function is serialized via desc->lock against all users of desc->chip and they never hit the non-initialized irq chip issue. The easy fix for the __setup_irq() issue would be to move the irq_chip_set_defaults(desc->chip) call after the trigger setting to make sure that a chip change is covered. But as we have already users, which do the type setting after request_irq(), the safe fix for now is to call irq_chip_set_defaults() from __irq_set_trigger() when desc->set_type() changed the irq chip. It needs a deeper analysis whether we should refuse to change the chip on an already enabled irq, but that'd be a large scale change to fix all the existing users. So that's neither stable nor 2.6.35 material. Reported-by: Esben Haabendal <eha@doredevelopment.dk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev <linuxppc-dev@ozlabs.org> Cc: stable@kernel.org
* | | | Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds2010-06-281-59/+65
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Prevent compiler from optimising the sched_avg_update() loop sched: Fix over-scheduling bug sched: Fix PROVE_RCU vs cpu_cgroup
| * | | sched: Prevent compiler from optimising the sched_avg_update() loopWill Deacon2010-06-251-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 4.4.1 on ARM has been observed to replace the while loop in sched_avg_update with a call to uldivmod, resulting in the following build failure at link-time: kernel/built-in.o: In function `sched_avg_update': kernel/sched.c:1261: undefined reference to `__aeabi_uldivmod' kernel/sched.c:1261: undefined reference to `__aeabi_uldivmod' make: *** [.tmp_vmlinux1] Error 1 This patch introduces a fake data hazard to the loop body to prevent the compiler optimising the loop away. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | sched: Fix over-scheduling bugAlex,Shi2010-06-181-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e70971591 ("sched: Optimize unused cgroup configuration") introduced an imbalanced scheduling bug. If we do not use CGROUP, function update_h_load won't update h_load. When the system has a large number of tasks far more than logical CPU number, the incorrect cfs_rq[cpu]->h_load value will cause load_balance() to pull too many tasks to the local CPU from the busiest CPU. So the busiest CPU keeps going in a round robin. That will hurt performance. The issue was found originally by a scientific calculation workload that developed by Yanmin. With that commit, the workload performance drops about 40%. CPU before after 00 : 2 : 7 01 : 1 : 7 02 : 11 : 6 03 : 12 : 7 04 : 6 : 6 05 : 11 : 7 06 : 10 : 6 07 : 12 : 7 08 : 11 : 6 09 : 12 : 6 10 : 1 : 6 11 : 1 : 6 12 : 6 : 6 13 : 2 : 6 14 : 2 : 6 15 : 1 : 6 Reviewed-by: Yanmin zhang <yanmin.zhang@intel.com> Signed-off-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1276754893.9452.5442.camel@debian> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | sched: Fix PROVE_RCU vs cpu_cgroupPeter Zijlstra2010-06-081-56/+59
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROVE_RCU has a few issues with the cpu_cgroup because the scheduler typically holds rq->lock around the css rcu derefs but the generic cgroup code doesn't (and can't) know about that lock. Provide means to add extra checks to the css dereference and use that in the scheduler to annotate its users. The addition of rq->lock to these checks is correct because the cgroup_subsys::attach() method takes the rq->lock for each task it moves, therefore by holding that lock, we ensure the task is pinned to the current cgroup and the RCU derefence is valid. That leaves one genuine race in __sched_setscheduler() where we used task_group() without holding any of the required locks and thus raced with the cgroup code. Solve this by moving the check under the appropriate lock. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds2010-06-281-4/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: nohz: Fix nohz ratelimit
| * | | nohz: Fix nohz ratelimitPeter Zijlstra2010-06-171-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chris Wedgwood reports that 39c0cbe (sched: Rate-limit nohz) causes a serial console regression, unresponsiveness, and indeed it does. The reason is that the nohz code is skipped even when the tick was already stopped before the nohz_ratelimit(cpu) condition changed. Move the nohz_ratelimit() check to the other conditions which prevent long idle sleeps. Reported-by: Chris Wedgwood <cw@f00f.org> Tested-by: Brian Bloniarz <bmb@athenacr.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg KH <gregkh@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Jef Driesen <jefdriesen@telenet.be> LKML-Reference: <1276790557.27822.516.camel@twins> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds2010-06-282-0/+11
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: silence PROVE_RCU in sched_fork() idr: fix RCU lockdep splat in idr_get_next() rcu: apply RCU protection to wake_affine()
| * | | | sched: silence PROVE_RCU in sched_fork()Peter Zijlstra2010-06-231-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because cgroup_fork() is ran before sched_fork() [ from copy_process() ] and the child's pid is not yet visible the child is pinned to its cgroup. Therefore we can silence this warning. A nicer solution would be moving cgroup_fork() to right after dup_task_struct() and exclude PF_STARTING from task_subsys_state(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| * | | | rcu: apply RCU protection to wake_affine()Daniel J Blueman2010-06-231-0/+2
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The task_group() function returns a pointer that must be protected by either RCU, the ->alloc_lock, or the cgroup lock (see the rcu_dereference_check() in task_subsys_state(), which is invoked by task_group()). The wake_affine() function currently does none of these, which means that a concurrent update would be within its rights to free the structure returned by task_group(). Because wake_affine() uses this structure only to compute load-balancing heuristics, there is no reason to acquire either of the two locks. Therefore, this commit introduces an RCU read-side critical section that starts before the first call to task_group() and ends after the last use of the "tg" pointer returned from task_group(). Thanks to Li Zefan for pointing out the need to extend the RCU read-side critical section from that proposed by the original patch. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* | | | Merge branch 'bugzilla-13931-sleep-nvs' into releaseLen Brown2010-06-124-17/+24
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | Conflicts: drivers/acpi/sleep.c Signed-off-by: Len Brown <len.brown@intel.com>
| * | | suspend: Move NVS save/restore code to generic suspend functionalityMatthew Garrett2010-06-104-17/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Saving platform non-volatile state may be required for suspend to RAM as well as hibernation. Move it to more generic code. Signed-off-by: Matthew Garrett <mjg@redhat.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: Maxim Levitsky <maximlevitsky@gmail.com> Signed-off-by: Len Brown <len.brown@intel.com>