aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/cpu.c
Commit message (Collapse)AuthorAgeFilesLines
* stop_machine/cpu hotplug: fix disable_nonboot_cpusHeiko Carstens2009-01-071-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the caller already created the stop_machine workqueue (like cpu_down does). Otherwise a call to stop_machine will lead to accesses to random memory regions. When introducing this new interface (9ea09af3bd3090e8349ca2899ca2011bd94cda85 "stop_machine: introduce stop_machine_create/destroy") I missed the second call site of _cpu_down. So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus as well. Fixes suspend-to-ram/disk and also this bug: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b [ 286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3 [ 286.550598] Oops: 0002 [#1] SMP [ 286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 [ 286.560580] EIP: is at __stop_machine+0x88/0xe3 [ 286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30 [ 286.560580] Call Trace: [ 286.560580] [<c03d04e4>] ? _cpu_down+0x10f/0x234 [ 286.560580] [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc [ 286.560580] [<c01360c0>] ? kernel_poweroff+0x22/0x39 [ 286.560580] [<c0136301>] ? sys_reboot+0xde/0x14c [ 286.560580] [<c01331b2>] ? complete_signal+0x179/0x191 [ 286.560580] [<c0133396>] ? send_signal+0x1cc/0x1e1 [ 286.560580] [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c [ 286.560580] [<c0133b65>] ? group_send_signal_info+0x58/0x61 [ 286.560580] [<c0133b9e>] ? kill_pid_info+0x30/0x3a [ 286.560580] [<c0133d49>] ? sys_kill+0x75/0x13a [ 286.560580] [<c01a06cb>] ? mntput_no_expire+ox1f/0x101 [ 286.560580] [<c019b3b3>] ? dput+0x1e/0x105 [ 286.560580] [<c018ef87>] ? __fput+0x150/0x158 [ 286.560580] [<c0157abf>] ? audit_syscall_entry+0x137/0x159 [ 286.560580] [<c010329f>] ? sysenter_do_call+0x12/0x34 Reported-and-tested-by: "Justin P. Mattock" <justinmattock@gmail.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* stop_machine: introduce stop_machine_create/destroy.Heiko Carstens2009-01-051-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Introduce stop_machine_create/destroy. With this interface subsystems that need a non-failing stop_machine environment can create the stop_machine machine threads before actually calling stop_machine. When the threads aren't needed anymore they can be killed with stop_machine_destroy again. When stop_machine gets called and the threads aren't present they will be created and destroyed automatically. This restores the old behaviour of stop_machine. This patch also converts cpu hotplug to the new interface since it is special: cpu_down calls __stop_machine instead of stop_machine. However the kstop threads will only be created when stop_machine gets called. Changing the code so that the threads would be created automatically on __stop_machine is currently not possible: when __stop_machine gets called we hold cpu_add_remove_lock, which is the same lock that create_rt_workqueue would take. So the workqueue needs to be created before the cpu hotplug code locks cpu_add_remove_lock. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* cpumask: convert kernel/cpu.cRusty Russell2009-01-011-19/+29
| | | | | | | | | | | | Impact: Reduce kernel stack and memory usage, use new cpumask API. Use cpumask_var_t for take_cpu_down() stack var, and frozen_cpus. Note that notify_cpu_starting() can be called before core_initcall allocates frozen_cpus, but the NULL check is optimized out by gcc for the CONFIG_CPUMASK_OFFSTACK=n case. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* cpumask: make set_cpu_*/init_cpu_* out-of-lineRusty Russell2008-12-301-0/+47
| | | | | | | | | | | | | | | They're only for use in boot/cpu hotplug code anyway, and this avoids the use of deprecated cpu_*_map. Stephen Rothwell points out that gcc 4.2.4 (on powerpc at least) didn't like the cast away of const anyway: include/linux/cpumask.h: In function 'set_cpu_possible': include/linux/cpumask.h:1052: warning: passing argument 2 of 'cpumask_set_cpu' discards qualifiers from pointer target type So this kills two birds with one stone. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* cpumask: switch over to cpu_online/possible/active/present_mask: coreRusty Russell2008-12-301-26/+23
| | | | | | | | | | | | | | Impact: cleanup This implements the obsolescent cpu_online_map in terms of cpu_online_mask, rather than the other way around. Same for the other maps. The documentation comments are also updated to refer to _mask rather than _map. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>
* cpumask: centralize cpu_online_map and cpu_possible_mapRusty Russell2008-12-131-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: cleanup Each SMP arch defines these themselves. Move them to a central location. Twists: 1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a CONFIG_INIT_ALL_POSSIBLE for this rather than break them. 2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'. Those archs simply have phys_cpu_present_map replaced everywhere. 3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky so I just manipulate them both in sync. 4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map' declarations. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Grant Grundler <grundler@parisc-linux.org> Tested-by: Tony Luck <tony.luck@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Mike Travis <travis@sgi.com> Cc: ink@jurassic.park.msu.ru Cc: rmk@arm.linux.org.uk Cc: starvik@axis.com Cc: tony.luck@intel.com Cc: takata@linux-m32r.org Cc: ralf@linux-mips.org Cc: grundler@parisc-linux.org Cc: paulus@samba.org Cc: schwidefsky@de.ibm.com Cc: lethal@linux-sh.org Cc: wli@holomorphy.com Cc: davem@davemloft.net Cc: jdike@addtoit.com Cc: mingo@redhat.com
* cpuinit fixes in kernel/*Al Viro2008-11-301-1/+1
| | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cpumask: introduce new API, without changing anythingRusty Russell2008-11-061-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: introduce new APIs We want to deprecate cpumasks on the stack, as we are headed for gynormous numbers of CPUs. Eventually, we want to head towards an undefined 'struct cpumask' so they can never be declared on stack. 1) New cpumask functions which take pointers instead of copies. (cpus_* -> cpumask_*) 2) Several new helpers to reduce requirements for temporary cpumasks (cpumask_first_and, cpumask_next_and, cpumask_any_and) 3) Helpers for declaring cpumasks on or offstack for large NR_CPUS (cpumask_var_t, alloc_cpumask_var and free_cpumask_var) 4) 'struct cpumask' for explicitness and to mark new-style code. 5) Make iterator functions stop at nr_cpu_ids (a runtime constant), not NR_CPUS for time efficiency and for smaller dynamic allocations in future. 6) cpumask_copy() so we can allocate less than a full cpumask eventually (for alloc_cpumask_var), and so we can eliminate the 'struct cpumask' definition eventually. 7) work_on_cpu() helper for doing task on a CPU, rather than saving old cpumask for current thread and manipulating it. 8) smp_call_function_many() which is smp_call_function_mask() except taking a cpumask pointer. Note that this patch simply introduces the new functions and leaves the obsolescent ones in place. This is to simplify the transition patches. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*-. Merge branches 'sched/devel', 'sched/cpu-hotplug', 'sched/cpusets' and ↵Ingo Molnar2008-10-081-2/+22
|\ \ | | | | | | | | | 'sched/urgent' into sched/core
| | * kernel/cpu.c: create a CPU_STARTING cpu_chain notifierManfred Spraul2008-09-081-0/+19
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now, there is no notifier that is called on a new cpu, before the new cpu begins processing interrupts/softirqs. Various kernel function would need that notification, e.g. kvm works around by calling smp_call_function_single(), rcu polls cpu_online_map. The patch adds a CPU_STARTING notification. It also adds a helper function that sends the message to all cpu_chain handlers. Tested on x86-64. All other archs are untested. Especially on sparc, I'm not sure if I got it right. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * kernel/cpu.c: Move the CPU_DYING notifiersManfred Spraul2008-09-061-2/+3
|/ | | | | | | | | | | | | | | | | | | | | | | | When a cpu is taken offline, the CPU_DYING notifiers are called on the dying cpu. According to <linux/notifiers.h>, the cpu should be "not running any task, not handling interrupts, soon dead". For the current implementation, this is not true: - __cpu_disable can fail. If it fails, then the cpu will remain alive and happy. - At least on x86, __cpu_disable() briefly enables the local interrupts to handle any outstanding interrupts. What about moving CPU_DYING down a few lines, behind the __cpu_disable() line? There are only two CPU_DYING handlers in the kernel right now: one in kvm, one in the scheduler. Both should work with the patch applied [and: I'm not sure if either one handles a failing __cpu_disable()] The patch survives simple offlining a cpu. kvm untested due to lack of a test setup. Signed-off-By: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* cpu hotplug: s390 doesn't support additional_cpus anymore.Heiko Carstens2008-08-121-1/+1
| | | | | | | | | | | s390 doesn't support the additional_cpus kernel parameter anymore since a long time. So we better update the code and documentation to reflect that. Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sched, cpu hotplug: fix set_cpus_allowed() use in hotplug callbacksDmitry Adamushko2008-08-111-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark Langsdorf reported: > One of my co-workers noticed that the powernow-k8 > driver no longer restarts when a CPU core is > hot-disabled and then hot-enabled on AMD quad-core > systems. > > The following comands work fine on 2.6.26 and fail > on 2.6.27-rc1: > > echo 0 > /sys/devices/system/cpu/cpu3/online > echo 1 > /sys/devices/system/cpu/cpu3/online > find /sys -name cpufreq > > For 2.6.26, the find will return a cpufreq > directory for each processor. In 2.6.27-rc1, > the cpu3 directory is missing. > > After digging through the code, the following > logic is failing when the core is hot-enabled > at runtime. The code works during the boot > sequence. > > cpumask_t = current->cpus_allowed; > set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu)); > if (smp_processor_id() != cpu) > return -ENODEV; So set the CPU active before calling the CPU_ONLINE notifier chain, there are a handful of notifiers that use set_cpus_allowed(). This fix also solves the problem with x86-microcode. I've sent alternative patches for microcode, but as this "rely on set_cpus_allowed_ptr() being workable in cpu-hotplug(CPU_ONLINE, ...)" assumption seems to be more broad than what we thought, perhaps this fix should be applied. With this patch we define that by the moment CPU_ONLINE is being sent, a 'cpu' is online and ready for tasks to be migrated onto it. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Reported-by: Mark Langsdorf <mark.langsdorf@amd.com> Tested-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'linus' into cpus4096Ingo Molnar2008-07-281-11/+5
|\ | | | | | | | | | | | | | | Conflicts: kernel/stop_machine.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * stop_machine(): stop_machine_run() changed to use cpu maskRusty Russell2008-07-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Instead of a "cpu" arg with magic values NR_CPUS (any cpu) and ~0 (all cpus), pass a cpumask_t. Allow NULL for the common case (where we don't care which CPU the function is run on): temporary cpumask_t's are usually considered bad for stack space. This deprecates stop_machine_run, to be removed soon when all the callers are dead. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * Hotplug CPU: don't check cpu_online after take_cpu_downRusty Russell2008-07-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Akinobu points out that if take_cpu_down() succeeds, the cpu must be offline. Remove the cpu_online() check, and put a BUG_ON(). Quoting Akinobu Mita: Actually the cpu_online() check was necessary before appling this stop_machine: simplify patch. With old __stop_machine_run(), __stop_machine_run() could succeed (return !IS_ERR(p) value) even if take_cpu_down() returned non-zero value. The return value of take_cpu_down() was obtained through kthread_stop().. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: "Akinobu Mita" <akinobu.mita@gmail.com>
| * Simplify stop_machineRusty Russell2008-07-281-10/+3
| | | | | | | | | | | | | | | | | | | | stop_machine creates a kthread which creates kernel threads. We can create those threads directly and simplify things a little. Some care must be taken with CPU hotunplug, which has special needs, but that code seems more robust than it was in the past. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
* | cpu masks: optimize and clean up cpumask_of_cpu()Linus Torvalds2008-07-281-108/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up and optimize cpumask_of_cpu(), by sharing all the zero words. Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns creating a huge array of constant bitmasks, realize that the zero words can be shared. In other words, on a 64-bit architecture, we only ever need 64 of these arrays - with a different bit set in one single world (with enough zero words around it so that we can create any bitmask by just offsetting in that big array). And then we just put enough zeroes around it that we can point every single cpumask to be one of those things. So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each, with one bit set in each array - 2MB memory total), we have exactly 64 arrays instead, each 8k bits in size (64kB total). And then we just point cpumask(n) to the right position (which we can calculate dynamically). Once we have the right arrays, getting "cpumask(n)" ends up being: static inline const cpumask_t *get_cpu_mask(unsigned int cpu) { const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG]; p -= cpu / BITS_PER_LONG; return (const cpumask_t *)p; } This brings other advantages and simplifications as well: - we are not wasting memory that is just filled with a single bit in various different places - we don't need all those games to re-create the arrays in some dense format, because they're already going to be dense enough. if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory is a non-issue (especially since by doing this "overlapping" trick we probably get better cache behaviour anyway). [ mingo@elte.hu: Converted Linus's mails into a commit. See: http://lkml.org/lkml/2008/7/27/156 http://lkml.org/lkml/2008/7/28/320 Also applied a family filter - which also has the side-effect of leaving out the bits where Linus calls me an idio... Oh, never mind ;-) ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | cpumask: export cpumask_of_cpu_mapIngo Molnar2008-07-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | fix: ERROR: "cpumask_of_cpu_map" [drivers/acpi/processor.ko] undefined! ERROR: "cpumask_of_cpu_map" [arch/x86/kernel/microcode.ko] undefined! ERROR: "cpumask_of_cpu_map" [arch/x86/kernel/cpu/cpufreq/speedstep-ich.ko] undefined! ERROR: "cpumask_of_cpu_map" [arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko] undefined! Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | cpumask: put cpumask_of_cpu_map in the initdata sectionMike Travis2008-07-261-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | * Create the cpumask_of_cpu_map statically in the init data section using NR_CPUS but replace it during boot up with one sized by nr_cpu_ids (num possible cpus). Signed-off-by: Mike Travis <travis@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | cpumask: make cpumask_of_cpu_map genericMike Travis2008-07-261-0/+109
|/ | | | | | | | | | | | | | | | | | | | | | If an arch doesn't define cpumask_of_cpu_map, create a generic statically-initialized one for them. This allows removal of the buggy cpumask_of_cpu() macro (&cpumask_of_cpu() gives address of out-of-scope var). An arch with NR_CPUS of 4096 probably wants to allocate this itself based on the actual number of CPUs, since otherwise they're using 2MB of rodata (1024 cpus means 128k). That's what CONFIG_HAVE_CPUMASK_OF_CPU_MAP is for (only x86/64 does so at the moment). In future as we support more CPUs, we'll need to resort to a get_cpu_map()/put_cpu_map() allocation scheme. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* workqueues: make get_online_cpus() useable for work->func()Oleg Nesterov2008-07-251-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under cpu_maps_update_begin(). This means that the multithreaded workqueues can't use get_online_cpus() due to the possible deadlock, very bad and very old problem. Introduce the new state, CPU_POST_DEAD, which is called after cpu_hotplug_done() but before cpu_maps_update_done(). Change workqueue_cpu_callback() to use CPU_POST_DEAD instead of CPU_DEAD. This means that create/destroy functions can't rely on get_online_cpus() any longer and should take cpu_add_remove_lock instead. [akpm@linux-foundation.org: fix CONFIG_SMP=n] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'sched/for-linus' of ↵Linus Torvalds2008-07-231-6/+34
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: hrtick_enabled() should use cpu_active() sched, x86: clean up hrtick implementation sched: fix build error, provide partition_sched_domains() unconditionally sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed cpu hotplug: Make cpu_active_map synchronization dependency clear cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) sched: rework of "prioritize non-migratable tasks over migratable ones" sched: reduce stack size in isolated_cpu_setup() Revert parts of "ftrace: do not trace scheduler functions" Fixed up conflicts in include/asm-x86/thread_info.h (due to the TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr() introduction).
| * cpu hotplug: Make cpu_active_map synchronization dependency clearMax Krasnyansky2008-07-181-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This goes on top of the cpu_active_map (take 2) patch. Currently we depend on the stop_machine to provide nescessesary synchronization for the cpu_active_map updates. As Dmitry Adamushko pointed this is fragile and is not much clearer than the previous scheme. In other words we do not want to depend on the internal stop machine operation here. So make the synchronization rules clear by doing synchronize_sched() after clearing out cpu active bit. Tested on quad-Core2 with: while true; do for i in 1 2 3; do echo 0 > /sys/devices/system/cpu/cpu$i/online done for i in 1 2 3; do echo 1 > /sys/devices/system/cpu/cpu$i/online done done and stress -c 200 No lockdep, preempt or other complaints. Signed-off-by: Max Krasnyansky <maxk@qualcomm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment ↵Max Krasnyansky2008-07-181-6/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (take 2) This is based on Linus' idea of creating cpu_active_map that prevents scheduler load balancer from migrating tasks to the cpu that is going down. It allows us to simplify domain management code and avoid unecessary domain rebuilds during cpu hotplug event handling. Please ignore the cpusets part for now. It needs some more work in order to avoid crazy lock nesting. Although I did simplfy and unify domain reinitialization logic. We now simply call partition_sched_domains() in all the cases. This means that we're using exact same code paths as in cpusets case and hence the test below cover cpusets too. Cpuset changes to make rebuild_sched_domains() callable from various contexts are in the separate patch (right next after this one). This not only boots but also easily handles while true; do make clean; make -j 8; done and while true; do on-off-cpu 1; done at the same time. (on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing). Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing this on right now in gnome-terminal and things are moving just fine. Also this is running with most of the debug features enabled (lockdep, mutex, etc) no BUG_ONs or lockdep complaints so far. I believe I addressed all of the Dmitry's comments for original Linus' version. I changed both fair and rt balancer to mask out non-active cpus. And replaced cpu_is_offline() with !cpu_active() in the main scheduler code where it made sense (to me). Signed-off-by: Max Krasnyanskiy <maxk@qualcomm.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Gregory Haskins <ghaskins@novell.com> Cc: dmitry.adamushko@gmail.com Cc: pj@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | Merge branch 'linus' into cpus4096Ingo Molnar2008-07-181-0/+1
|\ \ | |/ | | | | | | | | | | | | Conflicts: drivers/acpi/processor_throttling.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * force offline the processor during hot-removalZhang Rui2008-07-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | The ACPI device node for the cpu has already been unregistered when acpi_processor_handle_eject is called. Thus we should offline the cpu and continue, rather than a failure here. http://bugzilla.kernel.org/show_bug.cgi?id=9772 Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
* | Merge branch 'linus' into cpus4096Ingo Molnar2008-07-161-0/+24
|\ \ | |/ | | | | | | | | | | | | | | | | Conflicts: arch/x86/xen/smp.c kernel/sched_rt.c net/iucv/iucv.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: Move cpu masks from kernel/sched.c into kernel/cpu.cMax Krasnyansky2008-06-061-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel/cpu.c seems a more logical place for those maps since they do not really have much to do with the scheduler these days. kernel/cpu.c is now built for the UP kernel too, but it does not affect the size the kernel sections. $ size vmlinux before text data bss dec hex filename 3313797 307060 310352 3931209 3bfc49 vmlinux after text data bss dec hex filename 3313797 307060 310352 3931209 3bfc49 vmlinux Signed-off-by: Max Krasnyansky <maxk@qualcomm.com> Cc: pj@sgi.com Cc: menage@google.com Cc: rostedt@goodmis.org Cc: mingo@elte.hu Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | core: use performance variant for_each_cpu_mask_nrMike Travis2008-05-231-1/+1
|/ | | | | | | | | | | | Change references from for_each_cpu_mask to for_each_cpu_mask_nr where appropriate Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* kernel: replace remaining __FUNCTION__ occurrencesHarvey Harrison2008-04-301-2/+2
| | | | | | | | | __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* simplify cpu_hotplug_begin()/put_online_cpus()Oleg Nesterov2008-04-291-20/+10
| | | | | | | | | | | | | | | | | cpu_hotplug_begin() must be always called under cpu_add_remove_lock, this means that only one process can be cpu_hotplug.active_writer. So we don't need the cpu_hotplug.writer_queue, we can wake up the ->active_writer directly. Also, fix the comment. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Dipankar Sarma <dipankar@in.ibm.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cpu: fix section mismatch warning in reference to register_cpu_notifierSam Ravnborg2008-04-291-1/+1
| | | | | | | | | | | | | | | | | | | Fix following warnings: WARNING: vmlinux.o(.text+0xc60): Section mismatch in reference from the function kvm_init() to the function .cpuinit.text:register_cpu_notifier() WARNING: vmlinux.o(.text+0x33869a): Section mismatch in reference from the function xfs_icsb_init_counters() to the function .cpuinit.text:register_cpu_notifier() WARNING: vmlinux.o(.text+0x5556a1): Section mismatch in reference from the function acpi_processor_install_hotplug_notify() to the function .cpuinit.text:register_cpu_notifier() WARNING: vmlinux.o(.text+0xfe6b28): Section mismatch in reference from the function cpufreq_register_driver() to the function .cpuinit.text:register_cpu_notifier() register_cpu_notifier() are only really defined when HOTPLUG_CPU is enabled. So references to the function are OK. Annotate it with __ref so we do not get warnings from callers and do not get warnings for the functions/data used by register_cpu_notifier(). Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cpu: fix section mismatch warnings in *cpu_downSam Ravnborg2008-04-291-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | Fix following warnings: WARNING: vmlinux.o(.text+0x75c8d): Section mismatch in reference from the function take_cpu_down() to the variable .cpuinit.data:cpu_chain WARNING: vmlinux.o(.text+0x75d2a): Section mismatch in reference from the function _cpu_down() to the variable .cpuinit.data:cpu_chain WARNING: vmlinux.o(.text+0x75d4d): Section mismatch in reference from the function _cpu_down() to the variable .cpuinit.data:cpu_chain WARNING: vmlinux.o(.text+0x75de4): Section mismatch in reference from the function _cpu_down() to the variable .cpuinit.data:cpu_chain WARNING: vmlinux.o(.text+0x75e33): Section mismatch in reference from the function _cpu_down() to the variable .cpuinit.data:cpu_chain cpu_down is only used from code surrounded by HOTPLUG_CPU so any references to __cpuinit is OK. Add a few __ref to tech modpost to ignore the references. This is just papering over the fact that the cpu hotplug code is fragile with respect to use of HOTPLUG_CPU and in many cases rely on __cpuinit to get rid of code when HOTPLUG_CPU is not enabled. For now this is the least invasive change. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cpu: fix section mismatch warning in unregister_cpu_notifierSam Ravnborg2008-04-291-1/+1
| | | | | | | | | | | | | | | | Fix following warning: WARNING: vmlinux.o(.text+0x75f4e): Section mismatch in reference from the function unregister_cpu_notifier() to the variable .cpuinit.data:cpu_chain We know that unregister_cpu_notifier is using HOTPLUG_CPU stuff - so ignore these references. Annotating unregister_cpu_notifier had been another option but this caused far more warnings since not all callers were annotated __cpuinit. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* generic: use new set_cpus_allowed_ptr functionMike Travis2008-04-191-3/+3
| | | | | | | | | | | | | | | | | * Use new set_cpus_allowed_ptr() function added by previous patch, which instead of passing the "newly allowed cpus" cpumask_t arg by value, pass it by pointer: -int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask) +int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask) * Modify CPU_MASK_ALL Depends on: [sched-devel]: sched: add new set_cpus_allowed_ptr function Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* cpu: fix section mismatch warnings for enable_nonboot_cpusSam Ravnborg2008-02-081-1/+1
| | | | | | | | | | | | | | | Fix following warning: WARNING: o-x86_64/kernel/built-in.o(.text+0x36d8b): Section mismatch in reference from the function enable_nonboot_cpus() to the function .cpuinit.text:_cpu_up() enable_nonboot_cpus() are used solely from CONFIG_CONFIG_PM_SLEEP_SMP=y and PM_SLEEP_SMP imply HOTPLUG_CPU therefore the reference to _cpu_up() is valid. Annotate enable_nonboot_cpus() with __ref to silence modpost. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cpu-hotplug: replace per-subsystem mutexes with get_online_cpus()Gautham R Shenoy2008-01-251-4/+0
| | | | | | | | | This patch converts the known per-subsystem mutexes to get_online_cpus put_online_cpus. It also eliminates the CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE hotplug notification events. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* cpu-hotplug: replace lock_cpu_hotplug() with get_online_cpus()Gautham R Shenoy2008-01-251-5/+5
| | | | | | | | | | | | | | | | | Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use get_online_cpus and put_online_cpus instead as it highlights the refcount semantics in these operations. The new API guarantees protection against the cpu-hotplug operation, but it doesn't guarantee serialized access to any of the local data structures. Hence the changes needs to be reviewed. In case of pseries_add_processor/pseries_remove_processor, use cpu_maps_update_begin()/cpu_maps_update_done() as we're modifying the cpu_present_map there. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* cpu-hotplug: refcount based cpu hotplugGautham R Shenoy2008-01-251-41/+111
| | | | | | | | | | | | | | | | This patch implements a Refcount + Waitqueue based model for cpu-hotplug. Now, a thread which wants to prevent cpu-hotplug, will bump up a global refcount and the thread which wants to perform a cpu-hotplug operation will block till the global refcount goes to zero. The readers, if any, during an ongoing cpu-hotplug operation are blocked until the cpu-hotplug operation is over. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Paul Jackson <pj@sgi.com> [For !CONFIG_HOTPLUG_CPU ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
* CPU HOTPLUG: avoid hotadd when proper possible_map isn't specifiedKAMEZAWA Hiroyuki2007-10-191-0/+9
| | | | | | | | | | | | | cpu-hot-add should be fail if cpu is not set in cpu_possible_map. If go ahead, the system will panic soon. Especially, arch which requires additional_cpus= parameter should handle this. Tested on ia64. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Use helpers to obtain task pid in printksPavel Emelyanov2007-10-191-1/+2
| | | | | | | | | | | | | | | | The task_struct->pid member is going to be deprecated, so start using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in the kernel. The first thing to start with is the pid, printed to dmesg - in this case we may safely use task_pid_nr(). Besides, printks produce more (much more) than a half of all the explicit pid usage. [akpm@linux-foundation.org: git-drm went and changed lots of stuff] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cpu hotplug: cpu: deliver CPU_UP_CANCELED only to NOTIFY_OKed callbacks with ↵Akinobu Mita2007-10-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CPU_UP_PREPARE The functions in a CPU notifier chain is called with CPU_UP_PREPARE event before making the CPU online. If one of the callback returns NOTIFY_BAD, it stops to deliver CPU_UP_PREPARE event, and CPU online operation is canceled. Then CPU_UP_CANCELED event is delivered to the functions in a CPU notifier chain again. This CPU_UP_CANCELED event is delivered to the functions which have been called with CPU_UP_PREPARE, not delivered to the functions which haven't been called with CPU_UP_PREPARE. The problem that makes existing cpu hotplug error handlings complex is that the CPU_UP_CANCELED event is delivered to the function that has returned NOTIFY_BAD, too. Usually we don't expect to call destructor function against the object that has failed to initialize. It is like: err = register_something(); if (err) { unregister_something(); return err; } So it is natural to deliver CPU_UP_CANCELED event only to the functions that have returned NOTIFY_OK with CPU_UP_PREPARE event and not to call the function that have returned NOTIFY_BAD. This is what this patch is doing. Otherwise, every cpu hotplug notifiler has to track whether notifiler event is failed or not for each cpu. (drivers/base/topology.c is doing this with topology_dev_map) Similary this patch makes same thing with CPU_DOWN_PREPARE and CPU_DOWN_FAILED evnets. Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* PM: Fix dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATIONRafael J. Wysocki2007-08-311-2/+2
| | | | | | | | | | | | Dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION introduced by commit 296699de6bdc717189a331ab6bbe90e05c94db06 "Introduce CONFIG_SUSPEND for suspend-to-Ram and standby" are incorrect, as they don't cover the facts that (1) not all architectures support suspend and (2) SMP hibernation is only possible on X86 and PPC64 (if CONFIG_PPC64_SWSUSP is set). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* HOTPLUG: Add CPU_DYING notifierAvi Kivity2007-07-161-2/+14
| | | | | | | | | | | | KVM wants a notification when a cpu is about to die, so it can disable hardware extensions, but at a time when user processes cannot be scheduled on the cpu, so it doesn't try to use virtualization extensions after they have been disabled. This adds a CPU_DYING notification. The notification is called in atomic context on the doomed cpu. Signed-off-by: Avi Kivity <avi@qumranet.com>
* microcode: use suspend-related CPU hotplug notificationsRafael J. Wysocki2007-05-091-10/+0
| | | | | | | | | | | | | Make the microcode driver use the suspend-related CPU hotplug notifications to handle the CPU hotplug events occuring during system-wide suspend and resume transitions. Remove the global variable suspend_cpu_hotplug previously used for this purpose. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add suspend-related notifications for CPU hotplugRafael J. Wysocki2007-05-091-16/+18
| | | | | | | | | | | | | | | | | | | | | Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Remove kthread_bind() call from _cpu_down()Gautham R Shenoy2007-05-091-4/+0
| | | | | | | | | | | | | | | | | We are anyway kthread_stop()ping other per-cpu kernel threads after move_task_off_dead_cpu(), so we can do it with the stop_machine_run thread as well. I just checked with Vatsa if there was any subtle reason why they had put in the kthread_bind() in cpu.c. Vatsa cannot seem to recollect any and I can't see any. So let us just remove the kthread_bind. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failedHeiko Carstens2007-05-091-9/+10
| | | | | | | | | | | | | | This makes cpu hotplug symmetrical: if CPU_UP_PREPARE fails we get CPU_UP_CANCELED, so we can undo what ever happened on PREPARE. The same should happen for CPU_DOWN_PREPARE. [akpm@linux-foundation.org: fix for reduce-size-of-task_struct-on-64-bit-machines] Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Gautham Shenoy <ego@in.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Define and use new events,CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASEGautham R Shenoy2007-05-091-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an attempt to provide an alternate mechanism for postponing a hotplug event instead of using a global mechanism like lock_cpu_hotplug. The proposal is to add two new events namely CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE. The notification for these two events would be sent out before and after a cpu_hotplug event respectively. During the CPU_LOCK_ACQUIRE event, a cpu-hotplug-aware subsystem is supposed to acquire any per-subsystem hotcpu mutex ( Eg. workqueue_mutex in kernel/workqueue.c ). During the CPU_LOCK_RELEASE release event the cpu-hotplug-aware subsystem is supposed to release the per-subsystem hotcpu mutex. The reasons for defining new events as opposed to reusing the existing events like CPU_UP_PREPARE/CPU_UP_FAILED/CPU_ONLINE for locking/unlocking of per-subsystem hotcpu mutexes are as follow: - CPU_LOCK_ACQUIRE: All hotcpu mutexes are taken before subsystems start handling pre-hotplug events like CPU_UP_PREPARE/CPU_DOWN_PREPARE etc, thus ensuring a clean handling of these events. - CPU_LOCK_RELEASE: The hotcpu mutexes will be released only after all subsystems have handled post-hotplug events like CPU_DOWN_FAILED, CPU_DEAD,CPU_ONLINE etc thereby ensuring that there are no subsequent clashes amongst the interdependent subsystems after a cpu hotplugs. This patch also uses __raw_notifier_call chain in _cpu_up to take care of the dependency between the two consequetive calls to raw_notifier_call_chain. [akpm@linux-foundation.org: fix a bug] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>