aboutsummaryrefslogtreecommitdiffstats
path: root/arch/i386/kernel/smpboot.c
Commit message (Collapse)AuthorAgeFilesLines
* [PATCH] i383 numa: fix numaq/summit apicid conflictKeith Mannthey2006-10-031-1/+1
| | | | | | | | | | | This allows numaq to properly align cpus to their given node during boot. Pass logical apicid to apicid_to_node and allow the summit sub-arch to use physical apicid (hard_smp_processor_id()). Tested against numaq and summit based systems with no issues. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cpumask: export node_to_cpu_mask consistentlyGreg Banks2006-10-021-0/+1
| | | | | | | | | cpumask: ensure that node_to_cpumask() is available to modules for all supported combinations of architecture and CONFIG_NUMA. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] completions: lockdep annotate on stack completionsPeter Zijlstra2006-10-011-1/+1
| | | | | | | | | | | All on stack DECLARE_COMPLETIONs should be replaced by: DECLARE_COMPLETION_ONSTACK Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] convert i386 Summit subarch to use SRAT info for apicid_to_node callskeith mannthey2006-09-291-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Convert the i386 summit subarch apicid_to_node to use node information provided by the SRAT. It was discussed a little on LKML a few weeks ago and was seen as an acceptable fix. The current way of obtaining the nodeid static inline int apicid_to_node(int logical_apicid) { return logical_apicid >> 5; } is just not correct for all summit systems/bios. Assuming the apicid matches the Linux node number require a leap of faith that the bios mapped out the apicids a set way. Modern summit HW (IBM x460) does not layout its bios in the manner for various reasons and is unable to boot i386 numa. The best way to get the correct apicid to node information is from the SRAT table during boot. It lays out what apicid belongs to what node. I use this information to create a table for use at run time. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: don't taint UP K7's running SMP kernels.Dave Jones2006-09-261-0/+3
| | | | | | | | | | | | | We have a test that looks for invalid pairings of certain athlon/durons that weren't designed for SMP, and taint accordingly (with 'S') if we find such a configuration. However, this test shouldn't fire if there's only a single CPU present. It's perfectly valid for an SMP kernel to boot on UP hardware for example. AK: changed to num_possible_cpus() Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>
* [PATCH] i386: Replace i386 open-coded cmdline parsing withRusty Russell2006-09-261-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch replaces the open-coded early commandline parsing throughout the i386 boot code with the generic mechanism (already used by ppc, powerpc, ia64 and s390). The code was inconsistent with whether it deletes the option from the cmdline or not, meaning some of these will get passed through the environment into init. This transformation is mainly mechanical, but there are some notable parts: 1) Grammar: s/linux never set's it up/linux never sets it up/ 2) Remove hacked-in earlyprintk= option scanning. When someone actually implements CONFIG_EARLY_PRINTK, then they can use early_param(). [AK: actually it is implemented, but I'm adding the early_param it in the next x86-64 patch] 3) Move declaration of generic_apic_probe() from setup.c into asm/apic.h 4) Various parameters now moved into their appropriate files (thanks Andi). 5) All parse functions which examine arg need to check for NULL, except one where it has subtle humor value. AK: readded acpi_sci handling which was completely dropped AK: moved some more variables into acpi/boot.c Cc: len.brown@intel.com Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de>
* [PATCH] i386/x86-64: Fix NMI watchdog suspend/resumeShaohua Li2006-09-261-1/+2
| | | | | | | | | | | | | | | | | | | | | Making NMI suspend/resume work with SMP. We use CPU hotplug to offline APs in SMP suspend/resume. Only BSP executes sysdev's .suspend/.resume method. APs should follow CPU hotplug code path. And: +From: Don Zickus <dzickus@redhat.com> Makes the start/stop paths of nmi watchdog more robust to handle the suspend/resume cases more gracefully. AK: I merged the two patches together Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Don Zickus <dzickus@redhat.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org>
* [PATCH] i386: fix flat mode numa on a real numa systemkeith mannthey2006-09-251-1/+5
| | | | | | | | | | | | | | | | If there is only 1 node in the system cpus should think they are apart of some other node. If cases where a real numa system boots the Flat numa option make sure the cpus don't claim to be apart on a non-existent node. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] synchronize_tsc() fixesAndrew Morton2006-07-311-29/+33
| | | | | | | | | | | | | | - Move the tsc synchronisation variables into a struct, mark it __initdata - local `realdelta' wants to be 64-bit - Print the skew for negative skews, as well as for positive ones - remove dead code Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Remove obsolete #include <linux/config.h>Jörn Engel2006-06-301-1/+0
| | | | | Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
* [PATCH] sched: mc/smt power savings sched policySiddha, Suresh B2006-06-271-3/+5
| | | | | | | | | | | | | | | | | | | | | | sysfs entries 'sched_mc_power_savings' and 'sched_smt_power_savings' in /sys/devices/system/cpu/ control the MC/SMT power savings policy for the scheduler. Based on the values (1-enable, 0-disable) for these controls, sched groups cpu power will be determined for different domains. When power savings policy is enabled and under light load conditions, scheduler will minimize the physical packages/cpu cores carrying the load and thus conserving power(with a perf impact based on the workload characteristics... see OLS 2005 CMP kernel scheduler paper for more details..) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Con Kolivas <kernel@kolivas.org> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: move phys_proc_id and cpu_core_id to cpuinfo_x86Rohit Seth2006-06-271-11/+5
| | | | | | | | | | | Move the phys_core_id and cpu_core_id to cpuinfo_x86 structure. Similar patch for x86_64 is already accepted by Andi earlier this week. [akpm@osdl.org: fix warning] Signed-off-by: Rohit Seth <rohitseth@google.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: cpu_init(): avoid GFP_KERNEL allocation while atomicShaohua Li2006-06-271-0/+13
| | | | | | | | | | | | | | | The patch fixes two issues: 1. cpu_init is called with interrupt disabled. Allocating gdt table there isn't good at runtime. 2. gdt table page cause memory leak in CPU hotplug case. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86_64: nmi watchdog header cleanupDon Zickus2006-06-261-0/+1
| | | | | | | | Misc header cleanup for nmi watchdog. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cpu_relax(): smpboot.cAndreas Mohr2006-06-251-6/+8
| | | | | | | | | | Add cpu_relax() to various smpboot.c init loops. cpu_relax() always implies a barrier (according to Arjan), so remove those as well. Signed-off-by: Andreas Mohr <andi@lisas.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Avoid printing pointless tsc skew msgsDave Jones2006-04-281-1/+3
| | | | | | | | | | | | | These messages are kinda silly.. CPU#0 had 0 usecs TSC skew, fixed it up. CPU#1 had 0 usecs TSC skew, fixed it up. inspired from: http://bugzilla.kernel.org/attachment.cgi?id=7713&action=view Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: new sched domain for representing multi-coreSiddha, Suresh B2006-03-271-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | Add a new sched domain for representing multi-core with shared caches between cores. Consider a dual package system, each package containing two cores and with last level cache shared between cores with in a package. If there are two runnable processes, with this appended patch those two processes will be scheduled on different packages. On such systems, with this patch we have observed 8% perf improvement with specJBB(2 warehouse) benchmark and 35% improvement with CFP2000 rate(with 2 users). This new domain will come into play only on multi-core systems with shared caches. On other systems, this sched domain will be removed by domain degeneration code. This new domain can be also used for implementing power savings policy (see OLS 2005 CMP kernel scheduler paper for more details.. I will post another patch for power savings policy soon) Most of the arch/* file changes are for cpu_coregroup_map() implementation. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Check if cpu can be onlined before calling smp_prepare_cpu()Ashok Raj2006-03-251-15/+18
| | | | | | | | | | | | | | | | | | - Moved check for online cpu out of smp_prepare_cpu() - Moved default declaration of smp_prepare_cpu() to kernel/cpu.c - Removed lock_cpu_hotplug() from smp_prepare_cpu() to around it, since its called from cpu_up() as well now. - Removed clearing from cpu_present_map during cpu_offline as it breaks using cpu_up() directly during a subsequent online operation. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Li, Shaohua" <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: SMP alternativesGerd Hoffmann2006-03-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement SMP alternatives, i.e. switching at runtime between different code versions for UP and SMP. The code can patch both SMP->UP and UP->SMP. The UP->SMP case is useful for CPU hotplug. With CONFIG_CPU_HOTPLUG enabled the code switches to UP at boot time and when the number of CPUs goes down to 1, and switches to SMP when the number of CPUs goes up to 2. Without CONFIG_CPU_HOTPLUG or on non-SMP-capable systems the code is patched once at boot time (if needed) and the tables are released afterwards. The changes in detail: * The current alternatives bits are moved to a separate file, the SMP alternatives code is added there. * The patch adds some new elf sections to the kernel: .smp_altinstructions like .altinstructions, also contains a list of alt_instr structs. .smp_altinstr_replacement like .altinstr_replacement, but also has some space to save original instruction before replaving it. .smp_locks list of pointers to lock prefixes which can be nop'ed out on UP. The first two are used to replace more complex instruction sequences such as spinlocks and semaphores. It would be possible to deal with the lock prefixes with that as well, but by handling them as special case the table sizes become much smaller. * The sections are page-aligned and padded up to page size, so they can be free if they are not needed. * Splitted the code to release init pages to a separate function and use it to release the elf sections if they are unused. Signed-off-by: Gerd Hoffmann <kraxel@suse.de> Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: check for online cpus before bringing them upSrivatsa Vaddagiri2006-03-171-0/+10
| | | | | | | | | | | | | | | | Bryce reported a bug wherein offlining CPU0 (on x86 box) and then subsequently onlining it resulted in a lockup. On x86, CPU0 is never offlined. The subsequent attempt to online CPU0 doesn't take that into account. It actually tries to bootup the already booted CPU. Following patch fixes the problem (as acknowledged by Bryce). Please consider for inclusion in 2.6.16. Check if cpu is already online. Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: fix broken SMP boot sequenceJames Bottomley2006-02-241-6/+0
| | | | | | | | | | | | | | | | Recent GDT changes broke the SMP boot sequence if the booting CPU is numbered anything other than zero. There's also a subtle source of error in that the boot time CPU now uses cpu_gdt_table (which is actually the GDT for booting CPUs in head.S). This patch fixes both problems by making GDT descriptors themselves allocated from a per_cpu area and switching to them in cpu_init(), which now means that cpu_gdt_table is exclusively used for booting CPUs again. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Cc: Zachary Amsden <zach@vmware.com> Cc: Matt Tolentino <metolent@snoqualmie.dp.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: don't initialise cpu_possible_map to all onesAndrew Morton2006-02-101-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Initialising cpu_possible_map to all-ones with CONFIG_HOTPLUG_CPU means that a) All for_each_cpu() loops will iterate across all NR_CPUS CPUs, rather than over possible ones. That can be quite expensive. b) Soon we'll be allocating per-cpu areas only for possible CPUs. So with CPU_MASK_ALL, we'll be wasting memory. I also switched voyager over to not use CPU_MASK_ALL in the non-CPU-hotplug case. Should be OK.. I note that parisc is also using CPU_MASK_ALL. Suggest that it stop doing that. Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Paul Jackson <pj@sgi.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Zwane Mwaikambo <zwane@linuxpower.ca> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: fix task_pt_regs()akpm@osdl.org2006-01-121-2/+1
| | | | | | | | | | | | ) From: Al Viro <viro@ftp.linux.org.uk> task_pt_regs() needs the same offset-by-8 to match copy_thread() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] scheduler cache-hot-autodetectakpm@osdl.org2006-01-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ) From: Ingo Molnar <mingo@elte.hu> This is the latest version of the scheduler cache-hot-auto-tune patch. The first problem was that detection time scaled with O(N^2), which is unacceptable on larger SMP and NUMA systems. To solve this: - I've added a 'domain distance' function, which is used to cache measurement results. Each distance is only measured once. This means that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT distances 0 and 1, and on SMP distance 0 is measured. The code walks the domain tree to determine the distance, so it automatically follows whatever hierarchy an architecture sets up. This cuts down on the boot time significantly and removes the O(N^2) limit. The only assumption is that migration costs can be expressed as a function of domain distance - this covers the overwhelming majority of existing systems, and is a good guess even for more assymetric systems. [ People hacking systems that have assymetries that break this assumption (e.g. different CPU speeds) should experiment a bit with the cpu_distance() function. Adding a ->migration_distance factor to the domain structure would be one possible solution - but lets first see the problem systems, if they exist at all. Lets not overdesign. ] Another problem was that only a single cache-size was used for measuring the cost of migration, and most architectures didnt set that variable up. Furthermore, a single cache-size does not fit NUMA hierarchies with L3 caches and does not fit HT setups, where different CPUs will often have different 'effective cache sizes'. To solve this problem: - Instead of relying on a single cache-size provided by the platform and sticking to it, the code now auto-detects the 'effective migration cost' between two measured CPUs, via iterating through a wide range of cachesizes. The code searches for the maximum migration cost, which occurs when the working set of the test-workload falls just below the 'effective cache size'. I.e. real-life optimized search is done for the maximum migration cost, between two real CPUs. This, amongst other things, has the positive effect hat if e.g. two CPUs share a L2/L3 cache, a different (and accurate) migration cost will be found than between two CPUs on the same system that dont share any caches. (The reliable measurement of migration costs is tricky - see the source for details.) Furthermore i've added various boot-time options to override/tune migration behavior. Firstly, there's a blanket override for autodetection: migration_cost=1000,2000,3000 will override the depth 0/1/2 values with 1msec/2msec/3msec values. Secondly, there's a global factor that can be used to increase (or decrease) the autodetected values: migration_factor=120 will increase the autodetected values by 20%. This option is useful to tune things in a workload-dependent way - e.g. if a workload is cache-insensitive then CPU utilization can be maximized by specifying migration_factor=0. I've tested the autodetection code quite extensively on x86, on 3 P3/Xeon/2MB, and the autodetected values look pretty good: Dual Celeron (128K L2 cache): --------------------- migration cost matrix (max_cache_size: 131072, cpu: 467 MHz): --------------------- [00] [01] [00]: - 1.7(1) [01]: 1.7(1) - --------------------- cacheflush times [2]: 0.0 (0) 1.7 (1784008) --------------------- Here the slow memory subsystem dominates system performance, and even though caches are small, the migration cost is 1.7 msecs. Dual HT P4 (512K L2 cache): --------------------- migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz): --------------------- [00] [01] [02] [03] [00]: - 0.4(1) 0.0(0) 0.4(1) [01]: 0.4(1) - 0.4(1) 0.0(0) [02]: 0.0(0) 0.4(1) - 0.4(1) [03]: 0.4(1) 0.0(0) 0.4(1) - --------------------- cacheflush times [2]: 0.0 (33900) 0.4 (448514) --------------------- Here it can be seen that there is no migration cost between two HT siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory system makes inter-physical-CPU migration pretty cheap: 0.4 msecs. 8-way P3/Xeon [2MB L2 cache]: --------------------- migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz): --------------------- [00] [01] [02] [03] [04] [05] [06] [07] [00]: - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [01]: 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [02]: 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [03]: 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) [04]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) [05]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) [06]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) [07]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - --------------------- cacheflush times [2]: 0.0 (0) 19.2 (19281756) --------------------- This one has huge caches and a relatively slow memory subsystem - so the migration cost is 19 msecs. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Cc: <wilder@us.ibm.com> Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: GDT alignment fixZachary Amsden2006-01-061-0/+6
| | | | | | | | | | | | | | | | Make GDT page aligned and page padded to support running inside of a hypervisor. This prevents false sharing of the GDT page with other hot data, which is not allowed in Xen, and causes performance problems in VMware. Rather than go back to the old method of statically allocating the GDT (which wastes unneded space for non-present CPUs), the GDT for APs is allocated dynamically. Signed-off-by: Zachary Amsden <zach@vmware.com> Cc: "Seth, Rohit" <rohit.seth@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386/x86-64 disable LAPIC completely for offline CPUShaohua Li2005-12-121-2/+1
| | | | | | | | | | | | | Disabling LAPIC timer isn't sufficient. In some situations, such as we enabled NMI watchdog, there is still unexpected interrupt (such as NMI) invoked in offline CPU. This also avoids offline CPU receives spurious interrupt and anything similar. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: "Seth, Rohit" <rohit.seth@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge x86-64 update from AndiLinus Torvalds2005-11-141-20/+53
|\
| * [PATCH] x86-64/i386: Intel HT, Multi core detection fixesSiddha, Suresh B2005-11-141-20/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fields obtained through cpuid vector 0x1(ebx[16:23]) and vector 0x4(eax[14:25], eax[26:31]) indicate the maximum values and might not always be the same as what is available and what OS sees. So make sure "siblings" and "cpu cores" values in /proc/cpuinfo reflect the values as seen by OS instead of what cpuid instruction says. This will also fix the buggy BIOS cases (for example where cpuid on a single core cpu says there are "2" siblings, even when HT is disabled in the BIOS. http://bugzilla.kernel.org/show_bug.cgi?id=4359) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] sched: disable preempt in idle tasksNick Piggin2005-11-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Run idle threads with preempt disabled. Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()). How did it ever work before? Might fix the CPU hotplugging hang which Nigel Cunningham noted. We think the bug hits if the idle thread is preempted after checking need_resched() and before going to sleep, then the CPU offlined. After calling stop_machine_run, the CPU eventually returns from preemption and into the idle thread and goes to sleep. The CPU will continue executing previous idle and have no chance to call play_dead. By disabling preemption until we are ready to explicitly schedule, this bug is fixed and the idle threads generally become more robust. From: alexs <ashepard@u.washington.edu> PPC build fix From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp> MIPS build fix Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] unexport phys_proc_id and cpu_core_idAdrian Bunk2005-11-071-2/+0
| | | | | | | | | | | | | | | | | | EXPORT_SYMBOL's for phys_proc_id and cpu_core_id were added this year but never used. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] arch/i386: Use ARRAY_SIZE macroTobias Klauser2005-11-071-1/+1
|/ | | | | | | | Use ARRAY_SIZE macro instead of sizeof(x)/sizeof(x[0]) Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Revert "i386: move apic init in init_IRQs"Linus Torvalds2005-10-311-16/+52
| | | | | | | | Commit f2b36db692b7ff6972320ad9839ae656a3b0ee3e causes a bootup hang on at least one machine. Revert for now until we understand why. The old code may be ugly, but it works. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: move apic init in init_IRQsEric W. Biederman2005-10-301-52/+16
| | | | | | | | | | | | | | | | | | | | | | | | All kinds of ugliness exists because we don't initialize the apics during init_IRQs. - We calibrate jiffies in non apic mode even when we are using apics. - We have to have special code to initialize the apics when non-smp. - The legacy i8259 must exist and be setup correctly, even when we won't use it past initialization. - The kexec on panic code must restore the state of the io_apics. - init/main.c needs a special case for !smp smp_init on x86 In addition to pure code movement I needed a couple of non-obvious changes: - Move setup_boot_APIC_clock into APIC_late_time_init for simplicity. - Use cpu_khz to generate a better approximation of loops_per_jiffies so I can verify the timer interrupt is working. - Call setup_apic_nmi_watchdog again after cpu_khz is initialized on the boot cpu. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: hot plug CPU to support physical add of new processorsNatalie Protasevich2005-10-301-0/+4
| | | | | | | | | | | | | | The patch allows physical bring-up of new processors (not initially present in the configuration) from facilities such as driver/utility implemented on a platform. The actual method of making processors available is up to the platform implementation. Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Zwane Mwaikambo <zwane@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] useless includes of linux/irq.h in arch/i386Al Viro2005-09-261-1/+0
| | | | | | | | | | Most of these guys are simply not needed (pulled by other stuff via asm-i386/hardirq.h). One that is not entirely useless is hilarious - arch/i386/oprofile/nmi_timer_int.c includes linux/irq.h... as a way to get linux/errno.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] use add_taint() for setting tainted bit flagsRandy Dunlap2005-09-131-1/+1
| | | | | | | | | Use the add_taint() interface for setting tainted bit flags instead of doing it manually. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386/smpboot: use msleep() instead of schedule_timeout()Nishanth Aravamudan2005-09-101-2/+1
| | | | | | | | | | Replace schedule_timeout() with msleep() to guarantee the task delays as expected. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386 boottime for_each_cpu brokenZwane Mwaikambo2005-09-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | for_each_cpu walks through all processors in cpu_possible_map, which is defined as cpu_callout_map on i386 and isn't initialised until all processors have been booted. This breaks things which do for_each_cpu iterations early during boot. So, define cpu_possible_map as a bitmap with NR_CPUS bits populated. This was triggered by a patch i'm working on which does alloc_percpu before bringing up secondary processors. From: Alexander Nyberg <alexn@telia.com> i386-boottime-for_each_cpu-broken.patch i386-boottime-for_each_cpu-broken-fix.patch The SMP version of __alloc_percpu checks the cpu_possible_map before allocating memory for a certain cpu. With the above patches the BSP cpuid is never set in cpu_possible_map which breaks CONFIG_SMP on uniprocessor machines (as soon as someone tries to dereference something allocated via __alloc_percpu, which in fact is never allocated since the cpu is not set in cpu_possible_map). Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk> Signed-off-by: Alexander Nyberg <alexn@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: encapsulate copying of pgd entriesZachary Amsden2005-09-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a clone operation for pgd updates. This helps complete the encapsulation of updates to page tables (or pages about to become page tables) into accessor functions rather than using memcpy() to duplicate them. This is both generally good for consistency and also necessary for running in a hypervisor which requires explicit updates to page table entries. The new function is: clone_pgd_range(pgd_t *dst, pgd_t *src, int count); dst - pointer to pgd range anwhere on a pgd page src - "" count - the number of pgds to copy. dst and src can be on the same page, but the range must not overlap and must not cross a page boundary. Note that I ommitted using this call to copy pgd entries into the software suspend page root, since this is not technically a live paging structure, rather it is used on resume from suspend. CC'ing Pavel in case he has any feedback on this. Thanks to Chris Wright for noticing that this could be more optimal in PAE compiles by eliminating the memset. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] mostly_read data sectionChristoph Lameter2005-07-071-9/+9
| | | | | | | | | | | | | | | | | | | | | Add a new section called ".data.read_mostly" for data items that are read frequently and rarely written to like cpumaps etc. If these maps are placed in the .data section then these frequenly read items may end up in cachelines with data is is frequently updated. In that case all processors in an SMP system must needlessly reload the cachelines again and again containing elements of those frequently used variables. The ability to share these cachelines will allow each cpu in an SMP system to keep local copies of those shared cachelines thereby optimizing performance. Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com> Signed-off-by: Christoph Lameter <christoph@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cpu state clean after hot removeLi Shaohua2005-06-251-31/+144
| | | | | | | | Clean CPU states in order to reuse smp boot code for CPU hotplug. Signed-off-by: Li Shaohua<shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] init call cleanupLi Shaohua2005-06-251-9/+9
| | | | | | | | | Trival patch for CPU hotplug. In CPU identify part, only did cleaup for intel CPUs. Need do for other CPUs if they support S3 SMP. Signed-off-by: Li Shaohua<shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sibling map initializing reworkLi Shaohua2005-06-251-47/+50
| | | | | | | | Make sibling map init per-cpu. Hotplug CPU may change the map at runtime. Signed-off-by: Li Shaohua<shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sep initializing reworkLi Shaohua2005-06-251-0/+11
| | | | | | | | | Make SEP init per-cpu, so it is hotplug safe. Signed-off-by: Li Shaohua<shaohua.li@intel.com> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386 CPU hotplugZwane Mwaikambo2005-06-251-5/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (The i386 CPU hotplug patch provides infrastructure for some work which Pavel is doing as well as for ACPI S3 (suspend-to-RAM) work which Li Shaohua <shaohua.li@intel.com> is doing) The following provides i386 architecture support for safely unregistering and registering processors during runtime, updated for the current -mm tree. In order to avoid dumping cpu hotplug code into kernel/irq/* i dropped the cpu_online check in do_IRQ() by modifying fixup_irqs(). The difference being that on cpu offline, fixup_irqs() is called before we clear the cpu from cpu_online_map and a long delay in order to ensure that we never have any queued external interrupts on the APICs. There are additional changes to s390 and ppc64 to account for this change. 1) Add CONFIG_HOTPLUG_CPU 2) disable local APIC timer on dead cpus. 3) Disable preempt around irq balancing to prevent CPUs going down. 4) Print irq stats for all possible cpus. 5) Debugging check for interrupts on offline cpus. 6) Hacky fixup_irqs() to redirect irqs when cpus go off/online. 7) play_dead() for offline cpus to spin inside. 8) Handle offline cpus set in flush_tlb_others(). 9) Grab lock earlier in smp_call_function() to prevent CPUs going down. 10) Implement __cpu_disable() and __cpu_die(). 11) Enable local interrupts in cpu_enable() after fixup_irqs() 12) Don't fiddle with NMI on dead cpu, but leave intact on other cpus. 13) Program IRQ affinity whilst cpu is still in cpu_online_map on offline. Signed-off-by: Zwane Mwaikambo <zwane@linuxpower.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: cpu_khz type fixAndrew Morton2005-06-231-1/+1
| | | | | | | | | | x86_64's cpu_khz is unsigned int and there is no reason why x86 needs to use unsigned long. So make cpu_khz unsigned int on x86 as well. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Remove i386_ksyms.c, almost.Alexey Dobriyan2005-06-231-0/+12
| | | | | | | | | | | | | | | | | | | | * EXPORT_SYMBOL's moved to other files * #include <linux/config.h>, <linux/module.h> where needed * #include's in i386_ksyms.c cleaned up * After copy-paste, redundant due to Makefiles rules preprocessor directives removed: #ifdef CONFIG_FOO EXPORT_SYMBOL(foo); #endif obj-$(CONFIG_FOO) += foo.o * Tiny reformat to fit in 80 columns Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: fix smp_num_siblings on buggy BIOSesSiddha, Suresh B2005-05-281-1/+3
| | | | | | | | | | | | | | This fixes 'smp_num_siblings' value on the systems with a buggy bios, which sets number of siblings to '2' even when HT is disabled. (more details are at http://bugzilla.kernel.org/show_bug.cgi?id=4359) I am planning to do more cleanup in this area (like moving smp_num_siblings to per cpuinfo) shortly. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86_64: i386/x86-64: Export cpu_core_mapAndi Kleen2005-05-201-0/+1
| | | | | | | | | Needed for the powernow k8 driver for dual core support. Signed-off-by: Andi Kleen <ak@suse.de> Cc: <mark.langsdorf@amd.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] check nmi watchdog is brokenJack F Vogel2005-05-011-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | A bug against an xSeries system showed up recently noting that the check_nmi_watchdog() test was failing. I have been investigating it and discovered in both i386 and x86_64 the recent change to the routine to use the cpu_callin_map has uncovered a problem. Prior to that change, on an SMP box, the test was trivally passing because all cpu's were found to not yet be online, but now with the callin_map they are discovered, it goes on to test the counter and they have not yet begun to increment, so it announces a CPU is stuck and bails out. On all the systems I have access to test, the announcement of failure is also bougs... by the time you can login and check /proc/interrupts, the NMI count is happily incrementing on all CPUs. Its just that the test is being done too early. I have tried moving the call to the test around a bit, and it was always too early. I finally hit on this proposed solution, it delays the routine via a late_initcall(), seems like the right solution to me. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>