aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'x86-mce-for-linus' of ↵Linus Torvalds2009-09-1719-767/+363
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits) x86, mce: Fix compilation with !CONFIG_DEBUG_FS in mce-severity.c x86, mce: CE in last bank prevents panic by unknown MCE x86, mce: Fake panic support for MCE testing x86, mce: Move debugfs mce dir creating to mce.c x86, mce: Support specifying raise mode for software MCE injection x86, mce: Support specifying context for software mce injection x86, mce: fix reporting of Thermal Monitoring mechanism enabled x86, mce: remove never executed code x86, mce: add missing __cpuinit tags x86, mce: fix "mce" boot option handling for CONFIG_X86_NEW_MCE x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUs x86: mce: Lower maximum number of banks to architecture limit x86: mce: macros to compute banks MSRs x86: mce: Move per bank data in a single datastructure x86: mce: Move code in mce.c x86: mce: Rename CONFIG_X86_NEW_MCE to CONFIG_X86_MCE x86: mce: Remove old i386 machine check code x86: mce: Update X86_MCE description in x86/Kconfig x86: mce: Make CONFIG_X86_ANCIENT_MCE dependent on CONFIG_X86_MCE x86, mce: use atomic_inc_return() instead of add by 1 ... Manually fixed up trivial conflicts: Documentation/feature-removal-schedule.txt arch/x86/kernel/cpu/mcheck/mce.c
| * x86, mce: Fix compilation with !CONFIG_DEBUG_FS in mce-severity.cAndi Kleen2009-09-141-0/+2
| | | | | | | | | | | | | | | | | | Fix compilation error in arch/x86/kernel/cpu/mcheck/mce-severity.c when CONFIG_DEBUG_FS is disabled, introduced in commit 5be9ed251f58881dfc3dd6742a81ff9ad1a7bb04. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, mce: CE in last bank prevents panic by unknown MCEHidetoshi Seto2009-08-261-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If MCE handler is called but none of mces_seen have machine check event which might signal the MCE (i.e. event higher than MCE_KEEP_SEVERITY), panic with "Machine check from unknown source" will be taken since the MCE is assumed to be signaled from external agent or so. Usually mces_seen never point MCE_KEEP_SEVERITY event such as CE. But it can happen because initial value of mces_seen is accidentally modified by mce_no_way_out() - in case if mce_no_way_out() run through all banks and the last bank has the CE, mces_seen points the CE and the "panic by unknown" will not be taken. This patch fixes this undesired behavior, and clarifies the logic. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com> LKML-Reference: <4A94E244.3020301@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
| * x86, mce: Fake panic support for MCE testingHuang Ying2009-08-101-11/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If "fake panic" mode is turned on, just log panic message instead of go real panic. This is used for testing only, so that the test suite can check for the correct panic message and do regression testing for MCE would go panic. This patch is based on x86-tip.git/mce. ChangeLog: v5: - Rebased on x86-tip.git/mce v4: - Move config file from sysfs to debugfs Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, mce: Move debugfs mce dir creating to mce.cHuang Ying2009-08-103-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | Because more debugfs files under mce dir will be create in mce.c. ChangeLog: v5: - Rebased on x86-tip.git/mce Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, mce: Support specifying raise mode for software MCE injectionHuang Ying2009-08-102-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Raise mode include raising as exception or raising as poll, it is specified via the mce.inject_flags field. This can be used to specify raise mode of UCNA, which is UC error but raised not as exception. And this can be used to test the filter code of poll handler or exception handler too. For example, enforce a poll raise mode for a fatal MCE. ChangeLog: v2: - Re-base on latest x86-tip.git/mce3 Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, mce: Support specifying context for software mce injectionHuang Ying2009-08-102-32/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cpu context is specified via the new mce.inject_flags fields. This allows more realistic machine check testing in different situations. "RANDOM" context is implemented via NMI broadcasting to add randomization to testing. AK: Fix NMI broadcasting check. Fix 32-bit building. Some race fixes. Move to module. Various changes ChangeLog: v3: - Re-based on latest x86-tip.git/mce4 - Fix 32-bit building v2: - Re-base on latest x86-tip.git/mce3 Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, mce: fix reporting of Thermal Monitoring mechanism enabledBartlomiej Zolnierkiewicz2009-07-292-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | Early Pentium M models use different method for enabling TM2 (per paragraph 13.5.2.3 of the "Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1"). Tested on the affected Pentium M variant (model == 13). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, mce: remove never executed codeBartlomiej Zolnierkiewicz2009-07-291-2/+0
| | | | | | | | | | | | | | | | fseverities_coverage is never NULL in err_out code path. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, mce: add missing __cpuinit tagsBartlomiej Zolnierkiewicz2009-07-291-2/+2
| | | | | | | | | | | | | | | | mce_cap_init() and mce_cpu_quirks() can be tagged with __cpuinit. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, mce: fix "mce" boot option handling for CONFIG_X86_NEW_MCEBartlomiej Zolnierkiewicz2009-07-291-1/+3
| | | | | | | | | | | | | | | | | | "mce argument mce ignored. Please use /sys" message shouldn't be printed when using "mce" boot option. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUsBartlomiej Zolnierkiewicz2009-07-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On my legacy Pentium M laptop (Acer Extensa 2900) I get bogus MCE on a cold boot with CONFIG_X86_NEW_MCE enabled, i.e. (after decoding it with mcelog): MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 1 MCG status: MCi status: Error overflow Uncorrected error Error enabled Processor context corrupt MCA: Data CACHE Level-1 UNKNOWN Error STATUS f200000000000195 MCGSTATUS 0 [ The other STATUS values observed: f2000000000001b5 (... UNKNOWN error) and f200000000000115 (... READ Error). To verify that this is not a CONFIG_X86_NEW_MCE bug I also modified the CONFIG_X86_OLD_MCE code (which doesn't log any MCEs) to dump content of STATUS MSR before it is cleared during initialization. ] Since the bogus MCE results in a kernel taint (which in turn disables lockdep support) don't log boot MCEs on Pentium M (model == 13) CPUs by default ("mce=bootlog" boot parameter can be be used to get the old behavior). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86: mce: Lower maximum number of banks to architecture limitAndi Kleen2009-07-091-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | The Intel x86 architecture right now only supports 32 machine check banks, more would bump into other MSRs. So lower the max define to 32. This only affects a few bitmaps, most data structures are dynamically sized anyways. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86: mce: macros to compute banks MSRsAndi Kleen2009-07-093-22/+29
| | | | | | | | | | | | | | | | | | | | Instead of open coded calculations for bank MSRs hide the indexing of higher banks MCE register MSRs in new macros. No semantic changes. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86: mce: Move per bank data in a single datastructureAndi Kleen2009-07-092-56/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | This addresses one of the leftover review comments. Move the per bank data into a single structure. This avoids several separate variables and also separate allocation of sysfs objects. I didn't move the CMCI ownership information so far because that would have needed some non trivial changes in the algorithms. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86: mce: Move code in mce.cAndi Kleen2009-07-091-11/+11
| | | | | | | | | | | | | | | | | | | | | | Now that the X86_OLD_MCE ifdefs are gone move some code that used to be outside the big ifdef to a more natural place near its user. No code change. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86: mce: Rename CONFIG_X86_NEW_MCE to CONFIG_X86_MCEAndi Kleen2009-07-097-16/+10
| | | | | | | | | | | | | | | | | | | | Drop the CONFIG_X86_NEW_MCE symbol and change all references to it to check for CONFIG_X86_MCE directly. No code changes Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86: mce: Remove old i386 machine check codeAndi Kleen2009-07-098-593/+2
| | | | | | | | | | | | | | | | | | | | | | | | As announced in feature-remove-schedule.txt remove CONFIG_X86_OLD_MCE This patch only removes code. The ancient machine check code for very old systems that are not supported by CONFIG_X86_NEW_MCE is still kept. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86: mce: Update X86_MCE description in x86/KconfigAndi Kleen2009-07-091-12/+4
| | | | | | | | | | | | | | | | | | | | - Clarify that this config controls thermal throttling reporting too - Clarify the types of errors reported by machine checks - Drop references to ancient CPUs. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86: mce: Make CONFIG_X86_ANCIENT_MCE dependent on CONFIG_X86_MCEAndi Kleen2009-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | Add a missing depency for ANCIENT_MCE. It didn't matter in practice because the ANCIENT code wasn't compiled without X86_MCE, but it's better to express that clearly in Kconfig. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, mce: use atomic_inc_return() instead of add by 1Borislav Petkov2009-06-201-2/+2
| | | | | | | | | | | | | | | | Use atomic_inc_return() instead of atomic_add_return() by 1. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, mce: fix typo in comment in asm/mce.hBorislav Petkov2009-06-201-1/+1
| | | | | | | | | | | | | | | | Fix comment to match the actual declaration. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2009-09-177-88/+108
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits) sched: Fix SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL vs SD_WAKE_AFFINE sched: Stop buddies from hogging the system sched: Add new wakeup preemption mode: WAKEUP_RUNNING sched: Fix TASK_WAKING & loadaverage breakage sched: Disable wakeup balancing sched: Rename flags to wake_flags sched: Clean up the load_idx selection in select_task_rq_fair sched: Optimize cgroup vs wakeup a bit sched: x86: Name old_perf in a unique way sched: Implement a gentler fair-sleepers feature sched: Add SD_PREFER_LOCAL sched: Add a few SYNC hint knobs to play with sched: Fix sync wakeups again sched: Add WF_FORK sched: Rename sync arguments sched: Rename select_task_rq() argument sched: Feature to disable APERF/MPERF cpu_power x86: sched: Provide arch implementations using aperf/mperf x86: Add generic aperf/mperf code x86: Move APERF/MPERF into a X86_FEATURE ... Fix up trivial conflict in arch/x86/include/asm/processor.h due to nearby addition of amd_get_nb_id() declaration from the EDAC merge.
| * | sched: Disable wakeup balancingPeter Zijlstra2009-09-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't really mind too much, SD_BALANCE_NEWIDLE picks up most of the slack. On a dual socket, quad core, dual thread nehalem system: sysbench (--num_threads=16): SD_BALANCE_WAKE-: 13982 tx/s SD_BALANCE_WAKE+: 15688 tx/s kbuild (-j16): SD_BALANCE_WAKE-: 47.648295846 seconds time elapsed ( +- 0.312% ) SD_BALANCE_WAKE+: 47.608607360 seconds time elapsed ( +- 0.026% ) (same within noise) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: x86: Name old_perf in a unique wayPeter Zijlstra2009-09-161-2/+2
| | | | | | | | | | | | | | | | | | | | | Silly percpu bits don't respect static.. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission>
| * | x86: sched: Provide arch implementations using aperf/mperfPeter Zijlstra2009-09-152-1/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | APERF/MPERF support for cpu_power. APERF/MPERF is arch defined to be a relative scale of work capacity per logical cpu, this is assumed to include SMT and Turbo mode. APERF/MPERF are specified to both reset to 0 when either counter wraps, which is highly inconvenient, since that'll give a blimp when that happens. The manual specifies writing 0 to the counters after each read, but that's 1) too expensive, and 2) destroys the possibility of sharing these counters with other users, so we live with the blimp - the other existing user does too. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | x86: Add generic aperf/mperf codePeter Zijlstra2009-09-152-70/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move some of the aperf/mperf code out from the cpufreq driver thingy so that other people can enjoy it too. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: cpufreq@vger.kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | x86: Move APERF/MPERF into a X86_FEATUREPeter Zijlstra2009-09-153-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the APERFMPERF capacility into a X86_FEATURE flag so that it can be used outside of the acpi cpufreq driver. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: cpufreq@vger.kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Reduce forkexec_idxPeter Zijlstra2009-09-151-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | If we're looking to place a new task, we might as well find the idlest position _now_, not 1 tick ago. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Improve latencies and throughputMike Galbraith2009-09-151-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the idle balancer more agressive, to improve a x264 encoding workload provided by Jason Garrett-Glaser: NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 252.82 fps, 22096.60 kb/s encoded 600 frames, 250.69 fps, 22096.60 kb/s encoded 600 frames, 245.76 fps, 22096.60 kb/s NO_NEXT_BUDDY LB_BIAS encoded 600 frames, 344.44 fps, 22096.60 kb/s encoded 600 frames, 346.66 fps, 22096.60 kb/s encoded 600 frames, 352.59 fps, 22096.60 kb/s NO_NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 425.75 fps, 22096.60 kb/s encoded 600 frames, 425.45 fps, 22096.60 kb/s encoded 600 frames, 422.49 fps, 22096.60 kb/s Peter pointed out that this is better done via newidle_idx, not via LB_BIAS, newidle balancing should look for where there is load _now_, not where there was load 2 ticks ago. Worst-case latencies are improved as well as no buddies means less vruntime spread. (as per prior lkml discussions) This change improves kbuild-peak parallelism as well. Reported-by: Jason Garrett-Glaser <darkshikari@gmail.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1253011667.9128.16.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Tweak wake_idxPeter Zijlstra2009-09-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When merging select_task_rq_fair() and sched_balance_self() we lost the use of wake_idx, restore that and set them to 0 to make wake balancing more aggressive. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Merge select_task_rq_fair() and sched_balance_self()Peter Zijlstra2009-09-151-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem with wake_idle() is that is doesn't respect things like cpu_power, which means it doesn't deal well with SMT nor the recent RT interaction. To cure this, it needs to do what sched_balance_self() does, which leads to the possibility of merging select_task_rq_fair() and sched_balance_self(). Modify sched_balance_self() to: - update_shares() when walking up the domain tree, (it only called it for the top domain, but it should have done this anyway), which allows us to remove this ugly bit from try_to_wake_up(). - do wake_affine() on the smallest domain that contains both this (the waking) and the prev (the wakee) cpu for WAKE invocations. Then use the top-down balance steps it had to replace wake_idle(). This leads to the dissapearance of SD_WAKE_BALANCE and SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE. SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective. Touch all topology bits to replace the old with new SD flags -- platforms might need re-tuning, enabling SD_BALANCE_WAKE conditionally on a NUMA distance seems like a good additional feature, magny-core and small nehalem systems would want this enabled, systems with slow interconnects would not. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2009-09-173-12/+19
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pat: don't use rb-tree based lookup in reserve_memtype() x86: Increase MIN_GAP to include randomized stack
| * \ \ Merge branch 'x86/pat' into x86/urgentH. Peter Anvin2009-09-1738-1136/+1456
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: Suresh Siddha (1): x86, pat: don't use rb-tree based lookup in reserve_memtype() ... requires previous x86/pat commits already pushed to Linus. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | x86, pat: don't use rb-tree based lookup in reserve_memtype()Suresh Siddha2009-09-171-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent enhancement of rb-tree based lookup exposed a bug with the lookup mechanism in the reserve_memtype() which ensures that there are no conflicting memtype requests for the memory range. memtype_rb_search() returns an entry which has a start address <= new start address. And from here we traverse the linear linked list to check if there any conflicts with the existing mappings. As the rbtree is based on the start address of the memory range, it is quite possible that we have several overlapped mappings whose start address is much less than new requested start but the end is >= new requested end. This results in conflicting memtype mappings. Same bug exists with the old code which uses cached_entry from where we traverse the linear linked list. But the new rb-tree code exposes this bug fairly easily. For now, don't use the memtype_rb_search() and always start the search from the head of linear linked list in reserve_memtype(). Linear linked list for most of the systems grow's to few 10's of entries(as we track memory type of RAM pages using struct page). So we should be ok for now. We still retain the rbtree and use it to speed up free_memtype() which doesn't have the same bug(as we know what exactly we are searching for in free_memtype). Also use list_for_each_entry_from() in free_memtype() so that we start the search from rb-tree lookup result. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <1253136483.4119.12.camel@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | | x86: Increase MIN_GAP to include randomized stackMichal Hocko2009-09-102-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we are not including randomized stack size when calculating mmap_base address in arch_pick_mmap_layout for topdown case. This might cause that mmap_base starts in the stack reserved area because stack is randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB. If the stack really grows down to mmap_base then we can get silent mmap region overwrite by the stack values. Let's include maximum stack randomization size into MIN_GAP which is used as the low bound for the gap in mmap. Signed-off-by: Michal Hocko <mhocko@suse.cz> LKML-Reference: <1252400515-6866-1-git-send-email-mhocko@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Stable Team <stable@kernel.org>
* | | | | Merge branch 'tracing-core-for-linus' of ↵Linus Torvalds2009-09-172-3/+5
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (44 commits) vsnprintf: remove duplicate comment of vsnprintf softirq: add BLOCK_IOPOLL to softirq_to_name oprofile: fix oprofile regression: select RING_BUFFER_ALLOW_SWAP tracing: switch function prints from %pf to %ps vsprintf: add %ps that is the same as %pS but is like %pf tracing: Fix minor bugs for __unregister_ftrace_function_probe tracing: remove notrace from __kprobes annotation tracing: optimize global_trace_clock cachelines MAINTAINERS: Update tracing tree details ftrace: document function and function graph implementation tracing: make testing syscall events a separate configuration tracing: remove some unused macros ftrace: add compile-time check on F_printk() tracing: fix F_printk() typos tracing: have TRACE_EVENT macro use __flags to not shadow parameter tracing: add static to generated TRACE_EVENT functions ring-buffer: typecast cmpxchg to fix PowerPC warning tracing: add filter event logic to special, mmiotrace and boot tracers tracing: remove trace_event_types.h tracing: use the new trace_entries.h to create format files ...
| * \ \ \ \ Merge branch 'linus' into tracing/coreIngo Molnar2009-09-17185-5748/+8803
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: Pick up kernel/softirq.c update for dependent fix. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | tracing/function-graph: x86_64 stack allocation cleanupJiri Olsa2009-09-121-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only 24 bytes needs to be reserved on the stack for the function graph tracer on x86_64. Signed-off-by: Jiri Olsa <jolsa@redhat.com> LKML-Reference: <20090729085837.GB4998@jolsa.lab.eng.brq.redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | x86/tracing: comment need for atomic nopSteven Rostedt2009-09-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dynamic function tracer relys on the macro P6_NOP5 always being an atomic NOP. If for some reason it is changed to be two operations (like a nop2 nop3) it can faults within the kernel when the function tracer modifies the code. This patch adds a comment to note that the P6_NOPs are expected to be atomic. This will hopefully prevent anyone from changing that. Reported-by: Mathieu Desnoyer <mathieu.desnoyers@polymtl.ca> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | | | | | Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds2009-09-172-0/+12
|\ \ \ \ \ \ \ | |_|/ / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: amd64_edac: check NB MCE bank enable on the current node properly amd64_edac: Rewrite unganged mode code of f10_early_channel_count amd64_edac: cleanup amd64_check_ecc_enabled x86, EDAC: Provide function to return NodeId of a CPU amd64_edac: build driver only on AMD hardware
| * | | | | | x86, EDAC: Provide function to return NodeId of a CPUAndreas Herrmann2009-09-162-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: H. Peter Anvin <hpa@zytor.com>
* | | | | | | Merge branch 'linux-next' of ↵Linus Torvalds2009-09-165-68/+72
|\ \ \ \ \ \ \ | |/ / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (75 commits) PCI hotplug: clean up acpi_run_hpp() PCI hotplug: acpiphp: use generic pci_configure_slot() PCI hotplug: shpchp: use generic pci_configure_slot() PCI hotplug: pciehp: use generic pci_configure_slot() PCI hotplug: add pci_configure_slot() PCI hotplug: clean up acpi_get_hp_params_from_firmware() interface PCI hotplug: acpiphp: don't cache hotplug_params in acpiphp_bridge PCI hotplug: acpiphp: remove superfluous _HPP/_HPX evaluation PCI: Clear saved_state after the state has been restored PCI PM: Return error codes from pci_pm_resume() PCI: use dev_printk in quirk messages PCI / PCIe portdrv: Fix pcie_portdrv_slot_reset() PCI Hotplug: convert acpi_pci_detect_ejectable() to take an acpi_handle PCI Hotplug: acpiphp: find bridges the easy way PCI: pcie portdrv: remove unused variable PCI / ACPI PM: Propagate wake-up enable for devices w/o ACPI support ACPI PM: Replace wakeup.prepared with reference counter PCI PM: Introduce device flag wakeup_prepared PCI / ACPI PM: Rework some debug messages PCI PM: Simplify PCI wake-up code ... Fixed up conflict in arch/powerpc/kernel/pci_64.c due to OF device tree scanning having been moved and merged for the 32- and 64-bit cases. The 'needs_freset' initialization added in 6e19314cc ("PCI/powerpc: support PCIe fundamental reset") is now in arch/powerpc/kernel/pci_of_scan.c.
| * | | | | | x86/PCI: pci quirks, fix pci refcountingJiri Slaby2009-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stanse found a pci reference leak in quirk_amd_nb_node. Instead of putting nb_ht, there is a put of dev passed as an argument. http://stanse.fi.muni.cz/ Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | | | | PCI iommu: iommu=pt is a valid early paramAlex Williamson2009-09-091-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This avoids a "Malformed early option 'iommu'" on boot when trying to use pass-through mode. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | | | | x86/PCI: initialize PCI bus node numbers earlyJesse Barnes2009-09-092-63/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current mp_bus_to_node array is initialized only by AMD specific code, since AMD platforms have registers that can be used for determining mode numbers. On new Intel platforms it's necessary to initialize this array as well though, otherwise all PCI node numbers will be 0, when in fact they should be -1 (indicating that I/O isn't tied to any particular node). So move the mp_bus_to_node code into the common PCI code, and initialize it early with a default value of -1. This may be overridden later by arch code (e.g. the AMD code). With this change, PCI consistent memory and other node specific allocations (e.g. skbuff allocs) should occur on the "current" node. If, for performance reasons, applications want to be bound to specific nodes, they should open their devices only after being pinned to the CPU where they'll run, for maximum locality. Acked-by: Yinghai Lu <yinghai@kernel.org> Tested-by: Jesse Brandeburg <jesse.brandeburg@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | | | | PCI: remove pcibios_scan_all_fns()Alex Chiang2009-09-091-1/+0
| | |/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was #define'd as 0 on all platforms, so let's get rid of it. This change makes pci_scan_slot() slightly easier to read. Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Tony Luck <tony.luck@intel.com> Cc: David Howells <dhowells@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Russell King <linux@arm.linux.org.uk> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2009-09-159-369/+69
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c
| * | | | | | percpu: kill lpage first chunk allocatorTejun Heo2009-08-141-19/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With x86 converted to embedding allocator, lpage doesn't have any user left. Kill it along with cpa handling code. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com>
| * | | | | | x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMATejun Heo2009-08-142-131/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Embedding percpu first chunk allocator can now handle very sparse unit mapping. Use embedding allocator instead of lpage for 64bit NUMA. This removes extra TLB pressure and the need to do complex and fragile dancing when changing page attributes. For 32bit, using very sparse unit mapping isn't a good idea because the vmalloc space is very constrained. 32bit NUMA machines aren't exactly the focus of optimization and it isn't very clear whether lpage performs better than page. Use page first chunk allocator for 32bit NUMAs. As this leaves setup_pcpu_*() functions pretty much empty, fold them into setup_per_cpu_areas(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <andi@firstfloor.org>