aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/cpu/mcheck
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2009-06-111-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits) KVM: Prevent overflow in largepages calculation KVM: Disable large pages on misaligned memory slots KVM: Add VT-x machine check support KVM: VMX: Rename rmode.active to rmode.vm86_active KVM: Move "exit due to NMI" handling into vmx_complete_interrupts() KVM: Disable CR8 intercept if tpr patching is active KVM: Do not migrate pending software interrupts. KVM: inject NMI after IRET from a previous NMI, not before. KVM: Always request IRQ/NMI window if an interrupt is pending KVM: Do not re-execute INTn instruction. KVM: skip_emulated_instruction() decode instruction if size is not known KVM: Remove irq_pending bitmap KVM: Do not allow interrupt injection from userspace if there is a pending event. KVM: Unprotect a page if #PF happens during NMI injection. KVM: s390: Verify memory in kvm run KVM: s390: Sanity check on validity intercept KVM: s390: Unlink vcpu on destroy - v2 KVM: s390: optimize float int lock: spin_lock_bh --> spin_lock KVM: s390: use hrtimer for clock wakeup from idle - v2 KVM: s390: Fix memory slot versus run - v3 ...
| * KVM: Add VT-x machine check supportAndi Kleen2009-06-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VT-x needs an explicit MC vector intercept to handle machine checks in the hyper visor. It also has a special option to catch machine checks that happen during VT entry. Do these interceptions and forward them to the Linux machine check handler. Make it always look like user space is interrupted because the machine check handler treats kernel/user space differently. Thanks to Jiang Yunhong for help and testing. Cc: stable@kernel.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds2009-06-101-1/+0
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, nmi: Use predefined numbers instead of hardcoded one x86: asm/processor.h: remove double declaration x86, mtrr: replace MTRRdefType_MSR with msr-index's MSR_MTRRdefType x86, mtrr: replace MTRRfix4K_C0000_MSR with msr-index's MSR_MTRRfix4K_C0000 x86, mtrr: remove mtrr MSRs double declaration x86, mtrr: replace MTRRfix16K_80000_MSR with msr-index's MSR_MTRRfix16K_80000 x86, mtrr: replace MTRRfix64K_00000_MSR with msr-index's MSR_MTRRfix64K_00000 x86, mtrr: replace MTRRcap_MSR with msr-index's MSR_MTRRcap x86: mce: remove duplicated #include x86: msr-index.h remove duplicate MSR C001_0015 declaration x86: clean up arch/x86/kernel/tsc_sync.c a bit x86: use symbolic name for VM86_SIGNAL when used as vm86 default return x86: added 'ifndef _ASM_X86_IOMAP_H' to iomap.h x86: avoid multiple declaration of kstack_depth_to_print x86: vdso/vma.c declare vdso_enabled and arch_setup_additional_pages before they get used x86: clean up declarations and variables x86: apic/x2apic_cluster.c x86_cpu_to_logical_apicid should be static x86 early quirks: eliminate unused function
| * x86: mce: remove duplicated #includeHuang Weiyi2009-05-091-1/+0
| | | | | | | | | | | | | | | | | | Remove duplicated #include in arch/x86/kernel/cpu/mcheck/mce_intel_64.c. [ Impact: cleanup ] Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | cpumask: alloc zeroed cpumask for static cpumask_var_tsYinghai Lu2009-06-091-1/+1
| | | | | | | | | | | | | | | | These are defined as static cpumask_var_t so if MAXSMP is not used, they are cleared already. Avoid surprises when MAXSMP is enabled. Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | x86: MCE: make cmci_discover_lock irq-safeHidetoshi Seto2009-05-081-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lockdep reports the warning below when Li tries to offline one cpu: [ 110.835487] ================================= [ 110.835616] [ INFO: inconsistent lock state ] [ 110.835688] 2.6.30-rc4-00336-g8c9ed89 #52 [ 110.835757] --------------------------------- [ 110.835828] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. [ 110.835908] swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes: [ 110.835982] (cmci_discover_lock){?.+...}, at: [<ffffffff80236dc0>] cmci_clear+0x30/0x9b cmci_clear() can be called via smp_call_function_single(). It is better to disable interrupt while holding cmci_discover_lock, to turn it into an irq-safe lock - we can deadlock otherwise. [ Impact: fix possible deadlock in the MCE code ] Reported-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4A03ED38.8000700@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reported-by: Shaohua Li<shaohua.li@intel.com>
* | x86, mce: fix boot logging logicAndi Kleen2009-04-221-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | The earlier patch to change the poller to a separate function subtly broke the boot logging logic. This could lead to machine checks getting logged at boot even when disabled or defaulting to off on some systems. Fix that. [ Impact: bug fix - avoid spurious MCE in log ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | x86, mce: make polling timer interval per CPUAndi Kleen2009-04-221-12/+12
|/ | | | | | | | | | | | | | | The polling timer while running per CPU still uses a global next_interval variable, which lead to some CPUs either polling too fast or too slow. This was not a serious problem because all errors get picked up eventually, but it's still better to avoid it. Turn next_interval into a per cpu variable. v2: Fix check_interval == 0 case (Hidetoshi Seto) [ Impact: minor bug fix ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* Merge branch 'linus' into cpumask-for-linusIngo Molnar2009-03-302-17/+25
|\ | | | | | | | | Conflicts: arch/x86/kernel/cpu/common.c
| * x86: use smp_call_function_single() in arch/x86/kernel/cpu/mcheck/mce_amd_64.cAndrew Morton2009-03-181-16/+24
| | | | | | | | | | | | | | | | | | | | Attempting to rid us of the problematic work_on_cpu(). Just use smp_call_function_single() here. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <20090318042217.EF3F1DDF39@ozlabs.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86, mce: remove incorrect __cpuinit for intel_init_cmci()Hidetoshi Seto2009-03-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Bug fix on UP Referring commit cc3ca22063784076bd240fda87217387a8f2ae92, Peter removed __cpuinit annotations for mce_cpu_features() and its successor functions, which caused troubles on UP configurations. However the intel_init_cmci() was introduced after that and it also has __cpuinit annotation even though it is called from mce_cpu_features(). Remove the annotation from that function too. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | cpumask: use new cpumask functions throughout x86Rusty Russell2009-03-131-1/+1
| | | | | | | | | | | | | | | | | | | | Impact: cleanup 1) &cpu_online_map -> cpu_online_mask 2) first_cpu/next_cpu_nr -> cpumask_first/cpumask_next 3) cpu_*_map manipulation -> init_cpu_* / set_cpu_* Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | cpumask: convert arch/x86/kernel/cpu/mcheck/mce_64.cRusty Russell2009-03-131-4/+6
| | | | | | | | | | | | | | | | | | | | Impact: reduce kernel memory usage when CONFIG_CPUMASK_OFFSTACK=y Simple conversion of mce_device_initialized to cpumask_var_t. We don't check the alloc_cpumask_var() return since it's boot-time only, and the misc_register() in that same function isn't checked. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | cpumask: x86: convert cpu_sibling_map/cpu_core_map to cpumask_var_tRusty Russell2009-03-131-3/+3
|/ | | | | | | | | | | Impact: reduce per-cpu size for CONFIG_CPUMASK_OFFSTACK=y In most places it's cleaner to use the accessors cpu_sibling_mask() and cpu_core_mask() wrappers which already exist. I couldn't avoid cleaning up the access in oprofile, either. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* x86, mce: use round_jiffies() instead round_jiffies_relative()KOSAKI Motohiro2009-03-101-2/+2
| | | | | | | | | | | | | | | Impact: saving power _very_ little round_jiffies() round up absolute jiffies to full second. round_jiffies_relative() round up relative jiffies to full second. The "t->expires" is absolute jiffies. Then, round_jiffies() should be used instead round_jiffies_relative(). Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* x86, mce: fix build failure in arch/x86/kernel/cpu/mcheck/threshold.cIngo Molnar2009-03-041-3/+7
| | | | | | | | | | | | | Impact: build fix The APIC code rewrite in the x86 tree broke the x86/mce branch: arch/x86/kernel/cpu/mcheck/threshold.c: In function ‘mce_threshold_interrupt’: arch/x86/kernel/cpu/mcheck/threshold.c:24: error: implicit declaration of function ‘ack_APIC_irq’ Also tidy up the file a bit while at it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'x86/core' into x86/mce2H. Peter Anvin2009-02-243-12/+20
|\
| *-----. Merge branches 'x86/acpi', 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/mm', ↵Ingo Molnar2009-02-244-18/+27
| |\ \ \ \ | | | | | | | | | | | | | | | | | | 'x86/signal' and 'x86/urgent'; commit 'v2.6.29-rc6' into x86/core
| | | | | \
| | | | | \
| | | | *-. \ Merge branches 'x86/asm', 'x86/cleanups' and 'x86/headers' into x86/coreIngo Molnar2009-02-202-5/+5
| | | | |\ \ \ | | | | |/ / / | | | |/| | |
| | | | | * | x86: use symbolic constants for MSR_IA32_MISC_ENABLE bitsVegard Nossum2009-02-202-5/+5
| | | | |/ / | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Cleanup. No functional changes. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | | | * | Merge branch 'x86/urgent' into x86/coreIngo Molnar2009-02-201-2/+3
| | | | |\ \
| | * | | \ \ Merge branch 'linus' into x86/apicIngo Molnar2009-02-223-6/+7
| | |\ \ \ \ \ | | | |_|/ / / | | |/| | | / | | | | |_|/ | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/x86/mach-default/setup.c Semantic conflict resolution: arch/x86/kernel/setup.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | x86, apic: remove genapic.hIngo Molnar2009-02-173-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: cleanup Remove genapic.h and remove all references to it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | x86, apic: fix build fallout of genapic changesIngo Molnar2009-02-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - make oprofile build - select X86_X2APIC from X86_UV - it relies on it - export genapic for oprofile modular build Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | x86: fold apic_ops into genapicYinghai Lu2009-02-172-2/+2
| | | |/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: cleanup make it simpler, don't need have one extra struct. v2: fix the sgi_uv build Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | x86: remove include of apic.h from hardirq_64.hBrian Gerst2009-01-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: cleanup APIC definitions aren't needed here. Remove the include and fix up the fallout. tj: added include to mce_intel_64.c. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| | * | | x86: cleanup remaining cpumask_t code in mce_amd_64.cMike Travis2009-01-111-7/+14
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Reduce memory usage, use new cpumask API. Use cpumask_var_t for 'cpus' cpumask in struct threshold_bank and update remaining old cpumask_t functions to new cpumask API. Signed-off-by: Mike Travis <travis@sgi.com>
* | | | x86, mce, cmci: remove incorrect __cpuinit/__cpuexit annotationsH. Peter Anvin2009-02-241-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Bug fix on UP The MCE code is reinitialized from resume, so we can't use __cpuinit/__cpuexit for most of the code. Remove those annotations for anything downstream of mce_init(). Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | x86, mce, cmci: add CMCI supportAndi Kleen2009-02-242-3/+218
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Major new feature Intel CMCI (Corrected Machine Check Interrupt) is a new feature on Nehalem CPUs. It allows the CPU to trigger interrupts on corrected events, which allows faster reaction to them instead of with the traditional polling timer. Also use CMCI to discover shared banks. Machine check banks can be shared by CPU threads or even cores. Using the CMCI enable bit it is possible to detect the fact that another CPU already saw a specific bank. Use this to assign shared banks only to one CPU to avoid reporting duplicated events. On CPU hot unplug bank sharing is re discovered. This is done using a thread that cycles through all the CPUs. To avoid races between the poller and CMCI we only poll for banks that are not CMCI capable and only check CMCI owned banks on a interrupt. The shared banks ownership information is currently only used for CMCI interrupts, not polled banks. The sharing discovery code follows the algorithm recommended in the IA32 SDM Vol3a 14.5.2.1 The CMCI interrupt handler just calls the machine check poller to pick up the machine check event that caused the interrupt. I decided not to implement a separate threshold event like the AMD version has, because the threshold is always one currently and adding another event didn't seem to add any value. Some code inspired by Yunhong Jiang's Xen implementation, which was in term inspired by a earlier CMCI implementation by me. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | x86, mce, cmci: use polled banks bitmap in machine check pollerAndi Kleen2009-02-242-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define a per cpu bitmap that contains the banks polled by the machine check poller. This is needed for the CMCI code in the next patches to be able to disable polling on specific banks. The bank by default contains all banks, so there is no behaviour change. Only future code will remove some banks from the polling set. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | x86, mce: replace machine check events logged interval with ratelimitAndi Kleen2009-02-241-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: behavior change, use common code Use a standard leaky bucket ratelimit for the machine check warning print interval instead of waiting every check_interval. Also decrease the limit to twice per minute. This interacts better with threshold interrupts because they can happen more often than check_interval. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | x86, mce, cmci: avoid potential reentry of threshold interruptAndi Kleen2009-02-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: minor bugfix The threshold handler on AMD (and soon on Intel) could be theoretically reentered by the hardware. This could lead to corrupted events because the machine check poll code assumes it is not reentered. Move the APIC ACK to the end of the interrupt handler to let the hardware avoid that. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | x86, mce, cmci: factor out threshold interrupt handlerAndi Kleen2009-02-243-9/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: cleanup; preparation for feature The mce_amd_64 code has an own private MC threshold vector with an own interrupt handler. Since Intel needs a similar handler it makes sense to share the vector because both can not be active at the same time. I factored the common APIC handler code into a separate file which can be used by both the Intel or AMD MC code. This is needed for the next patch which adds an Intel specific CMCI handler. This patch should be a nop for AMD, it just moves some code around. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | x86, mce, cmci: export MAX_NR_BANKSAndi Kleen2009-02-241-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Cleanup (code movement) Move MAX_NR_BANKS into mce.h because it's needed there for followup patches. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | Merge branch 'x86/urgent' into x86/mce2H. Peter Anvin2009-02-233-4/+4
|\ \ \ \ | | |/ / | |/| |
| * | | x86, mce: remove incorrect __cpuinit for mce_cpu_features()H. Peter Anvin2009-02-203-4/+4
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Bug fix on UP Checkin 6ec68bff3c81e776a455f6aca95c8c5f1d630198: x86, mce: reinitialize per cpu features on resume introduced a call to mce_cpu_features() in the resume path, in order for the MCE machinery to get properly reinitialized after a resume. However, this function (and its successors) was flagged __cpuinit, which becomes __init on UP configurations (on SMP suspend/resume requires CPU hotplug and so this would not be seen.) Remove the offending __cpuinit annotations for mce_cpu_features() and its successor functions. Cc: Andi Kleen <ak@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | x86, mce: remove invalid __cpuinit/__cpuexit annotationsH. Peter Anvin2009-02-231-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Bug fix when CPU hotplug is disabled Correct the following broken __cpuinit/__cpuexit annotations: - mce_cpu_features() is called from mce_resume(), and so cannot be __cpuinit. - mce_disable_cpu() and mce_reenable_cpu() are called from mce_cpu_callback(), and so cannot be __cpuexit(). Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | | x86, mce: use %ll instead of %L for 64-bit numbersH. Peter Anvin2009-02-191-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Cleanup The standard spelling of a printf pattern for long long is "ll", not "L", which is for long double. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | | x86, mce: separate correct machine check poller and fatal exception handlerAndi Kleen2009-02-192-22/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: cleanup, performance enhancement The machine check poller is diverging more and more from the fatal exception handler. Instead of adding more special cases separate the code paths completely. The corrected poll path is actually quite simple, and this doesn't result in much code duplication. This makes both handlers much easier to read and results in cleaner code flow. The exception handler now only needs to care about uncorrected errors, which also simplifies the handling of multiple errors. The corrected poller also now always runs in standard interrupt context and does not need to do anything special to handle NMI context. Minor behaviour changes: - MCG status is now not cleared on polling. - Only the banks which had corrected errors get cleared on polling - The exception handler only clears banks with errors now v2: Forward port to new patch order. Add "uc" argument. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | | x86, mce: factor out duplicated struct mce setup into one functionAndi Kleen2009-02-193-13/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: cleanup This merely factors out duplicated code to set up the initial struct mce state into a single function. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | | x86, mce: implement dynamic machine check banks supportAndi Kleen2009-02-191-32/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: cleanup; making code future proof; memory saving on small systems This patch replaces the hardcoded max number of machine check banks with dynamic allocation depending on what the CPU reports. The sysfs data structures and the banks array are dynamically allocated. There is still a hard bank limit (128) because the mcelog protocol uses banks >= 128 as pseudo banks to escape other events. But we expect that 128 banks is beyond any reasonable CPU for now. This supersedes an earlier patch by Venki, but it solves the problem more completely by making the limit fully dynamic (up to the 128 boundary). This saves some memory on machines with less than 6 banks because they won't need sysdevs for unused ones and also allows to use sysfs to control these banks on possible future CPUs with more than 6 banks. This is an updated patch addressing Venki's comments. I also added in another patch from Thomas which fixed the error allocation path (that patch was previously separated) Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | | x86, mce: fix a race condition in mce_read()Huang Ying2009-02-171-17/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: bugfix Considering the situation as follow: before: mcelog.next == 1, mcelog.entry[0].finished = 1 +-------------------------------------------------------------------------- R W1 W2 W3 read mcelog.next (1) mcelog.next++ (2) (working on entry 1, finished == 0) mcelog.next = 0 mcelog.next++ (1) (working on entry 0) mcelog.next++ (2) (working on entry 1) <----------------- race ----------------> (done on entry 1, finished = 1) (done on entry 1, finished = 1) To fix the race condition, a cmpxchg loop is added to mce_read() to ensure no new MCE record can be added between mcelog.next reading and mcelog.next = 0. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | x86, mce: disable machine checks on offlined CPUsAndi Kleen2009-02-171-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Lower priority bug fix Offlined CPUs could still get machine checks, but the machine check handler cannot handle them properly, leading to an unconditional crash. Disable machine checks on CPUs that are going down. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | x86, mce: don't set up mce sysdev devices with mce=offAndi Kleen2009-02-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: bug fix, in this case the resume handler shouldn't run which avoids incorrectly reenabling machine checks on resume When MCEs are completely disabled on the command line don't set up the sysdev devices for them either. Includes a comment fix from Thomas Gleixner. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | x86, mce: switch machine check polling to per CPU timerAndi Kleen2009-02-171-23/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Higher priority bug fix The machine check poller runs a single timer and then broadcasted an IPI to all CPUs to check them. This leads to unnecessary synchronization between CPUs. The original CPU running the timer has to wait potentially a long time for all other CPUs answering. This is also real time unfriendly and in general inefficient. This was especially a problem on systems with a lot of events where the poller run with a higher frequency after processing some events. There could be more and more CPU time wasted with this, to the point of significantly slowing down machines. The machine check polling is actually fully independent per CPU, so there's no reason to not just do this all with per CPU timers. This patch implements that. Also switch the poller also to use standard timers instead of work queues. It was using work queues to be able to execute a user program on a event, but mce_notify_user() handles this case now with a separate callback. So instead always run the poll code in in a standard per CPU timer, which means that in the common case of not having to execute a trigger there will be less overhead. This allows to clean up the initialization significantly, because standard timers are already up when machine checks get init'ed. No multiple initialization functions. Thanks to Thomas Gleixner for some help. Cc: thockin@google.com v2: Use del_timer_sync() on cpu shutdown and don't try to handle migrated timers. v3: Add WARN_ON for timer running on unexpected CPU Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | x86, mce: always use separate work queue to run triggerAndi Kleen2009-02-171-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Needed for bug fix in next patch This relaxes the requirement that mce_notify_user has to run in process context. Useful for future changes, but also leads to cleaner behaviour now. Now instead mce_notify_user can be called directly from interrupt (but not NMI) context. The work queue only uses a single global work struct, which can be done safely because it is always free to reuse before the trigger function is executed. This way no events can be lost. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | x86, mce: don't disable machine checks during code patchingAndi Kleen2009-02-172-28/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: low priority bug fix This removes part of a a patch I added myself some time ago. After some consideration the patch was a bad idea. In particular it stopped machine check exceptions during code patching. To quote the comment: * MCEs only happen when something got corrupted and in this * case we must do something about the corruption. * Ignoring it is worse than a unlikely patching race. * Also machine checks tend to be broadcast and if one CPU * goes into machine check the others follow quickly, so we don't * expect a machine check to cause undue problems during to code * patching. So undo the machine check related parts of 8f4e956b313dcccbc7be6f10808952345e3b638c NMIs are still disabled. This only removes code, the only additions are a new comment. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | x86, mce: disable machine checks on suspendAndi Kleen2009-02-171-0/+25
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Bug fix During suspend it is not reliable to process machine check exceptions, because CPUs disappear but can still get machine check broadcasts. Also the system is slightly more likely to machine check them, but the handler is typically not a position to handle them in a meaningfull way. So disable them during suspend and enable them during resume. Also make sure they are always disabled on hot-unplugged CPUs. This new code assumes that suspend always hotunplugs all non BP CPUs. v2: Remove the WARN_ONs Thomas objected to. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | x86, mce: use force_sig_info to kill process in machine checkAndi Kleen2009-02-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: bug fix (with tolerant == 3) do_exit cannot be called directly from the exception handler because it can sleep and the exception handler runs on the exception stack. Use force_sig() instead. Based on a earlier patch by Ying Huang who debugged the problem. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | x86, mce: reinitialize per cpu features on resumeAndi Kleen2009-02-171-0/+1
|/ | | | | | | | | | | | | | | | | | | Impact: Bug fix This fixes a long standing bug in the machine check code. On resume the boot CPU wouldn't get its vendor specific state like thermal handling reinitialized. This means the boot cpu wouldn't ever get any thermal events reported again. Call the respective initialization functions on resume v2: Remove ancient init because they don't have a resume device anyways. Pointed out by Thomas Gleixner. v3: Now fix the Subject too to reflect v2 change Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>