From 254e42006c893f45bca48f313536fcba12206418 Mon Sep 17 00:00:00 2001 From: Suresh Siddha Date: Mon, 6 Dec 2010 12:26:30 -0800 Subject: x86, vt-d: Quirk for masking vtd spec errors to platform error handling logic On platforms with Intel 7500 chipset, there were some reports of system hang/NMI's during kexec/kdump in the presence of interrupt-remapping enabled. During kdump, there is a window where the devices might be still using old kernel's interrupt information, while the kdump kernel is coming up. This can cause vt-d faults as the interrupt configuration from the old kernel map to null IRTE entries in the new kernel etc. (with out interrupt-remapping enabled, we still have the same issue but in this case we will see benign spurious interrupt hit the new kernel). Based on platform config settings, these platforms seem to generate NMI/SMI when a vt-d fault happens and there were reports that the resulting SMI causes the system to hang. Fix it by masking vt-d spec defined errors to platform error reporting logic. VT-d spec related errors are already handled by the VT-d OS code, so need to report the same error through other channels. Signed-off-by: Suresh Siddha LKML-Reference: <1291667190.2675.8.camel@sbsiddha-MOBL3.sc.intel.com> Cc: stable@kernel.org [v2.6.32+] Reported-by: Max Asbock Reported-and-tested-by: Takao Indoh Acked-by: Chris Wright Acked-by: Kenji Kaneshige Signed-off-by: H. Peter Anvin --- drivers/pci/quirks.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) (limited to 'drivers/pci') diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 6f9350c..36191ed 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -2764,6 +2764,29 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_RICOH, PCI_DEVICE_ID_RICOH_R5C832, ricoh_m DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_RICOH, PCI_DEVICE_ID_RICOH_R5C832, ricoh_mmc_fixup_r5c832); #endif /*CONFIG_MMC_RICOH_MMC*/ +#if defined(CONFIG_DMAR) || defined(CONFIG_INTR_REMAP) +#define VTUNCERRMSK_REG 0x1ac +#define VTD_MSK_SPEC_ERRORS (1 << 31) +/* + * This is a quirk for masking vt-d spec defined errors to platform error + * handling logic. With out this, platforms using Intel 7500, 5500 chipsets + * (and the derivative chipsets like X58 etc) seem to generate NMI/SMI (based + * on the RAS config settings of the platform) when a vt-d fault happens. + * The resulting SMI caused the system to hang. + * + * VT-d spec related errors are already handled by the VT-d OS code, so no + * need to report the same error through other channels. + */ +static void vtd_mask_spec_errors(struct pci_dev *dev) +{ + u32 word; + + pci_read_config_dword(dev, VTUNCERRMSK_REG, &word); + pci_write_config_dword(dev, VTUNCERRMSK_REG, word | VTD_MSK_SPEC_ERRORS); +} +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x342e, vtd_mask_spec_errors); +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x3c28, vtd_mask_spec_errors); +#endif static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f, struct pci_fixup *end) -- cgit v1.1 From 7f99d946e71e71d484b7543b49e990508e70d0c0 Mon Sep 17 00:00:00 2001 From: Suresh Siddha Date: Tue, 30 Nov 2010 22:22:29 -0800 Subject: x86, vt-d: Handle previous faults after enabling fault handling Fault handling is getting enabled after enabling the interrupt-remapping (as the success of interrupt-remapping can affect the apic mode and hence the fault handling mode). Hence there can potentially be some faults between the window of enabling interrupt-remapping in the vt-d and the fault-handling of the vt-d units. Handle any previous faults after enabling the vt-d fault handling. For v2.6.38 cleanup, need to check if we can remove the dmar_fault() in the enable_intr_remapping() and see if we can enable fault handling along with enabling intr-remapping. Signed-off-by: Suresh Siddha LKML-Reference: <20101201062244.630417138@intel.com> Cc: stable@kernel.org [v2.6.32+] Acked-by: Chris Wright Signed-off-by: H. Peter Anvin --- drivers/pci/dmar.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'drivers/pci') diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c index 0157708..09933eb 100644 --- a/drivers/pci/dmar.c +++ b/drivers/pci/dmar.c @@ -1417,6 +1417,11 @@ int __init enable_drhd_fault_handling(void) (unsigned long long)drhd->reg_base_addr, ret); return -1; } + + /* + * Clear any previous faults. + */ + dmar_fault(iommu->irq, iommu); } return 0; -- cgit v1.1