aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* KVM: VMX: conditionally disable 2M pagesMarcelo Tosatti2009-09-103-2/+16
| | | | | | | | | | Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled. [avi: s/largepages_disabled/largepages_enabled/ to avoid negative logic] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: EPT misconfiguration handlerMarcelo Tosatti2009-09-101-1/+85
| | | | | | | | | | Handler for EPT misconfiguration which checks for valid state in the shadow pagetables, printing the spte on each level. The separate WARN_ONs are useful for kerneloops.org. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: add kvm_mmu_get_spte_hierarchy helperMarcelo Tosatti2009-09-102-0/+20
| | | | | | | Required by EPT misconfiguration handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: make for_each_shadow_entry aware of largepagesMarcelo Tosatti2009-09-101-0/+5
| | | | | | | | This way there is no need to add explicit checks in every for_each_shadow_entry user. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bitsMarcelo Tosatti2009-09-102-0/+27
| | | | | | | Required for EPT misconfiguration handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Move performance counter MSR access interception to generic x86 pathAndre Przywara2009-09-103-28/+30
| | | | | | | | | | | | | The performance counter MSRs are different for AMD and Intel CPUs and they are chosen mainly by the CPUID vendor string. This patch catches writes to all addresses (regardless of VMX/SVM path) and handles them in the generic MSR handler routine. Writing a 0 into the event select register is something we perfectly emulate ;-), so don't print out a warning to dmesg in this case. This fixes booting a 64bit Windows guest with an AMD CPUID on an Intel host. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU audit: largepage handlingMarcelo Tosatti2009-09-101-8/+7
| | | | | | | Make the audit code aware of largepages. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU audit: audit_mappings tweaksMarcelo Tosatti2009-09-101-1/+7
| | | | | | | | | | - Fail early in case gfn_to_pfn returns is_error_pfn. - For the pre pte write case, avoid spurious "gva is valid but spte is notrap" messages (the emulation code does the guest write first, so this particular case is OK). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU audit: nontrapping ptes in nonleaf levelMarcelo Tosatti2009-09-101-6/+1
| | | | | | | It is valid to set non leaf sptes as notrap. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU audit: update audit_write_protectionMarcelo Tosatti2009-09-101-3/+11
| | | | | | | | - Unsync pages contain writable sptes in the rmap. - rmaps do not exclusively contain writable sptes anymore. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU audit: update count_writable_mappings / count_rmapsMarcelo Tosatti2009-09-101-10/+94
| | | | | | | | | | | | | | | | | | Under testing, count_writable_mappings returns a value that is 2 integers larger than what count_rmaps returns. Suspicion is that either of the two functions is counting a duplicate (either positively or negatively). Modifying check_writable_mappings_rmap to check for rmap existance on all present MMU pages fails to trigger an error, which should keep Avi happy. Also introduce mmu_spte_walk to invoke a callback on all present sptes visible to the current vcpu, might be useful in the future. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: introduce is_last_spte helperMarcelo Tosatti2009-09-101-13/+13
| | | | | | | | | | Hiding some of the last largepage / level interaction (which is useful for gbpages and for zero based levels). Also merge the PT_PAGE_TABLE_LEVEL clearing loop in unlink_children. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Return to userspace on emulation failureAvi Kivity2009-09-102-2/+10
| | | | | | | Instead of mindlessly retrying to execute the instruction, report the failure to userspace. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Use macro to iterate over vcpus.Gleb Natapov2009-09-109-74/+76
| | | | | | | | [christian: remove unused variables on s390] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Break dependency between vcpu index in vcpus array and vcpu_id.Gleb Natapov2009-09-109-33/+55
| | | | | | | | | Archs are free to use vcpu_id as they see fit. For x86 it is used as vcpu's apic id. New ioctl is added to configure boot vcpu id that was assumed to be 0 till now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Use pointer to vcpu instead of vcpu_id in timer code.Gleb Natapov2009-09-104-4/+4
| | | | | Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Introduce kvm_vcpu_is_bsp() function.Gleb Natapov2009-09-1011-18/+28
| | | | | | | Use it instead of open code "vcpu_id zero is BSP" assumption. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: s/shadow_pte/spte/Avi Kivity2009-09-102-59/+59
| | | | | | | | | We use shadow_pte and spte inconsistently, switch to the shorter spelling. Rename set_shadow_pte() to __set_spte() to avoid a conflict with the existing set_spte(), and to indicate its lowlevelness. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pteAvi Kivity2009-09-104-21/+21
| | | | | | | | | Since the guest and host ptes can have wildly different format, adjust the pte accessor names to indicate on which type of pte they operate on. No functional changes. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Fix is_dirty_pte()Avi Kivity2009-09-101-1/+1
| | | | | | | | | | is_dirty_pte() is used on guest ptes, not shadow ptes, so it needs to avoid shadow_dirty_mask and use PT_DIRTY_MASK instead. Misdetecting dirty pages could lead to unnecessarily setting the dirty bit under EPT. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Move rmode structure to vmx-specific codeAvi Kivity2009-09-102-44/+44
| | | | | | rmode is only used in vmx, so move it to vmx.c Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Reorder ioctls in kvm.hAvi Kivity2009-09-101-5/+5
| | | | | | Somehow the VM ioctls got unsorted; resort. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Support Unrestricted Guest featureNitin A Kamble2009-09-103-11/+62
| | | | | | | | | | | | | | | | | | "Unrestricted Guest" feature is added in the VMX specification. Intel Westmere and onwards processors will support this feature. It allows kvm guests to run real mode and unpaged mode code natively in the VMX mode when EPT is turned on. With the unrestricted guest there is no need to emulate the guest real mode code in the vm86 container or in the emulator. Also the guest big real mode code works like native. The attached patch enhances KVM to use the unrestricted guest feature if available on the processor. It also adds a new kernel/module parameter to disable the unrestricted guest feature at the boot time. Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: switch irq injection/acking data structures to irq_lockMarcelo Tosatti2009-09-108-30/+58
| | | | | | | | | | | | | | | | | | | Protect irq injection/acking data structures with a separate irq_lock mutex. This fixes the following deadlock: CPU A CPU B kvm_vm_ioctl_deassign_dev_irq() mutex_lock(&kvm->lock); worker_thread() -> kvm_deassign_irq() -> kvm_assigned_dev_interrupt_work_handler() -> deassign_host_irq() mutex_lock(&kvm->lock); -> cancel_work_sync() [blocked] [gleb: fix ia64 path] Reported-by: Alex Williamson <alex.williamson@hp.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: introduce irq_lock, use it to protect ioapicMarcelo Tosatti2009-09-103-1/+8
| | | | | | | Introduce irq_lock, and use to protect ioapic data structures. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: move coalesced_mmio locking to its own deviceMarcelo Tosatti2009-09-102-6/+5
| | | | | | | | Move coalesced_mmio locking to its own device, instead of relying on kvm->lock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Grab pic lock in kvm_pic_clear_isr_ackMarcelo Tosatti2009-09-101-0/+2
| | | | | | | isr_ack is protected by kvm_pic->lock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Cleanup LAPIC interfaceJan Kiszka2009-09-101-12/+0
| | | | | | | | | None of the interface services the LAPIC emulation provides need to be exported to modules, and kvm_lapic_get_base is even totally unused today. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ppc: e500: Add MMUCFG and PVR emulationLiu Yu2009-09-102-0/+5
| | | | | | | Latest kernel started to use these two registers. Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ppc: e500: Directly pass pvr to guestLiu Yu2009-09-103-5/+1
| | | | | Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ppc: e500: Move to Book-3e MMU definitionsLiu Yu2009-09-102-8/+8
| | | | | | | According to commit 70fe3af8403f85196bb74f22ce4813db7dfedc1a. Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Calculate available entries in coalesced mmio ringAvi Kivity2009-09-101-5/+5
| | | | | | | | | | | Instead of checking whether we'll wrap around, calculate how many entries are available, and check whether we have enough (just one) for the pending mmio. By itself, this doesn't change anything, but it paves the way for making this function lockless. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Fix reporting of unhandled EPT violationsAvi Kivity2009-09-101-2/+2
| | | | | | | Instead of returning -ENOTSUPP, exit normally but indicate the hardware exit reason. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Cache pdptrsAvi Kivity2009-09-107-13/+63
| | | | | | | Instead of reloading the pdptrs on every entry and exit (vmcs writes on vmx, guest memory access on svm) extract them on demand. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Simplify pdptr and cr3 managementAvi Kivity2009-09-101-6/+15
| | | | | | | | | Instead of reading the PDPTRs from memory after every exit (which is slow and wrong, as the PDPTRs are stored on the cpu), sync the PDPTRs from memory to the VMCS before entry, and from the VMCS to memory after exit. Do the same for cr3. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Avoid duplicate ept tlb flush when setting cr3Avi Kivity2009-09-101-1/+0
| | | | | | | vmx_set_cr3() will call vmx_tlb_flush(), which will flush the ept context. So there is no need to call ept_sync_context() explicitly. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: do not register i8254 PIO regions until we are initializedGregory Haskins2009-09-101-9/+8
| | | | | | | | | | | We currently publish the i8254 resources to the pio_bus before the devices are fully initialized. Since we hold the pit_lock, its probably not a real issue. But lets clean this up anyway. Reported-by: Avi Kivity <avi@redhat.com> Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: cleanup io_device codeGregory Haskins2009-09-108-53/+109
| | | | | | | | | | | We modernize the io_device code so that we use container_of() instead of dev->private, and move the vtable to a separate ops structure (theoretically allows better caching for multiple instances of the same ops structure) Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Clean up coalesced_mmio destructionGregory Haskins2009-09-101-1/+4
| | | | | | | | | | We invoke kfree() on a data member instead of the structure. This works today because the kvm_io_device is the first element of the private structure, but this could change in the future, so lets clean this up. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: powerpc: fix some init/exit annotationsStephen Rothwell2009-09-103-5/+5
| | | | | | | | | | | | | | Fixes a couple of warnings like this one: WARNING: arch/powerpc/kvm/kvm-440.o(.text+0x1e8c): Section mismatch in reference from the function kvmppc_44x_exit() to the function .exit.text:kvmppc_booke_exit() The function kvmppc_44x_exit() references a function in an exit section. Often the function kvmppc_booke_exit() has valid usage outside the exit section and the fix is to remove the __exit annotation of kvmppc_booke_exit. Also add some __init annotations on obvious routines. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: SVM: Fold kvm_svm.h info svm.cAvi Kivity2009-09-102-54/+40
| | | | | | kvm_svm.h is only included from svm.c, so fold it in. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: remove redundant declarationsChristian Ehrhardt2009-09-101-2/+0
| | | | | | | | | | | | Changing s390 code in kvm_arch_vcpu_load/put come across this header declarations. They are complete duplicates, not even useful forward declarations as nothing using it is in between (maybe it was that in the past). This patch removes the two dispensable lines. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: No disable_irq for MSI/MSI-X interrupt on device assignmentSheng Yang2009-09-101-14/+8
| | | | | | | | | | | | | | | | Disable interrupt at interrupt handler and enable it when guest ack is for the level triggered interrupt, to prevent reinjected interrupt. MSI/MSI-X don't need it. One possible problem is multiply same vector interrupt injected between irq handler and scheduled work handler would be merged as one for MSI/MSI-X. But AFAIK, the drivers handle it well. The patch fixed the oplin card performance issue(MSI-X performance is half of MSI/INTx). Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: SVM: use explicit 64bit storage for sysenter valuesAndre Przywara2009-09-102-4/+6
| | | | | | | | | | | | Since AMD does not support sysenter in 64bit mode, the VMCB fields storing the MSRs are truncated to 32bit upon VMRUN/#VMEXIT. So store the values in a separate 64bit storage to avoid truncation. [andre: fix amd->amd migration] Signed-off-by: Christoph Egger <christoph.egger@amd.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Downsize max support MSI-X entry to 256Sheng Yang2009-09-101-1/+1
| | | | | | | | We only trap one page for MSI-X entry now, so it's 4k/(128/8) = 256 entries at most. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: s390: streamline memslot handlingChristian Ehrhardt2009-09-106-54/+62
| | | | | | | | | | | | | This patch relocates the variables kvm-s390 uses to track guest mem addr/size. As discussed dropping the variables at struct kvm_arch level allows to use the common vcpu->request based mechanism to reload guest memory if e.g. changes via set_memory_region. The kick mechanism introduced in this series is used to ensure running vcpus leave guest state to catch the update. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: s390: fix signal handlingChristian Ehrhardt2009-09-101-1/+3
| | | | | | | | | | | | | If signal pending is true we exit without updating kvm_run, userspace currently just does nothing and jumps to kvm_run again. Since we did not set an exit_reason we might end up with a random one (whatever was the last exit). Therefore it was possible to e.g. jump to the psw position the last real interruption set. Setting the INTR exit reason ensures that no old psw data is swapped in on reentry. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: s390: infrastructure to kick vcpus out of guest stateChristian Ehrhardt2009-09-105-27/+54
| | | | | | | | | | To ensure vcpu's come out of guest context in certain cases this patch adds a s390 specific way to kick them out of guest context. Currently it kicks them out to rerun the vcpu_run path in the s390 code, but the mechanism itself is expandable and with a new flag we could also add e.g. kicks to userspace etc. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ia64: Correct itc_offset calculationsJes Sorensen2009-09-101-1/+1
| | | | | | | | | | | Init the itc_offset for all possible vCPUs. The current code by mistake ends up only initializing the offset on vCPU 0. Spotted by Gleb Natapov. Signed-off-by: Jes Sorensen <jes@sgi.com> Acked-by : Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Allow PIT emulation without speaker portJan Kiszka2009-09-104-8/+32
| | | | | | | | | | | | | The in-kernel speaker emulation is only a dummy and also unneeded from the performance point of view. Rather, it takes user space support to generate sound output on the host, e.g. console beeps. To allow this, introduce KVM_CREATE_PIT2 which controls in-kernel speaker port emulation via a flag passed along the new IOCTL. It also leaves room for future extensions of the PIT configuration interface. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>