aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* KVM: x86 emulator: simplify exception generationAvi Kivity2011-01-121-90/+50
| | | | | | | | Immediately after we generate an exception, we want a X86EMUL_PROPAGATE_FAULT constant, so return it from the generation functions. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: tighen up ->read_std() and ->write_std() error checksAvi Kivity2011-01-121-8/+8
| | | | | | | | Instead of checking for X86EMUL_PROPAGATE_FAULT, check for any error, making the callers more reliable. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: drop dead pf injection in emulate_popf()Avi Kivity2011-01-121-8/+0
| | | | | | | If rc == X86EMUL_PROPAGATE_FAULT, we would have returned earlier. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: make emulator memory callbacks return full exceptionAvi Kivity2011-01-123-96/+84
| | | | | | | This way, they can return #GP, not just #PF. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: introduce struct x86_exception to communicate faultsAvi Kivity2011-01-123-12/+38
| | | | | | | | Introduce a structure that can contain an exception to be passed back to main kvm code. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: delay flush all tlbs on sync_page pathXiao Guangrong2011-01-123-3/+18
| | | | | | | | | | Quote from Avi: | I don't think we need to flush immediately; set a "tlb dirty" bit somewhere | that is cleareded when we flush the tlb. kvm_mmu_notifier_invalidate_page() | can consult the bit and force a flush if set. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: abstract invalid guest pte mappingXiao Guangrong2011-01-122-37/+37
| | | | | | | Introduce a common function to map invalid gpte Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: remove 'clear_unsync' parameterXiao Guangrong2011-01-123-8/+7
| | | | | | | Remove it since we can judge it by using sp->unsync Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: rename 'reset_host_protection' to 'host_writable'Lai Jiangshan2011-01-122-9/+9
| | | | | | | | Rename it to fit its sense better Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: don't drop spte if overwrite it from W to ROXiao Guangrong2011-01-121-11/+9
| | | | | | | | | We just need flush tlb if overwrite a writable spte with a read-only one. And we should move this operation to set_spte() for sync_page path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: fix forgot flush tlbs on sync_page pathXiao Guangrong2011-01-121-0/+1
| | | | | | | | | | | | | | | | | We should flush all tlbs after drop spte on sync_page path since Quote from Avi: | sync_page | drop_spte | kvm_mmu_notifier_invalidate_page | kvm_unmap_rmapp | spte doesn't exist -> no flush | page is freed | guest can write into freed page? KVM-Stable-Tag. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: PPC: Fix compile warningAlexander Graf2011-01-121-0/+2
| | | | | | | | | | | | | | | KVM compilation fails with the following warning: include/linux/kvm_host.h: In function 'kvm_irq_routing_update': include/linux/kvm_host.h:679:2: error: 'struct kvm' has no member named 'irq_routing' That function is only used and reasonable to have on systems that implement an in-kernel interrupt chip. PPC doesn't. Fix by #ifdef'ing it out when no irqchip is available. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Add instruction-set-specific exit qualifications to kvm_exit traceAvi Kivity2011-01-124-2/+25
| | | | | | | | | | | The exit reason alone is insufficient to understand exactly why an exit occured; add ISA-specific trace parameters for additional information. Because fetching these parameters is expensive on vmx, and because these parameters are fetched even if tracing is disabled, we fetch the parameters via a callback instead of as traditional trace arguments. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Record instruction set in kvm_exit tracepointAvi Kivity2011-01-123-4/+9
| | | | | | | exit_reason's meaning depend on the instruction set; record it so a trace taken on one machine can be interpreted on another. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: fast-path msi injection with irqfdMichael S. Tsirkin2011-01-123-15/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | Store irq routing table pointer in the irqfd object, and use that to inject MSI directly without bouncing out to a kernel thread. While we touch this structure, rearrange irqfd fields to make fastpath better packed for better cache utilization. This also adds some comments about locking rules and rcu usage in code. Some notes on the design: - Use pointer into the rt instead of copying an entry, to make it possible to use rcu, thus side-stepping locking complexities. We also save some memory this way. - Old workqueue code is still used for level irqs. I don't think we DTRT with level anyway, however, it seems easier to keep the code around as it has been thought through and debugged, and fix level later than rip out and re-instate it later. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()Avi Kivity2011-01-121-38/+25
| | | | | | | | | cea15c2 ("KVM: Move KVM context switch into own function") split vmx_vcpu_run() to prevent multiple copies of the context switch from being generated (causing problems due to a label). This patch folds them back together again and adds the __noclone attribute to prevent the label from being duplicated. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: do not perform address calculations on linear addressesAvi Kivity2011-01-121-1/+2
| | | | | | | | Linear addresses are supposed to already have segment checks performed on them; if we play with these addresses the checks become invalid. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: preserve an operand's segment identityAvi Kivity2011-01-122-52/+59
| | | | | | | | | | | | | Currently the x86 emulator converts the segment register associated with an operand into a segment base which is added into the operand address. This loss of information results in us not doing segment limit checks properly. Replace struct operand's addr.mem field by a segmented_address structure which holds both the effetive address and segment. This will allow us to do the limit check at the point of access. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: drop DPRINTF()Avi Kivity2011-01-121-6/+1
| | | | | | | Failed emulation is reported via a tracepoint; the cmps printk is pointless. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: drop unused #ifndef __KERNEL__Avi Kivity2011-01-121-7/+0
| | | | | Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Inform user about INTEL_TXT dependencyShane Wang2011-01-121-1/+4
| | | | | | | | | Inform user to either disable TXT in the BIOS or do TXT launch with tboot before enabling KVM since some BIOSes do not set FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX bit when TXT is enabled. Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: rename hardware_[dis|en]able() to *_nolock() and add locking wrappersTakuya Yoshikawa2011-01-121-12/+22
| | | | | | | | | | | | The naming convension of hardware_[dis|en]able family is little bit confusing because only hardware_[dis|en]able_all are using _nolock suffix. Renaming current hardware_[dis|en]able() to *_nolock() and using hardware_[dis|en]able() as wrapper functions which take kvm_lock for them reduces extra confusion. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: take kvm_lock for hardware_disable() during cpu hotplugTakuya Yoshikawa2011-01-121-0/+2
| | | | | | | | In kvm_cpu_hotplug(), only CPU_STARTING case is protected by kvm_lock. This patch adds missing protection for CPU_DYING case. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: don't mark spte notrap if reserved bit setXiao Guangrong2011-01-121-6/+11
| | | | | | | | If reserved bit is set, we need inject the #PF with PFEC.RSVD=1, but shadow_notrap_nonpresent_pte injects #PF with PFEC.RSVD=0 only Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Document device assigment APIJan Kiszka2011-01-121-0/+178
| | | | | | | | | | | Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE, KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and KVM_ASSIGN_SET_MSIX_ENTRY. Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Clean up kvm_vm_ioctl_assigned_deviceJan Kiszka2011-01-121-5/+4
| | | | | | | | | | | | Any arch not supporting device assigment will also not build assigned-dev.c. So testing for KVM_CAP_DEVICE_DEASSIGNMENT is pointless. KVM_CAP_ASSIGN_DEV_IRQ is unconditinally set. Moreover, add a default case for dispatching the ioctl. Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Save/restore state of assigned PCI deviceJan Kiszka2011-01-121-1/+4
| | | | | | | | | | The guest may change states that pci_reset_function does not touch. So we better save/restore the assigned device across guest usage. Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Refactor IRQ names of assigned devicesJan Kiszka2011-01-122-5/+7
| | | | | | | | | Cosmetic change, but it helps to correlate IRQs with PCI devices. Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Switch assigned device IRQ forwarding to threaded handlerJan Kiszka2011-01-122-83/+36
| | | | | | | | | | | | | | | | This improves the IRQ forwarding for assigned devices: By using the kernel's threaded IRQ scheme, we can get rid of the latency-prone work queue and simplify the code in the same run. Moreover, we no longer have to hold assigned_dev_lock while raising the guest IRQ, which can be a lenghty operation as we may have to iterate over all VCPUs. The lock is now only used for synchronizing masking vs. unmasking of INTx-type IRQs, thus is renames to intx_lock. Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Clear assigned guest IRQ on releaseJan Kiszka2011-01-121-0/+3
| | | | | | | | | | | When we deassign a guest IRQ, clear the potentially asserted guest line. There might be no chance for the guest to do this, specifically if we switch from INTx to MSI mode. Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Mask KVM_GET_SUPPORTED_CPUID data with Linux cpuid infoAvi Kivity2011-01-121-0/+9
| | | | | | | | This allows Linux to mask cpuid bits if, for example, nx is enabled on only some cpus. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: SVM: Replace svm_has() by standard Linux cpuid accessorsAvi Kivity2011-01-121-10/+5
| | | | | | | | | | Instead of querying cpuid directly, use the Linux accessors (boot_cpu_has, etc.). This allows the things like the clearcpuid kernel command line to work (when it's fixed wrt scattered cpuid bits). Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: fix apf prefault if nested guest is enabledXiao Guangrong2011-01-123-1/+4
| | | | | | | | If apf is generated in L2 guest and is completed in L1 guest, it will prefault this apf in L1 guest's mmu context. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: support apf for nonpaing guestXiao Guangrong2011-01-121-3/+9
| | | | | | | Let's support apf for nonpaing guest Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: clear apfs if page state is changedXiao Guangrong2011-01-121-0/+3
| | | | | | | | | | | | | If CR0.PG is changed, the page fault cann't be avoid when the prefault address is accessed later And it also fix a bug: it can retry a page enabled #PF in page disabled context if mmu is shadow page This idear is from Gleb Natapov Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: fix missing post sync auditXiao Guangrong2011-01-121-0/+1
| | | | | | | Add AUDIT_POST_SYNC audit for long mode shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Clean up vm creation and releaseJan Kiszka2011-01-127-72/+49
| | | | | | | | | | | | IA64 support forces us to abstract the allocation of the kvm structure. But instead of mixing this up with arch-specific initialization and doing the same on destruction, split both steps. This allows to move generic destruction calls into generic code. It also fixes error clean-up on failures of kvm_create_vm for IA64. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: Makefile clean upTracey Dent2011-01-121-1/+1
| | | | | | | Changed makefile to use the ccflags-y option instead of EXTRA_CFLAGS. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: remove unused function declarationXiao Guangrong2011-01-121-1/+0
| | | | | | | Remove the declaration of kvm_mmu_set_base_ptes() Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Refactor srcu struct release on early errorsJan Kiszka2011-01-121-8/+6
| | | | | Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Disallow NMI while blocked by STIAvi Kivity2011-01-121-1/+6
| | | | | | | | | | | | While not mandated by the spec, Linux relies on NMI being blocked by an IF-enabling STI. VMX also refuses to enter a guest in this state, at least on some implementations. Disallow NMI while blocked by STI by checking for the condition, and requesting an interrupt window exit if it occurs. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: fix the race while wakeup all pv guestXiao Guangrong2011-01-121-1/+4
| | | | | | | | | | | | In kvm_async_pf_wakeup_all(), we add a dummy apf to vcpu->async_pf.done without holding vcpu->async_pf.lock, it will break if we are handling apfs at this time. Also use 'list_empty_careful()' instead of 'list_empty()' Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: handle more completed apfs if possibleXiao Guangrong2011-01-121-16/+16
| | | | | | | | | | If it's no need to inject async #PF to PV guest we can handle more completed apfs at one time, so we can retry guest #PF as early as possible Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: avoid unnecessary wait for a async pfXiao Guangrong2011-01-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | In current code, it checks async pf completion out of the wait context, like this: if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE && !vcpu->arch.apf.halted) r = vcpu_enter_guest(vcpu); else { ...... kvm_vcpu_block(vcpu) ^- waiting until 'async_pf.done' is not empty } kvm_check_async_pf_completion(vcpu) ^- delete list from async_pf.done So, if we check aysnc pf completion first, it can be blocked at kvm_vcpu_block Fixed by mark the vcpu is unhalted in kvm_check_async_pf_completion() path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: fix searching async gfn in kvm_async_pf_gfn_slotXiao Guangrong2011-01-121-2/+2
| | | | | | | | Don't search later slots if the slot is empty Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: cleanup async_pf tracepointsXiao Guangrong2011-01-121-41/+35
| | | | | | | | Use 'DECLARE_EVENT_CLASS' to cleanup async_pf tracepoints Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: fix tracing kvm_try_async_get_pageXiao Guangrong2011-01-122-6/+8
| | | | | | | | | | | Tracing 'async' and *pfn is useless, since 'async' is always true, and '*pfn' is always "fault_pfn' We can trace 'gva' and 'gfn' instead, it can help us to see the life-cycle of an async_pf Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: replace vmalloc and memset with vzallocTakuya Yoshikawa2011-01-122-10/+3
| | | | | | | | Let's use newly introduced vzalloc(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: handle exit due to INVD in VMXGleb Natapov2011-01-122-0/+7
| | | | | | | | | | Currently the exit is unhandled, so guest halts with error if it tries to execute INVD instruction. Call into emulator when INVD instruction is executed by a guest instead. This instruction is not needed by ordinary guests, but firmware (like OpenBIOS) use it and fail. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86: Avoid issuing wbinvd twiceJan Kiszka2011-01-121-4/+6
| | | | | | | | | | Micro optimization to avoid calling wbinvd twice on the CPU that has to emulate it. As we might be preempted between smp_call_function_many and the local wbinvd, the cache might be filled again so that real work could be done uselessly. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>