diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2012-02-18 12:56:35 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-02-18 14:03:48 -0800 |
commit | 34ddc81a230b15c0e345b6b253049db731499f7e (patch) | |
tree | 0c3afd68071ec1a8a1d8724ef9a42ef845ecf402 /arch/x86/kernel/process_64.c | |
parent | f94edacf998516ac9d849f7bc6949a703977a7f3 (diff) | |
download | kernel_goldelico_gta04-34ddc81a230b15c0e345b6b253049db731499f7e.zip kernel_goldelico_gta04-34ddc81a230b15c0e345b6b253049db731499f7e.tar.gz kernel_goldelico_gta04-34ddc81a230b15c0e345b6b253049db731499f7e.tar.bz2 |
i387: re-introduce FPU state preloading at context switch time
After all the FPU state cleanups and finally finding the problem that
caused all our FPU save/restore problems, this re-introduces the
preloading of FPU state that was removed in commit b3b0870ef3ff ("i387:
do not preload FPU state at task switch time").
However, instead of simply reverting the removal, this reimplements
preloading with several fixes, most notably
- properly abstracted as a true FPU state switch, rather than as
open-coded save and restore with various hacks.
In particular, implementing it as a proper FPU state switch allows us
to optimize the CR0.TS flag accesses: there is no reason to set the
TS bit only to then almost immediately clear it again. CR0 accesses
are quite slow and expensive, don't flip the bit back and forth for
no good reason.
- Make sure that the same model works for both x86-32 and x86-64, so
that there are no gratuitous differences between the two due to the
way they save and restore segment state differently due to
architectural differences that really don't matter to the FPU state.
- Avoid exposing the "preload" state to the context switch routines,
and in particular allow the concept of lazy state restore: if nothing
else has used the FPU in the meantime, and the process is still on
the same CPU, we can avoid restoring state from memory entirely, just
re-expose the state that is still in the FPU unit.
That optimized lazy restore isn't actually implemented here, but the
infrastructure is set up for it. Of course, older CPU's that use
'fnsave' to save the state cannot take advantage of this, since the
state saving also trashes the state.
In other words, there is now an actual _design_ to the FPU state saving,
rather than just random historical baggage. Hopefully it's easier to
follow as a result.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'arch/x86/kernel/process_64.c')
-rw-r--r-- | arch/x86/kernel/process_64.c | 5 |
1 files changed, 4 insertions, 1 deletions
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 753e803..1fd94bc 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -386,8 +386,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) int cpu = smp_processor_id(); struct tss_struct *tss = &per_cpu(init_tss, cpu); unsigned fsindex, gsindex; + fpu_switch_t fpu; - __unlazy_fpu(prev_p); + fpu = switch_fpu_prepare(prev_p, next_p); /* * Reload esp0, LDT and the page table pointer: @@ -457,6 +458,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) wrmsrl(MSR_KERNEL_GS_BASE, next->gs); prev->gsindex = gsindex; + switch_fpu_finish(next_p, fpu); + /* * Switch the PDA and FPU contexts. */ |