diff options
author | Chen, Kenneth W <kenneth.w.chen@intel.com> | 2006-10-13 10:08:13 -0700 |
---|---|---|
committer | Tony Luck <tony.luck@intel.com> | 2007-02-06 15:04:48 -0800 |
commit | 00b65985fb2fc542b855b03fcda0d0f2bab4f442 (patch) | |
tree | dc9372aced10184945862b9adf0848da3e0e946f /arch/ia64/mm | |
parent | a0776ec8e97bf109e7d973d09fc3e1814eb32bfb (diff) | |
download | kernel_samsung_smdk4412-00b65985fb2fc542b855b03fcda0d0f2bab4f442.zip kernel_samsung_smdk4412-00b65985fb2fc542b855b03fcda0d0f2bab4f442.tar.gz kernel_samsung_smdk4412-00b65985fb2fc542b855b03fcda0d0f2bab4f442.tar.bz2 |
[IA64] relax per-cpu TLB requirement to DTC
Instead of pinning per-cpu TLB into a DTR, use DTC. This will free up
one TLB entry for application, or even kernel if access pattern to
per-cpu data area has high temporal locality.
Since per-cpu is mapped at the top of region 7 address, we just need to
add special case in alt_dtlb_miss. The physical address of per-cpu data
is already conveniently stored in IA64_KR(PER_CPU_DATA). Latency for
alt_dtlb_miss is not affected as we can hide all the latency. It was
measured that alt_dtlb_miss handler has 23 cycles latency before and
after the patch.
The performance effect is massive for applications that put lots of tlb
pressure on CPU. Workload environment like database online transaction
processing or application uses tera-byte of memory would benefit the most.
Measurement with industry standard database benchmark shown an upward
of 1.6% gain. While smaller workloads like cpu, java also showing small
improvement.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Diffstat (limited to 'arch/ia64/mm')
-rw-r--r-- | arch/ia64/mm/init.c | 11 |
1 files changed, 1 insertions, 10 deletions
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 1373fae..07d82cd 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -337,7 +337,7 @@ setup_gate (void) void __devinit ia64_mmu_init (void *my_cpu_data) { - unsigned long psr, pta, impl_va_bits; + unsigned long pta, impl_va_bits; extern void __devinit tlb_init (void); #ifdef CONFIG_DISABLE_VHPT @@ -346,15 +346,6 @@ ia64_mmu_init (void *my_cpu_data) # define VHPT_ENABLE_BIT 1 #endif - /* Pin mapping for percpu area into TLB */ - psr = ia64_clear_ic(); - ia64_itr(0x2, IA64_TR_PERCPU_DATA, PERCPU_ADDR, - pte_val(pfn_pte(__pa(my_cpu_data) >> PAGE_SHIFT, PAGE_KERNEL)), - PERCPU_PAGE_SHIFT); - - ia64_set_psr(psr); - ia64_srlz_i(); - /* * Check if the virtually mapped linear page table (VMLPT) overlaps with a mapped * address space. The IA-64 architecture guarantees that at least 50 bits of |