aboutsummaryrefslogtreecommitdiffstats
path: root/arch/ia64/mm
Commit message (Collapse)AuthorAgeFilesLines
...
* Introduce flags for reserve_bootmem()Bernhard Walle2008-02-072-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | This patchset adds a flags variable to reserve_bootmem() and uses the BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions between crashkernel area and already used memory. This patch: Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE. If that flag is set, the function returns with -EBUSY if the memory already has been reserved in the past. This is to avoid conflicts. Because that code runs before SMP initialisation, there's no race condition inside reserve_bootmem_core(). [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix powerpc build] Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: <linux-arch@vger.kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [IA64] honor notify_die() returning NOTIFY_STOPJan Beulich2008-02-051-3/+5
| | | | | | | | | This requires making die() and die_if_kernel() return a value, and their callers to honor this (and be prepared that it returns). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
* [IA64] Avoid unnecessary TLB flushes when allocating memoryde Dinechin, Christophe (Integrity VM)2007-12-181-3/+15
| | | | | | | | | | | | | | | | | | | Improve performance of memory allocations on ia64 by avoiding a global TLB purge to purge a single page from the file cache. This happens whenever we evict a page from the buffer cache to make room for some other allocation. Test case: Run 'find /usr -type f | xargs cat > /dev/null' in the background to fill the buffer cache, then run something that uses memory, e.g. 'gmake -j50 install'. Instrumentation showed that the number of global TLB purges went from a few millions down to about 170 over a 12 hours run of the above. The performance impact is particularly noticeable under virtualization, because a virtual TLB is generally both larger and slower to purge than a physical one. Signed-off-by: Christophe de Dinechin <ddd@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* [IA64] Add missing "space" to concatenated stringsJoe Perches2007-12-071-1/+1
| | | | | Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* [IA64] Fix section mismatch in contig.c version of per_cpu_init()Tony Luck2007-11-061-33/+41
| | | | | | | | | | | | | | | | | | | | | There is a section mismatch when building CONFIG_FLATMEM=y kernels that also have CONFIG_HOTPLUG_CPU=y WARNING: vmlinux.o(.text+0x5a902): Section mismatch: reference to \ .init.text:__alloc_bootmem (between 'per_cpu_init' and 'count_pages') The issue occurs because per_cpu_init() in mm/contig.c is marked __cpuinit (which is #define'd to nothing on a hot plug cpu configuration) call __alloc_bootmem() (which is an __init function). The usage is actually safe because the __alloc_bootmem() is inside an "if (first_time)" test so that the call is only made while it is still legal to do so. But the warning is irritating. Move the allocation to find_memory(). Signed-off-by: Tony Luck <tony.luck@intel.com>
* [IA64] ia64/mm/init.c: fix section mismatchesAdrian Bunk2007-10-291-2/+2
| | | | | | | | | | | | | | | | This patch fixes the following section mismatches: <-- snip --> ... WARNING: vmlinux.o(.text+0x5b5c2): Section mismatch: reference to .init.text:memmap_init_zone (between 'memmap_init' and 'virtual_memmap_init') WARNING: vmlinux.o(.text+0x5b842): Section mismatch: reference to .init.text:memmap_init_zone (between 'virtual_memmap_init' and 'ia64_mmu_init') ... <-- snip --> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
* pid namespaces: define is_global_init() and is_container_init()Serge E. Hallyn2007-10-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | is_init() is an ambiguous name for the pid==1 check. Split it into is_global_init() and is_container_init(). A cgroup init has it's tsk->pid == 1. A global init also has it's tsk->pid == 1 and it's active pid namespace is the init_pid_ns. But rather than check the active pid namespace, compare the task structure with 'init_pid_ns.child_reaper', which is initialized during boot to the /sbin/init process and never changes. Changelog: 2.6.22-rc4-mm2-pidns1: - Use 'init_pid_ns.child_reaper' to determine if a given task is the global init (/sbin/init) process. This would improve performance and remove dependence on the task_pid(). 2.6.21-mm2-pidns2: - [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc, ppc,avr32}/traps.c for the _exception() call to is_global_init(). This way, we kill only the cgroup if the cgroup's init has a bug rather than force a kernel panic. [akpm@linux-foundation.org: fix comment] [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c] [bunk@stusta.de: kernel/pid.c: remove unused exports] [sukadev@us.ibm.com: Fix capability.c to work with threaded init] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Pavel Emelianov <xemul@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Herbert Poetzel <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* setup vma->vm_page_prot by vm_get_page_prot()Coly Li2007-10-191-1/+1
| | | | | | | | | | | | | This patch uses vm_get_page_prot() to setup vma->vm_page_prot. Though inside vm_get_page_prot() the protection flags is AND with (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED), it does not hurt correct code. Signed-off-by: Coly Li <coyli@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add vmcoreinfoKen'ichi Ohmichi2007-10-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch set frees the restriction that makedumpfile users should install a vmlinux file (including the debugging information) into each system. makedumpfile command is the dump filtering feature for kdump. It creates a small dumpfile by filtering unnecessary pages for the analysis. To distinguish unnecessary pages, it needs a vmlinux file including the debugging information. These days, the debugging package becomes a huge file, and it is hard to install it into each system. To solve the problem, kdump developers discussed it at lkml and kexec-ml. As the result, we reached the conclusion that necessary information for dump filtering (called "vmcoreinfo") should be embedded into the first kernel file and it should be accessed through /proc/vmcore during the second kernel. (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html) Dan Aloni created the patch set for the above implementation. (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html) And I updated it for multi architectures and memory models. (http://lists.infradead.org/pipermail/kexec/2007-August/000479.html) Signed-off-by: Dan Aloni <da-x@monatomic.org> Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Bernhard Walle <bwalle@suse.de> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fix memory hot remove not configured case.KAMEZAWA Hiroyuki2007-10-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess. This patch cleans up them. This is against 2.6.23-rc6-mm1. - fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && !CONFIG_MEMORY_HOTREMOVE case. - For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(), which returns -EINVAL. - removed remove_pages() only used in powerpc. - removed no-op remove_memory() in i386, sh, sparc64, x86_64. - only powerpc returns -ENOSYS at memory hot remove(no-op). changes it to return -EINVAL. Note: Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other archs if there are requirements and testers. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memory unplug: ia64 interfaceKAMEZAWA Hiroyuki2007-10-161-1/+11
| | | | | | | | IA64 memory unplug interface. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Do not depend on MAX_ORDER when grouping pages by mobilityMel Gorman2007-10-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently mobility grouping works at the MAX_ORDER_NR_PAGES level. This makes sense for the majority of users where this is also the huge page size. However, on platforms like ia64 where the huge page size is runtime configurable it is desirable to group at a lower order. On x86_64 and occasionally on x86, the hugepage size may not always be MAX_ORDER_NR_PAGES. This patch groups pages together based on the value of HUGETLB_PAGE_ORDER. It uses a compile-time constant if possible and a variable where the huge page size is runtime configurable. It is assumed that grouping should be done at the lowest sensible order and that the user would not want to override this. If this is not true, page_block order could be forced to a variable initialised via a boot-time kernel parameter. One potential issue with this patch is that IA64 now parses hugepagesz with early_param() instead of __setup(). __setup() is called after the memory allocator has been initialised and the pageblock bitmaps already setup. In tests on one IA64 there did not seem to be any problem with using early_param() and in fact may be more correct as it guarantees the parameter is handled before the parsing of hugepages=. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* flush icache before set_pte() on ia64: flush icache at set_pteKAMEZAWA Hiroyuki2007-10-161-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current ia64 kernel flushes icache by lazy_mmu_prot_update() *after* set_pte(). This is too late. This patch removes lazy_mmu_prot_update and add modfied set_pte() for flushing if necessary. This patch flush icache of a page when new pte has exec bit. && new pte has present bit && new pte is user's page. && (old *ptep is not present || new pte's pfn is not same to old *ptep's ptn) && new pte's page has no Pg_arch_1 bit. Pg_arch_1 is set when a page is cache consistent. I think this condition checks are much easier to understand than considering "Where sync_icache_dcache() should be inserted ?". pte_user() for ia64 was removed by http://lkml.org/lkml/2007/6/12/67 as clean-up. So, I added it again. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* During VM oom condition, kill all threads in process groupWill Schmidt2007-10-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have had complaints where a threaded application is left in a bad state after one of it's threads is killed when we hit a VM: out_of_memory condition. Killing just one of the process threads can leave the application in a bad state, whereas killing the entire process group would allow for the application to restart, or be otherwise handled, and makes it very obvious that something has gone wrong. This change allows the entire process group to be taken down, rather than just the one thread. Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* IA64: SPARSEMEM_VMEMMAP 16K page size supportChristoph Lameter2007-10-161-0/+8
| | | | | | | | | | | | | | | | | | | | Equip IA64 sparsemem with a virtual memmap. This is similar to the existing CONFIG_VIRTUAL_MEM_MAP functionality for DISCONTIGMEM. It uses a PAGE_SIZE mapping. This is provided as a minimally intrusive solution. We split the 128TB VMALLOC area into two 64TB areas and use one for the virtual memmap. This should replace CONFIG_VIRTUAL_MEM_MAP long term. [apw@shadowen.org: convert to new helper based initialisation] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'release' of ↵Linus Torvalds2007-09-042-0/+6
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
| * [IA64] Stop bogus NMI & softlockup warnings in ia64 show_memPrarit Bhargava2007-09-012-0/+6
| | | | | | | | | | | | | | | | | | | | | | When dumping memory via sysrq-m it is possible to take a bogus NMI watchdog or softlockup watchdog because the dump can take a long time on big memory systems. Occasionally tickle the watchdog when doing the dump. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | hugepage: fix broken check for offset alignment in hugepage mappingsDavid Gibson2007-08-311-4/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | For hugepage mappings, the file offset, like the address and size, needs to be aligned to the size of a hugepage. In commit 68589bc353037f233fe510ad9ff432338c95db66, the check for this was moved into prepare_hugepage_range() along with the address and size checks. But since BenH's rework of the get_unmapped_area() paths leading up to commit 4b1d89290b62bb2db476c94c82cf7442aab440c8, prepare_hugepage_range() is only called for MAP_FIXED mappings, not for other mappings. This means we're no longer ever checking for an aligned offset - I've confirmed that mmap() will (apparently) succeed with a misaligned offset on both powerpc and i386 at least. This patch restores the check, removing it from prepare_hugepage_range() and putting it back into hugetlbfs_file_mmap(). I'm putting it there, rather than in the get_unmapped_area() path so it only needs to go in one place, than separately in the half-dozen or so arch-specific implementations of hugetlb_get_unmapped_area(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Cc: Adam Litke <agl@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [IA64] Failure to grow RBSAndrew Burgess2007-08-161-3/+11
| | | | | | | | | | | | | | | There is a bug in the ia64_do_page_fault code that can cause a failure to grow the register backing store, or any mapping that is marked as VM_GROWSUP if the mapping is the highest mapped area of memory. When the address accessed is below the first mapping the previous mapping is returned as NULL, and this case is handled. However, when the address accessed is above the highest mapping the vma returned is NULL, this case is not handled correctly, and it fails to spot that this access might require an existing mapping to grow upwards. Signed-off-by: Andrew Burgess <andrew@transitive.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* mm: fault feedback #2Nick Piggin2007-07-191-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch completes Linus's wish that the fault return codes be made into bit flags, which I agree makes everything nicer. This requires requires all handle_mm_fault callers to be modified (possibly the modifications should go further and do things like fault accounting in handle_mm_fault -- however that would be for another patch). [akpm@linux-foundation.org: fix alpha build] [akpm@linux-foundation.org: fix s390 build] [akpm@linux-foundation.org: fix sparc build] [akpm@linux-foundation.org: fix sparc64 build] [akpm@linux-foundation.org: fix ia64 build] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Still apparently needs some ARM and PPC loving - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [IA64] silence GCC ia64 unused variable warningsJes Sorensen2007-07-111-1/+1
| | | | | | | | Tell GCC to stop spewing out unnecessary warnings for unused variables passed to functions as pointers for ia64 files. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* [IA64] is_power_of_2-ia64/mm/hugetlbpage.cvignesh babu2007-06-261-1/+2
| | | | | | | | Replacing (n & (n-1)) in the context of power of 2 checks with is_power_of_2 Signed-off-by: vignesh babu <vignesh.babu@wipro.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* [IA64] optimize pagefaults a littleChristoph Hellwig2007-05-161-27/+14
| | | | | | | | | Get rid of the notifier list and call the kprobes code directly if compiled in. This mirrors the changes that recently went into powerpc, s390 and sparc64. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
* [IA64] spelling fixes: arch/ia64/Simon Arlott2007-05-111-1/+1
| | | | | | | Spelling and apostrophe fixes in arch/ia64/. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Tony Luck <tony.luck@intel.com>
* [IA64] Quicklist support for IA64Christoph Lameter2007-05-113-53/+2
| | | | | | | | | IA64 is the origin of the quicklist implementation. So cut out the pieces that are now in core code and modify the functions called. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6Linus Torvalds2007-05-091-3/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] wire up pselect, ppoll [IA64] Add TIF_RESTORE_SIGMASK [IA64] unwind did not work for processes born with CLONE_STOPPED [IA64] Optional method to purge the TLB on SN systems [IA64] SPIN_LOCK_UNLOCKED macro cleanup in arch/ia64 [IA64-SN2][KJ] mmtimer.c-kzalloc [IA64] fix stack alignment for ia32 signal handlers [IA64] - Altix: hotplug after intr redirect can crash system [IA64] save and restore cpus_allowed in cpu_idle_wait [IA64] Removal of percpu TR cleanup in kexec code [IA64] Fix some section mismatch errors
| * [IA64] SPIN_LOCK_UNLOCKED macro cleanup in arch/ia64Milind Arun Choudhary2007-05-081-3/+3
| | | | | | | | | | | | | | SPIN_LOCK_UNLOCKED macro cleanup, use __SPIN_LOCK_UNLOCKED instead. Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | header cleaning: don't include smp_lock.h when not usedRandy Dunlap2007-05-082-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | move die notifier handling to common codeChristoph Hellwig2007-05-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the die notifier handling to common code. Previous various architectures had exactly the same code for it. Note that the new code is compiled unconditionally, this should be understood as an appel to the other architecture maintainer to implement support for it aswell (aka sprinkling a notify_die or two in the proper place) arm had a notifiy_die that did something totally different, I renamed it to arm_notify_die as part of the patch and made it static to the file it's declared and used at. avr32 used to pass slightly less information through this interface and I brought it into line with the other architectures. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix vmalloc_sync_all bustage] [bryan.wu@analog.com: fix vmalloc_sync_all in nommu] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: <linux-arch@vger.kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Fix section mismatch of memory hotplug related code.Yasunori Goto2007-05-081-0/+2
|/ | | | | | | | | This is to fix many section mismatches of code related to memory hotplug. I checked compile with memory hotplug on/off on ia64 and x86-64 box. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6Linus Torvalds2007-05-072-24/+65
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] update memory attribute aliasing documentation & test cases [IA64] fail mmaps that span areas with incompatible attributes [IA64] allow WB /sys/.../legacy_mem mmaps [IA64] make ioremap avoid unsupported attributes [IA64] rename ioremap variables to match i386 [IA64] relax per-cpu TLB requirement to DTC [IA64] remove per-cpu ia64_phys_stacked_size_p8 [IA64] Fix example error injection program [IA64] Itanium MC Error Injection Tool: pal_mc_error_inject() interface [IA64] Itanium MC Error Injection Tool: Makefile changes [IA64] Itanium MC Error Injection Tool: Driver sysfs interface [IA64] Itanium MC Error Injection Tool: Doc and sample application [IA64] Itanium MC Error Injection Tool: Kernel configuration
| * Pull mem-attribute into release branchTony Luck2007-04-301-14/+64
| |\
| | * [IA64] make ioremap avoid unsupported attributesBjorn Helgaas2007-03-301-5/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Example memory map (from HP sx1000 with VGA enabled): 0x00000 - 0x9FFFF supports only WB (cacheable) access 0xA0000 - 0xBFFFF supports only UC (uncacheable) access 0xC0000 - 0xFFFFF supports only WB (cacheable) access pci_read_rom() indirectly uses ioremap(0xC0000) to read the shadow VGA option ROM. ioremap() used to default to a 16MB or 64MB UC kernel identity mapping, which would cause an MCA when reading 0xC0000 since only WB is supported there. X uses reads the option ROM to initialize devices. A smaller test case is: # echo 1 > /sys/bus/pci/devices/0000:aa:03.0/rom # cp /sys/bus/pci/devices/0000:aa:03.0/rom x To avoid this, we can use the same ioremap_page_range() strategy that most architectures use for all ioremaps. These page table mappings come out of the vmalloc area. On ia64, these are in region 5 (0xA... addresses) and typically use 16KB or 64KB mappings instead of 16MB or 64MB mappings. The smaller mappings give more flexibility to use the correct attributes. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| | * [IA64] rename ioremap variables to match i386Bjorn Helgaas2007-03-301-13/+13
| | | | | | | | | | | | | | | | | | | | | No functional change, just use the same names as i386. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * | Pull percpu-dtc into release branchTony Luck2007-04-301-10/+1
| |\ \ | | |/ | |/|
| | * [IA64] relax per-cpu TLB requirement to DTCChen, Kenneth W2007-02-061-10/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of pinning per-cpu TLB into a DTR, use DTC. This will free up one TLB entry for application, or even kernel if access pattern to per-cpu data area has high temporal locality. Since per-cpu is mapped at the top of region 7 address, we just need to add special case in alt_dtlb_miss. The physical address of per-cpu data is already conveniently stored in IA64_KR(PER_CPU_DATA). Latency for alt_dtlb_miss is not affected as we can hide all the latency. It was measured that alt_dtlb_miss handler has 23 cycles latency before and after the patch. The performance effect is massive for applications that put lots of tlb pressure on CPU. Workload environment like database online transaction processing or application uses tera-byte of memory would benefit the most. Measurement with industry standard database benchmark shown an upward of 1.6% gain. While smaller workloads like cpu, java also showing small improvement. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | | get_unmapped_area handles MAP_FIXED on ia64Benjamin Herrenschmidt2007-05-071-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handle MAP_FIXED in ia64 arch_get_unmapped_area and hugetlb_get_unmapped_area(), just call prepare_hugepage_range in the later and is_hugepage_only_range() in the former. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: William Irwin <bill.irwin@oracle.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Make page->private usable in compound pagesChristoph Lameter2007-05-071-1/+1
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we add a new flag so that we can distinguish between the first page and the tail pages then we can avoid to use page->private in the first page. page->private == page for the first page, so there is no real information in there. Freeing up page->private makes the use of compound pages more transparent. They become more usable like real pages. Right now we have to be careful f.e. if we are going beyond PAGE_SIZE allocations in the slab on i386 because we can then no longer use the private field. This is one of the issues that cause us not to support debugging for page size slabs in SLAB. Having page->private available for SLUB would allow more meta information in the page struct. I can probably avoid the 16 bit ints that I have in there right now. Also if page->private is available then a compound page may be equipped with buffer heads. This may free up the way for filesystems to support larger blocks than page size. We add PageTail as an alias of PageReclaim. Compound pages cannot currently be reclaimed. Because of the alias one needs to check PageCompound first. The RFC for the this approach was discussed at http://marc.info/?t=117574302800001&r=1&w=2 [nacc@us.ibm.com: fix hugetlbfs] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | [IA64] bugfix stack layout upside-downKAMEZAWA Hiroyuki2007-03-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ia64 expects following vm layout: == low memory [register-stack grows up] [memory-stack grows down] == high memory But the code assigns the base of the register stack at the maximum stack size offset from the fixed address where the stack *might* start. Stack randomization will result in the memory stack starting at a lower address than this, and if the user has set a low stack limit with "ulimit -s", then you can end up with the register stack above the memory stack (or if you were very unlucky right on top of it!). Fix: Calculate the base address for the register stack starting from the actual address of the memory stack. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | [IA64] min_low_pfn and max_low_pfn calculation fixZou Nan hai2007-03-203-27/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have seen bad_pte_print when testing crashdump on an SN machine in recent 2.6.20 kernel. There are tons of bad pte print (pfn < max_low_pfn) reports when the crash kernel boots up, all those reported bad pages are inside initmem range; That is because if the crash kernel code and data happens to be at the beginning of the 1st node. build_node_maps in discontig.c will bypass reserved regions with filter_rsvd_memory. Since min_low_pfn is calculated in build_node_map, so in this case, min_low_pfn will be greater than kernel code and data. Because pages inside initmem are freed and reused later, we saw pfn_valid check fail on those pages. I think this theoretically happen on a normal kernel. When I check min_low_pfn and max_low_pfn calculation in contig.c and discontig.c. I found more issues than this. 1. min_low_pfn and max_low_pfn calculation is inconsistent between contig.c and discontig.c, min_low_pfn is calculated as the first page number of boot memmap in contig.c (Why? Though this may work at the most of the time, I don't think it is the right logic). It is calculated as the lowest physical memory page number bypass reserved regions in discontig.c. max_low_pfn is calculated include reserved regions in contig.c. It is calculated exclude reserved regions in discontig.c. 2. If kernel code and data region is happen to be at the begin or the end of physical memory, when min_low_pfn and max_low_pfn calculation is bypassed kernel code and data, pages in initmem will report bad. 3. initrd is also in reserved regions, if it is at the begin or at the end of physical memory, kernel will refuse to reuse the memory. Because the virt_addr_valid check in free_initrd_mem. So it is better to fix and clean up those issues. Calculate min_low_pfn and max_low_pfn in a consistent way. Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> Acked-by: Jay Lan <jlan@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | [IA64] point saved_max_pfn to the max_pfn of the entire systemHorms2007-03-062-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make saved_max_pfn point to max_pfn of entire system. Without this patch is so that vmcore is zero length on ia64. This is because saved_max_pfn was wrongly being set to the max_pfn of the crash kernel's address space, rather than the max_pfg on the physical memory of the machine - the whole purpose of vmcore is to access physical memory that is not part of the crash kernel's addresss space. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> Sort-Of-Acked-By: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | [PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc().Robert P. J. Day2007-02-111-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the corresponding "kmem_cache_zalloc()" call. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Roland McGrath <roland@redhat.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Acked-by: Joel Becker <Joel.Becker@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | [PATCH] optional ZONE_DMA: optional ZONE_DMA for ia64Christoph Lameter2007-02-112-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | ZONE_DMA less operation for IA64 SGI platform Disable ZONE_DMA for SGI SN2. All memory is addressable by all devices and we do not need any special memory pool. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | [PATCH] Drop nr_free_pages_pgdat()Christoph Lameter2007-02-111-1/+1
| | | | | | | | | | | | | | | | | | Function is unnecessary now. We can use the summing features of the ZVCs to get the values we need. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | [IA64] swiotlb bug fixesJan Beulich2007-02-051-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes - marking I-cache clean of pages DMAed to now only done for IA64 - broken multiple inclusion in include/asm-x86_64/swiotlb.h - missing call to mark_clean in swiotlb_sync_sg() - a (perhaps only theoretical) issue in swiotlb_dma_supported() when io_tlb_end is exactly at the end of memory Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | [IA64] clean up sparsemem memory_present callBob Picco2007-02-051-33/+3
| | | | | | | | | | | | | | | | | | | | Eliminate arch specific memory_present call ia64 NUMA by utilizing sparse_memory_present_with_active_regions. Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | [IA64] show_mem() for IA64 sparsemem NUMAGeorge Beshers2007-02-051-26/+48
| | | | | | | | | | | | | | | | | | | | On the ia64 architecture only this patch upgrades show_mem() for sparse memory to be the same as it was for discontig memory. It has been shown to work on NUMA and flatmem architectures. Signed-off-by: George Beshers <gbeshers@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | [IA64] register memory ranges in a consistent mannerBob Picco2007-02-052-3/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While pursuing and unrelated issue with 64Mb granules I noticed a problem related to inconsistent use of add_active_range. There doesn't appear any reason to me why FLATMEM versus DISCONTIG_MEM should register memory to add_active_range with different code. So I've changed the code into a common implementation. The other subtle issue fixed by this patch was calling add_active_range in count_node_pages before granule aligning is performed. We were lucky with 16MB granules but not so with 64MB granules. count_node_pages has reserved regions filtered out and as a consequence linked kernel text and data aren't covered by calls to count_node_pages. So linked kernel regions wasn't reported to add_active_regions. This resulted in free_initmem causing numerous bad_page reports. This won't occur with this patch because now all known memory regions are reported by register_active_ranges. Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Bob Picco <bob.picco@hp.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | [IA64] kexec: typo in the saved_max_pfn description in contig.cHorms2007-02-051-1/+1
| | | | | | | | | | | | | | Fix a typo in the saved_max_pfn description in contig.c Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | [IA64] Zero size /proc/vmcore on ia64Horms2007-02-051-0/+6
|/ | | | | | | | | | | | | Set saved_max_pfn when discontig memory is in use. This sets up saved_max_pfn when disctontig memory is in use. This mirrors the code for contig memory. This patch does not entirely solve the problem of making vmcore work, however it does appear to be neccessary. Please consider applying. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Tony Luck <tony.luck@intel.com>