From b5810039a54e5babf428e9a1e89fc1940fabff11 Mon Sep 17 00:00:00 2001 From: Nick Piggin Date: Sat, 29 Oct 2005 18:16:12 -0700 Subject: [PATCH] core remove PageReserved Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality. PageReserved special casing is removed from get_page and put_page. All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages. MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept). Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed. Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work). A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue. Signed-off-by: Nick Piggin Refcount bug fix for filemap_xip.c Signed-off-by: Carsten Otte Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/ppc64/kernel/vdso.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) (limited to 'arch/ppc64') diff --git a/arch/ppc64/kernel/vdso.c b/arch/ppc64/kernel/vdso.c index efa985f..4aacf52 100644 --- a/arch/ppc64/kernel/vdso.c +++ b/arch/ppc64/kernel/vdso.c @@ -176,13 +176,13 @@ static struct page * vdso_vma_nopage(struct vm_area_struct * vma, return NOPAGE_SIGBUS; /* - * Last page is systemcfg, special handling here, no get_page() a - * this is a reserved page + * Last page is systemcfg. */ if ((vma->vm_end - address) <= PAGE_SIZE) - return virt_to_page(systemcfg); + pg = virt_to_page(systemcfg); + else + pg = virt_to_page(vbase + offset); - pg = virt_to_page(vbase + offset); get_page(pg); DBG(" ->page count: %d\n", page_count(pg)); @@ -259,7 +259,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int executable_stack) * gettimeofday will be totally dead. It's fine to use that for setting * breakpoints in the vDSO code pages though */ - vma->vm_flags = VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + vma->vm_flags = VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC | VM_RESERVED; vma->vm_flags |= mm->def_flags; vma->vm_page_prot = protection_map[vma->vm_flags & 0x7]; vma->vm_ops = &vdso_vmops; @@ -603,6 +603,8 @@ void __init vdso_init(void) ClearPageReserved(pg); get_page(pg); } + + get_page(virt_to_page(systemcfg)); } int in_gate_area_no_task(unsigned long addr) -- cgit v1.1 From 872fec16d9a0ed3b75b8893aa217e49cca575ee5 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Sat, 29 Oct 2005 18:16:21 -0700 Subject: [PATCH] mm: init_mm without ptlock First step in pushing down the page_table_lock. init_mm.page_table_lock has been used throughout the architectures (usually for ioremap): not to serialize kernel address space allocation (that's usually vmlist_lock), but because pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it. Reverse that: don't lock or unlock init_mm.page_table_lock in any of the architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take and drop it when allocating a new one, to check lest a racing task already did. Similarly no page_table_lock in vmalloc's map_vm_area. Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle user mms, which are converted only by a later patch, for now they have to lock differently according to whether or not it's init_mm. If sources get muddled, there's a danger that an arch source taking init_mm.page_table_lock will be mixed with common source also taking it (or neither take it). So break the rules and make another change, which should break the build for such a mismatch: remove the redundant mm arg from pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13). Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64 used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64 map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free took page_table_lock for no good reason. Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/ppc64/mm/imalloc.c | 5 ----- arch/ppc64/mm/init.c | 4 +--- 2 files changed, 1 insertion(+), 8 deletions(-) (limited to 'arch/ppc64') diff --git a/arch/ppc64/mm/imalloc.c b/arch/ppc64/mm/imalloc.c index c65b87b..f4ca29c 100644 --- a/arch/ppc64/mm/imalloc.c +++ b/arch/ppc64/mm/imalloc.c @@ -300,12 +300,7 @@ void im_free(void * addr) for (p = &imlist ; (tmp = *p) ; p = &tmp->next) { if (tmp->addr == addr) { *p = tmp->next; - - /* XXX: do we need the lock? */ - spin_lock(&init_mm.page_table_lock); unmap_vm_area(tmp); - spin_unlock(&init_mm.page_table_lock); - kfree(tmp); up(&imlist_sem); return; diff --git a/arch/ppc64/mm/init.c b/arch/ppc64/mm/init.c index be64b15..a45584b 100644 --- a/arch/ppc64/mm/init.c +++ b/arch/ppc64/mm/init.c @@ -155,7 +155,6 @@ static int map_io_page(unsigned long ea, unsigned long pa, int flags) unsigned long vsid; if (mem_init_done) { - spin_lock(&init_mm.page_table_lock); pgdp = pgd_offset_k(ea); pudp = pud_alloc(&init_mm, pgdp, ea); if (!pudp) @@ -163,12 +162,11 @@ static int map_io_page(unsigned long ea, unsigned long pa, int flags) pmdp = pmd_alloc(&init_mm, pudp, ea); if (!pmdp) return -ENOMEM; - ptep = pte_alloc_kernel(&init_mm, pmdp, ea); + ptep = pte_alloc_kernel(pmdp, ea); if (!ptep) return -ENOMEM; set_pte_at(&init_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, __pgprot(flags))); - spin_unlock(&init_mm.page_table_lock); } else { unsigned long va, vpn, hash, hpteg; -- cgit v1.1 From 208d54e5513c0c02d85af0990901354c74364d5c Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Sat, 29 Oct 2005 18:16:52 -0700 Subject: [PATCH] memory hotplug locking: node_size_lock pgdat->node_size_lock is basically only neeeded in one place in the normal code: show_mem(), which is the arch-specific sysrq-m printing function. Strictly speaking, the architectures not doing memory hotplug do no need this locking in show_mem(). However, they are all included for completeness. This should also make any future consolidation of all of the implementations a little more straightforward. This lock is also held in the sparsemem code during a memory removal, as sections are invalidated. This is the place there pfn_valid() is made false for a memory area that's being removed. The lock is only required when doing pfn_valid() operations on memory which the user does not already have a reference on the page, such as in show_mem(). Signed-off-by: Dave Hansen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/ppc64/mm/init.c | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'arch/ppc64') diff --git a/arch/ppc64/mm/init.c b/arch/ppc64/mm/init.c index a45584b..975b26d 100644 --- a/arch/ppc64/mm/init.c +++ b/arch/ppc64/mm/init.c @@ -104,6 +104,8 @@ void show_mem(void) show_free_areas(); printk("Free swap: %6ldkB\n", nr_swap_pages<<(PAGE_SHIFT-10)); for_each_pgdat(pgdat) { + unsigned long flags; + pgdat_resize_lock(pgdat, &flags); for (i = 0; i < pgdat->node_spanned_pages; i++) { page = pgdat_page_nr(pgdat, i); total++; @@ -114,6 +116,7 @@ void show_mem(void) else if (page_count(page)) shared += page_count(page) - 1; } + pgdat_resize_unlock(pgdat, &flags); } printk("%ld pages of RAM\n", total); printk("%ld reserved pages\n", reserved); @@ -647,11 +650,14 @@ void __init mem_init(void) #endif for_each_pgdat(pgdat) { + unsigned long flags; + pgdat_resize_lock(pgdat, &flags); for (i = 0; i < pgdat->node_spanned_pages; i++) { page = pgdat_page_nr(pgdat, i); if (PageReserved(page)) reservedpages++; } + pgdat_resize_unlock(pgdat, &flags); } codesize = (unsigned long)&_etext - (unsigned long)&_stext; -- cgit v1.1 From bb7e7e032d2cb8e0e9a88a2be209de5e61033b39 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Sat, 29 Oct 2005 18:16:58 -0700 Subject: [PATCH] memory hotplug: ppc64 specific hot-add functions Here is a set of ppc64 specific patches that at least allow compilation/booting with the following configurations: FLATMEM SPARSEMEN SPARSEMEM + MEMORY_HOTPLUG Signed-off-by: Mike Kravetz Signed-off-by: Dave Hansen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/ppc64/mm/init.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) (limited to 'arch/ppc64') diff --git a/arch/ppc64/mm/init.c b/arch/ppc64/mm/init.c index 975b26d..e2bd777 100644 --- a/arch/ppc64/mm/init.c +++ b/arch/ppc64/mm/init.c @@ -871,3 +871,80 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long addr, return vma_prot; } EXPORT_SYMBOL(phys_mem_access_prot); + +#ifdef CONFIG_MEMORY_HOTPLUG + +void online_page(struct page *page) +{ + ClearPageReserved(page); + free_cold_page(page); + totalram_pages++; + num_physpages++; +} + +/* + * This works only for the non-NUMA case. Later, we'll need a lookup + * to convert from real physical addresses to nid, that doesn't use + * pfn_to_nid(). + */ +int __devinit add_memory(u64 start, u64 size) +{ + struct pglist_data *pgdata = NODE_DATA(0); + struct zone *zone; + unsigned long start_pfn = start >> PAGE_SHIFT; + unsigned long nr_pages = size >> PAGE_SHIFT; + + /* this should work for most non-highmem platforms */ + zone = pgdata->node_zones; + + return __add_pages(zone, start_pfn, nr_pages); + + return 0; +} + +/* + * First pass at this code will check to determine if the remove + * request is within the RMO. Do not allow removal within the RMO. + */ +int __devinit remove_memory(u64 start, u64 size) +{ + struct zone *zone; + unsigned long start_pfn, end_pfn, nr_pages; + + start_pfn = start >> PAGE_SHIFT; + nr_pages = size >> PAGE_SHIFT; + end_pfn = start_pfn + nr_pages; + + printk("%s(): Attempting to remove memoy in range " + "%lx to %lx\n", __func__, start, start+size); + /* + * check for range within RMO + */ + zone = page_zone(pfn_to_page(start_pfn)); + + printk("%s(): memory will be removed from " + "the %s zone\n", __func__, zone->name); + + /* + * not handling removing memory ranges that + * overlap multiple zones yet + */ + if (end_pfn > (zone->zone_start_pfn + zone->spanned_pages)) + goto overlap; + + /* make sure it is NOT in RMO */ + if ((start < lmb.rmo_size) || ((start+size) < lmb.rmo_size)) { + printk("%s(): range to be removed must NOT be in RMO!\n", + __func__); + goto in_rmo; + } + + return __remove_pages(zone, start_pfn, nr_pages); + +overlap: + printk("%s(): memory range to be removed overlaps " + "multiple zones!!!\n", __func__); +in_rmo: + return -1; +} +#endif /* CONFIG_MEMORY_HOTPLUG */ -- cgit v1.1 From 9c0cbd54ce0397017a823484f9a8054ab369b8a2 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Sun, 30 Oct 2005 15:01:41 -0800 Subject: [PATCH] TIOC* compat ioctl handling TIOCSTART and TIOCSTOP are defined in asm/ioctls.h and asm/termios.h by various architectures but not actually implemented anywhere but in the IRIX compatibility layer, so remove their COMPATIBLE_IOCTL from parisc, ppc64 and sparc64. Move the TIOCSLTC COMPATIBLE_IOCTL to common code, guided by an ifdef to only show up on architectures that support it (same as the code handling it in tty_ioctl.c), aswell as it's brother TIOCGLTC that wasn't handled so far. Signed-off-by: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/ppc64/kernel/ioctl32.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'arch/ppc64') diff --git a/arch/ppc64/kernel/ioctl32.c b/arch/ppc64/kernel/ioctl32.c index a8005db..ba4a899 100644 --- a/arch/ppc64/kernel/ioctl32.c +++ b/arch/ppc64/kernel/ioctl32.c @@ -39,9 +39,7 @@ IOCTL_TABLE_START #include #define DECLARES #include "compat_ioctl.c" -COMPATIBLE_IOCTL(TIOCSTART) -COMPATIBLE_IOCTL(TIOCSTOP) -COMPATIBLE_IOCTL(TIOCSLTC) + /* Little p (/dev/rtc, /dev/envctrl, etc.) */ COMPATIBLE_IOCTL(_IOR('p', 20, int[7])) /* RTCGET */ COMPATIBLE_IOCTL(_IOW('p', 21, int[7])) /* RTCSET */ -- cgit v1.1 From dfb7dac3af623a68262536437af008ed6aba4d88 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Sun, 30 Oct 2005 15:02:22 -0800 Subject: [PATCH] unify sys_ptrace prototype Make sure we always return, as all syscalls should. Also move the common prototype to Signed-off-by: Christoph Hellwig Signed-off-by: Miklos Szeredi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/ppc64/kernel/ptrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/ppc64') diff --git a/arch/ppc64/kernel/ptrace.c b/arch/ppc64/kernel/ptrace.c index b1c044c..b33073c 100644 --- a/arch/ppc64/kernel/ptrace.c +++ b/arch/ppc64/kernel/ptrace.c @@ -53,7 +53,7 @@ void ptrace_disable(struct task_struct *child) clear_single_step(child); } -int sys_ptrace(long request, long pid, long addr, long data) +long sys_ptrace(long request, long pid, long addr, long data) { struct task_struct *child; int ret = -EPERM; -- cgit v1.1 From ecea8d19c9f0ebd62ddaa07fc919ff4e4b820d99 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sun, 30 Oct 2005 15:03:00 -0800 Subject: [PATCH] jiffies_64 cleanup Define jiffies_64 in kernel/timer.c rather than having 24 duplicated defines in each architecture. Signed-off-by: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/ppc64/kernel/time.c | 4 ---- 1 file changed, 4 deletions(-) (limited to 'arch/ppc64') diff --git a/arch/ppc64/kernel/time.c b/arch/ppc64/kernel/time.c index b56c6a3..ab56546 100644 --- a/arch/ppc64/kernel/time.c +++ b/arch/ppc64/kernel/time.c @@ -68,10 +68,6 @@ #include #include -u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES; - -EXPORT_SYMBOL(jiffies_64); - /* keep track of when we need to update the rtc */ time_t last_rtc_update; extern int piranha_simulator; -- cgit v1.1 From 4e57b6817880946a3a78d5d8cad1ace363f7e449 Mon Sep 17 00:00:00 2001 From: Tim Schmielau Date: Sun, 30 Oct 2005 15:03:48 -0800 Subject: [PATCH] fix missing includes I recently picked up my older work to remove unnecessary #includes of sched.h, starting from a patch by Dave Jones to not include sched.h from module.h. This reduces the number of indirect includes of sched.h by ~300. Another ~400 pointless direct includes can be removed after this disentangling (patch to follow later). However, quite a few indirect includes need to be fixed up for this. In order to feed the patches through -mm with as little disturbance as possible, I've split out the fixes I accumulated up to now (complete for i386 and x86_64, more archs to follow later) and post them before the real patch. This way this large part of the patch is kept simple with only adding #includes, and all hunks are independent of each other. So if any hunk rejects or gets in the way of other patches, just drop it. My scripts will pick it up again in the next round. Signed-off-by: Tim Schmielau Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/ppc64/kernel/hvcserver.c | 2 ++ arch/ppc64/kernel/of_device.c | 2 ++ arch/ppc64/lib/locks.c | 2 ++ 3 files changed, 6 insertions(+) (limited to 'arch/ppc64') diff --git a/arch/ppc64/kernel/hvcserver.c b/arch/ppc64/kernel/hvcserver.c index bde8f42..4d58417 100644 --- a/arch/ppc64/kernel/hvcserver.c +++ b/arch/ppc64/kernel/hvcserver.c @@ -22,6 +22,8 @@ #include #include #include +#include + #include #include #include diff --git a/arch/ppc64/kernel/of_device.c b/arch/ppc64/kernel/of_device.c index 9f200f0..3aabfd0 100644 --- a/arch/ppc64/kernel/of_device.c +++ b/arch/ppc64/kernel/of_device.c @@ -4,6 +4,8 @@ #include #include #include +#include + #include #include diff --git a/arch/ppc64/lib/locks.c b/arch/ppc64/lib/locks.c index 033643a..d622c1d 100644 --- a/arch/ppc64/lib/locks.c +++ b/arch/ppc64/lib/locks.c @@ -17,6 +17,8 @@ #include #include #include +#include + #include #include -- cgit v1.1