aboutsummaryrefslogtreecommitdiffstats
path: root/arch/i386/kernel/process.c
Commit message (Collapse)AuthorAgeFilesLines
* [PATCH] i386/x86_64: ACPI cpu_idle_wait() fixIngo Molnar2006-11-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The scheduler on Andreas Friedrich's hyperthreading system stopped working properly: the scheduler would never move tasks to another CPU! The lask known working kernel was 2.6.8. After a couple of attempts to corner the bug, the following smoking gun was found: BIOS reported wrong ACPI idfor the processor CPU#1: set_cpus_allowed(), swapper:1, 3 -> 2 [<c0103bbe>] show_trace_log_lvl+0x34/0x4a [<c0103ceb>] show_trace+0x2c/0x2e [<c01045f8>] dump_stack+0x2b/0x2d [<c0116a77>] set_cpus_allowed+0x52/0xec [<c0101d86>] cpu_idle_wait+0x2e/0x100 [<c0259c57>] acpi_processor_power_exit+0x45/0x58 [<c0259752>] acpi_processor_remove+0x46/0xea [<c025c6fb>] acpi_start_single_object+0x47/0x54 [<c025cee5>] acpi_bus_register_driver+0xa4/0xd3 [<c04ab2d7>] acpi_processor_init+0x57/0x77 [<c01004d7>] init+0x146/0x2fd [<c0103a87>] kernel_thread_helper+0x7/0x10 a quick look at cpu_idle_wait() shows how broken that code is on i386: it changes the init task's affinity map but never restores it ... and because all userspace tasks get forked by init, they all inherited that single-CPU affinity mask. x86_64 cloned this bug too. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Andreas Friedrich <andreas.friedrich@fujitsu-siemens.com> Cc: Wolfgang Erig <Wolfgang.Erig@fujitsu-siemens.com> Cc: Andrew Morton <akpm@osdl.org> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: Revert new unwind kernel stack terminationAndi Kleen2006-10-211-5/+1
| | | | | | | | | Jan convinced me that it was unnecessary because the assembly stubs do this already on the stack. Cc: jbeulich@novell.com Signed-off-by: Andi Kleen <ak@suse.de>
* ACPI: Processor native C-states using MWAITVenkatesh Pallipadi2006-10-141-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | | Intel processors starting with the Core Duo support support processor native C-state using the MWAIT instruction. Refer: Intel Architecture Software Developer's Manual http://www.intel.com/design/Pentium4/manuals/253668.htm Platform firmware exports the support for Native C-state to OS using ACPI _PDC and _CST methods. Refer: Intel Processor Vendor-Specific ACPI: Interface Specification http://www.intel.com/technology/iapc/acpi/downloads/302223.htm With Processor Native C-state, we use 'MWAIT' instruction on the processor to enter different C-states (C1, C2, C3). We won't use the special IO ports to enter C-state and no SMM mode etc required to enter C-state. Overall this will mean better C-state support. One major advantage of using MWAIT for all C-states is, with this and "treat interrupt as break event" feature of MWAIT, we can now get accurate timing for the time spent in C1, C2, .. states. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Len Brown <len.brown@intel.com>
* [PATCH] x86: Terminate the kernel stacks for the unwinderAndi Kleen2006-10-051-1/+5
| | | | | | | | | | | | Always make sure RIP/EIP is 0 in the registers stored on the top of the stack of a kernel thread. This makes sure the unwinder code won't try a fallback but knows the stack has ended. AK: this patch is a bit mysterious. in theory they should be terminated anyways, but it seems to fix at least one crash. Anyways double termination probably doesn't hurt. Signed-off-by: Andi Kleen <ak@suse.de>
* [PATCH] namespaces: utsname: use init_utsname when appropriateSerge E. Hallyn2006-10-021-3/+3
| | | | | | | | | | | | | | | | | | | | | In some places, particularly drivers and __init code, the init utsns is the appropriate one to use. This patch replaces those with a the init_utsname helper. Changes: Removed several uses of init_utsname(). Hope I picked all the right ones in net/ipv4/ipconfig.c. These are now changed to utsname() (the per-process namespace utsname) in the previous patch (2/7) [akpm@osdl.org: CIFS fix] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] kmemdup: some usersAlexey Dobriyan2006-10-011-3/+2
| | | | | Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: Allow a kernel not to be in ring 0Rusty Russell2006-09-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We allow for the fact that the guest kernel may not run in ring 0. This requires some abstraction in a few places when setting %cs or checking privilege level (user vs kernel). This is Chris' [RFC PATCH 15/33] move segment checks to subarch, except rather than using #define USER_MODE_MASK which depends on a config option, we use Zach's more flexible approach of assuming ring 3 == userspace. I also used "get_kernel_rpl()" over "get_kernel_cs()" because I think it reads better in the code... 1) Remove the hardcoded 3 and introduce #define SEGMENT_RPL_MASK 3 2) Add a get_kernel_rpl() macro, and don't assume it's zero. And: Clean up of patch for letting kernel run other than ring 0: a. Add some comments about the SEGMENT_IS_*_CODE() macros. b. Add a USER_RPL macro. (Code was comparing a value to a mask in some places and to the magic number 3 in other places.) c. Add macros for table indicator field and use them. d. Change the entry.S tests for LDT stack segment to use the macros Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>
* [PATCH] i386: move kernel_thread_helper into entry.SAndi Kleen2006-09-261-9/+0
| | | | | | | | | | | And add proper CFI annotation to it which was previously impossible. This prevents "stuck" messages by the dwarf2 unwinder when reaching the top of a kernel stack. Includes feedback from Jan Beulich Cc: jbeulich@novell.com Signed-off-by: Andi Kleen <ak@suse.de>
* [PATCH] i386/x86-64: Don't randomize stack top when no randomization ↵Andi Kleen2006-09-261-1/+2
| | | | | | | | | personality is set Based on patch from Frank van Maarseveen <frankvm@frankvm.com>, but extended. Signed-off-by: Andi Kleen <ak@suse.de>
* [PATCH] i386: switch_to(): misplaced parenthesesChuck Ebbert2006-07-281-2/+2
| | | | | | | | | Recent changes in i386 __switch_to() have a misplaced closing parenthesis causing an unlikely() to terminate early. Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: use thread_info flags for debug regs and IO bitmapsStephane Eranian2006-07-091-21/+29
| | | | | | | | | | | | | | | | | | | | Use thread info flags to track use of debug registers and IO bitmaps. - add TIF_DEBUG to track when debug registers are active - add TIF_IO_BITMAP to track when I/O bitmap is used - modify __switch_to() to use the new TIF flags Performance tested on Pentium II, ten runs of LMbench context switch benchmark (smaller is better:) before after avg 3.65 3.39 min 3.55 3.33 Signed-off-by: Stephane Eranian <eranian@hpl.hp.com> Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Remove obsolete #include <linux/config.h>Jörn Engel2006-06-301-1/+0
| | | | | Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
* [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_statusAndi Kleen2006-06-261-3/+3
| | | | | | | | | | | | | | | | | | During some profiling I noticed that default_idle causes a lot of memory traffic. I think that is caused by the atomic operations to clear/set the polling flag in thread_info. There is actually no reason to make this atomic - only the idle thread does it to itself, other CPUs only read it. So I moved it into ti->status. Converted i386/x86-64/ia64 for now because that was the easiest way to fix ACPI which also manipulates these flags in its idle function. Cc: Nick Piggin <npiggin@novell.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: reliable stack trace support (i386)Jan Beulich2006-06-261-1/+1
| | | | | | | | | | These are the i386-specific pieces to enable reliable stack traces. This is going to be even more useful once CFI annotations get added to he assembly code, namely to entry.S. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] unexport get_wchanAdrian Bunk2006-03-311-1/+0
| | | | | | | | The only user of get_wchan is the proc fs - and proc can't be built modular. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] kretprobe instance recycled by parent processbibo mao2006-03-261-8/+0
| | | | | | | | | | | | | | | | | When kretprobe probes the schedule() function, if the probed process exits then schedule() will never return, so some kretprobe instances will never be recycled. In this patch the parent process will recycle retprobe instances of the probed function and there will be no memory leak of kretprobe instances. Signed-off-by: bibo mao <bibo.mao@intel.com> Cc: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: fix uses of user_mode() vs. user_mode_vm()Jan Beulich2006-03-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | >commit 76381fee7e8feb4c22be636aa5d4765dbe4fbf9e >Author: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk> >Date: Thu Jun 23 00:08:46 2005 -0700 > > [PATCH] xen: x86_64: use more usermode macro > > Make use of the user_mode macro where it's possible. This is useful for Xen > because it will need only to redefine only the macro to a hypervisor call. I am of the opinion that the above changeset is incomplete, i.e. it missed converting some previous uses of user_mode to user_mode_vm. While most of them could be considered just cosmetical, at least the one in die_nmi doesn't appear to be. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk> Cc: Zachary Amsden <zach@vmware.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: print kernel version in register dumpsChuck Ebbert2006-02-051-2/+4
| | | | | | | | | | | | | Show first field of kernel version in register dumps like x86_64 does. Changes output from e.g.: (2.6.16-rc1) to: (2.6.16-rc1 #12) Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: task_stack_page()Al Viro2006-01-121-1/+1
| | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: fix task_pt_regs()akpm@osdl.org2006-01-121-18/+2
| | | | | | | | | | | | ) From: Al Viro <viro@ftp.linux.org.uk> task_pt_regs() needs the same offset-by-8 to match copy_thread() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: task_thread_info()Al Viro2006-01-121-2/+2
| | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Make vm86 support optionalMatt Mackall2006-01-081-0/+1
| | | | | | | | | | | | | | | | | | | This adds an option to remove vm86 support under CONFIG_EMBEDDED. Saves about 5k. This version eliminates most of the #ifdefs of the previous version and instead uses function stubs in vm86.h. Also, release_vm86_irqs is moved from asm-i386/irq.h to a more appropriate home in vm86.h so that the stubs can live together. $ size vmlinux-baseline vmlinux-novm86 text data bss dec hex filename 2920821 523232 190652 3634705 377611 vmlinux-baseline 2916268 523100 190492 3629860 376324 vmlinux-novm86 Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: Deprecate useless bugZachary Amsden2006-01-061-11/+1
| | | | | | | | | | | | Remove the "temporary debugging check" which has managed to live for quite some time, and is clearly unneeded. The mm can never be live at this point, so clearly checking the LDT in the mm->context is redundant as well. Signed-off-by: Zachary Amsden <zach@vmware.com> Cc: "Seth, Rohit" <rohit.seth@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: Cr4 is valid on some 486sZachary Amsden2006-01-061-3/+1
| | | | | | | | | | | | | | So some 486 processors do have CR4 register. Allow them to present it in register dumps by using the old fault technique rather than testing processor family. Thanks to Maciej for noticing this. Signed-off-by: Zachary Amsden <zach@vmware.com> Cc: "Seth, Rohit" <rohit.seth@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: teach dump_task_regs() about the -8 offset.Stas Sergeev2005-12-311-1/+3
| | | | | | | This should fix multi-threaded core-files Signed-off-by: stsp@aknet.ru Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] kprobes: Fix return probes on sys_execveJim Keniston2005-11-231-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a bug in kprobes that can cause an Oops or even a crash when a return probe is installed on one of the following functions: sys_execve, do_execve, load_*_binary, flush_old_exec, or flush_thread. The fix is to remove the call to kprobe_flush_task() in flush_thread(). This fix has been tested on all architectures for which the return-probes feature has been implemented (i386, x86_64, ppc64, ia64). Please apply. BACKGROUND Up to now, we have called kprobe_flush_task() under two situations: when a task exits, and when it execs. Flushing kretprobe_instances on exit is correct because (a) do_exit() doesn't return, and (b) one or more return-probed functions may be active when a task calls do_exit(). Neither is the case for sys_execve() and its callees. Initially, the mistaken call to kprobe_flush_task() on exec was harmless because we put the "real" return address of each active probed function back in the stack, just to be safe, when we recycled its kretprobe_instance. When support for ppc64 and ia64 was added, this safety measure couldn't be employed, and was eventually dropped even for i386 and x86_64. sys_execve() and its callees were informally blacklisted for return probes until this fix was developed. Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: resched and cpu_idle reworkNick Piggin2005-11-091-36/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce confusion, and make their semantics rigid. Improves efficiency of resched_task and some cpu_idle routines. * In resched_task: - TIF_NEED_RESCHED is only cleared with the task's runqueue lock held, and as we hold it during resched_task, then there is no need for an atomic test and set there. The only other time this should be set is when the task's quantum expires, in the timer interrupt - this is protected against because the rq lock is irq-safe. - If TIF_NEED_RESCHED is set, then we don't need to do anything. It won't get unset until the task get's schedule()d off. - If we are running on the same CPU as the task we resched, then set TIF_NEED_RESCHED and no further action is required. - If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set after TIF_NEED_RESCHED has been set, then we need to send an IPI. Using these rules, we are able to remove the test and set operation in resched_task, and make clear the previously vague semantics of POLLING_NRFLAG. * In idle routines: - Enter cpu_idle with preempt disabled. When the need_resched() condition becomes true, explicitly call schedule(). This makes things a bit clearer (IMO), but haven't updated all architectures yet. - Many do a test and clear of TIF_NEED_RESCHED for some reason. According to the resched_task rules, this isn't needed (and actually breaks the assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock held). So remove that. Generally one less locked memory op when switching to the idle thread. - Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner most polling idle loops. The above resched_task semantics allow it to be set until before the last time need_resched() is checked before going into a halt requiring interrupt wakeup. Many idle routines simply never enter such a halt, and so POLLING_NRFLAG can be always left set, completely eliminating resched IPIs when rescheduling the idle task. POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Con Kolivas <kernel@kolivas.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: disable preempt in idle tasksNick Piggin2005-11-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Run idle threads with preempt disabled. Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()). How did it ever work before? Might fix the CPU hotplugging hang which Nigel Cunningham noted. We think the bug hits if the idle thread is preempted after checking need_resched() and before going to sleep, then the CPU offlined. After calling stop_machine_run, the CPU eventually returns from preemption and into the idle thread and goes to sleep. The CPU will continue executing previous idle and have no chance to call play_dead. By disabling preemption until we are ready to explicitly schedule, this bug is fixed and the idle threads generally become more robust. From: alexs <ashepard@u.washington.edu> PPC build fix From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp> MIPS build fix Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] useless includes of linux/irq.h in arch/i386Al Viro2005-09-261-2/+0
| | | | | | | | | | Most of these guys are simply not needed (pulled by other stuff via asm-i386/hardirq.h). One that is not entirely useless is hilarious - arch/i386/oprofile/nmi_timer_int.c includes linux/irq.h... as a way to get linux/errno.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: make IOPL explicitZachary Amsden2005-09-051-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | The pushf/popf in switch_to are ONLY used to switch IOPL. Making this explicit in C code is more clear. This pushf/popf pair was added as a bugfix for leaking IOPL to unprivileged processes when using sysenter/sysexit based system calls (sysexit does not restore flags). When requesting an IOPL change in sys_iopl(), it is just as easy to change the current flags and the flags in the stack image (in case an IRET is required), but there is no reason to force an IRET if we came in from the SYSENTER path. This change is the minimal solution for supporting a paravirtualized Linux kernel that allows user processes to run with I/O privilege. Other solutions require radical rewrites of part of the low level fault / system call handling code, or do not fully support sysenter based system calls. Unfortunately, this added one field to the thread_struct. But as a bonus, on P4, the fastest time measured for switch_to() went from 312 to 260 cycles, a win of about 17% in the fast case through this performance critical path. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: more asm cleanupsZachary Amsden2005-09-051-1/+1
| | | | | | | | | Some more assembler cleanups I noticed along the way. Signed-off-by: Zachary Amsden <zach@vmware.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: load_tls() fixZachary Amsden2005-09-051-7/+12
| | | | | | | | | | | | Subtle fix: load_TLS has been moved after saving %fs and %gs segments to avoid creating non-reversible segments. This could conceivably cause a bug if the kernel ever needed to save and restore fs/gs from the NMI handler. It currently does not, but this is the safest approach to avoiding fs/gs corruption. SMIs are safe, since SMI saves the descriptor hidden state. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: inline asm cleanupZachary Amsden2005-09-051-10/+6
| | | | | | | | | | | | | | | | | | | | | i386 Inline asm cleanup. Use cr/dr accessor functions. Also, a potential bugfix. Also, some CR accessors really should be volatile. Reads from CR0 (numeric state may change in an exception handler), writes to CR4 (flipping CR4.TSD) and reads from CR2 (page fault) prevent instruction re-ordering. I did not add memory clobber to CR3 / CR4 / CR0 updates, as it was not there to begin with, and in no case should kernel memory be clobbered, except when doing a TLB flush, which already has memory clobber. I noticed that page invalidation does not have a memory clobber. I can't find a bug as a result, but there is definitely a potential for a bug here: #define __flush_tlb_single(addr) \ __asm__ __volatile__("invlpg %0": :"m" (*(char *) addr)) Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sys_get_thread_area does not clear the returned argumentBlaisorblade2005-07-271-0/+2
| | | | | | | | | | | sys_get_thread_area does not memset to 0 its struct user_desc info before copying it to user space... since sizeof(struct user_desc) is 16 while the actual datas which are filled are only 12 bytes + 9 bits (across the bitfields), there is a (small) information leak. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Fix up incorrect "unlikely()" on %gs reload in x86 __switch_toLinus Torvalds2005-07-221-8/+12
| | | | | | | | | | | These days %gs is normally the TLS segment, so it's no longer zero. As a result, we shouldn't just assume that %fs/%gs tend to be zero together, but test them independently instead. Also, fix setting of debug registers to use the "next" pointer instead of "current". It so happens that the scheduler will have set the new current pointer before calling __switch_to(), but that's just an implementation detail.
* [PATCH] seccomp: tsc disableAndrea Arcangeli2005-06-271-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I believe at least for seccomp it's worth to turn off the tsc, not just for HT but for the L2 cache too. So it's up to you, either you turn it off completely (which isn't very nice IMHO) or I recommend to apply this below patch. This has been tested successfully on x86-64 against current cogito repository (i686 compiles so I didn't bother testing ;). People selling the cpu through cpushare may appreciate this bit for a peace of mind. There's no way to get any timing info anymore with this applied (gettimeofday is forbidden of course). The seccomp environment is completely deterministic so it can't be allowed to get timing info, it has to be deterministic so in the future I can enable a computing mode that does a parallel computing for each task with server side transparent checkpointing and verification that the output is the same from all the 2/3 seller computers for each task, without the buyer even noticing (for now the verification is left to the buyer client side and there's no checkpointing, since that would require more kernel changes to track the dirty bits but it'll be easy to extend once the basic mode is finished). Eliminating a cold-cache read of the cr4 global variable will save one cacheline during the tlb flush while making the code per-cpu-safe at the same time. Thanks to Mikael Pettersson for noticing the tlb flush wasn't per-cpu-safe. The global tlb flush can run from irq (IPI calling do_flush_tlb_all) but it'll be transparent to the switch_to code since the IPI won't make any change to the cr4 contents from the point of view of the interrupted code and since it's now all per-cpu stuff, it will not race. So no need to disable irqs in switch_to slow path. Signed-off-by: Andrea Arcangeli <andrea@cpushare.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cpu state clean after hot removeLi Shaohua2005-06-251-11/+9
| | | | | | | | Clean CPU states in order to reuse smp boot code for CPU hotplug. Signed-off-by: Li Shaohua<shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] init call cleanupLi Shaohua2005-06-251-1/+1
| | | | | | | | | Trival patch for CPU hotplug. In CPU identify part, only did cleaup for intel CPUs. Need do for other CPUs if they support S3 SMP. Signed-off-by: Li Shaohua<shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386 CPU hotplugZwane Mwaikambo2005-06-251-1/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (The i386 CPU hotplug patch provides infrastructure for some work which Pavel is doing as well as for ACPI S3 (suspend-to-RAM) work which Li Shaohua <shaohua.li@intel.com> is doing) The following provides i386 architecture support for safely unregistering and registering processors during runtime, updated for the current -mm tree. In order to avoid dumping cpu hotplug code into kernel/irq/* i dropped the cpu_online check in do_IRQ() by modifying fixup_irqs(). The difference being that on cpu offline, fixup_irqs() is called before we clear the cpu from cpu_online_map and a long delay in order to ensure that we never have any queued external interrupts on the APICs. There are additional changes to s390 and ppc64 to account for this change. 1) Add CONFIG_HOTPLUG_CPU 2) disable local APIC timer on dead cpus. 3) Disable preempt around irq balancing to prevent CPUs going down. 4) Print irq stats for all possible cpus. 5) Debugging check for interrupts on offline cpus. 6) Hacky fixup_irqs() to redirect irqs when cpus go off/online. 7) play_dead() for offline cpus to spin inside. 8) Handle offline cpus set in flush_tlb_others(). 9) Grab lock earlier in smp_call_function() to prevent CPUs going down. 10) Implement __cpu_disable() and __cpu_die(). 11) Enable local interrupts in cpu_enable() after fixup_irqs() 12) Don't fiddle with NMI on dead cpu, but leave intact on other cpus. 13) Program IRQ affinity whilst cpu is still in cpu_online_map on offline. Signed-off-by: Zwane Mwaikambo <zwane@linuxpower.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] kprobes: function-return probesHien Nguyen2005-06-231-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds function-return probes to kprobes for the i386 architecture. This enables you to establish a handler to be run when a function returns. 1. API Two new functions are added to kprobes: int register_kretprobe(struct kretprobe *rp); void unregister_kretprobe(struct kretprobe *rp); 2. Registration and unregistration 2.1 Register To register a function-return probe, the user populates the following fields in a kretprobe object and calls register_kretprobe() with the kretprobe address as an argument: kp.addr - the function's address handler - this function is run after the ret instruction executes, but before control returns to the return address in the caller. maxactive - The maximum number of instances of the probed function that can be active concurrently. For example, if the function is non- recursive and is called with a spinlock or mutex held, maxactive = 1 should be enough. If the function is non-recursive and can never relinquish the CPU (e.g., via a semaphore or preemption), NR_CPUS should be enough. maxactive is used to determine how many kretprobe_instance objects to allocate for this particular probed function. If maxactive <= 0, it is set to a default value (if CONFIG_PREEMPT maxactive=max(10, 2 * NR_CPUS) else maxactive=NR_CPUS) For example: struct kretprobe rp; rp.kp.addr = /* entrypoint address */ rp.handler = /*return probe handler */ rp.maxactive = /* e.g., 1 or NR_CPUS or 0, see the above explanation */ register_kretprobe(&rp); The following field may also be of interest: nmissed - Initialized to zero when the function-return probe is registered, and incremented every time the probed function is entered but there is no kretprobe_instance object available for establishing the function-return probe (i.e., because maxactive was set too low). 2.2 Unregister To unregiter a function-return probe, the user calls unregister_kretprobe() with the same kretprobe object as registered previously. If a probed function is running when the return probe is unregistered, the function will return as expected, but the handler won't be run. 3. Limitations 3.1 This patch supports only the i386 architecture, but patches for x86_64 and ppc64 are anticipated soon. 3.2 Return probes operates by replacing the return address in the stack (or in a known register, such as the lr register for ppc). This may cause __builtin_return_address(0), when invoked from the return-probed function, to return the address of the return-probes trampoline. 3.3 This implementation uses the "Multiprobes at an address" feature in 2.6.12-rc3-mm3. 3.4 Due to a limitation in multi-probes, you cannot currently establish a return probe and a jprobe on the same function. A patch to remove this limitation is being tested. This feature is required by SystemTap (http://sourceware.org/systemtap), and reflects ideas contributed by several SystemTap developers, including Will Cohen and Ananth Mavinakayanahalli. Signed-off-by: Hien Nguyen <hien@us.ibm.com> Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Frederik Deweerdt <frederik.deweerdt@laposte.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] xen: x86: Use more usermode macroVincent Hanquez2005-06-231-1/+1
| | | | | | | | | Use the user_mode macro where it's possible. Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk> Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] xen: x86: Use new macro for debugregVincent Hanquez2005-06-231-6/+6
| | | | | | | | | Make use of the 2 new macro set_debugreg and get_debugreg. Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk> Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Remove i386_ksyms.c, almost.Alexey Dobriyan2005-06-231-0/+7
| | | | | | | | | | | | | | | | | | | | * EXPORT_SYMBOL's moved to other files * #include <linux/config.h>, <linux/module.h> where needed * #include's in i386_ksyms.c cleaned up * After copy-paste, redundant due to Makefiles rules preprocessor directives removed: #ifdef CONFIG_FOO EXPORT_SYMBOL(foo); #endif obj-$(CONFIG_FOO) += foo.o * Tiny reformat to fit in 80 columns Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86 stack initialisation fixAlexander Nyberg2005-05-051-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent change fix-crash-in-entrys-restore_all.patch childregs->esp = esp; p->thread.esp = (unsigned long) childregs; - p->thread.esp0 = (unsigned long) (childregs+1); + p->thread.esp0 = (unsigned long) (childregs+1) - 8; p->thread.eip = (unsigned long) ret_from_fork; introduces an inconsistency between esp and esp0 before the task is run the first time. esp0 is no longer the actual start of the stack, but 8 bytes off. This shows itself clearly in a scenario when a ptracer that is set to also ptrace eventual children traces program1 which then clones thread1. Now the ptracer wants to modify the registers of thread1. The x86 ptrace implementation bases it's knowledge about saved user-space registers upon p->thread.esp0. But this will be a few bytes off causing certain writes to the kernel stack to overwrite a saved kernel function address making the kernel when actually running thread1 jump out into user-space. Very spectacular. The testcase I've used is: /* start with strace -f ./a.out */ #include <pthread.h> #include <stdio.h> void *do_thread(void *p) { for (;;); } int main() { pthread_t one; pthread_create(&one, NULL, &do_thread, NULL); for (;;); return 0; } So, my solution is to instead of just adjusting esp0 that creates an inconsitent state I adjust where the user-space registers are saved with -8 bytes. This gives us the wanted extra bytes on the start of the stack and esp0 is now correct. This solves the issues I saw from the original testcase from Mateusz Berezecki and has survived testing here. I think this should go into -mm a round or two first however as there might be some cruft around depending on pt_regs lying on the start of the stack. That however would have broken with the first change too! It's actually a 2-line diff but I had to move the comment of why the -8 bytes are there a few lines up. Thanks to Zwane for helping me with this. Signed-off-by: Alexander Nyberg <alexn@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386/x86_64 segment register access updateH. J. Lu2005-05-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new i386/x86_64 assemblers no longer accept instructions for moving between a segment register and a 32bit memory location, i.e., movl (%eax),%ds movl %ds,(%eax) To generate instructions for moving between a segment register and a 16bit memory location without the 16bit operand size prefix, 0x66, mov (%eax),%ds mov %ds,(%eax) should be used. It will work with both new and old assemblers. The assembler starting from 2.16.90.0.1 will also support movw (%eax),%ds movw %ds,(%eax) without the 0x66 prefix. I am enclosing patches for 2.4 and 2.6 kernels here. The resulting kernel binaries should be unchanged as before, with old and new assemblers, if gcc never generates memory access for unsigned gsindex; asm volatile("movl %%gs,%0" : "=g" (gsindex)); If gcc does generate memory access for the code above, the upper bits in gsindex are undefined and the new assembler doesn't allow it. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] i386: Use loaddebug macro consistentlyRoland McGrath2005-04-161-7/+0
| | | | | | | | | | | | | | | | This moves the macro loaddebug from asm-i386/suspend.h to asm-i386/processor.h, which is the place that makes sense for it to be defined, removes the extra copy of the same macro in arch/i386/kernel/process.c, and makes arch/i386/kernel/signal.c use the macro in place of its expansion. This is a purely cosmetic cleanup for the normal i386 kernel. However, it is handy for Xen to be able to just redefine the loaddebug macro once instead of also changing the signal.c code. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fix crash in entry.S restore_allStas Sergeev2005-04-161-1/+11
| | | | | | | | | | | | | | | | | | | | Fix the access-above-bottom-of-stack crash. 1. Allows to preserve the valueable optimization 2. Works for NMIs 3. Doesn't care whether or not there are more of the like instances where the stack is left empty. 4. Seems to work for me without the crashes:) (akpm: this is still under discussion, although I _think_ it's OK. You might want to hold off) Signed-off-by: Stas Sergeev <stsp@aknet.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Linux-2.6.12-rc2Linus Torvalds2005-04-161-0/+848
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!