aboutsummaryrefslogtreecommitdiffstats
path: root/ipc
Commit message (Collapse)AuthorAgeFilesLines
* pid namespaces: changes to show virtual ids to userPavel Emelyanov2007-10-194-12/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the largest patch in the set. Make all (I hope) the places where the pid is shown to or get from user operate on the virtual pids. The idea is: - all in-kernel data structures must store either struct pid itself or the pid's global nr, obtained with pid_nr() call; - when seeking the task from kernel code with the stored id one should use find_task_by_pid() call that works with global pids; - when showing pid's numerical value to the user the virtual one should be used, but however when one shows task's pid outside this task's namespace the global one is to be used; - when getting the pid from userspace one need to consider this as the virtual one and use appropriate task/pid-searching functions. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: yet nuther build fix] [akpm@linux-foundation.org: remove unneeded casts] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sysctl mqueue: remove the binary sysctl numbersEric W. Biederman2007-10-181-10/+0
| | | | | | | | | | Because of a conflict with FS_INODE_NR none of the binary sysctl numbers use by mqueue, were available to user space. So just remove them. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* r/o bind mounts: filesystem helpers for custom 'struct file'sDave Hansen2007-10-171-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Why do we need r/o bind mounts? This feature allows a read-only view into a read-write filesystem. In the process of doing that, it also provides infrastructure for keeping track of the number of writers to any given mount. This has a number of uses. It allows chroots to have parts of filesystems writable. It will be useful for containers in the future because users may have root inside a container, but should not be allowed to write to somefilesystems. This also replaces patches that vserver has had out of the tree for several years. It allows security enhancement by making sure that parts of your filesystem read-only (such as when you don't trust your FTP server), when you don't want to have entire new filesystems mounted, or when you want atime selectively updated. I've been using the following script to test that the feature is working as desired. It takes a directory and makes a regular bind and a r/o bind mount of it. It then performs some normal filesystem operations on the three directories, including ones that are expected to fail, like creating a file on the r/o mount. This patch: Some filesystems forego the vfs and may_open() and create their own 'struct file's. This patch creates a couple of helper functions which can be used by these filesystems, and will provide a unified place which the r/o bind mount code may patch. Also, rename an existing, static-scope init_file() to a less generic name. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipc namespace: remove config ipc ns fixCedric Le Goater2007-10-171-4/+0
| | | | | | | | | | | | Finish the work : kill all #ifdef CONFIG_IPC_NS. Thanks Robert ! Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Eric Biederman <ebiederm@xmision.com> Cc: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipc/shm.c: make 2 functions staticAdrian Bunk2007-10-171-2/+3
| | | | | | | | This patch makes two needlessly global functions static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Slab API: remove useless ctor parameter and reorder parametersChristoph Lameter2007-10-171-1/+1
| | | | | | | | | | | | | | | | | | | | | Slab constructors currently have a flags parameter that is never used. And the order of the arguments is opposite to other slab functions. The object pointer is placed before the kmem_cache pointer. Convert ctor(void *object, struct kmem_cache *s, unsigned long flags) to ctor(struct kmem_cache *s, void *object) throughout the kernel [akpm@linux-foundation.org: coupla fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [NET]: cleanup 3rd argument in netlink_sendskbDenis V. Lunev2007-10-101-3/+2
| | | | | | | | | netlink_sendskb does not use third argument. Clean it and save a couple of bytes. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* Fix user struct leakage with locked IPC shem segmentPavel Emelianov2007-07-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When user locks an ipc shmem segmant with SHM_LOCK ctl and the segment is already locked the shmem_lock() function returns 0. After this the subsequent code leaks the existing user struct: == ipc/shm.c: sys_shmctl() == ... err = shmem_lock(shp->shm_file, 1, user); if (!err) { shp->shm_perm.mode |= SHM_LOCKED; shp->mlock_user = user; } ... == Other results of this are: 1. the new shp->mlock_user is not get-ed and will point to freed memory when the task dies. 2. the RLIMIT_MEMLOCK is screwed on both user structs. The exploit looks like this: == id = shmget(...); setresuid(uid, 0, 0); shmctl(id, SHM_LOCK, NULL); setresuid(uid + 1, 0, 0); shmctl(id, SHM_LOCK, NULL); == My solution is to return 0 to the userspace and do not change the segment's user. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* NOMMU: Fix SYSV IPC SHMDavid Howells2007-07-311-0/+2
| | | | | | | | | | Fix the SYSV IPC SHM to work with the changes applied by the new fault handler patches when CONFIG_MMU=n. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: Remove slab destructors from kmem_cache_create().Paul Mundt2007-07-201-1/+1
| | | | | | | | | | | | | | Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* mm: fault feedback #1Nick Piggin2007-07-191-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | Change ->fault prototype. We now return an int, which contains VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte. FAULT_RET_ code tells the VM whether a page was found, whether it has been locked, and potentially other things. This is not quite the way he wanted it yet, but that's changed in the next patch (which requires changes to arch code). This means we no longer set VM_CAN_INVALIDATE in the vma in order to say that a page is locked which requires filemap_nopage to go away (because we can no longer remain backward compatible without that flag), but we were going to do that anyway. struct fault_data is renamed to struct vm_fault as Linus asked. address is now a void __user * that we should firmly encourage drivers not to use without really good reason. The page is now returned via a page pointer in the vm_fault struct. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: merge populate and nopage into fault (fixes nonlinear)Nick Piggin2007-07-191-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes the virtual address -> file offset differently from linear mappings. ->populate is a layering violation because the filesystem/pagecache code should need to know anything about the virtual memory mapping. The hitch here is that the ->nopage handler didn't pass down enough information (ie. pgoff). But it is more logical to pass pgoff rather than have the ->nopage function calculate it itself anyway (because that's a similar layering violation). Having the populate handler install the pte itself is likewise a nasty thing to be doing. This patch introduces a new fault handler that replaces ->nopage and ->populate and (later) ->nopfn. Most of the old mechanism is still in place so there is a lot of duplication and nice cleanups that can be removed if everyone switches over. The rationale for doing this in the first place is that nonlinear mappings are subject to the pagefault vs invalidate/truncate race too, and it seemed stupid to duplicate the synchronisation logic rather than just consolidate the two. After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in pagecache. Seems like a fringe functionality anyway. NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no users have hit mainline yet. [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* arch/i386/* fs/* ipc/*: mark variables with uninitialized_var()Jeff Garzik2007-07-172-3/+3
| | | | | | | | Mark variables with uninitialized_var() if such a warning appears, and analysis proves that the var is initialized properly on all paths it is used. Signed-off-by: Jeff Garzik <jeff@garzik.org>
* remove CONFIG_UTS_NS and CONFIG_IPC_NSCedric Le Goater2007-07-165-25/+6
| | | | | | | | | | | | | | CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only deactivate the unshare of the uts and ipc namespaces and do not improve performance. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Acked-by: "Serge E. Hallyn" <serue@us.ibm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fix logic error in ipc compat semctl()Alexander Graf2007-07-061-1/+1
| | | | | | | | | When calling a semctl(IPC_STAT) without IPC_64 the check if the memory is unevaluated. This patch fixes this. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* shm: fix the filename of hugetlb sysv shared memoryEric W. Biederman2007-06-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | Some user space tools need to identify SYSV shared memory when examining /proc/<pid>/maps. To do so they look for a block device with major zero, a dentry named SYSV<sysv key>, and having the minor of the internal sysv shared memory kernel mount. To help these tools and to make it easier for people just browsing /proc/<pid>/maps this patch modifies hugetlb sysv shared memory to use the SYSV<key> dentry naming convention. User space tools will still have to be aware that hugetlb sysv shared memory lives on a different internal kernel mount and so has a different block device minor number from the rest of sysv shared memory. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Albert Cahalan <acahalan@gmail.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hugetlb: fix get_policy for stacked shared memory filesAdam Litke2007-06-161-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here's another breakage as a result of shared memory stacked files :( The NUMA policy for a VMA is determined by checking the following (in the order given): 1) vma->vm_ops->get_policy() (if defined) 2) vma->vm_policy (if defined) 3) task->mempolicy (if defined) 4) Fall back to default_policy By switching to stacked files for shared memory, get_policy() is now always set to shm_get_policy which is a wrapper function. This causes us to stop at step 1, which yields NULL for hugetlb instead of task->mempolicy which was the previous (and correct) result. This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for the wrapped vm_ops. (akpm: the refcounting of mempolicies is busted and this patch does nothing to improve it) Signed-off-by: Adam Litke <agl@us.ibm.com> Acked-by: William Irwin <bill.irwin@oracle.com> Cc: dean gaudet <dean@arctic.org> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Restore shmid as inode# to fix /proc/pid/maps ABI breakageBadari Pulavarty2007-06-161-0/+5
| | | | | | | | | | | | | shmid used to be stored as inode# for shared memory segments. Some of the proc-ps tools use this from /proc/pid/maps. Recent cleanups to newseg() changed it. This patch sets inode number back to shared memory id to fix breakage. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: "Albert Cahalan" <acahalan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Remove SLAB_CTOR_CONSTRUCTORChristoph Lameter2007-05-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@ucw.cz> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] complete message queue auditingAmy Griffis2007-05-111-0/+4
| | | | | | | | | | Handle the edge cases for POSIX message queue auditing. Collect inode info when opening an existing mq, and for send/receive operations. Remove audit_inode_update() as it has really evolved into the equivalent of audit_inode(). Signed-off-by: Amy Griffis <amy.griffis@hp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* header cleaning: don't include smp_lock.h when not usedRandy Dunlap2007-05-082-2/+0
| | | | | | | | | | | | Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge sys_clone()/sys_unshare() nsproxy and namespace handlingBadari Pulavarty2007-05-081-43/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sys_clone() and sys_unshare() both makes copies of nsproxy and its associated namespaces. But they have different code paths. This patch merges all the nsproxy and its associated namespace copy/clone handling (as much as possible). Posted on container list earlier for feedback. - Create a new nsproxy and its associated namespaces and pass it back to caller to attach it to right process. - Changed all copy_*_ns() routines to return a new copy of namespace instead of attaching it to task->nsproxy. - Moved the CAP_SYS_ADMIN checks out of copy_*_ns() routines. - Removed unnessary !ns checks from copy_*_ns() and added BUG_ON() just incase. - Get rid of all individual unshare_*_ns() routines and make use of copy_*_ns() instead. [akpm@osdl.org: cleanups, warning fix] [clg@fr.ibm.com: remove dup_namespaces() declaration] [serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval] [akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n] Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <containers@lists.osdl.org> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Cap shmmax at INT_MAX in compat shminfoGuy Streeter2007-05-081-0/+4
| | | | | | | | | | The value of shmmax may be larger than will fit in the struct used by the 32bit compat version of sys_shmctl. This change mirrors what the normal sys_shmctl does when called with the old IPC_INFO command. Signed-off-by: Guy Streeter <streeter@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* slab allocators: Remove SLAB_DEBUG_INITIAL flagChristoph Lameter2007-05-071-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by SLAB. I think its purpose was to have a callback after an object has been freed to verify that the state is the constructor state again? The callback is performed before each freeing of an object. I would think that it is much easier to check the object state manually before the free. That also places the check near the code object manipulation of the object. Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was compiled with SLAB debugging on. If there would be code in a constructor handling SLAB_DEBUG_INITIAL then it would have to be conditional on SLAB_DEBUG otherwise it would just be dead code. But there is no such code in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real use of, difficult to understand and there are easier ways to accomplish the same effect (i.e. add debug code before kfree). There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be clear in fs inode caches. Remove the pointless checks (they would even be pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors. This is the last slab flag that SLUB did not support. Remove the check for unimplemented flags from SLUB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] ipcns: fix !CONFIG_IPC_NS behaviorSerge E. Hallyn2007-03-271-0/+7
| | | | | | | | | | | | When CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) claims success, but did not actually clone a new IPC namespace. Fix this to return -EINVAL so the caller knows his request was denied. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] mqueue: nested locking annotationPeter Zijlstra2007-03-061-1/+2
| | | | | | | | Fix http://bugzilla.kernel.org/show_bug.cgi?id=8130 Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Fix get_unmapped_area and fsync for hugetlb shm segmentsAdam Litke2007-03-011-6/+26
| | | | | | | | | | | | | | | | | | | | This patch provides the following hugetlb-related fixes to the recent stacked shm files changes: - Update is_file_hugepages() so it will reconize hugetlb shm segments. - get_unmapped_area must be called with the nested file struct to handle the sfd->file->f_ops->get_unmapped_area == NULL case. - The fsync f_op must be wrapped since it is specified in the hugetlbfs f_ops. This is based on proposed fixes from Eric Biederman that were debugged and tested by me. Without it, attempting to use hugetlb shared memory segments on powerpc (and likely ia64) will kill your box. Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: William Irwin <bill.irwin@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] make ipc/shm.c:shm_nopage() staticAdrian Bunk2007-03-011-2/+2
| | | | | | | | | shm_nopage() can become static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] shm: make sysv ipc shared memory use stacked filesEric W. Biederman2007-02-201-85/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current ipc shared memory code runs into several problems because it does not quite use files like the rest of the kernel. With the option of backing ipc shared memory with either hugetlbfs or ordinary shared memory the problems got worse. With the added support for ipc namespaces things behaved so unexpected that we now have several bad namespace reference counting bugs when using what appears at first glance to be a reasonable idiom. So to attack these problems and hopefully make the code more maintainable this patch simply uses the files provided by other parts of the kernel and builds it's own files out of them. The shm files are allocated in do_shmat and freed when their reference count drops to zero with their last unmap. The file and vm operations that we don't want to implement or we don't implement completely we just delegate to the operations of our backing file. This means that we now get an accurate shm_nattch count for we have a hugetlbfs inode for backing store, and the shm accounting of last attach and last detach time work as well. This means that getting a reference to the ipc namespace when we create the file and dropping the referenece in the release method is now safe and correct. This means we no longer need a special case for clearing VM_MAYWRITE as our file descriptor now only has write permissions when we have requested write access when calling shmat. Although VM_SHARED is now cleared as well which I believe is harmless and is mostly likely a minor bug fix. By using the same set of operations for both the hugetlb case and regular shared memory case shmdt is not simplified and made slightly more correct as now the test "vma->vm_ops == &shm_vm_ops" is 100% accurate in spotting all shared memory regions generated from sysvipc shared memory. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] sysctl: remove insert_at_head from register_sysctlEric W. Biederman2007-02-142-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] sysctl: move SYSV IPC sysctls to their own fileEric W. Biederman2007-02-142-0/+184
| | | | | | | | | | | | | | | This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded with special cases, and by keeping all of the ipc logic to together it makes the code a little more readable. [gcoady.lk@gmail.com: build fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Grant Coady <gcoady.lk@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] mark struct inode_operations const 2Arjan van de Ven2007-02-121-2/+2
| | | | | | | | | | | Many struct inode_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] mark struct file_operations const 7Arjan van de Ven2007-02-123-6/+6
| | | | | | | | | | | Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] ipc: save the ipc namespace while reading proc filesEric W. Biederman2007-02-121-13/+45
| | | | | | | | | | | | | | | | | | The problem we were assuming that current->nsproxy->ipc_ns would never change while someone has our file in /proc/sysvipc/ file open. Given that this can change with both unshare and by passing the file descriptor to another process that assumption is occasionally wrong. Therefore this patch causes /proc/sysvipc/* to cache the namespace and increment it's count when we open the file and to decrement the count when we close the file, ensuring consistent operation with no surprises. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Numerous fixes to kernel-doc info in source files.Robert P. J. Day2007-02-111-11/+10
| | | | | | | | | | | | | | | A variety of (mostly) innocuous fixes to the embedded kernel-doc content in source files, including: * make multi-line initial descriptions single line * denote some function names, constants and structs as such * change erroneous opening '/*' to '/**' in a few places * reword some text for clarity Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] correct sys_shmget allocation checkGuy Streeter2007-01-231-1/+1
| | | | | | | | | As written, sys_shmget will return ENOSPC when one page is still available for allocation. This patch corrects the test. Signed-off-by: Guy Streeter <guy.streeter+lkml@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --
* [PATCH] getting rid of all casts of k[cmz]alloc() callsRobert P. J. Day2006-12-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Run this: #!/bin/sh for f in $(grep -Erl "\([^\)]*\) *k[cmz]alloc" *) ; do echo "De-casting $f..." perl -pi -e "s/ ?= ?\([^\)]*\) *(k[cmz]alloc) *\(/ = \1\(/" $f done And then go through and reinstate those cases where code is casting pointers to non-pointers. And then drop a few hunks which conflicted with outstanding work. Cc: Russell King <rmk@arm.linux.org.uk>, Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Greg KH <greg@kroah.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Paul Fulghum <paulkf@microgate.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Karsten Keil <kkeil@suse.de> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Ian Kent <raven@themaw.net> Cc: Steven French <sfrench@us.ibm.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Jaroslav Kysela <perex@suse.cz> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] struct path: convert ipcJosef Sipek2006-12-082-16/+16
| | | | | | Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] kernel core: replace kmalloc+memset with kzallocBurman Yan2006-12-071-2/+1
| | | | | Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Fix the size limit of compat space msgsizesuzuki2006-12-072-27/+40
| | | | | | | | | | | | | | | | | | | | | | | Currently we allocate 64k space on the user stack and use it the msgbuf for sys_{msgrcv,msgsnd} for compat and the results are later copied in user [ by copy_in_user]. This patch introduces helper routines for sys_{msgrcv,msgsnd} as below: do_msgsnd() : Accepts the mtype and user space ptr to the buffer along with the msqid and msgflg. do_msgrcv() : Accepts a kernel space ptr to mtype and a userspace ptr to the buffer. The mtype has to be copied back the user space msgbuf by the caller. These changes avoid the need to allocate the msgsize on the userspace ( thus removing the size limt ) and the overhead of an extra copy_in_user(). Signed-off-by: Suzuki K P <suzuki@in.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: remove kmem_cache_tChristoph Lameter2006-12-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: remove SLAB_KERNELChristoph Lameter2006-12-071-1/+1
| | | | | | | | SLAB_KERNEL is an alias of GFP_KERNEL. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* WorkStruct: Pass the work_struct pointer instead of context dataDavid Howells2006-11-221-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the work_struct pointer to the work function rather than context data. The work function can use container_of() to work out the data. For the cases where the container of the work_struct may go away the moment the pending bit is cleared, it is made possible to defer the release of the structure by deferring the clearing of the pending bit. To make this work, an extra flag is introduced into the management side of the work_struct. This governs auto-release of the structure upon execution. Ordinarily, the work queue executor would release the work_struct for further scheduling or deallocation by clearing the pending bit prior to jumping to the work function. This means that, unless the driver makes some guarantee itself that the work_struct won't go away, the work function may not access anything else in the work_struct or its container lest they be deallocated.. This is a problem if the auxiliary data is taken away (as done by the last patch). However, if the pending bit is *not* cleared before jumping to the work function, then the work function *may* access the work_struct and its container with no problems. But then the work function must itself release the work_struct by calling work_release(). In most cases, automatic release is fine, so this is the default. Special initiators exist for the non-auto-release case (ending in _NAR). Signed-Off-By: David Howells <dhowells@redhat.com>
* Revert unintentional "volatile" changes in ipc/msg.cLinus Torvalds2006-11-041-1/+1
| | | | | | | | | | | | | | | | | | | Commit 5a06a363ef48444186f18095ae1b932dddbbfa89 ("[PATCH] ipc/msg.c: clean up coding style") breaks fakeroot on Alpha (variously hangs or oopses), according to a report by Falk Hueffner. The fact that the code seems to rely on compiler access ordering through the use of "volatile" is a pretty certain sign that the code has locking problems, and we should fix those properly and then remove the whole "volatile" entirely. But in the meantime, the movement of "volatile" was unintentional, and should be reverted. Cc: Falk Hueffner <falk@debian.org> Cc: Andrew Morton <akpm@osdl.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Fix ipc entries removalPavel Emelianov2006-11-035-1/+16
| | | | | | | | | | | | | | | | | | | | Fix two issuses related to ipc_ids->entries freeing. 1. When freeing ipc namespace we need to free entries allocated with ipc_init_ids(). 2. When removing old entries in grow_ary() ipc_rcu_putref() may be called on entries set to &ids->nullentry earlier in ipc_init_ids(). This is almost impossible without namespaces, but with them this situation becomes possible. Found during OpenVZ testing after obvious leaks in beancounters. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Cc: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Michal Wronski: update contact infoMichal Wronski2006-10-031-1/+1
| | | | | | | My email has changed. Signed-Off-By: Michal Wronski <michal.wronski@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
* fix file specification in commentsUwe Zeisberger2006-10-031-1/+1
| | | | | | | Many files include the filename at the beginning, serveral used a wrong one. Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
* [PATCH] ipc: replace kmalloc and memset in get_undo_list with kzallocMatt Helsley2006-10-021-4/+1
| | | | | | | | | | | | | | Simplify get_undo_list() by dropping the unnecessary cast, removing the size variable, and switching to kzalloc() instead of a kmalloc() followed by a memset(). This cleanup was split then modified from Jes Sorenson's Task Notifiers patches. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] IPC namespace - shmKirill Korotaev2006-10-021-86/+168
| | | | | | | | | IPC namespace support for IPC shm code. Signed-off-by: Pavel Emelianiov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] IPC namespace - semKirill Korotaev2006-10-021-71/+130
| | | | | | | | | IPC namespace support for IPC sem code. Signed-off-by: Pavel Emelianiov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>