| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Randy Dunlap caught this built error on i386:
fs/built-in.o: In function `hash_index':
dir.c:(.text+0x6c1f2): undefined reference to `__umoddi3'
|
| | | | |
| | | | |
| | | | |
| | | | | |
Spotted by Dan Carpenter.
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This is a new flash file system. See
Documentation/filesystems/logfs.txt
Signed-off-by: Joern Engel <joern@logfs.org>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (21 commits)
xfs: return inode fork offset in bulkstat for fsr
xfs: Increase the default size of the reserved blocks pool
xfs: truncate delalloc extents when IO fails in writeback
xfs: check for more work before sleeping in xfssyncd
xfs: Fix a build warning in xfs_aops.c
xfs: fix locking for inode cache radix tree tag updates
xfs: remove xfs_ipin/xfs_iunpin
xfs: cleanup xfs_iunpin_wait/xfs_iunpin_nowait
xfs: kill xfs_lrw.h
xfs: factor common xfs_trans_bjoin code
xfs: stop passing opaque handles to xfs_log.c routines
xfs: split xfs_bmap_btalloc
xfs: fix xfs_fsblock_t tracing
xfs: fix inode pincount check in fsync
xfs: Non-blocking inode locking in IO completion
xfs: implement optimized fdatasync
xfs: remove wrapper for the fsync file operation
xfs: remove wrappers for read/write file operations
xfs: merge xfs_lrw.c into xfs_file.c
xfs: fix dquota trace format
...
|
| |\ \ \ \ \ |
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
So that fsr can attempt to get the fork offset of the temporary
inode it uses the same as the inode it is defragmenting, pass the
fork offset out in the bulkstat information.
The bulkstat structure has padding that has always been zeroed, so
userspace can tell if this field is set or not by use of the xattr
present flag and a non-zero value for the fork offset.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The current default size of the reserved blocks pool is easy to deplete
with certain workloads, in particular workloads that do lots of concurrent
delayed allocation extent conversions. If enough transactions are running
in parallel and the entire pool is consumed then subsequent calls to
xfs_trans_reserve() will fail with ENOSPC. Also add a rate limited
warning so we know if this starts happening again.
This is an updated version of an old patch from Lachlan McIlroy.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
We currently use block_invalidatepage() to clean up pages where I/O
fails in ->writepage(). Unfortunately, if the page has delalloc
regions on it, we fail to remove the delalloc regions when we
invalidate the page. This can result in tripping a BUG() in
xfs_get_blocks() later on if a direct IO read is done on that same
region - the delalloc extent is returned when none is supposed to be
there.
Fix this by truncating away the delalloc regions on the page before
invalidating it. Because they are delalloc, we can do this without
needing a transaction. Indeed - if we get ENOSPC errors, we have to
be able to do this truncation without a transaction as there is
no space left for block reservation (typically why we see a ENOSPC
in writeback).
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
xfssyncd processes a queue of work by detaching the queue and
then iterating over all the work items. It then sleeps for a
time period or until new work comes in. If new work is queued
while xfssyncd is actively processing the detached work queue,
it will not process that new work until after a sleep timeout
or the next work event queued wakes it.
Fix this by checking the work queue again before going to sleep.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Fix a build warning that slipped through. Dave Chinner had posted
an updated version of his patch but the previous version--without
this fix--was what got committed.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The radix-tree code requires it's users to serialize tag updates
against other updates to the tree. While XFS protects tag updates
against each other it does not serialize them against updates of the
tree contents, which can lead to tag corruption. Fix the inode
cache to always take pag_ici_lock in exclusive mode when updating
radix tree tags.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Patrick Schreurs <patrick@news-service.com>
Tested-by: Patrick Schreurs <patrick@news-service.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Inodes are only pinned/unpinned via the inode item methods, and lots of
code relies on that fact. So remove the separate xfs_ipin/xfs_iunpin
helpers and merge them into their only callers. This also fixes up
various duplicate and/or incorrect comments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Remove the inode item pointer and ili_last_lsn checks in
__xfs_iunpin_wait as any pinned inode is guaranteed to have them
valid. After this the xfs_iunpin_nowait case is nothing more than a
xfs_log_force_lsn, as we know that the caller has already checked
the pincount.
Make xfs_iunpin_nowait the new low-level routine just doing the log
force and rewrite xfs_iunpin_wait around it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Move the two declarations to better fitting headers now that
xfs_lrw.c is gone.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Most of xfs_trans_bjoin is duplicated in xfs_trans_get_buf,
xfs_trans_getsb and xfs_trans_read_buf. Add a new _xfs_trans_bjoin
which can be called by all four functions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Currenly we pass opaque xfs_log_ticket_t handles instead of
struct xlog_ticket pointers, and void pointers instead of
struct xlog_in_core pointers to various log manager functions.
Instead pass properly typed pointers after adding forward
declarations for them to xfs_log.h, and adjust the touched
function prototypes to the standard XFS style while at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Split out the nullfb case into a separate function to reduce the stack
footprint and make the code more readable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Using a static buffer in xfs_fmtfsblock means we can corrupt traces if
multiple CPUs hit this code path at the same. Just remove xfs_fmtfsblock
for now and print the block number purely numerical. If we want the
NULLFSBLOCK and NULLSTARTBLOCK formatting back the best way would be
a decoding plugin in the trace-cmd userspace command.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
We need to hold the ilock to check the inode pincount safely. While
we're at it also remove the check for ip->i_itemp->ili_last_lsn, a
pinned inode always has it set.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The introduction of barriers to loop devices has created a new IO
order completion dependency that XFS does not handle. The loop
device implements barriers using fsync and so turns a log IO in the
XFS filesystem on the loop device into a data IO in the backing
filesystem. That is, the completion of log IOs in the loop
filesystem are now dependent on completion of data IO in the backing
filesystem.
This can cause deadlocks when a flush daemon issues a log force with
an inode locked because the IO completion of IO on the inode is
blocked by the inode lock. This in turn prevents further data IO
completion from occuring on all XFS filesystems on that CPU (due to
the shared nature of the completion queues). This then prevents the
log IO from completing because the log is waiting for data IO
completion as well.
The fix for this new completion order dependency issue is to make
the IO completion inode locking non-blocking. If the inode lock
can't be grabbed, simply requeue the IO completion back to the work
queue so that it can be processed later. This prevents the
completion queue from being blocked and allows data IO completion on
other inodes to proceed, hence avoiding completion order dependent
deadlocks.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Allow us to track the difference between timestamp and size updates
by using mark_inode_dirty from the I/O completion code, and checking
the VFS inode flags in xfs_file_fsync.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Currently the fsync file operation is divided into a low-level
routine doing all the work and one that implements the Linux file
operation and does minimal argument wrapping. This is a leftover
from the days of the vnode operations layer and can be removed to
simplify the code a bit, as well as preparing for the implementation
of an optimized fdatasync which needs to look at the Linux inode
state.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Currently the aio_read, aio_write, splice_read and splice_write file
operations are divided into a low-level routine doing all the work
and one that implements the Linux file operations and does minimal
argument wrapping. This is a leftover from the days of the vnode
operations layer and can be removed to simplify the code a lot.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Currently the code to implement the file operations is split over
two small files. Merge the content of xfs_lrw.c into xfs_file.c to
have it in one place. Note that I haven't done various cleanups
that are possible after this yet, they will follow in the next
patch. Also the function xfs_dev_is_read_only which was in
xfs_lrw.c before really doesn't fit in here at all and was moved to
xfs_mount.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The be32_to_cpu in the TP_printk output breaks automatic parsing of
the trace format by the trace-cmd tools, so we have to move it into
the TP_assign block. While we're at it also fix the format for the
quota limits to more regular and easier parseable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
While doing some testing of readdir perf a while back,
I noticed that the buffer size we're using internally is
smaller than what glibc gives us by default. Upping this
size helped a bit, and seems safe.
glibc's __alloc_dir() does:
const size_t default_allocation = (4 * BUFSIZ < sizeof (struct dirent64)
? sizeof (struct dirent64) : 4 * BUFSIZ);
const size_t small_allocation = (BUFSIZ < sizeof (struct dirent64)
? sizeof (struct dirent64) : BUFSIZ);
size_t allocation = default_allocation;
#ifdef _STATBUF_ST_BLKSIZE
if (statp != NULL && default_allocation < statp->st_blksize)
allocation = statp->st_blksize;
#endif
and
#define _G_BUFSIZ 8192
#define _IO_BUFSIZ _G_BUFSIZ
# define BUFSIZ _IO_BUFSIZ
so the default buffer is 4 * 8192 = 32768
(except in the unlikely case of blocks > 32k....)
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits)
nfsd4: fix minor memory leak
svcrpc: treat uid's as unsigned
nfsd: ensure sockets are closed on error
Revert "sunrpc: move the close processing after do recvfrom method"
Revert "sunrpc: fix peername failed on closed listener"
sunrpc: remove unnecessary svc_xprt_put
NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN
xfs_export_operations.commit_metadata
commit_metadata export operation replacing nfsd_sync_dir
lockd: don't clear sm_monitored on nsm_reboot_lookup
lockd: release reference to nsm_handle in nlm_host_rebooted
nfsd: Use vfs_fsync_range() in nfsd_commit
NFSD: Create PF_INET6 listener in write_ports
SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
NFSD: Support AF_INET6 in svc_addsock() function
SUNRPC: Use rpc_pton() in ip_map_parse()
nfsd: 4.1 has an rfc number
nfsd41: Create the recovery entry for the NFSv4.1 client
nfsd: use vfs_fsync for non-directories
...
|
| |\ \ \ \ \ \ \
| | | |/ / / / /
| | |/| | | | |
| | | | | | | | |
Resolve merge conflict in fs/xfs/linux-2.6/xfs_export.c.
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
There's no need to allocate this cred more than once.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The server's callback client should stop trying to connect to the
client's callback server as soon as it gets ECONNREFUSED.
The NFS server's callback client does not call rpc_ping(), but appears
to have it's own "ping" procedure, so it wasn't covered by commit
caabea8a.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
This is the commit_metadata export operation for XFS.
- Takes one inode to be committed.
- Forces the log up to the lsn of the inode.
- Doesn't force the log if the inode doesn't have a pincount.
Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
[bfields@citi.umich.edu: trivial whitespace fix]
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
- Add commit_metadata export_operation to allow the underlying filesystem to
decide how to commit an inode most efficiently.
- Usage of nfsd_sync_dir and write_inode_now has been replaced with the
commit_metadata function that takes a svc_fh.
- The commit_metadata function calls the commit_metadata export_op if it's
there, or else falls back to sync_inode instead of fsync and write_inode_now
because only metadata need be synced here.
- nfsd4_sync_rec_dir now uses vfs_fsync so that commit_metadata can be static
Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
When lockd gets a notify downcall from statd, it'll search its hosts
cache and then clear the sm_monitored bit on the host it finds. The idea
is apparently to make lockd redo a SM_MON on the next lock request.
This is unnecessary and causes the kernel's NSM cache to go out of sync
with statd. statd doesn't stop monitoring a host when it gets a
SM_NOTIFY and there's no guarantee that another lock will occur after
the reclaim and before the unmount. In that event, no SM_UNMON will
occur.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
nsm_reboot_lookup takes a reference to the nsm_handle that it returns,
but nlm_host_rebooted never releases that reference.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The NFS COMMIT operation allows the client to specify the exact byte range
that it wishes to sync to disk in order to optimise server performance.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Try to create a PF_INET6 listener for NFSD, if IPv6 is enabled in the
kernel.
Make sure nfsd_serv's reference count is decreased if
__write_ports_addxprt() failed to create a listener. See
__write_ports_addfd().
Our current plan is to rely on rpc.nfsd to create appropriate IPv6
listeners when server-side NFS/IPv6 support is desired. Legacy
behavior, via the write_threads or write_svc kernel APIs, will remain
the same -- only IPv4 listeners are created.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[bfields@citi.umich.edu: Move error-handling code to end]
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
write_ports() converts svc_create_xprt()'s ENOENT error return to
EPROTONOSUPPORT so that rpc.nfsd (in user space) can report an error
message that makes sense.
It turns out that several of the other kernel APIs rpc.nfsd use can
also return ENOENT from svc_create_xprt(), by way of lockd_up().
On the client side, an NFSv2 or NFSv3 mount request can also return
the result of lockd_up(). This error may also be returned during an
NFSv4 mount request, since the NFSv4 callback service uses
svc_create_xprt() to create the callback listener. An ENOENT error
return results in a confusing error message from the mount command.
Let's have svc_create_xprt() return EPROTONOSUPPORT instead of ENOENT.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Clean up: Bruce observed we have more or less common logic in each of
svc_create_xprt()'s callers: the check to create an IPv6 RPC listener
socket only if CONFIG_IPV6 is set. I'm about to add another case
that does just the same.
If we move the ifdefs into __svc_xpo_create(), then svc_create_xprt()
call sites can get rid of the "#ifdef" ugliness, and can use the same
logic with or without IPv6 support available in the kernel.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Instead of opencoding the fsync calling sequence use vfs_fsync. This also
gets rid of the useless i_mutex over the data writeout.
Consolidate the remaining special code for syncing directories and document
it's quirks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Since we're checking for LAST_NFS4_OP, use FIRST_NFS4_OP to be consistent.
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The server incorrectly assumes that the operations in the
array start with value 0. The first operation (OP_ACCESS)
has a value of 3, causing the check in nfsd4_decode_compound
to be off.
Instead of comparing that the operation number is less than
the number of elements in the array, the server should verify
that it is less than the maximum valid operation number
defined by LAST_NFS4_OP.
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Modify uid check in do_coredump so as to not apply it in the case of
pipes.
This just got noticed in testing. The end of do_coredump validates the
uid of the inode for the created file against the uid of the crashing
process to ensure that no one can pre-create a core file with different
ownership and grab the information contained in the core when they
shouldn' tbe able to. This causes failures when using pipes for a core
dumps if the crashing process is not root, which is the uid of the pipe
when it is created.
The fix is simple. Since the check for matching uid's isn't relevant for
pipes (a process can't create a pipe that the uermodehelper code will open
anyway), we can just just skip it in the event ispipe is non-zero
Reverts a pipe-affecting change which was accidentally made in
: commit c46f739dd39db3b07ab5deb4e3ec81e1c04a91af
: Author: Ingo Molnar <mingo@elte.hu>
: AuthorDate: Wed Nov 28 13:59:18 2007 +0100
: Commit: Linus Torvalds <torvalds@woody.linux-foundation.org>
: CommitDate: Wed Nov 28 10:58:01 2007 -0800
:
: vfs: coredumping fix
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
User visible change.
do_coredump() kills all threads which share the same ->mm but only the
coredumping process gets the proper exit_code. Other tasks which share
the same ->mm die "silently" and return status == 0 to parent.
This is historical behaviour, not actually a bug. But I think Frank
Heckenbach rightly dislikes the current behaviour. Simple test-case:
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/wait.h>
int main(void)
{
int stat;
if (!fork()) {
if (!vfork())
kill(getpid(), SIGQUIT);
}
wait(&stat);
printf("stat=%x\n", stat);
return 0;
}
Before this patch it prints "stat=0" despite the fact the child was killed
by SIGQUIT. After this patch the output is "stat=3" which obviously makes
more sense.
Even with this patch, only the task which originates the coredumping gets
"|= 0x80" if the core was actually dumped, but at least the coredumping
signal is visible to do_wait/etc.
Reported-by: Frank Heckenbach <f.heckenbach@fh-soft.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Pass mm->flags as a coredump parameter for consistency.
---
1787 if (mm->core_state || !get_dumpable(mm)) { <- (1)
1788 up_write(&mm->mmap_sem);
1789 put_cred(cred);
1790 goto fail;
1791 }
1792
[...]
1798 if (get_dumpable(mm) == 2) { /* Setuid core dump mode */ <-(2)
1799 flag = O_EXCL; /* Stop rewrite attacks */
1800 cred->fsuid = 0; /* Dump root private */
1801 }
---
Since dumpable bits are not protected by lock, there is a chance to change
these bits between (1) and (2).
To solve this issue, this patch copies mm->flags to
coredump_params.mm_flags at the beginning of do_coredump() and uses it
instead of get_dumpable() while dumping core.
This copy is also passed to binfmt->core_dump, since elf*_core_dump() uses
dump_filter bits in mm->flags.
[akpm@linux-foundation.org: fix merge]
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The current ELF dumper implementation can produce broken corefiles if
program headers exceed 65535. This number is determined by the number of
vmas which the process have. In particular, some extreme programs may use
more than 65535 vmas. (If you google max_map_count, you can find some
users facing this problem.) This kind of program never be able to generate
correct coredumps.
This patch implements ``extended numbering'' that uses sh_info field of
the first section header instead of e_phnum field in order to represent
upto 4294967295 vmas.
This is supported by
AMD64-ABI(http://www.x86-64.org/documentation.html) and
Solaris(http://docs.sun.com/app/docs/doc/817-1984/).
Of course, we are preparing patches for gdb and binutils.
Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
By the next patch, elf_core_dump() and elf_fdpic_core_dump() will support
extended numbering and so will produce the corefiles with section header
table in a special case.
The problem is the process of writing a file header offset of the section
header table into e_shoff field of the ELF header. ELF header is
positioned at the beginning of the corefile, while section header at the
end. So, we need to take which of the following ways:
1. Seek backward to retry writing operation for ELF header
after writing process for a whole part
2. Make offset calculation process and writing process
totally sequential
The clause 1. is not always possible: one cannot assume that file system
supports seek function. Consider the no_llseek case.
Therefore, this patch adopts the clause 2.
Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
elf_core_dump() and elf_fdpic_core_dump() use #ifdef and the corresponding
macro for hiding _multiline_ logics in functions. This patch removes
#ifdef and replaces ELF_CORE_EXTRA_* by corresponding functions. For
architectures not implemeonting ELF_CORE_EXTRA_*, we use weak functions in
order to reduce a range of modification.
This cleanup is for my next patches, but I think this cleanup itself is
worth doing regardless of my firnal purpose.
Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
My next patch will replace ELF_CORE_EXTRA_* macros by functions, putting
them into other newly created *.c files. Then, each files will contain
dump_write(), where each pair of binfmt_*.c and elfcore.c should be the
same. So, this patch moves them into a header file with dump_seek().
Also, the patch deletes confusing DUMP_WRITE macros in each files.
Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The current ELF dumper can produce broken corefiles if program headers
exceed 65535. In particular, the program in 64-bit environment often
demands more than 65535 mmaps. If you google max_map_count, then you can
find many users facing this problem.
Solaris has already dealt with this issue, and other OSes have also
adopted the same method as in Solaris. Currently, Sun's document and AMD
64 ABI include the description for the extension, where they call the
extension Extended Numbering. See Reference for further information.
I believe that linux kernel should adopt the same way as they did, so I've
written this patch.
I am also preparing for patches of GDB and binutils.
How to fix
==========
In new dumping process, there are two cases according to weather or
not the number of program headers is equal to or more than 65535.
- if less than 65535, the produced corefile format is exactly the same
as the ordinary one.
- if equal to or more than 65535, then e_phnum field is set to newly
introduced constant PN_XNUM(0xffff) and the actual number of program
headers is set to sh_info field of the section header at index 0.
Compatibility Concern
=====================
* As already mentioned in Summary, Sun and AMD64 has already adopted
this. See Reference.
* There are four combinations according to whether kernel and userland
tools are respectively modified or not. The next table summarizes
shortly for each combination.
---------------------------------------------
Original Kernel | Modified Kernel
---------------------------------------------
< 65535 | >= 65535 | < 65535 | >= 65535
-------------------------------------------------------------
Original Tools | OK | broken | OK | broken (#)
-------------------------------------------------------------
Modified Tools | OK | broken | OK | OK
-------------------------------------------------------------
Note that there is no case that `OK' changes to `broken'.
(#) Although this case remains broken, O-M behaves better than
O-O. That is, while in O-O case e_phnum field would be extremely
small due to integer overflow, in O-M case it is guaranteed to be at
least 65535 by being set to PN_XNUM(0xFFFF), much closer to the
actual correct value than the O-O case.
Test Program
============
Here is a test program mkmmaps.c that is useful to produce the
corefile with many mmaps. To use this, please take the following
steps:
$ ulimit -c unlimited
$ sysctl vm.max_map_count=70000 # default 65530 is too small
$ sysctl fs.file-max=70000
$ mkmmaps 65535
Then, the program will abort and a corefile will be generated.
If failed, there are two cases according to the error message
displayed.
* ``out of memory'' means vm.max_map_count is still smaller
* ``too many open files'' means fs.file-max is still smaller
So, please change it to a larger value, and then retry it.
mkmmaps.c
==
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, char **argv)
{
int maps_num;
if (argc < 2) {
fprintf(stderr, "mkmmaps [number of maps to be created]\n");
exit(1);
}
if (sscanf(argv[1], "%d", &maps_num) == EOF) {
perror("sscanf");
exit(2);
}
if (maps_num < 0) {
fprintf(stderr, "%d is invalid\n", maps_num);
exit(3);
}
for (; maps_num > 0; --maps_num) {
if (MAP_FAILED == mmap((void *)NULL, (size_t) 1, PROT_READ,
MAP_SHARED | MAP_ANONYMOUS, (int) -1,
(off_t) NULL)) {
perror("mmap");
exit(4);
}
}
abort();
{
char buffer[128];
sprintf(buffer, "wc -l /proc/%u/maps", getpid());
system(buffer);
}
return 0;
}
Tested on i386, ia64 and um/sys-i386.
Built on sh4 (which covers fs/binfmt_elf_fdpic.c)
References
==========
- Sun microsystems: Linker and Libraries.
Part No: 817-1984-17, September 2008.
URL: http://docs.sun.com/app/docs/doc/817-1984
- System V ABI AMD64 Architecture Processor Supplement
Draft Version 0.99., May 11, 2009.
URL: http://www.x86-64.org/
This patch:
There are three different definitions for dump_seek() functions in
binfmt_aout.c, binfmt_elf.c and binfmt_elf_fdpic.c, respectively. The
only for binfmt_elf.c.
My next patch will move dump_seek() into a header file in order to share
the same implementations for dump_write() and dump_seek(). As the first
step, this patch unify these three definitions for dump_seek() by applying
the past commits that have been applied only for binfmt_elf.c.
Specifically, the modification made here is part of the following commits:
* d025c9db7f31fc0554ce7fb2dfc78d35a77f3487
* 7f14daa19ea36b200d237ad3ac5826ae25360461
This patch does not change a shape of corefiles.
Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|