| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Move inode deletion code out of blocking_cb handle_callback route to
avoid racy conditions that end up blocking lock_dlm1 thread. Fix
bugzilla 286821.
Signed-off-by: Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new flag to the gfs2_holder structure GL_FLOCK.
It is set on holders of glocks representing flocks. This flag is
checked in add_to_queue() and a process is permitted to queue more
than one holder onto a glock if it is set. This solves the issue
of a process not being able to do multiple flocks on the same file.
Through a single descriptor, a process can now promote and demote
flocks. Through multiple descriptors a process can now queue
multiple flocks on the same file. There's still the problem of
a process deadlocking itself (because gfs2 blocking locks are not
interruptible) by queueing incompatible deadlock.
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following alters gfs2_trans_add_revoke() to take a struct
gfs2_bufdata as an argument. This eliminates the memory allocation which
was previously required by making use of the already existing struct
gfs2_bufdata. It makes some sanity checks to ensure that the
gfs2_bufdata has been removed from all the lists before its recycled as
a revoke structure. This saves one memory allocation and one free per
revoke structure.
Also as a result, and to simplify the locking, since there is no longer
any blocking code in gfs2_trans_add_revoke() we must hold the log lock
whenever this function is called. This reduces the amount of times we
take and unlock the log lock.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The old revoke structure was allocated using kalloc/kfree but
there is a slab cache for gfs2_bufdata, so we should use that
now that the structures have been converted.
This is part two of the patch series to merge the revoke
and gfs2_bufdata structures.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both the revoke structure and the bufdata structure are quite similar.
They are basically small tags which are put on lists. In addition to
which the revoke structure is always allocated when there is a bufdata
structure which is (or can be) freed. As such it should be possible to
reduce the number of frees and allocations by using the same structure
for both purposes.
This patch is the first step along that path. It replaces existing uses
of the revoke structure with the bufdata structure.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
| |
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following patch removes the ordered write processing from
databuf_lo_before_commit() and moves it to log.c. This has the effect of
greatly simplyfying databuf_lo_before_commit() and well as potentially
making the ordered write code more efficient.
As a side effect of this, its now possible to remove ordered buffers
from the ordered buffer list at any time, so we now make use of this in
invalidatepage and releasepage to ensure timely release of these
buffers.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gfs2_pin and gfs2_unpin are only used in lops.c, despite being
defined in meta_io.c, so this patch moves them into lops.c and
makes them static. At the same time, its possible to clean up
the locking in the buf and databuf _lo_add() functions so that
we only need to grab the spinlock once. Also we have to move
lock_buffer() around the _lo_add() functions since we can't
do that in gfs2_pin() any more since we hold the spinlock
for the duration of that function.
As a result, the code shrinks by 12 lines and we do far fewer
operations when adding buffers to the log. It also makes the
code somewhat easier to read & understand.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
| |
Journaled data is marked dirty by gfs2_unpin and should not be marked
dirty here.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
| |
This collects together the operations required to remove a gfs2_bufdata
from the ail lists. Its only called from two places to start with, but
expect to see more of this function in future.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
| |
This patch corrects the lock ordering in unlink to be the same as
that in the rest of GFS2, i.e. parent -> child -> rgrp.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a nasty inode meta data corruption issue by keeping the buffer head in
icache array. This buffer needs to stay in memory until journal flush occurs
Otherwise, gfs2_meta_inode_buffer could do a disk read before the inode hits
disk. It ends up with meta data corruptions. The buffer will be released as
part of the existing journal flush logic.
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a lot of IO, with some distributed mmap IO, is run on a GFS2 filesystem in
a cluster, it will deadlock. The reason is that do_no_page() will repeatedly
call gfs2_sharewrite_nopage(), because each node keeps giving up the glock
too early, and is forced to call unmap_mapping_range(). This bumps the
mapping->truncate_count sequence count, forcing do_no_page() to retry. This
patch institutes a minimum glock hold time a tenth a second. This insures
that even in heavy contention cases, the node has enough time to get some
useful work done before it gives up the glock.
A second issue is that when gfs2_glock_dq() is called from within a page fault
to demote a lock, and the associated page needs to be written out, it will
try to acqire a lock on it, but it has already been locked at a higher level.
This patch puts makes gfs2_glock_dq() use the work queue as well, to avoid this
issue. This is the same patch as Steve Whitehouse originally proposed to fix
this issue, execpt that gfs2_glock_dq() now grabs a reference to the glock
before it queues up the work on it.
Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When you try to mount gfs2 with -o garbage, the mount fails and the gfs2
superblock is deallocated and becomes NULL. The vfs comes around later
on and calls gfs2_kill_sb. At this point the hidden gfs2 superblock
pointer (sb->s_fs_info) is NULL and dereferencing it through
gfs2_meta_syncfs causes the panic. (the other function call to
gfs2_delete_debugfs_file() succeeds because this function already checks
for a NULL pointer)
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a patch to GFS2 to protect sd_log_num_jdata with the
gfs2_log_lock. Without this patch, there is a timing window
where you can get hit the following assert from function
gfs2_log_flush():
gfs2_assert_withdraw(sdp,
sdp->sd_log_num_buf + sdp->sd_log_num_jdata ==
sdp->sd_log_commited_buf +
sdp->sd_log_commited_databuf);
I've tested it on my roth cluster and it fixes the problem.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
With this patch, gfs2 glockdump through the debugfs filesystem will only
dump glocks for the specified filesystem instead of all glocks. Also, to
aid debugging, the glock number is dumped in hex instead of decimal.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Abhijith Das <adas@redhat.com>
|
|
|
|
|
|
|
|
| |
This patch fixes the slight mess made in lowcomms closing by previous patches
and fixes all sorts of DLM hangs.
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Current GFS2 setattr call unconditionally invokes do_shrink even the
requested size and actual file size are equal. This has generated large
amount of extra IOs found during NFS benchmark runs. This patch moves
the relevant logic out of shrink code path. Since setattr is a system
call, the time stamps update is still required.
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
match_token() was returning garbage data instead of a fail value. This data
happened to match a valid option id for an option that required an argument (in
this case, lockproto=%s) For match_token() to correctly fail if the option
doesn't match any of the tokens, the token table must end with a NULL entry.
This patch adds the NULL entry.
Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
This was missing from the dir_split_leaf() function although in
most cases its not a problem due to other functions having
already previously called gfs2_trans_add_bh. This makes certain
that it is correct.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Wendy Cheng <wcheng@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes some bugs relating to journaled data files by cleaning
up the gfs2_invalidatepage() and gfs2_releasepage() functions. We now
never block during gfs2_releasepage(), instead we always either release
or refuse to release depending on the status of the buffers.
This fixes Red Hat bugzillas #248969 and #252392.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the filesystem part of the patches to fix this bz. There are
additional userland patches (gfs2_quota, libgfs2) for the complete
solution. This patch adds a new field qu_ll_next to the gfs2_quota
structure. This field allows us to create linked lists of quotas in the
ondisk quota inode. Instead of scanning through the entire sparse quota
file for valid quotas, we can now simply walk through the user and group
quota linked lists to perform the do_list operation.
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
| |
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
This patch forcibly unstuffs (if stuffed) the hidden quota inode at the
first availble opportunity. In any practical scenario the quota inode
won't be stuffed, so this is ok to do. Unstuffing the quota inode allows
us to ignore the case of a stuffed quota inode in gfs2_adjust_quota().
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
| |
the original code could work, but I think this code could work better.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
| |
sb->s_fs_info is a void pointer, thus the type cast is not needed.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
| |
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is for bugzilla bug #248176: GFS2: invalid metadata block
Patches 1 thru 3 were accepted upstream, but there were problems
with 4 and 5. Those issues have been resolved and now the recovery
tests are passing without errors. This code has gone through
41 * 3 successful gfs2 recovery tests before it hit an
unrelated (openais) problem. I'm continuing to test it.
This is a complete rewrite of patch 5 for bug #248176, written by
Steve Whitehouse. This is referred to in the bugzilla record as
"new 6" and "a different solution".
The problem was that the journal inodes, although protected by
a glock, were not synched with the other nodes because they don't
use the inode glock synch operations (i.e. no "glops" were defined).
Therefore, journal recovery on a journal-recovering node were causing
the blocks to get out of sync with the node that was actually trying
to use that journal as it comes back up from a reboot.
There are two possible solutions: (1) To make the journals use the
normal inode glock sync operations, or (2) To make the journal
operations take effect immediately (i.e. no caching). Although
option 1 works, it turns out to be a lot more code. Steve opted
for option 2, which is much simpler and therefore less prone to
regression errors.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
--
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is for bugzilla bug #248176: GFS2: invalid metadata block
Patches 1 thru 3 were accepted upstream, but there were problems
with 4 and 5. Those issues have been resolved and now the recovery
tests are passing without errors. This code has gone through
41 * 3 successful gfs2 recovery tests before it hit an
unrelated (openais) problem.
This is a complete rewrite of patch 4 for bug #248176.
Part of the problem was that inodes were being recycled
before their buffers were flushed to the journal logs.
Another problem was that the clone bitmaps were being
searched for deleted inodes to recycle, but only the
"real" bitmaps should be searched for that purpose.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
| |
We only need a single gfs2_scand process rather than the one
per filesystem which we had previously. As a result the parameter
determining the frequency of gfs2_scand runs becomes a module
parameter rather than a mount parameter as it was before.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
| |
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
| |
these struct *_operations are all method tables, thus should be const.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is patch 5 of 5 for bug #248176
Metadata corruption was occurring because page references weren't
being removed in all cases. I previously added a function called
detach_bufdata, but I discovered there already WAS a function out
there to do the job. It's called gfs2_meta_cache_flush. So I added
a call to that to remove the page references.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
| |
this is more clear.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is patch three of five for bug #248176.
The try_rgrp_unlink code in rgrp.c had an infinite loop. This was
caused because the bitmap function rgblk_search can return a block
less than the "goal" block, in which case it was looping. The fix is
to make it always march forward as needed.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This is patch 2 of 5 for bug #248176.
The list_move code previously concocted in log.c for bug #238162
(see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=238162#c23)
never runs as bh can now never be NULL at this point.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
This is the first of five patches for bug #248176:
There were still some critical variables being manipulated outside
the log_lock spinlock. That usually resulted in a hang.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
| |
This fixes an oops which was occurring during glock dumping due to the
seq file code not taking a reference to the glock. Also this fixes a
memory leak which occurred in certain cases, in turn preventing the
filesystem from unmounting.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
| |
When looking at an unrelated problem, I noticed that nfsd does not
set nameidata pointer on create (ie nd is NULL). This should
cause an oops in some cases in which when NFSd is mounted over GFS2.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
| |
This patch cleans up duplicate includes in
fs/gfs2/
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
If a glock is in the exclusive state and a request for demote to
deferred has been received, then further requests for demote to
shared are being ignored. This patch fixes that by ensuring that
we demote to unlocked in that case.
Signed-off-by: Josef Whiter <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One of the races relates to referencing a variable while not holding
its protecting spinlock. The patch simply moves the test inside the
spin lock. The other races occurs when a demote to unlocked request
occurs during the time a demote to shared request is already running.
This of course only happens in the case that the lock was in the
exclusive mode to start with. The patch adds a check to see if another
demote request has occurred in the mean time and if it has, then it
performs a second demote.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The recent fix for a circular lock dependency unfortunately introduced a
potential memory leak in the event where the call to nlmsvc_lookup_host
fails for some reason.
Thanks to Roel Kluin for spotting this.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
| |
When IOCB_FLAG_RESFD flag is set and iocb->aio_resfd is incorrect,
statement 'goto out_put_req' is executed. At label 'out_put_req',
aio_put_req(..) is called, which requires 'req->ki_filp' set.
Signed-off-by: Yan Zheng<yanzheng@21cn.com>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6:
Blackfin arch: fix PORT_J BUG for BF537/6 EMAC driver reported by Kalle Pokki <kalle.pokki@iki.fi>
Blackfin arch: gpio pinmux and resource allocation API required by BF537 on chip ethernet mac driver
Blackfin arch: add some missing syscall
binfmt_flat: checkpatch fixing minimum support for the blackfin relocations
Binfmt_flat: Add minimum support for the Blackfin relocations
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Cc: Bernd Schmidt <bernd.schmidt@analog.com>
Cc: David McCullough <davidm@snapgear.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Miles Bader <miles.bader@necel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add minimum support for the Blackfin relocations, since we don't have
enough space in each reloc. The idea is to store a value with one
relocation so that subsequent ones can access it.
Actually, this patch is required for Blackfin. Currently if BINFMT_FLAT is
enabled, git-tree kernel will fail to compile.
Signed-off-by: Bernd Schmidt <bernd.schmidt@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David McCullough <davidm@snapgear.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Miles Bader <miles.bader@necel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|/
|
|
|
|
|
|
| |
The fs was not unlocking the local alloc inode mutex in the code path in
which it failed to find a window of free bits in the global bitmap.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nick Piggin points out that splice isn't being good about the mmap
semaphore: while two readers can nest inside each others, it does leave
a possible deadlock if a writer (ie a new mmap()) comes in during that
nesting.
Original "just move the locking" patch by Nick, replaced by one by me
based on an optimistic pagefault_disable(). And then Jens tested and
updated that patch.
Reported-by: Nick Piggin <npiggin@suse.de>
Tested-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
on-disk version is newer."
This reverts commit b394e43e995d08821588a22561c6a71a63b4ff27.
Lachlan McIlroy says:
It tried to fix an issue where log replay is replaying an inode cluster
initialisation transaction that should not be replayed because the inode
cluster on disk is more up to date. Since we don't log file sizes (we
rely on inode flushing to get them to disk) then we can't just replay
all the transations in the log and expect the inode to be completely
restored. We lose file size updates. Unfortunately this fix is causing
more (serious) problems than it is fixing.
SGI-PV: 969656
SGI-Modid: xfs-linux-melb:xfs-kern:29804a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|