| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Create all new directories with OCFS2_INLINE_DATA_FL and the inline data
bytes formatted as an empty directory. Inode size field reflects the actual
amount of inline data available, which makes searching for dirent space
very similar to the regular directory search.
Inline-data directories are automatically pushed out to extents on any
insert request which is too large for the available space.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This splits out extent based directory read support and implements
inline-data versions of those functions. All knowledge of inline-data versus
extent based directories is internalized. For lookups the code uses
ocfs2_find_entry_id(), full dir iterations make use of
ocfs2_dir_foreach_blk_id().
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline
inode data.
For the most part, the changes to the core write code can be relied on to do
the heavy lifting. Any code calling ocfs2_write_begin (including shared
writeable mmap) can count on it doing the right thing with respect to
growing inline data to an extent tree.
Size reducing truncates, including UNRESVP can simply zero that portion of
the inode block being removed. Size increasing truncatesm, including RESVP
have to be a little bit smarter and grow the inode to an extent tree if
necessary.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This hooks up ocfs2_readpage() to populate a page with data from an inode
block. Direct IO reads from inline data are modified to fall back to
buffered I/O. Appropriate checks are also placed in the extent map code to
avoid reading an extent list when inline data might be stored.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add the disk, network and memory structures needed to support data in inode.
Struct ocfs2_inline_data is defined and embedded in ocfs2_dinode for storing
inline data.
A new inode field, i_dyn_features, is added to facilitate tracking of
dynamic inode state. Since it will be used often, we want to mirror it on
ocfs2_inode_info, and transfer it via the meta data lvb.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The check to see if a new dirent would fit in an old one is pretty ugly, and
it's done at least twice. Clean things up by putting this in it's own
easier-to-read function.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
ocfs2_rename() does direct manipulation of the dirent it's gotten back from
a directory search. Wrap this manipulation inside of a function so that we
can transparently change directory update behavior in the future. As an
added bonus, this gets rid of an ugly macro.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
A couple paths which needed to just match a parent dir + name pair to an
inode number were a bit messy because they had to deal with
ocfs2_find_files_on_disk() which returns a larger number of values. Provide
a convenience function, ocfs2_lookup_ino_from_name() which internalizes all
the extra accounting.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We can preserve the behavior of ocfs2_empty_dir(), while getting rid of the
open coded directory walk by just providing a smart filldir callback. This
also automatically gets to use the dir readahead code, though in this case
any advantage is minor at best.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
ocfs2_queue_orphans() has an open coded readdir loop which can easily just
use a directory accessor function.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
filldir_t can take this, so don't turn de->inode into a 32 bit value. Right
now this doesn't make a difference since no ocfs2 inodes overflow that, but
it could be a nasty surprise later on if some kernel code is calling
ocfs2_dir_foreach_blk() and expecting real inode numbers back...
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Put this in it's own function so that the functionality can be overridden.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The code for adding, removing, deleting directory entries was splattered all
over namei.c. I'd rather have this all centralized, so that it's easier to
make changes for inline dir data, and eventually indexed directories.
None of the code in any of the functions was changed. I only removed the
static keyword from some prototypes so that they could be exported.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We'll want to reuse most of this when pushing inline data back out to an
extent. Keeping this part as a seperate patch helps to keep the upcoming
changes for write support uncluttered.
The core portion of ocfs2_zero_cluster_pages() responsible for making sure a
page is mapped and properly dirtied is abstracted out into it's own
function, ocfs2_map_and_dirty_page(). Actual functionality doesn't change,
though zeroing becomes optional.
We also turn part of ocfs2_free_write_ctxt() into a common function for
unlocking and freeing a page array. This operation is very common (and
uniform) for Ocfs2 cluster sizes greater than page size, so it makes sense
to keep the code in one place.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
By doing this, we can remove any higher level logic which has to have
knowledge of btree functionality - any callers of ocfs2_write_begin() can
now expect it to do anything necessary to prepare the inode for new data.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
ocfs2-tools added some on-disk fields and flags which are used by
tunefs.ocfs2.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Implement sops->show_options() so as to allow /proc/mounts to show the mount
options.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This is technically harmless (recovery will clean it out later), but leaves
a bogus entry in the slot_map which really shouldn't be there.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
c_used_tail_recs in struct ocfs2_merge_ctxt is only ever set, so we can
remove it.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
delete_tail_recs in ocfs2_try_to_merge_extent() was only ever set, remove
it.
Signed-off-by: Tao Mao <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ocfs2_insert_type->ins_free_records was only used in one place, and was set
incorrectly in most places. We can free up some memory and lose some code by
removing this.
* Small warning fixup contributed by Andrew Mortom <akpm@linux-foundation.org>
Signed-off-by: Tao Mao <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Big thanks go to Mathias Kolehmainen for reporting the bug, providing
debug output and testing the patches I sent him to get it working.
The fix was to stop calling ntfs_attr_set() at mount time as that causes
balance_dirty_pages_ratelimited() to be called which on systems with
little memory actually tries to go and balance the dirty pages which tries
to take the s_umount semaphore but because we are still in fill_super()
across which the VFS holds s_umount for writing this results in a
deadlock.
We now do the dirty work by hand by submitting individual buffers. This
has the annoying "feature" that mounting can take a few seconds if the
journal is large as we have clear it all. One day someone should improve
on this by deferring the journal clearing to a helper kernel thread so it
can be done in the background but I don't have time for this at the moment
and the current solution works fine so I am leaving it like this for now.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |\ \
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (51 commits)
[DLM] block dlm_recv in recovery transition
[DLM] don't overwrite castparam if it's NULL
[GFS2] Get superblock a different way
[GFS2] Don't try to remove buffers that don't exist
[GFS2] Alternate gfs2_iget to avoid looking up inodes being freed
[GFS2] Data corruption fix
[GFS2] Clean up journaled data writing
[GFS2] GFS2: chmod hung - fix race in thread creation
[DLM] Make dlm_sendd cond_resched more
[GFS2] Move inode deletion out of blocking_cb
[GFS2] flocks from same process trip kernel BUG at fs/gfs2/glock.c:1118!
[GFS2] Clean up gfs2_trans_add_revoke()
[GFS2] Use slab operations for all gfs2_bufdata allocations
[GFS2] Replace revoke structure with bufdata structure
[GFS2] Fix ordering of dirty/journal for ordered buffer unstuffing
[GFS2] Clean up ordered write code
[GFS2] Move pin/unpin into lops.c, clean up locking
[GFS2] Don't mark jdata dirty in gfs2_unstuffer_page()
[GFS2] Introduce gfs2_remove_from_ail
[GFS2] Correct lock ordering in unlink
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
threads while working in the dlm. This allows dlm_recv activity to be
suspended when the lockspace transitions to, from and between recovery
cycles.
The specific bug prompting this change is one where an in-progress
recovery cycle is aborted by a new recovery cycle. While dlm_recv was
processing a recovery message, the recovery cycle was aborted and
dlm_recoverd began cleaning up. dlm_recv decremented recover_locks_count
on an rsb after dlm_recoverd had reset it to zero. This is fixed by
suspending dlm_recv (taking write lock on the rwsem) before aborting the
current recovery.
The transitions to/from normal and recovery modes are simplified by using
this new ability to block dlm_recv. The switch from normal to recovery
mode means dlm_recv goes from processing locking messages, to saving them
for later, and vice versa. Races are avoided by blocking dlm_recv when
setting the flag that switches between modes.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the castaddr passed to the userland API is NULL then don't overwrite the
existing castparam. This allows a different thread to cancel a lock request and
the CANCEL AST gets delivered to the original thread.
bz#306391 (for RHEL4) refers.
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The mapping may be NULL by the time the I/O has completed, so
we now get the superblock by a different route (via the bd and glock)
to avoid this problem.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Wendy Cheng <wcheng@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There is a possible deadlock between two processes on the same node, where one
process is deleting an inode, and another process is looking for allocated but
unused inodes to delete in order to create more space.
process A does an iput() on inode X, and it's i_count drops to 0. This causes
iput_final() to be called, which puts an inode into state I_FREEING at
generic_delete_inode(). There no point between when iput_final() is called, and
when I_FREEING is set where GFS2 could acquire any glocks. Once I_FREEING is
set, no other process on that node can successfully look up that inode until
the delete finishes.
process B locks the the resource group for the same inode in get_local_rgrp(),
which is called by gfs2_inplace_reserve_i()
process A tries to lock the resource group for the inode in
gfs2_dinode_dealloc(), but it's already locked by process B
process B waits in find_inode for the inode to have the I_FREEING state cleared.
Deadlock.
This patch solves the problem by adding an alternative to gfs2_iget(),
gfs2_iget_skip(), that simply skips any inodes that are in the I_FREEING
state.o The alternate test function is just like the original one, except that
it fails if the inode is being freed, and sets a skipped flag. The alternate
set function is just like the original, except that it fails if the skipped
flag is set. Only try_rgrp_unlink() calls gfs2_iget_skip() instead of
gfs2_iget().
Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* GFS2 has been using i_cache array to store its indirect meta blocks.
Its flush routine doesn't correctly clean up all the entries. The
problem would show while multiple nodes do simultaneous writes to the
same file. Upon glock exclusive lock transfer, if the file is a sparse
file with large file size where the indirect meta blocks span multiple
array entries with "zero" entries in between. The flush routine
prematurely stops the flushing that leaves old (stale) entries around.
This leads to several nasty issues, including data corruption.
* Fix gfs2_get_block_noalloc checking to correctly return EIO upon
unmapped buffer.
Signed-off-by: Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch cleans up the code for writing journaled data into the log.
It also removes the need to allocate a small "tag" structure for each
block written into the log. Instead we just keep count of the outstanding
I/O so that we can be sure that its all been written at the correct time.
Another result of this patch is that a number of ll_rw_block() calls
have become submit_bh() calls, closing some races at the same time.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The problem boiled down to a race between the gdlm_init_threads()
function initializing thread1 and its setting of blist = 1.
Essentially, "if (current == ls->thread1)" was checked by the thread
before the thread creator set ls->thread1.
Since thread1 is the only thread who is allowed to work on the
blocking queue, and since neither thread thought it was thread1, no one
was working on the queue. So everything just sat.
This patch reuses the ls->async_lock spin_lock to fix the race,
and it fixes the problem. I've done more than 2000 iterations of the
loop that was recreating the failure and it seems to work.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
--
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Under high recovery loads dlm_sendd can monopolise the CPU and cause soft lockups.
This one extra and one moved cond_resched() make it yield a little more during
such times keeping work moving.
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Move inode deletion code out of blocking_cb handle_callback route to
avoid racy conditions that end up blocking lock_dlm1 thread. Fix
bugzilla 286821.
Signed-off-by: Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds a new flag to the gfs2_holder structure GL_FLOCK.
It is set on holders of glocks representing flocks. This flag is
checked in add_to_queue() and a process is permitted to queue more
than one holder onto a glock if it is set. This solves the issue
of a process not being able to do multiple flocks on the same file.
Through a single descriptor, a process can now promote and demote
flocks. Through multiple descriptors a process can now queue
multiple flocks on the same file. There's still the problem of
a process deadlocking itself (because gfs2 blocking locks are not
interruptible) by queueing incompatible deadlock.
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The following alters gfs2_trans_add_revoke() to take a struct
gfs2_bufdata as an argument. This eliminates the memory allocation which
was previously required by making use of the already existing struct
gfs2_bufdata. It makes some sanity checks to ensure that the
gfs2_bufdata has been removed from all the lists before its recycled as
a revoke structure. This saves one memory allocation and one free per
revoke structure.
Also as a result, and to simplify the locking, since there is no longer
any blocking code in gfs2_trans_add_revoke() we must hold the log lock
whenever this function is called. This reduces the amount of times we
take and unlock the log lock.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The old revoke structure was allocated using kalloc/kfree but
there is a slab cache for gfs2_bufdata, so we should use that
now that the structures have been converted.
This is part two of the patch series to merge the revoke
and gfs2_bufdata structures.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Both the revoke structure and the bufdata structure are quite similar.
They are basically small tags which are put on lists. In addition to
which the revoke structure is always allocated when there is a bufdata
structure which is (or can be) freed. As such it should be possible to
reduce the number of frees and allocations by using the same structure
for both purposes.
This patch is the first step along that path. It replaces existing uses
of the revoke structure with the bufdata structure.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The following patch removes the ordered write processing from
databuf_lo_before_commit() and moves it to log.c. This has the effect of
greatly simplyfying databuf_lo_before_commit() and well as potentially
making the ordered write code more efficient.
As a side effect of this, its now possible to remove ordered buffers
from the ordered buffer list at any time, so we now make use of this in
invalidatepage and releasepage to ensure timely release of these
buffers.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
gfs2_pin and gfs2_unpin are only used in lops.c, despite being
defined in meta_io.c, so this patch moves them into lops.c and
makes them static. At the same time, its possible to clean up
the locking in the buf and databuf _lo_add() functions so that
we only need to grab the spinlock once. Also we have to move
lock_buffer() around the _lo_add() functions since we can't
do that in gfs2_pin() any more since we hold the spinlock
for the duration of that function.
As a result, the code shrinks by 12 lines and we do far fewer
operations when adding buffers to the log. It also makes the
code somewhat easier to read & understand.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Journaled data is marked dirty by gfs2_unpin and should not be marked
dirty here.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This collects together the operations required to remove a gfs2_bufdata
from the ail lists. Its only called from two places to start with, but
expect to see more of this function in future.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch corrects the lock ordering in unlink to be the same as
that in the rest of GFS2, i.e. parent -> child -> rgrp.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fix a nasty inode meta data corruption issue by keeping the buffer head in
icache array. This buffer needs to stay in memory until journal flush occurs
Otherwise, gfs2_meta_inode_buffer could do a disk read before the inode hits
disk. It ends up with meta data corruptions. The buffer will be released as
part of the existing journal flush logic.
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When a lot of IO, with some distributed mmap IO, is run on a GFS2 filesystem in
a cluster, it will deadlock. The reason is that do_no_page() will repeatedly
call gfs2_sharewrite_nopage(), because each node keeps giving up the glock
too early, and is forced to call unmap_mapping_range(). This bumps the
mapping->truncate_count sequence count, forcing do_no_page() to retry. This
patch institutes a minimum glock hold time a tenth a second. This insures
that even in heavy contention cases, the node has enough time to get some
useful work done before it gives up the glock.
A second issue is that when gfs2_glock_dq() is called from within a page fault
to demote a lock, and the associated page needs to be written out, it will
try to acqire a lock on it, but it has already been locked at a higher level.
This patch puts makes gfs2_glock_dq() use the work queue as well, to avoid this
issue. This is the same patch as Steve Whitehouse originally proposed to fix
this issue, execpt that gfs2_glock_dq() now grabs a reference to the glock
before it queues up the work on it.
Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When you try to mount gfs2 with -o garbage, the mount fails and the gfs2
superblock is deallocated and becomes NULL. The vfs comes around later
on and calls gfs2_kill_sb. At this point the hidden gfs2 superblock
pointer (sb->s_fs_info) is NULL and dereferencing it through
gfs2_meta_syncfs causes the panic. (the other function call to
gfs2_delete_debugfs_file() succeeds because this function already checks
for a NULL pointer)
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is a patch to GFS2 to protect sd_log_num_jdata with the
gfs2_log_lock. Without this patch, there is a timing window
where you can get hit the following assert from function
gfs2_log_flush():
gfs2_assert_withdraw(sdp,
sdp->sd_log_num_buf + sdp->sd_log_num_jdata ==
sdp->sd_log_commited_buf +
sdp->sd_log_commited_databuf);
I've tested it on my roth cluster and it fixes the problem.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
With this patch, gfs2 glockdump through the debugfs filesystem will only
dump glocks for the specified filesystem instead of all glocks. Also, to
aid debugging, the glock number is dumped in hex instead of decimal.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Abhijith Das <adas@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch fixes the slight mess made in lowcomms closing by previous patches
and fixes all sorts of DLM hangs.
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|