aboutsummaryrefslogtreecommitdiffstats
path: root/fs/xfs/linux-2.6/xfs_super.c
Commit message (Collapse)AuthorAgeFilesLines
* xfs: log the inode in ->write_inode calls for kupdateChristoph Hellwig2012-01-061-25/+5
| | | | | | | | | | | | | | | | | | | | | | | | Commit 0b8fd3033c308e4088760aa1d38ce77197b4e074 upstream. If the writeback code writes back an inode because it has expired we currently use the non-blockin ->write_inode path. This means any inode that is pinned is skipped. With delayed logging and a workload that has very little log traffic otherwise it is very likely that an inode that gets constantly written to is always pinned, and thus we keep refusing to write it. The VM writeback code at that point redirties it and doesn't try to write it again for another 30 seconds. This means under certain scenarious time based metadata writeback never happens. Fix this by calling into xfs_log_inode for kupdate in addition to data integrity syncs, and thus transfer the inode to the log ASAP. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Tested-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* xfs: fix ->write_inode return valuesChristoph Hellwig2011-11-261-25/+9
| | | | | | | | | | | | | | | | | patch 58d84c4ee0389ddeb86238d5d8359a982c9f7a5b upstream. Currently we always redirty an inode that was attempted to be written out synchronously but has been cleaned by an AIL pushed internall, which is rather bogus. Fix that by doing the i_update_core check early on and return 0 for it. Also include async calls for it, as doing any work for those is just as pointless. While we're at it also fix the sign for the EIO return in case of a filesystem shutdown, and fix the completely non-sensical locking around xfs_log_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* xfs: revert to using a kthread for AIL pushingChristoph Hellwig2011-10-251-12/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0030807c66f058230bcb20d2573bcaf28852e804 upstream Currently we have a few issues with the way the workqueue code is used to implement AIL pushing: - it accidentally uses the same workqueue as the syncer action, and thus can be prevented from running if there are enough sync actions active in the system. - it doesn't use the HIGHPRI flag to queue at the head of the queue of work items At this point I'm not confident enough in getting all the workqueue flags and tweaks right to provide a perfectly reliable execution context for AIL pushing, which is the most important piece in XFS to make forward progress when the log fills. Revert back to use a kthread per filesystem which fixes all the above issues at the cost of having a task struct and stack around for each mounted filesystem. In addition this also gives us much better ways to diagnose any issues involving hung AIL pushing and removes a small amount of code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Stefan Priebe <s.priebe@profihost.ag> Tested-by: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* xfs: start periodic workers laterChristoph Hellwig2011-10-251-21/+14
| | | | | | | | | | | | | | | | | | | commit 2bcf6e970f5a88fa05dced5eeb0326e13d93c4a1 upstream Start the periodic sync workers only after we have finished xfs_mountfs and thus fully set up the filesystem structures. Without this we can call into xfs_qm_sync before the quotainfo strucute is set up if the mount takes unusually long, and probably hit other incomplete states as well. Also clean up the xfs_fs_fill_super error path by using consistent label names, and removing an impossible to reach case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* xfs: make log devices with write back caches workChristoph Hellwig2011-06-161-75/+0
| | | | | | | | | | | | | There's no reason not to support cache flushing on external log devices. The only thing this really requires is flushing the data device first both in fsync and log commits. A side effect is that we also have to remove the barrier write test during mount, which has been superflous since the new FLUSH+FUA code anyway. Also use the chance to flush the RT subvolume write cache before the fsync commit, which is required for correct semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* fs: pass exact type of data dirties to ->dirty_inodeChristoph Hellwig2011-05-271-1/+2
| | | | | | | | | | | | | | | | | Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or anything else, so that the filesystem can track internally if it needs to push out a transaction for fdatasync or not. This is just the prototype change with no user for it yet. I plan to push large XFS changes for the next merge window, and getting this trivial infrastructure in this window would help a lot to avoid tree interdependencies. Also remove incorrect comments that ->dirty_inode can't block. That has been changed a long time ago, and many implementations rely on it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* xfs: add online discard supportChristoph Hellwig2011-05-241-2/+16
| | | | | | | | | | | | | | Now that we have reliably tracking of deleted extents in a transaction we can easily implement "online" discard support which calls blkdev_issue_discard once a transaction commits. The actual discard is a two stage operation as we first have to mark the busy extent as not available for reuse before we can start the actual discard. Note that we don't bother supporting discard for the non-delaylog mode. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* xfs: fix duplicate workqueue initialisationDave Chinner2011-05-191-4/+0
| | | | | | | | | The workqueue initialisation function is called twice when initialising the XFS subsystem. Remove the second initialisation call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
* xfs_destroy_workqueues() should not be tagged with__exitLuck, Tony2011-04-111-1/+1
| | | | | | | | | | ia64 throws away .exit sections for the built-in CONFIG case, so routines that are used in other circumstances should not be tagged as __exit. Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* xfs: convert the xfsaild threads to a workqueueDave Chinner2011-04-081-84/+43
| | | | | | | | | | | | | | | Similar to the xfssyncd, the per-filesystem xfsaild threads can be converted to a global workqueue and run periodically by delayed works. This makes sense for the AIL pushing because it uses variable timeouts depending on the work that needs to be done. By removing the xfsaild, we simplify the AIL pushing code and remove the need to spread the code to implement the threading and pushing across multiple files. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
* xfs: convert ENOSPC inode flushing to use new syncd workqueueDave Chinner2011-04-081-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | On of the problems with the current inode flush at ENOSPC is that we queue a flush per ENOSPC event, regardless of how many are already queued. Thi can result in hundreds of queued flushes, most of which simply burn CPU scanned and do no real work. This simply slows down allocation at ENOSPC. We really only need one active flush at a time, and we can easily implement that via the new xfs_syncd_wq. All we need to do is queue a flush if one is not already active, then block waiting for the currently active flush to complete. The result is that we only ever have a single ENOSPC inode flush active at a time and this greatly reduces the overhead of ENOSPC processing. On my 2p test machine, this results in tests exercising ENOSPC conditions running significantly faster - 042 halves execution time, 083 drops from 60s to 5s, etc - while not introducing test regressions. This allows us to remove the old xfssyncd threads and infrastructure as they are no longer used. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
* xfs: introduce a xfssyncd workqueueDave Chinner2011-04-081-13/+17
| | | | | | | | | | | | | | | | | | | | | | | | | All of the work xfssyncd does is background functionality. There is no need for a thread per filesystem to do this work - it can al be managed by a global workqueue now they manage concurrency effectively. Introduce a new gglobal xfssyncd workqueue, and convert the periodic work to use this new functionality. To do this, use a delayed work construct to schedule the next running of the periodic sync work for the filesystem. When the sync work is complete, queue a new delayed work for the next running of the sync work. For laptop mode, we wait on completion for the sync works, so ensure that the sync work queuing interface can flush and wait for work to complete to enable the work queue infrastructure to replace the current sequence number and wakeup that is used. Because the sync work does non-trivial amounts of work, mark the new work queue as CPU intensive. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
* xfs: register the inode cache shrinker before quotachecksDave Chinner2011-03-261-10/+24
| | | | | | | | | | | | | | | | | | | | | | | | During mount, we can do a quotacheck that involves a bulkstat pass on all inodes. If there are more inodes in the filesystem than can be held in memory, we require the inode cache shrinker to run to ensure that we don't run out of memory. Unfortunately, the inode cache shrinker is not registered until we get to the end of the superblock setup process, which is after a quotacheck is run if it is needed. Hence we need to register the inode cache shrinker earlier in the mount process so that we don't OOM during mount. This requires that we also initialise the syncd work before we register the shrinker, so we nee dto juggle that around as well. While there, make sure that we have set up the block sizes in the VFS superblock correctly before the quotacheck is run so that any inodes that are cached as a result of the quotacheck have their block size fields set up correctly. Cc: stable@kernel.org Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
* xfs: introduce inode cluster buffer trylocks for xfs_iflushDave Chinner2011-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | There is an ABBA deadlock between synchronous inode flushing in xfs_reclaim_inode and xfs_icluster_free. xfs_icluster_free locks the buffer, then takes inode ilocks, whilst synchronous reclaim takes the ilock followed by the buffer lock in xfs_iflush(). To avoid this deadlock, separate the inode cluster buffer locking semantics from the synchronous inode flush semantics, allowing callers to attempt to lock the buffer but still issue synchronous IO if it can get the buffer. This requires xfs_iflush() calls that currently use non-blocking semantics to pass SYNC_TRYLOCK rather than 0 as the flags parameter. This allows xfs_reclaim_inode to avoid the deadlock on the buffer lock and detect the failure so that it can drop the inode ilock and restart the reclaim attempt on the inode. This allows xfs_ifree_cluster to obtain the inode lock, mark the inode stale and release it and hence defuse the deadlock situation. It also has the pleasant side effect of avoiding IO in xfs_reclaim_inode when it tries to next reclaim the inode as it is now marked stale. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
* xfs: Convert linux-2.6/ files to new logging interfaceDave Chinner2011-03-071-70/+57
| | | | | | | | | | | Convert the files in fs/xfs/linux-2.6/ to use the new xfs_<level> logging format that replaces the old Irix inherited cmn_err() interfaces. While there, also convert naked printk calls to use the relevant xfs logging function to standardise output format. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* xfs: enable delaylog by defaultChristoph Hellwig2011-02-221-0/+1
| | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
* Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2011-01-141-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: prevent NMI timeouts in cmn_err xfs: Add log level to assertion printk xfs: fix an assignment within an ASSERT() xfs: fix error handling for synchronous writes xfs: add FITRIM support xfs: ensure log covering transactions are synchronous xfs: serialise unaligned direct IOs xfs: factor common write setup code xfs: split buffered IO write path from xfs_file_aio_write xfs: split direct IO write path from xfs_file_aio_write xfs: introduce xfs_rw_lock() helpers for locking the inode xfs: factor post-write newsize updates xfs: factor common post-write isize handling code xfs: ensure sync write errors are returned
| * xfs: ensure log covering transactions are synchronousDave Chinner2011-01-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To ensure the log is covered and the filesystem idles correctly, we need to ensure that dummy transactions hit the disk and do not stay pinned in memory. If the superblock is pinned in memory, it can't be flushed so the log covering cannot make progress. The result is dependent on timing - more oftent han not we continue to issues a log covering transaction every 36s rather than idling after ~90s. Fix this by making the log covering transaction synchronous. To avoid additional log force from xfssyncd, make the log covering transaction take the place of the existing log force in the xfssyncd background sync process. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* | Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2011-01-131-2/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits) block: ensure that completion error gets properly traced blktrace: add missing probe argument to block_bio_complete block cfq: don't use atomic_t for cfq_group block cfq: don't use atomic_t for cfq_queue block: trace event block fix unassigned field block: add internal hd part table references block: fix accounting bug on cross partition merges kref: add kref_test_and_get bio-integrity: mark kintegrityd_wq highpri and CPU intensive block: make kblockd_workqueue smarter Revert "sd: implement sd_check_events()" block: Clean up exit_io_context() source code. Fix compile warnings due to missing removal of a 'ret' variable fs/block: type signature of major_to_index(int) to major_to_index(unsigned) block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p) cfq-iosched: don't check cfqg in choose_service_tree() fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors cdrom: export cdrom_check_events() sd: implement sd_check_events() sr: implement sr_check_events() ...
| * \ Merge branch 'cleanup-bd_claim' of ↵Jens Axboe2010-11-271-2/+3
| |\ \ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into for-2.6.38/core
| | * | block: clean up blkdev_get() wrappers and their usersTejun Heo2010-11-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After recent blkdev_get() modifications, open_by_devnum() and open_bdev_exclusive() are simple wrappers around blkdev_get(). Replace them with blkdev_get_by_dev() and blkdev_get_by_path(). blkdev_get_by_dev() is identical to open_by_devnum(). blkdev_get_by_path() is slightly different in that it doesn't automatically add %FMODE_EXCL to @mode. All users are converted. Most conversions are mechanical and don't introduce any behavior difference. There are several exceptions. * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no reason to OR it explicitly on blkdev_put(). * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in sb->s_mode. * With the above changes, sb->s_mode now always should contain FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect errors. The new blkdev_get_*() functions are with proper docbook comments. While at it, add function description to blkdev_get() too. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Joern Engel <joern@lazybastard.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: reiserfs-devel@vger.kernel.org Cc: xfs-masters@oss.sgi.com Cc: Alexander Viro <viro@zeniv.linux.org.uk>
| | * | block: make blkdev_get/put() handle exclusive accessTejun Heo2010-11-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Over time, block layer has accumulated a set of APIs dealing with bdev open, close, claim and release. * blkdev_get/put() are the primary open and close functions. * bd_claim/release() deal with exclusive open. * open/close_bdev_exclusive() are combination of open and claim and the other way around, respectively. * bd_link/unlink_disk_holder() to create and remove holder/slave symlinks. * open_by_devnum() wraps bdget() + blkdev_get(). The interface is a bit confusing and the decoupling of open and claim makes it impossible to properly guarantee exclusive access as in-kernel open + claim sequence can disturb the existing exclusive open even before the block layer knows the current open if for another exclusive access. Reorganize the interface such that, * blkdev_get() is extended to include exclusive access management. @holder argument is added and, if is @FMODE_EXCL specified, it will gain exclusive access atomically w.r.t. other exclusive accesses. * blkdev_put() is similarly extended. It now takes @mode argument and if @FMODE_EXCL is set, it releases an exclusive access. Also, when the last exclusive claim is released, the holder/slave symlinks are removed automatically. * bd_claim/release() and close_bdev_exclusive() are no longer necessary and either made static or removed. * bd_link_disk_holder() remains the same but bd_unlink_disk_holder() is no longer necessary and removed. * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev() and blkdev_get(). It also has an unexpected extra bdev_read_only() test which probably should be moved into blkdev_get(). * open_by_devnum() is modified to take @holder argument and pass it to blkdev_get(). Most of bdev open/close operations are unified into blkdev_get/put() and most exclusive accesses are tested atomically at the open time (as it should). This cleans up code and removes some, both valid and invalid, but unnecessary all the same, corner cases. open_bdev_exclusive() and open_by_devnum() can use further cleanup - rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop special features. Well, let's leave them for another day. Most conversions are straight-forward. drbd conversion is a bit more involved as there was some reordering, but the logic should stay the same. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Brown <neilb@suse.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Philipp Reisner <philipp.reisner@linbit.com> Cc: Peter Osterlund <petero2@telia.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: dm-devel@redhat.com Cc: drbd-dev@lists.linbit.com Cc: Leo Chen <leochen@broadcom.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Joern Engel <joern@logfs.org> Cc: reiserfs-devel@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk>
* | | | Merge branch 'for-next' of ↵Linus Torvalds2011-01-131-1/+1
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
| * | | Merge branch 'master' into for-nextJiri Kosina2010-12-221-3/+0
| |\ \ \ | | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: MAINTAINERS arch/arm/mach-omap2/pm24xx.c drivers/scsi/bfa/bfa_fcpim.c Needed to update to apply fixes for which the old branch was too outdated.
| * | | tree-wide: fix comment/printk typosUwe Kleine-König2010-11-011-1/+1
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | "gadget", "through", "command", "maintain", "maintain", "controller", "address", "between", "initiali[zs]e", "instead", "function", "select", "already", "equal", "access", "management", "hierarchy", "registration", "interest", "relative", "memory", "offset", "already", Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | | xfs: reduce the number of AIL push wakeupsDave Chinner2010-12-171-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xfaild often tries to rest to wait for congestion to pass of for IO to complete, but is regularly woken in tail-pushing situations. In severe cases, the xfsaild is getting woken tens of thousands of times a second. Reduce the number needless wakeups by only waking the xfsaild if the new target is larger than the old one. Further make short sleeps uninterruptible as they occur when the xfsaild has decided it needs to back off to allow some IO to complete and being woken early is counter-productive. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* | | xfs: provide a inode iolock lockdep classDave Chinner2010-12-231-0/+2
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The XFS iolock needs to be re-initialised to a new lock class before it enters reclaim to prevent lockdep false positives. Unfortunately, this is not sufficient protection as inodes in the XFS_IRECLAIMABLE state can be recycled and not re-initialised before being reused. We need to re-initialise the lock state when transfering out of XFS_IRECLAIMABLE state to XFS_INEW, but we need to keep the same class as if the inode was just allocated. Hence we need a specific lockdep class variable for the iolock so that both initialisations use the same class. While there, add a specific class for inodes in the reclaim state so that it is easy to tell from lockdep reports what state the inode was in that generated the report. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* | xfs: remove experimental tag from the delaylog optionChristoph Hellwig2010-11-101-3/+0
|/ | | | | | | | | We promised to do this for 2.6.37, and the code looks stable enough to keep that promise. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
* new helper: mount_bdev()Al Viro2010-10-291-7/+5
| | | | | | ... and switch of the obvious get_sb_bdev() users to ->mount() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: kill block_prepare_writeChristoph Hellwig2010-10-251-1/+1
| | | | | | | | | __block_write_begin and block_prepare_write are identical except for slightly different calling conventions. Convert all callers to the __block_write_begin calling conventions and drop block_prepare_write. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2010-10-221-13/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: (36 commits) xfs: semaphore cleanup xfs: Extend project quotas to support 32bit project ids xfs: remove xfs_buf wrappers xfs: remove xfs_cred.h xfs: remove xfs_globals.h xfs: remove xfs_version.h xfs: remove xfs_refcache.h xfs: fix the xfs_trans_committed xfs: remove unused t_callback field in struct xfs_trans xfs: fix bogus m_maxagi check in xfs_iget xfs: do not use xfs_mod_incore_sb_batch for per-cpu counters xfs: do not use xfs_mod_incore_sb for per-cpu counters xfs: remove XFS_MOUNT_NO_PERCPU_SB xfs: pack xfs_buf structure more tightly xfs: convert buffer cache hash to rbtree xfs: serialise inode reclaim within an AG xfs: batch inode reclaim lookup xfs: implement batched inode lookups for AG walking xfs: split out inode walk inode grabbing xfs: split inode AG walking into separate code for reclaim ...
| * xfs: remove xfs_buf wrappersChristoph Hellwig2010-10-181-1/+1
| | | | | | | | | | | | | | | | Stop having two different names for many buffer functions and use the more descriptive xfs_buf_* names directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * xfs: remove xfs_version.hChristoph Hellwig2010-10-181-1/+0
| | | | | | | | | | | | | | | | It used to have a place when it contained an automatically generated CVS version, but these days it's entirely superflous. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * xfs: remove XFS_MOUNT_NO_PERCPU_SBChristoph Hellwig2010-10-181-2/+4
| | | | | | | | | | | | | | | | | | Fail the mount if we can't allocate memory for the per-CPU counters. This is consistent with how we handle everything else in the mount path and makes the superblock counter modification a lot simpler. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * xfs: store xfs_mount in the buftarg instead of in the xfs_bufDave Chinner2010-10-181-3/+5
| | | | | | | | | | | | | | | | | | | | | | Each buffer contains both a buftarg pointer and a mount pointer. If we add a mount pointer into the buftarg, we can avoid needing the b_mount field in every buffer and grab it from the buftarg when needed instead. This shrinks the xfs_buf by 8 bytes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * xfs: don't use vfs writeback for pure metadata modificationsDave Chinner2010-10-181-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under heavy multi-way parallel create workloads, the VFS struggles to write back all the inodes that have been changed in age order. The bdi flusher thread becomes CPU bound, spending 85% of it's time in the VFS code, mostly traversing the superblock dirty inode list to separate dirty inodes old enough to flush. We already keep an index of all metadata changes in age order - in the AIL - and continued log pressure will do age ordered writeback without any extra overhead at all. If there is no pressure on the log, the xfssyncd will periodically write back metadata in ascending disk address offset order so will be very efficient. Hence we can stop marking VFS inodes dirty during transaction commit or when changing timestamps during transactions. This will keep the inodes in the superblock dirty list to those containing data or unlogged metadata changes. However, the timstamp changes are slightly more complex than this - there are a couple of places that do unlogged updates of the timestamps, and the VFS need to be informed of these. Hence add a new function xfs_trans_ichgtime() for transactional changes, and leave xfs_ichgtime() for the non-transactional changes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* | Merge branch 'v2.6.36-rc8' into for-2.6.37/barrierJens Axboe2010-10-191-2/+7
|\ \ | |/ | | | | | | | | | | | | | | Conflicts: block/blk-core.c drivers/block/loop.c mm/swapfile.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * xfs: dummy transactions should not dirty VFS stateDave Chinner2010-08-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we need to cover the log, we issue dummy transactions to ensure the current log tail is on disk. Unfortunately we currently use the root inode in the dummy transaction, and the act of committing the transaction dirties the inode at the VFS level. As a result, the VFS writeback of the dirty inode will prevent the filesystem from idling long enough for the log covering state machine to complete. The state machine gets stuck in a loop issuing new dummy transactions to cover the log and never makes progress. To avoid this problem, the dummy transactions should not cause externally visible state changes. To ensure this occurs, make sure that dummy transactions log an unchanging field in the superblock as it's state is never propagated outside the filesystem. This allows the log covering state machine to complete successfully and the filesystem now correctly enters a fully idle state about 90s after the last modification was made. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * xfs: ensure f_ffree returned by statfs() is non-negativeStuart Brodsky2010-08-241-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because of delayed updates to sb_icount field in the super block, it is possible to allocate over maxicount number of inodes. This causes the arithmetic to calculate a negative number of free inodes in user commands like df or stat -f. Since maxicount is a somewhat arbitrary number, a slight over allocation is not critical but user commands should be displayed as 0 or greater and never go negative. To do this the value in the stats buffer f_ffree is capped to never go negative. [ Modified to use max_t as per Christoph's comment. ] Signed-off-by: Stu Brodsky <sbrodsky@sgi.com> Signed-off-by: Dave Chinner <dchinner@redhat.com>
* | block: remove BLKDEV_IFL_WAITChristoph Hellwig2010-09-161-2/+1
|/ | | | | | | | | | | | | | All the blkdev_issue_* helpers can only sanely be used for synchronous caller. To issue cache flushes or barriers asynchronously the caller needs to set up a bio by itself with a completion callback to move the asynchronous state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always specified when calling blkdev_issue_* and also remove the now unused flags argument to blkdev_issue_flush and blkdev_issue_zeroout. For blkdev_issue_discard we need to keep it for the secure discard flag, which gains a more descriptive name and loses the bitops vs flag confusion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* convert remaining ->clear_inode() to ->evict_inode()Al Viro2010-08-091-3/+5
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* xfs: remove obsolete osyncisosync mount optionChristoph Hellwig2010-07-261-6/+4
| | | | | | | | | | | Since Linux 2.6.33 the kernel has support for real O_SYNC, which made the osyncisosync option a no-op. Warn the users about this and remove the mount flag for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* xfs: move inode shrinker unregister even earlierDave Chinner2010-07-261-5/+5
| | | | | | | | | | | | | | | | | | | | I missed Dave Chinner's second revision of this change, and pushed his first version out to the repository instead. commit a476c59ebb279d738718edc0e3fb76aab3687114 Author: Dave Chinner <dchinner@redhat.com> This commit compensates for that by moving a block of code up a bit further, with a result that matches the the effect of Dave's second version. Dave's first version was: Reviewed-by: Eric Sandeen <sandeen@redhat.com> Dave's second version was: Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
* xfs: unregister inode shrinker before freeing filesystem structuresDave Chinner2010-07-261-1/+5
| | | | | | | | | | | | | | | | Currently we don't remove the XFS mount from the shrinker list until late in the unmount path. By this time, we have already torn down the internals of the filesystem (e.g. the per-ag structures), and hence if the shrinker is executed between the teardown and the unregistering, the shrinker will get NULL per-ag structure pointers and panic trying to dereference them. Fix this by removing the xfs mount from the shrinker list before tearing down it's internal structures. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
* xfs: split xfs_itrace_entryChristoph Hellwig2010-07-261-3/+4
| | | | | | | | | | | | | | | Replace the xfs_itrace_entry catchall with specific trace points. For most simple callers we now use the simple inode class, which used to be the iget class, but add more details tracing for namespace events, which now includes the name of the directory entries manipulated. Remove the xfs_inactive trace point, which is a duplicate of the clear_inode one, and the xfs_change_file_space trace point, which is immediately followed by the more specific alloc/free space trace points. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: remove explicit xfs_sync_data/xfs_sync_attr calls on umountChristoph Hellwig2010-07-261-12/+0
| | | | | | | | | | On the final put of a superblock the VFS already calls sync_filesystem for us to write out all data and wait for it. No need to start another asynchronous writeback inside ->put_super. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: avoid synchronous transaction in xfs_fs_write_inodeChristoph Hellwig2010-07-261-30/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We already rely on the fact that the sync code will cause a synchronous log force later on (currently via xfs_fs_sync_fs -> xfs_quiesce_data -> xfs_sync_data), so no need to do this here. This allows us to avoid a lot of synchronous log forces during sync, which pays of especially with delayed logging enabled. Some compilebench numbers that show this: xfs (delayed logging, 256k logbufs) =================================== intial create 25.94 MB/s 25.75 MB/s 25.64 MB/s create 8.54 MB/s 9.12 MB/s 9.15 MB/s patch 2.47 MB/s 2.47 MB/s 3.17 MB/s compile 29.65 MB/s 30.51 MB/s 27.33 MB/s clean 90.92 MB/s 98.83 MB/s 128.87 MB/s read tree 11.90 MB/s 11.84 MB/s 8.56 MB/s read compiled 28.75 MB/s 29.96 MB/s 24.25 MB/s delete tree 8.39 seconds 8.12 seconds 8.46 seconds delete compiled 8.35 seconds 8.44 seconds 5.11 seconds stat tree 6.03 seconds 5.59 seconds 5.19 seconds stat compiled tree 9.00 seconds 9.52 seconds 8.49 seconds xfs + write_inode log_force removal =================================== intial create 25.87 MB/s 25.76 MB/s 25.87 MB/s create 15.18 MB/s 14.80 MB/s 14.94 MB/s patch 3.13 MB/s 3.14 MB/s 3.11 MB/s compile 36.74 MB/s 37.17 MB/s 36.84 MB/s clean 226.02 MB/s 222.58 MB/s 217.94 MB/s read tree 15.14 MB/s 15.02 MB/s 15.14 MB/s read compiled tree 29.30 MB/s 29.31 MB/s 29.32 MB/s delete tree 6.22 seconds 6.14 seconds 6.15 seconds delete compiled tree 5.75 seconds 5.92 seconds 5.81 seconds stat tree 4.60 seconds 4.51 seconds 4.56 seconds stat compiled tree 4.07 seconds 3.87 seconds 3.96 seconds In addition to that also remove the delwri inode flush that is unessecary now that bulkstat is always coherent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: simplify inode to transaction joiningChristoph Hellwig2010-07-261-2/+1
| | | | | | | | | | | | | | | | Currently we need to either call IHOLD or xfs_trans_ihold on an inode when joining it to a transaction via xfs_trans_ijoin. This patches instead makes xfs_trans_ijoin usable on it's own by doing an implicity xfs_trans_ihold, which also allows us to drop the third argument. For the case where we want to hold a reference on the inode a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks the inode for needing an xfs_iput. In addition to the cleaner interface to the caller this also simplifies the implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: simplify log item descriptor trackingChristoph Hellwig2010-07-261-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we track log item descriptor belonging to a transaction using a complex opencoded chunk allocator. This code has been there since day one and seems to work around the lack of an efficient slab allocator. This patch replaces it with dynamically allocated log item descriptors from a dedicated slab pool, linked to the transaction by a linked list. This allows to greatly simplify the log item descriptor tracking to the point where it's just a couple hundred lines in xfs_trans.c instead of a separate file. The external API has also been simplified while we're at it - the xfs_trans_add_item and xfs_trans_del_item functions to add/ delete items from a transaction have been simplified to the bare minium, and the xfs_trans_find_item function is replaced with a direct dereference of the li_desc field. All debug code walking the list of log items in a transaction is down to a simple list_for_each_entry. Note that we could easily use a singly linked list here instead of the double linked list from list.h as the fastpath only does deletion from sequential traversal. But given that we don't have one available as a library function yet I use the list.h functions for simplicity. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
* xfs: remove unneeded #include statementsChristoph Hellwig2010-07-261-3/+0
| | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>