aboutsummaryrefslogtreecommitdiffstats
path: root/fs/nilfs2/segment.c
Commit message (Collapse)AuthorAgeFilesLines
* mm: add account_page_writeback()Michael Rubin2010-10-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To help developers and applications gain visibility into writeback behaviour this patch adds two counters to /proc/vmstat. # grep nr_dirtied /proc/vmstat nr_dirtied 3747 # grep nr_written /proc/vmstat nr_written 3618 These entries allow user apps to understand writeback behaviour over time and learn how it is impacting their performance. Currently there is no way to inspect dirty and writeback speed over time. It's not possible for nr_dirty/nr_writeback. These entries are necessary to give visibility into writeback behaviour. We have /proc/diskstats which lets us understand the io in the block layer. We have blktrace for more in depth understanding. We have e2fsprogs and debugsfs to give insight into the file systems behaviour, but we don't offer our users the ability understand what writeback is doing. There is no way to know how active it is over the whole system, if it's falling behind or to quantify it's efforts. With these values exported users can easily see how much data applications are sending through writeback and also at what rates writeback is processing this data. Comparing the rates of change between the two allow developers to see when writeback is not able to keep up with incoming traffic and the rate of dirty memory being sent to the IO back end. This allows folks to understand their io workloads and track kernel issues. Non kernel engineers at Google often use these counters to solve puzzling performance problems. Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written Patch #5 add writeback thresholds to /proc/vmstat Currently these values are in debugfs. But they should be promoted to /proc since they are useful for developers who are writing databases and file servers and are not debugging the kernel. The output is as below: # grep threshold /proc/vmstat nr_pages_dirty_threshold 409111 nr_pages_dirty_background_threshold 818223 This patch: This allows code outside of the mm core to safely manipulate page writeback state and not worry about the other accounting. Not using these routines means that some code will lose track of the accounting and we get bugs. Modify nilfs2 to use interface. Signed-off-by: Michael Rubin <mrubin@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Jiro SEKIBA <jir@unicus.jp> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nilfs2: eliminate sparse warning - "context imbalance"Jiro SEKIBA2010-10-231-0/+2
| | | | | | | | | | | | | insert sparse annotations to fix following sparse warning. fs/nilfs2/segment.c:2681:3: warning: context imbalance in 'nilfs_segctor_kill_thread' - unexpected unlock nilfs_segctor_kill_thread is only called inside sc_state_lock lock. sparse doesn't detect the context and warn "unexpected unlock". __acquires/__releases pretend to lock/unlock the sc_state_lock for sparse. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: add bdev freeze/thaw supportRyusuke Konishi2010-10-231-0/+2
| | | | | | | | | | | Nilfs hasn't supported the freeze/thaw feature because it didn't work due to the peculiar design that multiple super block instances could be allocated for a device. This limitation was removed by the patch "nilfs2: do not allocate multiple super block instances for a device". So now this adds the freeze/thaw support to nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: get rid of back pointer to writable sb instanceRyusuke Konishi2010-10-231-4/+0
| | | | | | | | | | | Nilfs object holds a back pointer to a writable super block instance in nilfs->ns_writer, and this became eliminable since sb is now made per device and all inodes have a valid pointer to it. This deletes the ns_writer pointer and a reader/writer semaphore protecting it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: get rid of GCDAT inodeRyusuke Konishi2010-10-231-7/+7
| | | | | | | This applies prepared rollback function and redirect function of metadata file to DAT file, and eliminates GCDAT inode. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: add routines to redirect access to buffers of DAT fileRyusuke Konishi2010-10-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | During garbage collection (GC), DAT file, which converts virtual block number to real block number, may return disk block number that is not yet written to the device. To avoid access to unwritten blocks, the current implementation stores changes to the caches of GCDAT during GC and atomically commit the changes into the DAT file after they are written to the device. This patch, instead, adds a function that makes a copy of specified buffer and stores it in nilfs_shadow_map, and a function to get the backup copy as needed (nilfs_mdt_freeze_buffer and nilfs_mdt_get_frozen_buffer respectively). Before DAT changes block number in an entry block, it makes a copy and redirect access to the buffer so that address conversion function (i.e. nilfs_dat_translate) refers to the old address saved in the copy. This patch gives requisites for such redirection. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: move inode count and block count into root objectRyusuke Konishi2010-10-231-2/+2
| | | | | | | This moves sbi->s_inodes_count and sbi->s_blocks_count into nilfs_root object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: use root object to get ifileRyusuke Konishi2010-10-231-17/+26
| | | | | | | | This rewrites functions using ifile so that they get ifile from nilfs_root object, and will remove sbi->s_ifile. Some functions that don't know the root object are extended to receive it from caller. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: remove own inode hash used for GCRyusuke Konishi2010-10-231-2/+1
| | | | | | | | This uses inode hash function that vfs provides instead of the own hash table for caching gc inodes. This finally removes the own inode hash from nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: keep zero value in i_cno except for gc-inodesRyusuke Konishi2010-10-231-15/+14
| | | | | | | | | | | | | | On-memory inode structures of nilfs have a member "i_cno" which stores a checkpoint number related to the inode. For gc-inodes, this field indicates version of data each gc-inode caches for GC. Log writer temporarily uses "i_cno" to transfer the latest checkpoint number. This stops the latter use and lets only gc-inodes use it. The purpose of this patch is to allow the successive change use "i_cno" for inode lookup. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: do not update log cursor for small changeRyusuke Konishi2010-07-231-1/+0
| | | | | | | | | | | | | Super blocks of nilfs are periodically overwritten in order to record the recent log position. This shortens recovery time after unclean unmount, but the current implementation performs the update even for a few blocks of change. If the filesystem gets small changes slowly and continually, super blocks may be updated excessively. This moderates the issue by skipping update of log cursor if it does not cross a segment boundary. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: sync super blocks in turnsJiro SEKIBA2010-07-231-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will sync super blocks in turns instead of syncing duplicate super blocks at the time. This will help searching valid super root when super block is written into disk before log is written, which is happen when barrier-less block devices are unmounted uncleanly. In the situation, old super block likely points to valid log. This patch introduces ns_sbwcount member to the nilfs object and adds nilfs_sb_will_flip() function; ns_sbwcount counts how many times super blocks write back to the disk. And, nilfs_sb_will_flip() decides whether flipping required or not based on the count of ns_sbwcount to sync super blocks asymmetrically. The following functions are also changed: - nilfs_prepare_super(): flips super blocks according to the argument. The argument is calculated by nilfs_sb_will_flip() function. - nilfs_cleanup_super(): sets "clean" flag to both super blocks if they point to the same checkpoint. To update both of super block information, caller of nilfs_commit_super must set the information on both super blocks. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: introduce nilfs_prepare_superJiro SEKIBA2010-07-231-2/+6
| | | | | | | | | | This function checks validity of super block pointers. If first super block is invalid, it will swap the super blocks. The function should be called before any super block information updates. Caller must obtain nilfs->ns_sem. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: get rid of macros for segment summary informationRyusuke Konishi2010-07-231-4/+4
| | | | | | | This removes macros to test segment summary flags and redefines a few relevant macros with inline functions. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: make nilfs_sc_*_ops staticRyusuke Konishi2010-05-101-3/+3
| | | | | | | | | | This kills the following sparse warnings: fs/nilfs2/segment.c:567:28: warning: symbol 'nilfs_sc_file_ops' was not declared. Should it be static? fs/nilfs2/segment.c:617:28: warning: symbol 'nilfs_sc_dat_ops' was not declared. Should it be static? fs/nilfs2/segment.c:625:28: warning: symbol 'nilfs_sc_dsync_ops' was not declared. Should it be static? Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: change sc_timer from a pointer to an embedded one in struct ↵Li Hong2010-05-101-18/+13
| | | | | | | | | | | | | | nilfs_sc_info In nilfs_segctor_thread(), timer is a local variable allocated on stack. Its address can't be set to sci->sc_timer and passed in several procedures. It works now by chance, just because other procedures are called by nilfs_segctor_thread() directly or indirectly and the stack hasn't been deallocated yet. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: remove nilfs_segctor_init() in segment.cLi Hong2010-05-101-8/+1
| | | | | | | | | | | | There are only two lines of code in nilfs_segctor_init(). From a logic design view, the first line 'sci->sc_seq_done = sci->sc_seq_request;' should be put in nilfs_segctor_new(). Even in nilfs_segctor_new(), this initialization is needless because sci is kzalloc-ed. So nilfs_segctor_init() is only a wrap call to nilfs_segctor_start_thread(). Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: insert checkpoint number in segment summary headerRyusuke Konishi2010-05-101-1/+2
| | | | | | | | | | | | | | This adds a field to record the latest checkpoint number in the nilfs_segment_summary structure. This will help to recover the latest checkpoint number from logs on disk. This field is intended for crucial cases in which super blocks have lost pointer to the latest log. Even though this will change the disk format, both backward and forward compatibility is preserved by a size field prepared in the segment summary header. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: cleanup multi kmem_cache_{create,destroy} codeLi Hong2010-05-101-36/+0
| | | | | | | | | | | | | | | | | | This cleanup patch gives several improvements: - Moving all kmem_cache_{create_destroy} calls into one place, which removes some small function calls, cleans up error check code and clarify the logic. - Mark all initial code in __init section. - Remove some very obvious comments. - Adjust some declarations. - Fix some space-tab issues. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: move out checksum routines to segment buffer codeRyusuke Konishi2010-05-101-30/+3
| | | | | | | This moves out checksum routines in log writer to segbuf.c for cleanup. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: move pointer to super root block into logsRyusuke Konishi2010-05-101-24/+21
| | | | | | | This moves a pointer to buffer storing super root block to each log buffer from nilfs_sc_info struct for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* nilfs2: fix hang-up of cleaner after log writer returned with errorRyusuke Konishi2010-03-241-2/+1
| | | | | | | | | | | | | | | | | | According to the report from Andreas Beckmann (Message-ID: <4BA54677.3090902@abeckmann.de>), nilfs in 2.6.33 kernel got stuck after a disk full error. This turned out to be a regression by log writer updates merged at kernel 2.6.33. nilfs_segctor_abort_construction, which is a cleanup function for erroneous cases, was skipping writeback completion for some logs. This fixes the bug and would resolve the hang issue. Reported-by: Andreas Beckmann <debian@abeckmann.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable <stable@kernel.org> [2.6.33.x]
* nilfs2: fix duplicate call to nilfs_segctor_cancel_freevRyusuke Konishi2010-03-221-6/+6
| | | | | | | | | | | | | | Andreas Beckmann gave me a report that nilfs logged the following warnings when it got a disk full: nilfs_sufile_do_cancel_free: segment 0 must be clean nilfs_sufile_do_cancel_free: segment 1 must be clean These arise from a duplicate call to nilfs_segctor_cancel_freev in an error path of log writer. This will fix the issue. Reported-by: Andreas Beckmann <debian@abeckmann.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: fix various typos in commentsRyusuke Konishi2010-03-141-2/+2
| | | | | | This fixes various typos I found in comments of nilfs2. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: fix function name typos in docbook commentsRyusuke Konishi2010-03-141-2/+2
| | | | | | | | | Fixes the following typos in docbook comments: nilfs_detroy_transaction_cache -> nilfs_destroy_transaction_cache nilfs_secgtor_start_timer -> nilfs_segctor_start_timer Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: move iterator to write log into segment bufferRyusuke Konishi2010-02-131-7/+2
| | | | | | | | This moves iterator to submit write requests for a series of logs into segbuf.c, and hides nilfs_segbuf_write() and nilfs_segbuf_wait() in the file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: get rid of s_dirt flag useRyusuke Konishi2010-02-131-6/+5
| | | | | | | | | | | This replaces s_dirt flag use in nilfs with a new flag added on the nilfs object. The s_dirt flag was used to indicate if sop->write_super() should be called, however the current version of nilfs does not use the callback. Thus, it can be replaced with the own flag. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Jiro SEKIBA <jir@unicus.jp>
* nilfs2: get rid of nilfs_segctor_req structRyusuke Konishi2010-02-131-41/+38
| | | | | | | This will clean up nilfs_segctor_req struct and the obscure request argument passed among private methods of segment constructor. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: fix potential hang in nilfs_error on errors=remount-roRyusuke Konishi2010-02-131-2/+9
| | | | | | | | | | | | | | | | nilfs_error() calls nilfs_detach_segment_constructor() if errors=remount-ro option is specified, and this may lead to a hang due to recursive locking of, for instance, nilfs->ns_segctor_sem and others. In this case, detaching segment constructor is not necessary because read-only flag is set to the filesystem and further writes are blocked. This fixes the potential hang issue by removing the nilfs_detach_segment_constructor() call from nilfs_error. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: issue discard request after cleaning segmentsJiro SEKIBA2010-02-131-0/+10
| | | | | | | | | This adds a function to send discard requests for given array of segment numbers, and calls the function when garbage collection succeeded. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: fix potential leak of dirty data on umountRyusuke Konishi2010-01-311-1/+1
| | | | | | | | | This fixes incorrect usage of nilfs_segctor_confirm() test function in nilfs_segctor_destroy(); nilfs_segctor_confirm() returns zero if the filesystem is not clean, so its use in nilfs_segctor_destroy() needs inversion. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: separate wait function from nilfs_segctor_writeRyusuke Konishi2009-11-301-95/+145
| | | | | | | | | | | | | | | | This separates wait function for submitted logs from the write function nilfs_segctor_write(). A new list of segment buffers "sc_write_logs" is added to hold logs under writing, and double buffering is partially applied to hide io latency. At this point, the double buffering is disabled for blocksize < pagesize because page dirty flag is turned off during write and dirty buffers are not properly collected for pages crossing over segments. To receive full benefit of the double buffering, further refinement is needed to move the io wait outside the lock section of log writer. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: add iterator for segment buffersRyusuke Konishi2009-11-301-39/+12
| | | | | | | This adds a few iterator functions for segment buffers to make it easy to handle multiple series of logs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: hide nilfs_write_info struct in segment buffer codeRyusuke Konishi2009-11-301-9/+3
| | | | | | | Hides nilfs_write_info struct and nilfs_segbuf_prepare_write function in segbuf.c to simplify the interface of nilfs_segbuf_write function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: relocate io status variables to segment bufferRyusuke Konishi2009-11-301-11/+8
| | | | | | | | | This moves io status variables in nilfs_write_info struct to nilfs_segment_buffer struct. This is a preparation to hide nilfs_write_info in segment buffer code. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: use list_splice_tail or list_splice_tail_initRyusuke Konishi2009-11-291-2/+2
| | | | | | | | This applies list_splice_tail (or list_splice_tail_init) operation instead of list_splice (or list_splice_init, respectively) to append a new list to tail of an existing list. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: move routine to set segment usage into sufileRyusuke Konishi2009-11-201-22/+10
| | | | | | | This adds nilfs_sufile_set_segment_usage() function in sufile to replace direct access to the sufile metadata in log writer code. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: move routine marking segment usage dirty into sufileRyusuke Konishi2009-11-201-17/+2
| | | | | | | | | This adds nilfs_sufile_mark_dirty() function in sufile to replace nilfs_touch_segusage() function in log writer code. This is a preparation for the further cleanup which will move out low level sufile operations in the log writer. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: eliminate inlines to directly read/write inode of metadata filesRyusuke Konishi2009-11-201-6/+6
| | | | | | | Removes two inline functions: nilfs_mdt_read_inode_direct() and nilfs_mdt_write_inode_direct(). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: fix irregular checkpoint creation due to data flushRyusuke Konishi2009-11-031-6/+11
| | | | | | | | | | | | | When nilfs flushes out dirty data to reduce memory pressure, creation of checkpoints is wrongly postponed. This bug causes irregular checkpoint creation especially in small footprint systems. To correct this issue, a timer for the checkpoint creation has to be continued if a log writer does not create a checkpoint. This will do the correction. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: stop using periodic write_super callbackJiro SEKIBA2009-09-141-1/+6
| | | | | | | | | | | | | | | This removes nilfs_write_super and commit super block in nilfs internal thread, instead of periodic write_super callback. VFS layer calls ->write_super callback periodically. However, it looks like that calling back is ommited when disk I/O is busy. And when cleanerd (nilfs GC) is runnig, disk I/O tend to be busy thus nilfs superblock is not synchronized as nilfs designed. To avoid it, syncing superblock by nilfs thread instead of pdflush. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: fix oops due to inconsistent state in page with discrete b-tree nodesRyusuke Konishi2009-08-011-1/+15
| | | | | | | | | | | | | | | | | | | | Andrea Gelmini gave me a report that a kernel oops hit on a nilfs filesystem with a 1KB block size when doing rsync. This turned out to be caused by an inconsistency of dirty state between a page and its buffers storing b-tree node blocks. If the page had multiple buffers split over multiple logs, and if the logs were written at a time, a dirty flag remained in the page even every dirty flag in the buffers was cleared. This will fix the failure by dropping the dirty flag properly for pages with the discrete multiple b-tree nodes. Reported-by: Andrea Gelmini <andrea.gelmini@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Andrea Gelmini <andrea.gelmini@gmail.com> Cc: stable@kernel.org
* nilfs2: fix hang problem of log writer which occurs after write failuresRyusuke Konishi2009-07-051-20/+6
| | | | | | | | | | | | | Leandro Lucarella gave me a report that nilfs gets stuck after its write function fails. The problem turned out to be caused by bugs which leave writeback flag on pages. This fixes the problem by ensuring to clear the writeback flag in error path. Reported-by: Leandro Lucarella <llucax@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org
* nilfs2: remove unlikely directive causing mis-conversion of error codeRyusuke Konishi2009-07-051-2/+2
| | | | | | | | | | | | | | | | | | The following error code handling in nilfs_segctor_write() function wrongly converted negative error codes to a truth value (i.e. 1): err = unlikely(err) ? : res; which originaly meant to be err = err ? : res; This mis-conversion caused that write or sync functions receive the unexpected error code. This fixes the bug by removing the unlikely directive. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org
* nilfs2: eliminate removal list of segmentsRyusuke Konishi2009-06-101-95/+35
| | | | | | | | | | | | This will clean up the removal list of segments and the related functions from segment.c and ioctl.c, which have hurt code readability. This elimination is applied by using nilfs_sufile_updatev() previously introduced in the patch ("nilfs2: add sufile function that can modify multiple segment usages"). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: fix lock order reversal in nilfs_clean_segments ioctlRyusuke Konishi2009-05-111-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a companion patch to ("nilfs2: fix possible circular locking for get information ioctls"). This corrects lock order reversal between mm->mmap_sem and nilfs->ns_segctor_sem in nilfs_clean_segments() which was detected by lockdep check: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.30-rc3-nilfs-00003-g360bdc1 #7 ------------------------------------------------------- mmap/5294 is trying to acquire lock: (&nilfs->ns_segctor_sem){++++.+}, at: [<d0d0e846>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] but task is already holding lock: (&mm->mmap_sem){++++++}, at: [<c043700a>] do_page_fault+0x1d8/0x30a which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&mm->mmap_sem){++++++}: [<c01470a5>] __lock_acquire+0x1066/0x13b0 [<c01474a9>] lock_acquire+0xba/0xdd [<c01836bc>] might_fault+0x68/0x88 [<c023c61d>] copy_from_user+0x2a/0x111 [<d0d120d0>] nilfs_ioctl_prepare_clean_segments+0x1d/0xf1 [nilfs2] [<d0d0e2aa>] nilfs_clean_segments+0x6d/0x1b9 [nilfs2] [<d0d11f68>] nilfs_ioctl+0x2ad/0x318 [nilfs2] [<c01a3be7>] vfs_ioctl+0x22/0x69 [<c01a408e>] do_vfs_ioctl+0x460/0x499 [<c01a4107>] sys_ioctl+0x40/0x5a [<c01031a4>] sysenter_do_call+0x12/0x38 [<ffffffff>] 0xffffffff -> #0 (&nilfs->ns_segctor_sem){++++.+}: [<c0146e0b>] __lock_acquire+0xdcc/0x13b0 [<c01474a9>] lock_acquire+0xba/0xdd [<c0433f1d>] down_read+0x2a/0x3e [<d0d0e846>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] [<d0cfe0e5>] nilfs_page_mkwrite+0xe7/0x154 [nilfs2] [<c0183b0b>] __do_fault+0x165/0x376 [<c01855cd>] handle_mm_fault+0x287/0x5d1 [<c043712d>] do_page_fault+0x2fb/0x30a [<c0435462>] error_code+0x72/0x78 [<ffffffff>] 0xffffffff where nilfs_clean_segments() holds: nilfs->ns_segctor_sem -> copy_from_user() --> page fault -> mm->mmap_sem And, page fault path may hold: page fault -> mm->mmap_sem --> nilfs_page_mkwrite() -> nilfs->ns_segctor_sem Even though nilfs_clean_segments() does not perform write access on given user pages, it may cause deadlock because nilfs->ns_segctor_sem is shared per device and mm->mmap_sem can be shared with other tasks. To avoid this problem, this patch moves all calls of copy_from_user() outside the nilfs->ns_segctor_sem lock in the ioctl. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: introduce secondary super blockRyusuke Konishi2009-04-071-5/+3
| | | | | | | | | | | | | | | | | | The former versions didn't have extra super blocks. This improves the weak point by introducing another super block at unused region in tail of the partition. This doesn't break disk format compatibility; older versions just ingore the secondary super block, and new versions just recover it if it doesn't exist. The partition created by an old mkfs may not have unused region, but in that case, the secondary super block will not be added. This doesn't make more redundant copies of the super block; it is a future work. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nilfs2: simplify handling of active state of segmentsRyusuke Konishi2009-04-071-162/+11
| | | | | | | | | | | | | | | | | | | will reduce some lines of segment constructor. Previously, the state was complexly controlled through a list of segments in order to keep consistency in meta data of usage state of segments. Instead, this presents ``calculated'' active flags to userland cleaner program and stop maintaining its real flag on disk. Only by this fake flag, the cleaner cannot exactly know if each segment is reclaimable or not. However, the recent extension of nilfs_sustat ioctl struct (nilfs2-extend-nilfs_sustat-ioctl-struct.patch) can prevent the cleaner from reclaiming in-use segment wrongly. So, now I can apply this for simplification. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nilfs2: mark minor flag for checkpoint created by internal operationRyusuke Konishi2009-04-071-0/+9
| | | | | | | | | | | | | | | | | | Nilfs creates checkpoints even for garbage collection or metadata updates such as checkpoint mode change. So, user often sees checkpoints created only by such internal operations. This is inconvenient in some situations. For example, application that monitors checkpoints and changes them to snapshots, will fall into an infinite loop because it cannot distinguish internally created checkpoints. This patch solves this sort of problem by adding a flag to checkpoint for identification. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>