aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| | * ext4: Remove journal_checksum mount option and enable it by defaultTheodore Ts'o2009-09-052-15/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | There's no real cost for the journal checksum feature, and we should make sure it is enabled all the time. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Declare seq_operations and file_operations structures as constTobias Klauser2009-09-051-4/+4
| | | | | | | | | | | | | | | Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Add new tracepoint: trace_ext4_da_write_pages()Theodore Ts'o2009-08-312-12/+16
| | | | | | | | | | | | | | | | | | | | | Add a new tracepoint which shows the pages that will be written using write_cache_pages() by ext4_da_writepages(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Restore wbc->range_start in ext4_da_writepages()Theodore Ts'o2009-08-311-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To solve a lock inversion problem, we implement part of the range_cyclic algorithm in ext4_da_writepages(). (See commit 2acf2c26 for more details.) As part of that change wbc->range_start was modified by ext4's writepages function, which causes its callers to get confused since they aren't expecting the filesystem to modify it. The simplest fix is to save and restore wbc->range_start in ext4_da_writepages. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Limit number of links that can be created by ext4_link()Theodore Ts'o2009-08-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | In ext4_link we need to check using EXT4_LINK_MAX, and not EXT4_DIR_LINK_MAX(), since ext4_link() is creating hard links of regular files, and not directories. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Allow rename to create more than EXT4_LINK_MAX subdirectoriesAneesh Kumar K.V2009-08-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Use EXT4_DIR_LINK_MAX so that rename() can move a directory into new parent directory without running into the EXT4_LINK_MAX limit. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: fix extent sanity checking code with AGGRESSIVE_TESTTheodore Ts'o2009-08-281-26/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The extents sanity-checking code depends on the ext4_ext_space_*() functions returning the maximum alloable size for eh_max; however, when the debugging #ifdef AGGRESSIVE_TEST is enabled to test the extent tree handling code, this prevents a normally created ext4 filesystem from being mounted with the errors: Aug 26 15:43:50 bsd086 kernel: [ 96.070277] EXT4-fs error (device sda8): ext4_ext_check_inode: bad header/extent in inode #8: too large eh_max - magic f30a, entries 1, max 4(3), depth 0(0) Aug 26 15:43:50 bsd086 kernel: [ 96.070526] EXT4-fs (sda8): no journal found Bug reported by Akira Fujita. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: use ext4_grpblk_t more extensivelyEric Sandeen2009-08-253-16/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unsigned short is potentially too small to track blocks within a group; today it is safe due to restrictions in e2fsprogs but we have _lo / _hi bits for group blocks with the intent to go up to 32 bits, so clean this up now. There are many more places where we use unsigned/int/unsigned int to contain a group block but this should at least fix all the short types. I added a few comments to the struct ext4_group_info definition as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: use variables not types in sizeofs() for allocationsEric Sandeen2009-08-251-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Precursor to changing some types; to keep things in sync, it seems better to allocate/memset based on the size of the variables we are using rather than on some disconnected basic type like "unsigned short" Signed-off-by: Eric Sandeen <sandeen@redhat.com>
| | * ext4: Add missing unlock_new_inode() call in extent migration codeAneesh Kumar K.V2009-08-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to unlock the new inode before iput. This patch fixes the following warning when calling chattr +e to migrate a file to use extents. It also fixes problems in when e4defrag attempts to defragment an inode. [ 470.400044] ------------[ cut here ]------------ [ 470.400065] WARNING: at fs/inode.c:1210 generic_delete_inode+0x65/0x16a() [ 470.400072] Hardware name: N/A ..... ... [ 470.400353] Pid: 4451, comm: chattr Not tainted 2.6.31-rc7-red-debug #4 [ 470.400359] Call Trace: [ 470.400372] [<ffffffff81037771>] warn_slowpath_common+0x77/0x8f [ 470.400385] [<ffffffff81037798>] warn_slowpath_null+0xf/0x11 [ 470.400395] [<ffffffff810b7f28>] generic_delete_inode+0x65/0x16a [ 470.400405] [<ffffffff810b8044>] generic_drop_inode+0x17/0x1bd [ 470.400413] [<ffffffff810b7083>] iput+0x61/0x65 [ 470.400455] [<ffffffffa003b229>] ext4_ext_migrate+0x5eb/0x66a [ext4] [ 470.400492] [<ffffffffa002b1f8>] ext4_ioctl+0x340/0x756 [ext4] [ 470.400507] [<ffffffff810b1a91>] vfs_ioctl+0x1d/0x82 [ 470.400517] [<ffffffff810b1ff0>] do_vfs_ioctl+0x483/0x4c9 [ 470.400527] [<ffffffff81059c30>] ? trace_hardirqs_on+0xd/0xf [ 470.400537] [<ffffffff810b2087>] sys_ioctl+0x51/0x74 [ 470.400549] [<ffffffff8100ba6b>] system_call_fastpath+0x16/0x1b [ 470.400557] ---[ end trace ab85723542352dac ]--- Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Add feature set check helper for mount & remount pathsEric Sandeen2009-08-181-42/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user reported that although his root ext4 filesystem was mounting fine, other filesystems would not mount, with the: "Filesystem with huge files cannot be mounted RDWR without CONFIG_LBDAF" error on his 32-bit box built without CONFIG_LBDAF. This is because the test at mount time for this situation was not being re-checked on remount, and the normal boot process makes an ro->rw transition, so this was being missed. Refactor to make a common helper function to test the filesystem features against the type of mount request (RO vs. RW) so that we stay consistent. Addresses Red-Hat-Bugzilla: #517650 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * simplify some logic in ext4_mb_normalize_requestEric Sandeen2009-08-171-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While reading through some of the mballoc code it seems that a couple spots in the size normalization function could be streamlined. The test for non-overlapping PAs can be or'd for the start & end conditions, and the tests for adjacent PAs can be else-if'd - it's essentially independently testing: if (A + B <= C) ... if (A > C) ... These cannot both be true so it seems like the else-if might be slightly more efficient and/or informative. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: open-code ext4_mb_update_group_infoEric Sandeen2009-08-173-12/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4_mb_update_group_info is only called in one place, and it's extremely simple. There's no reason to have it in a separate function in a separate file as far as I can tell, it just obfuscates what's really going on. Perhaps it was intended to keep the grp->bb_* manipulation local to mballoc.c but we're already accessing other grp-> fields in balloc.c directly so this seems ok. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: reject too-large filesystems on 32-bit kernelsEric Sandeen2009-08-171-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4 will happily mount a > 16T filesystem on a 32-bit box, but this is not safe; writes to the block device will wrap past 16T and the page cache can't index past 16T (232 index * 4k pages). Adding another test to the existing "too many sectors" test should do the trick. Add a comment, a relevant return value, and fix the reference to the CONFIG_LBD(AF) option as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks()Jan Kara2009-08-173-7/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During truncate we are sometimes forced to start a new transaction as the amount of blocks to be journaled is both quite large and hard to predict. So far we restarted a transaction while holding i_data_sem and that violates lock ordering because i_data_sem ranks below a transaction start (and it can lead to a real deadlock with ext4_get_blocks() mapping blocks in some page while having a transaction open). We fix the problem by dropping the i_data_sem before restarting the transaction and acquire it afterwards. It's slightly subtle that this works: 1) By the time ext4_truncate() is called, all the page cache for the truncated part of the file is dropped so get_block() should not be called on it (we only have to invalidate extent cache after we reacquire i_data_sem because some extent from not-truncated part could extend also into the part we are going to truncate). 2) Writes, migrate or defrag hold i_mutex so they are stopped for all the time of the truncate. This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * jbd2: Annotate transaction start also for jbd2_journal_restart()Jan Kara2009-08-171-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockdep annotation for a transaction start has been at the end of jbd2_journal_start(). But a transaction is also started from jbd2_journal_restart(). Move the lockdep annotation to start_this_handle() which covers both cases. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Show unwritten extent flag in ext4_ext_show_leaf()Mingming2009-09-182-11/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4_ext_show_leaf() will display the leaf extents when extent debugging is enabled. Printing out the unwritten bit is useful for debugging unwritten extent, allow us to see the unwritten extents vs written extents, after the unwritten extents are splitted or converted. Signed-off-by: Mingming Cao <cmm@us.ibm.com>
| | * ext4: Compile warning fix when EXT_DEBUG enabledMingming2009-09-011-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When EXT_DEBUG is enabled I received the following compile warning on PPC64: CC [M] fs/ext4/inode.o CC [M] fs/ext4/extents.o fs/ext4/extents.c: In function ‘ext4_ext_rm_leaf’: fs/ext4/extents.c:2097: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘ext4_lblk_t’ fs/ext4/extents.c: In function ‘ext4_ext_get_blocks’: fs/ext4/extents.c:2789: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’ fs/ext4/extents.c:2852: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘ext4_lblk_t’ fs/ext4/extents.c:2953: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 4 has type ‘unsigned int’ CC [M] fs/ext4/migrate.o The patch fixes compile warning. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Index: linux-2.6.31-rc4/fs/ext4/extents.c ===================================================================
| | * ext4: Avoid group preallocation for closed filesTheodore Ts'o2009-09-182-2/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the group preallocation code tries to find a large (512) free block from which to do per-cpu group allocation for small files. The problem with this scheme is that it leaves the filesystem horribly fragmented. In the worst case, if the filesystem is unmounted and remounted (after a system shutdown, for example) we forget the fact that wee were using a particular (now-partially filled) 512 block extent. So the next time we try to allocate space for a small file, we will find *another* completely free 512 block chunk to allocate small files. Given that there are 32,768 blocks in a block group, after 64 iterations of "mount, write one 4k file in a directory, unmount", the block group will have 64 files, each separated by 511 blocks, and the block group will no longer have any free 512 completely free chunks of blocks for group preallocation space. So if we try to allocate blocks for a file that has been closed, such that we know the final size of the file, and the filesystem is not busy, avoid using group preallocation. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Fix bugs in mballoc's stream allocation modeTheodore Ts'o2009-08-092-13/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all screwed up. These fields were getting unconditionally all the time, set even when stream allocation had not taken place, and if they were being used when the file was smaller than s_mb_stream_request, which is when the allocation should _not_ be doing stream allocation. Fix this by determining whether or not we stream allocation should take place once, in ext4_mb_group_or_file(), and setting a flag which gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found(). This simplifies the code and assures that we are consistently using (or not using) the stream allocation logic. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Display the mballoc flags in mb_history in hex instead of decimalTheodore Ts'o2009-08-092-13/+13
| | | | | | | | | | | | | | | | | | | | | Displaying the flags in base 16 makes it easier to see which flags have been set. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Add configurable run-time mballoc debuggingTheodore Ts'o2009-09-183-26/+80
| | | | | | | | | | | | | | | | | | | | | Allow mballoc debugging to be enabled at run-time instead of just at compile time. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: fix journal ref count in move_extent_par_pagePeng Tao2009-08-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | move_extent_par_page calls a_ops->write_begin() to increase journal handler's reference count. However, if either mext_replace_branches() or ext4_get_block fails, the increased reference count isn't decreased. This will cause a later attempt to umount of the fs to hang forever. The patch addresses the issue by calling ext4_journal_stop() if page is not NULL (which means a_ops->write_end() isn't invoked). Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * jbd2: round commit timer up to avoid uncommitted transactionAndreas Dilger2009-08-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fix jiffie rounding in jbd commit timer setup code. Rounding down could cause the timer to be fired before the corresponding transaction has expired. That transaction can stay not committed forever if no new transaction is created or expicit sync/umount happens. Signed-off-by: Alex Zhuravlev (Tomas) <alex.zhuravlev@sun.com> Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: remove redundant test on unsignedRoel Kluin2009-08-101-3/+1
| | | | | | | | | | | | | | | | | | | | | unsigned i_block cannot be less than 0. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: fix build warning when EXT4FS_DEBUG is onPeng Tao2009-07-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiling with EXT4FS_DEBUG on, gcc will complain with following warnings: linux-2.6/fs/ext4/ialloc.c: In function ‘ext4_count_free_inodes’: linux-2.6/fs/ext4/ialloc.c:1192: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘ext4_group_t’ So add a type cast to suppress it. Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Fix compile warnings with MB_DEBUGAkira Fujita2009-07-051-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | When MB_DEBUG is enabled, we get some compile warnings because ext4_group_t is unsigned int. This patch fixes them. Signed-off-by Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Remove unnecessary semicolons in mballoc.cJoe Perches2009-07-051-1/+1
| | | | | | | | | | | | | | | Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: More buffer head reference leaksCurt Wohlgemuth2009-07-172-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the patch I posted last week regarding buffer head ref leaks in no-journal mode, I looked at all the code that uses buffer heads and searched for more potential leaks. The patch below fixes the issues I found; these can occur even when a journal is present. The change to inode.c fixes a double release if ext4_journal_get_create_access() fails. The changes to namei.c are more complicated. add_dirent_to_buf() will release the input buffer head EXCEPT when it returns -ENOSPC. There are some callers of this routine that don't always do the brelse() in the event that -ENOSPC is returned. Unfortunately, to put this fix into ext4_add_entry() required capturing the return value of make_indexed_dir() and add_dirent_to_buf(). Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * jbd2: Fail to load a journal if it is too shortJan Kara2009-07-171-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | Due to on disk corruption, it can happen that journal is too short. Fail to load it in such case so that we don't oops somewhere later. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Avoid null pointer dereference when decoding EROFS w/o a journalTheodore Ts'o2009-07-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | We need to check to make sure a journal is present before checking the journal flags in ext4_decode_error(). Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Fix typo in ext4/KconfigManish Katiyar2009-07-271-1/+1
| | | | | | | | | | | | | | | Signed-off-by: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * ext4: Fix memory leak fix when mounting an ext4 filesystemAneesh Kumar K.V2009-07-171-19/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The allocation of the ext4_group_info array was moved to a new function ext4_mb_add_group_info() in commit 5f21b0e6 so that online resize would use a common (and correct) codepath. Unfortunately, the call to the new ext4_mb_add_group_info() function was added without removing the code which originally allocated the array. This caused a memory leak each time an ext4 filesystem was mounted. The fix is simple; remove the code that did the original allocation, since it is no longer needed. Reported-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | Merge branch 'for-linus' of ↵Linus Torvalds2009-09-184-15/+231
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add fusectl interface to max_background fuse: limit user-specified values of max background requests fuse: use drop_nlink() instead of direct nlink manipulation fuse: document protocol version negotiation fuse: make the number of max background requests and congestion threshold tunable
| | * | fuse: add fusectl interface to max_backgroundCsaba Henk2009-09-163-5/+141
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the max_background and congestion_threshold parameters of a FUSE mount tunable at runtime by adding the respective knobs to its directory within the fusectl filesystem. Signed-off-by: Csaba Henk <csaba@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| | * | fuse: limit user-specified values of max background requestsCsaba Henk2009-09-161-6/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An untrusted user could DoS the system if s/he were allowed to accumulate an arbitrary number of pending background requests by setting the above limits to extremely high values in INIT. This patch excludes this possibility by imposing global upper limits on the possible values of per-mount "max background requests" and "congestion threshold" parameters for unprivileged FUSE filesystems. These global limits are implemented as module parameters. Signed-off-by: Csaba Henk <csaba@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| | * | fuse: use drop_nlink() instead of direct nlink manipulationCsaba Henk2009-09-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drop_nlink() is the API function to decrease the link count of an inode. However, at a place the control filesystem used the decrement operator on i_nlink directly. Fix this. Cc: Anand Avati <avati@gluster.com> Signed-off-by: Csaba Henk <csaba@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| | * | fuse: make the number of max background requests and congestion threshold ↵Csaba Henk2009-07-073-11/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tunable The practical values for these limits depend on the design of the filesystem server so let userspace set them at initialization time. Signed-off-by: Csaba Henk <csaba@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * | | Merge branch 'for-linus' of ↵Linus Torvalds2009-09-181-10/+16
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: use kernel_sendpage dlm: fix connection close handling dlm: fix double-release of socket in error exit path
| | * | | dlm: use kernel_sendpagePaolo Bonzini2009-08-241-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using kernel_sendpage() is cleaner and safer than following sock->ops ourselves. Signed-off-by: Paolo Bonzini <bonzini@gnu.org> Signed-off-by: David Teigland <teigland@redhat.com>
| | * | | dlm: fix connection close handlingLars Marowsky-Bree2009-08-241-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Closing a connection to a node can create problems if there are outstanding messages for that node. The problems include dlm_send spinning attempting to reconnect, or BUG from tcp_connect_to_sock() attempting to use a partially closed connection. To cleanly close a connection, we now first attempt to send any pending messages, cancel any remaining workqueue work, and flag the connection as closed to avoid reconnect attempts. Signed-off-by: Lars Marowsky-Bree <lmb@suse.de> Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| | * | | dlm: fix double-release of socket in error exit pathCasey Dahlin2009-08-181-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last correction to the tcp_connect_to_sock error exit path, commit a89d63a159b1ba5833be2bef00adf8ad8caac8be, can free an already freed socket, due to collision with a previous (incomplete) attempt to fix the same issue, commit 311f6fc77c51926dbdfbeab0a5d88d70f01fa3f4. Signed-off-by: Casey Dahlin <cdahlin@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | | | Merge branch 'for_linus' of ↵Linus Torvalds2009-09-188-45/+76
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: ext3: Flush disk caches on fsync when needed ext3: Add locking to ext3_do_update_inode ext3: Fix possible deadlock between ext3_truncate() and ext3_get_blocks() jbd: Annotate transaction start also for journal_restart() jbd: Journal block numbers can ever be only 32-bit use unsigned int for them ext3: Update MAINTAINERS for ext3 and JBD JBD: round commit timer up to avoid uncommitted transaction
| | * | | | ext3: Flush disk caches on fsync when neededJan Kara2009-09-161-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case we fsync() a file and inode is not dirty, we don't force a transaction to disk and hence don't flush disk caches. Thus file data could be just in disk caches and not on persistent storage. Fix the problem by flushing disk caches if we didn't force a transaction commit. Signed-off-by: Jan Kara <jack@suse.cz>
| | * | | | ext3: Add locking to ext3_do_update_inodeChris Mason2009-09-161-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've been struggling with this off and on while I've been testing the data=guarded work. The symptom is corrupted orphan lists and inodes with the wrong i_size stored on disk. I was convinced the data=guarded code was just missing a call to ext3_mark_inode_dirty, but tracing showed the i_disksize I was sending to ext3_mark_inode_dirty wasn't actually making it to the drive. ext3_mark_inode_dirty can be called without locks held (atime updates and a few others), so the data=guarded code uses locks while updating the in-memory inode, and then calls ext3_mark_inode_dirty without any locks held. But, ext3_mark_inode_dirty has no internal locking to make sure that only one CPU is updating the buffer head at a time. Generally this works out ok because everyone that changes the inode then calls ext3_mark_inode_dirty themselves. Even though it races, eventually someone updates the buffer heads and things move on. But there is still a risk of the wrong values getting in, and the data=guarded code seems to hit the race very often. Since everyone that changes the inode also logs it, it should be possible to fix this with some memory barriers. I'll leave that as an exercise to the reader and lock the buffer head instead. It it probably a good idea to have a different patch series for lockless bit flipping on the ext3 i_state field. ext3_do_update_inode &= clears EXT3_STATE_NEW without any locks held. Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
| | * | | | ext3: Fix possible deadlock between ext3_truncate() and ext3_get_blocks()Jan Kara2009-09-161-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During truncate we are sometimes forced to start a new transaction as the amount of blocks to be journaled is both quite large and hard to predict. So far we restarted a transaction while holding truncate_mutex and that violates lock ordering because truncate_mutex ranks below transaction start (and it can lead to a real deadlock with ext3_get_blocks() allocating new blocks from ext3_writepage()). Luckily, the problem is easy to fix: We just drop the truncate_mutex before restarting the transaction and acquire it afterwards. We are safe to do this as by the time ext3_truncate() is called, all the page cache for the truncated part of the file is dropped and so writepage() cannot come and allocate new blocks in the part of the file we are truncating. The rest of writers is stopped by us holding i_mutex. Signed-off-by: Jan Kara <jack@suse.cz>
| | * | | | jbd: Annotate transaction start also for journal_restart()Jan Kara2009-09-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockdep annotation for a transaction start has been at the end of journal_start(). But a transaction is also started from journal_restart(). Move the lockdep annotation to start_this_handle() which covers both cases. Signed-off-by: Jan Kara <jack@suse.cz>
| | * | | | jbd: Journal block numbers can ever be only 32-bit use unsigned int for themJan Kara2009-09-165-36/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It does not make sense to store block number for journal as unsigned long since they can be only 32-bit (because of on-disk format limitation). So change in-memory structures and variables to use unsigned int instead. Signed-off-by: Jan Kara <jack@suse.cz>
| | * | | | JBD: round commit timer up to avoid uncommitted transactionAndreas Dilger2009-09-161-1/+2
| | | |_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix jiffie rounding in jbd commit timer setup code. Rounding down could cause the timer to be fired before the corresponding transaction has expired. That transaction can stay not committed forever if no new transaction is created or explicit sync/umount happens. Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | | | Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2009-09-1738-927/+592
| |\ \ \ \ | | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: (39 commits) xfs: includecheck fix for fs/xfs/xfs_iops.c xfs: switch to seq_file xfs: Record new maintainer information xfs: use correct log reservation when handling ENOSPC in xfs_create xfs: xfs_showargs() reports group *and* project quotas enabled xfs: un-static xfs_inobt_lookup xfs: actually enable the swapext compat handler xfs: simplify xfs_trans_iget xfs: merge fsync and O_SYNC handling xfs: speed up free inode search xfs: rationalize xfs_inobt_lookup* xfs: untangle xfs_dialloc xfs: factor out debug checks from xfs_dialloc and xfs_difree xfs: improve xfs_inobt_update prototype xfs: improve xfs_inobt_get_rec prototype xfs: factor out inode initialisation fs/xfs: Correct redundant test xfs: remove XFS_INO64_OFFSET un-static xfs_read_agf xfs: add more statics & drop some unused functions ...