aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Btrfs: map the inode item when doing fill_inode_itemJosef Bacik2011-04-081-0/+12
| | | | | | | | Instead of calling kmap_atomic for every thing we set in the inode item, map the entire inode item at the start and unmap it at the end. This makes a sequential dd of 400mb O_DIRECT something like 1% faster. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: only retry transaction reservation onceJosef Bacik2011-04-081-1/+10
| | | | | | | | | | I saw a lockup where we kept getting into this start transaction->commit transaction loop because of enospce. The fact is if we fail to make our reservation, we've tried _everything_ several times, so we only need to try and commit the transaction once, and if that doesn't work then we really are out of space and need to just exit. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: deal with the case that we run out of space in the cacheJosef Bacik2011-04-083-74/+69
| | | | | | | | | | | Currently we don't handle running out of space in the cache, so to fix this we keep track of how far in the cache we are. Then we only dirty the pages if we successfully modify all of them, otherwise if we have an error or run out of space we can just drop them and not worry about the vm writing them out. Thanks, Tested-by Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: don't warn in btrfs_add_orphanJosef Bacik2011-04-051-2/+0
| | | | | | | | | | | When I moved the orphan adding to btrfs_truncate I missed the fact that during orphan cleanup we just add the orphan items to the orphan list without going through btrfs_orphan_add, which results in lots of warnings on mount if you have any orphan items that need to be truncated. Just remove this warning since it's ok, this will allow all of the normal space accounting take place. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix free space cache when there are pinned extents and clusters V2Josef Bacik2011-04-051-4/+78
| | | | | | | | | | | | | | | | | | | | | I noticed a huge problem with the free space cache that was presenting as an early ENOSPC. Turns out when writing the free space cache out I forgot to take into account pinned extents and more importantly clusters. This would result in us leaking free space everytime we unmounted the filesystem and remounted it. I fix this by making sure to check and see if the current block group has a cluster and writing out any entries that are in the cluster to the cache, as well as writing any pinned extents we currently have to the cache since those will be available for us to use the next time the fs mounts. This patch also adds a check to the end of load_free_space_cache to make sure we got the right amount of free space cache, and if not make sure to clear the cache and re-cache the old fashioned way. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix uninitialized root flags for subvolumesLi Zefan2011-04-055-1/+30
| | | | | | | | | | | | | | | | | | root_item->flags and root_item->byte_limit are not initialized when a subvolume is created. This bug is not revealed until we added readonly snapshot support - now you mount a btrfs filesystem and you may find the subvolumes in it are readonly. To work around this problem, we steal a bit from root_item->inode_item->flags, and use it to indicate if those fields have been properly initialized. When we read a tree root from disk, we check if the bit is set, and if not we'll set the flag and initialize the two fields of the root item. Reported-by: Andreas Philipp <philipp.andreas@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Tested-by: Andreas Philipp <philipp.andreas@gmail.com> cc: stable@kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: clear __GFP_FS flag in the space cache inodeMiao Xie2011-04-052-2/+2
| | | | | | | | | | | the object id of the space cache inode's key is allocated from the relative root, just like the regular file. So we can't identify space cache inode by checking the object id of the inode's key, and we have to clear __GFP_FS flag at the time we look up the space cache inode. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix memory leak in start_transaction()Yoshinori Sano2011-04-051-0/+1
| | | | | | | | Free btrfs_trans_handle when join_transaction() fails in start_transaction() Signed-off-by: Yoshinori Sano <yoshinori.sano@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix memory leak in btrfs_ioctl_start_sync()Tsutomu Itoh2011-04-051-1/+3
| | | | | | | Call btrfs_end_transaction() if btrfs_commit_transaction_async() fails. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix subvol_sem leak in btrfs_rename()Johann Lombardi2011-04-051-3/+5
| | | | | | | btrfs_rename() does not release the subvol_sem if the transaction failed to start. Signed-off-by: Johann Lombardi <johann@whamcloud.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix oops for defrag with compression turned onLi Zefan2011-04-051-9/+8
| | | | | | | | | | | When we defrag a file, whose size can be fit into an inline extent, with compression enabled, the compress type is set to be fs_info->compress_type, which is 0 if the btrfs filesystem is mounted without compress option. This leads to oops. Reported-by: Daniel Blueman <daniel.blueman@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix /proc/mounts info.Tsutomu Itoh2011-04-051-2/+17
| | | | | | | | | | | | | | | | | | | | | Some mount options are not displayed by /proc/mounts. This patch displays the option such as compress_type by /proc/mounts. Ex. [before] $ mount | grep sdc2 /dev/sdc2 on /test12 type btrfs (rw,space_cache,compress=lzo) $ cat /proc/mounts | grep sdc2 /dev/sdc2 /test12 btrfs rw,relatime,compress 0 0 [after] $ mount | grep sdc2 /dev/sdc2 on /test12 type btrfs (rw,space_cache,compress=lzo) $ cat /proc/mounts | grep sdc2 /dev/sdc2 /test12 btrfs rw,relatime,compress=lzo,space_cache 0 0 Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix compiler warning in file.cTsutomu Itoh2011-04-051-1/+1
| | | | | | | | | | | | | | While compiling Btrfs, I got following messages: CC [M] fs/btrfs/file.o fs/btrfs/file.c: In function '__btrfs_buffered_write': fs/btrfs/file.c:909: warning: 'ret' may be used uninitialized in this function CC [M] fs/btrfs/tree-defrag.o This patch fixes compiler warning. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix __btrfs_map_block on 32 bit machinesChris Mason2011-03-281-6/+20
| | | | | | | Recent changes for discard support didn't compile, this fixes them not to try and % 64 bit numbers. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: fix possible deadlock by clearing __GFP_FS flagMiao Xie2011-03-282-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using the GFP_HIGHUSER_MOVABLE flag to allocate the metadata's page may cause deadlock. Task1 open() ... btrfs_search_slot() ... btrfs_cow_block() ... alloc_page() wait for reclaiming shrink_slab() ... shrink_icache_memory() ... btrfs_evict_inode() ... btrfs_search_slot() If the path is locked by task1, the deadlock happens. So the btree's page cache is different with the file's page cache, it can not allocate pages by GFP_HIGHUSER_MOVABLE flag, we must clear __GFP_FS flag in GFP_HIGHUSER_MOVABLE flag. Reported-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: check link counter overflow in link(2)Al Viro2011-03-281-0/+3
| | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: don't mess with i_nlink of unlocked inode in rename()Al Viro2011-03-281-11/+25
| | | | | | | | | | | old_inode is not locked; it's not safe to play with its link count. Instead of bumping it and calling btrfs_unlink_inode(), add a variant of the latter that does not do btrfs_drop_nlink()/ btrfs_update_inode(), call it instead of btrfs_inc_nlink()/ btrfs_unlink_inode() and do btrfs_update_inode() ourselves. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: check return value of btrfs_alloc_path()Tsutomu Itoh2011-03-284-14/+27
| | | | | | | | Adding the check on the return value of btrfs_alloc_path() to several places. And, some of callers are modified by this change. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix OOPS of empty filesystem after balanceliubo2011-03-283-0/+30
| | | | | | | | | | | | btrfs will remove unused block groups after balance. When a empty filesystem is balanced, the block group with tag "DATA" may be dropped, and after umount and mount again, it will not find "DATA" space_info and lead to OOPS. So we initial the necessary space_infos(DATA, SYSTEM, METADATA) to avoid OOPS. Reported-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix memory leak of empty filesystem after balanceliubo2011-03-281-0/+6
| | | | | | | | | | | | | | | | | | | | After Josef's patch(commit 3c14874acc71180553fb5aba528e3cf57c5b958b), btrfs will exclude super bytes when reading block groups(by marking a extent state UPTODATE). However, these bytes do not get freed while balance remove unused block groups, and we won't process those removed ones any more, when we do umount and unload the btrfs module, btrfs hits a memory leak. This patch add the missing free operation. Reproduce steps: $ mkfs.btrfs disk $ mount disk /mnt/btrfs -o loop $ btrfs filesystem balance /mnt/btrfs $ umount /mnt/btrfs $ rmmod btrfs Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix return value of setflags ioctlliubo2011-03-281-1/+3
| | | | | | | | setflags ioctl should return error when any checks fail. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix uncheck memory allocationsYoshinori Sano2011-03-283-0/+12
| | | | | | | | | | To make Btrfs code more robust, several return value checks where memory allocation can fail are introduced. I use BUG_ON where I don't know how to handle the error properly, which increases the number of using the notorious BUG_ON, though. Signed-off-by: Yoshinori Sano <yoshinori.sano@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: make inode ref log recovery fasterliubo2011-03-281-24/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we recover from crash via write-ahead log tree and process the inode refs, for each btrfs_inode_ref item, we will 1) check if we already have a perfect match in fs/file tree, if we have, then we're done. 2) search the corresponding back reference in fs/file tree, and check all the names in this back reference to see if they are also in the log to avoid conflict corners. 3) recover the logged inode refs to fs/file tree. In current btrfs, however, - for 2)'s check, once is enough, since the checked back reference will remain unchanged after processing all the inode refs belonged to the key. - it has no need to do another 1) between 2) and 3). I've made a small test to show how it improves, $dd if=/dev/zero of=foobar bs=4K count=1 $sync $make 100 hard links continuously, like ln foobar link_i $fsync foobar $echo b > /proc/sysrq-trigger after reboot $time mount DEV PATH without patch: real 0m0.285s user 0m0.001s sys 0m0.009s with patch: real 0m0.123s user 0m0.000s sys 0m0.010s Changelog v1->v2: - fix double free - pointed by David Sterba Changelog v2->v3: - adjust free order Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: add btrfs_trim_fs() to handle FITRIMLi Dongyang2011-03-285-1/+190
| | | | | | | | | | We take an free extent out from allocator, trim it, then put it back, but before we trim the block group, we should make sure the block group is cached, so plus a little change to make cache_block_group() run without a transaction. Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytesLi Dongyang2011-03-283-19/+31
| | | | | | | | | | | | Callers of btrfs_discard_extent() should check if we are mounted with -o discard, as we want to make fitrim to work even the fs is not mounted with -o discard. Also we should use REQ_DISCARD to map the free extent to get a full mapping, last we only return errors if 1. the error is not a EOPNOTSUPP 2. no device supports discard Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: make btrfs_map_block() return entire free extent for each device of ↵Li Dongyang2011-03-282-22/+129
| | | | | | | | | | | RAID0/1/10/DUP btrfs_map_block() will only return a single stripe length, but we want the full extent be mapped to each disk when we are trimming the extent, so we add length to btrfs_bio_stripe and fill it if we are mapping for REQ_DISCARD. Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: make update_reserved_bytes() publicLi Dongyang2011-03-282-9/+9
| | | | | | | | Make the function public as we should update the reserved extents calculations after taking out an extent for trimming. Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: return EXDEV when linking from different subvolumesMark Fasheh2011-03-281-1/+1
| | | | | | | | | | | | | | | | | | btrfs_link returns EPERM if a cross-subvolume link is attempted. However, in this case I believe EXDEV to be the more appropriate value. >From the link(2) man page: EXDEV oldpath and newpath are not on the same mounted file system. (Linux permits a file system to be mounted at multiple points, but link() does not work across different mount points, even if the same file system is mounted on both.) This matters because an application may have different behaviors based on return codes. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Per file/directory controls for COW and compressionLiu Bo2011-03-284-7/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Data compression and data cow are controlled across the entire FS by mount options right now. ioctls are needed to set this on a per file or per directory basis. This has been proposed previously, but VFS developers wanted us to use generic ioctls rather than btrfs-specific ones. According to Chris's comment, there should be just one true compression method(probably LZO) stored in the super. However, before this, we would wait for that one method is stable enough to be adopted into the super. So I list it as a long term goal, and just store it in ram today. After applying this patch, we can use the generic "FS_IOC_SETFLAGS" ioctl to control file and directory's datacow and compression attribute. NOTE: - The compression type is selected by such rules: If we mount btrfs with compress options, ie, zlib/lzo, the type is it. Otherwise, we'll use the default compress type (zlib today). v1->v2: - rebase to the latest btrfs. v2->v3: - fix a problem, i.e. when a file is set NOCOW via mount option, then this NOCOW will be screwed by inheritance from parent directory. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: use GFP_NOFS instead of GFP_KERNELMiao Xie2011-03-281-2/+2
| | | | | | | | In the filesystem context, we must allocate memory by GFP_NOFS, or we may start another filesystem operation and make kswap thread hang up. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: check return value of read_tree_block()Tsutomu Itoh2011-03-283-0/+15
| | | | | | | | This patch is checking return value of read_tree_block(), and if it is NULL, error processing. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: properly access unaligned checksum bufferDavid Sterba2011-03-281-1/+2
| | | | | | | | | | | | | | | | | | | | On Fri, Mar 18, 2011 at 11:56:53AM -0400, Chris Mason wrote: > Thanks for fielding this one. Does put_unaligned_le32 optimize away on > platforms with efficient access? It would be great if we didn't need > the #ifdef. (quicktest: assembly output is same for put_unaligned_le32 and direct assignment on my x86_64) I was originally following examples in Documentation/unaligned-memory-access.txt. From other code it seems to me that the define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is intended for larger portions of code. Macros/wrappers for {put,get}_unaligned* are chosen via arch/<arch>/include/asm/unaligned.h accordingly, therefore it's safe to use put_unaligned_le32 without the ifdef. dave Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: cleanup some BUG_ON()Tsutomu Itoh2011-03-289-23/+54
| | | | | | | | This patch changes some BUG_ON() to the error return. (but, most callers still use BUG_ON()) Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: add initial tracepoint support for btrfsliubo2011-03-2812-11/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tracepoints can provide insight into why btrfs hits bugs and be greatly helpful for debugging, e.g dd-7822 [000] 2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0 dd-7822 [000] 2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0 btrfs-transacti-7804 [001] 2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0) btrfs-transacti-7804 [001] 2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0) btrfs-transacti-7804 [001] 2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8 flush-btrfs-2-7821 [001] 2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA flush-btrfs-2-7821 [001] 2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0) flush-btrfs-2-7821 [001] 2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0) flush-btrfs-2-7821 [000] 2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0) Here is what I have added: 1) ordere_extent: btrfs_ordered_extent_add btrfs_ordered_extent_remove btrfs_ordered_extent_start btrfs_ordered_extent_put These provide critical information to understand how ordered_extents are updated. 2) extent_map: btrfs_get_extent extent_map is used in both read and write cases, and it is useful for tracking how btrfs specific IO is running. 3) writepage: __extent_writepage btrfs_writepage_end_io_hook Pages are cirtical resourses and produce a lot of corner cases during writeback, so it is valuable to know how page is written to disk. 4) inode: btrfs_inode_new btrfs_inode_request btrfs_inode_evict These can show where and when a inode is created, when a inode is evicted. 5) sync: btrfs_sync_file btrfs_sync_fs These show sync arguments. 6) transaction: btrfs_transaction_commit In transaction based filesystem, it will be useful to know the generation and who does commit. 7) back reference and cow: btrfs_delayed_tree_ref btrfs_delayed_data_ref btrfs_delayed_ref_head btrfs_cow_block Btrfs natively supports back references, these tracepoints are helpful on understanding btrfs's COW mechanism. 8) chunk: btrfs_chunk_alloc btrfs_chunk_free Chunk is a link between physical offset and logical offset, and stands for space infomation in btrfs, and these are helpful on tracing space things. 9) reserved_extent: btrfs_reserved_extent_alloc btrfs_reserved_extent_free These can show how btrfs uses its space. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: use RCU instead of a spinlock to protect the root nodeChris Mason2011-03-281-19/+8
| | | | | | | | | | | The pointer to the extent buffer for the root of each tree is protected by a spinlock so that we can safely read the pointer and take a reference on the extent buffer. But now that the extent buffers are freed via RCU, we can safely use rcu_read_lock instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: mark the bio with an error if we have a failure in dioJosef Bacik2011-03-251-0/+8
| | | | | | | | | | I noticed that dio_end_io calls the appropriate endio function with an error, but the endio functions don't actually do anything with that error, they assume that if there was an error then the bio will not be uptodate. So if we had checksum failures we would never pass back EIO. So if there is an error in our endio functions make sure to clear the uptodate flag on the bio. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: don't allocate dip->csums when doing writesJosef Bacik2011-03-251-2/+5
| | | | | | | | | When doing direct writes we store the checksums in the ordered sum stuff in the ordered extent for writing them when the write completes, so we don't even use the dip->csums array. So if we're writing, don't bother allocating dip->csums since we won't use it anyway. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: cleanup how we setup free space clustersJosef Bacik2011-03-252-185/+182
| | | | | | | | | | | | | | | | | | | | | | | | This patch makes the free space cluster refilling code a little easier to understand, and fixes some things with the bitmap part of it. Currently we either want to refill a cluster with 1) All normal extent entries (those without bitmaps) 2) A bitmap entry with enough space The current code has this ugly jump around logic that will first try and fill up the cluster with extent entries and then if it can't do that it will try and find a bitmap to use. So instead split this out into two functions, one that tries to find only normal entries, and one that tries to find bitmaps. This also fixes a suboptimal thing we would do with bitmaps. If we used a bitmap we would just tell the cluster that we were pointing at a bitmap and it would do the tree search in the block group for that entry every time we tried to make an allocation. Instead of doing that now we just add it to the clusters group. I tested this with my ENOSPC tests and xfstests and it survived. Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: don't be as aggressive about using bitmapsJosef Bacik2011-03-211-3/+16
| | | | | | | | | | | | We have been creating bitmaps for small extents unconditionally forever. This was great when testing to make sure the bitmap stuff was working, but is overkill normally. So instead of always adding small chunks of free space to bitmaps, only start doing it if we go past half of our extent threshold. This will keeps us from creating a bitmap for just one small free extent at the front of the block group, and will make the allocator a little faster as a result. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: deal with min_bytes appropriately when looking for a clusterJosef Bacik2011-03-211-5/+4
| | | | | | | | | | | | | | | | | | | | We do all this fun stuff with min_bytes, but either don't use it in the case of just normal extents, or use it completely wrong in the case of bitmaps. So fix this for both cases 1) In the extent case, stop looking for space with window_free >= min_bytes instead of bytes + empty_size. 2) In the bitmap case, we were looking for streches of free space that was at least min_bytes in size, which was not right at all. So instead search for stretches of free space that are at least bytes in size (this will make a difference when we have > page size blocks) and then only search for min_bytes amount of free space. Thanks, Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: check free space in block group before searching for a clusterJosef Bacik2011-03-211-0/+10
| | | | | | | | The free space cluster stuff is heavy duty, so there is no sense in going through the entire song and dance if there isn't enough space in the block group to begin with. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: add checks to verify dir items are correctJosef Bacik2011-03-175-0/+50
| | | | | | | | We need to make sure the dir items we get are valid dir items. So any time we try and read one check it with verify_dir_item, which will do various sanity checks to make sure it looks sane. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: check return value of btrfs_search_slot properlyJosef Bacik2011-03-171-0/+2
| | | | | | | | | Doing an audit of where we use btrfs_search_slot only showed one place where we don't check the return value of btrfs_search_slot properly. Just fix mark_extent_written to see if btrfs_search_slot failed and act accordingly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: check items for correctness as we searchJosef Bacik2011-03-174-124/+95
| | | | | | | | | | | | Currently if we have corrupted items things will blow up in spectacular ways. So as we read in blocks and they are leaves, check the entire leaf to make sure all of the items are correct and point to valid parts in the leaf for the item data the are responsible for. If the item is corrupt we will kick back EIO and not read any of the copies since they are likely to not be correct either. This will catch generic corruptions, it will be up to the individual callers of btrfs_search_slot to make sure their items are right. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: return error if the range we want to map is bogusJosef Bacik2011-03-171-0/+1
| | | | | | | | Currently if we have corrupt metadata map_extent_buffer will complain about it, but not return an error so the caller has no idea a problem was hit. Fix this. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: add a comment explaining what btrfs_cont_expand doesJosef Bacik2011-03-171-0/+6
| | | | | | | | Everytime I have to deal with btrfs_cont_expand I stare at it for 20 minutes trying to remember what exactly it does and why the hell we need it. So add a comment to save future-Josef some time. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: use mark_inode_dirty when expanding the fileJosef Bacik2011-03-171-15/+1
| | | | | | | | | Mark_inode_dirty will call btrfs_dirty_inode which will take care of updating the inode. This makes setsize a little cleaner since we don't have to start a transaction and update the inode in there, we can just call mark_inode_dirty. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: only add orphan items when truncatingJosef Bacik2011-03-171-27/+18
| | | | | | | | We don't need an orphan item when expanding files, we just need them for truncating them, so only add the orphan item in btrfs_truncate instead of in btrfs_setsize. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: make sure to remove the orphan item from the in-memory listJosef Bacik2011-03-171-0/+6
| | | | | | | | | This fixes a problem where if truncate fails the inode will still be on the in memory orphan list. This is will make us complain when the inode gets destroyed because it's still on the orphan list. So if we fail just remove us from the in memory list and carry on. Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: handle errors in btrfs_orphan_cleanupJosef Bacik2011-03-176-23/+50
| | | | | | | | | If we cannot truncate an inode for some reason we will never delete the orphan item associated with that inode, which means that we will loop forever in btrfs_orphan_cleanup. Instead of doing this just return error so we fail to mount. It sucks, but hey it's better than hanging. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>