aboutsummaryrefslogtreecommitdiffstats
path: root/fs/btrfs/super.c
Commit message (Collapse)AuthorAgeFilesLines
* Btrfs: Add zlib compression supportChris Mason2008-10-291-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a large change for adding compression on reading and writing, both for inline and regular extents. It does some fairly large surgery to the writeback paths. Compression is off by default and enabled by mount -o compress. Even when the -o compress mount option is not used, it is possible to read compressed extents off the disk. If compression for a given set of pages fails to make them smaller, the file is flagged to avoid future compression attempts later. * While finding delalloc extents, the pages are locked before being sent down to the delalloc handler. This allows the delalloc handler to do complex things such as cleaning the pages, marking them writeback and starting IO on their behalf. * Inline extents are inserted at delalloc time now. This allows us to compress the data before inserting the inline extent, and it allows us to insert an inline extent that spans multiple pages. * All of the in-memory extent representations (extent_map.c, ordered-data.c etc) are changed to record both an in-memory size and an on disk size, as well as a flag for compression. From a disk format point of view, the extent pointers in the file are changed to record the on disk size of a given extent and some encoding flags. Space in the disk format is allocated for compression encoding, as well as encryption and a generic 'other' field. Neither the encryption or the 'other' field are currently used. In order to limit the amount of data read for a single random read in the file, the size of a compressed extent is limited to 128k. This is a software only limit, the disk format supports u64 sized compressed extents. In order to limit the ram consumed while processing extents, the uncompressed size of a compressed extent is limited to 256k. This is a software only limit and will be subject to tuning later. Checksumming is still done on compressed extents, and it is done on the uncompressed version of the data. This way additional encodings can be layered on without having to figure out which encoding to checksum. Compression happens at delalloc time, which is basically singled threaded because it is usually done by a single pdflush thread. This makes it tricky to spread the compression load across all the cpus on the box. We'll have to look at parallel pdflush walks of dirty inodes at a later time. Decompression is hooked into readpages and it does spread across CPUs nicely. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: add and improve commentsChris Mason2008-09-291-0/+3
| | | | | | | | | | | This improves the comments at the top of many functions. It didn't dive into the guts of functions because I was trying to avoid merging problems with the new allocator and back reference work. extent-tree.c and volumes.c were both skipped, and there is definitely more work todo in cleaning and commenting the code. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Remove Btrfs compat code for older kernelsChris Mason2008-09-251-7/+0
| | | | | | | | Btrfs had compatibility code for kernels back to 2.6.18. These have been removed, and will be maintained in a separate backport git tree from now on. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Reinstate '-osubvol=.' option to mount entire treeDavid Woodhouse2008-09-251-15/+19
| | | | | | | | Date: Tue, 19 Aug 2008 16:49:35 +0100 This disappeared when I removed the special case for '.' in btrfs_lookup() Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Mask root object ID into f_fsid in btrfs_statfs()David Woodhouse2008-09-251-0/+4
| | | | | | | | | Date: Mon, 18 Aug 2008 13:10:20 +0100 This means that subvolumes get a different fsid, and NFS exporting them works properly. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Fill f_fsid field in btrfs_statfs()David Woodhouse2008-09-251-0/+6
| | | | | | Date: Mon, 18 Aug 2008 12:01:52 +0100 Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* NFS support for btrfs - v3Balaji Rao2008-09-251-0/+2
| | | | | | | | | | | | | | | | Date: Mon, 21 Jul 2008 02:01:56 +0530 Here's an implementation of NFS support for btrfs. It relies on the fixes which are going in to 2.6.28 for the NFS readdir/lookup deadlock. This uses the btrfs_iget helper introduced previously. [dwmw2: Tidy up a little, switch to d_obtain_alias() w/compat routine, change fh_type, store parent's root object ID where needed, fix some get_parent() and fs_to_dentry() bugs] Signed-off-by: Balaji Rao <balajirrao@gmail.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Various small fixes.Yan Zheng2008-09-251-0/+2
| | | | | | | | This trivial patch contains two locking fixes and a off by one fix. --- Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add ACL supportJosef Bacik2008-09-251-2/+7
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add version strings on module loadChris Mason2008-09-251-0/+3
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Online btree defragmentation fixesChris Mason2008-09-251-1/+0
| | | | | | | | | | The btree defragger wasn't making forward progress because the new key wasn't being saved by the btrfs_search_forward function. This also disables the automatic btree defrag, it wasn't scaling well to huge filesystems. The auto-defrag needs to be done differently. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Replace the transaction work queue with kthreadsChris Mason2008-09-251-10/+6
| | | | | | | This creates one kthread for commits and one kthread for deleting old snapshots. All the work queues are removed. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Replace the big fs_mutex with a collection of other locksChris Mason2008-09-251-2/+0
| | | | | | | | Extent alloctions are still protected by a large alloc_mutex. Objectid allocations are covered by a objectid mutex Other btree operations are protected by a lock on individual btree nodes Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add a mount option to control worker thread pool sizeChris Mason2008-09-251-1/+12
| | | | | | | | | | | | mount -o thread_pool_size changes the default, which is min(num_cpus + 2, 8). Larger thread pools would make more sense on very large disk arrays. This mount option controls the max size of each thread pool. There are multiple thread pools, so the total worker count will be larger than the mount option. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix mount -o max_inline=0Chris Mason2008-09-251-2/+5
| | | | | | | | max_inline=0 used to force the max_inline size to one sector instead. Now it properly disables inline data items, while still being able to read any that happen to exist on disk. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: allow scanning multiple devices during mountChristoph Hellwig2008-09-251-5/+16
| | | | | | | | | Allows to specify one or multiple device=/dev/foo options during mount so that ioctls on the control device can be avoided. Especially useful when trying to mount a multi-device setup as root. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: sanity mount option parsing and early mount codeChristoph Hellwig2008-09-251-105/+136
| | | | | | | Also adds lots of comments to describe what's going on here. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: transaction ioctlsSage Weil2008-09-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These ioctls let a user application hold a transaction open while it performs a series of operations. A final ioctl does a sync on the fs (closing the current transaction). This is the main requirement for Ceph's OSD to be able to keep the data it's storing in a btrfs volume consistent, and AFAICS it works just fine. The application would do something like fd = ::open("some/file", O_RDONLY); ::ioctl(fd, BTRFS_IOC_TRANS_START); /* do a bunch of stuff */ ::ioctl(fd, BTRFS_IOC_TRANS_END); or just ::close(fd); And to ensure it commits to disk, ::ioctl(fd, BTRFS_IOC_SYNC); When a transaction is held open, the trans_handle is attached to the struct file (via private_data) so that it will get cleaned up if the process dies unexpectedly. A held transaction is also ended on fsync() to avoid a deadlock. A misbehaving application could also deliberately hold a transaction open, effectively locking up the FS, so it may make sense to restrict something like this to root or something. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfsctl -A error code fixupLinda Knippers2008-09-251-2/+2
| | | | | | Send the error back to userland if the ioctl fails Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs delete ordered inode handling fixMingming2008-09-251-1/+0
| | | | | | Use btrfs_release_file instead of a put_inode call Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add mount -o degraded to allow mounts to continue with missing devicesChris Mason2008-09-251-10/+15
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add support for online device removalChris Mason2008-09-251-27/+8
| | | | | | | | | | | | | This required a few structural changes to the code that manages bdev pointers: The VFS super block now gets an anon-bdev instead of a pointer to the lowest bdev. This allows us to avoid swapping the super block bdev pointer around at run time. The code to read in the super block no longer goes through the extent buffer interface. Things got ugly keeping the mapping constant. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add new ioctl to add devicesChris Mason2008-09-251-0/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Fix btrfs_fill_super to return -EINVAL when no FS foundYan2008-09-251-2/+2
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add support for device scanning and detection ioctlsChris Mason2008-09-251-17/+44
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Add /dev/btrfs-control for device scanning ioctlsChris Mason2008-09-251-1/+40
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Misc 2.6.25 updatesChris Mason2008-09-251-1/+10
| | | | | | Remove the btrfs read_inode method, and use save_mount_options Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: mount -o max_inline=size to control the maximum inline extent sizeChris Mason2008-09-251-1/+18
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Split the extent_map code into two partsChris Mason2008-09-251-1/+9
| | | | | | | | | | | | | | There is now extent_map for mapping offsets in the file to disk and extent_io for state tracking, IO submission and extent_bufers. The new extent_map code shifts from [start,end] pairs to [start,len], and pushes the locking out into the caller. This allows a few performance optimizations and is easier to use. A number of extent_map usage bugs were fixed, mostly with failing to remove extent_map entries when changing the file. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add basic lockfs callsYan2008-09-251-1/+13
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add mount -o ssd, which includes optimizations for seek free storageChris Mason2008-09-251-1/+8
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Run igrab on data=ordered inodes to prevent deadlocks during writeoutChris Mason2008-09-251-1/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add drop inode func to avoid data=ordered deadlockChris Mason2008-09-251-0/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add flush barriers on commitChris Mason2008-09-251-1/+8
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add readahead to the online shrinker, and a mount -o alloc_start= for ↵Chris Mason2008-09-251-1/+15
| | | | | | testing Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Support for online FS resize (grow and shrink)Chris Mason2008-09-251-3/+4
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Back port to 2.6.18-el kernelsChris Mason2008-09-251-0/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add mount option to enforce a max extent sizeChris Mason2008-09-251-1/+45
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add mount option to turn off data cowChris Mason2008-09-251-7/+27
| | | | | | | | | | | A number of workloads do not require copy on write data or checksumming. mount -o nodatasum to disable checksums and -o nodatacow to disable both copy on write and checksumming. In nodatacow mode, copy on write is still performed when a given extent is under snapshot. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add mount -o nodatasum to turn of file data checksummingChris Mason2008-09-251-2/+15
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Return value checking in module initWyatt Banks2008-09-251-3/+18
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* xattr support for btrfsJosef Bacik2008-09-251-0/+2
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Allow tails larger than one pageChris Mason2008-09-251-2/+0
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Allow tree blocks larger than the page sizeChris Mason2008-09-251-2/+4
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Create extent_buffer interface for large blocksizesChris Mason2008-09-251-3/+4
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Use mount -o subvol to select the subvol directory instead of dev:Chris Mason2007-08-291-10/+39
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add mount into directory supportYan2007-08-291-3/+120
| | | | | | | | | | Modified form of original patch from Christoph Hellwig to make btrfs mount into the default subvolume by default. mount /dev/somedevice:subvolumename to get other subvolumes or mount /dev/somedevice:. to get the root Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add per-root block accounting and sysfs entriesJosef Bacik2007-08-291-0/+14
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add delayed allocation to the extent based page tree codeChris Mason2007-08-271-0/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Extent based page cache code. This uses an rbtree of extents and testsChris Mason2007-08-271-0/+2
| | | | | | instead of buffer heads. Signed-off-by: Chris Mason <chris.mason@oracle.com>