aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* [PATCH] ext3_fsblk_t: the rest of in-kernel filesystem blocks conversionMingming Cao2006-06-256-82/+82
| | | | | | | | | | Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t. Convert the rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t, and replace the printk format string respondingly. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ext3_fsblk_t: filesystem, group blocks and bug fixesMingming Cao2006-06-256-141/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of the in-kernel ext3 block variable type are treated as signed 4 bytes int type, thus limited ext3 filesystem to 8TB (4kblock size based). While trying to fix them, it seems quite confusing in the ext3 code where some blocks are filesystem-wide blocks, some are group relative offsets that need to be signed value (as -1 has special meaning). So it seem saner to define two types of physical blocks: one is filesystem wide blocks, another is group-relative blocks. The following patches clarify these two types of blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3 filesystem limit to 8TB. With this series of patches and the percpu counter data type changes in the mm tree, we are able to extend exts filesystem limit to 16TB. This work is also a pre-request for the recent >32 bit ext3 work, and makes the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine ext3_fsblk_t from unsigned long to sector_t and redefine the format string for ext3 filesystem block corresponding. Two RFC with a series patches have been posted to ext2-devel list and have been reviewed and discussed: http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2 http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2 Patches are tested on both 32 bit machine and 64 bit machine, <8TB ext3 and >8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39). Tests includes overnight fsx, tiobench, dbench and fsstress. This patch: Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for filesystem wide blocks. This patch classifies all block group relative blocks, and ext3_fsblk_t blocks occurs in the same function where used to be confusing before. Also include kernel bug fixes for filesystem wide in-kernel block variables. There are some fileystem wide blocks are treated as int/unsigned int type in the kernel currently, especially in ext3 block allocation and reservation code. This patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned long) type. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Prepare for __copy_from_user_inatomic to not zero missed bytesNeilBrown2006-06-251-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem is that when we write to a file, the copy from userspace to pagecache is first done with preemption disabled, so if the source address is not immediately available the copy fails *and* *zeros* *the* *destination*. This is a problem because a concurrent read (which admittedly is an odd thing to do) might see zeros rather that was there before the write, or what was there after, or some mixture of the two (any of these being a reasonable thing to see). If the copy did fail, it will immediately be retried with preemption re-enabled so any transient problem with accessing the source won't cause an error. The first copying does not need to zero any uncopied bytes, and doing so causes the problem. It uses copy_from_user_atomic rather than copy_from_user so the simple expedient is to change copy_from_user_atomic to *not* zero out bytes on failure. The first of these two patches prepares for the change by fixing two places which assume copy_from_user_atomic does zero the tail. The two usages are very similar pieces of code which copy from a userspace iovec into one or more page-cache pages. These are changed to remove the assumption. The second patch changes __copy_from_user_inatomic* to not zero the tail. Once these are accepted, I will look at similar patches of other architectures where this is important (ppc, mips and sparc being the ones I can find). This patch: There is a problem with __copy_from_user_inatomic zeroing the tail of the buffer in the case of an error. As it is called in atomic context, the error may be transient, so it results in zeros being written where maybe they shouldn't be. In the usage in filemap, this opens a window for a well timed read to see data (zeros) which is not consistent with any ordering of reads and writes. Most cases where __copy_from_user_inatomic is called, a failure results in __copy_from_user being called immediately. As long as the latter zeros the tail, the former doesn't need to. However in *copy_from_user_iovec implementations (in both filemap and ntfs/file), it is assumed that copy_from_user_inatomic will zero the tail. This patch removes that assumption, so that after this patch it will be safe for copy_from_user_inatomic to not zero the tail. This patch also adds some commentary to filemap.h and asm-i386/uaccess.h. After this patch, all architectures that might disable preempt when kmap_atomic is called need to have their __copy_from_user_inatomic* "fixed". This includes - powerpc - i386 - mips - sparc Signed-off-by: Neil Brown <neilb@suse.de> Cc: David Howells <dhowells@redhat.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ext3: remove inconsistent space before exclamation point in mount codeTheodore Ts'o2006-06-251-1/+1
| | | | | | | | This was reported as Debian bug #336604. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ext3: fix memory leak when the journal file is corruptedTheodore Ts'o2006-06-251-0/+1
| | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ext2: clean up dead code from mount codeTheodore Ts'o2006-06-251-1/+0
| | | | | | | | | The variable i is guaranteed to be the same as db_count given the previous for loop. So get rid of it since it's dead code. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Avoid disk sector_t overflow for >2TB ext3 filesystemMingming Cao2006-06-252-0/+20
| | | | | | | | | | | | | | | | If ext3 filesystem is larger than 2TB, and sector_t is a u32 (i.e. CONFIG_LBD not defined in the kernel), the calculation of the disk sector will overflow. Add check at ext3_fill_super() and ext3_group_extend() to prevent mount/remount/resize >2TB ext3 filesystem if sector_t size is 4 bytes. Verified this patch on a 32 bit platform without CONFIG_LBD defined (sector_t is 32 bits long), mount refuse to mount a 10TB ext3. Signed-off-by: Mingming Cao<cmm@us.ibm.com> Acked-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] openpromfs: factorize outJan Engelhardt2006-06-251-5/+10
| | | | | | | | | | | | | "Move" "common code" out to PTR_NOD, which does the conversion from private pointer to node number. This is to reduce potential casting/conversion errors due to redundancy. (The naming PTR_NOD follows PTR_ERR, turning a pointer into xyz.) [akpm@osdl.org: cleanups] Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] openpromfs: remove unnecessary castsJan Engelhardt2006-06-251-12/+12
| | | | | | | | | Remove unnecessary casts in fs/openpromfs/inode.c Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] openpromfs: fix missing NULJan Engelhardt2006-06-251-2/+3
| | | | | | | | | | | tchars is not '\0'-terminated so the strtoul may run into problems. Fix that. Also make tchars as big as a long in hexadecimal form would take rather than just 16. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fs/ufs/inode.c: make 2 functions staticAdrian Bunk2006-06-251-3/+6
| | | | | | | | Make two needlessly global functions static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: ubh_ll_rw_block cleanupEvgeniy Dushistov2006-06-255-14/+13
| | | | | | | | | | In ufs code there is function: ubh_ll_rw_block, it has parameter how many ufs_buffer_head it should handle, but it always called with "1" on the place of this parameter. This patch removes unused parameter of "ubh_ll_wr_block". Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: make fsck -f happyEvgeniy Dushistov2006-06-254-56/+126
| | | | | | | | | | | | | | | | | | | | | | ufs super block contains some statistic about file systems, like amount of directories, free blocks, inodes and so on. UFS1 hold this information in one location and uses 32bit integers for such information, UFS2 hold statistic in another location and uses 64bit integers. There is transition variant, if UFS1 has type 44BSD and flags field in super block has some special value this mean that we work with statistic like UFS2 does. and this also means that nobody care about old(UFS1) statistic. So if start fsck against such file system, after usage linux ufs driver, it found error: at now only UFS1 like statistic is updated. This patch should fix this. Also it contains some minor cleanup: CodingSytle and remove unused variables. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: fsync implementationEvgeniy Dushistov2006-06-251-0/+21
| | | | | | | | | Presently ufs doesn't support "fsync", this make some applications unhappy, for example vim. This patch fixes this situation. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: one way to access super blockEvgeniy Dushistov2006-06-252-148/+121
| | | | | | | | | | | | | | | | | | | | | | | | | Super block of UFS usually has size >512, because of fragment size may be 512, this cause some problems. Currently, there are two methods to work with ufs super block: 1) split structure which describes ufs super blocks into structures with size <=512 2) use one structure which describes ufs super block, and hope that array of "buffer_head" which holds "super block", has such construction: bh[n]->b_data + bh[n]->b_size == bh[n + 1]->b_data The second variant may cause some problems in the future, and usage of two variants cause unnecessary code duplication. This patch remove the second variant. Also patch contains some CodingStyle fixes. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: missed brelse and wrong baseblkEvgeniy Dushistov2006-06-252-7/+6
| | | | | | | | | | | | | This patch fixes two bugs, which introduced by previous patches: 1) Missed "brelse" 2) Sometimes "baseblk" may be wrongly calculated, if i_size is equal to zero, which lead infinite cycle in "mpage_writepages". Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: printk warning fixesAndrew Morton2006-06-251-3/+3
| | | | | | | | | | fs/ufs/super.c: In function `ufs_print_super_stuff': fs/ufs/super.c:103: warning: unsigned int format, different type arg (arg 2) fs/ufs/super.c: In function `ufs2_print_super_stuff': fs/ufs/super.c:147: warning: unsigned int format, different type arg (arg 2) fs/ufs/super.c: In function `ufs_print_cylinder_stuff': fs/ufs/super.c:175: warning: unsigned int format, different type arg (arg 2) Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: zero metadataEvgeniy Dushistov2006-06-251-40/+76
| | | | | | | | | | | Presently if we allocate several "metadata" blocks (pointers to indirect blocks for example), we fill with zeroes only the first block. This cause some problems in "truncate" function. Also this patch remove some unused arguments from several functions and add comments. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: unlock_super without lockEvgeniy Dushistov2006-06-251-4/+5
| | | | | | | | | | | | | | | ufs_free_blocks function looks now in so way: if (err) goto failed; lock_super(); failed: unlock_super(); So if error happen we'll unlock not locked super. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: i_blocks wrong countEvgeniy Dushistov2006-06-252-16/+12
| | | | | | | | | At now UFS code uses DQUOT_* mechanism, but it also update inode->i_blocks manually, this cause wrong i_blocks value. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: little directory lookup optimizationEvgeniy Dushistov2006-06-253-5/+7
| | | | | | | | | This patch make little optimization of ufs_find_entry like "ext2" does. Save number of page and reuse it again in the next call. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: easy debugEvgeniy Dushistov2006-06-2510-217/+155
| | | | | | | | | | | | | | | Currently to turn on debug mode "user" has to edit ~10 files, to turn off he has to do it again. This patch introduce such changes: 1)turn on(off) debug messages via ".config" 2)remove unnecessary duplication of code 3)make "UFSD" macros more similar to function 4)fix some compiler warnings Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: Unmark CONFIG_UFS_FS_WRITE as BROKENEvgeniy Dushistov2006-06-251-1/+1
| | | | | | | | | | | To find new bugs, I suggest revert this patch: http://lkml.org/lkml/2006/1/31/275 in -mm tree. So others can test "write support" of UFS. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: not usual amounts of fragments per blockEvgeniy Dushistov2006-06-252-71/+87
| | | | | | | | | | The writing to UFS file system with block/fragment!=8 may cause bogus behaviour. The problem in "ufs_bitmap_search" function, which doesn't work correctly in "block/fragment!=8" case. The idea is stolen from BSD code. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: wrong type castEvgeniy Dushistov2006-06-256-84/+90
| | | | | | | | | | | | | | | | | | | | | | There are two ugly macros in ufs code: #define UCPI_UBH ((struct ufs_buffer_head *)ucpi) #define USPI_UBH ((struct ufs_buffer_head *)uspi) when uspi looks like struct { struct ufs_buffer_head ; } and USPI_UBH has some sence, ucpi looks like struct { struct not_ufs_buffer_head; } To prevent bugs in future, this patch convert macros to inline function and fix "ucpi" structure. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: directory and page cache: from blocks to pagesEvgeniy Dushistov2006-06-252-504/+547
| | | | | | | | | | | | Change function in fs/ufs/dir.c and fs/ufs/namei.c to work with pages instead of straight work with blocks. It fixed such bugs: * for i in `seq 1 1000`; do touch $i; done - crash system * mkdir create directory without "." and ".." entries Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: directory and page cache: install aopsEvgeniy Dushistov2006-06-252-34/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This series of patches finished "bugs fixing" mentioned here http://lkml.org/lkml/2006/1/31/275 . The main bugs: * for i in `seq 1 1000`; do touch $i; done - crash system * mkdir create directory without "." and ".." entries The suggested solution is work with page cache instead of straight work with blocks. Such solution has following advantages * reduce code size and its complexity * some global locks go away * fix bugs The most part of code is stolen from ext2, because of it has similar directory structure. Patches testes with UFS1 and UFS2 file systems. This patch installs i_mapping->a_ops for directory inodes and removes some duplicated code. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: change block number on the flyEvgeniy Dushistov2006-06-252-43/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First of all some necessary notes about UFS by it self: To avoid waste of disk space the tail of file consists not from blocks (which is ordinary big enough, 16K usually), it consists from fragments(which is ordinary 2K). When file is growing its tail occupy 1 fragment, 2 fragments... At some stage decision to allocate whole block is made and all fragments are moved to one block. How this situation was handled before: ufs_prepare_write ->block_prepare_write ->ufs_getfrag_block ->... ->ufs_new_fragments: bh = sb_bread bh->b_blocknr = result + i; mark_buffer_dirty (bh); This is wrong solution, because: - it didn't take into consideration that there is another cache: "inode page cache" - because of sb_getblk uses not b_blocknr, (it uses page->index) to find certain block, this breaks sb_getblk. How this situation is handled now: we go though all "page inode cache", if there are no such page in cache we load it into cache, and change b_blocknr. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: right block allocationEvgeniy Dushistov2006-06-252-27/+18
| | | | | | | | | | | | * After block allocation, we map it on the same "address" as 8 others blocks * We nullify block several times: once in ufs/block.c and once in block_*write_full_page, and use different "caches" for this. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ufs: ufs_trunc_indirect: infinite cycleEvgeniy Dushistov2006-06-253-47/+21
| | | | | | | | | | | | | | | | | | | Currently, ufs write support have two sets of problems: work with files and work with directories. This series of patches should solve the first problem. This patch is similar to http://lkml.org/lkml/2006/1/17/61 this patch complements it. The situation the same: in ufs_trunc_(not direct), we read block, check if count of links to it is equal to one, if so we finish cycle, if not continue. Because of "count of links" always >=2 this operation cause infinite cycle and hang up the kernel. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fs/freevxfs: cleanup of spelling errorsCliff Wickman2006-06-252-8/+8
| | | | | | | | | Fix of some spelling errors in fs/freevxfs error messages and comments Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] splice: retrieve mapping after locking the pageJens Axboe2006-06-231-17/+29
| | | | | | | | | Otherwise we could be racing with truncate/mapping removal. Problem found/fixed by Nick Piggin <npiggin@suse.de>, logic rewritten by me. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] Kill PF_SYNCWRITE flagJens Axboe2006-06-233-14/+8
| | | | | | | | | | | | | | | A process flag to indicate whether we are doing sync io is incredibly ugly. It also causes performance problems when one does a lot of async io and then proceeds to sync it. Part of the io will go out as async, and the other part as sync. This causes a disconnect between the previously submitted io and the synced io. For io schedulers such as CFQ, this will cause us lost merges and suboptimal behaviour in scheduling. Remove PF_SYNCWRITE completely from the fsync/msync paths, and let the O_DIRECT path just directly indicate that the writes are sync by using WRITE_SYNC instead. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] make kernel warn about incorrectly sized partitionsMike Miller2006-06-231-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes partitions claim to be larger than the reported capacity of a disk device. This patch makes the kernel warn about those partitions. We still permit these patitions to be used. Quoting Andries Brouwer <Andries.Brouwer@cwi.nl>: Case 1: The kernel is mistaken about the size of the disk. (There are commands to clip a disk to a certain capacity, there are jumpers to tell a disk that it should report a certain capacity etc. Usually this is because of BIOS bugs. In bad cases the machine will crash in the BIOS and hence fail to boot if the disk reports full capacity.) In such cases actually accessing the blocks of the partition may work fine, or may work fine after running an unclip utility. I wrote "setmax" some years ago precisely for this reason. Case 2: There was a messy partition table (maybe just a rounding error) but the actual filesystem on the partition is contained in the physical disk. Now using the filesystem goes without problem. Case 3: Both partition and filesystem extend beyond the end of the disk. In forensic or debugging situations one often uses a copy of the start of a disk. Now access beyond the end gives an expected I/O error. Signed-off-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Stephen Cameron <steve.cameron@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] JBD: split checkpoint listsJan Kara2006-06-231-180/+239
| | | | | | | | | | | | | | Split the checkpoint list of the transaction into two lists. In the first list we keep the buffers that need to be submitted for IO. In the second list are kept buffers that were already submitted and we just have to wait for the IO to complete. This should simplify a handling of checkpoint lists a bit and can eventually be also a performance gain. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: "Stephen C. Tweedie" <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] list: use list_replace_init() instead of list_splice_init()Oleg Nesterov2006-06-231-2/+2
| | | | | | | | | | list_splice_init(list, head) does unneeded job if it is known that list_empty(head) == 1. We can use list_replace_init() instead. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] percpu counter data type changes to suppport more than 2**31 ext3 ↵Mingming Cao2006-06-233-30/+33
| | | | | | | | | | | | | | | | | | | | | free blocks counter The percpu counter data type are changed in this set of patches to support more users like ext3 who need more than 32 bit to store the free blocks total in the filesystem. - Generic perpcu counters data type changes. The size of the global counter and local counter were explictly specified using s64 and s32. The global counter is changed from long to s64, while the local counter is changed from long to s32, so we could avoid doing 64 bit update in most cases. - Users of the percpu counters are updated to make use of the new percpu_counter_init() routine now taking an additional parameter to allow users to pass the initial value of the global counter. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] binflt_elf: remove more castsJesper Juhl2006-06-231-9/+10
| | | | | | | | Remove redundant casts from NEW_AUX_ENT() arguments in fs/binfmt_elf.c Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] binfmt_elf: CodingStyle cleanup and remove some pointless castsJesper Juhl2006-06-231-161/+183
| | | | | | | | | Do a CodingStyle cleanup of fs/binfmt_elf.c and also remove some pointless casts of kmalloc() return values in the same file. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] ext3_clear_inode(): avoid kfree(NULL)Andrew Morton2006-06-231-11/+12
| | | | | | | | | | Steven Rostedt <rostedt@goodmis.org> points out that `rsv' here is usually NULL, so we should avoid calling kfree(). Also, fix up some nearby whitespace damage. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] jbd: avoid kfree(NULL)Andrew Morton2006-06-231-3/+6
| | | | | | | | | | | | There are a couple of places where JBD has to check to see whether an unneeded memory allocation was performed. Usually it _was_ needed, so we end up calling kfree(NULL). We can micro-optimise that by checking the pointer before calling kfree(). Thanks to Steven Rostedt <rostedt@goodmis.org> for identifying this. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] jbd: fix BUG in journal_commit_transaction()Jan Kara2006-06-232-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix possible assertion failure in journal_commit_transaction() on jh->b_next_transaction == NULL (when we are processing BJ_Forget list and buffer is not jbddirty). !jbddirty buffers can be placed on BJ_Forget list for example by journal_forget() or by __dispose_buffer() - generally such buffer means that it has been freed by this transaction. Freed buffers should not be reallocated until the transaction has committed (that's why we have the assertion there) but they *can* be reallocated when the transaction has already been committed to disk and we are just processing the BJ_Forget list (as soon as we remove b_committed_data from the bitmap bh, ext3 will be able to reallocate buffers freed by the committing transaction). So we have to also count with the case that the buffer has been reallocated and b_next_transaction has been already set. And one more subtle point: it can happen that we manage to reallocate the buffer and also mark it jbddirty. Then we also add the freed buffer to the checkpoint list of the committing trasaction. But that should do no harm. Non-jbddirty buffers should be filed to BJ_Reserved and not BJ_Metadata list. It can actually happen that we refile such buffers during the commit phase when we reallocate in the running transaction blocks deleted in committing transaction (and that can happen if the committing transaction already wrote all the data and is just cleaning up BJ_Forget list). Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: "Stephen C. Tweedie" <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Poll cleanups/microoptimizationsVadim Lobanov2006-06-231-32/+51
| | | | | | | | | | | | | | | The "count" and "pt" variables are declared and modified by do_poll(), as well as accessed and written indirectly in the do_pollfd() subroutine. This patch pulls all handling of these variables into the do_poll() function, thereby eliminating the odd use of indirection in do_pollfd(). This is done by pulling the "struct pollfd" traversal loop from do_pollfd() into its only caller do_poll(). As an added bonus, the patch saves a few clock cycles, and also adds comments to make the code easier to follow. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fs/fat/misc.c: unexport fat_sync_bhsAdrian Bunk2006-06-231-1/+0
| | | | | | | Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fs/locks.c: make posix_locks_deadlock() staticAdrian Bunk2006-06-231-3/+1
| | | | | | | | | We can now make posix_locks_deadlock() static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] vfs: add lock owner argument to flush operationMiklos Szeredi2006-06-237-8/+8
| | | | | | | | | | | | | | | | | | | Pass the POSIX lock owner ID to the flush operation. This is useful for filesystems which don't want to store any locking state in inode->i_flock but want to handle locking/unlocking POSIX locks internally. FUSE is one such filesystem but I think it possible that some network filesystems would need this also. Also add a flag to indicate that a POSIX locking request was generated by close(), so filesystems using the above feature won't send an extra locking request in this case. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] locks: clean up locks_remove_posix()Miklos Szeredi2006-06-231-20/+5
| | | | | | | | | | | | | locks_remove_posix() can use posix_lock_file() instead of doing the lock removal by hand. posix_lock_file() now does exacly the same. The comment about pids no longer applies, posix_lock_file() takes only the owner into account. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] locks: don't do unnecessary allocationsMiklos Szeredi2006-06-231-3/+10
| | | | | | | | | | | | | | | | posix_lock_file() always allocates new locks in advance, even if it's easy to determine that no allocations will be needed. Optimize these cases: - FL_ACCESS flag is set - Unlocking the whole range Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] locks: don't unnecessarily fail posix lock operationsMiklos Szeredi2006-06-231-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | posix_lock_file() was too cautious, failing operations on OOM, even if they didn't actually require an allocation. This has the disadvantage, that a failing unlock on process exit could lead to a memory leak. There are two possibilites for this: - filesystem implements .lock() and calls back to posix_lock_file(). On cleanup of files_struct locks_remove_posix() is called which should remove all locks belonging to files_struct. However if filesystem calls posix_lock_file() which fails, then those locks will never be freed. - if a file is closed while a lock is blocked, then after acquiring fcntl_setlk() will undo the lock. But this unlock itself might fail on OOM, again possibly leaking the lock. The solution is to move the checking of the allocations until after it is sure that they will be needed. This will solve the above problem since unlock will always succeed unless it splits an existing region. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] read_mapping_page for address spacePekka Enberg2006-06-2320-54/+30
| | | | | | | | | | Add read_mapping_page() which is used for callers that pass mapping->a_ops->readpage as the filler for read_cache_page. This removes some duplication from filesystem code. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>