aboutsummaryrefslogtreecommitdiffstats
path: root/fs/ceph
Commit message (Collapse)AuthorAgeFilesLines
...
* ceph: connect to export targets on cap exportSage Weil2010-08-013-2/+23
| | | | | | | When we get a cap EXPORT message, make sure we are connected to all export targets to ensure we can handle the matching IMPORT. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: connect to export targets if mds is laggySage Weil2010-08-011-0/+15
| | | | | | | | If an MDS we are talking to may have failed, we need to open sessions to its potential export targets to ensure that any in-progress migration that may have involved some of our caps is properly handled. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: introduce helper to connect to mds export targetsSage Weil2010-08-011-0/+37
| | | | | | | There are a few cases where we need to open sessions with a given mds's potential export targets. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: only set num_pages in calc_layoutSage Weil2010-08-011-3/+0
| | | | | | Setting it elsewhere is unnecessary and more fragile. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: do caps accounting per mds_clientYehuda Sadeh2010-08-015-115/+131
| | | | | | | | | Caps related accounting is now being done per mds client instead of just being global. This prepares ground work for a later revision of the caps preallocated reservation list. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: track laggy state of mds from mdsmapSage Weil2010-08-013-2/+16
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: code cleanupYehuda Sadeh2010-08-0113-49/+46
| | | | | | | Mainly fixing minor issues reported by sparse. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: skip if no auth cap in flush_snapsSage Weil2010-08-011-7/+12
| | | | | | | | If we have a capsnap but no auth cap (e.g. because it is migrating to another mds), bail out and do nothing for now. Do NOT remove the capsnap from the flush list. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: simplify caps revocation, fix for multimdsSage Weil2010-08-011-12/+18
| | | | | | | | | | | | | The caps revocation should either initiate writeback, invalidateion, or call check_caps to ack or do the dirty work. The primary question is whether we can get away with only checking the auth cap or whether all caps need to be checked. The old code was doing...something else. At the very least, revocations from non-auth MDSs could break by triggering the "check auth cap only" case. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: simplify add_cap_releasesSage Weil2010-08-011-16/+19
| | | | | | No functional change, aside from more useful debug output. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: drop unused argumentSage Weil2010-08-013-9/+6
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: perform lazy reads when file mode and caps permitSage Weil2010-08-014-13/+22
| | | | | | | | | If the file mode is marked as "lazy," perform cached/buffered reads when the caps permit it. Adjust the rdcache_gen and invalidation logic accordingly so that we manage our cache based on the FILE_CACHE -or- FILE_LAZYIO cap bits. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: perform lazy writes when file mode and caps permitSage Weil2010-08-012-8/+11
| | | | | | | If we have marked a file as "lazy" (using the ceph ioctl), perform buffered writes when the MDS caps allow it. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: add LAZYIO ioctl to mark a file description for lazy consistencySage Weil2010-08-012-0/+26
| | | | | | | | Allow an application to mark a file descriptor for lazy file consistency semantics, allowing buffered reads and writes when multiple clients are accessing the same file. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: request FILE_LAZYIO cap when LAZY file mode is setSage Weil2010-08-012-27/+26
| | | | | | | Also clean up the file flags -> file mode -> wanted caps functions while we're at it. This resyncs this file with userspace. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use complete_all and wake_up_allYehuda Sadeh2010-07-276-20/+20
| | | | | | | | | | This fixes an issue triggered by running concurrent syncs. One of the syncs would go through while the other would just hang indefinitely. In any case, we never actually want to wake a single waiter, so the *_all functions should be used. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: Correct obvious typo of Kconfig variable "CRYPTO_AES"Robert P. J. Day2010-07-241-1/+1
| | | | | Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix dentry lease releaseSage Weil2010-07-231-0/+1
| | | | | | | | | When we embed a dentry lease release notification in a request, invalidate our lease so we don't think we still have it. Otherwise we can get all sorts of incorrect client behavior when multiple clients are interacting with the same part of the namespace. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix leak of dentry in ceph_init_dentry() error pathSage Weil2010-07-231-1/+3
| | | | | | If we fail to allocate a ceph_dentry_info, don't leak the dn reference. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix pg_mapping leak on pg_temp updatesSage Weil2010-07-231-11/+15
| | | | | | | Free the ceph_pg_mapping structs when they are removed from the pg_temp rbtree. Also fix a leak in the __insert_pg_mapping() error path. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix d_release dop for snapdir, snapped dentriesSage Weil2010-07-231-3/+9
| | | | | | | | | | We need to set the d_release dop for snapdir and snapped dentries so that the ceph_dentry_info struct gets released. We also use the dcache to cache readdir results when possible, which only works if we know when dentries are dropped from the cache. Since we don't use the dcache for readdir in the hidden snapdir, avoid that case in ceph_dentry_release. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: avoid dcache readdir for snapdirSage Weil2010-07-221-0/+1
| | | | | | | | We should always go to the MDS for readdir on the hidden snapdir. The set of snapshots can change at any time; the client can't trust its cache for that. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: do not include cap/dentry releases in replayed messagesSage Weil2010-07-162-0/+9
| | | | | | | | | Strip the cap and dentry releases from replayed messages. They can cause the shared state to get out of sync because they were generated (with the request message) earlier, and no longer reflect the current client state. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: reuse request message when replaying against recovering mdsSage Weil2010-07-161-5/+22
| | | | | | | | | | | | | | | | Replayed rename operations (after an mds failure/recovery) were broken because the request paths were regenerated from the dentry names, which get mangled when d_move() is called. Instead, resend the previous request message when replaying completed operations. Just make sure the REPLAY flag is set and the target ino is filled in. This fixes problems with workloads doing renames when the MDS restarts, where the rename operation appears to succeed, but on mds restart then fails (leading to client confusion, app breakage, etc.). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix creation of ipv6 socketsSage Weil2010-07-091-3/+5
| | | | | | Use the address family from the peer address instead of assuming IPv4. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix parsing of ipv6 addressesSage Weil2010-07-091-6/+19
| | | | | | | Check for brackets around the ipv6 address to avoid ambiguity with the port number. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix printing of ipv6 addrsSage Weil2010-07-081-18/+6
| | | | | | | | The buffer was too small. Make it bigger, use snprintf(), put brackets around the ipv6 address to avoid mixing it up with the :port, and use the ever-so-handy %pI[46] formats. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: add kfree() to error pathDan Carpenter2010-07-081-0/+1
| | | | | | | We leak a "pi" on this error path. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix leak of mon authorizerSage Weil2010-07-051-0/+3
| | | | | | Fix leak of a struct ceph_buffer on umount. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix message revocationSage Weil2010-07-051-7/+7
| | | | | | | | | | | | | A message can be on a queue (pending or sent), or out_msg (sending), or both. We were assuming that if it's not on a queue it couldn't be out_msg, but that was false in the case of lossy connections like the OSD. Fix ceph_con_revoke() to treat these cases independently. Also, fix the out_kvec_is_message check to only trigger if we are currently sending _this_ message. This fixes a GPF in tcp_sendpage, triggered by OSD restarts. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix crush device 'out' threshold to 1.0, not 0.1Sage Weil2010-07-051-1/+1
| | | | | | | Fix a typo that made any OSD weighted between 0.1 and 1.0 effectively weighted as 1.0 (fully in). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix caps usage accounting for import (non-reserved) caseSage Weil2010-06-291-2/+8
| | | | | | | We need to increase the total and used counters when allocating a new cap in the non-reserved (cap import) case. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: only release clean, unused caps with mds requestsSage Weil2010-06-291-5/+6
| | | | | | | | | | | We can drop caps with an mds request. Ensure we only drop unused AND clean caps, since the MDS doesn't support cap writeback in that context, nor do we track it. If caps are dirty, and the MDS needs them back, we it will revoke and we will flush in the normal fashion. This fixes a possibly loss of metadata. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix crush CHOOSE_LEAF when type is already a leafSage Weil2010-06-241-13/+25
| | | | | | | We may not recurse for CHOOSE_LEAF if we start with a leaf node. When that happens, the out2 vector needs to be filled in with the result. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix crush recursionSage Weil2010-06-241-0/+1
| | | | | | | There was a longstanding problem with recursion through intervening bucket types on complex hierarchies. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix caps debugfs entryYehuda Sadeh2010-06-241-1/+1
| | | | | | | The ceph client structure was not set correctly. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: delay umount until all mds requests drop inode+dentry refsSage Weil2010-06-211-0/+6
| | | | | | | | | | | | | | | | | This fixes a race between handle_reply finishing an mds request, signalling completion, and then dropping the request structing and its dentry+inode refs, and pre_umount function waiting for requests to finish before letting the vfs tear down the dcache. If umount was delayed waiting for mds requests, we could race and BUG in shrink_dcache_for_umount_subtree because of a slow dput. This delays umount until the msgr queue flushes, which means handle_reply will exit and will have dropped the ceph_mds_request struct. I'm assuming the VFS has already ensured that its calls have all completed and those request refs have thus been dropped as well (I haven't seen that race, at least). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: handle splice_dentry/d_materialize_unique error in readdir_prepopulateSage Weil2010-06-211-7/+12
| | | | | | | Handle a splice_dentry failure (due to a d_materialize_unique error) without crashing. (Also, report the error code.) Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix crush map update decodingSage Weil2010-06-171-0/+1
| | | | | | | If the incremental osdmap has a new crush map, advance the position after decoding so that we can parse the rest of the osdmap properly. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix message memory leak, uninitialized variableSage Weil2010-06-131-0/+2
| | | | | | | | | We need to properly initialize skip, as not all alloc_msg op instances set it. Also, BUG if someone says skip but also allocates a message. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix map handler error pathSage Weil2010-06-131-1/+2
| | | | | | Don't leak message if we receive an unexpected message type. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: some endianity fixesYehuda Sadeh2010-06-133-3/+4
| | | | | | | Fix some problems that came up with sparse. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: try to send partial cap release on cap message on missing inodeSage Weil2010-06-103-5/+9
| | | | | | | | | | | | If we have enough memory to allocate a new cap release message, do so, so that we can send a partial release message immediately. This keeps us from making the MDS wait when the cap release it needs is in a partially full release message. If we fail because of ENOMEM, oh well, they'll just have to wait a bit longer. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: release cap on import if we don't have the inodeSage Weil2010-06-103-38/+61
| | | | | | | | If we get an IMPORT that give us a cap, but we don't have the inode, queue a release (and try to send it immediately) so that the MDS doesn't get stuck waiting for us. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix misleading/incorrect debug messageSage Weil2010-06-101-1/+1
| | | | | | Nothing is released here: the caps message is simply ignored in this case. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix atomic64_t initialization on ia64Jeff Mahoney2010-06-101-1/+1
| | | | | | | | bdi_seq is an atomic_long_t but we're using ATOMIC_INIT, which causes build failures on ia64. This patch fixes it to use ATOMIC_LONG_INIT. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix lease revocation when seq doesn't matchSage Weil2010-06-041-4/+8
| | | | | | | | If the client revokes a lease with a higher seq than what we have, keep the mds's seq, so that it honors our release. Otherwise, we can hang indefinitely. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix f_namelen reported by statfsSage Weil2010-06-011-1/+1
| | | | | | | | | We were setting f_namelen in kstatfs to PATH_MAX instead of NAME_MAX. That disagrees with ceph_lookup behavior (which checks against NAME_MAX), and also makes the pjd posix test suite spit out ugly errors because with can't clean up its temporary files. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix memory leak in statfsYehuda Sadeh2010-06-011-0/+2
| | | | | | | Freeing the statfs request structure when required. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix d_subdirs ordering problemHenry C Chang2010-06-011-1/+1
| | | | | | | | | | | | | | We misused list_move_tail() to order the dentry in d_subdirs. This will screw up the d_subdirs order. This bug can be reliably reproduced by: 1. mount ceph fs. 2. on ceph fs, git clone git://ceph.newdream.net/git/ceph.git 3. Run autogen.sh in ceph directory. (Note: Errors only occur at the first time you run autogen.sh.) Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>