aboutsummaryrefslogtreecommitdiffstats
path: root/fs/ceph/osd_client.h
Commit message (Collapse)AuthorAgeFilesLines
* ceph: factor out libceph from Ceph file systemYehuda Sadeh2010-10-201-233/+0
| | | | | | | | | | | | | | | | | | | | | | This factors out protocol and low-level storage parts of ceph into a separate libceph module living in net/ceph and include/linux/ceph. This is mostly a matter of moving files around. However, a few key pieces of the interface change as well: - ceph_client becomes ceph_fs_client and ceph_client, where the latter captures the mon and osd clients, and the fs_client gets the mds client and file system specific pieces. - Mount option parsing and debugfs setup is correspondingly broken into two pieces. - The mon client gets a generic handler callback for otherwise unknown messages (mds map, in this case). - The basic supported/required feature bits can be expanded (and are by ceph_fs_client). No functional change, aside from some subtle error handling cases that got cleaned up in the refactoring process. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph-rbd: osdc support for osd call and rollback operationsYehuda Sadeh2010-10-201-0/+6
| | | | | | This will be used for rbd snapshots administration. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
* ceph: messenger and osdc changes for rbdYehuda Sadeh2010-10-201-13/+48
| | | | | | | | | | | | Allow the messenger to send/receive data in a bio. This is added so that we wouldn't need to copy the data into pages or some other buffer when doing IO for an rbd block device. We can now have trailing variable sized data for osd ops. Also osd ops encoding is more modular. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: refactor osdc requests creation functionsYehuda Sadeh2010-10-201-0/+25
| | | | | | | | | | The osd requests creation are being decoupled from the vino parameter, allowing clients using the osd to use other arbitrary object names that are not necessarily vino based. Also, calc_raw_layout now takes a snap id. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: resubmit requests on pg mapping change (not just primary change)Sage Weil2010-05-111-0/+2
| | | | | | | | OSD requests need to be resubmitted on any pg mapping change, not just when the pg primary changes. Resending only when the primary changes results in occasional 'hung' requests during osd cluster recovery or rebalancing. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: don't use writeback_control in writepages completionSage Weil2010-05-051-1/+0
| | | | | | | | | | | The ->writepages writeback_control is not still valid in the writepages completion. We were touching it solely to adjust pages_skipped when there was a writeback error (EIO, ENOSPC, EPERM due to bad osd credentials), causing an oops in the writeback code shortly thereafter. Updating pages_skipped on error isn't correct anyway, so let's just rip out this (clearly broken) code to pass the wbc to the completion. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: rename r_sent_stamp r_stampSage Weil2010-03-231-1/+1
| | | | | | | | Make variable name slightly more generic, since it will (soon) reflect either the time the request was sent OR the time it was last determined to be still retrying. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: reset osd after relevant messages timed outYehuda Sadeh2010-03-041-1/+5
| | | | | | | | | | | | | | | | This simplifies the process of timing out messages. We keep lru of current messages that are in flight. If a timeout has passed, we reset the osd connection, so that messages will be retransmitted. This is a failsafe in case we hit some sort of problem sending out message to the OSD. Normally, we'll get notification via an updated osdmap if there are problems. If a request is older than the keepalive timeout, send a keepalive to ensure we detect any breaks in the TCP connection. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use single osd op reply msgSage Weil2010-03-011-4/+1
| | | | | | | Use a single ceph_msg for the osd reply, even when we are getting multiple replies. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: put unused osd connections on lruYehuda Sadeh2010-02-111-0/+4
| | | | | | | | | | Instead of removing osd connection immediately when the requests list is empty, put the osd connection on an lru. Only if that osd has not been used for more than a specified time, will it be removed. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: keep reserved replies on the request structureYehuda Sadeh2010-01-251-3/+5
| | | | | | | | This includes treating all the data preallocation and revokation at the same place, not having to have a special case for the reserved pages. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
* ceph: display pgid in debugfs osd request dumpSage Weil2010-01-141-0/+1
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: control access to page vector for incoming dataSage Weil2009-12-231-1/+3
| | | | | | | | | | | | | | When we issue an OSD read, we specify a vector of pages that the data is to be read into. The request may be sent multiple times, to multiple OSDs, if the osdmap changes, which means we can get more than one reply. Only read data into the page vector if the reply is coming from the OSD we last sent the request to. Keep track of which connection is using the vector by taking a reference. If another connection was already using the vector before and a new reply comes in on the right connection, revoke the pages from the other connection. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix msgpool reservation leakYehuda Sadeh2009-12-211-0/+1
| | | | Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
* ceph: use kref for ceph_osd_requestSage Weil2009-12-071-3/+8
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: negotiate authentication protocol; implement AUTH_NONE protocolSage Weil2009-11-181-0/+4
| | | | | | | | | | | | | | | | When we open a monitor session, we send an initial AUTH message listing the auth protocols we support, our entity name, and (possibly) a previously assigned global_id. The monitor chooses a protocol and responds with an initial message. Initially implement AUTH_NONE, a dummy protocol that provides no security, but works within the new framework. It generates 'authorizers' that are used when connecting to (mds, osd) services that simply state our entity name and global_id. This is a wire protocol change. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: build cleanly without CONFIG_DEBUG_FSSage Weil2009-11-121-0/+2
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: OSD clientSage Weil2009-10-061-0/+144
The OSD client is responsible for reading and writing data from/to the object storage pool. This includes determining where objects are stored in the cluster, and ensuring that requests are retried or redirected in the event of a node failure or data migration. If an OSD does not respond before a timeout expires, keepalive messages are sent across the lossless, ordered communications channel to ensure that any break in the TCP is discovered. If the session does reset, a reconnection is attempted and affected requests are resent (by the message transport layer). Signed-off-by: Sage Weil <sage@newdream.net>