summaryrefslogtreecommitdiffstats
path: root/services/common_time
diff options
context:
space:
mode:
authorJohn Grossman <johngro@google.com>2012-04-09 11:26:16 -0700
committerJohn Grossman <johngro@google.com>2012-04-09 15:33:53 -0700
commite1d6c080f0b1769637d742e51cc22167c7af12bb (patch)
tree1043b44ab40cb0dc9bbce83ab3c40b3773dc044d /services/common_time
parentfaef0d0f648570dae5e919e8cb2d9096861f2491 (diff)
downloadframeworks_base-e1d6c080f0b1769637d742e51cc22167c7af12bb.zip
frameworks_base-e1d6c080f0b1769637d742e51cc22167c7af12bb.tar.gz
frameworks_base-e1d6c080f0b1769637d742e51cc22167c7af12bb.tar.bz2
Make common_time more deferential when coming out of networkless mode.
Addresses issues seen in bug 6260139. This is a really tough bug to repro, but there is no doubt that it is happening occasionally on our super huge A@H subnet. I have collected data all weekend; the failure did not occur, but I got enough to have a theoretical sequence of events which could trigger this behavior. The sequence goes like this. 1) A network is running and happy with a timeline master M, maintaining timeline X. 2) Device B boots, but its network is taking a long time to come up. After 60 seconds of waiting for the network to come up, device B goes into networkless master mode and creates timeline Y. 3) Device B's network comes up. It immediately sends a master announcement saying that it is the current low-priority master of timeline Y (its low priority because it has never had any real clients) 4) Master M ignores B because B is low priority. 5) Device C boots and sends out a who is master request. It is a race between M and A to see who will respond first. In this case, A responds first. 6) C sends B a request which B receives. B now has its first client and is now high priority. In this scenario, B matches M in all aspects of the priority ranking function, including winning the tie breaker (larger MAC address when interpreted as a 48 bit integer) 7) M sends its master announcement; it is ignored by B since B now wins in the ranking function vs M. 8) Finally, B sends its next master announcement. M sees it, realizes that there is a higher priority master out there (looks like a bridged network scenario to M). M gives up master status along with timeline X. The clients of M become clients of B and move from timeline X to timeline Y (something which should only be needed during an actual network bridging event) This change has a few different things meant to severely minimize the chance that this can happen. First, and the most important change, is that networkless masters do not immediately announce themselves as masters on the network they are joining. Instead, they transition into Ronin to discover any pre-existing masters on the network. If there are no masters out there, the device will simply transition back to master and continue to maintain the timeline it had in networkless mode. In the scenario above, however, B should discover M and become its client, preserving the established timeline X. Second, any time a device experienced an interface reconfiguration (including coming out of networkless mode), it clears its high priority bit. This is a good thing. The bit used to get set again any time... 1) The device is master and receives a client request. 2) The device becomes a client of another master on the network. 3) The device becomes a master. Number 3 in this list is a mistake. The high priority bit should only be set for devices during master election which have been participating in a timeline which has been used by multiple devices. We know that this is the case when we are master and receive a request. We also know that this is the case when we hear from a master and decide to become its client. Simply becoming a master should not make us high priority. This behavior has been removed. Third, timeouts have been adjusted just for some extra "stickyness" when it comes to master status. Clients now say in the Ronin state for up to 10 seconds looking for a master sending up to 20 discovery requests, instead of only 3 seconds (sending 6 requests). The wait-for-election timeout has been adjusted up from 5 seconds to 12.5 seconds to track the longer election cycle as well. Also, while in steady-state, clients will now wait until 10 packets (10 seconds) have not been answered by its master before giving up and dropping into Ronin. Change-Id: I438b39f31265e34d6719d4adfa9e8b95a2afc188 Signed-off-by: John Grossman <johngro@google.com>
Diffstat (limited to 'services/common_time')
-rw-r--r--services/common_time/common_time_server.cpp23
1 files changed, 16 insertions, 7 deletions
diff --git a/services/common_time/common_time_server.cpp b/services/common_time/common_time_server.cpp
index 48fea66..59576aa 100644
--- a/services/common_time/common_time_server.cpp
+++ b/services/common_time/common_time_server.cpp
@@ -80,14 +80,14 @@ const int CommonTimeServer::kInitial_WhoIsMasterTimeoutMs = 500;
// number of sync requests that can fail before a client assumes its master
// is dead
-const int CommonTimeServer::kClient_NumSyncRequestRetries = 5;
+const int CommonTimeServer::kClient_NumSyncRequestRetries = 10;
/*** Master state constants ***/
/*** Ronin state constants ***/
// number of WhoIsMaster attempts sent before declaring ourselves master
-const int CommonTimeServer::kRonin_NumWhoIsMasterRetries = 4;
+const int CommonTimeServer::kRonin_NumWhoIsMasterRetries = 20;
// timeout used when waiting for a response to a WhoIsMaster request
const int CommonTimeServer::kRonin_WhoIsMasterTimeoutMs = 500;
@@ -96,7 +96,7 @@ const int CommonTimeServer::kRonin_WhoIsMasterTimeoutMs = 500;
// how long do we wait for an announcement from a master before
// trying another election?
-const int CommonTimeServer::kWaitForElection_TimeoutMs = 5000;
+const int CommonTimeServer::kWaitForElection_TimeoutMs = 12500;
CommonTimeServer::CommonTimeServer()
: Thread(false)
@@ -279,10 +279,14 @@ bool CommonTimeServer::runStateMachine_l() {
// If we were in the master state, then either we were the
// master in a no-network situation, or we were the master
// of a different network and have moved to a new interface.
- // In either case, immediately send out a master
- // announcement at low priority.
+ // In either case, immediately transition to Ronin at low
+ // priority. If there is no one in the network we just
+ // joined, we will become master soon enough. If there is,
+ // we want to be certain to defer master status to the
+ // existing timeline currently running on the network.
+ //
case CommonClockService::STATE_MASTER:
- sendMasterAnnouncement();
+ becomeRonin("leaving networkless mode");
break;
// If we were in any other state (CLIENT, RONIN, or
@@ -1071,6 +1075,12 @@ bool CommonTimeServer::becomeClient(const sockaddr_storage& masterEP,
mMasterEP = masterEP;
mMasterEPValid = true;
+
+ // If we are on a real network as a client of a real master, then we should
+ // no longer force low priority. If our master disappears, we should have
+ // the high priority bit set during the election to replace the master
+ // because this group was a real group and not a singleton created in
+ // networkless mode.
setForceLowPriority(false);
mClient_MasterDeviceID = masterDeviceID;
@@ -1112,7 +1122,6 @@ bool CommonTimeServer::becomeMaster(const char* cause) {
memset(&mMasterEP, 0, sizeof(mMasterEP));
mMasterEPValid = false;
- setForceLowPriority(false);
mClient_MasterDevicePriority = effectivePriority();
mClient_MasterDeviceID = mDeviceID;
mClockRecovery.reset(false, true);