aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4/tcp_cubic.c
Commit message (Collapse)AuthorAgeFilesLines
* tcp_cubic: do not set epoch_start in the futureEric Dumazet2016-02-221-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Tracking idle time in bictcp_cwnd_event() is imprecise, as epoch_start is normally set at ACK processing time, not at send time. Doing a proper fix would need to add an additional state variable, and does not seem worth the trouble, given CUBIC bug has been there forever before Jana noticed it. Let's simply not set epoch_start in the future, otherwise bictcp_update() could overflow and CUBIC would again grow cwnd too fast. This was detected thanks to a packetdrill test Neal wrote that was flaky before applying this fix. Change-Id: I600d3a8ac0ed1d70f3ff5cc6b7341ac13178f4f3 Fixes: 30927520dbae ("tcp_cubic: better follow cubic curve after idle period") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Cc: Jana Iyengar <jri@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
* tcp_cubic: better follow cubic curve after idle periodEric Dumazet2016-02-221-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jana Iyengar found an interesting issue on CUBIC : The epoch is only updated/reset initially and when experiencing losses. The delta "t" of now - epoch_start can be arbitrary large after app idle as well as the bic_target. Consequentially the slope (inverse of ca->cnt) would be really large, and eventually ca->cnt would be lower-bounded in the end to 2 to have delayed-ACK slow-start behavior. This particularly shows up when slow_start_after_idle is disabled as a dangerous cwnd inflation (1.5 x RTT) after few seconds of idle time. Jana initial fix was to reset epoch_start if app limited, but Neal pointed out it would ask the CUBIC algorithm to recalculate the curve so that we again start growing steeply upward from where cwnd is now (as CUBIC does just after a loss). Ideally we'd want the cwnd growth curve to be the same shape, just shifted later in time by the amount of the idle period. Change-Id: Ia99e2743517729b1c0589fbb8db9965ec806043b Reported-by: Jana Iyengar <jri@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Sangtae Ha <sangtae.ha@gmail.com> Cc: Lawrence Brakmo <lawrence@brakmo.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
* tcp: cubic: fix bug in bictcp_acked()Eric Dumazet2013-09-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit cd6b423afd3c08b27e1fed52db828ade0addbc6b ] While investigating about strange increase of retransmit rates on hosts ~24 days after boot, Van found hystart was disabled if ca->epoch_start was 0, as following condition is true when tcp_time_stamp high order bit is set. (s32)(tcp_time_stamp - ca->epoch_start) < HZ Quoting Van : At initialization & after every loss ca->epoch_start is set to zero so I believe that the above line will turn off hystart as soon as the 2^31 bit is set in tcp_time_stamp & hystart will stay off for 24 days. I think we've observed that cubic's restart is too aggressive without hystart so this might account for the higher drop rate we observe. Diagnosed-by: Van Jacobson <vanj@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tcp: cubic: fix overflow error in bictcp_update()Eric Dumazet2013-09-141-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2ed0edf9090bf4afa2c6fc4f38575a85a80d4b20 ] commit 17a6e9f1aa9 ("tcp_cubic: fix clock dependency") added an overflow error in bictcp_update() in following code : /* change the unit from HZ to bictcp_HZ */ t = ((tcp_time_stamp + msecs_to_jiffies(ca->delay_min>>3) - ca->epoch_start) << BICTCP_HZ) / HZ; Because msecs_to_jiffies() being unsigned long, compiler does implicit type promotion. We really want to constrain (tcp_time_stamp - ca->epoch_start) to a signed 32bit value, or else 't' has unexpected high values. This bugs triggers an increase of retransmit rates ~24 days after boot [1], as the high order bit of tcp_time_stamp flips. [1] for hosts with HZ=1000 Big thanks to Van Jacobson for spotting this problem. Diagnosed-by: Van Jacobson <vanj@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tcp_cubic: limit delayed_ack ratio to prevent divide errorstephen hemminger2011-05-081-2/+7
| | | | | | | | | | | | | | | | TCP Cubic keeps a metric that estimates the amount of delayed acknowledgements to use in adjusting the window. If an abnormally large number of packets are acknowledged at once, then the update could wrap and reach zero. This kind of ACK could only happen when there was a large window and huge number of ACK's were lost. This patch limits the value of delayed ack ratio. The choice of 32 is just a conservative value since normally it should be range of 1 to 4 packets. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2011-03-151-11/+34
|\ | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * tcp_cubic: fix low utilization of CUBIC with HyStartSangtae Ha2011-03-141-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | HyStart sets the initial exit point of slow start. Suppose that HyStart exits at 0.5BDP in a BDP network and no history exists. If the BDP of a network is large, CUBIC's initial cwnd growth may be too conservative to utilize the link. CUBIC increases the cwnd 20% per RTT in this case. Signed-off-by: Sangtae Ha <sangtae.ha@gmail.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp_cubic: make the delay threshold of HyStart less sensitiveSangtae Ha2011-03-141-1/+1
| | | | | | | | | | | | | | | | | | Make HyStart less sensitive to abrupt delay variations due to buffer bloat. Signed-off-by: Sangtae Ha <sangtae.ha@gmail.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Reported-by: Lucas Nussbaum <lucas.nussbaum@loria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp_cubic: enable high resolution ack time if neededstephen hemminger2011-03-141-0/+4
| | | | | | | | | | | | | | | | | | | | This is a refined version of an earlier patch by Lucas Nussbaum. Cubic needs RTT values in milliseconds. If HZ < 1000 then the values will be too coarse. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reported-by: Lucas Nussbaum <lucas.nussbaum@loria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp_cubic: fix clock dependencystephen hemminger2011-03-141-12/+19
| | | | | | | | | | | | | | | | | | | | The hystart code was written with assumption that HZ=1000. Replace the use of jiffies with bictcp_clock as a millisecond real time clock. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reported-by: Lucas Nussbaum <lucas.nussbaum@loria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp_cubic: make ack train delta value a parameterstephen hemminger2011-03-141-1/+4
| | | | | | | | | | | | | | | | Make the spacing between ACK's that indicates a train a tuneable value like other hystart values. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp_cubic: fix comparison of jiffiesstephen hemminger2011-03-141-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiffies wraps around therefore the correct way to compare is to use cast to signed value. Note: cubic is not using full jiffies value on 64 bit arch because using full unsigned long makes struct bictcp grow too large for the available ca_priv area. Includes correction from Sangtae Ha to improve ack train detection. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: mark tcp_congestion_ops read_mostlyStephen Hemminger2011-03-101-1/+1
|/ | | | | Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: add helper for AI algorithmIlpo Järvinen2009-03-021-10/+1
| | | | | | | | | | | | | | | | | | | | It seems that implementation in yeah was inconsistent to what other did as it would increase cwnd one ack earlier than the others do. Size benefits: bictcp_cong_avoid | -36 tcp_cong_avoid_ai | +52 bictcp_cong_avoid | -34 tcp_scalable_cong_avoid | -36 tcp_veno_cong_avoid | -12 tcp_yeah_cong_avoid | -38 = -104 bytes total Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP] CUBIC v2.3Sangtae Ha2008-11-021-11/+109
| | | | | Signed-off-by: Sangtae Ha <sha2@ncsu.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
* rename div64_64 to div64_u64Roman Zippel2008-05-011-2/+2
| | | | | | | | | | | | | | | | | | | | | Rename div64_64 to div64_u64 to make it consistent with the other divide functions, so it clearly includes the type of the divide. Move its definition to math64.h as currently no architecture overrides the generic implementation. They can still override it of course, but the duplicated declarations are avoided. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: Avi Kivity <avi@qumranet.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [TCP]: TCP cubic v2.2Sangtae Ha2008-03-041-27/+8
| | | | | | | | | | | | | | | | | | | We have updated CUBIC to fix some issues with slow increase in large BDP networks. We also improved its convergence speed. The fix is in fact very simple -- the window increase limit of smax during the window probing phase (i.e., convex growth phase) is removed. We found that this does not affect TCP friendliness, but only improves its scalability. We have run some tests in our lab and also over the Internet path from NCSU to Japan. These results can be seen from the following page: http://netsrv.csc.ncsu.edu/wiki/index.php/Intra_protocol_fairness_testing_with_linux-2.6.23.9 http://netsrv.csc.ncsu.edu/wiki/index.php/RTT_fairness_testing_with_linux-2.6.23.9 http://netsrv.csc.ncsu.edu/wiki/index.php/TCP_friendliness_testing_with_linux-2.6.23.9 Signed-off-by: Sangtae Ha <sha2@ncsu.edu> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoidIlpo Järvinen2008-01-281-2/+1
| | | | | | Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Remove num_acked>0 checks from cong.ctrl mods pkts_ackedIlpo Järvinen2007-10-101-1/+1
| | | | | | | | | | There is no need for such check in pkts_acked because the callback is not invoked unless at least one segment got fully ACKed (i.e., the snd_una moved past skb's end_seq) by the cumulative ACK's snd_una advancement. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: cubic - eliminate use of receive time stampStephen Hemminger2007-07-311-28/+18
| | | | | | | | | | | | | | Remove use of received timestamp option value from RTT calculation in Cubic. A hostile receiver may be returning a larger timestamp option than the original value. This would cause the sender to believe the malevolent receiver had a larger RTT and because Cubic tries to provide some RTT friendliness, the sender would then favor the liar. Instead, use the jiffie resolutionRTT value already computed and passed back after ack. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: congestion control API pass RTT in microsecondsStephen Hemminger2007-07-311-1/+1
| | | | | | | | | | | | | | | | | This patch changes the API for the callback that is done after an ACK is received. It solves a couple of issues: * Some congestion controls want higher resolution value of RTT (controlled by TCP_CONG_RTT_SAMPLE flag). These don't really want a ktime, but all compute a RTT in microseconds. * Other congestion control could use RTT at jiffies resolution. To keep API consistent the units should be the same for both cases, just the resolution should change. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: remove unused argument to cong_avoid opStephen Hemminger2007-07-181-1/+1
| | | | | | | | | None of the existing TCP congestion controls use the rtt value pased in the ca_ops->cong_avoid interface. Which is lucky because seq_rtt could have been -1 when handling a duplicate ack. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Set initial_ssthresh default to zero in Cubic and BIC.David S. Miller2007-06-131-1/+1
| | | | | | | | | | Because of the current default of 100, Cubic and BIC perform very poorly compared to standard Reno. In the worst case, this change makes Cubic and BIC as aggressive as Reno. So this change should be very safe. Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Congestion control API update.Stephen Hemminger2007-04-251-1/+1
| | | | | | | | | | | Do some simple changes to make congestion control API faster/cleaner. * use ktime_t rather than timeval * merge rtt sampling into existing ack callback this means one indirect call versus two per ack. * use flags bits to store options/settings Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: cubic update for net-2.6.22Stephen Hemminger2007-04-251-3/+5
| | | | | | | | | | | | | | | | | | The following update received from Injong updates TCP cubic to the latest version. I am running more complete tests and will have results after 4/1. According to Injong: the new version improves on its scalability, fairness and stability. So in all properties, we confirmed it shows better performance. NCSU results (for 2.6.18 and 2.6.20) available: http://netsrv.csc.ncsu.edu/wiki/index.php/TCP_Testing This version is described in a new Internet draft for CUBIC. http://www.ietf.org/internet-drafts/draft-rhee-tcp-cubic-00.txt Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: cubic optimizationStephen Hemminger2007-04-251-11/+39
| | | | | | | Use willy's work in optimizing cube root by having table for small values. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP] tcp_cubic: faster cube rootStephen Hemminger2007-04-251-11/+5
| | | | | | | | | | | | The Newton-Raphson method is quadratically convergent so only a small fixed number of steps are necessary. Therefore it is faster to unroll the loop. Since div64_64 is no longer inline it won't cause code explosion. Also fixes a bug that can occur if x^2 was bigger than 32 bits. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: div64_64 consolidate (rev3)Stephen Hemminger2007-04-251-23/+0
| | | | | | | Here is the current version of the 64 bit divide common code. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Use read mostly for CUBIC parameters.Stephen Hemminger2007-02-121-10/+10
| | | | | | | | These module parameters should be in the read mostly area to avoid cache pollution. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET] IPV4: Fix whitespace errors.YOSHIFUJI Hideaki2007-02-101-25/+25
| | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP] cubic: scaling errorStephen Hemminger2006-10-251-3/+3
| | | | | | | | | | | | | | | | | | | | Doug Leith observed a discrepancy between the version of CUBIC described in the papers and the version in 2.6.18. A math error related to scaling causes Cubic to grow too slowly. Patch is from "Sangtae Ha" <sha2@ncsu.edu>. I validated that it does fix the problems. See the following to show behavior over 500ms 100 Mbit link. Sender (2.6.19-rc3) --- Bridge (2.6.18-rt7) ------- Receiver (2.6.19-rc3) 1G [netem] 100M http://developer.osdl.org/shemminger/tcp/2.6.19-rc3/cubic-orig.png http://developer.osdl.org/shemminger/tcp/2.6.19-rc3/cubic-fix.png Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP] Congestion control (modulo lp, bic): use BUILD_BUG_ONAlexey Dobriyan2006-09-221-1/+1
| | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Remove obsolete #include <linux/config.h>Jörn Engel2006-06-301-1/+0
| | | | | Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
* [TCP]: Minimum congestion window consolidation.Stephen Hemminger2006-06-171-6/+0
| | | | | | | | | | | | | | Many of the TCP congestion methods all just use ssthresh as the minimum congestion window on decrease. Rather than duplicating the code, just have that be the default if that handle in the ops structure is not set. Minor behaviour change to TCP compound. It probably wants to use this (ssthresh) as lower bound, rather than ssthresh/2 because the latter causes undershoot on loss. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP] cubic: use Newton-RaphsonStephen Hemminger2006-01-031-54/+39
| | | | | | | | | Replace cube root algorithim with a faster version using Newton-Raphson. Surprisingly, doing the scaled div64_64 is faster than a true 64 bit division on 64 bit CPU's. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP] cubic: precompute constantsStephen Hemminger2006-01-031-76/+57
| | | | | | | | | | | | Revised version of patch to pre-compute values for TCP cubic. * d32,d64 replaced with descriptive names * cube_factor replaces srtt[scaled by count] / HZ * ((1 << (10+2*BICTCP_HZ)) / bic_scale) * beta_scale replaces 8*(BICTCP_BETA_SCALE+beta)/3/(BICTCP_BETA_SCALE-beta); Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP] BIC: CUBIC window growth (2.0)Stephen Hemminger2006-01-031-0/+445
Replace existing BIC version 1.1 with new version 2.0. The main change is to replace the window growth function with a cubic function as described in: http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/cubic-paper.pdf Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>