| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
We no longer use md5_add() method from struct tcp_sock_af_ops
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
The conversion is very similar to that made to ipv6's SIT code.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
In the future the ipv4/ipv6 route gateway will take on two types
of values:
1) INADDR_ANY/IN6ADDR_ANY, for local network routes, and in this case
the neighbour must be obtained using the destination address in
ipv4/ipv6 header as the lookup key.
2) Everything else, the actual nexthop route address.
So if the gateway is not inaddr-any we use it, otherwise we must use
the packet's destination address.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
It might be useful to get a counter of failed tcp_retransmit_skb()
calls.
Reported-by: Satoru Moriya <satoru.moriya@hds.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \ \ \ \ \ \ \ \ |
|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
We can remove the rt_gateway == 0 check but we shouldn't
remove the 'dst' initialization too.
Noticed by Eric Dumazet.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
We can remove the rt_gateway == 0 check but we shouldn't
remove the 'dst' initialization too.
Noticed by Eric Dumazet.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
It can never actually happen. rt_gateway is either the fully resolved
flow lookup key's destination address, or the non-zero FIB entry gateway
address.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \ \ \ \ \ \ \ \ \
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
"Out of the 8 commits, one fixes a long-standing locking issue around
tasklist walking and others are cleanups."
* 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Walk task list under tasklist_lock in cgroup_enable_task_cg_list
cgroup: Remove wrong comment on cgroup_enable_task_cg_list()
cgroup: remove cgroup_subsys argument from callbacks
cgroup: remove extra calls to find_existing_css_set
cgroup: replace tasklist_lock with rcu_read_lock
cgroup: simplify double-check locking in cgroup_attach_proc
cgroup: move struct cgroup_pidlist out from the header file
cgroup: remove cgroup_attach_task_current_cg()
|
| |/ / / / / / / / / /
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
The argument is not used at all, and it's not necessary, because
a specific callback handler of course knows which subsys it
belongs to.
Now only ->pupulate() takes this argument, because the handlers of
this callback always call cgroup_add_file()/cgroup_add_files().
So we reduce a few lines of code, though the shrinking of object size
is minimal.
16 files changed, 113 insertions(+), 162 deletions(-)
text data bss dec hex filename
5486240 656987 7039960 13183187 c928d3 vmlinux.o.orig
5486170 656987 7039960 13183117 c9288d vmlinux.o
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|\ \ \ \ \ \ \ \ \ \ \
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events changes for v3.4 from Ingo Molnar:
- New "hardware based branch profiling" feature both on the kernel and
the tooling side, on CPUs that support it. (modern x86 Intel CPUs
with the 'LBR' hardware feature currently.)
This new feature is basically a sophisticated 'magnifying glass' for
branch execution - something that is pretty difficult to extract from
regular, function histogram centric profiles.
The simplest mode is activated via 'perf record -b', and the result
looks like this in perf report:
$ perf record -b any_call,u -e cycles:u branchy
$ perf report -b --sort=symbol
52.34% [.] main [.] f1
24.04% [.] f1 [.] f3
23.60% [.] f1 [.] f2
0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow
0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn
0.01% [k] _IO_vfprintf_internal [k] strchrnul
0.01% [k] __printf [k] _IO_vfprintf_internal
0.01% [k] main [k] __printf
This output shows from/to branch columns and shows the highest
percentage (from,to) jump combinations - i.e. the most likely taken
branches in the system. "branches" can also include function calls
and any other synchronous and asynchronous transitions of the
instruction pointer that are not 'next instruction' - such as system
calls, traps, interrupts, etc.
This feature comes with (hopefully intuitive) flat ascii and TUI
support in perf report.
- Various 'perf annotate' visual improvements for us assembly junkies.
It will now recognize function calls in the TUI and by hitting enter
you can follow the call (recursively) and back, amongst other
improvements.
- Multiple threads/processes recording support in perf record, perf
stat, perf top - which is activated via a comma-list of PIDs:
perf top -p 21483,21485
perf stat -p 21483,21485 -ddd
perf record -p 21483,21485
- Support for per UID views, via the --uid paramter to perf top, perf
report, etc. For example 'perf top --uid mingo' will only show the
tasks that I am running, excluding other users, root, etc.
- Jump label restructurings and improvements - this includes the
factoring out of the (hopefully much clearer) include/linux/static_key.h
generic facility:
struct static_key key = STATIC_KEY_INIT_FALSE;
...
if (static_key_false(&key))
do unlikely code
else
do likely code
...
static_key_slow_inc();
...
static_key_slow_inc();
...
The static_key_false() branch will be generated into the code with as
little impact to the likely code path as possible. the
static_key_slow_*() APIs flip the branch via live kernel code patching.
This facility can now be used more widely within the kernel to
micro-optimize hot branches whose likelihood matches the static-key
usage and fast/slow cost patterns.
- SW function tracer improvements: perf support and filtering support.
- Various hardenings of the perf.data ABI, to make older perf.data's
smoother on newer tool versions, to make new features integrate more
smoothly, to support cross-endian recording/analyzing workflows
better, etc.
- Restructuring of the kprobes code, the splitting out of 'optprobes',
and a corner case bugfix.
- Allow the tracing of kernel console output (printk).
- Improvements/fixes to user-space RDPMC support, allowing user-space
self-profiling code to extract PMU counts without performing any
system calls, while playing nice with the kernel side.
- 'perf bench' improvements
- ... and lots of internal restructurings, cleanups and fixes that made
these features possible. And, as usual this list is incomplete as
there were also lots of other improvements
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits)
perf report: Fix annotate double quit issue in branch view mode
perf report: Remove duplicate annotate choice in branch view mode
perf/x86: Prettify pmu config literals
perf report: Enable TUI in branch view mode
perf report: Auto-detect branch stack sampling mode
perf record: Add HEADER_BRANCH_STACK tag
perf record: Provide default branch stack sampling mode option
perf tools: Make perf able to read files from older ABIs
perf tools: Fix ABI compatibility bug in print_event_desc()
perf tools: Enable reading of perf.data files from different ABI rev
perf: Add ABI reference sizes
perf report: Add support for taken branch sampling
perf record: Add support for sampling taken branch
perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK
x86/kprobes: Split out optprobe related code to kprobes-opt.c
x86/kprobes: Fix a bug which can modify kernel code permanently
x86/kprobes: Fix instruction recovery on optimized path
perf: Add callback to flush branch_stack on context switch
perf: Disable PERF_SAMPLE_BRANCH_* when not supported
perf/x86: Add LBR software filter support for Intel CPUs
...
|
| |\ \ \ \ \ \ \ \ \ \ \
| | | |_|_|_|_|_|_|_|/ /
| | |/| | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Merge reason: We are going to queue up a dependent patch.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| |\ \ \ \ \ \ \ \ \ \ \
| | | |_|_|_|_|_|/ / / /
| | |/| | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Conflicts:
tools/perf/builtin-record.c
tools/perf/builtin-top.c
tools/perf/perf.h
tools/perf/util/top.h
Merge reason: resolve these cherry-picking conflicts.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
static_key_slow_[inc|dec]()
So here's a boot tested patch on top of Jason's series that does
all the cleanups I talked about and turns jump labels into a
more intuitive to use facility. It should also address the
various misconceptions and confusions that surround jump labels.
Typical usage scenarios:
#include <linux/static_key.h>
struct static_key key = STATIC_KEY_INIT_TRUE;
if (static_key_false(&key))
do unlikely code
else
do likely code
Or:
if (static_key_true(&key))
do likely code
else
do unlikely code
The static key is modified via:
static_key_slow_inc(&key);
...
static_key_slow_dec(&key);
The 'slow' prefix makes it abundantly clear that this is an
expensive operation.
I've updated all in-kernel code to use this everywhere. Note
that I (intentionally) have not pushed through the rename
blindly through to the lowest levels: the actual jump-label
patching arch facility should be named like that, so we want to
decouple jump labels from the static-key facility a bit.
On non-jump-label enabled architectures static keys default to
likely()/unlikely() branches.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jason Baron <jbaron@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: a.p.zijlstra@chello.nl
Cc: mathieu.desnoyers@efficios.com
Cc: davem@davemloft.net
Cc: ddaney.cavm@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|\ \ \ \ \ \ \ \ \ \ \ \
| |_|_|_|_|_|_|_|_|_|_|/
|/| | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU changes for v3.4 from Ingo Molnar. The major features of this
series are:
- making RCU more aggressive about entering dyntick-idle mode in order
to improve energy efficiency
- converting a few more call_rcu()s to kfree_rcu()s
- applying a number of rcutree fixes and cleanups to rcutiny
- removing CONFIG_SMP #ifdefs from treercu
- allowing RCU CPU stall times to be set via sysfs
- adding CPU-stall capability to rcutorture
- adding more RCU-abuse diagnostics
- updating documentation
- fixing yet more issues located by the still-ongoing top-to-bottom
inspection of RCU, this time with a special focus on the CPU-hotplug
code path.
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
rcu: Stop spurious warnings from synchronize_sched_expedited
rcu: Hold off RCU_FAST_NO_HZ after timer posted
rcu: Eliminate softirq-mediated RCU_FAST_NO_HZ idle-entry loop
rcu: Add RCU_NONIDLE() for idle-loop RCU read-side critical sections
rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit()
rcu: Remove redundant check for rcu_head misalignment
PTR_ERR should be called before its argument is cleared.
rcu: Convert WARN_ON_ONCE() in rcu_lock_acquire() to lockdep
rcu: Trace only after NULL-pointer check
rcu: Call out dangers of expedited RCU primitives
rcu: Rework detection of use of RCU by offline CPUs
lockdep: Add CPU-idle/offline warning to lockdep-RCU splat
rcu: No interrupt disabling for rcu_prepare_for_idle()
rcu: Move synchronize_sched_expedited() to rcutree.c
rcu: Check for illegal use of RCU from offlined CPUs
rcu: Update stall-warning documentation
rcu: Add CPU-stall capability to rcutorture
rcu: Make documentation give more realistic rcutorture duration
rcutorture: Permit holding off CPU-hotplug operations during boot
rcu: Print scheduling-clock information on RCU CPU stall-warning messages
...
|
| |\ \ \ \ \ \ \ \ \ \ \
| | |_|/ / / / / / / / /
| |/| | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
The major features of this series are:
- making RCU more aggressive about entering dyntick-idle mode in order to
improve energy efficiency
- converting a few more call_rcu()s to kfree_rcu()s
- applying a number of rcutree fixes and cleanups to rcutiny
- removing CONFIG_SMP #ifdefs from treercu
- allowing RCU CPU stall times to be set via sysfs
- adding CPU-stall capability to rcutorture
- adding more RCU-abuse diagnostics
- updating documentation
- fixing yet more issues located by the still-ongoing top-to-bottom
inspection of RCU, this time with a special focus on the
CPU-hotplug code path.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
The call_rcu() in do_ip_setsockopt() invokes opt_kfree_rcu(), which just
calls kfree(). So convert the call_rcu() to kfree_rcu(), which allows
opt_kfree_rcu() to be eliminated.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
|
| | | |_|_|_|_|_|/ / /
| | |/| | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
Because opt_kfree_rcu() just calls kfree(), all call_rcu() uses of it
may be converted to kfree_rcu(). This permits opt_kfree_rcu() to
be eliminated.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
|
| |_|_|/ / / / / / /
|/| | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
commit ea4fc0d619 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit())
added a serious regression on synflood handling.
Simon Kirby discovered a successful connection was delayed by 20 seconds
before being responsive.
In my tests, I discovered that xmit frames were lost, and needed ~4
retransmits and a socket dst rebuild before being really sent.
In case of syncookie initiated connection, we use a different path to
initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared.
As ip_queue_xmit() now depends on inet flow being setup, fix this by
copying the temp flowi4 we use in cookie_v4_check().
Reported-by: Simon Kirby <sim@netnation.com>
Bisected-by: Simon Kirby <sim@netnation.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
As we invalidate the inetpeer tree along with the routing cache now,
we don't need a genid to reset the redirect handling when the routing
cache is flushed.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
We initialize the routing metrics with the values cached on the
inetpeer in rt_init_metrics(). So if we have the metrics cached on the
inetpeer, we ignore the user configured fib_metrics.
To fix this issue, we replace the old tree with a fresh initialized
inet_peer_base. The old tree is removed later with a delayed work queue.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |_|_|_|_|_|_|_|/
|/| | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
This commit fixes tcp_shift_skb_data() so that it does not shift
SACKed data below snd_una.
This fixes an issue whose symptoms exactly match reports showing
tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at
net/ipv4/tcp_input.c:3418" thread on netdev).
Since 2008 (832d11c5cd076abc0aa1eaf7be96c81d1a59ce41)
tcp_shift_skb_data() had been shifting SACKed ranges that were below
snd_una. It checked that the *end* of the skb it was about to shift
from was above snd_una, but did not check that the end of the actual
shifted range was above snd_una; this commit adds that check.
Shifting SACKed ranges below snd_una is problematic because for such
ranges tcp_sacktag_one() short-circuits: it does not declare anything
as SACKed and does not increase sacked_out.
Before the fixes in commits cc9a672ee522d4805495b98680f4a3db5d0a0af9
and daef52bab1fd26e24e8e9578f8fb33ba1d0cb412, shifting SACKed ranges
below snd_una happened to work because tcp_shifted_skb() was always
(incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq
tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence
tcp_sacktag_one() never short-circuited and always increased
tp->sacked_out in this case.
After those two fixes, my testing has verified that shifting SACKed
ranges below snd_una could cause tp->sacked_out to go negative with
the following sequence of events:
(1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una,
then shifts a prefix of that skb that is below snd_una
(2) tcp_shifted_skb() increments the packet count of the
already-SACKed prev sk_buff
(3) tcp_sacktag_one() sees the end of the new SACKed range is below
snd_una, so it short-circuits and doesn't increase tp->sacked_out
(5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed,
decrements tp->sacked_out by this "inflated" pcount that was
missing a matching increase in tp->sacked_out, and hence
tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted
to s32 is negative.
(6) this leads to the warnings seen in the recent "WARNING: at
net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.:
tcp_input.c:3418 WARN_ON((int)tp->sacked_out < 0);
More generally, I think this bug can be tickled in some cases where
two or more ACKs from the receiver are lost and then a DSACK arrives
that is immediately above an existing SACKed skb in the write queue.
This fix changes tcp_shift_skb_data() to abort this sequence at step
(1) in the scenario above by noticing that the bytes are below snd_una
and not shifting them.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |_|_|_|_|_|_|/
|/| | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb
to mark the first portion as lost. This is for two primary reasons:
(1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When
doing this, it preserves the sum of their packet counts in order to
reflect the real-world dynamics on the wire. But given that skbs can
have remainders that do not align to MSS boundaries, this packet count
preservation means that for SACKed skbs there is not necessarily a
direct linear relationship between tcp_skb_pcount(skb) and
skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment
off and mark as lost a prefix of length (packets - oldcnt)*mss from
SACKed skbs were leading to occasional failures of the WARN_ON(len >
skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the
recent "crash in tcp_fragment" thread on netdev).
(2) there is no real point in fragmenting off part of a SACKed skb and
calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP
for SACKed skbs.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/ / / / / / /
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
When tcp_shifted_skb() shifts bytes from the skb that is currently
pointed to by 'highest_sack' then the increment of
TCP_SKB_CB(skb)->seq implicitly advances tcp_highest_sack_seq(). This
implicit advancement, combined with the recent fix to pass the correct
SACKed range into tcp_sacktag_one(), caused tcp_sacktag_one() to think
that the newly SACKed range was before the tcp_highest_sack_seq(),
leading to a call to tcp_update_reordering() with a degree of
reordering matching the size of the newly SACKed range (typically just
1 packet, which is a NOP, but potentially larger).
This commit fixes this by simply calling tcp_sacktag_one() before the
TCP_SKB_CB(skb)->seq advancement that can advance our notion of the
highest SACKed sequence.
Correspondingly, we can simplify the code a little now that
tcp_shifted_skb() should update the lost_cnt_hint in all cases where
skb == tp->lost_skb_hint.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
1) ICMP sockets leave err uninitialized but we try to return it for the
unsupported MSG_OOB case, reported by Dave Jones.
2) Add new Zaurus device ID entries, from Dave Jones.
3) Pointer calculation in hso driver memset is wrong, from Dan
Carpenter.
4) ks8851_probe() checks unsigned value as negative, fix also from Dan
Carpenter.
5) Fix crashes in atl1c driver due to TX queue handling, from Eric
Dumazet. I anticipate some TX side locking fixes coming in the near
future for this driver as well.
6) The inline directive fix in Bluetooth which was breaking the build
only with very new versions of GCC, from Johan Hedberg.
7) Fix crashes in the ATP CLIP code due to ARP cleanups this merge
window, reported by Meelis Roos and fixed by Eric Dumazet.
8) JME driver doesn't flush RX FIFO correctly, from Guo-Fu Tseng.
9) Some ip6_route_output() callers test the return value for NULL, but
this never happens as the convention is to return a dst entry with
dst->error set. Fixes from RonQing Li.
10) Logitech Harmony 900 should be handled by zaurus driver not
cdc_ether, update white lists and black lists accordingly. From
Scott Talbert.
11) Receiving from certain kinds of devices there won't be a MAC header,
so there is no MAC header to fixup in the IPSEC code, and if we try
to do it we'll crash. Fix from Eric Dumazet.
12) Port type array indexing off-by-one in mlx4 driver, fix from Yevgeny
Petrilin.
13) Fix regression in link-down handling in davinci_emac which causes
all RX descriptors to be freed up and therefore RX to wedge
completely, from Christian Riesch.
14) It took two attempts, but ctnetlink soft lockups seem to be
cured now, from Pablo Neira Ayuso.
15) Endianness bug fix in ENIC driver, from Santosh Nayak.
16) The long ago conversion of the PPP fragmentation code over to
abstracted SKB list handling wasn't perfect, once we get an
out of sequence SKB we don't flush the rest of them like we
should. From Ben McKeegan.
17) Fix regression of ->ip_summed initialization in sfc driver.
From Ben Hutchings.
18) Bluetooth timeout mistakenly using msecs instead of jiffies,
from Andrzej Kaczmarek.
19) Using _sync variant of work cancellation results in deadlocks,
use the non _sync variants instead. From Andre Guedes.
20) Bluetooth rfcomm code had reference counting problems leading
to crashes, fix from Octavian Purdila.
21) The conversion of netem over to classful qdisc handling added
two bugs to netem_dequeue(), fixes from Eric Dumazet.
22) Missing pci_iounmap() in ATM Solos driver. Fix from Julia Lawall.
23) b44_pci_exit() should not have __exit tag since it's invoked from
non-__exit code. From Nikola Pajkovsky.
24) The conversion of the neighbour hash tables over to RCU added a
race, fixed here by adding the necessary reread of tbl->nht, fix
from Michel Machado.
25) When we added VF (virtual function) attributes for network device
dumps, this potentially bloats up the size of the dump of one
network device such that the dump size is too large for the buffer
allocated by properly written netlink applications.
In particular, if you add 255 VFs to a network device, parts of
GLIBC stop working.
To fix this, we add an attribute that is used to turn on these
extended portions of the network device dump. Sophisticaed
applications like 'ip' that want to see this stuff will be changed
to set the attribute, whereas things like GLIBC that don't care
about VFs simply will not, and therefore won't be busted by the
mere presence of VFs on a network device.
Thanks to the tireless work of Greg Rose on this fix.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
sfc: Fix assignment of ip_summed for pre-allocated skbs
ppp: fix 'ppp_mp_reconstruct bad seq' errors
enic: Fix endianness bug.
gre: fix spelling in comments
netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
Revert "netfilter: ctnetlink: fix soft lockup when netlink adds new entries"
davinci_emac: Do not free all rx dma descriptors during init
mlx4_core: Fixing array indexes when setting port types
phy: IC+101G and PHY_HAS_INTERRUPT flag
netdev/phy/icplus: Correct broken phy_init code
ipsec: be careful of non existing mac headers
Move Logitech Harmony 900 from cdc_ether to zaurus
hso: memsetting wrong data in hso_get_count()
netfilter: ip6_route_output() never returns NULL.
ethernet/broadcom: ip6_route_output() never returns NULL.
ipv6: ip6_route_output() never returns NULL.
jme: Fix FIFO flush issue
atm: clip: remove clip_tbl
ipv4: ping: Fix recvmsg MSG_OOB error handling.
rtnetlink: Fix problem with buffer allocation
...
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The original spelling and bad word choice makes these comments hard to read.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Niccolo Belli reported ipsec crashes in case we handle a frame without
mac header (atm in his case)
Before copying mac header, better make sure it is present.
Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809
Reported-by: Niccolò Belli <darkbasic@linuxsystems.it>
Tested-by: Niccolò Belli <darkbasic@linuxsystems.it>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/ / / / / /
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Don't return an uninitialized variable as the error, return
-EOPNOTSUPP instead.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \ \ \ \ \
| |/ / / / / /
|/| | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Assorted fixes, sat in -next for a week or so...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ocfs2: deal with wraparounds of i_nlink in ocfs2_rename()
vfs: fix compat_sys_stat() handling of overflows in st_nlink
quota: Fix deadlock with suspend and quotas
vfs: Provide function to get superblock and wait for it to thaw
vfs: fix panic in __d_lookup() with high dentry hashtable counts
autofs4 - fix lockdep splat in autofs
vfs: fix d_inode_lookup() dentry ref leak
|
| | |_|_|_|/
| |/| | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When the number of dentry cache hash table entries gets too high
(2147483648 entries), as happens by default on a 16TB system, use of a
signed integer in the dcache_init() initialization loop prevents the
dentry_hashtable from getting initialized, causing a panic in
__d_lookup(). Fix this in dcache_init() and similar areas.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This commit ensures that lost_cnt_hint is correctly updated in
tcp_shifted_skb() for FACK TCP senders. The lost_cnt_hint adjustment
in tcp_sacktag_one() only applies to non-FACK senders, so FACK senders
need their own adjustment.
This applies the spirit of 1e5289e121372a3494402b1b131b41bfe1cf9b7f -
except now that the sequence range passed into tcp_sacktag_one() is
correct we need only have a special case adjustment for FACK.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Fix the newly-SACKed range to be the range of newly-shifted bytes.
Previously - since 832d11c5cd076abc0aa1eaf7be96c81d1a59ce41 -
tcp_shifted_skb() incorrectly called tcp_sacktag_one() with the start
and end sequence numbers of the skb it passes in set to the range just
beyond the range that is newly-SACKed.
This commit also removes a special-case adjustment to lost_cnt_hint in
tcp_shifted_skb() since the pre-existing adjustment of lost_cnt_hint
in tcp_sacktag_one() now properly handles this things now that the
correct start sequence number is passed in.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/ / / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This commit allows callers of tcp_sacktag_one() to pass in sequence
ranges that do not align with skb boundaries, as tcp_shifted_skb()
needs to do in an upcoming fix in this patch series.
In fact, now tcp_sacktag_one() does not need to depend on an input skb
at all, which makes its semantics and dependencies more clear.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Commit 653241 (net: RFC3069, private VLAN proxy arp support) changed
the behavior of arp proxy to send arp replies back out on the interface
the request came in even if the private VLAN feature is disabled.
Previously we checked rt->dst.dev != skb->dev for in scenarios, when
proxy arp is enabled on for the netdevice and also when individual proxy
neighbour entries have been added.
This patch adds the check back for the pneigh_lookup() scenario.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch fix a bug which introduced by commit ac8a4810 (ipv4: Save
nexthop address of LSRR/SSRR option to IPCB.).In that patch, we saved
the nexthop of SRR in ip_option->nexthop and update iph->daddr until
we get to ip_forward_options(), but we need to update it before
ip_rt_get_source(), otherwise we may get a wrong src.
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Tested-by: Anisse Astier <anisse@astier.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |_|_|/
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Binding RST packet outgoing interface to incoming interface
for tcp v4 when there is no socket associate with it.
when sk is not NULL, using sk->sk_bound_dev_if instead.
(suggested by Eric Dumazet).
This has few benefits:
1. tcp_v6_send_reset already did that.
2. This helps tcp connect with SO_BINDTODEVICE set. When
connection is lost, we still able to sending out RST using
same interface.
3. we are sending reply, it is most likely to be succeed
if iif is used
Signed-off-by: Shawn Lu <shawn.lu@ericsson.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit 4acb4190 tries to fix the using uninitialized value
introduced by commit 3dc43e3, but it would make the
per-socket memory limits too small.
This patch fixes this and also remove the redundant codes
introduced in 4acb4190.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Some of our machines were reporting:
TCP: too many of orphaned sockets
even when the number of orphaned sockets was well below the
limit.
We print a different message depending on whether we're out
of TCP memory or there are too many orphaned sockets.
Also move the check out of line and cleanup the messages
that were printed.
Signed-off-by: Arun Sharma <asharma@fb.com>
Suggested-by: Mohan Srinivasan <mohan@fb.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: David Miller <davem@davemloft.net>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This commit fixes tcp_trim_head() to recalculate the number of
segments in the skb with the skb's existing MSS, so trimming the head
causes the skb segment count to be monotonically non-increasing - it
should stay the same or go down, but not increase.
Previously tcp_trim_head() used the current MSS of the connection. But
if there was a decrease in MSS between original transmission and ACK
(e.g. due to PMTUD), this could cause tcp_trim_head() to
counter-intuitively increase the segment count when trimming bytes off
the head of an skb. This violated assumptions in tcp_tso_acked() that
tcp_trim_head() only decreases the packet count, so that packets_acked
in tcp_tso_acked() could underflow, leading tcp_clean_rtx_queue() to
pass u32 pkts_acked values as large as 0xffffffff to
ca_ops->pkts_acked().
As an aside, if tcp_trim_head() had really wanted the skb to reflect
the current MSS, it should have called tcp_set_skb_tso_segs()
unconditionally, since a decrease in MSS would mean that a
single-packet skb should now be sliced into multiple segments.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
sysctl_tcp_mem() initialization was moved to sysctl_tcp_ipv4.c
in commit 3dc43e3e4d0b52197d3205214fe8f162f9e0c334, since it
became a per-ns value.
That code, however, will never run when CONFIG_SYSCTL is
disabled, leading to bogus values on those fields - causing hung
TCP sockets.
This patch fixes it by keeping an initialization code in
tcp_init(). It will be overwritten by the first net namespace
init if CONFIG_SYSCTL is compiled in, and do the right thing if
it is compiled out.
It is also named properly as tcp_init_mem(), to properly signal
its non-sysctl side effect on TCP limits.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Glauber Costa <glommer@parallels.com>
Cc: David S. Miller <davem@davemloft.net>
Link: http://lkml.kernel.org/r/4F22D05A.8030604@parallels.com
[ renamed the function, tidied up the changelog a bit ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Tunnel devices set NETIF_F_LLTX to bypass HARD_TX_LOCK. Sit and
ipip set this unconditionally in ops->setup, but gre enables it
conditionally after parameter passing in ops->newlink. This is
not called during tunnel setup as below, however, so GRE tunnels are
still taking the lock.
modprobe ip_gre
ip tunnel add test0 mode gre remote 10.5.1.1 dev lo
ip link set test0 up
ip addr add 10.6.0.1 dev test0
# cat /sys/class/net/test0/features
# $DIR/test_tunnel_xmit 10 10.5.2.1
ip route add 10.5.2.0/24 dev test0
ip tunnel del test0
The newlink callback is only called in rtnl_netlink, and only if
the device is new, as it calls register_netdevice internally. Gre
tunnels are created at 'ip tunnel add' with ioctl SIOCADDTUNNEL,
which calls ipgre_tunnel_locate, which calls register_netdev.
rtnl_newlink is called at 'ip link set', but skips ops->newlink
and the device is up with locking still enabled. The equivalent
ipip tunnel works fine, btw (just substitute 'method gre' for
'method ipip').
On kernels before /sys/class/net/*/features was removed [1],
the first commented out line returns 0x6000 with method gre,
which indicates that NETIF_F_LLTX (0x1000) is not set. With ipip,
it reports 0x7000. This test cannot be used on recent kernels where
the sysfs file is removed (and ETHTOOL_GFEATURES does not currently
work for tunnel devices, because they lack dev->ethtool_ops).
The second commented out line calls a simple transmission test [2]
that sends on 24 cores at maximum rate. Results of a single run:
ipip: 19,372,306
gre before patch: 4,839,753
gre after patch: 19,133,873
This patch replicates the condition check in ipgre_newlink to
ipgre_tunnel_locate. It works for me, both with oseq on and off.
This is the first time I looked at rtnetlink and iproute2 code,
though, so someone more knowledgeable should probably check the
patch. Thanks.
The tail of both functions is now identical, by the way. To avoid
code duplication, I'll be happy to rework this and merge the two.
[1] http://patchwork.ozlabs.org/patch/104610/
[2] http://kernel.googlecode.com/files/xmit_udp_parallel.c
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Port autoselection finds a port and then drop the lock,
then right after that, gets the hash bucket again and lock it.
Fix it to go direct.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The current code checks for conflicts when the application
requests a specific port. If there is no conflict, then
the request is granted.
On the other hand, the port autoselection done by the kernel
fails when all ports are bound even when there is a port
with no conflict available.
The fix changes port autoselection to check if there is a
conflict and use it if not.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
md5 key is added in socket through remote address.
remote address should be used in finding md5 key when
sending out reset packet.
Signed-off-by: shawnlu <shawn.lu@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Correctly implement a loss detection heuristic: New sequences (above
high_seq) sent during the fast recovery are deemed lost when higher
sequences are SACKed.
Current code does not catch these losses, because tcp_mark_head_lost()
does not check packets beyond high_seq. The fix is straight-forward by
checking packets until the highest sacked packet. In addition, all the
FLAG_DATA_LOST logic are in-effective and redundant and can be removed.
Update the loss heuristic comments. The algorithm above is documented
as heuristic B, but it is redundant too because heuristic A already
covers B.
Note that this change only marks some forward-retransmitted packets LOST.
It does NOT forbid TCP performing further CWR on new losses. A potential
follow-up patch under preparation is to perform another CWR on "new"
losses such as
1) sequence above high_seq is lost (by resetting high_seq to snd_nxt)
2) retransmission is lost.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch fixes CUBIC so that cwnd reductions made during RTOs can be
undone (just as they already can be undone when using the default/Reno
behavior).
When undoing cwnd reductions, BIC-derived congestion control modules
were restoring the cwnd from last_max_cwnd. There were two problems
with using last_max_cwnd to restore a cwnd during undo:
(a) last_max_cwnd was set to 0 on state transitions into TCP_CA_Loss
(by calling the module's reset() functions), so cwnd reductions from
RTOs could not be undone.
(b) when fast_covergence is enabled (which it is by default)
last_max_cwnd does not actually hold the value of snd_cwnd before the
loss; instead, it holds a scaled-down version of snd_cwnd.
This patch makes the following changes:
(1) upon undo, revert snd_cwnd to ca->loss_cwnd, which is already, as
the existing comment notes, the "congestion window at last loss"
(2) stop forgetting ca->loss_cwnd on TCP_CA_Loss events
(3) use ca->last_max_cwnd to check if we're in slow start
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Sangtae Ha <sangtae.ha@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes BIC so that cwnd reductions made during RTOs can be
undone (just as they already can be undone when using the default/Reno
behavior).
When undoing cwnd reductions, BIC-derived congestion control modules
were restoring the cwnd from last_max_cwnd. There were two problems
with using last_max_cwnd to restore a cwnd during undo:
(a) last_max_cwnd was set to 0 on state transitions into TCP_CA_Loss
(by calling the module's reset() functions), so cwnd reductions from
RTOs could not be undone.
(b) when fast_covergence is enabled (which it is by default)
last_max_cwnd does not actually hold the value of snd_cwnd before the
loss; instead, it holds a scaled-down version of snd_cwnd.
This patch makes the following changes:
(1) upon undo, revert snd_cwnd to ca->loss_cwnd, which is already, as
the existing comment notes, the "congestion window at last loss"
(2) stop forgetting ca->loss_cwnd on TCP_CA_Loss events
(3) use ca->last_max_cwnd to check if we're in slow start
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
tg3: Fix single-vector MSI-X code
openvswitch: Fix multipart datapath dumps.
ipv6: fix per device IP snmp counters
inetpeer: initialize ->redirect_genid in inet_getpeer()
net: fix NULL-deref in WARN() in skb_gso_segment()
net: WARN if skb_checksum_help() is called on skb requiring segmentation
caif: Remove bad WARN_ON in caif_dev
caif: Fix typo in Vendor/Product-ID for CAIF modems
bnx2x: Disable AN KR work-around for BCM57810
bnx2x: Remove AutoGrEEEn for BCM84833
bnx2x: Remove 100Mb force speed for BCM84833
bnx2x: Fix PFC setting on BCM57840
bnx2x: Fix Super-Isolate mode for BCM84833
net: fix some sparse errors
net: kill duplicate included header
net: sh-eth: Fix build error by the value which is not defined
net: Use device model to get driver name in skb_gso_segment()
bridge: BH already disabled in br_fdb_cleanup()
net: move sock_update_memcg outside of CONFIG_INET
mwl8k: Fixing Sparse ENDIAN CHECK warning
...
|
| |
| |
| |
| |
| |
| |
| |
| | |
kmemcheck complains that ->redirect_genid doesn't get initialized.
Presumably it should be set to zero.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|