aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | | | tcp: md5: remove obsolete md5_add() methodEric Dumazet2012-01-311-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We no longer use md5_add() method from struct tcp_sock_af_ops Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | ipv4: ip_gre: Convert to dst_neigh_lookup()David S. Miller2012-01-271-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conversion is very similar to that made to ipv6's SIT code. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | ipv4/ipv6: Prepare for new route gateway semantics.David S. Miller2012-01-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the future the ipv4/ipv6 route gateway will take on two types of values: 1) INADDR_ANY/IN6ADDR_ANY, for local network routes, and in this case the neighbour must be obtained using the destination address in ipv4/ipv6 header as the lookup key. 2) Everything else, the actual nexthop route address. So if the gateway is not inaddr-any we use it, otherwise we must use the packet's destination address. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | tcp: add LINUX_MIB_TCPRETRANSFAIL counterEric Dumazet2012-01-262-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It might be useful to get a counter of failed tcp_retransmit_skb() calls. Reported-by: Satoru Moriya <satoru.moriya@hds.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-01-245-36/+29
| |\ \ \ \ \ \ \ \ \
| * | | | | | | | | | ip_gre: Fix bug added to ipgre_tunnel_xmit().David S. Miller2012-01-241-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can remove the rt_gateway == 0 check but we shouldn't remove the 'dst' initialization too. Noticed by Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | | ipip: Fix bug added to ipip_tunnel_xmit().David S. Miller2012-01-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can remove the rt_gateway == 0 check but we shouldn't remove the 'dst' initialization too. Noticed by Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | | ipv4: Remove bogus checks of rt_gateway being zero.David S. Miller2012-01-242-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It can never actually happen. rt_gateway is either the fully resolved flow lookup key's destination address, or the non-zero FIB entry gateway address. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | Merge branch 'for-3.4' of ↵Linus Torvalds2012-03-201-1/+1
|\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup changes from Tejun Heo: "Out of the 8 commits, one fixes a long-standing locking issue around tasklist walking and others are cleanups." * 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Walk task list under tasklist_lock in cgroup_enable_task_cg_list cgroup: Remove wrong comment on cgroup_enable_task_cg_list() cgroup: remove cgroup_subsys argument from callbacks cgroup: remove extra calls to find_existing_css_set cgroup: replace tasklist_lock with rcu_read_lock cgroup: simplify double-check locking in cgroup_attach_proc cgroup: move struct cgroup_pidlist out from the header file cgroup: remove cgroup_attach_task_current_cg()
| * | | | | | | | | | | cgroup: remove cgroup_subsys argument from callbacksLi Zefan2012-02-021-1/+1
| |/ / / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The argument is not used at all, and it's not necessary, because a specific callback handler of course knows which subsys it belongs to. Now only ->pupulate() takes this argument, because the handlers of this callback always call cgroup_add_file()/cgroup_add_files(). So we reduce a few lines of code, though the shrinking of object size is minimal. 16 files changed, 113 insertions(+), 162 deletions(-) text data bss dec hex filename 5486240 656987 7039960 13183187 c928d3 vmlinux.o.orig 5486170 656987 7039960 13183117 c9288d vmlinux.o Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | | | | | | | | | Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2012-03-201-3/+3
|\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events changes for v3.4 from Ingo Molnar: - New "hardware based branch profiling" feature both on the kernel and the tooling side, on CPUs that support it. (modern x86 Intel CPUs with the 'LBR' hardware feature currently.) This new feature is basically a sophisticated 'magnifying glass' for branch execution - something that is pretty difficult to extract from regular, function histogram centric profiles. The simplest mode is activated via 'perf record -b', and the result looks like this in perf report: $ perf record -b any_call,u -e cycles:u branchy $ perf report -b --sort=symbol 52.34% [.] main [.] f1 24.04% [.] f1 [.] f3 23.60% [.] f1 [.] f2 0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow 0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn 0.01% [k] _IO_vfprintf_internal [k] strchrnul 0.01% [k] __printf [k] _IO_vfprintf_internal 0.01% [k] main [k] __printf This output shows from/to branch columns and shows the highest percentage (from,to) jump combinations - i.e. the most likely taken branches in the system. "branches" can also include function calls and any other synchronous and asynchronous transitions of the instruction pointer that are not 'next instruction' - such as system calls, traps, interrupts, etc. This feature comes with (hopefully intuitive) flat ascii and TUI support in perf report. - Various 'perf annotate' visual improvements for us assembly junkies. It will now recognize function calls in the TUI and by hitting enter you can follow the call (recursively) and back, amongst other improvements. - Multiple threads/processes recording support in perf record, perf stat, perf top - which is activated via a comma-list of PIDs: perf top -p 21483,21485 perf stat -p 21483,21485 -ddd perf record -p 21483,21485 - Support for per UID views, via the --uid paramter to perf top, perf report, etc. For example 'perf top --uid mingo' will only show the tasks that I am running, excluding other users, root, etc. - Jump label restructurings and improvements - this includes the factoring out of the (hopefully much clearer) include/linux/static_key.h generic facility: struct static_key key = STATIC_KEY_INIT_FALSE; ... if (static_key_false(&key)) do unlikely code else do likely code ... static_key_slow_inc(); ... static_key_slow_inc(); ... The static_key_false() branch will be generated into the code with as little impact to the likely code path as possible. the static_key_slow_*() APIs flip the branch via live kernel code patching. This facility can now be used more widely within the kernel to micro-optimize hot branches whose likelihood matches the static-key usage and fast/slow cost patterns. - SW function tracer improvements: perf support and filtering support. - Various hardenings of the perf.data ABI, to make older perf.data's smoother on newer tool versions, to make new features integrate more smoothly, to support cross-endian recording/analyzing workflows better, etc. - Restructuring of the kprobes code, the splitting out of 'optprobes', and a corner case bugfix. - Allow the tracing of kernel console output (printk). - Improvements/fixes to user-space RDPMC support, allowing user-space self-profiling code to extract PMU counts without performing any system calls, while playing nice with the kernel side. - 'perf bench' improvements - ... and lots of internal restructurings, cleanups and fixes that made these features possible. And, as usual this list is incomplete as there were also lots of other improvements * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits) perf report: Fix annotate double quit issue in branch view mode perf report: Remove duplicate annotate choice in branch view mode perf/x86: Prettify pmu config literals perf report: Enable TUI in branch view mode perf report: Auto-detect branch stack sampling mode perf record: Add HEADER_BRANCH_STACK tag perf record: Provide default branch stack sampling mode option perf tools: Make perf able to read files from older ABIs perf tools: Fix ABI compatibility bug in print_event_desc() perf tools: Enable reading of perf.data files from different ABI rev perf: Add ABI reference sizes perf report: Add support for taken branch sampling perf record: Add support for sampling taken branch perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK x86/kprobes: Split out optprobe related code to kprobes-opt.c x86/kprobes: Fix a bug which can modify kernel code permanently x86/kprobes: Fix instruction recovery on optimized path perf: Add callback to flush branch_stack on context switch perf: Disable PERF_SAMPLE_BRANCH_* when not supported perf/x86: Add LBR software filter support for Intel CPUs ...
| * \ \ \ \ \ \ \ \ \ \ Merge branch 'perf/urgent' into perf/coreIngo Molnar2012-03-123-19/+97
| |\ \ \ \ \ \ \ \ \ \ \ | | | |_|_|_|_|_|_|_|/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: We are going to queue up a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | | | | | Merge branch 'perf/urgent' into perf/coreIngo Molnar2012-03-0512-50/+68
| |\ \ \ \ \ \ \ \ \ \ \ | | | |_|_|_|_|_|/ / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: tools/perf/builtin-record.c tools/perf/builtin-top.c tools/perf/perf.h tools/perf/util/top.h Merge reason: resolve these cherry-picking conflicts. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | | | | | static keys: Introduce 'struct static_key', static_key_true()/false() and ↵Ingo Molnar2012-02-241-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | static_key_slow_[inc|dec]() So here's a boot tested patch on top of Jason's series that does all the cleanups I talked about and turns jump labels into a more intuitive to use facility. It should also address the various misconceptions and confusions that surround jump labels. Typical usage scenarios: #include <linux/static_key.h> struct static_key key = STATIC_KEY_INIT_TRUE; if (static_key_false(&key)) do unlikely code else do likely code Or: if (static_key_true(&key)) do likely code else do unlikely code The static key is modified via: static_key_slow_inc(&key); ... static_key_slow_dec(&key); The 'slow' prefix makes it abundantly clear that this is an expensive operation. I've updated all in-kernel code to use this everywhere. Note that I (intentionally) have not pushed through the rename blindly through to the lowest levels: the actual jump-label patching arch facility should be named like that, so we want to decouple jump labels from the static-key facility a bit. On non-jump-label enabled architectures static keys default to likely()/unlikely() branches. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jason Baron <jbaron@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: a.p.zijlstra@chello.nl Cc: mathieu.desnoyers@efficios.com Cc: davem@davemloft.net Cc: ddaney.cavm@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | | | | | | | | Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2012-03-202-14/+4
|\ \ \ \ \ \ \ \ \ \ \ \ | |_|_|_|_|_|_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes for v3.4 from Ingo Molnar. The major features of this series are: - making RCU more aggressive about entering dyntick-idle mode in order to improve energy efficiency - converting a few more call_rcu()s to kfree_rcu()s - applying a number of rcutree fixes and cleanups to rcutiny - removing CONFIG_SMP #ifdefs from treercu - allowing RCU CPU stall times to be set via sysfs - adding CPU-stall capability to rcutorture - adding more RCU-abuse diagnostics - updating documentation - fixing yet more issues located by the still-ongoing top-to-bottom inspection of RCU, this time with a special focus on the CPU-hotplug code path. * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) rcu: Stop spurious warnings from synchronize_sched_expedited rcu: Hold off RCU_FAST_NO_HZ after timer posted rcu: Eliminate softirq-mediated RCU_FAST_NO_HZ idle-entry loop rcu: Add RCU_NONIDLE() for idle-loop RCU read-side critical sections rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit() rcu: Remove redundant check for rcu_head misalignment PTR_ERR should be called before its argument is cleared. rcu: Convert WARN_ON_ONCE() in rcu_lock_acquire() to lockdep rcu: Trace only after NULL-pointer check rcu: Call out dangers of expedited RCU primitives rcu: Rework detection of use of RCU by offline CPUs lockdep: Add CPU-idle/offline warning to lockdep-RCU splat rcu: No interrupt disabling for rcu_prepare_for_idle() rcu: Move synchronize_sched_expedited() to rcutree.c rcu: Check for illegal use of RCU from offlined CPUs rcu: Update stall-warning documentation rcu: Add CPU-stall capability to rcutorture rcu: Make documentation give more realistic rcutorture duration rcutorture: Permit holding off CPU-hotplug operations during boot rcu: Print scheduling-clock information on RCU CPU stall-warning messages ...
| * | | | | | | | | | | Merge branch 'rcu/next' of ↵Ingo Molnar2012-02-282-14/+4
| |\ \ \ \ \ \ \ \ \ \ \ | | |_|/ / / / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu The major features of this series are: - making RCU more aggressive about entering dyntick-idle mode in order to improve energy efficiency - converting a few more call_rcu()s to kfree_rcu()s - applying a number of rcutree fixes and cleanups to rcutiny - removing CONFIG_SMP #ifdefs from treercu - allowing RCU CPU stall times to be set via sysfs - adding CPU-stall capability to rcutorture - adding more RCU-abuse diagnostics - updating documentation - fixing yet more issues located by the still-ongoing top-to-bottom inspection of RCU, this time with a special focus on the CPU-hotplug code path. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | | | | | ipv4: Convert call_rcu() to kfree_rcu(), drop opt_kfree_rcuPaul E. McKenney2012-02-211-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The call_rcu() in do_ip_setsockopt() invokes opt_kfree_rcu(), which just calls kfree(). So convert the call_rcu() to kfree_rcu(), which allows opt_kfree_rcu() to be eliminated. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: netdev@vger.kernel.org
| | * | | | | | | | | | ipv4: Convert call_rcu() to kfree_rcu(), drop opt_kfree_rcu()Paul E. McKenney2012-02-211-8/+3
| | | |_|_|_|_|_|/ / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because opt_kfree_rcu() just calls kfree(), all call_rcu() uses of it may be converted to kfree_rcu(). This permits opt_kfree_rcu() to be eliminated. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: netdev@vger.kernel.org
* | | | | | | | | | | tcp: fix syncookie regressionEric Dumazet2012-03-112-17/+23
| |_|_|/ / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ea4fc0d619 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit()) added a serious regression on synflood handling. Simon Kirby discovered a successful connection was delayed by 20 seconds before being responsive. In my tests, I discovered that xmit frames were lost, and needed ~4 retransmits and a socket dst rebuild before being really sent. In case of syncookie initiated connection, we use a different path to initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared. As ip_queue_xmit() now depends on inet flow being setup, fix this by copying the temp flowi4 we use in cookie_v4_check(). Reported-by: Simon Kirby <sim@netnation.com> Bisected-by: Simon Kirby <sim@netnation.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | route: Remove redirect_genidSteffen Klassert2012-03-082-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we invalidate the inetpeer tree along with the routing cache now, we don't need a genid to reset the redirect handling when the routing cache is flushed. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | inetpeer: Invalidate the inetpeer tree along with the routing cacheSteffen Klassert2012-03-082-1/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We initialize the routing metrics with the values cached on the inetpeer in rt_init_metrics(). So if we have the metrics cached on the inetpeer, we ignore the user configured fib_metrics. To fix this issue, we replace the old tree with a fresh initialized inet_peer_base. The old tree is removed later with a delayed work queue. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | tcp: fix tcp_shift_skb_data() to not shift SACKed data below snd_unaNeal Cardwell2012-03-061-0/+4
| |_|_|_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes tcp_shift_skb_data() so that it does not shift SACKed data below snd_una. This fixes an issue whose symptoms exactly match reports showing tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at net/ipv4/tcp_input.c:3418" thread on netdev). Since 2008 (832d11c5cd076abc0aa1eaf7be96c81d1a59ce41) tcp_shift_skb_data() had been shifting SACKed ranges that were below snd_una. It checked that the *end* of the skb it was about to shift from was above snd_una, but did not check that the end of the actual shifted range was above snd_una; this commit adds that check. Shifting SACKed ranges below snd_una is problematic because for such ranges tcp_sacktag_one() short-circuits: it does not declare anything as SACKed and does not increase sacked_out. Before the fixes in commits cc9a672ee522d4805495b98680f4a3db5d0a0af9 and daef52bab1fd26e24e8e9578f8fb33ba1d0cb412, shifting SACKed ranges below snd_una happened to work because tcp_shifted_skb() was always (incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence tcp_sacktag_one() never short-circuited and always increased tp->sacked_out in this case. After those two fixes, my testing has verified that shifting SACKed ranges below snd_una could cause tp->sacked_out to go negative with the following sequence of events: (1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una, then shifts a prefix of that skb that is below snd_una (2) tcp_shifted_skb() increments the packet count of the already-SACKed prev sk_buff (3) tcp_sacktag_one() sees the end of the new SACKed range is below snd_una, so it short-circuits and doesn't increase tp->sacked_out (5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed, decrements tp->sacked_out by this "inflated" pcount that was missing a matching increase in tp->sacked_out, and hence tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted to s32 is negative. (6) this leads to the warnings seen in the recent "WARNING: at net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.: tcp_input.c:3418 WARN_ON((int)tp->sacked_out < 0); More generally, I think this bug can be tickled in some cases where two or more ACKs from the receiver are lost and then a DSACK arrives that is immediately above an existing SACKed skb in the write queue. This fix changes tcp_shift_skb_data() to abort this sequence at step (1) in the scenario above by noticing that the bytes are below snd_una and not shifting them. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | tcp: don't fragment SACKed skbs in tcp_mark_head_lost()Neal Cardwell2012-03-031-0/+1
| |_|_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb to mark the first portion as lost. This is for two primary reasons: (1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When doing this, it preserves the sum of their packet counts in order to reflect the real-world dynamics on the wire. But given that skbs can have remainders that do not align to MSS boundaries, this packet count preservation means that for SACKed skbs there is not necessarily a direct linear relationship between tcp_skb_pcount(skb) and skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment off and mark as lost a prefix of length (packets - oldcnt)*mss from SACKed skbs were leading to occasional failures of the WARN_ON(len > skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the recent "crash in tcp_fragment" thread on netdev). (2) there is no real point in fragmenting off part of a SACKed skb and calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP for SACKed skbs. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | tcp: fix false reordering signal in tcp_shifted_skbNeal Cardwell2012-02-281-8/+10
|/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When tcp_shifted_skb() shifts bytes from the skb that is currently pointed to by 'highest_sack' then the increment of TCP_SKB_CB(skb)->seq implicitly advances tcp_highest_sack_seq(). This implicit advancement, combined with the recent fix to pass the correct SACKed range into tcp_sacktag_one(), caused tcp_sacktag_one() to think that the newly SACKed range was before the tcp_highest_sack_seq(), leading to a call to tcp_update_reordering() with a degree of reordering matching the size of the newly SACKed range (typically just 1 packet, which is a NOP, but potentially larger). This commit fixes this by simply calling tcp_sacktag_one() before the TCP_SKB_CB(skb)->seq advancement that can advance our notion of the highest SACKed sequence. Correspondingly, we can simplify the code a little now that tcp_shifted_skb() should update the lost_cnt_hint in all cases where skb == tp->lost_skb_hint. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2012-02-264-13/+9
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) ICMP sockets leave err uninitialized but we try to return it for the unsupported MSG_OOB case, reported by Dave Jones. 2) Add new Zaurus device ID entries, from Dave Jones. 3) Pointer calculation in hso driver memset is wrong, from Dan Carpenter. 4) ks8851_probe() checks unsigned value as negative, fix also from Dan Carpenter. 5) Fix crashes in atl1c driver due to TX queue handling, from Eric Dumazet. I anticipate some TX side locking fixes coming in the near future for this driver as well. 6) The inline directive fix in Bluetooth which was breaking the build only with very new versions of GCC, from Johan Hedberg. 7) Fix crashes in the ATP CLIP code due to ARP cleanups this merge window, reported by Meelis Roos and fixed by Eric Dumazet. 8) JME driver doesn't flush RX FIFO correctly, from Guo-Fu Tseng. 9) Some ip6_route_output() callers test the return value for NULL, but this never happens as the convention is to return a dst entry with dst->error set. Fixes from RonQing Li. 10) Logitech Harmony 900 should be handled by zaurus driver not cdc_ether, update white lists and black lists accordingly. From Scott Talbert. 11) Receiving from certain kinds of devices there won't be a MAC header, so there is no MAC header to fixup in the IPSEC code, and if we try to do it we'll crash. Fix from Eric Dumazet. 12) Port type array indexing off-by-one in mlx4 driver, fix from Yevgeny Petrilin. 13) Fix regression in link-down handling in davinci_emac which causes all RX descriptors to be freed up and therefore RX to wedge completely, from Christian Riesch. 14) It took two attempts, but ctnetlink soft lockups seem to be cured now, from Pablo Neira Ayuso. 15) Endianness bug fix in ENIC driver, from Santosh Nayak. 16) The long ago conversion of the PPP fragmentation code over to abstracted SKB list handling wasn't perfect, once we get an out of sequence SKB we don't flush the rest of them like we should. From Ben McKeegan. 17) Fix regression of ->ip_summed initialization in sfc driver. From Ben Hutchings. 18) Bluetooth timeout mistakenly using msecs instead of jiffies, from Andrzej Kaczmarek. 19) Using _sync variant of work cancellation results in deadlocks, use the non _sync variants instead. From Andre Guedes. 20) Bluetooth rfcomm code had reference counting problems leading to crashes, fix from Octavian Purdila. 21) The conversion of netem over to classful qdisc handling added two bugs to netem_dequeue(), fixes from Eric Dumazet. 22) Missing pci_iounmap() in ATM Solos driver. Fix from Julia Lawall. 23) b44_pci_exit() should not have __exit tag since it's invoked from non-__exit code. From Nikola Pajkovsky. 24) The conversion of the neighbour hash tables over to RCU added a race, fixed here by adding the necessary reread of tbl->nht, fix from Michel Machado. 25) When we added VF (virtual function) attributes for network device dumps, this potentially bloats up the size of the dump of one network device such that the dump size is too large for the buffer allocated by properly written netlink applications. In particular, if you add 255 VFs to a network device, parts of GLIBC stop working. To fix this, we add an attribute that is used to turn on these extended portions of the network device dump. Sophisticaed applications like 'ip' that want to see this stuff will be changed to set the attribute, whereas things like GLIBC that don't care about VFs simply will not, and therefore won't be busted by the mere presence of VFs on a network device. Thanks to the tireless work of Greg Rose on this fix. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits) sfc: Fix assignment of ip_summed for pre-allocated skbs ppp: fix 'ppp_mp_reconstruct bad seq' errors enic: Fix endianness bug. gre: fix spelling in comments netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2) Revert "netfilter: ctnetlink: fix soft lockup when netlink adds new entries" davinci_emac: Do not free all rx dma descriptors during init mlx4_core: Fixing array indexes when setting port types phy: IC+101G and PHY_HAS_INTERRUPT flag netdev/phy/icplus: Correct broken phy_init code ipsec: be careful of non existing mac headers Move Logitech Harmony 900 from cdc_ether to zaurus hso: memsetting wrong data in hso_get_count() netfilter: ip6_route_output() never returns NULL. ethernet/broadcom: ip6_route_output() never returns NULL. ipv6: ip6_route_output() never returns NULL. jme: Fix FIFO flush issue atm: clip: remove clip_tbl ipv4: ping: Fix recvmsg MSG_OOB error handling. rtnetlink: Fix problem with buffer allocation ...
| * | | | | | | gre: fix spelling in commentsstephen hemminger2012-02-241-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original spelling and bad word choice makes these comments hard to read. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | ipsec: be careful of non existing mac headersEric Dumazet2012-02-232-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Niccolo Belli reported ipsec crashes in case we handle a frame without mac header (atm in his case) Before copying mac header, better make sure it is present. Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809 Reported-by: Niccolò Belli <darkbasic@linuxsystems.it> Tested-by: Niccolò Belli <darkbasic@linuxsystems.it> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | ipv4: ping: Fix recvmsg MSG_OOB error handling.David S. Miller2012-02-211-0/+1
| |/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't return an uninitialized variable as the error, return -EOPNOTSUPP instead. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-02-201-2/+3
|\ \ \ \ \ \ \ | |/ / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Assorted fixes, sat in -next for a week or so... * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ocfs2: deal with wraparounds of i_nlink in ocfs2_rename() vfs: fix compat_sys_stat() handling of overflows in st_nlink quota: Fix deadlock with suspend and quotas vfs: Provide function to get superblock and wait for it to thaw vfs: fix panic in __d_lookup() with high dentry hashtable counts autofs4 - fix lockdep splat in autofs vfs: fix d_inode_lookup() dentry ref leak
| * | | | | | vfs: fix panic in __d_lookup() with high dentry hashtable countsDimitri Sivanich2012-02-131-2/+3
| | |_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the number of dentry cache hash table entries gets too high (2147483648 entries), as happens by default on a 16TB system, use of a signed integer in the dcache_init() initialization loop prevents the dentry_hashtable from getting initialized, causing a panic in __d_lookup(). Fix this in dcache_init() and similar areas. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | tcp: fix tcp_shifted_skb() adjustment of lost_cnt_hint for FACKNeal Cardwell2012-02-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit ensures that lost_cnt_hint is correctly updated in tcp_shifted_skb() for FACK TCP senders. The lost_cnt_hint adjustment in tcp_sacktag_one() only applies to non-FACK senders, so FACK senders need their own adjustment. This applies the spirit of 1e5289e121372a3494402b1b131b41bfe1cf9b7f - except now that the sequence range passed into tcp_sacktag_one() is correct we need only have a special case adjustment for FACK. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | tcp: fix range tcp_shifted_skb() passes to tcp_sacktag_one()Neal Cardwell2012-02-131-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the newly-SACKed range to be the range of newly-shifted bytes. Previously - since 832d11c5cd076abc0aa1eaf7be96c81d1a59ce41 - tcp_shifted_skb() incorrectly called tcp_sacktag_one() with the start and end sequence numbers of the skb it passes in set to the range just beyond the range that is newly-SACKed. This commit also removes a special-case adjustment to lost_cnt_hint in tcp_shifted_skb() since the pre-existing adjustment of lost_cnt_hint in tcp_sacktag_one() now properly handles this things now that the correct start sequence number is passed in. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | tcp: allow tcp_sacktag_one() to tag ranges not aligned with skbsNeal Cardwell2012-02-131-14/+22
|/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit allows callers of tcp_sacktag_one() to pass in sequence ranges that do not align with skb boundaries, as tcp_shifted_skb() needs to do in an upcoming fix in this patch series. In fact, now tcp_sacktag_one() does not need to depend on an input skb at all, which makes its semantics and dependencies more clear. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: Don't proxy arp respond if iif == rt->dst.dev if private VLAN is disabledThomas Graf2012-02-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 653241 (net: RFC3069, private VLAN proxy arp support) changed the behavior of arp proxy to send arp replies back out on the interface the request came in even if the private VLAN feature is disabled. Previously we checked rt->dst.dev != skb->dev for in scenarios, when proxy arp is enabled on for the netdevice and also when individual proxy neighbour entries have been added. This patch adds the check back for the pneigh_lookup() scenario. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | ipv4: Fix wrong order of ip_rt_get_source() and update iph->daddr.Li Wei2012-02-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fix a bug which introduced by commit ac8a4810 (ipv4: Save nexthop address of LSRR/SSRR option to IPCB.).In that patch, we saved the nexthop of SRR in ip_option->nexthop and update iph->daddr until we get to ip_forward_options(), but we need to update it before ip_rt_get_source(), otherwise we may get a wrong src. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: Fix build regression when INET_UDP_DIAG=y and IPV6=mAnisse Astier2012-02-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tested-by: Anisse Astier <anisse@astier.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp_v4_send_reset: binding oif to iif in no sock caseShawn Lu2012-02-041-0/+5
| |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Binding RST packet outgoing interface to incoming interface for tcp v4 when there is no socket associate with it. when sk is not NULL, using sk->sk_bound_dev_if instead. (suggested by Eric Dumazet). This has few benefits: 1. tcp_v6_send_reset already did that. 2. This helps tcp connect with SO_BINDTODEVICE set. When connection is lost, we still able to sending out RST using same interface. 3. we are sending reply, it is most likely to be succeed if iif is used Signed-off-by: Shawn Lu <shawn.lu@ericsson.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: properly initialize tcp memory limitsJason Wang2012-02-022-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4acb4190 tries to fix the using uninitialized value introduced by commit 3dc43e3, but it would make the per-socket memory limits too small. This patch fixes this and also remove the redundant codes introduced in 4acb4190. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: Disambiguate kernel messageArun Sharma2012-02-012-8/+16
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of our machines were reporting: TCP: too many of orphaned sockets even when the number of orphaned sockets was well below the limit. We print a different message depending on whether we're out of TCP memory or there are too many orphaned sockets. Also move the check out of line and cleanup the messages that were printed. Signed-off-by: Arun Sharma <asharma@fb.com> Suggested-by: Mohan Srinivasan <mohan@fb.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: David Miller <davem@davemloft.net> Cc: Glauber Costa <glommer@parallels.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: fix tcp_trim_head() to adjust segment count with skb MSSNeal Cardwell2012-01-301-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes tcp_trim_head() to recalculate the number of segments in the skb with the skb's existing MSS, so trimming the head causes the skb segment count to be monotonically non-increasing - it should stay the same or go down, but not increase. Previously tcp_trim_head() used the current MSS of the connection. But if there was a decrease in MSS between original transmission and ACK (e.g. due to PMTUD), this could cause tcp_trim_head() to counter-intuitively increase the segment count when trimming bytes off the head of an skb. This violated assumptions in tcp_tso_acked() that tcp_trim_head() only decreases the packet count, so that packets_acked in tcp_tso_acked() could underflow, leading tcp_clean_rtx_queue() to pass u32 pkts_acked values as large as 0xffffffff to ca_ops->pkts_acked(). As an aside, if tcp_trim_head() had really wanted the skb to reflect the current MSS, it should have called tcp_set_skb_tso_segs() unconditionally, since a decrease in MSS would mean that a single-packet skb should now be sliced into multiple segments. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/tcp: Fix tcp memory limits initialization when !CONFIG_SYSCTLGlauber Costa2012-01-302-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sysctl_tcp_mem() initialization was moved to sysctl_tcp_ipv4.c in commit 3dc43e3e4d0b52197d3205214fe8f162f9e0c334, since it became a per-ns value. That code, however, will never run when CONFIG_SYSCTL is disabled, leading to bogus values on those fields - causing hung TCP sockets. This patch fixes it by keeping an initialization code in tcp_init(). It will be overwritten by the first net namespace init if CONFIG_SYSCTL is compiled in, and do the right thing if it is compiled out. It is also named properly as tcp_init_mem(), to properly signal its non-sysctl side effect on TCP limits. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: David S. Miller <davem@davemloft.net> Link: http://lkml.kernel.org/r/4F22D05A.8030604@parallels.com [ renamed the function, tidied up the changelog a bit ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: Fix ip_gre lockless xmits.Willem de Bruijn2012-01-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tunnel devices set NETIF_F_LLTX to bypass HARD_TX_LOCK. Sit and ipip set this unconditionally in ops->setup, but gre enables it conditionally after parameter passing in ops->newlink. This is not called during tunnel setup as below, however, so GRE tunnels are still taking the lock. modprobe ip_gre ip tunnel add test0 mode gre remote 10.5.1.1 dev lo ip link set test0 up ip addr add 10.6.0.1 dev test0 # cat /sys/class/net/test0/features # $DIR/test_tunnel_xmit 10 10.5.2.1 ip route add 10.5.2.0/24 dev test0 ip tunnel del test0 The newlink callback is only called in rtnl_netlink, and only if the device is new, as it calls register_netdevice internally. Gre tunnels are created at 'ip tunnel add' with ioctl SIOCADDTUNNEL, which calls ipgre_tunnel_locate, which calls register_netdev. rtnl_newlink is called at 'ip link set', but skips ops->newlink and the device is up with locking still enabled. The equivalent ipip tunnel works fine, btw (just substitute 'method gre' for 'method ipip'). On kernels before /sys/class/net/*/features was removed [1], the first commented out line returns 0x6000 with method gre, which indicates that NETIF_F_LLTX (0x1000) is not set. With ipip, it reports 0x7000. This test cannot be used on recent kernels where the sysfs file is removed (and ETHTOOL_GFEATURES does not currently work for tunnel devices, because they lack dev->ethtool_ops). The second commented out line calls a simple transmission test [2] that sends on 24 cores at maximum rate. Results of a single run: ipip: 19,372,306 gre before patch: 4,839,753 gre after patch: 19,133,873 This patch replicates the condition check in ipgre_newlink to ipgre_tunnel_locate. It works for me, both with oseq on and off. This is the first time I looked at rtnetlink and iproute2 code, though, so someone more knowledgeable should probably check the patch. Thanks. The tail of both functions is now identical, by the way. To avoid code duplication, I'll be happy to rework this and merge the two. [1] http://patchwork.ozlabs.org/patch/104610/ [2] http://kernel.googlecode.com/files/xmit_udp_parallel.c Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: bind() optimize port allocationFlavio Leitner2012-01-251-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Port autoselection finds a port and then drop the lock, then right after that, gets the hash bucket again and lock it. Fix it to go direct. Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: bind() fix autoselection to share portsFlavio Leitner2012-01-251-0/+5
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code checks for conflicts when the application requests a specific port. If there is no conflict, then the request is granted. On the other hand, the port autoselection done by the kernel fails when all ports are bound even when there is a port with no conflict available. The fix changes port autoselection to check if there is a conflict and use it if not. Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: md5: using remote adress for md5 lookup in rst packetshawnlu2012-01-221-1/+1
| | | | | | | | | | | | | | | | | | md5 key is added in socket through remote address. remote address should be used in finding md5 key when sending out reset packet. Signed-off-by: shawnlu <shawn.lu@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: detect loss above high_seq in recoveryYuchung Cheng2012-01-222-27/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Correctly implement a loss detection heuristic: New sequences (above high_seq) sent during the fast recovery are deemed lost when higher sequences are SACKed. Current code does not catch these losses, because tcp_mark_head_lost() does not check packets beyond high_seq. The fix is straight-forward by checking packets until the highest sacked packet. In addition, all the FLAG_DATA_LOST logic are in-effective and redundant and can be removed. Update the loss heuristic comments. The algorithm above is documented as heuristic B, but it is redundant too because heuristic A already covers B. Note that this change only marks some forward-retransmitted packets LOST. It does NOT forbid TCP performing further CWR on new losses. A potential follow-up patch under preparation is to perform another CWR on "new" losses such as 1) sequence above high_seq is lost (by resetting high_seq to snd_nxt) 2) retransmission is lost. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: fix undo after RTO for CUBICNeal Cardwell2012-01-201-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes CUBIC so that cwnd reductions made during RTOs can be undone (just as they already can be undone when using the default/Reno behavior). When undoing cwnd reductions, BIC-derived congestion control modules were restoring the cwnd from last_max_cwnd. There were two problems with using last_max_cwnd to restore a cwnd during undo: (a) last_max_cwnd was set to 0 on state transitions into TCP_CA_Loss (by calling the module's reset() functions), so cwnd reductions from RTOs could not be undone. (b) when fast_covergence is enabled (which it is by default) last_max_cwnd does not actually hold the value of snd_cwnd before the loss; instead, it holds a scaled-down version of snd_cwnd. This patch makes the following changes: (1) upon undo, revert snd_cwnd to ca->loss_cwnd, which is already, as the existing comment notes, the "congestion window at last loss" (2) stop forgetting ca->loss_cwnd on TCP_CA_Loss events (3) use ca->last_max_cwnd to check if we're in slow start Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Sangtae Ha <sangtae.ha@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: fix undo after RTO for BICNeal Cardwell2012-01-201-4/+7
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes BIC so that cwnd reductions made during RTOs can be undone (just as they already can be undone when using the default/Reno behavior). When undoing cwnd reductions, BIC-derived congestion control modules were restoring the cwnd from last_max_cwnd. There were two problems with using last_max_cwnd to restore a cwnd during undo: (a) last_max_cwnd was set to 0 on state transitions into TCP_CA_Loss (by calling the module's reset() functions), so cwnd reductions from RTOs could not be undone. (b) when fast_covergence is enabled (which it is by default) last_max_cwnd does not actually hold the value of snd_cwnd before the loss; instead, it holds a scaled-down version of snd_cwnd. This patch makes the following changes: (1) upon undo, revert snd_cwnd to ca->loss_cwnd, which is already, as the existing comment notes, the "congestion window at last loss" (2) stop forgetting ca->loss_cwnd on TCP_CA_Loss events (3) use ca->last_max_cwnd to check if we're in slow start Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2012-01-174-19/+20
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits) tg3: Fix single-vector MSI-X code openvswitch: Fix multipart datapath dumps. ipv6: fix per device IP snmp counters inetpeer: initialize ->redirect_genid in inet_getpeer() net: fix NULL-deref in WARN() in skb_gso_segment() net: WARN if skb_checksum_help() is called on skb requiring segmentation caif: Remove bad WARN_ON in caif_dev caif: Fix typo in Vendor/Product-ID for CAIF modems bnx2x: Disable AN KR work-around for BCM57810 bnx2x: Remove AutoGrEEEn for BCM84833 bnx2x: Remove 100Mb force speed for BCM84833 bnx2x: Fix PFC setting on BCM57840 bnx2x: Fix Super-Isolate mode for BCM84833 net: fix some sparse errors net: kill duplicate included header net: sh-eth: Fix build error by the value which is not defined net: Use device model to get driver name in skb_gso_segment() bridge: BH already disabled in br_fdb_cleanup() net: move sock_update_memcg outside of CONFIG_INET mwl8k: Fixing Sparse ENDIAN CHECK warning ...
| * inetpeer: initialize ->redirect_genid in inet_getpeer()Dan Carpenter2012-01-171-0/+1
| | | | | | | | | | | | | | | | kmemcheck complains that ->redirect_genid doesn't get initialized. Presumably it should be set to zero. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>