aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv6
Commit message (Collapse)AuthorAgeFilesLines
...
* | Merge remote-tracking branch 'stable/linux-3.0.y' into android-3.0Todd Poynor2012-11-017-68/+118
|\ \ | |/ | | | | | | Change-Id: I9685feb9277b450da10d78a455b3c0674d6cfe18 Signed-off-by: Todd Poynor <toddpoynor@google.com>
| * tcp: resets are misroutedAlexey Kuznetsov2012-10-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 4c67525849e0b7f4bd4fab2487ec9e43ea52ef29 ] After commit e2446eaa ("tcp_v4_send_reset: binding oif to iif in no sock case").. tcp resets are always lost, when routing is asymmetric. Yes, backing out that patch will result in misrouting of resets for dead connections which used interface binding when were alive, but we actually cannot do anything here. What's died that's died and correct handling normal unbound connections is obviously a priority. Comment to comment: > This has few benefits: > 1. tcp_v6_send_reset already did that. It was done to route resets for IPv6 link local addresses. It was a mistake to do so for global addresses. The patch fixes this as well. Actually, the problem appears to be even more serious than guaranteed loss of resets. As reported by Sergey Soloviev <sol@eqv.ru>, those misrouted resets create a lot of arp traffic and huge amount of unresolved arp entires putting down to knees NAT firewalls which use asymmetric routing. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipv6: mip6: fix mip6_mh_filter()Eric Dumazet2012-10-131-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 96af69ea2a83d292238bdba20e4508ee967cf8cb ] mip6_mh_filter() should not modify its input, or else its caller would need to recompute ipv6_hdr() if skb->head is reallocated. Use skb_header_pointer() instead of pskb_may_pull() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipv6: raw: fix icmpv6_filter()Eric Dumazet2012-10-131-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 1b05c4b50edbddbdde715c4a7350629819f6655e ] icmpv6_filter() should not modify its input, or else its caller would need to recompute ipv6_hdr() if skb->head is reallocated. Use skb_header_pointer() instead of pskb_may_pull() and change the prototype to make clear both sk and skb are const. Also, if icmpv6 header cannot be found, do not deliver the packet, as we do in IPv4. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rtGao feng2012-10-131-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6825a26c2dc21eb4f8df9c06d3786ddec97cf53b ] as we hold dst_entry before we call __ip6_del_rt, so we should alse call dst_release not only return -ENOENT when the rt6_info is ip6_null_entry. and we already hold the dst entry, so I think it's safe to call dst_release out of the write-read lock. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipv6: addrconf: Avoid calling netdevice notifiers with RCU read-side lockBen Hutchings2012-10-021-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 4acd4945cd1e1f92b20d14e349c6c6a52acbd42d ] Cong Wang reports that lockdep detected suspicious RCU usage while enabling IPV6 forwarding: [ 1123.310275] =============================== [ 1123.442202] [ INFO: suspicious RCU usage. ] [ 1123.558207] 3.6.0-rc1+ #109 Not tainted [ 1123.665204] ------------------------------- [ 1123.768254] include/linux/rcupdate.h:430 Illegal context switch in RCU read-side critical section! [ 1123.992320] [ 1123.992320] other info that might help us debug this: [ 1123.992320] [ 1124.307382] [ 1124.307382] rcu_scheduler_active = 1, debug_locks = 0 [ 1124.522220] 2 locks held by sysctl/5710: [ 1124.648364] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81768498>] rtnl_trylock+0x15/0x17 [ 1124.882211] #1: (rcu_read_lock){.+.+.+}, at: [<ffffffff81871df8>] rcu_lock_acquire+0x0/0x29 [ 1125.085209] [ 1125.085209] stack backtrace: [ 1125.332213] Pid: 5710, comm: sysctl Not tainted 3.6.0-rc1+ #109 [ 1125.441291] Call Trace: [ 1125.545281] [<ffffffff8109d915>] lockdep_rcu_suspicious+0x109/0x112 [ 1125.667212] [<ffffffff8107c240>] rcu_preempt_sleep_check+0x45/0x47 [ 1125.781838] [<ffffffff8107c260>] __might_sleep+0x1e/0x19b [...] [ 1127.445223] [<ffffffff81757ac5>] call_netdevice_notifiers+0x4a/0x4f [...] [ 1127.772188] [<ffffffff8175e125>] dev_disable_lro+0x32/0x6b [ 1127.885174] [<ffffffff81872d26>] dev_forward_change+0x30/0xcb [ 1128.013214] [<ffffffff818738c4>] addrconf_forward_change+0x85/0xc5 [...] addrconf_forward_change() uses RCU iteration over the netdev list, which is unnecessary since it already holds the RTNL lock. We also cannot reasonably require netdevice notifier functions not to sleep. Reported-by: Cong Wang <amwang@redhat.com> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipv6: Move ipv6 proc file registration to end of init orderThomas Graf2012-07-161-10/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d189634ecab947c10f6f832258b103d0bbfe73cc ] /proc/net/ipv6_route reflects the contents of fib_table_hash. The proc handler is installed in ip6_route_net_init() whereas fib_table_hash is allocated in fib6_net_init() _after_ the proc handler has been installed. This opens up a short time frame to access fib_table_hash with its pants down. Move the registration of the proc files to a later point in the init order to avoid the race. Tested :-) Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * xfrm: take net hdr len into account for esp payload size calculationBenjamin Poirier2012-06-101-11/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 91657eafb64b4cb53ec3a2fbc4afc3497f735788 ] Corrects the function that determines the esp payload size. The calculations done in esp{4,6}_get_mtu() lead to overlength frames in transport mode for certain mtu values and suboptimal frames for others. According to what is done, mainly in esp{,6}_output() and tcp_mtu_to_mss(), net_header_len must be taken into account before doing the alignment calculation. Signed-off-by: Benjamin Poirier <bpoirier@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipv6: fix incorrect ipsec fragmentGao feng2012-06-101-18/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 0c1833797a5a6ec23ea9261d979aa18078720b74 ] Since commit ad0081e43a "ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed" the fragment of packets is incorrect. because tunnel mode needs IPsec headers and trailer for all fragments, while on transport mode it is sufficient to add the headers to the first fragment and the trailer to the last. so modify mtu and maxfraglen base on ipsec mode and if fragment is first or last. with my test,it work well(every fragment's size is the mtu) and does not trigger slow fragment path. Changes from v1: though optimization, mtu_prev and maxfraglen_prev can be delete. replace xfrm mode codes with dst_entry's new frag DST_XFRM_TUNNEL. add fuction ip6_append_data_mtu to make codes clearer. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | ipv6: check return value for dst_allocMadalin Bucur2012-05-091-1/+3
| | | | | | | | | | | | | | return value of dst_alloc must be checked before use Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge commit 'v3.0.30' into android-3.0Todd Poynor2012-04-302-1/+5
|\ \ | |/
| * tcp: fix TCP_MAXSEG for established IPv6 passive socketsNeal Cardwell2012-04-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d135c522f1234f62e81be29cebdf59e9955139ad ] Commit f5fff5d forgot to fix TCP_MAXSEG behavior IPv6 sockets, so IPv6 TCP server sockets that used TCP_MAXSEG would find that the advmss of child sockets would be incorrect. This commit mirrors the advmss logic from tcp_v4_syn_recv_sock in tcp_v6_syn_recv_sock. Eventually this logic should probably be shared between IPv4 and IPv6, but this at least fixes this issue. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipv6: fix array index in ip6_mc_add_src()RongQing.Li2012-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 78d50217baf36093ab320f95bae0d6452daec85c ] Convert array index from the loop bound to the loop index. And remove the void type conversion to ip6_mc_del1_src() return code, seem it is unnecessary, since ip6_mc_del1_src() does not use __must_check similar attribute, no compiler will report the warning when it is removed. v2: enrich the commit header Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge linux-stable 3.0.28 into android-3.0Todd Poynor2012-04-1913-61/+115
|\ \ | |/ | | | | | | Change-Id: Iee820738e53627f5d0447a87ceff34443aa72786 Signed-off-by: Todd Poynor <toddpoynor@google.com>
| * net: fix a potential rcu_read_lock() imbalance in rt6_fill_node()Eric Dumazet2012-04-021-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 94f826b8076e2cb92242061e92f21b5baa3eccc2 ] Commit f2c31e32b378 (net: fix NULL dereferences in check_peer_redir() ) added a regression in rt6_fill_node(), leading to rcu_read_lock() imbalance. Thats because NLA_PUT() can make a jump to nla_put_failure label. Fix this by using nla_put() Many thanks to Ben Greear for his help Reported-by: Ben Greear <greearb@candelatech.com> Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.RongQing.Li2012-03-231-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c577923756b7fe9071f28a76b66b83b306d1d001 ] ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't need to dev_hold(). With dev_hold(), not corresponding dev_put(), will lead to leak. [ bug introduced in 96b52e61be1 (ipv6: mcast: RCU conversions) ] Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * IPv6: Fix not join all-router mcast group when forwarding set.Li Wei2012-03-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d6ddef9e641d1229d4ec841dc75ae703171c3e92 ] When forwarding was set and a new net device is register, we need add this device to the all-router mcast group. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipsec: be careful of non existing mac headersEric Dumazet2012-03-192-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 03606895cd98c0a628b17324fd7b5ff15db7e3cd ] Niccolo Belli reported ipsec crashes in case we handle a frame without mac header (atm in his case) Before copying mac header, better make sure it is present. Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809 Reported-by: Niccolò Belli <darkbasic@linuxsystems.it> Tested-by: Niccolò Belli <darkbasic@linuxsystems.it> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipv6-multicast: Fix memory leak in IPv6 multicast.Ben Greear2012-02-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 67928c4041606f02725f3c95c4c0404e4532df1b ] If reg_vif_xmit cannot find a routing entry, be sure to free the skb before returning the error. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipv6-multicast: Fix memory leak in input path.Ben Greear2012-02-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2015de5fe2a47086a3260802275932bfd810884e ] Have to free the skb before returning if we fail the fib lookup. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: fix NULL dereferences in check_peer_redir()Eric Dumazet2012-02-136-37/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d3aaeb38c40e5a6c08dd31a1b64da65c4352be36, along with dependent backports of commits: 69cce1d1404968f78b177a0314f5822d5afdbbfb 9de79c127cccecb11ae6a21ab1499e87aa222880 218fa90f072e4aeff9003d57e390857f4f35513e 580da35a31f91a594f3090b7a2c39b85cb051a12 f7e57044eeb1841847c24aa06766c8290c202583 e049f28883126c689cf95859480d9ee4ab23b7fa ] Gergely Kalman reported crashes in check_peer_redir(). It appears commit f39925dbde778 (ipv4: Cache learned redirect information in inetpeer.) added a race, leading to possible NULL ptr dereference. Since we can now change dst neighbour, we should make sure a reader can safely use a neighbour. Add RCU protection to dst neighbour, and make sure check_peer_redir() can be called safely by different cpus in parallel. As neighbours are already freed after one RCU grace period, this patch should not add typical RCU penalty (cache cold effects) Many thanks to Gergely for providing a pretty report pointing to the bug. Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * tcp: md5: using remote adress for md5 lookup in rst packetshawnlu2012-02-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 8a622e71f58ec9f092fc99eacae0e6cf14f6e742 ] md5 key is added in socket through remote address. remote address should be used in finding md5 key when sending out reset packet. Signed-off-by: shawnlu <shawn.lu@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ah: Don't return NET_XMIT_DROP on input.Nick Bowler2012-02-031-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4b90a603a1b21d63cf743cc833680cb195a729f6 upstream. When the ahash driver returns -EBUSY, AH4/6 input functions return NET_XMIT_DROP, presumably copied from the output code path. But returning transmit codes on input doesn't make a lot of sense. Since NET_XMIT_DROP is a positive int, this gets interpreted as the next header type (i.e., success). As that can only end badly, remove the check. Signed-off-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ah: Read nexthdr value before overwriting it in ahash input callback.Nick Bowler2012-01-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | commit b7ea81a58adc123a4e980cb0eff9eb5c144b5dc7 upstream. The AH4/6 ahash input callbacks read out the nexthdr field from the AH header *after* they overwrite that header. This is obviously not going to end well. Fix it up. Signed-off-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * ah: Correctly pass error codes in ahash output callback.Nick Bowler2012-01-251-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 069294e813ed5f27f82613b027609bcda5f1b914 upstream. The AH4/6 ahash output callbacks pass nexthdr to xfrm_output_resume instead of the error code. This appears to be a copy+paste error from the input case, where nexthdr is expected. This causes the driver to continuously add AH headers to the datagram until either an allocation fails and the packet is dropped or the ahash driver hits a synchronous fallback and the resulting monstrosity is transmitted. Correct this issue by simply passing the error code unadulterated. Signed-off-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * ipip, sit: copy parms.name after register_netdeviceTed Feng2012-01-061-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 72b36015ba43a3cca5303f5534d2c3e1899eae29 upstream. Same fix as 731abb9cb2 for ipip and sit tunnel. Commit 1c5cae815d removed an explicit call to dev_alloc_name in ipip_tunnel_locate and ipip6_tunnel_locate, because register_netdevice will now create a valid name, however the tunnel keeps a copy of the name in the private parms structure. Fix this by copying the name back after register_netdevice has successfully returned. This shows up if you do a simple tunnel add, followed by a tunnel show: $ sudo ip tunnel add mode ipip remote 10.2.20.211 $ ip tunnel tunl0: ip/ip remote any local any ttl inherit nopmtudisc tunl%d: ip/ip remote 10.2.20.211 local any ttl inherit $ sudo ip tunnel add mode sit remote 10.2.20.212 $ ip tunnel sit0: ipv6/ip remote any local any ttl 64 nopmtudisc 6rd-prefix 2002::/16 sit%d: ioctl 89f8 failed: No such device sit%d: ipv6/ip remote 10.2.20.212 local any ttl inherit Signed-off-by: Ted Feng <artisdom@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * ip6_tunnel: copy parms.name after register_netdeviceJosh Boyer2011-11-261-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 731abb9cb27aef6013ce60808a04e04a545f3f4e upstream. Commit 1c5cae815d removed an explicit call to dev_alloc_name in ip6_tnl_create because register_netdevice will now create a valid name. This works for the net_device itself. However the tunnel keeps a copy of the name in the parms structure for the ip6_tnl associated with the tunnel. parms.name is set by copying the net_device name in ip6_tnl_dev_init_gen. That function is called from ip6_tnl_dev_init in ip6_tnl_create, but it is done before register_netdevice is called so the name is set to a bogus value in the parms.name structure. This shows up if you do a simple tunnel add, followed by a tunnel show: [root@localhost ~]# ip -6 tunnel add remote fec0::100 local fec0::200 [root@localhost ~]# ip -6 tunnel show ip6tnl0: ipv6/ipv6 remote :: local :: encaplimit 0 hoplimit 0 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000) ip6tnl%d: ipv6/ipv6 remote fec0::100 local fec0::200 encaplimit 4 hoplimit 64 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000) [root@localhost ~]# Fix this by moving the strcpy out of ip6_tnl_dev_init_gen, and calling it after register_netdevice has successfully returned. Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socketYan, Zheng2011-11-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 676a1184e8afd4fed7948232df1ff91517400859 ] ipv6_ac_list and ipv6_fl_list from listening socket are inadvertently shared with new socket created for connection. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * tcp: properly handle md5sig_pool referencesYan, Zheng2011-11-111-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 260fcbeb1ae9e768a44c9925338fbacb0d7e5ba9 ] tcp_v4_clear_md5_list() assumes that multiple tcp md5sig peers only hold one reference to md5sig_pool. but tcp_v4_md5_do_add() increases use count of md5sig_pool for each peer. This patch makes tcp_v4_md5_do_add() only increases use count for the first tcp md5sig peer. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | Merge commit 'v3.0.8' into android-3.0Colin Cross2011-10-278-12/+49
|\ \ | |/
| * ipv6: fix NULL dereference in udp6_ufo_fragment()Jason Wang2011-10-162-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the issue caused by ef81bb40bf15f350fe865f31fa42f1082772a576 which is a backport of upstream 87c48fa3b4630905f98268dde838ee43626a060c. The problem does not exist in upstream. We do not check whether route is attached before trying to assign ip identification through route dest which lead NULL pointer dereference. This happens when host bridge transmit a packet from guest. This patch changes ipv6_select_ident() to accept in6_addr as its paramter and fix the issue by using the destination address in ipv6 header when no route is attached. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * tcp: initialize variable ecn_ok in syncookies pathMike Waychison2011-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f0e3d0689da401f7d1981c2777a714ba295ea5ff ] Using a gcc 4.4.3, warnings are emitted for a possibly uninitialized use of ecn_ok. This can happen if cookie_check_timestamp() returns due to not having seen a timestamp. Defaulting to ecn off seems like a reasonable thing to do in this case, so initialized ecn_ok to false. Signed-off-by: Mike Waychison <mikew@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * mcast: Fix source address selection for multicast listener reportYan, Zheng2011-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e05c4ad3ed874ee4f5e2c969e55d318ec654332c ] Should check use count of include mode filter instead of total number of include mode filters. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * ipv6: Fix ipv6_getsockopt for IPV6_2292PKTOPTIONSDaniel Baluta2011-10-031-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 98e77438aed3cd3343cbb86825127b1d9d2bea33 ] IPV6_2292PKTOPTIONS is broken for 32-bit applications running in COMPAT mode on 64-bit kernels. The same problem was fixed for IPv4 with the patch: ipv4: Fix ip_getsockopt for IP_PKTOPTIONS, commit dd23198e58cd35259dd09e8892bbdb90f1d57748 Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com> Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * ipv6: make fragment identifications less predictableEric Dumazet2011-08-153-6/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Backport of upstream commit 87c48fa3b4630905f98268dde838ee43626a060c ] Fernando Gont reported current IPv6 fragment identification generation was not secure, because using a very predictable system-wide generator, allowing various attacks. IPv4 uses inetpeer cache to address this problem and to get good performance. We'll use this mechanism when IPv6 inetpeer is stable enough in linux-3.1 For the time being, we use jhash on destination address to provide less predictable identifications. Also remove a spinlock and use cmpxchg() to get better SMP performance. Reported-by: Fernando Gont <fernando@gont.com.ar> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * net: Compute protocol sequence numbers and fragment IDs using MD5.David S. Miller2011-08-152-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Computers have become a lot faster since we compromised on the partial MD4 hash which we use currently for performance reasons. MD5 is a much safer choice, and is inline with both RFC1948 and other ISS generators (OpenBSD, Solaris, etc.) Furthermore, only having 24-bits of the sequence number be truly unpredictable is a very serious limitation. So the periodic regeneration and 8-bit counter have been removed. We compute and use a full 32-bit sequence number. For ipv6, DCCP was found to use a 32-bit truncated initial sequence number (it needs 43-bits) and that is fixed here as well. Reported-by: Dan Kaminsky <dan@doxpara.com> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | netfilter: ipv6: fix crash caused by ipv6_find_hdr()JP Abgrall2011-09-301-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When calling: ipv6_find_hdr(skb, &thoff, -1, NULL) on a fragmented packet, thoff would be left with a random value causing callers to read random memory offsets with: skb_header_pointer(skb, thoff, ...) Now we force ipv6_find_hdr() to return a failure in this case. Calling: ipv6_find_hdr(skb, &thoff, -1, &fragoff) will set fragoff as expected, and not return a failure. Change-Id: Ib474e8a4267dd2b300feca325811330329684a88 Signed-off-by: JP Abgrall <jpa@google.com>
* | ipv6: updates to privacy addresses per RFC 4941JP Abgrall2011-08-041-21/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the code to handle some of the differences between RFC 3041 and RFC 4941, which obsoletes it. Also a couple of janitorial fixes. - Allow router advertisements to increase the lifetime of temporary addresses. This was not allowed by RFC 3041, but is specified by RFC 4941. It is useful when RA lifetimes are lower than TEMP_{VALID,PREFERRED}_LIFETIME: in this case, the previous code would delete or deprecate addresses prematurely. - Change the default of MAX_RETRY to 3 per RFC 4941. - Add a comment to clarify that the preferred and valid lifetimes in inet6_ifaddr are relative to the timestamp. - Shorten lines to 80 characters in a couple of places. Change-Id: I4da097664d4b1de7c1cebf410895319601c7f1cc Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: JP Abgrall <jpa@google.com>
* | Merge commit 'v3.0-rc7' into android-3.0Colin Cross2011-07-122-17/+10
|\ \ | |/
| * net: bind() fix error return on wrong address familyMarcus Meissner2011-07-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hi, Reinhard Max also pointed out that the error should EAFNOSUPPORT according to POSIX. The Linux manpages have it as EINVAL, some other OSes (Minix, HPUX, perhaps BSD) use EAFNOSUPPORT. Windows uses WSAEFAULT according to MSDN. Other protocols error values in their af bind() methods in current mainline git as far as a brief look shows: EAFNOSUPPORT: atm, appletalk, l2tp, llc, phonet, rxrpc EINVAL: ax25, bluetooth, decnet, econet, ieee802154, iucv, netlink, netrom, packet, rds, rose, unix, x25, No check?: can/raw, ipv6/raw, irda, l2tp/l2tp_ip Ciao, Marcus Signed-off-by: Marcus Meissner <meissner@suse.de> Cc: Reinhard Max <max@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: Don't put artificial limit on routing table size.David S. Miller2011-07-011-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPV6, unlike IPV4, doesn't have a routing cache. Routing table entries, as well as clones made in response to route lookup requests, all live in the same table. And all of these things are together collected in the destination cache table for ipv6. This means that routing table entries count against the garbage collection limits, even though such entries cannot ever be reclaimed and are added explicitly by the administrator (rather than being created in response to lookups). Therefore it makes no sense to count ipv6 routing table entries against the GC limits. Add a DST_NOCOUNT destination cache entry flag, and skip the counting if it is set. Use this flag bit in ipv6 when adding routing table entries. Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: Don't change dst->flags using assignments.David S. Miller2011-07-011-10/+2
| | | | | | | | | | | | This blows away any flags already set in the entry. Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge commit 'v3.0-rc6' into android-3.0Dima Zavin2011-07-071-1/+4
|\ \ | |/
| * udp/recvmsg: Clear MSG_TRUNC flag when starting over for a new packetXufeng Zhang2011-06-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider this scenario: When the size of the first received udp packet is bigger than the receive buffer, MSG_TRUNC bit is set in msg->msg_flags. However, if checksum error happens and this is a blocking socket, it will goto try_again loop to receive the next packet. But if the size of the next udp packet is smaller than receive buffer, MSG_TRUNC flag should not be set, but because MSG_TRUNC bit is not cleared in msg->msg_flags before receive the next packet, MSG_TRUNC is still set, which is wrong. Fix this problem by clearing MSG_TRUNC flag when starting over for a new packet. Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6/udp: Use the correct variable to determine non-blocking conditionXufeng Zhang2011-06-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | udpv6_recvmsg() function is not using the correct variable to determine whether or not the socket is in non-blocking operation, this will lead to unexpected behavior when a UDP checksum error occurs. Consider a non-blocking udp receive scenario: when udpv6_recvmsg() is called by sock_common_recvmsg(), MSG_DONTWAIT bit of flags variable in udpv6_recvmsg() is cleared by "flags & ~MSG_DONTWAIT" in this call: err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len); i.e. with udpv6_recvmsg() getting these values: int noblock = flags & MSG_DONTWAIT int flags = flags & ~MSG_DONTWAIT So, when udp checksum error occurs, the execution will go to csum_copy_err, and then the problem happens: csum_copy_err: ............... if (flags & MSG_DONTWAIT) return -EAGAIN; goto try_again; ............... But it will always go to try_again as MSG_DONTWAIT has been cleared from flags at call time -- only noblock contains the original value of MSG_DONTWAIT, so the test should be: if (noblock) return -EAGAIN; This is also consistent with what the ipv4/udp code does. Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge commit 'v3.0-rc5' into android-3.0Colin Cross2011-06-292-1/+3
|\ \ | |/
| * net: rfs: enable RFS before first data packet is receivedEric Dumazet2011-06-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Le jeudi 16 juin 2011 à 23:38 -0400, David Miller a écrit : > From: Ben Hutchings <bhutchings@solarflare.com> > Date: Fri, 17 Jun 2011 00:50:46 +0100 > > > On Wed, 2011-06-15 at 04:15 +0200, Eric Dumazet wrote: > >> @@ -1594,6 +1594,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb) > >> goto discard; > >> > >> if (nsk != sk) { > >> + sock_rps_save_rxhash(nsk, skb->rxhash); > >> if (tcp_child_process(sk, nsk, skb)) { > >> rsk = nsk; > >> goto reset; > >> > > > > I haven't tried this, but it looks reasonable to me. > > > > What about IPv6? The logic in tcp_v6_do_rcv() looks very similar. > > Indeed ipv6 side needs the same fix. > > Eric please add that part and resubmit. And in fact I might stick > this into net-2.6 instead of net-next-2.6 > OK, here is the net-2.6 based one then, thanks ! [PATCH v2] net: rfs: enable RFS before first data packet is received First packet received on a passive tcp flow is not correctly RFS steered. One sock_rps_record_flow() call is missing in inet_accept() But before that, we also must record rxhash when child socket is setup. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
| * netfilter: fix looped (broad|multi)cast's MAC handlingNicolas Cavallari2011-06-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By default, when broadcast or multicast packet are sent from a local application, they are sent to the interface then looped by the kernel to other local applications, going throught netfilter hooks in the process. These looped packet have their MAC header removed from the skb by the kernel looping code. This confuse various netfilter's netlink queue, netlink log and the legacy ip_queue, because they try to extract a hardware address from these packets, but extracts a part of the IP header instead. This patch prevent NFQUEUE, NFLOG and ip_QUEUE to include a MAC header if there is none in the packet. Signed-off-by: Nicolas Cavallari <cavallar@lri.fr> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | netfilter: have ip*t REJECT set the sock err when an icmp is to be sentJP Abgrall2011-06-202-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the REJECT --reject-with icmp*blabla to also set the matching error locally on the socket affected by the reject. This allows the process to see an error almost as if it received it via ICMP. It avoids the local process who's ingress packet is rejected to have to wait for a pseudo-eternity until some timeout kicks in. Ideally, this should be enabled with a new iptables flag similar to --reject-with-sock-err For now it is enabled with CONFIG_IP*_NF_TARGET_REJECT_SKERR option. Change-Id: I649a4fd5940029ec0b3233e5abb205da6984891e Signed-off-by: JP Abgrall <jpa@google.com>
* | net: Support nuking IPv6 sockets as well as IPv4.Lorenzo Colitti2011-06-141-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | On Linux, when an interface goes down all its IPv6 addresses are deleted, so relying on knowing the previous IPv6 addresses on the interface is brittle. Instead, support nuking all sockets that are bound to IP addresses that are not configured and up on the system. This behaviour is triggered by specifying the unspecified address (:: or 0.0.0.0). If an IP address is specified, the behaviour is unchanged, except the ioctl now supports IPv6 as well as IPv4. Signed-off-by: Lorenzo Colitti <lorenzo@google.com>