aboutsummaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* net: fix iterating over hashtable in tcp_nuke_addr()Dmitry Torokhov2016-10-291-1/+1
| | | | | | | | | The actual size of the tcp hashinfo table is tcp_hashinfo.ehash_mask + 1 so we need to adjust the loop accordingly to get the sockets hashed into the last bucket. Change-Id: I796b3c7b4a1a7fa35fba9e5192a4a403eb6e17de Signed-off-by: Dmitry Torokhov <dtor@google.com>
* netfilter: x_tables: check for size overflowFlorian Westphal2016-10-291-0/+4
| | | | | | | | | | | | Ben Hawkes says: integer overflow in xt_alloc_table_info, which on 32-bit systems can lead to small structure allocation and a copy_from_user based heap corruption. Change-Id: I13c554c630651a37e3f6a195e9a5f40cddcb29a1 Reported-by: Ben Hawkes <hawkes@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: x_tables: fix unconditional helperFlorian Westphal2016-10-293-33/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ben Hawkes says: In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it is possible for a user-supplied ipt_entry structure to have a large next_offset field. This field is not bounds checked prior to writing a counter value at the supplied offset. Problem is that mark_source_chains should not have been called -- the rule doesn't have a next entry, so its supposed to return an absolute verdict of either ACCEPT or DROP. However, the function conditional() doesn't work as the name implies. It only checks that the rule is using wildcard address matching. However, an unconditional rule must also not be using any matches (no -m args). The underflow validator only checked the addresses, therefore passing the 'unconditional absolute verdict' test, while mark_source_chains also tested for presence of matches, and thus proceeeded to the next (not-existent) rule. Unify this so that all the callers have same idea of 'unconditional rule'. Change-Id: Id2b4779f2e41b1a82b1d266bb9e11118c4428afc Reported-by: Ben Hawkes <hawkes@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* ipv4: Don't do expensive useless work during inetdev destroy.David S. Miller2016-10-293-2/+18
| | | | | | | | | | | | | | | | | | When an inetdev is destroyed, every address assigned to the interface is removed. And in this scenerio we do two pointless things which can be very expensive if the number of assigned interfaces is large: 1) Address promotion. We are deleting all addresses, so there is no point in doing this. 2) A full nf conntrack table purge for every address. We only need to do this once, as is already caught by the existing masq_dev_notifier so masq_inet_event() can skip this. Change-Id: I4b2a3ed665543728451c21465fb90ec89f739135 Reported-by: Solar Designer <solar@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Cyrill Gorcunov <gorcunov@openvz.org>
* ipv6: addrconf: validate new MTU before applying itMarcelo Leitner2016-10-291-1/+16
| | | | | | | | | | | | | | | | | | | | | | | Currently we don't check if the new MTU is valid or not and this allows one to configure a smaller than minimum allowed by RFCs or even bigger than interface own MTU, which is a problem as it may lead to packet drops. If you have a daemon like NetworkManager running, this may be exploited by remote attackers by forging RA packets with an invalid MTU, possibly leading to a DoS. (NetworkManager currently only validates for values too small, but not for too big ones.) The fix is just to make sure the new value is valid. That is, between IPV6_MIN_MTU and interface's MTU. Note that similar check is already performed at ndisc_router_discovery(), for when kernel itself parses the RA. Change-Id: I6b70d0c12a77c7932066982f8797d8024f130d7c Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* UPSTREAM: net: Fix use after free in the recvmmsg exit pathArnaldo Carvalho de Melo2016-10-291-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (cherry picked from commit 34b88a68f26a75e4fded796f1a49c40f82234b7d) The syzkaller fuzzer hit the following use-after-free: Call Trace: [<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295 [<ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261 [< inline >] SYSC_recvmmsg net/socket.c:2281 [<ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270 [<ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185 And, as Dmitry rightly assessed, that is because we can drop the reference and then touch it when the underlying recvmsg calls return some packets and then hit an error, which will make recvmmsg to set sock->sk->sk_err, oops, fix it. Reported-and-Tested-by: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Fixes: a2e2725541fa ("net: Introduce recvmmsg socket syscall") http://lkml.kernel.org/r/20160122211644.GC2470@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: I2adb0faf595b7b634d9b739dfdd1a47109e20ecb Bug: 30515201
* UPSTREAM: netfilter: x_tables: make sure e->next_offset covers remaining ↵Florian Westphal2016-10-293-6/+12
| | | | | | | | | | | | | blob size (cherry pick from commit 6e94e0cfb0887e4013b3b930fa6ab1fe6bb6ba91) Otherwise this function may read data beyond the ruleset blob. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Change-Id: I9d19ecf3e00a2d52817b35b9042623927895c005 Bug: 29637687
* UPSTREAM: netfilter: x_tables: validate e->target_offset earlyFlorian Westphal2016-10-293-27/+24
| | | | | | | | | | | | | (cherry pick from commit bdf533de6968e9686df777dc178486f600c6e617) We should check that e->target_offset is sane before mark_source_chains gets called since it will fetch the target entry for loop detection. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Change-Id: Ic2dbc31c9525d698e94d4d8875886acf3524abbd Bug: 29637687
* ipv6: sysctl to restrict candidate source addressesErik Kline2016-10-291-1/+18
| | | | | | | | | | | | | | | | | | | | Per RFC 6724, section 4, "Candidate Source Addresses": It is RECOMMENDED that the candidate source addresses be the set of unicast addresses assigned to the interface that will be used to send to the destination (the "outgoing" interface). Add a sysctl to enable this behaviour. Signed-off-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [Simplified back-port of net-next 3985e8a3611a93bb36789f65db862e5700aab65e] Bug: 19470192 Bug: 21832279 Bug: 22464419 Change-Id: Icd96382f814a6f3ea53f05beb98c266b1929c5a3
* net: validate the range we feed to iov_iter_init() in sys_sendto/sys_recvfromAl Viro2016-10-291-0/+4
| | | | | | | | Bug: 28759139 Change-Id: I561a14b514d714838ef539a94275b117d7f475f4 Cc: stable@vger.kernel.org # v3.19 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* fix infoleak in rtnetlinkkangjie2016-10-291-8/+8
| | | | | | | | | | | the stack object “map” has a total size of 32 bytes. Its last 4 bytes are padding generated by compiler. These padding bytes are not initialized and sent out via “nla_put” Bug: 28620102 Change-Id: I13da380c6fe8abca49e3cf9f05293c02b44d2e5e Signed-off-by: kangjie <kangjielu@gmail.com>
* Replace %p with %pK to prevent leaking kernel addressMohamad Ayyash2016-10-291-1/+1
| | | | | | BUG: 27532522 Change-Id: Ic0710a9a8cfc682acd88ecf3bbfeece2d798c4a4 Signed-off-by: Mohamad Ayyash <mkayyash@google.com>
* BACKPORT: Bluetooth: Fix potential NULL dereference in RFCOMM bind callbackJaganath Kanakkassery2016-10-291-7/+11
| | | | | | | | | | | (cherry picked from 951b6a0717db97ce420547222647bcc40bf1eacd) addr can be NULL and it should not be dereferenced before NULL checking. Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Change-Id: I18bda54bb1427d9443a39a04a5c551720118dc26 Bug: 30149612
* tcp: Use supported random32 in pre-3.8 kernelsMatt Mower2016-10-291-1/+1
| | | | | | | | | | Commit 'tcp: make challenge acks less predictable' makes use of prandom_u32(), which was not introduced until kernel 3.8. It is essentially a more accurate name given to random32() available in pre-3.8 kernels. Change-Id: I6cd471acd68b0c4d08e87aaea28c2284b8c9fce0 Signed-off-by: Matt Mower <mowerm@gmail.com>
* BACKPORT: tcp: make challenge acks less predictableEric Dumazet2016-10-291-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (cherry picked from commit 75ff39ccc1bd5d3c455b6822ab09e533c551f758) Yue Cao claims that current host rate limiting of challenge ACKS (RFC 5961) could leak enough information to allow a patient attacker to hijack TCP sessions. He will soon provide details in an academic paper. This patch increases the default limit from 100 to 1000, and adds some randomization so that the attacker can no longer hijack sessions without spending a considerable amount of probes. Based on initial analysis and patch from Linus. Note that we also have per socket rate limiting, so it is tempting to remove the host limit in the future. v2: randomize the count of challenge acks per second, not the period. Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2") Reported-by: Yue Cao <ycao009@ucr.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Change-Id: Ib46ba66f5e4a5a7c81bfccd7b0aa83c3d9e1b3bb Bug: 30809774
* proc: Usable inode numbers for the namespace file descriptors.Eric W. Biederman2016-04-031-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | Assign a unique proc inode to each namespace, and use that inode number to ensure we only allocate at most one proc inode for every namespace in proc. A single proc inode per namespace allows userspace to test to see if two processes are in the same namespace. This has been a long requested feature and only blocked because a naive implementation would put the id in a global space and would ultimately require having a namespace for the names of namespaces, making migration and certain virtualization tricks impossible. We still don't have per superblock inode numbers for proc, which appears necessary for application unaware checkpoint/restart and migrations (if the application is using namespace file descriptors) but that is now allowd by the design if it becomes important. I have preallocated the ipc and uts initial proc inode numbers so their structures can be statically initialized. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> (cherry picked from commit 98f842e675f96ffac96e6c50315790912b2812be)
* security: remove the security_netlink_recv hook as it is equivalent to capable()Eric Paris2016-03-117-7/+7
| | | | | | | | | | Once upon a time netlink was not sync and we had to get the effective capabilities from the skb that was being received. Today we instead get the capabilities from the current task. This has rendered the entire purpose of the hook moot as it is now functionally equivalent to the capable() call. Signed-off-by: Eric Paris <eparis@redhat.com>
* nf: IDLETIMER: Adds the uid field in the msgRuchi Kandoi2016-03-111-5/+32
| | | | | | | | | | | Message notifications contains an additional uid field. This field represents the uid that was responsible for waking the radio. And hence it is present only in notifications stating that the radio is now active. Change-Id: I18fc73eada512e370d7ab24fc9f890845037b729 Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com> Bug: 20264396
* netfilter: IDLETIMER: fix invalid deference of timerJP Abgrall2016-03-111-1/+1
| | | | | | | | "timer" was checked for null, but used later without being re-checked. Change-Id: Ib4d08cd49860c9f157d1cac556705ba85cd44f4e Reported-by: dan.carpenter@oracle.com Signed-off-by: JP Abgrall <jpa@google.com>
* nf: Remove compilation error caused byRuchi Kandoi2016-03-111-1/+0
| | | | | | e254d2c28c880da28626af6d53b7add5f7d6afee Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
* nf: IDLETIMER: time-stamp and suspend/resume handling.Ruchi Kandoi2016-03-111-18/+152
| | | | | | | | | | | | | | | | | Message notifications contains an additional timestamp field in nano seconds. The expiry time for the timers are modified during suspend/resume. If timer was supposed to expire while the system is suspended then a notification is sent when it resumes with the timestamp of the scheduled expiry. Removes the race condition for multiple work scheduled. Bug: 13247811 Change-Id: I752c5b00225fe7085482819f975cc0eb5af89bff Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com> Conflicts: net/netfilter/xt_IDLETIMER.c
* vfs: fix up ENOIOCTLCMD error handlingLinus Torvalds2016-01-081-15/+1
| | | | | | | | | | | | | | | | | | | | | We're doing some odd things there, which already messes up various users (see the net/socket.c code that this removes), and it was going to add yet more crud to the block layer because of the incorrect error code translation. ENOIOCTLCMD is not an error return that should be returned to user mode from the "ioctl()" system call, but it should *not* be translated as EINVAL ("Invalid argument"). It should be translated as ENOTTY ("Inappropriate ioctl for device"). That EINVAL confusion has apparently so permeated some code that the block layer actually checks for it, which is sad. We continue to do so for now, but add a big comment about how wrong that is, and we should remove it entirely eventually. In the meantime, this tries to keep the changes localized to just the EINVAL -> ENOTTY fix, and removing code that makes it harder to do the right thing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* netfilter: nf_conntrack_dccp: fix skb_header_pointer API usagesDaniel Borkmann2016-01-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | Some occurences in the netfilter tree use skb_header_pointer() in the following way ... struct dccp_hdr _dh, *dh; ... skb_header_pointer(skb, dataoff, sizeof(_dh), &dh); ... where dh itself is a pointer that is being passed as the copy buffer. Instead, we need to use &_dh as the forth argument so that we're copying the data into an actual buffer that sits on the stack. Currently, we probably could overwrite memory on the stack (e.g. with a possibly mal-formed DCCP packet), but unintentionally, as we only want the buffer to be placed into _dh variable. Change-Id: I73508115362de789b6133f1f6d103210f6965d18 Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: conntrack: disable generic tracking for known protocolsFlorian Westphal2016-01-051-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given following iptables ruleset: -P FORWARD DROP -A FORWARD -m sctp --dport 9 -j ACCEPT -A FORWARD -p tcp --dport 80 -j ACCEPT -A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT One would assume that this allows SCTP on port 9 and TCP on port 80. Unfortunately, if the SCTP conntrack module is not loaded, this allows *all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT, which we think is a security issue. This is because on the first SCTP packet on port 9, we create a dummy "generic l4" conntrack entry without any port information (since conntrack doesn't know how to extract this information). All subsequent packets that are unknown will then be in established state since they will fallback to proto_generic and will match the 'generic' entry. Our originally proposed version [1] completely disabled generic protocol tracking, but Jozsef suggests to not track protocols for which a more suitable helper is available, hence we now mitigate the issue for in tree known ct protocol helpers only, so that at least NAT and direction information will still be preserved for others. [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html Joint work with Daniel Borkmann. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Conflicts: net/netfilter/nf_conntrack_proto_generic.c Change-Id: I5f7ddfa2d29672cc4533ad907bd7605bdc6c4bf7
* net: add validation for the socket syscall protocol argumentHannes Frederic Sowa2016-01-055-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 郭永刚 reported that one could simply crash the kernel as root by using a simple program: int socket_fd; struct sockaddr_in addr; addr.sin_port = 0; addr.sin_addr.s_addr = INADDR_ANY; addr.sin_family = 10; socket_fd = socket(10,3,0x40000000); connect(socket_fd , &addr,16); AF_INET, AF_INET6 sockets actually only support 8-bit protocol identifiers. inet_sock's skc_protocol field thus is sized accordingly, thus larger protocol identifiers simply cut off the higher bits and store a zero in the protocol fields. This could lead to e.g. NULL function pointer because as a result of the cut off inet_num is zero and we call down to inet_autobind, which is NULL for raw sockets. kernel: Call Trace: kernel: [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70 kernel: [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80 kernel: [<ffffffff81645069>] SYSC_connect+0xd9/0x110 kernel: [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80 kernel: [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200 kernel: [<ffffffff81645e0e>] SyS_connect+0xe/0x10 kernel: [<ffffffff81779515>] tracesys_phase2+0x84/0x89 I found no particular commit which introduced this problem. Change-Id: I653fad90da54908144cc8916c2dccb1fa6f14eed CVE: CVE-2015-8543 Cc: Cong Wang <cwang@twopensource.com> Reported-by: 郭永刚 <guoyonggang@360.cn> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* bluetooth: Validate socket address length in sco_sock_bind().David S. Miller2016-01-051-0/+3
| | | | | Change-Id: I890640975f1af64f71947b6a1820249e08f6375b Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: try to cache dst_entries which would cause a redirectHannes Frederic Sowa2016-01-052-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Not caching dst_entries which cause redirects could be exploited by hosts on the same subnet, causing a severe DoS attack. This effect aggravated since commit f88649721268999 ("ipv4: fix dst race in sk_dst_get()"). Lookups causing redirects will be allocated with DST_NOCACHE set which will force dst_release to free them via RCU. Unfortunately waiting for RCU grace period just takes too long, we can end up with >1M dst_entries waiting to be released and the system will run OOM. rcuos threads cannot catch up under high softirq load. Attaching the flag to emit a redirect later on to the specific skb allows us to cache those dst_entries thus reducing the pressure on allocation and deallocation. This issue was discovered by Marcelo Leitner. Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Marcelo Leitner <mleitner@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Conflicts: include/net/ip.h net/ipv4/route.c Change-Id: I53e4b500a4db2f5fece937a42a3bd810b2640c44
* udp: fix behavior of wrong checksumsEric Dumazet2016-01-052-8/+4
| | | | | | | | | | | | | | | | | | | | | | We have two problems in UDP stack related to bogus checksums : 1) We return -EAGAIN to application even if receive queue is not empty. This breaks applications using edge trigger epoll() 2) Under UDP flood, we can loop forever without yielding to other processes, potentially hanging the host, especially on non SMP. This patch is an attempt to make things better. We might in the future add extra support for rt applications wanting to better control time spent doing a recv() in a hostile environment. For example we could validate checksums before queuing packets in socket receive queue. Change-Id: I9355321ac7ee564d56c342fa7738b918052bf308 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Missing sk_nulls_node_init() in ping_unhash().David S. Miller2016-01-051-0/+1
| | | | | | | | | | | | If we don't do that, then the poison value is left in the ->pprev backlink. This can cause crashes if we do a disconnect, followed by a connect(). Change-Id: I13536e10c0b67bf9830af91f7ed3c4f3653d11cb Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Wen Xu <hotdog3645@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: llc: use correct size for sysctl timeout entriesSasha Levin2016-01-051-4/+4
| | | | | | | | | | The timeout entries are sizeof(int) rather than sizeof(long), which means that when they were getting read we'd also leak kernel memory to userspace along with the timeout values. Change-Id: I328d1186720a6f70f555eeeb62c83ee69814868d Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Don't reduce hop limit for an interfaceD.S. Ljungmark2016-01-051-1/+8
| | | | | | | | | | | | | | | | | | | | A local route may have a lower hop_limit set than global routes do. RFC 3756, Section 4.2.7, "Parameter Spoofing" > 1. The attacker includes a Current Hop Limit of one or another small > number which the attacker knows will cause legitimate packets to > be dropped before they reach their destination. > As an example, one possible approach to mitigate this threat is to > ignore very small hop limits. The nodes could implement a > configurable minimum hop limit, and ignore attempts to set it below > said limit. Change-Id: I51ee1778e3d2d5fa1aefbdf1ad8869e4e8dc28b2 Signed-off-by: D.S. Ljungmark <ljungmark@modio.se> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ipv6: allow choosing optimistic addresses with use_optimisticErik Kline2015-04-111-1/+3
| | | | | | | | | | | | | | | | The use_optimistic sysctl makes optimistic IPv6 addresses equivalent to preferred addresses for source address selection (e.g., when calling connect()), but it does not allow an application to bind to optimistic addresses. This behaviour is inconsistent - for example, it doesn't make sense for bind() to an optimistic address fail with EADDRNOTAVAIL, but connect() to choose that address outgoing address on the same socket. Bug: 17769720 Bug: 18609055 Change-Id: I9de0d6c92ac45e29d28e318ac626c71806666f13 Signed-off-by: Erik Kline <ek@google.com> Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
* net/ping: handle protocol mismatching scenarioJane Zhou2015-04-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | ping_lookup() may return a wrong sock if sk_buff's and sock's protocols dont' match. For example, sk_buff's protocol is ETH_P_IPV6, but sock's sk_family is AF_INET, in that case, if sk->sk_bound_dev_if is zero, a wrong sock will be returned. the fix is to "continue" the searching, if no matching, return NULL. [cherry-pick of net 91a0b603469069cdcce4d572b7525ffc9fd352a6] Bug: 18512516 Change-Id: I520223ce53c0d4e155c37d6b65a03489cc7fd494 Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Jane Zhou <a17711@motorola.com> Signed-off-by: Yiwei Zhao <gbjc64@motorola.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ipv6: Add a sysctl to make optimistic addresses useful candidatesErik Kline2014-11-281-2/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a sysctl that causes an interface's optimistic addresses to be considered equivalent to other non-deprecated addresses for source address selection purposes. Preferred addresses will still take precedence over optimistic addresses, subject to other ranking in the source address selection algorithm. This is useful where different interfaces are connected to different networks from different ISPs (e.g., a cell network and a home wifi network). The current behaviour complies with RFC 3484/6724, and it makes sense if the host has only one interface, or has multiple interfaces on the same network (same or cooperating administrative domain(s), but not in the multiple distinct networks case. For example, if a mobile device has an IPv6 address on an LTE network and then connects to IPv6-enabled wifi, while the wifi IPv6 address is undergoing DAD, IPv6 connections will try use the wifi default route with the LTE IPv6 address, and will get stuck until they time out. Also, because optimistic nodes can receive frames, issue an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC flag appropriately set). A second RTM_NEWADDR is sent if DAD completes (the address flags have changed), otherwise an RTM_DELADDR is sent. Also: add an entry in ip-sysctl.txt for optimistic_dad. [backport of net-next 7fd2561e4ebdd070ebba6d3326c4c5b13942323f] Signed-off-by: Erik Kline <ek@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Bug: 17769720 Change-Id: I440a9b8c788db6767d191bbebfd2dff481aa9e0d
* net: ipv6: autoconf routes into per-device tablesLorenzo Colitti2014-11-272-44/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, IPv6 router discovery always puts routes into RT6_TABLE_MAIN. This causes problems for connection managers that want to support multiple simultaneous network connections and want control over which one is used by default (e.g., wifi and wired). To work around this connection managers typically take the routes they prefer and copy them to static routes with low metrics in the main table. This puts the burden on the connection manager to watch netlink to see if the routes have changed, delete the routes when their lifetime expires, etc. Instead, this patch adds a per-interface sysctl to have the kernel put autoconf routes into different tables. This allows each interface to have its own autoconf table, and choosing the default interface (or using different interfaces at the same time for different types of traffic) can be done using appropriate ip rules. The sysctl behaves as follows: - = 0: default. Put routes into RT6_TABLE_MAIN as before. - > 0: manual. Put routes into the specified table. - < 0: automatic. Add the absolute value of the sysctl to the device's ifindex, and use that table. The automatic mode is most useful in conjunction with net.ipv6.conf.default.accept_ra_rt_table. A connection manager or distribution could set it to, say, -100 on boot, and thereafter just use IP rules. Change-Id: I093d39fb06ec413905dc0d0d5792c1bc5d5c73a9 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Conflicts: net/ipv6/addrconf.c net/ipv6/route.c
* net: defer net_secret[] initializationEric Dumazet2014-11-262-4/+5
| | | | | | | | | | | Instead of feeding net_secret[] at boot time, defer the init at the point first socket is created. This permits some platforms to use better entropy sources than the ones available at boot time. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/l2tp: don't fall back on UDP [get|set]sockoptSasha Levin2014-11-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | The l2tp [get|set]sockopt() code has fallen back to the UDP functions for socket option levels != SOL_PPPOL2TP since day one, but that has never actually worked, since the l2tp socket isn't an inet socket. As David Miller points out: "If we wanted this to work, it'd have to look up the tunnel and then use tunnel->sk, but I wonder how useful that would be" Since this can never have worked so nobody could possibly have depended on that functionality, just remove the broken code and return -EINVAL. Reported-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: James Chapman <jchapman@katalix.com> Acked-by: David Miller <davem@davemloft.net> Cc: Phil Turnbull <phil.turnbull@oracle.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Willy Tarreau <w@1wt.eu> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I63a8c6572971c62370442c7e74c53e66b43e4b1b
* net: ipv4: current group_info should be put after using.JP Abgrall2014-11-261-4/+11
| | | | | | | | | | | | | | | | | | Plug a group_info refcount leak in ping_init. group_info is only needed during initialization and the code failed to release the reference on exit. While here move grabbing the reference to a place where it is actually needed. Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Signed-off-by: Zhang Dongxing <dongxing.zhang@intel.com> Signed-off-by: xiaoming wang <xiaoming.wang@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Conflicts: net/ipv4/ping.c Change-Id: I51931e439cce7f19b4179f5828d7355bb3b1dcfa
* net: ipv6: ping: Use socket mark in routing lookupLorenzo Colitti2014-11-261-0/+1
| | | | | | | | [net-next commit bf439b3154ce49d81a79b14f9fab18af99018ae2] Change-Id: I8356e9132088c75d4510021c6e4c2641d772087a Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Fix "ip rule delete table 256"Andreas Henriksson2014-11-261-1/+2
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit 13eb2ab2d33c57ebddc57437a7d341995fc9138c ] When trying to delete a table >= 256 using iproute2 the local table will be deleted. The table id is specified as a netlink attribute when it needs more then 8 bits and iproute2 then sets the table field to RT_TABLE_UNSPEC (0). Preconditions to matching the table id in the rule delete code doesn't seem to take the "table id in netlink attribute" into condition so the frh_get_table helper function never gets to do its job when matching against current rule. Use the helper function twice instead of peaking at the table value directly. Originally reported at: http://bugs.debian.org/724783 Reported-by: Nicolas HICHER <nhicher@avencall.com> Signed-off-by: Andreas Henriksson <andreas@fatal.se> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge remote-tracking branch 'linux/linux-3.0.y' into stable-newpvrZiyann2014-11-2685-301/+634
|\ | | | | | | | | Conflicts: arch/arm/include/asm/hardware/cache-l2x0.h
| * ipv6: tcp: fix panic in SYN processingEric Dumazet2013-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c16a98ed91597b40b22b540c6517103497ef8e74 upstream. commit 72a3effaf633bc ([NET]: Size listen hash tables using backlog hint) added a bug allowing inet6_synq_hash() to return an out of bound array index, because of u16 overflow. Bug can happen if system admins set net.core.somaxconn & net.ipv4.tcp_max_syn_backlog sysctls to values greater than 65536 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipv6: udp packets following an UFO enqueued packet need also be handled by UFOHannes Frederic Sowa2013-10-131-31/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2811ebac2521ceac84f2bdae402455baa6a7fb47 ] In the following scenario the socket is corked: If the first UDP packet is larger then the mtu we try to append it to the write queue via ip6_ufo_append_data. A following packet, which is smaller than the mtu would be appended to the already queued up gso-skb via plain ip6_append_data. This causes random memory corruptions. In ip6_ufo_append_data we also have to be careful to not queue up the same skb multiple times. So setup the gso frame only when no first skb is available. This also fixes a shortcoming where we add the current packet's length to cork->length but return early because of a packet > mtu with dontfrag set (instead of sutracting it again). Found with trinity. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_putSalam Noureddine2013-10-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e2401654dd0f5f3fb7a8d80dad9554d73d7ca394 ] It is possible for the timer handlers to run after the call to ip_mc_down so use in_dev_put instead of __in_dev_put in the handler function in order to do proper cleanup when the refcnt reaches 0. Otherwise, the refcnt can reach zero without the in_device being destroyed and we end up leaking a reference to the net_device and see messages like the following, unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Tested on linux-3.4.43. Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_putSalam Noureddine2013-10-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9260d3e1013701aa814d10c8fc6a9f92bd17d643 ] It is possible for the timer handlers to run after the call to ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the handler function in order to do proper cleanup when the refcnt reaches 0. Otherwise, the refcnt can reach zero without the inet6_dev being destroyed and we end up leaking a reference to the net_device and see messages like the following, unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Tested on linux-3.4.43. Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ip: generate unique IP identificator if local fragmentation is allowedAnsis Atteka2013-10-137-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 703133de331a7a7df47f31fb9de51dc6f68a9de8 ] If local fragmentation is allowed, then ip_select_ident() and ip_select_ident_more() need to generate unique IDs to ensure correct defragmentation on the peer. For example, if IPsec (tunnel mode) has to encrypt large skbs that have local_df bit set, then all IP fragments that belonged to different ESP datagrams would have used the same identificator. If one of these IP fragments would get lost or reordered, then peer could possibly stitch together wrong IP fragments that did not belong to the same datagram. This would lead to a packet loss or data corruption. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * bridge: Clamp forward_delay when enabling STPHerbert Xu2013-10-133-8/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit be4f154d5ef0ca147ab6bcd38857a774133f5450 ] At some point limits were added to forward_delay. However, the limits are only enforced when STP is enabled. This created a scenario where you could have a value outside the allowed range while STP is disabled, which then stuck around even after STP is enabled. This patch fixes this by clamping the value when we enable STP. I had to move the locking around a bit to ensure that there is no window where someone could insert a value outside the range while we're in the middle of enabling STP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * resubmit bridge: fix message_age_timer calculationChris Healy2013-10-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9a0620133ccce9dd35c00a96405c8d80938c2cc0 ] This changes the message_age_timer calculation to use the BPDU's max age as opposed to the local bridge's max age. This is in accordance with section 8.6.2.3.2 Step 2 of the 802.1D-1998 sprecification. With the current implementation, when running with very large bridge diameters, convergance will not always occur even if a root bridge is configured to have a longer max age. Tested successfully on bridge diameters of ~200. Signed-off-by: Chris Healy <cphealy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: sctp: fix ipv6 ipsec encryption bug in sctp_v6_xmitDaniel Borkmann2013-10-131-30/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 95ee62083cb6453e056562d91f597552021e6ae7 ] Alan Chester reported an issue with IPv6 on SCTP that IPsec traffic is not being encrypted, whereas on IPv4 it is. Setting up an AH + ESP transport does not seem to have the desired effect: SCTP + IPv4: 22:14:20.809645 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 116) 192.168.0.2 > 192.168.0.5: AH(spi=0x00000042,sumlen=16,seq=0x1): ESP(spi=0x00000044,seq=0x1), length 72 22:14:20.813270 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 340) 192.168.0.5 > 192.168.0.2: AH(spi=0x00000043,sumlen=16,seq=0x1): SCTP + IPv6: 22:31:19.215029 IP6 (class 0x02, hlim 64, next-header SCTP (132) payload length: 364) fe80::222:15ff:fe87:7fc.3333 > fe80::92e6:baff:fe0d:5a54.36767: sctp 1) [INIT ACK] [init tag: 747759530] [rwnd: 62464] [OS: 10] [MIS: 10] Moreover, Alan says: This problem was seen with both Racoon and Racoon2. Other people have seen this with OpenSwan. When IPsec is configured to encrypt all upper layer protocols the SCTP connection does not initialize. After using Wireshark to follow packets, this is because the SCTP packet leaves Box A unencrypted and Box B believes all upper layer protocols are to be encrypted so it drops this packet, causing the SCTP connection to fail to initialize. When IPsec is configured to encrypt just SCTP, the SCTP packets are observed unencrypted. In fact, using `socat sctp6-listen:3333 -` on one end and transferring "plaintext" string on the other end, results in cleartext on the wire where SCTP eventually does not report any errors, thus in the latter case that Alan reports, the non-paranoid user might think he's communicating over an encrypted transport on SCTP although he's not (tcpdump ... -X): ... 0x0030: 5d70 8e1a 0003 001a 177d eb6c 0000 0000 ]p.......}.l.... 0x0040: 0000 0000 706c 6169 6e74 6578 740a 0000 ....plaintext... Only in /proc/net/xfrm_stat we can see XfrmInTmplMismatch increasing on the receiver side. Initial follow-up analysis from Alan's bug report was done by Alexey Dobriyan. Also thanks to Vlad Yasevich for feedback on this. SCTP has its own implementation of sctp_v6_xmit() not calling inet6_csk_xmit(). This has the implication that it probably never really got updated along with changes in inet6_csk_xmit() and therefore does not seem to invoke xfrm handlers. SCTP's IPv4 xmit however, properly calls ip_queue_xmit() to do the work. Since a call to inet6_csk_xmit() would solve this problem, but result in unecessary route lookups, let us just use the cached flowi6 instead that we got through sctp_v6_get_dst(). Since all SCTP packets are being sent through sctp_packet_transmit(), we do the route lookup / flow caching in sctp_transport_route(), hold it in tp->dst and skb_dst_set() right after that. If we would alter fl6->daddr in sctp_v6_xmit() to np->opt->srcrt, we possibly could run into the same effect of not having xfrm layer pick it up, hence, use fl6_update_dst() in sctp_v6_get_dst() instead to get the correct source routed dst entry, which we assign to the skb. Also source address routing example from 625034113 ("sctp: fix sctp to work with ipv6 source address routing") still works with this patch! Nevertheless, in RFC5095 it is actually 'recommended' to not use that anyway due to traffic amplification [1]. So it seems we're not supposed to do that anyway in sctp_v6_xmit(). Moreover, if we overwrite the flow destination here, the lower IPv6 layer will be unable to put the correct destination address into IP header, as routing header is added in ipv6_push_nfrag_opts() but then probably with wrong final destination. Things aside, result of this patch is that we do not have any XfrmInTmplMismatch increase plus on the wire with this patch it now looks like: SCTP + IPv6: 08:17:47.074080 IP6 2620:52:0:102f:7a2b:cbff:fe27:1b0a > 2620:52:0:102f:213:72ff:fe32:7eba: AH(spi=0x00005fb4,seq=0x1): ESP(spi=0x00005fb5,seq=0x1), length 72 08:17:47.074264 IP6 2620:52:0:102f:213:72ff:fe32:7eba > 2620:52:0:102f:7a2b:cbff:fe27:1b0a: AH(spi=0x00003d54,seq=0x1): ESP(spi=0x00003d55,seq=0x1), length 296 This fixes Kernel Bugzilla 24412. This security issue seems to be present since 2.6.18 kernels. Lets just hope some big passive adversary in the wild didn't have its fun with that. lksctp-tools IPv6 regression test suite passes as well with this patch. [1] http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf Reported-by: Alan Chester <alan.chester@tekelec.com> Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * netpoll: fix NULL pointer dereference in netpoll_cleanupNikolay Aleksandrov2013-10-131-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d0fe8c888b1fd1a2f84b9962cabcb98a70988aec ] I've been hitting a NULL ptr deref while using netconsole because the np->dev check and the pointer manipulation in netpoll_cleanup are done without rtnl and the following sequence happens when having a netconsole over a vlan and we remove the vlan while disabling the netconsole: CPU 1 CPU2 removes vlan and calls the notifier enters store_enabled(), calls netdev_cleanup which checks np->dev and then waits for rtnl executes the netconsole netdev release notifier making np->dev == NULL and releases rtnl continues to dereference a member of np->dev which at this point is == NULL Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>