aboutsummaryrefslogtreecommitdiffstats
path: root/net/core
Commit message (Collapse)AuthorAgeFilesLines
* Backport mac80211 from 3.4 kernelWolfgang Wiedmeyer2017-01-211-0/+20
| | | | | | | | | The ath9k_htc driver depends on mac80211, but mac80211 can't be build. The reason is that net/wireless is almost completely backported from a 3.4 kernel. To follow suit, mac80211 is also backported from 3.4, more precisely from 3.4.113. This makes mac80211 build. Signed-off-by: Wolfgang Wiedmeyer <wolfgit@wiedmeyer.de>
* net: avoid signed overflows for SO_{SND|RCV}BUFFORCEEric Dumazet2017-01-081-2/+2
| | | | | | | | | | | | | | | | | | | CAP_NET_ADMIN users should not be allowed to set negative sk_sndbuf or sk_rcvbuf values, as it can lead to various memory corruptions, crashes, OOM... Note that before commit 82981930125a ("net: cleanups in sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF and SO_RCVBUF were vulnerable. This needs to be backported to all known linux kernels. Again, many thanks to syzkaller team for discovering this gem. Change-Id: I7b3a4b234eee4e3b2b2766f4d61a44d92e76095d Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: cleanups in sock_setsockopt()Eric Dumazet2017-01-081-27/+15
| | | | | | | | | Use min_t()/max_t() macros, reformat two comments, use !!test_bit() to match !!sock_flag() Change-Id: Ie9ef0586af81908916e7df27ea3c4508982dc42c Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* fix infoleak in rtnetlinkkangjie2016-10-191-8/+8
| | | | | | | | | | | the stack object “map” has a total size of 32 bytes. Its last 4 bytes are padding generated by compiler. These padding bytes are not initialized and sent out via “nla_put” Bug: 28620102 Change-Id: I13da380c6fe8abca49e3cf9f05293c02b44d2e5e Signed-off-by: kangjie <kangjielu@gmail.com>
* net: add length argument to skb_copy_and_csum_datagram_iovecSabrina Dubroca2016-03-151-1/+5
| | | | | | | | | | | | | | | | | | Without this length argument, we can read past the end of the iovec in memcpy_toiovec because we have no way of knowing the total length of the iovec's buffers. This is needed for stable kernels where 89c22d8c3b27 ("net: Fix skb csum races when peeking") has been backported but that don't have the ioviter conversion, which is almost all the stable trees <= 3.18. This also fixes a kernel crash for NFS servers when the client uses -onfsvers=3,proto=udp to mount the export. Change-Id: I1865e3d7a1faee42a5008a9ad58c4d3323ea4bab Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> (cherry picked from commit c91234366e4cfd4f70c73e7d79ede92a6e462a88)
* neigh: Better handling of transition to NUD_PROBE stateErik Kline2016-02-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [1] When entering NUD_PROBE state via neigh_update(), perhaps received from userspace, correctly (re)initialize the probes count to zero. This is useful for forcing revalidation of a neighbor (for example if the host is attempting to do DNA [IPv4 4436, IPv6 6059]). [2] Notify listeners when a neighbor goes into NUD_PROBE state. By sending notifications on entry to NUD_PROBE state listeners get more timely warnings of imminent connectivity issues. The current notifications on entry to NUD_STALE have somewhat limited usefulness: NUD_STALE is a perfectly normal state, as is NUD_DELAY, whereas notifications on entry to NUD_FAILURE come after a neighbor reachability problem has been confirmed (typically after three probes). [Cherry-pick of upstream 765c9c639fbb132af0cafc6e1da22fe6cea26bb8] Signed-off-by: Erik Kline <ek@google.com> Acked-By: Lorenzo Colitti <lorenzo@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Change-Id: I3a48f267a1bf752daeb3ff8a28de926ac66f5ad8
* Merge remote-tracking branch 'korg/linux-3.0.y' into cm-13.0rogersb112015-11-108-34/+44
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: crypto/algapi.c drivers/gpu/drm/i915/i915_debugfs.c drivers/gpu/drm/i915/intel_display.c drivers/video/fbmem.c include/linux/nls.h kernel/cgroup.c kernel/signal.c kernel/timeconst.pl net/ipv4/ping.c Change-Id: I1f532925d1743df74d66bcdd6fc92f05c72ee0dd
| * netpoll: fix NULL pointer dereference in netpoll_cleanupNikolay Aleksandrov2013-10-131-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d0fe8c888b1fd1a2f84b9962cabcb98a70988aec ] I've been hitting a NULL ptr deref while using netconsole because the np->dev check and the pointer manipulation in netpoll_cleanup are done without rtnl and the following sequence happens when having a netconsole over a vlan and we remove the vlan while disabling the netconsole: CPU 1 CPU2 removes vlan and calls the notifier enters store_enabled(), calls netdev_cleanup which checks np->dev and then waits for rtnl executes the netconsole netdev release notifier making np->dev == NULL and releases rtnl continues to dereference a member of np->dev which at this point is == NULL Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * splice: fix racy pipe->buffers usesEric Dumazet2013-10-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 047fe3605235888f3ebcda0c728cb31937eadfe6 upstream. Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered by splice_shrink_spd() called from vmsplice_to_pipe() commit 35f3d14dbbc5 (pipe: add support for shrinking and growing pipes) added capability to adjust pipe->buffers. Problem is some paths don't hold pipe mutex and assume pipe->buffers doesn't change for their duration. Fix this by adding nr_pages_max field in struct splice_pipe_desc, and use it in place of pipe->buffers where appropriate. splice_shrink_spd() loses its struct pipe_inode_info argument. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Tom Herbert <therbert@google.com> Cc: stable <stable@vger.kernel.org> # 2.6.35 Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: check net.core.somaxconn sysctl valuesRoman Gushchin2013-09-141-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 5f671d6b4ec3e6d66c2a868738af2cdea09e7509 ] It's possible to assign an invalid value to the net.core.somaxconn sysctl variable, because there is no checks at all. The sk_max_ack_backlog field of the sock structure is defined as unsigned short. Therefore, the backlog argument in inet_listen() shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall is truncated to the somaxconn value. So, the somaxconn value shouldn't exceed 65535 (USHRT_MAX). Also, negative values of somaxconn are meaningless. before: $ sysctl -w net.core.somaxconn=256 net.core.somaxconn = 256 $ sysctl -w net.core.somaxconn=65536 net.core.somaxconn = 65536 $ sysctl -w net.core.somaxconn=-100 net.core.somaxconn = -100 after: $ sysctl -w net.core.somaxconn=256 net.core.somaxconn = 256 $ sysctl -w net.core.somaxconn=65536 error: "Invalid argument" setting key "net.core.somaxconn" $ sysctl -w net.core.somaxconn=-100 error: "Invalid argument" setting key "net.core.somaxconn" Based on a prior patch from Changli Gao. Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Reported-by: Changli Gao <xiaosuo@gmail.com> Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * neighbour: fix a race in neigh_destroy()Eric Dumazet2013-07-281-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c9ab4d85de222f3390c67aedc9c18a50e767531e ] There is a race in neighbour code, because neigh_destroy() uses skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, while other parts of the code assume neighbour rwlock is what protects arp_queue Convert all skb_queue_purge() calls to the __skb_queue_purge() variant Use __skb_queue_head_init() instead of skb_queue_head_init() to make clear we do not use arp_queue.lock And hold neigh->lock in neigh_destroy() to close the race. Reported-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ipv6: do not clear pinet6 fieldEric Dumazet2013-05-191-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f77d602124d865c38705df7fa25c03de9c284ad2 ] We have seen multiple NULL dereferences in __inet6_lookup_established() After analysis, I found that inet6_sk() could be NULL while the check for sk_family == AF_INET6 was true. Bug was added in linux-2.6.29 when RCU lookups were introduced in UDP and TCP stacks. Once an IPv6 socket, using SLAB_DESTROY_BY_RCU is inserted in a hash table, we no longer can clear pinet6 field. This patch extends logic used in commit fcbdf09d9652c891 ("net: fix nulls list corruptions in sk_prot_alloc") TCP/UDP/UDPLite IPv6 protocols provide their own .clear_sk() method to make sure we do not clear pinet6 field. At socket clone phase, we do not really care, as cloning the parent (non NULL) pinet6 is not adding a fatal race. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: fix incorrect credentials passingLinus Torvalds2013-05-011-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 83f1b4ba917db5dc5a061a44b3403ddb6e783494 ] Commit 257b5358b32f ("scm: Capture the full credentials of the scm sender") changed the credentials passing code to pass in the effective uid/gid instead of the real uid/gid. Obviously this doesn't matter most of the time (since normally they are the same), but it results in differences for suid binaries when the wrong uid/gid ends up being used. This just undoes that (presumably unintentional) part of the commit. Reported-by: Andy Lutomirski <luto@amacapital.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: David S. Miller <davem@davemloft.net> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * rtnetlink: Call nlmsg_parse() with correct header lengthMichael Riesch2013-05-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 88c5b5ce5cb57af6ca2a7cf4d5715fa320448ff9 ] Signed-off-by: Michael Riesch <michael.riesch@omicron.at> Cc: "David S. Miller" <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Benc <jbenc@redhat.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-kernel@vger.kernel.org Acked-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * netfilter: don't reset nf_trace in nf_reset()Patrick McHardy2013-05-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 124dff01afbdbff251f0385beca84ba1b9adda68 ] Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code to reset nf_trace in nf_reset(). This is wrong and unnecessary. nf_reset() is used in the following cases: - when passing packets up the the socket layer, at which point we want to release all netfilter references that might keep modules pinned while the packet is queued. nf_trace doesn't matter anymore at this point. - when encapsulating or decapsulating IPsec packets. We want to continue tracing these packets after IPsec processing. - when passing packets through virtual network devices. Only devices on that encapsulate in IPv4/v6 matter since otherwise nf_trace is not used anymore. Its not entirely clear whether those packets should be traced after that, however we've always done that. - when passing packets through virtual network devices that make the packet cross network namespace boundaries. This is the only cases where we clearly want to reset nf_trace and is also what the original patch intended to fix. Add a new function nf_reset_trace() and use it in dev_forward_skb() to fix this properly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: count hw_addr syncs so that unsync works properly.Vlad Yasevich2013-05-011-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 4543fbefe6e06a9e40d9f2b28d688393a299f079 ] A few drivers use dev_uc_sync/unsync to synchronize the address lists from master down to slave/lower devices. In some cases (bond/team) a single address list is synched down to multiple devices. At the time of unsync, we have a leak in these lower devices, because "synced" is treated as a boolean and the address will not be unsynced for anything after the first device/call. Treat "synced" as a count (same as refcount) and allow all unsync calls to work. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: add a synchronize_net() in netdev_rx_handler_unregister()Eric Dumazet2013-04-051-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 00cfec37484761a44a3b6f4675a54caa618210ae ] commit 35d48903e97819 (bonding: fix rx_handler locking) added a race in bonding driver, reported by Steven Rostedt who did a very good diagnosis : <quoting Steven> I'm currently debugging a crash in an old 3.0-rt kernel that one of our customers is seeing. The bug happens with a stress test that loads and unloads the bonding module in a loop (I don't know all the details as I'm not the one that is directly interacting with the customer). But the bug looks to be something that may still be present and possibly present in mainline too. It will just be much harder to trigger it in mainline. In -rt, interrupts are threads, and can schedule in and out just like any other thread. Note, mainline now supports interrupt threads so this may be easily reproducible in mainline as well. I don't have the ability to tell the customer to try mainline or other kernels, so my hands are somewhat tied to what I can do. But according to a core dump, I tracked down that the eth irq thread crashed in bond_handle_frame() here: slave = bond_slave_get_rcu(skb->dev); bond = slave->bond; <--- BUG the slave returned was NULL and accessing slave->bond caused a NULL pointer dereference. Looking at the code that unregisters the handler: void netdev_rx_handler_unregister(struct net_device *dev) { ASSERT_RTNL(); RCU_INIT_POINTER(dev->rx_handler, NULL); RCU_INIT_POINTER(dev->rx_handler_data, NULL); } Which is basically: dev->rx_handler = NULL; dev->rx_handler_data = NULL; And looking at __netif_receive_skb() we have: rx_handler = rcu_dereference(skb->dev->rx_handler); if (rx_handler) { if (pt_prev) { ret = deliver_skb(skb, pt_prev, orig_dev); pt_prev = NULL; } switch (rx_handler(&skb)) { My question to all of you is, what stops this interrupt from happening while the bonding module is unloading? What happens if the interrupt triggers and we have this: CPU0 CPU1 ---- ---- rx_handler = skb->dev->rx_handler netdev_rx_handler_unregister() { dev->rx_handler = NULL; dev->rx_handler_data = NULL; rx_handler() bond_handle_frame() { slave = skb->dev->rx_handler; bond = slave->bond; <-- NULL pointer dereference!!! What protection am I missing in the bond release handler that would prevent the above from happening? </quoting Steven> We can fix bug this in two ways. First is adding a test in bond_handle_frame() and others to check if rx_handler_data is NULL. A second way is adding a synchronize_net() in netdev_rx_handler_unregister() to make sure that a rcu protected reader has the guarantee to see a non NULL rx_handler_data. The second way is better as it avoids an extra test in fast path. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jpirko@redhat.com> Cc: Paul E. McKenney <paulmck@us.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * rtnetlink: Mask the rta_type when range checkingVlad Yasevich2013-03-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a5b8db91442fce9c9713fcd656c3698f1adde1d6 ] Range/validity checks on rta_type in rtnetlink_rcv_msg() do not account for flags that may be set. This causes the function to return -EINVAL when flags are set on the type (for example NLA_F_NESTED). Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * rtnl: fix info leak on RTM_GETLINK request for VF devicesMathias Krause2013-03-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 84d73cd3fb142bf1298a8c13fd4ca50fd2432372 ] Initialize the mac address buffer with 0 as the driver specific function will probably not fill the whole buffer. In fact, all in-kernel drivers fill only ETH_ALEN of the MAX_ADDR_LEN bytes, i.e. 6 of the 32 possible bytes. Therefore we currently leak 26 bytes of stack memory to userland via the netlink interface. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * bridging: fix rx_handlers return codeCristian Bercaru2013-03-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 3bc1b1add7a8484cc4a261c3e128dbe1528ce01f ] The frames for which rx_handlers return RX_HANDLER_CONSUMED are no longer counted as dropped. They are counted as successfully received by 'netif_receive_skb'. This allows network interface drivers to correctly update their RX-OK and RX-DRP counters based on the result of 'netif_receive_skb'. Signed-off-by: Cristian Bercaru <B43982@freescale.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | smdk4412: network: squashed commitsDerTeufel2014-12-091-1/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 9792f37daba788506559f99832c62b240402296c Author: Sreeram Ramachandran <sreeram@google.com> Date: Tue Jul 8 11:37:03 2014 -0700 Handle 'sk' being NULL in UID-based routing. Bug: 15413527 Change-Id: If33bebb7b52c0ebfa8dac2452607bce0c2b0faa0 Signed-off-by: Sreeram Ramachandran <sreeram@google.com> commit 7ab80d7fd3f1e3faebb14313119700fd7416ad54 Author: Lorenzo Colitti <lorenzo@google.com> Date: Mon Mar 31 16:23:51 2014 +0900 net: core: Support UID-based routing. This contains the following commits: 1. 0149763 net: core: Add a UID range to fib rules. 2. 1650474 net: core: Use the socket UID in routing lookups. 3. 0b16771 net: ipv4: Add the UID to the route cache. 4. ee058f1 net: core: Add a RTA_UID attribute to routes. This is so that userspace can do per-UID route lookups. Bug: 15413527 Change-Id: I1285474c6734614d3bda6f61d88dfe89a4af7892 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> commit a769ab7f07dcbbf29f2a8658aa5486bb6a2a66c3 Author: Hannes Frederic Sowa <hannes@stressinduktion.org> Date: Fri Mar 8 02:07:16 2013 +0000 ipv6: introdcue __ipv6_addr_needs_scope_id and ipv6_iface_scope_id helper functions [net-next commit b7ef213ef65256168df83ddfbb8131ed9adc10f9] __ipv6_addr_needs_scope_id checks if an ipv6 address needs to supply a 'sin6_scope_id != 0'. 'sin6_scope_id != 0' was enforced in case of link-local addresses. To support interface-local multicast these checks had to be enhanced and are now consolidated into these new helper functions. v2: a) migrated to struct ipv6_addr_props v3: a) reverted changes for ipv6_addr_props b) test for address type instead of comparing scope v4: a) unchanged Change-Id: Id6fc54cec61f967928e08a9eba4f857157d973a3 Suggested-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> commit af9b98af02a072c3eb0f3dd7d3df7242d8294e5c Author: Hannes Frederic Sowa <hannes@stressinduktion.org> Date: Mon Nov 18 07:07:45 2013 +0100 ping: prevent NULL pointer dereference on write to msg_name A plain read() on a socket does set msg->msg_name to NULL. So check for NULL pointer first. [Backport of net-next cf970c002d270c36202bd5b9c2804d3097a52da0] Bug: 12780426 Change-Id: I29d9cb95ef05ec76d37517e01317f4a29e60931c Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Lorenzo Colitti <lorenzo@google.com> commit d66ae9bbbf35cd6e7a3d04f6946d506b3148f06b Author: Cong Wang <amwang@redhat.com> Date: Sun Jun 2 22:43:52 2013 +0000 ping: always initialize ->sin6_scope_id and ->sin6_flowinfo [net-next commit c26d6b46da3ee86fa8a864347331e5513ca84c2b] If we don't need scope id, we should initialize it to zero. Same for ->sin6_flowinfo. Change-Id: I28e4bc9593e76fc3434052182466fab4bb8ccf3a Cc: Lorenzo Colitti <lorenzo@google.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> commit 22d188e621c143108e1207831e5817f24d0cccc0 Author: Lorenzo Colitti <lorenzo@google.com> Date: Thu Jul 4 00:12:40 2013 +0900 net: ipv6: fix wrong ping_v6_sendmsg return value [net-next commit fbfe80c890a1dc521d0b629b870e32fcffff0da5] ping_v6_sendmsg currently returns 0 on success. It should return the number of bytes written instead. Bug: 9469865 Change-Id: I82b7d3a37ba91ad24e6dbd97a4880745ce16ad31 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> commit b691b1c9931f86c3fc7a10208030752f205d1adf Author: Lorenzo Colitti <lorenzo@google.com> Date: Thu Jul 4 00:52:49 2013 +0900 net: ipv6: add missing lock in ping_v6_sendmsg [net-next commit a1bdc45580fc19e968b32ad27cd7e476a4aa58f6] Bug: 9469865 Change-Id: I480f8ce95956dd8f17fbbb26dc60cc162f8ec933 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> commit 515b76147e907579254cd5997a4ab9e64da32268 Author: Lorenzo Colitti <lorenzo@google.com> Date: Wed Jan 16 22:09:49 2013 +0000 net: ipv6: Add IPv6 support to the ping socket. [backport of net-next 6d0bfe22611602f36617bc7aa2ffa1bbb2f54c67] This adds the ability to send ICMPv6 echo requests without a raw socket. The equivalent ability for ICMPv4 was added in 2011. Instead of having separate code paths for IPv4 and IPv6, make most of the code in net/ipv4/ping.c dual-stack and only add a few IPv6-specific bits (like the protocol definition) to a new net/ipv6/ping.c. Hopefully this will reduce divergence and/or duplication of bugs in the future. Caveats: - Setting options via ancillary data (e.g., using IPV6_PKTINFO to specify the outgoing interface) is not yet supported. - There are no separate security settings for IPv4 and IPv6; everything is controlled by /proc/net/ipv4/ping_group_range. - The proc interface does not yet display IPv6 ping sockets properly. Tested with a patched copy of ping6 and using raw socket calls. Compiles and works with all of CONFIG_IPV6={n,m,y}. Change-Id: Ia359af556021344fc7f890c21383aadf950b6498 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [lorenzo@google.com: backported to 3.0] Signed-off-by: Lorenzo Colitti <lorenzo@google.com> commit d72b1c37bab1bbdebb096421b5ef88ceec6eae8e Author: Li Wei <lw@cn.fujitsu.com> Date: Thu Feb 21 00:09:54 2013 +0000 ipv4: fix a bug in ping_err(). [ Upstream commit b531ed61a2a2a77eeb2f7c88b49aa5ec7d9880d8 ] We should get 'type' and 'code' from the outer ICMP header. Change-Id: I9a467b4aa794127f22dbc5f802d17ae618aa0c74 Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit ead1926fc318a4c97e735a885db40e77135c0531 Author: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon Oct 24 03:06:21 2011 -0400 ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT There is a long standing bug in linux tcp stack, about ACK messages sent on behalf of TIME_WAIT sockets. In the IP header of the ACK message, we choose to reflect TOS field of incoming message, and this might break some setups. Example of things that were broken : - Routing using TOS as a selector - Firewalls - Trafic classification / shaping We now remember in timewait structure the inet tos field and use it in ACK generation, and route lookup. Notes : - We still reflect incoming TOS in RST messages. - We could extend MuraliRaja Muniraju patch to report TOS value in netlink messages for TIME_WAIT sockets. - A patch is needed for IPv6 Change-Id: Ic7ad8a7b858de181bfe2a789c472f84955397d4c Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> commit 47ef68bdd0ceb7113496f3325068202e5d1f3eba Author: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed Nov 30 19:00:53 2011 +0000 ipv4: use a 64bit load/store in output path gcc compiler is smart enough to use a single load/store if we memcpy(dptr, sptr, 8) on x86_64, regardless of CONFIG_CC_OPTIMIZE_FOR_SIZE In IP header, daddr immediately follows saddr, this wont change in the future. We only need to make sure our flowi4 (saddr,daddr) fields wont break the rule. Change-Id: Iad9c8fd9121ec84c2599b013badaebba92db7c39 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> commit 5b7251328273e10d0d768a24f7b555d1e1f671e6 Author: Julian Anastasov <ja@ssi.bg> Date: Sun Aug 7 09:16:09 2011 +0000 ipv4: route non-local sources for raw socket The raw sockets can provide source address for routing but their privileges are not considered. We can provide non-local source address, make sure the FLOWI_FLAG_ANYSRC flag is set if socket has privileges for this, i.e. based on hdrincl (IP_HDRINCL) and transparent flags. Change-Id: I136b161c584deac3885efbf217e959e1a829fc1d Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Change-Id: I0022e9536ee1861bf163e5bba4a86a3e94669960
* | Merge remote-tracking branch 'kernelorg/linux-3.0.y' into 3_0_64Andrew Dodd2013-02-2711-148/+266
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/arm/Kconfig arch/arm/include/asm/hwcap.h arch/arm/kernel/smp.c arch/arm/plat-samsung/adc.c drivers/gpu/drm/i915/i915_reg.h drivers/gpu/drm/i915/intel_drv.h drivers/mmc/core/sd.c drivers/net/tun.c drivers/net/usb/usbnet.c drivers/regulator/max8997.c drivers/usb/core/hub.c drivers/usb/host/xhci.h drivers/usb/serial/qcserial.c fs/jbd2/transaction.c include/linux/migrate.h kernel/sys.c kernel/time/timekeeping.c lib/genalloc.c mm/memory-failure.c mm/memory_hotplug.c mm/mempolicy.c mm/page_alloc.c mm/vmalloc.c mm/vmscan.c mm/vmstat.c scripts/Kbuild.include Change-Id: I91e2d85c07320c7ccfc04cf98a448e89bed6ade6
| * pktgen: correctly handle failures when adding a deviceCong Wang2013-02-141-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 604dfd6efc9b79bce432f2394791708d8e8f6efc ] The return value of pktgen_add_device() is not checked, so even if we fail to add some device, for example, non-exist one, we still see "OK:...". This patch fixes it. After this patch, I got: # echo "add_device non-exist" > /proc/net/pktgen/kpktgend_0 -bash: echo: write error: No such device # cat /proc/net/pktgen/kpktgend_0 Running: Stopped: Result: ERROR: can not add device non-exist # echo "add_device eth0" > /proc/net/pktgen/kpktgend_0 # cat /proc/net/pktgen/kpktgend_0 Running: Stopped: eth0 Result: OK: add_device=eth0 (Candidate for -stable) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * rtnetlink: fix rtnl_calcit() and rtnl_dump_ifinfo()Eric Dumazet2013-01-171-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a4b64fbe482c7766f7925f03067fc637716bfa3f upstream. nlmsg_parse() might return an error, so test its return value before potential random memory accesses. Errors introduced in commit 115c9b81928 (rtnetlink: Fix problem with buffer allocation) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * rtnetlink: Fix problem with buffer allocationGreg Rose2013-01-171-19/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 115c9b81928360d769a76c632bae62d15206a94a upstream. Implement a new netlink attribute type IFLA_EXT_MASK. The mask is a 32 bit value that can be used to indicate to the kernel that certain extended ifinfo values are requested by the user application. At this time the only mask value defined is RTEXT_FILTER_VF to indicate that the user wants the ifinfo dump to send information about the VFs belonging to the interface. This patch fixes a bug in which certain applications do not have large enough buffers to accommodate the extra information returned by the kernel with large numbers of SR-IOV virtual functions. Those applications will not send the new netlink attribute with the interface info dump request netlink messages so they will not get unexpectedly large request buffers returned by the kernel. Modifies the rtnl_calcit function to traverse the list of net devices and compute the minimum buffer size that can hold the info dumps of all matching devices based upon the filter passed in via the new netlink attribute filter mask. If no filter mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE. With this change it is possible to add yet to be defined netlink attributes to the dump request which should make it fairly extensible in the future. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.0: - Adjust context - Drop the change in do_setlink() that reverts commit f18da1456581 ('net: RTNETLINK adjusting values of min_ifinfo_dump_size'), which was never applied here] Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * rtnetlink: Compute and store minimum ifinfo dump sizeGreg Rose2013-01-173-19/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c7ac8679bec9397afe8918f788cbcef88c38da54 upstream. The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net-rps: Fix brokeness causing OOO packetsTom Herbert2012-11-261-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit baefa31db2f2b13a05d1b81bdf2d20d487f58b0a ] In commit c445477d74ab3779 which adds aRFS to the kernel, the CPU selected for RFS is not set correctly when CPU is changing. This is causing OOO packets and probably other issues. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: correct check in dev_addr_del()Jiri Pirko2012-11-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a652208e0b52c190e57f2a075ffb5e897fe31c3b ] Check (ha->addr == dev->dev_addr) is always true because dev_addr_init() sets this. Correct the check to behave properly on addr removal. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: Fix skb_under_panic oops in neigh_resolve_outputramesh.nagappa@gmail.com2012-10-281-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e1f165032c8bade3a6bdf546f8faf61fda4dd01c ] The retry loop in neigh_resolve_output() and neigh_connected_output() call dev_hard_header() with out reseting the skb to network_header. This causes the retry to fail with skb_under_panic. The fix is to reset the network_header within the retry loop. Signed-off-by: Ramesh Nagappa <ramesh.nagappa@ericsson.com> Reviewed-by: Shawn Lu <shawn.lu@ericsson.com> Reviewed-by: Robert Coulson <robert.coulson@ericsson.com> Reviewed-by: Billie Alsup <billie.alsup@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * pktgen: fix crash when generating IPv6 packetsAmerigo Wang2012-10-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5aa8b572007c4bca1e6d3dd4c4820f1ae49d6bb2 upstream. For IPv6, sizeof(struct ipv6hdr) = 40, thus the following expression will result negative: datalen = pkt_dev->cur_pkt_size - 14 - sizeof(struct ipv6hdr) - sizeof(struct udphdr) - pkt_dev->pkt_overhead; And, the check "if (datalen < sizeof(struct pktgen_hdr))" will be passed as "datalen" is promoted to unsigned, therefore will cause a crash later. This is a quick fix by checking if "datalen" is negative. The following patch will increase the default value of 'min_pkt_size' for IPv6. This bug should exist for a long time, so Cc -stable too. Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: do not disable sg for packets requiring no checksumEd Cashin2012-10-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c0d680e577ff171e7b37dbdb1b1bf5451e851f04 ] A change in a series of VLAN-related changes appears to have inadvertently disabled the use of the scatter gather feature of network cards for transmission of non-IP ethernet protocols like ATA over Ethernet (AoE). Below is a reference to the commit that introduces a "harmonize_features" function that turns off scatter gather when the NIC does not support hardware checksumming for the ethernet protocol of an sk buff. commit f01a5236bd4b140198fbcc550f085e8361fd73fa Author: Jesse Gross <jesse@nicira.com> Date: Sun Jan 9 06:23:31 2011 +0000 net offloading: Generalize netif_get_vlan_features(). The can_checksum_protocol function is not equipped to consider a protocol that does not require checksumming. Calling it for a protocol that requires no checksum is inappropriate. The patch below has harmonize_features call can_checksum_protocol when the protocol needs a checksum, so that the network layer is not forced to perform unnecessary skb linearization on the transmission of AoE packets. Unnecessary linearization results in decreased performance and increased memory pressure, as reported here: http://www.spinics.net/lists/linux-mm/msg15184.html The problem has probably not been widely experienced yet, because only recently has the kernel.org-distributed aoe driver acquired the ability to use payloads of over a page in size, with the patchset recently included in the mm tree: https://lkml.org/lkml/2012/8/28/140 The coraid.com-distributed aoe driver already could use payloads of greater than a page in size, but its users generally do not use the newest kernels. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: guard tcp_set_keepalive() to tcp socketsEric Dumazet2012-10-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 3e10986d1d698140747fcfc2761ec9cb64c1d582 ] Its possible to use RAW sockets to get a crash in tcp_set_keepalive() / sk_reset_timer() Fix is to make sure socket is a SOCK_STREAM one. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: small bug on rxhash calculationChema Gonzalez2012-10-131-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6862234238e84648c305526af2edd98badcad1e0 ] In the current rxhash calculation function, while the sorting of the ports/addrs is coherent (you get the same rxhash for packets sharing the same 4-tuple, in both directions), ports and addrs are sorted independently. This implies packets from a connection between the same addresses but crossed ports hash to the same rxhash. For example, traffic between A=S:l and B=L:s is hashed (in both directions) from {L, S, {s, l}}. The same rxhash is obtained for packets between C=S:s and D=L:l. This patch ensures that you either swap both addrs and ports, or you swap none. Traffic between A and B, and traffic between C and D, get their rxhash from different sources ({L, S, {l, s}} for A<->B, and {L, S, {s, l}} for C<->D) The patch is co-written with Eric Dumazet <edumazet@google.com> Signed-off-by: Chema Gonzalez <chema@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * drop_monitor: dont sleep in atomic contextEric Dumazet2012-10-021-68/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit bec4596b4e6770c7037f21f6bd27567b152dc0d6 upstream. drop_monitor calls several sleeping functions while in atomic context. BUG: sleeping function called from invalid context at mm/slub.c:943 in_atomic(): 1, irqs_disabled(): 0, pid: 2103, name: kworker/0:2 Pid: 2103, comm: kworker/0:2 Not tainted 3.5.0-rc1+ #55 Call Trace: [<ffffffff810697ca>] __might_sleep+0xca/0xf0 [<ffffffff811345a3>] kmem_cache_alloc_node+0x1b3/0x1c0 [<ffffffff8105578c>] ? queue_delayed_work_on+0x11c/0x130 [<ffffffff815343fb>] __alloc_skb+0x4b/0x230 [<ffffffffa00b0360>] ? reset_per_cpu_data+0x160/0x160 [drop_monitor] [<ffffffffa00b022f>] reset_per_cpu_data+0x2f/0x160 [drop_monitor] [<ffffffffa00b03ab>] send_dm_alert+0x4b/0xb0 [drop_monitor] [<ffffffff810568e0>] process_one_work+0x130/0x4c0 [<ffffffff81058249>] worker_thread+0x159/0x360 [<ffffffff810580f0>] ? manage_workers.isra.27+0x240/0x240 [<ffffffff8105d403>] kthread+0x93/0xa0 [<ffffffff816be6d4>] kernel_thread_helper+0x4/0x10 [<ffffffff8105d370>] ? kthread_freezable_should_stop+0x80/0x80 [<ffffffff816be6d0>] ? gs_change+0xb/0xb Rework the logic to call the sleeping functions in right context. Use standard timer/workqueue api to let system chose any cpu to perform the allocation and netlink send. Also avoid a loop if reset_per_cpu_data() cannot allocate memory : use mod_timer() to wait 1/10 second before next try. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * drop_monitor: prevent init path from scheduling on the wrong cpuNeil Horman2012-10-021-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4fdcfa12843bca38d0c9deff70c8720e4e8f515f upstream. I just noticed after some recent updates, that the init path for the drop monitor protocol has a minor error. drop monitor maintains a per cpu structure, that gets initalized from a single cpu. Normally this is fine, as the protocol isn't in use yet, but I recently made a change that causes a failed skb allocation to reschedule itself . Given the current code, the implication is that this workqueue reschedule will take place on the wrong cpu. If drop monitor is used early during the boot process, its possible that two cpus will access a single per-cpu structure in parallel, possibly leading to data corruption. This patch fixes the situation, by storing the cpu number that a given instance of this per-cpu data should be accessed from. In the case of a need for a reschedule, the cpu stored in the struct is assigned the rescheule, rather than the currently executing cpu Tested successfully by myself. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * drop_monitor: Make updating data->skb smp safeNeil Horman2012-10-021-16/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3885ca785a3618593226687ced84f3f336dc3860 upstream. Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in its smp protections. Specifically, its possible to replace data->skb while its being written. This patch corrects that by making data->skb an rcu protected variable. That will prevent it from being overwritten while a tracepoint is modifying it. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> CC: David Miller <davem@davemloft.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * drop_monitor: fix sleeping in invalid context warningNeil Horman2012-10-021-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit cde2e9a651b76d8db36ae94cd0febc82b637e5dd upstream. Eric Dumazet pointed out this warning in the drop_monitor protocol to me: [ 38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85 [ 38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch [ 38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71 [ 38.352582] Call Trace: [ 38.352592] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0 [ 38.352599] [<ffffffff81063f2a>] __might_sleep+0xca/0xf0 [ 38.352606] [<ffffffff81655b16>] mutex_lock+0x26/0x50 [ 38.352610] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0 [ 38.352616] [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90 [ 38.352621] [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170 [ 38.352625] [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40 [ 38.352630] [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0 [ 38.352636] [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0 [ 38.352640] [<ffffffff8154a600>] ? genl_rcv+0x30/0x30 [ 38.352645] [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0 [ 38.352649] [<ffffffff8154a5f0>] genl_rcv+0x20/0x30 [ 38.352653] [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0 [ 38.352658] [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310 [ 38.352663] [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130 [ 38.352668] [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0 [ 38.352673] [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0 [ 38.352677] [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390 [ 38.352682] [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210 [ 38.352687] [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0 [ 38.352693] [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0 [ 38.352699] [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10 [ 38.352703] [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140 [ 38.352708] [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80 [ 38.352713] [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b It stems from holding a spinlock (trace_state_lock) while attempting to register or unregister tracepoint hooks, making in_atomic() true in this context, leading to the warning when the tracepoint calls might_sleep() while its taking a mutex. Since we only use the trace_state_lock to prevent trace protocol state races, as well as hardware stat list updates on an rcu write side, we can just convert the spinlock to a mutex to avoid this problem. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> CC: David Miller <davem@davemloft.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: Statically initialize init_net.dev_base_headRustad, Mark D2012-10-022-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 734b65417b24d6eea3e3d7457e1f11493890ee1d upstream. This change eliminates an initialization-order hazard most recently seen when netprio_cgroup is built into the kernel. With thanks to Eric Dumazet for catching a bug. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net/core: Fix potential memory leak in dev_set_alias()Alexey Khoroshilov2012-10-021-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7364e445f62825758fa61195d237a5b8ecdd06ec ] Do not leak memory by updating pointer with potentially NULL realloc return value. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * tcp: Apply device TSO segment limit earlierBen Hutchings2012-10-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 1485348d2424e1131ea42efc033cbd9366462b01 ] Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to limit the size of TSO skbs. This avoids the need to fall back to software GSO for local TCP senders. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: Allow driver to limit number of GSO segments per skbBen Hutchings2012-10-021-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 30b678d844af3305cda5953467005cebb5d7b687 ] A peer (or local user) may cause TCP to use a nominal MSS of as little as 88 (actual MSS of 76 with timestamps). Given that we have a sufficiently prodigious local sender and the peer ACKs quickly enough, it is nevertheless possible to grow the window for such a connection to the point that we will try to send just under 64K at once. This results in a single skb that expands to 861 segments. In some drivers with TSO support, such an skb will require hundreds of DMA descriptors; a substantial fraction of a TX ring or even more than a full ring. The TX queue selected for the skb may stall and trigger the TX watchdog repeatedly (since the problem skb will be retried after the TX reset). This particularly affects sfc, for which the issue is designated as CVE-2012-3412. Therefore: 1. Add the field net_device::gso_max_segs holding the device-specific limit. 2. In netif_skb_features(), if the number of segments is too high then mask out GSO features to force fall back to software GSO. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: feed /dev/random with the MAC address when registering a deviceTheodore Ts'o2012-08-152-0/+4
| | | | | | | | | | | | | | | | | | | | commit 7bf2357524408b97fec58344caf7397f8140c3fd upstream. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: David Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: fix rtnetlink IFF_PROMISC and IFF_ALLMULTI handlingJiri Benc2012-08-091-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b1beb681cba5358f62e6187340660ade226a5fcc ] When device flags are set using rtnetlink, IFF_PROMISC and IFF_ALLMULTI flags are handled specially. Function dev_change_flags sets IFF_PROMISC and IFF_ALLMULTI bits in dev->gflags according to the passed value but do_setlink passes a result of rtnl_dev_combine_flags which takes those bits from dev->flags. This can be easily trigerred by doing: tcpdump -i eth0 & ip l s up eth0 ip sets IFF_UP flag in ifi_flags and ifi_change, which is combined with IFF_PROMISC by rtnl_dev_combine_flags, causing __dev_change_flags to set IFF_PROMISC in gflags. Reported-by: Max Matveev <makc@redhat.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * netpoll: fix netpoll_send_udp() bugsEric Dumazet2012-07-161-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 954fba0274058d27c7c07b5ea07c41b3b7477894 ] Bogdan Hamciuc diagnosed and fixed following bug in netpoll_send_udp() : "skb->len += len;" instead of "skb_put(skb, len);" Meaning that _if_ a network driver needs to call skb_realloc_headroom(), only packet headers would be copied, leaving garbage in the payload. However the skb_realloc_headroom() must be avoided as much as possible since it requires memory and netpoll tries hard to work even if memory is exhausted (using a pool of preallocated skbs) It appears netpoll_send_udp() reserved 16 bytes for the ethernet header, which happens to work for typicall drivers but not all. Right thing is to use LL_RESERVED_SPACE(dev) (And also add dev->needed_tailroom of tailroom) This patch combines both fixes. Many thanks to Bogdan for raising this issue. Reported-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ethtool: allow ETHTOOL_GSSET_INFO for usersMichał Mirosław2012-07-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f80400a26a2e8bff541de12834a1134358bb6642 ] Allow ETHTOOL_GSSET_INFO ethtool ioctl() for unprivileged users. ETHTOOL_GSTRINGS is already allowed, but is unusable without this one. Signed-off-by: Micha©© Miros©©aw <mirq-linux@rere.qmqm.pl> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: sock: validate data_len before allocating skb in sock_alloc_send_pskb()Jason Wang2012-07-161-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit cc9b17ad29ecaa20bfe426a8d4dbfb94b13ff1cc ] We need to validate the number of pages consumed by data_len, otherwise frags array could be overflowed by userspace. So this patch validate data_len and return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * Revert "net: maintain namespace isolation between vlan and real device"David S. Miller2012-06-101-31/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 59b9997baba5242997ddc7bd96b1391f5275a5a4 ] This reverts commit 8a83a00b0735190384a348156837918271034144. It causes regressions for S390 devices, because it does an unconditional DST drop on SKBs for vlans and the QETH device needs the neighbour entry hung off the DST for certain things on transmit. Arnd can't remember exactly why he even needed this change. Conflicts: drivers/net/macvlan.c net/8021q/vlan_dev.c net/core/dev.c Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * pktgen: fix module unload for goodEric Dumazet2012-06-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d4b1133558e0d417342d5d2c49e4c35b428ff20d ] commit c57b5468406 (pktgen: fix crash at module unload) did a very poor job with list primitives. 1) list_splice() arguments were in the wrong order 2) list_splice(list, head) has undefined behavior if head is not initialized. 3) We should use the list_splice_init() variant to clear pktgen_threads list. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * pktgen: fix crash at module unloadEric Dumazet2012-06-101-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c57b54684060c8aced64a5b78ff69ff289af97b9 ] commit 7d3d43dab4e9 (net: In unregister_netdevice_notifier unregister the netdevices.) makes pktgen crashing at module unload. [ 296.820578] BUG: spinlock bad magic on CPU#6, rmmod/3267 [ 296.820719] lock: ffff880310c38000, .magic: ffff8803, .owner: <none>/-1, .owner_cpu: -1 [ 296.820943] Pid: 3267, comm: rmmod Not tainted 3.4.0-rc5+ #254 [ 296.821079] Call Trace: [ 296.821211] [<ffffffff8168a715>] spin_dump+0x8a/0x8f [ 296.821345] [<ffffffff8168a73b>] spin_bug+0x21/0x26 [ 296.821507] [<ffffffff812b4741>] do_raw_spin_lock+0x131/0x140 [ 296.821648] [<ffffffff8169188e>] _raw_spin_lock+0x1e/0x20 [ 296.821786] [<ffffffffa00cc0fd>] __pktgen_NN_threads+0x4d/0x140 [pktgen] [ 296.821928] [<ffffffffa00ccf8d>] pktgen_device_event+0x10d/0x1e0 [pktgen] [ 296.822073] [<ffffffff8154ed4f>] unregister_netdevice_notifier+0x7f/0x100 [ 296.822216] [<ffffffffa00d2a0b>] pg_cleanup+0x48/0x73 [pktgen] [ 296.822357] [<ffffffff8109528e>] sys_delete_module+0x17e/0x2a0 [ 296.822502] [<ffffffff81699652>] system_call_fastpath+0x16/0x1b Hold the pktgen_thread_lock while splicing pktgen_threads, and test pktgen_exiting in pktgen_device_event() to make unload faster. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * net: In unregister_netdevice_notifier unregister the netdevices.Eric W. Biederman2012-05-211-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7d3d43dab4e978d8d9ad1acf8af15c9b1c4b0f0f ] We already synthesize events in register_netdevice_notifier and synthesizing events in unregister_netdevice_notifier allows to us remove the need for special case cleanup code. This change should be safe as it adds no new cases for existing callers of unregiser_netdevice_notifier to handle. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>