aboutsummaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* netfilter: nf_nat_helper: tidy up adjust_tcp_sequenceHannes Eder2009-11-051-13/+9
| | | | | | | | | | | The variable 'other_way' gets initialized but is not read afterwards, so remove it. Pass the right arguments to a pr_debug call. While being at tidy up a bit and it fix this checkpatch warning: WARNING: suspect code indent for conditional statements Signed-off-by: Hannes Eder <heder@google.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nf_conntrack: avoid additional compare.Changli Gao2009-11-051-4/+10
| | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: remove synchronize_net() calls in ip_queue/ip6_queueEric Dumazet2009-11-042-2/+2
| | | | | | | | nf_unregister_queue_handlers() already does a synchronize_rcu() call, we dont need to do it again in callers. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: xt_socket: make module available for INPUT chainJan Engelhardt2009-10-291-2/+4
| | | | | | | | | | | This should make it possible to test for the existence of local sockets in the INPUT path. References: http://marc.info/?l=netfilter-devel&m=125380481517129&w=2 Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
* Merge branch 'master' of ↵David S. Miller2009-10-296-57/+49
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
| * mesh: use set_bit() to set MESH_WORK_HOUSEKEEPING.Rui Paulo2009-10-271-2/+2
| | | | | | | | | | | | | | | | | | This makes the mesh housekeeping timer work properly on big endian systems. Signed-off-by: Rui Paulo <rpaulo@gmail.com> Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * mac80211: Learn about mesh portals from multicast trafficJavier Cardona2009-10-271-6/+15
| | | | | | | | | | | | | | | | | | | | Mesh portals proxy traffic for nodes external to the mesh. When a proxied frame is received by a mesh interface, it should update its mesh portal table. This was only happening for unicast frames. With this change we also learn about mesh portals from proxied multicast frames. Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * mac80211: replace netif_tx_{start,stop,wake}_all_queuesJohn W. Linville2009-10-273-9/+9
| | | | | | | | | | | | | | | | Replace netif_tx_{start,stop,wake}_all_queues with the single-queue equivalents (i.e. netif_{start,stop,wake}_queue). Since we are down to a single queue, these should peform slightly better. Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * cfg80211: remove warning in deauth caseHolger Schurig2009-10-271-6/+0
| | | | | | | | | | | | | | | | | | It might be the case that __cfg80211_disconnected() has already cleaned up wdev->current_bss() for us. The old code didn't catch that situation and didn't warn needlessly. Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * cfg80211: no cookies in cfg80211_send_XXX()Holger Schurig2009-10-272-34/+23
| | | | | | | | | | | | | | Get rid of cookies in cfg80211_send_XXX() functions. Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* | net: Introduce dev_get_by_index_rcu()Eric Dumazet2009-10-291-10/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | Some workloads hit dev_base_lock rwlock pretty hard. We can use RCU lookups to avoid touching this rwlock. netdevices are already freed after a RCU grace period, so this patch adds no penalty at device dismantle time. dev_ifname() converted to dev_get_by_index_rcu() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Cleanup redundant tests on unsignedroel kluin2009-10-293-6/+1
| | | | | | | | | | | | | | optlen is unsigned so the `< 0' test is never true. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Cleanup redundant tests on unsignedroel kluin2009-10-292-2/+2
| | | | | | | | | | | | | | | | | | | | | | If there is data, the unsigned skb->len is greater than 0. rt.sigdigits is unsigned as well, so the test `>= 0' is always true, the other part of the test catches wrapped values. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Allow disabling of DSACK TCP option per routeGilad Ben-Yossef2009-10-291-2/+6
| | | | | | | | | | | | | | | | | | Add and use no DSCAK bit in the features field. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Allow to turn off TCP window scale opt per routeGilad Ben-Yossef2009-10-292-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | Add and use no window scale bit in the features field. Note that this is not the same as setting a window scale of 0 as would happen with window limit on route. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Allow disabling TCP timestamp options per routeGilad Ben-Yossef2009-10-292-3/+8
| | | | | | | | | | | | | | | | | | | | Implement querying and acting upon the no timestamp bit in the feature field. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Add the no SACK route option featureGilad Ben-Yossef2009-10-292-2/+5
| | | | | | | | | | | | | | | | | | | | Implement querying and acting upon the no sack bit in the features field. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Allow tcp_parse_options to consult dst entryGilad Ben-Yossef2009-10-296-41/+54
| | | | | | | | | | | | | | | | | | | | We need tcp_parse_options to be aware of dst_entry to take into account per dst_entry TCP options settings Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Only parse time stamp TCP option in time wait sockGilad Ben-Yossef2009-10-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | Since we only use tcp_parse_options here to check for the exietence of TCP timestamp option in the header, it is better to call with the "established" flag on. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Signed-off-by: Ori Finkelman <ori@comsleep.com> Signed-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ip6mr: Optimize multiple unregistrationEric Dumazet2009-10-291-5/+10
| | | | | | | | | | | | | | Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6 sit: Optimize multiple unregistrationEric Dumazet2009-10-291-6/+11
| | | | | | | | | | | | | | Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipmr: Optimize multiple unregistrationEric Dumazet2009-10-291-5/+10
| | | | | | | | | | | | | | Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ip6tnl: Optimize multiple unregistrationEric Dumazet2009-10-291-3/+8
| | | | | | | | | | | | | | Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bridge: Optimize multiple unregistrationEric Dumazet2009-10-291-10/+9
| | | | | | | | | | | | | | Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vlan: Add support to netdev_ops.ndo_fcoe_get_wwn for VLAN deviceYi Zou2009-10-291-0/+13
| | | | | | | | | | | | | | | | Implements the netdev_ops.ndo_fcoe_get_wwn for VLAN device. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Corrected spelling error heurestics->heuristicsAndreas Petlund2009-10-281-2/+2
| | | | | | | | | | | | | | Corrected a spelling error in a function name. Signed-off-by: Andreas Petlund <apetlund@simula.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sysfs: ethtool_ops can be NULLEric Dumazet2009-10-281-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d519e17e2d01a0ee9abe083019532061b4438065 (net: export device speed and duplex via sysfs) made the wrong assumption that netdev->ethtool_ops was always set. This makes possible to crash kernel and let rtnl in locked state. modprobe dummy ip link set dummy0 up (udev runs and crash) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | gre: Optimize multiple unregistrationEric Dumazet2009-10-281-5/+10
| | | | | | | | | | | | | | Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipip: Optimize multiple unregistrationEric Dumazet2009-10-281-6/+11
| | | | | | | | | | | | | | Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vlan: Optimize multiple unregistrationEric Dumazet2009-10-282-16/+34
| | | | | | | | | | | | | | Use unregister_netdevice_many() to speedup master device unregister. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: add a list_head parameter to dellink() methodEric Dumazet2009-10-284-13/+13
| | | | | | | | | | | | | | | | | | Adding a list_head parameter to rtnl_link_ops->dellink() methods allow us to queue devices on a list, in order to dismantle them all at once. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Introduce unregister_netdevice_many()Eric Dumazet2009-10-281-32/+65
| | | | | | | | | | | | | | | | | | | | Introduce rollback_registered_many() and unregister_netdevice_many() rollback_registered_many() is able to perform necessary steps at device dismantle time, factorizing two expensive synchronize_net() calls. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Introduce unregister_netdevice_queue()Eric Dumazet2009-10-281-7/+13
| | | | | | | | | | | | | | | | | | This patchs adds an unreg_list anchor to struct net_device, and introduces an unregister_netdevice_queue() function, able to queue a net_device to a list instead of immediately unregister it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2009-10-279-34/+107
|\ \ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/sh_eth.c
| * | pktgen: Dont leak kernel memoryEric Dumazet2009-10-241-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While playing with pktgen, I realized IP ID was not filled and a random value was taken, possibly leaking 2 bytes of kernel memory. We can use an increasing ID, this can help diagnostics anyway. Also clear packet payload, instead of leaking kernel memory. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: use WARN() for the WARN_ON in commit b6b39e8f3fbbbArjan van de Ven2009-10-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b6b39e8f3fbbb (tcp: Try to catch MSG_PEEK bug) added a printk() to the WARN_ON() that's in tcp.c. This patch changes this combination to WARN(); the advantage of WARN() is that the printk message shows up inside the message, so that kerneloops.org will collect the message. In addition, this gets rid of an extra if() statement. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: Try to catch MSG_PEEK bugHerbert Xu2009-10-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch tries to print out more information when we hit the MSG_PEEK bug in tcp_recvmsg. It's been around since at least 2005 and it's about time that we finally fix it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Fix IP_MULTICAST_IFEric Dumazet2009-10-192-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ipv4/ipv6 setsockopt(IP_MULTICAST_IF) have dubious __dev_get_by_index() calls. This function should be called only with RTNL or dev_base_lock held, or reader could see a corrupt hash chain and eventually enter an endless loop. Fix is to call dev_get_by_index()/dev_put(). If this happens to be performance critical, we could define a new dev_exist_by_index() function to avoid touching dev refcount. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bluetooth: static lock key fixDave Young2009-10-191-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When shutdown ppp connection, lockdep waring about non-static key will happen, it is caused by the lock is not initialized properly at that time. Fix with tuning the lock/skb_queue_head init order [ 94.339261] INFO: trying to register non-static key. [ 94.342509] the code is fine but needs lockdep annotation. [ 94.342509] turning off the locking correctness validator. [ 94.342509] Pid: 0, comm: swapper Not tainted 2.6.31-mm1 #2 [ 94.342509] Call Trace: [ 94.342509] [<c0248fbe>] register_lock_class+0x58/0x241 [ 94.342509] [<c024b5df>] ? __lock_acquire+0xb57/0xb73 [ 94.342509] [<c024ab34>] __lock_acquire+0xac/0xb73 [ 94.342509] [<c024b7fa>] ? lock_release_non_nested+0x17b/0x1de [ 94.342509] [<c024b662>] lock_acquire+0x67/0x84 [ 94.342509] [<c04cd1eb>] ? skb_dequeue+0x15/0x41 [ 94.342509] [<c054a857>] _spin_lock_irqsave+0x2f/0x3f [ 94.342509] [<c04cd1eb>] ? skb_dequeue+0x15/0x41 [ 94.342509] [<c04cd1eb>] skb_dequeue+0x15/0x41 [ 94.342509] [<c054a648>] ? _read_unlock+0x1d/0x20 [ 94.342509] [<c04cd641>] skb_queue_purge+0x14/0x1b [ 94.342509] [<fab94fdc>] l2cap_recv_frame+0xea1/0x115a [l2cap] [ 94.342509] [<c024b5df>] ? __lock_acquire+0xb57/0xb73 [ 94.342509] [<c0249c04>] ? mark_lock+0x1e/0x1c7 [ 94.342509] [<f8364963>] ? hci_rx_task+0xd2/0x1bc [bluetooth] [ 94.342509] [<fab95346>] l2cap_recv_acldata+0xb1/0x1c6 [l2cap] [ 94.342509] [<f8364997>] hci_rx_task+0x106/0x1bc [bluetooth] [ 94.342509] [<fab95295>] ? l2cap_recv_acldata+0x0/0x1c6 [l2cap] [ 94.342509] [<c02302c4>] tasklet_action+0x69/0xc1 [ 94.342509] [<c022fbef>] __do_softirq+0x94/0x11e [ 94.342509] [<c022fcaf>] do_softirq+0x36/0x5a [ 94.342509] [<c022fe14>] irq_exit+0x35/0x68 [ 94.342509] [<c0204ced>] do_IRQ+0x72/0x89 [ 94.342509] [<c02038ee>] common_interrupt+0x2e/0x34 [ 94.342509] [<c024007b>] ? pm_qos_add_requirement+0x63/0x9d [ 94.342509] [<c038e8a5>] ? acpi_idle_enter_bm+0x209/0x238 [ 94.342509] [<c049d238>] cpuidle_idle_call+0x5c/0x94 [ 94.342509] [<c02023f8>] cpu_idle+0x4e/0x6f [ 94.342509] [<c0534153>] rest_init+0x53/0x55 [ 94.342509] [<c0781894>] start_kernel+0x2f0/0x2f5 [ 94.342509] [<c0781091>] i386_start_kernel+0x91/0x96 Reported-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Tested-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bluetooth: scheduling while atomic bug fixDave Young2009-10-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to driver core changes dev_set_drvdata will call kzalloc which should be in might_sleep context, but hci_conn_add will be called in atomic context Like dev_set_name move dev_set_drvdata to work queue function. oops as following: Oct 2 17:41:59 darkstar kernel: [ 438.001341] BUG: sleeping function called from invalid context at mm/slqb.c:1546 Oct 2 17:41:59 darkstar kernel: [ 438.001345] in_atomic(): 1, irqs_disabled(): 0, pid: 2133, name: sdptool Oct 2 17:41:59 darkstar kernel: [ 438.001348] 2 locks held by sdptool/2133: Oct 2 17:41:59 darkstar kernel: [ 438.001350] #0: (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.+.}, at: [<faa1d2f5>] lock_sock+0xa/0xc [l2cap] Oct 2 17:41:59 darkstar kernel: [ 438.001360] #1: (&hdev->lock){+.-.+.}, at: [<faa20e16>] l2cap_sock_connect+0x103/0x26b [l2cap] Oct 2 17:41:59 darkstar kernel: [ 438.001371] Pid: 2133, comm: sdptool Not tainted 2.6.31-mm1 #2 Oct 2 17:41:59 darkstar kernel: [ 438.001373] Call Trace: Oct 2 17:41:59 darkstar kernel: [ 438.001381] [<c022433f>] __might_sleep+0xde/0xe5 Oct 2 17:41:59 darkstar kernel: [ 438.001386] [<c0298843>] __kmalloc+0x4a/0x15a Oct 2 17:41:59 darkstar kernel: [ 438.001392] [<c03f0065>] ? kzalloc+0xb/0xd Oct 2 17:41:59 darkstar kernel: [ 438.001396] [<c03f0065>] kzalloc+0xb/0xd Oct 2 17:41:59 darkstar kernel: [ 438.001400] [<c03f04ff>] device_private_init+0x15/0x3d Oct 2 17:41:59 darkstar kernel: [ 438.001405] [<c03f24c5>] dev_set_drvdata+0x18/0x26 Oct 2 17:41:59 darkstar kernel: [ 438.001414] [<fa51fff7>] hci_conn_init_sysfs+0x40/0xd9 [bluetooth] Oct 2 17:41:59 darkstar kernel: [ 438.001422] [<fa51cdc0>] ? hci_conn_add+0x128/0x186 [bluetooth] Oct 2 17:41:59 darkstar kernel: [ 438.001429] [<fa51ce0f>] hci_conn_add+0x177/0x186 [bluetooth] Oct 2 17:41:59 darkstar kernel: [ 438.001437] [<fa51cf8a>] hci_connect+0x3c/0xfb [bluetooth] Oct 2 17:41:59 darkstar kernel: [ 438.001442] [<faa20e87>] l2cap_sock_connect+0x174/0x26b [l2cap] Oct 2 17:41:59 darkstar kernel: [ 438.001448] [<c04c8df5>] sys_connect+0x60/0x7a Oct 2 17:41:59 darkstar kernel: [ 438.001453] [<c024b703>] ? lock_release_non_nested+0x84/0x1de Oct 2 17:41:59 darkstar kernel: [ 438.001458] [<c028804b>] ? might_fault+0x47/0x81 Oct 2 17:41:59 darkstar kernel: [ 438.001462] [<c028804b>] ? might_fault+0x47/0x81 Oct 2 17:41:59 darkstar kernel: [ 438.001468] [<c033361f>] ? __copy_from_user_ll+0x11/0xce Oct 2 17:41:59 darkstar kernel: [ 438.001472] [<c04c9419>] sys_socketcall+0x82/0x17b Oct 2 17:41:59 darkstar kernel: [ 438.001477] [<c020329d>] syscall_call+0x7/0xb Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: fix TCP_DEFER_ACCEPT retrans calculationJulian Anastasov2009-10-191-12/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix TCP_DEFER_ACCEPT conversion between seconds and retransmission to match the TCP SYN-ACK retransmission periods because the time is converted to such retransmissions. The old algorithm selects one more retransmission in some cases. Allow up to 255 retransmissions. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPTJulian Anastasov2009-10-191-3/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change SYN-ACK retransmitting code for the TCP_DEFER_ACCEPT users to not retransmit SYN-ACKs during the deferring period if ACK from client was received. The goal is to reduce traffic during the deferring period. When the period is finished we continue with sending SYN-ACKs (at least one) but this time any traffic from client will change the request to established socket allowing application to terminate it properly. Also, do not drop acked request if sending of SYN-ACK fails. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: accept socket after TCP_DEFER_ACCEPT periodJulian Anastasov2009-10-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Willy Tarreau and many other folks in recent years were concerned what happens when the TCP_DEFER_ACCEPT period expires for clients which sent ACK packet. They prefer clients that actively resend ACK on our SYN-ACK retransmissions to be converted from open requests to sockets and queued to the listener for accepting after the deferring period is finished. Then application server can decide to wait longer for data or to properly terminate the connection with FIN if read() returns EAGAIN which is an indication for accepting after the deferring period. This change still can have side effects for applications that expect always to see data on the accepted socket. Others can be prepared to work in both modes (with or without TCP_DEFER_ACCEPT period) and their data processing can ignore the read=EAGAIN notification and to allocate resources for clients which proved to have no data to send during the deferring period. OTOH, servers that use TCP_DEFER_ACCEPT=1 as flag (not as a timeout) to wait for data will notice clients that didn't send data for 3 seconds but that still resend ACKs. Thanks to Willy Tarreau for the initial idea and to Eric Dumazet for the review and testing the change. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Revert "tcp: fix tcp_defer_accept to consider the timeout"David S. Miller2009-10-191-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 6d01a026b7d3009a418326bdcf313503a314f1ea. Julian Anastasov, Willy Tarreau and Eric Dumazet have come up with a more correct way to deal with this. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | AF_UNIX: Fix deadlock on connecting to shutdown socketTomoki Sekiyama2009-10-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found a deadlock bug in UNIX domain socket, which makes able to DoS attack against the local machine by non-root users. How to reproduce: 1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct namespace(*), and shutdown(2) it. 2. Repeat connect(2)ing to the listening socket from the other sockets until the connection backlog is full-filled. 3. connect(2) takes the CPU forever. If every core is taken, the system hangs. PoC code: (Run as many times as cores on SMP machines.) int main(void) { int ret; int csd; int lsd; struct sockaddr_un sun; /* make an abstruct name address (*) */ memset(&sun, 0, sizeof(sun)); sun.sun_family = PF_UNIX; sprintf(&sun.sun_path[1], "%d", getpid()); /* create the listening socket and shutdown */ lsd = socket(AF_UNIX, SOCK_STREAM, 0); bind(lsd, (struct sockaddr *)&sun, sizeof(sun)); listen(lsd, 1); shutdown(lsd, SHUT_RDWR); /* connect loop */ alarm(15); /* forcely exit the loop after 15 sec */ for (;;) { csd = socket(AF_UNIX, SOCK_STREAM, 0); ret = connect(csd, (struct sockaddr *)&sun, sizeof(sun)); if (-1 == ret) { perror("connect()"); break; } puts("Connection OK"); } return 0; } (*) Make sun_path[0] = 0 to use the abstruct namespace. If a file-based socket is used, the system doesn't deadlock because of context switches in the file system layer. Why this happens: Error checks between unix_socket_connect() and unix_wait_for_peer() are inconsistent. The former calls the latter to wait until the backlog is processed. Despite the latter returns without doing anything when the socket is shutdown, the former doesn't check the shutdown state and just retries calling the latter forever. Patch: The patch below adds shutdown check into unix_socket_connect(), so connect(2) to the shutdown socket will return -ECONREFUSED. Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com> Signed-off-by: Masanori Yoshida <masanori.yoshida.tv@hitachi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | vlan: allow null VLAN ID to be usedEric Dumazet2009-10-274-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently use a 16 bit field (vlan_tci) to store VLAN ID/PRIO on a skb. Null value is used as a special value, meaning vlan tagging not enabled. This forbids use of null vlan ID. As pointed by David, some drivers use the 3 high order bits (PRIO) As VLAN ID is 12 bits, we can use the remaining bit (CFI) as a flag, and allow null VLAN ID. In case future code really wants to use VLAN_CFI_MASK, we'll have to use a bit outside of vlan_tci. #define VLAN_PRIO_MASK 0xe000 /* Priority Code Point */ #define VLAN_PRIO_SHIFT 13 #define VLAN_CFI_MASK 0x1000 /* Canonical Format Indicator */ #define VLAN_TAG_PRESENT VLAN_CFI_MASK #define VLAN_VID_MASK 0x0fff /* VLAN Identifier */ Reported-by: Gertjan Hofman <gertjan_hofman@yahoo.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | rtnetlink: speedup rtnl_dump_ifinfo()Eric Dumazet2009-10-242-18/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When handling large number of netdevice, rtnl_dump_ifinfo() is very slow because it has O(N^2) complexity. Instead of scanning one single list, we can use the 256 sub lists of the dev_index hash table. This considerably speedups "ip link" operations Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | gre: convert hash tables locking to RCUEric Dumazet2009-10-241-17/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GRE tunnels use one rwlock to protect their hash tables. This locking scheme can be converted to RCU for free, since netdevice already must wait for a RCU grace period at dismantle time. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ip6tnl: convert hash tables locking to RCUEric Dumazet2009-10-241-19/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ip6_tunnels use one rwlock to protect their hash tables. This locking scheme can be converted to RCU for free, since netdevice already must wait for a RCU grace period at dismantle time. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipip: convert hash tables locking to RCUEric Dumazet2009-10-241-21/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPIP tunnels use one rwlock to protect their hash tables. This locking scheme can be converted to RCU for free, since netdevice already must wait for a RCU grace period at dismantle time. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>