aboutsummaryrefslogtreecommitdiffstats
path: root/net/core/rtnetlink.c
Commit message (Collapse)AuthorAgeFilesLines
* net: RTNETLINK adjusting values of min_ifinfo_dump_sizeStefan Gula2012-01-261-0/+3
| | | | | | | | | Setting link parameters on a netdevice changes the value of if_nlmsg_size(), therefore it is necessary to recalculate min_ifinfo_dump_size. Signed-off-by: Stefan Gula <steweg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-securityLinus Torvalds2012-01-141-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://selinuxproject.org/~jmorris/linux-security: capabilities: remove __cap_full_set definition security: remove the security_netlink_recv hook as it is equivalent to capable() ptrace: do not audit capability check when outputing /proc/pid/stat capabilities: remove task_ns_* functions capabitlies: ns_capable can use the cap helpers rather than lsm call capabilities: style only - move capable below ns_capable capabilites: introduce new has_ns_capabilities_noaudit capabilities: call has_ns_capability from has_capability capabilities: remove all _real_ interfaces capabilities: introduce security_capable_noaudit capabilities: reverse arguments to security_capable capabilities: remove the task from capable LSM hook entirely selinux: sparse fix: fix several warnings in the security server cod selinux: sparse fix: fix warnings in netlink code selinux: sparse fix: eliminate warnings for selinuxfs selinux: sparse fix: declare selinux_disable() in security.h selinux: sparse fix: move selinux_complete_init selinux: sparse fix: make selinux_secmark_refcount static SELinux: Fix RCU deref check warning in sel_netport_insert() Manually fix up a semantic mis-merge wrt security_netlink_recv(): - the interface was removed in commit fd7784615248 ("security: remove the security_netlink_recv hook as it is equivalent to capable()") - a new user of it appeared in commit a38f7907b926 ("crypto: Add userspace configuration API") causing no automatic merge conflict, but Eric Paris pointed out the issue.
| * security: remove the security_netlink_recv hook as it is equivalent to capable()Eric Paris2012-01-051-1/+1
| | | | | | | | | | | | | | | | | | | | Once upon a time netlink was not sync and we had to get the effective capabilities from the skb that was being received. Today we instead get the capabilities from the current task. This has rendered the entire purpose of the hook moot as it is now functionally equivalent to the capable() call. Signed-off-by: Eric Paris <eparis@redhat.com>
* | rtnetlink: rtnl_link_register() sanity testEric Dumazet2011-12-141-11/+14
| | | | | | | | | | | | | | | | | | | | | | Before adding a struct rtnl_link_ops into link_ops list, check it doesnt clash with a prior one. Based on a previous patch from Alexander Smirnov Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | if_link: Add additional parameter to IFLA_VF_INFO for spoof checkingGreg Rose2011-10-161-3/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add configuration setting for drivers to turn spoof checking on or off for discrete VFs. v2 - Fix indentation problem, wrap the ifla_vf_info structure in #ifdef __KERNEL__ to prevent user space from accessing and change function paramater for the spoof check setting netdev op from u8 to bool. v3 - Preset spoof check setting to -1 so that user space tools such as ip can detect that the driver didn't report a spoofcheck setting. Prevents incorrect display of spoof check settings for drivers that don't report it. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | rtnetlink: remove initialization of dev->real_num_tx_queuesJiri Pirko2011-08-111-1/+0
|/ | | | | | | dev->real_num_tx_queues is correctly set already in alloc_netdev_mqs. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* rtnl: provide link dump consistency infoThomas Graf2011-07-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a change sequence counter to each net namespace which is bumped whenever a netdevice is added or removed from the list. If such a change occurred while a link dump took place, the dump will have the NLM_F_DUMP_INTR flag set in the first message which has been interrupted and in all subsequent messages of the same dump. Note that links may still be modified or renamed while a dump is taking place but we can guarantee for userspace to receive a complete list of links and not miss any. Testing: I have added 500 VLAN netdevices to make sure the dump is split over multiple messages. Then while continuously dumping links in one process I also continuously deleted and re-added a dummy netdevice in another process. Multiple dumps per seconds have had the NLM_F_DUMP_INTR flag set. I guess we can wait for Johannes patch to hit net-next via the wireless tree. I just wanted to give this some testing right away. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* rtnetlink: Compute and store minimum ifinfo dump sizeGreg Rose2011-06-091-11/+49
| | | | | | | | | | | | | | | The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfdLinus Torvalds2011-05-251-1/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd: net: fix get_net_ns_by_fd for !CONFIG_NET_NS ns proc: Return -ENOENT for a nonexistent /proc/self/ns/ entry. ns: Declare sys_setns in syscalls.h net: Allow setting the network namespace by fd ns proc: Add support for the ipc namespace ns proc: Add support for the uts namespace ns proc: Add support for the network namespace. ns: Introduce the setns syscall ns: proc files for namespace naming policy.
| * net: Allow setting the network namespace by fdEric W. Biederman2011-05-101-1/+4
| | | | | | | | | | | | | | | | | | | | Take advantage of the new abstraction and allow network devices to be placed in any network namespace that we have a fd to talk about. Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* | net: hold rtnl again in dump callbacksEric Dumazet2011-05-251-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e67f88dd12f6 (dont hold rtnl mutex during netlink dump callbacks) missed fact that rtnl_fill_ifinfo() must be called with rtnl held. Because of possible deadlocks between two mutexes (cb_mutex and rtnl), its not easy to solve this problem, so revert this part of the patch. It also forgot one rcu_read_unlock() in FIB dump_rules() Add one ASSERT_RTNL() in rtnl_fill_ifinfo() to remind us the rule. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | rtnetlink: ignore NETDEV_RELEASE and NETDEV_JOIN eventAmerigo Wang2011-05-221-0/+2
| | | | | | | | | | | | | | These two events are not expected to be caught by userspace. Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: use batched device unregister in veth and macvlanEric Dumazet2011-05-091-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | veth devices dont use the batched device unregisters yet. Since veth are a pair of devices, it makes sense to use a batch of two unregisters, this roughly divides dismantle time by two. Fix this by changing dellink() callers to always provide a non NULL head. (Idea from Michał Mirosław) This patch also handles macvlan case : We now dismantle all macvlans on top of a lower dev at once. Reported-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Michał Mirosław <mirqus@gmail.com> Cc: Jesse Gross <jesse@nicira.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: call dev_alloc_name from register_netdeviceJiri Pirko2011-05-051-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Force dev_alloc_name() to be called from register_netdevice() by dev_get_valid_name(). That allows to remove multiple explicit dev_alloc_name() calls. The possibility to call dev_alloc_name in advance remains. This also fixes veth creation regresion caused by 84c49d8c3e4abefb0a41a77b25aa37ebe8d6b743 Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: dont hold rtnl mutex during netlink dump callbacksEric Dumazet2011-05-021-7/+5
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | Four years ago, Patrick made a change to hold rtnl mutex during netlink dump callbacks. I believe it was a wrong move. This slows down concurrent dumps, making good old /proc/net/ files faster than rtnetlink in some situations. This occurred to me because one "ip link show dev ..." was _very_ slow on a workload adding/removing network devices in background. All dump callbacks are able to use RCU locking now, so this patch does roughly a revert of commits : 1c2d670f366 : [RTNETLINK]: Hold rtnl_mutex during netlink dump callbacks 6313c1e0992 : [RTNETLINK]: Remove unnecessary locking in dump callbacks This let writers fight for rtnl mutex and readers going full speed. It also takes care of phonet : phonet_route_get() is now called from rcu read section. I renamed it to phonet_route_get_rcu() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Remi Denis-Courmont <remi.denis-courmont@nokia.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Fix common misspellingsLucas De Marchi2011-03-311-2/+2
| | | | | | Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
* rtnetlink: implement setting of master deviceJiri Pirko2011-02-131-0/+43
| | | | | | | | | | This patch allows userspace to enslave/release slave devices via netlink interface using IFLA_MASTER. This introduces generic way to add/remove underling devices. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2011-01-311-0/+3
|\ | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * net: Fix ip link add netns oopsEric W. Biederman2011-01-301-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ed Swierk <eswierk@bigswitch.com> writes: > On 2.6.35.7 > ip link add link eth0 netns 9999 type macvlan > where 9999 is a nonexistent PID triggers an oops and causes all network functions to hang: > [10663.821898] BUG: unable to handle kernel NULL pointer dereference at 000000000000006d > [10663.821917] IP: [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170 > [10663.821933] PGD 1d3927067 PUD 22f5c5067 PMD 0 > [10663.821944] Oops: 0000 [#1] SMP > [10663.821953] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq > [10663.821959] CPU 3 > [10663.821963] Modules linked in: macvlan ip6table_filter ip6_tables rfcomm ipt_MASQUERADE binfmt_misc iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack sco ipt_REJECT bnep l2cap xt_tcpudp iptable_filter ip_tables x_tables bridge stp vboxnetadp vboxnetflt vboxdrv kvm_intel kvm parport_pc ppdev snd_hda_codec_intelhdmi snd_hda_codec_conexant arc4 iwlagn iwlcore mac80211 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi i915 snd_seq_midi_event snd_seq thinkpad_acpi drm_kms_helper btusb tpm_tis nvram uvcvideo snd_timer snd_seq_device bluetooth videodev v4l1_compat v4l2_compat_ioctl32 tpm drm tpm_bios snd cfg80211 psmouse serio_raw intel_ips soundcore snd_page_alloc intel_agp i2c_algo_bit video output netconsole configfs lp parport usbhid hid e1000e sdhci_pci ahci libahci sdhci led_class > [10663.822155] > [10663.822161] Pid: 6000, comm: ip Not tainted 2.6.35-23-generic #41-Ubuntu 2901CTO/2901CTO > [10663.822167] RIP: 0010:[<ffffffff8149c2fa>] [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170 > [10663.822177] RSP: 0018:ffff88014aebf7b8 EFLAGS: 00010286 > [10663.822182] RAX: 00000000fffffff4 RBX: ffff8801ad900800 RCX: 0000000000000000 > [10663.822187] RDX: ffff880000000000 RSI: 0000000000000000 RDI: ffff88014ad63000 > [10663.822191] RBP: ffff88014aebf808 R08: 0000000000000041 R09: 0000000000000041 > [10663.822196] R10: 0000000000000000 R11: dead000000200200 R12: ffff88014aebf818 > [10663.822201] R13: fffffffffffffffd R14: ffff88014aebf918 R15: ffff88014ad62000 > [10663.822207] FS: 00007f00c487f700(0000) GS:ffff880001f80000(0000) knlGS:0000000000000000 > [10663.822212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [10663.822216] CR2: 000000000000006d CR3: 0000000231f19000 CR4: 00000000000026e0 > [10663.822221] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [10663.822226] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [10663.822231] Process ip (pid: 6000, threadinfo ffff88014aebe000, task ffff88014afb16e0) > [10663.822236] Stack: > [10663.822240] ffff88014aebf808 ffffffff814a2bb5 ffff88014aebf7e8 00000000a00ee8d6 > [10663.822251] <0> 0000000000000000 ffffffffa00ef940 ffff8801ad900800 ffff88014aebf818 > [10663.822265] <0> ffff88014aebf918 ffff8801ad900800 ffff88014aebf858 ffffffff8149c413 > [10663.822281] Call Trace: > [10663.822290] [<ffffffff814a2bb5>] ? dev_addr_init+0x75/0xb0 > [10663.822298] [<ffffffff8149c413>] dev_alloc_name+0x43/0x90 > [10663.822307] [<ffffffff814a85ee>] rtnl_create_link+0xbe/0x1b0 > [10663.822314] [<ffffffff814ab2aa>] rtnl_newlink+0x48a/0x570 > [10663.822321] [<ffffffff814aafcc>] ? rtnl_newlink+0x1ac/0x570 > [10663.822332] [<ffffffff81030064>] ? native_x2apic_icr_read+0x4/0x20 > [10663.822339] [<ffffffff814a8c17>] rtnetlink_rcv_msg+0x177/0x290 > [10663.822346] [<ffffffff814a8aa0>] ? rtnetlink_rcv_msg+0x0/0x290 > [10663.822354] [<ffffffff814c25d9>] netlink_rcv_skb+0xa9/0xd0 > [10663.822360] [<ffffffff814a8a85>] rtnetlink_rcv+0x25/0x40 > [10663.822367] [<ffffffff814c223e>] netlink_unicast+0x2de/0x2f0 > [10663.822374] [<ffffffff814c303e>] netlink_sendmsg+0x1fe/0x2e0 > [10663.822383] [<ffffffff81488533>] sock_sendmsg+0xf3/0x120 > [10663.822391] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20 > [10663.822400] [<ffffffff81168656>] ? __d_lookup+0x136/0x150 > [10663.822406] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20 > [10663.822414] [<ffffffff812b7a0d>] ? _atomic_dec_and_lock+0x4d/0x80 > [10663.822422] [<ffffffff8116ea90>] ? mntput_no_expire+0x30/0x110 > [10663.822429] [<ffffffff81486ff5>] ? move_addr_to_kernel+0x65/0x70 > [10663.822435] [<ffffffff81493308>] ? verify_iovec+0x88/0xe0 > [10663.822442] [<ffffffff81489020>] sys_sendmsg+0x240/0x3a0 > [10663.822450] [<ffffffff8111e2a9>] ? __do_fault+0x479/0x560 > [10663.822457] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20 > [10663.822465] [<ffffffff8116cf4a>] ? alloc_fd+0x10a/0x150 > [10663.822473] [<ffffffff8158d76e>] ? do_page_fault+0x15e/0x350 > [10663.822482] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b > [10663.822487] Code: 90 48 8d 78 02 be 25 00 00 00 e8 92 1d e2 ff 48 85 c0 75 cf bf 20 00 00 00 e8 c3 b1 c6 ff 49 89 c7 b8 f4 ff ff ff 4d 85 ff 74 bd <4d> 8b 75 70 49 8d 45 70 48 89 45 b8 49 83 ee 58 eb 28 48 8d 55 > [10663.822618] RIP [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170 > [10663.822627] RSP <ffff88014aebf7b8> > [10663.822631] CR2: 000000000000006d > [10663.822636] ---[ end trace 3dfd6c3ad5327ca7 ]--- This bug was introduced in: commit 81adee47dfb608df3ad0b91d230fb3cef75f0060 Author: Eric W. Biederman <ebiederm@aristanetworks.com> Date: Sun Nov 8 00:53:51 2009 -0800 net: Support specifying the network namespace upon device creation. There is no good reason to not support userspace specifying the network namespace during device creation, and it makes it easier to create a network device and pass it to a child network namespace with a well known name. We have to be careful to ensure that the target network namespace for the new device exists through the life of the call. To keep that logic clear I have factored out the network namespace grabbing logic into rtnl_link_get_net. In addtion we need to continue to pass the source network namespace to the rtnl_link_ops.newlink method so that we can find the base device source network namespace. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Where apparently I forgot to add error handling to the path where we create a new network device in a new network namespace, and pass in an invalid pid. Cc: stable@kernel.org Reported-by: Ed Swierk <eswierk@bigswitch.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2011-01-271-2/+1
|\ \ | |/ | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * net: fix validate_link_af in rtnetlink coreKurt Van Dijck2011-01-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | I'm testing an API that uses IFLA_AF_SPEC attribute. In the rtnetlink core , the set_link_af() member of the rtnl_af_ops struct receives the nested attribute (as I expected), but the validate_link_af() member receives the parent attribute. IMO, this patch fixes this. Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2011-01-241-1/+1
|\ \ | |/ | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/sched/sch_hfsc.c net/sched/sch_htb.c net/sched/sch_tbf.c
| * Revert "netlink: test for all flags of the NLM_F_DUMP composite"David S. Miller2011-01-191-1/+1
| | | | | | | | | | | | | | | | This reverts commit 0ab03c2b1478f2438d2c80204f7fef65b1bca9cf. It breaks several things including the avahi daemon. Signed-off-by: David S. Miller <davem@davemloft.net>
* | rtnetlink: fix link attribute validation with IFLA_GROUPPatrick McHardy2011-01-201-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rtnl_group_changelink() is invoked by rtnl_newlink() before the link attributes have been validated. Additionally the group changes are performed even if NLM_F_CREATE is specified and a new link is created, while more reasonable semantics would be to set the group value on the newly created link. Fix both problems by moving the rtnl_group_changelink() invocation down to the handling of non-existant links without NLM_F_CREATE() and add a dev_set_group() call to rtnl_create_link(). Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Vlad Dogaru <ddvlad@rosedu.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: support setting devgroup parametersVlad Dogaru2011-01-191-4/+28
| | | | | | | | | | | | | | | | | | | | If a rtnetlink request specifies a negative or zero ifindex and has no interface name attribute, but has a group attribute, then the chenges are made to all the interfaces belonging to the specified group. Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_device: add support for network device groupsVlad Dogaru2011-01-191-0/+6
|/ | | | | | | | | | Net devices can now be grouped, enabling simpler manipulation from userspace. This patch adds a group field to the net_device structure, as well as rtnetlink support to query and modify it. Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: test for all flags of the NLM_F_DUMP compositeJan Engelhardt2011-01-091-1/+1
| | | | | | | | | | | | | Due to NLM_F_DUMP is composed of two bits, NLM_F_ROOT | NLM_F_MATCH, when doing "if (x & NLM_F_DUMP)", it tests for _either_ of the bits being set. Because NLM_F_MATCH's value overlaps with NLM_F_EXCL, non-dump requests with NLM_F_EXCL set are mistaken as dump requests. Substitute the condition to test for _all_ bits being set. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* rtnl: make link af-specific updates atomicThomas Graf2010-11-271-5/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | As David pointed out correctly, updates to af-specific attributes are currently not atomic. If multiple changes are requested and one of them fails, previous updates may have been applied already leaving the link behind in a undefined state. This patch splits the function parse_link_af() into two functions validate_link_af() and set_link_at(). validate_link_af() is placed to validate_linkmsg() check for errors as early as possible before any changes to the link have been made. set_link_af() is called to commit the changes later. This method is not fail proof, while it is currently sufficient to make set_link_af() inerrable and thus 100% atomic, the validation function method will not be able to detect all error scenarios in the future, there will likely always be errors depending on states which are f.e. not protected by rtnl_mutex and thus may change between validation and setting. Also, instead of silently ignoring unknown address families and config blocks for address families which did not register a set function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are returned to avoid comitting 4 out of 5 update requests without notifying the user. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* rtnetlink: Link address family APIThomas Graf2010-11-171-2/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each net_device contains address family specific data such as per device settings and statistics. We already expose this data via procfs/sysfs and partially netlink. The netlink method requires the requester to send one RTM_GETLINK request for each address family it wishes to receive data of and then merge this data itself. This patch implements a new API which combines all address family specific link data in a new netlink attribute IFLA_AF_SPEC. IFLA_AF_SPEC contains a sequence of nested attributes, one for each address family which in turn defines the structure of its own attribute. Example: [IFLA_AF_SPEC] = { [AF_INET] = { [IFLA_INET_CONF] = ..., }, [AF_INET6] = { [IFLA_INET6_FLAGS] = ..., [IFLA_INET6_CONF] = ..., } } The API also allows for address families to implement a function which parses the IFLA_AF_SPEC attribute sent by userspace to implement address family specific link options. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* rtnetlink: Fix message size calculation for link messagesThomas Graf2010-11-121-4/+5
| | | | | | | | | | | | | | | | | | | nlmsg_total_size() calculates the length of a netlink message including header and alignment. nla_total_size() calculates the space an individual attribute consumes which was meant to be used in this context. Also, ensure to account for the attribute header for the IFLA_INFO_XSTATS attribute as implementations of get_xstats_size() seem to assume that we do so. The addition of two message headers minus the missing attribute header resulted in a calculated message size that was larger than required. Therefore we never risked running out of skb tailroom. Signed-off-by: Thomas Graf <tgraf@infradead.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* rtnetlink: remove rtnl_kill_linksstephen hemminger2010-10-211-8/+0
| | | | | | | The function rtnl_kill_links is defined but never used. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: copy_rtnl_link_stats64() simplificationEric Dumazet2010-08-231-30/+1
| | | | | | | | | No need to use a temporary struct rtnl_link_stats64 variable, just copy the source to skb buffer. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/core: EXPORT_SYMBOL cleanupsEric Dumazet2010-07-121-1/+1
| | | | | | | | | CodingStyle cleanups EXPORT_SYMBOL should immediately follow the symbol declaration. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix 64 bit counters on 32 bit archesEric Dumazet2010-07-071-1/+2
| | | | | | | | | | | | | | | | | | There is a small possibility that a reader gets incorrect values on 32 bit arches. SNMP applications could catch incorrect counters when a 32bit high part is changed by another stats consumer/provider. One way to solve this is to add a rtnl_link_stats64 param to all ndo_get_stats64() methods, and also add such a parameter to dev_get_stats(). Rule is that we are not allowed to use dev->stats64 as a temporary storage for 64bit stats, but a caller provided area (usually on stack) Old drivers (only providing get_stats() method) need no changes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Enable 64-bit net device statistics on 32-bit architecturesBen Hutchings2010-06-121-3/+3
| | | | | | | | | | | | | | | | | | | | Use struct rtnl_link_stats64 as the statistics structure. On 32-bit architectures, insert 32 bits of padding after/before each field of struct net_device_stats to make its layout compatible with struct rtnl_link_stats64. Add an anonymous union in net_device; move stats into the union and add struct rtnl_link_stats64 stats64. Add net_device_ops::ndo_get_stats64, implementations of which will return a pointer to struct rtnl_link_stats64. Drivers that implement this operation must not update the structure asynchronously. Change dev_get_stats() to call ndo_get_stats64 if available, and to return a pointer to struct rtnl_link_stats64. Change callers of dev_get_stats() accordingly. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: bug fix: wrong size was calculated for vfinfo list blobScott Feldman2010-05-281-5/+6
| | | | | | | | | | The wrong size was being calculated for vfinfo. In one case, it was over- calculating using nlmsg_total_size on attrs, in another case, it was under-calculating by assuming ifla_vf_* structs are packed together, but each struct is it's own attr w/ hdr (and padding). Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: bug fix: don't overrun skbs on vf_port dumpScott Feldman2010-05-281-6/+9
| | | | | | | | | | | Noticed by Patrick McHardy: was continuing to fill skb after a nla_put_failure, ignoring the size calculated by upper layer. Now, return -EMSGSIZE on any overruns, but also allow netdev to fail ndo_get_vf_port with error other than -EMSGSIZE, thus unwinding nest. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* rtnetlink: Fix error handling in do_setlink()David Howells2010-05-231-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit c02db8c6290bb992442fec1407643c94cc414375: Author: Chris Wright <chrisw@sous-sol.org> Date: Sun May 16 01:05:45 2010 -0700 Subject: rtnetlink: make SR-IOV VF interface symmetric adds broken error handling to do_setlink() in net/core/rtnetlink.c. The problem is the following chunk of code: if (tb[IFLA_VFINFO_LIST]) { struct nlattr *attr; int rem; nla_for_each_nested(attr, tb[IFLA_VFINFO_LIST], rem) { if (nla_type(attr) != IFLA_VF_INFO) ----> goto errout; err = do_setvfinfo(dev, attr); if (err < 0) goto errout; modified = 1; } } which can get to errout without setting err, resulting in the following error: net/core/rtnetlink.c: In function 'do_setlink': net/core/rtnetlink.c:904: warning: 'err' may be used uninitialized in this function Change the code to return -EINVAL in this case. Note that this might not be the appropriate error though. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chris Wright <chrisw@sous-sol.org> cc: David S. Miller <davem@davemloft.net> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Add netlink support for virtual port management (was iovnl)Scott Feldman2010-05-171-1/+168
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new netdev ops ndo_{set|get}_vf_port to allow setting of port-profile on a netdev interface. Extends netlink socket RTM_SETLINK/ RTM_GETLINK with two new sub msgs called IFLA_VF_PORTS and IFLA_PORT_SELF (added to end of IFLA_cmd list). These are both nested atrtibutes using this layout: [IFLA_NUM_VF] [IFLA_VF_PORTS] [IFLA_VF_PORT] [IFLA_PORT_*], ... [IFLA_VF_PORT] [IFLA_PORT_*], ... ... [IFLA_PORT_SELF] [IFLA_PORT_*], ... These attributes are design to be set and get symmetrically. VF_PORTS is a list of VF_PORTs, one for each VF, when dealing with an SR-IOV device. PORT_SELF is for the PF of the SR-IOV device, in case it wants to also have a port-profile, or for the case where the VF==PF, like in enic patch 2/2 of this patch set. A port-profile is used to configure/enable the external switch virtual port backing the netdev interface, not to configure the host-facing side of the netdev. A port-profile is an identifier known to the switch. How port- profiles are installed on the switch or how available port-profiles are made know to the host is outside the scope of this patch. There are two types of port-profiles specs in the netlink msg. The first spec is for 802.1Qbg (pre-)standard, VDP protocol. The second spec is for devices that run a similar protocol as VDP but in firmware, thus hiding the protocol details. In either case, the specs have much in common and makes sense to define the netlink msg as the union of the two specs. For example, both specs have a notition of associating/deassociating a port-profile. And both specs require some information from the hypervisor manager, such as client port instance ID. The general flow is the port-profile is applied to a host netdev interface using RTM_SETLINK, the receiver of the RTM_SETLINK msg communicates with the switch, and the switch virtual port backing the host netdev interface is configured/enabled based on the settings defined by the port-profile. What those settings comprise, and how those settings are managed is again outside the scope of this patch, since this patch only deals with the first step in the flow. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2010-05-161-49/+110
|\ | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: include/linux/if_link.h
| * rtnetlink: make SR-IOV VF interface symmetricChris Wright2010-05-161-49/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now we have a set of nested attributes: IFLA_VFINFO_LIST (NESTED) IFLA_VF_INFO (NESTED) IFLA_VF_MAC IFLA_VF_VLAN IFLA_VF_TX_RATE This allows a single set to operate on multiple attributes if desired. Among other things, it means a dump can be replayed to set state. The current interface has yet to be released, so this seems like something to consider for 2.6.34. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2010-04-271-7/+7
|\ \ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6
| * | net: rtnetlink: decouple rtnetlink address families from real address familiesPatrick McHardy2010-04-261-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Decouple rtnetlink address families from real address families in socket.h to be able to add rtnetlink interfaces to code that is not a real address family without increasing AF_MAX/NPROTO. This will be used to add support for multicast route dumping from all tables as the proc interface can't be extended to support anything but the main table without breaking compatibility. This partialy undoes the patch to introduce independant families for routing rules and converts ipmr routing rules to a new rtnetlink family. Similar to that patch, values up to 127 are reserved for real address families, values above that may be used arbitrarily. Signed-off-by: Patrick McHardy <kaber@trash.net>
* | | Merge branch 'master' of ↵David S. Miller2010-04-271-2/+3
|\ \ \ | |/ / |/| / | |/ | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/e100.c drivers/net/e1000e/netdev.c
| * rtnetlink: potential ERR_PTR dereferenceDan Carpenter2010-04-221-2/+3
| | | | | | | | | | | | | | | | | | In the original code, if rtnl_create_link() returned an ERR_PTR then that would get passed to rtnl_configure_link() which dereferences it. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: fib_rules: decouple address families from real address familiesPatrick McHardy2010-04-131-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Decouple the address family values used for fib_rules from the real address families in socket.h. This allows to use fib_rules for code that is not a real address family without increasing AF_MAX/NPROTO. Values up to 127 are reserved for real address families and map directly to the corresponding AF value, values starting from 128 are for other uses. rtnetlink is changed to invoke the AF_UNSPEC dumpit/doit handlers for these families. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: increase preallocated size of nlmsg to accomodate for IFLA_STATS64Jan Engelhardt2010-03-271-0/+1
| | | | | | | | | | | | | | | | | | When more data is stuffed into an nlmsg than initially projected, an extra allocation needs to be done. Reserve enough for IFLA_STATS64 so that this does not to needlessy happen. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: fix unaligned access in IFLA_STATS64Jan Engelhardt2010-03-271-31/+31
| | | | | | | | | | | | | | | | | | | | Tony Luck observes that the original IFLA_STATS64 submission causes unaligned accesses. This is because nla_data() returns a pointer to a memory region that is only aligned to 32 bits. Do some memcpying to workaround this. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: rtnetlink: ignore NETDEV_PRE_TYPE_CHANGE in rtnetlink_event()Patrick McHardy2010-03-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | Ignore the new NETDEV_PRE_TYPE_CHANGE event in rtnetlink_event() since there have been no changes userspace needs to be notified of. Also add a comment to the netdev notifier event definitions to remind people to update the exclusion list when adding new event types. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: core: add IFLA_STATS64 supportJan Engelhardt2010-03-161-1/+41
|/ | | | | | | | | | | `ip -s link` shows interface counters truncated to 32 bit. This is because interface statistics are transported only in 32-bit quantity to userspace. This commit adds a new IFLA_STATS64 attribute that exports them in full 64 bit. References: http://lkml.indiana.edu/hypermail/linux/kernel/0307.3/0215.html Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>