aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4/ip_fragment.c
Commit message (Collapse)AuthorAgeFilesLines
* net: ip_expire() must revalidate routeEric Dumazet2011-05-041-16/+15
| | | | | | | | | | | | | | | | | | | | Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path) added a bug in IP defragmentation handling, in case timeout is fired. When a frame is defragmented, we use last skb dst field when building final skb. Its dst is valid, since we are in rcu read section. But if a timeout occurs, we take first queued fragment to build one ICMP TIME EXCEEDED message. Problem is all queued skb have weak dst pointers, since we escaped RCU critical section after their queueing. icmp_send() might dereference a now freed (and possibly reused) part of memory. Calling skb_dst_drop() and ip_route_input_noref() to revalidate route is the only possible choice. Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: IP defragmentation must be ECN awareEric Dumazet2011-01-061-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC3168 (The Addition of Explicit Congestion Notification to IP) states : 5.3. Fragmentation ECN-capable packets MAY have the DF (Don't Fragment) bit set. Reassembly of a fragmented packet MUST NOT lose indications of congestion. In other words, if any fragment of an IP packet to be reassembled has the CE codepoint set, then one of two actions MUST be taken: * Set the CE codepoint on the reassembled packet. However, this MUST NOT occur if any of the other fragments contributing to this reassembly carries the Not-ECT codepoint. * The packet is dropped, instead of being reassembled, for any other reason. This patch implements this requirement for IPv4, choosing the first action : If one fragment had NO-ECT codepoint reassembled frame has NO-ECT ElIf one fragment had CE codepoint reassembled frame has CE Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inetpeer: Make inet_getpeer() take an inet_peer_adress_t pointer.David S. Miller2010-11-301-1/+1
| | | | | | And make an inet_getpeer_v4() helper, update callers. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: return operator cleanupEric Dumazet2010-09-231-2/+2
| | | | | | | | | Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Rename skb_has_frags to skb_has_frag_listDavid S. Miller2010-08-231-1/+1
| | | | | | | | | | | SKBs can be "fragmented" in two ways, via a page array (called skb_shinfo(skb)->frags[]) and via a list of SKBs (called skb_shinfo(skb)->frag_list). Since skb_has_frags() tests the latter, it's name is confusing since it sounds more like it's testing the former. Signed-off-by: David S. Miller <davem@davemloft.net>
* net/ipv4: EXPORT_SYMBOL cleanupsEric Dumazet2010-07-121-2/+1
| | | | | | | | | CodingStyle cleanups EXPORT_SYMBOL should immediately follow the symbol declaration. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* fragment: add fast path for in-order fragmentsChangli Gao2010-06-301-0/+12
| | | | | | | | | | | | | | | | | add fast path for in-order fragments As the fragments are sent in order in most of OSes, such as Windows, Darwin and FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue. In the fast path, we check if the skb at the end of the inet_frag_queue is the prev we expect. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/net/inet_frag.h | 1 + net/ipv4/ip_fragment.c | 12 ++++++++++++ net/ipv6/reassembly.c | 11 +++++++++++ 3 files changed, 24 insertions(+) Signed-off-by: David S. Miller <davem@davemloft.net>
* ipfrag : frag_kfree_skb() cleanupEric Dumazet2010-06-151-6/+3
| | | | | | | | | Third param (work) is unused, remove it. Remove __inline__ and inline qualifiers. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ip_frag: Remove some atomic opsEric Dumazet2010-06-151-2/+1
| | | | | | | Instead of doing one atomic operation per frag, we can factorize them. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* ipv4: ip_fragment: fix unbalanced rcu_read_unlock()Patrick McHardy2010-02-021-3/+2
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* IP: Send an ICMP "Fragment Reassembly Timeout" message when enabling ↵Shan Wei2010-01-231-4/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | connection track No matter whether connection track is enabled, an end host should send an ICMPv4 "Fragment Reassembly Timeout" message when defrag timeout. The reasons are following two points: 1. RFC 792 says: >>>> >> > > If a host reassembling a fragmented datagram cannot complete the >>>> >> > > reassembly due to missing fragments within its time limit it >>>> >> > > discards the datagram, and it may send a time exceeded message. >>>> >> > > >>>> >> > > If fragment zero is not available then no time exceeded need be >>>> >> > > sent at all. >>>> >> > > >>>> >> > > Read more: http://www.faqs.org/rfcs/rfc792.html#ixzz0aOXRD7Wp 2. Patrick McHardy also agrees with this opinion. :-) About the discussion of this opinion, refer to http://patchwork.ozlabs.org/patch/41649 The patch fixed the problem like this: When enabling connection track, fragments are received at PRE_ROUTING HOOK. If they are failed to reassemble, ip_expire() will be called. Before sending an ICMP "Fragment Reassembly Timeout" message, the patch searches router table to get the destination entry only for host type. The patch has been tested on both host type and route type. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: spread __net_init, __net_exitAlexey Dobriyan2010-01-171-4/+4
| | | | | | | | | | | __net_init/__net_exit are apparently not going away, so use them to full extent. In some cases __net_init was removed, because it was called from __net_exit code. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2009-12-081-5/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits) mac80211: fix reorder buffer release iwmc3200wifi: Enable wimax core through module parameter iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter iwmc3200wifi: Coex table command does not expect a response iwmc3200wifi: Update wiwi priority table iwlwifi: driver version track kernel version iwlwifi: indicate uCode type when fail dump error/event log iwl3945: remove duplicated event logging code b43: fix two warnings ipw2100: fix rebooting hang with driver loaded cfg80211: indent regulatory messages with spaces iwmc3200wifi: fix NULL pointer dereference in pmkid update mac80211: Fix TX status reporting for injected data frames ath9k: enable 2GHz band only if the device supports it airo: Fix integer overflow warning rt2x00: Fix padding bug on L2PAD devices. WE: Fix set events not propagated b43legacy: avoid PPC fault during resume b43: avoid PPC fault during resume tcp: fix a timewait refcnt race ... Fix up conflicts due to sysctl cleanups (dead sysctl_check code and CTL_UNNUMBERED removed) in kernel/sysctl_check.c net/ipv4/sysctl_net_ipv4.c net/ipv6/addrconf.c net/sctp/sysctl.c
| * Merge branch 'master' of ↵David S. Miller2009-12-011-1/+1
| |\ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/ht.c
| * | net: use net_eq to compare netsOctavian Purdila2009-11-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generated with the following semantic patch @@ struct net *n1; struct net *n2; @@ - n1 == n2 + net_eq(n1, n2) @@ struct net *n1; struct net *n2; @@ - n1 != n2 + !net_eq(n1, n2) applied over {include,net,drivers/net}. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ip_frag: dont touch device refcountEric Dumazet2009-11-051-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | When sending fragmentation expiration ICMP V4/V6 messages, we can avoid touching device refcount, thanks to RCU Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6Linus Torvalds2009-12-081-6/+0
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits) security/tomoyo: Remove now unnecessary handling of security_sysctl. security/tomoyo: Add a special case to handle accesses through the internal proc mount. sysctl: Drop & in front of every proc_handler. sysctl: Remove CTL_NONE and CTL_UNNUMBERED sysctl: kill dead ctl_handler definitions. sysctl: Remove the last of the generic binary sysctl support sysctl net: Remove unused binary sysctl code sysctl security/tomoyo: Don't look at ctl_name sysctl arm: Remove binary sysctl support sysctl x86: Remove dead binary sysctl support sysctl sh: Remove dead binary sysctl support sysctl powerpc: Remove dead binary sysctl support sysctl ia64: Remove dead binary sysctl support sysctl s390: Remove dead sysctl binary support sysctl frv: Remove dead binary sysctl support sysctl mips/lasat: Remove dead binary sysctl support sysctl drivers: Remove dead binary sysctl support sysctl crypto: Remove dead binary sysctl support sysctl security/keys: Remove dead binary sysctl support sysctl kernel: Remove binary sysctl logic ...
| * | sysctl net: Remove unused binary sysctl codeEric W. Biederman2009-11-121-6/+0
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that sys_sysctl is a compatiblity wrapper around /proc/sys all sysctl strategy routines, and all ctl_name and strategy entries in the sysctl tables are unused, and can be revmoed. In addition neigh_sysctl_register has been modified to no longer take a strategy argument and it's callers have been modified not to pass one. Cc: "David Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* | ipv4: additional update of dev_net(dev) to struct *net in ip_fragment.c, ↵David Ford2009-11-291-1/+1
|/ | | | | | | | | | | | | | | | | | | NULL ptr OOPS ipv4 ip_frag_reasm(), fully replace 'dev_net(dev)' with 'net', defined previously patched into 2.6.29. Between 2.6.28.10 and 2.6.29, net/ipv4/ip_fragment.c was patched, changing from dev_net(dev) to container_of(...). Unfortunately the goto section (out_fail) on oversized packets inside ip_frag_reasm() didn't get touched up as well. Oversized IP packets cause a NULL pointer dereference and immediate hang. I discovered this running openvasd and my previous email on this is titled: NULL pointer dereference at 2.6.32-rc8:net/ipv4/ip_fragment.c:566 Signed-off-by: David Ford <david@blue-labs.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Use frag list abstraction interfaces.David S. Miller2009-06-091-2/+2
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* net: skb->dst accessorsEric Dumazet2009-06-031-1/+1
| | | | | | | | | | | | | | | | | | Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: oops in ip[6]_frag_reasm incrementing statsJorge Boncompte [DTI2]2009-03-181-1/+2
| | | | | | | | | | | | | | dev can be NULL in ip[6]_frag_reasm for skb's coming from RAW sockets. Quagga's OSPFD sends fragmented packets on a RAW socket, when netfilter conntrack reassembles them on the OUTPUT path you hit this code path. You can test it with something like "hping2 -0 -d 2000 -f AA.BB.CC.DD" With help from Jarek Poplawski. Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: '&' reduxAlexey Dobriyan2008-11-031-7/+7
| | | | | | | | | | | | | I want to compile out proc_* and sysctl_* handlers totally and stub them to NULL depending on config options, however usage of & will prevent this, since taking adress of NULL pointer will break compilation. So, drop & in front of every ->proc_handler and every ->strategy handler, it was never needed in fact. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: clean up net/ipv4/ip_fragment.c tcp_timer.c ip_input.cJianjun Kong2008-11-031-1/+1
| | | | | Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: replace NIPQUAD() in net/ipv4/ net/ipv6/Harvey Harrison2008-10-311-3/+2
| | | | | | | | Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Rationalise email address: Network Specific PartsAlan Cox2008-10-131-1/+1
| | | | | | | | | | Clean up the various different email addresses of mine listed in the code to a single current and valid address. As Dave says his network merges for 2.6.28 are now done this seems a good point to send them in where they won't risk disrupting real changes. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: convert BUG_TRAP to generic WARN_ONIlpo Järvinen2008-07-251-2/+2
| | | | | | | | | | | | | | Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* mib: add net to IP_ADD_STATS_BHPavel Emelyanov2008-07-161-1/+1
| | | | | | | | Very simple - only ip_evictor (fragments) requires such. This patch ends up the IP_XXX_STATS patching. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* mib: add net to IP_INC_STATS_BHPavel Emelyanov2008-07-161-7/+10
| | | | | Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: prepare net initialization for IP accountingPavel Emelyanov2008-07-161-3/+3
| | | | | | | | | | | Some places, that deal with IP statistics already have where to get a struct net from, but use it directly, without declaring a separate variable on the stack. So, save this net on the stack for future IP_XXX_STATS macros. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2008-06-281-0/+2
|\ | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl4965-base.c
| * inet fragments: fix race between inet_frag_find and inet_frag_secret_rebuildPavel Emelyanov2008-06-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem is that while we work w/o the inet_frags.lock even read-locked the secret rebuild timer may occur (on another CPU, since BHs are still disabled in the inet_frag_find) and change the rnd seed for ipv4/6 fragments. It was caused by my patch fd9e63544cac30a34c951f0ec958038f0529e244 ([INET]: Omit double hash calculations in xxx_frag_intern) late in the 2.6.24 kernel, so this should probably be queued to -stable. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: remove CVS keywordsAdrian Bunk2008-06-111-2/+0
| | | | | | | | | | | | | | | | This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Register fragmentation some ctls at read-only root.Pavel Emelyanov2008-05-191-2/+14
| | | | | | | | | | | | | | | | | | Parts of fragments-related sysctls are read-only, but this is done by cloning all the tables and dropping write-bits from mode. Do the same but with read-only root. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Rename fragmentation sysctl-related functions/variables.Pavel Emelyanov2008-05-191-9/+9
|/ | | | | | | | | | | | | The fragments sysctls also contains some, that are to be visible, but read-only in net namespaces. The naming in net/core/sysctl_net_core.c is - tables, that are to be registered in namespaces have a "ns" word in their names. So rename ones in ipv4/ip_fragment.c and ipv6/reassembly.c to fit this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Use NIPQUAD_FMT to format ipv4 addresses.YOSHIFUJI Hideaki2008-04-141-1/+1
| | | | | | | And use %u to format port. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Rename inet_frag.h identifiers COMPLETE, FIRST_IN, LAST_IN to INET_FRAG_*Joe Perches2008-03-281-8/+9
| | | | | | | | | | On Fri, 2008-03-28 at 03:24 -0700, Andrew Morton wrote: > they should all be renamed. Done for include/net and net Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.YOSHIFUJI Hideaki2008-03-261-1/+1
| | | | | | | | Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [NETNS]: Process IP layer in the context of the correct namespace.Denis V. Lunev2008-03-241-1/+4
| | | | | | | Replace all the rest of the init_net with a proper net on the IP layer. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Fix null dereference in ip_defragPhil Oester2008-03-211-1/+1
| | | | | | | | | | | | | | | | | Been seeing occasional panics in my testing of 2.6.25-rc in ip_defrag. Offending line in ip_defrag is here: net = skb->dev->nd_net where dev is NULL. Bisected the problem down to commit ac18e7509e7df327e30d6e073a787d922eaf211d ([NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces). Below patch (idea from Patrick McHardy) fixes the problem for me. Signed-off-by: Phil Oester <kernel@linuxace.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Make the pernet subsystem for fragments.Pavel Emelyanov2008-01-281-1/+12
| | | | | | | | | | On namespace start we mainly prepare the ctl variables. When the namespace is stopped we have to kill all the fragments that point to this namespace. The inet_frags_exit_net() handles it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Make the LRU list per namespace.Pavel Emelyanov2008-01-281-1/+1
| | | | | | | | | | | | | The inet_frags.lru_list is used for evicting only, so we have to make it per-namespace, to evict only those fragments, who's namespace exceeded its high threshold, but not the whole hash. Besides, this helps to avoid long loops in evictor. The spinlock is not per-namespace because it protects the hash table as well, which is global. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Isolate the secret interval from namespaces.Pavel Emelyanov2008-01-281-6/+2
| | | | | | | | | | | | Since we have one hashtable to lookup the fragment, having different secret_interval-s for hash rebuild doesn't make sense, so move this one to inet_frags. The inet_frags_ctl becomes empty after this, so remove it. The appropriate ctl table is kept read-only in namespaces. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Make thresholds work in namespaces.Pavel Emelyanov2008-01-281-13/+13
| | | | | | | | | | | This is the same as with the timeout variable. Currently, after exceeding the high threshold _all_ the fragments are evicted, but it will be fixed in later patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Make the net.ipv4.ipfrag_timeout work in namespaces.Pavel Emelyanov2008-01-281-10/+10
| | | | | | | | | | | Move it to the netns_frags, adjust the usage and make the appropriate ctl table writable. Now fragment, that live in different namespaces can live for different times. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Duplicate sysctl tables for new namespaces.Pavel Emelyanov2008-01-281-3/+39
| | | | | | | | | | | | Each namespace has to have own tables to tune their different parameters, so duplicate the tables and register them. All the tables in sub-namespaces are temporarily made read-only. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Make the mem counter per-namespace.Pavel Emelyanov2008-01-281-14/+15
| | | | | | | | This is also simple, but introduces more changes, since then mem counter is altered in more places. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Make the nqueues counter per-namespace.Pavel Emelyanov2008-01-281-2/+4
| | | | | | | | This is simple - just move the variable from struct inet_frags to struct netns_frags and adjust the usage appropriately. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces.Pavel Emelyanov2008-01-281-3/+5
| | | | | | | | | | | | | | | | Since fragment management code is consolidated, we cannot have the pointer from inet_frag_queue to struct net, since we must know what king of fragment this is. So, I introduce the netns_frags structure. This one is currently empty, but will be eventually filled with per-namespace attributes. Each inet_frag_queue is tagged with this one. The conntrack_reasm is not "netns-izated", so it has one static netns_frags instance to keep working in init namespace. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>