From d62fda082c48b417b47a553860abf75d9cf8b591 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 12 May 2009 20:48:02 +0000 Subject: bnx2: bnx2_tx_int() optimizations When using bnx2 in a high transmit load, bnx2_tx_int() cost is pretty high. There are two reasons. One is an expensive call to bnx2_get_hw_tx_cons(bnapi) for each freed skb One is cpu stalls when accessing skb_is_gso(skb) / skb_shinfo(skb)->nr_frags because of two cache line misses. (One to get skb->end/head to compute skb_shinfo(skb), one to get is_gso/nr_frags) This patch : 1) avoids calling bnx2_get_hw_tx_cons(bnapi) too many times. 2) makes bnx2_start_xmit() cache is_gso & nr_frags into sw_tx_bd descriptor. This uses a litle bit more ram (256 longs per device on x86), but helps a lot. 3) uses a prefetch(&skb->end) to speedup dev_kfree_skb(), bringing cache line that will be needed in skb_release_data() result is 5 % bandwidth increase in benchmarks, involving UDP or TCP receive & transmits, when a cpu is dedicated to ksoftirqd for bnx2. bnx2_tx_int going from 3.33 % cpu to 0.5 % cpu in oprofile Note : skb_dma_unmap() still very expensive but this is for another patch, not related to bnx2 (2.9 % of cpu, while it does nothing on x86_32) Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- drivers/net/bnx2.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'drivers/net/bnx2.h') diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h index 5b570e1..026ed1c 100644 --- a/drivers/net/bnx2.h +++ b/drivers/net/bnx2.h @@ -6552,6 +6552,8 @@ struct sw_pg { struct sw_tx_bd { struct sk_buff *skb; + unsigned short is_gso; + unsigned short nr_frags; }; #define SW_RXBD_RING_SIZE (sizeof(struct sw_bd) * RX_DESC_CNT) -- cgit v1.1