Some fragmets from the experiment, It's getting complex and hairy. Anyway results from the first tests to give you an idea... pktgen sending on 10 * 10g interfaces. [From pktgen script] fn() { i=$1 #ifname c=$2 #queue / cpu core n=$3 # numa node PGDEV=/proc/net/pktgen/kpktgend_$c pgset "add_device eth$i@$c " PGDEV=/proc/net/pktgen/eth$i@$c pgset "node $n" pgset "$COUNT" pgset "flag NODE_ALLOC" pgset "$CLONE_SKB" pgset "$PKT_SIZE" pgset "$DELAY" pgset "dst 10.0.0.0" } remove_all # Setup # TYAN S7025 with two nodes. # Each node has own bus with it's own TYLERSBURG bridge # so eth0-eth3 is closest to node0 which in turn "owns" # CPU-cores 0-3 in this HW setup. So we setup so # pktgen according to this. clone_skb=1000000. # Used slots are PCIe-x16 except when PCIe-x8 is indicated. # eth0 queue=0(CPU) node=0 fn 0 0 0 fn 1 1 0 fn 2 2 0 fn 3 3 0 fn 4 4 1 fn 5 5 1 fn 6 6 1 fn 7 7 1 fn 8 12 1 fn 9 13 1 Result "manually" tuned. eth0 9617.7 M bit/s 822 k pps eth1 9619.1 M bit/s 823 k pps eth2 9619.1 M bit/s 823 k pps eth3 9619.2 M bit/s 823 k pps eth4 5995.2 M bit/s 512 k pps <- PCIe-x8 eth5 5995.3 M bit/s 512 k pps <- PCIe-x8 eth6 9619.2 M bit/s 823 k pps eth7 9619.2 M bit/s 823 k pps eth8 9619.1 M bit/s 823 k pps eth9 9619.0 M bit/s 823 k pps > 90 Gbit/s Result "manually" mistuned by switching node 0 and 1. eth0 9613.6 M bit/s 822 k pps eth1 9614.9 M bit/s 822 k pps eth2 9615.0 M bit/s 822 k pps eth3 9615.1 M bit/s 822 k pps eth4 2918.5 M bit/s 249 k pps <- PCIe-x8 eth5 2918.4 M bit/s 249 k pps <- PCIe-x8 eth6 8597.0 M bit/s 735 k pps eth7 8597.0 M bit/s 735 k pps eth8 8568.3 M bit/s 733 k pps eth9 8568.3 M bit/s 733 k pps