remove port check, document configuration with tc, and support layer 3 interfaces #1

arinc9 · 2025-07-20T07:57:55Z

Hey Matt!

This pull request removes the port check logic from the BPF programme, documents configuration with tc, brings support for layer 3 interfaces, and improves the documentation.

Cheers.
Chester A.

matttbe

Hi @arinc9,

Thank you for this PR. The modification in terms of code looks OK to me, just a small comment in the README and test.sh.

I quickly tested it and I noticed a performance drop. I think I wrote something about that somewhere in the csum branch: I was suspecting a performance drop with this modification because TC will have to parse each packet up to the layer 4 to find the port, and the BPF program will do the same to read other parts of the L4. Without your modifications, I can typically reach ~3.25 Gbps with iperf3 -c 10.0.2.2 -ZR when using test.sh. In the same conditions, with your modifications, the performances go down to ~2.6 Gbps, so around 20%. TBH, I wouldn't have expected a so big impact, ~20% seems quite high.

I like the fact it simplifies the eBPF C code, but the perf impact is maybe not worth it. It depends on other checks (packet mark?) that need to be done. WDYT?

By chance, are there no other alternatives? In some BPF programs, I know that you can set some variables per program, typically to set the port(s), or the side. But I don't think you can do that here with the TC hooks, right? I guess an alternative would be to use maps, but I guess there would be an impact as well (and probably the program will no longer be loaded by the tc filter command).

The other notes you have are still interesting. We could also document the idea and point to this PR to say that this version is more flexible, but there is a perf impact. (or do the opposite)

README.md

test.sh

arinc9 · 2025-07-28T11:23:20Z

Without your modifications, I can typically reach ~3.25 Gbps with iperf3 -c 10.0.2.2 -ZR when using test.sh. In the same conditions, with your modifications, the performances go down to ~2.6 Gbps, so around 20%. TBH, I wouldn't have expected a so big impact, ~20% seems quite high.

Can you try the u32 filter instead of flower? Let's see if that performs better. Example:

tc filter add dev "${IFACE}" egress  u32 match ip dport 5201 0xffff action goto chain 1

matttbe · 2025-07-28T12:49:51Z

Without your modifications, I can typically reach ~3.25 Gbps with iperf3 -c 10.0.2.2 -ZR when using test.sh. In the same conditions, with your modifications, the performances go down to ~2.6 Gbps, so around 20%. TBH, I wouldn't have expected a so big impact, ~20% seems quite high.

Can you try the u32 filter instead of flower? Let's see if that performs better. Example:
tc filter add dev "${IFACE}" egress  u32 match ip dport 5201 0xffff action goto chain 1

Switching to one port instead of a range helps to go from ~2.6 (~20% drop) to ~2.8 Gbps (~15% drop)

diff --git a/test.sh b/test.sh
index 0dc9466..26bb3e6 100755
--- a/test.sh
+++ b/test.sh
@@ -40,15 +40,15 @@ server()
 
 tc_client()
 {
-	local ns="${NS}_cpe" iface="int" port_start="5201" port_end="5203"
+	local ns="${NS}_cpe" iface="int" port="5201"
 
 	# ip netns will umount everything on exit
 	ip netns exec "${ns}" sh -c "mount -t debugfs none /sys/kernel/debug && cat /sys/kernel/debug/tracing/trace_pipe" &
 
 	tc -n "${ns}" qdisc add dev "${iface}" clsact
-	tc -n "${ns}" filter add dev "${iface}" egress  protocol ip flower ip_proto tcp dst_port "${port_start}"-"${port_end}" action goto chain 1
+	tc -n "${ns}" filter add dev "${iface}" egress  protocol ip flower ip_proto tcp dst_port "${port}" action goto chain 1
 	tc -n "${ns}" filter add dev "${iface}" egress  chain 1 bpf object-file tcp_in_udp_tc.o section tc action csum udp
-	tc -n "${ns}" filter add dev "${iface}" ingress protocol ip flower ip_proto udp src_port "${port_start}"-"${port_end}" action goto chain 1
+	tc -n "${ns}" filter add dev "${iface}" ingress protocol ip flower ip_proto udp src_port "${port}" action goto chain 1
 	tc -n "${ns}" filter add dev "${iface}" ingress chain 1 bpf object-file tcp_in_udp_tc.o section tc direct-action
 
 	tc -n "${ns}" filter show dev "${iface}" egress
@@ -61,15 +61,15 @@ tc_client()
 
 tc_server()
 {
-	local ns="${NS}_net" iface="int" port_start="5201" port_end="5203"
+	local ns="${NS}_net" iface="int" port="5201"
 
 	# ip netns will umount everything on exit
 	ip netns exec "${ns}" sh -c "mount -t debugfs none /sys/kernel/debug && cat /sys/kernel/debug/tracing/trace_pipe" &
 
 	tc -n "${ns}" qdisc add dev "${iface}" clsact
-	tc -n "${ns}" filter add dev "${iface}" egress  protocol ip flower ip_proto tcp src_port "${port_start}"-"${port_end}" action goto chain 1
+	tc -n "${ns}" filter add dev "${iface}" egress  protocol ip flower ip_proto tcp src_port "${port}" action goto chain 1
 	tc -n "${ns}" filter add dev "${iface}" egress  chain 1 bpf object-file tcp_in_udp_tc.o section tc action csum udp
-	tc -n "${ns}" filter add dev "${iface}" ingress protocol ip flower ip_proto udp dst_port "${port_start}"-"${port_end}" action goto chain 1
+	tc -n "${ns}" filter add dev "${iface}" ingress protocol ip flower ip_proto udp dst_port "${port}" action goto chain 1
 	tc -n "${ns}" filter add dev "${iface}" ingress chain 1 bpf object-file tcp_in_udp_tc.o section tc direct-action
 
 	tc -n "${ns}" filter show dev "${iface}" egress

And switching to u32 helps: from 2.8 to 3.05 Gbps (5% drop).

diff --git a/test.sh b/test.sh
index 26bb3e6..fbd4b50 100755
--- a/test.sh
+++ b/test.sh
@@ -46,9 +46,9 @@ tc_client()
 	ip netns exec "${ns}" sh -c "mount -t debugfs none /sys/kernel/debug && cat /sys/kernel/debug/tracing/trace_pipe" &
 
 	tc -n "${ns}" qdisc add dev "${iface}" clsact
-	tc -n "${ns}" filter add dev "${iface}" egress  protocol ip flower ip_proto tcp dst_port "${port}" action goto chain 1
+	tc -n "${ns}" filter add dev "${iface}" egress  u32 match ip dport ${port} 0xffff action goto chain 1
 	tc -n "${ns}" filter add dev "${iface}" egress  chain 1 bpf object-file tcp_in_udp_tc.o section tc action csum udp
-	tc -n "${ns}" filter add dev "${iface}" ingress protocol ip flower ip_proto udp src_port "${port}" action goto chain 1
+	tc -n "${ns}" filter add dev "${iface}" ingress u32 match ip sport ${port} 0xffff action goto chain 1
 	tc -n "${ns}" filter add dev "${iface}" ingress chain 1 bpf object-file tcp_in_udp_tc.o section tc direct-action
 
 	tc -n "${ns}" filter show dev "${iface}" egress
@@ -67,9 +67,9 @@ tc_server()
 	ip netns exec "${ns}" sh -c "mount -t debugfs none /sys/kernel/debug && cat /sys/kernel/debug/tracing/trace_pipe" &
 
 	tc -n "${ns}" qdisc add dev "${iface}" clsact
-	tc -n "${ns}" filter add dev "${iface}" egress  protocol ip flower ip_proto tcp src_port "${port}" action goto chain 1
+	tc -n "${ns}" filter add dev "${iface}" egress  u32 match ip sport ${port} 0xffff action goto chain 1
 	tc -n "${ns}" filter add dev "${iface}" egress  chain 1 bpf object-file tcp_in_udp_tc.o section tc action csum udp
-	tc -n "${ns}" filter add dev "${iface}" ingress protocol ip flower ip_proto udp dst_port "${port}" action goto chain 1
+	tc -n "${ns}" filter add dev "${iface}" ingress u32 match ip dport ${port} 0xffff action goto chain 1
 	tc -n "${ns}" filter add dev "${iface}" ingress chain 1 bpf object-file tcp_in_udp_tc.o section tc direct-action
 
 	tc -n "${ns}" filter show dev "${iface}" egress

5% drop but more flexible seems OK.

Do you have similar results on your side?

arinc9 · 2025-07-28T16:46:37Z

I can also see u32 performing better on single thread (iperf3 uses a thread per stream since 2023).

protocol ip flower ip_proto tcp src_port 5201 and equivalent on the other side:

[  5] local 10.0.0.2 port 50618 connected to 10.0.0.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.08 GBytes  9.29 Gbits/sec                  
[  5]   1.00-2.00   sec  1.08 GBytes  9.25 Gbits/sec                  
[  5]   2.00-3.00   sec  1.09 GBytes  9.40 Gbits/sec                  
[  5]   3.00-4.00   sec  1.12 GBytes  9.61 Gbits/sec                  
[  5]   4.00-5.00   sec  1.11 GBytes  9.54 Gbits/sec                  
[  5]   5.00-6.00   sec  1.11 GBytes  9.54 Gbits/sec                  
[  5]   6.00-7.00   sec  1.12 GBytes  9.60 Gbits/sec                  
[  5]   7.00-8.00   sec  1.10 GBytes  9.45 Gbits/sec                  
[  5]   8.00-9.00   sec  1.09 GBytes  9.33 Gbits/sec                  
[  5]   9.00-10.00  sec  1.13 GBytes  9.67 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  11.0 GBytes  9.47 Gbits/sec    0            sender
[  5]   0.00-10.00  sec  11.0 GBytes  9.47 Gbits/sec                  receiver

u32 match tcp src 5201 0xffff and equivalent on the other side:

[  5] local 10.0.0.2 port 53260 connected to 10.0.0.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.21 GBytes  10.4 Gbits/sec                  
[  5]   1.00-2.00   sec  1.19 GBytes  10.2 Gbits/sec                  
[  5]   2.00-3.00   sec  1.21 GBytes  10.4 Gbits/sec                  
[  5]   3.00-4.00   sec  1.20 GBytes  10.3 Gbits/sec                  
[  5]   4.00-5.00   sec  1.24 GBytes  10.6 Gbits/sec                  
[  5]   5.00-6.00   sec  1.22 GBytes  10.5 Gbits/sec                  
[  5]   6.00-7.00   sec  1.23 GBytes  10.6 Gbits/sec                  
[  5]   7.00-8.00   sec  1.23 GBytes  10.6 Gbits/sec                  
[  5]   8.00-9.00   sec  1.23 GBytes  10.6 Gbits/sec                  
[  5]   9.00-10.00  sec  1.21 GBytes  10.4 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  12.2 GBytes  10.5 Gbits/sec    0            sender
[  5]   0.00-10.00  sec  12.2 GBytes  10.5 Gbits/sec                  receiver

no tc filter, discrimination at the BPF programme:

[  5] local 10.0.0.2 port 36288 connected to 10.0.0.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.26 GBytes  10.9 Gbits/sec                  
[  5]   1.00-2.00   sec  1.31 GBytes  11.2 Gbits/sec                  
[  5]   2.00-3.00   sec  1.31 GBytes  11.2 Gbits/sec                  
[  5]   3.00-4.00   sec  1.29 GBytes  11.1 Gbits/sec                  
[  5]   4.00-5.00   sec  1.26 GBytes  10.8 Gbits/sec                  
[  5]   5.00-6.00   sec  1.26 GBytes  10.8 Gbits/sec                  
[  5]   6.00-7.00   sec  1.25 GBytes  10.8 Gbits/sec                  
[  5]   7.00-8.00   sec  1.24 GBytes  10.6 Gbits/sec                  
[  5]   8.00-9.00   sec  1.25 GBytes  10.7 Gbits/sec                  
[  5]   9.00-10.00  sec  1.26 GBytes  10.9 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  12.7 GBytes  10.9 Gbits/sec    0            sender
[  5]   0.00-10.00  sec  12.7 GBytes  10.9 Gbits/sec                  receiver

Having multiple filters to allow more than one port forwarded to the BPF programme won't degrade performance. So I'm going to change my patch to use u32.

arinc9 · 2025-07-28T18:28:53Z

u32 match tcp src 5201 0xffff won't match whilst u32 match ip sport 5201 0xffff will. Looking into why.

arinc9 · 2025-07-29T11:34:09Z

Before I get into the tc filter issue, here's my test result with the offloads. Setting gso_max_segs to 0 or 1 is necessary and it's the only option that has an effect. No need to turn off any offloading option using ethtool. Testing on veth interface with these offload options (untouched, LRO is not supported on veth):

$ sudo ip netns exec client ethtool -k eth0 | grep ": on"
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ip-generic: on
        tx-checksum-sctp: on
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: on
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: on
        tx-tcp6-segmentation: on
generic-segmentation-offload: on
rx-vlan-offload: on
tx-vlan-offload: on
highdma: on
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-sctp-segmentation: on
tx-udp-segmentation: on
tx-gso-list: on
tx-vlan-stag-hw-insert: on
rx-vlan-stag-hw-parse: on

matttbe · 2025-07-29T14:10:35Z

Setting gso_max_segs to 0 or 1 is necessary and it's the only option that has an effect. No need to turn off any offloading option using ethtool. Testing on veth interface with these offload options (untouched, LRO is not supported on veth):

Thank you for having checked that. Don't hesitate to update the README section. In egress, gso_max_segs should indeed be enough. In ingress, I think we don't need anything: we would have UDP packets and I think UDP GRO is only done on demand, e.g. when the userspace asks it (setsockopt(IPPROTO_UDP, UDP_GRO)) or for some in-kernel tunnels. So it is possible gro and lro doesn't need to be disabled, but to be confirmed with HW supporting it. We don't want the packets to be merged or split.

arinc9 · 2025-07-29T15:44:32Z

The only information I could find about LRO in kernel source code is here:

https://github.com/torvalds/linux/blob/86aa721820952b793a12fc6e5a01734186c0c238/Documentation/networking/device_drivers/ethernet/intel/fm10k.rst#generic-receive-offload-aka-gro

If LRO only supports TCP as documented there, then we don't need to disable it as we receive UDP packets.

matttbe · 2025-07-29T16:00:24Z

Indeed, apparently LRO is TCP (and IPv4?) only: https://lwn.net/Articles/358910/

One other nice thing about GRO is that, unlike LRO, it is not limited to TCP/IPv4.

So if LRO is not for UDP and GRO with UDP is on demand only, I guess it means we don't need to change any HW offload.

Don't hesitate to reflect that in the README file. (Probably no need to change anything in test.sh, because there, the tunnelling is not done on the client/server, where the TCP connection is handled.)

arinc9 · 2025-07-29T16:32:52Z

What about the commented out command to set gso_max_segs? Don't I need to uncomment those?

matttbe · 2025-07-29T16:42:17Z

What about the commented out command to set gso_max_segs? Don't I need to uncomment those?

I would say no: the test env is a bit particular:

cli --------- cpe --------- int --------- net --------- srv
       TCP           UDP           UDP           TCP

To be able to see the packets before and after the TC hooks, the BPF hooks are loaded on cpe and net hosts, not on cli and srv which generate the TCP traffic (where gso_max_segs would do something). In this test env, we need to disable GRO, not to receive aggregated TCP packets that cannot fit in one UDP packet on the wire.

See: dae7fd2

Or we could have 2 tests env:

the existing one.
a new one with just cli-int-srv: the TC hooks are loaded on the client and server + using gso_max_segs on both side.

Offloads other than GSO and GRO do not break this type of traffic. Document disabling GSO and explain why disabling GRO is not needed. Signed-off-by: Chester A. Unal <[email protected]>

arinc9 · 2025-08-08T18:27:17Z

@matttbe let me know if the current version of the series is ok.

The layer 4 protocol and UDP or TCP port can be distinguished by a tc filter. Document that and remove the logic to discriminate packets by UDP or TCP port from the BPF programme. Add warnings to the README. Signed-off-by: Chester A. Unal <[email protected]>

Cellular interfaces do not include layer 2 header. When reading the Ethernet header, if there is no IPv4 or IPv6 header found, assume that the packet does not have an Ethernet header and check whether the protocol is IPv4 or IPv6. Signed-off-by: Chester A. Unal <[email protected]>

Remove the unused includes. Sort in alphabetical order where possible. Signed-off-by: Chester A. Unal <[email protected]>

Only the make, clang, libelf-dev, libc6-dev-i386, and libbpf-dev packages are needed. Document them. Signed-off-by: Chester A. Unal <[email protected]>

arinc9 · 2025-08-15T20:28:59Z

@matttbe reminder this is still up for review.

arinc9 force-pushed the pr branch from 4909f00 to 1e7cb1b Compare July 20, 2025 08:09

arinc9 changed the title ~~{readme,tc}: remove port check and document configuration with tc~~ remove port check, document configuration with tc, and support layer 3 interfaces Jul 21, 2025

matttbe requested changes Jul 28, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

test.sh Outdated Show resolved Hide resolved

arinc9 force-pushed the pr branch from 69a4634 to 8c7df87 Compare July 28, 2025 17:57

arinc9 force-pushed the pr branch 2 times, most recently from c065ac6 to 69807db Compare August 8, 2025 18:22

{readme,test}: only disable GSO and explain no need to disable GRO

9ebca18

Offloads other than GSO and GRO do not break this type of traffic. Document disabling GSO and explain why disabling GRO is not needed. Signed-off-by: Chester A. Unal <[email protected]>

arinc9 force-pushed the pr branch from 69807db to a724f60 Compare August 8, 2025 18:25

arinc9 added 4 commits August 8, 2025 19:34

tc: remove unused includes and sort alphabetical

73ccf7f

Remove the unused includes. Sort in alphabetical order where possible. Signed-off-by: Chester A. Unal <[email protected]>

readme: document only the necessary apt packages

17bc834

Only the make, clang, libelf-dev, libc6-dev-i386, and libbpf-dev packages are needed. Document them. Signed-off-by: Chester A. Unal <[email protected]>

arinc9 force-pushed the pr branch from a724f60 to 17bc834 Compare August 8, 2025 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

remove port check, document configuration with tc, and support layer 3 interfaces #1

remove port check, document configuration with tc, and support layer 3 interfaces #1

Uh oh!

arinc9 commented Jul 20, 2025 •

edited

Loading

Uh oh!

matttbe left a comment

Uh oh!

Uh oh!

Uh oh!

arinc9 commented Jul 28, 2025

Uh oh!

matttbe commented Jul 28, 2025 •

edited

Loading

Uh oh!

arinc9 commented Jul 28, 2025

Uh oh!

arinc9 commented Jul 28, 2025

Uh oh!

arinc9 commented Jul 29, 2025

Uh oh!

matttbe commented Jul 29, 2025

Uh oh!

arinc9 commented Jul 29, 2025

Uh oh!

matttbe commented Jul 29, 2025

Uh oh!

arinc9 commented Jul 29, 2025

Uh oh!

matttbe commented Jul 29, 2025

Uh oh!

arinc9 commented Aug 8, 2025

Uh oh!

arinc9 commented Aug 15, 2025

Uh oh!

Uh oh!

remove port check, document configuration with tc, and support layer 3 interfaces #1

Are you sure you want to change the base?

remove port check, document configuration with tc, and support layer 3 interfaces #1

Uh oh!

Conversation

arinc9 commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matttbe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

arinc9 commented Jul 28, 2025

Uh oh!

matttbe commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arinc9 commented Jul 28, 2025

Uh oh!

arinc9 commented Jul 28, 2025

Uh oh!

arinc9 commented Jul 29, 2025

Uh oh!

matttbe commented Jul 29, 2025

Uh oh!

arinc9 commented Jul 29, 2025

Uh oh!

matttbe commented Jul 29, 2025

Uh oh!

arinc9 commented Jul 29, 2025

Uh oh!

matttbe commented Jul 29, 2025

Uh oh!

arinc9 commented Aug 8, 2025

Uh oh!

arinc9 commented Aug 15, 2025

Uh oh!

Uh oh!

arinc9 commented Jul 20, 2025 •

edited

Loading

matttbe commented Jul 28, 2025 •

edited

Loading