-
Notifications
You must be signed in to change notification settings - Fork 4
remove port check, document configuration with tc, and support layer 3 interfaces #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @arinc9,
Thank you for this PR. The modification in terms of code looks OK to me, just a small comment in the README and test.sh
.
I quickly tested it and I noticed a performance drop. I think I wrote something about that somewhere in the csum
branch: I was suspecting a performance drop with this modification because TC will have to parse each packet up to the layer 4 to find the port, and the BPF program will do the same to read other parts of the L4. Without your modifications, I can typically reach ~3.25 Gbps with iperf3 -c 10.0.2.2 -ZR
when using test.sh
. In the same conditions, with your modifications, the performances go down to ~2.6 Gbps, so around 20%. TBH, I wouldn't have expected a so big impact, ~20% seems quite high.
I like the fact it simplifies the eBPF C code, but the perf impact is maybe not worth it. It depends on other checks (packet mark?) that need to be done. WDYT?
By chance, are there no other alternatives? In some BPF programs, I know that you can set some variables per program, typically to set the port(s), or the side. But I don't think you can do that here with the TC hooks, right? I guess an alternative would be to use maps, but I guess there would be an impact as well (and probably the program will no longer be loaded by the tc filter
command).
The other notes you have are still interesting. We could also document the idea and point to this PR to say that this version is more flexible, but there is a perf impact. (or do the opposite)
Can you try the u32 filter instead of flower? Let's see if that performs better. Example:
|
Switching to one port instead of a range helps to go from ~2.6 (~20% drop) to ~2.8 Gbps (~15% drop) diff --git a/test.sh b/test.sh
index 0dc9466..26bb3e6 100755
--- a/test.sh
+++ b/test.sh
@@ -40,15 +40,15 @@ server()
tc_client()
{
- local ns="${NS}_cpe" iface="int" port_start="5201" port_end="5203"
+ local ns="${NS}_cpe" iface="int" port="5201"
# ip netns will umount everything on exit
ip netns exec "${ns}" sh -c "mount -t debugfs none /sys/kernel/debug && cat /sys/kernel/debug/tracing/trace_pipe" &
tc -n "${ns}" qdisc add dev "${iface}" clsact
- tc -n "${ns}" filter add dev "${iface}" egress protocol ip flower ip_proto tcp dst_port "${port_start}"-"${port_end}" action goto chain 1
+ tc -n "${ns}" filter add dev "${iface}" egress protocol ip flower ip_proto tcp dst_port "${port}" action goto chain 1
tc -n "${ns}" filter add dev "${iface}" egress chain 1 bpf object-file tcp_in_udp_tc.o section tc action csum udp
- tc -n "${ns}" filter add dev "${iface}" ingress protocol ip flower ip_proto udp src_port "${port_start}"-"${port_end}" action goto chain 1
+ tc -n "${ns}" filter add dev "${iface}" ingress protocol ip flower ip_proto udp src_port "${port}" action goto chain 1
tc -n "${ns}" filter add dev "${iface}" ingress chain 1 bpf object-file tcp_in_udp_tc.o section tc direct-action
tc -n "${ns}" filter show dev "${iface}" egress
@@ -61,15 +61,15 @@ tc_client()
tc_server()
{
- local ns="${NS}_net" iface="int" port_start="5201" port_end="5203"
+ local ns="${NS}_net" iface="int" port="5201"
# ip netns will umount everything on exit
ip netns exec "${ns}" sh -c "mount -t debugfs none /sys/kernel/debug && cat /sys/kernel/debug/tracing/trace_pipe" &
tc -n "${ns}" qdisc add dev "${iface}" clsact
- tc -n "${ns}" filter add dev "${iface}" egress protocol ip flower ip_proto tcp src_port "${port_start}"-"${port_end}" action goto chain 1
+ tc -n "${ns}" filter add dev "${iface}" egress protocol ip flower ip_proto tcp src_port "${port}" action goto chain 1
tc -n "${ns}" filter add dev "${iface}" egress chain 1 bpf object-file tcp_in_udp_tc.o section tc action csum udp
- tc -n "${ns}" filter add dev "${iface}" ingress protocol ip flower ip_proto udp dst_port "${port_start}"-"${port_end}" action goto chain 1
+ tc -n "${ns}" filter add dev "${iface}" ingress protocol ip flower ip_proto udp dst_port "${port}" action goto chain 1
tc -n "${ns}" filter add dev "${iface}" ingress chain 1 bpf object-file tcp_in_udp_tc.o section tc direct-action
tc -n "${ns}" filter show dev "${iface}" egress And switching to diff --git a/test.sh b/test.sh
index 26bb3e6..fbd4b50 100755
--- a/test.sh
+++ b/test.sh
@@ -46,9 +46,9 @@ tc_client()
ip netns exec "${ns}" sh -c "mount -t debugfs none /sys/kernel/debug && cat /sys/kernel/debug/tracing/trace_pipe" &
tc -n "${ns}" qdisc add dev "${iface}" clsact
- tc -n "${ns}" filter add dev "${iface}" egress protocol ip flower ip_proto tcp dst_port "${port}" action goto chain 1
+ tc -n "${ns}" filter add dev "${iface}" egress u32 match ip dport ${port} 0xffff action goto chain 1
tc -n "${ns}" filter add dev "${iface}" egress chain 1 bpf object-file tcp_in_udp_tc.o section tc action csum udp
- tc -n "${ns}" filter add dev "${iface}" ingress protocol ip flower ip_proto udp src_port "${port}" action goto chain 1
+ tc -n "${ns}" filter add dev "${iface}" ingress u32 match ip sport ${port} 0xffff action goto chain 1
tc -n "${ns}" filter add dev "${iface}" ingress chain 1 bpf object-file tcp_in_udp_tc.o section tc direct-action
tc -n "${ns}" filter show dev "${iface}" egress
@@ -67,9 +67,9 @@ tc_server()
ip netns exec "${ns}" sh -c "mount -t debugfs none /sys/kernel/debug && cat /sys/kernel/debug/tracing/trace_pipe" &
tc -n "${ns}" qdisc add dev "${iface}" clsact
- tc -n "${ns}" filter add dev "${iface}" egress protocol ip flower ip_proto tcp src_port "${port}" action goto chain 1
+ tc -n "${ns}" filter add dev "${iface}" egress u32 match ip sport ${port} 0xffff action goto chain 1
tc -n "${ns}" filter add dev "${iface}" egress chain 1 bpf object-file tcp_in_udp_tc.o section tc action csum udp
- tc -n "${ns}" filter add dev "${iface}" ingress protocol ip flower ip_proto udp dst_port "${port}" action goto chain 1
+ tc -n "${ns}" filter add dev "${iface}" ingress u32 match ip dport ${port} 0xffff action goto chain 1
tc -n "${ns}" filter add dev "${iface}" ingress chain 1 bpf object-file tcp_in_udp_tc.o section tc direct-action
tc -n "${ns}" filter show dev "${iface}" egress 5% drop but more flexible seems OK. Do you have similar results on your side? |
I can also see u32 performing better on single thread (iperf3 uses a thread per stream since 2023).
no tc filter, discrimination at the BPF programme:
Having multiple filters to allow more than one port forwarded to the BPF programme won't degrade performance. So I'm going to change my patch to use u32. |
|
Before I get into the tc filter issue, here's my test result with the offloads. Setting gso_max_segs to 0 or 1 is necessary and it's the only option that has an effect. No need to turn off any offloading option using ethtool. Testing on veth interface with these offload options (untouched, LRO is not supported on veth):
|
Thank you for having checked that. Don't hesitate to update the README section. In egress, |
The only information I could find about LRO in kernel source code is here: If LRO only supports TCP as documented there, then we don't need to disable it as we receive UDP packets. |
Indeed, apparently LRO is TCP (and IPv4?) only: https://lwn.net/Articles/358910/
So if LRO is not for UDP and GRO with UDP is on demand only, I guess it means we don't need to change any HW offload. Don't hesitate to reflect that in the README file. (Probably no need to change anything in |
What about the commented out command to set gso_max_segs? Don't I need to uncomment those? |
I would say no: the test env is a bit particular:
To be able to see the packets before and after the TC hooks, the BPF hooks are loaded on See: dae7fd2 Or we could have 2 tests env:
|
c065ac6
to
69807db
Compare
Offloads other than GSO and GRO do not break this type of traffic. Document disabling GSO and explain why disabling GRO is not needed. Signed-off-by: Chester A. Unal <[email protected]>
@matttbe let me know if the current version of the series is ok. |
The layer 4 protocol and UDP or TCP port can be distinguished by a tc filter. Document that and remove the logic to discriminate packets by UDP or TCP port from the BPF programme. Add warnings to the README. Signed-off-by: Chester A. Unal <[email protected]>
Cellular interfaces do not include layer 2 header. When reading the Ethernet header, if there is no IPv4 or IPv6 header found, assume that the packet does not have an Ethernet header and check whether the protocol is IPv4 or IPv6. Signed-off-by: Chester A. Unal <[email protected]>
Remove the unused includes. Sort in alphabetical order where possible. Signed-off-by: Chester A. Unal <[email protected]>
Only the make, clang, libelf-dev, libc6-dev-i386, and libbpf-dev packages are needed. Document them. Signed-off-by: Chester A. Unal <[email protected]>
@matttbe reminder this is still up for review. |
Hey Matt!
This pull request removes the port check logic from the BPF programme, documents configuration with tc, brings support for layer 3 interfaces, and improves the documentation.
Cheers.
Chester A.