Skip to content

Commit acde8fb

Browse files
committed
lightway-server: Use i/o uring for all i/o, not just tun.
This does not consistently improve performance but reduces CPU overheads (by around 50%-100% i.e. half to one core) under heavy traffic, which adding perhaps a few hundred Mbps to a speedtest.net download test and making negligible difference to the upload test. It also removes about 1ms from the latency in the same tests. Finally the STDEV across multiple test runs appears to be lower. This appears to be due to a combination of avoiding async runtime overheads, as well as removing various channels/queues in favour of a more direct model of interaction between the ring and the connections. As well as those benefits we are now able to reach the same level of performance with far fewer slots used for the TUN rx path, here we use 64 slots (by default) and reach the same performance as using 1024 previously. The way uring handles blocking vs async for tun devices seems to be non-optimal. In blocking mode things are very slow. In async mode more and more time is spent on bookkeeping and polling, as the number of slots is increased, plus a high level of EAGAIN results (due to a request timing out after multiple failed polls[^0]) which waste time requeueing. This is related to axboe/liburing#886 and axboe/liburing#239. For UDP/TCP sockets io uring behaves well with the socket in blocking mode which avoids processing lots of EAGAIN results. Tuning the slots for each I/O path is a bit of an art (more is definitely not always better) and the sweet spot varies depending on the I/O device, so provide various tunables instead of just splitting the ring evenly. With this there's no real reason to have a very large ring, it's the number of inflight requests which matters. This is specific to the server since it relies on kernel features and correctness(/lack of bugs) which may not be upheld on an arbitrary client system (while it is assumed that server operators have more control over what they run). It is also not portable to non-Linux systems. It is known to work with Linux 6.1 (as found in Debian 12 AKA bookworm). Note that this kernel version contains a bug which causes the `iou-sqp-*` kernel thread to get stuck (unkillable) if the tun is in blocking mode, therefore an option is provided. Enabling that option on a kernel which contains [the fix][] allows equivalent performance with fewer slots on the ring. [^0]: When data becomes available _all_ requests are woken but only one will find data, the rest will see EAGAIN and after a certain number of such events I/O uring will propagate this back to userspace. [the fix]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=438b406055cd21105aad77db7938ee4720b09bee
1 parent 72a9e8f commit acde8fb

File tree

20 files changed

+1906
-446
lines changed

20 files changed

+1906
-446
lines changed

Cargo.lock

Lines changed: 2 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ clap = { version = "4.4.7", features = ["derive"] }
3434
ctrlc = { version = "3.4.2", features = ["termination"] }
3535
delegate = "0.12.0"
3636
educe = { version = "0.6.0", default-features = false, features = ["Debug"] }
37+
io-uring = "0.7.0"
3738
ipnet = { version = "2.8.0", features = ["serde"]}
3839
libc = "0.2.152"
3940
lightway-app-utils = { path = "./lightway-app-utils" }
@@ -52,3 +53,4 @@ tokio-util = "0.7.10"
5253
tracing = "0.1.37"
5354
tracing-subscriber = "0.3.17"
5455
twelf = { version = "0.15.0", default-features = false, features = ["env", "clap", "yaml"]}
56+
tun = { version = "0.7.1" }

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Protocol and design documentation can be found in the
2525
Lightway rust implementation currently supports Linux OS. Both x86_64 and arm64 platforms are
2626
supported and built as part of CI.
2727

28-
Support for other platforms will be added soon.
28+
Support for other client platforms will be added soon.
2929

3030
## Development steps
3131

lightway-app-utils/Cargo.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ bytes.workspace = true
2323
clap.workspace = true
2424
fs-mistrust = { version = "0.8.0", default-features = false }
2525
humantime = "2.1.0"
26-
io-uring = { version = "0.7.0", optional = true }
26+
io-uring = { workspace = true, optional = true }
2727
ipnet.workspace = true
2828
libc.workspace = true
2929
lightway-core.workspace = true
@@ -38,11 +38,12 @@ tokio-stream = { workspace = true, optional = true }
3838
tokio-util.workspace = true
3939
tracing.workspace = true
4040
tracing-subscriber = { workspace = true, features = ["json"] }
41-
tun = { version = "0.7", features = ["async"] }
41+
tun = { workspace = true, features = ["async"] }
4242

4343
[[example]]
4444
name = "udprelay"
4545
path = "examples/udprelay.rs"
46+
required-features = ["io-uring"]
4647

4748
[dev-dependencies]
4849
async-trait.workspace = true

lightway-app-utils/src/lib.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,9 @@ mod event_stream;
1414
mod iouring;
1515
mod tun;
1616

17+
mod net;
18+
pub use net::{sockaddr_from_socket_addr, socket_addr_from_sockaddr};
19+
1720
#[cfg(feature = "tokio")]
1821
pub use connection_ticker::{
1922
connection_ticker_cb, ConnectionTicker, ConnectionTickerState, ConnectionTickerTask, Tickable,

lightway-app-utils/src/net.rs

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
use std::{io, net::SocketAddr};
2+
3+
/// Convert from `libc::sockaddr_storage` to `std::net::SocketAddr`
4+
#[allow(unsafe_code)]
5+
pub fn socket_addr_from_sockaddr(
6+
storage: &libc::sockaddr_storage,
7+
len: libc::socklen_t,
8+
) -> io::Result<SocketAddr> {
9+
match storage.ss_family as libc::c_int {
10+
libc::AF_INET => {
11+
if (len as usize) < std::mem::size_of::<libc::sockaddr_in>() {
12+
return Err(io::Error::new(
13+
io::ErrorKind::InvalidInput,
14+
"invalid argument (inet len)",
15+
));
16+
}
17+
18+
// SAFETY: Casting from sockaddr_storage to sockaddr_in is safe since we have validated the len.
19+
let addr =
20+
unsafe { &*(storage as *const libc::sockaddr_storage as *const libc::sockaddr_in) };
21+
22+
let ip = u32::from_be(addr.sin_addr.s_addr);
23+
let ip = std::net::Ipv4Addr::from_bits(ip);
24+
let port = u16::from_be(addr.sin_port);
25+
26+
Ok((ip, port).into())
27+
}
28+
libc::AF_INET6 => {
29+
if (len as usize) < std::mem::size_of::<libc::sockaddr_in6>() {
30+
return Err(io::Error::new(
31+
io::ErrorKind::InvalidInput,
32+
"invalid argument (inet6 len)",
33+
));
34+
}
35+
// SAFETY: Casting from sockaddr_storage to sockaddr_in6 is safe since we have validated the len.
36+
let addr = unsafe {
37+
&*(storage as *const libc::sockaddr_storage as *const libc::sockaddr_in6)
38+
};
39+
40+
let ip = u128::from_be_bytes(addr.sin6_addr.s6_addr);
41+
let ip = std::net::Ipv6Addr::from_bits(ip);
42+
let port = u16::from_be(addr.sin6_port);
43+
44+
Ok((ip, port).into())
45+
}
46+
_ => Err(io::Error::new(
47+
std::io::ErrorKind::InvalidInput,
48+
"invalid argument (ss_family)",
49+
)),
50+
}
51+
}
52+
53+
/// Convert from `std::net::SocketAddr` to `libc::sockaddr_storage`+`libc::socklen_t`
54+
#[allow(unsafe_code)]
55+
pub fn sockaddr_from_socket_addr(addr: SocketAddr) -> (libc::sockaddr_storage, libc::socklen_t) {
56+
// SAFETY: All zeroes is a valid sockaddr_storage
57+
let mut storage: libc::sockaddr_storage = unsafe { std::mem::zeroed() };
58+
59+
let len = match addr {
60+
SocketAddr::V4(v4) => {
61+
let p = &mut storage as *mut libc::sockaddr_storage as *mut libc::sockaddr_in;
62+
// SAFETY: sockaddr_storage is defined to be big enough for any sockaddr_*.
63+
unsafe {
64+
p.write(libc::sockaddr_in {
65+
sin_family: libc::AF_INET as _,
66+
sin_port: v4.port().to_be(),
67+
sin_addr: libc::in_addr {
68+
s_addr: v4.ip().to_bits().to_be(),
69+
},
70+
sin_zero: Default::default(),
71+
})
72+
};
73+
std::mem::size_of::<libc::sockaddr_in>() as libc::socklen_t
74+
}
75+
SocketAddr::V6(v6) => {
76+
let p = &mut storage as *mut libc::sockaddr_storage as *mut libc::sockaddr_in6;
77+
// SAFETY: sockaddr_storage is defined to be big enough for any sockaddr_*.
78+
unsafe {
79+
p.write(libc::sockaddr_in6 {
80+
sin6_family: libc::AF_INET6 as _,
81+
sin6_port: v6.port().to_be(),
82+
sin6_flowinfo: v6.flowinfo().to_be(),
83+
sin6_addr: libc::in6_addr {
84+
s6_addr: v6.ip().to_bits().to_be_bytes(),
85+
},
86+
sin6_scope_id: v6.scope_id().to_be(),
87+
})
88+
};
89+
std::mem::size_of::<libc::sockaddr_in6>() as libc::socklen_t
90+
}
91+
};
92+
93+
(storage, len)
94+
}
95+
96+
#[cfg(test)]
97+
mod tests {
98+
#![allow(unsafe_code, clippy::undocumented_unsafe_blocks)]
99+
100+
use std::{
101+
net::{IpAddr, Ipv4Addr, Ipv6Addr},
102+
str::FromStr as _,
103+
};
104+
105+
use super::*;
106+
107+
use test_case::test_case;
108+
109+
#[test]
110+
fn socket_addr_from_sockaddr_unknown_af() {
111+
// Test assumes these don't match the zero initialized
112+
// libc::sockaddr_storage::ss_family.
113+
assert_ne!(libc::AF_INET, 0);
114+
assert_ne!(libc::AF_INET6, 0);
115+
116+
let storage = unsafe { std::mem::zeroed() };
117+
let err =
118+
socket_addr_from_sockaddr(&storage, std::mem::size_of::<libc::sockaddr_storage>() as _)
119+
.unwrap_err();
120+
121+
assert!(matches!(err.kind(), std::io::ErrorKind::InvalidInput));
122+
assert!(err.to_string().contains("invalid argument (ss_family)"));
123+
}
124+
125+
#[test]
126+
fn socket_addr_from_sockaddr_unknown_af_inet_short() {
127+
let mut storage: libc::sockaddr_storage = unsafe { std::mem::zeroed() };
128+
storage.ss_family = libc::AF_INET as libc::sa_family_t;
129+
130+
let err = socket_addr_from_sockaddr(
131+
&storage,
132+
(std::mem::size_of::<libc::sockaddr_in>() - 1) as _,
133+
)
134+
.unwrap_err();
135+
136+
assert!(matches!(err.kind(), std::io::ErrorKind::InvalidInput));
137+
assert!(err.to_string().contains("invalid argument (inet len)"));
138+
}
139+
140+
#[test]
141+
fn socket_addr_from_sockaddr_unknown_af_inet6_short() {
142+
let mut storage: libc::sockaddr_storage = unsafe { std::mem::zeroed() };
143+
storage.ss_family = libc::AF_INET6 as libc::sa_family_t;
144+
145+
let err = socket_addr_from_sockaddr(
146+
&storage,
147+
(std::mem::size_of::<libc::sockaddr_in6>() - 1) as _,
148+
)
149+
.unwrap_err();
150+
151+
assert!(matches!(err.kind(), std::io::ErrorKind::InvalidInput));
152+
assert!(err.to_string().contains("invalid argument (inet6 len)"));
153+
}
154+
155+
#[test]
156+
fn sockaddr_from_socket_addr_inet() {
157+
let socket_addr = SocketAddr::new(IpAddr::V4(Ipv4Addr::new(127, 0, 0, 1)), 8080);
158+
let (storage, len) = sockaddr_from_socket_addr(socket_addr);
159+
assert_eq!(storage.ss_family, libc::AF_INET as libc::sa_family_t);
160+
assert_eq!(len as usize, std::mem::size_of::<libc::sockaddr_in>());
161+
}
162+
163+
#[test]
164+
fn sockaddr_from_socket_addr_inet6() {
165+
let socket_addr = SocketAddr::new(IpAddr::V6(Ipv6Addr::new(0, 0, 0, 0, 0, 0, 0, 1)), 8080);
166+
let (storage, len) = sockaddr_from_socket_addr(socket_addr);
167+
assert_eq!(storage.ss_family, libc::AF_INET6 as libc::sa_family_t);
168+
assert_eq!(len as usize, std::mem::size_of::<libc::sockaddr_in6>());
169+
}
170+
171+
#[test_case("127.0.0.1:443")]
172+
#[test_case("[::1]:8888")]
173+
fn round_trip(addr: &str) {
174+
let orig = SocketAddr::from_str(addr).unwrap();
175+
let (storage, len) = sockaddr_from_socket_addr(orig);
176+
let round_tripped = socket_addr_from_sockaddr(&storage, len).unwrap();
177+
assert_eq!(orig, round_tripped)
178+
}
179+
}

lightway-server/Cargo.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,8 @@ license = "GPL-2.0-only"
99
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
1010

1111
[features]
12-
default = ["io-uring"]
12+
default = []
1313
debug = ["lightway-core/debug"]
14-
io-uring = ["lightway-app-utils/io-uring"]
1514

1615
[lints]
1716
workspace = true
@@ -26,6 +25,7 @@ clap.workspace = true
2625
ctrlc.workspace = true
2726
delegate.workspace = true
2827
educe.workspace = true
28+
io-uring.workspace = true
2929
ipnet.workspace = true
3030
jsonwebtoken = "9.3.0"
3131
libc.workspace = true
@@ -48,6 +48,7 @@ tokio-stream = { workspace = true, features = ["time"] }
4848
tracing.workspace = true
4949
tracing-log = "0.2.0"
5050
tracing-subscriber = { workspace = true, features = ["json"] }
51+
tun.workspace = true
5152
twelf.workspace = true
5253

5354
[dev-dependencies]

lightway-server/src/args.rs

Lines changed: 53 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -71,13 +71,25 @@ pub struct Config {
7171
#[clap(long, default_value_t)]
7272
pub enable_pqc: bool,
7373

74-
/// Enable IO-uring interface for Tunnel
75-
#[clap(long, default_value_t)]
76-
pub enable_tun_iouring: bool,
77-
78-
/// IO-uring submission queue count. Only applicable when
79-
/// `enable_tun_iouring` is `true`
80-
// Any value more than 1024 negatively impact the throughput
74+
/// Total IO-uring submission queue count.
75+
///
76+
/// Must be larger than the total of:
77+
///
78+
/// UDP:
79+
///
80+
/// iouring_tun_rx_count + iouring_udp_rx_count +
81+
/// iouring_tx_count + 1 (cancellation request)
82+
///
83+
/// TCP:
84+
///
85+
/// iouring_tun_rx_count + iouring_tx_count + 1 (cancellation
86+
/// request) + 2 * maximum number of connections.
87+
///
88+
/// Each connection actually uses up to 3 slots, a persistent
89+
/// recv request and on demand slots for TX and cancellation
90+
/// (teardown).
91+
///
92+
/// There is no downside to setting this much larger.
8193
#[clap(long, default_value_t = 1024)]
8294
pub iouring_entry_count: usize,
8395

@@ -87,6 +99,36 @@ pub struct Config {
8799
#[clap(long, default_value = "100ms")]
88100
pub iouring_sqpoll_idle_time: Duration,
89101

102+
/// Number of concurrent TUN device read requests to issue to
103+
/// IO-uring. Setting this too large may negatively impact
104+
/// performance.
105+
#[clap(long, default_value_t = 64)]
106+
pub iouring_tun_rx_count: u32,
107+
108+
/// Configure TUN device in blocking mode. This can allow
109+
/// equivalent performance with fewer `ìouring-tun-rx-count`
110+
/// entries but can significantly harm performance on some kernels
111+
/// where the kernel does not indicate that the tun device handles
112+
/// `FMODE_NOWAIT`.
113+
///
114+
/// If blocking mode is enabled then `iouring_tun_rx_count` may be
115+
/// set much lower.
116+
///
117+
/// This was fixed by <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=438b406055cd21105aad77db7938ee4720b09bee>
118+
/// which was part of v6.4-rc1.
119+
#[clap(long, default_value_t = false)]
120+
pub iouring_tun_blocking: bool,
121+
122+
/// Number of concurrent UDP socket recvmsg requests to issue to
123+
/// IO-uring.
124+
#[clap(long, default_value_t = 32)]
125+
pub iouring_udp_rx_count: u32,
126+
127+
/// Maximum number of concurrent UDP + TUN sendmsg/write requests
128+
/// to issue to IO-uring.
129+
#[clap(long, default_value_t = 512)]
130+
pub iouring_tx_count: u32,
131+
90132
/// Log format
91133
#[clap(long, value_enum, default_value_t = LogFormat::Full)]
92134
pub log_format: LogFormat,
@@ -111,6 +153,10 @@ pub struct Config {
111153
#[clap(long, default_value_t = ByteSize::mib(15))]
112154
pub udp_buffer_size: ByteSize,
113155

156+
/// Set UDP buffer size. Default value is 256 KiB.
157+
#[clap(long, default_value_t = ByteSize::kib(256))]
158+
pub tcp_buffer_size: ByteSize,
159+
114160
/// Enable WolfSSL debug logging
115161
#[cfg(feature = "debug")]
116162
#[clap(long)]

0 commit comments

Comments
 (0)