tokio-splice2 - The splice(2) syscall API wrapper, async Rust ready.
Implemented splice(2) based unidirectional/bidirectional data transmission, just like tokio::io::copy_bidirectional.
See examples/proxy.rs, with a Go implementation for comparison.
-
TODO:
- Reuse pipe.
-
0.3.0:
- MSRV is now 1.70.0.
- Replace
libcwithrustix. - Add
tracinglogger support. - Add unidirectional copy.
- Returns
TrafficResultinstead ofio::Result<T>to have traffic statistics returned when error occurs. - (Experimental) Add
tokio::fs::Filesupport to splice from (likesendfile) / to (not fully tested). - (Experimental) Traffic rate limitation support.
1.70.0
While no formal benchmarks have been conducted, iperf3 testing results indicate that the throughput is comparable to tokio-splice and slightly outperforms the Go implementation.
-
When splicing data from a file to a pipe and then splicing from the pipe to a socket for transmission, the data is referenced from the page cache corresponding to the file. If the original file is modified while the splice operation is in progress (i.e., the data is still in the kernel buffer and has not been fully sent to the network), there may be a situation where the transmitted data is the old data (before modification). Because there is no clear mechanism to know when the data has truly "left" the kernel and been sent to the network, thus safely allowing the file to be modified. Linus Torvalds once commented that this is the "key point" of splice design, which shares references to data pages and behaves similarly to
mmap(). This is a complex issue concerning data consistency and concurrent access.See lwn.net/Articles/923237 and rust#116451.
This crate requires passing
&mut Rto prevent modification elsewhere before theFutureofsplice(2)I/O completes. However, this is just best-effort guarantee. -
In certain cases, such as transferring small chunks of data, frequently calling splice, or when the underlying driver/hardware does not support efficient zero-copy, the performance improvement may not meet expectations. It could even be lower than an optimized read/write loop due to additional system call overhead. The choice of pipe buffer size may also affect performance.
-
A successful
splice(2)call returns the number of bytes transferred, but this ONLY indicates that the data has entered the kernel buffer of the destination file descriptor (such as the send buffer of a socket). It does not mean the data has actually left the local network interface or been received by the peer.We call
flush/poll_flushafter bytes data is spliced from pipe to the target fd to ensure that it has been flushed to the destination. However, poor implementation oftokio::io::AsyncWritemay break this, as they may not flush the data immediately. -
For UDP zero-copy I/O,
splice(2)does not help. Linux kernel actually pays less attention optimizing UDP performance.Consider using
sendmmsgandrecvmmsginstead, which is a more efficient way to send/receive multiple UDP packets in a single system call. eBPF XDP is also a good choice for high-performance stateless UDP packet forwarding.
MIT OR Apache-2.0