-
Notifications
You must be signed in to change notification settings - Fork 9
Direct writing to/reading from of underlying buffer without Bincode #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@andrew-manifold Thanks so much for this, and sorry for the delayed response. I'm planning to review this as soon as I have the chance -- probably this Friday. |
dicej
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for this. I think the send_direct* parts look good overall (see some comments inline), but I have concerns about the {try_}recv_direct parts, and I think ZeroCopyContext::{try_}recv::<_, &[u8]> can do the job just as efficiently and more correctly.
If you can demonstrate that bincode is adding significant overhead even for the &[u8] case, then perhaps we could add {try_}recv_direct functions to ZeroCopyContext, but I don't believe they can be added soundly to Receiver.
src/lib.rs
Outdated
| /// Attempt to read a message without blocking. Returns a slice directly from the ring buffer | ||
| /// | ||
| /// This will return `Ok(None)` if there are no messages immediately available. | ||
| pub fn try_recv_direct<'a>(&'a self) -> Result<Option<&'a [u8]>>{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What advantage does this have over using ZeroCopyContext::try_recv<_, &[u8]>? I would expect it to give identical performance. Have you run any benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you've mentioned below it's more around giving an assumption less means for the developer to get access to the underlying bytes. ShmWriter puts all the ownership on the user to properly implement encoding and it was helpful to have a means to allow for proper decoding too. Ideally there would be another trait for ShmReader but the lifetime pollution issues were getting pretty horrific so I opted to just return the backing buffer
src/lib.rs
Outdated
| }) | ||
| } | ||
|
|
||
| fn try_recv_direct_0<'a>(&'a self) -> Result<Option<(&'a [u8], u32)>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's significant code duplication across this and try_recv_0 (and likewise across recv_direct_timeout_0 and recv_timeout_0). Assuming these provide a significant performance improvement over ZeroCopyContext, and that they can be used safely and soundly (per my comment above), perhaps we could re-implement try_recv_0 in terms of try_recv_direct_0 (and likewise recv_timeout_0 in terms of recv_direct_timeout_0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree, the code duplication isn't great. I'll try to come up with a more unified approach that would generalize across the safe + unsafe implementations.
| /// This will return `Ok(None)` if there are no messages immediately available. | ||
| pub fn try_recv_direct<'a>(&'a self) -> Result<Option<&'a [u8]>>{ | ||
| Ok(if let Some((value, position)) = self.try_recv_direct_0()? { | ||
| self.seek(position)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is both unsafe and unsound. The call to seek tells the writer(s) that we're done reading the buffer, which is not true, since we're about to return a shared reference to the caller. That violates Rust's aliasing rules because any writer(s) will believe they have unique, mutable access to something the caller has shared, immutable access to.
ZeroCopyContext exists precisely to avoid this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a valid point, I'll drop that method for the direct version, it indeed should only every be called within the usage of ZeroCopyContext
| /// `Err(`[`Error::ZeroSizedMessage`](enum.Error.html#variant.ZeroSizedMessage)`))`. If the message size is | ||
| /// greater than the ring buffer capacity, this method will return | ||
| /// `Err(`[`Error::MessageTooLarge`](enum.Error.html#variant.MessageTooLarge)`))`. | ||
| pub fn send_direct(&self, value: &impl ShmWriter) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all the send_direct* methods should be marked unsafe given the inherent unsafety of ShmWriter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, can easily update those
src/lib.rs
Outdated
| self.send_direct_timeout_0(value, true, None).map(drop) | ||
| } | ||
|
|
||
| fn send_direct_timeout_0( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could reduce code duplication here by re-implementing send_timeout_0 in terms of send_direct_timeout_0, using a bincode-based ShmWriter implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I like that idea lot. It will keep things much cleaner and easier to maintain
src/lib.rs
Outdated
| /// to account for padding, alignment, proper offsetting etc. or you will cause mayhem. | ||
| pub trait ShmWriter { | ||
| fn msg_len(&self) -> u32; | ||
| fn write_to_shm(&self, shm_ptr: *mut u8, msg_len: u32) -> Result<()>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage of using a raw pointer here rather than a &mut [u8] slice? That would make the API safe, but still allow the implementer to convert the slice to a raw pointer if they want to do something unusual (in which case soundness is their responsibility).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't necessarily think there is an advantage for using a raw pointer, probably more so a result of some bad habits that have been picked up over time haha. Will update to use &mut [u8] as it does give more flexibility
|
Ah, I just realized that using |
The context of my interest in adding these methods is that I've run into some corner cases with All the feedback is very much appreciated, I'll try to get updates into this PR soon that address all the issues you've pointed out. |
|
@dicej I believe all major issues have been fixed, let me know if you think other tweaks would be helpful. I managed to unify both the sender and receiver api's to use one common implementation underneath for each |
dicej
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking; thanks for the updates.
Three requests (plus another inline, below):
- Would you please run
cargo fmton the code? - Would you mind adding one or more tests for the
_directcases so we have some coverage there? That will help ensure this feature doesn't regress in the future. - I just noticed that there seems to be a subtle regression such that
cargo test arbitrary_casehangs indefinitely on your branch but works on themasterbranch. It's not consuming any CPU, so it might be a deadlock. Would you mind taking a look at that?
src/lib.rs
Outdated
| /// `Err(`[`Error::ZeroSizedMessage`](enum.Error.html#variant.ZeroSizedMessage)`))`. If the message size is | ||
| /// greater than the ring buffer capacity, this method will return | ||
| /// `Err(`[`Error::MessageTooLarge`](enum.Error.html#variant.MessageTooLarge)`))`. | ||
| pub unsafe fn send_direct(&self, value: &impl ShmWriter) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I recommended adding unsafe to these, but now that ShmWriter::write_to_shim takes a &mut [u8] instead of a *mut u8, I think we can remove the unsafe keywords from the send_direct* methods. AFAICT, there's no way for the caller to use them unsoundly unless the caller (or ShmWriter implementer) itself uses unsafe. Sorry for the confusion.
|
I have everything besides the Currently looking into what is setup incorrectly to cause this as it's this section in the tests: |
|
@dicej so after doing more digging I've found a few takeaways:
As fas as how this is happening with my PR and not on master is still an unknown to me. My hypothesis is that the compiler is able to possibly do some optimizations in one of the two versions that allows for faster processing. I'll need to look at the assembly to see where things are inconsistent. running this test on my branch on one of my dev servers and not a Mac book pro (my laptop) has it pass (machine specs below): |
|
Additional unit tests are now included that cover |
|
I know this is an old PR (I'm not sure why it wasn't merged in the end, seems like interest was lost), but I am pretty interested in it. Would it make more sense to just pass slices and allow the developer to choose how to handle them? I assume some length prefix would be required in that case. |
Neither I nor the author were able to debug the test hangs/deadlocks. For my part, I spent a few hours closely reading the code trying to identify whatever data race or subtle unsoundness might account for those failures, but never succeeded. Unfortunately, I haven't had the time or motivation to continue debugging them, and presumably neither has the author. If you do happen to have the time and motivation, feel free to dig in. I agree that this would be an awesome PR to merge if we can make it robust.
I don't think it's quite that simple, given Rust's rules about unique ( If you want direct access to the shared memory, you might be better served by using e.g. memmap2 or shared_memory directly (and accepting that you'll need to write some |
|
Thanks for the reply!
Not particularly hehe. I like what you've done, it's very useful to me right now. I suspect that by using the zero copy context, I can reduce the allocations as much as possible. |
This crate has a fantastically designed API but at times the hard dependency on
bincodeintroduces excessive overhead for very low latency/high throughput applications. I've included an identical set of methods forSenderandReceiver(the_direct_versions) that allow for:ShmWritertrait allows implementing structs to customize their encoding logic and directly write to the mmap backing buffer without additional costs around bincode's layer and additional allocations this may have originally resulted in (i.e. no intermediaryVec<u8>gets setup)&[u8]'s contained within a message should be able to be easily offset to with no real overhead