-
Notifications
You must be signed in to change notification settings - Fork 114
runtime: completion fences + async‑friendly client helpers #878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0ead4ec to
ed52052
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
work_done and sync are essentially the same. We could update the sync implementations in both cuda and hip to the new way, using events, but I would remove work_done altogether.
| impl Drop for Fence { | ||
| fn drop(&mut self) { | ||
| if !self.event.is_null() { | ||
| unsafe { | ||
| let _ = cudarc::driver::result::event::destroy(self.event); | ||
| self.event = core::ptr::null_mut(); | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fence is destroyed right after we wait on it. The methods wait_async and wait_sync take ownership, so you can't wait more than once on an event. I don't mind removing the destroy call in both functions, but we should also change the method signatures to take &self instead, so that we can wait multiple times on the same event.
- Add ComputeServer::work_done() (default to sync) - Channels: non-blocking WorkDone (MPSC); forwarders on Mutex/Cell - WGPU: use Queue::on_submitted_work_done for precise, non-blocking completion - CUDA/HIP: native event fences; add Drop to avoid event leaks on cancellation - ComputeClient: fence(), fence_handle() (blocking wait), execute_async(), write_async() - Test: fence_completes_submitted_work (dummy backend) Single commit; vendor-only (no softmax or unrelated changes)
… Drop. Remove work_done API and route client helpers through sync(); keep WGPU sync via on_submitted_work_done
2a9e9e9 to
84f2025
Compare
…hods take &self and support multi-wait semantics
|
Per feedback, I simplified the API and adjusted fence semantics:
|
|
Is it really needed to add execute_async and write_async? In the common case you shouldn't execute one kernel and then immediatly wait. People can call sync() when they want to themselves when they've submitted all the work.
|
This PR introduces precise, non‑blocking completion fences across CubeCL runtimes, with small client conveniences that make async integration (e.g., Kotlin coroutines via UniFFI) natural and ergonomic—without disrupting existing backends or APIs.
Why
Behavior
ComputeServer::work_done() -> DynFut<()>(default:sync()), returning a future that resolves when all work submitted up to the call completes.ComputeChannel::work_done()ComputeClienthelpersfence() -> DynFut<()>— awaitable completion.fence_handle() -> ClientFence— optional blocking handle withwait().execute_async(..) -> DynFut<()>andwrite_async(..) -> DynFut<Result<(), IoError>>— convenience wrappers that submit and then await completion.Backends
Queue::on_submitted_work_done(wgpu#6395 landed in wgpu 26). We callflush()first so the fence observes all previously submitted work, then wrap the callback into a Future—fully non‑blocking and precise.CUeventon the stream;work_done()returns a future that waits for the event. AddsDropon fence to avoid event leaks on cancellation.hipEvent/hipStreamWaitEvent; also addsDropfor safety.Compatibility & scope
work_done()defers tosync(). No behavior changes required in existing backends to adopt this API.Validation
fence_completes_submitted_workon the dummy backend to verify ordering and completion semantics.cargo xtask validate(audit, fmt, clippy, build, doc‑tests).Notes
queue.on_submitted_work_donebeing unavailable is now obsolete; we use the native callback on wgpu 26 as intended.PR has been validated with Burn — no compilation or test errors.