Skip to content

Conversation

huan-xi
Copy link

@huan-xi huan-xi commented Jan 27, 2024

No description provided.

@willstott101
Copy link
Contributor

Thanks, do you know what situation this happens in?

@roderickvd
Copy link
Member

@willstott101 are you OK to merge this? We're planning a librespot release soon, and I was noticing the libmdns dependencies could use an update to clean up the tree. Then I saw these and two other PRs that look just about good to merge, so let met know how I can help!

@willstott101
Copy link
Contributor

I'm trying to understand when this would happen. I suppose if there is a panic in one of the responder tasks? The only unwraps in there I can obviously notice are from Mutex locking.

I guess this might be reasonable defensive programming against some unknown panic but the fact that this PR doesn't build (removal of CommandSender definition) and due to no response on when this issue has come up in real usage I'm going to close this PR.

@roderickvd
Copy link
Member

Yes, would be great if @huan-xi would provide that feedback.

I could think of one race condition during shutdown:

impl Drop for Shutdown {
    fn drop(&mut self) {
        self.0.send_shutdown();
        // TODO wait for tasks to shutdown
    }
}

Because that doesn't wait, it's possible that tasks begin shutting down and drop their receivers before the Service starts dropping:

impl Drop for Service {
    fn drop(&mut self) {
        let svc = self.services.write().unwrap().unregister(self.id);
        self.commands.send_unsolicited(svc, 0, false);
    }
}

I can imagine defensive programming would help to complete shutdown as best it could.

Or tackle the root cause and implement waiting for tasks to shutdown:

struct Shutdown {
    command_sender: CommandSender,
    shutdown_confirmations: Vec<Receiver<()>>,
}

fn default_handle(
    allowed_ips: Vec<IpAddr>,
    mut hostname: String,
) -> io::Result<(Responder, ResponderTask)> {
    // ...

    let mut shutdown_confirmations = Vec::new();
    
    let v4_result = FSM::<Inet>::new(&services, allowed_ips.clone()).map(|(task, cmd)| {
        let (shutdown_tx, shutdown_rx) = mpsc::channel();
        shutdown_confirmations.push(shutdown_rx);
        (task, command, shutdown_tx)
    });
    
    let v6_result = FSM::<Inet6>::new(&services, allowed_ips).map(|(task, cmd)| {
        let (shutdown_tx, shutdown_rx) = mpsc::channel();
        shutdown_confirmations.push(shutdown_rx);
        (task, command, shutdown_tx)
    });

    // ...
}

impl Drop for Shutdown {
    fn drop(&mut self) {
        const SHUTDOWN_TIMEOUT: Duration = Duration::from_secs(5);

        self.command_sender.send_shutdown();
               
        for confirmation in self.shutdown_confirmations.drain(..) {
            match confirmation.recv_timeout(SHUTDOWN_TIMEOUT) {
                Ok(()) => {
                    // Task confirmed shutdown
                }
                Err(mpsc::RecvTimeoutError::Timeout) => {
                    warn!("Task did not confirm shutdown within timeout");
                }
                Err(mpsc::RecvTimeoutError::Disconnected) => {
                    // Task already completed (this is fine)
                }
            }
        }
    }
}

Then keeping the expect() may be what you want, because it will tear down if something is obviously broken and hindering the tasks to shut down within the specified timeout.

@willstott101
Copy link
Contributor

Because Shutdown is in an Arc and referenced by both the Service and Responder structs the intention is for Shutdown::drop to only be called when the last field of the last struct referencing Shutdown is dropped.

@roderickvd
Copy link
Member

You are right... 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants