Skip to content

Termination driver helper process is not reliable #21092

@nirs

Description

@nirs

Drivers running a helper process (qemu, vfkit, krunkit) terminate the helper process using an API. qemu uses qmp protocol, and vfkit and krunkit use HTTP protocol to submit a Stop request. All them share the same issue - the stop request can be ignored by the guest, either due to incorrect kernel configuration (#21089) or to issues in the guest. In this case the helper process will continue running minikube stop will timeout.

How graceful termination should work

  1. User run minikube stop
  2. Driver.Stop() is called
  3. Driver use the API to perform graceful shutdown
  4. Minikube monitor driver status
  5. After a driver configurable timeout, minikube invokes Driver.Kill()
  6. Minikube invokes Driver.Wait() to reap the terminate processes

Graceful shutdown with multiple nodes

When stopping multiple nodes, we don't want to preform all the steps for all nodes since it is much slower. For example if we define a 60 seconds graceful timeout, stopping cluster with 3 nodes will take 3 minutes.

  1. User run minikube stop
  2. For every node, Minikube calls Driver.Stop()
  3. For every node, Minikube monitors driver status
  4. After a driver configurable timeout, minikube invokes Driver.Kill() for all nodes
  5. For every node, minikube calls Driver.Wait()

The same flow should be used for any helper process.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions