Skip to content

Guest monitoring #4037

@jandubois

Description

@jandubois

There have been several discussion lately that all share the need to report information from the VM to the host, ideally as events. For example:

I see some general categories:

  • boot/cloud-init progress monitoring (requirements probes)
  • guest monitoring (cpu/memory/disk)
  • port monitoring via events APIs
  • reporting IP address
  • heartbeats and health checks
  • collecting user-defined status (e.g systemd services states)

Guest and Host Agents

We already have a mechanism for this: the guest agent collecting information and reporting it to the host agent. This mechanism should be extensible.

Allow multiple connections to the host agent

I think the best way to extend the mechanism is to allow multiple connections to the host agent. That way additional monitoring tasks can be completely decoupled from the guest agent binary (and release cycle).

The reason for additional agents to connect directly to the host agent instead of a central guest agent is that it is extensible to "plain" instances that don't include the default Lima guest agent. It also means integration tests can generate test events on the host.

The host agent listeners should put any incoming events into a channel, and then a single Go routine can process them. That way there are no concurrency issues.

The message format needs to become a stable API

We need to document the event types that the host agent understands. While I think that port monitoring for containerd, dockerd, and kubernetes should be part of the default guest agent, it should also be possible for a new container runtime to provide its own port monitoring agent that reports directly to the host agent.

That means the logic to consolidate port forwarding information should move from the guest agent to the host agent.

Monitoring data should be stored in the instance directory

For the initial implementation I think all monitoring data should be stored in a JSON file inside the instance directory (e.g. status.json)1. It should be updated atomically.

This data can be queried with limactl list, which merges it with the static information it currently reports.

Some data, like IP addresses, will be known to Lima, and will be stored under a key at the top level. But users can also report arbitrary JSON objects under a key name, and this data will be available under a separate object (.status is already taken, so maybe .user or .info), e.g.

limactl ls INST --yq '.info.service.containerd'

to query the state of the containerd service (assuming there is some kind of systemd service reporter running inside the instance).

Conceptually this would be similar to Kubernetes .status objects.

Implementation

This will be larger project, so should be implemented in the smallest steps that can be individually tested. We can break things down further once we have agreement on the high level concept.

Footnotes

  1. Constantly updating a JSON file may become inefficient if we want to support frequent heart beats, or cpu/memory/disk reporting for an "activity monitor". But we can implement something better later, if we actually need it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions