Skip to content

Conversation

mook-as
Copy link
Contributor

@mook-as mook-as commented Sep 4, 2025

This changes waiting for requirements (e.g. ssh) to retry every second instead of every ten; this can help with startup speed when they are actually met shortly after the previous check.

For example, in Rancher Desktop on my machine, SSH is usually ready about three seconds after the second check (at ~11 seconds vs ~14 seconds).

This changes waiting for requirements (e.g. ssh) to retry every second
instead of every ten; this can help with startup speed when they are
actually met shortly after the previous check.

For example, in Rancher Desktop on my machine, SSH is usually ready about
three seconds after the second check (at ~11 seconds vs ~14 seconds).

Signed-off-by: Mark Yen <[email protected]>
Copy link
Member

@jandubois jandubois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem with this change is that it dumps tcpproxy errors to the console when we attempt to connect to ssh while the server is not yet ready:

INFO[0000] [hostagent] Waiting for the essential requirement 1 of 2: "ssh"
INFO[0000] [hostagent] [VZ] - vm state change: running
INFO[0005] [hostagent] 2025/09/04 14:02:31 tcpproxy: for incoming conn 127.0.0.1:60391, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host
INFO[0007] [hostagent] 2025/09/04 14:02:33 tcpproxy: for incoming conn 127.0.0.1:60392, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: connection was refused
INFO[0008] [hostagent] 2025/09/04 14:02:34 tcpproxy: for incoming conn 127.0.0.1:60393, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: connection was refused
INFO[0009] [hostagent] 2025/09/04 14:02:35 tcpproxy: for incoming conn 127.0.0.1:60395, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: connection was refused
INFO[0009] [hostagent] The essential requirement 1 of 2 is satisfied

The errors seem to come from https://github.com/containers/gvisor-tap-vsock/blob/c0c07dccdd3a4cced4de6a5fdf6433740a048900/pkg/tcpproxy/tcpproxy.go#L465.

Any idea how we can suppress them? @balajiv113 or @nirs?

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may consume more CPU and battery ?

Probably we should rather avoid polling, and find a way to wait for events?

@jandubois
Copy link
Member

This may consume more CPU and battery ?

This seems a bit silly; these are like 10 to 15 commands executed over ssh. How much CPU and battery do you expect it to consume compared to a VM starting up services.

Probably we should rather avoid polling, and find a way to wait for events?

Yes, that should be the ultimate goal for all the probes. But we are not there yet, and this change seems like a simple way to cut down instance start time by something like 5 seconds.

@jandubois
Copy link
Member

The time needed for limactl shell default true seems less than 50ms, so you would have to call it every second for 2 minutes before the additional overhead matches your expected saving.

I would be concerned if this would be a constantly running loop. But it only runs until SSH becomes available, which normally takes less than 20s, in which case the number of ssh calls would turn from 2 into up to 20, which will add less than a second, and potentially save up to 9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants