Skip to content

Add timeouts to refresh_health in VC #6231

@michaelsproul

Description

@michaelsproul

Description

While working on:

I noticed that our implementation of refresh_status could be optimised in several ways:

/// Perform some queries against the node to determine if it is a good candidate, updating
/// `self.status` and returning that result.
pub async fn refresh_status<T: SlotClock>(
&self,
slot_clock: Option<&T>,
spec: &ChainSpec,
log: &Logger,
) -> Result<(), CandidateError> {
let previous_status = self.status(RequireSynced::Yes).await;
let was_offline = matches!(previous_status, Err(CandidateError::Offline));
let new_status = if let Err(e) = self.is_online(was_offline, log).await {
Err(e)
} else if let Err(e) = self.is_compatible(spec, log).await {
Err(e)
} else if let Err(e) = self.is_synced(slot_clock, log).await {
Err(e)
} else {
Ok(())
};
// In case of concurrent use, the latest value will always be used. It's possible that a
// long time out might over-ride a recent successful response, leading to a falsely-offline
// status. I deem this edge-case acceptable in return for the concurrency benefits of not
// holding a write-lock whilst we check the online status of the node.
*self.status.write().await = new_status;
new_status
}

The issues are:

  • There's no timeout for any of the HTTP methods we call, so this function could run indefinitely in case of a very unresponsive BN.
  • We make 3 calls when 2 would probably be sufficient. We could use the response from is_synced to infer online status, rather than making a separate request for the version.

Version

Lighthouse v5.3.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    optimizationSomething to make Lighthouse run more efficiently.val-clientRelates to the validator client binary

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions