Skip to content

Don't panic in nodemanager when local metadata returns an error #9544

@mdbooth

Description

@mdbooth

What would you like to be added:

When nodemanager encounters an error accessing the local metadata service it panics:

// Instead of just logging the error, panic and node manager can restart
utilruntime.Must(fmt.Errorf("failed to initialize node %s at cloudprovider: %w", node.Name, err))

It does this intentionally, so this is technically not a bug. The purpose is to force the process to exit and retry.

This is just my supposition, but I suspect it does this because, the way the code is structured it would be complex to add a retry here. This code is called directly by a handler on an informer in response to a Node being added. These functions are not typically intended to be used this way. Consequently there are a number of error paths in this code which are currently only logged and never retried.

There is likely a task here to restructure this code to be more robust to errors, but I'm not asking for that here. I would simply like it to log the error and exit rather than panic.

Why is this needed:

This is really a cosmetic request. The current behaviour is correct. However, panics are ugly and extremely uncommon, so we have a generic task in OpenShift CI to look for them. They're treated as release blockers by default every time they occur, so it's a bit of a fuss to convince folks to ignore one.

Separately, as noted I think this code could do with some restructuring.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions