Skip to content

Conversation

daaang
Copy link
Collaborator

@daaang daaang commented May 13, 2025

This shares the same alert name, summary, labels, and for value with the existing up{job="ipmi"} == 0 alert. Sometimes when a lights out device is not responding, the ipmi exporter does respond like this:

# HELP ipmi_up '1' if a scrape of the IPMI device was successful, '0' otherwise.
# TYPE ipmi_up gauge
ipmi_up{collector="bmc"} 0
ipmi_up{collector="chassis"} 0
ipmi_up{collector="ipmi"} 0

If this happens, the corresponding up value will be 1 since there was a response with any metrics, even if that response happens to be "everything I tried to talk to is down."

So this adds a ipmi_up{collector="ipmi"} == 0 condition. It's fine for these to share a name in part because they will never fire at the same time:

  1. If up == 0 then there are no metrics, so ipmi_up cannot also == 0.
  2. If up == 1 then an alert only fires if ipmi_up is == 0.

This shares the same alert name, summary, labels, and for value with the
existing `up{job="ipmi"} == 0` alert. Sometimes when a lights out device
is not responding, the ipmi exporter does respond like this:

    # HELP ipmi_up '1' if a scrape of the IPMI device was successful, '0' otherwise.
    # TYPE ipmi_up gauge
    ipmi_up{collector="bmc"} 0
    ipmi_up{collector="chassis"} 0
    ipmi_up{collector="ipmi"} 0

If this happens, the corresponding `up` value will be `1` since there
was a response with any metrics, even if that response happens to be
"everything I tried to talk to is down."

So this adds a `ipmi_up{collector="ipmi"} == 0` condition. It's fine for
these to share a name in part because they will never fire at the same
time:

1.  If `up == 0` then there are no metrics, so `ipmi_up` cannot also `== 0`.
2.  If `up == 1` then an alert only fires if `ipmi_up` is `== 0`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant