Skip to content

bug: metrics with no label constraint are exposed to potential race conditions #1951

@denopink

Description

@denopink

Latest version and main are impacted and any releases since Jun 27, 2023 (bug introduced in #1296).

Both func constrainLabels and func constrainLabels return the reference to the user's label list or map (when there's no label constraints), instead of creating a new copy (this is the root cause of the potential race conditions):

func constrainLabels(desc *Desc, labels Labels) (Labels, func()) {
if len(desc.variableLabels.labelConstraints) == 0 {
// Fast path when there's no constraints
return labels, func() {}
}

func constrainLabelValues(desc *Desc, lvs []string, curry []curriedLabelValue) []string {
if len(desc.variableLabels.labelConstraints) == 0 {
// Fast path when there's no constraints
return lvs
}

Creating a handful of potential race conditions in functions utilizing these two functions like func (m *MetricVec) GetMetricWith:

func (m *MetricVec) GetMetricWith(labels Labels) (Metric, error) {
labels, closer := constrainLabels(m.desc, labels)
defer closer()
h, err := m.hashLabels(labels)
if err != nil {
return nil, err
}
return m.getOrCreateMetricWithLabels(h, labels, m.curry), nil
}

At best, when a race conditions is encountered, metrics will be unintentionally written with incorrect labels. At worst:

  1. potentially crashing the program if the race condition is hit just right, triggering a fatal error: concurrent map iteration and map write
  2. metric labels values could change to a previous written value set, triggering collected before with the same name and label values errors

For example, this is a reproducible race condition in func (m *MetricVec) GetMetricWith:

  1. initial call myCounterVec.With(myLabels) with myLabels := map[string]string{"foo": "bar"} and assuming there no label constraints
    1 which will call c, err := v.GetMetricWith(labels) and will make these internal calls
    1. labels, closer := constrainLabels(m.desc, labels)
      1. labels still points to the same map as myLabels due to there being no label constraints being used
    2. h, err := m.hashLabels(labels)
      2. h being the hash of the labels values of map[string]string{"foo": "bar"}, i.e.: hash of "bar"
    3. return m.getOrCreateMetricWithLabels(h, labels, m.curry), nil, will make the following calls
      1. assuming a metric didn't exist for the hash value hash, then metric, ok := m.getMetricWithHashAndLabels(hash, labels, curry) will results in ok := False
      2. leading to the creation of a new metric lvs := extractLabelValues(m.desc, labels, curry)
        1. since labels still points to the same map as myLabels, then a value change from "foo": "bar" to "foo": "cake" in myLabels will be reflected in labels
      3. m.metrics[hash] = append(m.metrics[hash], metricWithLabelValues{values: lvs, metric: metric})
        1. assuming that label value change occurred in myLabels right before lvs := extractLabelValues(m.desc, labels, curry) was executed, then you'll have a miss match between hash and the new metric metric = m.newMetric(lvs...)

The miss match between hash and the new metric, will either:

  1. assuming the new metric previously didn't exist
    1. potentially crash the program due to a fatal error: concurrent map iteration and map write if the race condition is hit just right
    2. the new metric will unintentionally be written with the wrong labels
  2. assuming it did exist
    1. again, potentially triggering a fatal error: concurrent map iteration and map write crash
    2. mfs, done, err := reg.Gather() returns a collected before with the same name and label values error due to its checkMetricConsistency call and will either, depending the prom http setting
      1. PanicOnError - panic due to the error
      2. HTTPErrorOnError - return 500s with the body as the error when metrics are scraped
      3. ContinueOnError - ignores the error (besides logging it which is all three cases do) or report the error if no metrics have been gathered

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions