-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Latest version and main are impacted and any releases since Jun 27, 2023 (bug introduced in #1296).
Both func constrainLabels and func constrainLabels return the reference to the user's label list or map (when there's no label constraints), instead of creating a new copy (this is the root cause of the potential race conditions):
client_golang/prometheus/vec.go
Lines 665 to 669 in bf37be4
| func constrainLabels(desc *Desc, labels Labels) (Labels, func()) { | |
| if len(desc.variableLabels.labelConstraints) == 0 { | |
| // Fast path when there's no constraints | |
| return labels, func() {} | |
| } |
client_golang/prometheus/vec.go
Lines 684 to 688 in bf37be4
| func constrainLabelValues(desc *Desc, lvs []string, curry []curriedLabelValue) []string { | |
| if len(desc.variableLabels.labelConstraints) == 0 { | |
| // Fast path when there's no constraints | |
| return lvs | |
| } |
Creating a handful of potential race conditions in functions utilizing these two functions like func (m *MetricVec) GetMetricWith:
client_golang/prometheus/vec.go
Lines 238 to 248 in bf37be4
| func (m *MetricVec) GetMetricWith(labels Labels) (Metric, error) { | |
| labels, closer := constrainLabels(m.desc, labels) | |
| defer closer() | |
| h, err := m.hashLabels(labels) | |
| if err != nil { | |
| return nil, err | |
| } | |
| return m.getOrCreateMetricWithLabels(h, labels, m.curry), nil | |
| } |
At best, when a race conditions is encountered, metrics will be unintentionally written with incorrect labels. At worst:
- potentially crashing the program if the race condition is hit just right, triggering a
fatal error: concurrent map iteration and map write - metric labels values could change to a previous written value set, triggering
collected before with the same name and label valueserrors
For example, this is a reproducible race condition in func (m *MetricVec) GetMetricWith:
- initial call
myCounterVec.With(myLabels)withmyLabels := map[string]string{"foo": "bar"}and assuming there no label constraints
1 which will callc, err := v.GetMetricWith(labels)and will make these internal callslabels, closer := constrainLabels(m.desc, labels)labelsstill points to the same map asmyLabelsdue to there being no label constraints being used
h, err := m.hashLabels(labels)
2.hbeing the hash of the labels values ofmap[string]string{"foo": "bar"}, i.e.: hash of"bar"return m.getOrCreateMetricWithLabels(h, labels, m.curry), nil, will make the following calls- assuming a metric didn't exist for the hash value
hash, thenmetric, ok := m.getMetricWithHashAndLabels(hash, labels, curry)will results inok := False - leading to the creation of a new metric
lvs := extractLabelValues(m.desc, labels, curry)- since
labelsstill points to the same map asmyLabels, then a value change from"foo": "bar"to"foo": "cake"inmyLabelswill be reflected inlabels
- since
m.metrics[hash] = append(m.metrics[hash], metricWithLabelValues{values: lvs, metric: metric})- assuming that label value change occurred in
myLabelsright beforelvs := extractLabelValues(m.desc, labels, curry)was executed, then you'll have a miss match betweenhashand the new metricmetric = m.newMetric(lvs...)
- assuming that label value change occurred in
- assuming a metric didn't exist for the hash value
The miss match between hash and the new metric, will either:
- assuming the new metric previously didn't exist
- potentially crash the program due to a
fatal error: concurrent map iteration and map writeif the race condition is hit just right - the new metric will unintentionally be written with the wrong labels
- potentially crash the program due to a
- assuming it did exist
- again, potentially triggering a
fatal error: concurrent map iteration and map writecrash mfs, done, err := reg.Gather()returns acollected before with the same name and label valueserror due to itscheckMetricConsistencycall and will either, depending the prom http setting- PanicOnError - panic due to the error
- HTTPErrorOnError - return 500s with the body as the error when metrics are scraped
- ContinueOnError - ignores the error (besides logging it which is all three cases do) or report the error if no metrics have been gathered
- again, potentially triggering a