labels: Adjust cost estimates #990

dimitarvdimitrov · 2025-09-26T19:45:47Z

Cost model primitives

Updated the cost model to use a fixed cost for map operations rather than per-element cost. Also added benchmarks to help compare the cost of these with each other.

apple silicon arm64 benchmarks

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/model/labels
cpu: Apple M1 Pro
                                                    │ StringEquality │  StringHasPrefix   │   SliceContains    │    MapContains     │
                                                    │     sec/op     │   sec/op     vs base   │   sec/op     vs base   │   sec/op     vs base   │
CostEstimation/flavour=Equal_8chars-10                   2.056n ± 1%
CostEstimation/flavour=Equal_32chars-10                  2.728n ± 0%
CostEstimation/flavour=Equal_64chars-10                  2.565n ± 0%
CostEstimation/flavour=NotEqual_8chars-10                2.222n ± 0%
CostEstimation/flavour=NotEqual_32chars-10               2.064n ± 1%
CostEstimation/flavour=NotEqual_64chars-10               2.252n ± 0%
CostEstimation/flavour=ShortPrefix_8chars_Match-10                     2.158n ± 0%
CostEstimation/flavour=LongPrefix_32chars_Match-10                     2.720n ± 1%
CostEstimation/flavour=NearMiss_LastChar_32times-10                    2.400n ± 1%
CostEstimation/size=1-10                                                                    2.239n ± 0%
CostEstimation/size=2-10                                                                    3.175n ± 1%          7.053n ± 2%
CostEstimation/size=8-10                                                                    5.050n ± 0%
CostEstimation/size=16-10                                                                   7.557n ± 1%          9.596n ± 6%
CostEstimation/size=32-10                                                                                        9.556n ± 6%
CostEstimation/size=128-10                                                                                       9.791n ± 7%
CostEstimation/size=256-10                                                                                       9.631n ± 8%
geomean                                                  2.301n        2.415n       ? ¹ ²   4.058n       ? ¹ ²   9.058n       ? ¹ ²
¹ benchmark set differs from baseline; geomeans may not be comparable
² ratios must be >0 to compute geomean

intel amd64 benchmarks

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/model/labels
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
                                                   │ StringEquality │   StringHasPrefix   │    SliceContains    │     MapContains      │
                                                   │     sec/op     │    sec/op     vs base   │    sec/op     vs base   │    sec/op      vs base   │
CostEstimation/flavour=Equal_8chars-4                  2.295n ± ∞ ¹
CostEstimation/flavour=Equal_32chars-4                 7.092n ± ∞ ¹
CostEstimation/flavour=Equal_64chars-4                 6.559n ± ∞ ¹
CostEstimation/flavour=NotEqual_8chars-4               5.141n ± ∞ ¹
CostEstimation/flavour=NotEqual_32chars-4              4.619n ± ∞ ¹
CostEstimation/flavour=NotEqual_64chars-4              5.522n ± ∞ ¹
CostEstimation/flavour=ShortPrefix_8chars_Match-4                     5.870n ± ∞ ¹
CostEstimation/flavour=LongPrefix_32chars_Match-4                     7.985n ± ∞ ¹
CostEstimation/flavour=NearMiss_LastChar_32times-4                    7.990n ± ∞ ¹
CostEstimation/size=1-4                                                                     3.811n ± ∞ ¹
CostEstimation/size=2-4                                                                     4.449n ± ∞ ¹          12.870n ± ∞ ¹
CostEstimation/size=8-4                                                                     6.929n ± ∞ ¹
CostEstimation/size=16-4                                                                    10.62n ± ∞ ¹           13.62n ± ∞ ¹
CostEstimation/size=32-4                                                                                           13.59n ± ∞ ¹
CostEstimation/size=128-4                                                                                          15.47n ± ∞ ¹
CostEstimation/size=256-4                                                                                          13.58n ± ∞ ¹
geomean                                                4.909n         7.208n        ? ² ³   5.943n        ? ² ³    13.80n        ? ² ³```

Matcher benchmarking

I wanted to implement some way in which we can validate our cost model and track improvements. What I went with are average rank diff + Kendall's Tau.

Below are the results from before and after these adjustments. I think we didn't fix some major discrepancies, but we did move the average.

Before

    cost_test.go:457: Average rank difference: 5.33
    cost_test.go:458: Kendall's Tau: 0.6628 (1.0 = perfect positive correlation, -1.0 = perfect negative correlation)

    cost_test.go:455: rankDiff= 0	costRank= 1	runtimeRank= 1	cost=   1.0	timePerOp=     110ns	: state="Active"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: cluster="ops-eu-south-0"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: job="integrations/db-o11y"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="cortex_partition_ring_partitions"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: name="ingester-partitions"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="cortex_distributor_samples_in_total"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: version!="12.1.0-91295"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: topic!=""
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="adaptive_metrics_canary_agg"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="loki_distributor_bytes_received_total"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: container!="istio-proxy"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="kube_statefulset_replicas"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: container="distributor"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="up"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: namespace="hosted-grafana"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: namespace="grafana-com"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="mimir_target_series_per_ingester"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: status!="ok"
    cost_test.go:455: rankDiff=-1	costRank=12	runtimeRank=13	cost=  33.0	timePerOp=     290ns	: topic!~"(.+)-KSTREAM-AGGREGATE-STATE-STORE-(.+)"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: target!="remote"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: owner_kind!="ReplicaSet"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: __name__="cortex_lifecycler_read_only"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: namespace!="AWS/ECS"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: __name__="namespace_user:cortex_ingester_owned_series:sum_filtered_max_over_time...
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: job!="integrations/db-o11y"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"ops-eu-south-0"
    cost_test.go:455: rankDiff=-3	costRank= 1	runtimeRank= 4	cost=   1.0	timePerOp=     190ns	: cluster=~".+"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"prod-eu-west-2"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: namespace=~"loki-prod-035"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"prod-gb-south-1"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: namespace=~"loki-prod-031"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"prod-us-east-0"
    cost_test.go:455: rankDiff=-4	costRank= 2	runtimeRank= 6	cost=   1.1	timePerOp=     210ns	: namespace=~"mimir-ops-03"
    cost_test.go:455: rankDiff=-4	costRank= 2	runtimeRank= 6	cost=   1.1	timePerOp=     210ns	: cluster=~"prod-us-central-0"
    cost_test.go:455: rankDiff=-5	costRank= 7	runtimeRank=12	cost=  18.0	timePerOp=     280ns	: image_spec!~"(.*):d849bcd"
    cost_test.go:455: rankDiff=-5	costRank= 2	runtimeRank= 7	cost=   1.1	timePerOp=     220ns	: namespace=~"asserts"
    cost_test.go:455: rankDiff=-5	costRank=14	runtimeRank=19	cost= 280.0	timePerOp=     550ns	: route=~"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewa...
    cost_test.go:455: rankDiff=-5	costRank= 7	runtimeRank=12	cost=  18.0	timePerOp=     280ns	: image_spec!~"(.*):cfc5ca8"
    cost_test.go:455: rankDiff=-5	costRank= 7	runtimeRank=12	cost=  18.0	timePerOp=     280ns	: image_spec!~"(.*):cb8eaaa"
    cost_test.go:455: rankDiff=-5	costRank= 7	runtimeRank=12	cost=  18.0	timePerOp=     280ns	: image_spec!~"(.*):1364de3"
    cost_test.go:455: rankDiff=-6	costRank= 2	runtimeRank= 8	cost=   1.1	timePerOp=     230ns	: namespace!~"(cortex-ops-01)"
    cost_test.go:455: rankDiff=-7	costRank=11	runtimeRank=18	cost=  31.0	timePerOp=     390ns	: job!~".*envoy-stats.*"
    cost_test.go:455: rankDiff=-7	costRank= 2	runtimeRank= 9	cost=   1.1	timePerOp=     250ns	: tenant=~"(29)"
    cost_test.go:455: rankDiff=-7	costRank=11	runtimeRank=18	cost=  31.0	timePerOp=     390ns	: exported_job!~".*envoy-stats.*"
    cost_test.go:455: rankDiff=-8	costRank= 1	runtimeRank= 9	cost=   1.0	timePerOp=     250ns	: statefulset!~"ingester-zone-.-partition"
    cost_test.go:455: rankDiff=-8	costRank= 9	runtimeRank=17	cost=  28.0	timePerOp=     380ns	: route=~"(prometheus|api_prom)_api_v1_.+"
    cost_test.go:455: rankDiff=-9	costRank= 5	runtimeRank=14	cost=   4.4	timePerOp=     320ns	: job!~"integrations/(windows|node_exporter|unix|docker)"
    cost_test.go:455: rankDiff=-9	costRank=10	runtimeRank=19	cost=  30.0	timePerOp=     550ns	: statefulset=~"(ingester|mimir-write).*"
    cost_test.go:455: rankDiff=-9	costRank=13	runtimeRank=22	cost=  47.0	timePerOp=    1.11µs	: route=~".*v1.*|.*prom.*"
    cost_test.go:455: rankDiff=-9	costRank= 3	runtimeRank=12	cost=   2.2	timePerOp=     280ns	: k8s_dst_owner_type!~"Pod|Node"
    cost_test.go:455: rankDiff=-9	costRank= 1	runtimeRank=10	cost=   1.0	timePerOp=     260ns	: namespace!~"kube-.*"
    cost_test.go:455: rankDiff=-9	costRank= 3	runtimeRank=12	cost=   2.2	timePerOp=     280ns	: k8s_src_owner_type!~"Pod|Node"
    cost_test.go:455: rankDiff=-9	costRank= 6	runtimeRank=15	cost=   5.5	timePerOp=     330ns	: job!~"integrations/(windows|node_exporter|unix|docker|db-o11y)"
    cost_test.go:455: rankDiff=-10	costRank= 3	runtimeRank=13	cost=   2.2	timePerOp=     290ns	: job!~"(ecs-dockerstats-exporter)|(vmagent)"
    cost_test.go:455: rankDiff=-10	costRank= 3	runtimeRank=13	cost=   2.2	timePerOp=     290ns	: workload_type!~"job|cronjob"
    cost_test.go:455: rankDiff=-10	costRank= 3	runtimeRank=13	cost=   2.2	timePerOp=     290ns	: created_by_kind!~"Job|TaskRun"
    cost_test.go:455: rankDiff=10	costRank=15	runtimeRank= 5	cost=9065.0	timePerOp=     200ns	: pod=~"(bigquery-datasource-grafana-app-fast-54994db8f6-lvrv7|bigquery-datasource...
    cost_test.go:455: rankDiff=-10	costRank= 5	runtimeRank=15	cost=   4.4	timePerOp=     330ns	: reason=~"(rate_limited|per_stream_rate_limit|blocked_ingestion|missing_enforced_...
    cost_test.go:455: rankDiff=-10	costRank= 1	runtimeRank=11	cost=   1.0	timePerOp=     270ns	: slug!~"ephemeral.*"
    cost_test.go:455: rankDiff=-10	costRank= 1	runtimeRank=11	cost=   1.0	timePerOp=     270ns	: job=~"(cortex-prod-13)/((gateway|cortex-gw.*))"
    cost_test.go:455: rankDiff=-10	costRank= 1	runtimeRank=11	cost=   1.0	timePerOp=     270ns	: job=~"(mimir-ops-03)/((compactor.*|cortex|mimir))"
    cost_test.go:455: rankDiff=-11	costRank= 5	runtimeRank=16	cost=   4.4	timePerOp=     340ns	: route=~"api_(v1|prom)_push|otlp_v1_metrics|api_v1_push_influx_write"
    cost_test.go:455: rankDiff=11	costRank=16	runtimeRank= 5	cost=27569.0	timePerOp=     200ns	: pod=~"(synthetic-monitoring-agent-5587988f98-4nn82:agent:http-metrics|synthetic-...
    cost_test.go:455: rankDiff=-12	costRank= 1	runtimeRank=13	cost=   1.0	timePerOp=     290ns	: __name__=~"aws_.+_info"
    cost_test.go:455: rankDiff=-15	costRank= 8	runtimeRank=23	cost=  21.0	timePerOp=    2.04µs	: db_name!~"template.*|^$"
    cost_test.go:455: rankDiff=-16	costRank= 4	runtimeRank=20	cost=   4.0	timePerOp=     650ns	: partition=~"0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|...
    cost_test.go:455: rankDiff=-17	costRank= 4	runtimeRank=21	cost=   4.0	timePerOp=     870ns	: pod=~"querier-burst-744b9bc4b8-226b8|querier-burst-744b9bc4b8-226r8|querier-burs...

PASS

After

    cost_test.go:457: Average rank difference: 4.72
    cost_test.go:458: Kendall's Tau: 0.6724 (1.0 = perfect positive correlation, -1.0 = perfect negative correlation)

    cost_test.go:455: rankDiff= 0	costRank= 1	runtimeRank= 1	cost=   1.0	timePerOp=     110ns	: state="Active"
    cost_test.go:455: rankDiff= 0	costRank= 1	runtimeRank= 1	cost=   1.0	timePerOp=     110ns	: version!="12.1.0-91295"
    cost_test.go:455: rankDiff= 0	costRank=12	runtimeRank=12	cost=  33.0	timePerOp=     290ns	: topic!~"(.+)-KSTREAM-AGGREGATE-STATE-STORE-(.+)"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: container="distributor"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: topic!=""
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="cortex_distributor_samples_in_total"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="namespace_user:cortex_ingester_owned_series:sum_filtered_max_over_time...
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="kube_statefulset_replicas"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="up"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="loki_distributor_bytes_received_total"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="cortex_lifecycler_read_only"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: cluster="ops-eu-south-0"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="adaptive_metrics_canary_agg"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="cortex_partition_ring_partitions"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: name="ingester-partitions"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: namespace="grafana-com"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="mimir_target_series_per_ingester"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: container!="istio-proxy"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: namespace="hosted-grafana"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: status!="ok"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: owner_kind!="ReplicaSet"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: job="integrations/db-o11y"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: namespace!="AWS/ECS"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: job!="integrations/db-o11y"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: target!="remote"
    cost_test.go:455: rankDiff=-2	costRank= 2	runtimeRank= 4	cost=   1.1	timePerOp=     190ns	: cluster=~"prod-eu-west-2"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"ops-eu-south-0"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"prod-gb-south-1"
    cost_test.go:455: rankDiff=-3	costRank= 1	runtimeRank= 4	cost=   1.0	timePerOp=     190ns	: cluster=~".+"
    cost_test.go:455: rankDiff=-3	costRank= 7	runtimeRank=10	cost=  18.0	timePerOp=     270ns	: image_spec!~"(.*):1364de3"
    cost_test.go:455: rankDiff=-3	costRank= 7	runtimeRank=10	cost=  18.0	timePerOp=     270ns	: image_spec!~"(.*):d849bcd"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: namespace=~"loki-prod-035"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"prod-us-east-0"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: namespace=~"loki-prod-031"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"prod-us-central-0"
    cost_test.go:455: rankDiff=-4	costRank= 7	runtimeRank=11	cost=  18.0	timePerOp=     280ns	: image_spec!~"(.*):cb8eaaa"
    cost_test.go:455: rankDiff=-4	costRank= 7	runtimeRank=11	cost=  18.0	timePerOp=     280ns	: image_spec!~"(.*):cfc5ca8"
    cost_test.go:455: rankDiff=-4	costRank= 2	runtimeRank= 6	cost=   1.1	timePerOp=     210ns	: namespace=~"mimir-ops-03"
    cost_test.go:455: rankDiff=-4	costRank= 2	runtimeRank= 6	cost=   1.1	timePerOp=     210ns	: namespace=~"asserts"
    cost_test.go:455: rankDiff=-5	costRank= 2	runtimeRank= 7	cost=   1.1	timePerOp=     230ns	: namespace!~"(cortex-ops-01)"
    cost_test.go:455: rankDiff=-5	costRank=14	runtimeRank=19	cost= 280.0	timePerOp=     550ns	: route=~"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewa...
    cost_test.go:455: rankDiff=-6	costRank=11	runtimeRank=17	cost=  31.0	timePerOp=     390ns	: job!~".*envoy-stats.*"
    cost_test.go:455: rankDiff=-6	costRank= 2	runtimeRank= 8	cost=   1.1	timePerOp=     250ns	: tenant=~"(29)"
    cost_test.go:455: rankDiff=-6	costRank=11	runtimeRank=17	cost=  31.0	timePerOp=     390ns	: exported_job!~".*envoy-stats.*"
    cost_test.go:455: rankDiff=-7	costRank= 1	runtimeRank= 8	cost=   1.0	timePerOp=     250ns	: statefulset!~"ingester-zone-.-partition"
    cost_test.go:455: rankDiff=-7	costRank= 1	runtimeRank= 8	cost=   1.0	timePerOp=     250ns	: namespace!~"kube-.*"
    cost_test.go:455: rankDiff=-7	costRank= 3	runtimeRank=10	cost=   2.2	timePerOp=     270ns	: k8s_dst_owner_type!~"Pod|Node"
    cost_test.go:455: rankDiff=-7	costRank= 9	runtimeRank=16	cost=  28.0	timePerOp=     380ns	: route=~"(prometheus|api_prom)_api_v1_.+"
    cost_test.go:455: rankDiff=-8	costRank= 1	runtimeRank= 9	cost=   1.0	timePerOp=     260ns	: job=~"(mimir-ops-03)/((compactor.*|cortex|mimir))"
    cost_test.go:455: rankDiff=-8	costRank= 3	runtimeRank=11	cost=   2.2	timePerOp=     280ns	: k8s_src_owner_type!~"Pod|Node"
    cost_test.go:455: rankDiff=-8	costRank= 3	runtimeRank=11	cost=   2.2	timePerOp=     280ns	: job!~"(ecs-dockerstats-exporter)|(vmagent)"
    cost_test.go:455: rankDiff=-8	costRank=10	runtimeRank=18	cost=  30.0	timePerOp=     520ns	: statefulset=~"(ingester|mimir-write).*"
    cost_test.go:455: rankDiff=-8	costRank= 6	runtimeRank=14	cost=   5.5	timePerOp=     330ns	: job!~"integrations/(windows|node_exporter|unix|docker|db-o11y)"
    cost_test.go:455: rankDiff=-8	costRank= 1	runtimeRank= 9	cost=   1.0	timePerOp=     260ns	: slug!~"ephemeral.*"
    cost_test.go:455: rankDiff=-8	costRank= 5	runtimeRank=13	cost=   4.4	timePerOp=     310ns	: job!~"integrations/(windows|node_exporter|unix|docker)"
    cost_test.go:455: rankDiff=-9	costRank= 3	runtimeRank=12	cost=   2.2	timePerOp=     290ns	: created_by_kind!~"Job|TaskRun"
    cost_test.go:455: rankDiff=-9	costRank= 5	runtimeRank=14	cost=   4.4	timePerOp=     330ns	: reason=~"(rate_limited|per_stream_rate_limit|blocked_ingestion|missing_enforced_...
    cost_test.go:455: rankDiff=-9	costRank= 3	runtimeRank=12	cost=   2.2	timePerOp=     290ns	: workload_type!~"job|cronjob"
    cost_test.go:455: rankDiff=-9	costRank=13	runtimeRank=22	cost=  47.0	timePerOp=    1.15µs	: route=~".*v1.*|.*prom.*"
    cost_test.go:455: rankDiff=-10	costRank= 1	runtimeRank=11	cost=   1.0	timePerOp=     280ns	: job=~"(cortex-prod-13)/((gateway|cortex-gw.*))"
    cost_test.go:455: rankDiff=10	costRank=15	runtimeRank= 5	cost=9065.0	timePerOp=     200ns	: pod=~"(bigquery-datasource-grafana-app-fast-54994db8f6-lvrv7|bigquery-datasource...
    cost_test.go:455: rankDiff=10	costRank=16	runtimeRank= 6	cost=27569.0	timePerOp=     210ns	: pod=~"(synthetic-monitoring-agent-5587988f98-4nn82:agent:http-metrics|synthetic-...
    cost_test.go:455: rankDiff=-10	costRank= 5	runtimeRank=15	cost=   4.4	timePerOp=     340ns	: route=~"api_(v1|prom)_push|otlp_v1_metrics|api_v1_push_influx_write"
    cost_test.go:455: rankDiff=-10	costRank= 1	runtimeRank=11	cost=   1.0	timePerOp=     280ns	: __name__=~"aws_.+_info"
    cost_test.go:455: rankDiff=-15	costRank= 8	runtimeRank=23	cost=  21.0	timePerOp=    2.03µs	: db_name!~"template.*|^$"
    cost_test.go:455: rankDiff=-16	costRank= 4	runtimeRank=20	cost=   4.0	timePerOp=     670ns	: partition=~"0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|...
    cost_test.go:455: rankDiff=-17	costRank= 4	runtimeRank=21	cost=   4.0	timePerOp=     820ns	: pod=~"querier-burst-744b9bc4b8-226b8|querier-burst-744b9bc4b8-226r8|querier-burs...
PASS

Advice to reviewers

It's probably easier to review without whitespace diff

related to grafana/mimir#11920

Map access is constant time, not dependent on map size. Updated the cost model to use a fixed cost for map operations rather than per-element cost.  Changes: - Fixed map cost model to be constant time (4.0) instead of per-element - Added comprehensive benchmarks for string equality, hasPrefix, slice contains, and map contains operations - Added postings iteration benchmarks across different sizes - Removed outdated TODO comments that are now addressed

Copilot

Pull Request Overview

This PR adjusts the cost estimation model for matcher operations in Prometheus labels, moving from per-element costs to fixed costs for map operations. It also introduces comprehensive benchmarks to compare the performance characteristics of different string matching operations.

Updated cost constants to better reflect actual performance characteristics
Changed map operations from per-element to fixed cost model
Added extensive benchmarks for string equality, prefix matching, slice/map contains operations

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
model/labels/cost.go	Updated cost estimation constants and changed map matcher cost calculation from per-element to fixed cost
model/labels/cost_test.go	Added comprehensive benchmarks for string operations, slice/map contains, and postings iteration
model/labels/postings_bench_test.go	Added benchmarks for iterating over postings lists of various sizes

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

model/labels/cost.go

Extracted the test cases into a shared matcherTestCases variable that can be reused by other functions.

Creates sub-benchmarks for each matcher from matcherTestCases, measures actual runtime vs theoretical cost, and outputs ranking comparison. Runtime is rounded to multiples of 3 nanoseconds and uses competition-style ranking where equal values get the same rank.

### Problem We've seen that ingesters can end up doing more work when this optimization is enabled. This is due to fetching significantly more series and checking them for sharding. This is not something our cost model accounted for. ### What this PR does this PR introduces the cost of retrieving a single series from the index. The cost of doing this depends a lot on whether the block is an in-memory or an on-disk block. In-memory blocks have much more efficient sharding code. For now we're sticking with a single cost for all blocks, but we can change that later. ### why 10? I added a benchmark to help me come up with the new number. I'm comparing the baseline of 2ns on my machine with the 2ns that it takes to do a string comparison (see grafana/mimir-prometheus#990). This means our cost should be around 10-40. 10 is 3 orders of magnitude higher than what we had before (0.01) already, so I was cautious not to move this too much <details><summary>Cost of sharded vs non-sharded postings iteration</summary> <p> false: not sharded true: sharded ``` benchstat -col=/sharded -row=/size -filter='(/sharded:false OR /reuseCache:true)' │ false │ true │ │ sec/op │ sec/op vs base │ 128 326.1n ± ∞ ¹ 2937.0n ± ∞ ¹ ~ (p=0.100 n=3) ² 128K 301.5µ ± ∞ ¹ 4943.5µ ± ∞ ¹ ~ (p=0.100 n=3) ² 1M 2.435m ± ∞ ¹ 83.056m ± ∞ ¹ ~ (p=0.100 n=3) ² geomean 62.10µ 1.064m +1614.11% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 │ false │ true │ │ sec/posting │ sec/posting vs base │ 128 2.548n ± ∞ ¹ 22.940n ± ∞ ¹ ~ (p=0.100 n=3) ² 128K 2.300n ± ∞ ¹ 37.720n ± ∞ ¹ ~ (p=0.100 n=3) ² 1M 2.322n ± ∞ ¹ 79.210n ± ∞ ¹ ~ (p=0.100 n=3) ² geomean 2.387n 40.92n +1614.16% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 ``` </p> </details> ### Potential next steps 1. detect whether the block is in-memory or on-disk and adjust our plans. 2. adjust the cost based on whether we have query sharding enabled or not; hashing series is expensive 1\. will probably give higher returns  #### What this PR does #### Which issue(s) this PR fixes or relates to related to #11920 --------- Signed-off-by: Dimitar Dimitrov <[email protected]>

model/labels/cost_test.go

dimitarvdimitrov requested a review from Copilot September 26, 2025 19:45

Copilot AI reviewed Sep 26, 2025

View reviewed changes

model/labels/cost.go Show resolved Hide resolved

model/labels/cost.go Show resolved Hide resolved

dimitarvdimitrov added 6 commits September 26, 2025 21:48

Adjust benchmark names

8461431

WIP: Fix some tests

cf9cbd6

Adjust cost tests

04cd0ae

Refactor SingleMatchCost test to use shared test data

882a92e

Extracted the test cases into a shared matcherTestCases variable that can be reused by other functions.

Delete claude's benchmark

0d038b6

dimitarvdimitrov mentioned this pull request Sep 26, 2025

index planning: Refine cost models after initial phase of testing grafana/mimir#11920

Open

7 tasks

Add regexp matcher cost benchmarks

5758bc5

dimitarvdimitrov marked this pull request as ready for review September 26, 2025 21:55

dimitarvdimitrov added 2 commits September 27, 2025 00:01

bikeshedding init

8aadecf

Move function for easier diff

b887cc8

dimitarvdimitrov mentioned this pull request Sep 26, 2025

lookup planning: adjust plan's intersection cost grafana/mimir#12830

Merged

chencs reviewed Oct 8, 2025

View reviewed changes

model/labels/cost_test.go Outdated Show resolved Hide resolved

Refactor comments

e637b6d

chencs approved these changes Oct 15, 2025

View reviewed changes

dimitarvdimitrov merged commit e8e048d into main Oct 16, 2025
28 checks passed

dimitarvdimitrov deleted the dimitar/labels/fix-cost-estimates branch October 16, 2025 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

labels: Adjust cost estimates #990

labels: Adjust cost estimates #990

Uh oh!

dimitarvdimitrov commented Sep 26, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

labels: Adjust cost estimates #990

labels: Adjust cost estimates #990

Uh oh!

Conversation

dimitarvdimitrov commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cost model primitives

Matcher benchmarking

Advice to reviewers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dimitarvdimitrov commented Sep 26, 2025 •

edited

Loading