Skip to content

Conversation

@dimitarvdimitrov
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov commented Sep 26, 2025

Cost model primitives

Updated the cost model to use a fixed cost for map operations rather than per-element cost. Also added benchmarks to help compare the cost of these with each other.

apple silicon arm64 benchmarks

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/model/labels
cpu: Apple M1 Pro
                                                    │ StringEquality │  StringHasPrefix   │   SliceContains    │    MapContains     │
                                                    │     sec/op     │   sec/op     vs base   │   sec/op     vs base   │   sec/op     vs base   │
CostEstimation/flavour=Equal_8chars-10                   2.056n ± 1%
CostEstimation/flavour=Equal_32chars-10                  2.728n ± 0%
CostEstimation/flavour=Equal_64chars-10                  2.565n ± 0%
CostEstimation/flavour=NotEqual_8chars-10                2.222n ± 0%
CostEstimation/flavour=NotEqual_32chars-10               2.064n ± 1%
CostEstimation/flavour=NotEqual_64chars-10               2.252n ± 0%
CostEstimation/flavour=ShortPrefix_8chars_Match-10                     2.158n ± 0%
CostEstimation/flavour=LongPrefix_32chars_Match-10                     2.720n ± 1%
CostEstimation/flavour=NearMiss_LastChar_32times-10                    2.400n ± 1%
CostEstimation/size=1-10                                                                    2.239n ± 0%
CostEstimation/size=2-10                                                                    3.175n ± 1%          7.053n ± 2%
CostEstimation/size=8-10                                                                    5.050n ± 0%
CostEstimation/size=16-10                                                                   7.557n ± 1%          9.596n ± 6%
CostEstimation/size=32-10                                                                                        9.556n ± 6%
CostEstimation/size=128-10                                                                                       9.791n ± 7%
CostEstimation/size=256-10                                                                                       9.631n ± 8%
geomean                                                  2.301n        2.415n       ? ¹ ²   4.058n       ? ¹ ²   9.058n       ? ¹ ²
¹ benchmark set differs from baseline; geomeans may not be comparable
² ratios must be >0 to compute geomean

intel amd64 benchmarks

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/model/labels
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
                                                   │ StringEquality │   StringHasPrefix   │    SliceContains    │     MapContains      │
                                                   │     sec/op     │    sec/op     vs base   │    sec/op     vs base   │    sec/op      vs base   │
CostEstimation/flavour=Equal_8chars-4                  2.295n ± ∞ ¹
CostEstimation/flavour=Equal_32chars-4                 7.092n ± ∞ ¹
CostEstimation/flavour=Equal_64chars-4                 6.559n ± ∞ ¹
CostEstimation/flavour=NotEqual_8chars-4               5.141n ± ∞ ¹
CostEstimation/flavour=NotEqual_32chars-4              4.619n ± ∞ ¹
CostEstimation/flavour=NotEqual_64chars-4              5.522n ± ∞ ¹
CostEstimation/flavour=ShortPrefix_8chars_Match-4                     5.870n ± ∞ ¹
CostEstimation/flavour=LongPrefix_32chars_Match-4                     7.985n ± ∞ ¹
CostEstimation/flavour=NearMiss_LastChar_32times-4                    7.990n ± ∞ ¹
CostEstimation/size=1-4                                                                     3.811n ± ∞ ¹
CostEstimation/size=2-4                                                                     4.449n ± ∞ ¹          12.870n ± ∞ ¹
CostEstimation/size=8-4                                                                     6.929n ± ∞ ¹
CostEstimation/size=16-4                                                                    10.62n ± ∞ ¹           13.62n ± ∞ ¹
CostEstimation/size=32-4                                                                                           13.59n ± ∞ ¹
CostEstimation/size=128-4                                                                                          15.47n ± ∞ ¹
CostEstimation/size=256-4                                                                                          13.58n ± ∞ ¹
geomean                                                4.909n         7.208n        ? ² ³   5.943n        ? ² ³    13.80n        ? ² ³```

Matcher benchmarking

I wanted to implement some way in which we can validate our cost model and track improvements. What I went with are average rank diff + Kendall's Tau.

Below are the results from before and after these adjustments. I think we didn't fix some major discrepancies, but we did move the average.

Before

    cost_test.go:457: Average rank difference: 5.33
    cost_test.go:458: Kendall's Tau: 0.6628 (1.0 = perfect positive correlation, -1.0 = perfect negative correlation)
    cost_test.go:455: rankDiff= 0	costRank= 1	runtimeRank= 1	cost=   1.0	timePerOp=     110ns	: state="Active"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: cluster="ops-eu-south-0"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: job="integrations/db-o11y"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="cortex_partition_ring_partitions"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: name="ingester-partitions"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="cortex_distributor_samples_in_total"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: version!="12.1.0-91295"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: topic!=""
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="adaptive_metrics_canary_agg"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="loki_distributor_bytes_received_total"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: container!="istio-proxy"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="kube_statefulset_replicas"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: container="distributor"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="up"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: namespace="hosted-grafana"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: namespace="grafana-com"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="mimir_target_series_per_ingester"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: status!="ok"
    cost_test.go:455: rankDiff=-1	costRank=12	runtimeRank=13	cost=  33.0	timePerOp=     290ns	: topic!~"(.+)-KSTREAM-AGGREGATE-STATE-STORE-(.+)"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: target!="remote"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: owner_kind!="ReplicaSet"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: __name__="cortex_lifecycler_read_only"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: namespace!="AWS/ECS"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: __name__="namespace_user:cortex_ingester_owned_series:sum_filtered_max_over_time...
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: job!="integrations/db-o11y"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"ops-eu-south-0"
    cost_test.go:455: rankDiff=-3	costRank= 1	runtimeRank= 4	cost=   1.0	timePerOp=     190ns	: cluster=~".+"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"prod-eu-west-2"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: namespace=~"loki-prod-035"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"prod-gb-south-1"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: namespace=~"loki-prod-031"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"prod-us-east-0"
    cost_test.go:455: rankDiff=-4	costRank= 2	runtimeRank= 6	cost=   1.1	timePerOp=     210ns	: namespace=~"mimir-ops-03"
    cost_test.go:455: rankDiff=-4	costRank= 2	runtimeRank= 6	cost=   1.1	timePerOp=     210ns	: cluster=~"prod-us-central-0"
    cost_test.go:455: rankDiff=-5	costRank= 7	runtimeRank=12	cost=  18.0	timePerOp=     280ns	: image_spec!~"(.*):d849bcd"
    cost_test.go:455: rankDiff=-5	costRank= 2	runtimeRank= 7	cost=   1.1	timePerOp=     220ns	: namespace=~"asserts"
    cost_test.go:455: rankDiff=-5	costRank=14	runtimeRank=19	cost= 280.0	timePerOp=     550ns	: route=~"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewa...
    cost_test.go:455: rankDiff=-5	costRank= 7	runtimeRank=12	cost=  18.0	timePerOp=     280ns	: image_spec!~"(.*):cfc5ca8"
    cost_test.go:455: rankDiff=-5	costRank= 7	runtimeRank=12	cost=  18.0	timePerOp=     280ns	: image_spec!~"(.*):cb8eaaa"
    cost_test.go:455: rankDiff=-5	costRank= 7	runtimeRank=12	cost=  18.0	timePerOp=     280ns	: image_spec!~"(.*):1364de3"
    cost_test.go:455: rankDiff=-6	costRank= 2	runtimeRank= 8	cost=   1.1	timePerOp=     230ns	: namespace!~"(cortex-ops-01)"
    cost_test.go:455: rankDiff=-7	costRank=11	runtimeRank=18	cost=  31.0	timePerOp=     390ns	: job!~".*envoy-stats.*"
    cost_test.go:455: rankDiff=-7	costRank= 2	runtimeRank= 9	cost=   1.1	timePerOp=     250ns	: tenant=~"(29)"
    cost_test.go:455: rankDiff=-7	costRank=11	runtimeRank=18	cost=  31.0	timePerOp=     390ns	: exported_job!~".*envoy-stats.*"
    cost_test.go:455: rankDiff=-8	costRank= 1	runtimeRank= 9	cost=   1.0	timePerOp=     250ns	: statefulset!~"ingester-zone-.-partition"
    cost_test.go:455: rankDiff=-8	costRank= 9	runtimeRank=17	cost=  28.0	timePerOp=     380ns	: route=~"(prometheus|api_prom)_api_v1_.+"
    cost_test.go:455: rankDiff=-9	costRank= 5	runtimeRank=14	cost=   4.4	timePerOp=     320ns	: job!~"integrations/(windows|node_exporter|unix|docker)"
    cost_test.go:455: rankDiff=-9	costRank=10	runtimeRank=19	cost=  30.0	timePerOp=     550ns	: statefulset=~"(ingester|mimir-write).*"
    cost_test.go:455: rankDiff=-9	costRank=13	runtimeRank=22	cost=  47.0	timePerOp=    1.11µs	: route=~".*v1.*|.*prom.*"
    cost_test.go:455: rankDiff=-9	costRank= 3	runtimeRank=12	cost=   2.2	timePerOp=     280ns	: k8s_dst_owner_type!~"Pod|Node"
    cost_test.go:455: rankDiff=-9	costRank= 1	runtimeRank=10	cost=   1.0	timePerOp=     260ns	: namespace!~"kube-.*"
    cost_test.go:455: rankDiff=-9	costRank= 3	runtimeRank=12	cost=   2.2	timePerOp=     280ns	: k8s_src_owner_type!~"Pod|Node"
    cost_test.go:455: rankDiff=-9	costRank= 6	runtimeRank=15	cost=   5.5	timePerOp=     330ns	: job!~"integrations/(windows|node_exporter|unix|docker|db-o11y)"
    cost_test.go:455: rankDiff=-10	costRank= 3	runtimeRank=13	cost=   2.2	timePerOp=     290ns	: job!~"(ecs-dockerstats-exporter)|(vmagent)"
    cost_test.go:455: rankDiff=-10	costRank= 3	runtimeRank=13	cost=   2.2	timePerOp=     290ns	: workload_type!~"job|cronjob"
    cost_test.go:455: rankDiff=-10	costRank= 3	runtimeRank=13	cost=   2.2	timePerOp=     290ns	: created_by_kind!~"Job|TaskRun"
    cost_test.go:455: rankDiff=10	costRank=15	runtimeRank= 5	cost=9065.0	timePerOp=     200ns	: pod=~"(bigquery-datasource-grafana-app-fast-54994db8f6-lvrv7|bigquery-datasource...
    cost_test.go:455: rankDiff=-10	costRank= 5	runtimeRank=15	cost=   4.4	timePerOp=     330ns	: reason=~"(rate_limited|per_stream_rate_limit|blocked_ingestion|missing_enforced_...
    cost_test.go:455: rankDiff=-10	costRank= 1	runtimeRank=11	cost=   1.0	timePerOp=     270ns	: slug!~"ephemeral.*"
    cost_test.go:455: rankDiff=-10	costRank= 1	runtimeRank=11	cost=   1.0	timePerOp=     270ns	: job=~"(cortex-prod-13)/((gateway|cortex-gw.*))"
    cost_test.go:455: rankDiff=-10	costRank= 1	runtimeRank=11	cost=   1.0	timePerOp=     270ns	: job=~"(mimir-ops-03)/((compactor.*|cortex|mimir))"
    cost_test.go:455: rankDiff=-11	costRank= 5	runtimeRank=16	cost=   4.4	timePerOp=     340ns	: route=~"api_(v1|prom)_push|otlp_v1_metrics|api_v1_push_influx_write"
    cost_test.go:455: rankDiff=11	costRank=16	runtimeRank= 5	cost=27569.0	timePerOp=     200ns	: pod=~"(synthetic-monitoring-agent-5587988f98-4nn82:agent:http-metrics|synthetic-...
    cost_test.go:455: rankDiff=-12	costRank= 1	runtimeRank=13	cost=   1.0	timePerOp=     290ns	: __name__=~"aws_.+_info"
    cost_test.go:455: rankDiff=-15	costRank= 8	runtimeRank=23	cost=  21.0	timePerOp=    2.04µs	: db_name!~"template.*|^$"
    cost_test.go:455: rankDiff=-16	costRank= 4	runtimeRank=20	cost=   4.0	timePerOp=     650ns	: partition=~"0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|...
    cost_test.go:455: rankDiff=-17	costRank= 4	runtimeRank=21	cost=   4.0	timePerOp=     870ns	: pod=~"querier-burst-744b9bc4b8-226b8|querier-burst-744b9bc4b8-226r8|querier-burs...

PASS

After

    cost_test.go:457: Average rank difference: 4.72
    cost_test.go:458: Kendall's Tau: 0.6724 (1.0 = perfect positive correlation, -1.0 = perfect negative correlation)
    cost_test.go:455: rankDiff= 0	costRank= 1	runtimeRank= 1	cost=   1.0	timePerOp=     110ns	: state="Active"
    cost_test.go:455: rankDiff= 0	costRank= 1	runtimeRank= 1	cost=   1.0	timePerOp=     110ns	: version!="12.1.0-91295"
    cost_test.go:455: rankDiff= 0	costRank=12	runtimeRank=12	cost=  33.0	timePerOp=     290ns	: topic!~"(.+)-KSTREAM-AGGREGATE-STATE-STORE-(.+)"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: container="distributor"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: topic!=""
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="cortex_distributor_samples_in_total"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="namespace_user:cortex_ingester_owned_series:sum_filtered_max_over_time...
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="kube_statefulset_replicas"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="up"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="loki_distributor_bytes_received_total"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="cortex_lifecycler_read_only"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: cluster="ops-eu-south-0"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="adaptive_metrics_canary_agg"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="cortex_partition_ring_partitions"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: name="ingester-partitions"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: namespace="grafana-com"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: __name__="mimir_target_series_per_ingester"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: container!="istio-proxy"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: namespace="hosted-grafana"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: status!="ok"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: owner_kind!="ReplicaSet"
    cost_test.go:455: rankDiff=-1	costRank= 1	runtimeRank= 2	cost=   1.0	timePerOp=     120ns	: job="integrations/db-o11y"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: namespace!="AWS/ECS"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: job!="integrations/db-o11y"
    cost_test.go:455: rankDiff=-2	costRank= 1	runtimeRank= 3	cost=   1.0	timePerOp=     130ns	: target!="remote"
    cost_test.go:455: rankDiff=-2	costRank= 2	runtimeRank= 4	cost=   1.1	timePerOp=     190ns	: cluster=~"prod-eu-west-2"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"ops-eu-south-0"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"prod-gb-south-1"
    cost_test.go:455: rankDiff=-3	costRank= 1	runtimeRank= 4	cost=   1.0	timePerOp=     190ns	: cluster=~".+"
    cost_test.go:455: rankDiff=-3	costRank= 7	runtimeRank=10	cost=  18.0	timePerOp=     270ns	: image_spec!~"(.*):1364de3"
    cost_test.go:455: rankDiff=-3	costRank= 7	runtimeRank=10	cost=  18.0	timePerOp=     270ns	: image_spec!~"(.*):d849bcd"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: namespace=~"loki-prod-035"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"prod-us-east-0"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: namespace=~"loki-prod-031"
    cost_test.go:455: rankDiff=-3	costRank= 2	runtimeRank= 5	cost=   1.1	timePerOp=     200ns	: cluster=~"prod-us-central-0"
    cost_test.go:455: rankDiff=-4	costRank= 7	runtimeRank=11	cost=  18.0	timePerOp=     280ns	: image_spec!~"(.*):cb8eaaa"
    cost_test.go:455: rankDiff=-4	costRank= 7	runtimeRank=11	cost=  18.0	timePerOp=     280ns	: image_spec!~"(.*):cfc5ca8"
    cost_test.go:455: rankDiff=-4	costRank= 2	runtimeRank= 6	cost=   1.1	timePerOp=     210ns	: namespace=~"mimir-ops-03"
    cost_test.go:455: rankDiff=-4	costRank= 2	runtimeRank= 6	cost=   1.1	timePerOp=     210ns	: namespace=~"asserts"
    cost_test.go:455: rankDiff=-5	costRank= 2	runtimeRank= 7	cost=   1.1	timePerOp=     230ns	: namespace!~"(cortex-ops-01)"
    cost_test.go:455: rankDiff=-5	costRank=14	runtimeRank=19	cost= 280.0	timePerOp=     550ns	: route=~"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewa...
    cost_test.go:455: rankDiff=-6	costRank=11	runtimeRank=17	cost=  31.0	timePerOp=     390ns	: job!~".*envoy-stats.*"
    cost_test.go:455: rankDiff=-6	costRank= 2	runtimeRank= 8	cost=   1.1	timePerOp=     250ns	: tenant=~"(29)"
    cost_test.go:455: rankDiff=-6	costRank=11	runtimeRank=17	cost=  31.0	timePerOp=     390ns	: exported_job!~".*envoy-stats.*"
    cost_test.go:455: rankDiff=-7	costRank= 1	runtimeRank= 8	cost=   1.0	timePerOp=     250ns	: statefulset!~"ingester-zone-.-partition"
    cost_test.go:455: rankDiff=-7	costRank= 1	runtimeRank= 8	cost=   1.0	timePerOp=     250ns	: namespace!~"kube-.*"
    cost_test.go:455: rankDiff=-7	costRank= 3	runtimeRank=10	cost=   2.2	timePerOp=     270ns	: k8s_dst_owner_type!~"Pod|Node"
    cost_test.go:455: rankDiff=-7	costRank= 9	runtimeRank=16	cost=  28.0	timePerOp=     380ns	: route=~"(prometheus|api_prom)_api_v1_.+"
    cost_test.go:455: rankDiff=-8	costRank= 1	runtimeRank= 9	cost=   1.0	timePerOp=     260ns	: job=~"(mimir-ops-03)/((compactor.*|cortex|mimir))"
    cost_test.go:455: rankDiff=-8	costRank= 3	runtimeRank=11	cost=   2.2	timePerOp=     280ns	: k8s_src_owner_type!~"Pod|Node"
    cost_test.go:455: rankDiff=-8	costRank= 3	runtimeRank=11	cost=   2.2	timePerOp=     280ns	: job!~"(ecs-dockerstats-exporter)|(vmagent)"
    cost_test.go:455: rankDiff=-8	costRank=10	runtimeRank=18	cost=  30.0	timePerOp=     520ns	: statefulset=~"(ingester|mimir-write).*"
    cost_test.go:455: rankDiff=-8	costRank= 6	runtimeRank=14	cost=   5.5	timePerOp=     330ns	: job!~"integrations/(windows|node_exporter|unix|docker|db-o11y)"
    cost_test.go:455: rankDiff=-8	costRank= 1	runtimeRank= 9	cost=   1.0	timePerOp=     260ns	: slug!~"ephemeral.*"
    cost_test.go:455: rankDiff=-8	costRank= 5	runtimeRank=13	cost=   4.4	timePerOp=     310ns	: job!~"integrations/(windows|node_exporter|unix|docker)"
    cost_test.go:455: rankDiff=-9	costRank= 3	runtimeRank=12	cost=   2.2	timePerOp=     290ns	: created_by_kind!~"Job|TaskRun"
    cost_test.go:455: rankDiff=-9	costRank= 5	runtimeRank=14	cost=   4.4	timePerOp=     330ns	: reason=~"(rate_limited|per_stream_rate_limit|blocked_ingestion|missing_enforced_...
    cost_test.go:455: rankDiff=-9	costRank= 3	runtimeRank=12	cost=   2.2	timePerOp=     290ns	: workload_type!~"job|cronjob"
    cost_test.go:455: rankDiff=-9	costRank=13	runtimeRank=22	cost=  47.0	timePerOp=    1.15µs	: route=~".*v1.*|.*prom.*"
    cost_test.go:455: rankDiff=-10	costRank= 1	runtimeRank=11	cost=   1.0	timePerOp=     280ns	: job=~"(cortex-prod-13)/((gateway|cortex-gw.*))"
    cost_test.go:455: rankDiff=10	costRank=15	runtimeRank= 5	cost=9065.0	timePerOp=     200ns	: pod=~"(bigquery-datasource-grafana-app-fast-54994db8f6-lvrv7|bigquery-datasource...
    cost_test.go:455: rankDiff=10	costRank=16	runtimeRank= 6	cost=27569.0	timePerOp=     210ns	: pod=~"(synthetic-monitoring-agent-5587988f98-4nn82:agent:http-metrics|synthetic-...
    cost_test.go:455: rankDiff=-10	costRank= 5	runtimeRank=15	cost=   4.4	timePerOp=     340ns	: route=~"api_(v1|prom)_push|otlp_v1_metrics|api_v1_push_influx_write"
    cost_test.go:455: rankDiff=-10	costRank= 1	runtimeRank=11	cost=   1.0	timePerOp=     280ns	: __name__=~"aws_.+_info"
    cost_test.go:455: rankDiff=-15	costRank= 8	runtimeRank=23	cost=  21.0	timePerOp=    2.03µs	: db_name!~"template.*|^$"
    cost_test.go:455: rankDiff=-16	costRank= 4	runtimeRank=20	cost=   4.0	timePerOp=     670ns	: partition=~"0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|...
    cost_test.go:455: rankDiff=-17	costRank= 4	runtimeRank=21	cost=   4.0	timePerOp=     820ns	: pod=~"querier-burst-744b9bc4b8-226b8|querier-burst-744b9bc4b8-226r8|querier-burs...
PASS

Advice to reviewers

It's probably easier to review without whitespace diff

related to grafana/mimir#11920

Map access is constant time, not dependent on map size. Updated the cost
model to use a fixed cost for map operations rather than per-element cost.

<!-- TODO: Add detailed benchmark results here -->

Changes:
- Fixed map cost model to be constant time (4.0) instead of per-element
- Added comprehensive benchmarks for string equality, hasPrefix, slice contains, and map contains operations
- Added postings iteration benchmarks across different sizes
- Removed outdated TODO comments that are now addressed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adjusts the cost estimation model for matcher operations in Prometheus labels, moving from per-element costs to fixed costs for map operations. It also introduces comprehensive benchmarks to compare the performance characteristics of different string matching operations.

  • Updated cost constants to better reflect actual performance characteristics
  • Changed map operations from per-element to fixed cost model
  • Added extensive benchmarks for string equality, prefix matching, slice/map contains operations

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
model/labels/cost.go Updated cost estimation constants and changed map matcher cost calculation from per-element to fixed cost
model/labels/cost_test.go Added comprehensive benchmarks for string operations, slice/map contains, and postings iteration
model/labels/postings_bench_test.go Added benchmarks for iterating over postings lists of various sizes

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Extracted the test cases into a shared matcherTestCases variable
that can be reused by other functions.
Creates sub-benchmarks for each matcher from matcherTestCases,
measures actual runtime vs theoretical cost, and outputs ranking
comparison. Runtime is rounded to multiples of 3 nanoseconds
and uses competition-style ranking where equal values get the
same rank.
@dimitarvdimitrov dimitarvdimitrov marked this pull request as ready for review September 26, 2025 21:55
dimitarvdimitrov added a commit to grafana/mimir that referenced this pull request Oct 8, 2025
### Problem

We've seen that ingesters can end up doing more work when this
optimization is enabled. This is due to fetching significantly more
series and checking them for sharding. This is not something our cost
model accounted for.

### What this PR does

this PR introduces the cost of retrieving a single series from the
index. The cost of doing this depends a lot on whether the block is an
in-memory or an on-disk block. In-memory blocks have much more efficient
sharding code. For now we're sticking with a single cost for all blocks,
but we can change that later.

### why 10?

I added a benchmark to help me come up with the new number. I'm
comparing the baseline of 2ns on my machine with the 2ns that it takes
to do a string comparison (see
grafana/mimir-prometheus#990). This means our
cost should be around 10-40. 10 is 3 orders of magnitude higher than
what we had before (0.01) already, so I was cautious not to move this
too much


<details><summary>Cost of sharded vs non-sharded postings
iteration</summary>
<p>

false: not sharded
true: sharded

```
benchstat -col=/sharded -row=/size -filter='(/sharded:false OR /reuseCache:true)' 
        │    false     │                   true                   │
        │    sec/op    │    sec/op      vs base                   │
128       326.1n ± ∞ ¹   2937.0n ± ∞ ¹          ~ (p=0.100 n=3) ²
128K      301.5µ ± ∞ ¹   4943.5µ ± ∞ ¹          ~ (p=0.100 n=3) ²
1M        2.435m ± ∞ ¹   83.056m ± ∞ ¹          ~ (p=0.100 n=3) ²
geomean   62.10µ          1.064m        +1614.11%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

        │    false     │                   true                   │
        │ sec/posting  │  sec/posting   vs base                   │
128       2.548n ± ∞ ¹   22.940n ± ∞ ¹          ~ (p=0.100 n=3) ²
128K      2.300n ± ∞ ¹   37.720n ± ∞ ¹          ~ (p=0.100 n=3) ²
1M        2.322n ± ∞ ¹   79.210n ± ∞ ¹          ~ (p=0.100 n=3) ²
geomean   2.387n          40.92n        +1614.16%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
```

</p>
</details> 



### Potential next steps
1. detect whether the block is in-memory or on-disk and adjust our
plans.
2. adjust the cost based on whether we have query sharding enabled or
not; hashing series is expensive

1\. will probably give higher returns


<!--  Thanks for sending a pull request!  Before submitting:

1. Read our CONTRIBUTING.md guide
3. Rebase your PR if it gets out of sync with main
-->

#### What this PR does

#### Which issue(s) this PR fixes or relates to

related to #11920

---------

Signed-off-by: Dimitar Dimitrov <[email protected]>
@dimitarvdimitrov dimitarvdimitrov merged commit e8e048d into main Oct 16, 2025
28 checks passed
@dimitarvdimitrov dimitarvdimitrov deleted the dimitar/labels/fix-cost-estimates branch October 16, 2025 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants