Releases: cortexproject/cortex
Releases · cortexproject/cortex
Cortex 1.7.0-rc.0
Changelog
Cortex
- [CHANGE] FramedSnappy encoding support has been removed from Push and Remote Read APIs. This means Prometheus 1.6 support has been removed and the oldest Prometheus version supported in the remote write is 1.7. #3682
- [CHANGE] Ruler: removed the flag -ruler.evaluation-delay-duration-deprecatedwhich was deprecated in 1.4.0. Please use theruler_evaluation_delay_durationper-tenant limit instead. #3694
- [CHANGE] Removed the flags -<prefix>.grpc-use-gzip-compressionwhich were deprecated in 1.3.0: #3694- -query-scheduler.grpc-client-config.grpc-use-gzip-compression: use- -query-scheduler.grpc-client-config.grpc-compressioninstead
- -frontend.grpc-client-config.grpc-use-gzip-compression: use- -frontend.grpc-client-config.grpc-compressioninstead
- -ruler.client.grpc-use-gzip-compression: use- -ruler.client.grpc-compressioninstead
- -bigtable.grpc-use-gzip-compression: use- -bigtable.grpc-compressioninstead
- -ingester.client.grpc-use-gzip-compression: use- -ingester.client.grpc-compressioninstead
- -querier.frontend-client.grpc-use-gzip-compression: use- -querier.frontend-client.grpc-compressioninstead
 
- [CHANGE] Querier: it's not required to set -frontend.query-stats-enabled=truein the querier anymore to enable query statistics logging in the query-frontend. The flag is now required to be configured only in the query-frontend and it will be propagated to the queriers. #3595 #3695
- [CHANGE] Blocks storage: compactor is now required when running a Cortex cluster with the blocks storage, because it also keeps the bucket index updated. #3583
- [CHANGE] Blocks storage: block deletion marks are now stored in a per-tenant global markers/ location too, other than within the block location. The compactor, at startup, will copy deletion marks from the block location to the global location. This migration is required only once, so you can safely disable it via -compactor.block-deletion-marks-migration-enabled=falseonce new compactor has successfully started once in your cluster. #3583
- [CHANGE] OpenStack Swift: the default value for the -ruler.storage.swift.container-nameand-swift.container-nameconfig options has changed fromcortexto empty string. If you were relying on the default value, you should set it back tocortex. #3660
- [CHANGE] HA Tracker: configured replica label is now verified against label value length limit (-validation.max-length-label-value). #3668
- [CHANGE] Distributor: extend_writesfield in YAML configuration has moved fromlifecycler(insideingester_config) todistributor_config. This doesn't affect command line option-distributor.extend-writes, which stays the same. #3719
- [CHANGE] Alertmanager: Deprecated -cluster.CLI flags in favor of their-alertmanager.cluster.equivalent. The deprecated flags (and their respective YAML config options) are: #3677- -cluster.listen-addressin favor of- -alertmanager.cluster.listen-address
- -cluster.advertise-addressin favor of- -alertmanager.cluster.advertise-address
- -cluster.peerin favor of- -alertmanager.cluster.peers
- -cluster.peer-timeoutin favor of- -alertmanager.cluster.peer-timeout
 
- [CHANGE] Blocks storage: the default value of -blocks-storage.bucket-store.sync-intervalhas been changed from5mto15m. #3724
- [FEATURE] Querier: Queries can be federated across multiple tenants. The tenants IDs involved need to be specified separated by a |character in theX-Scope-OrgIDrequest header. This is an experimental feature, which can be enabled by setting-tenant-federation.enabled=trueon all Cortex services. #3250
- [FEATURE] Alertmanager: introduced the experimental option -alertmanager.sharding-enabledto shard tenants across multiple Alertmanager instances. This feature is still under heavy development and its usage is discouraged. The following new metrics are exported by the Alertmanager: #3664- cortex_alertmanager_ring_check_errors_total
- cortex_alertmanager_sync_configs_total
- cortex_alertmanager_sync_configs_failed_total
- cortex_alertmanager_tenants_discovered
- cortex_alertmanager_tenants_owned
 
- [ENHANCEMENT] Allow specifying JAEGER_ENDPOINT instead of sampling server or local agent port. #3682
- [ENHANCEMENT] Blocks storage: introduced a per-tenant bucket index, periodically updated by the compactor, used to avoid full bucket scanning done by queriers, store-gateways and rulers. The bucket index is updated by the compactor during blocks cleanup, on every -compactor.cleanup-interval. #3553 #3555 #3561 #3583 #3625 #3711 #3715
- [ENHANCEMENT] Blocks storage: introduced an option -blocks-storage.bucket-store.bucket-index.enabledto enable the usage of the bucket index in the querier, store-gateway and ruler. When enabled, the querier, store-gateway and ruler will use the bucket index to find a tenant's blocks instead of running the periodic bucket scan. The following new metrics are exported by the querier and ruler: #3614 #3625- cortex_bucket_index_loads_total
- cortex_bucket_index_load_failures_total
- cortex_bucket_index_load_duration_seconds
- cortex_bucket_index_loaded
 
- [ENHANCEMENT] Compactor: exported the following metrics. #3583 #3625
- cortex_bucket_blocks_count: Total number of blocks per tenant in the bucket. Includes blocks marked for deletion, but not partial blocks.
- cortex_bucket_blocks_marked_for_deletion_count: Total number of blocks per tenant marked for deletion in the bucket.
- cortex_bucket_blocks_partials_count: Total number of partial blocks.
- cortex_bucket_index_last_successful_update_timestamp_seconds: Timestamp of the last successful update of a tenant's bucket index.
 
- [ENHANCEMENT] Ruler: Add cortex_prometheus_last_evaluation_samplesto expose the number of samples generated by a rule group per tenant. #3582
- [ENHANCEMENT] Memberlist: add status page (/memberlist) with available details about memberlist-based KV store and memberlist cluster. It's also possible to view KV values in Go struct or JSON format, or download for inspection. #3575
- [ENHANCEMENT] Memberlist: client can now keep a size-bounded buffer with sent and received messages and display them in the admin UI (/memberlist) for troubleshooting. #3581 #3602
- [ENHANCEMENT] Blocks storage: added block index attributes caching support to metadata cache. The TTL can be configured via -blocks-storage.bucket-store.metadata-cache.block-index-attributes-ttl. #3629
- [ENHANCEMENT] Alertmanager: Add support for Azure blob storage. #3634
- [ENHANCEMENT] Compactor: tenants marked for deletion will now be fully cleaned up after some delay since deletion of last block. Cleanup includes removal of remaining marker files (including tenant deletion mark file) and files under debug/metas. #3613
- [ENHANCEMENT] Compactor: retry compaction of a single tenant on failure instead of re-running compaction for all tenants. #3627
- [ENHANCEMENT] Querier: Implement result caching for tenant query federation. #3640
- [ENHANCEMENT] API: Add a modequery parameter for the config endpoint: #3645- /config?mode=diff: Shows the YAML configuration with all values that differ from the defaults.
- /config?mode=defaults: Shows the YAML configuration with all the default values.
 
- [ENHANCEMENT] OpenStack Swift: added the following config options to OpenStack Swift backend client: #3660
- Chunks storage: -swift.auth-version,-swift.max-retries,-swift.connect-timeout,-swift.request-timeout.
- Blocks storage: -blocks-storage.swift.auth-version,-blocks-storage.swift.max-retries,-blocks-storage.swift.connect-timeout,-blocks-storage.swift.request-timeout.
- Ruler: -ruler.storage.swift.auth-version,-ruler.storage.swift.max-retries,-ruler.storage.swift.connect-timeout,-ruler.storage.swift.request-timeout.
 
- Chunks storage: 
- [ENHANCEMENT] Disabled in-memory shuffle-sharding subring cache in the store-gateway, ruler and compactor. This should reduce the memory utilisation in these services when shuffle-sharding is enabled, without introducing a significantly increase CPU utilisation. #3601
- [ENHANCEMENT] Shuffle sharding: optimised subring generation used by shuffle sharding. #3601
- [ENHANCEMENT] New /runtime_config endpoint that returns the defined runtime configuration in YAML format. The returned configuration includes overrides. #3639
- [ENHANCEMENT] Query-frontend: included the parameter name failed to validate in HTTP 400 message. #3703
- [ENHANCEMENT] Fail to startup Cortex if provided runtime config is invalid. #3707
- [ENHANCEMENT] Alertmanager: Add flags to customize the cluster configuration: #3667
- -alertmanager.cluster.gossip-interval: The interval between sending gossip messages. By lowering this value (more frequent) gossip messages are propagated across cluster more quickly at the expense of increased bandwidth usage.
- -alertmanager.cluster.push-pull-interval: The interval between gossip state syncs. Setting this interval lower (more frequent) will increase convergence speeds across larger clusters at the expense of increased bandwidth usage.
 
- [ENHANCEMENT] Distributor: change the error message returned when a received series has too many label values. The new message format has the series at the end and this plays better with Prometheus logs truncation. #3718
- From: sample for '<series>' has <value> label names; limit <value>
- To: series has too many labels (actual: <value>, limit: <value>) series: '<series>'
 
- From: 
- [ENHANCEMENT] Improve bucket index loader to handle edge case where new tenant has not had blocks uploaded to storage yet. #3717
- [BUGFIX] Allow -querier.max-query-lookbackusey|w|dsuffix like deprecated-store.max-look-back-period. #3598
- [BUGFIX] Memberlist: Entry in the ring should now not appear again after using "Forget" ...
Cortex 1.6.0
Changelog
Cortex
- [CHANGE] Query Frontend: deprecate -querier.compress-http-responsesin favour of-api.response-compression-enabled. #3544
- [CHANGE] Querier: deprecated -store.max-look-back-period. You should use-querier.max-query-lookbackinstead. #3452
- [CHANGE] Blocks storage: increased -blocks-storage.bucket-store.chunks-cache.attributes-ttldefault from24hto168h(1 week). #3528
- [CHANGE] Blocks storage: the config option -blocks-storage.bucket-store.index-cache.postings-compression-enabledhas been deprecated and postings compression is always enabled. #3538
- [CHANGE] Ruler: gRPC message size default limits on the Ruler-client side have changed: #3523
- limit for outgoing gRPC messages has changed from 2147483647 to 16777216 bytes
- limit for incoming gRPC messages has changed from 4194304 to 104857600 bytes
 
- [FEATURE] Distributor/Ingester: Provide ability to not overflow writes in the presence of a leaving or unhealthy ingester. This allows for more efficient ingester rolling restarts. #3305
- [FEATURE] Query-frontend: introduced query statistics logged in the query-frontend when enabled via -frontend.query-stats-enabled=true. When enabled, the metriccortex_query_seconds_totalis tracked, counting the sum of the wall time spent across all queriers while running queries (on a per-tenant basis). The metricscortex_request_duration_secondsandcortex_query_seconds_totalare different: the first one tracks the request duration (eg. HTTP request from the client), while the latter tracks the sum of the wall time on all queriers involved executing the query. #3539
- [ENHANCEMENT] API: Add GZIP HTTP compression to the API responses. Compression can be enabled via -api.response-compression-enabled. #3536
- [ENHANCEMENT] Added zone-awareness support on queries. When zone-awareness is enabled, queries will still succeed if all ingesters in a single zone will fail. #3414
- [ENHANCEMENT] Blocks storage ingester: exported more TSDB-related metrics. #3412
- cortex_ingester_tsdb_wal_corruptions_total
- cortex_ingester_tsdb_head_truncations_failed_total
- cortex_ingester_tsdb_head_truncations_total
- cortex_ingester_tsdb_head_gc_duration_seconds
 
- [ENHANCEMENT] Enforced keepalive on all gRPC clients used for inter-service communication. #3431
- [ENHANCEMENT] Added cortex_alertmanager_config_hashmetric to expose hash of Alertmanager Config loaded per user. #3388
- [ENHANCEMENT] Query-Frontend / Query-Scheduler: New component called "Query-Scheduler" has been introduced. Query-Scheduler is simply a queue of requests, moved outside of Query-Frontend. This allows Query-Frontend to be scaled separately from number of queues. To make Query-Frontend and Querier use Query-Scheduler, they need to be started with -frontend.scheduler-addressand-querier.scheduler-addressoptions respectively. #3374 #3471
- [ENHANCEMENT] Query-frontend / Querier / Ruler: added -querier.max-query-lookbackto limit how long back data (series and metadata) can be queried. This setting can be overridden on a per-tenant basis and is enforced in the query-frontend, querier and ruler. #3452 #3458
- [ENHANCEMENT] Querier: added -querier.query-store-for-labels-enabledto query store for label names, label values and series APIs. Only works with blocks storage engine. #3461 #3520
- [ENHANCEMENT] Ingester: exposed -blocks-storage.tsdb.wal-segment-size-bytesconfig option to customise the TSDB WAL segment max size. #3476
- [ENHANCEMENT] Compactor: concurrently run blocks cleaner for multiple tenants. Concurrency can be configured via -compactor.cleanup-concurrency. #3483
- [ENHANCEMENT] Compactor: shuffle tenants before running compaction. #3483
- [ENHANCEMENT] Compactor: wait for a stable ring at startup, when sharding is enabled. #3484
- [ENHANCEMENT] Store-gateway: added -blocks-storage.bucket-store.index-header-lazy-loading-enabledto enable index-header lazy loading (experimental). When enabled, index-headers will be mmap-ed only once required by a query and will be automatically released after-blocks-storage.bucket-store.index-header-lazy-loading-idle-timeouttime of inactivity. #3498
- [ENHANCEMENT] Alertmanager: added metrics cortex_alertmanager_notification_requests_totalandcortex_alertmanager_notification_requests_failed_total. #3518
- [ENHANCEMENT] Ingester: added -blocks-storage.tsdb.head-chunks-write-buffer-size-bytesto fine-tune the TSDB head chunks write buffer size when running Cortex blocks storage. #3518
- [ENHANCEMENT] /metrics now supports OpenMetrics output. HTTP and gRPC servers metrics can now include exemplars. #3524
- [ENHANCEMENT] Expose gRPC keepalive policy options by gRPC server. #3524
- [ENHANCEMENT] Blocks storage: enabled caching of meta.jsonattributes, configurable via-blocks-storage.bucket-store.metadata-cache.metafile-attributes-ttl. #3528
- [ENHANCEMENT] Compactor: added a config validation check to fail fast if the compactor has been configured invalid block range periods (each period is expected to be a multiple of the previous one). #3534
- [ENHANCEMENT] Blocks storage: concurrently fetch deletion marks from object storage. #3538
- [ENHANCEMENT] Blocks storage ingester: ingester can now close idle TSDB and delete local data. #3491 #3552
- [ENHANCEMENT] Blocks storage: add option to use V2 signatures for S3 authentication. #3540
- [ENHANCEMENT] Exported process metrics to monitor the number of memory map areas allocated. #3537
- 
- process_memory_map_areas
 
- 
- process_memory_map_areas_limit
 
 
- 
- [ENHANCEMENT] Ruler: Expose gRPC client options. #3523
- [ENHANCEMENT] Compactor: added metrics to track on-going compaction. #3535
- cortex_compactor_tenants_discovered
- cortex_compactor_tenants_skipped
- cortex_compactor_tenants_processing_succeeded
- cortex_compactor_tenants_processing_failed
 
- [ENHANCEMENT] Added new experimental API endpoints: POST /purger/delete_tenantandGET /purger/delete_tenant_statusfor deleting all tenant data. Only works with blocks storage. Compactor removes blocks that belong to user marked for deletion. #3549 #3558
- [ENHANCEMENT] Chunks storage: add option to use V2 signatures for S3 authentication. #3560
- [BUGFIX] Query-Frontend: cortex_query_seconds_totalnow return seconds not nanoseconds. #3589
- [BUGFIX] Blocks storage ingester: fixed some cases leading to a TSDB WAL corruption after a partial write to disk. #3423
- [BUGFIX] Blocks storage: Fix the race between ingestion and /flushcall resulting in overlapping blocks. #3422
- [BUGFIX] Querier: fixed -querier.max-query-into-futurewhich wasn't correctly enforced on range queries. #3452
- [BUGFIX] Fixed float64 precision stability when aggregating metrics before exposing them. This could have lead to false counters resets when querying some metrics exposed by Cortex. #3506
- [BUGFIX] Querier: the meta.json sync concurrency done when running Cortex with the blocks storage is now controlled by -blocks-storage.bucket-store.meta-sync-concurrencyinstead of the incorrect-blocks-storage.bucket-store.block-sync-concurrency(default values are the same). #3531
- [BUGFIX] Querier: fixed initialization order of querier module when using blocks storage. It now (again) waits until blocks have been synchronized. #3551
Blocksconvert
Cortex 1.6.0-rc.1
Changelog
Cortex
- [BUGFIX] Query-Frontend: cortex_query_seconds_totalnow return seconds not nanoseconds. #3589
Cortex 1.6.0-rc.0
Changelog
Cortex
- [CHANGE] Query Frontend: deprecate -querier.compress-http-responsesin favour of-api.response-compression-enabled. #3544
- [CHANGE] Querier: deprecated -store.max-look-back-period. You should use-querier.max-query-lookbackinstead. #3452
- [CHANGE] Blocks storage: increased -blocks-storage.bucket-store.chunks-cache.attributes-ttldefault from24hto168h(1 week). #3528
- [CHANGE] Blocks storage: the config option -blocks-storage.bucket-store.index-cache.postings-compression-enabledhas been deprecated and postings compression is always enabled. #3538
- [CHANGE] Ruler: gRPC message size default limits on the Ruler-client side have changed: #3523
- limit for outgoing gRPC messages has changed from 2147483647 to 16777216 bytes
- limit for incoming gRPC messages has changed from 4194304 to 104857600 bytes
 
- [FEATURE] Distributor/Ingester: Provide ability to not overflow writes in the presence of a leaving or unhealthy ingester. This allows for more efficient ingester rolling restarts. #3305
- [FEATURE] Query-frontend: introduced query statistics logged in the query-frontend when enabled via -frontend.query-stats-enabled=true. When enabled, the metriccortex_query_seconds_totalis tracked, counting the sum of the wall time spent across all queriers while running queries (on a per-tenant basis). The metricscortex_request_duration_secondsandcortex_query_seconds_totalare different: the first one tracks the request duration (eg. HTTP request from the client), while the latter tracks the sum of the wall time on all queriers involved executing the query. #3539
- [ENHANCEMENT] API: Add GZIP HTTP compression to the API responses. Compression can be enabled via -api.response-compression-enabled. #3536
- [ENHANCEMENT] Added zone-awareness support on queries. When zone-awareness is enabled, queries will still succeed if all ingesters in a single zone will fail. #3414
- [ENHANCEMENT] Blocks storage ingester: exported more TSDB-related metrics. #3412
- cortex_ingester_tsdb_wal_corruptions_total
- cortex_ingester_tsdb_head_truncations_failed_total
- cortex_ingester_tsdb_head_truncations_total
- cortex_ingester_tsdb_head_gc_duration_seconds
 
- [ENHANCEMENT] Enforced keepalive on all gRPC clients used for inter-service communication. #3431
- [ENHANCEMENT] Added cortex_alertmanager_config_hashmetric to expose hash of Alertmanager Config loaded per user. #3388
- [ENHANCEMENT] Query-Frontend / Query-Scheduler: New component called "Query-Scheduler" has been introduced. Query-Scheduler is simply a queue of requests, moved outside of Query-Frontend. This allows Query-Frontend to be scaled separately from number of queues. To make Query-Frontend and Querier use Query-Scheduler, they need to be started with -frontend.scheduler-addressand-querier.scheduler-addressoptions respectively. #3374 #3471
- [ENHANCEMENT] Query-frontend / Querier / Ruler: added -querier.max-query-lookbackto limit how long back data (series and metadata) can be queried. This setting can be overridden on a per-tenant basis and is enforced in the query-frontend, querier and ruler. #3452 #3458
- [ENHANCEMENT] Querier: added -querier.query-store-for-labels-enabledto query store for label names, label values and series APIs. Only works with blocks storage engine. #3461 #3520
- [ENHANCEMENT] Ingester: exposed -blocks-storage.tsdb.wal-segment-size-bytesconfig option to customise the TSDB WAL segment max size. #3476
- [ENHANCEMENT] Compactor: concurrently run blocks cleaner for multiple tenants. Concurrency can be configured via -compactor.cleanup-concurrency. #3483
- [ENHANCEMENT] Compactor: shuffle tenants before running compaction. #3483
- [ENHANCEMENT] Compactor: wait for a stable ring at startup, when sharding is enabled. #3484
- [ENHANCEMENT] Store-gateway: added -blocks-storage.bucket-store.index-header-lazy-loading-enabledto enable index-header lazy loading (experimental). When enabled, index-headers will be mmap-ed only once required by a query and will be automatically released after-blocks-storage.bucket-store.index-header-lazy-loading-idle-timeouttime of inactivity. #3498
- [ENHANCEMENT] Alertmanager: added metrics cortex_alertmanager_notification_requests_totalandcortex_alertmanager_notification_requests_failed_total. #3518
- [ENHANCEMENT] Ingester: added -blocks-storage.tsdb.head-chunks-write-buffer-size-bytesto fine-tune the TSDB head chunks write buffer size when running Cortex blocks storage. #3518
- [ENHANCEMENT] /metrics now supports OpenMetrics output. HTTP and gRPC servers metrics can now include exemplars. #3524
- [ENHANCEMENT] Expose gRPC keepalive policy options by gRPC server. #3524
- [ENHANCEMENT] Blocks storage: enabled caching of meta.jsonattributes, configurable via-blocks-storage.bucket-store.metadata-cache.metafile-attributes-ttl. #3528
- [ENHANCEMENT] Compactor: added a config validation check to fail fast if the compactor has been configured invalid block range periods (each period is expected to be a multiple of the previous one). #3534
- [ENHANCEMENT] Blocks storage: concurrently fetch deletion marks from object storage. #3538
- [ENHANCEMENT] Blocks storage ingester: ingester can now close idle TSDB and delete local data. #3491 #3552
- [ENHANCEMENT] Blocks storage: add option to use V2 signatures for S3 authentication. #3540
- [ENHANCEMENT] Exported process metrics to monitor the number of memory map areas allocated. #3537
- 
- process_memory_map_areas
 
- 
- process_memory_map_areas_limit
 
 
- 
- [ENHANCEMENT] Ruler: Expose gRPC client options. #3523
- [ENHANCEMENT] Compactor: added metrics to track on-going compaction. #3535
- cortex_compactor_tenants_discovered
- cortex_compactor_tenants_skipped
- cortex_compactor_tenants_processing_succeeded
- cortex_compactor_tenants_processing_failed
 
- [ENHANCEMENT] Added new experimental API endpoints: POST /purger/delete_tenantandGET /purger/delete_tenant_statusfor deleting all tenant data. Only works with blocks storage. Compactor removes blocks that belong to user marked for deletion. #3549 #3558
- [ENHANCEMENT] Chunks storage: add option to use V2 signatures for S3 authentication. #3560
- [BUGFIX] Blocks storage ingester: fixed some cases leading to a TSDB WAL corruption after a partial write to disk. #3423
- [BUGFIX] Blocks storage: Fix the race between ingestion and /flushcall resulting in overlapping blocks. #3422
- [BUGFIX] Querier: fixed -querier.max-query-into-futurewhich wasn't correctly enforced on range queries. #3452
- [BUGFIX] Fixed float64 precision stability when aggregating metrics before exposing them. This could have lead to false counters resets when querying some metrics exposed by Cortex. #3506
- [BUGFIX] Querier: the meta.json sync concurrency done when running Cortex with the blocks storage is now controlled by -blocks-storage.bucket-store.meta-sync-concurrencyinstead of the incorrect-blocks-storage.bucket-store.block-sync-concurrency(default values are the same). #3531
- [BUGFIX] Querier: fixed initialization order of querier module when using blocks storage. It now (again) waits until blocks have been synchronized. #3551
Blocksconvert
Cortex 1.5.0
Changelog
Cortex
- [CHANGE] Blocks storage: update the default HTTP configuration values for the S3 client to the upstream Thanos default values. #3244
- -blocks-storage.s3.http.idle-conn-timeoutis set 90 seconds.
- -blocks-storage.s3.http.response-header-timeoutis set to 2 minutes.
 
- [CHANGE] Improved shuffle sharding support in the write path. This work introduced some config changes: #3090
- Introduced -distributor.sharding-strategyCLI flag (and its respectivesharding_strategyYAML config option) to explicitly specify which sharding strategy should be used in the write path
- -experimental.distributor.user-subring-sizeflag renamed to- -distributor.ingestion-tenant-shard-size
- user_subring_sizelimit YAML config option renamed to- ingestion_tenant_shard_size
 
- Introduced 
- [CHANGE] Dropped "blank Alertmanager configuration; using fallback" message from Info to Debug level. #3205
- [CHANGE] Zone-awareness replication for time-series now should be explicitly enabled in the distributor via the -distributor.zone-awareness-enabledCLI flag (or its respective YAML config option). Before, zone-aware replication was implicitly enabled if a zone was set on ingesters. #3200
- [CHANGE] Removed the deprecated CLI flag -config-yaml. You should use-schema-config-fileinstead. #3225
- [CHANGE] Enforced the HTTP method required by some API endpoints which did (incorrectly) allow any method before that. #3228
- GET /
- GET /config
- GET /debug/fgprof
- GET /distributor/all_user_stats
- GET /distributor/ha_tracker
- GET /all_user_stats
- GET /ha-tracker
- GET /api/v1/user_stats
- GET /api/v1/chunks
- GET <legacy-http-prefix>/user_stats
- GET <legacy-http-prefix>/chunks
- GET /services
- GET /multitenant_alertmanager/status
- GET /status(alertmanager microservice)
- GET|POST /ingester/ring
- GET|POST /ring
- GET|POST /store-gateway/ring
- GET|POST /compactor/ring
- GET|POST /ingester/flush
- GET|POST /ingester/shutdown
- GET|POST /flush
- GET|POST /shutdown
- GET|POST /ruler/ring
- POST /api/v1/push
- POST <legacy-http-prefix>/push
- POST /push
- POST /ingester/push
 
- [CHANGE] Renamed CLI flags to configure the network interface names from which automatically detect the instance IP. #3295
- -compactor.ring.instance-interfacerenamed to- -compactor.ring.instance-interface-names
- -store-gateway.sharding-ring.instance-interfacerenamed to- -store-gateway.sharding-ring.instance-interface-names
- -distributor.ring.instance-interfacerenamed to- -distributor.ring.instance-interface-names
- -ruler.ring.instance-interfacerenamed to- -ruler.ring.instance-interface-names
 
- [CHANGE] Renamed -<prefix>.redis.enable-tlsCLI flag to-<prefix>.redis.tls-enabled, and its respective YAML config option fromenable_tlstotls_enabled. #3298
- [CHANGE] Increased default -<prefix>.redis.timeoutfrom100msto500ms. #3301
- [CHANGE] cortex_alertmanager_config_invalidhas been removed in favor ofcortex_alertmanager_config_last_reload_successful. #3289
- [CHANGE] Query-frontend: POST requests whose body size exceeds 10MiB will be rejected. The max body size can be customised via -frontend.max-body-size. #3276
- [FEATURE] Shuffle sharding: added support for shuffle-sharding queriers in the query-frontend. When configured (-frontend.max-queriers-per-tenantglobally, or using per-tenant limitmax_queriers_per_tenant), each tenants's requests will be handled by different set of queriers. #3113 #3257
- [FEATURE] Shuffle sharding: added support for shuffle-sharding ingesters on the read path. When ingesters shuffle-sharding is enabled and -querier.shuffle-sharding-ingesters-lookback-periodis set, queriers will fetch in-memory series from the minimum set of required ingesters, selecting only ingesters which may have received series since 'now - lookback period'. #3252
- [FEATURE] Query-frontend: added compressionconfig to support results cache with compression. #3217
- [FEATURE] Add OpenStack Swift support to blocks storage. #3303
- [FEATURE] Added support for applying Prometheus relabel configs on series received by the distributor. A metric_relabel_configsfield has been added to the per-tenant limits configuration. #3329
- [FEATURE] Support for Cassandra client SSL certificates. #3384
- [ENHANCEMENT] Ruler: Introduces two new limits -ruler.max-rules-per-rule-groupand-ruler.max-rule-groups-per-tenantto control the number of rules per rule group and the total number of rule groups for a given user. They are disabled by default. #3366
- [ENHANCEMENT] Allow to specify multiple comma-separated Cortex services to -targetCLI option (or its respective YAML config option). For example,-target=all,compactorcan be used to start Cortex single-binary with compactor as well. #3275
- [ENHANCEMENT] Expose additional HTTP configs for the S3 backend client. New flag are listed below: #3244
- -blocks-storage.s3.http.idle-conn-timeout
- -blocks-storage.s3.http.response-header-timeout
- -blocks-storage.s3.http.insecure-skip-verify
 
- [ENHANCEMENT] Added cortex_query_frontend_connected_clientsmetric to show the number of workers currently connected to the frontend. #3207
- [ENHANCEMENT] Shuffle sharding: improved shuffle sharding in the write path. Shuffle sharding now should be explicitly enabled via -distributor.sharding-strategyCLI flag (or its respective YAML config option) and guarantees stability, consistency, shuffling and balanced zone-awareness properties. #3090 #3214
- [ENHANCEMENT] Ingester: added new metric cortex_ingester_active_seriesto track active series more accurately. Also added options to control whether active series tracking is enabled (-ingester.active-series-enabled, defaults to false), and how often this metric is updated (-ingester.active-series-update-period) and max idle time for series to be considered inactive (-ingester.active-series-idle-timeout). #3153
- [ENHANCEMENT] Store-gateway: added zone-aware replication support to blocks replication in the store-gateway. #3200
- [ENHANCEMENT] Store-gateway: exported new metrics. #3231
- cortex_bucket_store_cached_series_fetch_duration_seconds
- cortex_bucket_store_cached_postings_fetch_duration_seconds
- cortex_bucket_stores_gate_queries_max
 
- [ENHANCEMENT] Added -versionflag to Cortex. #3233
- [ENHANCEMENT] Hash ring: added instance registered timestamp to the ring. #3248
- [ENHANCEMENT] Reduce tail latency by smoothing out spikes in rate of chunk flush operations. #3191
- [ENHANCEMENT] User Cortex as User Agent in http requests issued by Configs DB client. #3264
- [ENHANCEMENT] Experimental Ruler API: Fetch rule groups from object storage in parallel. #3218
- [ENHANCEMENT] Chunks GCS object storage client uses the fieldsselector to limit the payload size when listing objects in the bucket. #3218 #3292
- [ENHANCEMENT] Added shuffle sharding support to ruler. Added new metric cortex_ruler_sync_rules_total. #3235
- [ENHANCEMENT] Return an explicit error when the store-gateway is explicitly requested without a blocks storage engine. #3287
- [ENHANCEMENT] Ruler: only load rules that belong to the ruler. Improves rules synching performances when ruler sharding is enabled. #3269
- [ENHANCEMENT] Added -<prefix>.redis.tls-insecure-skip-verifyflag. #3298
- [ENHANCEMENT] Added cortex_alertmanager_config_last_reload_successful_secondsmetric to show timestamp of last successful AM config reload. #3289
- [ENHANCEMENT] Blocks storage: reduced number of bucket listing operations to list block content (applies to newly created blocks only). #3363
- [ENHANCEMENT] Ruler: Include the tenant ID on the notifier logs. #3372
- [ENHANCEMENT] Blocks storage Compactor: Added -compactor.enabled-tenantsand-compactor.disabled-tenantsto explicitly enable or disable compaction of specific tenants. #3385
- [ENHANCEMENT] Blocks storage ingester: Creating checkpoint only once even when there are multiple Head compactions in a single Compact()call. #3373
- [BUGFIX] Blocks storage ingester: Read repair memory-mapped chunks file which can end up being empty on abrupt shutdowns combined with faulty disks. #3373
- [BUGFIX] Blocks storage ingester: Close TSDB resources on failed startup preventing ingester OOMing. #3373
- [BUGFIX] No-longer-needed ingester operations for queries triggered by queriers and rulers are now canceled. #3178
- [BUGFIX] Ruler: directories in the configured rules-pathwill be removed on startup and shutdown in order to ensure they don't persist between runs. #3195
- [BUGFIX] Handle hash-collisions in the query path. #3192
- [BUGFIX] Check for postgres rows errors. #3197
- [BUGFIX] Ruler Experimental API: Don't allow rule groups without names or empty rule groups. #3210
- [BUGFIX] Experimental Alertmanager API: Do not allow empty Alertmanager configurations or bad template filenames to be submitted through the configuration API. #3185
- [BUGFIX] Reduce failures to update heartbeat when using Consul. #3259
- [BUGFIX] When using ruler sharding, moving all user rule groups from ruler to a different one and then back could end up with some user groups not being evaluated at all. #3235
- [BUGFIX] Fixed shuffle sharding consistency when zone-awareness is enabled and the shard size is increased or instances in a new zone are added. #3299
- [BUGFIX] Use a valid grpc header when logging IP addresses. #3307
- [BUGFIX] Fixed the metric cortex_prometheus_rule_group_duration_secondsin the Ruler, it wouldn't report any values. #3310
- [BUGFIX] Fixed gRPC connections leaking in rulers when rulers sharding is enabled and APIs called. #3314
- [BUGFIX] Fixed shuffle sharding consistency when zone-awareness is enabled and the shard size is increased or instances...
Cortex 1.5.0-rc.1
Cortex 1.5.0-rc.0
Changelog
Cortex
- [CHANGE] Blocks storage: update the default HTTP configuration values for the S3 client to the upstream Thanos default values. #3244
- -blocks-storage.s3.http.idle-conn-timeoutis set 90 seconds.
- -blocks-storage.s3.http.response-header-timeoutis set to 2 minutes.
 
- [CHANGE] Improved shuffle sharding support in the write path. This work introduced some config changes: #3090
- Introduced -distributor.sharding-strategyCLI flag (and its respectivesharding_strategyYAML config option) to explicitly specify which sharding strategy should be used in the write path
- -experimental.distributor.user-subring-sizeflag renamed to- -distributor.ingestion-tenant-shard-size
- user_subring_sizelimit YAML config option renamed to- ingestion_tenant_shard_size
 
- Introduced 
- [CHANGE] Dropped "blank Alertmanager configuration; using fallback" message from Info to Debug level. #3205
- [CHANGE] Zone-awareness replication for time-series now should be explicitly enabled in the distributor via the -distributor.zone-awareness-enabledCLI flag (or its respective YAML config option). Before, zone-aware replication was implicitly enabled if a zone was set on ingesters. #3200
- [CHANGE] Removed the deprecated CLI flag -config-yaml. You should use-schema-config-fileinstead. #3225
- [CHANGE] Enforced the HTTP method required by some API endpoints which did (incorrectly) allow any method before that. #3228
- GET /
- GET /config
- GET /debug/fgprof
- GET /distributor/all_user_stats
- GET /distributor/ha_tracker
- GET /all_user_stats
- GET /ha-tracker
- GET /api/v1/user_stats
- GET /api/v1/chunks
- GET <legacy-http-prefix>/user_stats
- GET <legacy-http-prefix>/chunks
- GET /services
- GET /multitenant_alertmanager/status
- GET /status(alertmanager microservice)
- GET|POST /ingester/ring
- GET|POST /ring
- GET|POST /store-gateway/ring
- GET|POST /compactor/ring
- GET|POST /ingester/flush
- GET|POST /ingester/shutdown
- GET|POST /flush
- GET|POST /shutdown
- GET|POST /ruler/ring
- POST /api/v1/push
- POST <legacy-http-prefix>/push
- POST /push
- POST /ingester/push
 
- [CHANGE] Renamed CLI flags to configure the network interface names from which automatically detect the instance IP. #3295
- -compactor.ring.instance-interfacerenamed to- -compactor.ring.instance-interface-names
- -store-gateway.sharding-ring.instance-interfacerenamed to- -store-gateway.sharding-ring.instance-interface-names
- -distributor.ring.instance-interfacerenamed to- -distributor.ring.instance-interface-names
- -ruler.ring.instance-interfacerenamed to- -ruler.ring.instance-interface-names
 
- [CHANGE] Renamed -<prefix>.redis.enable-tlsCLI flag to-<prefix>.redis.tls-enabled, and its respective YAML config option fromenable_tlstotls_enabled. #3298
- [CHANGE] Increased default -<prefix>.redis.timeoutfrom100msto500ms. #3301
- [CHANGE] cortex_alertmanager_config_invalidhas been removed in favor ofcortex_alertmanager_config_last_reload_successful. #3289
- [CHANGE] Query-frontend: POST requests whose body size exceeds 10MiB will be rejected. The max body size can be customised via -frontend.max-body-size. #3276
- [FEATURE] Shuffle sharding: added support for shuffle-sharding queriers in the query-frontend. When configured (-frontend.max-queriers-per-tenantglobally, or using per-tenant limitmax_queriers_per_tenant), each tenants's requests will be handled by different set of queriers. #3113 #3257
- [FEATURE] Shuffle sharding: added support for shuffle-sharding ingesters on the read path. When ingesters shuffle-sharding is enabled and -querier.shuffle-sharding-ingesters-lookback-periodis set, queriers will fetch in-memory series from the minimum set of required ingesters, selecting only ingesters which may have received series since 'now - lookback period'. #3252
- [FEATURE] Query-frontend: added compressionconfig to support results cache with compression. #3217
- [FEATURE] Add OpenStack Swift support to blocks storage. #3303
- [FEATURE] Added support for applying Prometheus relabel configs on series received by the distributor. A metric_relabel_configsfield has been added to the per-tenant limits configuration. #3329
- [FEATURE] Support for Cassandra client SSL certificates. #3384
- [ENHANCEMENT] Ruler: Introduces two new limits -ruler.max-rules-per-rule-groupand-ruler.max-rule-groups-per-tenantto control the number of rules per rule group and the total number of rule groups for a given user. They are disabled by default. #3366
- [ENHANCEMENT] Allow to specify multiple comma-separated Cortex services to -targetCLI option (or its respective YAML config option). For example,-target=all,compactorcan be used to start Cortex single-binary with compactor as well. #3275
- [ENHANCEMENT] Expose additional HTTP configs for the S3 backend client. New flag are listed below: #3244
- -blocks-storage.s3.http.idle-conn-timeout
- -blocks-storage.s3.http.response-header-timeout
- -blocks-storage.s3.http.insecure-skip-verify
 
- [ENHANCEMENT] Added cortex_query_frontend_connected_clientsmetric to show the number of workers currently connected to the frontend. #3207
- [ENHANCEMENT] Shuffle sharding: improved shuffle sharding in the write path. Shuffle sharding now should be explicitly enabled via -distributor.sharding-strategyCLI flag (or its respective YAML config option) and guarantees stability, consistency, shuffling and balanced zone-awareness properties. #3090 #3214
- [ENHANCEMENT] Ingester: added new metric cortex_ingester_active_seriesto track active series more accurately. Also added options to control whether active series tracking is enabled (-ingester.active-series-enabled, defaults to false), and how often this metric is updated (-ingester.active-series-update-period) and max idle time for series to be considered inactive (-ingester.active-series-idle-timeout). #3153
- [ENHANCEMENT] Store-gateway: added zone-aware replication support to blocks replication in the store-gateway. #3200
- [ENHANCEMENT] Store-gateway: exported new metrics. #3231
- cortex_bucket_store_cached_series_fetch_duration_seconds
- cortex_bucket_store_cached_postings_fetch_duration_seconds
- cortex_bucket_stores_gate_queries_max
 
- [ENHANCEMENT] Added -versionflag to Cortex. #3233
- [ENHANCEMENT] Hash ring: added instance registered timestamp to the ring. #3248
- [ENHANCEMENT] Reduce tail latency by smoothing out spikes in rate of chunk flush operations. #3191
- [ENHANCEMENT] User Cortex as User Agent in http requests issued by Configs DB client. #3264
- [ENHANCEMENT] Experimental Ruler API: Fetch rule groups from object storage in parallel. #3218
- [ENHANCEMENT] Chunks GCS object storage client uses the fieldsselector to limit the payload size when listing objects in the bucket. #3218 #3292
- [ENHANCEMENT] Added shuffle sharding support to ruler. Added new metric cortex_ruler_sync_rules_total. #3235
- [ENHANCEMENT] Return an explicit error when the store-gateway is explicitly requested without a blocks storage engine. #3287
- [ENHANCEMENT] Ruler: only load rules that belong to the ruler. Improves rules synching performances when ruler sharding is enabled. #3269
- [ENHANCEMENT] Added -<prefix>.redis.tls-insecure-skip-verifyflag. #3298
- [ENHANCEMENT] Added cortex_alertmanager_config_last_reload_successful_secondsmetric to show timestamp of last successful AM config reload. #3289
- [ENHANCEMENT] Blocks storage: reduced number of bucket listing operations to list block content (applies to newly created blocks only). #3363
- [ENHANCEMENT] Ruler: Include the tenant ID on the notifier logs. #3372
- [ENHANCEMENT] Blocks storage Compactor: Added -compactor.enabled-tenantsand-compactor.disabled-tenantsto explicitly enable or disable compaction of specific tenants. #3385
- [ENHANCEMENT] Blocks storage ingester: Creating checkpoint only once even when there are multiple Head compactions in a single Compact()call. #3373
- [BUGFIX] Blocks storage ingester: Read repair memory-mapped chunks file which can end up being empty on abrupt shutdowns combined with faulty disks. #3373
- [BUGFIX] Blocks storage ingester: Close TSDB resources on failed startup preventing ingester OOMing. #3373
- [BUGFIX] No-longer-needed ingester operations for queries triggered by queriers and rulers are now canceled. #3178
- [BUGFIX] Ruler: directories in the configured rules-pathwill be removed on startup and shutdown in order to ensure they don't persist between runs. #3195
- [BUGFIX] Handle hash-collisions in the query path. #3192
- [BUGFIX] Check for postgres rows errors. #3197
- [BUGFIX] Ruler Experimental API: Don't allow rule groups without names or empty rule groups. #3210
- [BUGFIX] Experimental Alertmanager API: Do not allow empty Alertmanager configurations or bad template filenames to be submitted through the configuration API. #3185
- [BUGFIX] Reduce failures to update heartbeat when using Consul. #3259
- [BUGFIX] When using ruler sharding, moving all user rule groups from ruler to a different one and then back could end up with some user groups not being evaluated at all. #3235
- [BUGFIX] Fixed shuffle sharding consistency when zone-awareness is enabled and the shard size is increased or instances in a new zone are added. #3299
- [BUGFIX] Use a valid grpc header when logging IP addresses. #3307
- [BUGFIX] Fixed the metric cortex_prometheus_rule_group_duration_secondsin the Ruler, it wouldn't report any values. #3310
- [BUGFIX] Fixed gRPC connections leaking in rulers when rulers sharding is enabled and APIs called. #3314
- [BUGFIX] Fixed shuffle sharding consistency when zone-awareness is enabled and the shard size is increased or instances...
Cortex 1.4.0
This Cortex release features 112 contributions from 32 authors and exciting news!
Highlights
- Cortex blocks storage is now GA.
- Cassandra support for the chunks storage is now GA.
- Redis caching backend now supports Redis sentinel and Redis cluster too.
- Introduced shuffle sharding support to store-gateway blocks sharding (blocks storage).
- The ruler and alertmanager got several improvements
- Last, but not the least, many enhancements, optimisations and bug fixes.
Please refer to the changelog for full list of changes and improvements.
Changelog
- [CHANGE] Cassandra backend support is now GA (stable). #3180
- [CHANGE] Blocks storage is now GA (stable). The -experimentalprefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180- -experimental.blocks-storage.*flags renamed to- -blocks-storage.*
- -experimental.store-gateway.*flags renamed to- -store-gateway.*
- -experimental.querier.store-gateway-client.*flags renamed to- -querier.store-gateway-client.*
- -experimental.querier.store-gateway-addressesflag renamed to- -querier.store-gateway-addresses
- -store-gateway.replication-factorflag renamed to- -store-gateway.sharding-ring.replication-factor
- -store-gateway.tokens-file-pathflag renamed to- store-gateway.sharding-ring.tokens-file-path
 
- [CHANGE] Ingester: Removed deprecated untyped record from chunks WAL. Only if you are running v1.0or below, it is recommended to first upgrade tov1.1/v1.2/v1.3and run it for a day before upgrading tov1.4to avoid data loss. #3115
- [CHANGE] Distributor API endpoints are no longer served unless target is set to distributororall. #3112
- [CHANGE] Increase the default Cassandra client replication factor to 3. #3007
- [CHANGE] Blocks storage: removed the support to transfer blocks between ingesters on shutdown. When running the Cortex blocks storage, ingesters are expected to run with a persistent disk. The following metrics have been removed: #2996
- cortex_ingester_sent_files
- cortex_ingester_received_files
- cortex_ingester_received_bytes_total
- cortex_ingester_sent_bytes_total
 
- [CHANGE] The buckets for the cortex_chunk_store_index_lookups_per_querymetric have been changed to 1, 2, 4, 8, 16. #3021
- [CHANGE] Blocks storage: the operationlabel valuegetrangehas changed intoget_rangefor the metricsthanos_store_bucket_cache_operation_requests_totalandthanos_store_bucket_cache_operation_hits_total. #3000
- [CHANGE] Experimental Delete Series: /api/v1/admin/tsdb/delete_seriesand/api/v1/admin/tsdb/cancel_delete_requestpurger APIs to return status code204instead of200for success. #2946
- [CHANGE] Histogram cortex_memcache_request_duration_secondsmethodlabel value changes fromMemcached.GettoMemcached.GetBatchedfor batched lookups, and is not reported for non-batched lookups (label valueMemcached.GetMultiremains, and had exactly the same value asGetin nonbatched lookups). The same change applies to tracing spans. #3046
- [CHANGE] TLS server validation is now enabled by default, a new parameter tls_insecure_skip_verifycan be set to true to skip validation optionally. #3030
- [CHANGE] cortex_ruler_config_update_failures_totalhas been removed in favor ofcortex_ruler_config_last_reload_successful. #3056
- [CHANGE] ruler.evaluation_delay_durationfield in YAML config has been moved and renamed tolimits.ruler_evaluation_delay_duration. #3098
- [CHANGE] Removed obsolete results_cache.max_freshnessfrom YAML config (deprecated since Cortex 1.2). #3145
- [CHANGE] Removed obsolete -promql.lookback-deltaoption (deprecated since Cortex 1.2, replaced with-querier.lookback-delta). #3144
- [CHANGE] Cache: added support for Redis Cluster and Redis Sentinel. #2961
- The following changes have been made in Redis configuration:
- -redis.master_nameadded
- -redis.dbadded
- -redis.max-active-connschanged to- -redis.pool-size
- -redis.max-conn-lifetimechanged to- -redis.max-connection-age
- -redis.max-idle-connsremoved
- -redis.wait-on-pool-exhaustionremoved
 
- [CHANGE] TLS configuration for gRPC, HTTP and etcd clients is now marked as experimental. These features are not yet fully baked, and we expect possible small breaking changes in Cortex 1.5. #3198
- [CHANGE] Fixed store-gateway CLI flags inconsistencies. #3201
- -store-gateway.replication-factorflag renamed to- -store-gateway.sharding-ring.replication-factor
- -store-gateway.tokens-file-pathflag renamed to- store-gateway.sharding-ring.tokens-file-path
 
- [FEATURE] Logging of the source IP passed along by a reverse proxy is now supported by setting the -server.log-source-ips-enabled. For non standard headers the settings-server.log-source-ips-headerand-server.log-source-ips-regexcan be used. #2985
- [FEATURE] Blocks storage: added shuffle sharding support to store-gateway blocks sharding. Added the following additional metrics to store-gateway: #3069
- cortex_bucket_stores_tenants_discovered
- cortex_bucket_stores_tenants_synced
 
- [FEATURE] Experimental blocksconvert: introduce an experimental tool blocksconvertto migrate long-term storage chunks to blocks. #3092 #3122 #3127 #3162
- [ENHANCEMENT] Add support for azure storage in China, German and US Government environments. #2988
- [ENHANCEMENT] Query-tee: added a small tolerance to floating point sample values comparison. #2994
- [ENHANCEMENT] Query-tee: add support for doing a passthrough of requests to preferred backend for unregistered routes #3018
- [ENHANCEMENT] Expose storage.aws.dynamodb.backoff_configconfiguration file field. #3026
- [ENHANCEMENT] Added cortex_request_message_bytesandcortex_response_message_byteshistograms to track received and sent gRPC message and HTTP request/response sizes. Addedcortex_inflight_requestsgauge to track number of inflight gRPC and HTTP requests. #3064
- [ENHANCEMENT] Publish ruler's ring metrics. #3074
- [ENHANCEMENT] Add config validation to the experimental Alertmanager API. Invalid configs are no longer accepted. #3053
- [ENHANCEMENT] Add "integration" as a label for cortex_alertmanager_notifications_totalandcortex_alertmanager_notifications_failed_totalmetrics. #3056
- [ENHANCEMENT] Add cortex_ruler_config_last_reload_successfulandcortex_ruler_config_last_reload_successful_secondsto check status of users rule manager. #3056
- [ENHANCEMENT] The configuration validation now fails if an empty YAML node has been set for a root YAML config property. #3080
- [ENHANCEMENT] Memcached dial() calls now have a circuit-breaker to avoid hammering a broken cache. #3051, #3189
- [ENHANCEMENT] -ruler.evaluation-delay-durationis now overridable as a per-tenant limit,ruler_evaluation_delay_duration. #3098
- [ENHANCEMENT] Add TLS support to etcd client. #3102
- [ENHANCEMENT] When a tenant accesses the Alertmanager UI or its API, if we have valid -alertmanager.configs.fallbackwe'll use that to start the manager and avoid failing the request. #3073
- [ENHANCEMENT] Add DELETE api/v1/rules/{namespace}to the Ruler. It allows all the rule groups of a namespace to be deleted. #3120
- [ENHANCEMENT] Experimental Delete Series: Retry processing of Delete requests during failures. #2926
- [ENHANCEMENT] Improve performance of QueryStream() in ingesters. #3177
- [ENHANCEMENT] Modules included in "All" target are now visible in output of -modulesCLI flag. #3155
- [ENHANCEMENT] Added /debug/fgprofendpoint to debug running Cortex process usingfgprof. This adds up to the existing/debug/...endpoints. #3131
- [ENHANCEMENT] Blocks storage: optimised /api/v1/seriesfor blocks storage. (#2976)
- [BUGFIX] Ruler: when loading rules from "local" storage, check for directory after resolving symlink. #3137
- [BUGFIX] Query-frontend: Fixed rounding for incoming query timestamps, to be 100% Prometheus compatible. #2990
- [BUGFIX] Querier: Merge results from chunks and blocks ingesters when using streaming of results. #3013
- [BUGFIX] Querier: query /series from ingesters regardless the -querier.query-ingesters-withinsetting. #3035
- [BUGFIX] Blocks storage: Ingester is less likely to hit gRPC message size limit when streaming data to queriers. #3015
- [BUGFIX] Blocks storage: fixed memberlist support for the store-gateways and compactors ring used when blocks sharding is enabled. #3058 #3095
- [BUGFIX] Fix configuration for TLS server validation, TLS skip verify was hardcoded to true for all TLS configurations and prevented validation of server certificates. #3030
- [BUGFIX] Fixes the Alertmanager panicking when no -alertmanager.web.external-urlis provided. #3017
- [BUGFIX] Fixes the registration of the Alertmanager API metrics cortex_alertmanager_alerts_received_totalandcortex_alertmanager_alerts_invalid_total. #3065
- [BUGFIX] Fixes flag needs an argument: -config.expand-enverror. #3087
- [BUGFIX] An index optimisation actually slows things down when using caching. Moved it to the right location. #2973
- [BUGFIX] Ingester: If push request contained both valid and invalid samples, valid samples were ingested but not stored to WAL of the chunks storage. This has been fixed. #3067
- [BUGFIX] Cassandra: fixed consistency setting in the CQL session when creating the keyspace. #3105
- [BUGFIX] Ruler: Config API would return both the recordandalertinYAMLresponse keys even when one of them must be empty. #3120
- [BUGFIX] Index page now uses configured HTTP path prefix when creating links. #3126
- [BUGFIX] Purger: fixed deadlock when reloading of tombstones failed. #3182
- [BU...
Cortex 1.4.0-rc.1
This is the second release candidate for Cortex 1.4.0.
Changelog
- [CHANGE] TLS configuration for gRPC, HTTP and etcd clients is now marked as experimental. These features are not yet fully baked, and we expect possible small breaking changes in Cortex 1.5. #3198
- [CHANGE] Fixed store-gateway CLI flags inconsistencies. #3201
- -store-gateway.replication-factorflag renamed to- -store-gateway.sharding-ring.replication-factor
- -store-gateway.tokens-file-pathflag renamed to- store-gateway.sharding-ring.tokens-file-path
 
- [BUGFIX] Handle hash-collisions in the query path. Before this fix, Cortex could occasionally mix up two different series in a query, leading to invalid results, when -querier.ingester-streamingwas used. #3192
Cortex 1.4.0-rc.0
This Cortex releases features 112 contributions from 32 authors and exciting news!
Highlights
- Cortex blocks storage is now GA.
- Cassandra support for the chunks storage is now GA.
- Redis caching backend now supports Redis sentinel and Redis cluster too.
- Introduced shuffle sharding support to store-gateway blocks sharding (blocks storage).
- The ruler and alertmanager got several improvements
- Last, but not the least, many enhancements, optimisations and bug fixes.
Please refer to the changelog for full list of changes and improvements.
Changelog
- [CHANGE] Cassandra backend support is now GA (stable). #3180
- [CHANGE] Blocks storage is now GA (stable). The -experimentalprefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180- -experimental.blocks-storage.*flags renamed to- -blocks-storage.*
- -experimental.store-gateway.*flags renamed to- -store-gateway.*
- -experimental.querier.store-gateway-client.*flags renamed to- -querier.store-gateway-client.*
- -experimental.querier.store-gateway-addressesflag renamed to- -querier.store-gateway-addresses
 
- [CHANGE] Ingester: Removed deprecated untyped record from chunks WAL. Only if you are running v1.0or below, it is recommended to first upgrade tov1.1/v1.2/v1.3and run it for a day before upgrading tov1.4to avoid data loss. #3115
- [CHANGE] Distributor API endpoints are no longer served unless target is set to distributororall. #3112
- [CHANGE] Increase the default Cassandra client replication factor to 3. #3007
- [CHANGE] Blocks storage: removed the support to transfer blocks between ingesters on shutdown. When running the Cortex blocks storage, ingesters are expected to run with a persistent disk. The following metrics have been removed: #2996
- cortex_ingester_sent_files
- cortex_ingester_received_files
- cortex_ingester_received_bytes_total
- cortex_ingester_sent_bytes_total
 
- [CHANGE] The buckets for the cortex_chunk_store_index_lookups_per_querymetric have been changed to 1, 2, 4, 8, 16. #3021
- [CHANGE] Blocks storage: the operationlabel valuegetrangehas changed intoget_rangefor the metricsthanos_store_bucket_cache_operation_requests_totalandthanos_store_bucket_cache_operation_hits_total. #3000
- [CHANGE] Experimental Delete Series: /api/v1/admin/tsdb/delete_seriesand/api/v1/admin/tsdb/cancel_delete_requestpurger APIs to return status code204instead of200for success. #2946
- [CHANGE] Histogram cortex_memcache_request_duration_secondsmethodlabel value changes fromMemcached.GettoMemcached.GetBatchedfor batched lookups, and is not reported for non-batched lookups (label valueMemcached.GetMultiremains, and had exactly the same value asGetin nonbatched lookups). The same change applies to tracing spans. #3046
- [CHANGE] TLS server validation is now enabled by default, a new parameter tls_insecure_skip_verifycan be set to true to skip validation optionally. #3030
- [CHANGE] cortex_ruler_config_update_failures_totalhas been removed in favor ofcortex_ruler_config_last_reload_successful. #3056
- [CHANGE] ruler.evaluation_delay_durationfield in YAML config has been moved and renamed tolimits.ruler_evaluation_delay_duration. #3098
- [CHANGE] Removed obsolete results_cache.max_freshnessfrom YAML config (deprecated since Cortex 1.2). #3145
- [CHANGE] Removed obsolete -promql.lookback-deltaoption (deprecated since Cortex 1.2, replaced with-querier.lookback-delta). #3144
- [CHANGE] Cache: added support for Redis Cluster and Redis Sentinel. #2961
- The following changes have been made in Redis configuration:
- -redis.master_nameadded
- -redis.dbadded
- -redis.max-active-connschanged to- -redis.pool-size
- -redis.max-conn-lifetimechanged to- -redis.max-connection-age
- -redis.max-idle-connsremoved
- -redis.wait-on-pool-exhaustionremoved
 
- [FEATURE] Logging of the source IP passed along by a reverse proxy is now supported by setting the -server.log-source-ips-enabled. For non standard headers the settings-server.log-source-ips-headerand-server.log-source-ips-regexcan be used. #2985
- [FEATURE] Blocks storage: added shuffle sharding support to store-gateway blocks sharding. Added the following additional metrics to store-gateway: #3069
- cortex_bucket_stores_tenants_discovered
- cortex_bucket_stores_tenants_synced
 
- [FEATURE] Experimental blocksconvert: introduce an experimental tool blocksconvertto migrate long-term storage chunks to blocks. #3092 #3122 #3127 #3162
- [ENHANCEMENT] Add support for azure storage in China, German and US Government environments. #2988
- [ENHANCEMENT] Query-tee: added a small tolerance to floating point sample values comparison. #2994
- [ENHANCEMENT] Query-tee: add support for doing a passthrough of requests to preferred backend for unregistered routes #3018
- [ENHANCEMENT] Expose storage.aws.dynamodb.backoff_configconfiguration file field. #3026
- [ENHANCEMENT] Added cortex_request_message_bytesandcortex_response_message_byteshistograms to track received and sent gRPC message and HTTP request/response sizes. Addedcortex_inflight_requestsgauge to track number of inflight gRPC and HTTP requests. #3064
- [ENHANCEMENT] Publish ruler's ring metrics. #3074
- [ENHANCEMENT] Add config validation to the experimental Alertmanager API. Invalid configs are no longer accepted. #3053
- [ENHANCEMENT] Add "integration" as a label for cortex_alertmanager_notifications_totalandcortex_alertmanager_notifications_failed_totalmetrics. #3056
- [ENHANCEMENT] Add cortex_ruler_config_last_reload_successfulandcortex_ruler_config_last_reload_successful_secondsto check status of users rule manager. #3056
- [ENHANCEMENT] The configuration validation now fails if an empty YAML node has been set for a root YAML config property. #3080
- [ENHANCEMENT] Memcached dial() calls now have a circuit-breaker to avoid hammering a broken cache. #3051, #3189
- [ENHANCEMENT] -ruler.evaluation-delay-durationis now overridable as a per-tenant limit,ruler_evaluation_delay_duration. #3098
- [ENHANCEMENT] Add TLS support to etcd client. #3102
- [ENHANCEMENT] When a tenant accesses the Alertmanager UI or its API, if we have valid -alertmanager.configs.fallbackwe'll use that to start the manager and avoid failing the request. #3073
- [ENHANCEMENT] Add DELETE api/v1/rules/{namespace}to the Ruler. It allows all the rule groups of a namespace to be deleted. #3120
- [ENHANCEMENT] Experimental Delete Series: Retry processing of Delete requests during failures. #2926
- [ENHANCEMENT] Improve performance of QueryStream() in ingesters. #3177
- [ENHANCEMENT] Modules included in "All" target are now visible in output of -modulesCLI flag. #3155
- [ENHANCEMENT] Added /debug/fgprofendpoint to debug running Cortex process usingfgprof. This adds up to the existing/debug/...endpoints. #3131
- [ENHANCEMENT] Blocks storage: optimised /api/v1/seriesfor blocks storage. (#2976)
- [BUGFIX] Ruler: when loading rules from "local" storage, check for directory after resolving symlink. #3137
- [BUGFIX] Query-frontend: Fixed rounding for incoming query timestamps, to be 100% Prometheus compatible. #2990
- [BUGFIX] Querier: Merge results from chunks and blocks ingesters when using streaming of results. #3013
- [BUGFIX] Querier: query /series from ingesters regardless the -querier.query-ingesters-withinsetting. #3035
- [BUGFIX] Blocks storage: Ingester is less likely to hit gRPC message size limit when streaming data to queriers. #3015
- [BUGFIX] Blocks storage: fixed memberlist support for the store-gateways and compactors ring used when blocks sharding is enabled. #3058 #3095
- [BUGFIX] Fix configuration for TLS server validation, TLS skip verify was hardcoded to true for all TLS configurations and prevented validation of server certificates. #3030
- [BUGFIX] Fixes the Alertmanager panicking when no -alertmanager.web.external-urlis provided. #3017
- [BUGFIX] Fixes the registration of the Alertmanager API metrics cortex_alertmanager_alerts_received_totalandcortex_alertmanager_alerts_invalid_total. #3065
- [BUGFIX] Fixes flag needs an argument: -config.expand-enverror. #3087
- [BUGFIX] An index optimisation actually slows things down when using caching. Moved it to the right location. #2973
- [BUGFIX] Ingester: If push request contained both valid and invalid samples, valid samples were ingested but not stored to WAL of the chunks storage. This has been fixed. #3067
- [BUGFIX] Cassandra: fixed consistency setting in the CQL session when creating the keyspace. #3105
- [BUGFIX] Ruler: Config API would return both the recordandalertinYAMLresponse keys even when one of them must be empty. #3120
- [BUGFIX] Index page now uses configured HTTP path prefix when creating links. #3126
- [BUGFIX] Purger: fixed deadlock when reloading of tombstones failed. #3182
- [BUGFIX] Fixed panic in flusher job, when error writing chunks to the store would cause "idle" chunks to be flushed, which triggered panic. #3140
- [BUGFIX] Index page no longer shows links that are not valid for running Cortex instance. #3133
- [BUGFIX] Configs: prevent validation of templates to fail when using template functions. #3157
- [BUGFIX] Configuring the S3 URL with an @but without username and password doesn't enable the AWS static credentials anymore. #3170
- [BUGFIX] Limit errors on ranged queries (api/v1/query_range) no longer return a status code500but422instead. #3167