Skip to content

Retention and Ruler not working with Azure Service Principal #18677

@tobiabocchi

Description

@tobiabocchi

Bug Description

I'm encountering an issue with Loki (using Azure as the object store and a Service Principal for auth): retention is not being applied correctly - Loki fails to delete old records from Azure Blob Storage.

Logs show an error immediately after the "initiating cleanup of obsolete entries" message.

This error appears even though ruler.enabled is explicitly set to false in loki-values.yaml. Enabling the ruler causes the error to shift from loki-chunks to loki-ruler container access, suggesting that the ruler block is not fully disabled and Loki attempts to connect regardless.

Furthermore, when the ruler is enabled, the generated config.yaml does not render use_service_principal: true under the ruler block. This leads me to suspect that Loki defaults to connection string auth for the ruler's blob access, which fails with the Service Principal setup.

Reproduce the Bug

Steps to reproduce the behavior:

  1. Start Loki 3.5.3 (latest at the time of writing) using helm with the values.yaml reported below which specifies Azure as storage backend and Service Principal for auth

  2. Inspect the logs of any of the loki-backend pods after a few minutes to find the following error:

    level=info ts=2025-07-31T08:22:41.607553027Z caller=memberlist_client.go:552 msg="initiating cleanup of obsolete entries"
    level=error ts=2025-07-31T08:22:44.473077958Z caller=ruler.go:576 msg="unable to list rules" err="-> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /src/loki/vendor/github.com/Azure/azure-storage-blob-go/azblob/zc_storage_error.go:42\n===== RESPONSE ERROR (ServiceCode=AuthenticationFailed) =====\nDescription=Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:REDACTED\nTime:2025-07-31T08:22:44.3926300Z, Details: \n   AuthenticationErrorDetail: The MAC signature found in the HTTP request 'REDACTED' is not the same as any computed signature. Server used following string to sign: 'GET\n\n\n\n\n\n\n\n\n\n\n\nx-ms-client-request-id:REDACTED\nx-ms-date:Thu, 31 Jul 2025 08:22:44 GMT\nx-ms-version:2020-04-08\n/REDACTED/loki-chunks\ncomp:list\ndelimiter:\nprefix:rules/\nrestype:container\ntimeout:31'.\n   Code: AuthenticationFailed\n   GET https://REDACTED.blob.core.windows.net/loki-chunks?comp=list&delimiter=&prefix=rules%2F&restype=container&timeout=31\n   Authorization: REDACTED\n   User-Agent: [Azure-Storage/0.14 (go1.24.5; linux)]\n   X-Ms-Client-Request-Id: [REDACTED]\n   X-Ms-Date: [Thu, 31 Jul 2025 08:22:44 GMT]\n   X-Ms-Version: [2020-04-08]\n   --------------------------------------------------------------------------------\n   RESPONSE Status: 403 Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\n   Content-Length: [781]\n   Content-Type: [application/xml]\n   Date: [Thu, 31 Jul 2025 08:22:44 GMT]\n   Server: [Microsoft-HTTPAPI/2.0]\n   X-Ms-Error-Code: [AuthenticationFailed]\n   X-Ms-Request-Id: [REDACTED]\n\n\n"
    
  3. The logs are not being deleted in the Azure blob storage after the retention period set to 72h (3 days)

Expected behavior

I see two problems right now:

  1. The logs should be deleted after the specified retention period and the ruler should not run if disabled in values.yaml.
  2. Since it's listed as one of the supported auth methods, it should be possible to enable the ruler and use Azure Service Principal as auth method

Environment

  • Infrastructure: Kubernetes
  • Deployment tool: helm

Config

Here's my loki-values.yaml for reference (REDACTED where necessary):

deploymentMode: SimpleScalable

ruler:
  enabled: false

loki:
  auth_enabled: false
  schemaConfig:
    configs:
      - from: "2024-04-01"
        store: tsdb
        object_store: azure
        schema: v13
        index:
          prefix: loki_index_
          period: 24h
  storage_config:
    azure:
      use_service_principal: true
      tenant_id: "REDACTED"
      client_id: "REDACTED"
      client_secret: "REDACTED"
  ingester:
    chunk_encoding: snappy
  querier:
    max_concurrent: 4
  pattern_ingester:
    enabled: true
  limits_config:
    allow_structured_metadata: true
    volume_enabled: true
  storage:
    type: azure
    azure:
      accountName: "REDACTED"
    bucketNames:
      chunks: "loki-chunks"
      ruler: "loki-ruler" # this field is required with Loki 3.5.3 (latest) even though ruler is disabled
  limits_config:
    retention_period: 72h

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions