Skip to content

# RFC: ACL-Aware Routing Strategy for Shard Assignment #18829

@atris

Description

@atris

Is your feature request related to a problem? Please describe


Motivation

In multi-tenant or access-controlled environments, documents often carry metadata indicating which users or groups can access them. For example, Box, Google Drive, or enterprise DMS platforms assign ACL groups or tenant identifiers to every document.

Currently, OpenSearch does not offer a built-in way to route documents based on such ACL or tenant metadata. This leads to:

  • Related documents (same group/tenant) being scattered across shards
  • Poor locality for filtered ACL queries
  • Redundant I/O and cache misses for group-level operations

To address this, we propose a routing strategy based on acl_group, tenant_id, or other access scope fields, integrated via ingest and search pipelines.

Describe the solution you'd like


Proposal

Introduce two new processors:

  1. AclRoutingProcessor (ingest)
  2. AclRoutingSearchProcessor (search)

Each processor will extract an ACL-related field from the document or query, normalize it, and assign a stable _routing value (ingest) or SearchRequest.routing() value (search) to ensure group-based locality.


Use Cases

  • Multi-tenant SaaS platforms
  • Group-based document access control
  • Enterprise folder or team-based indexing
  • ACL-filtered search acceleration

Configuration

Ingest Pipeline Processor

{
  "acl_routing": {
    "acl_field": "acl_group",         // Required
    "skip_missing": false,            // Optional: skip or fail on missing field
    "override_existing": true         // Optional: overwrite existing _routing
  }
}

Search Pipeline Processor

{
  "acl_routing_search": {
    "acl_field": "acl_group",         // Required
    "source": "query",                // "query", "params", or "metadata"
    "override_existing": true
  }
}

Routing Logic

  1. Extract the value of the configured acl_field (e.g. "engineering_team")
  2. Normalize: trim, lowercase, sanitize if needed
  3. Compute routing key:
    • Use raw string or hashed version
    • Optional hash strategy
  4. Set _routing (ingest) or SearchRequest.routing() (search)

Implementation Plan

Ingest Processor: AclRoutingProcessor

  • Class: AclRoutingProcessor in org.opensearch.ingest.common
  • Interface: extends AbstractProcessor
  • Logic:
    • execute(IngestDocument) sets _routing from acl_field

Search Processor: AclRoutingSearchProcessor

  • Class: AclRoutingSearchProcessor in org.opensearch.search.pipeline
  • Interface: SearchRequestProcessor
  • Logic:
    • Reads acl_group from query or params
    • Sets SearchRequest.routing()

Registration

  • IngestCommonPlugin.getProcessors()
  • SearchPipelinePlugin.getRequestProcessors()

Testing Plan

Unit Tests

  • Field extraction
  • Missing/invalid field handling
  • Override vs preserve logic
  • Hash consistency

Integration Tests

  • Indexing documents with ACL groups
  • Searching with correct routing applied
  • Shard distribution and query fan-out verification

Example Pipeline Usage

Ingest Pipeline

PUT _ingest/pipeline/acl-routing
{
  "processors": [
    {
      "acl_routing": {
        "acl_field": "acl_group"
      }
    }
  ]
}

Search Pipeline

PUT _search/pipeline/acl-routing-search
{
  "processors": [
    {
      "acl_routing_search": {
        "acl_field": "acl_group",
        "source": "params"
      }
    }
  ]
}

Benefits

  • Ensures tenant/group-based locality in shard assignment
  • Reduces fan-out and improves filtered query performance
  • Promotes access-aware data partitioning
  • Pairs well with hierarchical routing or composite strategies

Future Extensions

  • Composite routing using both folder_path and acl_group
  • Static ACL group → shard maps for stronger locality
  • Shard explainability via _routing_explain API

Backward Compatibility

  • No impact unless processor is explicitly enabled
  • Fully opt-in, no core routing changes required

Summary

ACL-aware routing helps colocate data for shared access groups, improves cache efficiency, and minimizes shard fan-out. This RFC proposes a clean, modular implementation via ingest and search pipeline processors, providing a scalable path to ACL-locality without touching core routing logic.

Related component

Search

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    SearchSearch query, autocomplete ...etcShardManagement:RoutingdiscussIssues intended to help drive brainstorming and decision makingenhancementEnhancement or improvement to existing feature or requestuntriaged

    Type

    No type

    Projects

    Status

    ✅ Done

    Status

    ✅ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions