-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Is your feature request related to a problem? Please describe
Motivation
In multi-tenant or access-controlled environments, documents often carry metadata indicating which users or groups can access them. For example, Box, Google Drive, or enterprise DMS platforms assign ACL groups or tenant identifiers to every document.
Currently, OpenSearch does not offer a built-in way to route documents based on such ACL or tenant metadata. This leads to:
- Related documents (same group/tenant) being scattered across shards
- Poor locality for filtered ACL queries
- Redundant I/O and cache misses for group-level operations
To address this, we propose a routing strategy based on acl_group
, tenant_id
, or other access scope fields, integrated via ingest and search pipelines.
Describe the solution you'd like
Proposal
Introduce two new processors:
AclRoutingProcessor
(ingest)AclRoutingSearchProcessor
(search)
Each processor will extract an ACL-related field from the document or query, normalize it, and assign a stable _routing
value (ingest) or SearchRequest.routing()
value (search) to ensure group-based locality.
Use Cases
- Multi-tenant SaaS platforms
- Group-based document access control
- Enterprise folder or team-based indexing
- ACL-filtered search acceleration
Configuration
Ingest Pipeline Processor
{
"acl_routing": {
"acl_field": "acl_group", // Required
"skip_missing": false, // Optional: skip or fail on missing field
"override_existing": true // Optional: overwrite existing _routing
}
}
Search Pipeline Processor
{
"acl_routing_search": {
"acl_field": "acl_group", // Required
"source": "query", // "query", "params", or "metadata"
"override_existing": true
}
}
Routing Logic
- Extract the value of the configured
acl_field
(e.g."engineering_team"
) - Normalize: trim, lowercase, sanitize if needed
- Compute routing key:
- Use raw string or hashed version
- Optional hash strategy
- Set
_routing
(ingest) orSearchRequest.routing()
(search)
Implementation Plan
Ingest Processor: AclRoutingProcessor
- Class:
AclRoutingProcessor
inorg.opensearch.ingest.common
- Interface: extends
AbstractProcessor
- Logic:
execute(IngestDocument)
sets_routing
fromacl_field
Search Processor: AclRoutingSearchProcessor
- Class:
AclRoutingSearchProcessor
inorg.opensearch.search.pipeline
- Interface:
SearchRequestProcessor
- Logic:
- Reads
acl_group
fromquery
orparams
- Sets
SearchRequest.routing()
- Reads
Registration
IngestCommonPlugin.getProcessors()
SearchPipelinePlugin.getRequestProcessors()
Testing Plan
Unit Tests
- Field extraction
- Missing/invalid field handling
- Override vs preserve logic
- Hash consistency
Integration Tests
- Indexing documents with ACL groups
- Searching with correct routing applied
- Shard distribution and query fan-out verification
Example Pipeline Usage
Ingest Pipeline
PUT _ingest/pipeline/acl-routing
{
"processors": [
{
"acl_routing": {
"acl_field": "acl_group"
}
}
]
}
Search Pipeline
PUT _search/pipeline/acl-routing-search
{
"processors": [
{
"acl_routing_search": {
"acl_field": "acl_group",
"source": "params"
}
}
]
}
Benefits
- Ensures tenant/group-based locality in shard assignment
- Reduces fan-out and improves filtered query performance
- Promotes access-aware data partitioning
- Pairs well with hierarchical routing or composite strategies
Future Extensions
- Composite routing using both
folder_path
andacl_group
- Static ACL group → shard maps for stronger locality
- Shard explainability via
_routing_explain
API
Backward Compatibility
- No impact unless processor is explicitly enabled
- Fully opt-in, no core routing changes required
Summary
ACL-aware routing helps colocate data for shared access groups, improves cache efficiency, and minimizes shard fan-out. This RFC proposes a clean, modular implementation via ingest and search pipeline processors, providing a scalable path to ACL-locality without touching core routing logic.
Related component
Search
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status