Skip to content

Conversation

atris
Copy link
Contributor

@atris atris commented Jul 25, 2025

Introduces two new processors to enable document routing based on ACL metadata:

  • AclRoutingProcessor (ingest pipeline):

    • Extracts ACL field value and generates deterministic routing using MurmurHash3
    • Configurable options: acl_field, target_field (default: _routing),
      ignore_missing, override_existing
    • Ensures documents with same ACL values colocate on same shard
  • AclRoutingSearchProcessor (search pipeline):

    • Automatically extracts ACL values from term/terms/bool queries
    • Sets routing on search requests to target specific shards
    • Supports nested bool query traversal (must/filter/should clauses)
    • Configurable extraction with extract_from_query flag

Implementation details:

  • Both processors use identical MurmurHash3.hash128() with Base64 encoding
    for consistent routing value generation
  • Registered in IngestCommonModulePlugin and SearchPipelineCommonModulePlugin
  • Comprehensive unit tests and integration tests for both processors
  • Follows existing processor patterns (similar to HierarchicalRoutingProcessor)

Use case: Improves query performance in multi-tenant environments by ensuring
tenant-specific documents are colocated and queries are routed to relevant shards.

Resolves #18829

Signed-off-by: [Atri Sharma] [email protected]

atris added 2 commits July 25, 2025 12:32
 Shard Placement

Actually add the files

Signed-off-by: Atri Sharma <[email protected]>
  Introduces two new processors to enable document routing based on ACL metadata:

  - AclRoutingProcessor (ingest pipeline):
    * Extracts ACL field value and generates deterministic routing using MurmurHash3
    * Configurable options: acl_field, target_field (default: _routing),
      ignore_missing, override_existing
    * Ensures documents with same ACL values colocate on same shard

  - AclRoutingSearchProcessor (search pipeline):
    * Automatically extracts ACL values from term/terms/bool queries
    * Sets routing on search requests to target specific shards
    * Supports nested bool query traversal (must/filter/should clauses)
    * Configurable extraction with extract_from_query flag

  Implementation details:
  - Both processors use identical MurmurHash3.hash128() with Base64 encoding
    for consistent routing value generation
  - Registered in IngestCommonModulePlugin and SearchPipelineCommonModulePlugin
  - Comprehensive unit tests and integration tests for both processors
  - Follows existing processor patterns (similar to HierarchicalRoutingProcessor)

  Use case: Improves query performance in multi-tenant environments by ensuring
  tenant-specific documents are colocated and queries are routed to relevant shards.

  Resolves opensearch-project#18829

Signed-off-by: Atri Sharma <[email protected]>
@atris atris requested a review from a team as a code owner July 25, 2025 18:30
@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc labels Jul 25, 2025
@atris
Copy link
Contributor Author

atris commented Jul 25, 2025

@msfroh

Signed-off-by: Atri Sharma <[email protected]>
Copy link
Contributor

❌ Gradle check result for 3a85104: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@atris atris closed this Jul 28, 2025
@atris atris reopened this Jul 28, 2025
Copy link
Contributor

✅ Gradle check result for 3a85104: SUCCESS

Copy link

codecov bot commented Jul 28, 2025

Codecov Report

❌ Patch coverage is 95.06173% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.78%. Comparing base (c1dfa6a) to head (684f3cd).
⚠️ Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
.../opensearch/ingest/common/AclRoutingProcessor.java 91.42% 1 Missing and 2 partials ⚠️
...rch/pipeline/common/AclRoutingSearchProcessor.java 97.77% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18834      +/-   ##
============================================
+ Coverage     72.75%   72.78%   +0.02%     
- Complexity    68520    68567      +47     
============================================
  Files          5570     5574       +4     
  Lines        314998   315254     +256     
  Branches      45697    45754      +57     
============================================
+ Hits         229185   229449     +264     
+ Misses        67260    67205      -55     
- Partials      18553    18600      +47     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Atri Sharma <[email protected]>
Copy link
Contributor

❌ Gradle check result for fb10eca: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Atri Sharma <[email protected]>
Copy link
Contributor

❌ Gradle check result for 684f3cd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@atris
Copy link
Contributor Author

atris commented Jul 30, 2025

Flaky tests #14509

@atris atris closed this Jul 30, 2025
@atris atris reopened this Jul 30, 2025
@github-actions github-actions bot added discuss Issues intended to help drive brainstorming and decision making ShardManagement:Routing labels Jul 30, 2025
Copy link
Contributor

✅ Gradle check result for 684f3cd: SUCCESS

Copy link
Contributor

@msfroh msfroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, @atris!

I added a comment on the search processor just to capture my thoughts on the query visitor. The conclusion is that your implementation is good, but I figure if I had to think about it, I should write down those thoughts. (I'll resolve my own comment before merging.)

@msfroh msfroh merged commit 82b57a8 into opensearch-project:main Aug 1, 2025
58 of 60 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Shard Management Project Board Aug 1, 2025
sunqijun1 pushed a commit to sunqijun1/OpenSearch that referenced this pull request Aug 4, 2025
…pensearch-project#18834)

Introduces two new processors to enable document routing based on ACL metadata:

  - AclRoutingProcessor (ingest pipeline):
    * Extracts ACL field value and generates deterministic routing using MurmurHash3
    * Configurable options: acl_field, target_field (default: _routing),
      ignore_missing, override_existing
    * Ensures documents with same ACL values colocate on same shard

  - AclRoutingSearchProcessor (search pipeline):
    * Automatically extracts ACL values from term/terms/bool queries
    * Sets routing on search requests to target specific shards
    * Supports nested bool query traversal (must/filter/should clauses)
    * Configurable extraction with extract_from_query flag

  Implementation details:
  - Both processors use identical MurmurHash3.hash128() with Base64 encoding
    for consistent routing value generation
  - Registered in IngestCommonModulePlugin and SearchPipelineCommonModulePlugin
  - Comprehensive unit tests and integration tests for both processors
  - Follows existing processor patterns (similar to HierarchicalRoutingProcessor)

  Use case: Improves query performance in multi-tenant environments by ensuring
  tenant-specific documents are colocated and queries are routed to relevant shards.

  Resolves opensearch-project#18829

--------

Signed-off-by: Atri Sharma <[email protected]>
Signed-off-by: sunqijun.jun <[email protected]>
tandonks pushed a commit to tandonks/OpenSearch that referenced this pull request Aug 5, 2025
…pensearch-project#18834)

Introduces two new processors to enable document routing based on ACL metadata:

  - AclRoutingProcessor (ingest pipeline):
    * Extracts ACL field value and generates deterministic routing using MurmurHash3
    * Configurable options: acl_field, target_field (default: _routing),
      ignore_missing, override_existing
    * Ensures documents with same ACL values colocate on same shard

  - AclRoutingSearchProcessor (search pipeline):
    * Automatically extracts ACL values from term/terms/bool queries
    * Sets routing on search requests to target specific shards
    * Supports nested bool query traversal (must/filter/should clauses)
    * Configurable extraction with extract_from_query flag

  Implementation details:
  - Both processors use identical MurmurHash3.hash128() with Base64 encoding
    for consistent routing value generation
  - Registered in IngestCommonModulePlugin and SearchPipelineCommonModulePlugin
  - Comprehensive unit tests and integration tests for both processors
  - Follows existing processor patterns (similar to HierarchicalRoutingProcessor)

  Use case: Improves query performance in multi-tenant environments by ensuring
  tenant-specific documents are colocated and queries are routed to relevant shards.

  Resolves opensearch-project#18829

--------

Signed-off-by: Atri Sharma <[email protected]>
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Sep 26, 2025
…pensearch-project#18834)

Introduces two new processors to enable document routing based on ACL metadata:

  - AclRoutingProcessor (ingest pipeline):
    * Extracts ACL field value and generates deterministic routing using MurmurHash3
    * Configurable options: acl_field, target_field (default: _routing),
      ignore_missing, override_existing
    * Ensures documents with same ACL values colocate on same shard

  - AclRoutingSearchProcessor (search pipeline):
    * Automatically extracts ACL values from term/terms/bool queries
    * Sets routing on search requests to target specific shards
    * Supports nested bool query traversal (must/filter/should clauses)
    * Configurable extraction with extract_from_query flag

  Implementation details:
  - Both processors use identical MurmurHash3.hash128() with Base64 encoding
    for consistent routing value generation
  - Registered in IngestCommonModulePlugin and SearchPipelineCommonModulePlugin
  - Comprehensive unit tests and integration tests for both processors
  - Follows existing processor patterns (similar to HierarchicalRoutingProcessor)

  Use case: Improves query performance in multi-tenant environments by ensuring
  tenant-specific documents are colocated and queries are routed to relevant shards.

  Resolves opensearch-project#18829

--------

Signed-off-by: Atri Sharma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc ShardManagement:Routing
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

# RFC: ACL-Aware Routing Strategy for Shard Assignment
2 participants