Skip to content

Conversation

@DarshitChanpura
Copy link
Member

@DarshitChanpura DarshitChanpura commented Oct 13, 2025

Description

Implements resource-access-control for workflow and workflow_state.

Related Issues

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@DarshitChanpura
Copy link
Member Author

CI will resolve once: opensearch-project/security#5677 is merged.

@DarshitChanpura DarshitChanpura marked this pull request as ready for review October 14, 2025 15:50
@DarshitChanpura DarshitChanpura force-pushed the resource-permissions branch 2 times, most recently from dace39b to 69009e6 Compare October 14, 2025 22:58
@DarshitChanpura
Copy link
Member Author

CI blocked by #1252

Copy link
Member

@owaiskazi19 owaiskazi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st iteration

Copy link
Member

@owaiskazi19 owaiskazi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2nd iteration

@dbwiddis dbwiddis force-pushed the resource-permissions branch 2 times, most recently from 61b0a77 to 20fa46b Compare November 5, 2025 22:48
@codecov
Copy link

codecov bot commented Nov 7, 2025

Codecov Report

❌ Patch coverage is 70.12987% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.46%. Comparing base (60c3da9) to head (f65076d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...a/org/opensearch/flowframework/model/Template.java 50.00% 8 Missing and 2 partials ⚠️
.../opensearch/flowframework/model/WorkflowState.java 52.38% 8 Missing and 2 partials ⚠️
.../transport/DeprovisionWorkflowTransportAction.java 57.89% 8 Missing ⚠️
.../org/opensearch/flowframework/util/ParseUtils.java 30.00% 3 Missing and 4 partials ⚠️
.../opensearch/flowframework/FlowFrameworkPlugin.java 25.00% 3 Missing ⚠️
...owframework/transport/GetWorkflowStateRequest.java 0.00% 3 Missing ⚠️
...flowframework/transport/handler/SearchHandler.java 25.00% 1 Missing and 2 partials ⚠️
...framework/indices/FlowFrameworkIndicesHandler.java 0.00% 1 Missing ⚠️
...ework/transport/CreateWorkflowTransportAction.java 85.71% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1251      +/-   ##
============================================
- Coverage     77.84%   77.46%   -0.39%     
- Complexity     1223     1260      +37     
============================================
  Files           103      106       +3     
  Lines          5778     5902     +124     
  Branches        599      612      +13     
============================================
+ Hits           4498     4572      +74     
- Misses          992     1034      +42     
- Partials        288      296       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dbwiddis dbwiddis force-pushed the resource-permissions branch 2 times, most recently from 811a01f to 3f61633 Compare November 10, 2025 19:46
Copy link
Member

@dbwiddis dbwiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarshitChanpura I've tried to fix most issues I've found but don't understand this one.

The FlowFrameworkSecureRestAPI test is failing to find a not found exception after deleting a workflow (line 404)

        // Invoke delete API
        response = deleteWorkflow(aliceClient, workflowId);
        assertEquals(RestStatus.OK, TestHelpers.restStatus(response));
        // Invoke status API with failure
        ResponseException exception = expectThrows(ResponseException.class, () -> getWorkflowStatus(aliceClient, workflowId, false));
        assertEquals(RestStatus.NOT_FOUND.getStatus(), exception.getResponse().getStatusLine().getStatusCode());

It may be something in your resource sharing is preventing the deletion (but still returning OK for the delete call?).

Please investigate.

If it helps, Q tells me:

I found the issue! Looking at the DeleteWorkflowTransportAction, I can see that the delete operation is using the sdkClient.deleteDataObjectAsync() method to delete from the global context index, but there's a problem with how the resource authorization is being handled.

The issue is in the verifyResourceAccessAndProcessRequest call. When resource authorization is enabled (which happens when the FlowFrameworkResourceSharingExtension is active), it calls the first lambda () -> executeDeleteRequest(request, tenantId, listener, context) directly without going through the user permission checks.

However, the problem is that when resource authorization is enabled, the workflow might still be accessible through the resource sharing system even after it's deleted from the main index. The getWorkflowStatus call in the test is likely going through the resource sharing system, which might still have a reference to the workflow.

The issue is that the delete operation doesn't properly clean up the workflow from the resource sharing system. When verifyResourceAccessAndProcessRequest is called with resource authorization enabled, it should ensure that the workflow is also removed from the resource sharing system.

Looking at the code, the delete operation only deletes from the GLOBAL_CONTEXT_INDEX using the sdkClient, but it doesn't explicitly remove the workflow from the resource sharing authorization system.

The fix would be to ensure that when a workflow is deleted and resource authorization is enabled, the workflow is also removed from the resource sharing system's authorization records. This might require calling additional cleanup methods on the resource sharing client to remove the workflow's authorization entries.

The specific issue is that the resource sharing system maintains its own records of which resources exist and who has access to them, and these records aren't being cleaned up when the workflow is deleted from the main index.

@DarshitChanpura
Copy link
Member Author

@dbwiddis I was able to reproduce the failure and root-caused it to early response return without waiting for async steps to complete:
https://github.com/opensearch-project/flow-framework/blob/main/src/main/java/org/opensearch/flowframework/transport/DeprovisionWorkflowTransportAction.java#L359

i have consistently been able to see success for the test run after addressing the fix.

@dbwiddis
Copy link
Member

i have consistently been able to see success for the test run after addressing the fix.

Nice... unit tests failing on a class cast. Check the argument index if you've added parameters to a method, fixed a few previous tests where action listener was arg 1 and moved to 3...

@cwperks
Copy link
Member

cwperks commented Nov 13, 2025

Looks like the addition of another matrix argument on the security integration tests is preventing a merge.

I can help here if needed. There's a bit of a timing issue updating the branch protection rules.

I suggest asking an admin to override and merge and then update the branch protection rules post-merge.

@gaiksaya
Copy link
Member

@gaiksaya can we update our required tests to include the "true" here?

Done!

@dbwiddis
Copy link
Member

@DarshitChanpura this test is failing again, thought you had fixed it previously :)

I'm no longer able to reproduce this locally so not sure what is going on here. But it is flaky.

Ah, now I remember. Deleting the state is asynchronous and occurs AFTER the workflow is deleted.

sdkClient.deleteDataObjectAsync(deleteRequest).whenComplete((r, throwable) -> {
context.restore();
if (throwable == null) {
try {
DeleteResponse response = DeleteResponse.fromXContent(r.parser());
listener.onResponse(response);
} catch (Exception e) {
logger.error("Failed to parse delete response", e);
listener.onFailure(new FlowFrameworkException("Failed to parse delete response", RestStatus.INTERNAL_SERVER_ERROR));
}
} else {
Exception exception = SdkClientUtils.unwrapAndConvertToException(throwable);
String errorMessage = ParameterizedMessageFactory.INSTANCE.newMessage("Failed to delete template {}", workflowId)
.getFormattedMessage();
logger.error(errorMessage, exception);
listener.onFailure(new FlowFrameworkException(errorMessage, RestStatus.INTERNAL_SERVER_ERROR));
}
});
// Whether to force deletion of corresponding state
final boolean clearStatus = Booleans.parseBoolean(request.getParams().get(CLEAR_STATUS), false);
ActionListener<DeleteResponse> stateListener = ActionListener.wrap(response -> {
logger.info("Deleted workflow state doc: {}", workflowId);
}, exception -> { logger.info("Failed to delete workflow state doc: {}", workflowId, exception); });
flowFrameworkIndicesHandler.canDeleteWorkflowStateDoc(workflowId, tenantId, clearStatus, canDelete -> {
if (Boolean.TRUE.equals(canDelete)) {
flowFrameworkIndicesHandler.deleteFlowFrameworkSystemIndexDoc(workflowId, tenantId, stateListener);
}
}, stateListener);

We've handled this in other ITs using a retry-until-it-passes approach:

Response deleteResponse = deleteWorkflow(client(), workflowId);
assertEquals(RestStatus.OK, TestHelpers.restStatus(deleteResponse));
// Verify state doc is deleted
assertBusy(() -> { getAndAssertWorkflowStatusNotFound(client(), workflowId); }, 30, TimeUnit.SECONDS);

DarshitChanpura and others added 23 commits November 13, 2025 11:06
…e-sharing tests

Signed-off-by: Darshit Chanpura <[email protected]>

# Conflicts:
#	.github/workflows/test_security.yml
Signed-off-by: Darshit Chanpura <[email protected]>
Co-authored-by: Owais Kazi <[email protected]>
Signed-off-by: Daniel Widdis <[email protected]>
Signed-off-by: Daniel Widdis <[email protected]>
…t action-requests as DocRequests

Signed-off-by: Darshit Chanpura <[email protected]>
Signed-off-by: Darshit Chanpura <[email protected]>
Signed-off-by: Darshit Chanpura <[email protected]>
Signed-off-by: Darshit Chanpura <[email protected]>
@dbwiddis dbwiddis force-pushed the resource-permissions branch from a389f76 to f65076d Compare November 13, 2025 19:06
@dbwiddis dbwiddis merged commit 0687108 into opensearch-project:main Nov 14, 2025
64 of 65 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Onboard flow-framework plugin to Centralized Resource AuthZ framework

5 participants