Skip to content

Fix RDS IAM Cross Account Auth and Clarify Dev Container Docs#27632

Open
aniruddhaadak80 wants to merge 8 commits intoopen-metadata:mainfrom
aniruddhaadak80:fix-rds-iam-and-dev-docs-27552-27517
Open

Fix RDS IAM Cross Account Auth and Clarify Dev Container Docs#27632
aniruddhaadak80 wants to merge 8 commits intoopen-metadata:mainfrom
aniruddhaadak80:fix-rds-iam-and-dev-docs-27552-27517

Conversation

@aniruddhaadak80
Copy link
Copy Markdown

@aniruddhaadak80 aniruddhaadak80 commented Apr 22, 2026

This PR addresses two issues: #27552 and #27517. It implements support for the optional assumeRoleArn parameter in AwsRdsDatabaseAuthenticationProvider.java to enable cross-account IAM authentication for RDS. It also enhances the DEVELOPER.md documentation with a dedicated section for Dev Containers and adds clarifying comments to .devcontainer configs to clarify that post-create.sh is the primary initialization script.


Summary by Gitar

  • Database resilience:
    • Implemented AutoCloseable in AwsRdsDatabaseAuthenticationProvider to ensure proper cleanup of stsClientCache and credentialsProviderCache.
    • Added hard-delete cleanup for testCaseResult and testCaseResolutionStatus in TestCaseRepository.
  • Search index stability:
    • Added isAoss() detection to OpenSearchClient to support AWS OpenSearch Serverless configurations.
    • Added defensive null-checks in SearchIndexClusterValidator and SearchClusterMetrics for cluster statistics to prevent runtime exceptions.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings April 22, 2026 14:14
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds cross-account support for AWS RDS IAM auth token generation by optionally assuming an STS role, and updates developer documentation/devcontainer configs to clarify Dev Container initialization.

Changes:

  • Add optional assumeRoleArn JDBC query param support in AwsRdsDatabaseAuthenticationProvider using STS assume-role credentials.
  • Document Dev Container workflows in DEVELOPER.md, clarifying that post-create.sh is the one-time initialization script shared by both devcontainer modes.
  • Add clarifying inline notes to Dev Container postCreateCommand configuration.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/util/jdbi/AwsRdsDatabaseAuthenticationProvider.java Add optional STS assume-role credentials provider for cross-account RDS IAM token generation.
DEVELOPER.md Add Dev Containers section clarifying the two configs and initialization flow.
.devcontainer/full-stack/devcontainer.json Add clarification near postCreateCommand (but currently via JSON comment).
.devcontainer/dev/devcontainer.json Add clarification near postCreateCommand (but currently via JSON comment).

Comment on lines +46 to +66
AwsCredentialsProvider credentialsProvider = DefaultCredentialsProvider.create();

if (assumeRoleArn != null) {
StsClient stsClient =
StsClient.builder()
.region(Region.of(awsRegion))
.credentialsProvider(credentialsProvider)
.build();

AssumeRoleRequest assumeRoleRequest =
AssumeRoleRequest.builder()
.roleArn(assumeRoleArn)
.roleSessionName("OpenMetadata-RDS-IAM-Auth")
.build();

credentialsProvider =
StsAssumeRoleCredentialsProvider.builder()
.stsClient(stsClient)
.refreshRequest(assumeRoleRequest)
.build();
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assumeRoleArn triggers creating a new StsClient + StsAssumeRoleCredentialsProvider on every authenticate() call. In the IAM-auth path this runs per DB connection, so this will repeatedly call STS (latency + throttling risk) and also leaves the StsClient/provider unclosed, potentially leaking HTTP resources/threads. Consider constructing and reusing an assume-role credentials provider (e.g., cached by awsRegion+assumeRoleArn or initialized once per pool) and ensuring any SDK clients/providers are closed on shutdown.

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +59
if (assumeRoleArn != null) {
StsClient stsClient =
StsClient.builder()
.region(Region.of(awsRegion))
.credentialsProvider(credentialsProvider)
.build();

AssumeRoleRequest assumeRoleRequest =
AssumeRoleRequest.builder()
.roleArn(assumeRoleArn)
.roleSessionName("OpenMetadata-RDS-IAM-Auth")
.build();
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check if (assumeRoleArn != null) will attempt an STS assume-role even when the query param is present but empty/whitespace (e.g. assumeRoleArn=), which will fail with an AWS SDK validation error. Treat blank values as “not provided” (e.g., check isBlank() and skip) or raise a clear configuration error.

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +66
StsClient stsClient =
StsClient.builder()
.region(Region.of(awsRegion))
.credentialsProvider(credentialsProvider)
.build();

AssumeRoleRequest assumeRoleRequest =
AssumeRoleRequest.builder()
.roleArn(assumeRoleArn)
.roleSessionName("OpenMetadata-RDS-IAM-Auth")
.build();

credentialsProvider =
StsAssumeRoleCredentialsProvider.builder()
.stsClient(stsClient)
.refreshRequest(assumeRoleRequest)
.build();
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new STS assume-role path, failures (STS call errors, invalid role ARN, missing permissions, etc.) will throw AWS SDK runtime exceptions that currently bypass the catch (MalformedURLException e) and won’t be wrapped in DatabaseAuthenticationProviderException. Consider catching broader exceptions in authenticate() and wrapping them consistently to keep error handling aligned with other providers (e.g., AzureDatabaseAuthenticationProvider).

Copilot uses AI. Check for mistakes.
"version": "22.17.0"
}
},
// Use post-create script for one-time environment initialization (ANTLR, dependencies, venv)
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devcontainer.json is parsed by the repo’s pre-commit check-json hook (strict JSON). Adding // comments makes this file invalid JSON and will cause the hook/CI to fail. Please remove the comment or replace it with a normal JSON field (e.g., a _comment property) if you want to keep the explanation in-file.

Suggested change
// Use post-create script for one-time environment initialization (ANTLR, dependencies, venv)
"_comment": "Use post-create script for one-time environment initialization (ANTLR, dependencies, venv)",

Copilot uses AI. Check for mistakes.
Comment thread .devcontainer/dev/devcontainer.json Outdated
"version": "22.17.0"
}
},
// Use post-create script for one-time environment initialization (ANTLR, dependencies, venv)
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devcontainer.json is parsed by the repo’s pre-commit check-json hook (strict JSON). Adding // comments makes this file invalid JSON and will cause the hook/CI to fail. Please remove the comment or replace it with a normal JSON field (e.g., a _comment property) if you want to keep the explanation in-file.

Suggested change
// Use post-create script for one-time environment initialization (ANTLR, dependencies, venv)
"_comment": "Use post-create script for one-time environment initialization (ANTLR, dependencies, venv)",

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot AI review requested due to automatic review settings April 22, 2026 15:45
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment on lines +47 to +54
if (CommonUtil.nullOrEmpty(awsRegion)) {
throw new DatabaseAuthenticationProviderException(
"Parameter `awsRegion` shall be provided in the jdbc url.");
}
if (CommonUtil.nullOrEmpty(allowPublicKeyRetrieval)) {
throw new DatabaseAuthenticationProviderException(
"Parameter `allowPublicKeyRetrieval` shall be provided in the jdbc url.");
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CommonUtil.nullOrEmpty only checks isEmpty() and will treat whitespace-only values as present. That means values like awsRegion=%20 or assumeRoleArn=%20 will pass validation and then fail later (e.g., Region.of(" ") or STS AssumeRole with an invalid ARN), producing a harder-to-diagnose error. Consider validating these parameters with a blank-aware check (e.g., trim + empty, or StringUtils.isBlank) so whitespace-only inputs are rejected as missing.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot AI review requested due to automatic review settings April 23, 2026 09:06
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

… restore shard null-checks, clean up STS resources, and add service name AOSS detection
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@aniruddhaadak80
Copy link
Copy Markdown
Author

aniruddhaadak80 commented Apr 23, 2026

I have addressed all the feedback from Gitar-bot and Copilot-bot. Could a maintainer please add the safe to test label and approve the pending workflows?

@harshach harshach added the safe to test Add this label to run secure Github workflows on PRs label Apr 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

…n to resolve Checkstyle and Copilot feedback
Copilot AI review requested due to automatic review settings April 23, 2026 14:57
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 23, 2026

Code Review ✅ Approved 5 resolved / 5 findings

Fixes RDS IAM cross-account authentication and updates development container documentation. Resolves resource leaks in StsClient, fixes AOSS detection logic, removes duplicate methods, and adds necessary null safety checks.

✅ 5 resolved
Bug: StsClient is never closed, causing resource leak on every auth call

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/jdbi/AwsRdsDatabaseAuthenticationProvider.java:49-63
The StsClient created at line 49 is never closed. Since authenticate() is called by HikariCP every time a new database connection is obtained from the pool, this leaks an HTTP client and its associated resources (threads, connections) on every invocation.

Additionally, the StsAssumeRoleCredentialsProvider wrapping it is also never closed.

This will gradually exhaust file descriptors and memory under sustained load.

Performance: StsClient and STS credentials provider recreated on every connection

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/jdbi/AwsRdsDatabaseAuthenticationProvider.java:46-60
Every call to authenticate() creates a new StsClient and StsAssumeRoleCredentialsProvider. These are heavyweight objects involving HTTP client setup and STS API calls. Since authenticate() is invoked on every HikariCP connection checkout, this adds significant latency and unnecessary STS API calls (which are also subject to throttling).

StsAssumeRoleCredentialsProvider already handles credential caching and refresh internally — it's designed to be long-lived.

Edge Case: AOSS detection misses SEARCH_AWS_SERVICE_NAME=aoss config

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchClient.java:1085-1099
The checkIsAoss method only checks if the hostname ends with .aoss.amazonaws.com. The PR description mentions also checking SEARCH_AWS_SERVICE_NAME=aoss as a detection mechanism. Users connecting through a proxy, VPN endpoint, or custom DNS CNAME would have a hostname that doesn't match the .aoss.amazonaws.com suffix, causing AOSS to go undetected and cluster-level API calls to fail.

Consider also checking the AWS service name configuration (if available in ElasticSearchConfiguration or AwsConfiguration) as a secondary signal.

Bug: Duplicate postDelete method will cause compilation error

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:1534-1544 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:862-866
The class TestCaseRepository already defines postDelete(TestCase, boolean) at line 862 (which calls updateTestSuite(testCase)). The new postDelete added at line 1534 has the identical signature. Java does not allow two methods with the same name and parameter types in the same class — this will fail to compile.

Additionally, even if one were removed, the existing postDelete at line 862 calls updateTestSuite() which is needed to keep test suite counts in sync. The new method's hard-delete cleanup logic must be merged into the existing method, not added as a separate override.

Bug: Null check on shards().total() was dropped, risking NPE

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexClusterValidator.java:90-93 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchClusterMetrics.java:87-90
The old code guarded against shards().total() being null before calling .intValue(). The refactored code adds null checks for indices() and shards() but drops the existing null check on total(), which returns a nullable Long. If the OpenSearch API returns a response where shards is present but total is null, this will throw a NullPointerException on .intValue().

This affects both SearchIndexClusterValidator.java and SearchClusterMetrics.java with the identical pattern.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@github-actions
Copy link
Copy Markdown
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants