Skip to content

New OAuth2Manager #2244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

New OAuth2Manager #2244

wants to merge 10 commits into from

Conversation

sungwy
Copy link
Collaborator

@sungwy sungwy commented Jul 23, 2025

Rationale for this change

New OAuth2Manager implementation that makes use of AuthManager and more closely follows https://datatracker.ietf.org/doc/html/rfc6749 recommendations.

Proactively refreshes the access token by checking the expiration.

Are these changes tested?

Yes, both in unit and integration tests.

Are there any user-facing changes?

No, this is a new feature.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, a few nit comments. Lets get #2055 merged and then rebase this PR

- `noop`: No authentication (no Authorization header sent).
- `basic`: HTTP Basic authentication.
- `oauth2`: OAuth2 client credentials flow.
- `legacyoauth2`: Legacy OAuth2 client credentials flow (Deprecated and will be removed in PyIceberg 1.0.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i think this becomes a little confusing. legacyoauth2 is a fallback mechanism. i.e. when the auth: block is not provided. i think we should call this out

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - I'm fine with leaving legacyoauth2 completely out of this section and relying on the existing Authentication Options to explain those configurations

return self._token


class OAuth2AuthManager(AuthManager):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any tests for LegacyOAuth2AuthManager? do we want OAuth2AuthManager to be feature parity in this first release?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont see credential, resource, and audience

| credential | client_id:client_secret | Credential to use for OAuth2 credential flow when initializing the catalog |
| scope | openid offline corpds:ds:profile | Desired scope of the requested security token (default : catalog) |
| resource | rest_catalog.iceberg.com | URI for the target resource or service |
| audience | rest_catalog | Logical name of target resource or service |

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are using client_id and client_secret instead in the current implementation, as opposed to credential. This is also currently in draft mode, and I intend to review OAuth2 spec a little bit more in depth and other industry standard implementations before finalizing the implementation.

@@ -187,3 +289,4 @@ def create(cls, class_or_name: str, config: Dict[str, Any]) -> AuthManager:
AuthManagerFactory.register("noop", NoopAuthManager)
AuthManagerFactory.register("basic", BasicAuthManager)
AuthManagerFactory.register("legacyoauth2", LegacyOAuth2AuthManager)
AuthManagerFactory.register("oauth2", OAuth2AuthManager)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it a good idea to call AuthManagerFactory.register directly in the file? Is it better if its encapsulated in a function?

Im worry about import automatically running this code

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that with Python import the module's code is executed only once, and subsequent imports instructs Python to retrieve the loaded module object from sys.modules unless the application owner intentionaly reloads the module. The registration is also currently idempotent

@kevinjqliu
Copy link
Contributor

i took the liberty to merge main :)

@kevinjqliu kevinjqliu added this to the PyIceberg 0.10.0 milestone Jul 25, 2025
@kevinjqliu kevinjqliu requested a review from Copilot July 25, 2025 23:49
Copilot

This comment was marked as outdated.

@kevinjqliu kevinjqliu requested a review from Copilot July 26, 2025 00:19
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new OAuth2Manager implementation that more closely follows RFC6749 standards, replacing the legacy OAuth2 implementation. The new implementation includes proactive token refresh capabilities based on expiration times.

  • Adds a new OAuth2AuthManager with RFC6749-compliant implementation
  • Introduces thread-safe OAuth2TokenProvider for token management with automatic refresh
  • Updates documentation to reflect the new oauth2 auth type and deprecates legacyoauth2

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
pyiceberg/catalog/rest/auth.py Implements OAuth2TokenProvider and OAuth2AuthManager classes with thread-safe token refresh
tests/catalog/test_rest.py Adds integration test for the new oauth2 authentication type
mkdocs/docs/configuration.md Updates documentation to include oauth2 configuration and marks legacyoauth2 as deprecated

raise ValueError(
"The expiration time of the Token must be provided by the Server in the Access Token Response in `expires_in` field, or by the PyIceberg Client."
)
self._expires_at = time.time() + expires_in - self.refresh_margin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to use time.monotonic() here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic suggestion @jayceslesar - thanks!


def get_token(self) -> str:
with self._lock:
if not self._token or time.time() >= self._expires_at:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -1646,6 +1646,40 @@ def test_rest_catalog_with_unsupported_auth_type() -> None:
assert "Could not load AuthManager class for 'unsupported'" in str(e.value)


def test_rest_catalog_with_oauth2_auth_type(requests_mock: Mocker) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a test where fetching the token does not return a 200 or where refreshing the token does not return a 200.

That would allow us to verify that the non-happy paths work as intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants