Skip to content

Conversation

dimpavloff
Copy link

@dimpavloff dimpavloff commented Jul 7, 2025

Part one for grpc/proposal#492 (A97).
This is done in a new credentials/jwt package to provide file-based PerRPCCallCredentials. It can be used beyond XDS. The package handles token reloading, caching, and validation as per A97 .

There will be a separate PR which uses it in xds/bootstrap.

Whilst implementing the above, I considered credentials/oauth and credentials/xds packages instead of creating a new one. The former package has NewJWTAccessFromKey and jwtAccess which seem very relevant at first. However, I think the jwtAccess behaviour seems more tailored towards Google services. Also, the refresh, caching, and error behaviour for A97 is quite different than what's already there and therefore a separate implementation would have still made sense.
WRT credentials/xds, it could have been extended to both handle transport and call credentials. However, this is a bit at odds with A97 which says that the implementation should be non-XDS specific and, from reading between the lines, usable beyond XDS.
I think the current approach makes review easier but because of the similarities with the other two packages, it is a bit confusing to navigate. Please let me know whether the structure should change.

Relates to istio/istio#53532

RELEASE NOTES:

  • credentials: add credentials/jwt package providing file-based JWT PerRPCCredentials (A97)

Copy link

codecov bot commented Jul 7, 2025

Codecov Report

❌ Patch coverage is 96.74797% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.92%. Comparing base (dd718e4) to head (3f9195e).
⚠️ Report is 62 commits behind head on master.

Files with missing lines Patch % Lines
credentials/jwt/jwt_token_file_call_creds.go 95.06% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8431      +/-   ##
==========================================
- Coverage   82.27%   81.92%   -0.35%     
==========================================
  Files         414      415       +1     
  Lines       40424    40643     +219     
==========================================
+ Hits        33259    33297      +38     
- Misses       5795     5966     +171     
- Partials     1370     1380      +10     
Files with missing lines Coverage Δ
credentials/jwt/jwt_file_reader.go 100.00% <100.00%> (ø)
credentials/jwt/jwt_token_file_call_creds.go 95.06% <95.06%> (ø)

... and 250 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dimpavloff dimpavloff changed the title xds: implement file-based JWT authentication (A97) xds: implement file-based JWT Call Credentials (A97) Jul 7, 2025
@dimpavloff
Copy link
Author

@dfawley hey 👋 Given you approved A97, would you mind having a cursory look at the PR to confirm if at least at a high level the approach looks good?

@eshitachandwani
Copy link
Member

I will take a look at this , I need to go through the gRFC first.

@dfawley dfawley self-assigned this Jul 22, 2025
@dfawley dfawley requested review from easwars and eshitachandwani and removed request for dfawley July 25, 2025 20:39
@dfawley dfawley assigned easwars and unassigned dfawley Jul 25, 2025
@dfawley
Copy link
Member

dfawley commented Jul 25, 2025

Sorry for the delay here.

@easwars would you be able to review this change? I think you have more background into some of the things than I do, like the bootstrap integration. Thank you!

@easwars
Copy link
Contributor

easwars commented Jul 28, 2025

Thank you for your contribution @dimpavloff. Yes, it would be nice if you can split this into smaller PRs. I will continue to use this PR to review the JWT call credentials implementation. If you can move the xDS implementation out to one or more PRs, I would greatly appreciate that and would be happy to review them as well.


// Verify cached expiration is 30 seconds before actual token expiration
impl := creds.(*jwtTokenFileCallCreds)
impl.mu.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please only test the API surface. Relying on implementation internals in tests makes them brittle and would result in test changes when any changes to implementation is made.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you are referring to using a private field rather than obtaining mu specifically.
In general I agree -- white box tests may get fragile and break during a refactor. However, this test and the next couple of ones are about the caching behaviour -- it is meant to be transparent to the external API. If I don't make assertions about the private fields, the tests may pass trivially and become more flaky (e.g. when testing the backoff in the next test).
One alternative could be factoring out these behaviours out into a separate private struct with "public" functions which expose the same information. Given that it would require shifting the majority of the implementation into that struct, I'm not sure it is an improvement from the current approach.
Please do let me know your thoughts and if you have other suggestions.

@easwars easwars assigned dimpavloff and unassigned easwars Jul 28, 2025
@dimpavloff dimpavloff requested a review from easwars August 26, 2025 17:52
@dimpavloff dimpavloff removed their assignment Aug 26, 2025
@easwars easwars self-assigned this Aug 29, 2025
Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of all of my comments. I think I'm mostly happy with where we are at now. I'll follow up on the two open issues and get back to you soon.

if err == nil {
t.Fatalf("GetRequestMetadata() expected error, got nil")
}
if !strings.Contains(err.Error(), tt.wantErrContains) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check with folks on the team and will get back to you soon on this. Thanks.

@easwars easwars assigned dimpavloff and unassigned easwars Aug 29, 2025
@dimpavloff dimpavloff requested a review from easwars September 1, 2025 10:25
@dimpavloff dimpavloff removed their assignment Sep 1, 2025
Comment on lines 365 to 370
impl := creds.(*jwtTokenFileCallCreds)
impl.mu.Lock()
cachedErr := impl.cachedError
retryAttempt := impl.retryAttempt
nextRetryTime := impl.nextRetryTime
impl.mu.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider declaring the backoff function as a package level variable that can be overridden from here. That way, you can easily verify the retry attempt and inject any value that you want from the backoff function. This is an example of a test that overrides the backoff function used by the code under test. See:

func (s) TestADS_BackoffAfterStreamFailure(t *testing.T) {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if that would help much. The struct is storing the backoff duration in nextRetryTime which I am already able to mutate directly in the tests which would be simpler than through an overridden backoff function.

Comment on lines +406 to +409
// Fast-forward the backoff retry time to allow next retry attempt.
impl.mu.Lock()
impl.nextRetryTime = time.Now().Add(-1 * time.Minute)
impl.mu.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if we override the backoff function, we wont need to change the internal fields of the creds. We can simply control the returned value from the overridden backoff function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backoff function will need to be configured what delay to return before GetMetadata is called. This result is then stored in nextRetryTime in the implementation. Subsequent calls will continue to consult nextRetryTime until another attempt is made.
Therefore, in this test, before the first call, I would have to estimate what is a good delay such that it's long enough for the first and second call to be covered (ie second call is cached) but short enough for the third call to trigger a retry. Obviously, this would be extremely flaky. Alternatively, it would require that I create a deadline in the backoff function which is shared with the rest of the test and the deadline is awaited on this line. To me this seems more complicated than mutating the private field to a negative value to force the retry and it will make the test slower because I still need to overestimate the duration for the first and second calls.

LMK if you still want to proceed.

@easwars easwars assigned dimpavloff and unassigned easwars Sep 3, 2025
@dimpavloff dimpavloff requested a review from easwars September 3, 2025 22:32
@dimpavloff dimpavloff removed their assignment Sep 3, 2025
Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a TODO in the tests that access/mutate internal state to rewrite/change them to not do that. We can handle it as a low priority task at some point in time.

Thank you for taking care of all the comments.

@easwars easwars requested review from dfawley and arjan-bal and removed request for dfawley September 4, 2025 21:53
@easwars
Copy link
Contributor

easwars commented Sep 4, 2025

Moving to @arjan-bal for second set of eyes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Auth Includes regular credentials API and implementation. Also includes advancedtls, authz, rbac etc. Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants