Skip to content

Conversation

@dlutsch
Copy link

@dlutsch dlutsch commented Nov 5, 2025

What type of PR is this?
feature

Which issue does this PR fix:
Fixes #842 & #697

What does this PR do / Why do we need it:

This PR enables automatic discovery of VPC Lattice Service Networks that have been shared via AWS Resource Access Manager (RAM) from other AWS accounts. This is critical for enterprise multi-account architectures where a central networking account owns and shares Service Networks to spoke accounts.

Problem:

  • Controller currently only searches for Service Networks in the local account
  • RAM-shared networks from other accounts are not discovered
  • Users must use workarounds (Standalone Mode + manual associations or complex automation)

Solution:

  • Modified FindServiceNetwork() to use two-step discovery process:
    1. First searches local account (existing behavior - backward compatible)
    2. Falls back to VPC association discovery for RAM-shared networks
  • Added findServiceNetworkViaVPCAssociation() to discover networks via VPC associations
  • Added buildServiceNetworkInfo() helper for consistent tag fetching
  • Added isLocalServiceNetwork() to detect local vs RAM-shared networks by parsing ARN
  • Modified UpsertVpcAssociation() to skip ownership checks for RAM-shared networks
  • Maintains full backward compatibility with existing local network discovery
  • No additional IAM permissions required (uses existing ListServiceNetworkVpcAssociations API)

Changes:

  • pkg/aws/services/vpclattice.go (~75 lines added)
  • pkg/deploy/lattice/service_network_manager.go (~30 lines modified)
  • pkg/deploy/lattice/service_network_manager_test.go (4 tests updated with valid ARN formats)
  • pkg/deploy/lattice/service_network_manager_ram_shared_test.go (7 new tests added)

If an issue # is not available please add repro steps and logs from aws-gateway-controller showing the issue:

Repro Steps (Before this PR):

  1. Central networking account (e.g., 111111111111) creates and shares Service Network via RAM
  2. Spoke account (e.g., 222222222222) VPC is associated with the RAM-shared Service Network (status: ACTIVE)
  3. Deploy Gateway API Controller in spoke account EKS cluster
  4. Create Gateway with name matching the RAM-shared Service Network
  5. Result: Gateway fails with "Service network not found" error because controller only searches local account

After this PR:
Same steps result in Gateway successfully discovering and using the RAM-shared Service Network through VPC association lookup.

Testing done on this change:

Unit Tests: ✅ COMPLETE

Updated Existing Tests:

  • Fixed 4 tests in service_network_manager_test.go to use valid ARN formats
  • All original tests passing with new code
  • Tests verify local network behavior remains unchanged

New RAM-Shared Network Tests (service_network_manager_ram_shared_test.go):

  1. Test_isLocalServiceNetwork_LocalNetwork - Verifies local network detection
  2. Test_isLocalServiceNetwork_RAMSharedNetwork - Verifies RAM-shared detection
  3. Test_isLocalServiceNetwork_InvalidARN - Tests graceful error handling
  4. Test_isLocalServiceNetwork_NilARN - Tests nil pointer handling
  5. Test_UpsertVpcAssociation_RAMSharedNetwork_ExistingAssociation - Verifies ownership checks are skipped
  6. Test_UpsertVpcAssociation_RAMSharedNetwork_ReadOnly - Confirms no modifications to RAM-shared networks
  7. Test_UpsertVpcAssociation_LocalNetwork_WithUpdates - Ensures local networks still updatable

Test Results:

=== RUN   Test_isLocalServiceNetwork_LocalNetwork
--- PASS: Test_isLocalServiceNetwork_LocalNetwork (0.00s)
=== RUN   Test_isLocalServiceNetwork_RAMSharedNetwork
--- PASS: Test_isLocalServiceNetwork_RAMSharedNetwork (0.00s)
=== RUN   Test_isLocalServiceNetwork_InvalidARN
--- PASS: Test_isLocalServiceNetwork_InvalidARN (0.00s)
=== RUN   Test_isLocalServiceNetwork_NilARN
--- PASS: Test_isLocalServiceNetwork_NilARN (0.00s)
=== RUN   Test_UpsertVpcAssociation_RAMSharedNetwork_ExistingAssociation
--- PASS: Test_UpsertVpcAssociation_RAMSharedNetwork_ExistingAssociation (0.00s)
=== RUN   Test_UpsertVpcAssociation_RAMSharedNetwork_ReadOnly
--- PASS: Test_UpsertVpcAssociation_RAMSharedNetwork_ReadOnly (0.00s)
=== RUN   Test_UpsertVpcAssociation_LocalNetwork_WithUpdates
--- PASS: Test_UpsertVpcAssociation_LocalNetwork_WithUpdates (0.00s)
PASS
ok  	github.com/aws/aws-application-networking-k8s/pkg/deploy/lattice

All package tests passing: go test ./pkg/deploy/lattice - 100% success rate

Multi-Account Integration Testing: ✅ COMPLETE

Validated in real AWS multi-account environment:

Test Environment:

  • Central networking account: 111111111111 (RAM-shared network owner)
  • Spoke account: 222222222222 (EKS cluster account)
  • EKS cluster: test-cluster-usw2
  • VPC: vpc-abc123def456 (pre-associated with RAM-shared network)

Test Scenario:

# Gateway manifest
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: test-ram-gateway
spec:
  gatewayClassName: amazon-vpc-lattice
  listeners:
  - name: http
    port: 80
    protocol: HTTP

Test Results:

  1. ✅ Gateway discovered RAM-shared network from account 111111111111
  2. ✅ Gateway status: Programmed=True
  3. ✅ HTTPRoute created VPC Lattice service in spoke account 222222222222
  4. ✅ Service association created with RAM-shared network: status ACTIVE
  5. ✅ End-to-end traffic flow validated

Gateway Status:

status:
  conditions:
  - type: Programmed
    status: "True"
    reason: Programmed
    message: 'aws-service-network-arn: arn:aws:vpc-lattice:us-west-2:111111111111:servicenetwork/sn-04f8437d5c6e026b0'

Service Association (cross-account):

{
  "id": "snsa-abcdef1234567890",
  "arn": "arn:aws:vpc-lattice:us-west-2:222222222222:servicenetworkserviceassociation/snsa-abcdef1234567890",
  "serviceId": "svc-05adf70023b306447",
  "serviceName": "k8s-default-test-route-abc123",
  "serviceNetworkArn": "arn:aws:vpc-lattice:us-west-2:111111111111:servicenetwork/sn-04f8437d5c6e026b0",
  "status": "ACTIVE"
}

Backward Compatibility Testing: ✅ VERIFIED

  • ✅ Local Service Networks continue to work (no regression)
  • ✅ Existing Gateway deployments unaffected
  • ✅ Discovery prefers local networks first (fallback to RAM-shared only when not found locally)
  • ✅ All original unit tests passing with updated ARN formats

Automation added to e2e:

The repository already contains comprehensive e2e tests for RAM sharing in test/suites/integration/ram_share_test.go (added in #578). These tests validate the existing explicit naming behavior and will continue to pass with this PR.

Our feature adds a new auto-discovery capability via VPC associations that complements (rather than replaces) the existing explicit naming approach. Both methods now work:

  1. Existing: Gateway name matches service network name (tested by existing e2e tests)
  2. New: Gateway name can be anything, discovers via VPC association (validated in production)

Will this PR introduce any new dependencies?:

No new dependencies. Uses existing AWS VPC Lattice APIs:

  • ListServiceNetworkVpcAssociations (existing IAM permission)
  • ListTagsForResource (existing IAM permission)
  • No new libraries or modules
  • No additional IAM permissions required

Will this break upgrades or downgrades. Has updating a running cluster been tested?:

No breaking changes. Tested with live upgrade:

  • ✅ Upgraded existing controller deployment with new image containing this feature
  • ✅ Existing Gateway resources remained fully functional
  • ✅ New Gateway resources immediately benefited from RAM-shared discovery
  • ✅ Downgrade safe - falls back to original local-only discovery behavior
  • ✅ No CRD changes or API modifications required

Does this PR introduce any user-facing change?:

Enables automatic discovery of RAM-shared VPC Lattice Service Networks. Gateways can now reference Service Networks that have been shared via AWS RAM from other accounts. The controller automatically discovers these networks via VPC associations without requiring any user configuration or additional IAM permissions. This eliminates the need for manual workarounds in multi-account environments.

Do all end-to-end tests successfully pass when running make e2e-test?:

Unit tests are comprehensive and all passing. E2E tests were not run in this PR validation due to test infrastructure setup requirements, but:

Unit test coverage is complete:

  • 7 new tests for RAM-shared network functionality
  • 4 existing tests updated with valid ARN formats
  • All tests passing (100% success rate)
  • Tests cover: account detection, ownership checks, read-only behavior, backward compatibility

Production validation complete:

  • Validated in real multi-account AWS environment
  • RAM-shared networks from central networking account successfully discovered
  • Services created and associated with RAM-shared networks (status: ACTIVE)
  • End-to-end workflow validated

Backward compatibility confirmed:

  • Existing unit tests all pass
  • Local network discovery unchanged
  • Existing e2e tests for explicit RAM naming will continue to pass

The repository's existing e2e tests in test/suites/integration/ram_share_test.go provide coverage for RAM sharing scenarios and will continue to pass with these changes.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Dan Lutsch and others added 3 commits November 5, 2025 14:54
- Modified FindServiceNetwork() to use two-step discovery process
- Added findServiceNetworkViaVPCAssociation() to discover RAM-shared networks
- Added buildServiceNetworkInfo() helper for tag fetching
- Added isLocalServiceNetwork() to detect local vs RAM-shared networks
- Modified UpsertVpcAssociation() to skip ownership checks for RAM-shared networks
- Maintains backward compatibility with existing local network discovery
- No additional IAM permissions required
- Fixed existing tests with valid ARN formats
- Added comprehensive RAM-shared network test coverage
- All tests passing (100% success rate)
@rlymbur
Copy link
Contributor

rlymbur commented Nov 28, 2025

Thank you for the PR and extensive testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT] Support Cross-Account VPC Lattice Service Network Discovery via AWS RAM

2 participants