Skip to content

Conversation

@himani2411
Copy link
Contributor

Description of changes

  • Install NVIDIA Fabric manager for ARM instances
  • Gb200 is ARM architecture instance and requires Nvidia fabric manager

Tests

  • Unit test and Manual installation on instance

References

  • Link to impacted open issues.
  • Link to related PRs in other packages (i.e. cookbook, node).
  • Link to documentation useful to understand the changes.

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@himani2411 himani2411 requested a review from a team as a code owner August 20, 2025 21:54
@himani2411 himani2411 added the 3.x label Aug 20, 2025
@himani2411 himani2411 requested a review from a team as a code owner August 20, 2025 21:54
CHANGELOG.md Outdated
- Remove `berkshelf`. All cookbooks are local and do not need `berkshelf` dependency management.
- Add support for GB200 instance types.
- Install nvidia-imex for all OSs except AL2.
- Install nvidia-fabricmanager for ARM instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARM instances

hanwen-cluster
hanwen-cluster previously approved these changes Aug 20, 2025
@himani2411 himani2411 enabled auto-merge (squash) August 20, 2025 22:31
@codecov
Copy link

codecov bot commented Aug 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.20%. Comparing base (a2651f4) to head (769d4ee).
⚠️ Report is 4 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3014   +/-   ##
========================================
  Coverage    75.20%   75.20%           
========================================
  Files           24       24           
  Lines         2444     2444           
========================================
  Hits          1838     1838           
  Misses         606      606           
Flag Coverage Δ
unittests 75.20% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@himani2411 himani2411 merged commit f516bba into aws:develop Aug 20, 2025
30 of 32 checks passed
himani2411 pushed a commit to himani2411/aws-parallelcluster-cookbook that referenced this pull request Aug 22, 2025
himani2411 added a commit that referenced this pull request Aug 22, 2025
* Revert "[Fabric] Install NVIDIA Fabric manager for ARM instances (#3014)"

This reverts commit f516bba.

* [Fabric] We do not enable Nvidia Fabric manager for Gb200 instance

---------

Co-authored-by: Himani Anil Deshpande <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants