Skip to content

Conversation

@benle7
Copy link
Contributor

@benle7 benle7 commented Oct 26, 2025

This PR is dependent on the following PRs:
sonic-net/sonic-platform-common#605
sonic-net/sonic-buildimage#24345

What I did

Added BMC dump collection to the show techsupport output.
Introduced new CLI commands for retrieving BMC information.

How I did it

Updated generate_dump script to trigger asynchronous BMC dump collection
at the beginning of the techsupport process and collect the dump before packaging the final tarball.
Implemented new BMC CLI commands under show platform bmc by extending the existing CLI framework.
The CLIs internally query the BMC API to retrieve BMC summary and EEPROM data.

How to verify it

show techsupport
show platform bmc summary
show platform bmc eeprom

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@benle7 benle7 changed the title Integrate BMC support and Redfish APIs into SONiC [BMC] Add BMC CLI and integrate BMC dump into show techsupport Oct 26, 2025
@benle7 benle7 changed the title [BMC] Add BMC CLI and integrate BMC dump into show techsupport [BMC] Add BMC CLIs and integrate BMC dump into show techsupport Oct 26, 2025
@benle7 benle7 marked this pull request as ready for review October 27, 2025 08:04
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@benle7
Copy link
Contributor Author

benle7 commented Nov 6, 2025

@judyjoseph @yxieca
Could you please help review?

Copy link

@nate-nexthop nate-nexthop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@lguohan lguohan requested a review from Copilot November 8, 2025 20:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds BMC (Baseboard Management Controller) support to the SONiC utilities, enabling BMC information retrieval and BMC debug log collection during techsupport generation.

  • Adds new CLI commands show platform bmc summary and show platform bmc eeprom to display BMC information
  • Integrates BMC debug log collection into the generate_dump script workflow
  • Introduces bmc_techsupport.py script to trigger and collect BMC debug logs via Redfish API

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
show/platform.py Adds BMC command group with summary and eeprom subcommands to display BMC hardware and firmware information
scripts/bmc_techsupport.py New script implementing BMC debug log dump triggering and collection using Redfish API
scripts/generate_dump Adds BMC support check, triggers BMC debug log dump at start, and collects logs during techsupport generation
tests/show_platform_test.py Comprehensive test coverage for BMC commands including regular/JSON output and error scenarios
setup.py Registers the new bmc_techsupport.py script for installation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1972 to 1976
###############################################################################
# Trigger BMC debug log dump task
# Globals:
# TAR, TARFILE, DUMPDIR, BASE, TARDIR, TECHSUPPORT_TIME_INFO
# Arguments:
# None
# Returns:
# None
###############################################################################
Copy link

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Globals section lists variables (TAR, TARFILE, DUMPDIR, BASE, TARDIR, TECHSUPPORT_TIME_INFO) that are not actually used in the trigger_bmc_debug_log_dump function. This function doesn't use any of these global variables. The Globals documentation should be updated to accurately reflect what's used or removed if no globals are used.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 216 to 232
sys.modules['sonic_platform'] = mock_sonic_platform
sys.modules['sonic_platform.platform'] = mock_sonic_platform.platform

mock_platform = mock.MagicMock()
mock_chassis = mock.MagicMock()
mock_bmc = mock.MagicMock()

mock_platform.get_chassis.return_value = mock_chassis
mock_chassis.get_bmc.return_value = mock_bmc
mock_bmc.get_eeprom.return_value = self.TEST_BMC_EEPROM_INFO
mock_bmc.get_version.return_value = self.TEST_BMC_VERSION
mock_sonic_platform.platform.Platform.return_value = mock_platform

result = CliRunner().invoke(show.cli.commands['platform'].commands['bmc'].commands['summary'], [])
assert result.exit_code == 0, result.output
assert result.output == textwrap.dedent(expected_output)

Copy link

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test methods modify sys.modules to mock the sonic_platform module but never clean it up. This can cause test pollution where mocked modules from one test affect subsequent tests. Consider using a fixture or teardown to restore the original sys.modules state after each test, or use unittest.mock.patch.dict to automatically clean up:

with mock.patch.dict('sys.modules', {
    'sonic_platform': mock_sonic_platform,
    'sonic_platform.platform': mock_sonic_platform.platform
}):
    # test code here
Suggested change
sys.modules['sonic_platform'] = mock_sonic_platform
sys.modules['sonic_platform.platform'] = mock_sonic_platform.platform
mock_platform = mock.MagicMock()
mock_chassis = mock.MagicMock()
mock_bmc = mock.MagicMock()
mock_platform.get_chassis.return_value = mock_chassis
mock_chassis.get_bmc.return_value = mock_bmc
mock_bmc.get_eeprom.return_value = self.TEST_BMC_EEPROM_INFO
mock_bmc.get_version.return_value = self.TEST_BMC_VERSION
mock_sonic_platform.platform.Platform.return_value = mock_platform
result = CliRunner().invoke(show.cli.commands['platform'].commands['bmc'].commands['summary'], [])
assert result.exit_code == 0, result.output
assert result.output == textwrap.dedent(expected_output)
with mock.patch.dict('sys.modules', {
'sonic_platform': mock_sonic_platform,
'sonic_platform.platform': mock_sonic_platform.platform
}):
mock_platform = mock.MagicMock()
mock_chassis = mock.MagicMock()
mock_bmc = mock.MagicMock()
mock_platform.get_chassis.return_value = mock_chassis
mock_chassis.get_bmc.return_value = mock_bmc
mock_bmc.get_eeprom.return_value = self.TEST_BMC_EEPROM_INFO
mock_bmc.get_version.return_value = self.TEST_BMC_VERSION
mock_sonic_platform.platform.Platform.return_value = mock_platform
result = CliRunner().invoke(show.cli.commands['platform'].commands['bmc'].commands['summary'], [])
assert result.exit_code == 0, result.output
assert result.output == textwrap.dedent(expected_output)

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

fi
# Trigger BMC redfish API to start BMC debug log dump task
local task_id=$(python3 /usr/local/bin/bmc_techsupport.py -m trigger)
echo $task_id
Copy link

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable $task_id should be quoted to prevent potential issues with word splitting or globbing:

echo "$task_id"
Suggested change
echo $task_id
echo "$task_id"

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

The usage of this script is divided into two parts:
1. Triggering BMC debug log dump Redfish task
* In this case the script, triggers a POST request to BMC to start collecting debug log dump.
Copy link

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an unnecessary comma in the sentence: "In this case the script, triggers" should be "In this case the script triggers".

Suggested change
* In this case the script, triggers a POST request to BMC to start collecting debug log dump.
* In this case the script triggers a POST request to BMC to start collecting debug log dump.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

2. Collecting BMC debug log dump
* In this step we will wait for the task-id to finish if it has not finished.
* Blocking action till we get the file or having ERROR or Timeout.
Copy link

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "having ERROR or Timeout" is grammatically awkward. Consider rephrasing to "encountering an ERROR or Timeout" or "until an ERROR or Timeout occurs".

Suggested change
* Blocking action till we get the file or having ERROR or Timeout.
* Blocking action until we get the file or encounter an ERROR or Timeout.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

[ -f "$TARBALL_XZ" ] && rm -f "$TARBALL_XZ"

# Invoke BMC redfish API to extract BMC debug log dump to "/tmp/bmc_debug_log_dump.tar.xz"
python3 /usr/local/bin/bmc_techsupport.py -m collect -p "$TARBALL_XZ" -t $bmc_debug_log_dump_task_id
Copy link

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable $bmc_debug_log_dump_task_id should be quoted to prevent potential issues with word splitting or globbing. While the task ID is unlikely to contain spaces, it's a bash best practice to quote variables:

python3 /usr/local/bin/bmc_techsupport.py -m collect -p "$TARBALL_XZ" -t "$bmc_debug_log_dump_task_id"
Suggested change
python3 /usr/local/bin/bmc_techsupport.py -m collect -p "$TARBALL_XZ" -t $bmc_debug_log_dump_task_id
python3 /usr/local/bin/bmc_techsupport.py -m collect -p "$TARBALL_XZ" -t "$bmc_debug_log_dump_task_id"

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

# Globals:
# TAR, TARFILE, DUMPDIR, BASE, TARDIR, TECHSUPPORT_TIME_INFO
# Arguments:
# None
Copy link

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function documentation states "Arguments: None" but the function accepts one argument $1 (the BMC debug log dump task ID) as seen on line 2009. The documentation should be updated to reflect this:

# Arguments:
#  $1 - BMC debug log dump task ID
Suggested change
# None
# $1 - BMC debug log dump task ID

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +85 to +98
try:
import sonic_platform
chassis = sonic_platform.platform.Platform().get_chassis()
bmc = chassis.get_bmc()

if bmc is None:
click.echo("BMC is not available on this platform")
return

eeprom_info = bmc.get_eeprom()
if not eeprom_info:
click.echo("Failed to retrieve BMC EEPROM information")
return

Copy link

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The bmc_summary and eeprom functions have significant code duplication in their initialization and error handling logic (getting the chassis, BMC, and EEPROM info). Consider extracting this common logic into a helper function to improve maintainability:

def _get_bmc_eeprom_info():
    """Helper function to get BMC and EEPROM information."""
    import sonic_platform
    chassis = sonic_platform.platform.Platform().get_chassis()
    bmc = chassis.get_bmc()
    
    if bmc is None:
        return None, None, "BMC is not available on this platform"
    
    eeprom_info = bmc.get_eeprom()
    if not eeprom_info:
        return None, None, "Failed to retrieve BMC EEPROM information"
    
    return bmc, eeprom_info, None

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the helper function cannot have the @click.option, we should indicate the kind of error via the return values, and then check and log accordingly in the original functions. So there is no significant improvement in code reduction.

trigger_bmc_debug_log_dump() {
trap 'handle_error $? $LINENO' ERR
if ! is_bmc_supported; then
echo "INFO: BMC is not found on this platform. Skipping..."
Copy link

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When BMC is not supported, the function echoes an INFO message and returns, which causes that message to be captured as the task_id value at line 2255. This leads to incorrect behavior:

  1. At line 2256, the check for "-1" fails (since task_id is "INFO: BMC is not found...")
  2. At line 2474, the check != "-1" succeeds, causing collect_bmc_files to be called with the INFO message as the task_id

The function should return "-1" explicitly without echoing the INFO message, or restructure to avoid this issue:

if ! is_bmc_supported; then
    echo "-1"
    return
fi

And move the INFO message to the caller (line 2254-2258) to check for empty/invalid returns.

Suggested change
echo "INFO: BMC is not found on this platform. Skipping..."
echo "-1"

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect_bmc_files will be called with the INFO message as the task_id.
However, at the top of collect_bmc_files, there is a check for 'is_bmc_supported'. Therefore, we will not use this task_id.

Comment on lines 1953 to 1959
###############################################################################
# Check BMC presence
# Globals:
# TAR, TARFILE, DUMPDIR, BASE, TARDIR, TECHSUPPORT_TIME_INFO
# Arguments:
# None
# Returns:
# None
###############################################################################
Copy link

Copilot AI Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Globals section lists variables (TAR, TARFILE, DUMPDIR, BASE, TARDIR, TECHSUPPORT_TIME_INFO) that are not actually used in the is_bmc_supported function. This function only uses the platform variable. The Globals documentation should be updated to accurately reflect what's used or removed if no globals are used.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants