Skip to content

Conversation

@byu343
Copy link
Contributor

@byu343 byu343 commented Oct 7, 2025

What I did

Check SSD health using ssdutil before warm-reboot

How I did it

Check the health of SSD based on the output of ssdutil. Stop warm-reboot early if the health number is 0.

How to verify it

The check will be skipped if the command ssdutil returned with error
The added lines can correctly parse the output of ssdutil in the format of "Health : X%" or "Health : X.Y%"
The check will block fast-reboot/warm-reboot if the extracted health number is 0

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vaibhavhd vaibhavhd requested a review from judyjoseph October 7, 2025 19:38
debug "SSD Health is $health_value% — OK."
else
error "Warning: Health is $health_value% — Possible drive failure!"
exit "${EXIT_SDD_HEALTH_FAILURE}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Boyang - this is very basic / minimal check that can be done to prevent the issue.

The core idea is to collect debug information to then implement a solution to really prevent these issues.

# Check SSD health
if [ -x "${SSD_UTIL}" ]; then
debug "Checking ssd health before ${REBOOT_TYPE}..."
health_line=$(${SSD_UTIL} | grep -E "Health\s*:\s*[0-9]+\.?[0-9]*%" || true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What is runtime for this new utility? Can it extend the warm-reboot overall runtime considerably?

PLATFORM=$(sonic-cfggen -H -v DEVICE_METADATA.localhost.platform)
PLATFORM_PLUGIN="${REBOOT_TYPE}_plugin"
LOG_SSD_HEALTH="/usr/local/bin/log_ssd_health"
SSD_UTIL="/usr/local/bin/ssdutil"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we backport these changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants