Skip to content

Conversation

@lotus-nexthop
Copy link

@lotus-nexthop lotus-nexthop commented Oct 29, 2025

Upstream commits:

The patch had to be adapted to v6.1 we're using, that was basically adding the entire contents (5 constants) of fch.h as the file didn't exist in v6.1, and updating the patch for amd.c for context.

Testing

If we intentionally trigger a CPU soft reset (with sudo reboot -f) I see this:

admin@gold208-dut:~$ sudo dmesg | grep -i reason
[    0.635233] x86/amd: Previous system reset reason [0x00080800]: software wrote 0x6 to reset control register 0xCF9

If we intentionally trigger the CPU FCH Watchdog, I see this:

admin@gold208-dut:~$ sudo dmesg | grep reason
[    0.632563] x86/amd: Previous system reset reason [0x02000800]: hardware watchdog timer expired

To enable watchdog we create a
/etc/systemd/system.conf.d/override.conf‎
with the contents:

[Manager]
RuntimeWatchdogSec=default
WatchdogDevice=/dev/watchdog1

To trigger the watchdog:
sudo tee /dev/watchdog1 and enter just one character and let the device be for a minute or so.

If I intentionally trigger a CPU soft reset I see this:
```
admin@gold208-dut:~$ sudo dmesg | grep -i reason
[    0.635233] x86/amd: Previous system reset reason [0x00080800]: software wrote 0x6 to reset control register 0xCF9
```

If I intentionally trigger the CPU FCH Watchdog, I see this:
```
admin@gold208-dut:~$ sudo dmesg | grep reason
[    0.632563] x86/amd: Previous system reset reason [0x02000800]: hardware watchdog timer expired
```

Upstream from here:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=ab8131028710d009ab93d6bffd2a2749ade909b0

The patch had to be adapted to v6.1 we're using, that was basically
adding the entire contents (5 constants) of `fch.h` as the file didn't
exist in v6.1, and updating the patch for `amd.c` for context.
@mssonicbld
Copy link

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lotus-nexthop lotus-nexthop marked this pull request as ready for review October 29, 2025 02:08
@lotus-nexthop lotus-nexthop requested a review from a team as a code owner October 29, 2025 02:08
@paulmenzel
Copy link
Contributor

How can these events be triggered?

From: Yazen Ghannam <[email protected]>
Date: Tue, 22 Apr 2025 18:48:30 -0500
Subject: [PATCH 1/2] x86/CPU/AMD: Print the reason for the last reset

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the upstream commit hash as done for stable series commits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @paulmenzel , please take a look at 46ca756 to see if that is what you had in mind.

@nate-nexthop
Copy link

How can these events be triggered?

As I understand it, writing 0x6 to 0xcf9 is a standard way of reboting an x86 CPU. I can trigger this with sudo reboot -f on SONiC with this CPU.

Triggering the FCH watchdog on SONiC with an AMD Zen3 CPU, I can do by enabling the watchdog and never petting it.
As a hack, I can do this:
sudo tee /dev/watchdog1 and enter just one character and let the device be for a minute or so.
After the reboot, I see this:
[ 0.613853] x86/amd: Previous system reset reason [0x02000800]: hardware watchdog timer expired

@mssonicbld
Copy link

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants