Fix #1048: stop alert engine fabricating 100% host CPU on Linux#1055
Merged
Conversation
The #1049 fix corrected every path that reads CPU from the collected table (collector, views, chart reads), but the alert engine doesn't read that table — DatabaseService.NocHealth.GetCpuPercentAsync runs its own live query against sys.dm_os_ring_buffers and was never touched. It computes other_cpu_percent = 100 - SystemIdle - ProcessUtilization, and since SystemIdle is always 0 on SQL Server on Linux, that returns 100 - sqlcpu. AlertHealthResult.TotalCpuPercent then sums to a permanent 100%, so AlertStateService's TotalCpuPercent >= CpuThresholdPercent check fires the host-CPU alert forever — exactly what the reporter still saw after installing the nightly. Fix: apply the same Linux guard used by install/18, RemoteCollectorService.Cpu, and FinOps.Inventory — detect host_platform via sp_executesql behind an OBJECT_ID(N'sys.dm_os_host_info', N'V') check (so SQL 2016 never binds the 2017+ DMV) and return NULL for other_cpu_percent on Linux. The existing TotalCpuPercent getter already falls back to the SQL-only figure when OtherCpuPercent is null, so the alert clears. Windows behavior is unchanged. Dashboard-only change — no schema or installer impact. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The reporter on #1048 installed the #1049 nightly and still saw the host-CPU alert on their OpenSUSE-hosted SQL Server.
#1049 corrected every path that reads CPU from the collected table (
install/18collector,install/47views,Overview.cs/ResourceMetrics.cschart reads). But the alert engine doesn't read that table —DatabaseService.NocHealth.GetCpuPercentAsyncruns its own live query againstsys.dm_os_ring_buffersand was never touched.It computes
other_cpu_percent = 100 - SystemIdle - ProcessUtilization. SinceSystemIdleis always0on SQL Server on Linux, that returns100 - sqlcpu.AlertHealthResult.TotalCpuPercentthen sums to a permanent 100%, soAlertStateService'sTotalCpuPercent >= CpuThresholdPercentcheck fires the host-CPU alert forever. The chart was fixed; the alert badge was not.Fix
Apply the same Linux guard already used by
install/18,RemoteCollectorService.Cpu, andFinOps.Inventory:host_platformviasp_executesqlbehind anOBJECT_ID(N'sys.dm_os_host_info', N'V')check, so SQL 2016 never binds the 2017+ DMV.NULLforother_cpu_percenton Linux.The existing
TotalCpuPercentgetter already falls back to the SQL-only figure whenOtherCpuPercentis null, so the alert clears. Windows behavior is unchanged.Scope
Dashboard-C#-only — no schema or installer change. The reporter can verify with just the next nightly Dashboard (no DB re-install needed).
Verification
NULL).NocHealth.cswas the only remaining liveSystemIdlecomputation in the Dashboard.🤖 Generated with Claude Code