Skip to content

Commit 77c8e1f

Browse files
Enhance crash telemetry with richer diagnostics and EndBuild hang detection (#13304)
## Summary When MSBuild crashes via throw-helpers like `ErrorUtilities.ThrowInternalError`, crash telemetry previously captured only the throw-helper frame in `StackTop` — making triage nearly impossible since all `InternalErrorException` crashes look identical. Additionally, when `EndBuild()` hangs waiting for submissions or nodes, no telemetry was emitted at all because the crash telemetry in the `finally` block is unreachable during a hang. This PR addresses both problems. ## Changes ### Richer crash diagnostics (all crash types) | Property | Purpose | |----------|---------| | `StackCaller` | First meaningful caller frame, skipping known throw-helpers (`ThrowInternalError`, `VerifyThrow`, etc.) — the frame you actually need for triage | | `FullStackTrace` | Complete stack trace with file paths sanitized, capped at 4096 chars | | `ExceptionMessage` | Truncated exception message (256 chars) with file paths redacted to avoid PII | | `CrashThreadName` | Thread name at crash time (main, worker, node communication, etc.) | ### EndBuild hang detection Replaces infinite `WaitOne()` calls in `EndBuild()` with **timed 30-second loops** that emit periodic diagnostic telemetry via `CrashExitType.EndBuildHang`: | Property | Purpose | |----------|---------| | `EndBuildWaitPhase` | Which wait point is stuck ("WaitingForSubmissions" / "WaitingForNodes") | | `EndBuildWaitDurationMs` | How long the hang has lasted | | `PendingSubmissionCount` | Submissions still in the pending dictionary | | `SubmissionsWithResultNoLogging` | Submissions that have a result but `LoggingCompleted` is false — the ones blocking EndBuild | | `ThreadExceptionRecorded` | Whether a thread exception exists on the BuildManager | | `UnmatchedProjectStartedCount` | Orphaned ProjectStarted events (no corresponding ProjectFinished) | Hang state is also persisted to `%TEMP%\MSBuild_pid-{pid}.hang.txt` via `DumpHangDiagnosticsToFile` for later retrieval from customer machines. ### PII protection All new telemetry properties are sanitized to prevent PII leaks: - **Stack frames**: File paths replaced with `<redacted>` (preserves line numbers) - **Exception messages**: Regex redaction of Windows (`C:\...`) and Unix (`/...`) paths → `<path>` - **MSB0001 prefix stripping**: Removes boilerplate `MSB0001: Internal MSBuild Error:` prefix --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 074bba0 commit 77c8e1f

File tree

5 files changed

+814
-2
lines changed

5 files changed

+814
-2
lines changed

src/Build/BackEnd/BuildManager/BuildManager.cs

Lines changed: 62 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1038,9 +1038,22 @@ public void EndBuild()
10381038
}
10391039
}
10401040

1041-
_noActiveSubmissionsEvent!.WaitOne();
1041+
{
1042+
Stopwatch hangWatch = Stopwatch.StartNew();
1043+
while (!_noActiveSubmissionsEvent!.WaitOne(CrashTelemetryRecorder.EndBuildHangDiagnosticsIntervalMs))
1044+
{
1045+
EmitEndBuildHangDiagnostics("WaitingForSubmissions", hangWatch);
1046+
}
1047+
}
1048+
10421049
ShutdownConnectedNodes(false /* normal termination */);
1043-
_noNodesActiveEvent!.WaitOne();
1050+
{
1051+
Stopwatch hangWatch = Stopwatch.StartNew();
1052+
while (!_noNodesActiveEvent!.WaitOne(CrashTelemetryRecorder.EndBuildHangDiagnosticsIntervalMs))
1053+
{
1054+
EmitEndBuildHangDiagnostics("WaitingForNodes", hangWatch);
1055+
}
1056+
}
10441057

10451058
// Wait for all of the actions in the work queue to drain.
10461059
// _workQueue.Completion.Wait() could throw here if there was an unhandled exception in the work queue,
@@ -1230,6 +1243,53 @@ private void RecordCrashTelemetry(Exception exception, bool isUnhandled)
12301243
host);
12311244
}
12321245

1246+
/// <summary>
1247+
/// Extracts build state under lock and delegates to <see cref="CrashTelemetryRecorder"/>
1248+
/// for EndBuild hang diagnostic telemetry emission. Also writes diagnostics to disk
1249+
/// via <see cref="ExceptionHandling.DumpHangDiagnosticsToFile"/>.
1250+
/// </summary>
1251+
private void EmitEndBuildHangDiagnostics(string waitPhase, Stopwatch hangWatch)
1252+
{
1253+
int pendingSubmissionCount;
1254+
int submissionsWithResultNoLogging = 0;
1255+
bool threadExceptionRecorded;
1256+
int unmatchedProjectStartedCount;
1257+
string? host;
1258+
1259+
lock (_syncLock)
1260+
{
1261+
foreach (BuildSubmissionBase submission in _buildSubmissions.Values)
1262+
{
1263+
if (submission.BuildResultBase is not null && !submission.LoggingCompleted)
1264+
{
1265+
submissionsWithResultNoLogging++;
1266+
}
1267+
}
1268+
1269+
pendingSubmissionCount = _buildSubmissions.Count;
1270+
threadExceptionRecorded = _threadException is not null;
1271+
unmatchedProjectStartedCount = _projectStartedEvents.Count;
1272+
host = _buildTelemetry?.BuildEngineHost ?? BuildEnvironmentState.GetHostName();
1273+
}
1274+
1275+
string diagnostics = $"Phase={waitPhase}, Duration={hangWatch.ElapsedMilliseconds}ms, " +
1276+
$"PendingSubmissions={pendingSubmissionCount}, WithResultNoLogging={submissionsWithResultNoLogging}, " +
1277+
$"ThreadException={threadExceptionRecorded}, UnmatchedProjectStarted={unmatchedProjectStartedCount}";
1278+
1279+
ExceptionHandling.DumpHangDiagnosticsToFile(diagnostics);
1280+
1281+
CrashTelemetryRecorder.CollectAndEmitEndBuildHangDiagnostics(
1282+
waitPhase,
1283+
hangWatch.ElapsedMilliseconds,
1284+
pendingSubmissionCount,
1285+
submissionsWithResultNoLogging,
1286+
threadExceptionRecorded,
1287+
unmatchedProjectStartedCount,
1288+
ProjectCollection.Version?.ToString(),
1289+
NativeMethodsShared.FrameworkName,
1290+
host);
1291+
}
1292+
12331293

12341294
/// <summary>
12351295
/// Convenience method. Submits a lone build request and blocks until results are available.

0 commit comments

Comments
 (0)