Commit 77c8e1f
Enhance crash telemetry with richer diagnostics and EndBuild hang detection (#13304)
## Summary
When MSBuild crashes via throw-helpers like
`ErrorUtilities.ThrowInternalError`, crash telemetry previously captured
only the throw-helper frame in `StackTop` — making triage nearly
impossible since all `InternalErrorException` crashes look identical.
Additionally, when `EndBuild()` hangs waiting for submissions or nodes,
no telemetry was emitted at all because the crash telemetry in the
`finally` block is unreachable during a hang.
This PR addresses both problems.
## Changes
### Richer crash diagnostics (all crash types)
| Property | Purpose |
|----------|---------|
| `StackCaller` | First meaningful caller frame, skipping known
throw-helpers (`ThrowInternalError`, `VerifyThrow`, etc.) — the frame
you actually need for triage |
| `FullStackTrace` | Complete stack trace with file paths sanitized,
capped at 4096 chars |
| `ExceptionMessage` | Truncated exception message (256 chars) with file
paths redacted to avoid PII |
| `CrashThreadName` | Thread name at crash time (main, worker, node
communication, etc.) |
### EndBuild hang detection
Replaces infinite `WaitOne()` calls in `EndBuild()` with **timed
30-second loops** that emit periodic diagnostic telemetry via
`CrashExitType.EndBuildHang`:
| Property | Purpose |
|----------|---------|
| `EndBuildWaitPhase` | Which wait point is stuck
("WaitingForSubmissions" / "WaitingForNodes") |
| `EndBuildWaitDurationMs` | How long the hang has lasted |
| `PendingSubmissionCount` | Submissions still in the pending dictionary
|
| `SubmissionsWithResultNoLogging` | Submissions that have a result but
`LoggingCompleted` is false — the ones blocking EndBuild |
| `ThreadExceptionRecorded` | Whether a thread exception exists on the
BuildManager |
| `UnmatchedProjectStartedCount` | Orphaned ProjectStarted events (no
corresponding ProjectFinished) |
Hang state is also persisted to `%TEMP%\MSBuild_pid-{pid}.hang.txt` via
`DumpHangDiagnosticsToFile` for later retrieval from customer machines.
### PII protection
All new telemetry properties are sanitized to prevent PII leaks:
- **Stack frames**: File paths replaced with `<redacted>` (preserves
line numbers)
- **Exception messages**: Regex redaction of Windows (`C:\...`) and Unix
(`/...`) paths → `<path>`
- **MSB0001 prefix stripping**: Removes boilerplate `MSB0001: Internal
MSBuild Error:` prefix
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>1 parent 074bba0 commit 77c8e1f
File tree
5 files changed
+814
-2
lines changed- src
- Build/BackEnd/BuildManager
- Framework.UnitTests
- Framework/Telemetry
5 files changed
+814
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1038 | 1038 | | |
1039 | 1039 | | |
1040 | 1040 | | |
1041 | | - | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
1042 | 1049 | | |
1043 | | - | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
1044 | 1057 | | |
1045 | 1058 | | |
1046 | 1059 | | |
| |||
1230 | 1243 | | |
1231 | 1244 | | |
1232 | 1245 | | |
| 1246 | + | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
| 1261 | + | |
| 1262 | + | |
| 1263 | + | |
| 1264 | + | |
| 1265 | + | |
| 1266 | + | |
| 1267 | + | |
| 1268 | + | |
| 1269 | + | |
| 1270 | + | |
| 1271 | + | |
| 1272 | + | |
| 1273 | + | |
| 1274 | + | |
| 1275 | + | |
| 1276 | + | |
| 1277 | + | |
| 1278 | + | |
| 1279 | + | |
| 1280 | + | |
| 1281 | + | |
| 1282 | + | |
| 1283 | + | |
| 1284 | + | |
| 1285 | + | |
| 1286 | + | |
| 1287 | + | |
| 1288 | + | |
| 1289 | + | |
| 1290 | + | |
| 1291 | + | |
| 1292 | + | |
1233 | 1293 | | |
1234 | 1294 | | |
1235 | 1295 | | |
| |||
0 commit comments