-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Mgmt package tests are silently skipped on PR builds, hiding cross-package failures #57658
Description
Summary
Management-plane package tests are never executed during PR builds due to LimitForPullRequest: true in every ci.mgmt.yml. Combined with reduced test matrices for fork PRs and indirect packages, cross-package breaking changes can merge undetected and only surface later in scheduled builds.
This was discovered while investigating PR #57317, where an EventHubs serialization change (PR #46169) broke HealthcareApis recorded tests — but the failure was invisible because no tests ran during the PR.
Three Mechanisms That Reduce Test Coverage
1. LimitForPullRequest: true — mgmt tests completely skipped on PRs
File: eng/pipelines/templates/jobs/ci.yml:136
- ${{ if or(ne(variables['Build.Reason'], 'PullRequest'), ne(parameters.LimitForPullRequest, 'true')) }}:All ci.mgmt.yml files set LimitForPullRequest: true (template: eng/templates/Azure.ResourceManager.Template/content/ci.mgmt.yml). This means the entire test matrix and compliance jobs are skipped for any PR build targeting mgmt packages.
Impact: Mgmt package test failures are only caught by scheduled/CI builds, not during code review.
2. System.TeamProject — fork PRs get reduced builds
Files:
eng/pipelines/templates/stages/archetype-sdk-client.yml:91eng/pipelines/templates/variables/globals.yml:7-10
When a PR comes from a fork (runs in the public Azure DevOps project instead of internal):
- Only Debug configuration runs (no Release)
- Release/prerelease stages skipped
- CodeQL, ComponentGovernance, workflow enforcement skipped
- API review creation and package validation skipped
Fork repos are detected via Build.Repository.Name ending in -pr.
3. Indirect package sparse matrix — reduced coverage for transitive deps
File: eng/pipelines/templates/jobs/ci.yml:189-190
PRMatrixIndirectFilters:
- 'AdditionalTestArguments=.*true'
- 'Pool=.*LinuxPool$'Packages not directly modified in a PR (indirect/transitive dependencies) get a sparse matrix: only Linux + UseProjectReferenceToAzureClients=true. This dramatically reduces cross-platform and cross-configuration coverage for dependent packages.
Concrete Example: PR #57317
- EventHubs SDK changed serialization (PR Enable WriteCore feature for eventhub #46169, Oct 2024) — now emits
"properties": {}unconditionally - HealthcareApis recordings were made before this change
- When tested with
UseProjectReferenceToAzureClients=true(using locally-built EventHubs), tests fail withTestRecordingMismatchException - This failure was invisible because:
- HealthcareApis is a mgmt package →
LimitForPullRequest: true→ no tests on PRs - Even if tests ran, HealthcareApis would be an "indirect" package → sparse matrix only
- HealthcareApis is a mgmt package →
Workaround: Run Tests Locally
Options like creating a PR from an upstream branch or manually queuing a service-specific pipeline only test a single service — they won't catch cross-service dependency failures. For a quick local check of a specific service:
dotnet test eng/service.proj \
/p:ServiceDirectory=healthcareapis \
/p:UseProjectReferenceToAzureClients=true \
--filter "TestCategory!=Live"To test multiple services in parallel, run each service directory as a separate process:
# Run multiple services in parallel
for svc in healthcareapis compute network storage; do
dotnet test eng/service.proj \
/p:ServiceDirectory=$svc \
/p:UseProjectReferenceToAzureClients=true \
--filter "TestCategory!=Live" &
done
waitRunning all services at once (ServiceDirectory=*) is possible but will likely take too long on a single machine — each service's test suite can take a significant amount of time.
The UseProjectReferenceToAzureClients=true mode is critical — it uses locally-built dependencies instead of NuGet packages, which is where cross-package breaking changes (like the EventHubs serialization change) surface.
Proposed Improvement: Nightly CI Run
We should add a nightly scheduled pipeline that runs the full cross-service test suite with UseProjectReferenceToAzureClients=true against main. This would automatically surface cross-package breaking changes without relying on manual local runs.
Important: This pipeline should use the existing test matrix infrastructure to parallelize across services — a single monolithic dotnet test over all services would exceed the default 1-hour job timeout. Each service (or batch of services) should run as a separate job in the matrix, similar to how platform-matrix.json and Create-PrJobMatrix.ps1 distribute work today.
Questions
- Should
LimitForPullRequestbe relaxed for mgmt packages, at least for a subset of the test matrix? - Should the
net - pullrequestpipeline run indirect mgmt package tests with a minimal matrix? - What's the right granularity for the nightly run — one job per service, or batched?
/cc @m-nash