Strengthen Copilot review instructions to use imperative "Flag any" phrasing (#62584)

kaxil · web-flow · commit a06453823a13 · 2026-02-27T20:38:48.000Z
Copilot's review agent converts rules phrased as "Flag any X" into
active code searches but treats passive phrasing as informational
context. Rewrite all critical rules (imports, assert, time.time,
lru_cache, session.commit, N+1 queries, unittest, unspec'd mocks)
to use imperative directives so the agent reliably flags violations.
diff --git a/.github/instructions/code-review.instructions.md b/.github/instructions/code-review.instructions.md
@@ -10,45 +10,49 @@ Use these rules when reviewing pull requests to the Apache Airflow repository.
 ## Architecture Boundaries
 
 - **Scheduler must never run user code.** It only processes serialized Dags. Flag any scheduler-path code that deserializes or executes Dag/task code.
-- **Workers must not access the metadata DB directly.** Task execution communicates with the API server through the Execution API (`/execution` endpoints) only.
-- **Dag Processor and Triggerer run user code in isolated processes.** Code in these components should maintain that isolation.
-- **Providers must not import core internals** like `SUPERVISOR_COMMS` or task-runner plumbing. Providers interact through the public SDK and execution API only.
+- **Flag any task execution code that accesses the metadata DB directly** instead of through the Execution API (`/execution` endpoints).
+- **Flag any code in Dag Processor or Triggerer that breaks process isolation** — these components run user code in isolated processes.
+- **Flag any provider importing core internals** like `SUPERVISOR_COMMS` or task-runner plumbing. Providers interact through the public SDK and execution API only.
 
 ## Database and Query Correctness
 
-- **N+1 queries**: Flag SQLAlchemy queries that access relationships inside loops without `joinedload()` or `selectinload()`.
-- **`run_id` is only unique per Dag.** Queries that group, partition, or join on `run_id` alone (without `dag_id`) will collide across Dags. Always require `(dag_id, run_id)` together.
-- **Cross-database compatibility**: SQL changes must work on PostgreSQL, MySQL, and SQLite. Flag database-specific features (lateral joins, window functions) without cross-DB handling.
-- **Session discipline**: In `airflow-core`, functions receiving a `session` parameter must not call `session.commit()`. Use keyword-only `session` parameters.
+- **Flag any SQLAlchemy relationship access inside a loop** without `joinedload()` or `selectinload()` — this is an N+1 query.
+- **Flag any query on `run_id` without `dag_id`.** `run_id` is only unique per Dag. Queries that filter, group, partition, or join on `run_id` alone will silently collide across Dags.
+- **Flag any `session.commit()` call in `airflow-core`** code that receives a `session` parameter. Session lifecycle is managed by the caller, not the callee.
+- **Flag any `session` parameter that is not keyword-only** (`*, session`) in `airflow-core`.
+- **Flag any database-specific SQL** (e.g., `LATERAL` joins, PostgreSQL-only functions, MySQL-only syntax) without cross-DB handling. SQL must work on PostgreSQL, MySQL, and SQLite.
 
 ## Code Quality Rules
 
-- No `assert` in production code (stripped in optimized Python).
-- `time.monotonic()` for durations, not `time.time()`.
-- Imports at top of file. Valid exceptions: circular imports, lazy loading for worker isolation, `TYPE_CHECKING` blocks.
-- Guard heavy type-only imports (e.g., `kubernetes.client`) with `TYPE_CHECKING` in multi-process code paths.
-- Unbounded caches are bugs: all `@lru_cache` must have `maxsize`.
-- Resources (files, connections, sessions) must use context managers or `try/finally`.
+- **Flag any `assert` in non-test code.** `assert` is stripped in optimized Python (`python -O`), making it a silent no-op in production.
+- **Flag any `time.time()` used for measuring durations.** Use `time.monotonic()` instead — `time.time()` is affected by system clock adjustments.
+- **Flag any `from` or `import` statement inside a function or method body.** Imports must be at the top of the file. The only valid exceptions are: (1) circular import avoidance, (2) lazy loading for worker isolation, (3) `TYPE_CHECKING` blocks. If the import is inside a function, ask the author to justify why it cannot be at module level.
+- **Flag any `@lru_cache(maxsize=None)`.** This creates an unbounded cache — every unique argument set is cached forever. Note: `@lru_cache()` without arguments defaults to `maxsize=128` and is fine.
+- **Flag any heavy import** (e.g., `kubernetes.client`) in multi-process code paths that is not behind a `TYPE_CHECKING` guard.
+- **Flag any file, connection, or session opened without a context manager or `try/finally`.**
 
 ## Testing Requirements
 
-- New behavior requires tests covering success, failure, and edge cases.
-- Use pytest patterns, not `unittest.TestCase`.
-- Use `spec`/`autospec` when mocking.
-- Use `time_machine` for time-dependent tests.
-- Imports belong at the top of test files, not inside test functions.
-- Issue numbers do not belong in test docstrings.
+- **Flag any new public method or behavior without corresponding tests.** Tests must cover success, failure, and edge cases.
+- **Flag any `unittest.TestCase` subclass.** Use pytest patterns instead.
+- **Flag any `mock.Mock()` or `mock.MagicMock()` without `spec` or `autospec`.** Unspec'd mocks silently accept any attribute access, hiding real bugs.
+- **Flag any `time.sleep` or `datetime.now()` in tests.** Use `time_machine` for time-dependent tests.
+- **Flag any issue number in test docstrings** (e.g., `"""Fix for #12345"""`) — test names should describe behavior, not track tickets.
 
 ## API Correctness
 
-- `map_index` must be handled correctly for mapped tasks. Queries without `map_index` filtering may return arbitrary task instances.
-- Execution API changes must follow Cadwyn versioning (CalVer format).
+- **Flag any query on mapped task instances that does not filter on `map_index`.** Without it, queries may return arbitrary instances from the mapped set.
+- **Flag any Execution API change without a Cadwyn version migration** (CalVer format).
 
 ## UI Code (React/TypeScript)
 
 - Avoid `useState + useEffect` to sync derived state. Use nullish coalescing or nullable override patterns instead.
 - Extract shared logic into custom hooks rather than copy-pasting across components.
 
+## Generated Files
+
+- **Flag any manual edits to files in `openapi-gen/`** or Task SDK generated models. These must be regenerated, not hand-edited.
+
 ## AI-Generated Code Signals
 
 Flag these patterns that indicate low-quality AI-generated contributions:
@@ -63,7 +67,5 @@ Flag these patterns that indicate low-quality AI-generated contributions:
 
 ## Quality Signals to Check
 
-The absence of these signals in a "fix" or "optimization" PR is itself a red flag:
-
-- **Bug fixes need regression tests**: A test that fails without the fix and passes with it.
-- **Existing tests must still pass without modification**: If existing tests need changes to pass, the PR may introduce a behavioral regression.
+- **For bug-fix PRs, flag if there is no regression test** — a test that fails without the fix and passes with it.
+- **Flag any existing test modified to accommodate new behavior** — this may indicate a behavioral regression rather than a genuine fix.