11 — Testing

Tests are code that runs on every push. They earn their keep when they fail meaningfully on real regressions and pass quietly otherwise.

What good looks like

class FixedClock:
    def __init__(self, now: datetime) -> None:
        self._now = now

    def now(self) -> datetime:
        return self._now


@pytest.fixture
def clock() -> FixedClock:
    return FixedClock(datetime(2025, 1, 1, tzinfo=timezone.utc))


@pytest.fixture
def repo() -> FakeUserRepository:
    return FakeUserRepository()


def test_register_persists_user_and_stamps_created_at_when_email_is_new(
    repo: FakeUserRepository, clock: FixedClock
) -> None:
    # arrange
    service = RegistrationService(repo, clock)

    # act
    user = service.register(email="t@example.com")

    # assert
    assert repo.find(user.id) == user
    assert user.created_at == datetime(2025, 1, 1, tzinfo=timezone.utc)

The test name reads as an English sentence (11.2); arrange/act/assert sections are blank-line-separated with one concern (11.3); clock and repo are deliberately scoped fixtures, fresh per test (11.5, 11.6); FakeUserRepository is a hand-rolled domain fake while time arrives through an injected FixedClock rather than datetime.now() (11.8, 11.9); and the paired assertions check both persistence and the stamped timestamp (11.13).

Rules

11.1 — `pytest`, not `unittest`. Always.

Reasoning, step by step:

pytest is the de facto standard. It supports plain assert, fixtures, parameterization, plugins, and minimal ceremony.
unittest-style TestCase classes work in pytest but lose half the benefits — fixtures, parametrize, plain assertions.
Write tests as top-level functions. Group with class Test*: only when fixtures are class-scoped and shared.
Migrate existing unittest code when you touch it.

Enforcement: review; CI runs pytest, and a flake8/ruff check flags unittest.TestCase subclasses in tests/.

11.2 — Test names are sentences in `snake_case`.

Reasoning, step by step:

def test_returns_404_when_user_does_not_exist(): ... reads as the failure message you'll see at 2 am.
def test_user_404(): reads as a code.
Pattern: test_<action>_<expected outcome>_when_<condition>. Variations are fine; reading aloud as English is not optional.
Test files: tests/test_*.py. pytest discovers automatically.

Enforcement: review; pytest's python_files = test_*.py discovery enforces the file naming.

11.3 — Arrange / act / assert. One concern per test.

Reasoning, step by step:

Three sections: set up the world, run the operation, check the outcome. Blank lines separate them.
One concern per test. If a test asserts five unrelated facts, splitting improves failure messages.
Acceptable: multiple assertions that all verify the same concern.
Anti-pattern: god-tests that exercise an entire feature in one function. They fail in unhelpful ways and resist refactoring.

Enforcement: review; oversized test bodies flagged by a function-length lint.

11.4 — `@pytest.mark.parametrize` for table-driven cases.

Reasoning, step by step:

Don't copy-paste tests with different inputs. Parametrize.

Pattern:

@pytest.mark.parametrize(
    "raw, expected",
    [
        ("2025-01-01", date(2025, 1, 1)),
        ("2025-12-31", date(2025, 12, 31)),
    ],
    ids=["new-year", "year-end"],
)
def test_parse_iso_date(raw: str, expected: date) -> None:
    assert parse_iso_date(raw) == expected

Each row is a separate test with its own failure context. The ids= parameter labels failures readably.
For complex inputs, use a @dataclass parameter value rather than a tuple. Readable failure output.

Enforcement: review; copy-pasted near-identical test bodies rejected in favor of a parametrized table.

11.5 — Fixtures over `setUp`/`tearDown`. Scope them deliberately.

Reasoning, step by step:

setUp/tearDown run for every test, even when the setup is shared. Wasteful and noisy.
pytest fixtures: scope per-function (default), per-class, per-module, per-session. Pick by what's safe to share.

Pattern:

@pytest.fixture
def user() -> User:
    return User(id=UserId("u-1"), email="t@example.com", active=True)

Yield-fixtures for cleanup:

@pytest.fixture
def temp_dir() -> Iterator[Path]:
    d = Path(tempfile.mkdtemp())
    yield d
    shutil.rmtree(d)

Share fixtures across files via conftest.py at the test directory's root.

Enforcement: review; setUp/tearDown methods flagged alongside the unittest.TestCase lint (11.1).

11.6 — Tests are independent. No order, no shared state.

Reasoning, step by step:

Tests must be runnable in any order, alone or in groups, on any machine, in parallel.
Shared mutable state (module-level _cache: dict = {}) creates cross-test coupling. One test's failure masks another's success.
Each test gets fresh fixtures. Session-scoped fixtures are for immutable shared data only (a parsed schema, a fixed clock).
Parallel-by-default: assume pytest -n auto (with pytest-xdist) will run. Anything assuming serial order is a future flake.

Enforcement: CI runs pytest -n auto and a randomized-order plugin (pytest-randomly); order-dependent tests fail there.

11.7 — Use real implementations where possible. Mock at the genuine seam.

Reasoning, step by step:

The best test uses the real code with a real input. If real code does I/O, fake the I/O — don't mock the layer above.
Mocking everything makes the test re-state the implementation. The test passes when the implementation is wrong in the way you mocked.
The genuine seam is the external boundary: HTTP, database, clock, filesystem.
Tools: unittest.mock.Mock/MagicMock from stdlib; pytest-mock for the mocker fixture; hand-rolled Protocol-conforming fakes are often clearest.

Enforcement: review; mocks reaching past the external boundary into domain code are rejected.

11.8 — Hand-rolled fakes for domain types. Mocks for external IO.

Reasoning, step by step:

A FakeUserRepository that stores users in a dict is clearer than a MagicMock configured to return canned values.
Mocks shine when the dependency is hard to fake (an HTTP client, a database connection) — let the mocking framework do the call-recording.
Domain-shaped fakes can grow to be tiny implementations: in-memory, deterministic, fully behavioral.

Pattern:

class FakeUserRepository:
    def __init__(self) -> None:
        self._by_id: dict[UserId, User] = {}

    def save(self, user: User) -> None:
        self._by_id[user.id] = user

    def find(self, user_id: UserId) -> User | None:
        return self._by_id.get(user_id)

Enforcement: review; a MagicMock standing in for a domain repository or value object is rejected in favor of a fake.

11.9 — Time, randomness, IDs: injected, not pulled from the wild.

Reasoning, step by step:

datetime.now() inside production code is untestable. Inject a Clock Protocol.
Same for random.random(), uuid.uuid4(), any source of non-determinism.
Tests use a deterministic clock and seeded random. The fixture provides them.
The injection cost is one constructor parameter. The testability payoff is permanent.

Enforcement: review; a lint flags datetime.now, uuid.uuid4, and random.* reached directly inside production modules.

11.10 — Property-based tests where invariants are natural: `hypothesis`.

Reasoning, step by step:

hypothesis generates inputs and checks invariants. It finds edge cases your manual tests miss.
Good properties: round-trip (decode(encode(x)) == x), idempotence (f(f(x)) == f(x)), monotonicity (a <= b ⇒ f(a) <= f(b)).
Don't use it as a clever way to run the same test against random nonsense.
Shrinking is the killer feature: failed tests shrink to a minimal failing case. hypothesis shrinks well.

Enforcement: review; round-trip, idempotence, and monotonicity invariants expected to carry a hypothesis property test.

11.11 — Async tests: `pytest-asyncio` with `@pytest.mark.asyncio`.

Reasoning, step by step:

pytest-asyncio lets you write async def test_*. Configure once in pyproject.toml:
```
[tool.pytest.ini_options]
asyncio_mode = "auto"
```
Alternative: anyio's test plugin if you target multiple async libraries.
For time-based async tests, use freezegun or a fake clock — don't await asyncio.sleep(1) in tests.

Enforcement: asyncio_mode = "auto" set in pyproject.toml; review confirms async tests use the plugin, not ad-hoc event-loop wiring.

11.12 — No `time.sleep` in tests. Use polling, fake clocks, or async test machinery.

Reasoning, step by step:

time.sleep(1) is a flake waiting to happen — too short on slow CI, too long for fast tests.
For async timing: fake the clock or use anyio.move_on_after.
For waiting on a condition: poll with a timeout.
A test that needs to sleep "to give the thread time to start" has the wrong synchronization model.

Enforcement: review; a lint flags time.sleep under tests/.

11.13 — Assertion density mirrors production: 2+ per test on average.

Reasoning, step by step:

A test with one assertion checks one fact. Often that's right.
Complex outcomes assert both the positive (what happened) and the negative (what didn't). "User was created" implies user.id is not None and repo.count == 1 and audit.called_once.
Pair-assertion: verify the same property two ways. After sort(xs), assert (a) len(xs) unchanged and (b) ordering invariant holds.
Don't pile up unrelated assertions to hit a number.

Enforcement: review; single-assertion tests of complex outcomes questioned for a missing negative or paired check.

11.14 — Coverage is a floor, not a ceiling. 100% coverage with 0% meaning is worse than 70% with deliberate tests.

Reasoning, step by step:

Code coverage measures lines executed. It doesn't measure assertion quality.
Set a minimum (often 80%) and fail builds below. Don't gloat about 100%.
Focus tests on: behavior at boundaries, edge cases (empty, max, off-by-one), error paths, regressions.
Mutation testing (mutmut, cosmic-ray) is the next level — but expensive. Use selectively.

Enforcement: CI fails the build below the configured --cov-fail-under threshold.

Cross-references

Protocols for testable seams: chapter 03, chapter 06.
Async testing primitives: chapter 09.
Test fixtures and the testability of constructor injection: chapter 10.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

11 — Testing

What good looks like

Rules

11.1 — `pytest`, not `unittest`. Always.

11.2 — Test names are sentences in `snake_case`.

11.3 — Arrange / act / assert. One concern per test.

11.4 — `@pytest.mark.parametrize` for table-driven cases.

11.5 — Fixtures over `setUp`/`tearDown`. Scope them deliberately.

11.6 — Tests are independent. No order, no shared state.

11.7 — Use real implementations where possible. Mock at the genuine seam.

11.8 — Hand-rolled fakes for domain types. Mocks for external IO.

11.9 — Time, randomness, IDs: injected, not pulled from the wild.

11.10 — Property-based tests where invariants are natural: `hypothesis`.

11.11 — Async tests: `pytest-asyncio` with `@pytest.mark.asyncio`.

11.12 — No `time.sleep` in tests. Use polling, fake clocks, or async test machinery.

11.13 — Assertion density mirrors production: 2+ per test on average.

11.14 — Coverage is a floor, not a ceiling. 100% coverage with 0% meaning is worse than 70% with deliberate tests.

Cross-references

Uh oh!

FilesExpand file tree

11-testing.md

Latest commit

History

11-testing.md

File metadata and controls

11 — Testing

What good looks like

Rules

11.1 — pytest, not unittest. Always.

11.2 — Test names are sentences in snake_case.

11.3 — Arrange / act / assert. One concern per test.

11.4 — @pytest.mark.parametrize for table-driven cases.

11.5 — Fixtures over setUp/tearDown. Scope them deliberately.

11.6 — Tests are independent. No order, no shared state.

11.7 — Use real implementations where possible. Mock at the genuine seam.

11.8 — Hand-rolled fakes for domain types. Mocks for external IO.

11.9 — Time, randomness, IDs: injected, not pulled from the wild.

11.10 — Property-based tests where invariants are natural: hypothesis.

11.11 — Async tests: pytest-asyncio with @pytest.mark.asyncio.

11.12 — No time.sleep in tests. Use polling, fake clocks, or async test machinery.

11.13 — Assertion density mirrors production: 2+ per test on average.

11.14 — Coverage is a floor, not a ceiling. 100% coverage with 0% meaning is worse than 70% with deliberate tests.

Cross-references

11.1 — `pytest`, not `unittest`. Always.

11.2 — Test names are sentences in `snake_case`.

11.4 — `@pytest.mark.parametrize` for table-driven cases.

11.5 — Fixtures over `setUp`/`tearDown`. Scope them deliberately.

11.10 — Property-based tests where invariants are natural: `hypothesis`.

11.11 — Async tests: `pytest-asyncio` with `@pytest.mark.asyncio`.

11.12 — No `time.sleep` in tests. Use polling, fake clocks, or async test machinery.