Skip to content

Experiment: Clone DB to isolate tests#3872

Draft
PeterNerlich wants to merge 2 commits intodevelopfrom
tests/clonedb
Draft

Experiment: Clone DB to isolate tests#3872
PeterNerlich wants to merge 2 commits intodevelopfrom
tests/clonedb

Conversation

@PeterNerlich
Copy link
Copy Markdown
Contributor

Short description

This is one ongoing experiment trying to reign in the growing chaos in our tests

Proposed changes

  • Implement a function snapshot_db() to clone the current database and switch out djangos connection, and reverting it afterwards
  • Provide a session scoped fixture to make an empty database snapshot (all other fixtures depend on this one way or another so we can be sure that this will actually be an empty one)
  • Provide a session scoped fixture to make a snapshot and fill it with the test data
  • Provide a function scoped fixture to just make a snapshot. This is what every test needing the db should use, and replaces the load_test_data fixture for tests

This experiment is still Work In Progress.

Side effects

  • Side effects between tests should be mitigated with this

Faithfulness to issue description and design

There are no intended deviations from the issue and design.

Resolved issues

Fixes: #3777


Pull Request Review Guidelines

@PeterNerlich PeterNerlich changed the title Experiment: Experiment: Clone DB to isolate tests Sep 9, 2025
@MizukiTemma MizukiTemma added the stale This PR has been stale for a while (~3 months). This is a first warning. label Jan 26, 2026
@dkehne
Copy link
Copy Markdown
Collaborator

dkehne commented Jan 26, 2026

Analysis & Suggestions for Database Cloning Approach

This is a great approach! Database cloning for test isolation is being discussed upstream in Django as a faster alternative to serialized_rollback. Here's my analysis:


Issues Found

1. Missing Imports

The snapshot_db() function uses sqlite3 and urllib.parse but they're not imported:

import sqlite3
import urllib.parse

2. Settings Dict Reference Issue

prev_settings[db_name] = conn.settings_dict  # This is a reference, not a copy!

When conn.settings_dict is modified later (e.g., conn.settings_dict["NAME"] = sandbox_uri), prev_settings will point to the modified dict. Use:

prev_settings[db_name] = conn.settings_dict.copy()

3. Potential UnboundLocalError

test_database_name is assigned inside the loop but used in cleanup. If databases is empty, this would fail:

test_database_name = None  # Already there, good

# But in cleanup:
conn.creation.destroy_test_db(old_database_name=test_database_name)  # Could be None

Consider storing per-database:

prev_db_names = {}
# ...
prev_db_names[db_name] = conn.settings_dict["NAME"]

4. SQLite In-Memory Reconnection Order

conn.close()
conn.connect()  # reconnect before closing so we don't lose the db
target.close()

The comment says "reconnect before closing" but conn.close() is called first. This might work but the logic is confusing. Consider:

# Keep target connection open until Django reconnects
conn.settings_dict["NAME"] = sandbox_uri
conn.close()
conn.connect()  # Now connected to the clone
# Safe to close the backup connections
source.close()
target.close()

Architectural Suggestions

1. Consider Using @contextmanager Decorator

Instead of manually calling contextmanager(snapshot_db), define it as one:

from contextlib import contextmanager

@contextmanager
def snapshot_db(django_db_blocker, suffix="snap", databases=(DEFAULT_DB_ALIAS,)):
    # ... setup ...
    try:
        yield
    finally:
        # ... cleanup ...

2. Fixture Dependency Simplification

The current pattern requires tests to use both test_data_db_snapshot AND db_snapshot:

def test_foo(test_data_db_snapshot: None, db_snapshot: None):

Consider a single fixture that handles the hierarchy:

@pytest.fixture(scope="function")
def isolated_test_db(test_data_db_snapshot, django_db_blocker):
    """Provides an isolated database with test data for each test."""
    yield from snapshot_db(django_db_blocker, suffix="test")

Then tests just need:

def test_foo(isolated_test_db):

3. Add Error Handling

Wrap cleanup in try/finally to ensure database is restored even if test crashes:

try:
    yield
finally:
    with django_db_blocker.unblock():
        for db_name in databases:
            try:
                conn = connections[db_name]
                conn.creation.destroy_test_db(old_database_name=prev_db_names[db_name])
            except Exception as e:
                logger.warning(f"Failed to cleanup test db {db_name}: {e}")
            finally:
                conn.close()
                conn.settings_dict = prev_settings[db_name]
                conn.connect()

Performance Consideration

For PostgreSQL, clone_test_db() uses CREATE DATABASE ... TEMPLATE which is fast. For large test suites, consider:

  1. Parallel test workers: Each worker can have its own clone suffix
  2. Caching: If test data fixtures are expensive, the session-scoped test_data_db_snapshot is the right approach

References


Note: This analysis was done with AI assistance (Claude Code).

@hannaseithe hannaseithe removed their assignment Feb 2, 2026
@osmers osmers added this to the Next milestone Mar 17, 2026
@jarlhengstmengel jarlhengstmengel removed this from the Next milestone Mar 17, 2026
@MizukiTemma
Copy link
Copy Markdown
Member

This PR has not been worked for a long time but is usuful as inspiration for test improvement: see the conversation here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale This PR has been stale for a while (~3 months). This is a first warning.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Isolate tests on database level

6 participants