Skip to content

Commit 3cd2cfc

Browse files
committed
Fix parallel chunk scanning sharing single DB connection across threads (v2.6.38)
Parallel chunk workers (v2.6.32+) shared the parent PixelProbe instance which uses StaticPool with a single DB connection. Concurrent _save_to_cache() calls from 3+ threads caused silent write loss -- files ended up with scan_status='completed' but scan_date=NULL. The v2.6.36 raw SQL UPDATE masked this by marking unscanned files as done. Fix: each chunk worker thread gets its own PixelProbe instance via threading.local(), with tracked engine disposal after scanning completes.
1 parent a260b69 commit 3cd2cfc

3 files changed

Lines changed: 36 additions & 3 deletions

File tree

CHANGELOG.MD

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,14 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0).
77

8+
## [2.6.38] - 2026-04-06
9+
10+
### Fixed
11+
12+
- **Fix newly discovered files marked completed without being scanned**: Parallel chunk scanning (v2.6.32+) passed the parent `PixelProbe` instance to all chunk worker threads. Each `PixelProbe` uses a `StaticPool` with a single DB connection -- when 3+ chunk threads shared it concurrently, `_save_to_cache()` writes were silently lost due to transaction interference. Files ended up with `scan_status='completed'` but `scan_date=NULL` and `scan_tool=NULL`. Fixed by creating a per-thread `PixelProbe` instance in each chunk worker, giving each its own isolated DB connection. The v2.6.36 raw SQL `UPDATE SET scan_status = 'completed'` fix masked the issue by marking unscanned files as done.
13+
14+
---
15+
816
## [2.6.37] - 2026-04-06
917

1018
### Fixed

pixelprobe/services/scan_service.py

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1457,14 +1457,21 @@ def _parallel_scan_chunks(self, checker: PixelProbe, chunks: List[ScanChunk],
14571457
files_scanned_lock = threading.Lock()
14581458
discovery_lock = threading.Lock()
14591459
failed_chunks = [] # Track chunks that fail during parallel processing for retry
1460-
1460+
14611461
# Create progress tracker for scan
14621462
progress_tracker = ProgressTracker('scan')
14631463

14641464
# Capture Flask app for worker threads
14651465
from flask import current_app
14661466
app = current_app._get_current_object()
14671467

1468+
# Thread-local PixelProbe instances. The parent checker uses StaticPool
1469+
# (single DB connection) which causes data races when multiple chunk
1470+
# threads share it. Each thread gets its own instance via threading.local().
1471+
chunk_thread_local = threading.local()
1472+
thread_checkers = [] # Track all instances for connection cleanup
1473+
thread_checkers_lock = threading.Lock()
1474+
14681475
def scan_chunk(chunk_db_id, chunk_id_str):
14691476
"""Process a single chunk in a worker thread.
14701477
@@ -1490,9 +1497,19 @@ def scan_chunk(chunk_db_id, chunk_id_str):
14901497
pass
14911498
return chunk_id_str, 0
14921499

1500+
if not hasattr(chunk_thread_local, 'checker'):
1501+
chunk_thread_local.checker = PixelProbe(
1502+
database_path=self.database_uri,
1503+
excluded_paths=checker.excluded_paths,
1504+
excluded_extensions=checker.excluded_extensions,
1505+
excluded_patterns=checker.excluded_patterns
1506+
)
1507+
with thread_checkers_lock:
1508+
thread_checkers.append(chunk_thread_local.checker)
1509+
14931510
chunk_file_workers = 1 if chunk_workers > 1 else num_workers
14941511
try:
1495-
self._scan_chunk_files(thread_chunk, checker, force_rescan, 0, 0,
1512+
self._scan_chunk_files(thread_chunk, chunk_thread_local.checker, force_rescan, 0, 0,
14961513
thread_scan_state, num_workers=chunk_file_workers,
14971514
use_atomic_increment=True)
14981515
except Exception as e:
@@ -1639,6 +1656,14 @@ def scan_chunk(chunk_db_id, chunk_id_str):
16391656
except Exception as e:
16401657
logger.error(f"Chunk {failed_cid} failed on retry: {e}")
16411658

1659+
# Dispose all thread-local PixelProbe DB engines to release connections
1660+
for tc in thread_checkers:
1661+
try:
1662+
if tc._db_engine:
1663+
tc._db_engine.dispose()
1664+
except Exception:
1665+
pass
1666+
16421667
# Complete scan
16431668
if self.scan_cancelled:
16441669
self._handle_scan_cancellation(scan_state)

pixelprobe/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
# Default version - this is the single source of truth
55

66

7-
_DEFAULT_VERSION = '2.6.37'
7+
_DEFAULT_VERSION = '2.6.38'
88

99

1010
# Allow override via environment variable for CI/CD, but default to the hardcoded version

0 commit comments

Comments
 (0)