JPEG pixel corruption detection, video freeze detection, parallel chunk scanning fixes by ttlequals0 · Pull Request #46 · ttlequals0/PixelProbe

ttlequals0 · 2026-04-04T17:07:00Z

Summary

JPEG pixel corruption detection: Detect visually corrupted JPEGs that pass PIL/ImageMagick validation but contain visible garbage (rainbow bands, solid color fill) in decoded pixel data
Video freeze detection: Detect videos with frozen frames using FFmpeg freezedetect filter with black frame filtering to reduce false positives
Per-worker scan progress grid: Collapsible UI grid showing each parallel chunk worker's status, directory, and progress
Parallel chunk scanning fixes: Fixed scan startup crash (undefined offset), schema migration for missing columns, batch pagination on shrinking result sets, dual-session pending file issue, and removed hardcoded 30-chunk limit on progress grid

Version

2.6.37

Test plan

Docker image built and pushed (ttlequals0/pixelprobe:2.6.37 + latest)
Deployed to production, verified 1.15M files scanned with 0 pending
JPEG pixel corruption detection tested with synthetic images (7 unit tests)
Chunk progress grid displays all chunks (no 30-chunk cap)

Add pixel-level JPEG corruption detection that catches files passing PIL and ImageMagick but containing visible garbage (rainbow bands, decoder fill). Uses two signals: sustained chaos (8+ consecutive chaotic rows) and bottom-anchored solid fill (30+ identical rows reaching image bottom). Designed to avoid false positives on high-contrast thumbnails and photos.

Solid fill detection now requires 3+ chaotic rows in the 10 rows preceding the fill streak. This eliminates false positives on channel art with solid backgrounds (Fireship fanart/banner) while still catching corruption where decoder fill follows garbage data.

Add file size guard (skip >10MB), image dimensions guard (skip >30MP), and 30s timeout to prevent OOM kills and hangs when scanning large DSLR photos. Move dimensions check before RGB conversion to avoid unnecessary memory allocation. Fix chaos_region_start tracking bug in detail strings.

Add pool_reset_on_return='rollback' to SQLAlchemy engine options and db.engine.dispose() on DatabaseError/OperationalError in the scan error handler. Prevents psycopg2 PGRES_TUPLES_OK errors from blocking all subsequent scans after a worker crash.

Eliminate redundant Image.open() in JPEG pixel analysis by passing the already-loaded PIL Image from the caller. Previously every JPEG was opened 3 times (verify, load, pixel analysis), causing cumulative memory growth that killed the worker after ~700 files. Also adds pool_reset_on_return='rollback' and db.engine.dispose() on DatabaseError to recover from corrupted connections after worker crashes.

…6.15) The JPEG pixel analysis was using PIL's PixelAccess C extension for 80M+ pixel reads across 8000 files, causing the forked worker to silently crash (no OOM, no error, no segfault in dmesg). Replace with img.tobytes() to extract raw pixel data once as a Python bytes object, then compute row averages from bytes indexing -- zero PIL C calls during the analysis loop. Also add worker_max_memory_per_child=512MB to Celery config as a safety net for long-running image scans.

512MB was too aggressive -- the worker process uses ~300-400MB just for Python + Celery + Flask + SQLAlchemy before scanning any files. It was getting killed immediately during discovery phase, corrupting the DB connection and preventing any scans from completing.

…6.17) Root cause: Pillow Image.close() does not deallocate pixel data (python-pillow/Pillow#3610). tobytes() created 36MB Python allocations per image that bypassed PIL's block allocator, fragmenting memory over thousands of files until the worker was killed. Fixes: - Downscale image to ~200px wide before pixel analysis (90KB vs 36MB) - Add gc.collect() after each scan chunk (PIL circular ref cleanup) - Set PILLOW_BLOCKS_MAX=256 in docker-compose (PIL block reuse) - Remove worker_max_memory_per_child (was killing workers and corrupting DB connections via psycopg2 fork inheritance)

….18) gc.collect() at the end of each chunk triggers PIL C-level destructors in the forked worker process, causing a silent crash. Previous versions without gc.collect() survived 8000+ files across multiple chunks. With gc.collect(), the worker died after the first chunk (~1000 files).

Root cause: _create_scanning_chunks() loaded ALL file paths into memory via .all(), consuming ~200MB+ for 600K files. The worker accumulated this on top of discovery/adding phase memory, then died silently when PIL/ImageMagick processing started. Fixes: - Replace .all() with yield_per() streaming in chunk creation -- holds only one chunk (~1000 paths) in memory at a time - Add composite index (scan_status, file_path) for optimal pending file query performance - Add cancel_futures=True to ThreadPoolExecutor shutdown to prevent indefinite hangs at chunk boundaries - Revert pool_reset_on_return and db.engine.dispose that were treating symptoms of this underlying bug

…2.6.21) Root cause confirmed via pg_stat_activity: 20 ThreadPoolExecutor threads all executed UPDATE scan_state on the same row simultaneously, creating a PostgreSQL row-level lock convoy that permanently blocked scanning. Fix: progress updates now happen once per batch (100 files) from the main thread after all futures complete, instead of per-file from inside the as_completed loop. Uses db.session directly instead of deprecated Session(bind=) API.

…es (v2.6.22) The previous fix (v2.6.21) only fixed the parallel FILE path inside _scan_chunk_files. But with MAX_WORKERS=20, the code takes the parallel CHUNK path (_parallel_scan_chunks), which spawns 20 chunk-level threads each running _scan_chunk_files with num_workers=1 (sequential). Those 20 sequential threads each did per-file UPDATE scan_state, creating the same row-lock convoy. Fix: when use_atomic_increment=True (parallel chunk mode), skip per-file scan_state DB updates entirely. The chunk completion handler in _parallel_scan_chunks already handles aggregate progress.

Even with explicit scan_state updates skipped, the ORM object could be dirty from earlier attribute access. db.session.commit() flushes ALL dirty objects, causing the same UPDATE scan_state contention. Fix: db.session.expire(scan_state) before commit in parallel mode, preventing the ORM from flushing stale scan_state attributes that create row-level lock convoys.

….24) The expire(scan_state) fix for the row-lock convoy also stopped last_update from being written by the scan worker. The UI progress worker that should maintain last_update independently fails to launch when Redis is congested from previous crash retries (the .delay() call succeeds but the task is never picked up by a worker). Fix: raw SQL UPDATE scan_state SET last_update at chunk completion. This runs once per chunk (every few minutes) from one thread at a time, avoiding the contention that caused the original deadlock while keeping the scan alive for the stuck scan checker.

Collapsible grid below the main progress bar showing each parallel chunk worker's status during scanning phase. Shows directory path, files scanned/total, and mini progress bar per chunk. Collapsed by default, expandable via toggle button. Backend: extend /api/scan-status with chunks array from ScanChunk table Frontend: safe DOM-based rendering (no innerHTML), dark mode support, mobile responsive with hidden progress bars and truncated paths.

Root cause: _parallel_scan_chunks worker threads shared Flask's scoped db.session, causing "concurrent operations are not permitted" errors. Every chunk immediately errored, making scans complete with 0 files. Fixed by calling db.session.remove() at thread start for a fresh session. Also includes per-worker progress grid UI (collapsible, shows each chunk worker's status during parallel scanning).

A single chunk failure (ResourceClosedError from stale DB connection) propagated from scan_chunk() through future.result() and crashed the entire scan task. Added try/except around _scan_chunk_files so failed chunks are marked as error and retried later, without killing the scan. Reverted db.session.remove() which was corrupting the main thread's session -- Flask-SQLAlchemy already provides thread-local scoping via app_context().

…ng (v2.6.28) Two issues: 1. Progress bar disappeared after clicking Start Scan because the API returned the previous scan's "completed" status before the new one initialized. Added 15-second grace period after user-initiated start to ignore stale completed status. 2. Worker grid showed no chunks because chunk objects created in the main thread's session weren't visible to worker thread sessions. Used db.session.merge() at chunk processing start and raw SQL UPDATE at chunk completion to ensure cross-thread visibility.

Fixes 7 interconnected bugs from mixing ORM, raw SQL, and server-side cursors across threads: 1. Replace yield_per() with limit/offset pagination -- eliminates server-side cursor that held transactions open across operations 2. Pass chunk DB IDs to worker threads instead of ORM objects -- each thread queries fresh objects in its own session 3. Remove ALL scan_state writes from worker threads when in parallel chunk mode -- main thread as_completed loop is the single writer 4. Expire chunk ORM before raw SQL completion to prevent ORM from overwriting raw SQL values on commit 5. Add rollback before error-handling SQL in exception handlers 6. Remove pointless double commit and unnecessary expire calls 7. Worker grid UI: 3-second delay on scan start to avoid stale status, chunk files_scanned updated every 100 files via raw SQL

Two regressions from the limit/offset pagination change: 1. Chunk creation used OFFSET/LIMIT which shifts when concurrent processes change file status between queries, skipping ~23% of files. Replaced with keyset pagination (WHERE file_path > last_path) which is stable regardless of concurrent changes. 2. _retry_pending_files loaded ALL remaining pending files (90K+) via .all() and rescanned them sequentially, blocking scan completion for hours. Now uses count() first and skips retry if > 1000 files remain (they get picked up on the next scheduled scan).

SQLAlchemy query(ScanResult.file_path) returns Row objects where row[0] can fail with IndexError in some versions. Use row.file_path named attribute access instead.

…2.6.37) Fix scan startup crash from undefined offset variable in keyset pagination. Add startup migration to sync missing scan_state/scan_chunks columns. Fix batch pagination skipping files on shrinking pending result set. Fix scans completing with ~65% files still pending by adding raw SQL status update via Flask's db.session after each scan_file() call, bridging the cross-connection visibility gap with PixelProbe's separate StaticPool session. Remove hardcoded 30-chunk limit on scan progress grid.

ttlequals0 added 19 commits April 4, 2026 13:06

ttlequals0 changed the title ~~JPEG pixel corruption detection (v2.6.10)~~ JPEG pixel corruption detection + parallel scan fixes (v2.6.30) Apr 5, 2026

ttlequals0 added 3 commits April 5, 2026 19:53

Fix IndexError in keyset pagination row access (v2.6.32)

bae766f

SQLAlchemy query(ScanResult.file_path) returns Row objects where row[0] can fail with IndexError in some versions. Use row.file_path named attribute access instead.

ttlequals0 changed the title ~~JPEG pixel corruption detection + parallel scan fixes (v2.6.30)~~ JPEG pixel corruption detection, video freeze detection, parallel chunk scanning fixes Apr 6, 2026

ttlequals0 merged commit a260b69 into main Apr 6, 2026
6 of 9 checks passed

ttlequals0 deleted the jpg-color-corruption branch April 6, 2026 20:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JPEG pixel corruption detection, video freeze detection, parallel chunk scanning fixes#46

JPEG pixel corruption detection, video freeze detection, parallel chunk scanning fixes#46
ttlequals0 merged 22 commits into
mainfrom
jpg-color-corruption

ttlequals0 commented Apr 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ttlequals0 commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Version

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ttlequals0 commented Apr 4, 2026 •

edited

Loading