Skip to content

Comments

Add Mass Transfer app with partitioned DICOM export and conversion#298

Open
NumericalAdvantage wants to merge 14 commits intomainfrom
masstransfer
Open

Add Mass Transfer app with partitioned DICOM export and conversion#298
NumericalAdvantage wants to merge 14 commits intomainfrom
masstransfer

Conversation

@NumericalAdvantage
Copy link
Collaborator

@NumericalAdvantage NumericalAdvantage commented Feb 3, 2026

This PR introduces a new Mass Transfer facility that brings the volume export workflow (https://github.com/openradx/volume-export-pipeline) into ADIT as a first‑class transfer type just like Selective Transfer or Batch transfer. Mass transfer has its own worker to keep long running low priority tasks isolated from regular selective transfer or batch transfer tasks.
Added a Nav bar item called Mass Transfer for transferring large amounts of reports.
Implemented features like mandatory DICOM files cleanup after NIfti conversion and per-study pseudonymization just like in the volume export pipeline.
Also added daily/weekly partitioning and reusable filter definitions so bulk exports can be segmented in time and re‑run consistently.
Users can share the filters with each other. We can apply group restrictions later ..

Summary by CodeRabbit

  • New Features

    • Mass Transfer module: create/manage jobs, filters, tasks, preferences; UI pages, control panels, and status/log views
    • End-to-end Mass Transfer workflow: discovery, export, DICOM→NIfTI conversion, pseudonymization option, and export cleanup
  • Bug Fixes

    • Improved failure handling and cleanup to ensure exports and task/job states are consistent on errors
  • Tests

    • Added unit tests for partitions, processing, and cleanup
  • Documentation

    • dcm2niix requirement and troubleshooting guidance
  • Chores

    • Configuration keys, admin integration, migrations, assets (JS/CSS) and URL wiring added

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @NumericalAdvantage, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new 'Mass Transfer' system, significantly enhancing the application's capability to handle large volumes of imaging data. It provides a structured workflow for exporting and converting DICOM files, incorporating features like time-based data segmentation, privacy-preserving pseudonymization, and automated file management. The goal is to offer a robust and efficient solution for managing extensive datasets, integrating seamlessly with existing transfer mechanisms.

Highlights

  • New Mass Transfer Feature: Introduced a comprehensive 'Mass Transfer' facility for large-scale DICOM data export and conversion, now a first-class transfer type alongside Selective and Batch transfer.
  • Partitioned DICOM Export: Implemented daily or weekly partitioning for mass transfer jobs, allowing users to segment bulk exports over specific time ranges and re-run them consistently.
  • Per-Study Pseudonymization and Cleanup: Added per-study pseudonymization, similar to the volume export pipeline, and mandatory cleanup of DICOM files after successful NIfTI conversion to manage storage and privacy.
  • Reusable Filter Definitions: Enabled the creation and sharing of reusable filter definitions, allowing users to apply consistent criteria across multiple mass transfer jobs.
  • Dedicated Worker Queue: Established a separate worker queue for mass transfer tasks to ensure long-running, low-priority operations are isolated and do not impact other transfer types.
Changelog
  • adit/core/management/commands/cleanup_jobs_and_tasks.py
    • Generalized cleanup methods to work with any DicomTask or DicomJob model, removing hardcoded references.
    • Included MassTransferJob and MassTransferTask in the list of models to be cleaned up.
  • adit/core/tasks.py
    • Extracted common DICOM task retry strategy into a reusable constant.
    • Refactored process_dicom_task to use an internal _run_dicom_task function for reusability.
    • Added a new process_mass_transfer_task Celery task for the dedicated mass_transfer queue.
  • adit/core/templates/core/home.html
    • Added a new navigation link for 'Mass Transfer' to the home page.
  • adit/mass_transfer/admin.py
    • Added Django admin registrations for MassTransferFilter, MassTransferJob, MassTransferSettings, MassTransferTask, and MassTransferVolume models.
  • adit/mass_transfer/apps.py
    • Configured the MassTransferConfig app, registering it as a main menu item, a DICOM processor, and a job stats collector.
    • Ensured MassTransferSettings are created upon initial database migration.
  • adit/mass_transfer/filters.py
    • Defined MassTransferJobFilter and MassTransferTaskFilter for filtering job and task lists.
  • adit/mass_transfer/forms.py
    • Created MassTransferFilterForm for managing mass transfer filters.
    • Developed MassTransferJobForm for creating new mass transfer jobs, including validation for source/destination, date ranges, and logic to generate MassTransferTask instances based on partition granularity.
  • adit/mass_transfer/migrations/0001_initial.py
    • Introduced an initial database migration to create all necessary models for the Mass Transfer feature, including filters, settings, jobs, tasks, and volumes, along with their relationships and constraints.
  • adit/mass_transfer/mixins.py
    • Implemented MassTransferLockedMixin to control access to mass transfer features based on application settings.
  • adit/mass_transfer/models.py
    • Defined core Django models: MassTransferSettings, MassTransferFilter (supporting DICOM wildcards), MassTransferJob (with partition granularity, source/destination, and filter relationships), MassTransferTask (linked to jobs and partitions), and MassTransferVolume (tracking individual series status).
    • Added methods for queuing pending tasks within MassTransferJob and MassTransferTask.
  • adit/mass_transfer/processors.py
    • Implemented MassTransferTaskProcessor to manage the core logic of mass transfer tasks.
    • Includes functionality for finding DICOM studies/series using filters and time windows, handling large result sets by splitting time windows, exporting DICOM data with pseudonymization, converting exported DICOM to NIfTI via dcm2niix, and cleaning up temporary export directories.
  • adit/mass_transfer/static/mass_transfer/mass_transfer.js
    • Added JavaScript functions to update user preferences for mass transfer form fields such as source, destination, granularity, and email notifications.
  • adit/mass_transfer/tables.py
    • Defined MassTransferJobTable and MassTransferTaskTable for displaying mass transfer job and task data in a structured format.
  • adit/mass_transfer/templates/mass_transfer/mass_transfer_filter_confirm_delete.html
    • Added template for confirming deletion of a mass transfer filter.
  • adit/mass_transfer/templates/mass_transfer/mass_transfer_filter_form.html
    • Added template for creating and updating mass transfer filters.
  • adit/mass_transfer/templates/mass_transfer/mass_transfer_filter_list.html
    • Added template for listing all mass transfer filters.
  • adit/mass_transfer/templates/mass_transfer/mass_transfer_job_detail.html
    • Added template for displaying details of a specific mass transfer job.
  • adit/mass_transfer/templates/mass_transfer/mass_transfer_job_form.html
    • Added template for creating new mass transfer jobs.
  • adit/mass_transfer/templates/mass_transfer/mass_transfer_job_list.html
    • Added template for listing all mass transfer jobs.
  • adit/mass_transfer/templates/mass_transfer/mass_transfer_layout.html
    • Added base layout template for mass transfer pages, including static JavaScript.
  • adit/mass_transfer/templates/mass_transfer/mass_transfer_task_detail.html
    • Added template for displaying details of a specific mass transfer task.
  • adit/mass_transfer/templatetags/mass_transfer_extras.py
    • Created custom template tags job_control_panel and task_control_panel to render control buttons for Mass Transfer jobs and tasks.
  • adit/mass_transfer/tests/init.py
    • Added an init.py file to mark the directory as a Python package for tests.
  • adit/mass_transfer/tests/test_partitions.py
    • Added unit tests for the partition building utility functions.
  • adit/mass_transfer/tests/test_processor.py
    • Added unit tests for the MassTransferTaskProcessor, covering aspects like time window splitting and pseudonymization grouping.
  • adit/mass_transfer/urls.py
    • Defined URL patterns for all Mass Transfer features, including CRUD operations for filters, job/task listing, detail views, and various control actions.
  • adit/mass_transfer/utils/partitions.py
    • Provided utility functions for generating daily or weekly PartitionWindow objects based on a given date range, used for segmenting mass transfer tasks.
  • adit/mass_transfer/views.py
    • Implemented Django views for managing Mass Transfer filters, jobs, and tasks, inheriting from core DICOM views where appropriate.
    • Includes MassTransferUpdatePreferencesView for user-specific settings.
  • adit/settings/base.py
    • Added 'adit.mass_transfer.apps.MassTransferConfig' to INSTALLED_APPS.
    • Introduced START_MASS_TRANSFER_UNVERIFIED setting to control the initial status of new mass transfer jobs.
    • Defined MASS_TRANSFER_DEFAULT_PRIORITY and MASS_TRANSFER_URGENT_PRIORITY for task scheduling.
    • Added MASS_TRANSFER_MAX_SEARCH_RESULTS and MASS_TRANSFER_EXPORT_BASE_DIR settings for configuring the mass transfer process.
  • adit/urls.py
    • Included 'adit.mass_transfer.urls' under the '/mass-transfer/' path in the main URL configuration.
  • docker-compose.base.yml
    • Added environment variables MASS_TRANSFER_EXPORT_BASE_DIR and MASS_TRANSFER_MAX_SEARCH_RESULTS.
  • docker-compose.dev.yml
    • Added a new 'mass_transfer_worker' service for development environments, configured to run the dedicated background worker.
  • docker-compose.prod.yml
    • Added a new 'mass_transfer_worker' service for production environments, with configurable replicas.
  • example.env
    • Updated example environment variables to include MASS_TRANSFER_MAX_SEARCH_RESULTS, MASS_TRANSFER_EXPORT_BASE_DIR, and MASS_TRANSFER_WORKER_REPLICAS.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new Mass Transfer feature: app wiring, models, migrations, admin, views, forms, templates, static assets, processors, task wiring, settings, URLs, utilities, tests, and management cleanup to discover, export, pseudonymize, convert (dcm2niix) and track bulk DICOM transfers across partitioned time windows.

Changes

Cohort / File(s) Summary
Models & Migrations
adit/mass_transfer/models.py, adit/mass_transfer/migrations/0001_initial.py, adit/mass_transfer/migrations/0002_masstransferjob_pseudonymize.py, adit/mass_transfer/migrations/0003_masstransferfilter_owner.py, adit/mass_transfer/migrations/0004_masstransferfilter_require_name.py, adit/mass_transfer/migrations/0005_masstransferfilter_unique_owner_name.py
Adds MassTransfer models (Settings, Filter, Job, Task, Volume), enums, constraints, indexes, and multiple migrations including data backfill and unique constraints.
Processing & Utilities
adit/mass_transfer/processors.py, adit/mass_transfer/utils/partitions.py, adit/mass_transfer/tests/test_processor.py, adit/mass_transfer/tests/test_partitions.py
Implements MassTransferTaskProcessor (discovery, export, pseudonymize, dcm2niix conversion, cleanup), partition builder utility, and comprehensive tests for processing and partitioning.
Tasks & Cleanup Integration
adit/core/tasks.py, adit/core/management/commands/cleanup_jobs_and_tasks.py
Centralizes DICOM task retry strategy, introduces process_mass_transfer_task entrypoint with shared retry, adds mass-transfer-specific cleanup helper and ensures MassTransferTask/MassTransferJob are included in cleanup flows.
Views, Forms & Frontend
adit/mass_transfer/views.py, adit/mass_transfer/forms.py, adit/mass_transfer/templates/mass_transfer/*, adit/mass_transfer/static/mass_transfer/*, adit/mass_transfer/templatetags/mass_transfer_extras.py, adit/mass_transfer/tables.py
Adds CRUD views, MassTransferJobForm (partition/task creation), templates (list/detail/form/task), JS/CSS for preferences, templatetags for control panels, and table mappings.
Admin, App Config & Mixins
adit/mass_transfer/admin.py, adit/mass_transfer/apps.py, adit/mass_transfer/filters.py, adit/mass_transfer/mixins.py
Registers admin entries, app config with ready() hook (menu/processor/stats collector), filters, and MassTransferLockedMixin.
Settings, URLs & Integration
adit/settings/base.py, adit/urls.py, adit/core/templates/core/home.html
Adds MassTransferConfig to INSTALLED_APPS, new flags/constants (START_MASS_TRANSFER_UNVERIFIED, priorities, MAX_SEARCH_RESULTS, EXPORT_BASE_DIR), registers /mass-transfer/ URLs, and exposes a Mass Transfer link on the home page.
Static Config & CI
docker-compose.base.yml, example.env, .github/workflows/ci.yml, AGENTS.md
Adds environment vars for export dir and max results, updates example.env, documents dcm2niix requirement, and adds a Docker image tag in CI.
Tests & Cleanup Tests
adit/mass_transfer/tests/*, adit/mass_transfer/tests/__init__.py, adit/mass_transfer/tests/test_cleanup.py
Adds tests for partitions, processor behavior, pseudonym grouping/output path logic, and cleanup of exports on failure.

Sequence Diagram

sequenceDiagram
    actor User
    participant Web as "Web View"
    participant Form as "MassTransferJobForm"
    participant DB as "Database"
    participant Queue as "Procrastinate Queue"
    participant Worker as "MassTransferTaskProcessor"
    participant DICOM as "DICOM Operator"
    participant FS as "Filesystem / dcm2niix"

    User->>Web: create job (filters, dates, prefs)
    Web->>Form: validate & build partitions
    Form->>DB: save Job + Tasks (PENDING)
    User->>Web: verify -> queue tasks
    Web->>Queue: enqueue process_mass_transfer_task(task_id)
    Queue->>Worker: run task
    Worker->>DICOM: C-FIND/C-MOVE (discover/export)
    DICOM-->>Worker: DICOM files on disk
    Worker->>FS: run dcm2niix convert
    FS-->>Worker: NIfTI outputs
    Worker->>FS: cleanup export dirs
    Worker->>DB: update MassTransferVolume, Task statuses & logs
    DB-->>Worker: persist
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐰 I hopped through code and carved neat lanes,

Partitions, filters, and pseudonym chains,
From servers I fetched each DICOM crate,
Converted to NIfTI, tidy and straight.
Now mass transfers hum across the site — a rabbit’s quiet, clever bite.

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.94% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add Mass Transfer app with partitioned DICOM export and conversion' accurately summarizes the main change: introducing a new Mass Transfer feature with partitioned export and conversion capabilities.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch masstransfer

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Feb 3, 2026

Pull Request Review: Mass Transfer Feature

Overview

This PR introduces a comprehensive mass transfer facility for bulk DICOM export and NIfTI conversion. The implementation is well-structured and follows the existing ADIT patterns. Here's my detailed review:


✅ Strengths

Architecture & Design

  • Excellent consistency: Follows the established Job/Task pattern used by selective_transfer and batch_transfer
  • Good separation of concerns: Models, processors, views, and utilities are cleanly organized
  • Dedicated worker queue: mass_transfer queue isolates long-running tasks from regular operations
  • Per-study pseudonymization: UUID-based pseudonyms are correctly shared across all series in a study (processors.py:145-151)
  • Partitioning system: Daily/weekly partitioning is well-designed for managing large date ranges

Code Quality

  • Type annotations: Good use of type hints throughout (from __future__ import annotations)
  • Proper Django field conventions: String fields use blank=True, default="", non-string fields use both blank=True, null=True
  • Clear separation: Volumes tracked separately from tasks, allowing fine-grained progress monitoring
  • Test coverage: Unit tests for partitioning and processor logic

⚠️ Issues & Concerns

1. Critical: Missing Queue Registration 🔴

The mass_transfer queue is referenced but not registered in Procrastinate configuration.

Location: adit/core/tasks.py:86 and models.py:86

Issue: Tasks are deferred to mass_transfer queue, but I don't see it added to app.queues in settings.

Fix needed: Add to settings/base.py or wherever Procrastinate queues are configured:

PROCRASTINATE_QUEUES = ["default", "dicom", "mass_transfer"]

2. Security: Command Injection Risk 🔴

Location: processors.py:420-435

The dcm2niix subprocess call is vulnerable if series_name contains shell metacharacters.

Current code:

cmd = [
    "dcm2niix",
    "-z", "y",
    "-o", str(output_path),
    "-f", series_name,  # ⚠️ User-controlled via DICOM metadata
    str(volume.exported_folder),
]
result = subprocess.run(cmd, check=False, capture_output=True, text=True)

Issue: While using a list prevents shell injection, series_name is derived from _series_folder_name() which uses sanitize_filename(). Verify that sanitize_filename() properly handles all edge cases.

Recommendation: Add explicit validation or use a more restrictive character set for series names:

series_name = re.sub(r'[^a-zA-Z0-9_-]', '_', series_name)

3. Bug: Incorrect Status Reference 🟡

Location: adit/core/management/commands/cleanup_jobs_and_tasks.py:23,27

Fixed in PR: The PR correctly fixes hardcoded SelectiveTransferTask.Status to use the generic model.Status. Good catch! ✓

4. Error Handling: Silent Cleanup Failures 🟡

Location: processors.py:443-458

def _cleanup_export(self, volume: MassTransferVolume) -> None:
    try:
        shutil.rmtree(export_folder)
    except FileNotFoundError:
        pass  # Silent pass
    except Exception as err:
        volume.add_log(f"Cleanup failed: {err}")
        # Continues without raising

Issue: Cleanup failures are logged to the volume but don't affect task status. If exports aren't cleaned up, disk space can fill rapidly.

Recommendation:

  • Add monitoring/alerting for cleanup failures
  • Consider making cleanup mandatory or add a separate cleanup task
  • Track cumulative cleanup failures in task result

5. Performance: Recursive Query Splitting 🟡

Location: processors.py:324-333

if len(studies) > max_results:
    if end - start < _MIN_SPLIT_WINDOW:
        raise DicomError(f"Time window too small...")
    mid = start + (end - start) / 2
    return self._find_studies(operator, mf, start, mid) + self._find_studies(operator, mf, mid, end)

Concerns:

  • No maximum recursion depth limit
  • Could cause stack overflow with pathological data distributions
  • Duplicate queries at overlapping boundaries (mid appears in both ranges)

Recommendation:

# Add recursion depth tracking
def _find_studies(self, operator, mf, start, end, depth=0):
    if depth > MAX_RECURSION_DEPTH:
        raise DicomError(f"Query complexity too high for filter {mf}")
    # ... existing logic ...
    return self._find_studies(operator, mf, start, mid, depth+1) + \
           self._find_studies(operator, mf, mid + timedelta(seconds=1), end, depth+1)

6. Database Performance: N+1 Queries 🟡

Location: processors.py:148-168

for _, study_volumes in volumes_by_study.items():
    for volume in study_volumes:
        # Multiple save() calls per volume
        self._export_volume(...)
        self._convert_volume(...)

Issue: Individual volume.save() calls could be batched for better performance with large volumes.

Recommendation: Use bulk_update() where possible or defer saves to the end of each study.

7. Missing Configuration Defaults 🟡

Location: example.env:92-96

MASS_TRANSFER_MAX_SEARCH_RESULTS=200
MASS_TRANSFER_EXPORT_BASE_DIR="/mnt/mass_transfer_exports"

Issue: /mnt/mass_transfer_exports may not exist in development environments. No auto-creation documented.

Recommendation:

  • Add volume mount to docker-compose.dev.yml
  • Document directory creation in CLAUDE.md
  • Add healthcheck that validates directory exists and is writable

8. Test Coverage Gaps 🟡

Missing tests:

  • ✗ Filter matching logic (_dicom_match, institution filtering)
  • ✗ Error scenarios (disk full, conversion failures, permission errors)
  • ✗ Integration tests for full export → convert → cleanup workflow
  • ✗ Concurrent task processing with shared studies
  • ✓ Partition generation (covered)
  • ✓ Per-study pseudonymization grouping (covered)

Recommendation: Add acceptance tests for end-to-end workflows.

9. Documentation 📝

Missing:

  • No docstrings for MassTransferTaskProcessor methods
  • No explanation of why convert_to_nifti is hardcoded to True (forms.py:146)
  • Filter field documentation could explain DICOM wildcard patterns
  • No migration guide from volume-export-pipeline

Recommendation: Add comprehensive docstrings, especially for:

  • _find_volumes(): Complex filtering logic
  • _volume_export_path() vs _volume_output_path(): Why two separate functions?

🔍 Minor Issues

Code Style

  1. Line 330: Division by 2 could use // for clarity that it's datetime arithmetic: mid = start + (end - start) // 2
  2. models.py:173: add_log() method could use a timestamp prefix for debugging
  3. forms.py:146: Comment explaining why convert_to_nifti = True is hardcoded would help

Potential Improvements

  1. Progress tracking: Consider WebSocket updates like selective_transfer for real-time progress
  2. Cancellation: No explicit cancellation handling for in-progress conversions
  3. Retry logic: Failed volumes aren't automatically retried separately from the task
  4. Volume deduplication: Multiple filters could find the same series (handled at line 260, but worth commenting)

🔒 Security Assessment

Reviewed Items

  • ✓ User permissions checked (clean_source, clean_destination)
  • ✓ Path traversal prevention via sanitize_filename()
  • ⚠️ Command injection risk (see issue wip #2)
  • ✓ SQL injection: Uses Django ORM (safe)
  • ✓ No sensitive data logged

Required Actions

  1. Audit sanitize_filename() implementation for completeness
  2. Add input validation tests with malicious DICOM metadata
  3. Document security implications of client-controlled series names

📋 Recommendations Summary

Must Fix (Before Merge)

  1. ✅ Register mass_transfer queue in Procrastinate configuration
  2. ✅ Verify sanitize_filename() prevents all injection vectors or add extra validation
  3. ✅ Add recursion depth limit to _find_studies()

Should Fix (Before Merge)

  1. ✅ Document mass transfer export directory setup
  2. ✅ Add error handling tests
  3. ✅ Fix off-by-one in recursive query split (mid appears in both ranges)

Nice to Have (Can Follow Up)

  1. Add WebSocket progress updates
  2. Improve cleanup failure handling with monitoring
  3. Add comprehensive docstrings
  4. Consider bulk database operations for large volume counts

📊 Summary

Overall Assessment: Strong implementation that integrates well with ADIT's architecture. The partitioning and per-study pseudonymization features are well-designed. Main concerns are around queue configuration, security hardening, and error handling robustness.

Recommendation: Approve with required fixes for queue registration and security validation.

Estimated Risk: Low-Medium (isolated feature, dedicated worker prevents impact on existing functionality)

Great work bringing the volume export workflow into ADIT! 🎉

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature, the Mass Transfer app, which allows for partitioned DICOM export and conversion. The implementation is well-structured, with a dedicated worker queue for long-running tasks, reusable filters, and comprehensive models for managing jobs, tasks, and volumes. The code is generally of high quality, with good use of Django patterns and clear logic. I've identified a couple of areas for improvement related to database query efficiency and code maintainability, which I've detailed in the specific comments. Overall, this is a great addition to the project.

Comment on lines +37 to +39
counts: dict[MassTransferJob.Status, int] = {}
for status in MassTransferJob.Status:
counts[status] = MassTransferJob.objects.filter(status=status).count()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of collect_job_stats executes a separate database query for each status, which can be inefficient. This can be optimized to use a single query with GROUP BY to fetch all counts at once.

Suggested change
counts: dict[MassTransferJob.Status, int] = {}
for status in MassTransferJob.Status:
counts[status] = MassTransferJob.objects.filter(status=status).count()
from django.db.models import Count
db_counts = {
item["status"]: item["count"]
for item in MassTransferJob.objects.values("status").annotate(count=Count("id"))
}
counts: dict[MassTransferJob.Status, int] = {
status: db_counts.get(status.value, 0) for status in MassTransferJob.Status
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds true. Not sure how the other apps do that, but we could do that in all of the apps in that way (if that works, of course). Let's open an issue for that in the other apps.

Comment on lines 264 to 297
volume, created = MassTransferVolume.objects.get_or_create(
job=job,
series_instance_uid=series_uid,
defaults={
"partition_key": self.mass_task.partition_key,
"patient_id": str(study.PatientID),
"accession_number": str(study.get("AccessionNumber", "")),
"study_instance_uid": str(study.StudyInstanceUID),
"modality": str(series.Modality),
"study_description": str(study.get("StudyDescription", "")),
"series_description": str(series.get("SeriesDescription", "")),
"series_number": series_number,
"study_datetime": timezone.make_aware(study_dt),
"institution_name": str(series.get("InstitutionName", "")),
"number_of_images": _parse_int(
series.get("NumberOfSeriesRelatedInstances"), default=0
),
},
)
if not created:
volume.partition_key = self.mass_task.partition_key
volume.patient_id = str(study.PatientID)
volume.accession_number = str(study.get("AccessionNumber", ""))
volume.study_instance_uid = str(study.StudyInstanceUID)
volume.modality = str(series.Modality)
volume.study_description = str(study.get("StudyDescription", ""))
volume.series_description = str(series.get("SeriesDescription", ""))
volume.series_number = series_number
volume.study_datetime = timezone.make_aware(study_dt)
volume.institution_name = str(series.get("InstitutionName", ""))
volume.number_of_images = _parse_int(
series.get("NumberOfSeriesRelatedInstances"), default=0
)
volume.save()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of get_or_create followed by an if not created: block to update the object leads to significant code duplication. This can be simplified by using update_or_create, which handles both creation and updates more cleanly.

                    defaults = {
                        "partition_key": self.mass_task.partition_key,
                        "patient_id": str(study.PatientID),
                        "accession_number": str(study.get("AccessionNumber", "")),
                        "study_instance_uid": str(study.StudyInstanceUID),
                        "modality": str(series.Modality),
                        "study_description": str(study.get("StudyDescription", "")),
                        "series_description": str(series.get("SeriesDescription", "")),
                        "series_number": series_number,
                        "study_datetime": timezone.make_aware(study_dt),
                        "institution_name": str(series.get("InstitutionName", "")),
                        "number_of_images": _parse_int(
                            series.get("NumberOfSeriesRelatedInstances"), default=0
                        ),
                    }
                    volume, _ = MassTransferVolume.objects.update_or_create(
                        job=job,
                        series_instance_uid=series_uid,
                        defaults=defaults,
                    )

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@adit/mass_transfer/apps.py`:
- Line 45: The signal handler init_db currently declares an unused **kwargs
which triggers Ruff ARG001; rename that parameter to a leading-underscore form
(e.g., **_kwargs) so the handler signature remains compatible with Django
signals but linters treat it as intentionally unused; update any internal
references if present (none expected) and run the linter to verify the warning
is gone.
- Around line 12-42: The register_app() function registers a job-stats collector
each time AppConfig.ready() runs, causing duplicates on Django autoreload; make
register_job_stats_collector idempotent by checking for prior registration
before appending: e.g., when registering collect_job_stats from register_app(),
detect an existing identical collector (by function identity or a unique
name/key) and skip re-registration, or mark the function with an attribute
(e.g., __registered__) after successful registration; update
register_app()/collect_job_stats usage to perform that check so multiple calls
from ready() are no-ops.

In `@adit/mass_transfer/processors.py`:
- Around line 303-335: In _find_studies, the StudyTime range is being applied
for every date when splitting multi-day windows, which can exclude studies on
intermediate days; change the QueryDataset.create call so StudyTime is only set
when start.date() == end.date() (i.e., the window is a single day), while always
setting StudyDate=(start.date(), end.date()), and keep the existing logic that
adds query.dataset.ModalitiesInStudy and StudyDescription using mf.modality and
mf.study_description; ensure recursive splits continue to pass correct start/end
so StudyTime will be applied only for single-day queries.

In `@adit/mass_transfer/templates/mass_transfer/mass_transfer_job_detail.html`:
- Around line 40-48: The template uses an outer conditional ({% if
job.filters.all %}) which prevents the {% empty %} clause from ever running;
remove that outer if around the for-loop (or replace it with an explicit else)
so the for f in job.filters.all ... {% empty %} — {% endfor %} executes for
empty filter sets and renders the placeholder; update the block around
job.filters.all accordingly.
🧹 Nitpick comments (6)
adit/mass_transfer/utils/partitions.py (1)

39-40: Consider using time.max for end-of-day precision.

Using time(23, 59, 59) excludes the final ~1 second of the day (23:59:59.000001 through 23:59:59.999999). If DICOM queries use inclusive timestamp comparisons, studies at the very end of a day could be missed.

♻️ Proposed fix
-        end_dt = timezone.make_aware(datetime.combine(window_end_date, time(23, 59, 59)), tz)
+        end_dt = timezone.make_aware(datetime.combine(window_end_date, time.max), tz)
adit/mass_transfer/processors.py (2)

80-97: Consider consolidating the duplicate path builders.

_volume_export_path and _volume_output_path are identical; a single helper would reduce duplication and drift risk.


110-191: Add structured logs for task and volume state transitions.

There are status transitions and per-volume outcomes, but only exception logs are emitted. Structured logs would improve traceability in long-running batch workflows.
As per coding guidelines "Use structured logging for agent actions and state transitions".

adit/mass_transfer/admin.py (1)

20-22: Consider adding custom admin for MassTransferVolume for better usability.

MassTransferVolume has many fields (pseudonym, patient_id, accession_number, study/series UIDs, status, etc.). Using the default admin.ModelAdmin may make it difficult to navigate and filter volumes. Consider adding list_display, list_filter, and search_fields similar to how DicomJobAdmin and DicomTaskAdmin are configured in adit/core/admin.py.

♻️ Optional: Custom admin for MassTransferVolume
+class MassTransferVolumeAdmin(admin.ModelAdmin):
+    list_display = ("id", "job", "status", "pseudonym", "modality", "study_datetime", "created")
+    list_filter = ("status", "modality", "created")
+    search_fields = ("pseudonym", "patient_id", "study_instance_uid", "series_instance_uid")
+
+
 admin.site.register(MassTransferJob, MassTransferJobAdmin)
 admin.site.register(MassTransferTask, DicomTaskAdmin)
 admin.site.register(MassTransferSettings, admin.ModelAdmin)
 admin.site.register(MassTransferFilter, admin.ModelAdmin)
-admin.site.register(MassTransferVolume, admin.ModelAdmin)
+admin.site.register(MassTransferVolume, MassTransferVolumeAdmin)
adit/mass_transfer/tests/test_processor.py (1)

150-155: Consider using tmp_path fixture for consistency.

While the static analysis warning (S108) about /tmp/base is a false positive here since no actual files are created, using the tmp_path fixture would be more consistent with the other tests in this file and avoids the linter noise.

♻️ Optional: Use tmp_path fixture
-def test_volume_output_path_uses_year_month_and_pseudonym():
-    base_dir = Path("/tmp/base")
+def test_volume_output_path_uses_year_month_and_pseudonym(tmp_path: Path):
+    base_dir = tmp_path / "base"
     study_dt = datetime(2024, 2, 15, 10, 30)
     path = _volume_output_path(base_dir, study_dt, "pseudo", "1-Head")

-    assert path == base_dir / "202402" / "pseudo" / "1-Head"
+    assert path == base_dir / "202402" / "pseudo" / "1-Head"
adit/mass_transfer/forms.py (1)

122-152: Add structured logs for job/task creation.

Creating partitioned tasks is a key state transition; emitting a structured event (job_id, partition_count, date range) will improve auditability and debugging.

As per coding guidelines: **/*.{js,ts,py}: Use structured logging for agent actions and state transitions.

Comment on lines +12 to +42
def ready(self):
register_app()

# Put calls to db stuff in this signal handler
post_migrate.connect(init_db, sender=self)


def register_app():
from adit_radis_shared.common.site import MainMenuItem, register_main_menu_item

from adit.core.site import JobStats, register_dicom_processor, register_job_stats_collector

from .models import MassTransferJob, MassTransferTask
from .processors import MassTransferTaskProcessor

register_main_menu_item(
MainMenuItem(
url_name="mass_transfer_job_create",
label=SECTION_NAME,
)
)

register_dicom_processor(get_model_label(MassTransferTask), MassTransferTaskProcessor)

def collect_job_stats() -> JobStats:
counts: dict[MassTransferJob.Status, int] = {}
for status in MassTransferJob.Status:
counts[status] = MassTransferJob.objects.filter(status=status).count()
return JobStats("Mass Transfer", "mass_transfer_job_list", counts)

register_job_stats_collector(collect_job_stats)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd adit/mass_transfer && cat -n apps.py

Repository: openradx/adit

Length of output: 1873


🌐 Web query:

Django AppConfig.ready() autoreloader multiple calls duplicate registration pattern

💡 Result:

Django can call AppConfig.ready() more than once, so anything you do there (signal connect(), registry setup, scheduler start, etc.) must be idempotent. Django explicitly documents this (“ready might be called more than once… write idempotent methods, or put a flag…”). [1]

Why it happens (common case)

With manage.py runserver, Django’s autoreloader starts two processes (a parent “watcher” + the child that actually serves), so your startup code can run twice unless you guard it. [2]

Robust duplicate-registration patterns

1) Prefer “idempotent registration” for signals: dispatch_uid

If your issue is duplicate signal handlers, use dispatch_uid so repeated imports / repeated ready() calls don’t connect the same receiver twice. [3]

# apps.py
from django.apps import AppConfig
from django.db.models.signals import post_save

class MyAppConfig(AppConfig):
    name = "myapp"

    def ready(self):
        from .signals import my_receiver
        from .models import MyModel
        post_save.connect(
            my_receiver,
            sender=MyModel,
            dispatch_uid="myapp.post_save.my_receiver",  # prevents duplicates
        )

2) If you truly need “run once per runserver start”, guard for the autoreloader child

Typical guard:

import os
from django.apps import AppConfig

class MyAppConfig(AppConfig):
    name = "myapp"

    def ready(self):
        if os.environ.get("RUN_MAIN") != "true":
            return  # skip the autoreloader parent process
        # do one-time startup work here

This pattern is commonly used specifically to avoid the runserver double-process behavior. [2]

A more Django-internal-focused variant discussed by Django devs is checking the autoreloader env var (DJANGO_AUTORELOAD_ENV == "true") to distinguish the “real” child process. [4]

3) Also remember: “once” is per process, not per deployment

Even if you fix runserver duplication, in production multiple worker processes will each run ready(). So anything that must be globally-singleton (cron scheduler, background thread, queue consumer) should usually live outside Django app startup (or be protected by an external lock/leader election), not just an in-process flag. This is the same underlying “idempotency” requirement Django documents for ready(). [1]

Sources:
[1] Django docs: AppConfig.ready() may run more than once; make it idempotent or guard with a flag. (docs.djangoproject.com)
[2] runserver starts two processes; common RUN_MAIN guard / --noreload discussion. (stackoverflow.com)
[3] Django signal docs: prevent duplicate registrations with dispatch_uid; notes ready() can execute more than once. (docs.djangoproject.com)
[4] Django-developers thread: DJANGO_AUTORELOAD_ENV used by the autoreloader to mark the “active” child process. (groups.google.com)

Citations:


🏁 Script executed:

# Check the registration functions to see if they handle duplicates
fd -t f -name "*.py" -path "*/adit/core/*" | xargs rg -l "register_job_stats_collector|register_main_menu_item" | head -5

Repository: openradx/adit

Length of output: 436


🏁 Script executed:

rg -A 10 "def register_main_menu_item" --type py

Repository: openradx/adit

Length of output: 39


🏁 Script executed:

rg -A 10 "def register_job_stats_collector" --type py

Repository: openradx/adit

Length of output: 551


🏁 Script executed:

rg -A 10 "def register_dicom_processor" --type py

Repository: openradx/adit

Length of output: 245


🏁 Script executed:

rg "from adit_radis_shared" adit/mass_transfer/apps.py -A 5

Repository: openradx/adit

Length of output: 350


🏁 Script executed:

# Check if there's any indication of adit_radis_shared package location or if it's external
fd -t f "site.py" | xargs rg -l "MainMenuItem.*register_main_menu_item" 2>/dev/null | head -3

Repository: openradx/adit

Length of output: 39


🌐 Web query:

Ruff ARG001 unused argument Python error rule enforcement

💡 Result:

ARG001 is Ruff’s unused-function-argument rule (from flake8-unused-arguments). It flags function parameters that are defined but never used. If the argument is intentionally unused, rename it to match Ruff’s “dummy variable” pattern (by default, a leading underscore, e.g. _unused). [1]

Enforcing ARG001

Enable it in your config (e.g., pyproject.toml):

[tool.ruff.lint]
select = ["ARG001"]
# or enable the whole family:
# select = ["ARG"]

Ruff’s ARG rules treat “dummy” names as intentionally unused; you can customize what counts as “dummy” via lint.dummy-variable-rgx. [1]

Common ways to handle/allow exceptions

  • Mark the parameter as intentionally unused:

    def callback(_unused, x): ...

    (works because _unused matches the dummy-variable pattern) [1]

  • Suppress in one place with noqa:

    def callback(unused, x): ...  # noqa: ARG001

    Ruff supports noqa and also global/per-file ignores via configuration. [2]

  • If you want “only flag certain argument names”: Ruff doesn’t currently offer a built-in allow/deny list of argument names for ARG rules (there’s an open request). [3]

Sources: Ruff rule docs for ARG001 [1], Ruff suppression/config docs (noqa, ignores, per-file-ignores) [2], Ruff GitHub issue requesting name-based selection/ignores for ARG rules [3].


Guard against duplicate registrations in register_job_stats_collector on autoreload.
register_app() runs from AppConfig.ready(), which Django's autoreloader may call multiple times (especially during manage.py runserver). The register_job_stats_collector() function appends to a global list without deduplication, so multiple calls will create duplicate stats collectors. Consider an idempotency guard to make registration safe to re-run. (register_dicom_processor() is already safe by design since it uses dict assignment; register_main_menu_item() behavior depends on the external package.)

🔁 Suggested idempotency guard
+_APP_REGISTERED = False
+
 def register_app():
+    global _APP_REGISTERED
+    if _APP_REGISTERED:
+        return
+    _APP_REGISTERED = True
     from adit_radis_shared.common.site import MainMenuItem, register_main_menu_item
     from adit.core.site import JobStats, register_dicom_processor, register_job_stats_collector
     ...
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def ready(self):
register_app()
# Put calls to db stuff in this signal handler
post_migrate.connect(init_db, sender=self)
def register_app():
from adit_radis_shared.common.site import MainMenuItem, register_main_menu_item
from adit.core.site import JobStats, register_dicom_processor, register_job_stats_collector
from .models import MassTransferJob, MassTransferTask
from .processors import MassTransferTaskProcessor
register_main_menu_item(
MainMenuItem(
url_name="mass_transfer_job_create",
label=SECTION_NAME,
)
)
register_dicom_processor(get_model_label(MassTransferTask), MassTransferTaskProcessor)
def collect_job_stats() -> JobStats:
counts: dict[MassTransferJob.Status, int] = {}
for status in MassTransferJob.Status:
counts[status] = MassTransferJob.objects.filter(status=status).count()
return JobStats("Mass Transfer", "mass_transfer_job_list", counts)
register_job_stats_collector(collect_job_stats)
def ready(self):
register_app()
# Put calls to db stuff in this signal handler
post_migrate.connect(init_db, sender=self)
_APP_REGISTERED = False
def register_app():
global _APP_REGISTERED
if _APP_REGISTERED:
return
_APP_REGISTERED = True
from adit_radis_shared.common.site import MainMenuItem, register_main_menu_item
from adit.core.site import JobStats, register_dicom_processor, register_job_stats_collector
from .models import MassTransferJob, MassTransferTask
from .processors import MassTransferTaskProcessor
register_main_menu_item(
MainMenuItem(
url_name="mass_transfer_job_create",
label=SECTION_NAME,
)
)
register_dicom_processor(get_model_label(MassTransferTask), MassTransferTaskProcessor)
def collect_job_stats() -> JobStats:
counts: dict[MassTransferJob.Status, int] = {}
for status in MassTransferJob.Status:
counts[status] = MassTransferJob.objects.filter(status=status).count()
return JobStats("Mass Transfer", "mass_transfer_job_list", counts)
register_job_stats_collector(collect_job_stats)
🤖 Prompt for AI Agents
In `@adit/mass_transfer/apps.py` around lines 12 - 42, The register_app() function
registers a job-stats collector each time AppConfig.ready() runs, causing
duplicates on Django autoreload; make register_job_stats_collector idempotent by
checking for prior registration before appending: e.g., when registering
collect_job_stats from register_app(), detect an existing identical collector
(by function identity or a unique name/key) and skip re-registration, or mark
the function with an attribute (e.g., __registered__) after successful
registration; update register_app()/collect_job_stats usage to perform that
check so multiple calls from ready() are no-ops.

register_job_stats_collector(collect_job_stats)


def init_db(**kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid Ruff ARG001 for unused signal kwargs.
Rename the unused kwargs to a leading-underscore name (or add a noqa) to keep lint clean while preserving the signal handler signature.

🧹 Minimal fix
-def init_db(**kwargs):
+def init_db(**_kwargs):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def init_db(**kwargs):
def init_db(**_kwargs):
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 45-45: Unused function argument: kwargs

(ARG001)

🤖 Prompt for AI Agents
In `@adit/mass_transfer/apps.py` at line 45, The signal handler init_db currently
declares an unused **kwargs which triggers Ruff ARG001; rename that parameter to
a leading-underscore form (e.g., **_kwargs) so the handler signature remains
compatible with Django signals but linters treat it as intentionally unused;
update any internal references if present (none expected) and run the linter to
verify the warning is gone.

Comment on lines 303 to 335
def _find_studies(
self,
operator: DicomOperator,
mf: MassTransferFilter,
start: datetime,
end: datetime,
) -> list[ResultDataset]:
max_results = settings.MASS_TRANSFER_MAX_SEARCH_RESULTS

query = QueryDataset.create(
StudyDate=(start.date(), end.date()),
StudyTime=(start.time(), end.time()),
)

if mf.modality:
query.dataset.ModalitiesInStudy = mf.modality
if mf.study_description:
query.dataset.StudyDescription = mf.study_description

studies = list(operator.find_studies(query, limit_results=max_results + 1))

if len(studies) > max_results:
if end - start < _MIN_SPLIT_WINDOW:
raise DicomError(
f"Time window too small ({start} to {end}) for filter {mf}."
)

mid = start + (end - start) / 2
return self._find_studies(operator, mf, start, mid) + self._find_studies(
operator, mf, mid, end
)

return studies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix StudyTime filtering when splitting multi-day windows.

When the search window is split and spans multiple days, applying StudyTime=(start.time(), end.time()) to every date can exclude studies on non-boundary days. This can cause missing volumes once the query is recursively split. Consider only applying the StudyTime range when the window is a single day.

🛠️ Proposed fix
-        query = QueryDataset.create(
-            StudyDate=(start.date(), end.date()),
-            StudyTime=(start.time(), end.time()),
-        )
+        query_kwargs = {"StudyDate": (start.date(), end.date())}
+        if start.date() == end.date():
+            query_kwargs["StudyTime"] = (start.time(), end.time())
+        query = QueryDataset.create(**query_kwargs)
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 326-328: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@adit/mass_transfer/processors.py` around lines 303 - 335, In _find_studies,
the StudyTime range is being applied for every date when splitting multi-day
windows, which can exclude studies on intermediate days; change the
QueryDataset.create call so StudyTime is only set when start.date() ==
end.date() (i.e., the window is a single day), while always setting
StudyDate=(start.date(), end.date()), and keep the existing logic that adds
query.dataset.ModalitiesInStudy and StudyDescription using mf.modality and
mf.study_description; ensure recursive splits continue to pass correct start/end
so StudyTime will be applied only for single-day queries.

Comment on lines +40 to +48
<dt class="col-sm-3">Filters</dt>
<dd class="col-sm-9">
{% if job.filters.all %}
{% for f in job.filters.all %}
<div>{{ f }}</div>
{% empty %}
{% endfor %}
{% endif %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Render an empty-state placeholder for filters.

{% if job.filters.all %} prevents the {% empty %} branch from ever showing, so jobs without filters render a blank cell. Remove the outer if or add an explicit else.

🛠️ Proposed fix
-        <dd class="col-sm-9">
-            {% if job.filters.all %}
-                {% for f in job.filters.all %}
-                    <div>{{ f }}</div>
-                {% empty %}
-
-                {% endfor %}
-            {% endif %}
-        </dd>
+        <dd class="col-sm-9">
+            {% for f in job.filters.all %}
+                <div>{{ f }}</div>
+            {% empty %}
+
+            {% endfor %}
+        </dd>
🤖 Prompt for AI Agents
In `@adit/mass_transfer/templates/mass_transfer/mass_transfer_job_detail.html`
around lines 40 - 48, The template uses an outer conditional ({% if
job.filters.all %}) which prevents the {% empty %} clause from ever running;
remove that outer if around the for-loop (or replace it with an explicit else)
so the for f in job.filters.all ... {% empty %} — {% endfor %} executes for
empty filter sets and renders the placeholder; update the block around
job.filters.all accordingly.

@claude
Copy link

claude bot commented Feb 3, 2026

Pull Request Review: Mass Transfer App

Overview

This PR introduces a comprehensive Mass Transfer facility for bulk DICOM export with NIfTI conversion, implementing time-partitioned processing and reusable filter management. The implementation follows ADITs architectural patterns and integrates well with the existing job/task system.

Positive Aspects

  1. Well-structured Architecture: The app follows ADITs established patterns with proper separation between models, processors, views, and forms
  2. Comprehensive Testing: Good test coverage including edge cases (time window splitting, pseudonymization opt-out)
  3. Proper Integration: Correctly integrates with existing infrastructure (Procrastinate queues, cleanup commands, settings)
  4. Reusable Filters: The filter management system is a nice feature for defining and sharing transfer criteria
  5. Partition System: Time-based partitioning (daily/weekly) is well-designed for managing large-scale transfers
  6. Docker Support: Proper worker container configuration for both dev and prod environments

Critical Issues

1. Hardcoded NIfTI Conversion (adit/mass_transfer/forms.py:151-152)

The form hardcodes convert_to_nifti = True but the DicomJob model may not have this field. This could cause runtime errors.

Recommendation: Verify the field exists in the base model, or add proper validation.

2. Security: Command Injection Risk (adit/mass_transfer/processors.py:427-438)

While using a list avoids shell injection, series_name uses sanitize_filename(). Verify that this properly escapes all dangerous characters for command-line use.

Recommendation: Add explicit validation or use a safer filename generation approach. Consider setting check=True in subprocess.run.

3. Resource Leaks: Incomplete Cleanup (adit/mass_transfer/processors.py:450-465)

Silent failure on cleanup means disk space could accumulate. The " (cleaned)" suffix trick is fragile.

Recommendation: Add periodic cleanup job, log cleanup failures at WARNING level, consider using temporary directories.

4. Performance: Potential N+1 Query (adit/mass_transfer/processors.py:208-306)

For each filter, for each study, there is a separate DICOM C-FIND query. With multiple filters this could query the same studies repeatedly.

Recommendation: Consider caching study results or deduplicating queries before hitting the PACS server.

5. Data Integrity: Race Condition (adit/mass_transfer/processors.py:269-287)

If the volume already exists, all fields are blindly overwritten. Concurrent processing could cause lost updates or inconsistent state.

Recommendation: Add locking or check the volumes status before updating fields.

Minor Issues

  1. Type Safety: Missing return type annotations
  2. Error Handling: Tasks can succeed even if ALL volumes fail
  3. Code Style: Inconsistent field ordering in MassTransferVolume model
  4. Missing Documentation: Utility functions lack docstrings
  5. Hardcoded Magic Values: _MIN_SPLIT_WINDOW should be a setting
  6. Test Coverage Gaps: Missing tests for filter validation, partition edge cases, concurrent processing
  7. UI/UX: No preview of how many tasks will be created

Security Considerations

  1. Access Control: Verify that filter sharing does not bypass DICOM node access restrictions
  2. Path Traversal: Ensure sanitize_filename() prevents path traversal in export paths
  3. Resource Exhaustion: No limit on number of volumes per job

Performance Considerations

  1. Database: MassTransferVolume table could grow very large - consider indexing and archiving
  2. PACS Load: Recursive time splitting could create many small queries
  3. Disk I/O: Simultaneous conversion could saturate I/O

Recommendations Summary

Must Fix Before Merge:

  1. Verify convert_to_nifti field exists
  2. Add validation for command-line inputs
  3. Implement cleanup monitoring/alerting
  4. Add locking for race conditions

Should Fix Before Merge:
5. Add return type annotations
6. Improve error context in exception handling
7. Add test coverage for edge cases
8. Move magic constants to settings

Overall Assessment

This is a well-designed and valuable feature that properly extends ADITs architecture. The code quality is generally good with proper testing and integration. However, there are several issues that should be addressed before merging around field validation, resource management, security, and performance.

With the recommended fixes, this will be a solid addition to ADIT. Great work on maintaining consistency with existing patterns!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@adit/mass_transfer/forms.py`:
- Around line 149-161: The current save stores a bound method on self.save_tasks
when commit=False which is unconventional; instead assign a no-arg closure to
the standard save_m2m hook or document the pattern. Concretely, after job =
super().save(commit=False) and setting job fields, if commit is False set
self.save_m2m = lambda: self._save_tasks(job) (or alternatively set
self._pending_job = job and override save_m2m to call
self._save_tasks(self._pending_job)); this keeps the callback signature
consistent with Django and avoids requiring callers to call
form.save_tasks(job). Ensure changes reference the save method, self.save_m2m,
and _save_tasks.
🧹 Nitpick comments (2)
adit/mass_transfer/processors.py (2)

80-97: Consider consolidating identical path helpers.

_volume_export_path and _volume_output_path have identical implementations. If this duplication is intentional to allow divergent paths in the future, consider adding a comment. Otherwise, a single helper could be used.

♻️ Optional: Consolidate into a single helper
-def _volume_export_path(
-    base_dir: Path,
-    study_dt: datetime,
-    subject_id: str,
-    series_name: str,
-) -> Path:
-    year_month = study_dt.strftime("%Y%m")
-    return base_dir / year_month / subject_id / series_name
-
-
-def _volume_output_path(
+def _volume_path(
     base_dir: Path,
     study_dt: datetime,
     subject_id: str,
     series_name: str,
 ) -> Path:
+    """Construct path for volume export or output directories."""
     year_month = study_dt.strftime("%Y%m")
     return base_dir / year_month / subject_id / series_name

269-302: Consider using update_or_create for volume records.

The current pattern of get_or_create followed by manual field updates on existing records could be simplified with update_or_create. This would reduce code duplication and make the intent clearer.

♻️ Optional: Use update_or_create
-                    volume, created = MassTransferVolume.objects.get_or_create(
+                    defaults = {
+                        "partition_key": self.mass_task.partition_key,
+                        "patient_id": str(study.PatientID),
+                        "accession_number": str(study.get("AccessionNumber", "")),
+                        "study_instance_uid": str(study.StudyInstanceUID),
+                        "modality": str(series.Modality),
+                        "study_description": str(study.get("StudyDescription", "")),
+                        "series_description": str(series.get("SeriesDescription", "")),
+                        "series_number": series_number,
+                        "study_datetime": timezone.make_aware(study_dt),
+                        "institution_name": str(series.get("InstitutionName", "")),
+                        "number_of_images": _parse_int(
+                            series.get("NumberOfSeriesRelatedInstances"), default=0
+                        ),
+                    }
+                    volume, _ = MassTransferVolume.objects.update_or_create(
                         job=job,
                         series_instance_uid=series_uid,
-                        defaults={
-                            "partition_key": self.mass_task.partition_key,
-                            ...
-                        },
+                        defaults=defaults,
                     )
-                    if not created:
-                        volume.partition_key = self.mass_task.partition_key
-                        ...
-                        volume.save()

Comment on lines 149 to 161
def save(self, commit: bool = True):
job = super().save(commit=False)
# Mass transfer always converts to NIfTI
job.convert_to_nifti = True
job.urgent = False

if commit:
job.save()
self.save_m2m()
self._save_tasks(job)
else:
self.save_tasks = self._save_tasks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "MassTransferJobForm" --type=py -A 15 adit/mass_transfer/views.py

Repository: openradx/adit

Length of output: 1467


🏁 Script executed:

# Check the _save_tasks method signature
rg -n "_save_tasks" --type=py -B2 -A8 adit/mass_transfer/forms.py

Repository: openradx/adit

Length of output: 601


🏁 Script executed:

# Search for any calls to save_tasks after save(commit=False)
rg -n "save\(commit=False\)" --type=py -A5 adit/mass_transfer/

Repository: openradx/adit

Length of output: 421


🏁 Script executed:

# Look for other forms that use similar callback patterns
rg -n "self\.[a-z_]+ = self\._[a-z_]+" --type=py adit/

Repository: openradx/adit

Length of output: 379


🏁 Script executed:

# Check MassTransferJobCreateView for form_valid or custom save logic
rg -n "MassTransferJobCreateView" --type=py -A 30 adit/mass_transfer/views.py

Repository: openradx/adit

Length of output: 1362


🏁 Script executed:

# Search for calls to form.save_tasks anywhere
rg -n "form\.save_tasks\(" --type=py

Repository: openradx/adit

Length of output: 39


🏁 Script executed:

# Check if there's a form_valid method in the view
rg -n "def form_valid" --type=py -B3 -A8 adit/mass_transfer/views.py

Repository: openradx/adit

Length of output: 455


🏁 Script executed:

# Find DicomJobCreateView and check how it handles form.save_tasks
rg -n "class DicomJobCreateView" --type=py -A 30 adit/

Repository: openradx/adit

Length of output: 1633


🏁 Script executed:

# Also check how form_valid is implemented in the parent class
rg -n "def form_valid" --type=py -B2 -A 15 adit/core/ | head -100

Repository: openradx/adit

Length of output: 3171


Document the save_tasks callback pattern or use standard Django save_m2m approach.

When commit=False, self.save_tasks stores the bound method self._save_tasks. The caller must explicitly invoke form.save_tasks(job) after saving the form. This pattern is unconventional for Django forms (where save_m2m takes no arguments). Clarify in documentation or tests how external code should handle this, or consider storing a closure or using Django's standard save_m2m hook instead.

🤖 Prompt for AI Agents
In `@adit/mass_transfer/forms.py` around lines 149 - 161, The current save stores
a bound method on self.save_tasks when commit=False which is unconventional;
instead assign a no-arg closure to the standard save_m2m hook or document the
pattern. Concretely, after job = super().save(commit=False) and setting job
fields, if commit is False set self.save_m2m = lambda: self._save_tasks(job) (or
alternatively set self._pending_job = job and override save_m2m to call
self._save_tasks(self._pending_job)); this keeps the callback signature
consistent with Django and avoids requiring callers to call
form.save_tasks(job). Ensure changes reference the save method, self.save_m2m,
and _save_tasks.

Comment on lines 427 to 444
cmd = [
"dcm2niix",
"-z",
"y",
"-o",
str(output_path),
"-f",
series_name,
str(volume.exported_folder),
]

result = subprocess.run(cmd, check=False, capture_output=True, text=True)
if result.returncode != 0:
raise DicomError(
f"Conversion failed for series {volume.series_instance_uid}: {result.stderr}"
)

volume.converted_file = str(output_path / f"{series_name}.nii.gz")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, examine the full _convert_volume method to understand the context
cat -n adit/mass_transfer/processors.py | sed -n '403,448p'

Repository: openradx/adit

Length of output: 1873


🏁 Script executed:

#!/bin/bash
# Search for any existing dcm2niix or nii.gz handling patterns in the codebase
rg -n "dcm2niix|nii\.gz|converted_file" --type=py -C2

Repository: openradx/adit

Length of output: 5423


🏁 Script executed:

#!/bin/bash
# Check if there are any tests related to dcm2niix output handling
fd -type f -name "*.py" | xargs grep -l "dcm2niix" | head -20

Repository: openradx/adit

Length of output: 227


🏁 Script executed:

#!/bin/bash
# Examine the DicomToNiftiConverter class
cat -n adit/core/utils/dicom_to_nifti_converter.py

Repository: openradx/adit

Length of output: 2228


🏁 Script executed:

#!/bin/bash
# Check if there are any tests for the _convert_volume method specifically
fd -type f -name "*.py" -path "*/test*" | xargs grep -l "_convert_volume\|mass_transfer.*test" | head -10

Repository: openradx/adit

Length of output: 227


🏁 Script executed:

#!/bin/bash
# Search for any file existence checks or output validation after dcm2niix calls
rg -n "nii\.gz.*exist|Path.*exist|converted_file" --type py -A3 -B3

Repository: openradx/adit

Length of output: 2514


🏁 Script executed:

#!/bin/bash
# Check the MassTransferVolume model to understand series-to-volume mapping
rg -n "class MassTransferVolume|series.*=" adit/mass_transfer/models.py | head -30

Repository: openradx/adit

Length of output: 474


🏁 Script executed:

#!/bin/bash
# Look at how volumes are created from DICOM series - check the export process
rg -n "_export_series|_create.*volume" adit/mass_transfer/processors.py -A5 | head -50

Repository: openradx/adit

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Search for mass_transfer specific tests
fd -type f -name "*test*.py" adit/mass_transfer/ | head -10

Repository: openradx/adit

Length of output: 227


🏁 Script executed:

#!/bin/bash
# Check if there are mass_transfer acceptance tests that validate the converted_file assumption
find adit/mass_transfer -name "*test*.py" -type f -exec cat {} \;

Repository: openradx/adit

Length of output: 7993


🏁 Script executed:

#!/bin/bash
# Check the batch_transfer test to understand if ROI1 is from separate series or same series
grep -n "ROI1\|Topogramm" adit/batch_transfer/tests/acceptance/test_batch_transfer.py -B5 -A10

Repository: openradx/adit

Length of output: 699


🏁 Script executed:

#!/bin/bash
# Search for how dcm2niix -f flag behaves - check for any documentation or comments
rg -n "\-f.*format|\-f.*output|dcm2niix.*behavior" --type py -A2 -B2

Repository: openradx/adit

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Check what batch_transfer actually uses - does it use the same _convert_volume code?
rg -n "batch_transfer.*processors|batch_transfer.*convert" adit/batch_transfer/ -C2 | head -50

Repository: openradx/adit

Length of output: 2462


🏁 Script executed:

#!/bin/bash
# Check if there's a test for mass_transfer that actually calls _convert_volume with real dcm2niix
rg -n "test.*convert_volume|_convert_volume" adit/mass_transfer/tests/ -C3

Repository: openradx/adit

Length of output: 1216


🏁 Script executed:

#!/bin/bash
# Get the full batch_transfer test file to understand the setup
wc -l adit/batch_transfer/tests/acceptance/test_batch_transfer.py

Repository: openradx/adit

Length of output: 120


🏁 Script executed:

#!/bin/bash
# Look at the first part of batch_transfer test to see how volumes/series are set up
head -n 200 adit/batch_transfer/tests/acceptance/test_batch_transfer.py

Repository: openradx/adit

Length of output: 6880


🏁 Script executed:

#!/bin/bash
# Check how batch_transfer actually creates/manages volumes and their conversion
rg -n "class.*Processor|def.*convert|MassTransferVolume\|BatchTransferVolume" adit/batch_transfer/ --type py -A3 | head -80

Repository: openradx/adit

Length of output: 2094


🏁 Script executed:

#!/bin/bash
# Check if batch_transfer uses the same _convert_volume method or different code
rg -n "from adit.mass_transfer\|import.*mass_transfer\|DicomToNiftiConverter" adit/batch_transfer/ --type py

Repository: openradx/adit

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Find TransferTaskProcessor to see the common base class
rg -n "class TransferTaskProcessor" --type py -A20

Repository: openradx/adit

Length of output: 1490


🏁 Script executed:

#!/bin/bash
# Check the batch_transfer processors.py file to see what it actually does
head -n 100 adit/batch_transfer/processors.py

Repository: openradx/adit

Length of output: 541


🏁 Script executed:

#!/bin/bash
# Search for where the actual NIfTI conversion happens - look for DicomToNiftiConverter usage
rg -n "DicomToNiftiConverter\|dcm2niix" adit/ --type py | grep -v "test\|migration"

Repository: openradx/adit

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Check if there's a convert_to_nifti function or similar in batch_transfer or core
rg -n "def.*convert.*nifti\|def.*convert_to_nifti" adit/ --type py -A5

Repository: openradx/adit

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Search for where batch transfer actually implements NIfTI conversion - check views or tasks
rg -n "nifti\|NIfTI\|convert" adit/batch_transfer/ --type py -i | grep -v test | head -30

Repository: openradx/adit

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Look for how batch_transfer actually uses the conversion - check if it has its own implementation
find adit/batch_transfer -name "*.py" -exec grep -l "subprocess\|convert" {} \; | head -10

Repository: openradx/adit

Length of output: 421


🏁 Script executed:

#!/bin/bash
# Check the batch_transfer views to see how convert_to_nifti is handled
rg -n "convert_to_nifti\|BATCH_TRANSFER_CONVERT_TO_NIFTI" adit/batch_transfer/views.py -A10 -B5

Repository: openradx/adit

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Check the migration that adds convert_to_nifti field
cat adit/batch_transfer/migrations/0030_batchtransferjob_convert_to_nifti.py

Repository: openradx/adit

Length of output: 473


🏁 Script executed:

#!/bin/bash
# Search for any Celery tasks or background processing that might do the actual conversion
rg -n "@.*task|@shared_task|def.*task" adit/batch_transfer/ --type py -A3 | head -50

Repository: openradx/adit

Length of output: 3693


🏁 Script executed:

#!/bin/bash
# Search for procrastinate tasks or job processing functions
rg -n "@app.task|register_task|def.*nifti" adit/ --type py -A3 | grep -v test | head -80

Repository: openradx/adit

Length of output: 1271


🏁 Script executed:

#!/bin/bash
# Look for any process_job or similar function that might trigger conversion
rg -n "def process|def run|convert_to_nifti.*True" adit/batch_transfer/ --type py -B2 -A10 | head -100

Repository: openradx/adit

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Examine the _transfer_to_nifti method fully
sed -n '136,200p' adit/core/processors.py

Repository: openradx/adit

Length of output: 3152


🏁 Script executed:

#!/bin/bash
# Check where _transfer_to_nifti is called from
rg -n "_transfer_to_nifti" adit/ --type py -B5 -A2

Repository: openradx/adit

Length of output: 1200


🏁 Script executed:

#!/bin/bash
# Check how converted_file is used after being stored - is it validated/checked?
rg -n "converted_file" adit/mass_transfer/ --type py -A3 -B3

Repository: openradx/adit

Length of output: 2514


Verify file existence after dcm2niix conversion and handle potential multiple outputs.

The code assumes dcm2niix produces exactly one file named {series_name}.nii.gz without validating its existence. However, dcm2niix can produce multiple outputs from a single series with variant suffixes (e.g., ROI reconstructions), though only one file path is stored. Consider either:

  • Validating that the expected file exists after conversion
  • Scanning the output directory to handle cases where dcm2niix generates multiple outputs per series
🧰 Tools
🪛 Ruff (0.14.14)

[error] 438-438: subprocess call: check for execution of untrusted input

(S603)


[warning] 440-442: Avoid specifying long messages outside the exception class

(TRY003)

@claude
Copy link

claude bot commented Feb 3, 2026

Pull Request Review: Mass Transfer Feature

This PR introduces a comprehensive mass transfer facility. Overall this is well-implemented and follows ADIT's patterns. Here are my findings:

Critical Issues

  1. Bug in cleanup_jobs_and_tasks.py:30 - Hardcoded SelectiveTransferJob instead of using job_model variable
  2. Missing composite index - MassTransferVolume needs index on (job, partition_key) for query performance
  3. Path security - Review path validation in _destination_base_dir() for potential traversal
  4. Missing permissions - Add permission_required to destructive operations (delete, cancel, retry)

Important Issues

  1. No model-level date constraint - Add CheckConstraint for end_date >= start_date
  2. Race condition potential - processors.py:268-303 update pattern could race, consider update_or_create()
  3. Memory growth - Large series lists held in memory, consider batching
  4. Missing docs - New environment variables not documented in CLAUDE.md

Security Review

✅ User access validated for nodes
✅ Pseudonymization enabled by default
✅ Good error handling and logging
⚠️ Missing permission checks on some views
⚠️ Path traversal concerns need review

Performance

✅ Dedicated worker prevents blocking
✅ Partitioning and recursive splitting
✅ Bulk create for tasks
⚠️ Potential N+1 queries in _study_has_institution
⚠️ Memory usage for large datasets

Test Coverage

✅ Good unit tests for core logic
⚠️ Missing tests for: recursive splitting boundaries, concurrent processing, institution filtering, conversion failures

Recommendation

Approve with changes requested. This is solid work that follows project standards. Critical issues are straightforward fixes that don't require architectural changes. The mass transfer feature will be a valuable addition to ADIT!

@claude
Copy link

claude bot commented Feb 3, 2026

Pull Request Review: Mass Transfer Feature

I've completed a comprehensive review of PR #298, which introduces a new Mass Transfer facility for bulk DICOM export and conversion. This is a substantial feature addition (~2,300 lines) that follows the existing ADIT architecture patterns well.


✅ Strengths

Architecture & Design

  • Consistent patterns: Follows the established Job→Task model used by selective_transfer and batch_transfer
  • Proper isolation: Dedicated worker queue (mass_transfer) prevents long-running exports from blocking regular transfers
  • Good separation of concerns: Clear division between models, processors, forms, and views
  • Partition strategy: Daily/weekly partitioning with reusable filters is well-designed for segmenting large exports

Code Quality

  • Type hints: Proper use of type annotations throughout (e.g., from __future__ import annotations)
  • Django field conventions: Correctly follows project standards (text fields use blank=True, default="", non-string fields use both blank=True, null=True)
  • Error handling: Appropriate use of try/except blocks with cleanup in failure paths
  • Logging: Good use of logger for debugging and error tracking

Testing

  • Good coverage: Unit tests for partitioning, processor logic, and pseudonymization grouping
  • Mocking strategy: Proper use of pytest-mock to isolate units under test
  • Edge cases: Tests cover error scenarios (time window too small, pseudonymization opt-out)

🔴 Critical Issues

1. Command Injection Vulnerability (adit/mass_transfer/processors.py:1455-1466)

The _convert_volume method constructs a subprocess command using user-controlled data without proper sanitization:

cmd = [
    "dcm2niix",
    "-z", "y",
    "-o", str(output_path),
    "-f", series_name,  # ⚠️ series_name is derived from series_description
    str(volume.exported_folder),
]
result = subprocess.run(cmd, check=False, capture_output=True, text=True)

While series_name is passed through sanitize_filename() at adit/mass_transfer/processors.py:1400-1404, this may not be sufficient for shell command arguments. An attacker could craft a series description with special characters that could bypass sanitization.

Recommendation:

  • Use subprocess.run() with a list (already done - good!) rather than shell=True (already avoided - good!)
  • Add additional validation to ensure series_name contains only safe characters
  • Consider using a fixed naming pattern instead of user-controlled series descriptions

2. Race Condition in Pseudonym Assignment (adit/mass_transfer/processors.py:1176-1184)

The pseudonymization logic has a potential race condition:

for _, study_volumes in volumes_by_study.items():
    pseudonym = ""
    if job.pseudonymize:
        existing_pseudonym = next(
            (v.pseudonym for v in study_volumes if v.pseudonym),
            None,
        )
        pseudonym = existing_pseudonym or uuid.uuid4().hex  # ⚠️ Not atomic

If multiple workers process different partitions of the same job concurrently, two volumes from the same study in different partitions could receive different pseudonyms.

Recommendation:

  • Use database-level uniqueness constraints or select_for_update() to ensure atomic pseudonym assignment
  • Consider pre-generating pseudonyms during job creation or using a deterministic hashing approach

3. Incomplete Cleanup on Failure (adit/core/tasks.py:111-147)

The _cleanup_mass_transfer_exports function only cleans up volumes that haven't been converted:

for volume in volumes:
    if volume.status == MassTransferVolume.Status.CONVERTED:
        continue  # ⚠️ Skips cleanup for converted volumes

This could leave DICOM files in the export directory even after conversion completes successfully.

Recommendation:

  • The current implementation at processors.py:1476 already calls _cleanup_export() after conversion
  • Consider whether the task-level cleanup should be more aggressive or document why converted volumes are skipped

⚠️ Significant Issues

4. Missing Validation on Filter Count (adit/mass_transfer/processors.py:1155-1161)

The processor fails early if no filters exist, but doesn't validate this at form submission:

filters = list(job.filters.all())
if not filters:
    return {
        "status": MassTransferTask.Status.FAILURE,
        "message": "No filters configured for this job.",
    }

Recommendation: Add validation to MassTransferJobForm.clean() to prevent creating jobs with no filters.

5. Hardcoded NIfTI Conversion (adit/mass_transfer/forms.py:539-540)

def save(self, commit: bool = True):
    job = super().save(commit=False)
    job.convert_to_nifti = True  # ⚠️ Always true, but not in model fields

The convert_to_nifti field is set but doesn't appear in the MassTransferJob model definition.

Recommendation:

  • Add convert_to_nifti to the model if it's needed for compatibility
  • Or remove this line if it's not used by mass transfer

6. Recursive Query Split Could Cause Stack Overflow (adit/mass_transfer/processors.py:1357-1366)

The _find_studies method uses recursion to split time windows:

if len(studies) > max_results:
    if end - start < _MIN_SPLIT_WINDOW:
        raise DicomError(...)
    mid = start + (end - start) / 2
    return self._find_studies(operator, mf, start, mid) + self._find_studies(
        operator, mf, mid, end
    )

For very large date ranges with many results, this could cause deep recursion.

Recommendation:

  • Add a recursion depth counter with a maximum limit
  • Log warnings when splitting occurs to help diagnose performance issues
  • Consider iterative approach instead of recursion

7. Inconsistent Error Status for Volumes (adit/mass_transfer/processors.py:1193-1201)

When a volume fails, the task continues processing other volumes:

except Exception as err:
    logger.exception("Mass transfer failed for volume %s", ...)
    self._cleanup_export(volume)
    volume.status = MassTransferVolume.Status.ERROR
    volume.save()
    failed_count += 1

This is good for resilience, but there's no mechanism to retry failed volumes individually.

Recommendation:

  • Document that failed volumes require manual intervention or job re-run
  • Consider adding a "retry failed volumes" feature in a future PR

💡 Minor Issues & Suggestions

8. Missing Index on Foreign Keys (adit/mass_transfer/models.py:982-1014)

The MassTransferVolume model has foreign keys without explicit database indexes:

job = models.ForeignKey(MassTransferJob, on_delete=models.CASCADE, related_name="volumes")

Recommendation: Django auto-creates indexes on ForeignKey fields, but consider adding a compound index on (job_id, partition_key) for common queries.

9. Magic Number for Series Number Parsing (adit/mass_transfer/processors.py:1267)

series_number = _parse_int(series.get("SeriesNumber"), default=None)

The _parse_int function silently returns None on parse failures, which might hide data quality issues.

Recommendation: Log a warning when series number parsing fails to help diagnose PACS data issues.

10. Unclear Terminology: "Mass Transfer" vs "Volume Export"

The PR description mentions bringing "volume export workflow" into ADIT as "Mass Transfer," but there's potential confusion between:

  • Transfer (moving DICOM between servers)
  • Export (converting DICOM to NIfTI)

Recommendation: Consider renaming to "Bulk Export" or "Volume Export" to clarify that this is primarily an export-with-conversion feature, not just a transfer feature.

11. Missing Documentation for dcm2niix Dependency

The code calls dcm2niix via subprocess but doesn't document this external dependency in requirements or CLAUDE.md.

Recommendation:

  • Update CLAUDE.md with dcm2niix installation requirements
  • Add a check during startup to verify dcm2niix is available
  • Document supported dcm2niix versions

12. Potential Memory Issue with Large Result Sets (adit/mass_transfer/processors.py:1355)

studies = list(operator.find_studies(query, limit_results=max_results + 1))

This loads all results into memory before checking the count.

Recommendation: If the operator supports it, use a count query first or stream results.


🔒 Security Review

Authentication & Authorization ✅

  • Forms properly validate user access to source/destination nodes (forms.py:492-506)
  • User context is passed through the form correctly

Input Validation ⚠️

  • Date validation is present (forms.py:508-513)
  • Series name sanitization exists but may need strengthening (see Issue Fully automate dev setup with Gitpod #1)
  • DICOM query patterns are converted to regex - ensure convert_to_python_regex() is safe

Data Exposure ✅

  • Pseudonymization is properly applied at export time
  • Original patient IDs are preserved in volume records (for audit trail) but pseudonyms are used in file paths

🧪 Test Coverage Assessment

Good coverage for:

  • Partition logic
  • Pseudonym grouping by study
  • Pseudonymization opt-out
  • Time window splitting error cases

Missing tests for:

  • Volume discovery and filtering logic
  • Actual DICOM export and conversion (integration test needed)
  • Cleanup behavior on various failure modes
  • Filter validation at form level
  • Race conditions in pseudonym assignment

Recommendation: Add integration tests that use the test Orthanc servers to verify end-to-end export and conversion.


📊 Performance Considerations

Positive

  • Dedicated worker queue prevents blocking other operations
  • Partitioning strategy limits query result set sizes
  • Per-volume status tracking enables parallelization

Concerns

  1. Temporary Storage: Exports are written to MASS_TRANSFER_EXPORT_BASE_DIR before conversion. For large jobs, this could consume significant disk space. Consider:

    • Documenting disk space requirements
    • Adding monitoring/alerts for export directory size
    • Implementing export directory cleanup on a schedule
  2. Database Load: Creating many MassTransferVolume records could be slow. The current implementation uses get_or_create in a loop (processors.py:1297-1330). Consider:

    • Batch creation with bulk_create where possible
    • Adding pagination to volume display views

🎯 Recommendations Summary

Must Fix Before Merge

  1. Issue Fully automate dev setup with Gitpod #1: Review and strengthen shell command sanitization in _convert_volume
  2. Issue wip #2: Fix race condition in pseudonym assignment
  3. Issue Transfer tasks #5: Clarify convert_to_nifti field usage

Should Fix Before Merge

  1. Issue Channels #4: Add form validation for minimum filter count
  2. Issue Polymorphism #6: Add recursion depth limit to _find_studies
  3. Issue Pure batch transfer tasks #11: Document dcm2niix dependency

Can Address in Follow-up

  1. Issue Separate settings and requirements #3, True async consumer #7: Enhance cleanup and retry mechanisms
  2. Issue Refactor connector #8-Remove BatchID #12: Performance and observability improvements
  3. Add integration tests for end-to-end workflow
  4. Consider renaming for clarity

📝 Final Assessment

This is a well-structured feature that follows ADIT's architectural patterns and coding standards. The implementation demonstrates good understanding of the codebase and Django best practices. However, there are critical security and concurrency issues that must be addressed before merging.

The feature adds valuable functionality for bulk DICOM exports with NIfTI conversion, and the partition-based approach is elegant. With the issues above addressed, this will be a solid addition to ADIT.

Recommendation: Request changes to address critical issues #1, #2, and #5, then approve after verification.


Great work overall! Let me know if you'd like me to elaborate on any of these points or help with fixes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@adit/mass_transfer/forms.py`:
- Around line 96-125: The Meta inner class in adit/mass_transfer/forms.py
declares mutable class attributes labels, help_texts, and widgets; annotate each
with typing.ClassVar to indicate they are class-level constants and satisfy Ruff
RUF012. Import ClassVar from typing if not present, and add ClassVar[...] type
annotations on Meta.labels, Meta.help_texts and Meta.widgets (using appropriate
dict types such as ClassVar[dict[str, str]] or ClassVar[dict[str, Any]] for
widgets) while leaving values unchanged.
- Around line 19-78: Annotate the mutable class attributes to satisfy RUF012 by
marking MODALITY_CHOICES as a ClassVar[List[Tuple[str, str]]] and the
Meta.labels mapping as a ClassVar[Dict[str, str]] (or the appropriate mapping
type) and add the necessary imports from typing (ClassVar, List, Tuple, Dict) at
the top; specifically update the declaration of MODALITY_CHOICES and the inner
Meta.labels in the form class (referencing MODALITY_CHOICES and Meta.labels) so
they are typed as ClassVar to avoid shared-mutable-class-attribute lint errors.
🧹 Nitpick comments (5)
adit/core/tasks.py (2)

74-77: Consider narrowing the exception type for import failure.

The bare Exception catch is flagged by Ruff (BLE001). For import failures, ImportError (which includes ModuleNotFoundError) would be more appropriate and explicit about the expected failure mode.

🔧 Suggested fix
     try:
         from adit.mass_transfer.models import MassTransferVolume
-    except Exception:
+    except ImportError:
         return

94-97: Cleanup failure leaves volume in potentially confusing state.

When cleanup fails (line 94-97), the volume retains its original status (e.g., EXPORTED) with exported_folder still populated, but the folder may be in an inconsistent state. Consider setting status to ERROR regardless, so downstream logic doesn't attempt to process it.

🔧 Suggested fix
             except Exception as err:
                 volume.add_log(f"Cleanup failed: {err}")
+                volume.status = MassTransferVolume.Status.ERROR
                 volume.save()
                 continue
adit/mass_transfer/tests/test_cleanup.py (2)

24-56: Consider using factory-boy for mass transfer model creation.

The test mixes factory-boy (UserFactory, DicomServerFactory, DicomFolderFactory) with direct objects.create() calls for mass transfer models. For consistency and maintainability, consider creating MassTransferJobFactory, MassTransferTaskFactory, and MassTransferVolumeFactory. As per coding guidelines, "Use factory-boy for generating test data in test files."


17-18: Consider adding edge case tests.

This test covers the primary success path. For more robust coverage, consider adding tests for:

  • Volume with CONVERTED status (should be skipped per line 85-86 in tasks.py)
  • Non-matching partition_key (cleanup should not affect it)
  • Non-existent export directory (should handle gracefully)
#!/bin/bash
# Check if there are other cleanup-related tests in the mass_transfer app
fd -t f 'test*.py' adit/mass_transfer/tests/ --exec grep -l 'cleanup' {} \;
adit/mass_transfer/forms.py (1)

178-198: Consider populating self.tasks or removing it.

self.tasks is initialized but never set, so callers will always see an empty list. If it’s intended for downstream use, assign it after bulk_create; otherwise, drop the attribute to avoid confusion.

♻️ Suggested fix
         MassTransferTask.objects.bulk_create(tasks)
+        self.tasks = tasks

Comment on lines +19 to +78
MODALITY_CHOICES = [
("", "Any modality"),
("CT", "CT"),
("MR", "MR"),
("XR", "XR"),
("US", "US"),
("NM", "NM"),
("PT", "PT"),
("MG", "MG"),
("CR", "CR"),
("DX", "DX"),
("RF", "RF"),
("XA", "XA"),
("OT", "OT"),
("SR", "SR"),
("PR", "PR"),
("ECG", "ECG"),
("SEG", "SEG"),
("RTSTRUCT", "RTSTRUCT"),
("RTPLAN", "RTPLAN"),
("RTDOSE", "RTDOSE"),
("RTIMAGE", "RTIMAGE"),
("SM", "SM"),
("IVUS", "IVUS"),
("OCT", "OCT"),
("ES", "ES"),
("OP", "OP"),
("IO", "IO"),
("FA", "FA"),
("RG", "RG"),
("MS", "MS"),
("DOC", "DOC"),
]

modality = forms.ChoiceField(
required=False,
choices=MODALITY_CHOICES,
help_text="Leave blank for any modality.",
)

class Meta:
model = MassTransferFilter
fields = (
"name",
"modality",
"institution_name",
"apply_institution_on_study",
"study_description",
"series_description",
"series_number",
)
labels = {
"name": "Filter name",
"modality": "Modality",
"institution_name": "Institution name",
"apply_institution_on_study": "Apply institution filter on study",
"study_description": "Study description",
"series_description": "Series description",
"series_number": "Series number",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Annotate mutable class attributes to satisfy RUF012.

Ruff flags MODALITY_CHOICES and Meta.labels as mutable class attributes. Add ClassVar annotations to avoid lint failures and accidental shared mutation.

🔧 Proposed fix
-from typing import cast
+from typing import ClassVar, cast
@@
-class MassTransferFilterForm(forms.ModelForm):
-    MODALITY_CHOICES = [
+class MassTransferFilterForm(forms.ModelForm):
+    MODALITY_CHOICES: ClassVar[list[tuple[str, str]]] = [
@@
-        labels = {
+        labels: ClassVar[dict[str, str]] = {
             "name": "Filter name",
             "modality": "Modality",
             "institution_name": "Institution name",
             "apply_institution_on_study": "Apply institution filter on study",
             "study_description": "Study description",
             "series_description": "Series description",
             "series_number": "Series number",
         }
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 19-51: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


[warning] 70-78: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🤖 Prompt for AI Agents
In `@adit/mass_transfer/forms.py` around lines 19 - 78, Annotate the mutable class
attributes to satisfy RUF012 by marking MODALITY_CHOICES as a
ClassVar[List[Tuple[str, str]]] and the Meta.labels mapping as a
ClassVar[Dict[str, str]] (or the appropriate mapping type) and add the necessary
imports from typing (ClassVar, List, Tuple, Dict) at the top; specifically
update the declaration of MODALITY_CHOICES and the inner Meta.labels in the form
class (referencing MODALITY_CHOICES and Meta.labels) so they are typed as
ClassVar to avoid shared-mutable-class-attribute lint errors.

Comment on lines 96 to 125
class Meta:
model = MassTransferJob
fields = (
"source",
"destination",
"start_date",
"end_date",
"partition_granularity",
"filters",
"pseudonymize",
"send_finished_mail",
)
labels = {
"start_date": "Start date",
"end_date": "End date",
"partition_granularity": "Partition granularity",
"pseudonymize": "Pseudonymize data",
"send_finished_mail": "Send Email when job is finished",
}
help_texts = {
"partition_granularity": "Daily or weekly partition windows.",
"pseudonymize": (
"When disabled, patient identifiers are preserved and output folders use "
"Patient ID."
),
}
widgets = {
"start_date": forms.DateInput(attrs={"type": "date"}),
"end_date": forms.DateInput(attrs={"type": "date"}),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add ClassVar annotations for Meta mappings (RUF012).

labels, help_texts, and widgets are mutable class attributes. Annotate with ClassVar to satisfy Ruff and clarify intent.

🔧 Proposed fix
     class Meta:
@@
-        labels = {
+        labels: ClassVar[dict[str, str]] = {
             "start_date": "Start date",
             "end_date": "End date",
             "partition_granularity": "Partition granularity",
             "pseudonymize": "Pseudonymize data",
             "send_finished_mail": "Send Email when job is finished",
         }
-        help_texts = {
+        help_texts: ClassVar[dict[str, str]] = {
             "partition_granularity": "Daily or weekly partition windows.",
             "pseudonymize": (
                 "When disabled, patient identifiers are preserved and output folders use "
                 "Patient ID."
             ),
         }
-        widgets = {
+        widgets: ClassVar[dict[str, forms.DateInput]] = {
             "start_date": forms.DateInput(attrs={"type": "date"}),
             "end_date": forms.DateInput(attrs={"type": "date"}),
         }
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 108-114: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


[warning] 115-121: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


[warning] 122-125: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🤖 Prompt for AI Agents
In `@adit/mass_transfer/forms.py` around lines 96 - 125, The Meta inner class in
adit/mass_transfer/forms.py declares mutable class attributes labels,
help_texts, and widgets; annotate each with typing.ClassVar to indicate they are
class-level constants and satisfy Ruff RUF012. Import ClassVar from typing if
not present, and add ClassVar[...] type annotations on Meta.labels,
Meta.help_texts and Meta.widgets (using appropriate dict types such as
ClassVar[dict[str, str]] or ClassVar[dict[str, Any]] for widgets) while leaving
values unchanged.

@claude
Copy link

claude bot commented Feb 3, 2026

Pull Request Review: Mass Transfer Feature

Overview

This PR introduces a substantial new feature bringing volume export workflow into ADIT as a first-class transfer type. The implementation includes a dedicated worker queue, partitioned exports, and NIfTI conversion capabilities. Overall, this is well-structured code that follows the project patterns, but there are some important issues to address.

Critical Issues

1. Security: Unsafe subprocess execution (processors.py:427-442)

The dcm2niix command uses user-influenced paths without proper validation. While sanitize_filename() is used for series_name, there is no explicit validation that paths do not escape their intended directories. An attacker could potentially craft malicious DICOM metadata to create paths outside the intended directory structure.

Recommendation: Add explicit path validation to ensure all paths remain within expected base directories using path.resolve().is_relative_to()

2. Resource Management: Missing cleanup in exception paths (processors.py:161-173)

When _export_volume succeeds but _convert_volume fails, the exported DICOM files remain on disk indefinitely. If _convert_volume fails, volume.exported_folder might not be set, making _cleanup_export ineffective. This could lead to disk space exhaustion on long-running mass transfers.

Recommendation: Ensure exported_folder is saved before conversion, or use try/finally blocks for cleanup.

3. Data Integrity: Race condition in volume creation (processors.py:269-302)

The get_or_create pattern updates an existing volume even if it belongs to a different partition. If a series appears in multiple partitions, later tasks will overwrite the partition_key and other metadata of existing volumes.

Recommendation: Either make partition_key part of the uniqueness constraint, OR skip updating if the volume already exists with status CONVERTED

Major Issues

4. Performance: Recursive study splitting without depth limit (processors.py:329-338)

The recursive time window splitting has no maximum recursion depth. For very dense time periods, this could result in deep recursion and stack overflow.

Recommendation: Add a depth counter parameter or use iterative splitting with a queue.

5. Error Handling: Silent failures in institution filtering (processors.py:342-359)

The _study_has_institution method returns False for any exception without logging. If find_series fails, potentially matching studies are silently excluded.

Recommendation: Let exceptions propagate or add logging when series queries fail.

6. Configuration: Missing environment variable validation

No runtime validation that MASS_TRANSFER_EXPORT_BASE_DIR exists and is writable. Workers will crash on first task if the directory is inaccessible.

Recommendation: Add startup validation in apps.py or Django system checks.

Minor Issues

  1. Code Quality: Inconsistent error messages - some are user-facing, others technical
  2. Type Safety: Missing return type hints on helper functions
  3. Testing: Limited edge case coverage (recursion depth, concurrent access, missing dcm2niix)
  4. Documentation: Missing docstrings on key classes and complex methods

Positive Aspects

  • Excellent architecture following existing Job/Task patterns
  • Proper isolation with dedicated mass_transfer queue
  • Well-designed task failure cleanup in core/tasks.py
  • Thoughtful UX with filter reusability and form prefill
  • Good test foundation covering critical paths
  • Correct database migrations
  • Sensible priority values lower than interactive transfers

Recommendations

High Priority: Fix issues 1-3, add recursion limit (4), and environment validation (6)
Medium Priority: Improve error handling (5), add edge case tests (9)
Low Priority: Add type hints (8), standardize messages (7), add docstrings (10)

Final Assessment

This is a strong implementation that demonstrates good understanding of the codebase architecture. However, the security issue with subprocess paths and resource cleanup concerns should be addressed before merging. Once the critical issues are resolved, this will be a valuable addition to ADIT.

Recommendation: Request changes for critical issues, approve with comments for the rest.

@claude
Copy link

claude bot commented Feb 6, 2026

PR Review: Mass Transfer Feature

This PR introduces a comprehensive mass transfer facility for bulk DICOM export and conversion. I've reviewed the implementation for code quality, security, performance, and test coverage. Overall, this is a well-structured feature with good separation of concerns. Here are my findings:

Architecture & Design ✅

Strengths:

  • Clean separation of concerns with dedicated Django app following existing patterns
  • Proper job/task processing model consistent with selective_transfer and batch_transfer
  • Well-designed partitioning system for segmenting large exports by time windows
  • Per-study pseudonymization correctly groups series by study UID (adit/mass_transfer/processors.py:144-156)
  • Dedicated mass_transfer worker queue prevents long-running tasks from blocking regular transfers

Observations:

  • The recursive split approach in _find_studies (processors.py:330-338) is elegant for handling large result sets
  • MassTransferVolume tracking model provides good granularity for monitoring individual series

Code Quality 👍

Positive aspects:

  • Consistent with Google Python Style Guide
  • Good use of type hints and dataclasses (e.g., PartitionWindow)
  • Proper Django field conventions followed (blank=True for text fields, blank+null for non-text)
  • Clean form validation with user-friendly error messages (forms.py:189-211)

Minor suggestions:

  1. processors.py:438 - subprocess.run without timeout could hang indefinitely on dcm2niix conversion. Consider adding a timeout parameter based on expected conversion times.

  2. processors.py:165-173 - Broad exception handler catches all errors during volume processing. This is already done well, but you might want to differentiate between dcm2niix failures vs DICOM fetch failures.

Security 🔒

Good practices:

  • Proper user permission checks in views (forms.py:191-192, 199-200)
  • MassTransferLockedMixin applied to all views
  • sanitize_filename used consistently for file paths (processors.py:378, 389, 423)
  • Form validation prevents end_date < start_date (forms.py:209-210)
  • DicomNodeChoiceField restricts to user-accessible nodes

No significant security concerns found.

Performance Considerations ⚡

Well-designed aspects:

  • Recursive query splitting prevents overwhelming PACS servers (processors.py:330-338)
  • Bulk create for tasks reduces DB round-trips (forms.py:232)
  • get_or_create for volumes prevents duplicates efficiently (processors.py:269-287)
  • Mandatory cleanup of DICOM exports after NIfTI conversion saves disk space (processors.py:448)

Potential concerns:

  1. processors.py:232 - Series-level queries run in a loop per study. For studies with many series, this could be slow. Consider batching if performance issues arise with large studies.

  2. processors.py:84-102 - Cleanup function iterates all volumes in partition. For large partitions, this could be slow. Consider adding an index on (job_id, partition_key) if not already present.

  3. _MIN_SPLIT_WINDOW = timedelta(minutes=30) - This hardcoded minimum might be too large for very high-volume PACS. Consider making it configurable.

Error Handling & Robustness 🛡️

Excellent retry strategy:

  • Unified DICOM_TASK_RETRY_STRATEGY applied to both queues (core/tasks.py:58-63, 232-238)
  • Proper cleanup on task failure via _cleanup_mass_transfer_exports (core/tasks.py:66-103, 156, 178, 194)
  • Volume-level error tracking preserves partial success (processors.py:182-195)
  • Status differentiation: WARNING for partial failures, FAILURE for complete failures

Good exception handling:

  • FileNotFoundError caught during cleanup (processors.py:457, core/tasks.py:92)
  • Validation for node types (processors.py:122-125)
  • Empty filter check prevents pointless execution (processors.py:128-133)

Suggestion:

  • Consider adding retry logic specifically for dcm2niix failures, as conversion failures might be transient (disk I/O issues, temporary resource exhaustion).

Test Coverage ✅

Good coverage:

  • Unit tests for partitioning logic (test_partitions.py)
  • Processor tests for pseudonymization grouping (test_processor.py:61-147)
  • Pseudonymization opt-out test (test_processor.py:151-209)
  • Cleanup tests (test_cleanup.py)
  • Time window edge case testing (test_processor.py:24-57)

Missing tests:

  • No integration tests for actual dcm2niix conversion (understandable if complex to set up)
  • No tests for filter matching logic (_dicom_match, series filtering)
  • No tests for recursive study splitting behavior
  • No acceptance tests (unlike selective_transfer which has playwright tests)

Suggestion: Add tests for the filter matching logic, especially for institution_name filtering at study vs series level (processors.py:221-223, 242-248).

Configuration & Deployment 🚀

Well-structured:

  • Proper Docker service configuration for mass_transfer_worker
  • CI/CD updated to cache new worker image (.github/workflows/ci.yml)
  • Environment variables properly documented (example.env would need to add MASS_TRANSFER_* vars if not present)
  • AGENTS.md updated with troubleshooting section

Note: Verify that dcm2niix is included in the Docker image. The code assumes it's on PATH but I don't see explicit installation in the Dockerfile.

Django Conventions 📋

Excellent adherence:

  • Migrations properly generated
  • Admin registered (admin.py:22)
  • Template structure follows project conventions
  • URL patterns properly namespaced (urls.py)
  • AppConfig with ready() method for initialization (apps.py:49)

Documentation 📚

Good:

  • AGENTS.md updated with overview and troubleshooting
  • Inline comments where needed
  • Clear help_text on form fields (forms.py:116-121)
  • PR description clearly explains the feature

Could be improved:

  • Add docstrings to public methods in processors.py (e.g., _find_volumes, _export_volume, _convert_volume)
  • Document the recursive splitting algorithm behavior
  • Add comments explaining the two-phase retry strategy mentioned in core/tasks.py:218-226

Specific Issues Found 🐛

Critical:

None

Medium:

  1. Subprocess timeout missing (processors.py:438) - Could hang worker
  2. Hardcoded settings - MASS_TRANSFER_EXPORT_BASE_DIR location not validated at startup

Low:

  1. Magic string " (cleaned)" used as marker (processors.py:452, 464) - Could use Status enum
  2. Timezone handling - Uses timezone.make_aware but assumes server timezone matches PACS timezone (processors.py:281, 297)

Recommendations 📝

  1. Add subprocess timeout to dcm2niix calls
  2. Add validation that MASS_TRANSFER_EXPORT_BASE_DIR is writable on app startup
  3. Consider adding more comprehensive integration tests
  4. Add docstrings to processor methods
  5. Consider making _MIN_SPLIT_WINDOW configurable
  6. Add database index for MassTransferVolume queries if not in migration

Summary

This is a well-architected and production-ready feature. The code follows project conventions, has good error handling, and includes reasonable test coverage. The identified issues are minor and mostly around edge cases and documentation. The integration of retry strategy across all DICOM tasks is particularly well done.

Recommendation: Approve with minor suggestions

Great work! 🎉

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@adit/mass_transfer/processors.py`:
- Around line 329-338: The recursive split in _find_studies uses mid derived
from start/end but passes the same mid to both recursive calls causing studies
whose StudyDate equals mid.date() (with full-day StudyTime ranges) to be
included in both halves; change the boundary handling so one side is exclusive —
e.g., when computing mid or when calling _find_studies, ensure the left call
uses range [start, mid) and the right uses [mid, end] (or vice versa), by
adjusting mid to an exclusive instant (subtract one second/microsecond or switch
the comparison used for StudyDate/StudyTime) so that studies on mid.date() are
only returned by one branch; keep _find_volumes dedup as-is.
- Around line 30-36: The _dicom_match implementation uses regex.search which
allows substring matches; change it to use regex.fullmatch so the converted
unanchored pattern (from convert_to_python_regex) must match the entire value.
Update the function _dicom_match to call regex.fullmatch(str(value)) instead of
regex.search(...); ensure callers that filter study_description,
series_description, and institution_name continue to use _dicom_match unchanged.
🧹 Nitpick comments (8)
adit/mass_transfer/processors.py (6)

80-97: _volume_export_path and _volume_output_path are identical implementations.

Both functions have the exact same body: base_dir / year_month / subject_id / series_name. If they're intended to diverge in the future, a comment explaining the intent would help. Otherwise, consolidate into a single helper.

Suggested consolidation
-def _volume_export_path(
-    base_dir: Path,
-    study_dt: datetime,
-    subject_id: str,
-    series_name: str,
-) -> Path:
-    year_month = study_dt.strftime("%Y%m")
-    return base_dir / year_month / subject_id / series_name
-
-
-def _volume_output_path(
-    base_dir: Path,
-    study_dt: datetime,
-    subject_id: str,
-    series_name: str,
-) -> Path:
-    year_month = study_dt.strftime("%Y%m")
-    return base_dir / year_month / subject_id / series_name
+def _volume_path(
+    base_dir: Path,
+    study_dt: datetime,
+    subject_id: str,
+    series_name: str,
+) -> Path:
+    year_month = study_dt.strftime("%Y%m")
+    return base_dir / year_month / subject_id / series_name

175-180: "Volumes processed" count is misleading — it includes already-converted volumes that were skipped.

len(volumes) counts all discovered volumes, but volumes with status == CONVERTED are skipped at line 158. The log suggests they were all processed in this run. Consider tracking the actual number attempted.

Proposed fix
+        skipped_count = 0
+
         for _, study_volumes in volumes_by_study.items():
             ...
             for volume in study_volumes:
                 if volume.status == MassTransferVolume.Status.CONVERTED:
+                    skipped_count += 1
                     continue
                 ...

         log_lines = [
             f"Partition {self.mass_task.partition_key}",
-            f"Volumes processed: {len(volumes)}",
+            f"Volumes found: {len(volumes)}",
+            f"Skipped (already converted): {skipped_count}",
             f"Converted: {converted_count}",
             f"Failed: {failed_count}",
         ]

110-110: Missing return type annotation on process().

The base class declares process(self) -> ProcessingResult. Adding the annotation here keeps pyright happy and makes the contract explicit.

-    def process(self):
+    def process(self) -> ProcessingResult:

(Also add ProcessingResult to the import from adit.core.processors.)


258-263: Unreachable except block — series_number is already parsed by _parse_int.

At line 239, series_number is converted via _parse_int(...) which returns int | None. Both mf.series_number (a PositiveIntegerField) and series_number are int | None, so the comparison at line 260 cannot raise TypeError or ValueError.

Simplified version
                     if mf.series_number is not None:
-                        try:
-                            if series_number is None or mf.series_number != series_number:
-                                continue
-                        except (TypeError, ValueError):
-                            continue
+                        if series_number is None or mf.series_number != series_number:
+                            continue

221-232: Redundant PACS queries: _study_has_institution fetches all series, then find_series fetches them again.

When apply_institution_on_study is set, line 222 queries all series to check institution names. Then line 232 queries the same series again for volume discovery. For studies with many series, this doubles the PACS load.

Consider caching the series list from the institution check and reusing it:

Sketch
+                prefetched_series: list | None = None
                 if mf.institution_name and mf.apply_institution_on_study:
-                    if not self._study_has_institution(operator, study, mf.institution_name):
+                    series_query = QueryDataset.create(
+                        PatientID=study.PatientID,
+                        StudyInstanceUID=study.StudyInstanceUID,
+                    )
+                    series_query.dataset.InstitutionName = ""
+                    prefetched_series = list(operator.find_series(series_query))
+                    if not any(
+                        _dicom_match(mf.institution_name, s.get("InstitutionName"))
+                        for s in prefetched_series
+                    ):
                         continue
 
-                series_query = QueryDataset.create(...)
-                series_list = list(operator.find_series(series_query))
+                if prefetched_series is None:
+                    series_query = QueryDataset.create(
+                        PatientID=study.PatientID,
+                        StudyInstanceUID=study.StudyInstanceUID,
+                    )
+                    series_query.dataset.InstitutionName = ""
+                    prefetched_series = list(operator.find_series(series_query))
+                series_list = prefetched_series

110-196: Consider adding structured logging for key state transitions in process().

The only logging is logger.exception on failure. Adding logger.info calls for start of processing, volume discovery counts, and per-volume export/convert steps would significantly improve operational observability for these long-running tasks. As per coding guidelines, structured logging should be used for agent actions and state transitions (**/*.py).

adit/mass_transfer/forms.py (1)

94-94: tasks attribute is declared but never populated.

tasks: list[MassTransferTask] on line 94 and self.tasks = [] on line 128 suggest the form tracks created tasks, but _save_tasks uses bulk_create without storing the result. Either remove the attribute or populate it:

Proposed fix
     def _save_tasks(self, job: MassTransferJob) -> None:
         ...
-        MassTransferTask.objects.bulk_create(tasks)
+        self.tasks = MassTransferTask.objects.bulk_create(tasks)

Also applies to: 128-128

AGENTS.md (1)

190-192: Consider expanding the troubleshooting section with version/diagnostic info.

The single bullet is helpful but could be more actionable. Operators encountering conversion failures would benefit from knowing how to verify the installation and check output.

Suggested expansion
 ### Mass Transfer Conversion Failures
 
 - Ensure `dcm2niix` is installed and available on `PATH` inside the worker image
+- Verify version: `dcm2niix --version` (expected: 1.0.20250506 or later)
+- Check worker logs: `docker compose logs mass_transfer_worker`

Comment on lines +30 to +36
def _dicom_match(pattern: str, value: str | None) -> bool:
if not pattern:
return True
if value is None:
return False
regex = convert_to_python_regex(pattern)
return bool(regex.search(str(value)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "convert_to_python_regex" --type=py -B2 -A5

Repository: openradx/adit

Length of output: 4469


🏁 Script executed:

fd -t f "dicom_utils.py" --type=py

Repository: openradx/adit

Length of output: 226


🏁 Script executed:

sed -n '48,80p' adit/core/utils/dicom_utils.py

Repository: openradx/adit

Length of output: 1188


🏁 Script executed:

# Verify where _dicom_match is called in processors.py to understand impact
rg -n "_dicom_match" adit/mass_transfer/processors.py -B2 -A2

Repository: openradx/adit

Length of output: 1108


🏁 Script executed:

# Check if there are any tests that clarify the intended behavior
fd -name "*test*" -type f | xargs rg -l "_dicom_match\|convert_to_python_regex" 2>/dev/null | head -5

Repository: openradx/adit

Length of output: 287


🏁 Script executed:

# Search for test files containing convert_to_python_regex or _dicom_match
find . -name "*test*.py" -type f -exec rg -l "convert_to_python_regex\|_dicom_match" {} \;

Repository: openradx/adit

Length of output: 39


Use regex.fullmatch() instead of regex.search() in _dicom_match.

convert_to_python_regex produces an unanchored pattern (it only escapes, replaces \* with .*, and \? with . without adding ^ or $ anchors). This causes search() to match substrings unintentionally—e.g., pattern "Hospital" matches "General Hospital". DICOM wildcard matching per the standard requires matching the full value. This affects filtering of study_description (line 216), series_description (line 253), and institution_name (lines 244, 356).

Proposed fix
 def _dicom_match(pattern: str, value: str | None) -> bool:
     if not pattern:
         return True
     if value is None:
         return False
     regex = convert_to_python_regex(pattern)
-    return bool(regex.search(str(value)))
+    return bool(regex.fullmatch(str(value)))
🤖 Prompt for AI Agents
In `@adit/mass_transfer/processors.py` around lines 30 - 36, The _dicom_match
implementation uses regex.search which allows substring matches; change it to
use regex.fullmatch so the converted unanchored pattern (from
convert_to_python_regex) must match the entire value. Update the function
_dicom_match to call regex.fullmatch(str(value)) instead of regex.search(...);
ensure callers that filter study_description, series_description, and
institution_name continue to use _dicom_match unchanged.

@NumericalAdvantage
Copy link
Collaborator Author

closes #289

@claude
Copy link

claude bot commented Feb 8, 2026

Comprehensive Code Review: PR #298 - Mass Transfer App

I've completed a thorough review of this PR which adds a Mass Transfer app with partitioned DICOM export and conversion. Overall, this is a solid implementation with good structure and adherence to project conventions. Here's my detailed feedback:


🔴 Critical Issues (Must Fix)

1. Security: Filter Ownership Vulnerability

Location: adit/mass_transfer/models.py:22-24

owner = models.ForeignKey(
    settings.AUTH_USER_MODEL,
    on_delete=models.CASCADE,
    related_name="mass_transfer_filters",
    null=True,  # ⚠️ SECURITY ISSUE
    blank=True,
)

Problem: Allowing null=True for owner means filters can exist without an owner, creating a security gap where unowned filters could be accessible to all users or create authorization bypasses.

Recommendation: Remove null=True and ensure every filter has an owner. If system-wide filters are needed, add a separate field like is_shared or is_system_filter.

2. Race Condition in Volume Creation

Location: adit/mass_transfer/processors.py:268-302

volume, created = MassTransferVolume.objects.get_or_create(
    job=job,
    series_instance_uid=series_uid,
    defaults={...},
)
if not created:
    volume.partition_key = self.mass_task.partition_key
    # ... 13 fields updated
    volume.save()

Problem: The get_or_create followed by an update creates a race condition. Two concurrent tasks processing the same series could overwrite each other's changes. The update logic also duplicates all default values, violating DRY.

Recommendation: Use update_or_create instead.

3. Potential Infinite Recursion

Location: adit/mass_transfer/processors.py:336

Problem: No recursion depth limit. If a PACS returns more than max_results for a very small time window, this could recurse indefinitely.

Recommendation: Add depth tracking parameter with a maximum limit (e.g., 20).

4. Missing dcm2niix Validation

Location: adit/mass_transfer/processors.py:438

Problem: The code assumes dcm2niix is installed but only fails at runtime. Better to validate at startup in apps.py.


🟡 High Priority Issues (Should Fix)

5. Missing Database Indexes

Location: adit/mass_transfer/models.py:169-173

Problem: No explicit index on partition_key which is frequently queried.

Recommendation: Add indexes on ["job", "partition_key"] and ["status"].

6. Inconsistent Field Types

Location: adit/mass_transfer/models.py:152 vs :31

Problem: MassTransferVolume.series_number uses IntegerField while MassTransferFilter.series_number uses PositiveIntegerField. Series numbers are always positive per DICOM standard.

Recommendation: Use PositiveIntegerField consistently.

7. Insecure Directory Permissions

Location: adit/mass_transfer/processors.py:68-69, 76

Problem: Uses default permissions which could create world-readable directories containing PHI if umask is permissive.

Recommendation: Set explicit permissions with mode=0o750.

8. Queue Strategy Confusion

Location: adit/core/tasks.py:232-238

Question: The PR description mentions a separate mass transfer worker, but the task uses the same queue="dicom" as regular tasks. Should this use queue="mass_transfer" instead?


🟢 Medium Priority Issues (Nice to Have)

9. Cleanup Logic Location

Location: adit/core/tasks.py:66-103

Issue: Mass transfer-specific cleanup logic is in the core app, breaking modularity.

Recommendation: Consider a callback/hook pattern to keep mass_transfer logic in its own app.

10. Error Handling

Location: adit/mass_transfer/processors.py:456

Suggestion: Log FileNotFoundError at DEBUG level instead of silently passing.

11. Logging

Suggestion: Add more INFO-level logging for normal operations to aid debugging.


🧪 Test Coverage Gaps

Missing Tests:

  1. Form validation tests - Date validation, filter ownership, node type validation
  2. View authorization tests - Users cannot access other users' filters/jobs
  3. Processor edge cases - dcm2niix not installed, directory not writable, zero images
  4. Concurrent execution tests - Race conditions
  5. Retry logic tests - For core/tasks.py changes

✅ Strengths

  1. Excellent adherence to Django conventions - proper field conventions, type hints, line length
  2. Good separation of concerns - clear boundaries between models, processors, forms, views
  3. Proper authorization - all filter views correctly restrict to user's own filters
  4. Good file handling security - uses sanitize_filename() to prevent path traversal
  5. Centralized retry strategy - great improvement to core task handling
  6. Good documentation - CLAUDE.md updated with troubleshooting and dependency notes
  7. Test quality - good use of mocking, deterministic UUIDs in tests

📊 Overall Assessment

Code Quality: 8/10 - Well-structured, follows Django conventions
Security: 7/10 - Good authorization, but some concerns
Performance: 7/10 - Generally good, some optimization opportunities
Test Coverage: 6/10 - Good processor tests, missing form/view/integration tests
Maintainability: 8/10 - Clear structure, good documentation

Verdict: This is a solid PR that adds significant functionality. With the critical issues addressed (especially the filter ownership security issue and race condition), this would be ready to merge. The code quality is high and follows project conventions well.


🎯 Recommendations Summary

Critical (Must Fix): Items 1-4
High Priority (Should Fix): Items 5-8
Medium Priority (Nice to Have): Items 9-11
Test Coverage: Add missing tests as time permits

Great work on this feature! The architecture is sound and the code is well-organized. Looking forward to seeing the critical fixes applied.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@adit/core/tasks.py`:
- Around line 66-102: The cleanup currently unconditionally wipes partial
exports in _cleanup_mass_transfer_exports, which prevents reuse on retries;
update _cleanup_mass_transfer_exports (the function handling DicomTask) to first
check dicom_task.status and return early when dicom_task.status ==
DicomTask.Status.PENDING so volumes that reached EXPORTED with an
exported_folder are not cleared; keep the existing partition_key and import
checks, and only perform the existing removal/ERROR-setting logic for
non-PENDING tasks.

In `@adit/mass_transfer/forms.py`:
- Line 96: self.tasks is typed on the class but never populated because
_save_tasks builds a local tasks list and bulk-creates MassTransferTask objects
without assigning them back to the instance; update the _save_tasks method (the
function named _save_tasks that bulk-creates MassTransferTask) to either remove
the unused self.tasks attribute or, preferably, set self.tasks to the list of
created task instances (or their dicts/IDs) after bulk_create so callers can
read form.tasks; ensure you reference the MassTransferTask creation logic and
assign the result to self.tasks immediately after bulk_create (or drop the
attribute and remove all reads of self.tasks).

Comment on lines 66 to 102
def _cleanup_mass_transfer_exports(dicom_task: DicomTask) -> None:
if dicom_task._meta.app_label != "mass_transfer":
return

partition_key = getattr(dicom_task, "partition_key", None)
if not partition_key:
return

try:
from adit.mass_transfer.models import MassTransferVolume
except Exception:
return

volumes = MassTransferVolume.objects.filter(
job_id=dicom_task.job_id,
partition_key=partition_key,
).exclude(exported_folder="")

for volume in volumes:
if volume.status == MassTransferVolume.Status.CONVERTED:
continue

export_folder = volume.exported_folder
if export_folder:
try:
shutil.rmtree(Path(export_folder))
except FileNotFoundError:
pass
except Exception as err:
volume.add_log(f"Cleanup failed: {err}")
volume.save()
continue

volume.exported_folder = ""
volume.status = MassTransferVolume.Status.ERROR
volume.add_log("Export cleaned up after task failure.")
volume.save()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find the file and check the call site at line 178
fd tasks.py --type f | head -5

Repository: openradx/adit

Length of output: 292


🏁 Script executed:

# Check the exact line where _cleanup_mass_transfer_exports is called
rg -n "_cleanup_mass_transfer_exports" --type=py | head -20

Repository: openradx/adit

Length of output: 665


🏁 Script executed:

# Now run the verification script from the comment
rg -n "_find_volumes\|EXPORTED\|CONVERTED\|status.*PENDING" --type=py adit/mass_transfer/processors.py | head -40

Repository: openradx/adit

Length of output: 39


🏁 Script executed:

# Check if adit/mass_transfer/processors.py exists and list mass_transfer structure
fd processors.py --path "*/mass_transfer/*" --type f

Repository: openradx/adit

Length of output: 272


🏁 Script executed:

# Read the context around the cleanup calls in adit/core/tasks.py
sed -n '140,210p' adit/core/tasks.py

Repository: openradx/adit

Length of output: 2828


🏁 Script executed:

# Also check the test file to understand the intended behavior
cat -n adit/mass_transfer/tests/test_cleanup.py

Repository: openradx/adit

Length of output: 2451


🏁 Script executed:

# Check if processors.py exists in mass_transfer
find adit/mass_transfer -name "*.py" -type f | head -20

Repository: openradx/adit

Length of output: 830


🏁 Script executed:

# Look for how processors handle volumes and whether they reuse exports
rg -n "EXPORTED|CONVERTED" --type=py adit/mass_transfer/ | head -30

Repository: openradx/adit

Length of output: 863


🏁 Script executed:

# Check the cleanup function again to understand the CONVERTED status check
sed -n '66,102p' adit/core/tasks.py

Repository: openradx/adit

Length of output: 1207


🏁 Script executed:

# Examine processors.py to understand how EXPORTED volumes are reused on retry
sed -n '360,415p' adit/mass_transfer/processors.py

Repository: openradx/adit

Length of output: 1883


🏁 Script executed:

# Also check the broader context to understand the flow
sed -n '150,170p' adit/mass_transfer/processors.py

Repository: openradx/adit

Length of output: 985


🏁 Script executed:

# Check what happens with EXPORTED volumes - do they get reprocessed on retry?
sed -n '395,420p' adit/mass_transfer/processors.py

Repository: openradx/adit

Length of output: 866


Cleanup discards partial exports on retriable failures, forcing redundant PACS re-exports on retry.

_cleanup_mass_transfer_exports is called at line 178 in the RetriableDicomError handler regardless of whether the task will be retried. This resets volumes not in CONVERTED status to ERROR and removes their export folders.

However, the processor's _export_volume method (line 368 in processors.py) explicitly reuses exports: if volume.status == MassTransferVolume.Status.EXPORTED and volume.exported_folder: return. On retry, volumes that reached EXPORTED status should be reusable without re-fetching from PACS. Deleting exported_folder forces the next retry to re-fetch and re-export the same series, causing redundant PACS round-trips.

Add a condition to skip cleanup when dicom_task.status == DicomTask.Status.PENDING to allow partial progress to be reused across retry attempts.

🧰 Tools
🪛 Ruff (0.14.14)

[warning] 76-76: Do not catch blind exception: Exception

(BLE001)


[warning] 94-94: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@adit/core/tasks.py` around lines 66 - 102, The cleanup currently
unconditionally wipes partial exports in _cleanup_mass_transfer_exports, which
prevents reuse on retries; update _cleanup_mass_transfer_exports (the function
handling DicomTask) to first check dicom_task.status and return early when
dicom_task.status == DicomTask.Status.PENDING so volumes that reached EXPORTED
with an exported_folder are not cleared; keep the existing partition_key and
import checks, and only perform the existing removal/ERROR-setting logic for
non-PENDING tasks.

),
)

tasks: list[MassTransferTask]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

self.tasks is declared but never populated.

self.tasks is initialized to [] at line 130 and typed at line 96, but _save_tasks creates a separate local tasks list (line 230) and bulk-creates from it without ever assigning back to self.tasks. If any caller reads form.tasks after save, it will always be empty.

Either remove the attribute if it's unused, or assign after bulk-create:

Proposed fix
-        MassTransferTask.objects.bulk_create(tasks)
+        self.tasks = MassTransferTask.objects.bulk_create(tasks)
#!/bin/bash
# Check if form.tasks is accessed anywhere outside the form class
rg -n "form\.tasks" --type=py adit/mass_transfer/
rg -n "\.tasks" --type=py adit/mass_transfer/views.py | grep -v "import\|model\|#"

Also applies to: 129-130, 223-242

🤖 Prompt for AI Agents
In `@adit/mass_transfer/forms.py` at line 96, self.tasks is typed on the class but
never populated because _save_tasks builds a local tasks list and bulk-creates
MassTransferTask objects without assigning them back to the instance; update the
_save_tasks method (the function named _save_tasks that bulk-creates
MassTransferTask) to either remove the unused self.tasks attribute or,
preferably, set self.tasks to the list of created task instances (or their
dicts/IDs) after bulk_create so callers can read form.tasks; ensure you
reference the MassTransferTask creation logic and assign the result to
self.tasks immediately after bulk_create (or drop the attribute and remove all
reads of self.tasks).

@claude
Copy link

claude bot commented Feb 8, 2026

Code Review for PR #298: Add Mass Transfer app

I have reviewed this substantial PR that introduces the Mass Transfer feature. Overall, the implementation is well-structured and follows existing patterns. Below are my findings:

Strengths

Architecture & Design

  • Follows the established job/task pattern used in Selective Transfer and Batch Transfer
  • Good separation of concerns: Models, processors, forms, and views are cleanly separated
  • Reusable filter system allows users to define and reuse query patterns
  • Daily/weekly partitioning prevents overwhelming the PACS with massive queries
  • Dedicated mass_transfer_worker isolates long-running tasks

Code Quality

  • Good use of type hints throughout
  • Recursive splitting when search results exceed limits (processors.py:329-338)
  • Includes unit tests for processors, partitions, and cleanup logic
  • Updated AGENTS.md with dcm2niix dependency and troubleshooting

Issues & Concerns

1. Security: Path Traversal Vulnerability (CRITICAL)

Location: processors.py:73-77

The code creates directories using user-controlled data without proper validation:

def _destination_base_dir(node: DicomNode) -> Path:
    assert node.node_type == DicomNode.NodeType.FOLDER
    path = Path(node.dicomfolder.path)  # User-controlled via database
    path.mkdir(parents=True, exist_ok=True)  # POTENTIAL RISK
    return path

Recommendation:

  • Validate that node.dicomfolder.path is within allowed directories
  • Add validation in DicomFolder model to reject paths with .. or absolute paths outside a safe root
  • Consider using Path.resolve() and checking it is a subdirectory of an allowed base path

2. Command Injection Risk (HIGH)

Location: processors.py:427-438

While subprocess.run with a list is safer than shell=True, the series_name is derived from user-controlled DICOM data via sanitize_filename().

Recommendation:

  • Review sanitize_filename() to ensure it properly escapes shell metacharacters
  • Add additional validation to ensure series_name contains only safe characters
  • Consider using shlex.quote() as an additional safety layer

3. Resource Exhaustion (MEDIUM)

Location: processors.py:329-338

The recursive splitting algorithm has no maximum recursion depth limit and could hit Python recursion limit on highly populated PACS servers.

Recommendation:

  • Add a maximum recursion depth parameter with a reasonable limit (e.g., 10 levels)
  • Consider converting to iterative approach using a queue
  • Add logging at each split to track depth

4. Data Integrity: Missing Transaction (MEDIUM)

Location: processors.py:144-173

The main processing loop modifies multiple MassTransferVolume objects but does not use database transactions.

Recommendation:

  • Wrap the processing loop in a database transaction
  • Or clearly document that partial success is acceptable
  • Consider adding rollback logic for filesystem operations

5. Cleanup Race Condition (LOW-MEDIUM)

Location: processors.py:450-465

The (cleaned) suffix approach is fragile and there is a potential race condition.

Recommendation:

  • Use a separate cleanup_attempted boolean field on the model
  • Log cleanup failures more prominently
  • Consider a separate cleanup task that can be retried

6. Missing Validation (MEDIUM)

Location: forms.py:220-226

Missing validations for unreasonably large date ranges that could create thousands of tasks.

Recommendation:

  • Add maximum date range validation (e.g., 1 year max)
  • Warn users if the date range will create many tasks
  • Validate filters have at least one non-empty criterion

7. Test Coverage Gaps (MEDIUM)

Tests are good but missing:

  • Edge cases for date boundaries and timezone transitions
  • Error recovery scenarios (what happens when dcm2niix fails)
  • Tests for the recursive splitting logic with actual depth

8. Duplicate Code (LOW)

Location: processors.py:87-97

_volume_export_path and _volume_output_path are identical functions. Consolidate or add comment explaining why separate.

Performance Considerations

  1. Recursive splitting is clever, but consider caching intermediate results
  2. Line 269 uses get_or_create in a loop - consider bulk operations if this becomes a bottleneck
  3. No rate limiting on dcm2niix calls - could overwhelm I/O on busy workers

Summary

This is a solid implementation that integrates well with existing ADIT architecture. Major concerns:

  1. MUST FIX: Path traversal and command injection security issues
  2. SHOULD FIX: Add recursion limits, transaction handling, and improved validation
  3. NICE TO HAVE: Consolidate duplicate code, improve test coverage

The core logic is sound and the feature will be valuable for bulk DICOM operations. Once security issues are addressed, this will be ready to merge.

Recommendation: Request changes for security issues, then approve after fixes.

Great work on this substantial feature addition!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.


In `@adit/mass_transfer/processors.py`:
- Around line 148-155: When job.pseudonymize is true and you generate a
pseudonym for a study (inside the volumes_by_study loop where pseudonym is
computed), immediately assign and persist that pseudonym to every Volume in
study_volumes (set volume.pseudonym and call the same persistence path used by
_export_volume or the model/save method) before invoking _export_volume for any
single volume; this ensures the generated pseudonym is stored atomically for the
whole study and avoids generating a new one on retry. Use the existing symbols
volumes_by_study, study_volumes, job.pseudonymize, volume.pseudonym and the
persistence logic currently used in _export_volume to perform the save.
- Around line 258-263: The try/except around comparing mf.series_number to
series_number is dead code because series_number comes from _parse_int which
already handles TypeError/ValueError; remove the try/except and replace with a
direct conditional: check if series_number is None or mf.series_number !=
series_number and continue accordingly (i.e., keep the existing logic inside the
if but without the try/except wrapper), referring to mf.series_number,
series_number and _parse_int to locate the code.
- Around line 59-64: _study_datetime currently returns a naive datetime which
forces callers to call timezone.make_aware; instead, make _study_datetime return
an aware datetime: in function _study_datetime, when combining study.StudyDate
and study.StudyTime preserve any existing tzinfo on study_time (or
study_time.tzinfo), and if the resulting datetime has no tzinfo, use
django.utils.timezone.get_current_timezone() (or
timezone.get_default_timezone()) and call timezone.make_aware on the combined dt
before returning; remove or stop relying on callers to call timezone.make_aware
(e.g., the code that currently wraps _study_datetime with timezone.make_aware
should no longer do so).
- Around line 182-196: The current final-status logic in the mass transfer
result block treats converted_count == 0 and failed_count == 0 as a generic
SUCCESS; change it to detect the "all skipped / nothing to do" case
(converted_count == 0 and failed_count == 0) and return a clearer outcome: set
status to MassTransferTask.Status.SUCCESS (or a new explicit status if your
domain supports it) but set message to "No new volumes to process" (and keep the
existing "log": "\n".join(log_lines)). Update the conditional branching around
converted_count and failed_count (the block that currently assigns
status/message) to check for this zero-zero case first and produce the distinct
message using the same variables (converted_count, failed_count, log_lines,
MassTransferTask.Status).
- Around line 269-304: The get_or_create call for MassTransferVolume duplicates
the same field assignments in the subsequent update block; extract the shared
field values into a single dict (e.g., build a "volume_values" dict before
calling MassTransferVolume.objects.get_or_create) and pass it as defaults, then
if not created update the model using that same dict (or simply switch to
MassTransferVolume.objects.update_or_create(job=job,
series_instance_uid=series_uid, defaults=volume_values) if you want
unconditional overwrites). Update references to the dict where you set
partition_key, patient_id, accession_number, study_instance_uid, modality,
study_description, series_description, series_number, study_datetime,
institution_name, number_of_images and ensure found_series[series_uid] still
receives the resulting Volume instance.
- Around line 370-410: The _export_volume routine calls operator.fetch_series
with a callback that writes every instance but lacks observability and timeout;
update the callback passed to operator.fetch_series to (1) maintain a local
counter and a last-progress timestamp, logging progress (using your logger)
every N images (e.g., every 100) including patient_id/study/series identifiers,
(2) update last-progress when an image is processed, and (3) enforce a stall
timeout by raising/propagating an exception or aborting the fetch if no progress
is seen for T seconds; ensure you reference the existing callback closure,
manipulator.manipulate, write_dataset, and volume identifiers so
exported_folder/volume.status handling remains correct and errors are surfaced.

In `@adit/mass_transfer/tests/test_processor.py`:
- Around line 238-250: The helper _make_processor currently patches
MassTransferTaskProcessor.__init__ and manually sets dicom_task and mass_task,
which is fragile if DicomTaskProcessor.__init__ later initializes more
attributes; update _make_processor to document and set all required instance
attributes explicitly (at minimum dicom_task and mass_task) and add a short
comment listing these required attributes and why __init__ is bypassed, or
alternatively construct the processor via its real __init__ with a lightweight
in-memory/fixture task; reference the _make_processor function,
MassTransferTaskProcessor, DicomTaskProcessor.__init__, and the
dicom_task/mass_task attributes when making the change.
- Around line 350-369: Remove the no-op unpack and the unused `right_end` by
deleting the line "_, _ = call_ranges[0]" and change the third unpack to discard
the unused end value (e.g., unpack `call_ranges[2]` as `right_start, _`) so only
`left_start, left_end` and `right_start` are assigned; this targets the test in
test_processor.py that patches MassTransferTaskProcessor._find_studies and uses
the `call_ranges` list.
🧹 Nitpick comments (8)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed.


In `@adit/mass_transfer/processors.py`:
- Around line 258-263: The try/except around comparing mf.series_number to
series_number is dead code because series_number comes from _parse_int which
already handles TypeError/ValueError; remove the try/except and replace with a
direct conditional: check if series_number is None or mf.series_number !=
series_number and continue accordingly (i.e., keep the existing logic inside the
if but without the try/except wrapper), referring to mf.series_number,
series_number and _parse_int to locate the code.
- Around line 59-64: _study_datetime currently returns a naive datetime which
forces callers to call timezone.make_aware; instead, make _study_datetime return
an aware datetime: in function _study_datetime, when combining study.StudyDate
and study.StudyTime preserve any existing tzinfo on study_time (or
study_time.tzinfo), and if the resulting datetime has no tzinfo, use
django.utils.timezone.get_current_timezone() (or
timezone.get_default_timezone()) and call timezone.make_aware on the combined dt
before returning; remove or stop relying on callers to call timezone.make_aware
(e.g., the code that currently wraps _study_datetime with timezone.make_aware
should no longer do so).
- Around line 182-196: The current final-status logic in the mass transfer
result block treats converted_count == 0 and failed_count == 0 as a generic
SUCCESS; change it to detect the "all skipped / nothing to do" case
(converted_count == 0 and failed_count == 0) and return a clearer outcome: set
status to MassTransferTask.Status.SUCCESS (or a new explicit status if your
domain supports it) but set message to "No new volumes to process" (and keep the
existing "log": "\n".join(log_lines)). Update the conditional branching around
converted_count and failed_count (the block that currently assigns
status/message) to check for this zero-zero case first and produce the distinct
message using the same variables (converted_count, failed_count, log_lines,
MassTransferTask.Status).
- Around line 269-304: The get_or_create call for MassTransferVolume duplicates
the same field assignments in the subsequent update block; extract the shared
field values into a single dict (e.g., build a "volume_values" dict before
calling MassTransferVolume.objects.get_or_create) and pass it as defaults, then
if not created update the model using that same dict (or simply switch to
MassTransferVolume.objects.update_or_create(job=job,
series_instance_uid=series_uid, defaults=volume_values) if you want
unconditional overwrites). Update references to the dict where you set
partition_key, patient_id, accession_number, study_instance_uid, modality,
study_description, series_description, series_number, study_datetime,
institution_name, number_of_images and ensure found_series[series_uid] still
receives the resulting Volume instance.
- Around line 370-410: The _export_volume routine calls operator.fetch_series
with a callback that writes every instance but lacks observability and timeout;
update the callback passed to operator.fetch_series to (1) maintain a local
counter and a last-progress timestamp, logging progress (using your logger)
every N images (e.g., every 100) including patient_id/study/series identifiers,
(2) update last-progress when an image is processed, and (3) enforce a stall
timeout by raising/propagating an exception or aborting the fetch if no progress
is seen for T seconds; ensure you reference the existing callback closure,
manipulator.manipulate, write_dataset, and volume identifiers so
exported_folder/volume.status handling remains correct and errors are surfaced.

In `@adit/mass_transfer/tests/test_processor.py`:
- Around line 238-250: The helper _make_processor currently patches
MassTransferTaskProcessor.__init__ and manually sets dicom_task and mass_task,
which is fragile if DicomTaskProcessor.__init__ later initializes more
attributes; update _make_processor to document and set all required instance
attributes explicitly (at minimum dicom_task and mass_task) and add a short
comment listing these required attributes and why __init__ is bypassed, or
alternatively construct the processor via its real __init__ with a lightweight
in-memory/fixture task; reference the _make_processor function,
MassTransferTaskProcessor, DicomTaskProcessor.__init__, and the
dicom_task/mass_task attributes when making the change.
- Around line 350-369: Remove the no-op unpack and the unused `right_end` by
deleting the line "_, _ = call_ranges[0]" and change the third unpack to discard
the unused end value (e.g., unpack `call_ranges[2]` as `right_start, _`) so only
`left_start, left_end` and `right_start` are assigned; this targets the test in
test_processor.py that patches MassTransferTaskProcessor._find_studies and uses
the `call_ranges` list.
adit/mass_transfer/processors.py (6)

258-263: Unreachable except clause — _parse_int already handles TypeError/ValueError.

series_number is returned by _parse_int on line 239, which catches TypeError and ValueError internally and returns default. So mf.series_number != series_number (both int | None) will never raise those exceptions. The try/except here is dead code.

♻️ Simplified
                     if mf.series_number is not None:
-                        try:
-                            if series_number is None or mf.series_number != series_number:
-                                continue
-                        except (TypeError, ValueError):
-                            continue
+                        if series_number is None or mf.series_number != series_number:
+                            continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adit/mass_transfer/processors.py` around lines 258 - 263, The try/except
around comparing mf.series_number to series_number is dead code because
series_number comes from _parse_int which already handles TypeError/ValueError;
remove the try/except and replace with a direct conditional: check if
series_number is None or mf.series_number != series_number and continue
accordingly (i.e., keep the existing logic inside the if but without the
try/except wrapper), referring to mf.series_number, series_number and _parse_int
to locate the code.

59-64: _study_datetime returns a naive datetime; callers use timezone.make_aware.

Line 64 returns a naive datetime via datetime.combine, and line 281 wraps the result in timezone.make_aware(). This works, but if study.StudyDate or study.StudyTime already carries timezone info (unlikely for DICOM but possible with extended negotiation), make_aware would raise ValueError. More importantly, this naive-to-aware conversion is easy to forget for new callers. Consider making the function timezone-aware internally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adit/mass_transfer/processors.py` around lines 59 - 64, _study_datetime
currently returns a naive datetime which forces callers to call
timezone.make_aware; instead, make _study_datetime return an aware datetime: in
function _study_datetime, when combining study.StudyDate and study.StudyTime
preserve any existing tzinfo on study_time (or study_time.tzinfo), and if the
resulting datetime has no tzinfo, use
django.utils.timezone.get_current_timezone() (or
timezone.get_default_timezone()) and call timezone.make_aware on the combined dt
before returning; remove or stop relying on callers to call timezone.make_aware
(e.g., the code that currently wraps _study_datetime with timezone.make_aware
should no longer do so).

182-196: All-skipped scenario returns SUCCESS — verify this is intended.

When every volume has status == CONVERTED (line 158-159), both converted_count and failed_count remain 0, so the else branch at line 188 returns SUCCESS. The log will show "Converted: 0, Failed: 0" which is technically correct for an idempotent re-run, but could be surprising. Consider whether a distinct message like "No new volumes to process" would be more informative.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adit/mass_transfer/processors.py` around lines 182 - 196, The current
final-status logic in the mass transfer result block treats converted_count == 0
and failed_count == 0 as a generic SUCCESS; change it to detect the "all skipped
/ nothing to do" case (converted_count == 0 and failed_count == 0) and return a
clearer outcome: set status to MassTransferTask.Status.SUCCESS (or a new
explicit status if your domain supports it) but set message to "No new volumes
to process" (and keep the existing "log": "\n".join(log_lines)). Update the
conditional branching around converted_count and failed_count (the block that
currently assigns status/message) to check for this zero-zero case first and
produce the distinct message using the same variables (converted_count,
failed_count, log_lines, MassTransferTask.Status).

269-304: Large duplication between get_or_create defaults and the update block.

The field assignments on lines 289–302 mirror the defaults dict on lines 272–286 almost identically. If a field is added or renamed, both must be updated in lockstep, which is error-prone.

Consider extracting a helper dict or using update_or_create:

♻️ Proposed refactor using update_or_create
+                    volume_defaults = {
+                        "partition_key": self.mass_task.partition_key,
+                        "patient_id": str(study.PatientID),
+                        "accession_number": str(study.get("AccessionNumber", "")),
+                        "study_instance_uid": str(study.StudyInstanceUID),
+                        "modality": str(series.Modality),
+                        "study_description": str(study.get("StudyDescription", "")),
+                        "series_description": str(series.get("SeriesDescription", "")),
+                        "series_number": series_number,
+                        "study_datetime": timezone.make_aware(study_dt),
+                        "institution_name": str(series.get("InstitutionName", "")),
+                        "number_of_images": _parse_int(
+                            series.get("NumberOfSeriesRelatedInstances"), default=0
+                        ),
+                    }
+                    volume, _ = MassTransferVolume.objects.update_or_create(
+                        job=job,
+                        series_instance_uid=series_uid,
+                        defaults=volume_defaults,
                     )

Note: update_or_create will overwrite fields on existing records unconditionally; if you need to preserve some fields (e.g., status, pseudonym) on existing volumes, keep the current pattern but extract the shared dict to avoid the duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adit/mass_transfer/processors.py` around lines 269 - 304, The get_or_create
call for MassTransferVolume duplicates the same field assignments in the
subsequent update block; extract the shared field values into a single dict
(e.g., build a "volume_values" dict before calling
MassTransferVolume.objects.get_or_create) and pass it as defaults, then if not
created update the model using that same dict (or simply switch to
MassTransferVolume.objects.update_or_create(job=job,
series_instance_uid=series_uid, defaults=volume_values) if you want
unconditional overwrites). Update references to the dict where you set
partition_key, patient_id, accession_number, study_instance_uid, modality,
study_description, series_description, series_number, study_datetime,
institution_name, number_of_images and ensure found_series[series_uid] still
receives the resulting Volume instance.

370-410: _export_volume fetches all instances but has no progress tracking or timeout.

For large series (thousands of images), operator.fetch_series could run for a very long time. There's no timeout, no progress logging, and no way to detect a stalled PACS connection. Consider adding at least a periodic log statement inside the callback (e.g., every N images) for observability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adit/mass_transfer/processors.py` around lines 370 - 410, The _export_volume
routine calls operator.fetch_series with a callback that writes every instance
but lacks observability and timeout; update the callback passed to
operator.fetch_series to (1) maintain a local counter and a last-progress
timestamp, logging progress (using your logger) every N images (e.g., every 100)
including patient_id/study/series identifiers, (2) update last-progress when an
image is processed, and (3) enforce a stall timeout by raising/propagating an
exception or aborting the fetch if no progress is seen for T seconds; ensure you
reference the existing callback closure, manipulator.manipulate, write_dataset,
and volume identifiers so exported_folder/volume.status handling remains correct
and errors are surfaced.

447-451: subprocess.run with list argument from partially user-influenced data.

Ruff flags S603 here. While series_name goes through sanitize_filename, the volume.exported_folder path at line 444 is read from the database and could theoretically be manipulated. Using a list (not shell=True) mitigates shell injection, but consider validating that volume.exported_folder is a real path under the expected export base before passing it to a subprocess.

adit/mass_transfer/tests/test_processor.py (2)

238-250: _make_processor bypasses __init__ — fragile if internal attributes change.

Patching __init__ and manually setting dicom_task / mass_task will silently break if the parent class DicomTaskProcessor.__init__ starts initializing additional attributes. This is a pragmatic trade-off for avoiding DB setup, but consider adding a brief comment documenting which attributes are required so future maintainers know what to update.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adit/mass_transfer/tests/test_processor.py` around lines 238 - 250, The
helper _make_processor currently patches MassTransferTaskProcessor.__init__ and
manually sets dicom_task and mass_task, which is fragile if
DicomTaskProcessor.__init__ later initializes more attributes; update
_make_processor to document and set all required instance attributes explicitly
(at minimum dicom_task and mass_task) and add a short comment listing these
required attributes and why __init__ is bypassed, or alternatively construct the
processor via its real __init__ with a lightweight in-memory/fixture task;
reference the _make_processor function, MassTransferTaskProcessor,
DicomTaskProcessor.__init__, and the dicom_task/mass_task attributes when making
the change.

350-369: Unused variable right_end and no-op unpacking on line 363.

Line 363 unpacks call_ranges[0] into two discarded variables — a bare comment or index access would be clearer. Line 365's right_end is unused (Ruff RUF059).

♻️ Minor cleanup
-    _, _ = call_ranges[0]
-    left_start, left_end = call_ranges[1]
-    right_start, right_end = call_ranges[2]
+    left_start, left_end = call_ranges[1]
+    right_start, _ = call_ranges[2]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adit/mass_transfer/tests/test_processor.py` around lines 350 - 369, Remove
the no-op unpack and the unused `right_end` by deleting the line "_, _ =
call_ranges[0]" and change the third unpack to discard the unused end value
(e.g., unpack `call_ranges[2]` as `right_start, _`) so only `left_start,
left_end` and `right_start` are assigned; this targets the test in
test_processor.py that patches MassTransferTaskProcessor._find_studies and uses
the `call_ranges` list.

Comment on lines 148 to 155
for _, study_volumes in volumes_by_study.items():
pseudonym = ""
if job.pseudonymize:
existing_pseudonym = next(
(v.pseudonym for v in study_volumes if v.pseudonym),
None,
)
pseudonym = existing_pseudonym or uuid.uuid4().hex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pseudonym not persisted before _export_volume; re-run of a partially failed study may generate a new pseudonym.

When a study has two volumes and the first succeeds (pseudonym saved on volume.pseudonym inside _export_volume) but the second fails, a re-run will hit line 152 and recover the pseudonym from the first volume. However, the pseudonym is generated at line 155 and only persisted inside _export_volume (line 408). If the process crashes between line 155 and the first _export_volume call completing, the generated pseudonym is lost entirely and a new one will be created on retry, leading to inconsistent pseudonyms across volumes of the same study.

Consider persisting the pseudonym on all study volumes as soon as it is generated, before starting export.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adit/mass_transfer/processors.py` around lines 148 - 155, When
job.pseudonymize is true and you generate a pseudonym for a study (inside the
volumes_by_study loop where pseudonym is computed), immediately assign and
persist that pseudonym to every Volume in study_volumes (set volume.pseudonym
and call the same persistence path used by _export_volume or the model/save
method) before invoking _export_volume for any single volume; this ensures the
generated pseudonym is stored atomically for the whole study and avoids
generating a new one on retry. Use the existing symbols volumes_by_study,
study_volumes, job.pseudonymize, volume.pseudonym and the persistence logic
currently used in _export_volume to perform the save.

@claude
Copy link

claude bot commented Feb 16, 2026

Code Review: Mass Transfer Feature

Thank you for this substantial contribution! The Mass Transfer feature brings valuable functionality from the volume-export-pipeline into ADIT. The code is generally well-structured and the latest commit fixing duplicate studies shows good problem-solving. However, I've identified several issues that should be addressed.


Critical Issues (Must Fix Before Merge)

1. N+1 Database Query Problem

Location: adit/mass_transfer/processors.py:269-302

The _find_volumes method performs get_or_create inside nested loops, potentially causing thousands of individual database queries. For 10 filters × 100 studies × 10 series = 10,000 queries. This will cause severe performance degradation.

Fix: Use bulk_create with ignore_conflicts=True and pre-fetch existing volumes into a dictionary.

2. No Maximum Recursion Depth

Location: adit/mass_transfer/processors.py:308-349

The _find_studies recursive splitting has no depth limit. A PACS returning thousands of studies from a single day could cause stack overflow.

Fix: Add recursion depth tracking with a maximum depth limit (e.g., 10 levels).


High Priority Issues

3. Fragile Pseudonym Consistency

Location: adit/mass_transfer/processors.py:148-156, 387-410

If a volume fails before setting its pseudonym, subsequent volumes in the same study get different pseudonyms, breaking the per-study guarantee.

Recommendation: Store pseudonym at the study level or generate once per study before processing any volumes.

4. Missing Test Coverage

Gaps: Filter matching edge cases, series number filtering with None values, error handling for dcm2niix failures, integration tests with actual DICOM queries.

Good coverage for: Time window splitting, deduplication, cleanup, partitions.

5. No Date Range Validation in Forms

Location: adit/mass_transfer/forms.py:101-104, 220-226

Forms allow selecting arbitrary date ranges (e.g., 10 years), which could create thousands of tasks and overwhelm the system.

Fix: Add maximum date range validation (e.g., 1 year) in the form's clean method.


Medium Priority Issues

6. Cleanup State Tracking Uses String Suffix

Location: adit/mass_transfer/processors.py:457-474

Uses string suffix matching which could fail if a folder legitimately ends with "(cleaned)".

Recommendation: Add an explicit cleanup_completed boolean field.

7. No Volume Error Recovery

Location: adit/mass_transfer/models.py:145-149

Once a volume enters ERROR state, there's no retry mechanism.

8. Unnecessary Exception Handling

Location: adit/mass_transfer/processors.py:258-263

Exception handling appears unnecessary since both values are already integers or None.


Minor Issues

  • Missing docstrings on complex public methods
  • Unused urgent_priority setting (never used, always hardcoded to False)
  • dcm2niix error messages not sanitized (low risk)

Positives

Great work on:

  1. Duplicate studies fix is correct - properly addresses midpoint overlap and deduplication
  2. File path sanitization - prevents path traversal attacks
  3. Django field conventions - correctly follows project patterns
  4. Cleanup on failure - proper error handling
  5. Partition logic - clean implementation with good tests
  6. Type hints - comprehensive throughout

Summary

Overall Assessment: This PR introduces valuable functionality with generally good code quality. The duplicate studies fix demonstrates strong problem-solving. However, the N+1 query issue is critical and will cause severe performance problems with real-world datasets.

Recommended Action: Address the two critical issues, then re-review. High-priority issues should also be addressed, but could potentially be handled in follow-up PRs if there's urgency.

Let me know if you'd like help implementing any of these fixes!

adit_dev-web:latest
adit_dev-default_worker:latest
adit_dev-dicom_worker:latest
adit_dev-mass_transfer_worker:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still not using the dicom worker. I thought we talked about to use the dicom worker instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we switched to dicom worker. This line is a leftover from when I had implemented its own worker. I will remove it. But if you look at the compose file, there is only dicom_worker and default_worker. no mass transfer worker..

MASS_TRANSFER_MAX_SEARCH_RESULTS = env.int("MASS_TRANSFER_MAX_SEARCH_RESULTS", default=200)

# Base directory for temporary DICOM exports in mass transfer
MASS_TRANSFER_EXPORT_BASE_DIR = env.str(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what is this? Why don't we use the normal destination (just filtered to only allow directories).

Copy link
Collaborator Author

@NumericalAdvantage NumericalAdvantage Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not want to use the same folder for the conversion (even if you do subdirectory) - cleanup in case of failure can be quite messy. Easy to make mistakes, delete wrong stuff, etc.

But yes, i get your point, having a separate env var just for a temp directory is also definitely overkill.

But here is the thing, if we do it with tempfile.mkdtemp(), we have a separate list of issues we have to take of - mkdtemp() generates a different random path. If worker dies before we clean up, we cannot clean up the old path! unless of course we record it, but that just adds more complexity.

Another drawback of using /tmp is that since we are on a container, and containers (will generally) write their tmp files to the host filesystem. Hence space can also be an issue ? (I saw that CT's with thin slices can be quite large - upto 3GB). In case Mass transfer is running on all three replicas, in worst case scenario you have a demand of 3 GB (for a thin slice CT) per DICOM worker, so that is 9 GB to 10 GB on the host filesystem .. Which is not a big deal but this requirement is hidden in /tmp usage .. not sure how much a faux pas that is ..

One "long term point" is that we might at some point want the processing to happen on a local path which has an SSD and the DicomFolder is probably a NAS or something.

Overall, I am not sure what to do. what do you think?

@claude
Copy link

claude bot commented Feb 18, 2026

Code Review: Mass Transfer App

This is a substantial addition to ADIT, introducing a well-structured mass transfer workflow. The overall architecture follows established project patterns and the code quality is generally good. Below are findings organized by severity.


Bugs / Correctness Issues

1. _cleanup_mass_transfer_exports skips volumes that are CONVERTED — but timeout path may access stale partition_key

In adit/core/tasks.py, the cleanup function correctly skips volumes with status CONVERTED. However, if _cleanup_mass_transfer_exports is called on timeout (before dicom_task.save()), and the task object has just been created, partition_key may not yet be persisted. Worth verifying partition_key is always present before this cleanup path is triggered.

2. Potential race condition in get_or_create + manual update in _find_volumes

In processors.py:

volume, created = MassTransferVolume.objects.get_or_create(
    job=job,
    series_instance_uid=series_uid,
    defaults={...},
)
if not created:
    # manually update all fields
    volume.save()

If two tasks run concurrently (e.g., overlapping partitions), both can call get_or_create and one will win the create; the other will immediately overwrite all fields including partition_key, status, and exported_folder, potentially clobbering in-progress work from the first task. Consider using update_or_create with a subset of safe fields, or checking volume.status before overwriting.

3. _convert_volume hardcodes the output filename

volume.converted_file = str(output_path / f"{series_name}.nii.gz")

dcm2niix generates its own filenames and may produce multiple files or use a different naming convention (e.g., {series_name}_e1.nii.gz for multi-echo sequences). The saved converted_file path may not actually exist. Consider scanning output_path for .nii.gz files after conversion instead.

4. _cleanup_export uses a sentinel string " (cleaned)" as state

if not export_folder or export_folder.endswith(" (cleaned)"):
    return
...
volume.exported_folder = f"{export_folder} (cleaned)"

This is fragile — it embeds state in a path field. Use the existing status field or a separate boolean to track cleanup. If exported_folder happens to end in (cleaned) for some reason, cleanup will silently be skipped.

5. MassTransferFilter.owner is nullable but the unique constraint includes it

models.UniqueConstraint(
    fields=["owner", "name"],
    name="mass_transfer_filter_unique_owner_name",
)

In PostgreSQL, NULL != NULL in unique constraints, so two filters with owner=NULL and the same name are allowed by the DB. If shared/unowned filters are intentional, document this behavior; if not, make owner non-nullable.


Design / Architecture Concerns

6. adit/core/tasks.py imports from adit.mass_transfer — reversed dependency direction

_cleanup_mass_transfer_exports uses a deferred import from adit.mass_transfer.models and checks dicom_task._meta.app_label. This creates a coupling from core to mass_transfer, reversing the intended dependency direction. A cleaner approach: define a cleanup_on_failure() hook method on DicomTask (no-op by default), and override it in MassTransferTask, keeping core/tasks.py agnostic of specific apps.

7. Duplicate Procrastinate task functions with identical bodies

def process_dicom_task(context, model_label, task_id):
    _run_dicom_task(context, model_label, task_id)

def process_mass_transfer_task(context, model_label, task_id):
    _run_dicom_task(context, model_label, task_id)

These are identical in behavior but exist to route to different queues (valid Procrastinate pattern). Please add a comment explaining why two separate task functions exist, to avoid future confusion.

8. _volume_export_path and _volume_output_path are identical functions

def _volume_export_path(base_dir, study_dt, subject_id, series_name) -> Path:
    year_month = study_dt.strftime("%Y%m")
    return base_dir / year_month / subject_id / series_name

def _volume_output_path(base_dir, study_dt, subject_id, series_name) -> Path:
    year_month = study_dt.strftime("%Y%m")
    return base_dir / year_month / subject_id / series_name

These are byte-for-byte identical. Merge into one function.

9. MassTransferTask.source duplicates job.source

MassTransferTask has its own source ForeignKey, but it's always set to job.source in _save_tasks. If all tasks in a job always share the same source, this field is redundant overhead.


Code Quality / Style

10. Five separate migrations for a new app

The PR introduces 5 migrations (0001–0005) for a brand-new app with no production data. These should be squashed into a single 0001_initial migration before merging.

11. MassTransferJob silently overrides convert_to_nifti on every save

# In MassTransferJobForm.save():
job.convert_to_nifti = True

If mass transfer always converts to NIfTI, remove the field from this model or make it a class-level constant/property, rather than silently overriding it on every form save.

12. StudyInstanceUID / SeriesInstanceUID field lengths

max_length=64 is per the DICOM standard, but some real-world (non-compliant) PACS generate UIDs up to 80+ characters. Consider max_length=128 to be safe.


Test Coverage Gaps

13. No test for process() when filters is empty

The process() method returns a FAILURE dict when no filters are configured, but there is no test covering this path.

14. No test for _export_volume or _convert_volume with filesystem interactions

The tests mock _export_volume and _convert_volume entirely. Tests exercising the actual export/convert paths (with a temp directory and mocked DicomOperator) would catch issues like the hardcoded filename assumption (#3 above).

15. No test for MassTransferFilterForm.clean_name uniqueness check

The per-user filter name uniqueness validation is untested.


Minor Issues

  • MassTransferFilter.__str__ has a fallback that builds a description from fields, but since name is now non-nullable (migration 0004), the fallback branch can never be reached. Simplify to return self.name.
  • The processed_tasks attribute referenced in mass_transfer_job_detail.html (job.processed_tasks.count) — confirm this is inherited from DicomJob and not missing.
  • In _find_studies, the StudyTime upper bound uses datetime.max.time() which includes microseconds. DICOM time fields don't support microseconds — replace with time(23, 59, 59).

Summary

This is a solid first implementation that follows ADIT's established patterns well. The partitioning logic, pseudonymization grouping by study, and cleanup-on-failure handling are particularly thoughtful. The main items to address before merging:

  1. Fix the _cleanup_export sentinel string pattern — use status field instead (bug Channels #4)
  2. Squash migrations into a single 0001_initial (Abstract transfer job #10)
  3. Resolve the coremass_transfer dependency inversion (Polymorphism #6)
  4. Fix the hardcoded dcm2niix output filename assumption (Separate settings and requirements #3)
  5. Remove the duplicate _volume_export_path / _volume_output_path functions (Refactor connector #8)

@claude
Copy link

claude bot commented Feb 18, 2026

Code Review — Mass Transfer App

This is a substantial, well-structured addition. The architecture follows existing patterns (Job→Task→Processor, Procrastinate queues, shared admin/view/filter base classes) and the recursive time-window splitting for large DICOM result sets is a solid design. Below are the issues I found, ordered by severity.


Critical

1. dcm2niix output filename is not deterministic

_convert_volume constructs the expected output path as {series_name}.nii.gz, but dcm2niix auto-generates its own filenames based on DICOM metadata — the file you expect to exist may not match what was actually written.

# processors.py — current
output_file = output_dir / f"{series_name}.nii.gz"
if not output_file.exists():
    raise RuntimeError(...)
volume.converted_file = str(output_file)

Replace with a glob to find whatever dcm2niix actually produced:

nii_files = list(output_dir.glob("*.nii.gz"))
if not nii_files:
    raise RuntimeError("dcm2niix produced no .nii.gz output")
volume.converted_file = str(nii_files[0])  # or handle multiple

2. Race condition in pseudonym assignment across partitions

_find_volumes groups volumes by StudyInstanceUID and assigns a single pseudonym to each study. But multiple MassTransferTask workers (one per partition) can run concurrently. If two workers encounter the same study (e.g., it spans a partition boundary), they each independently call get_or_create and may assign different pseudonyms to volumes from the same study. The result is inconsistently pseudonymized data.

Fix: Before processing any volumes, do a single DB-level pseudonym assignment per (job, StudyInstanceUID) using SELECT FOR UPDATE or update_or_create with a stable key. Or assign pseudonyms at job creation time after the full volume discovery step.

3. Potential infinite recursion in _find_studies

The recursion guard is _MIN_SPLIT_WINDOW = timedelta(minutes=30), but this is checked at the call site before recursing. If the DICOM server consistently returns >MAX results for any 30-minute window (e.g., a very busy site), the recursion terminates with an exception — but if the check condition is slightly off, you can recurse until the Python stack overflows. Add an explicit max_depth parameter and enforce it:

def _find_studies(self, ..., _depth: int = 0) -> list[Dataset]:
    if _depth > 20:
        raise RuntimeError("_find_studies exceeded maximum recursion depth")
    ...
    left = self._find_studies(..., _depth=_depth + 1)
    right = self._find_studies(..., _depth=_depth + 1)

4. export_cleaned sentinel string vs. boolean field

_cleanup_export checks if " (cleaned)" in volume.exported_folder to detect already-cleaned volumes, but the model has a dedicated export_cleaned = BooleanField(default=False). The string-based check is fragile and the two signals can diverge. Use the boolean exclusively:

def _cleanup_export(self, volume: MassTransferVolume) -> None:
    if volume.export_cleaned:
        return
    shutil.rmtree(volume.exported_folder, ignore_errors=True)
    volume.export_cleaned = True
    volume.save(update_fields=["export_cleaned"])

High Priority

5. save_tasks stored as bound method on form instance

In MassTransferJobForm.save(commit=False), a save_tasks attribute is set on self:

self.save_tasks = self._save_tasks

Django's own pattern for this is to extend save_m2m. The current approach works in practice but breaks if the caller doesn't know to call form.save_tasks(job) — the standard contract from ModelForm is save_m2m(). Even coderabbitai flagged this. Align with the Django pattern.

6. _dicom_match uses substring search, not DICOM wildcard semantics

# current
return bool(regex.search(pattern, value))

DICOM wildcard (* and ?) anchors to the full value. regex.search matches anywhere in the string, so MR would match MR_BRAIN and PERIPHERAL_MR. Use regex.fullmatch after converting *.* and ?.:

dicom_pattern = re.sub(r"\*", ".*", re.sub(r"\?", ".", re.escape(pattern_raw)))
return bool(re.fullmatch(dicom_pattern, value, re.IGNORECASE))

7. _find_studies split sends the same mid date to both halves

mid = start + (end - start) / 2
left  = self._find_studies(operator, start, mid, ...)
right = self._find_studies(operator, mid,   end, ...)

Studies with StudyDate == mid appear in both branches and are deduplicated by StudyInstanceUID at each level — but the deduplication happens after the merge, so borderline studies do get queried twice. Use mid - timedelta(days=1) for the left half (or use exclusive end semantics) to avoid the overlap.

8. No disk space check before export

_export_volume calls shutil.copy in a loop inside fetch_series. On a system with a nearly-full export volume, a long-running job will silently accumulate partial DICOM files until OSError: No space left on device. Add a pre-flight check or handle OSError explicitly and mark the volume as ERROR instead of propagating an unhandled exception.


Medium Priority

9. process_dicom_task and process_mass_transfer_task share the same queue

The comment in tasks.py says the separate task name exists so routing can be changed independently — but both are registered on queue="dicom". If the intent is isolation (as stated in the PR description: "dedicated worker"), the mass transfer task should use queue="mass_transfer" now, not as a future TODO. Otherwise the architectural isolation claim is misleading.

10. Missing minimum-filter validation at form level

MassTransferJobForm allows zero filters to be submitted (the field is required=True but a non-empty queryset can still be submitted with no items checked). The processor fails at runtime with FAILURE status. Validate in clean_filters:

def clean_filters(self):
    filters = self.cleaned_data.get("filters", [])
    if not filters:
        raise ValidationError("At least one filter is required.")
    ...

11. Five separate migrations for a new app

New apps should ship with a single squashed initial migration. Having five migration files for brand-new models is unnecessary churn. Run squashmigrations before merging.

12. convert_to_nifti always set to True in save()

job.convert_to_nifti = True

The field exists in the model and there is a UI checkbox for it, but the form's save() method unconditionally overrides the user's choice. Either remove the field from the UI (if always-True is intentional) or respect the submitted value.

13. MassTransferVolume.series_number type mismatch

MassTransferFilter.series_number is PositiveIntegerField, but MassTransferVolume.series_number is plain IntegerField. Use PositiveIntegerField for consistency with the filter definition and with DICOM semantics (series numbers are always non-negative).


Minor / Nits

14. StudyTime upper bound may lose precision

study_time_end = time.max  # time(23, 59, 59, 999999)

DICOM time fields have no microsecond resolution. Use time(23, 59, 59) to avoid surprising server behavior.

15. Dead except (TypeError, ValueError) around series_number

The except block around the series number comparison catches TypeError and ValueError but neither int() nor integer comparison raises these in the way the code expects — the _parse_int() helper already handles parsing failures. The block is dead code and should be removed.

16. Identical _volume_export_path and _volume_output_path helpers

Both return the same value. Consolidate into one _volume_path() function (as already noted by multiple reviewers).

17. MassTransferFilter.__str__ fallback is unreachable

def __str__(self):
    return self.name or "<unnamed filter>"

name has a DB-level NOT NULL constraint and a blank check in the model — the fallback is dead code.

18. _study_datetime returns a naive datetime

The helper constructs a datetime from DICOM StudyDate/StudyTime without attaching a timezone. This will cause issues if the application's USE_TZ = True. Return a timezone-aware datetime or document the assumption explicitly.


Positive Notes

  • The recursive time-window splitting approach in _find_studies is a well-known pattern for DICOM servers that cap result sets; the implementation is clean.
  • Adding cleanup_on_failure() as a hook on DicomTask is a good extension point and avoids coupling core/tasks.py to mass transfer specifics.
  • Per-study pseudonym grouping is the right approach for maintaining cross-series consistency.
  • The partition building logic is well-isolated and well-tested.
  • Using get_or_create for MassTransferVolume is good for idempotency on task retry.

The most important issues to address before merging are #1 (broken NIfTI path), #2 (pseudonym race condition), #6 (incorrect regex semantics), and #7 (split boundary overlap).

NumericalAdvantage and others added 5 commits February 18, 2026 16:06
The recursive split used `mid` as the boundary for both halves, causing
studies at the midpoint to appear in both. Additionally, since the DICOM
query operates at date-level granularity, same-day splits produced
identical queries returning the same results in both halves.

Shift the right half to start at mid + 1s and deduplicate by
StudyInstanceUID when merging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants