Add Mass Transfer app with partitioned DICOM export and conversion#298
Add Mass Transfer app with partitioned DICOM export and conversion#298NumericalAdvantage wants to merge 14 commits intomainfrom
Conversation
Summary of ChangesHello @NumericalAdvantage, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new 'Mass Transfer' system, significantly enhancing the application's capability to handle large volumes of imaging data. It provides a structured workflow for exporting and converting DICOM files, incorporating features like time-based data segmentation, privacy-preserving pseudonymization, and automated file management. The goal is to offer a robust and efficient solution for managing extensive datasets, integrating seamlessly with existing transfer mechanisms. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new Mass Transfer feature: app wiring, models, migrations, admin, views, forms, templates, static assets, processors, task wiring, settings, URLs, utilities, tests, and management cleanup to discover, export, pseudonymize, convert (dcm2niix) and track bulk DICOM transfers across partitioned time windows. Changes
Sequence DiagramsequenceDiagram
actor User
participant Web as "Web View"
participant Form as "MassTransferJobForm"
participant DB as "Database"
participant Queue as "Procrastinate Queue"
participant Worker as "MassTransferTaskProcessor"
participant DICOM as "DICOM Operator"
participant FS as "Filesystem / dcm2niix"
User->>Web: create job (filters, dates, prefs)
Web->>Form: validate & build partitions
Form->>DB: save Job + Tasks (PENDING)
User->>Web: verify -> queue tasks
Web->>Queue: enqueue process_mass_transfer_task(task_id)
Queue->>Worker: run task
Worker->>DICOM: C-FIND/C-MOVE (discover/export)
DICOM-->>Worker: DICOM files on disk
Worker->>FS: run dcm2niix convert
FS-->>Worker: NIfTI outputs
Worker->>FS: cleanup export dirs
Worker->>DB: update MassTransferVolume, Task statuses & logs
DB-->>Worker: persist
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Pull Request Review: Mass Transfer FeatureOverviewThis PR introduces a comprehensive mass transfer facility for bulk DICOM export and NIfTI conversion. The implementation is well-structured and follows the existing ADIT patterns. Here's my detailed review: ✅ StrengthsArchitecture & Design
Code Quality
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant new feature, the Mass Transfer app, which allows for partitioned DICOM export and conversion. The implementation is well-structured, with a dedicated worker queue for long-running tasks, reusable filters, and comprehensive models for managing jobs, tasks, and volumes. The code is generally of high quality, with good use of Django patterns and clear logic. I've identified a couple of areas for improvement related to database query efficiency and code maintainability, which I've detailed in the specific comments. Overall, this is a great addition to the project.
| counts: dict[MassTransferJob.Status, int] = {} | ||
| for status in MassTransferJob.Status: | ||
| counts[status] = MassTransferJob.objects.filter(status=status).count() |
There was a problem hiding this comment.
The current implementation of collect_job_stats executes a separate database query for each status, which can be inefficient. This can be optimized to use a single query with GROUP BY to fetch all counts at once.
| counts: dict[MassTransferJob.Status, int] = {} | |
| for status in MassTransferJob.Status: | |
| counts[status] = MassTransferJob.objects.filter(status=status).count() | |
| from django.db.models import Count | |
| db_counts = { | |
| item["status"]: item["count"] | |
| for item in MassTransferJob.objects.values("status").annotate(count=Count("id")) | |
| } | |
| counts: dict[MassTransferJob.Status, int] = { | |
| status: db_counts.get(status.value, 0) for status in MassTransferJob.Status | |
| } |
There was a problem hiding this comment.
Sounds true. Not sure how the other apps do that, but we could do that in all of the apps in that way (if that works, of course). Let's open an issue for that in the other apps.
| volume, created = MassTransferVolume.objects.get_or_create( | ||
| job=job, | ||
| series_instance_uid=series_uid, | ||
| defaults={ | ||
| "partition_key": self.mass_task.partition_key, | ||
| "patient_id": str(study.PatientID), | ||
| "accession_number": str(study.get("AccessionNumber", "")), | ||
| "study_instance_uid": str(study.StudyInstanceUID), | ||
| "modality": str(series.Modality), | ||
| "study_description": str(study.get("StudyDescription", "")), | ||
| "series_description": str(series.get("SeriesDescription", "")), | ||
| "series_number": series_number, | ||
| "study_datetime": timezone.make_aware(study_dt), | ||
| "institution_name": str(series.get("InstitutionName", "")), | ||
| "number_of_images": _parse_int( | ||
| series.get("NumberOfSeriesRelatedInstances"), default=0 | ||
| ), | ||
| }, | ||
| ) | ||
| if not created: | ||
| volume.partition_key = self.mass_task.partition_key | ||
| volume.patient_id = str(study.PatientID) | ||
| volume.accession_number = str(study.get("AccessionNumber", "")) | ||
| volume.study_instance_uid = str(study.StudyInstanceUID) | ||
| volume.modality = str(series.Modality) | ||
| volume.study_description = str(study.get("StudyDescription", "")) | ||
| volume.series_description = str(series.get("SeriesDescription", "")) | ||
| volume.series_number = series_number | ||
| volume.study_datetime = timezone.make_aware(study_dt) | ||
| volume.institution_name = str(series.get("InstitutionName", "")) | ||
| volume.number_of_images = _parse_int( | ||
| series.get("NumberOfSeriesRelatedInstances"), default=0 | ||
| ) | ||
| volume.save() |
There was a problem hiding this comment.
The use of get_or_create followed by an if not created: block to update the object leads to significant code duplication. This can be simplified by using update_or_create, which handles both creation and updates more cleanly.
defaults = {
"partition_key": self.mass_task.partition_key,
"patient_id": str(study.PatientID),
"accession_number": str(study.get("AccessionNumber", "")),
"study_instance_uid": str(study.StudyInstanceUID),
"modality": str(series.Modality),
"study_description": str(study.get("StudyDescription", "")),
"series_description": str(series.get("SeriesDescription", "")),
"series_number": series_number,
"study_datetime": timezone.make_aware(study_dt),
"institution_name": str(series.get("InstitutionName", "")),
"number_of_images": _parse_int(
series.get("NumberOfSeriesRelatedInstances"), default=0
),
}
volume, _ = MassTransferVolume.objects.update_or_create(
job=job,
series_instance_uid=series_uid,
defaults=defaults,
)There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@adit/mass_transfer/apps.py`:
- Line 45: The signal handler init_db currently declares an unused **kwargs
which triggers Ruff ARG001; rename that parameter to a leading-underscore form
(e.g., **_kwargs) so the handler signature remains compatible with Django
signals but linters treat it as intentionally unused; update any internal
references if present (none expected) and run the linter to verify the warning
is gone.
- Around line 12-42: The register_app() function registers a job-stats collector
each time AppConfig.ready() runs, causing duplicates on Django autoreload; make
register_job_stats_collector idempotent by checking for prior registration
before appending: e.g., when registering collect_job_stats from register_app(),
detect an existing identical collector (by function identity or a unique
name/key) and skip re-registration, or mark the function with an attribute
(e.g., __registered__) after successful registration; update
register_app()/collect_job_stats usage to perform that check so multiple calls
from ready() are no-ops.
In `@adit/mass_transfer/processors.py`:
- Around line 303-335: In _find_studies, the StudyTime range is being applied
for every date when splitting multi-day windows, which can exclude studies on
intermediate days; change the QueryDataset.create call so StudyTime is only set
when start.date() == end.date() (i.e., the window is a single day), while always
setting StudyDate=(start.date(), end.date()), and keep the existing logic that
adds query.dataset.ModalitiesInStudy and StudyDescription using mf.modality and
mf.study_description; ensure recursive splits continue to pass correct start/end
so StudyTime will be applied only for single-day queries.
In `@adit/mass_transfer/templates/mass_transfer/mass_transfer_job_detail.html`:
- Around line 40-48: The template uses an outer conditional ({% if
job.filters.all %}) which prevents the {% empty %} clause from ever running;
remove that outer if around the for-loop (or replace it with an explicit else)
so the for f in job.filters.all ... {% empty %} — {% endfor %} executes for
empty filter sets and renders the placeholder; update the block around
job.filters.all accordingly.
🧹 Nitpick comments (6)
adit/mass_transfer/utils/partitions.py (1)
39-40: Consider usingtime.maxfor end-of-day precision.Using
time(23, 59, 59)excludes the final ~1 second of the day (23:59:59.000001 through 23:59:59.999999). If DICOM queries use inclusive timestamp comparisons, studies at the very end of a day could be missed.♻️ Proposed fix
- end_dt = timezone.make_aware(datetime.combine(window_end_date, time(23, 59, 59)), tz) + end_dt = timezone.make_aware(datetime.combine(window_end_date, time.max), tz)adit/mass_transfer/processors.py (2)
80-97: Consider consolidating the duplicate path builders.
_volume_export_pathand_volume_output_pathare identical; a single helper would reduce duplication and drift risk.
110-191: Add structured logs for task and volume state transitions.There are status transitions and per-volume outcomes, but only exception logs are emitted. Structured logs would improve traceability in long-running batch workflows.
As per coding guidelines "Use structured logging for agent actions and state transitions".adit/mass_transfer/admin.py (1)
20-22: Consider adding custom admin forMassTransferVolumefor better usability.
MassTransferVolumehas many fields (pseudonym, patient_id, accession_number, study/series UIDs, status, etc.). Using the defaultadmin.ModelAdminmay make it difficult to navigate and filter volumes. Consider addinglist_display,list_filter, andsearch_fieldssimilar to howDicomJobAdminandDicomTaskAdminare configured inadit/core/admin.py.♻️ Optional: Custom admin for MassTransferVolume
+class MassTransferVolumeAdmin(admin.ModelAdmin): + list_display = ("id", "job", "status", "pseudonym", "modality", "study_datetime", "created") + list_filter = ("status", "modality", "created") + search_fields = ("pseudonym", "patient_id", "study_instance_uid", "series_instance_uid") + + admin.site.register(MassTransferJob, MassTransferJobAdmin) admin.site.register(MassTransferTask, DicomTaskAdmin) admin.site.register(MassTransferSettings, admin.ModelAdmin) admin.site.register(MassTransferFilter, admin.ModelAdmin) -admin.site.register(MassTransferVolume, admin.ModelAdmin) +admin.site.register(MassTransferVolume, MassTransferVolumeAdmin)adit/mass_transfer/tests/test_processor.py (1)
150-155: Consider usingtmp_pathfixture for consistency.While the static analysis warning (S108) about
/tmp/baseis a false positive here since no actual files are created, using thetmp_pathfixture would be more consistent with the other tests in this file and avoids the linter noise.♻️ Optional: Use tmp_path fixture
-def test_volume_output_path_uses_year_month_and_pseudonym(): - base_dir = Path("/tmp/base") +def test_volume_output_path_uses_year_month_and_pseudonym(tmp_path: Path): + base_dir = tmp_path / "base" study_dt = datetime(2024, 2, 15, 10, 30) path = _volume_output_path(base_dir, study_dt, "pseudo", "1-Head") - assert path == base_dir / "202402" / "pseudo" / "1-Head" + assert path == base_dir / "202402" / "pseudo" / "1-Head"adit/mass_transfer/forms.py (1)
122-152: Add structured logs for job/task creation.Creating partitioned tasks is a key state transition; emitting a structured event (job_id, partition_count, date range) will improve auditability and debugging.
As per coding guidelines:
**/*.{js,ts,py}: Use structured logging for agent actions and state transitions.
| def ready(self): | ||
| register_app() | ||
|
|
||
| # Put calls to db stuff in this signal handler | ||
| post_migrate.connect(init_db, sender=self) | ||
|
|
||
|
|
||
| def register_app(): | ||
| from adit_radis_shared.common.site import MainMenuItem, register_main_menu_item | ||
|
|
||
| from adit.core.site import JobStats, register_dicom_processor, register_job_stats_collector | ||
|
|
||
| from .models import MassTransferJob, MassTransferTask | ||
| from .processors import MassTransferTaskProcessor | ||
|
|
||
| register_main_menu_item( | ||
| MainMenuItem( | ||
| url_name="mass_transfer_job_create", | ||
| label=SECTION_NAME, | ||
| ) | ||
| ) | ||
|
|
||
| register_dicom_processor(get_model_label(MassTransferTask), MassTransferTaskProcessor) | ||
|
|
||
| def collect_job_stats() -> JobStats: | ||
| counts: dict[MassTransferJob.Status, int] = {} | ||
| for status in MassTransferJob.Status: | ||
| counts[status] = MassTransferJob.objects.filter(status=status).count() | ||
| return JobStats("Mass Transfer", "mass_transfer_job_list", counts) | ||
|
|
||
| register_job_stats_collector(collect_job_stats) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd adit/mass_transfer && cat -n apps.pyRepository: openradx/adit
Length of output: 1873
🌐 Web query:
Django AppConfig.ready() autoreloader multiple calls duplicate registration pattern
💡 Result:
Django can call AppConfig.ready() more than once, so anything you do there (signal connect(), registry setup, scheduler start, etc.) must be idempotent. Django explicitly documents this (“ready might be called more than once… write idempotent methods, or put a flag…”). [1]
Why it happens (common case)
With manage.py runserver, Django’s autoreloader starts two processes (a parent “watcher” + the child that actually serves), so your startup code can run twice unless you guard it. [2]
Robust duplicate-registration patterns
1) Prefer “idempotent registration” for signals: dispatch_uid
If your issue is duplicate signal handlers, use dispatch_uid so repeated imports / repeated ready() calls don’t connect the same receiver twice. [3]
# apps.py
from django.apps import AppConfig
from django.db.models.signals import post_save
class MyAppConfig(AppConfig):
name = "myapp"
def ready(self):
from .signals import my_receiver
from .models import MyModel
post_save.connect(
my_receiver,
sender=MyModel,
dispatch_uid="myapp.post_save.my_receiver", # prevents duplicates
)2) If you truly need “run once per runserver start”, guard for the autoreloader child
Typical guard:
import os
from django.apps import AppConfig
class MyAppConfig(AppConfig):
name = "myapp"
def ready(self):
if os.environ.get("RUN_MAIN") != "true":
return # skip the autoreloader parent process
# do one-time startup work hereThis pattern is commonly used specifically to avoid the runserver double-process behavior. [2]
A more Django-internal-focused variant discussed by Django devs is checking the autoreloader env var (DJANGO_AUTORELOAD_ENV == "true") to distinguish the “real” child process. [4]
3) Also remember: “once” is per process, not per deployment
Even if you fix runserver duplication, in production multiple worker processes will each run ready(). So anything that must be globally-singleton (cron scheduler, background thread, queue consumer) should usually live outside Django app startup (or be protected by an external lock/leader election), not just an in-process flag. This is the same underlying “idempotency” requirement Django documents for ready(). [1]
Sources:
[1] Django docs: AppConfig.ready() may run more than once; make it idempotent or guard with a flag. (docs.djangoproject.com)
[2] runserver starts two processes; common RUN_MAIN guard / --noreload discussion. (stackoverflow.com)
[3] Django signal docs: prevent duplicate registrations with dispatch_uid; notes ready() can execute more than once. (docs.djangoproject.com)
[4] Django-developers thread: DJANGO_AUTORELOAD_ENV used by the autoreloader to mark the “active” child process. (groups.google.com)
Citations:
- 1: https://docs.djangoproject.com/en/5.2/ref/applications/?utm_source=openai
- 2: https://stackoverflow.com/questions/33814615/how-to-avoid-appconfig-ready-method-running-twice-in-django?utm_source=openai
- 3: https://docs.djangoproject.com/en/4.2/topics/signals/?utm_source=openai
- 4: https://groups.google.com/g/django-developers/c/cMKq-H8Pk-c/m/w2XfubwBAgAJ?utm_source=openai
🏁 Script executed:
# Check the registration functions to see if they handle duplicates
fd -t f -name "*.py" -path "*/adit/core/*" | xargs rg -l "register_job_stats_collector|register_main_menu_item" | head -5Repository: openradx/adit
Length of output: 436
🏁 Script executed:
rg -A 10 "def register_main_menu_item" --type pyRepository: openradx/adit
Length of output: 39
🏁 Script executed:
rg -A 10 "def register_job_stats_collector" --type pyRepository: openradx/adit
Length of output: 551
🏁 Script executed:
rg -A 10 "def register_dicom_processor" --type pyRepository: openradx/adit
Length of output: 245
🏁 Script executed:
rg "from adit_radis_shared" adit/mass_transfer/apps.py -A 5Repository: openradx/adit
Length of output: 350
🏁 Script executed:
# Check if there's any indication of adit_radis_shared package location or if it's external
fd -t f "site.py" | xargs rg -l "MainMenuItem.*register_main_menu_item" 2>/dev/null | head -3Repository: openradx/adit
Length of output: 39
🌐 Web query:
Ruff ARG001 unused argument Python error rule enforcement
💡 Result:
ARG001 is Ruff’s unused-function-argument rule (from flake8-unused-arguments). It flags function parameters that are defined but never used. If the argument is intentionally unused, rename it to match Ruff’s “dummy variable” pattern (by default, a leading underscore, e.g. _unused). [1]
Enforcing ARG001
Enable it in your config (e.g., pyproject.toml):
[tool.ruff.lint]
select = ["ARG001"]
# or enable the whole family:
# select = ["ARG"]Ruff’s ARG rules treat “dummy” names as intentionally unused; you can customize what counts as “dummy” via lint.dummy-variable-rgx. [1]
Common ways to handle/allow exceptions
-
Mark the parameter as intentionally unused:
def callback(_unused, x): ...
(works because
_unusedmatches the dummy-variable pattern) [1] -
Suppress in one place with
noqa:def callback(unused, x): ... # noqa: ARG001
Ruff supports
noqaand also global/per-file ignores via configuration. [2] -
If you want “only flag certain argument names”: Ruff doesn’t currently offer a built-in allow/deny list of argument names for
ARGrules (there’s an open request). [3]
Sources: Ruff rule docs for ARG001 [1], Ruff suppression/config docs (noqa, ignores, per-file-ignores) [2], Ruff GitHub issue requesting name-based selection/ignores for ARG rules [3].
Guard against duplicate registrations in register_job_stats_collector on autoreload.
register_app() runs from AppConfig.ready(), which Django's autoreloader may call multiple times (especially during manage.py runserver). The register_job_stats_collector() function appends to a global list without deduplication, so multiple calls will create duplicate stats collectors. Consider an idempotency guard to make registration safe to re-run. (register_dicom_processor() is already safe by design since it uses dict assignment; register_main_menu_item() behavior depends on the external package.)
🔁 Suggested idempotency guard
+_APP_REGISTERED = False
+
def register_app():
+ global _APP_REGISTERED
+ if _APP_REGISTERED:
+ return
+ _APP_REGISTERED = True
from adit_radis_shared.common.site import MainMenuItem, register_main_menu_item
from adit.core.site import JobStats, register_dicom_processor, register_job_stats_collector
...📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def ready(self): | |
| register_app() | |
| # Put calls to db stuff in this signal handler | |
| post_migrate.connect(init_db, sender=self) | |
| def register_app(): | |
| from adit_radis_shared.common.site import MainMenuItem, register_main_menu_item | |
| from adit.core.site import JobStats, register_dicom_processor, register_job_stats_collector | |
| from .models import MassTransferJob, MassTransferTask | |
| from .processors import MassTransferTaskProcessor | |
| register_main_menu_item( | |
| MainMenuItem( | |
| url_name="mass_transfer_job_create", | |
| label=SECTION_NAME, | |
| ) | |
| ) | |
| register_dicom_processor(get_model_label(MassTransferTask), MassTransferTaskProcessor) | |
| def collect_job_stats() -> JobStats: | |
| counts: dict[MassTransferJob.Status, int] = {} | |
| for status in MassTransferJob.Status: | |
| counts[status] = MassTransferJob.objects.filter(status=status).count() | |
| return JobStats("Mass Transfer", "mass_transfer_job_list", counts) | |
| register_job_stats_collector(collect_job_stats) | |
| def ready(self): | |
| register_app() | |
| # Put calls to db stuff in this signal handler | |
| post_migrate.connect(init_db, sender=self) | |
| _APP_REGISTERED = False | |
| def register_app(): | |
| global _APP_REGISTERED | |
| if _APP_REGISTERED: | |
| return | |
| _APP_REGISTERED = True | |
| from adit_radis_shared.common.site import MainMenuItem, register_main_menu_item | |
| from adit.core.site import JobStats, register_dicom_processor, register_job_stats_collector | |
| from .models import MassTransferJob, MassTransferTask | |
| from .processors import MassTransferTaskProcessor | |
| register_main_menu_item( | |
| MainMenuItem( | |
| url_name="mass_transfer_job_create", | |
| label=SECTION_NAME, | |
| ) | |
| ) | |
| register_dicom_processor(get_model_label(MassTransferTask), MassTransferTaskProcessor) | |
| def collect_job_stats() -> JobStats: | |
| counts: dict[MassTransferJob.Status, int] = {} | |
| for status in MassTransferJob.Status: | |
| counts[status] = MassTransferJob.objects.filter(status=status).count() | |
| return JobStats("Mass Transfer", "mass_transfer_job_list", counts) | |
| register_job_stats_collector(collect_job_stats) |
🤖 Prompt for AI Agents
In `@adit/mass_transfer/apps.py` around lines 12 - 42, The register_app() function
registers a job-stats collector each time AppConfig.ready() runs, causing
duplicates on Django autoreload; make register_job_stats_collector idempotent by
checking for prior registration before appending: e.g., when registering
collect_job_stats from register_app(), detect an existing identical collector
(by function identity or a unique name/key) and skip re-registration, or mark
the function with an attribute (e.g., __registered__) after successful
registration; update register_app()/collect_job_stats usage to perform that
check so multiple calls from ready() are no-ops.
| register_job_stats_collector(collect_job_stats) | ||
|
|
||
|
|
||
| def init_db(**kwargs): |
There was a problem hiding this comment.
Avoid Ruff ARG001 for unused signal kwargs.
Rename the unused kwargs to a leading-underscore name (or add a noqa) to keep lint clean while preserving the signal handler signature.
🧹 Minimal fix
-def init_db(**kwargs):
+def init_db(**_kwargs):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def init_db(**kwargs): | |
| def init_db(**_kwargs): |
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 45-45: Unused function argument: kwargs
(ARG001)
🤖 Prompt for AI Agents
In `@adit/mass_transfer/apps.py` at line 45, The signal handler init_db currently
declares an unused **kwargs which triggers Ruff ARG001; rename that parameter to
a leading-underscore form (e.g., **_kwargs) so the handler signature remains
compatible with Django signals but linters treat it as intentionally unused;
update any internal references if present (none expected) and run the linter to
verify the warning is gone.
| def _find_studies( | ||
| self, | ||
| operator: DicomOperator, | ||
| mf: MassTransferFilter, | ||
| start: datetime, | ||
| end: datetime, | ||
| ) -> list[ResultDataset]: | ||
| max_results = settings.MASS_TRANSFER_MAX_SEARCH_RESULTS | ||
|
|
||
| query = QueryDataset.create( | ||
| StudyDate=(start.date(), end.date()), | ||
| StudyTime=(start.time(), end.time()), | ||
| ) | ||
|
|
||
| if mf.modality: | ||
| query.dataset.ModalitiesInStudy = mf.modality | ||
| if mf.study_description: | ||
| query.dataset.StudyDescription = mf.study_description | ||
|
|
||
| studies = list(operator.find_studies(query, limit_results=max_results + 1)) | ||
|
|
||
| if len(studies) > max_results: | ||
| if end - start < _MIN_SPLIT_WINDOW: | ||
| raise DicomError( | ||
| f"Time window too small ({start} to {end}) for filter {mf}." | ||
| ) | ||
|
|
||
| mid = start + (end - start) / 2 | ||
| return self._find_studies(operator, mf, start, mid) + self._find_studies( | ||
| operator, mf, mid, end | ||
| ) | ||
|
|
||
| return studies |
There was a problem hiding this comment.
Fix StudyTime filtering when splitting multi-day windows.
When the search window is split and spans multiple days, applying StudyTime=(start.time(), end.time()) to every date can exclude studies on non-boundary days. This can cause missing volumes once the query is recursively split. Consider only applying the StudyTime range when the window is a single day.
🛠️ Proposed fix
- query = QueryDataset.create(
- StudyDate=(start.date(), end.date()),
- StudyTime=(start.time(), end.time()),
- )
+ query_kwargs = {"StudyDate": (start.date(), end.date())}
+ if start.date() == end.date():
+ query_kwargs["StudyTime"] = (start.time(), end.time())
+ query = QueryDataset.create(**query_kwargs)🧰 Tools
🪛 Ruff (0.14.14)
[warning] 326-328: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@adit/mass_transfer/processors.py` around lines 303 - 335, In _find_studies,
the StudyTime range is being applied for every date when splitting multi-day
windows, which can exclude studies on intermediate days; change the
QueryDataset.create call so StudyTime is only set when start.date() ==
end.date() (i.e., the window is a single day), while always setting
StudyDate=(start.date(), end.date()), and keep the existing logic that adds
query.dataset.ModalitiesInStudy and StudyDescription using mf.modality and
mf.study_description; ensure recursive splits continue to pass correct start/end
so StudyTime will be applied only for single-day queries.
| <dt class="col-sm-3">Filters</dt> | ||
| <dd class="col-sm-9"> | ||
| {% if job.filters.all %} | ||
| {% for f in job.filters.all %} | ||
| <div>{{ f }}</div> | ||
| {% empty %} | ||
| — | ||
| {% endfor %} | ||
| {% endif %} |
There was a problem hiding this comment.
Render an empty-state placeholder for filters.
{% if job.filters.all %} prevents the {% empty %} branch from ever showing, so jobs without filters render a blank cell. Remove the outer if or add an explicit else.
🛠️ Proposed fix
- <dd class="col-sm-9">
- {% if job.filters.all %}
- {% for f in job.filters.all %}
- <div>{{ f }}</div>
- {% empty %}
- —
- {% endfor %}
- {% endif %}
- </dd>
+ <dd class="col-sm-9">
+ {% for f in job.filters.all %}
+ <div>{{ f }}</div>
+ {% empty %}
+ —
+ {% endfor %}
+ </dd>🤖 Prompt for AI Agents
In `@adit/mass_transfer/templates/mass_transfer/mass_transfer_job_detail.html`
around lines 40 - 48, The template uses an outer conditional ({% if
job.filters.all %}) which prevents the {% empty %} clause from ever running;
remove that outer if around the for-loop (or replace it with an explicit else)
so the for f in job.filters.all ... {% empty %} — {% endfor %} executes for
empty filter sets and renders the placeholder; update the block around
job.filters.all accordingly.
Pull Request Review: Mass Transfer AppOverviewThis PR introduces a comprehensive Mass Transfer facility for bulk DICOM export with NIfTI conversion, implementing time-partitioned processing and reusable filter management. The implementation follows ADITs architectural patterns and integrates well with the existing job/task system. Positive Aspects
Critical Issues1. Hardcoded NIfTI Conversion (adit/mass_transfer/forms.py:151-152)The form hardcodes convert_to_nifti = True but the DicomJob model may not have this field. This could cause runtime errors. Recommendation: Verify the field exists in the base model, or add proper validation. 2. Security: Command Injection Risk (adit/mass_transfer/processors.py:427-438)While using a list avoids shell injection, series_name uses sanitize_filename(). Verify that this properly escapes all dangerous characters for command-line use. Recommendation: Add explicit validation or use a safer filename generation approach. Consider setting check=True in subprocess.run. 3. Resource Leaks: Incomplete Cleanup (adit/mass_transfer/processors.py:450-465)Silent failure on cleanup means disk space could accumulate. The " (cleaned)" suffix trick is fragile. Recommendation: Add periodic cleanup job, log cleanup failures at WARNING level, consider using temporary directories. 4. Performance: Potential N+1 Query (adit/mass_transfer/processors.py:208-306)For each filter, for each study, there is a separate DICOM C-FIND query. With multiple filters this could query the same studies repeatedly. Recommendation: Consider caching study results or deduplicating queries before hitting the PACS server. 5. Data Integrity: Race Condition (adit/mass_transfer/processors.py:269-287)If the volume already exists, all fields are blindly overwritten. Concurrent processing could cause lost updates or inconsistent state. Recommendation: Add locking or check the volumes status before updating fields. Minor Issues
Security Considerations
Performance Considerations
Recommendations SummaryMust Fix Before Merge:
Should Fix Before Merge: Overall AssessmentThis is a well-designed and valuable feature that properly extends ADITs architecture. The code quality is generally good with proper testing and integration. However, there are several issues that should be addressed before merging around field validation, resource management, security, and performance. With the recommended fixes, this will be a solid addition to ADIT. Great work on maintaining consistency with existing patterns! |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@adit/mass_transfer/forms.py`:
- Around line 149-161: The current save stores a bound method on self.save_tasks
when commit=False which is unconventional; instead assign a no-arg closure to
the standard save_m2m hook or document the pattern. Concretely, after job =
super().save(commit=False) and setting job fields, if commit is False set
self.save_m2m = lambda: self._save_tasks(job) (or alternatively set
self._pending_job = job and override save_m2m to call
self._save_tasks(self._pending_job)); this keeps the callback signature
consistent with Django and avoids requiring callers to call
form.save_tasks(job). Ensure changes reference the save method, self.save_m2m,
and _save_tasks.
🧹 Nitpick comments (2)
adit/mass_transfer/processors.py (2)
80-97: Consider consolidating identical path helpers.
_volume_export_pathand_volume_output_pathhave identical implementations. If this duplication is intentional to allow divergent paths in the future, consider adding a comment. Otherwise, a single helper could be used.♻️ Optional: Consolidate into a single helper
-def _volume_export_path( - base_dir: Path, - study_dt: datetime, - subject_id: str, - series_name: str, -) -> Path: - year_month = study_dt.strftime("%Y%m") - return base_dir / year_month / subject_id / series_name - - -def _volume_output_path( +def _volume_path( base_dir: Path, study_dt: datetime, subject_id: str, series_name: str, ) -> Path: + """Construct path for volume export or output directories.""" year_month = study_dt.strftime("%Y%m") return base_dir / year_month / subject_id / series_name
269-302: Consider usingupdate_or_createfor volume records.The current pattern of
get_or_createfollowed by manual field updates on existing records could be simplified withupdate_or_create. This would reduce code duplication and make the intent clearer.♻️ Optional: Use update_or_create
- volume, created = MassTransferVolume.objects.get_or_create( + defaults = { + "partition_key": self.mass_task.partition_key, + "patient_id": str(study.PatientID), + "accession_number": str(study.get("AccessionNumber", "")), + "study_instance_uid": str(study.StudyInstanceUID), + "modality": str(series.Modality), + "study_description": str(study.get("StudyDescription", "")), + "series_description": str(series.get("SeriesDescription", "")), + "series_number": series_number, + "study_datetime": timezone.make_aware(study_dt), + "institution_name": str(series.get("InstitutionName", "")), + "number_of_images": _parse_int( + series.get("NumberOfSeriesRelatedInstances"), default=0 + ), + } + volume, _ = MassTransferVolume.objects.update_or_create( job=job, series_instance_uid=series_uid, - defaults={ - "partition_key": self.mass_task.partition_key, - ... - }, + defaults=defaults, ) - if not created: - volume.partition_key = self.mass_task.partition_key - ... - volume.save()
| def save(self, commit: bool = True): | ||
| job = super().save(commit=False) | ||
| # Mass transfer always converts to NIfTI | ||
| job.convert_to_nifti = True | ||
| job.urgent = False | ||
|
|
||
| if commit: | ||
| job.save() | ||
| self.save_m2m() | ||
| self._save_tasks(job) | ||
| else: | ||
| self.save_tasks = self._save_tasks | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "MassTransferJobForm" --type=py -A 15 adit/mass_transfer/views.pyRepository: openradx/adit
Length of output: 1467
🏁 Script executed:
# Check the _save_tasks method signature
rg -n "_save_tasks" --type=py -B2 -A8 adit/mass_transfer/forms.pyRepository: openradx/adit
Length of output: 601
🏁 Script executed:
# Search for any calls to save_tasks after save(commit=False)
rg -n "save\(commit=False\)" --type=py -A5 adit/mass_transfer/Repository: openradx/adit
Length of output: 421
🏁 Script executed:
# Look for other forms that use similar callback patterns
rg -n "self\.[a-z_]+ = self\._[a-z_]+" --type=py adit/Repository: openradx/adit
Length of output: 379
🏁 Script executed:
# Check MassTransferJobCreateView for form_valid or custom save logic
rg -n "MassTransferJobCreateView" --type=py -A 30 adit/mass_transfer/views.pyRepository: openradx/adit
Length of output: 1362
🏁 Script executed:
# Search for calls to form.save_tasks anywhere
rg -n "form\.save_tasks\(" --type=pyRepository: openradx/adit
Length of output: 39
🏁 Script executed:
# Check if there's a form_valid method in the view
rg -n "def form_valid" --type=py -B3 -A8 adit/mass_transfer/views.pyRepository: openradx/adit
Length of output: 455
🏁 Script executed:
# Find DicomJobCreateView and check how it handles form.save_tasks
rg -n "class DicomJobCreateView" --type=py -A 30 adit/Repository: openradx/adit
Length of output: 1633
🏁 Script executed:
# Also check how form_valid is implemented in the parent class
rg -n "def form_valid" --type=py -B2 -A 15 adit/core/ | head -100Repository: openradx/adit
Length of output: 3171
Document the save_tasks callback pattern or use standard Django save_m2m approach.
When commit=False, self.save_tasks stores the bound method self._save_tasks. The caller must explicitly invoke form.save_tasks(job) after saving the form. This pattern is unconventional for Django forms (where save_m2m takes no arguments). Clarify in documentation or tests how external code should handle this, or consider storing a closure or using Django's standard save_m2m hook instead.
🤖 Prompt for AI Agents
In `@adit/mass_transfer/forms.py` around lines 149 - 161, The current save stores
a bound method on self.save_tasks when commit=False which is unconventional;
instead assign a no-arg closure to the standard save_m2m hook or document the
pattern. Concretely, after job = super().save(commit=False) and setting job
fields, if commit is False set self.save_m2m = lambda: self._save_tasks(job) (or
alternatively set self._pending_job = job and override save_m2m to call
self._save_tasks(self._pending_job)); this keeps the callback signature
consistent with Django and avoids requiring callers to call
form.save_tasks(job). Ensure changes reference the save method, self.save_m2m,
and _save_tasks.
adit/mass_transfer/processors.py
Outdated
| cmd = [ | ||
| "dcm2niix", | ||
| "-z", | ||
| "y", | ||
| "-o", | ||
| str(output_path), | ||
| "-f", | ||
| series_name, | ||
| str(volume.exported_folder), | ||
| ] | ||
|
|
||
| result = subprocess.run(cmd, check=False, capture_output=True, text=True) | ||
| if result.returncode != 0: | ||
| raise DicomError( | ||
| f"Conversion failed for series {volume.series_instance_uid}: {result.stderr}" | ||
| ) | ||
|
|
||
| volume.converted_file = str(output_path / f"{series_name}.nii.gz") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, examine the full _convert_volume method to understand the context
cat -n adit/mass_transfer/processors.py | sed -n '403,448p'Repository: openradx/adit
Length of output: 1873
🏁 Script executed:
#!/bin/bash
# Search for any existing dcm2niix or nii.gz handling patterns in the codebase
rg -n "dcm2niix|nii\.gz|converted_file" --type=py -C2Repository: openradx/adit
Length of output: 5423
🏁 Script executed:
#!/bin/bash
# Check if there are any tests related to dcm2niix output handling
fd -type f -name "*.py" | xargs grep -l "dcm2niix" | head -20Repository: openradx/adit
Length of output: 227
🏁 Script executed:
#!/bin/bash
# Examine the DicomToNiftiConverter class
cat -n adit/core/utils/dicom_to_nifti_converter.pyRepository: openradx/adit
Length of output: 2228
🏁 Script executed:
#!/bin/bash
# Check if there are any tests for the _convert_volume method specifically
fd -type f -name "*.py" -path "*/test*" | xargs grep -l "_convert_volume\|mass_transfer.*test" | head -10Repository: openradx/adit
Length of output: 227
🏁 Script executed:
#!/bin/bash
# Search for any file existence checks or output validation after dcm2niix calls
rg -n "nii\.gz.*exist|Path.*exist|converted_file" --type py -A3 -B3Repository: openradx/adit
Length of output: 2514
🏁 Script executed:
#!/bin/bash
# Check the MassTransferVolume model to understand series-to-volume mapping
rg -n "class MassTransferVolume|series.*=" adit/mass_transfer/models.py | head -30Repository: openradx/adit
Length of output: 474
🏁 Script executed:
#!/bin/bash
# Look at how volumes are created from DICOM series - check the export process
rg -n "_export_series|_create.*volume" adit/mass_transfer/processors.py -A5 | head -50Repository: openradx/adit
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Search for mass_transfer specific tests
fd -type f -name "*test*.py" adit/mass_transfer/ | head -10Repository: openradx/adit
Length of output: 227
🏁 Script executed:
#!/bin/bash
# Check if there are mass_transfer acceptance tests that validate the converted_file assumption
find adit/mass_transfer -name "*test*.py" -type f -exec cat {} \;Repository: openradx/adit
Length of output: 7993
🏁 Script executed:
#!/bin/bash
# Check the batch_transfer test to understand if ROI1 is from separate series or same series
grep -n "ROI1\|Topogramm" adit/batch_transfer/tests/acceptance/test_batch_transfer.py -B5 -A10Repository: openradx/adit
Length of output: 699
🏁 Script executed:
#!/bin/bash
# Search for how dcm2niix -f flag behaves - check for any documentation or comments
rg -n "\-f.*format|\-f.*output|dcm2niix.*behavior" --type py -A2 -B2Repository: openradx/adit
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Check what batch_transfer actually uses - does it use the same _convert_volume code?
rg -n "batch_transfer.*processors|batch_transfer.*convert" adit/batch_transfer/ -C2 | head -50Repository: openradx/adit
Length of output: 2462
🏁 Script executed:
#!/bin/bash
# Check if there's a test for mass_transfer that actually calls _convert_volume with real dcm2niix
rg -n "test.*convert_volume|_convert_volume" adit/mass_transfer/tests/ -C3Repository: openradx/adit
Length of output: 1216
🏁 Script executed:
#!/bin/bash
# Get the full batch_transfer test file to understand the setup
wc -l adit/batch_transfer/tests/acceptance/test_batch_transfer.pyRepository: openradx/adit
Length of output: 120
🏁 Script executed:
#!/bin/bash
# Look at the first part of batch_transfer test to see how volumes/series are set up
head -n 200 adit/batch_transfer/tests/acceptance/test_batch_transfer.pyRepository: openradx/adit
Length of output: 6880
🏁 Script executed:
#!/bin/bash
# Check how batch_transfer actually creates/manages volumes and their conversion
rg -n "class.*Processor|def.*convert|MassTransferVolume\|BatchTransferVolume" adit/batch_transfer/ --type py -A3 | head -80Repository: openradx/adit
Length of output: 2094
🏁 Script executed:
#!/bin/bash
# Check if batch_transfer uses the same _convert_volume method or different code
rg -n "from adit.mass_transfer\|import.*mass_transfer\|DicomToNiftiConverter" adit/batch_transfer/ --type pyRepository: openradx/adit
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Find TransferTaskProcessor to see the common base class
rg -n "class TransferTaskProcessor" --type py -A20Repository: openradx/adit
Length of output: 1490
🏁 Script executed:
#!/bin/bash
# Check the batch_transfer processors.py file to see what it actually does
head -n 100 adit/batch_transfer/processors.pyRepository: openradx/adit
Length of output: 541
🏁 Script executed:
#!/bin/bash
# Search for where the actual NIfTI conversion happens - look for DicomToNiftiConverter usage
rg -n "DicomToNiftiConverter\|dcm2niix" adit/ --type py | grep -v "test\|migration"Repository: openradx/adit
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Check if there's a convert_to_nifti function or similar in batch_transfer or core
rg -n "def.*convert.*nifti\|def.*convert_to_nifti" adit/ --type py -A5Repository: openradx/adit
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Search for where batch transfer actually implements NIfTI conversion - check views or tasks
rg -n "nifti\|NIfTI\|convert" adit/batch_transfer/ --type py -i | grep -v test | head -30Repository: openradx/adit
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Look for how batch_transfer actually uses the conversion - check if it has its own implementation
find adit/batch_transfer -name "*.py" -exec grep -l "subprocess\|convert" {} \; | head -10Repository: openradx/adit
Length of output: 421
🏁 Script executed:
#!/bin/bash
# Check the batch_transfer views to see how convert_to_nifti is handled
rg -n "convert_to_nifti\|BATCH_TRANSFER_CONVERT_TO_NIFTI" adit/batch_transfer/views.py -A10 -B5Repository: openradx/adit
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Check the migration that adds convert_to_nifti field
cat adit/batch_transfer/migrations/0030_batchtransferjob_convert_to_nifti.pyRepository: openradx/adit
Length of output: 473
🏁 Script executed:
#!/bin/bash
# Search for any Celery tasks or background processing that might do the actual conversion
rg -n "@.*task|@shared_task|def.*task" adit/batch_transfer/ --type py -A3 | head -50Repository: openradx/adit
Length of output: 3693
🏁 Script executed:
#!/bin/bash
# Search for procrastinate tasks or job processing functions
rg -n "@app.task|register_task|def.*nifti" adit/ --type py -A3 | grep -v test | head -80Repository: openradx/adit
Length of output: 1271
🏁 Script executed:
#!/bin/bash
# Look for any process_job or similar function that might trigger conversion
rg -n "def process|def run|convert_to_nifti.*True" adit/batch_transfer/ --type py -B2 -A10 | head -100Repository: openradx/adit
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Examine the _transfer_to_nifti method fully
sed -n '136,200p' adit/core/processors.pyRepository: openradx/adit
Length of output: 3152
🏁 Script executed:
#!/bin/bash
# Check where _transfer_to_nifti is called from
rg -n "_transfer_to_nifti" adit/ --type py -B5 -A2Repository: openradx/adit
Length of output: 1200
🏁 Script executed:
#!/bin/bash
# Check how converted_file is used after being stored - is it validated/checked?
rg -n "converted_file" adit/mass_transfer/ --type py -A3 -B3Repository: openradx/adit
Length of output: 2514
Verify file existence after dcm2niix conversion and handle potential multiple outputs.
The code assumes dcm2niix produces exactly one file named {series_name}.nii.gz without validating its existence. However, dcm2niix can produce multiple outputs from a single series with variant suffixes (e.g., ROI reconstructions), though only one file path is stored. Consider either:
- Validating that the expected file exists after conversion
- Scanning the output directory to handle cases where dcm2niix generates multiple outputs per series
🧰 Tools
🪛 Ruff (0.14.14)
[error] 438-438: subprocess call: check for execution of untrusted input
(S603)
[warning] 440-442: Avoid specifying long messages outside the exception class
(TRY003)
Pull Request Review: Mass Transfer FeatureThis PR introduces a comprehensive mass transfer facility. Overall this is well-implemented and follows ADIT's patterns. Here are my findings: Critical Issues
Important Issues
Security Review✅ User access validated for nodes Performance✅ Dedicated worker prevents blocking Test Coverage✅ Good unit tests for core logic RecommendationApprove with changes requested. This is solid work that follows project standards. Critical issues are straightforward fixes that don't require architectural changes. The mass transfer feature will be a valuable addition to ADIT! |
Pull Request Review: Mass Transfer FeatureI've completed a comprehensive review of PR #298, which introduces a new Mass Transfer facility for bulk DICOM export and conversion. This is a substantial feature addition (~2,300 lines) that follows the existing ADIT architecture patterns well. ✅ StrengthsArchitecture & Design
Code Quality
Testing
🔴 Critical Issues1. Command Injection Vulnerability (adit/mass_transfer/processors.py:1455-1466)The cmd = [
"dcm2niix",
"-z", "y",
"-o", str(output_path),
"-f", series_name, # ⚠️ series_name is derived from series_description
str(volume.exported_folder),
]
result = subprocess.run(cmd, check=False, capture_output=True, text=True)While Recommendation:
2. Race Condition in Pseudonym Assignment (adit/mass_transfer/processors.py:1176-1184)The pseudonymization logic has a potential race condition: for _, study_volumes in volumes_by_study.items():
pseudonym = ""
if job.pseudonymize:
existing_pseudonym = next(
(v.pseudonym for v in study_volumes if v.pseudonym),
None,
)
pseudonym = existing_pseudonym or uuid.uuid4().hex # ⚠️ Not atomicIf multiple workers process different partitions of the same job concurrently, two volumes from the same study in different partitions could receive different pseudonyms. Recommendation:
3. Incomplete Cleanup on Failure (adit/core/tasks.py:111-147)The for volume in volumes:
if volume.status == MassTransferVolume.Status.CONVERTED:
continue # ⚠️ Skips cleanup for converted volumesThis could leave DICOM files in the export directory even after conversion completes successfully. Recommendation:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@adit/mass_transfer/forms.py`:
- Around line 96-125: The Meta inner class in adit/mass_transfer/forms.py
declares mutable class attributes labels, help_texts, and widgets; annotate each
with typing.ClassVar to indicate they are class-level constants and satisfy Ruff
RUF012. Import ClassVar from typing if not present, and add ClassVar[...] type
annotations on Meta.labels, Meta.help_texts and Meta.widgets (using appropriate
dict types such as ClassVar[dict[str, str]] or ClassVar[dict[str, Any]] for
widgets) while leaving values unchanged.
- Around line 19-78: Annotate the mutable class attributes to satisfy RUF012 by
marking MODALITY_CHOICES as a ClassVar[List[Tuple[str, str]]] and the
Meta.labels mapping as a ClassVar[Dict[str, str]] (or the appropriate mapping
type) and add the necessary imports from typing (ClassVar, List, Tuple, Dict) at
the top; specifically update the declaration of MODALITY_CHOICES and the inner
Meta.labels in the form class (referencing MODALITY_CHOICES and Meta.labels) so
they are typed as ClassVar to avoid shared-mutable-class-attribute lint errors.
🧹 Nitpick comments (5)
adit/core/tasks.py (2)
74-77: Consider narrowing the exception type for import failure.The bare
Exceptioncatch is flagged by Ruff (BLE001). For import failures,ImportError(which includesModuleNotFoundError) would be more appropriate and explicit about the expected failure mode.🔧 Suggested fix
try: from adit.mass_transfer.models import MassTransferVolume - except Exception: + except ImportError: return
94-97: Cleanup failure leaves volume in potentially confusing state.When cleanup fails (line 94-97), the volume retains its original status (e.g.,
EXPORTED) withexported_folderstill populated, but the folder may be in an inconsistent state. Consider setting status toERRORregardless, so downstream logic doesn't attempt to process it.🔧 Suggested fix
except Exception as err: volume.add_log(f"Cleanup failed: {err}") + volume.status = MassTransferVolume.Status.ERROR volume.save() continueadit/mass_transfer/tests/test_cleanup.py (2)
24-56: Consider using factory-boy for mass transfer model creation.The test mixes factory-boy (
UserFactory,DicomServerFactory,DicomFolderFactory) with directobjects.create()calls for mass transfer models. For consistency and maintainability, consider creatingMassTransferJobFactory,MassTransferTaskFactory, andMassTransferVolumeFactory. As per coding guidelines, "Use factory-boy for generating test data in test files."
17-18: Consider adding edge case tests.This test covers the primary success path. For more robust coverage, consider adding tests for:
- Volume with
CONVERTEDstatus (should be skipped per line 85-86 in tasks.py)- Non-matching
partition_key(cleanup should not affect it)- Non-existent export directory (should handle gracefully)
#!/bin/bash # Check if there are other cleanup-related tests in the mass_transfer app fd -t f 'test*.py' adit/mass_transfer/tests/ --exec grep -l 'cleanup' {} \;adit/mass_transfer/forms.py (1)
178-198: Consider populatingself.tasksor removing it.
self.tasksis initialized but never set, so callers will always see an empty list. If it’s intended for downstream use, assign it afterbulk_create; otherwise, drop the attribute to avoid confusion.♻️ Suggested fix
MassTransferTask.objects.bulk_create(tasks) + self.tasks = tasks
| MODALITY_CHOICES = [ | ||
| ("", "Any modality"), | ||
| ("CT", "CT"), | ||
| ("MR", "MR"), | ||
| ("XR", "XR"), | ||
| ("US", "US"), | ||
| ("NM", "NM"), | ||
| ("PT", "PT"), | ||
| ("MG", "MG"), | ||
| ("CR", "CR"), | ||
| ("DX", "DX"), | ||
| ("RF", "RF"), | ||
| ("XA", "XA"), | ||
| ("OT", "OT"), | ||
| ("SR", "SR"), | ||
| ("PR", "PR"), | ||
| ("ECG", "ECG"), | ||
| ("SEG", "SEG"), | ||
| ("RTSTRUCT", "RTSTRUCT"), | ||
| ("RTPLAN", "RTPLAN"), | ||
| ("RTDOSE", "RTDOSE"), | ||
| ("RTIMAGE", "RTIMAGE"), | ||
| ("SM", "SM"), | ||
| ("IVUS", "IVUS"), | ||
| ("OCT", "OCT"), | ||
| ("ES", "ES"), | ||
| ("OP", "OP"), | ||
| ("IO", "IO"), | ||
| ("FA", "FA"), | ||
| ("RG", "RG"), | ||
| ("MS", "MS"), | ||
| ("DOC", "DOC"), | ||
| ] | ||
|
|
||
| modality = forms.ChoiceField( | ||
| required=False, | ||
| choices=MODALITY_CHOICES, | ||
| help_text="Leave blank for any modality.", | ||
| ) | ||
|
|
||
| class Meta: | ||
| model = MassTransferFilter | ||
| fields = ( | ||
| "name", | ||
| "modality", | ||
| "institution_name", | ||
| "apply_institution_on_study", | ||
| "study_description", | ||
| "series_description", | ||
| "series_number", | ||
| ) | ||
| labels = { | ||
| "name": "Filter name", | ||
| "modality": "Modality", | ||
| "institution_name": "Institution name", | ||
| "apply_institution_on_study": "Apply institution filter on study", | ||
| "study_description": "Study description", | ||
| "series_description": "Series description", | ||
| "series_number": "Series number", | ||
| } |
There was a problem hiding this comment.
Annotate mutable class attributes to satisfy RUF012.
Ruff flags MODALITY_CHOICES and Meta.labels as mutable class attributes. Add ClassVar annotations to avoid lint failures and accidental shared mutation.
🔧 Proposed fix
-from typing import cast
+from typing import ClassVar, cast
@@
-class MassTransferFilterForm(forms.ModelForm):
- MODALITY_CHOICES = [
+class MassTransferFilterForm(forms.ModelForm):
+ MODALITY_CHOICES: ClassVar[list[tuple[str, str]]] = [
@@
- labels = {
+ labels: ClassVar[dict[str, str]] = {
"name": "Filter name",
"modality": "Modality",
"institution_name": "Institution name",
"apply_institution_on_study": "Apply institution filter on study",
"study_description": "Study description",
"series_description": "Series description",
"series_number": "Series number",
}🧰 Tools
🪛 Ruff (0.14.14)
[warning] 19-51: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
[warning] 70-78: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
🤖 Prompt for AI Agents
In `@adit/mass_transfer/forms.py` around lines 19 - 78, Annotate the mutable class
attributes to satisfy RUF012 by marking MODALITY_CHOICES as a
ClassVar[List[Tuple[str, str]]] and the Meta.labels mapping as a
ClassVar[Dict[str, str]] (or the appropriate mapping type) and add the necessary
imports from typing (ClassVar, List, Tuple, Dict) at the top; specifically
update the declaration of MODALITY_CHOICES and the inner Meta.labels in the form
class (referencing MODALITY_CHOICES and Meta.labels) so they are typed as
ClassVar to avoid shared-mutable-class-attribute lint errors.
| class Meta: | ||
| model = MassTransferJob | ||
| fields = ( | ||
| "source", | ||
| "destination", | ||
| "start_date", | ||
| "end_date", | ||
| "partition_granularity", | ||
| "filters", | ||
| "pseudonymize", | ||
| "send_finished_mail", | ||
| ) | ||
| labels = { | ||
| "start_date": "Start date", | ||
| "end_date": "End date", | ||
| "partition_granularity": "Partition granularity", | ||
| "pseudonymize": "Pseudonymize data", | ||
| "send_finished_mail": "Send Email when job is finished", | ||
| } | ||
| help_texts = { | ||
| "partition_granularity": "Daily or weekly partition windows.", | ||
| "pseudonymize": ( | ||
| "When disabled, patient identifiers are preserved and output folders use " | ||
| "Patient ID." | ||
| ), | ||
| } | ||
| widgets = { | ||
| "start_date": forms.DateInput(attrs={"type": "date"}), | ||
| "end_date": forms.DateInput(attrs={"type": "date"}), | ||
| } |
There was a problem hiding this comment.
Add ClassVar annotations for Meta mappings (RUF012).
labels, help_texts, and widgets are mutable class attributes. Annotate with ClassVar to satisfy Ruff and clarify intent.
🔧 Proposed fix
class Meta:
@@
- labels = {
+ labels: ClassVar[dict[str, str]] = {
"start_date": "Start date",
"end_date": "End date",
"partition_granularity": "Partition granularity",
"pseudonymize": "Pseudonymize data",
"send_finished_mail": "Send Email when job is finished",
}
- help_texts = {
+ help_texts: ClassVar[dict[str, str]] = {
"partition_granularity": "Daily or weekly partition windows.",
"pseudonymize": (
"When disabled, patient identifiers are preserved and output folders use "
"Patient ID."
),
}
- widgets = {
+ widgets: ClassVar[dict[str, forms.DateInput]] = {
"start_date": forms.DateInput(attrs={"type": "date"}),
"end_date": forms.DateInput(attrs={"type": "date"}),
}🧰 Tools
🪛 Ruff (0.14.14)
[warning] 108-114: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
[warning] 115-121: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
[warning] 122-125: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
🤖 Prompt for AI Agents
In `@adit/mass_transfer/forms.py` around lines 96 - 125, The Meta inner class in
adit/mass_transfer/forms.py declares mutable class attributes labels,
help_texts, and widgets; annotate each with typing.ClassVar to indicate they are
class-level constants and satisfy Ruff RUF012. Import ClassVar from typing if
not present, and add ClassVar[...] type annotations on Meta.labels,
Meta.help_texts and Meta.widgets (using appropriate dict types such as
ClassVar[dict[str, str]] or ClassVar[dict[str, Any]] for widgets) while leaving
values unchanged.
Pull Request Review: Mass Transfer FeatureOverviewThis PR introduces a substantial new feature bringing volume export workflow into ADIT as a first-class transfer type. The implementation includes a dedicated worker queue, partitioned exports, and NIfTI conversion capabilities. Overall, this is well-structured code that follows the project patterns, but there are some important issues to address. Critical Issues1. Security: Unsafe subprocess execution (processors.py:427-442)The dcm2niix command uses user-influenced paths without proper validation. While sanitize_filename() is used for series_name, there is no explicit validation that paths do not escape their intended directories. An attacker could potentially craft malicious DICOM metadata to create paths outside the intended directory structure. Recommendation: Add explicit path validation to ensure all paths remain within expected base directories using path.resolve().is_relative_to() 2. Resource Management: Missing cleanup in exception paths (processors.py:161-173)When _export_volume succeeds but _convert_volume fails, the exported DICOM files remain on disk indefinitely. If _convert_volume fails, volume.exported_folder might not be set, making _cleanup_export ineffective. This could lead to disk space exhaustion on long-running mass transfers. Recommendation: Ensure exported_folder is saved before conversion, or use try/finally blocks for cleanup. 3. Data Integrity: Race condition in volume creation (processors.py:269-302)The get_or_create pattern updates an existing volume even if it belongs to a different partition. If a series appears in multiple partitions, later tasks will overwrite the partition_key and other metadata of existing volumes. Recommendation: Either make partition_key part of the uniqueness constraint, OR skip updating if the volume already exists with status CONVERTED Major Issues4. Performance: Recursive study splitting without depth limit (processors.py:329-338)The recursive time window splitting has no maximum recursion depth. For very dense time periods, this could result in deep recursion and stack overflow. Recommendation: Add a depth counter parameter or use iterative splitting with a queue. 5. Error Handling: Silent failures in institution filtering (processors.py:342-359)The _study_has_institution method returns False for any exception without logging. If find_series fails, potentially matching studies are silently excluded. Recommendation: Let exceptions propagate or add logging when series queries fail. 6. Configuration: Missing environment variable validationNo runtime validation that MASS_TRANSFER_EXPORT_BASE_DIR exists and is writable. Workers will crash on first task if the directory is inaccessible. Recommendation: Add startup validation in apps.py or Django system checks. Minor Issues
Positive Aspects
RecommendationsHigh Priority: Fix issues 1-3, add recursion limit (4), and environment validation (6) Final AssessmentThis is a strong implementation that demonstrates good understanding of the codebase architecture. However, the security issue with subprocess paths and resource cleanup concerns should be addressed before merging. Once the critical issues are resolved, this will be a valuable addition to ADIT. Recommendation: Request changes for critical issues, approve with comments for the rest. |
PR Review: Mass Transfer FeatureThis PR introduces a comprehensive mass transfer facility for bulk DICOM export and conversion. I've reviewed the implementation for code quality, security, performance, and test coverage. Overall, this is a well-structured feature with good separation of concerns. Here are my findings: Architecture & Design ✅Strengths:
Observations:
Code Quality 👍Positive aspects:
Minor suggestions:
Security 🔒Good practices:
No significant security concerns found. Performance Considerations ⚡Well-designed aspects:
Potential concerns:
Error Handling & Robustness 🛡️Excellent retry strategy:
Good exception handling:
Suggestion:
Test Coverage ✅Good coverage:
Missing tests:
Suggestion: Add tests for the filter matching logic, especially for institution_name filtering at study vs series level (processors.py:221-223, 242-248). Configuration & Deployment 🚀Well-structured:
Note: Verify that Django Conventions 📋Excellent adherence:
Documentation 📚Good:
Could be improved:
Specific Issues Found 🐛Critical:None Medium:
Low:
Recommendations 📝
SummaryThis is a well-architected and production-ready feature. The code follows project conventions, has good error handling, and includes reasonable test coverage. The identified issues are minor and mostly around edge cases and documentation. The integration of retry strategy across all DICOM tasks is particularly well done. Recommendation: Approve with minor suggestions Great work! 🎉 |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@adit/mass_transfer/processors.py`:
- Around line 329-338: The recursive split in _find_studies uses mid derived
from start/end but passes the same mid to both recursive calls causing studies
whose StudyDate equals mid.date() (with full-day StudyTime ranges) to be
included in both halves; change the boundary handling so one side is exclusive —
e.g., when computing mid or when calling _find_studies, ensure the left call
uses range [start, mid) and the right uses [mid, end] (or vice versa), by
adjusting mid to an exclusive instant (subtract one second/microsecond or switch
the comparison used for StudyDate/StudyTime) so that studies on mid.date() are
only returned by one branch; keep _find_volumes dedup as-is.
- Around line 30-36: The _dicom_match implementation uses regex.search which
allows substring matches; change it to use regex.fullmatch so the converted
unanchored pattern (from convert_to_python_regex) must match the entire value.
Update the function _dicom_match to call regex.fullmatch(str(value)) instead of
regex.search(...); ensure callers that filter study_description,
series_description, and institution_name continue to use _dicom_match unchanged.
🧹 Nitpick comments (8)
adit/mass_transfer/processors.py (6)
80-97:_volume_export_pathand_volume_output_pathare identical implementations.Both functions have the exact same body:
base_dir / year_month / subject_id / series_name. If they're intended to diverge in the future, a comment explaining the intent would help. Otherwise, consolidate into a single helper.Suggested consolidation
-def _volume_export_path( - base_dir: Path, - study_dt: datetime, - subject_id: str, - series_name: str, -) -> Path: - year_month = study_dt.strftime("%Y%m") - return base_dir / year_month / subject_id / series_name - - -def _volume_output_path( - base_dir: Path, - study_dt: datetime, - subject_id: str, - series_name: str, -) -> Path: - year_month = study_dt.strftime("%Y%m") - return base_dir / year_month / subject_id / series_name +def _volume_path( + base_dir: Path, + study_dt: datetime, + subject_id: str, + series_name: str, +) -> Path: + year_month = study_dt.strftime("%Y%m") + return base_dir / year_month / subject_id / series_name
175-180: "Volumes processed" count is misleading — it includes already-converted volumes that were skipped.
len(volumes)counts all discovered volumes, but volumes withstatus == CONVERTEDare skipped at line 158. The log suggests they were all processed in this run. Consider tracking the actual number attempted.Proposed fix
+ skipped_count = 0 + for _, study_volumes in volumes_by_study.items(): ... for volume in study_volumes: if volume.status == MassTransferVolume.Status.CONVERTED: + skipped_count += 1 continue ... log_lines = [ f"Partition {self.mass_task.partition_key}", - f"Volumes processed: {len(volumes)}", + f"Volumes found: {len(volumes)}", + f"Skipped (already converted): {skipped_count}", f"Converted: {converted_count}", f"Failed: {failed_count}", ]
110-110: Missing return type annotation onprocess().The base class declares
process(self) -> ProcessingResult. Adding the annotation here keeps pyright happy and makes the contract explicit.- def process(self): + def process(self) -> ProcessingResult:(Also add
ProcessingResultto the import fromadit.core.processors.)
258-263: Unreachableexceptblock —series_numberis already parsed by_parse_int.At line 239,
series_numberis converted via_parse_int(...)which returnsint | None. Bothmf.series_number(aPositiveIntegerField) andseries_numberareint | None, so the comparison at line 260 cannot raiseTypeErrororValueError.Simplified version
if mf.series_number is not None: - try: - if series_number is None or mf.series_number != series_number: - continue - except (TypeError, ValueError): - continue + if series_number is None or mf.series_number != series_number: + continue
221-232: Redundant PACS queries:_study_has_institutionfetches all series, thenfind_seriesfetches them again.When
apply_institution_on_studyis set, line 222 queries all series to check institution names. Then line 232 queries the same series again for volume discovery. For studies with many series, this doubles the PACS load.Consider caching the series list from the institution check and reusing it:
Sketch
+ prefetched_series: list | None = None if mf.institution_name and mf.apply_institution_on_study: - if not self._study_has_institution(operator, study, mf.institution_name): + series_query = QueryDataset.create( + PatientID=study.PatientID, + StudyInstanceUID=study.StudyInstanceUID, + ) + series_query.dataset.InstitutionName = "" + prefetched_series = list(operator.find_series(series_query)) + if not any( + _dicom_match(mf.institution_name, s.get("InstitutionName")) + for s in prefetched_series + ): continue - series_query = QueryDataset.create(...) - series_list = list(operator.find_series(series_query)) + if prefetched_series is None: + series_query = QueryDataset.create( + PatientID=study.PatientID, + StudyInstanceUID=study.StudyInstanceUID, + ) + series_query.dataset.InstitutionName = "" + prefetched_series = list(operator.find_series(series_query)) + series_list = prefetched_series
110-196: Consider adding structured logging for key state transitions inprocess().The only logging is
logger.exceptionon failure. Addinglogger.infocalls for start of processing, volume discovery counts, and per-volume export/convert steps would significantly improve operational observability for these long-running tasks. As per coding guidelines, structured logging should be used for agent actions and state transitions (**/*.py).adit/mass_transfer/forms.py (1)
94-94:tasksattribute is declared but never populated.
tasks: list[MassTransferTask]on line 94 andself.tasks = []on line 128 suggest the form tracks created tasks, but_save_tasksusesbulk_createwithout storing the result. Either remove the attribute or populate it:Proposed fix
def _save_tasks(self, job: MassTransferJob) -> None: ... - MassTransferTask.objects.bulk_create(tasks) + self.tasks = MassTransferTask.objects.bulk_create(tasks)Also applies to: 128-128
AGENTS.md (1)
190-192: Consider expanding the troubleshooting section with version/diagnostic info.The single bullet is helpful but could be more actionable. Operators encountering conversion failures would benefit from knowing how to verify the installation and check output.
Suggested expansion
### Mass Transfer Conversion Failures - Ensure `dcm2niix` is installed and available on `PATH` inside the worker image +- Verify version: `dcm2niix --version` (expected: 1.0.20250506 or later) +- Check worker logs: `docker compose logs mass_transfer_worker`
| def _dicom_match(pattern: str, value: str | None) -> bool: | ||
| if not pattern: | ||
| return True | ||
| if value is None: | ||
| return False | ||
| regex = convert_to_python_regex(pattern) | ||
| return bool(regex.search(str(value))) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "convert_to_python_regex" --type=py -B2 -A5Repository: openradx/adit
Length of output: 4469
🏁 Script executed:
fd -t f "dicom_utils.py" --type=pyRepository: openradx/adit
Length of output: 226
🏁 Script executed:
sed -n '48,80p' adit/core/utils/dicom_utils.pyRepository: openradx/adit
Length of output: 1188
🏁 Script executed:
# Verify where _dicom_match is called in processors.py to understand impact
rg -n "_dicom_match" adit/mass_transfer/processors.py -B2 -A2Repository: openradx/adit
Length of output: 1108
🏁 Script executed:
# Check if there are any tests that clarify the intended behavior
fd -name "*test*" -type f | xargs rg -l "_dicom_match\|convert_to_python_regex" 2>/dev/null | head -5Repository: openradx/adit
Length of output: 287
🏁 Script executed:
# Search for test files containing convert_to_python_regex or _dicom_match
find . -name "*test*.py" -type f -exec rg -l "convert_to_python_regex\|_dicom_match" {} \;Repository: openradx/adit
Length of output: 39
Use regex.fullmatch() instead of regex.search() in _dicom_match.
convert_to_python_regex produces an unanchored pattern (it only escapes, replaces \* with .*, and \? with . without adding ^ or $ anchors). This causes search() to match substrings unintentionally—e.g., pattern "Hospital" matches "General Hospital". DICOM wildcard matching per the standard requires matching the full value. This affects filtering of study_description (line 216), series_description (line 253), and institution_name (lines 244, 356).
Proposed fix
def _dicom_match(pattern: str, value: str | None) -> bool:
if not pattern:
return True
if value is None:
return False
regex = convert_to_python_regex(pattern)
- return bool(regex.search(str(value)))
+ return bool(regex.fullmatch(str(value)))🤖 Prompt for AI Agents
In `@adit/mass_transfer/processors.py` around lines 30 - 36, The _dicom_match
implementation uses regex.search which allows substring matches; change it to
use regex.fullmatch so the converted unanchored pattern (from
convert_to_python_regex) must match the entire value. Update the function
_dicom_match to call regex.fullmatch(str(value)) instead of regex.search(...);
ensure callers that filter study_description, series_description, and
institution_name continue to use _dicom_match unchanged.
|
closes #289 |
Comprehensive Code Review: PR #298 - Mass Transfer AppI've completed a thorough review of this PR which adds a Mass Transfer app with partitioned DICOM export and conversion. Overall, this is a solid implementation with good structure and adherence to project conventions. Here's my detailed feedback: 🔴 Critical Issues (Must Fix)1. Security: Filter Ownership VulnerabilityLocation: owner = models.ForeignKey(
settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
related_name="mass_transfer_filters",
null=True, # ⚠️ SECURITY ISSUE
blank=True,
)Problem: Allowing Recommendation: Remove 2. Race Condition in Volume CreationLocation: volume, created = MassTransferVolume.objects.get_or_create(
job=job,
series_instance_uid=series_uid,
defaults={...},
)
if not created:
volume.partition_key = self.mass_task.partition_key
# ... 13 fields updated
volume.save()Problem: The Recommendation: Use 3. Potential Infinite RecursionLocation: Problem: No recursion depth limit. If a PACS returns more than Recommendation: Add depth tracking parameter with a maximum limit (e.g., 20). 4. Missing dcm2niix ValidationLocation: Problem: The code assumes 🟡 High Priority Issues (Should Fix)5. Missing Database IndexesLocation: Problem: No explicit index on Recommendation: Add indexes on 6. Inconsistent Field TypesLocation: Problem: Recommendation: Use 7. Insecure Directory PermissionsLocation: Problem: Uses default permissions which could create world-readable directories containing PHI if umask is permissive. Recommendation: Set explicit permissions with 8. Queue Strategy ConfusionLocation: Question: The PR description mentions a separate mass transfer worker, but the task uses the same 🟢 Medium Priority Issues (Nice to Have)9. Cleanup Logic LocationLocation: Issue: Mass transfer-specific cleanup logic is in the core app, breaking modularity. Recommendation: Consider a callback/hook pattern to keep mass_transfer logic in its own app. 10. Error HandlingLocation: Suggestion: Log 11. LoggingSuggestion: Add more INFO-level logging for normal operations to aid debugging. 🧪 Test Coverage GapsMissing Tests:
✅ Strengths
📊 Overall AssessmentCode Quality: 8/10 - Well-structured, follows Django conventions Verdict: This is a solid PR that adds significant functionality. With the critical issues addressed (especially the filter ownership security issue and race condition), this would be ready to merge. The code quality is high and follows project conventions well. 🎯 Recommendations SummaryCritical (Must Fix): Items 1-4 Great work on this feature! The architecture is sound and the code is well-organized. Looking forward to seeing the critical fixes applied. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@adit/core/tasks.py`:
- Around line 66-102: The cleanup currently unconditionally wipes partial
exports in _cleanup_mass_transfer_exports, which prevents reuse on retries;
update _cleanup_mass_transfer_exports (the function handling DicomTask) to first
check dicom_task.status and return early when dicom_task.status ==
DicomTask.Status.PENDING so volumes that reached EXPORTED with an
exported_folder are not cleared; keep the existing partition_key and import
checks, and only perform the existing removal/ERROR-setting logic for
non-PENDING tasks.
In `@adit/mass_transfer/forms.py`:
- Line 96: self.tasks is typed on the class but never populated because
_save_tasks builds a local tasks list and bulk-creates MassTransferTask objects
without assigning them back to the instance; update the _save_tasks method (the
function named _save_tasks that bulk-creates MassTransferTask) to either remove
the unused self.tasks attribute or, preferably, set self.tasks to the list of
created task instances (or their dicts/IDs) after bulk_create so callers can
read form.tasks; ensure you reference the MassTransferTask creation logic and
assign the result to self.tasks immediately after bulk_create (or drop the
attribute and remove all reads of self.tasks).
adit/core/tasks.py
Outdated
| def _cleanup_mass_transfer_exports(dicom_task: DicomTask) -> None: | ||
| if dicom_task._meta.app_label != "mass_transfer": | ||
| return | ||
|
|
||
| partition_key = getattr(dicom_task, "partition_key", None) | ||
| if not partition_key: | ||
| return | ||
|
|
||
| try: | ||
| from adit.mass_transfer.models import MassTransferVolume | ||
| except Exception: | ||
| return | ||
|
|
||
| volumes = MassTransferVolume.objects.filter( | ||
| job_id=dicom_task.job_id, | ||
| partition_key=partition_key, | ||
| ).exclude(exported_folder="") | ||
|
|
||
| for volume in volumes: | ||
| if volume.status == MassTransferVolume.Status.CONVERTED: | ||
| continue | ||
|
|
||
| export_folder = volume.exported_folder | ||
| if export_folder: | ||
| try: | ||
| shutil.rmtree(Path(export_folder)) | ||
| except FileNotFoundError: | ||
| pass | ||
| except Exception as err: | ||
| volume.add_log(f"Cleanup failed: {err}") | ||
| volume.save() | ||
| continue | ||
|
|
||
| volume.exported_folder = "" | ||
| volume.status = MassTransferVolume.Status.ERROR | ||
| volume.add_log("Export cleaned up after task failure.") | ||
| volume.save() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find the file and check the call site at line 178
fd tasks.py --type f | head -5Repository: openradx/adit
Length of output: 292
🏁 Script executed:
# Check the exact line where _cleanup_mass_transfer_exports is called
rg -n "_cleanup_mass_transfer_exports" --type=py | head -20Repository: openradx/adit
Length of output: 665
🏁 Script executed:
# Now run the verification script from the comment
rg -n "_find_volumes\|EXPORTED\|CONVERTED\|status.*PENDING" --type=py adit/mass_transfer/processors.py | head -40Repository: openradx/adit
Length of output: 39
🏁 Script executed:
# Check if adit/mass_transfer/processors.py exists and list mass_transfer structure
fd processors.py --path "*/mass_transfer/*" --type fRepository: openradx/adit
Length of output: 272
🏁 Script executed:
# Read the context around the cleanup calls in adit/core/tasks.py
sed -n '140,210p' adit/core/tasks.pyRepository: openradx/adit
Length of output: 2828
🏁 Script executed:
# Also check the test file to understand the intended behavior
cat -n adit/mass_transfer/tests/test_cleanup.pyRepository: openradx/adit
Length of output: 2451
🏁 Script executed:
# Check if processors.py exists in mass_transfer
find adit/mass_transfer -name "*.py" -type f | head -20Repository: openradx/adit
Length of output: 830
🏁 Script executed:
# Look for how processors handle volumes and whether they reuse exports
rg -n "EXPORTED|CONVERTED" --type=py adit/mass_transfer/ | head -30Repository: openradx/adit
Length of output: 863
🏁 Script executed:
# Check the cleanup function again to understand the CONVERTED status check
sed -n '66,102p' adit/core/tasks.pyRepository: openradx/adit
Length of output: 1207
🏁 Script executed:
# Examine processors.py to understand how EXPORTED volumes are reused on retry
sed -n '360,415p' adit/mass_transfer/processors.pyRepository: openradx/adit
Length of output: 1883
🏁 Script executed:
# Also check the broader context to understand the flow
sed -n '150,170p' adit/mass_transfer/processors.pyRepository: openradx/adit
Length of output: 985
🏁 Script executed:
# Check what happens with EXPORTED volumes - do they get reprocessed on retry?
sed -n '395,420p' adit/mass_transfer/processors.pyRepository: openradx/adit
Length of output: 866
Cleanup discards partial exports on retriable failures, forcing redundant PACS re-exports on retry.
_cleanup_mass_transfer_exports is called at line 178 in the RetriableDicomError handler regardless of whether the task will be retried. This resets volumes not in CONVERTED status to ERROR and removes their export folders.
However, the processor's _export_volume method (line 368 in processors.py) explicitly reuses exports: if volume.status == MassTransferVolume.Status.EXPORTED and volume.exported_folder: return. On retry, volumes that reached EXPORTED status should be reusable without re-fetching from PACS. Deleting exported_folder forces the next retry to re-fetch and re-export the same series, causing redundant PACS round-trips.
Add a condition to skip cleanup when dicom_task.status == DicomTask.Status.PENDING to allow partial progress to be reused across retry attempts.
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 76-76: Do not catch blind exception: Exception
(BLE001)
[warning] 94-94: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@adit/core/tasks.py` around lines 66 - 102, The cleanup currently
unconditionally wipes partial exports in _cleanup_mass_transfer_exports, which
prevents reuse on retries; update _cleanup_mass_transfer_exports (the function
handling DicomTask) to first check dicom_task.status and return early when
dicom_task.status == DicomTask.Status.PENDING so volumes that reached EXPORTED
with an exported_folder are not cleared; keep the existing partition_key and
import checks, and only perform the existing removal/ERROR-setting logic for
non-PENDING tasks.
| ), | ||
| ) | ||
|
|
||
| tasks: list[MassTransferTask] |
There was a problem hiding this comment.
self.tasks is declared but never populated.
self.tasks is initialized to [] at line 130 and typed at line 96, but _save_tasks creates a separate local tasks list (line 230) and bulk-creates from it without ever assigning back to self.tasks. If any caller reads form.tasks after save, it will always be empty.
Either remove the attribute if it's unused, or assign after bulk-create:
Proposed fix
- MassTransferTask.objects.bulk_create(tasks)
+ self.tasks = MassTransferTask.objects.bulk_create(tasks)#!/bin/bash
# Check if form.tasks is accessed anywhere outside the form class
rg -n "form\.tasks" --type=py adit/mass_transfer/
rg -n "\.tasks" --type=py adit/mass_transfer/views.py | grep -v "import\|model\|#"Also applies to: 129-130, 223-242
🤖 Prompt for AI Agents
In `@adit/mass_transfer/forms.py` at line 96, self.tasks is typed on the class but
never populated because _save_tasks builds a local tasks list and bulk-creates
MassTransferTask objects without assigning them back to the instance; update the
_save_tasks method (the function named _save_tasks that bulk-creates
MassTransferTask) to either remove the unused self.tasks attribute or,
preferably, set self.tasks to the list of created task instances (or their
dicts/IDs) after bulk_create so callers can read form.tasks; ensure you
reference the MassTransferTask creation logic and assign the result to
self.tasks immediately after bulk_create (or drop the attribute and remove all
reads of self.tasks).
Code Review for PR #298: Add Mass Transfer appI have reviewed this substantial PR that introduces the Mass Transfer feature. Overall, the implementation is well-structured and follows existing patterns. Below are my findings: StrengthsArchitecture & Design
Code Quality
Issues & Concerns1. Security: Path Traversal Vulnerability (CRITICAL)Location: processors.py:73-77 The code creates directories using user-controlled data without proper validation: def _destination_base_dir(node: DicomNode) -> Path:
assert node.node_type == DicomNode.NodeType.FOLDER
path = Path(node.dicomfolder.path) # User-controlled via database
path.mkdir(parents=True, exist_ok=True) # POTENTIAL RISK
return pathRecommendation:
2. Command Injection Risk (HIGH)Location: processors.py:427-438 While subprocess.run with a list is safer than shell=True, the series_name is derived from user-controlled DICOM data via sanitize_filename(). Recommendation:
3. Resource Exhaustion (MEDIUM)Location: processors.py:329-338 The recursive splitting algorithm has no maximum recursion depth limit and could hit Python recursion limit on highly populated PACS servers. Recommendation:
4. Data Integrity: Missing Transaction (MEDIUM)Location: processors.py:144-173 The main processing loop modifies multiple MassTransferVolume objects but does not use database transactions. Recommendation:
5. Cleanup Race Condition (LOW-MEDIUM)Location: processors.py:450-465 The (cleaned) suffix approach is fragile and there is a potential race condition. Recommendation:
6. Missing Validation (MEDIUM)Location: forms.py:220-226 Missing validations for unreasonably large date ranges that could create thousands of tasks. Recommendation:
7. Test Coverage Gaps (MEDIUM)Tests are good but missing:
8. Duplicate Code (LOW)Location: processors.py:87-97 _volume_export_path and _volume_output_path are identical functions. Consolidate or add comment explaining why separate. Performance Considerations
SummaryThis is a solid implementation that integrates well with existing ADIT architecture. Major concerns:
The core logic is sound and the feature will be valuable for bulk DICOM operations. Once security issues are addressed, this will be ready to merge. Recommendation: Request changes for security issues, then approve after fixes. Great work on this substantial feature addition! |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.
In `@adit/mass_transfer/processors.py`:
- Around line 148-155: When job.pseudonymize is true and you generate a
pseudonym for a study (inside the volumes_by_study loop where pseudonym is
computed), immediately assign and persist that pseudonym to every Volume in
study_volumes (set volume.pseudonym and call the same persistence path used by
_export_volume or the model/save method) before invoking _export_volume for any
single volume; this ensures the generated pseudonym is stored atomically for the
whole study and avoids generating a new one on retry. Use the existing symbols
volumes_by_study, study_volumes, job.pseudonymize, volume.pseudonym and the
persistence logic currently used in _export_volume to perform the save.
- Around line 258-263: The try/except around comparing mf.series_number to
series_number is dead code because series_number comes from _parse_int which
already handles TypeError/ValueError; remove the try/except and replace with a
direct conditional: check if series_number is None or mf.series_number !=
series_number and continue accordingly (i.e., keep the existing logic inside the
if but without the try/except wrapper), referring to mf.series_number,
series_number and _parse_int to locate the code.
- Around line 59-64: _study_datetime currently returns a naive datetime which
forces callers to call timezone.make_aware; instead, make _study_datetime return
an aware datetime: in function _study_datetime, when combining study.StudyDate
and study.StudyTime preserve any existing tzinfo on study_time (or
study_time.tzinfo), and if the resulting datetime has no tzinfo, use
django.utils.timezone.get_current_timezone() (or
timezone.get_default_timezone()) and call timezone.make_aware on the combined dt
before returning; remove or stop relying on callers to call timezone.make_aware
(e.g., the code that currently wraps _study_datetime with timezone.make_aware
should no longer do so).
- Around line 182-196: The current final-status logic in the mass transfer
result block treats converted_count == 0 and failed_count == 0 as a generic
SUCCESS; change it to detect the "all skipped / nothing to do" case
(converted_count == 0 and failed_count == 0) and return a clearer outcome: set
status to MassTransferTask.Status.SUCCESS (or a new explicit status if your
domain supports it) but set message to "No new volumes to process" (and keep the
existing "log": "\n".join(log_lines)). Update the conditional branching around
converted_count and failed_count (the block that currently assigns
status/message) to check for this zero-zero case first and produce the distinct
message using the same variables (converted_count, failed_count, log_lines,
MassTransferTask.Status).
- Around line 269-304: The get_or_create call for MassTransferVolume duplicates
the same field assignments in the subsequent update block; extract the shared
field values into a single dict (e.g., build a "volume_values" dict before
calling MassTransferVolume.objects.get_or_create) and pass it as defaults, then
if not created update the model using that same dict (or simply switch to
MassTransferVolume.objects.update_or_create(job=job,
series_instance_uid=series_uid, defaults=volume_values) if you want
unconditional overwrites). Update references to the dict where you set
partition_key, patient_id, accession_number, study_instance_uid, modality,
study_description, series_description, series_number, study_datetime,
institution_name, number_of_images and ensure found_series[series_uid] still
receives the resulting Volume instance.
- Around line 370-410: The _export_volume routine calls operator.fetch_series
with a callback that writes every instance but lacks observability and timeout;
update the callback passed to operator.fetch_series to (1) maintain a local
counter and a last-progress timestamp, logging progress (using your logger)
every N images (e.g., every 100) including patient_id/study/series identifiers,
(2) update last-progress when an image is processed, and (3) enforce a stall
timeout by raising/propagating an exception or aborting the fetch if no progress
is seen for T seconds; ensure you reference the existing callback closure,
manipulator.manipulate, write_dataset, and volume identifiers so
exported_folder/volume.status handling remains correct and errors are surfaced.
In `@adit/mass_transfer/tests/test_processor.py`:
- Around line 238-250: The helper _make_processor currently patches
MassTransferTaskProcessor.__init__ and manually sets dicom_task and mass_task,
which is fragile if DicomTaskProcessor.__init__ later initializes more
attributes; update _make_processor to document and set all required instance
attributes explicitly (at minimum dicom_task and mass_task) and add a short
comment listing these required attributes and why __init__ is bypassed, or
alternatively construct the processor via its real __init__ with a lightweight
in-memory/fixture task; reference the _make_processor function,
MassTransferTaskProcessor, DicomTaskProcessor.__init__, and the
dicom_task/mass_task attributes when making the change.
- Around line 350-369: Remove the no-op unpack and the unused `right_end` by
deleting the line "_, _ = call_ranges[0]" and change the third unpack to discard
the unused end value (e.g., unpack `call_ranges[2]` as `right_start, _`) so only
`left_start, left_end` and `right_start` are assigned; this targets the test in
test_processor.py that patches MassTransferTaskProcessor._find_studies and uses
the `call_ranges` list.
🧹 Nitpick comments (8)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed. In `@adit/mass_transfer/processors.py`: - Around line 258-263: The try/except around comparing mf.series_number to series_number is dead code because series_number comes from _parse_int which already handles TypeError/ValueError; remove the try/except and replace with a direct conditional: check if series_number is None or mf.series_number != series_number and continue accordingly (i.e., keep the existing logic inside the if but without the try/except wrapper), referring to mf.series_number, series_number and _parse_int to locate the code. - Around line 59-64: _study_datetime currently returns a naive datetime which forces callers to call timezone.make_aware; instead, make _study_datetime return an aware datetime: in function _study_datetime, when combining study.StudyDate and study.StudyTime preserve any existing tzinfo on study_time (or study_time.tzinfo), and if the resulting datetime has no tzinfo, use django.utils.timezone.get_current_timezone() (or timezone.get_default_timezone()) and call timezone.make_aware on the combined dt before returning; remove or stop relying on callers to call timezone.make_aware (e.g., the code that currently wraps _study_datetime with timezone.make_aware should no longer do so). - Around line 182-196: The current final-status logic in the mass transfer result block treats converted_count == 0 and failed_count == 0 as a generic SUCCESS; change it to detect the "all skipped / nothing to do" case (converted_count == 0 and failed_count == 0) and return a clearer outcome: set status to MassTransferTask.Status.SUCCESS (or a new explicit status if your domain supports it) but set message to "No new volumes to process" (and keep the existing "log": "\n".join(log_lines)). Update the conditional branching around converted_count and failed_count (the block that currently assigns status/message) to check for this zero-zero case first and produce the distinct message using the same variables (converted_count, failed_count, log_lines, MassTransferTask.Status). - Around line 269-304: The get_or_create call for MassTransferVolume duplicates the same field assignments in the subsequent update block; extract the shared field values into a single dict (e.g., build a "volume_values" dict before calling MassTransferVolume.objects.get_or_create) and pass it as defaults, then if not created update the model using that same dict (or simply switch to MassTransferVolume.objects.update_or_create(job=job, series_instance_uid=series_uid, defaults=volume_values) if you want unconditional overwrites). Update references to the dict where you set partition_key, patient_id, accession_number, study_instance_uid, modality, study_description, series_description, series_number, study_datetime, institution_name, number_of_images and ensure found_series[series_uid] still receives the resulting Volume instance. - Around line 370-410: The _export_volume routine calls operator.fetch_series with a callback that writes every instance but lacks observability and timeout; update the callback passed to operator.fetch_series to (1) maintain a local counter and a last-progress timestamp, logging progress (using your logger) every N images (e.g., every 100) including patient_id/study/series identifiers, (2) update last-progress when an image is processed, and (3) enforce a stall timeout by raising/propagating an exception or aborting the fetch if no progress is seen for T seconds; ensure you reference the existing callback closure, manipulator.manipulate, write_dataset, and volume identifiers so exported_folder/volume.status handling remains correct and errors are surfaced. In `@adit/mass_transfer/tests/test_processor.py`: - Around line 238-250: The helper _make_processor currently patches MassTransferTaskProcessor.__init__ and manually sets dicom_task and mass_task, which is fragile if DicomTaskProcessor.__init__ later initializes more attributes; update _make_processor to document and set all required instance attributes explicitly (at minimum dicom_task and mass_task) and add a short comment listing these required attributes and why __init__ is bypassed, or alternatively construct the processor via its real __init__ with a lightweight in-memory/fixture task; reference the _make_processor function, MassTransferTaskProcessor, DicomTaskProcessor.__init__, and the dicom_task/mass_task attributes when making the change. - Around line 350-369: Remove the no-op unpack and the unused `right_end` by deleting the line "_, _ = call_ranges[0]" and change the third unpack to discard the unused end value (e.g., unpack `call_ranges[2]` as `right_start, _`) so only `left_start, left_end` and `right_start` are assigned; this targets the test in test_processor.py that patches MassTransferTaskProcessor._find_studies and uses the `call_ranges` list.adit/mass_transfer/processors.py (6)
258-263: Unreachableexceptclause —_parse_intalready handlesTypeError/ValueError.
series_numberis returned by_parse_inton line 239, which catchesTypeErrorandValueErrorinternally and returnsdefault. Somf.series_number != series_number(bothint | None) will never raise those exceptions. The try/except here is dead code.♻️ Simplified
if mf.series_number is not None: - try: - if series_number is None or mf.series_number != series_number: - continue - except (TypeError, ValueError): - continue + if series_number is None or mf.series_number != series_number: + continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@adit/mass_transfer/processors.py` around lines 258 - 263, The try/except around comparing mf.series_number to series_number is dead code because series_number comes from _parse_int which already handles TypeError/ValueError; remove the try/except and replace with a direct conditional: check if series_number is None or mf.series_number != series_number and continue accordingly (i.e., keep the existing logic inside the if but without the try/except wrapper), referring to mf.series_number, series_number and _parse_int to locate the code.
59-64:_study_datetimereturns a naive datetime; callers usetimezone.make_aware.Line 64 returns a naive
datetimeviadatetime.combine, and line 281 wraps the result intimezone.make_aware(). This works, but ifstudy.StudyDateorstudy.StudyTimealready carries timezone info (unlikely for DICOM but possible with extended negotiation),make_awarewould raiseValueError. More importantly, this naive-to-aware conversion is easy to forget for new callers. Consider making the function timezone-aware internally.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@adit/mass_transfer/processors.py` around lines 59 - 64, _study_datetime currently returns a naive datetime which forces callers to call timezone.make_aware; instead, make _study_datetime return an aware datetime: in function _study_datetime, when combining study.StudyDate and study.StudyTime preserve any existing tzinfo on study_time (or study_time.tzinfo), and if the resulting datetime has no tzinfo, use django.utils.timezone.get_current_timezone() (or timezone.get_default_timezone()) and call timezone.make_aware on the combined dt before returning; remove or stop relying on callers to call timezone.make_aware (e.g., the code that currently wraps _study_datetime with timezone.make_aware should no longer do so).
182-196: All-skipped scenario returnsSUCCESS— verify this is intended.When every volume has
status == CONVERTED(line 158-159), bothconverted_countandfailed_countremain 0, so the else branch at line 188 returnsSUCCESS. The log will show "Converted: 0, Failed: 0" which is technically correct for an idempotent re-run, but could be surprising. Consider whether a distinct message like "No new volumes to process" would be more informative.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@adit/mass_transfer/processors.py` around lines 182 - 196, The current final-status logic in the mass transfer result block treats converted_count == 0 and failed_count == 0 as a generic SUCCESS; change it to detect the "all skipped / nothing to do" case (converted_count == 0 and failed_count == 0) and return a clearer outcome: set status to MassTransferTask.Status.SUCCESS (or a new explicit status if your domain supports it) but set message to "No new volumes to process" (and keep the existing "log": "\n".join(log_lines)). Update the conditional branching around converted_count and failed_count (the block that currently assigns status/message) to check for this zero-zero case first and produce the distinct message using the same variables (converted_count, failed_count, log_lines, MassTransferTask.Status).
269-304: Large duplication betweenget_or_createdefaults and the update block.The field assignments on lines 289–302 mirror the
defaultsdict on lines 272–286 almost identically. If a field is added or renamed, both must be updated in lockstep, which is error-prone.Consider extracting a helper dict or using
update_or_create:♻️ Proposed refactor using update_or_create
+ volume_defaults = { + "partition_key": self.mass_task.partition_key, + "patient_id": str(study.PatientID), + "accession_number": str(study.get("AccessionNumber", "")), + "study_instance_uid": str(study.StudyInstanceUID), + "modality": str(series.Modality), + "study_description": str(study.get("StudyDescription", "")), + "series_description": str(series.get("SeriesDescription", "")), + "series_number": series_number, + "study_datetime": timezone.make_aware(study_dt), + "institution_name": str(series.get("InstitutionName", "")), + "number_of_images": _parse_int( + series.get("NumberOfSeriesRelatedInstances"), default=0 + ), + } + volume, _ = MassTransferVolume.objects.update_or_create( + job=job, + series_instance_uid=series_uid, + defaults=volume_defaults, )Note:
update_or_createwill overwrite fields on existing records unconditionally; if you need to preserve some fields (e.g.,status,pseudonym) on existing volumes, keep the current pattern but extract the shared dict to avoid the duplication.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@adit/mass_transfer/processors.py` around lines 269 - 304, The get_or_create call for MassTransferVolume duplicates the same field assignments in the subsequent update block; extract the shared field values into a single dict (e.g., build a "volume_values" dict before calling MassTransferVolume.objects.get_or_create) and pass it as defaults, then if not created update the model using that same dict (or simply switch to MassTransferVolume.objects.update_or_create(job=job, series_instance_uid=series_uid, defaults=volume_values) if you want unconditional overwrites). Update references to the dict where you set partition_key, patient_id, accession_number, study_instance_uid, modality, study_description, series_description, series_number, study_datetime, institution_name, number_of_images and ensure found_series[series_uid] still receives the resulting Volume instance.
370-410:_export_volumefetches all instances but has no progress tracking or timeout.For large series (thousands of images),
operator.fetch_seriescould run for a very long time. There's no timeout, no progress logging, and no way to detect a stalled PACS connection. Consider adding at least a periodic log statement inside the callback (e.g., every N images) for observability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@adit/mass_transfer/processors.py` around lines 370 - 410, The _export_volume routine calls operator.fetch_series with a callback that writes every instance but lacks observability and timeout; update the callback passed to operator.fetch_series to (1) maintain a local counter and a last-progress timestamp, logging progress (using your logger) every N images (e.g., every 100) including patient_id/study/series identifiers, (2) update last-progress when an image is processed, and (3) enforce a stall timeout by raising/propagating an exception or aborting the fetch if no progress is seen for T seconds; ensure you reference the existing callback closure, manipulator.manipulate, write_dataset, and volume identifiers so exported_folder/volume.status handling remains correct and errors are surfaced.
447-451:subprocess.runwith list argument from partially user-influenced data.Ruff flags S603 here. While
series_namegoes throughsanitize_filename, thevolume.exported_folderpath at line 444 is read from the database and could theoretically be manipulated. Using a list (notshell=True) mitigates shell injection, but consider validating thatvolume.exported_folderis a real path under the expected export base before passing it to a subprocess.adit/mass_transfer/tests/test_processor.py (2)
238-250:_make_processorbypasses__init__— fragile if internal attributes change.Patching
__init__and manually settingdicom_task/mass_taskwill silently break if the parent classDicomTaskProcessor.__init__starts initializing additional attributes. This is a pragmatic trade-off for avoiding DB setup, but consider adding a brief comment documenting which attributes are required so future maintainers know what to update.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@adit/mass_transfer/tests/test_processor.py` around lines 238 - 250, The helper _make_processor currently patches MassTransferTaskProcessor.__init__ and manually sets dicom_task and mass_task, which is fragile if DicomTaskProcessor.__init__ later initializes more attributes; update _make_processor to document and set all required instance attributes explicitly (at minimum dicom_task and mass_task) and add a short comment listing these required attributes and why __init__ is bypassed, or alternatively construct the processor via its real __init__ with a lightweight in-memory/fixture task; reference the _make_processor function, MassTransferTaskProcessor, DicomTaskProcessor.__init__, and the dicom_task/mass_task attributes when making the change.
350-369: Unused variableright_endand no-op unpacking on line 363.Line 363 unpacks
call_ranges[0]into two discarded variables — a bare comment or index access would be clearer. Line 365'sright_endis unused (Ruff RUF059).♻️ Minor cleanup
- _, _ = call_ranges[0] - left_start, left_end = call_ranges[1] - right_start, right_end = call_ranges[2] + left_start, left_end = call_ranges[1] + right_start, _ = call_ranges[2]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@adit/mass_transfer/tests/test_processor.py` around lines 350 - 369, Remove the no-op unpack and the unused `right_end` by deleting the line "_, _ = call_ranges[0]" and change the third unpack to discard the unused end value (e.g., unpack `call_ranges[2]` as `right_start, _`) so only `left_start, left_end` and `right_start` are assigned; this targets the test in test_processor.py that patches MassTransferTaskProcessor._find_studies and uses the `call_ranges` list.
adit/mass_transfer/processors.py
Outdated
| for _, study_volumes in volumes_by_study.items(): | ||
| pseudonym = "" | ||
| if job.pseudonymize: | ||
| existing_pseudonym = next( | ||
| (v.pseudonym for v in study_volumes if v.pseudonym), | ||
| None, | ||
| ) | ||
| pseudonym = existing_pseudonym or uuid.uuid4().hex |
There was a problem hiding this comment.
Pseudonym not persisted before _export_volume; re-run of a partially failed study may generate a new pseudonym.
When a study has two volumes and the first succeeds (pseudonym saved on volume.pseudonym inside _export_volume) but the second fails, a re-run will hit line 152 and recover the pseudonym from the first volume. However, the pseudonym is generated at line 155 and only persisted inside _export_volume (line 408). If the process crashes between line 155 and the first _export_volume call completing, the generated pseudonym is lost entirely and a new one will be created on retry, leading to inconsistent pseudonyms across volumes of the same study.
Consider persisting the pseudonym on all study volumes as soon as it is generated, before starting export.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@adit/mass_transfer/processors.py` around lines 148 - 155, When
job.pseudonymize is true and you generate a pseudonym for a study (inside the
volumes_by_study loop where pseudonym is computed), immediately assign and
persist that pseudonym to every Volume in study_volumes (set volume.pseudonym
and call the same persistence path used by _export_volume or the model/save
method) before invoking _export_volume for any single volume; this ensures the
generated pseudonym is stored atomically for the whole study and avoids
generating a new one on retry. Use the existing symbols volumes_by_study,
study_volumes, job.pseudonymize, volume.pseudonym and the persistence logic
currently used in _export_volume to perform the save.
Code Review: Mass Transfer FeatureThank you for this substantial contribution! The Mass Transfer feature brings valuable functionality from the volume-export-pipeline into ADIT. The code is generally well-structured and the latest commit fixing duplicate studies shows good problem-solving. However, I've identified several issues that should be addressed. Critical Issues (Must Fix Before Merge)1. N+1 Database Query ProblemLocation: adit/mass_transfer/processors.py:269-302 The _find_volumes method performs get_or_create inside nested loops, potentially causing thousands of individual database queries. For 10 filters × 100 studies × 10 series = 10,000 queries. This will cause severe performance degradation. Fix: Use bulk_create with ignore_conflicts=True and pre-fetch existing volumes into a dictionary. 2. No Maximum Recursion DepthLocation: adit/mass_transfer/processors.py:308-349 The _find_studies recursive splitting has no depth limit. A PACS returning thousands of studies from a single day could cause stack overflow. Fix: Add recursion depth tracking with a maximum depth limit (e.g., 10 levels). High Priority Issues3. Fragile Pseudonym ConsistencyLocation: adit/mass_transfer/processors.py:148-156, 387-410 If a volume fails before setting its pseudonym, subsequent volumes in the same study get different pseudonyms, breaking the per-study guarantee. Recommendation: Store pseudonym at the study level or generate once per study before processing any volumes. 4. Missing Test CoverageGaps: Filter matching edge cases, series number filtering with None values, error handling for dcm2niix failures, integration tests with actual DICOM queries. Good coverage for: Time window splitting, deduplication, cleanup, partitions. 5. No Date Range Validation in FormsLocation: adit/mass_transfer/forms.py:101-104, 220-226 Forms allow selecting arbitrary date ranges (e.g., 10 years), which could create thousands of tasks and overwhelm the system. Fix: Add maximum date range validation (e.g., 1 year) in the form's clean method. Medium Priority Issues6. Cleanup State Tracking Uses String SuffixLocation: adit/mass_transfer/processors.py:457-474 Uses string suffix matching which could fail if a folder legitimately ends with "(cleaned)". Recommendation: Add an explicit cleanup_completed boolean field. 7. No Volume Error RecoveryLocation: adit/mass_transfer/models.py:145-149 Once a volume enters ERROR state, there's no retry mechanism. 8. Unnecessary Exception HandlingLocation: adit/mass_transfer/processors.py:258-263 Exception handling appears unnecessary since both values are already integers or None. Minor Issues
PositivesGreat work on:
SummaryOverall Assessment: This PR introduces valuable functionality with generally good code quality. The duplicate studies fix demonstrates strong problem-solving. However, the N+1 query issue is critical and will cause severe performance problems with real-world datasets. Recommended Action: Address the two critical issues, then re-review. High-priority issues should also be addressed, but could potentially be handled in follow-up PRs if there's urgency. Let me know if you'd like help implementing any of these fixes! |
.github/workflows/ci.yml
Outdated
| adit_dev-web:latest | ||
| adit_dev-default_worker:latest | ||
| adit_dev-dicom_worker:latest | ||
| adit_dev-mass_transfer_worker:latest |
There was a problem hiding this comment.
It's still not using the dicom worker. I thought we talked about to use the dicom worker instead.
There was a problem hiding this comment.
Yeah, we switched to dicom worker. This line is a leftover from when I had implemented its own worker. I will remove it. But if you look at the compose file, there is only dicom_worker and default_worker. no mass transfer worker..
| MASS_TRANSFER_MAX_SEARCH_RESULTS = env.int("MASS_TRANSFER_MAX_SEARCH_RESULTS", default=200) | ||
|
|
||
| # Base directory for temporary DICOM exports in mass transfer | ||
| MASS_TRANSFER_EXPORT_BASE_DIR = env.str( |
There was a problem hiding this comment.
For what is this? Why don't we use the normal destination (just filtered to only allow directories).
There was a problem hiding this comment.
I did not want to use the same folder for the conversion (even if you do subdirectory) - cleanup in case of failure can be quite messy. Easy to make mistakes, delete wrong stuff, etc.
But yes, i get your point, having a separate env var just for a temp directory is also definitely overkill.
But here is the thing, if we do it with tempfile.mkdtemp(), we have a separate list of issues we have to take of - mkdtemp() generates a different random path. If worker dies before we clean up, we cannot clean up the old path! unless of course we record it, but that just adds more complexity.
Another drawback of using /tmp is that since we are on a container, and containers (will generally) write their tmp files to the host filesystem. Hence space can also be an issue ? (I saw that CT's with thin slices can be quite large - upto 3GB). In case Mass transfer is running on all three replicas, in worst case scenario you have a demand of 3 GB (for a thin slice CT) per DICOM worker, so that is 9 GB to 10 GB on the host filesystem .. Which is not a big deal but this requirement is hidden in /tmp usage .. not sure how much a faux pas that is ..
One "long term point" is that we might at some point want the processing to happen on a local path which has an SSD and the DicomFolder is probably a NAS or something.
Overall, I am not sure what to do. what do you think?
Code Review: Mass Transfer AppThis is a substantial addition to ADIT, introducing a well-structured mass transfer workflow. The overall architecture follows established project patterns and the code quality is generally good. Below are findings organized by severity. Bugs / Correctness Issues1. In 2. Potential race condition in In volume, created = MassTransferVolume.objects.get_or_create(
job=job,
series_instance_uid=series_uid,
defaults={...},
)
if not created:
# manually update all fields
volume.save()If two tasks run concurrently (e.g., overlapping partitions), both can call 3. volume.converted_file = str(output_path / f"{series_name}.nii.gz")
4. if not export_folder or export_folder.endswith(" (cleaned)"):
return
...
volume.exported_folder = f"{export_folder} (cleaned)"This is fragile — it embeds state in a path field. Use the existing 5. models.UniqueConstraint(
fields=["owner", "name"],
name="mass_transfer_filter_unique_owner_name",
)In PostgreSQL, Design / Architecture Concerns6.
7. Duplicate Procrastinate task functions with identical bodies def process_dicom_task(context, model_label, task_id):
_run_dicom_task(context, model_label, task_id)
def process_mass_transfer_task(context, model_label, task_id):
_run_dicom_task(context, model_label, task_id)These are identical in behavior but exist to route to different queues (valid Procrastinate pattern). Please add a comment explaining why two separate task functions exist, to avoid future confusion. 8. def _volume_export_path(base_dir, study_dt, subject_id, series_name) -> Path:
year_month = study_dt.strftime("%Y%m")
return base_dir / year_month / subject_id / series_name
def _volume_output_path(base_dir, study_dt, subject_id, series_name) -> Path:
year_month = study_dt.strftime("%Y%m")
return base_dir / year_month / subject_id / series_nameThese are byte-for-byte identical. Merge into one function. 9.
Code Quality / Style10. Five separate migrations for a new app The PR introduces 5 migrations (0001–0005) for a brand-new app with no production data. These should be squashed into a single 11. # In MassTransferJobForm.save():
job.convert_to_nifti = TrueIf mass transfer always converts to NIfTI, remove the field from this model or make it a class-level constant/property, rather than silently overriding it on every form save. 12.
Test Coverage Gaps13. No test for The 14. No test for The tests mock 15. No test for The per-user filter name uniqueness validation is untested. Minor Issues
SummaryThis is a solid first implementation that follows ADIT's established patterns well. The partitioning logic, pseudonymization grouping by study, and cleanup-on-failure handling are particularly thoughtful. The main items to address before merging:
|
Code Review — Mass Transfer AppThis is a substantial, well-structured addition. The architecture follows existing patterns (Job→Task→Processor, Procrastinate queues, shared admin/view/filter base classes) and the recursive time-window splitting for large DICOM result sets is a solid design. Below are the issues I found, ordered by severity. Critical1.
# processors.py — current
output_file = output_dir / f"{series_name}.nii.gz"
if not output_file.exists():
raise RuntimeError(...)
volume.converted_file = str(output_file)Replace with a glob to find whatever nii_files = list(output_dir.glob("*.nii.gz"))
if not nii_files:
raise RuntimeError("dcm2niix produced no .nii.gz output")
volume.converted_file = str(nii_files[0]) # or handle multiple2. Race condition in pseudonym assignment across partitions
Fix: Before processing any volumes, do a single DB-level pseudonym assignment per 3. Potential infinite recursion in The recursion guard is def _find_studies(self, ..., _depth: int = 0) -> list[Dataset]:
if _depth > 20:
raise RuntimeError("_find_studies exceeded maximum recursion depth")
...
left = self._find_studies(..., _depth=_depth + 1)
right = self._find_studies(..., _depth=_depth + 1)4.
def _cleanup_export(self, volume: MassTransferVolume) -> None:
if volume.export_cleaned:
return
shutil.rmtree(volume.exported_folder, ignore_errors=True)
volume.export_cleaned = True
volume.save(update_fields=["export_cleaned"])High Priority5. In self.save_tasks = self._save_tasksDjango's own pattern for this is to extend 6. # current
return bool(regex.search(pattern, value))DICOM wildcard ( dicom_pattern = re.sub(r"\*", ".*", re.sub(r"\?", ".", re.escape(pattern_raw)))
return bool(re.fullmatch(dicom_pattern, value, re.IGNORECASE))7. mid = start + (end - start) / 2
left = self._find_studies(operator, start, mid, ...)
right = self._find_studies(operator, mid, end, ...)Studies with 8. No disk space check before export
Medium Priority9. The comment in 10. Missing minimum-filter validation at form level
def clean_filters(self):
filters = self.cleaned_data.get("filters", [])
if not filters:
raise ValidationError("At least one filter is required.")
...11. Five separate migrations for a new app New apps should ship with a single squashed initial migration. Having five migration files for brand-new models is unnecessary churn. Run 12. job.convert_to_nifti = TrueThe field exists in the model and there is a UI checkbox for it, but the form's 13.
Minor / Nits14. study_time_end = time.max # time(23, 59, 59, 999999)DICOM time fields have no microsecond resolution. Use 15. Dead The 16. Identical Both return the same value. Consolidate into one 17. def __str__(self):
return self.name or "<unnamed filter>"
18. The helper constructs a Positive Notes
The most important issues to address before merging are #1 (broken NIfTI path), #2 (pseudonym race condition), #6 (incorrect regex semantics), and #7 (split boundary overlap). |
The recursive split used `mid` as the boundary for both halves, causing studies at the midpoint to appear in both. Additionally, since the DICOM query operates at date-level granularity, same-day splits produced identical queries returning the same results in both halves. Shift the right half to start at mid + 1s and deduplicate by StudyInstanceUID when merging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
80f228a to
f4925c2
Compare
This PR introduces a new Mass Transfer facility that brings the volume export workflow (https://github.com/openradx/volume-export-pipeline) into ADIT as a first‑class transfer type just like Selective Transfer or Batch transfer. Mass transfer has its own worker to keep long running low priority tasks isolated from regular selective transfer or batch transfer tasks.
Added a Nav bar item called Mass Transfer for transferring large amounts of reports.
Implemented features like mandatory DICOM files cleanup after NIfti conversion and per-study pseudonymization just like in the volume export pipeline.
Also added daily/weekly partitioning and reusable filter definitions so bulk exports can be segmented in time and re‑run consistently.
Users can share the filters with each other. We can apply group restrictions later ..
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation
Chores