Skip to content

feat: implement analysis step and refactor pipeline to data-staging pattern#35

Open
kenibrewer wants to merge 43 commits intodevfrom
ken-brewer/feat-analysis
Open

feat: implement analysis step and refactor pipeline to data-staging pattern#35
kenibrewer wants to merge 43 commits intodevfrom
ken-brewer/feat-analysis

Conversation

@kenibrewer
Copy link
Copy Markdown
Member

@kenibrewer kenibrewer commented Apr 1, 2026

Summary

Implement the CellProfiler analysis step and refactor the entire pipeline to use the data-staging pattern from nf-pooled-cellpainting. Also syncs with nf-core TEMPLATE v3.5.1.

Feature changes

  • Implement per-site CellProfiler analysis module with full segmentation + feature extraction
  • Refactor all CellProfiler modules to generate load_data.csv at runtime via Python bin/ scripts reading metadata JSON (replaces pre-generated CSV subworkflows)
  • Wire CYTOTABLE downstream of analysis for Parquet conversion
  • Assay development now runs in both modes (QC gate before analysis)
  • Add cellprofiler_assaydevelopment_site parameter for site selection
  • Add cellprofiler_mode enum validation (assay_development | analysis)
  • Wire up PIPELINE_COMPLETION subworkflow
  • Remove unused stop_after parameter
  • Add real .cppipe pipeline files for assay development and analysis
  • Complete publishDir configurations for all processes

Files changed by feature work

New files:

  • bin/generate_illumination_calc_csv.py — CSV generator for illumination correction (reads metadata JSON)
  • bin/generate_illumination_apply_csv.py — CSV generator for assay dev + analysis (paired orig/illum columns)
  • assets/cellprofiler/analysis.cppipe — CellProfiler analysis pipeline (from ErinWeisbart/nf-core-test-datasets)
  • assets/cellprofiler/assaydevelopment.cppipe — CellProfiler assay dev pipeline (from Add Assay Development to workflow #30)
  • AGENTS.md / CLAUDE.md — development instructions for AI coding assistants
  • modules/local/cellprofiler/analysis/tests/main.nf.test — stub test
  • modules/local/cellprofiler/assaydevelopment/tests/main.nf.test — stub test

Rewritten:

  • workflows/cellpainting.nf — inline channel grouping/joining, analysis + CYTOTABLE wiring, deterministic sorting
  • modules/local/cellprofiler/illuminationcorrection/main.nf — metadata JSON input, generates own CSV
  • modules/local/cellprofiler/assaydevelopment.nf — real CellProfiler command, data-staging pattern
  • modules/local/cellprofiler/analysis.nf — full rewrite from samtools stub to per-site analysis
  • modules/local/cytotable/main.nf — accepts directory input, added stub block
  • main.nf — pass new params, wire PIPELINE_COMPLETION, remove unused import

Updated:

  • conf/modules.config — publishDir for all CellProfiler steps and cytotable
  • conf/test.config — default to analysis mode
  • nextflow.config — add cellprofiler_assaydevelopment_site, remove stop_after
  • nextflow_schema.json — fix malformed defaults, add enum, add site param, remove stop_after
  • .nf-core.yml — template_strings ignore list
  • .gitignore — add docs/superpowers/
  • assets/samplesheet.csv — remove unused columns
  • modules/local/cellprofiler/illuminationcorrection/tests/main.nf.test — updated input signature
  • modules/local/cytotable/tests/main.nf.test — added stub test

Deleted:

  • subworkflows/local/cellprofiler_load_data_csv/ (replaced by inline channel ops + bin scripts)
  • subworkflows/local/cellprofiler_load_data_csv_with_illum/ (replaced by inline channel ops + bin scripts)

Files changed by nf-core TEMPLATE sync (v3.3.2 → v3.5.1)

  • .devcontainer/devcontainer.json, .devcontainer/setup.sh
  • .github/actions/nf-test/action.yml
  • .github/workflows/ (awsfulltest, awstest, clean-up, download_pipeline, fix_linting, linting, linting_comment, nf-test, release-announcements, template-version-comment)
  • .gitpod.yml (removed)
  • .pre-commit-config.yaml, .prettierignore
  • README.md, docs/usage.md
  • modules.json, ro-crate-metadata.json
  • modules/nf-core/multiqc/ (environment.yml, main.nf, tests/)
  • subworkflows/nf-core/ (utils_nextflow_pipeline, utils_nfcore_pipeline, utils_nfschema_plugin)
  • subworkflows/local/utils_nfcore_cellpainting_pipeline/main.nf (added help params)
  • tests/.nftignore, tests/default.nf.test

Closes

Test plan

  • nf-test passes in stub mode
  • Full pipeline run with real test data succeeds (24/24 tasks)
  • Illumination correction module test passes (real + stub)
  • Assay development module test passes (stub)
  • Analysis module test passes (stub)
  • CYTOTABLE module test passes (real + stub)
  • All CellProfiler steps produce expected outputs
  • CYTOTABLE produces valid Parquet files (5-14MB each)
  • MultiQC report generated successfully
  • CI tests pass

🤖 Generated with Claude Code

nf-core-bot and others added 28 commits October 16, 2025 13:38
Design for implementing the CellProfiler analysis step while refactoring
the pipeline to use the data-staging pattern from nf-pooled-cellpainting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
12-task plan covering full pipeline refactor to data-staging pattern
plus new CellProfiler analysis module.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add stub mode to nf-test
- Add stub block to CYTOTABLE module
- Regenerate snapshots for new process structure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ysis

Assay development now always runs (as a QC step), not just in
assay_development mode. Analysis + CYTOTABLE only run when
cellprofiler_mode is 'analysis'. Test profile updated accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Strip columns not defined in schema_input.json to eliminate
nf-schema validation warnings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- assaydevelopment.cppipe from #30
- analysis.cppipe from cpg0000-jump-pilot (CPJUMP1_analysis_without_batchfile)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allows --image-directory CLI flag to override the image path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Must match the -g flag passed on the CellProfiler CLI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
….cppipe

Match channel names from our samplesheet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adapted Base image location to work with --image-directory CLI flag.
Channel names already match our samplesheet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bump to 8 CPUs, 24GB memory to accommodate CellProfiler analysis.
Note: Docker Desktop must also be configured with sufficient memory
(recommend >= 16GB in Settings > Resources).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MultiQC module updated to v1.33 with single tuple input.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
groupTuple() doesn't guarantee ordering, so the metadata JSON
in the script block could vary between runs, breaking -resume
caching. Sort by filename at the Nextflow level (for cache key
stability) and in Python scripts (belt-and-suspenders).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nf-core-bot
Copy link
Copy Markdown
Member

nf-core-bot commented Apr 1, 2026

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 1, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 6ad4884

+| ✅ 204 tests passed       |+
#| ❔  11 tests were ignored |#
#| ❔   1 tests had warnings |#
!| ❗  29 tests had warnings |!
Details

❗ Test warnings:

  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in README.md: TODO nf-core:
  • pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
  • pipeline_todos - TODO string in README.md: Fill in short bullet-pointed list of the default steps in the pipeline
  • pipeline_todos - TODO string in README.md: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
  • pipeline_todos - TODO string in README.md: update the following command to include all required parameters for a minimal example
  • pipeline_todos - TODO string in README.md: If applicable, make list of people who have also contributed
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
  • pipeline_todos - TODO string in nextflow.config: Specify your pipeline's command line flags
  • pipeline_todos - TODO string in nextflow.config: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
  • pipeline_todos - TODO string in nextflow.config: Update the field with the details of the contributors to your pipeline. New with Nextflow version 24.10.0
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
  • pipeline_todos - TODO string in test.config: Specify the paths to your test data on nf-core/test-datasets
  • pipeline_todos - TODO string in test.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here
  • pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
  • pipeline_todos - TODO string in output.md: Write this documentation describing your workflow's output
  • schema_description - No description provided in schema for parameter: cellprofiler_illumination_cppipe
  • schema_description - No description provided in schema for parameter: cellprofiler_analysis_cppipe
  • schema_description - No description provided in schema for parameter: cellprofiler_assaydevelopment_cppipe

❔ Tests ignored:

  • files_exist - File is ignored: conf/igenomes.config
  • files_exist - File is ignored: conf/igenomes_ignored.config
  • files_exist - File is ignored: conf/igenomes.config
  • files_exist - File is ignored: conf/igenomes_ignored.config
  • nextflow_config - Config default ignored: params.cellprofiler_illumination_cppipe
  • nextflow_config - Config default ignored: params.cellprofiler_analysis_cppipe
  • nextflow_config - Config default ignored: params.cellprofiler_assaydevelopment_cppipe
  • template_strings - Ignoring Jinja template strings in file /home/runner/work/cellpainting/cellpainting/AGENTS.md
  • template_strings - Ignoring Jinja template strings in file /home/runner/work/cellpainting/cellpainting/CLAUDE.md
  • template_strings - Ignoring Jinja template strings in file /home/runner/work/cellpainting/cellpainting/assets/cellprofiler/illumination.cppipe.jinja
  • template_strings - Ignoring Jinja template strings in file /home/runner/work/cellpainting/cellpainting/modules/local/cellprofiler/illuminationcorrection/main.nf

❔ Tests fixed:

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.1
  • Run at 2026-04-02 00:05:53

kenibrewer and others added 15 commits April 1, 2026 10:33
…rflow

Groovy's JsonOutput.toJson causes StackOverflowError on Nextflow meta
maps in CI (smaller JVM stack). Extract only needed fields into plain
maps before serializing. Also update illumination correction module
test for new input signature and regenerate snapshots.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract sortGroupedImages closure to eliminate 3x duplicated sort block
- Filter assay dev images to target site BEFORE groupTuple (avoids
  grouping all sites only to discard them)
- Restore ch_multiqc_custom_config in MultiQC input (was silently dropped)
- Single-pass channel collection in generate_illumination_apply_csv.py
- Narrow glob pattern for .npy files (*_Illum*.npy)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace f-string escaped braces in error message to avoid false
  positive Jinja detection
- Add design/plan docs to template_strings ignore list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Design specs and plans are local development artifacts,
not pipeline deliverables.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All CellProfiler modules must report the real version even in
stub mode for proper version tracking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep both docs/superpowers/ and .superset/ entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge origin/TEMPLATE into feature branch. Resolves conflicts:
- .nf-core.yml: update to v3.5.1, keep our lint config
- modules/nf-core/multiqc: take template version (v1.32, old input sig)
- subworkflows/local/utils_nfcore_cellpainting_pipeline: take template
  version with help params, keep our samplesheet parsing
- Revert MULTIQC call to simple positional arguments
- modules.json, ro-crate-metadata.json: take template versions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contains commands, architecture overview, key conventions, and
template sync instructions for AI coding assistants.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Real run md5 unchanged. Stub md5 changed due to empty touch vs echo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix nextflow_schema.json: correct ${projectDir] to ${projectDir},
  fix analysis_cppipe path, add .jinja extension to illumination default
- Add enum validation for cellprofiler_mode (assay_development|analysis)
- Remove stop_after param (declared but never used)
- Remove conda directives from assaydevelopment.nf and analysis.nf
  (no environment.yml in moduleDir)
- Remove unused CYTOTABLE import from main.nf
- Wire up PIPELINE_COMPLETION in main.nf
- Complete publishDir configs for all CellProfiler steps
- Add module-level stub tests for assaydevelopment, analysis, cytotable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add AGENTS.md and CLAUDE.md to template_strings ignore list
- Sync multiqc module to nf-core/modules@dfa7f1a (v1.32, matches
  modules.json tracking)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore multiqc module files to match af27af1be706 from nf-core/modules,
which is what the TEMPLATE branch's modules.json tracks. Remove extra
files (conda-lock, custom_prefix.config, tags.yml) that don't exist
at that sha.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread nextflow_schema.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CytoTable cellprofiler_analysis cellprofiler_assaydevelopment

3 participants