Skip to content

Conversation

jeffe107
Copy link
Member

Hello,

I apologize for creating a new PR, I couldn't force the previous one to take the changes from dev branch. I include again my message, and in the next comment I will post your comments from the previous PR.

According to issue [#859 ] to produce the file that is used as input for BIgMAG I have included the following:

I didn't want to mess with the current state of the pipeline, so I included a subworkflow that handles the execution in this manner:

  • If BUSCO (default tool) is selected as QC tool, it will trigger CheckM2. If the user selects CheckM or CheckM2, BUSCO will be executed then.
  • It runs GUNC by default. It is aware that GUNC can be run within the subworkflow BIN_QC, so it adjusted to that situation.
  • It takes as input the global bin_summary file.
  • Two modules were added: concat_bigmag.nf and bigmag_summary.nf. The first performs the merge of the TSV files just like CSVTK_CONCAT. I had to create it because if I call the same module, it would overwrite the summary file in the GenomeBinning folder, but in essence they are the same module. The second one joins all of the files (bin_summary, alternative_tool and gunc_summary). bigmag_summary.nf executes the script bigmag_summary.py found in the folder bin.

PR checklist

  • [ ✓] This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • [ ✓] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • [ ✓] Usage Documentation in docs/usage.md is updated.
  • [ ✓] Output Documentation in docs/output.md is updated.
  • [ ✓] CHANGELOG.md is updated.
  • [ ✓] README.md is updated (including new tool citations and authors/contributors).

@jeffe107
Copy link
Member Author

@harper357 I think your PR #842 might be relevant for this

No pressure (just trying to work out priorities), but I was wondering how far off you think your PR would be?

If so we could maybe get that in first and it might make @jeffe107 's life easier?

What do you think? If you think it'll take you a while still then we can just proceed with Jeferyd's system here and later on once your PR is merged in change the logic

@jfy133 I am hoping to have it done this week as long as life stuff doesn't eat up all my time. I will try to look at BIgMAG to make sure I am not doing anything that will make it harder to add it.

@jfy133
Copy link
Member

jfy133 commented Sep 17, 2025

@nf-core-bot fix linting

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise

First pass for checking adherence to nf-core guidelines/the mag conventions (i.e., without testing), but it looks really good for a first contribution, thank you @jeffe107 👏

@@ -0,0 +1,55 @@
process CONCAT_BIGMAG {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question(if-minor)
Is this a duplicate of modules/nf-core/csvtk/concat? If so, you don't need to write it again, you can just import the module in your subworkflow and rename it with the as alias (see the subworkflows/local/assembly_shortread` for how we do that to import SPADES as METASPADES

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. It does exactly the same, the difference relies on that modules/nf-core/csvtk/concat takes binqc_tool as parameter to generate the output file. If I call the same module on bigmag subworkflow, it would overwrite the file generated during BIN_QC subworkflow. As a result, I created CONCAT_BIGMAG to rename the binqc_tool and create the file for the tool that bigmag subworkflow is executing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would it re-write the file? You mean in --outdir? If that's the case we can just change the resulting file's name using publishDir for an aliased CSVTK module in modules.config :)

Unless I misunderstand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, it would overwrite the file in --outdir. So, great, I agree with the solution you propose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other reviewers: this is generally redundant, but we will wait for @harper357 's PR before we simplify this subworkflow

@prototaxites
Copy link
Contributor

prototaxites commented Sep 19, 2025

Hi all, a couple of initial thoughts on a quick passs-through:

If BUSCO (default tool) is selected as QC tool, it will trigger CheckM2. If the user selects CheckM or CheckM2, BUSCO will be executed then.

It runs GUNC by default. It is aware that GUNC can be run within the subworkflow BIN_QC, so it adjusted to that situation.

I would prefer if we just transparently refused to run the subworkflow if the required tools are not enabled. We've gone to some effort to have most of the tools be configurable with flags, and I wouldn't want to pre-empt the user's choices in this matter. We should just print a warning if the BigMag subworkflow can't run.

Two modules were added: concat_bigmag.nf and bigmag_summary.nf. The first performs the merge of the TSV files just like CSVTK_CONCAT. I had to create it because if I call the same module, it would overwrite the summary file in the GenomeBinning folder, but in essence they are the same module. The second one joins all of the files (bin_summary, alternative_tool and gunc_summary). bigmag_summary.nf executes the script bigmag_summary.py found in the folder bin.

Unless I'm missing something, as James says, you can definitely just re-use CSVTK_CONCAT. You just import it:

include { CSVTK_CONCAT as CSVTK_CONCAT_BIGMAG } from '../../../modules/nf-core/csvtk/concat/main'

And then you can reference CSVTK_CONCAT_BIGMAG in the config (and it shouldn't pick up the config from other CSVTK_CONCAT processes in the pipeline).

@jfy133
Copy link
Member

jfy133 commented Oct 3, 2025

OK I'm back from the nf-core core team retreat and a conference, sorry for the delay and thanks for the patience @jeffe107 !

Looking through this PR again, I do think we should work off of @harper357 's PR. I've offered to take that over so we can get that in and have something easier for you to work with. So I'll focus on that next week and then we can re-jig this PR based on those changes :)

And I agree with @prototaxites that I (once Greg's PR is in), that we should just error on pipeline initialisation if BIgMAG is requested but the necessary QC tools are not activated.

@jeffe107
Copy link
Member Author

jeffe107 commented Oct 3, 2025

I agree with @jfy133 PR #842 can easily solve the logic to generate the file for BIgMAG. Let's wait then to solve that one to proceed with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants