Skip to content

ADR: module tool arguments#6770

Draft
pditommaso wants to merge 1 commit intomasterfrom
260128-tools-arguments
Draft

ADR: module tool arguments#6770
pditommaso wants to merge 1 commit intomasterfrom
260128-tools-arguments

Conversation

@pditommaso
Copy link
Member

Summary

Introduces ADR for typed tool arguments in module definitions, replacing the convoluted task.ext.args pattern.

Problem: Current ext.args approach uses opaque strings with no validation, documentation, or IDE support.

Solution: Extend tools section in meta.yaml with typed args component:

tools:
  - bwa:
      args:
        K:
          type: integer
          prefix: '-'
        Y:
          type: boolean
          prefix: '-'

Key features:

  • Argument attributes: type, enum, prefix, default, description
  • Script usage: ${tools.bwa.args} for all args, ${tools.bwa.args.K} for single
  • Config: tools.bwa.args.K = 100000000
  • CLI: --tools.bwa.K=value

Open problems:

  • Subcommand argument collision (same option in different subcommands)

Related

  • Module System ADR: adr/20251114-module-system.md
  • Module Parameters ADR: adr/20260128-module-parameters.md

🤖 Generated with Claude Code

Introduces specification for typed tool arguments in module definitions:
- Extend tools section in meta.yaml with args component
- Argument attributes: type, enum, prefix, default, description
- Script usage via tools implicit variable
- Configuration and CLI override mechanisms
- Migration guide from ext.args pattern

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@netlify
Copy link

netlify bot commented Jan 28, 2026

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 0c38de2
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/697a12f5a6cdbe0008ab9111

@ewels ewels mentioned this pull request Jan 28, 2026
@pinin4fjords
Copy link
Contributor

The typed validation and IDE support would be really nice for common options.

However, the rationale behind ext.args in nf-core is deliberately to avoid maintaining option lists. From the module guidelines:

"The justification behind using ext.args is to provide more flexibility to users. As ext.args is derived from the configuration, advanced users can overwrite the default ext.args and supply their own arguments to modify the behaviour of a module. This can increase the capabilities of a pipeline beyond what the original developers intended."

This means users can pass any tool option without waiting for it to be enumerated, and enables patterns like dynamic closures and sample-specific args:

// Parameter-dependent
ext.args = { params.fastqc_kmer_size ? "-k ${params.fastqc_kmer_size}" : '' }

// Sample-specific
ext.args = { "--id ${meta.id}" }

My concern is how we match this flexibility without either:

  1. Keeping meta.yaml in sync with complex, shifting tool parameter landscapes, or
  2. Cutting off access to options that aren't enumerated

I noticed the earlier module-system ADR had this language: "This list does not need to be exhaustive. It should include any arguments known to be used by pipelines or that could be expected to be used by users." - but this ADR doesn't clarify what happens with unlisted options.

Could we take a **kwargs-style approach? In Python, functions can define explicit parameters while still accepting arbitrary additional arguments:

def func(a, b, **kwargs):
    # a and b are typed/documented
    # kwargs catches everything else

Similarly:

tools.bwa.args.K = 100000000  // documented in meta.yaml → validated
tools.bwa.args.B = 3          // not in meta.yaml → passed through as-is

The challenge is command-line formatting - documented args have prefix defined in meta.yaml, but for undocumented args we wouldn't know how to format them. Options:

  1. Require prefix in the key for undocumented args:

    tools.bwa.args.K = 100000000       // documented → uses prefix from meta.yaml
    tools.bwa.args['-B'] = 3           // undocumented → prefix explicit in key
  2. Single passthrough field for pre-formatted args:

    tools.bwa.args.K = 100000000            // documented, validated
    tools.bwa.args._passthrough = '-B 3'    // raw string

Option 2 is cleaner in terms of explicit expectations, but it's essentially ext.args again. Maybe that's unavoidable - the realistic goal being typed/validated/documented args for common options, with a raw passthrough for everything else?

@ewels
Copy link
Member

ewels commented Jan 28, 2026

Do we need the prefix stuff at all? If we have YAML dict keys that are in quotes we can just have the full flag, no?

tools:
  - bwa:
      args:
        "-K":
          type: integer
tools.bwa.args['-K'] = 3 

I like the simplicity of that. Then in the module we just do $args and the list gets collapsed nicely and we don't need to worry about what keys were in it.

@bentsherman bentsherman changed the title Add tools arguments ADR ADR: module tool arguments Jan 28, 2026
@bentsherman
Copy link
Member

It might be enough to just have tools.bwa.args as a drop-in replacement for ext.args, and use the spec only as a "documentation hint" for users / agents. It doesn't have to be exhaustive, but the more options you document, the better

The tool spec could just be a list of argument descriptions. Using nf-core/bwa/mem as an example:

tools:
  - bwa:
      documentation: https://bio-bwa.sourceforge.net/bwa.shtml
      args:
        - shortName: '-t'
          longName: '--threads' # bwa mem doesn't actually have long option names, this is just to illustrate
          type: integer
          default: 1
          description: 'Number of threads'
        - shortName: '-k'
          type: integer
          default: 19
          description: 'Minimum seed length'
        # ...

So, users and agents would still construct the desired string of args, but could use the tool spec for guidance. No need for an additional layer of typed parameters, which might introduce more rough edges

Note that nf-core/bwa/mem already has a documentation URL, which just provides the man-page for bwa. So in this case, an agent might not even need the tool spec if it can just consult the tool documentation. But other tools might not be as well documented, and either way it can be nice to have the most useful options documented locally

@pditommaso
Copy link
Member Author

Agree on both last comment. We could explicitly only target GNU CLI conventions, and infer - vs -- depending the the option name (length).

Agree, also the list should not be exhaustive, it's meant to expose the tool options that can be used to parametrise the process execution.

What I dislike in the current for is the "subcommand argument collision"

Subcommand argument collision
A tool having the same option name in two different subcommands cannot be managed with the current design. Arguments are defined at the tool level, not at the subcommand level.

@bentsherman
Copy link
Member

From our discussion today, people seem to like the prospect of specifying args with a map. Phil's example using the literal option name seems like the safest approach:

tools:
  - bwa:
      args:
        "-K":
          type: integer
tools.bwa.args['-K'] = 3 

With a method like tools.bwa.args.toString() to concatenate the args.

Some challenges that came up:

  • Converting between floating-point numbers and strings can create artifacts. But I just remembered, we can just parse floats as BigDecimal and it will be safe

  • We could provide tools.bwa.args.K as a further shorthand, but this is not as flexible, and might not be worth adding due to the "multiple ways to do the same thing" problem

  • Positional args vs named args. Most tools expect positional args after named args, but I don't think we can assume this. We could provide e.g. tools.bwa.args (list of positional args) and tools.bwa.kwargs (map of named args) so that the user can inject them independently.

  • Subcommand argument collision. To Paolo's comment above, this could be handled by treating the command and subcomannd as separate tools, e.g. tools.bwa.args vs tools.bwa.subcommand.args

@bentsherman
Copy link
Member

The args / kwargs distinction actually addresses my fundamental concern about trying to model CLI args as a map. Since CLI args are just a list of strings at the end of the day, users can always fall back on the args list if they need to, while using the kwargs map where they can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants