feat: Add mmseqs2 main workflows by fgvieira · Pull Request #3981 · snakemake/snakemake-wrappers

fgvieira · 2025-04-04T13:38:24Z

QC

I confirm that I have followed the documentation for contributing to snakemake-wrappers.

While the contributions guidelines are more extensive, please particularly ensure that:

test.py was updated to call any added or updated example rules in a Snakefile
input: and output: file paths in the rules can be chosen arbitrarily
wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:)
temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to
the meta.yaml contains a link to the documentation of the respective tool or command under url:
conda environments use a minimal amount of channels and packages, in recommended ordering

Summary by CodeRabbit

New Features
- Added MMseqs2 wrappers, workflow rules, and DB creation support for search, clustering, linclust, taxonomy, and RBH; workflow metadata and parameters.
Tests
- Added end-to-end tests, many expected outputs, and database creation fixtures.
Documentation
- Added SILVA export README and workflow metadata for discoverability.
Chores
- Added pinned and YAML Conda environment specs for reproducible installs.

coderabbitai · 2025-04-04T13:38:33Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds MMseqs2 Snakemake wrappers (workflow + DB), metadata and Conda environment specs (YAML + linux-64 pin), test Snakefiles, a test runner, and many static test fixtures/expected outputs for DB creation, search, clustering, linclust, taxonomy, and RBH.

Changes

Cohort / File(s)	Summary
Conda environments `bio/mmseqs2/db/environment.yaml`, `bio/mmseqs2/db/environment.linux-64.pin.txt`, `bio/mmseqs2/workflows/environment.yaml`, `bio/mmseqs2/workflows/environment.linux-64.pin.txt`	Add environment YAMLs and explicit linux-64 pin files listing exact package URLs and pinned versions (mmseqs2, snakemake-wrapper-utils, runtime libs) for reproducible environments.
Metadata / Manifests `bio/mmseqs2/db/meta.yaml`, `bio/mmseqs2/workflows/meta.yaml`	Add workflow and DB metadata declaring name, url, description, authors, I/O schema and params (module, extra).
Wrappers `bio/mmseqs2/db/wrapper.py`, `bio/mmseqs2/workflows/wrapper.py`	New Snakemake wrappers that normalize inputs/outputs, assemble command-line args (module, extra, threads, tmpdir), special-case DB modules, and execute mmseqs2 via shell; workflow wrapper includes module-level metadata.
DB rules & inputs `bio/mmseqs2/db/test/Snakefile`, `bio/mmseqs2/db/test/seqs/a.fasta`	Add test Snakefile with mmseqs2_databases and mmseqs2_createdb rules and small FASTA test input.
DB expected fixtures `bio/mmseqs2/db/test/expected/createdb/`, `bio/mmseqs2/db/test/expected/databases/`	Add static expected outputs for createdb/databases (index, lookup, source, version README, _h.* files).
Workflow rules (tests) `bio/mmseqs2/workflows/test/Snakefile`	Add test rules for search, cluster, linclust, taxonomy, and rbh with multiext outputs, logs, params and wrapper references.
Workflow DB fixtures `bio/mmseqs2/workflows/test/db/*`	Add static DB fixture files (`a.index`, `a.lookup`, `a.source`, `a_h.index`, `a_mapping`).
Workflow expected outputs `bio/mmseqs2/workflows/test/expected/cluster/`, `bio/mmseqs2/workflows/test/expected/linclust/`, `bio/mmseqs2/workflows/test/expected/search/a.tab`, `bio/mmseqs2/workflows/test/expected/rbh/a.tab`, `bio/mmseqs2/workflows/test/expected/taxonomy/a_report`	Add expected FASTA files, representative sequences, alignment/tab outputs and taxonomy report used by workflow tests.
Top-level tests `test_wrappers.py`	Add test function `test_mmseqs2(run)` that executes both workflow and DB test suites and compares outputs against expected fixtures.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Rule as Snakemake Rule
  participant Wrapper as MMseqs2 Wrapper
  participant MM as mmseqs2 CLI
  participant FS as Filesystem

  Rule->>Wrapper: invoke(inputs, params(module, extra), threads, log)
  Note over Wrapper: normalize inputs/outputs\nresolve common prefixes\nconfigure tmpdir/threads/extra
  Wrapper->>MM: mmseqs2 <module> <query?> <target?> <output> --threads N <extra> (uses tmpdir)
  MM->>FS: read input files
  MM-->>FS: write outputs (DBs, tabs, FASTA, reports)
  MM-->>Wrapper: exit status
  Wrapper-->>Rule: write log, expose outputs

sequenceDiagram
  autonumber
  participant RuleDB as Snakemake DB Rule
  participant DBWrapper as MMseqs2 DB Wrapper
  participant MM as mmseqs2 CLI
  participant FS as Filesystem

  RuleDB->>DBWrapper: invoke(seqs input, params, threads, log)
  Note over DBWrapper: special-case modules:\n- databases: append thread flags\n- createdb: disable tmpdir
  DBWrapper->>MM: mmseqs2 <module> <in> <out> [--threads N] <extra>
  MM->>FS: read seqs
  MM-->>FS: emit DB artifacts (.index, .lookup, .source, _h.*)
  MM-->>DBWrapper: exit status
  DBWrapper-->>RuleDB: log and outputs

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat!: Pin wrapper versions in meta-wrappers; add alignoth_report meta-wrapper #4678 — related test harness changes and adjustments to wrapper tests.

Suggested reviewers

johanneskoester

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding MMseqs2 main workflows to the repository, following conventional commit format.
Description check	✅ Passed	The PR description is complete and follows the template, with all QC checklist items checked and confirmed by the contributor as completed.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (10)

bio/mmseqs2/db/test/seqs/a.fasta (1)

1-8: Confirm intentional duplicate FASTA IDs.

Headers “>1” appear twice. If this is deliberate to exercise lookup behavior, all good; otherwise consider making IDs unique to avoid tool-dependent collisions.

bio/mmseqs2/workflows/environment.linux-64.pin.txt (1)

1-44: Explicit pin looks consistent; verify CI can consume it as-is.

mmseqs2 and wrapper-utils are pinned; good for reproducibility. Please ensure CI uses conda/mamba versions that support @explicit with these URLs. Also note OpenSSL/libstdc++ versions differ from the db pin; that’s acceptable but consider aligning if unnecessary drift causes solver differences.

Note: Based on prior repo conventions I’ve learned, ensure environment.yaml (separate file) uses spaces around “=” in pins if updated.

bio/mmseqs2/db/test/expected/databases/a.version (1)

1-165: Verify licensing for bundled SILVA text; consider trimming the fixture.

This embeds substantial third‑party documentation text. Please confirm redistribution is permitted under its license. If the test doesn’t require full content, prefer a minimal synthetic placeholder or assert only on sentinel lines to avoid bundling large copyrighted text.

bio/mmseqs2/workflows/test/db/a.index (1)

1-4: Optional: de-duplicate identical fixtures.

Consider referencing a single canonical a.index fixture (or symlink) to avoid drift between workflow and db tests.
bio/mmseqs2/db/meta.yaml (1)
3-13: Polish description and I/O/param wording for clarity and consistency.

Capitalize the description; prefer “FASTA” over “FAS”; clarify “module” to reflect the MMseqs2 component being run.
 description: |
-  ultra fast and sensitive sequence search and clustering suite
+  Ultra-fast and sensitive sequence search and clustering suite
 input:
-  - input FAS file
+  - input FASTA file
 output:
-  - output: DB files
+  - output: MMseqs2 DB files
 params:
-  - module: workflow to use
+  - module: MMseqs2 module to run (e.g., createdb)
   - extra: additional program arguments
bio/mmseqs2/workflows/meta.yaml (1)
3-14: Clarify description and I/O wording; name “FASTA” explicitly; refine param help.

Minor copyedits for consistency with repo conventions.
 description: |
-  ultra fast and sensitive sequence search and clustering suite
+  Ultra-fast and sensitive sequence search and clustering suite
 input:
-  - query: input query FAS file(s)
-  - target: input target FAS file(s) or DB
+  - query: input query FASTA file(s)
+  - target: input target FASTA file(s) or MMseqs2 DB
 output:
-  - output: FAS, cluster or DB file(s)
+  - output: FASTA, cluster, or DB file(s)
 params:
-  - module: workflow to use
+  - module: MMseqs2 workflow to run (e.g., cluster, linclust, search)
   - extra: additional program arguments
bio/mmseqs2/workflows/environment.yaml (1)
5-7: Unpin snakemake-wrapper-utils; mmseqs2 17.b804f confirmed on Bioconda

Per repo convention pin only the main tool — keep mmseqs2 pinned, remove the snakemake-wrapper-utils pin.
 dependencies:
   - mmseqs2 =17.b804f
-  - snakemake-wrapper-utils =0.8.0
+  - snakemake-wrapper-utils
test_wrappers.py (1)
153-166: Remove duplicate key in compare_results_with_expected.

The key "out/cluster/a_b_cluster.tsv" appears twice; the second overwrites the first.

Apply this diff:
         compare_results_with_expected={
             "out/search/a.fas": "expected/search/a.fas",
             "out/cluster/a_b_cluster.tsv": "expected/cluster/a_b_cluster.tsv",
             "out/cluster/a_b_rep_seq.fasta": "expected/cluster/a_b_rep_seq.fasta",
             "out/cluster/a_b_all_seqs.fasta": "expected/cluster/a_b_all_seqs.fasta",
-            "out/cluster/a_b_cluster.tsv": "expected/cluster/a_b_cluster.tsv",
             "out/linclust/a_b_rep_seq.fasta": "expected/linclust/a_b_rep_seq.fasta",
             "out/linclust/a_b_all_seqs.fasta": "expected/linclust/a_b_all_seqs.fasta",
             "out/linclust/a_b_cluster.tsv": "expected/linclust/a_b_cluster.tsv",
             "out/taxonomy/a_tophit_report": "expected/taxonomy/a_tophit_report",
             "out/taxonomy/a_tophit_aln": "expected/taxonomy/a_tophit_aln",
             "out/taxonomy/a_report": "expected/taxonomy/a_report",
             "out/taxonomy/a_lca.tsv": "expected/taxonomy/a_lca.tsv",
             "out/rbh/a.fas": "expected/rbh/a.fas",
         },
bio/mmseqs2/db/wrapper.py (1)
23-31: Make module checks robust when params.module includes arguments.

params.module can contain arguments (e.g., "databases SILVA"), so equality checks miss "databases". Split before checking; also avoid passing a dummy tmpdir for modules not requiring it.

Apply this diff:
 with tempfile.TemporaryDirectory() as tmpdir:
-    # Modules with threads
-    if snakemake.params.module in ["databases"]:
-        extra = f"--threads {snakemake.threads} {extra}"
-    # Modules with no temp folder
-    if snakemake.params.module in ["createdb"]:
-        tmpdir = ""
-
-    shell("mmseqs {snakemake.params.module} {input} {out} {tmpdir} {extra} {log}")
+    module = snakemake.params.module
+    module_name = module.split()[0]
+    # Modules with threads
+    if module_name == "databases":
+        extra = f"--threads {snakemake.threads} {extra}"
+    # Modules with no temp folder
+    tmp_arg = "" if module_name == "createdb" else tmpdir
+
+    shell("mmseqs {module} {input} {out} {tmp_arg} {extra} {log}")
bio/mmseqs2/workflows/wrapper.py (1)
14-17: Use the local variable for target prefixing.

Minor clarity fix: use the already-resolved target when getting the common prefix.

Apply this diff:
 target = snakemake.input.get("target", "")
 if isinstance(target, list):
-    target = os.path.commonprefix(snakemake.input.target)
+    target = os.path.commonprefix(target)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 168b48b and 187892a.

⛔ Files ignored due to path filters (5)

bio/mmseqs2/workflows/test/expected/cluster/a_b_cluster.tsv is excluded by !**/*.tsv
bio/mmseqs2/workflows/test/expected/linclust/a_b_cluster.tsv is excluded by !**/*.tsv
bio/mmseqs2/workflows/test/expected/taxonomy/a_lca.tsv is excluded by !**/*.tsv
bio/mmseqs2/workflows/test/seqs/nucl.a.fas.gz is excluded by !**/*.gz
bio/mmseqs2/workflows/test/seqs/nucl.b.fas.gz is excluded by !**/*.gz

📒 Files selected for processing (30)

bio/mmseqs2/db/environment.linux-64.pin.txt (1 hunks)
bio/mmseqs2/db/environment.yaml (1 hunks)
bio/mmseqs2/db/meta.yaml (1 hunks)
bio/mmseqs2/db/test/Snakefile (1 hunks)
bio/mmseqs2/db/test/expected/createdb/a.index (1 hunks)
bio/mmseqs2/db/test/expected/createdb/a.lookup (1 hunks)
bio/mmseqs2/db/test/expected/createdb/a.source (1 hunks)
bio/mmseqs2/db/test/expected/createdb/a_h.index (1 hunks)
bio/mmseqs2/db/test/expected/databases/a.source (1 hunks)
bio/mmseqs2/db/test/expected/databases/a.version (1 hunks)
bio/mmseqs2/db/test/seqs/a.fasta (1 hunks)
bio/mmseqs2/db/wrapper.py (1 hunks)
bio/mmseqs2/workflows/environment.linux-64.pin.txt (1 hunks)
bio/mmseqs2/workflows/environment.yaml (1 hunks)
bio/mmseqs2/workflows/meta.yaml (1 hunks)
bio/mmseqs2/workflows/test/Snakefile (1 hunks)
bio/mmseqs2/workflows/test/db/a.index (1 hunks)
bio/mmseqs2/workflows/test/db/a.lookup (1 hunks)
bio/mmseqs2/workflows/test/db/a.source (1 hunks)
bio/mmseqs2/workflows/test/db/a_h.index (1 hunks)
bio/mmseqs2/workflows/test/db/a_mapping (1 hunks)
bio/mmseqs2/workflows/test/expected/cluster/a_b_all_seqs.fasta (1 hunks)
bio/mmseqs2/workflows/test/expected/cluster/a_b_rep_seq.fasta (1 hunks)
bio/mmseqs2/workflows/test/expected/linclust/a_b_all_seqs.fasta (1 hunks)
bio/mmseqs2/workflows/test/expected/linclust/a_b_rep_seq.fasta (1 hunks)
bio/mmseqs2/workflows/test/expected/rbh/a.tab (1 hunks)
bio/mmseqs2/workflows/test/expected/search/a.tab (1 hunks)
bio/mmseqs2/workflows/test/expected/taxonomy/a_report (1 hunks)
bio/mmseqs2/workflows/wrapper.py (1 hunks)
test_wrappers.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

test_wrappers.py
bio/mmseqs2/db/wrapper.py
bio/mmseqs2/workflows/wrapper.py

**/wrapper.py

⚙️ CodeRabbit configuration file

Do not complain about use of undefined variable called snakemake.

Files:

bio/mmseqs2/db/wrapper.py
bio/mmseqs2/workflows/wrapper.py

🧠 Learnings (8)

📚 Learning: 2025-04-17T09:24:51.738Z

Learnt from: dlaehnemann
PR: snakemake/snakemake-wrappers#0
File: :0-0
Timestamp: 2025-04-17T09:24:51.738Z
Learning: In snakemake-wrappers repository, environment.yaml files should follow these conventions:
1. Use whitespace before the equal sign in version specifications (e.g., "datavzrd =2.53.1")
2. Only specify the exact version for the main software package
3. Don't add version constraints for dependencies unless absolutely necessary
4. See full guidelines at: https://snakemake-wrappers.readthedocs.io/en/stable/contributing.html#environment-yaml-file

Applied to files:

bio/mmseqs2/workflows/environment.yaml
bio/mmseqs2/db/environment.yaml

📚 Learning: 2025-04-17T09:24:51.738Z

Learnt from: dlaehnemann
PR: snakemake/snakemake-wrappers#0
File: :0-0
Timestamp: 2025-04-17T09:24:51.738Z
Learning: In snakemake-wrappers repository, environment.yaml files should follow these conventions:
1. Use whitespace before the equal sign in version specifications (e.g., "datavzrd =2.53.1")
2. Only specify the exact version for the main software package
3. Don't add version constraints for dependencies unless absolutely necessary
4. See guidelines at: https://snakemake-wrappers.readthedocs.io/en/stable/contributing.html#environment-yaml-file

Applied to files:

bio/mmseqs2/workflows/environment.yaml

📚 Learning: 2025-02-11T12:24:22.592Z

Learnt from: dlaehnemann
PR: snakemake/snakemake-wrappers#3648
File: bio/nanosim/simulator/environment.yaml:6-6
Timestamp: 2025-02-11T12:24:22.592Z
Learning: In the nanosim bioconda recipe, dependencies are carefully managed with specific version pins (e.g., scikit-learn ~=0.23.2) to ensure compatibility with pre-trained models. These dependencies don't need to be explicitly added to environment.yaml files when the main package is listed as a dependency, as they are handled through the bioconda recipe system.

Applied to files:

bio/mmseqs2/workflows/environment.yaml
bio/mmseqs2/db/environment.linux-64.pin.txt
bio/mmseqs2/workflows/environment.linux-64.pin.txt

📚 Learning: 2025-06-02T07:56:35.854Z

Learnt from: tdayris
PR: snakemake/snakemake-wrappers#4159
File: bio/pyfaidx/environment.yaml:6-6
Timestamp: 2025-06-02T07:56:35.854Z
Learning: In the Snakemake-wrapper repository, conda dependency version pinning in environment.yaml files uses spaces around the equals sign (e.g., `- pyfaidx =0.8.1.4`) as the established coding standard, even though conda itself doesn't require the spaces.

Applied to files:

bio/mmseqs2/workflows/environment.linux-64.pin.txt

📚 Learning: 2024-11-26T08:31:00.099Z

Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3496
File: bio/mtnucratio/test/Snakefile:2-6
Timestamp: 2024-11-26T08:31:00.099Z
Learning: In test files for Snakemake wrappers, such as `bio/mtnucratio/test/Snakefile`, hard-coded input and output paths are acceptable as examples and do not need to use wildcards to make paths flexible.

Applied to files:

test_wrappers.py

📚 Learning: 2024-08-21T08:30:42.757Z

Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-08-21T08:30:42.757Z
Learning: In `wrapper.py` scripts, do not flag the use of an undefined variable called `snakemake`.

Applied to files:

bio/mmseqs2/db/wrapper.py
bio/mmseqs2/workflows/wrapper.py

📚 Learning: 2024-11-21T10:23:03.427Z

Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.

Applied to files:

bio/mmseqs2/db/wrapper.py
bio/mmseqs2/workflows/wrapper.py

📚 Learning: 2024-10-08T17:41:54.542Z

Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-10-08T17:41:54.542Z
Learning: The `snakemake` variable is inserted via a preamble during execution in `wrapper.py` scripts, so it doesn't need to be explicitly defined.

Applied to files:

bio/mmseqs2/workflows/wrapper.py

🪛 Ruff (0.13.1)

test_wrappers.py

157-157: Dictionary key literal "out/cluster/a_b_cluster.tsv" repeated

Remove repeated key literal "out/cluster/a_b_cluster.tsv"

(F601)

bio/mmseqs2/db/wrapper.py

9-9: Undefined name snakemake

(F821)

10-10: Undefined name snakemake

(F821)

13-13: Undefined name snakemake

(F821)

18-18: Undefined name snakemake

(F821)

25-25: Undefined name snakemake

(F821)

26-26: Undefined name snakemake

(F821)

28-28: Undefined name snakemake

(F821)

bio/mmseqs2/workflows/wrapper.py

10-10: Undefined name snakemake

(F821)

11-11: Undefined name snakemake

(F821)

14-14: Undefined name snakemake

(F821)

16-16: Undefined name snakemake

(F821)

19-19: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: testing

🔇 Additional comments (31)

bio/mmseqs2/db/test/expected/createdb/a.lookup (1)

1-4: LGTM; please confirm column semantics are stable across OS/versions.

The mapping aligns with the duplicated IDs in the input FASTA. Please confirm the 3rd column (“0”) is the expected source/file index from mmseqs2 createdb and is stable across platforms.

bio/mmseqs2/workflows/test/expected/linclust/a_b_rep_seq.fasta (1)

1-4: LGTM.

Representative sequences look fine for the linclust expected output.

bio/mmseqs2/db/environment.linux-64.pin.txt (1)

1-33: LGTM; reproducible DB env pin.

Pins are minimal and include only what’s needed (mmseqs2, gawk, basics). Confirm CI path uses conda create --file to avoid accidental channel mixing.

bio/mmseqs2/workflows/test/expected/taxonomy/a_report (1)

1-1: LGTM.

Report format/fields look consistent for the taxonomy workflow expected output.

bio/mmseqs2/db/test/expected/createdb/a.source (1)

1-1: LGTM; confirm path semantics.

The source mapping “0a.fasta” matches common mmseqs2 conventions. Please confirm tests assume only basenames (not full paths) to keep runs stable across machines.

bio/mmseqs2/workflows/test/expected/rbh/a.tab (1)

1-1: Confirm column order vs mmseqs2 rbh --format-output.

RBH fields can differ by version/flags; please verify the wrapper’s format-output matches this expectation to avoid brittle tests.

bio/mmseqs2/db/test/expected/databases/a.source (1)

1-1: LGTM: expected database source mapping.

Content and tab delimiter look correct.

bio/mmseqs2/workflows/test/db/a.index (1)

1-4: LGTM: index fixture matches createdb expectation.

Looks consistent with the createdb/a.index expectation.

bio/mmseqs2/workflows/test/db/a_mapping (1)

1-5: LGTM: taxonomy mapping fixture.

Values and formatting look fine.

bio/mmseqs2/db/environment.yaml (1)

1-6: LGTM: environment.yaml follows repo conventions.

Channels ordering, nodefaults, and single pinned main dependency all align with guidelines.

If mmseqs2 version changes later, remember to refresh the corresponding pinned lockfile(s) to keep CI deterministic.

bio/mmseqs2/db/test/expected/createdb/a.index (1)

1-4: LGTM: createdb index expectation.

Matches the workflow fixture; tab-separated triplets are consistent.

bio/mmseqs2/workflows/test/db/a_h.index (1)

1-4: LGTM: header index fixture.

Consistent increments; formatting OK.

bio/mmseqs2/workflows/test/db/a.source (1)

1-1: LGTM: source mapping.

Tab-delimited mapping looks correct.

bio/mmseqs2/workflows/test/db/a.lookup (1)

1-4: LGTM: fixture content looks consistent.

Tab-delimited triplets align with the corresponding createdb expectations.

bio/mmseqs2/db/test/expected/createdb/a_h.index (1)

1-4: LGTM: expected index triplets look plausible.

Monotonic offsets with fixed block size are consistent.

bio/mmseqs2/workflows/test/expected/search/a.tab (1)

1-37: LGTM: expected TSV layout is coherent.

Columns/values look consistent with typical MMseqs2 search output.

bio/mmseqs2/workflows/test/expected/linclust/a_b_all_seqs.fasta (1)

1-6: Incorrect — no consecutive FASTA headers found. Repository check under bio/mmseqs2/workflows/test/expected found no consecutive '>' lines; the duplicate header-only records reported are not present.

Likely an incorrect or invalid review comment.

test_wrappers.py (3)

137-151: LGTM: adds end-to-end tests for mmseqs2 workflows.

153-166: Sanity-check side-effect outputs used for comparisons.

"out/search/a.fas" and "out/rbh/a.fas" are not declared outputs; they rely on mmseqs2 side effects. Please confirm these filenames are stable across mmseqs2 versions (to avoid flaky tests).

170-194: LGTM: db tests cover both 'databases' and 'createdb' artifacts.

bio/mmseqs2/db/wrapper.py (1)

13-21: Prefix derivation looks good.

Using commonprefix and rstrip("_") to compute database/output base paths matches the multiext patterns used in tests.

bio/mmseqs2/db/test/Snakefile (2)

1-25: LGTM: sensible outputs and logging for 'databases'.

27-50: LGTM: createdb rule aligns with wrapper contract.

bio/mmseqs2/workflows/wrapper.py (2)

19-30: LGTM: output base handling and format-mode toggles.

31-34: LGTM: command assembly with threads and tmpdir.

bio/mmseqs2/workflows/test/expected/cluster/a_b_rep_seq.fasta (1)

1-4: LGTM: expected FASTA fixture added.

bio/mmseqs2/workflows/test/Snakefile (5)

1-15: LGTM: easy-search rule wiring.

17-32: LGTM: easy-cluster outputs via multiext with stable base.

34-49: LGTM: easy-linclust mirrors cluster interface.

51-81: LGTM: taxonomy target DB multiext matches db wrapper outputs.

83-97: LGTM: RBH rule consistent with search interface.

bio/mmseqs2/workflows/test/expected/cluster/a_b.seqs.fas

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

bio/mmseqs2/workflows/wrapper.py (2)
34-37: Quote paths to avoid word splitting; keep optional target unquoted.

Prevents breakage when paths contain spaces, while not forcing an empty target argument.

Apply:
-    shell(
-        "mmseqs {snakemake.params.module} {snakemake.input.query} {target} {out} {tmpdir} --threads {snakemake.threads} {extra} {log}"
-    )
+    shell(
+        'mmseqs {snakemake.params.module} "{snakemake.input.query}" {target} "{out}" "{tmpdir}" --threads {snakemake.threads} {extra} {log}'
+    )
18-18: Track the TODO for arbitrary output names.

If needed, I can help implement support and adjust tests.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 187892a and 2a4ab88.

📒 Files selected for processing (1)

bio/mmseqs2/workflows/wrapper.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

bio/mmseqs2/workflows/wrapper.py

**/wrapper.py

⚙️ CodeRabbit configuration file

Do not complain about use of undefined variable called snakemake.

Files:

bio/mmseqs2/workflows/wrapper.py

🧠 Learnings (2)

📚 Learning: 2024-11-21T10:23:03.427Z

Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.

Applied to files:

bio/mmseqs2/workflows/wrapper.py

📚 Learning: 2024-08-21T08:30:42.757Z

Learnt from: johanneskoester
PR: snakemake/snakemake-wrappers#3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-08-21T08:30:42.757Z
Learning: In `wrapper.py` scripts, do not flag the use of an undefined variable called `snakemake`.

Applied to files:

bio/mmseqs2/workflows/wrapper.py

🪛 Ruff (0.13.1)

bio/mmseqs2/workflows/wrapper.py

10-10: Undefined name snakemake

(F821)

11-11: Undefined name snakemake

(F821)

14-14: Undefined name snakemake

(F821)

16-16: Undefined name snakemake

(F821)

19-19: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: testing
GitHub Check: docs

🔇 Additional comments (1)

bio/mmseqs2/workflows/wrapper.py (1)

1-3: Module metadata looks good.

bio/mmseqs2/workflows/wrapper.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

test_wrappers.py (2)
152-160: Remove duplicate dict key in compare_results_with_expected (ruff F601).

The key "out/cluster/a_b_cluster.tsv" appears twice; the latter silently overrides the former. Drop the duplicate to satisfy linting and avoid confusion.

Apply this diff:
             "out/cluster/a_b_rep_seq.fasta": "expected/cluster/a_b_rep_seq.fasta",
             "out/cluster/a_b_all_seqs.fasta": "expected/cluster/a_b_all_seqs.fasta",
-            "out/cluster/a_b_cluster.tsv": "expected/cluster/a_b_cluster.tsv",
             "out/linclust/a_b_rep_seq.fasta": "expected/linclust/a_b_rep_seq.fasta",
137-196: Consider splitting into two tests for clearer failures.

Having workflows and db assertions in one test makes triage harder when one half fails. Suggest test_mmseqs2_workflows() and test_mmseqs2_db().

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a4ab88 and 11e8805.

📒 Files selected for processing (1)

test_wrappers.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

test_wrappers.py

🪛 Ruff (0.13.1)

test_wrappers.py

157-157: Dictionary key literal "out/cluster/a_b_cluster.tsv" repeated

Remove repeated key literal "out/cluster/a_b_cluster.tsv"

(F601)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: testing
GitHub Check: docs

🔇 Additional comments (1)

test_wrappers.py (1)
152-166: Verify expected fixture files exist
In test_wrappers.py (around lines 152–166 and 181–193), before comparing outputs, iterate over compare_results_with_expected.values() and assert each expected file exists to avoid brittle CI failures. For example:
for expected_path in compare_results_with_expected.values():
    assert os.path.isfile(expected_path), f"Missing fixture: {expected_path}"

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11e8805 and 949d65d.

📒 Files selected for processing (1)

test_wrappers.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

test_wrappers.py

🪛 Ruff (0.14.1)

test_wrappers.py

168-168: Dictionary key literal "out/cluster/a_b_cluster.tsv" repeated

Remove repeated key literal "out/cluster/a_b_cluster.tsv"

(F601)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: docs
GitHub Check: testing

test_wrappers.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

dlaehnemann

Wow, this is a big one. Nice job in getting all of the test cases set up, even including expected files.

I have some things I would suggest now, and would then have another look once they are addressed. The most important points are:

Actually implement the TODO for arbitrary output files. I think this is feasible and sooo much better for using the wrappers in workflows. I made a suggestion in the db wrapper.
Actually check that the specified params: module: is a valid choice and throw a useful error otherwise. I made a suggestion in the db wrapper, but the the error message could maybe be even more detailed, maybe with a linkout to the docs.
The meta.yaml files should list all of the available modules and link out to respective sections in the documentation. This makes it so much easier to find the right place to look for how to use a module.
Maybe directly update to the latest mmseqs2 release available on bioconda.

bio/mmseqs2/db/environment.yaml

bio/mmseqs2/workflows/environment.yaml

bio/mmseqs2/db/wrapper.py

bio/mmseqs2/db/meta.yaml

bio/mmseqs2/db/test/Snakefile

dlaehnemann

Just some minor points. So feel free to merge without another review, once you have gone through them.

bio/mmseqs2/workflows/meta.yaml

bio/mmseqs2/workflows/wrapper.py

bio/mmseqs2/db/test/Snakefile

Co-authored-by: David Laehnemann <1379875+dlaehnemann@users.noreply.github.com>

🤖 I have created a release *beep* *boop* --- ## [9.3.0](v9.2.0...v9.3.0) (2026-02-27) ### Features * Add mmseqs2 main workflows ([#3981](#3981)) ([d97a2a8](d97a2a8)) ### Performance Improvements * autobump bio/alignoth/environment.yaml ([#4993](#4993)) ([6894aa1](6894aa1)) * autobump bio/bbtools/environment.yaml ([#4995](#4995)) ([04129d2](04129d2)) * autobump bio/bismark/bam2nuc/environment.yaml ([#4996](#4996)) ([2f846f7](2f846f7)) * autobump bio/bismark/bismark_genome_preparation/environment.yaml ([#5004](#5004)) ([693441d](693441d)) * autobump bio/bismark/bismark_methylation_extractor/environment.yaml ([#5000](#5000)) ([b213a05](b213a05)) * autobump bio/bismark/bismark/environment.yaml ([#4997](#4997)) ([e3c8b09](e3c8b09)) * autobump bio/bismark/bismark2bedGraph/environment.yaml ([#5001](#5001)) ([117fe2b](117fe2b)) * autobump bio/bismark/bismark2report/environment.yaml ([#4999](#4999)) ([4f17be2](4f17be2)) * autobump bio/bismark/bismark2summary/environment.yaml ([#5005](#5005)) ([3f95f0c](3f95f0c)) * autobump bio/bismark/deduplicate_bismark/environment.yaml ([#4998](#4998)) ([ad4d0af](ad4d0af)) * autobump bio/bowtie2/align/environment.yaml ([#5003](#5003)) ([47cd070](47cd070)) * autobump bio/bowtie2/build/environment.yaml ([#5002](#5002)) ([1ef441f](1ef441f)) * autobump bio/deseq2/deseqdataset/environment.yaml ([#5006](#5006)) ([9c7740c](9c7740c)) * autobump bio/mosdepth/environment.yaml ([#5009](#5009)) ([eb9b820](eb9b820)) * autobump bio/trim_galore/pe/environment.yaml ([#5014](#5014)) ([1747f10](1747f10)) * autobump bio/trim_galore/se/environment.yaml ([#5016](#5016)) ([e8e2ff9](e8e2ff9)) * autobump bio/tximport/environment.yaml ([#5015](#5015)) ([4432c52](4432c52)) * autobump phys/root/define_columns/environment.yaml ([#5013](#5013)) ([0c34711](0c34711)) * autobump phys/root/filter/environment.yaml ([#5011](#5011)) ([b271a89](b271a89)) * autobump phys/root/hadd/environment.yaml ([#5010](#5010)) ([68c03a1](68c03a1)) * autobump phys/root/rootcp/environment.yaml ([#5012](#5012)) ([1893e02](1893e02)) * autobump utils/datavzrd/environment.yaml ([#5007](#5007)) ([50d0e18](50d0e18)) * autobump utils/miller/environment.yaml ([#5008](#5008)) ([d0e29f7](d0e29f7)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

Add main workflows

ea4387b

fgvieira marked this pull request as draft April 4, 2025 13:38

fgvieira added 8 commits April 4, 2025 15:53

Code clean-up

1d903d7

Add two db wrappers

261f39f

Fix meta

fe509e5

Merge branch 'master' into mmseqs2

94a7e09

Fix typos

7b121ed

Merge branch 'master' into mmseqs2

f12765b

Infer output format

2f94129

Only set out format for html and sam, and fix typo

187892a

fgvieira marked this pull request as ready for review September 24, 2025 15:43

coderabbitai bot reviewed Sep 24, 2025

View reviewed changes

bio/mmseqs2/workflows/test/expected/cluster/a_b.seqs.fas Show resolved Hide resolved

Fix import

2a4ab88

coderabbitai bot reviewed Sep 24, 2025

View reviewed changes

bio/mmseqs2/workflows/wrapper.py Show resolved Hide resolved

bio/mmseqs2/workflows/wrapper.py Outdated Show resolved Hide resolved

bio/mmseqs2/workflows/wrapper.py Outdated Show resolved Hide resolved

Fix expected files

11e8805

coderabbitai bot reviewed Sep 24, 2025

View reviewed changes

fgvieira requested a review from johanneskoester September 29, 2025 10:27

fgvieira requested a review from dlaehnemann October 10, 2025 12:53

Merge branch 'master' into mmseqs2

949d65d

coderabbitai bot reviewed Oct 27, 2025

View reviewed changes

test_wrappers.py Show resolved Hide resolved

fgvieira and others added 4 commits October 27, 2025 13:06

Remove dup dict entry

1c07285

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Merge branch 'master' into mmseqs2

6e9889c

Merge branch 'master' into mmseqs2

df98e93

Merge branch 'master' into mmseqs2

4b063cd

dlaehnemann requested changes Feb 25, 2026

View reviewed changes

fgvieira added 4 commits February 25, 2026 13:29

Merge branch 'master' into mmseqs2

ff00211

Bump version

3414e5f

Expand docs

2ca7e2a

Fix expected file

7f180b6

fgvieira added 6 commits February 25, 2026 15:36

Add createtaxdb test

2d561f2

Code format

f58e9e2

Use HTTP

105bf41

Fix input

2ef1aea

Fix input

2415b17

Small code tweak

adc848b

fgvieira requested a review from dlaehnemann February 25, 2026 17:42

dlaehnemann approved these changes Feb 27, 2026

View reviewed changes

bio/mmseqs2/workflows/meta.yaml Outdated Show resolved Hide resolved

bio/mmseqs2/workflows/wrapper.py Outdated Show resolved Hide resolved

bio/mmseqs2/workflows/wrapper.py Outdated Show resolved Hide resolved

bio/mmseqs2/db/test/Snakefile Outdated Show resolved Hide resolved

fgvieira and others added 5 commits February 27, 2026 13:46

Apply suggestions from code review

cbaf8c5

Co-authored-by: David Laehnemann <1379875+dlaehnemann@users.noreply.github.com>

Update docs

c08e521

Allow for arbitrary output names

b600d65

Fix test files

356f671

Comments

6b14830

fgvieira enabled auto-merge (squash) February 27, 2026 14:06

fgvieira merged commit d97a2a8 into snakemake:master Feb 27, 2026
8 checks passed

snakemake-bot mentioned this pull request Feb 27, 2026

chore(master): release 9.3.0 #4994

Merged

fgvieira deleted the mmseqs2 branch March 18, 2026 14:39

Conversation

fgvieira commented Apr 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QC

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dlaehnemann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dlaehnemann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fgvieira commented Apr 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 4, 2025 •

edited

Loading