Skip to content

Conversation

@anna-parker
Copy link
Contributor

@anna-parker anna-parker commented Jul 31, 2025

resolves #4847

Screenshot

Builds on #4821

You can use pathoplexus/dev_example_data#2 for testing - see example submission in video:

Screen.Recording.2025-08-12.at.13.43.15.mov

Enable assignment of segments/subtypes using nextclade sort with the param:

classify_with_nextclade_sort: True
minimizer_index: <url_to_minimizer_index_used_by_nextclade_sort>

When this is set to true (per default) fasta headers must have the format _ (as in current set up).

Additionally instead of having a dictionary for the nextclade datasets and servers we make nucleotideSequences a list of sequences:

nextclade_dataset_name: 
    L: nextstrain/cchfv/linked/L
    M: nextstrain/cchfv/linked/M
    S: nextstrain/cchfv/linked/S
nextclade_dataset_server: https://raw.githubusercontent.com/nextstrain/nextclade_data/cornelius-cchfv/data_output
genes: [RdRp, GPC, NP]
nucleotideSequences:
  - name: L
    nextclade_dataset_name: nextstrain/cchfv/linked/L
    nextclade_dataset_tag: <optional - was previously incorrectly placed on an organism level> 
    nextclade_dataset_server: <optional overwrites nextclade_dataset_server for this seq>
    accepted_sort_matches: <optional, used for classify_with_nextclade_sort and require_nextclade_sort_match, if not given nextclade_dataset_name is used> 
    gene_prefix: <optional, prefix to add to genes produced by nextclade run, e.g. nextclade labels genes as `AV1` but we expect `EV1_AV1`, here `EV1` would be the prefix >
  - name: M
    nextclade_dataset_name: nextstrain/cchfv/linked/M
  - name: S
    nextclade_dataset_name: nextstrain/cchfv/linked/S
nextclade_dataset_server: https://raw.githubusercontent.com/nextstrain/nextclade_data/cornelius-cchfv/data_output

Note the templates now also generate the genes list from the merged config.

PR Checklist

🚀 Preview: Add preview label to enable

Base automatically changed from fixPreproConfig to main August 1, 2025 06:21
@anna-parker anna-parker added the preview Triggers a deployment to argocd label Aug 4, 2025
@anna-parker anna-parker force-pushed the prepro_config_multi_path branch 2 times, most recently from d18816c to 61e3123 Compare August 4, 2025 16:56
@anna-parker anna-parker changed the base branch from main to small_ingest_fixes August 4, 2025 16:56
@corneliusroemer

This comment was marked as outdated.

@anna-parker anna-parker force-pushed the prepro_config_multi_path branch from 61e3123 to 141a825 Compare August 5, 2025 09:46
@anna-parker anna-parker force-pushed the prepro_config_multi_path branch from 141a825 to 0f15fe1 Compare August 5, 2025 09:47
Base automatically changed from small_ingest_fixes to main August 5, 2025 13:17
@anna-parker anna-parker force-pushed the prepro_config_multi_path branch 3 times, most recently from e52e614 to 97e34b4 Compare August 5, 2025 18:39
@anna-parker anna-parker changed the base branch from main to sequence_hydration August 5, 2025 18:39
@anna-parker anna-parker force-pushed the prepro_config_multi_path branch 2 times, most recently from 6232be6 to a0242c5 Compare August 6, 2025 11:53
Base automatically changed from sequence_hydration to main August 11, 2025 06:35
@anna-parker anna-parker changed the title feat(prepro): start config feat(prepro): assign segment with nextclade sort Aug 11, 2025
@anna-parker anna-parker force-pushed the prepro_config_multi_path branch 2 times, most recently from f69d2ac to 48d8110 Compare August 11, 2025 07:15
@anna-parker anna-parker changed the base branch from main to move_fast_header_validation August 11, 2025 07:15
@anna-parker anna-parker force-pushed the move_fast_header_validation branch from fc76c68 to 997af9c Compare August 11, 2025 12:23
@anna-parker anna-parker force-pushed the prepro_config_multi_path branch 2 times, most recently from 3d4d2dd to 8142ff0 Compare August 12, 2025 07:36
@anna-parker
Copy link
Contributor Author

closing in favor of #5402

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants