Skip to content

feat: add dataset metadata fields and improve community filtering#1754

Merged
ivan-aksamentov merged 4 commits intomasterfrom
feat/dataset-validation
Mar 5, 2026
Merged

feat: add dataset metadata fields and improve community filtering#1754
ivan-aksamentov merged 4 commits intomasterfrom
feat/dataset-validation

Conversation

@ivan-aksamentov
Copy link
Member

@ivan-aksamentov ivan-aksamentov commented Mar 5, 2026

Add explicit schema fields for dataset metadata (deprecated, experimental, maintenance) that were previously absorbed by serde's flatten catch-all, enabling proper schema validation during data rebuild. Replace the official field with path-based community detection so --no-community now correctly filters only datasets in the community/ collection rather than all non-nextstrain datasets. Remove the unused enabled field which was redundant with deprecated.

  • Add deprecated, experimental, maintenance fields to VirusProperties struct
  • Add DatasetMaintenance struct for contact URLs (website, docs, issues, authors)
  • Replace Dataset::official() with Dataset::is_community() checking community/ prefix
  • Update --no-community CLI filter to use new path-based detection
  • Remove unused enabled and official fields from schemas
  • Regenerate JSON schemas

Sibling PRs in data: nextstrain/nextclade_data#413, nextstrain/nextclade_data#414

…aset

Dataset pathogen.json files use top-level metadata flags (enabled,
deprecated, experimental, official, maintenance) that were silently
absorbed by the serde flatten catch-all. Add explicit fields so they
are recognized by schema validation and accessible in Rust code.

- Add `DatasetMaintenance` struct with typed fields for contact URLs
- Add `enabled`, `deprecated`, `experimental`, `official`, `maintenance` to `VirusProperties`
- Add `enabled`, `maintenance` to `Dataset` (present in index.json)
…etection

Remove the `official` field from VirusProperties and Dataset structs.
Replace `Dataset::official()` with `Dataset::is_community()` which checks
if path starts with `community/`. This properly handles the three dataset
collections (nextstrain, enpen, community) where only community datasets
should be filtered by `--no-community`.
The enabled field was never used - deprecated serves the same purpose.
Remove from VirusProperties, Dataset structs and regenerate schemas.
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

@ivan-aksamentov ivan-aksamentov merged commit 2a538b5 into master Mar 5, 2026
19 checks passed
@ivan-aksamentov ivan-aksamentov deleted the feat/dataset-validation branch March 5, 2026 00:51
ivan-aksamentov added a commit that referenced this pull request Mar 5, 2026
…m VirusProperties

These root-level fields are no longer read by Nextclade. Since #1754,
both values are read from attributes.get("deprecated") and
attributes.get("experimental") in Dataset.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant