generated from linkml/linkml-template
-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Summary
During the MIxS migration to GSC commit 0368da846b197bef1c0dd27a9cf337a8aeea17f2, we identified 6 slots where GSC changed the range from string/TextValue to enum types. NMDC production data currently uses TextValue objects for these slots, so we override the range back to TextValue for backward compatibility.
This issue tracks the future migration of this data to use the proper enum values.
Affected Slots
| Slot | GSC Enum Range | NMDC Current Range | Biosamples with Data |
|---|---|---|---|
crop_rotation |
CropRotationEnum |
TextValue |
TBD |
cult_root_med |
CultRootMedEnum |
TextValue |
140 |
gravidity |
GravidityEnum |
TextValue |
TBD |
perturbation |
PerturbationEnum |
TextValue |
TBD |
soil_type |
FaoClassEnum |
TextValue |
TBD |
store_cond |
StoreCondEnum |
TextValue |
3,910 |
Current State
The yq transformations in assets/yq-for-mixs_subset_modified.txt override these slots:
# Restore TextValue range for slots where NMDC MongoDB has TextValue data
'.slots.crop_rotation.range |= "TextValue"'
'.slots.cult_root_med.range |= "TextValue"'
'.slots.gravidity.range |= "TextValue"'
'.slots.perturbation.range |= "TextValue"'
'.slots.soil_type.range |= "TextValue"'
'.slots.store_cond.range |= "TextValue"'Migration Approach
For each slot:
-
Analyze existing data
- Query MongoDB for all unique
has_raw_valuevalues - Document the value distribution
- Query MongoDB for all unique
-
Map to enum values
- Compare existing values to GSC enum permissible values
- Identify exact matches, partial matches, and unmappable values
- Decide on mapping strategy (exact match, normalization, or custom enum extension)
-
Create migration script
- Transform
{type: "nmdc:TextValue", has_raw_value: "..."}→"enum_value" - Handle edge cases and unmappable values
- Transform
-
Update schema
- Remove TextValue range override from yq file
- Accept GSC enum range (or extend enum if needed)
-
Execute migration
- Run migration on MongoDB
- Validate all biosamples against updated schema
Example: store_cond
Current data format:
{"type": "nmdc:TextValue", "has_raw_value": "frozen"}GSC StoreCondEnum permissible values need to be checked for compatibility.
Target format:
"frozen" // or appropriate enum valueRelated
- Documentation:
src/docs/mixs-migration.md - MIxS migration tracking: change nmdc-schema's MIxS import to GSC's 6.2 YAML #1368
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels