Create a helper script (pipelines/geo/autoassign_sample_groups.py) that automatically fetches GEO sample metadata and classifies samples as control or condition based on their titles/characteristics.
Goals
[ ] Parse mirna_experiment_info.tsv
[ ] For each GSE, query NCBI Entrez API to get all GSM sample metadata
[ ] Classify samples into Control and Condition using Claude Code skills
[ ] Output a draft TSV with control_samples and condition_samples columns filled
[ ] Flag when done/ambiguous cases for manual review
Final Workflow
- Run autoassign_sample_groups.py → generates draft with auto-filled columns
- Manual review and corrections
- Run RNA-seq pipeline with verified sample assignments
Create a helper script (pipelines/geo/autoassign_sample_groups.py) that automatically fetches GEO sample metadata and classifies samples as control or condition based on their titles/characteristics.
Goals
[ ] Parse mirna_experiment_info.tsv
[ ] For each GSE, query NCBI Entrez API to get all GSM sample metadata
[ ] Classify samples into Control and Condition using Claude Code skills
[ ] Output a draft TSV with control_samples and condition_samples columns filled
[ ] Flag when done/ambiguous cases for manual review
Final Workflow