From 0ef9fdf106645323b82007cf376d2ddf09785031 Mon Sep 17 00:00:00 2001 From: Sebastian Urchs Date: Fri, 30 May 2025 14:48:01 +0200 Subject: [PATCH 01/53] [ENH] Integrate BEP036 - Phenotypic Data Guidelines BEP036 brings guidelines for best tabular phenotypic data to the BIDS specification. - Includes an appendix called `phenotype.md` - Includes admonitions for the guidelines in-line with modality agnostic files sections --------- Co-authored-by: Eric Earl Co-authored-by: Samuel Guay Co-authored-by: Sebastian Urchs Co-authored-by: Arshitha B --- src/appendices/phenotype.md | 331 ++++++++++++++++++ src/common-principles.md | 8 +- .../data-summary-files.md | 286 +++++++++++++-- src/schema/objects/files.yaml | 7 +- 4 files changed, 602 insertions(+), 30 deletions(-) create mode 100644 src/appendices/phenotype.md diff --git a/src/appendices/phenotype.md b/src/appendices/phenotype.md new file mode 100644 index 0000000000..53afb47206 --- /dev/null +++ b/src/appendices/phenotype.md @@ -0,0 +1,331 @@ +# Tabular phenotypic data guidelines + +This appendix is a collection of guidelines and examples for creating well-organized aggregated tabular phenotypic data. + +## Guidelines + +These guidelines are all **RECOMMENDED** when preparing +tabular phenotypic data like the +participants file, sessions file, demographics file, +or phenotypic and assessment data. +The language below uses REQUIRED, MUST, and others to imply +these are the requirements for these **RECOMMENDED** guidelines. + +### 1. Always pair tabular data with data dictionaries + +Tabular phenotypic data MUST be prepared as one pair of a tabular file +in tab-separated value (TSV) format and a corresponding data dictionary +in JavaScript Object Notation (JSON) format. + +### 2. Aggregate data across sessions + +Aggregation refers to the contents of the TSV file. It is REQUIRED +to collect all participant data into one TSV per tabular phenotypic file. + +### 3. Ensure minimal annotation for phenotypic and assessment data + +In phenotypic and assessment data each measurement tool has an independent +aggregated data TSV file in which the user collects all subjects, sessions, +and/or runs of data as one entry per row (with a row defined by +the smallest unit of acquisition). In other words: + +1. Each row MUST start with `participant_id`. +2. Each TSV file MUST contain a `session_id` column when +multiple [sessions](../glossary.md#session-entities)[^1] are present +in the data set regardless of whether those sessions are in +the `phenotype/` data, `sub-