Skip to content

Fix readme adoc#795

Open
noatgnu wants to merge 9 commits intomasterfrom
fix-readme-adoc
Open

Fix readme adoc#795
noatgnu wants to merge 9 commits intomasterfrom
fix-readme-adoc

Conversation

@noatgnu
Copy link
Collaborator

@noatgnu noatgnu commented Feb 8, 2026

update adoc for various template to conform with the yaml template

Summary by CodeRabbit

  • Documentation

    • Added comprehensive versioning and deprecation policy documentation.
    • Enhanced template creation guide with step-by-step instructions and concrete examples.
    • Added llms.txt availability badge to repository.
  • Changes

    • Relaxed metadata requirements across multiple templates, moving fields from required/recommended to optional.
    • Updated DIA acquisition template with new method documentation and optional mass-tolerance columns.
    • Removed deprecated LinkML schema file.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 8, 2026

📝 Walkthrough

Walkthrough

This PR comprehensively updates SDRF-Proteomics documentation by adding versioning and deprecation policies, expanding template creation guidance with concrete examples, relaxing multiple template metadata field requirements, removing the LinkML schema file, and adding an llms.txt reference documentation.

Changes

Cohort / File(s) Summary
Documentation Badge & Reference
README.md, llms.txt
Added llms.txt availability badge link to README and created comprehensive SDRF-Proteomics documentation file enumerating specification, templates, versioning, tools, examples, and publications.
Core SDRF Versioning & Policy
sdrf-proteomics/README.adoc, sdrf-proteomics/VERSIONING.adoc
Added versioning cross-reference and changed comment[technical replicate] from REQUIRED to RECOMMENDED; introduced comprehensive versioning and deprecation policy covering three independently versioned components, compatibility matrix, feature lifecycle, grace periods, and validator operations.
Template Creation Guidance
sdrf-proteomics/metadata-guidelines/template-definitions.adoc
Converted minimal template creation section into full end-to-end workflow with concrete top-down template example, structured 7-step guide, expanded YAML schema demonstrating column inheritance/overrides, validation semantics, and PR submission guidance.
Template Requirement Changes
sdrf-proteomics/templates/affinity-proteomics/README.adoc, sdrf-proteomics/templates/cell-lines/README.adoc, sdrf-proteomics/templates/human/README.adoc, sdrf-proteomics/templates/metaproteomics/README.adoc
Downgraded multiple metadata columns from RECOMMENDED/REQUIRED to OPTIONAL: comment[plate], characteristics[sampling site], comment[cell line source], characteristics[age], characteristics[sex], and moved characteristics[sample collection method] to recommended.
DIA Template Updates
sdrf-proteomics/templates/dia-acquisition/README.adoc
Replaced comment[number of isolation windows] with comment[DIA method], added dual HTML5/non-HTML5 backend table structures, introduced optional precursor mass tolerance and fragment mass tolerance columns, and updated example SDRF content.
Crosslinking Template Simplification
sdrf-proteomics/templates/crosslinking/README.adoc
Removed optional characteristics[enrichment process] column entry from crosslinking template.
Schema File Removal
sdrf-proteomics/templates/sdrf-template-schema.linkml.yaml
Deleted comprehensive LinkML schema file (377 lines) that defined SDRF proteomics template structure, validation rules, enumerations, and metadata for template YAML files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

Review effort 3/5

Suggested reviewers

  • nithujohn
  • timosachsenberg
  • levitsky
  • enryH
  • fabianegli

Poem

🐰 Hop, hop, hooray! — A rabbit's cheer
Templates polished, versioning clear,
Schemas shuffled, requirements eased,
Docs expanded, guidance released!
📚✨ The proteomics path grows bright!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Fix readme adoc' is vague and generic, using non-descriptive terms that don't convey the scope or nature of the substantial changes across multiple template files and documentation. Consider a more descriptive title that reflects the actual scope, such as 'Update SDRF template documentation to conform with versioning and schema standards' or 'Harmonize template metadata requirements across SDRF documentation'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-readme-adoc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sdrf-proteomics/templates/dia-acquisition/README.adoc (1)

182-220: ⚠️ Potential issue | 🟡 Minor

Remove duplicate Optional Columns section.

Lines 182-220 duplicate content already present in lines 116-178. The "Optional Columns" section appears twice with overlapping column definitions (precursor mass tolerance and fragment mass tolerance). This creates confusion and should be consolidated into a single section.

♻️ Proposed fix

Remove lines 182-220 entirely, as the optional columns are already documented in the earlier section (lines 116-178).

 endif::[]
 
-NOTE: The parent term for all DIA methods is https://www.ebi.ac.uk/ols4/ontologies/pride/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPRIDE_0000450[Data-independent acquisition (PRIDE:0000450)]. Use `comment[DIA method]` to specify the specific variant when known. Check the https://www.ebi.ac.uk/ols4/ontologies/pride[PRIDE Controlled Vocabulary] for available terms.
-
-=== Optional Columns
-
-The following columns are OPTIONAL but commonly used:
-
-ifdef::backend-html5[]
-++++
-<table class="tableblock frame-all grid-all stretch requirements-table">
-<thead>
-<tr><th>Column</th><th>Requirement</th><th>Description</th><th>Ontology/CV</th><th>Example Values</th></tr>
-</thead>
-<tbody>
-<tr>
-<td><code>comment[precursor mass tolerance]</code></td>
-<td><span class="optional-badge">OPTIONAL</span></td>
-<td>Mass tolerance for precursor ions in database search</td>
-<td>Numeric value with unit</td>
-<td>10 ppm, 20 ppm</td>
-</tr>
-<tr>
-<td><code>comment[fragment mass tolerance]</code></td>
-<td><span class="optional-badge">OPTIONAL</span></td>
-<td>Mass tolerance for fragment ions in database search</td>
-<td>Numeric value with unit</td>
-<td>0.02 Da, 20 ppm</td>
-</tr>
-</tbody>
-</table>
-++++
-endif::[]
-
-ifndef::backend-html5[]
-[cols="2,1,2,1,2", options="header"]
-|===
-|Column Name |Requirement |Description |Ontology/CV |Example Values
-
-|comment[precursor mass tolerance] |OPTIONAL |Mass tolerance for precursor ions in database search |Numeric value with unit |10 ppm, 20 ppm
-|comment[fragment mass tolerance] |OPTIONAL |Mass tolerance for fragment ions in database search |Numeric value with unit |0.02 Da, 20 ppm
-|===
-endif::[]
-
 [[stepped-collision-energy]]
 == Stepped Collision Energy
🤖 Fix all issues with AI agents
In `@llms.txt`:
- Around line 1-121: The "Template YAML Schemas" block refers to the
uninitialized git submodule sdrf-templates (entries like
sdrf-templates/templates.yaml, base/1.1.0/base.yaml,
ms-proteomics/1.1.0/ms-proteomics.yaml, etc.) which is empty; either initialize
and populate that submodule so those files exist and update llms.txt paths to
the real files, or remove the entire "Template YAML Schemas" section (the
sdrf-templates block) from llms.txt; also verify and either add or remove the
annotated-projects/PXD017710/PXD017710.sdrf.tsv reference (update the
annotated-projects list accordingly) so no broken references remain.

In `@README.md`:
- Line 10: Update the badge link in README.md so it points to the correct
repository: replace the URL referencing bigbio/proteomics-metadata-standard with
bigbio/proteomics-sample-metadata (change the llms.txt link from
https://github.com/bigbio/proteomics-metadata-standard/blob/master/llms.txt to
https://github.com/bigbio/proteomics-sample-metadata/blob/master/llms.txt) so
the [![llms.txt](...)] badge points to the existing llms.txt file.

In `@sdrf-proteomics/VERSIONING.adoc`:
- Around line 248-253: The example validator message in VERSIONING.adoc contains
an invalid placeholder URL ("https://github.com/bigbio/.../CHANGELOG.md");
update that string literal in the [source] example so it points to a concrete
changelog path (e.g. the real repo CHANGELOG URL) or use a clear placeholder
format like {CHANGELOG_URL} or https://github.com/bigbio/REPO_NAME/CHANGELOG.md
to satisfy link checkers and make the example valid.

Comment on lines +1 to +121
# SDRF-Proteomics

> SDRF-Proteomics is a HUPO-PSI community standard defining a tab-delimited file format for capturing sample-to-data-file relationships in proteomics experiments. It standardizes sample metadata (organism, disease, tissue), technical metadata (instrument, labels, enzymes), and experimental design (factor values) to enable automated reprocessing and reuse of public proteomics datasets. Compatible with MAGE-TAB SDRF from transcriptomics.

## Specification

- sdrf-proteomics/README.adoc - Core specification: format rules, column headers, cell values, templates, factor values, ontologies
- sdrf-proteomics/quickstart.adoc - Quick Start Tutorial (10-15 min)
- sdrf-proteomics/metadata-guidelines/sample-metadata.adoc - Sample Metadata Guidelines: age, sex, disease, organism part, cell type
- sdrf-proteomics/metadata-guidelines/template-definitions.adoc - Template Definitions Guide (for developers)
- sdrf-proteomics/metadata-guidelines/sdrf-terms.tsv - SDRF Terms Reference: all column terms with ontology mappings

- sdrf-proteomics/VERSIONING.adoc - Versioning and Deprecation Policy: version tracks, template compatibility, deprecation lifecycle, transition timelines
- sdrf-proteomics/open-issues.adoc - Open Issues and Future Decisions: community discussions for post-v1.1.0 changes
- psi-document/v1.0.0/SDRF_Proteomics_Specification_v1.0.0.pdf - Official HUPO-PSI specification (PDF, v1.0.0)
- psi-document/v1.1.0-dev/sdrf-proteomics-specification-v1.1.0-dev.pdf - Development specification (PDF, v1.1.0-dev)

## Templates

- sdrf-proteomics/templates/ms-proteomics/README.adoc - MS-Proteomics: labels, instruments, modifications, cleavage agents
- sdrf-proteomics/templates/affinity-proteomics/README.adoc - Affinity Proteomics: Olink and SomaScan
- sdrf-proteomics/templates/human/README.adoc - Human: disease, age, sex, ancestry, disease staging
- sdrf-proteomics/templates/vertebrates/README.adoc - Vertebrates: mouse, rat, zebrafish
- sdrf-proteomics/templates/invertebrates/README.adoc - Invertebrates: Drosophila, C. elegans
- sdrf-proteomics/templates/plants/README.adoc - Plants: Arabidopsis, crops
- sdrf-proteomics/templates/cell-lines/README.adoc - Cell Lines: Cellosaurus integration
- sdrf-proteomics/templates/dda-acquisition/README.adoc - DDA Acquisition: dissociation method, collision energy
- sdrf-proteomics/templates/dia-acquisition/README.adoc - DIA Acquisition: scan windows, isolation width
- sdrf-proteomics/templates/single-cell/README.adoc - Single-Cell Proteomics: cell isolation, carrier proteome
- sdrf-proteomics/templates/immunopeptidomics/README.adoc - Immunopeptidomics: MHC class, HLA typing
- sdrf-proteomics/templates/crosslinking/README.adoc - Crosslinking MS: crosslinker reagents
- sdrf-proteomics/templates/metaproteomics/README.adoc - Metaproteomics: environmental and microbiome samples
- sdrf-proteomics/templates/olink/README.adoc - Olink: proximity extension assays
- sdrf-proteomics/templates/somascan/README.adoc - SomaScan: aptamer-based proteomics

## Template YAML Schemas (sdrf-templates submodule)

Machine-readable YAML definitions used by sdrf-pipelines for validation. Each template has a `.yaml` schema and an optional `.sdrf.tsv` example file. Templates follow a layered hierarchy: base → technology → sample/experiment.

- sdrf-proteomics/sdrf-templates/templates.yaml - Template manifest: all templates with latest versions, inheritance, and layer metadata
- sdrf-proteomics/sdrf-templates/base/1.1.0/base.yaml - Base template (internal, not user-facing): shared columns inherited by all templates
- sdrf-proteomics/sdrf-templates/base/1.1.0/base.sdrf.tsv - Base example
- sdrf-proteomics/sdrf-templates/ms-proteomics/1.1.0/ms-proteomics.yaml - MS-Proteomics (technology layer): minimum valid template for any MS experiment
- sdrf-proteomics/sdrf-templates/ms-proteomics/1.1.0/ms-proteomics.sdrf.tsv - MS-Proteomics example
- sdrf-proteomics/sdrf-templates/affinity-proteomics/1.1.0/affinity-proteomics.yaml - Affinity Proteomics (technology layer): Olink, SomaScan base
- sdrf-proteomics/sdrf-templates/affinity-proteomics/1.1.0/affinity-proteomics.sdrf.tsv - Affinity Proteomics example
- sdrf-proteomics/sdrf-templates/human/1.1.0/human.yaml - Human (sample layer): disease, age, sex, ancestry
- sdrf-proteomics/sdrf-templates/human/1.1.0/human.sdrf.tsv - Human example
- sdrf-proteomics/sdrf-templates/vertebrates/1.1.0/vertebrates.yaml - Vertebrates (sample layer): mouse, rat, zebrafish, etc.
- sdrf-proteomics/sdrf-templates/vertebrates/1.1.0/vertebrates.sdrf.tsv - Vertebrates example
- sdrf-proteomics/sdrf-templates/invertebrates/1.1.0/invertebrates.yaml - Invertebrates (sample layer): Drosophila, C. elegans
- sdrf-proteomics/sdrf-templates/invertebrates/1.1.0/invertebrates.sdrf.tsv - Invertebrates example
- sdrf-proteomics/sdrf-templates/plants/1.1.0/plants.yaml - Plants (sample layer): Arabidopsis, crops
- sdrf-proteomics/sdrf-templates/plants/1.1.0/plants.sdrf.tsv - Plants example
- sdrf-proteomics/sdrf-templates/cell-lines/1.1.0/cell-lines.yaml - Cell Lines (experiment layer): Cellosaurus integration
- sdrf-proteomics/sdrf-templates/cell-lines/1.1.0/cell-lines.sdrf.tsv - Cell Lines example
- sdrf-proteomics/sdrf-templates/dda-acquisition/1.1.0/dda-acquisition.yaml - DDA Acquisition (experiment layer): dissociation method, collision energy
- sdrf-proteomics/sdrf-templates/dda-acquisition/1.1.0/dda-acquisition.sdrf.tsv - DDA example
- sdrf-proteomics/sdrf-templates/dia-acquisition/1.1.0/dia-acquisition.yaml - DIA Acquisition (experiment layer): scan windows, isolation width
- sdrf-proteomics/sdrf-templates/dia-acquisition/1.1.0/dia-acquisition.sdrf.tsv - DIA example
- sdrf-proteomics/sdrf-templates/crosslinking/1.1.0/crosslinking.yaml - Crosslinking MS (experiment layer): crosslinker reagents
- sdrf-proteomics/sdrf-templates/crosslinking/1.1.0/crosslinking.sdrf.tsv - Crosslinking example
- sdrf-proteomics/sdrf-templates/single-cell/1.0.0/single-cell.yaml - Single-Cell (experiment layer): cell isolation, carrier proteome
- sdrf-proteomics/sdrf-templates/single-cell/1.0.0/single-cell.sdrf.tsv - Single-Cell example
- sdrf-proteomics/sdrf-templates/immunopeptidomics/1.0.0-dev/immunopeptidomics.yaml - Immunopeptidomics (experiment layer): MHC class, HLA typing
- sdrf-proteomics/sdrf-templates/metaproteomics/1.0.0-dev/metaproteomics.yaml - Metaproteomics (experiment layer): environmental and microbiome samples
- sdrf-proteomics/sdrf-templates/metaproteomics/1.0.0-dev/metaproteomics.sdrf.tsv - Metaproteomics example
- sdrf-proteomics/sdrf-templates/olink/1.0.0/olink.yaml - Olink (experiment layer): proximity extension assays
- sdrf-proteomics/sdrf-templates/olink/1.0.0/olink.sdrf.tsv - Olink example
- sdrf-proteomics/sdrf-templates/somascan/1.0.0/somascan.yaml - SomaScan (experiment layer): aptamer-based proteomics
- sdrf-proteomics/sdrf-templates/somascan/1.0.0/somascan.sdrf.tsv - SomaScan example

## Tools

- sdrf-proteomics/tool-support.adoc - Tool Support Overview: annotators, validators, analysis tools
- https://github.com/bigbio/sdrf-pipelines - sdrf-pipelines: official Python CLI/library for SDRF validation
- https://lessdrf.streamlit.app/ - lesSDRF: web-based SDRF creation tool
- https://cupcake-vanilla-demo.proteo.nexus/ - CupCAKE: web annotation platform with ontology integration
- https://quantms.org/ - quantms: Nextflow pipeline for quantitative proteomics
- https://www.maxquant.org/ - MaxQuant: desktop proteomics software with SDRF export
- https://github.com/wombat-p - Wombat-P: benchmarking platform for proteomics workflows

## Examples

- examples/core/PXD002137/PXD002137.sdrf.tsv - Core example: label-free
- examples/core/PXD004684/PXD004684.sdrf.tsv - Core example: TMT labeled
- examples/core/PXD006482/PXD006482.sdrf.tsv - Core example: SILAC
- examples/core/PXD008934/PXD008934.sdrf.tsv - Core example: human proteome
- examples/core/PDC000126/PDC000126.sdrf.tsv - Core example: PDC dataset
- examples/use-cases/crosslinking.sdrf.tsv - Use case: crosslinking MS
- examples/use-cases/immunopeptidomics.sdrf.tsv - Use case: immunopeptidomics
- examples/use-cases/single-cell.sdrf.tsv - Use case: single-cell proteomics

## Annotated Projects

- annotated-projects/ - 250+ public proteomics datasets annotated in SDRF format
- annotated-projects/PXD008934/PXD008934.sdrf.tsv - Label-free quantification
- annotated-projects/PXD017710/PXD017710.sdrf.tsv - TMT-labeled quantitative proteomics
- annotated-projects/PXD000612/PXD000612.sdrf.tsv - SILAC-based quantification
- annotated-projects/PXD018830/PXD018830-DIA.sdrf.tsv - Data-independent acquisition
- annotated-projects/PXD000759/PXD000759.sdrf.tsv - Phosphoproteomics
- annotated-projects/PXD001819/PXD001819.sdrf.tsv - Cell line proteomics

## Publications

- https://www.nature.com/articles/s41467-021-26111-3 - Dai et al. (2021) Nat Commun: A proteomics sample metadata representation for multiomics integration
- https://pubs.acs.org/doi/abs/10.1021/acs.jproteome.0c00376 - Perez-Riverol et al. (2020) J Proteome Res: Towards a sample metadata standard in public proteomics repositories

## Project

- README.md - Project overview and contributor list
- CHANGELOG.md - Version history and changes
- CITATION.cff - Citation metadata
- LICENSE - GNU General Public License
- DEVELOPMENT.md - Building the documentation website locally

## Optional

- https://github.com/bigbio/proteomics-metadata-standard/wiki - 30-Minute Guide to SDRF-Proteomics
- https://www.youtube.com/watch?v=TMDu_yTzYQM - Introduction to SDRF-Proteomics (video)
- https://www.psidev.info/sdrf-sample-data-relationship-format - HUPO-PSI official page
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# First, let's understand the repository structure
echo "=== Repository Structure ==="
ls -la | head -20

echo -e "\n=== Checking critical documentation files ==="
for file in \
  "sdrf-proteomics/README.adoc" \
  "sdrf-proteomics/quickstart.adoc" \
  "sdrf-proteomics/VERSIONING.adoc" \
  "sdrf-proteomics/metadata-guidelines/sample-metadata.adoc" \
  "sdrf-proteomics/metadata-guidelines/template-definitions.adoc" \
  "sdrf-proteomics/metadata-guidelines/sdrf-terms.tsv" \
  "psi-document/v1.0.0/SDRF_Proteomics_Specification_v1.0.0.pdf" \
  "psi-document/v1.1.0-dev/sdrf-proteomics-specification-v1.1.0-dev.pdf"
do
  if [ -f "$file" ]; then
    echo "✓ EXISTS: $file"
  else
    echo "✗ MISSING: $file"
  fi
done

echo -e "\n=== Checking template README files ==="
for template in ms-proteomics affinity-proteomics human vertebrates invertebrates plants cell-lines dda-acquisition dia-acquisition single-cell immunopeptidomics crosslinking metaproteomics olink somascan
do
  if [ -f "sdrf-proteomics/templates/$template/README.adoc" ]; then
    echo "✓ EXISTS: sdrf-proteomics/templates/$template/README.adoc"
  else
    echo "✗ MISSING: sdrf-proteomics/templates/$template/README.adoc"
  fi
done

echo -e "\n=== Checking sdrf-templates submodule ==="
if [ -d "sdrf-proteomics/sdrf-templates" ]; then
  echo "✓ sdrf-templates directory exists"
  if [ -f "sdrf-proteomics/sdrf-templates/templates.yaml" ]; then
    echo "✓ templates.yaml manifest found"
  else
    echo "✗ MISSING: templates.yaml"
  fi
  
  # Check for template YAML files mentioned
  echo -e "\n=== Checking template YAML files ==="
  for path in \
    "sdrf-proteomics/sdrf-templates/base/1.1.0/base.yaml" \
    "sdrf-proteomics/sdrf-templates/ms-proteomics/1.1.0/ms-proteomics.yaml" \
    "sdrf-proteomics/sdrf-templates/human/1.1.0/human.yaml" \
    "sdrf-proteomics/sdrf-templates/single-cell/1.0.0/single-cell.yaml" \
    "sdrf-proteomics/sdrf-templates/immunopeptidomics/1.0.0-dev/immunopeptidomics.yaml" \
    "sdrf-proteomics/sdrf-templates/metaproteomics/1.0.0-dev/metaproteomics.yaml" \
    "sdrf-proteomics/sdrf-templates/olink/1.0.0/olink.yaml" \
    "sdrf-proteomics/sdrf-templates/somascan/1.0.0/somascan.yaml"
  do
    if [ -f "$path" ]; then
      echo "✓ EXISTS: $path"
    else
      echo "✗ MISSING: $path"
    fi
  done
else
  echo "✗ MISSING: sdrf-templates directory/submodule"
fi

echo -e "\n=== Checking examples directory ==="
for file in \
  "examples/core/PXD002137/PXD002137.sdrf.tsv" \
  "examples/core/PXD004684/PXD004684.sdrf.tsv" \
  "examples/core/PXD006482/PXD006482.sdrf.tsv" \
  "examples/core/PXD008934/PXD008934.sdrf.tsv" \
  "examples/core/PDC000126/PDC000126.sdrf.tsv" \
  "examples/use-cases/crosslinking.sdrf.tsv" \
  "examples/use-cases/immunopeptidomics.sdrf.tsv" \
  "examples/use-cases/single-cell.sdrf.tsv"
do
  if [ -f "$file" ]; then
    echo "✓ EXISTS: $file"
  else
    echo "✗ MISSING: $file"
  fi
done

echo -e "\n=== Checking annotated-projects directory ==="
if [ -d "annotated-projects" ]; then
  echo "✓ annotated-projects directory exists"
  for file in \
    "annotated-projects/PXD008934/PXD008934.sdrf.tsv" \
    "annotated-projects/PXD017710/PXD017710.sdrf.tsv" \
    "annotated-projects/PXD000612/PXD000612.sdrf.tsv" \
    "annotated-projects/PXD018830/PXD018830-DIA.sdrf.tsv" \
    "annotated-projects/PXD000759/PXD000759.sdrf.tsv" \
    "annotated-projects/PXD001819/PXD001819.sdrf.tsv"
  do
    if [ -f "$file" ]; then
      echo "✓ EXISTS: $file"
    else
      echo "✗ MISSING: $file"
    fi
  done
else
  echo "✗ MISSING: annotated-projects directory"
fi

echo -e "\n=== Checking project files ==="
for file in "README.md" "CHANGELOG.md" "CITATION.cff" "LICENSE" "DEVELOPMENT.md"
do
  if [ -f "$file" ]; then
    echo "✓ EXISTS: $file"
  else
    echo "✗ MISSING: $file"
  fi
done

Repository: bigbio/proteomics-sample-metadata

Length of output: 4565


🏁 Script executed:

# Check if sdrf-templates is a git submodule
echo "=== Checking .gitmodules ==="
cat .gitmodules

echo -e "\n=== Contents of sdrf-templates directory ==="
ls -la sdrf-proteomics/sdrf-templates/ 2>/dev/null || echo "Directory empty or not readable"

echo -e "\n=== Check git submodule status ==="
git config --file=.gitmodules --name-only --get-regexp path

echo -e "\n=== Detailed submodule info ==="
git config --file=.gitmodules --get-regexp submodule || echo "No submodule configuration found"

echo -e "\n=== Check if llms.txt actually references missing files ==="
grep -n "sdrf-templates" llms.txt | head -20

Repository: bigbio/proteomics-sample-metadata

Length of output: 3068


Fix broken references to uninitialized git submodule in llms.txt.

The sdrf-templates git submodule (lines 36–71) is registered but not initialized. The directory exists but is empty, making all referenced YAML files (templates.yaml, base.yaml, ms-proteomics.yaml, etc.) unavailable. This undermines llms.txt's purpose as a navigation guide for LLMs and developers.

Additionally, one annotated project is missing: annotated-projects/PXD017710/PXD017710.sdrf.tsv (referenced line 101).

Required fixes:

  1. Either initialize the sdrf-templates submodule and update the llms.txt file to reference the actual paths, or remove the entire "Template YAML Schemas" section (lines 36–71)
  2. Remove or verify the reference to annotated-projects/PXD017710/PXD017710.sdrf.tsv

All other documented paths (specification files, template README.adoc files, examples, and annotated projects except PXD017710) are accurate and exist.

🤖 Prompt for AI Agents
In `@llms.txt` around lines 1 - 121, The "Template YAML Schemas" block refers to
the uninitialized git submodule sdrf-templates (entries like
sdrf-templates/templates.yaml, base/1.1.0/base.yaml,
ms-proteomics/1.1.0/ms-proteomics.yaml, etc.) which is empty; either initialize
and populate that submodule so those files exist and update llms.txt paths to
the real files, or remove the entire "Template YAML Schemas" section (the
sdrf-templates block) from llms.txt; also verify and either add or remove the
annotated-projects/PXD017710/PXD017710.sdrf.tsv reference (update the
annotated-projects list accordingly) so no broken references remain.

![Contributors](https://flat.badgen.net/github/contributors/bigbio/proteomics-metadata-standard)
![Watchers](https://flat.badgen.net/github/watchers/bigbio/proteomics-metadata-standard)
![Stars](https://flat.badgen.net/github/stars/bigbio/proteomics-metadata-standard)
[![llms.txt](https://flat.badgen.net/static/llms.txt/available/blue)](https://github.com/bigbio/proteomics-metadata-standard/blob/master/llms.txt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check if llms.txt file exists in the repository
find . -name "llms.txt" -type f 2>/dev/null

Repository: bigbio/proteomics-sample-metadata

Length of output: 87


🏁 Script executed:

# Verify README.md content around line 10
head -20 README.md | cat -n

Repository: bigbio/proteomics-sample-metadata

Length of output: 2661


🏁 Script executed:

# List files in the repository root to understand the structure
ls -la | head -20

Repository: bigbio/proteomics-sample-metadata

Length of output: 1287


Fix the badge URL to use the correct repository name.

The llms.txt file already exists in the repository. The badge link will fail not because the file is missing, but because the URL references the wrong repository. The badge URL should point to bigbio/proteomics-sample-metadata instead of bigbio/proteomics-metadata-standard.

Correct the URL in line 10 from:

https://github.com/bigbio/proteomics-metadata-standard/blob/master/llms.txt

to:

https://github.com/bigbio/proteomics-sample-metadata/blob/master/llms.txt
🤖 Prompt for AI Agents
In `@README.md` at line 10, Update the badge link in README.md so it points to the
correct repository: replace the URL referencing
bigbio/proteomics-metadata-standard with bigbio/proteomics-sample-metadata
(change the llms.txt link from
https://github.com/bigbio/proteomics-metadata-standard/blob/master/llms.txt to
https://github.com/bigbio/proteomics-sample-metadata/blob/master/llms.txt) so
the [![llms.txt](...)] badge points to the existing llms.txt file.

Comment on lines +248 to +253
[source]
----
INFO: Template 'human v1.2.0' is available.
Your file uses 'human v1.1.0' and is valid under that version.
See CHANGELOG for what changed: https://github.com/bigbio/.../CHANGELOG.md
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the placeholder URL in the example validator message.

The URL contains a literal ... which makes it invalid and causes the link checker to fail. Replace with either a concrete path to the CHANGELOG or use a clear placeholder format.

🔗 Proposed fix
 [source]
 ----
 INFO: Template 'human v1.2.0' is available.
   Your file uses 'human v1.1.0' and is valid under that version.
-  See CHANGELOG for what changed: https://github.com/bigbio/.../CHANGELOG.md
+  See CHANGELOG for what changed: https://github.com/bigbio/proteomics-metadata-standard/blob/master/CHANGELOG.md
 ----
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[source]
----
INFO: Template 'human v1.2.0' is available.
Your file uses 'human v1.1.0' and is valid under that version.
See CHANGELOG for what changed: https://github.com/bigbio/.../CHANGELOG.md
----
[source]
----
INFO: Template 'human v1.2.0' is available.
Your file uses 'human v1.1.0' and is valid under that version.
See CHANGELOG for what changed: https://github.com/bigbio/proteomics-metadata-standard/blob/master/CHANGELOG.md
----
🤖 Prompt for AI Agents
In `@sdrf-proteomics/VERSIONING.adoc` around lines 248 - 253, The example
validator message in VERSIONING.adoc contains an invalid placeholder URL
("https://github.com/bigbio/.../CHANGELOG.md"); update that string literal in
the [source] example so it points to a concrete changelog path (e.g. the real
repo CHANGELOG URL) or use a clear placeholder format like {CHANGELOG_URL} or
https://github.com/bigbio/REPO_NAME/CHANGELOG.md to satisfy link checkers and
make the example valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants