Skip to content

Add SSSOM mappings for Dataset/DatasetCollection/File/FileCollection in the exchange layer#147

Merged
realmarcin merged 8 commits into
mainfrom
update_exchange
Apr 26, 2026
Merged

Add SSSOM mappings for Dataset/DatasetCollection/File/FileCollection in the exchange layer#147
realmarcin merged 8 commits into
mainfrom
update_exchange

Conversation

@realmarcin
Copy link
Copy Markdown
Collaborator

Summary

Closes a gap in the D4D ↔ RO-Crate semantic exchange layer: four classes (Dataset, DatasetCollection, File, FileCollection) were added to the D4D schema after the original SSSOM mappings were authored, so they had no entries in the SSSOM tables. This PR adds the missing class-level and key-slot mappings.

What's mapped

D4D class Predicate RO-Crate target Notes
d4d:DatasetCollection skos:exactMatch schema:Dataset RO-Crate root: @type=["Dataset", "https://w3id.org/EVI#ROCrate"], @id="./". Matches tree_root: true on CoreDatasetCollection.
d4d:DatasetCollection skos:closeMatch dcat:Catalog Semantic-catalog view; matches existing close_mappings: dcat:Catalog.
d4d:File skos:exactMatch schema:MediaObject Matches class_uri in D4D_FileCollection.yaml.
d4d:File skos:closeMatch schema:DigitalDocument Matches existing exact_mappings.
d4d:FileCollection skos:exactMatch schema:Dataset Nested Dataset in RO-Crate hasPart.
d4d:FileCollection skos:closeMatch dcat:Distribution When the FileCollection is logically a single distribution.

Plus 6 key-slot rows (DatasetCollection.resourcesschema:hasPart, File.file_typed4d:fileType, FileCollection.{resources, collection_type, file_count, total_bytes} → respective slot URIs).

Files

  • src/data_sheets_schema/alignment/d4d_rocrate_sssom_mapping.tsv: 95 → 107 rows
  • data/mappings/d4d_rocrate_structural_mapping.sssom.tsv: 149 → 155 rows
  • src/data_sheets_schema/alignment/d4d_rocrate_skos_alignment.ttl: +12 SKOS triples + dcat: prefix declaration

The existing d4d:Dataset → schema:Dataset row was left intact.

Out of scope (explicit follow-ups)

  • src/fairscape_integration/d4d_to_fairscape.py:292-295 — the converter code's existing # TODO: Convert FileCollection.resources to RO-Crate File entities is not addressed here. The mapping layer is now ready for that work.
  • The generated *_comprehensive*.tsv and *_uri*.tsv variants weren't regenerated. The canonical files (semantic + structural) are the source of truth in this PR; a follow-up can update generators (src/alignment/) to auto-discover the new classes.

Test plan

  • poetry run python -m pytest tests/test_alignment tests/test_fairscape_integration -v — 190 passed, 2 skipped, 0 failures
  • SSSOMIntegration parses both files; semantic via custom reader, structural via sssom-py
  • Spot-check via get_mappings_by_subject returns the new class+slot mappings
  • All new D4D class subjects (d4d:DatasetCollection, d4d:File, d4d:FileCollection) appear in the file

🤖 Generated with Claude Code

These four classes were added to the D4D schema after the original
semantic exchange layer was authored, leaving them without RO-Crate
mappings. This commit closes that gap.

Semantic SSSOM (src/data_sheets_schema/alignment/d4d_rocrate_sssom_mapping.tsv):
  +12 rows (95 → 107)
  - DatasetCollection → schema:Dataset (exactMatch, RO-Crate root)
  - DatasetCollection → dcat:Catalog (closeMatch, semantic-catalog view)
  - File → schema:MediaObject (exactMatch)
  - File → schema:DigitalDocument (closeMatch)
  - FileCollection → schema:Dataset (exactMatch, nested in hasPart)
  - FileCollection → dcat:Distribution (closeMatch)
  - 6 key-slot rows: DatasetCollection.resources/FileCollection.resources →
    schema:hasPart, File.file_type → d4d:fileType, FileCollection.{collection_type,
    file_count, total_bytes} → d4d:collectionType / d4d:fileCount / dcat:byteSize

Structural SSSOM (data/mappings/d4d_rocrate_structural_mapping.sssom.tsv):
  +6 rows (149 → 155) — slot-level rows mirroring the semantic-file slots

SKOS alignment (src/data_sheets_schema/alignment/d4d_rocrate_skos_alignment.ttl):
  - Added dcat: prefix declaration
  - Added 6 class-level + 6 slot-level skos triples mirroring the SSSOM rows

Per the user's note that DatasetCollection may be the RO-Crate root
(@type=["Dataset", "https://w3id.org/EVI#ROCrate"], @id="./"),
DatasetCollection is given a dual mapping: exactMatch → schema:Dataset
(root semantics) and closeMatch → dcat:Catalog (semantic-catalog view).

Out of scope for this PR (existing TODOs remain):
  - src/fairscape_integration/d4d_to_fairscape.py:292-295 — converter
    code does not yet traverse FileCollection.resources to emit RO-Crate
    File entities. The mapping layer is now ready; converter update is
    a separate follow-up.
  - The generated comprehensive/uri SSSOM variants weren't regenerated;
    the canonical files (semantic + structural) are the source of truth.

Validation:
  - SSSOMIntegration parses both files (semantic via custom reader,
    structural via sssom-py per the existing column-naming setup)
  - All 190 tests in tests/test_alignment + tests/test_fairscape_integration pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 26, 2026 01:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds missing semantic (SKOS) and structural (SSSOM) mappings for recently added D4D classes/slots (DatasetCollection, File, FileCollection) to close gaps in the D4D ↔ RO-Crate exchange-layer alignment.

Changes:

  • Added new class-level SKOS alignments for DatasetCollection, File, and FileCollection (plus dcat: prefix) in the TTL alignment file.
  • Added new class/slot rows to the semantic SSSOM TSV mapping table.
  • Added 6 new key-slot rows to the structural SSSOM TSV mapping table.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/data_sheets_schema/alignment/d4d_rocrate_sssom_mapping.tsv Adds new class + key-slot semantic mappings for DatasetCollection/File/FileCollection.
src/data_sheets_schema/alignment/d4d_rocrate_skos_alignment.ttl Extends the SKOS alignment graph with new class and slot triples and declares dcat: prefix.
data/mappings/d4d_rocrate_structural_mapping.sssom.tsv Adds new structural (class/slot → RO-Crate predicate) mappings for the same additions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/data_sheets_schema/semantic_exchange/d4d_rocrate_sssom_mapping.tsv Outdated
Comment thread src/data_sheets_schema/semantic_exchange/d4d_rocrate_sssom_mapping.tsv Outdated
Comment thread src/data_sheets_schema/alignment/d4d_rocrate_sssom_mapping.tsv Outdated
realmarcin and others added 5 commits April 25, 2026 20:15
A reusable Claude Code slash command that captures the workflow used in
this PR — adding D4D ↔ RO-Crate / FAIRSCAPE mappings for new schema
classes. The skill:

- Describes the 19-column semantic SSSOM and 17-column structural SSSOM
  layouts and points at the canonical files
- Provides a decision rubric for choosing primary/secondary RO-Crate
  targets based on class_uri / exact_mappings / tree_root annotations
- Includes row templates and a Python helper-script skeleton
- Documents standard RO-Crate target conventions (root Dataset,
  schema:MediaObject, dcat:Catalog, schema:hasPart, etc.)
- Specifies the mandatory validation step via SSSOMIntegration + pytest
- Codifies branch / commit / PR conventions
- Calls out known follow-ups to keep out of scope (converter TODOs,
  generator regen, schema YAML touch-ups)

Cross-references PR #147 as the canonical worked example.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generated from the D4D ↔ RO-Crate semantic SSSOM by parsing rocrate_json_path
patterns to extract entity types and their properties. Shows:
- Dataset (root) with properties grouped by namespace (schema.org, DCAT,
  FAIRSCAPE EVI, Croissant RAI, D4D-specific)
- Sub-entities: MediaObject, Person, Organization, Grant, CreativeWork,
  DefinedTerm
- Reference edges (author/creator/contributor → Person, funder → Grant,
  publisher → Organization, citation → CreativeWork, about → DefinedTerm,
  hasPart → MediaObject)
- ROCrate as root marker connected via dashed @type edge

Generator: src/alignment/ (helper script captured in /tmp during this PR);
rendered with graphviz dot -Gdpi=180.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-class side-by-side comparison of slot counts in the d4d-core
semantic exchange layer (left, orange) versus mapped/standard
RO-Crate properties on the corresponding target type (right, green).

Right-side counts combine SSSOM-discovered properties with the
schema.org / RO-Crate 1.1 baseline for sub-entity types
(Person, Organization, Grant, MediaObject, Distribution).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… site coverage

- src/data_sheets_schema/alignment/ → src/data_sheets_schema/semantic_exchange/
  (canonical SKOS TTL + semantic SSSOM artifacts)
- data/mappings/ → data/semantic_exchange/
  (sssom-py-compatible structural mapping + analysis docs)
- src/alignment/ → src/semantic_exchange/  (generator scripts)
- tests/test_alignment/ → tests/test_semantic_exchange/

Updated all path references in Makefile, generator scripts, schema YAMLs,
fairscape_integration, notes, and tests. All 190 tests pass.

Visibility improvements:
- README.md: new "D4D-Core Schema" + "Semantic Exchange Layer" sections
  with per-artifact path tables
- docs/home.md: top-level pointers to D4D-Core and Semantic Exchange
- docs/d4d_core.md: new hand-curated landing page for the core schema
  (artifacts, build/validate targets, curated example datasheets, class
  crosswalk, rationale)
- docs/semantic_exchange.md: new hand-curated landing page for the
  exchange layer (canonical artifacts, generator scripts, validation,
  /d4d-add-mapping workflow, namespaces, coverage stats)
- mkdocs.yml: added "D4D-Core" and "Semantic Exchange" to nav

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the chart only covered 8 hand-listed structural classes.
Now it shows every d4d-core class, sorted by slot count, in a
two-column layout with poster-friendly aspect (~1.84).

Right-side counts:
- Structural targets (Dataset/Distribution/Person/Org/Grant/etc.):
  full property surface (SSSOM-discovered + schema.org baseline)
- Property/wrapper classes: derived by looking up which slots have
  the class as range, then checking the SKOS TTL for mapped targets

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- SSSOM subject_id values for the 6 new key-slot rows now use the
  underscore form (d4d:Class_slot) to match the SKOS TTL subjects
  and what generate_sssom_mapping.py emits, so downstream lookups
  via SSSOMIntegration.get_mappings_by_subject() resolve correctly.
- SSSOM header refreshed: '# Total mappings: 107' (was 95) and
  '# Date: 2026-04-26'.
- SKOS TTL header bumped to Version 1.1 / Date 2026-04-26 and the
  alignment-statistics block updated to reflect the current 112
  triples (69 exact / 25 close / 10 related / 7 narrow / 1 broad)
  and the per-namespace counts (schema.org 57, rai 29, d4d 10,
  evi 7, dcat 3, rdf 1).

Tests: 190 passed, 2 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 47 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .claude/commands/d4d-add-mapping.md
The skill doc still pointed contributors at the pre-rename paths
(src/data_sheets_schema/alignment/, data/mappings/, src/alignment/,
tests/test_alignment/) so its grep, git-add, and validation snippets
no longer matched the canonical files. Repointed every reference to
the renamed directories.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@realmarcin realmarcin merged commit 7ae9832 into main Apr 26, 2026
3 checks passed
@realmarcin realmarcin deleted the update_exchange branch April 26, 2026 08:28
realmarcin added a commit that referenced this pull request Apr 27, 2026
* Add SSSOM mappings for Dataset/DatasetCollection/File/FileCollection

These four classes were added to the D4D schema after the original
semantic exchange layer was authored, leaving them without RO-Crate
mappings. This commit closes that gap.

Semantic SSSOM (src/data_sheets_schema/alignment/d4d_rocrate_sssom_mapping.tsv):
  +12 rows (95 → 107)
  - DatasetCollection → schema:Dataset (exactMatch, RO-Crate root)
  - DatasetCollection → dcat:Catalog (closeMatch, semantic-catalog view)
  - File → schema:MediaObject (exactMatch)
  - File → schema:DigitalDocument (closeMatch)
  - FileCollection → schema:Dataset (exactMatch, nested in hasPart)
  - FileCollection → dcat:Distribution (closeMatch)
  - 6 key-slot rows: DatasetCollection.resources/FileCollection.resources →
    schema:hasPart, File.file_type → d4d:fileType, FileCollection.{collection_type,
    file_count, total_bytes} → d4d:collectionType / d4d:fileCount / dcat:byteSize

Structural SSSOM (data/mappings/d4d_rocrate_structural_mapping.sssom.tsv):
  +6 rows (149 → 155) — slot-level rows mirroring the semantic-file slots

SKOS alignment (src/data_sheets_schema/alignment/d4d_rocrate_skos_alignment.ttl):
  - Added dcat: prefix declaration
  - Added 6 class-level + 6 slot-level skos triples mirroring the SSSOM rows

Per the user's note that DatasetCollection may be the RO-Crate root
(@type=["Dataset", "https://w3id.org/EVI#ROCrate"], @id="./"),
DatasetCollection is given a dual mapping: exactMatch → schema:Dataset
(root semantics) and closeMatch → dcat:Catalog (semantic-catalog view).

Out of scope for this PR (existing TODOs remain):
  - src/fairscape_integration/d4d_to_fairscape.py:292-295 — converter
    code does not yet traverse FileCollection.resources to emit RO-Crate
    File entities. The mapping layer is now ready; converter update is
    a separate follow-up.
  - The generated comprehensive/uri SSSOM variants weren't regenerated;
    the canonical files (semantic + structural) are the source of truth.

Validation:
  - SSSOMIntegration parses both files (semantic via custom reader,
    structural via sssom-py per the existing column-naming setup)
  - All 190 tests in tests/test_alignment + tests/test_fairscape_integration pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add /d4d-add-mapping skill for SSSOM exchange layer edits

A reusable Claude Code slash command that captures the workflow used in
this PR — adding D4D ↔ RO-Crate / FAIRSCAPE mappings for new schema
classes. The skill:

- Describes the 19-column semantic SSSOM and 17-column structural SSSOM
  layouts and points at the canonical files
- Provides a decision rubric for choosing primary/secondary RO-Crate
  targets based on class_uri / exact_mappings / tree_root annotations
- Includes row templates and a Python helper-script skeleton
- Documents standard RO-Crate target conventions (root Dataset,
  schema:MediaObject, dcat:Catalog, schema:hasPart, etc.)
- Specifies the mandatory validation step via SSSOMIntegration + pytest
- Codifies branch / commit / PR conventions
- Calls out known follow-ups to keep out of scope (converter TODOs,
  generator regen, schema YAML touch-ups)

Cross-references PR #147 as the canonical worked example.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add Fig 7: D4D RO-Crate profile schema diagram

Generated from the D4D ↔ RO-Crate semantic SSSOM by parsing rocrate_json_path
patterns to extract entity types and their properties. Shows:
- Dataset (root) with properties grouped by namespace (schema.org, DCAT,
  FAIRSCAPE EVI, Croissant RAI, D4D-specific)
- Sub-entities: MediaObject, Person, Organization, Grant, CreativeWork,
  DefinedTerm
- Reference edges (author/creator/contributor → Person, funder → Grant,
  publisher → Organization, citation → CreativeWork, about → DefinedTerm,
  hasPart → MediaObject)
- ROCrate as root marker connected via dashed @type edge

Generator: src/alignment/ (helper script captured in /tmp during this PR);
rendered with graphviz dot -Gdpi=180.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add Fig 8: butterfly chart of D4D-core vs RO-Crate slot density

Per-class side-by-side comparison of slot counts in the d4d-core
semantic exchange layer (left, orange) versus mapped/standard
RO-Crate properties on the corresponding target type (right, green).

Right-side counts combine SSSOM-discovered properties with the
schema.org / RO-Crate 1.1 baseline for sub-entity types
(Person, Organization, Grant, MediaObject, Distribution).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Rename mapping/alignment dirs to semantic_exchange; add README + docs site coverage

- src/data_sheets_schema/alignment/ → src/data_sheets_schema/semantic_exchange/
  (canonical SKOS TTL + semantic SSSOM artifacts)
- data/mappings/ → data/semantic_exchange/
  (sssom-py-compatible structural mapping + analysis docs)
- src/alignment/ → src/semantic_exchange/  (generator scripts)
- tests/test_alignment/ → tests/test_semantic_exchange/

Updated all path references in Makefile, generator scripts, schema YAMLs,
fairscape_integration, notes, and tests. All 190 tests pass.

Visibility improvements:
- README.md: new "D4D-Core Schema" + "Semantic Exchange Layer" sections
  with per-artifact path tables
- docs/home.md: top-level pointers to D4D-Core and Semantic Exchange
- docs/d4d_core.md: new hand-curated landing page for the core schema
  (artifacts, build/validate targets, curated example datasheets, class
  crosswalk, rationale)
- docs/semantic_exchange.md: new hand-curated landing page for the
  exchange layer (canonical artifacts, generator scripts, validation,
  /d4d-add-mapping workflow, namespaces, coverage stats)
- mkdocs.yml: added "D4D-Core" and "Semantic Exchange" to nav

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fig 8: expand butterfly chart to all 76 d4d-core classes (2-column)

Previously the chart only covered 8 hand-listed structural classes.
Now it shows every d4d-core class, sorted by slot count, in a
two-column layout with poster-friendly aspect (~1.84).

Right-side counts:
- Structural targets (Dataset/Distribution/Person/Org/Grant/etc.):
  full property surface (SSSOM-discovered + schema.org baseline)
- Property/wrapper classes: derived by looking up which slots have
  the class as range, then checking the SKOS TTL for mapped targets

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address Copilot review on PR #147

- SSSOM subject_id values for the 6 new key-slot rows now use the
  underscore form (d4d:Class_slot) to match the SKOS TTL subjects
  and what generate_sssom_mapping.py emits, so downstream lookups
  via SSSOMIntegration.get_mappings_by_subject() resolve correctly.
- SSSOM header refreshed: '# Total mappings: 107' (was 95) and
  '# Date: 2026-04-26'.
- SKOS TTL header bumped to Version 1.1 / Date 2026-04-26 and the
  alignment-statistics block updated to reflect the current 112
  triples (69 exact / 25 close / 10 related / 7 narrow / 1 broad)
  and the per-namespace counts (schema.org 57, rai 29, d4d 10,
  evi 7, dcat 3, rdf 1).

Tests: 190 passed, 2 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update /d4d-add-mapping skill paths to semantic_exchange

The skill doc still pointed contributors at the pre-rename paths
(src/data_sheets_schema/alignment/, data/mappings/, src/alignment/,
tests/test_alignment/) so its grep, git-add, and validation snippets
no longer matched the canonical files. Repointed every reference to
the renamed directories.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Drop duplicate / stale SSSOM artifacts from data/semantic_exchange

The data/semantic_exchange/ directory had grown to seven SSSOM TSV
copies, several of which were stale snapshots or byte-identical
duplicates of the canonical files in
src/data_sheets_schema/semantic_exchange/. Two of them were a v1/v2
pair that were impossible to interpret without comparing dates by
hand (v1 was the newer 2026-04-09 / 284-attr file; v2 was a stale
2026-03-23 / 268-attr file).

Deleted from data/semantic_exchange/:
- d4d_rocrate_sssom_mapping.tsv          (stale 102-row snapshot)
- d4d_rocrate_sssom_mapping_subset.tsv   (duplicate of src/)
- d4d_rocrate_sssom_comprehensive.tsv    (duplicate of src/)
- d4d_rocrate_sssom_uri_mapping.tsv      (duplicate of src/)
- d4d_rocrate_sssom_uri_comprehensive_v1.tsv  (duplicate of src/'s
                                                canonical
                                                d4d_rocrate_sssom_uri_comprehensive.tsv)
- d4d_rocrate_sssom_uri_comprehensive_v2.tsv  (stale older snapshot)
- d4d_rocrate_sssom_uri_interface.tsv    (orphan; not referenced
                                          anywhere in code or Make)

Kept in data/semantic_exchange/ (canonical here):
- d4d_rocrate_structural_mapping.sssom.tsv
- d4d_rocrate_structural_mapping_summary.md
- STRUCTURAL_MAPPING_ANALYSIS.md
- uri_mapping_recommendations.md
- README.md (rewritten to point at src/.../semantic_exchange/ for
             everything except the structural mapping)

Updated tests/test_semantic_exchange/test_sssom_validation.py to
look up comprehensive / uri / uri_comprehensive in the canonical
src/ tree instead of the deleted data/ copies. Tests: 190 passed,
2 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
realmarcin added a commit that referenced this pull request Apr 29, 2026
Conflict resolution:
- Canonical SSSOM/SKOS files at src/data_sheets_schema/semantic_exchange/:
  kept ours (114-row mapping, plus 7 d4d-core class additions on top of
  the 107-row baseline that PR #147 already shipped, plus expanded SKOS
  TTL).
- Mapping TSVs duplicated under data/semantic_exchange/: deleted.
  PR #148 (Name cleanup) already moved them to the canonical
  src/data_sheets_schema/semantic_exchange/ location.
- Poster figures added by main (fig7_rocrate_profile.{dot,png},
  fig8_exchange_butterfly.png): removed per project rule that poster
  artifacts don't get committed here.
- README + test_sssom_validation.py: took main's version (correctly
  reflects the post-#148 structural/canonical split).
- docs/html_output/concatenated/curated/*.html re-rendered from current
  renderer + curated YAMLs (generated, not hand-merged).
- data/semantic_exchange/d4d_rocrate_structural_mapping.sssom.tsv:
  kept ours (superset of main).

Tests: tests.test_semantic_exchange.test_sssom_validation passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
realmarcin added a commit that referenced this pull request Apr 29, 2026
…+ renderer subtitle annotations (#149)

* Add SSSOM mappings for Dataset/DatasetCollection/File/FileCollection

These four classes were added to the D4D schema after the original
semantic exchange layer was authored, leaving them without RO-Crate
mappings. This commit closes that gap.

Semantic SSSOM (src/data_sheets_schema/alignment/d4d_rocrate_sssom_mapping.tsv):
  +12 rows (95 → 107)
  - DatasetCollection → schema:Dataset (exactMatch, RO-Crate root)
  - DatasetCollection → dcat:Catalog (closeMatch, semantic-catalog view)
  - File → schema:MediaObject (exactMatch)
  - File → schema:DigitalDocument (closeMatch)
  - FileCollection → schema:Dataset (exactMatch, nested in hasPart)
  - FileCollection → dcat:Distribution (closeMatch)
  - 6 key-slot rows: DatasetCollection.resources/FileCollection.resources →
    schema:hasPart, File.file_type → d4d:fileType, FileCollection.{collection_type,
    file_count, total_bytes} → d4d:collectionType / d4d:fileCount / dcat:byteSize

Structural SSSOM (data/mappings/d4d_rocrate_structural_mapping.sssom.tsv):
  +6 rows (149 → 155) — slot-level rows mirroring the semantic-file slots

SKOS alignment (src/data_sheets_schema/alignment/d4d_rocrate_skos_alignment.ttl):
  - Added dcat: prefix declaration
  - Added 6 class-level + 6 slot-level skos triples mirroring the SSSOM rows

Per the user's note that DatasetCollection may be the RO-Crate root
(@type=["Dataset", "https://w3id.org/EVI#ROCrate"], @id="./"),
DatasetCollection is given a dual mapping: exactMatch → schema:Dataset
(root semantics) and closeMatch → dcat:Catalog (semantic-catalog view).

Out of scope for this PR (existing TODOs remain):
  - src/fairscape_integration/d4d_to_fairscape.py:292-295 — converter
    code does not yet traverse FileCollection.resources to emit RO-Crate
    File entities. The mapping layer is now ready; converter update is
    a separate follow-up.
  - The generated comprehensive/uri SSSOM variants weren't regenerated;
    the canonical files (semantic + structural) are the source of truth.

Validation:
  - SSSOMIntegration parses both files (semantic via custom reader,
    structural via sssom-py per the existing column-naming setup)
  - All 190 tests in tests/test_alignment + tests/test_fairscape_integration pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add /d4d-add-mapping skill for SSSOM exchange layer edits

A reusable Claude Code slash command that captures the workflow used in
this PR — adding D4D ↔ RO-Crate / FAIRSCAPE mappings for new schema
classes. The skill:

- Describes the 19-column semantic SSSOM and 17-column structural SSSOM
  layouts and points at the canonical files
- Provides a decision rubric for choosing primary/secondary RO-Crate
  targets based on class_uri / exact_mappings / tree_root annotations
- Includes row templates and a Python helper-script skeleton
- Documents standard RO-Crate target conventions (root Dataset,
  schema:MediaObject, dcat:Catalog, schema:hasPart, etc.)
- Specifies the mandatory validation step via SSSOMIntegration + pytest
- Codifies branch / commit / PR conventions
- Calls out known follow-ups to keep out of scope (converter TODOs,
  generator regen, schema YAML touch-ups)

Cross-references PR #147 as the canonical worked example.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add Fig 7: D4D RO-Crate profile schema diagram

Generated from the D4D ↔ RO-Crate semantic SSSOM by parsing rocrate_json_path
patterns to extract entity types and their properties. Shows:
- Dataset (root) with properties grouped by namespace (schema.org, DCAT,
  FAIRSCAPE EVI, Croissant RAI, D4D-specific)
- Sub-entities: MediaObject, Person, Organization, Grant, CreativeWork,
  DefinedTerm
- Reference edges (author/creator/contributor → Person, funder → Grant,
  publisher → Organization, citation → CreativeWork, about → DefinedTerm,
  hasPart → MediaObject)
- ROCrate as root marker connected via dashed @type edge

Generator: src/alignment/ (helper script captured in /tmp during this PR);
rendered with graphviz dot -Gdpi=180.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add Fig 8: butterfly chart of D4D-core vs RO-Crate slot density

Per-class side-by-side comparison of slot counts in the d4d-core
semantic exchange layer (left, orange) versus mapped/standard
RO-Crate properties on the corresponding target type (right, green).

Right-side counts combine SSSOM-discovered properties with the
schema.org / RO-Crate 1.1 baseline for sub-entity types
(Person, Organization, Grant, MediaObject, Distribution).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Rename mapping/alignment dirs to semantic_exchange; add README + docs site coverage

- src/data_sheets_schema/alignment/ → src/data_sheets_schema/semantic_exchange/
  (canonical SKOS TTL + semantic SSSOM artifacts)
- data/mappings/ → data/semantic_exchange/
  (sssom-py-compatible structural mapping + analysis docs)
- src/alignment/ → src/semantic_exchange/  (generator scripts)
- tests/test_alignment/ → tests/test_semantic_exchange/

Updated all path references in Makefile, generator scripts, schema YAMLs,
fairscape_integration, notes, and tests. All 190 tests pass.

Visibility improvements:
- README.md: new "D4D-Core Schema" + "Semantic Exchange Layer" sections
  with per-artifact path tables
- docs/home.md: top-level pointers to D4D-Core and Semantic Exchange
- docs/d4d_core.md: new hand-curated landing page for the core schema
  (artifacts, build/validate targets, curated example datasheets, class
  crosswalk, rationale)
- docs/semantic_exchange.md: new hand-curated landing page for the
  exchange layer (canonical artifacts, generator scripts, validation,
  /d4d-add-mapping workflow, namespaces, coverage stats)
- mkdocs.yml: added "D4D-Core" and "Semantic Exchange" to nav

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fig 8: expand butterfly chart to all 76 d4d-core classes (2-column)

Previously the chart only covered 8 hand-listed structural classes.
Now it shows every d4d-core class, sorted by slot count, in a
two-column layout with poster-friendly aspect (~1.84).

Right-side counts:
- Structural targets (Dataset/Distribution/Person/Org/Grant/etc.):
  full property surface (SSSOM-discovered + schema.org baseline)
- Property/wrapper classes: derived by looking up which slots have
  the class as range, then checking the SKOS TTL for mapped targets

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address Copilot review on PR #147

- SSSOM subject_id values for the 6 new key-slot rows now use the
  underscore form (d4d:Class_slot) to match the SKOS TTL subjects
  and what generate_sssom_mapping.py emits, so downstream lookups
  via SSSOMIntegration.get_mappings_by_subject() resolve correctly.
- SSSOM header refreshed: '# Total mappings: 107' (was 95) and
  '# Date: 2026-04-26'.
- SKOS TTL header bumped to Version 1.1 / Date 2026-04-26 and the
  alignment-statistics block updated to reflect the current 112
  triples (69 exact / 25 close / 10 related / 7 narrow / 1 broad)
  and the per-namespace counts (schema.org 57, rai 29, d4d 10,
  evi 7, dcat 3, rdf 1).

Tests: 190 passed, 2 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update /d4d-add-mapping skill paths to semantic_exchange

The skill doc still pointed contributors at the pre-rename paths
(src/data_sheets_schema/alignment/, data/mappings/, src/alignment/,
tests/test_alignment/) so its grep, git-add, and validation snippets
no longer matched the canonical files. Repointed every reference to
the renamed directories.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Expand semantic exchange layer to cover all concrete D4D classes

Coverage now: 73/76 d4d-core classes (96%) and 74/77 full schema
classes (96%) mapped via class-level SKOS triples. Only the abstract
base classes — DatasetProperty, Information, NamedThing — remain
intentionally unmapped.

SKOS TTL changes (v1.2 → 189 triples, was 119):
- New class-level mappings for: DataSubset, CoreDataset,
  CoreDatasetCollection, CoreDistribution
- New sub-entity class mappings for: Person, Creator, Organization,
  Grantor, Grant, FundingMechanism, VariableMetadata, Software,
  Maintainer, DataCollector
- New DatasetProperty subclass mappings for ~50 wrapper classes
  (Instance, SamplingStrategy, LabelingStrategy, AnnotationAnalysis,
  HumanSubjectResearch, InformedConsent, Deidentification,
  RawDataSource, ImputationProtocol, AtRiskPopulations, …)
- Final gap-fill for consent workflow, ExportControlRegulatoryRestrictions,
  MissingInfo, ThirdPartySharing, FormatDialect

SSSOM TSV: +7 class-level rows (114 total) — explicit Core* +
DataSubset rows so d4d-core has its own class-level SSSOM coverage
(d4d-core is the main visible / messaged data product).

Structural SSSOM: +4 class-level rows for the same.

Fig 5 (d4d-core) and Fig 6 (full) regenerated. Coloring scheme
updated: ORANGE = in SSSOM exchange layer (mapped); BLUE = not yet
mapped. Mapping detection credits a class if either it has a
class-level SKOS triple OR a schema slot ranges to it with a
slot-level SKOS triple.

Tests: 190 passed, 2 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add Fig 9: Ontologies & Standards used by D4D-Core

Horizontal bar chart of every external vocabulary referenced by the
d4d-core schema (class_uri / slot_uri / *_mappings) plus SKOS-target
counts from the alignment TTL. Bars colored by category:

- FAIR-core (schema.org, DCTerms, DCAT)
- RAI / Ethics (Croissant RAI, DUO)
- Provenance / Quality (FAIRSCAPE EVI, PROV-O, AIO, QUDT, SKOS)
- Domain / Internal (d4d:)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Re-render GC HTML thumbnails to show project title at top

Top-cropped 1100px slice from each project's curated
*_human_readable.html so the poster panel shows the project name
("AI READI Dataset Documentation", etc.) plus the first section
header rather than mid-document content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Re-render GC HTML thumbnails in forced dark theme

Adds src/html/output/forced-dark.css — a dark theme override that
extends datasheet-common.css with table styling for dark mode (the
existing @media (prefers-color-scheme: dark) block didn't cover
.data-table). Used via weasyprint's -s flag to render the four
project HTML thumbnails for the poster.

The dark theme matches the requested design: navy/charcoal
background, purple gradient header, dark cards with rounded corners,
blue-accent left border on table cells. Margins auto-trimmed in the
render script so the thumbnails are tight to content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Re-render HTML thumbs at 1750px crop + add schema-snippet PNG

Reduce thumb crop from 2400px back to 1750px (between original 1200
and doubled 2400) to make room for a d4d-core LinkML schema snippet
PNG below each project thumbnail in the poster's records panel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Pull section subtitles from schema annotations, not hardcoded dict

Add `annotations.d4d:section_question` to each of the 8 D4D module YAMLs
holding the canonical Datasheets-for-Datasets paper question (e.g.
Motivation: "For what purpose was the dataset created?"). Update the
human-readable HTML renderer to read these annotations via yaml.safe_load
and use them as the per-section subtitle, with the previous hardcoded
strings retained only as fallbacks.

This makes the schema the single source of truth for section subtitles,
so the HTML and YAML can no longer drift (e.g. "Why was this dataset
created?" vs "For what purpose was the dataset created?"). Re-rendered
the 4 curated GC datasheets and the dark-theme poster thumbnails.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Re-render butterfly chart as 4-panel 2x2 with 18pt row labels

Split 76 d4d-core classes into 4 quartiles (19 each) arranged 2x2 so the
row labels can grow from 8pt to 18pt while keeping all classes visible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Re-render butterfly chart as 6-panel 2x3 with 20pt row labels

Split 76 d4d-core classes into 6 groups (~13 each) arranged 2x3 to bring
row labels up to 20pt (from 18pt). Drop the "→ target" suffix to fit
narrower per-panel widths; the target type is captured by the
D4D-core / RO-Crate column headers above each panel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* D4D-Assistant figure: larger multi-arm robot, black gears, 20pt fonts

- Robot enlarged (use width 320 → 440) with viewBox extended to fit
  three angled arms per side (6 arms total).
- "Working indicator" gears under the robot recolored from orange to
  black per poster review feedback.
- Bumped chest "agent" text and "D4D-Core record" header from 16pt to
  20pt so all text in the figure is at least 20pt.
- Tagline updated: "Beautiful, computable D4D records" →
  "Standardized, computable D4D records".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Robot: add D4D red headband + lengthen head to fit

Extended the robot head height 120 → 140 px and shifted it up by 15 px.
Added a red curved headband across the top of the head with bold white
"D4D" lettering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Robot figure: add small crowd of stick figures per source row

Each Heterogeneous-sources row now has 4 stick figures (a foreground
"main" person plus 3 smaller / faded background figures) so the figure
reads as multiple project members contributing each kind of source
document.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Robot: thinner arms pointing outward + red D4D-agent chest panel

Replaced the chunky 24px arms with thin 12px arms (left/right side, 3
each) explicitly pointing outward at ±45° and 0°. Recolored the chest
"D4D agent" panel to match the red headband (white text on red).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Pipeline: add LinkML validator + feedback loop, shorter source arrows

Insert a LinkML-schema-validation block between the agent and the output
records. A red dashed feedback arrow ("errors → corrections") loops from
the validator back to the agent, showing iterative correction. Source-
to-robot input arrows trimmed by ~70 px so they don't intrude into the
robot's outward-pointing arms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Pipeline: feedback arrow now points at robot's red headlight

Re-routed the validation feedback path from the bottom of the agent
to the headlight at the top of the antenna, and fixed the red arrow
marker to userSpaceOnUse so the arrowhead is correctly sized (18x18 px)
relative to the headlight instead of being scaled up 5x by stroke
width. Also recolored the headlight from orange to clear red with a
subtle dark-red outline to match the "red light" framing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Pipeline figure: bigger fonts, restructured arms, larger people, smaller boxes

- Scale all fonts up so every label is at least 22 pt (sources/records
  headers 36, agent label 34, headband 26, project labels 24, etc.).
- Robot arms restructured: LEFT side gets three arms (top points UP,
  middle points LEFT, bottom points DOWN); RIGHT side has a single
  middle arm pointing into the validator. Arms are thin (14 px).
- Source rows: people enlarged (main 80→115, crowd 65→95 / 55→80) and
  cardboard boxes shrunk (220→170 wide, 160→120 tall) so contributors
  read as people, not packages.
- Feedback path tightened to land squarely on the antenna's red
  headlight at (1075,215); removed the redundant "validated" label
  next to the validator (records already show green checkmarks).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Remove poster assets from repo

Strip data/poster_assets/ — figures, thumbnails, QR codes, robot SVG —
that were committed in support of the Bridge2AI April 2026 F2F poster.
The Google Slides deck remains the canonical artefact for poster work;
none of these PNG/SVG/dot files are referenced by the schema or
publication site.

Schema and renderer changes that were motivated by poster work but are
useful in their own right are preserved on this branch (module-level
section_question annotations, the renderer reading them, the SSSOM
exchange-layer expansions, and the alignment → semantic_exchange
rename).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Robot: diagonal upper/lower-left arms, lower errors→corrections label

Restore the D4D Assistant pipeline figure (was removed in previous
clean-up) and tweak two details:

- Left arms now point diagonally away from the body: top arm UP-LEFT
  (rotate +45 around the upper-left shoulder), bottom arm DOWN-LEFT
  (rotate -45 around the lower-left shoulder); middle arm horizontal
  LEFT.
- "errors → corrections" label moved from y=55 down to y=145, hugging
  the apex of the red dashed feedback arc instead of floating high
  above it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Butterfly chart: 3 long panels (26/25/25) at 22pt row labels

Switch from 6 small panels (2x3, 13 classes each) to 3 long panels
(1x3, ~25 classes each) so the row labels can grow to 22pt while still
showing all 76 d4d-core classes. Figure aspect now ~1.25 (taller)
to fill the white space available below in the exchange-layer panel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Schema figures: color nodes/edges by D4D module, +1pt line weight

Replace the orange-vs-blue (mapped vs unmapped) coloring with a
per-module color scale on both fig5 (d4d-core) and fig6 (full schema):
Motivation/Composition/Collection/Preprocessing/Uses/Distribution/
Maintenance/Human/Ethics/Data_Governance get distinct hues; Variables
+ Base_import + structural top-level classes use neutral grays.

Composition arrows are colored by the source class's module so each
hub fans out as a module-colored star. All edge weights bumped +1 pt
(2.5 → 3.5 for hub borders, 1.5 → 2.5 for normal borders, 1 → 2 for
composition edges) so the figures read clearly at poster scale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Schema figures: curved edges + 70% opacity

Switch sfdp from splines=spline (which fell back to straight lines
when nodes touched) to splines=curved (Catmull-Rom, more permissive).
All edges get a B3 alpha suffix (70% opacity) so when an arrow crosses
a node or another arrow it reads through instead of obscuring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Butterfly chart: remove redundant suptitle

Drop the "Bridge2AI Semantic Exchange Layer — slot density across all
76 d4d-core classes ..." suptitle from the figure itself; the panel
title above the figure on the poster already carries that framing.
The freed vertical space lets the bars take up more of the canvas.

* Schema view: emphasize sparse FormatDialect edge so it doesn't drown in hub

The dialect → FormatDialect edge was getting visually lost among the ~50
other edges out of CoreDataset. Render it solid (no alpha), thicker
(4.0pt), and with a bold, larger label.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Butterfly chart: replace panel labels with explicit rank ranges

The previous "top 26 / next 25 / last 25 of 76" wording was confusing —
"top 26 + next 25" reads as 51, not 51 of a larger set. Use explicit
"ranks 1–26 of 76 (most slots)" / "ranks 27–51 of 76" / "ranks 52–76 of
76 (fewest slots)" so the totals are unambiguous.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Remove poster assets from repo

Poster artifacts (PNG/SVG figures) belong outside this repo. Keeping
them on the working tree only — not committed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Re-render curated D4D datasheets and ship CSS in nested path

Re-rendered AI_READI/CM4AI/VOICE curated HTMLs with current renderer
(adds blue-bar styling on description fields, schema-driven section
subtitles).

Add the renderer's stylesheet to the nested docs path so GitHub Pages
serves it co-located with the HTML files. Without this, the relative
<link href="datasheet-common.css"> 404s on the deployed site and the
page renders unstyled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address Copilot review on PR #149

1. src/html/human_readable_renderer.py: open the module YAML with
   explicit encoding='utf-8' so non-ASCII section_question text decodes
   consistently across platforms (matches the convention used in
   render_yaml_file).

2. src/html/output/forced-dark.css: remove. Was added for one-off poster
   screenshots and not referenced anywhere — the renderer/CLI doesn't
   accept a CSS-selection flag, so the file was dead weight.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants