Skip to content

This repository processes folder structures containing digitized cultural heritage objects and generates metadata and provenance files for each processing stage

License

Notifications You must be signed in to change notification settings

dharc-org/changes-metadata-manager

Repository files navigation

CHANGES Metadata Manager

Tests Coverage uv Repo Size Python 3.10+ License

This repository processes folder structures containing digitized cultural heritage objects and generates metadata and provenance files for each processing stage.

Overview

The project scans a folder hierarchy following the pattern Sala*/Folder/Stage/ and generates:

  • meta.ttl: metadata extracted from the knowledge graph for each stage
  • prov.nq: provenance snapshots conforming to the CHAD-AP specification

The input knowledge graph (kg.ttl) is generated by morph-kgc-changes-metadata.

The provenance model is based on the OpenCitations Data Model:

Daquino, Marilena; Massari, Arcangelo; Peroni, Silvio; Shotton, David (2018). The OpenCitations Data Model. figshare. Online resource. https://doi.org/10.6084/m9.figshare.3443876.v8

Installation

Requirements:

  • Python 3.10+
  • uv

Using uv

If uv is not already installed, please follow the installation instructions at https://docs.astral.sh/uv/getting-started/installation/

# Clone the repository
git clone https://github.com/dharc-org/changes-metadata-manager.git
cd changes-metadata-manager

# Install dependencies with uv
uv sync

Usage

Building folder metadata

The folder_metadata_builder.py script processes a folder structure and generates metadata and provenance files for each stage.

Prerequisites:

  • A folder structure with the format <root>/Sala*/Folder/Stage/
  • Knowledge graph in Turtle format (data/kg.ttl)

Usage:

uv run python -m changes_metadata_manager.folder_metadata_builder <root_directory>

The script scans the folder structure and generates for each stage:

  • meta.ttl: Metadata extracted from the knowledge graph
  • prov.nq: Provenance snapshots for the metadata

Supported stages: raw, rawp, dcho, dchoo.

Development

SharePoint sync

When the local folder structure is not available, you can sync files from SharePoint using piccione.

Create a YAML configuration file:

site_url: https://liveunibo.sharepoint.com/sites/PE5-Spoke4-CaseStudyAldrovandi
fedauth: <FedAuth_cookie_value>
rtfa: <rtFa_cookie_value>
folders:
  - /Shared Documents/Sala1
  - /Shared Documents/Sala2

Cookie values can be extracted from browser developer tools after authenticating to SharePoint.

Sync structure only (no file download):

uv run python -m piccione.download.from_sharepoint config.yaml /output/dir --structure-only

Sync structure and files:

uv run python -m piccione.download.from_sharepoint config.yaml /output/dir

Then run the metadata builder with the --structure flag:

uv run python -m changes_metadata_manager.folder_metadata_builder <root_directory> --structure /output/dir/structure.json

When using --structure, the script uses the JSON file to determine the folder hierarchy instead of scanning the filesystem.

Running tests

uv run pytest

About

This repository processes folder structures containing digitized cultural heritage objects and generates metadata and provenance files for each processing stage

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages