This repository processes folder structures containing digitized cultural heritage objects and generates metadata and provenance files for each processing stage.
The project scans a folder hierarchy following the pattern Sala*/Folder/Stage/ and generates:
meta.ttl: metadata extracted from the knowledge graph for each stageprov.nq: provenance snapshots conforming to the CHAD-AP specification
The input knowledge graph (kg.ttl) is generated by morph-kgc-changes-metadata.
The provenance model is based on the OpenCitations Data Model:
Daquino, Marilena; Massari, Arcangelo; Peroni, Silvio; Shotton, David (2018). The OpenCitations Data Model. figshare. Online resource. https://doi.org/10.6084/m9.figshare.3443876.v8
Requirements:
- Python 3.10+
- uv
If uv is not already installed, please follow the installation instructions at https://docs.astral.sh/uv/getting-started/installation/
# Clone the repository
git clone https://github.com/dharc-org/changes-metadata-manager.git
cd changes-metadata-manager
# Install dependencies with uv
uv syncThe folder_metadata_builder.py script processes a folder structure and generates metadata and provenance files for each stage.
Prerequisites:
- A folder structure with the format
<root>/Sala*/Folder/Stage/ - Knowledge graph in Turtle format (
data/kg.ttl)
Usage:
uv run python -m changes_metadata_manager.folder_metadata_builder <root_directory>The script scans the folder structure and generates for each stage:
meta.ttl: Metadata extracted from the knowledge graphprov.nq: Provenance snapshots for the metadata
Supported stages: raw, rawp, dcho, dchoo.
When the local folder structure is not available, you can sync files from SharePoint using piccione.
Create a YAML configuration file:
site_url: https://liveunibo.sharepoint.com/sites/PE5-Spoke4-CaseStudyAldrovandi
fedauth: <FedAuth_cookie_value>
rtfa: <rtFa_cookie_value>
folders:
- /Shared Documents/Sala1
- /Shared Documents/Sala2Cookie values can be extracted from browser developer tools after authenticating to SharePoint.
Sync structure only (no file download):
uv run python -m piccione.download.from_sharepoint config.yaml /output/dir --structure-onlySync structure and files:
uv run python -m piccione.download.from_sharepoint config.yaml /output/dirThen run the metadata builder with the --structure flag:
uv run python -m changes_metadata_manager.folder_metadata_builder <root_directory> --structure /output/dir/structure.jsonWhen using --structure, the script uses the JSON file to determine the folder hierarchy instead of scanning the filesystem.
uv run pytest