Skip to content

Documentation #50

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Jul 22, 2025
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
077437d
schemapack documentation
sbilge Jun 12, 2025
b4d597f
schemapack spec documentation
sbilge Jun 16, 2025
1c5648d
docs folder added
sbilge Jun 16, 2025
ec8192d
formatting
sbilge Jun 16, 2025
bc7e309
quick start intro
sbilge Jun 17, 2025
ba32e71
intros added to specs
sbilge Jun 17, 2025
5cf524d
headers are added
sbilge Jun 17, 2025
252fff2
Merge remote-tracking branch 'origin' into documentation
sbilge Jun 17, 2025
8464f41
description update in toml
sbilge Jul 10, 2025
5a39056
Update docs/datapack_spec.md
sbilge Jul 10, 2025
f714923
broken link fix
sbilge Jul 10, 2025
6bbabac
separator line removed
sbilge Jul 10, 2025
2653ef8
Update docs/datapack_spec.md
sbilge Jul 10, 2025
2508282
embedding installation and quickstart
sbilge Jul 10, 2025
5f31087
rooted example fix
sbilge Jul 10, 2025
38d5d55
table of possible target resource types
sbilge Jul 10, 2025
2b1eff2
template update
sbilge Jul 10, 2025
0eb5f52
readme link correction
sbilge Jul 10, 2025
8cda131
delete misplaced description
sbilge Jul 10, 2025
d850d40
cardinality description fix
sbilge Jul 10, 2025
ba48552
updated description
sbilge Jul 10, 2025
c971b7d
readme template modification
sbilge Jul 10, 2025
dd9e3a4
rooting warning
sbilge Jul 10, 2025
9497460
side by side rooted example
sbilge Jul 10, 2025
3d543e1
'condense' clarification
sbilge Jul 11, 2025
647df5e
schemapack root definition
sbilge Jul 11, 2025
2827e93
data isolation added
sbilge Jul 11, 2025
8fb404f
Merge branch 'main' into documentation
sbilge Jul 11, 2025
fb2aeef
embedded install section, auto generated --help text
sbilge Jul 16, 2025
e528f98
inline css styles removed
sbilge Jul 16, 2025
2d8f3a0
example fix
sbilge Jul 16, 2025
82a6d0a
Update docs/datapack_spec.md
sbilge Jul 16, 2025
60bea90
Update docs/datapack_spec.md
sbilge Jul 16, 2025
29706a8
Update docs/datapack_spec.md
sbilge Jul 16, 2025
6efc30b
development section embedded
sbilge Jul 16, 2025
cb9bf91
minor
sbilge Jul 16, 2025
db652dd
updated links in the template
sbilge Jul 21, 2025
fbee93d
description plural
sbilge Jul 21, 2025
98fe52e
Merge branch 'main' into documentation
sbilge Jul 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions .readme_generation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<!--
Copyright 2021 - 2025 Universität Tübingen, DKFZ, EMBL, and Universität zu Köln
for the German Human Genome-Phenome Archive (GHGA)

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

-->

# Readme Generation

The Repository README is generated by collecting information from different sources as
outlined in the following.

- name: The full name of the package is derived from the remote origin Git repository.
- title: A title case representation of the name.
- shortname: An abbreviation of the full name. This is derived from the name mentioned
in the [`../pyproject.toml`](../pyproject.toml).
- summary: A short 1-2 sentence summary derived from the description in the
[`../pyproject.toml`](../pyproject.toml).
- version: The package version derived from the version specified in the
[`../pyproject.toml`](../pyproject.toml).
- description: A markdown-formatted description of the features and use cases of this
service or package. Obtained from the [`./description.md`](./description.md).
- design_description: A markdown-formatted description of the overall architecture and
design of the package. Obtained from the [`./design.md`](./design.md).
- openapi_doc: A markdown-formatted description of the HTTP API. This is autogenerated
and links to the [`../openapi.yaml`](../openapi.yaml). If the openapi.yaml is not
this documentation is empty.

The [`./readme_template.md`](./readme_template.md) serves as a template where the
above variable can be filled in using Pythons `string.Template` utility from the
standard library.

The [`../scripts/update_readme.py`](../scripts/update_readme.py) script can be used to
collect all information and fill it into the template to generate the README file.
3 changes: 3 additions & 0 deletions .readme_generation/description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<!-- Please provide a short overview of the features of this service. -->

Schemapack is a library that defines a lightweight data modeling framework based on a schema description, a compatible data instance format, and the tooling that supports them. It introduces two main components: the schemapack, which describes linked data structures, and the datapack, which represents the data conforming to those structures. The tooling around `schemapack` and `datapack` focuses on loading, extraction, and validation, and supports partial extraction and data embedding operations. The Schemapack library includes a CLI component that provides access to core functionality via the command line.
3 changes: 3 additions & 0 deletions .readme_generation/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<!-- Please provide an overview of the architecture and design of the code base.
Mention anything that deviates from the standard Triple Hexagonal Architecture and
the corresponding structure. -->
58 changes: 58 additions & 0 deletions .readme_generation/installation_usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@

This package is available at PyPI:
https://pypi.org/project/schemapack

Install:
```
bash

pip install $shortname
```

Upgrade:
```
pip install --upgrade $shortname
```



## Usage:

To view the help message:

```
bash

$shortname --help
```

```
Usage: schemapack [OPTIONS] COMMAND [ARGS]...

Common arguments and options.

Options
--version Show the version of the library and exit.
--install-completion Install completion for the current shell.
--show-completion Show completion for the current shell, to copy it or
customize the installation.
--help Show this message and exit.


Commands
validate Validate a datapack against a schemapack.
check-schemapack Check if the provided JSON/YAML document complies with
the schemapack specs.
check-datapack Check if the provided JSON/YAML document complies with
the datapack specs.
condense-schemapack Writes a version of the provided schemapack with embedded
content schemas to stdout.
isolate-resource Isolate a resource from the given datapack and write a datapack
that is rooted to this resource to stdout.
isolate-class Isolate a class from the given schemapack and write a condensed
(with content schemas being embedded) schemapack that is
rooted to this class to stdout.
export-mermaid Generate an entity relationship diagram based on the mermaid
markup from the provided schemapack.

```
22 changes: 22 additions & 0 deletions .readme_generation/quickstart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@

This example shows how to validate a `datapack.yaml` file against a `schemapack.yaml` using the `schemapack` Python library. The `schemapack` defines the schema and validation rules, while the `datapack` contains the actual data to be validated. The steps below demonstrate how to load both files and run validation with `SchemaPackValidator`.

```python

from pathlib import Path

from schemapack import SchemaPackValidator, load_datapack, load_schemapack

schemapack_path = Path("path/to/schemapack.yaml")
datapack_path = Path("path/to/datapack.yaml")

# load schemapack
schemapack = load_schemapack(schemapack_path)

# load datapack
datapack = load_datapack(datapack_path)

# validate datapack against schemapack
validator = SchemaPackValidator(schemapack=schemapack)
validator.validate(datapack=datapack)
```
39 changes: 39 additions & 0 deletions .readme_generation/readme_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
[![tests](https://github.com/ghga-de/$repo_name/actions/workflows/tests.yaml/badge.svg)](https://github.com/ghga-de/$repo_name/actions/workflows/tests.yaml)
[![Coverage Status](https://coveralls.io/repos/github/ghga-de/$repo_name/badge.svg?branch=main)](https://coveralls.io/github/ghga-de/$repo_name?branch=main)
[![PyPI version shields.io](https://img.shields.io/pypi/v/$repo_name.svg)](https://pypi.python.org/pypi/$repo_name/)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/$repo_name.svg)](https://pypi.python.org/pypi/$repo_name/)

# $title

$summary

## Description

$description

## Installation

$installation

## Quick Start

$quick_start


## Documentation

- [SchemaPack specification](./docs/schemapack_spec.md)
- [DataPack specification](./docs/datapack_spec.md)
- [Data isolation](./docs/data_isolation.md)
- [Development guide](./docs/development.md)


## License

This repository is free to use and modify according to the
[Apache 2.0 License](./LICENSE).

## README Generation

This README file is auto-generated, please see [`readme_generation`](.readme_generation/README.md)
for details.
27 changes: 27 additions & 0 deletions .readme_generation/template_overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Microservice Repository Template

This is a template for GitHub repositories containing one Python-based microservice (optimal for a multirepository setup).

It features:

- *Continuous Templation* - A continuous update-delivery mechanism for templated repositories
- A [devcontainer](https://containers.dev/)-based fully-configured development environment for vscode
- Tight linting and formatting using [Ruff](https://docs.astral.sh/ruff/)
- Static type checking using [mypy](https://www.mypy-lang.org/)
- Security scanning using [bandit](https://bandit.readthedocs.io/en/latest/)
- A structure for automated tests using [pytest](https://docs.pytest.org/en/7.4.x/)
- Dependency locking using [pip-tools](https://github.com/jazzband/pip-tools)
- Git hooks checking linting and formatting before committing using [pre-commit](https://pre-commit.com/)
- Automatic container-building and publishing to [Docker Hub](https://hub.docker.com/)
- GitHub Actions for automating or checking all of the above

It is worth emphasizing the first point, this template is not just a one-time kickstart for your project
but repositories created using this template will continue receiving updates as the template evolves.
For further details, please look at the explanation in [.template/README.md](/.template/README.md).

Please also refer to [.readme_generation/README.md](/.readme_generation/README.md) for details on how
to adapt this readme.

Here the intro to the template stops and the actual template for the readme of the microservice starts:

---
2 changes: 2 additions & 0 deletions .template/deprecated_files_ignore.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
# Optional list of files which are actually deprecetated in the template
# but are still allowed to be used in the current repository

docs
130 changes: 105 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,122 @@
![tests](https://github.com/ghga-de/schemapack/actions/workflows/tests.yaml/badge.svg)
[![tests](https://github.com/ghga-de/schemapack/actions/workflows/tests.yaml/badge.svg)](https://github.com/ghga-de/schemapack/actions/workflows/tests.yaml)
[![Coverage Status](https://coveralls.io/repos/github/ghga-de/schemapack/badge.svg?branch=main)](https://coveralls.io/github/ghga-de/schemapack?branch=main)
[![PyPI version shields.io](https://img.shields.io/pypi/v/schemapack.svg)](https://pypi.python.org/pypi/schemapack/)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/schemapack.svg)](https://pypi.python.org/pypi/schemapack/)
[![Coverage Status](https://coveralls.io/repos/github/ghga-de/schemapack/badge.svg?branch=main)](https://coveralls.io/github/ghga-de/schemapack?branch=main)

# schemapack
# Schemapack

Make your JSON Schemas sociable and create linked data models.

## Description

<!-- Please provide a short overview of the features of this service. -->

Schemapack is a library that defines a lightweight data modeling framework based on a schema description, a compatible data instance format, and the tooling that supports them. It introduces two main components: the schemapack, which describes linked data structures, and the datapack, which represents the data conforming to those structures. The tooling around `schemapack` and `datapack` focuses on loading, extraction, and validation, and supports partial extraction and data embedding operations. The Schemapack library includes a CLI component that provides access to core functionality via the command line.

Make your JSON Schemas sociable and create linked data model.

## Installation


This package is available at PyPI:
https://pypi.org/project/schemapack

You can install it from there using:
Install:
```
bash

pip install $shortname
```

Upgrade:
```
pip install --upgrade $shortname
```



## Usage:

To view the help message:

```
bash

$shortname --help
```

```
pip install schemapack
Usage: schemapack [OPTIONS] COMMAND [ARGS]...

Common arguments and options.

Options
--version Show the version of the library and exit.
--install-completion Install completion for the current shell.
--show-completion Show completion for the current shell, to copy it or
customize the installation.
--help Show this message and exit.


Commands
validate Validate a datapack against a schemapack.
check-schemapack Check if the provided JSON/YAML document complies with
the schemapack specs.
check-datapack Check if the provided JSON/YAML document complies with
the datapack specs.
condense-schemapack Writes a version of the provided schemapack with embedded
content schemas to stdout.
isolate-resource Isolate a resource from the given datapack and write a datapack
that is rooted to this resource to stdout.
isolate-class Isolate a class from the given schemapack and write a condensed
(with content schemas being embedded) schemapack that is
rooted to this class to stdout.
export-mermaid Generate an entity relationship diagram based on the mermaid
markup from the provided schemapack.

```


## Quick Start


This example shows how to validate a `datapack.yaml` file against a `schemapack.yaml` using the `schemapack` Python library. The `schemapack` defines the schema and validation rules, while the `datapack` contains the actual data to be validated. The steps below demonstrate how to load both files and run validation with `SchemaPackValidator`.

```python

from pathlib import Path

from schemapack import SchemaPackValidator, load_datapack, load_schemapack

schemapack_path = Path("path/to/schemapack.yaml")
datapack_path = Path("path/to/datapack.yaml")

# load schemapack
schemapack = load_schemapack(schemapack_path)

# load datapack
datapack = load_datapack(datapack_path)

# validate datapack against schemapack
validator = SchemaPackValidator(schemapack=schemapack)
validator.validate(datapack=datapack)
```

## Development
For setting up the development environment, we rely on the
[devcontainer feature](https://code.visualstudio.com/docs/remote/containers) of vscode.

To use it, you have to have Docker as well as vscode with its "Remote - Containers"
extension (`ms-vscode-remote.remote-containers`) extension installed.
Then, you just have to open this repo in vscode and run the command
`Remote-Containers: Reopen in Container` from the vscode "Command Palette".

This will give you a full-fledged, pre-configured development environment including:
- infrastructural dependencies of the service (databases, etc.)
- all relevant vscode extensions pre-installed
- pre-configured linting and auto-formating
- a pre-configured debugger
- automatic license-header insertion
## Documentation

- [SchemaPack specification](./docs/schemapack_spec.md)
- [DataPack specification](./docs/datapack_spec.md)
- [Data isolation](./docs/data_isolation.md)
- [Development guide](./docs/development.md)

Moreover, inside the devcontainer, there is follwing convenience command available
(please type it in the integrated terminal of vscode):
- `dev_install` - install the lib with all development dependencies and pre-commit hooks
(please run that if you are starting the devcontainer for the first time
or if added any python dependencies to the [`./setup.cfg`](./setup.cfg))

## License
This repository is free to use and modify according to the [Apache 2.0 License](./LICENSE).

This repository is free to use and modify according to the
[Apache 2.0 License](./LICENSE).

## README Generation

This README file is auto-generated, please see [`readme_generation`](.readme_generation/README.md)
for details.
Loading