XProc 3.0 Libraries for Digital Books Download and Enrichment
This software downloads all available (or selected) data and metadata, like MODS, FOXML, DC, ALTO, and images, from digital libraries based on the system Kramerius and stores it on the local system.
This software also uses services provided by the LINDAT/CLARIAH-CZ Research Infrastructure (https://lindat.cz "LINDAT/CLARIAH-CZ Research Infrastructure"), supported by the Ministry of Education, Youth and Sports of the Czech Republic (Project No. LM2023062).
Librari augmentati enriches textual data by morphosyntactic annotation using UDPipe service and named entity recognition by NameTag service developed and operated by the mentioned large research infrastructure.
Enriched textual data can be converted to TEI P5 format, both at the page level and the whole document level.
At this time, Librari augmentati only supports digital libraries provided by Moravian Library in Brno (Kramerius version 7) and National Library (Kramerius version 5). Further libraries can be added to the libraries.xml settings file. See registry of all running instances of the Kramerius system.
- Install Java JDK 11
- Extract content of the MorganaXProc-IIIse-1.4.5.zip
- Extract content of the SaxonHE12-3J.zip file
- copy extracted
saxon-he-12.3.jarandsaxon-he-xqj-12.3.jarfiles to theMorganaXProc-IIIse_libfolder
- copy extracted
- On your operating system, set environment PATH variable to point to the location of the
MorganaXProc-IIIsefolder - for example on Windows:
- (run command line as administrator):
setx /m PATH "%PATH%;C:\Programs\MorganaXProc-IIIse" - (run command line as usual user):
setx PATH "%PATH%;C:\Programs\MorganaXProc-IIIse"
- (run command line as administrator):
Clone the repository using Git
git clone https://github.com/moravianlibrary/libri-augmentati.gitor download zipped version of the repository.
Go to the run folder of this project and run one of the following command:
- sample-book.cmd uses sample-book.xpl pipeline for downloading and processing digital book from Moravian Library in Brno.
- in the client.cmd file you can set parameters of the book to be processed; this batch file uses client.xpl pipeline
- in the client-mzk.cmd file change
-option:document-idand-option:nicknameargument and download (meta)data from Moravian Library in Brno, enrich them and convert to TEI - in the client-nkp.cmd file change
-option:document-idand-option:nicknameargument and download (meta)data from National Library, enrich them and convert to TEI - test-samples.cmd uses test-samples.xpl pipeline for downloading and processing all books used as a sample in the libraries.xml settings file
After the data processing, a report (report.html) is generated in the selected folder from which you can access all the data locally or online in the original digital library.
When viewing a title in a digital library, you can find the document-id of the publication in the URL. For example, the sample book is accessible via https://www.digitalniknihovna.cz/mzk/view/uuid:de87a0e0-643b-11ea-a744-005056827e51?page=uuid:73648039-1b21-4921-9b08-13161ae6d239. For the document-id use the first unique identifier, UUID, after /view/, in this case uuid:de87a0e0-643b-11ea-a744-005056827e51.
Or you can go through the sharing dialog where you have to select the book to share:
- install Docker Desktop
- create Libri augmentati container with following command
docker run -it --rm --volume D:\TEI-2025\Libri-augmentati:/data daliboris/libri-augmentati:latest-it= interactive mode--rm= remove Docker container after exitD:\TEI-2025\Libri-augmentati= path on your computer where you want store enriched data/data= path in Docker container where enriched data will be stored (can't be changed)daliboris/libri-augmentati:latest= identifier of the Docker image with Libri augmentati application
Use prompt in the Docker container to run the pipeline, for example:
Morgana.sh -config=/config/config.xml /la/run/client.xpl -option:library-code=mzk -option:api-version=7 -option:output-directory=/data/tei-conference -option:document-resources="MODS DC FOXML" -option:page-resources="ALTO TEXT FOXML DC MODS IMAGE" -option:document-id=uuid:de87a0e0-643b-11ea-a744-005056827e51 -option:nickname=postural-defects -indent-errors
| parameter | explanation |
|---|---|
| Morgana.sh | main script that runs XProc pipeline |
| -config=/config/config.xml | base configuration used by MorganaXProc-IIIse engine |
| /la/run/client.xpl | main pipeline for request processing |
| -option:library-code=mzk | abbreviation of digital library where publications are stored |
| -option:api-version=7 | version of Kramerius (API) used by digital library |
| -option:output-directory=/data/tei-conference | where final data and reports are stored (should be in /data/ directory) |
| -option:document-resources="MODS DC FOXML" | which metadata for the entire document should be downloaded |
| -option:page-resources="ALTO TEXT FOXML DC MODS IMAGE" | which metadata for individual pages should be downloaded |
| -option:document-id=uuid:de87a0e0-643b-11ea-a744-005056827e51 | identifier of the digital book to be downloaded |
| -option:nickname=postural-defects | nick name for the book (report and directory with this name will be created) |
Detailed documentation of the code is in the doc directory in XML, HMTL and Markdown format.
The software was funded by the Institutional support for long term conceptual development of a research organization (The Moravian Library) by the Czech Ministry of Culture.

