Skip to content

XProc 3.0 Libraries for Digital Books Download and Enrichment

License

Notifications You must be signed in to change notification settings

moravianlibrary/libri-augmentati

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Libri augmentati

XProc 3.0 Libraries for Digital Books Download and Enrichment

This software downloads all available (or selected) data and metadata, like MODS, FOXML, DC, ALTO, and images, from digital libraries based on the system Kramerius and stores it on the local system.

This software also uses services provided by the LINDAT/CLARIAH-CZ Research Infrastructure (https://lindat.cz "LINDAT/CLARIAH-CZ Research Infrastructure"), supported by the Ministry of Education, Youth and Sports of the Czech Republic (Project No. LM2023062).

Librari augmentati enriches textual data by morphosyntactic annotation using UDPipe service and named entity recognition by NameTag service developed and operated by the mentioned large research infrastructure.

Enriched textual data can be converted to TEI P5 format, both at the page level and the whole document level.

At this time, Librari augmentati only supports digital libraries provided by Moravian Library in Brno (Kramerius version 7) and National Library (Kramerius version 5). Further libraries can be added to the libraries.xml settings file. See registry of all running instances of the Kramerius system.

Prerequisites

Setting up the environment

  • Install Java JDK 11
  • Extract content of the MorganaXProc-IIIse-1.4.5.zip
  • Extract content of the SaxonHE12-3J.zip file
    • copy extracted saxon-he-12.3.jar and saxon-he-xqj-12.3.jar files to the MorganaXProc-IIIse_lib folder
  • On your operating system, set environment PATH variable to point to the location of the MorganaXProc-IIIse folder
  • for example on Windows:
    • (run command line as administrator): setx /m PATH "%PATH%;C:\Programs\MorganaXProc-IIIse"
    • (run command line as usual user): setx PATH "%PATH%;C:\Programs\MorganaXProc-IIIse"

How to use it

Clone the repository using Git

git clone https://github.com/moravianlibrary/libri-augmentati.git

or download zipped version of the repository.

Go to the run folder of this project and run one of the following command:

  • sample-book.cmd uses sample-book.xpl pipeline for downloading and processing digital book from Moravian Library in Brno.
  • in the client.cmd file you can set parameters of the book to be processed; this batch file uses client.xpl pipeline
  • in the client-mzk.cmd file change -option:document-id and -option:nickname argument and download (meta)data from Moravian Library in Brno, enrich them and convert to TEI
  • in the client-nkp.cmd file change -option:document-id and -option:nickname argument and download (meta)data from National Library, enrich them and convert to TEI
  • test-samples.cmd uses test-samples.xpl pipeline for downloading and processing all books used as a sample in the libraries.xml settings file

After the data processing, a report (report.html) is generated in the selected folder from which you can access all the data locally or online in the original digital library.

When viewing a title in a digital library, you can find the document-id of the publication in the URL. For example, the sample book is accessible via https://www.digitalniknihovna.cz/mzk/view/uuid:de87a0e0-643b-11ea-a744-005056827e51?page=uuid:73648039-1b21-4921-9b08-13161ae6d239. For the document-id use the first unique identifier, UUID, after /view/, in this case uuid:de87a0e0-643b-11ea-a744-005056827e51.

Or you can go through the sharing dialog where you have to select the book to share:

Docker

  • install Docker Desktop
  • create Libri augmentati container with following command
docker run -it --rm --volume D:\TEI-2025\Libri-augmentati:/data daliboris/libri-augmentati:latest

Parameters

  • -it = interactive mode
  • --rm = remove Docker container after exit
  • D:\TEI-2025\Libri-augmentati = path on your computer where you want store enriched data
  • /data = path in Docker container where enriched data will be stored (can't be changed)
  • daliboris/libri-augmentati:latest = identifier of the Docker image with Libri augmentati application

Use Libri augmentati

Use prompt in the Docker container to run the pipeline, for example:

Morgana.sh -config=/config/config.xml /la/run/client.xpl -option:library-code=mzk -option:api-version=7 -option:output-directory=/data/tei-conference -option:document-resources="MODS DC FOXML" -option:page-resources="ALTO TEXT FOXML DC MODS IMAGE" -option:document-id=uuid:de87a0e0-643b-11ea-a744-005056827e51 -option:nickname=postural-defects -indent-errors

Parameters

parameter explanation
Morgana.sh main script that runs XProc pipeline
-config=/config/config.xml base configuration used by MorganaXProc-IIIse engine
/la/run/client.xpl main pipeline for request processing
-option:library-code=mzk abbreviation of digital library where publications are stored
-option:api-version=7 version of Kramerius (API) used by digital library
-option:output-directory=/data/tei-conference where final data and reports are stored (should be in /data/ directory)
-option:document-resources="MODS DC FOXML" which metadata for the entire document should be downloaded
-option:page-resources="ALTO TEXT FOXML DC MODS IMAGE" which metadata for individual pages should be downloaded
-option:document-id=uuid:de87a0e0-643b-11ea-a744-005056827e51 identifier of the digital book to be downloaded
-option:nickname=postural-defects nick name for the book (report and directory with this name will be created)

Technical documentation

Detailed documentation of the code is in the doc directory in XML, HMTL and Markdown format.

Acknowledgment

The software was funded by the Institutional support for long term conceptual development of a research organization (The Moravian Library) by the Czech Ministry of Culture.

About

XProc 3.0 Libraries for Digital Books Download and Enrichment

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages