MHDBDB TEI Repository

TEI-encoded Middle High German literature texts with semantic annotations and dual web interfaces from the Mittelhochdeutsche Begriffsdatenbank (MHDBDB), University of Salzburg.

Overview

Alle Inhalte basieren auf den Daten der Mittelhochdeutschen Begriffsdatenbank (MHDBDB) der Universität Salzburg – einem Forschungsprojekt mit über 50 Jahren mediävistischer Text- und Begriffsforschung.

Corpus Content

TEI-encoded texts of Middle High German literature
Authority files: persons, works, lexicon, concepts, genres, names, variants
Pre-built indices for fast search
Comprehensive test suite with Playwright integration

Two Web Interfaces

Feature	Main Site (index.html)	Playground (playground/)
Purpose	Public search & reading	Advanced research & analysis
Data	Pre-built indices	Pre-built authority index + lazy-loaded TEI
Search	Single lemma with filters	Multiple search types (incl. multi-lemma)
Target Users	General public, students	Researchers, medievalists

📚 Documentation

For Developers

CLAUDE.md - Primary developer guide and project overview
docs/INDEX.MD - Comprehensive knowledge base gateway with links to specialized documentation

For Users

Playground includes built-in help and search examples
Authority data browsing with filtering and sorting

Quick Start

Start Web Server

npm run serve
# Opens http://localhost:8080

Build Indices (Optional)

Pre-built indices are included. To rebuild:

npm run build              # Build all indices
npm run build:authority    # Build authority index only
npm run build:corpus       # Build corpus index only
npm run validate:indices   # Validate generated indices

Run Tests

npm test                   # Run all tests
npm run test:ui            # Interactive test UI
npm run test:headed        # Run with visible browser

Programmatic Access

TEI files reference authority data via xml:id:

<author ref="#person_445">Meister Eckhart</author>
<w lemma="vriunt" ana="#concept_12345">vriunt</w>

XPath Examples

//tei:persName[@type='preferred']  # All preferred person names
//tei:w[@lemma='vriunt']           # All instances of 'vriunt'

Authority Files

File	Content
persons.xml	Authors and historical persons
works.xml	Works and manuscript metadata
lexicon.xml	Lemmata with grammatical annotations
concepts.xml	Semantic concepts (taxonomy)
genres.xml	Literary genres (taxonomy)
names.xml	Proper names with semantic relations
variants.xml	Orthographic variants mapped to lemmas

Architecture

Pre-Built Indices

The repository includes pre-built compressed indices for fast loading:

Index	Contains
authority-index.json.gz	All authority files merged
corpus-index.json.gz	Texts with lemma positions

Features:

Compressed JSON format for reduced download size
IndexedDB caching with automatic expiration
No XML parsing overhead in browser

Technology Stack

Frontend: Vanilla JavaScript (ES Modules), Tailwind CSS
Compression: Pako (gzip compression)
Storage: Dexie.js (IndexedDB wrapper)
Testing: Playwright
Build: Python + lxml for index generation
Server: http-server (npm) or Python http.server

Middle High German Normalization

All search functions use centralized MHG character normalization:

Long vowels: â→a, ê→e, î→i, ô→o, û→u
Umlauts: ä→ae, ö→oe, ü→ue
Parity between Python (build) and JavaScript (runtime)
Comprehensive automated test coverage

License & Contact

License: CC BY-NC-SA 3.0 AT Contact: [email protected] | https://mhdbdb.plus.ac.at Project: University of Salzburg, 50+ years of medievalist research

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
.agent/workflows		.agent/workflows
.claude		.claude
.github/workflows		.github/workflows
assets/images		assets/images
authority-files		authority-files
css		css
data		data
docs		docs
js		js
lib		lib
playground		playground
scripts		scripts
tei		tei
testing		testing
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
README.md		README.md
index.html		index.html
korpus.html		korpus.html
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MHDBDB TEI Repository

Overview

Corpus Content

Two Web Interfaces

📚 Documentation

For Developers

For Users

Quick Start

Start Web Server

Build Indices (Optional)

Run Tests

Programmatic Access

XPath Examples

Authority Files

Architecture

Pre-Built Indices

Technology Stack

Middle High German Normalization

License & Contact

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

DigitalHumanitiesCraft/mhdbdb-tei-only

Folders and files

Latest commit

History

Repository files navigation

MHDBDB TEI Repository

Overview

Corpus Content

Two Web Interfaces

📚 Documentation

For Developers

For Users

Quick Start

Start Web Server

Build Indices (Optional)

Run Tests

Programmatic Access

XPath Examples

Authority Files

Architecture

Pre-Built Indices

Technology Stack

Middle High German Normalization

License & Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages