This is the German Drama Corpus (GerDraCor), a collection of TEI P5-encoded German-language plays from the 1500s to the 1940s. The corpus is released under the Creative Commons Zero copyright waiver (CC0).
If you want to cite the corpus, please use this publication:
- Fischer, Frank, et al. (2019). Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama. In Proceedings of DH2019: "Complexities", Utrecht University, doi:10.5281/zenodo.4284002.
We started to build the corpus by extracting all plays from TextGrid Repository (TGRep). The source for the versions in TGRep was zeno.org's text collection. However, TGRep's conversion from zeno.org's proprietary XML to TEI caused some bugs and inconsistencies which we fixed for GerDraCor in a longer process between 2017 and 2019. All our fixes including enhancements are documented on GerDraCor's Wiki. After this clean-up process, GerDraCor is now in a position to grow by taking on new plays from sources such as Deutsches Textarchiv, Project Gutenberg, Projekt Gutenberg-DE, Wikisource, or Google Books.
GerDraCor is an autonomous corpus and will be maintained independently. Yet it is also integrated into the dracor.org website, the showcase for our newly introduced "Programmable Corpora" concept.
- Editors: Frank Fischer, Peer Trilcke
- Support during the initial compilation of the corpus from TextGrid Repository: Mathias Göbel (Göttingen State and University Library), Dario Kampkaspar (Technical University of Darmstadt/ACDH-CH, Vienna)
- Additional encoders: Erik Renz (University of Rostock)
- Bibliographic research: Lilly Welz (Freie Universität Berlin)
- Character annotations: Nathalie Wiedmer (University of Tübingen), Janis Pagel, Nils Reiter (both University of Cologne)
- Systematic OCR error cleanup: Jan Jokisch (Max-Planck-Institut für empirische Ästhetik)
Character relations encode the information provided in the dramatis personae and make it machine-readable. This is mainly about family and power relations.
The following relations have been annotated (by Nathalie Wiedmer et al.):
| Relation label | Directed/Undirected | Description |
|---|---|---|
parent_of |
directed | One character is a parent of the other |
lover_of |
directed | For lovers |
related_with |
directed | Other family relations (e.g., uncles) |
associated_with |
directed | For clearly associated characters (e.g., butlers) |
siblings |
undirected | Characters that have at least one parent in common |
spouses |
undirected | Characters in marriage (or engaged) |
friends |
undirected | Characters marked as being friends |
All relations are marked in XML in the <listPerson> element within <listRelation>. Directed relations are encoded with an active and passive attribute where the active part is always the one in front of the relation, if expressed as a sentence. E.g., Odoardo is parent of Emilia translates to this:
<relation name="parent_of" active="#odoardo_galotti" passive="#emilia" />
Undirected relations use the mutual attribute to collect all IDs that are part of a relationship:
<relation name="spouses" mutual="#baerbel #adam"/>
The label from the table above is contained in the name attribute.
An easy way to download the network data (instead of the actual TEI files) is to use our API (documentation here). If you have jq installed, it would work like this:
for play in `curl 'https://dracor.org/api/v1/corpora/ger' | jq -r ".plays[] .name"`; do
wget -O "$play".csv https://dracor.org/api/v1/corpora/ger/plays/"$play"/networkdata/csv
done
The API info page is at https://dracor.org/api/v1/info. It also tells you which version of eXist-db we're running on dracor.org.
Explore the distribution of speakers per play over time by loading the metadata directly from the API:
library(data.table)
library(ggplot2)
gerdracor <- fread("https://dracor.org/api/v1/corpora/ger/metadata/csv")
ggplot(gerdracor[], aes(x = yearNormalized, y = numOfSpeakers)) + geom_point()
This produces a scatter plot showing character counts across centuries:
To visualise the temporal distribution of plays by decade:
library(httr)
library(jsonlite)
response <- GET("https://dracor.org/api/v1/corpora/ger/metadata")
metadata <- fromJSON(content(response, "text", encoding = "UTF-8"))
decades <- floor(metadata$yearNormalized / 10) * 10
all_decades <- seq(min(decades, na.rm = TRUE), max(decades, na.rm = TRUE), 10)
barplot(table(factor(decades, levels = all_decades)), col = "lightgray", las = 2, xlab = "decade", ylab = "numOfPlays")
grid(nx = NA, ny = NULL)
This creates a barplot of play frequencies across decades:
GerDraCor evolved from earlier work under the DLINA (digitally-enabled literary network analysis) project, which used an intermediary format containing only structural data for network analysis research. As our research agenda expanded beyond network analysis, we developed the Programmable Corpora paradigm, enabling programmatic access to complete dramatic texts alongside their structural features.
(README last updated on February 8, 2026.)

