Add briefzettelkatalog benchmark

## What specific task do you want to benchmark?
Catalog data extraction from *Briefzettelkatalog* scans. If successful, the extracted data will be ingested into ALMA and replace the manual cataloging of the *Briefzettelkatalog*.

## What dataset will you use, and do you have permission to share it?
The *Briefzettelkatalog* consists of about 50,000 cards, all of which are in the process of being digitized. The scans will be released under a **CC0 license**.

## How will you create ground truth (who will annotate, and how)?
About 10,000 cards have been re-cataloged by domain experts and serve as ground truth. These records are available in ALMA as **MARC21**. A randomized sample of around 500 records will be transformed into the expected output JSON format (see below).

## What does successful model output look like?
```json
{
  "Metadata": {
    "Author": "Andrait, Jacques",
    "Reference": "G² II 14, fol. 125",
    "Recipient": "Arragonis, Euchelmius",
    "Date": "1608-05-19",
    "Place": "St. Michel de Lanes",
    "Language": "französisch",
    "Note": "Apogr. französ.",
    "Bibliography": "[Bibliogr.: Fr Ertl. I 15,3]"
  },
  "Description": {
    "Text": "R: Der Herr A.S. Ihan wird nach Rasel kommen ins A. gut zu bedienen. - Und hat sich wegen des Reitzes von H. alle Mühe gegeben, kein Roger A. hat sich immer gegen den Willen des A. gestellt. Es soll nun besser werden. - Grüess."
  }
}
```

## How will you score model performance?
Compute **F1 per field** (with a string similarity threshold) and **F1 micro per card**.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add briefzettelkatalog benchmark #58

What specific task do you want to benchmark?

What dataset will you use, and do you have permission to share it?

How will you create ground truth (who will annotate, and how)?

What does successful model output look like?

How will you score model performance?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add briefzettelkatalog benchmark #58

Description

What specific task do you want to benchmark?

What dataset will you use, and do you have permission to share it?

How will you create ground truth (who will annotate, and how)?

What does successful model output look like?

How will you score model performance?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions