Skip to content

MaRMAT: A desktop tool for schema-agnostic reparative metadata assessment. Flag harmful terms in tabular metadata using pre-curated and custom lexicons.

License

Notifications You must be signed in to change notification settings

marriott-library/MaRMAT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Marriott Reparative Metadata Assessment Tool (MaRMAT)

The Marriott Reparative Metadata Assessment Tool (MaRMAT) is an open-source application created by librarians at the University of Utah’s J. Willard Marriott Library to help metadata practitioners flag various terms and phrases within metadata records using pre-curated and custom lexicons. MaRMAT is schema agnostic and supports library and museum professionals in assessing metadata for harmful, outdated, and otherwise problematic language in tabular metadata. In addition to reparative work, MaRMAT can be used to support broader metadata assessment and collections content analysis. Learn more about MaRMAT on our website: www.marmatproject.org.

Table of Contents

  1. About

    1.1 Features

    1.2 The Lexicons

  2. Installation

    2.1 Download

    2.2 Dependencies

    2.3 Troubleshooting

  3. Running MaRMAT

    3.1 MacOS Users

    3.2 Windows Users

  4. Tips

  5. Credits and Acknowledgments

  6. User Feedback Survey

  7. Report Bug

1. About

The Marriott Reparative Metadata Assessment Tool (MaRMAT) is an open-source application created by librarians at the University of Utah’s J. Willard Marriott Library to help metadata practitioners flag various terms and phrases within metadata records using pre-curated and custom lexicons. MaRMAT is schema agnostic and supports library and museum professionals in assessing metadata for harmful, outdated, and otherwise problematic language in tabular metadata in CSV or TSV file formats. In addition to reparative work, MaRMAT can be used to support broader metadata assessment and collections content analysis.

Identifying potentially harmful language—including problematic or outdated Library of Congress Subject Headings—is one important step toward reparative metadata practices. However, deciding what to change and how to change it requires thoughtful judgment by metadata practitioners. This work calls for awareness, education, and sensitivity to the communities and histories represented in digital collections. The Inclusive Metadata Toolkit, created by the Digital Library Federation’s Cultural Assessment Working Group, offers valuable resources to support decision-making in reparative metadata work. 

MaRMAT was inspired by Duke University Libraries Description Audit Tool, developed to analyze MARC XML and EAD finding aid metadata. We created MaRMAT to complement this work by enabling bulk analysis of metadata in tabular formats, allowing for schema-agnostic assessment.

1.1 Features

  • Schema-agnostic tabular metadata analysis
  • Custom and pre-curated lexicon support
  • Batch processing of metadata records
  • Exportable results for further analysis or remediation
  • Support for user-contributed lexicons

1.2 The Lexicons

MaRMAT uses specialized lexicons—carefully curated lists of terms—to identify potentially harmful, outdated, or problematic language within metadata records. These lexicons include pre-curated sets created by our team of librarians as well as the ability for users to build and incorporate custom term lists tailored to their specific assessment goals. By leveraging these lexicons, MaRMAT empowers metadata practitioners to conduct thorough and thoughtful reviews of their collections, supporting more inclusive and accurate descriptions. Access CSV files of our lexicons here or on our website.

Lexicon Description
Reparative Metadata Lexicon​ The Reparative Metadata Lexicon includes potentially harmful terminology organized according to category (e.g., aggrandizement, ability, gender, LGBTQ, mental health, race, slavery, US Japanese Incarceration). This lexicon is best suited for use on uncontrolled metadata fields (e.g., title, description). The Inclusive Metadata Toolkit and its associated Resource Directory, developed by the Digital Library Federation's Cultural Assessment Working Group, provides additional resources for reparative metadata practice. WARNING: In order to perform robust metadata assessment, this lexicon contains extremely offensive terminology including expletives, racial slurs, pejoratives, and other derogatory terms for marginalized groups of people. Note: If you are running this lexicon against a large set of metadata, processing times may be delayed. To improve processing speed, we recommend selecting a subset of these categories in MaRMAT's interface rather than assessing for all categories at once.
Library of Congress Subject Headings (LCSH) Lexicon​ The LCSH Lexicon includes selected changed and canceled Library of Congress Subject Headings as well as headings that have been identified on The Cataloguing Lab. Select changed and canceled headings, mostly relating to people and places, were mined from the Library of Congress Subject Heading Approved Monthly Lists for 2023-2024, along with a few notable changes from 2025. The LCSH Lexicon is best suited for use against metadata fields that use LCSH as a controlled vocabulary (e.g., subject). 
Sensitive Content Lexicon​ The Sensitive Content Lexicon includes terms may be used to identify records with sensitive content that may be eligible for either a sensitive content viewer or removal from public display. Sensitive topics identified include deceased people, nudity, and graphic, violent, or sexual content. It also includes Indigenous American material that may need restriction or removal due to cultural sensitivity or potential violation of the Native American Graves Protection and Repatriation Act (NAGPRA). Each organization has their own set of parameters for implementing content warnings and criteria for sensitive content. Please use this lexicon directionally and adhere to your organization's established policies or guidelines.

2. Installation

2.1 Download

MacOS Users: Go to v2.6.0-rc and download MaRMAT_v.2.6.0-rc_macOS.zip from Assets.

Windows Users: Go to v2.6.0-rc and download MaRMAT_v.2.6.0-rc_Windows.zip from Assets.

Linux Users: Go to v2.6.0-rc and download MaRMAT_v.2.6.0-rc_Linux-x64.zip from Assets.

2.2 Dependencies

To run MaRMAT, you will need Python 3 installed on your computer. If Python is not installed, you can download it here:

MaRMAT also requires two Python libraries: pandas and PyQt6. To install them, follow the instructions for your operating system below.

MacOS:

  1. Open Terminal (Applications > Utilities > Terminal).
  2. Run the following command:
    pip3 install pandas PyQt6

Windows:

Windows users have two options for installation:

Option A:

  1. Unzip the MaRMAT_v.2.6.0-rc_Windows.zip
  2. Double-click on the install-dependencies.bat file

Option B:

  1. Open Command Prompt (search for cmd) or PowerShell
  2. Run the following command:
    py -m pip install pandas PyQt6

2.3 Troubleshooting

  1. If you see a permissions error installing pandas and PyQt6 on MacOS, try running the command with elevated privileges:
    sudo pip install pandas PyQt6

3. Running MaRMAT

3.1 MacOS Users

  1. Download MaRMAT_v.2.6.0-rc_macOS.zip from our release assets.
  2. Unzip the downloaded folder to a location on your computer, such as your Desktop or Downloads folder.
  3. Use the cd command in Terminal to change directories to the unzipped MaRMAT folder. For example, if you unzipped the folder to your Desktop, run:
    cd Desktop/MaRMAT_v.2.6.0-rc_macOS/src
  4. Then run the following command to launch the MaRMAT user interface:
    python3 main.py

3.2 Windows Users

  1. Download MaRMAT_v.2.6.0-rc_Windows.zip from our release assets.
  2. Unzip the downloaded folder to a location on your computer, such as your Desktop or Downloads folder.
  3. If the necesssary python libraries haven't been installed yet, double-click to install-dependencies.bat file.
  4. Open MaRMAT by double-clicking the run-marmat.bat file.

4. Tips

  • Ensure metadata files can be in a TSV or CSV format.
  • Ensure the lexicon files are in a CSV format.
  • The lexicon file should contain columns for terms and their corresponding categories ("Terms","Category").
  • The metadata file should contain the text data to be analyzed, with each row representing a separate entry.
  • The metadata file should contain a column, such as a Record ID, that you can use as an "Identifier" to reconcile the tool's output with your original metadata.

5. Credits and Acknowledgments

The current version of MaRMAT was fully reprogrammed and redesigned by Aiden deBoer thanks to an internal "Jumpstart Grant" awarded by the J. Willard Marriott Library in 2025. MaRMAT-beta was released in July 2024. It was developed by Kaylee Alexander in collaboration with ChatGPT 3.5 with input from Rachel Wittmann and Anna Neatrour at the University of Utah. MaRMAT was inspired by the Duke University Libraries Description Audit Tool and informed by resources such as The Inclusive Metadata Toolkit, developed by the Digital Library Federation's Cultural Assessment Working Group. We are grateful to the Oregon Digital team for their contributions to the Reparative Metadata Lexicon.

6. User Feedback Survey

After using MaRMAT, please take our survey and tell us about your experience. We appreciate your feedback!

7. Report Bug

Encountered a bug? Report it here.

About

MaRMAT: A desktop tool for schema-agnostic reparative metadata assessment. Flag harmful terms in tabular metadata using pre-curated and custom lexicons.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •  

Languages