Detecting Edit Failures in LLMs: An Improved Specificity Benchmark (website)
This repository contains the code for the paper Detecting Edit Failures in LLMs: An Improved Specificity Benchmark (ACL Findings 2023).
It extends previous work on model editing by Meng et al. [1] by introducing a new benchmark, called CounterFact+, for measuring the specificity of model edits.
The repository is a fork of MEMIT, which implements the model editing algorithms MEMIT (Mass Editing Memory in a Transformer) and ROME (Rank-One Model Editing). Our fork extends this code by additional evaluation scripts implementing the CounterFact+ benchmark. For installation instructions see the original repository.
We recommend conda for managing Python, CUDA, and PyTorch; pip is for everything else. To get started, simply install conda and run:
CONDA_HOME=$CONDA_HOME ./scripts/setup_conda.sh$CONDA_HOME should be the path to your conda installation, e.g., ~/miniconda3.
See INSTRUCTIONS.md for instructions on how to run the experiments and evaluations.
If you find our paper useful, please consider citing as:
@inproceedings{jason2023detecting,
title         = {Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark},
author        = {Hoelscher-Obermaier, Jason and Persson, Julia and Kran, Esben and Konstas, Ionnis and Barez, Fazl},
booktitle     = {Findings of ACL},
year          = {2023},
organization  = {Association for Computational Linguistics}
}