acl-org · mjpost · Oct 6, 2025 · Sep 2, 2025 · Sep 2, 2025 · Sep 2, 2025
diff --git a/bin/add_revision.py b/bin/add_revision.py
@@ -248,7 +248,7 @@ def main(args):
         repo.git.add(get_xml_file(args.anthology_id))
         if repo.is_dirty(index=True, working_tree=True, untracked_files=True):
             repo.index.commit(
-                f"Add revision for {args.anthology_id} (closes #{args.issue})"
+                f"Add {change_type} for {args.anthology_id} (closes #{args.issue})"
             )
 
 

diff --git a/data/xml/2021.tacl.xml b/data/xml/2021.tacl.xml
@@ -966,8 +966,11 @@
       <doi>10.1162/tacl_a_00419</doi>
       <abstract>Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks—reading comprehension, textual entailment, and so on. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5k new instances across 6 distinct NLU tasks. Additionally, we present the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.1</abstract>
       <pages>1147–1162</pages>
-      <url hash="fed0d867">2021.tacl-1.68</url>
+      <url hash="6da546ec">2021.tacl-1.68</url>
       <bibkey>khashabi-etal-2021-parsinlu</bibkey>
+      <revision id="1" href="2021.tacl-1.68v1" hash="fed0d867"/>
+      <revision id="2" href="2021.tacl-1.68v2" hash="6da546ec" date="2025-09-09">Fix author name</revision>
+      <revision id="3" href="2021.tacl-1.68v3" hash="6da546ec" date="2025-09-12">Author info update.</revision>
     </paper>
     <paper id="69">
       <title>What Helps Transformers Recognize Conversational Structure? Importance of Context, Punctuation, and Labels in Dialog Act Recognition</title>

diff --git a/data/xml/2023.emnlp.xml b/data/xml/2023.emnlp.xml
@@ -11083,13 +11083,14 @@
       <author><first>Yoav</first><last>Tulpan</last></author>
       <pages>12883-12895</pages>
       <abstract>Online social platforms provide a bustling arena for information-sharing and for multi-party discussions. Various frameworks for dialogic discourse parsing were developed and used for the processing of discussions and for predicting the productivity of a dialogue. However, most of these frameworks are not suitable for the analysis of contentious discussions that are commonplace in many online platforms. A novel multi-label scheme for contentious dialog parsing was recently introduced by Zakharov et al. (2021). While the schema is well developed, the computational approach they provide is both naive and inefficient, as a different model (architecture) using a different representation of the input, is trained for each of the 31 tags in the annotation scheme. Moreover, all their models assume full knowledge of label collocations and context, which is unlikely in any realistic setting. In this work, we present a unified model for Non-Convergent Discourse Parsing that does not require any additional input other than the previous dialog utterances. We fine-tuned a RoBERTa backbone, combining embeddings of the utterance, the context and the labels through GRN layers and an asymmetric loss function. Overall, our model achieves results comparable with SOTA, without using label collocation and without training a unique architecture/model for each label. Our proposed architecture makes the labeling feasible at large scale, promoting the development of tools that deepen our understanding of discourse dynamics.</abstract>
-      <url hash="e4a384a1">2023.emnlp-main.796</url>
+      <url hash="e22d2a38">2023.emnlp-main.796</url>
       <bibkey>tsur-tulpan-2023-deeper</bibkey>
       <doi>10.18653/v1/2023.emnlp-main.796</doi>
       <video href="2023.emnlp-main.796.mp4"/>
       <revision id="1" href="2023.emnlp-main.796v1" hash="a167848a"/>
       <revision id="2" href="2023.emnlp-main.796v2" hash="f2d33ff7" date="2025-03-27">Minor updates.</revision>
       <revision id="3" href="2023.emnlp-main.796v3" hash="e4a384a1" date="2025-03-27">The language of the Ethics and Broader Impact was changed upon request from the PEC.</revision>
+      <revision id="4" href="2023.emnlp-main.796v4" hash="e22d2a38" date="2025-09-25">Modifications requested by PEC.</revision>
     </paper>
     <paper id="797">
       <title>We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields</title>

diff --git a/data/xml/2023.findings.xml b/data/xml/2023.findings.xml
@@ -23608,9 +23608,11 @@
       <author><first>Heuiseok</first><last>Lim</last></author>
       <pages>10334-10343</pages>
       <abstract>Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their ability to establish causal relationships, particularly in the context of temporal interventions and language hallucinations, remains challenging. This paper presents <b>CReTIHC</b>, a novel dataset designed to test and enhance the causal reasoning abilities of LLMs. The dataset is constructed using a unique approach that incorporates elements of verbal hallucinations and temporal interventions through the reengineering of existing causal inference datasets. This transformation creates complex scenarios that push LLMs to critically evaluate the information presented and identify cause-and-effect relationships. The CReTIHC dataset serves as a pioneering tool for improving LLM’s causal inference capabilities, paving the way for a more nuanced understanding of causal relationships in natural language processing (NLP) tasks. The whole dataset is publicly accessible at: (https://github.com/ChangwooChun/CReTIHC)</abstract>
-      <url hash="ce39d040">2023.findings-emnlp.693</url>
+      <url hash="2d594ded">2023.findings-emnlp.693</url>
       <bibkey>chun-etal-2023-cretihc</bibkey>
       <doi>10.18653/v1/2023.findings-emnlp.693</doi>
+      <revision id="1" href="2023.findings-emnlp.693v1" hash="ce39d040"/>
+      <revision id="2" href="2023.findings-emnlp.693v2" hash="2d594ded" date="2025-09-06">Author info update.</revision>
     </paper>
     <paper id="694">
       <title>On the Dimensionality of Sentence Embeddings</title>
@@ -27814,9 +27816,11 @@
       <author><first>Michael</first><last>Elhadad</last></author>
       <pages>15164-15172</pages>
       <abstract>We call into question the recently popularized method of direct model editing as a means of correcting factual errors in LLM generations. We contrast model editing with three similar but distinct approaches that pursue better defined objectives: (1) retrieval-based architectures, which decouple factual memory from inference and linguistic capabilities embodied in LLMs; (2) concept erasure methods, which aim at preventing systemic bias in generated text; and (3) attribution methods, which aim at grounding generations into identified textual sources. We argue that direct model editing cannot be trusted as a systematic remedy for the disadvantages inherent to LLMs, and while it has proven potential in improving model explainability, it opens risks by reinforcing the notion that models can be trusted for factuality. We call for cautious promotion and application of model editing as part of the LLM deployment process, and for responsibly limiting the use cases of LLMs to those not relying on editing as a critical component.</abstract>
-      <url hash="10937ad7">2023.findings-emnlp.1012</url>
+      <url hash="51898b8f">2023.findings-emnlp.1012</url>
       <bibkey>pinter-elhadad-2023-emptying</bibkey>
       <doi>10.18653/v1/2023.findings-emnlp.1012</doi>
+      <revision id="1" href="2023.findings-emnlp.1012v1" hash="10937ad7"/>
+      <revision id="2" href="2023.findings-emnlp.1012v2" hash="51898b8f" date="2025-09-07">Updates.</revision>
     </paper>
     <paper id="1013">
       <title>A Causal View of Entity Bias in (Large) Language Models</title>

diff --git a/data/xml/2024.argmining.xml b/data/xml/2024.argmining.xml
@@ -174,9 +174,11 @@
       <author><first>Iryna</first><last>Gurevych</last></author>
       <pages>130-149</pages>
       <abstract>Argument retrieval is the task of finding relevant arguments for a given query. While existing approaches rely solely on the semantic alignment of queries and arguments, this first shared task on perspective argument retrieval incorporates perspectives during retrieval, ac- counting for latent influences in argumenta- tion. We present a novel multilingual dataset covering demographic and socio-cultural (so- cio) variables, such as age, gender, and politi- cal attitude, representing minority and major- ity groups in society. We distinguish between three scenarios to explore how retrieval systems consider explicitly (in both query and corpus) and implicitly (only in query) formulated per- spectives. This paper provides an overview of this shared task and summarizes the results of the six submitted systems. We find substantial challenges in incorporating perspectivism, especially when aiming for personalization based solely on the text of arguments without explicitly providing socio profiles. Moreover, re- trieval systems tend to be biased towards the majority group but partially mitigate bias for the female gender. While we bootstrap per- spective argument retrieval, further research is essential to optimize retrieval systems to facilitate personalization and reduce polarization.</abstract>
-      <url hash="5f27538b">2024.argmining-1.14</url>
+      <url hash="7f9ba824">2024.argmining-1.14</url>
       <bibkey>falk-etal-2024-overview</bibkey>
       <doi>10.18653/v1/2024.argmining-1.14</doi>
+      <revision id="1" href="2024.argmining-1.14v1" hash="5f27538b"/>
+      <revision id="2" href="2024.argmining-1.14v2" hash="7f9ba824" date="2025-09-05">Corrected a typo.</revision>
     </paper>
     <paper id="15">
       <title>Sövereign at The Perspective Argument Retrieval Shared Task 2024: Using <fixed-case>LLM</fixed-case>s with Argument Mining</title>

diff --git a/data/xml/2024.conll.xml b/data/xml/2024.conll.xml
@@ -209,9 +209,11 @@
       <author><first>Yevgeni</first><last>Berzak</last></author>
       <pages>219-230</pages>
       <abstract>The effect of surprisal on processing difficulty has been a central topic of investigation in psycholinguistics. Here, we use eyetracking data to examine three language processing regimes that are common in daily life but have not been addressed with respect to this question: information seeking, repeated processing, and the combination of the two. Using standard regime-agnostic surprisal estimates we find that the prediction of surprisal theory regarding the presence of a linear effect of surprisal on processing times, extends to these regimes. However, when using surprisal estimates from regime-specific contexts that match the contexts and tasks given to humans, we find that in information seeking, such estimates do not improve the predictive power of processing times compared to standard surprisals. Further, regime-specific contexts yield near zero surprisal estimates with no predictive power for processing times in repeated reading. These findings point to misalignments of task and memory representations between humans and current language models, and question the extent to which such models can be used for estimating cognitively relevant quantities. We further discuss theoretical challenges posed by these results.</abstract>
-      <url hash="edbdb721">2024.conll-1.17</url>
+      <url hash="53bc791b">2024.conll-1.17</url>
       <bibkey>klein-etal-2024-effect</bibkey>
       <doi>10.18653/v1/2024.conll-1.17</doi>
+      <revision id="1" href="2024.conll-1.17v1" hash="edbdb721"/>
+      <revision id="2" href="2024.conll-1.17v2" hash="53bc791b" date="2025-09-02">The current PDF is missing the SM (supplementary materials). We provide here the right file that includes the SM.</revision>
     </paper>
     <paper id="18">
       <title>Revisiting Hierarchical Text Classification: Inference and Metrics</title>

diff --git a/data/xml/2024.emnlp.xml b/data/xml/2024.emnlp.xml
@@ -10584,10 +10584,12 @@
       <author><first>David A.</first><last>Clifton</last><affiliation>University of Oxford</affiliation></author>
       <pages>13696-13710</pages>
       <abstract>The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering (QA) task with answer options for evaluation. However, many clinical decisions involve answering open-ended questions without pre-set options. To better understand LLMs in the clinic, we construct a benchmark ClinicBench. We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. Furthermore, we construct six novel datasets and clinical tasks that are complex but common in real-world practice, e.g., open-ended decision-making, long document processing, and emerging drug analysis. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings. Finally, we invite medical experts to evaluate the clinical usefulness of LLMs</abstract>
-      <url hash="e0a4aaac">2024.emnlp-main.759</url>
+      <url hash="70eb450d">2024.emnlp-main.759</url>
       <attachment type="data" hash="08c7b763">2024.emnlp-main.759.data.zip</attachment>
       <bibkey>liu-etal-2024-large</bibkey>
       <doi>10.18653/v1/2024.emnlp-main.759</doi>
+      <revision id="1" href="2024.emnlp-main.759v1" hash="e0a4aaac"/>
+      <revision id="2" href="2024.emnlp-main.759v2" hash="70eb450d" date="2025-09-11">Amend the wording of the funding acknowledgement.</revision>
     </paper>
     <paper id="760">
       <title>Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction</title>

diff --git a/data/xml/2024.parlaclarin.xml b/data/xml/2024.parlaclarin.xml
@@ -108,6 +108,7 @@
       <url hash="23516dd9">2024.parlaclarin-1.9</url>
       <attachment type="OptionalSupplementaryMaterial" hash="1e7ba012">2024.parlaclarin-1.9.OptionalSupplementaryMaterial.docx</attachment>
       <bibkey>menzel-2024-exploring</bibkey>
+      <erratum id="1" hash="cd11c97a" date="2025-09-02">2024.parlaclarin-1.9e1</erratum>
     </paper>
     <paper id="10">
       <title>Quantitative Analysis of Editing in Transcription Process in <fixed-case>J</fixed-case>apanese and <fixed-case>E</fixed-case>uropean Parliaments and its Diachronic Changes</title>

diff --git a/data/xml/2024.semeval.xml b/data/xml/2024.semeval.xml
@@ -3652,6 +3652,7 @@
       <bibkey>jullien-etal-2024-semeval</bibkey>
       <doi>10.18653/v1/2024.semeval-1.271</doi>
       <video href="2024.semeval-1.271.mp4"/>
+      <erratum id="1" hash="af05a452" date="2025-09-02">2024.semeval-1.271e1</erratum>
     </paper>
     <paper id="272">
       <title><fixed-case>S</fixed-case>em<fixed-case>E</fixed-case>val Task 1: Semantic Textual Relatedness for <fixed-case>A</fixed-case>frican and <fixed-case>A</fixed-case>sian Languages</title>