Skip to content

Conversation

mbollmann
Copy link
Member

This commit adds two main features:

1. Simplified usage of MarkupText

Before:

paper.title = MarkupText.from_latex_maybe("Towards GPT-$^{\\infty}$")

After:

paper.title = "Towards GPT-$^{\\infty}$"

Instead of requiring that markup text is always instantiated explicitly via a builder function like MarkupText.from_string(...), it can now simply be set as a string, in which case it will be parsed via MarkupText.from_latex_maybe(...) automatically. This should greatly simplify the API for the most common use case.

2. Added Anthology.save_all()

This didn't exist before because writing all XML files takes quite some time. This PR adds a hook to classes that automatically sets Collection.is_modified = True whenever an attribute on a collection/volume/paper/event is changed. Consequently, there's now an Anthology.save_all() that only saves those XML files that have actually been modified.

Minor changes/additions

  • New venues can now be created via Anthology.venues.create().
  • Renamed EventLinkingType to EventLink, in analogy to NameLink.
  • Added a few basic string methods to MarkupText, e.g. you can now do if "ACL Anthology" in paper.title without having to explicitly convert it to a string first.

@mbollmann mbollmann requested a review from mjpost August 20, 2025 11:42
@mbollmann mbollmann self-assigned this Aug 20, 2025
@mbollmann mbollmann added the python-library Concerning the acl-anthology-py library label Aug 20, 2025
Copy link

codecov bot commented Aug 20, 2025

Codecov Report

❌ Patch coverage is 97.59036% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.40%. Comparing base (d4c7d10) to head (e9eface).
⚠️ Report is 13 commits behind head on python-dev.

Files with missing lines Patch % Lines
python/acl_anthology/people/person.py 93.18% 3 Missing ⚠️
python/acl_anthology/people/index.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##           python-dev    #5799      +/-   ##
==============================================
+ Coverage       95.28%   95.40%   +0.12%     
==============================================
  Files              35       35              
  Lines            3031     3134     +103     
==============================================
+ Hits             2888     2990     +102     
- Misses            143      144       +1     
Files with missing lines Coverage Δ
python/acl_anthology/anthology.py 89.88% <100.00%> (+0.50%) ⬆️
python/acl_anthology/collections/__init__.py 100.00% <100.00%> (ø)
python/acl_anthology/collections/collection.py 98.47% <100.00%> (+0.04%) ⬆️
python/acl_anthology/collections/event.py 99.23% <100.00%> (+0.03%) ⬆️
python/acl_anthology/collections/eventindex.py 92.95% <100.00%> (ø)
python/acl_anthology/collections/index.py 100.00% <100.00%> (ø)
python/acl_anthology/collections/paper.py 92.81% <100.00%> (+0.12%) ⬆️
python/acl_anthology/collections/types.py 100.00% <100.00%> (ø)
python/acl_anthology/collections/volume.py 96.59% <100.00%> (+0.14%) ⬆️
python/acl_anthology/constants.py 100.00% <100.00%> (ø)
... and 7 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mjpost
Copy link
Member

mjpost commented Aug 20, 2025

This is potentially timely. I am trying to do the following:

  • ✓ Load the Anthology
  • ✓ Search for an author
  • ✓ Get all their papers
  • ✓ Find the author among the paper's authors
  • ✓ Set the author ID tag on that NameSpecification
  • ??? Write the Anthology item to disk

Will this work? WIP script here.

@mbollmann
Copy link
Member Author

mbollmann commented Aug 21, 2025

@mjpost Modifying only NameSpecifications won't be detected, so save_all() wouldn't yet do anything in your WIP script. If the person didn't have an explicit ID at all yet, i.e. you assume in your script that you just manually created it, then you don't actually need to do that and can just use Person.make_explicit() instead. Now that I think of it, it would probably be nice to have a similar function that joins an inferred person with an already-existing explicit one; that doesn't exist yet but should be easy to add.

@mjpost
Copy link
Member

mjpost commented Aug 21, 2025

I think Person.make_explicit() is in the pending Python library, though, and not in 0.52?

@mbollmann
Copy link
Member Author

I think Person.make_explicit() is in the pending Python library, though, and not in 0.52?

Just like the changes that are in this PR.

@mbollmann
Copy link
Member Author

Okay, I pushed one more change here that refactors the code I already had (for make_explicit()) to be more flexible, so I could add merge_with_explicit() as well.

@mbollmann
Copy link
Member Author

mbollmann commented Aug 21, 2025

@mjpost You can already call collection.save() on individual collections though with 0.5.3, no need to write XML files manually even before the new author system. EDIT: The new thing in this PR is the convenience function and the semi-automatic detection which collections need saving, not the saving mechanism itself. That one you already reviewed a month ago or so and is in 0.5.3 and on master. Sorry for the misunderstanding!

Also:

        # find the person with the non-explicit ID
        for person in people:
            if not person.is_explicit:
                break


        if not person:

I don't think this does what you think it does — if all people are explicit, person will be the last item of people, not None.

@mjpost mjpost mentioned this pull request Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python-library Concerning the acl-anthology-py library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants