Skip to content

Zbmath author strings#226

Merged
LizzAlice merged 3 commits intomainfrom
zbmath_author_strings
Apr 10, 2026
Merged

Zbmath author strings#226
LizzAlice merged 3 commits intomainfrom
zbmath_author_strings

Conversation

@LizzAlice
Copy link
Copy Markdown
Contributor

@LizzAlice LizzAlice commented Apr 10, 2026

implement author name string handling for zbmath

Summary by CodeRabbit

  • New Features

    • Publications now store explicit author name strings so names are preserved when author identifiers are missing, improving data completeness and discoverability.
  • Refactor

    • Improved author parsing and creation flow with more robust handling of missing or conflicting author identifiers, reducing missed author associations.

@LizzAlice LizzAlice requested a review from eloiferrer April 10, 2026 13:01
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

Constructor and import logic were extended to collect and propagate author name strings: ZBMathSource now builds parallel authors and author_name_strings lists, and ZBMathPublication accepts author_name_strings and inserts them as P43 claims after existing author (P50) claims.

Changes

Cohort / File(s) Summary
Publication: P43 insertion
mardi_importer/mardi_importer/zbmath/ZBMathPublication.py
Constructor now accepts author_name_strings; insert_claims iterates author_name_strings after adding P50 author claims and adds P43 claims for those name strings.
Source: author parsing & propagation
mardi_importer/mardi_importer/zbmath/ZBMathSource.py
Refactored author parsing to always split author_ids/author_strings, normalize entries, build authors and author_name_strings in parallel, create or reuse author items when IDs exist, and pass author_name_strings to ZBMathPublication.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • eloiferrer

Poem

🐰 I nibbled through lists of names so bright,

When IDs hid, I kept their letters tight,
P50s first, then P43s hop in line,
A tidy record, every author fine. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title "Zbmath author strings" directly relates to the main objective of implementing author name string handling for zbmath, as evidenced by the PR description and the changes that add author_name_strings argument to ZBMathPublication and refactor author parsing in ZBMathSource.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch zbmath_author_strings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
mardi_importer/mardi_importer/zbmath/ZBMathSource.py (1)

743-747: ⚠️ Potential issue | 🟠 Major

arXiv records still bypass the new author-string handling.

Passing author_name_strings into ZBMathPublication only helps the non-arXiv path. From Lines 763-799, arXiv records skip publication.create() / publication.update(), and both the existing-item patch logic and create_arxiv_item() still mirror only publication.authors. Unresolved authors on zbMath arXiv papers will therefore still disappear.

🧩 One way to keep the arXiv path in sync
         if publication.authors:
             author_claims = []
             for author in publication.authors:
                 claim = self.api.get_claim("wdt:P50", author)
                 author_claims.append(claim)
             item.add_claims(author_claims)
+        if publication.author_name_strings:
+            author_string_claims = []
+            for author_string in publication.author_name_strings:
+                claim = self.api.get_claim("P43", author_string)
+                author_string_claims.append(claim)
+            item.add_claims(author_string_claims)

Apply the same P43 handling in the existing-arXiv-item enrichment branch as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mardi_importer/mardi_importer/zbmath/ZBMathSource.py` around lines 743 - 747,
The arXiv branch currently skips using the new author-name-strings flow so
unresolved authors are lost; update the arXiv enrichment path to apply the same
P43/author_name_strings handling as the non-arXiv path by passing
author_name_strings into ZBMathPublication and using publication.create() /
publication.update() logic (or equivalent patch logic) when enriching existing
arXiv items, and ensure create_arxiv_item() and the "existing-arXiv-item" patch
branch use publication.author_name_strings (not only publication.authors) when
building the P43 patch so unresolved author strings are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mardi_importer/mardi_importer/zbmath/ZBMathSource.py`:
- Around line 536-546: The loop skips/ignores entries where author_ids contains
the literal string "None" because the guard checks only falsy values; normalize
author_ids before the guards (or adjust the checks) so that any "None" string is
treated as missing. Specifically, in the code handling author_strings/author_ids
within ZBMathSource (referencing author_strings, author_ids, author_name_strings
and process_data), map author_ids entries of "None" to None or empty string
(e.g., author_ids = [None if x == "None" else x for x in author_ids]) so the
existing guards like "if not a and not a_id" and "if a and not a_id" behave
correctly and the fallback branch runs for mixed rows; apply the same
normalization where the same pattern appears later (the second occurrence around
the other author loop).

---

Outside diff comments:
In `@mardi_importer/mardi_importer/zbmath/ZBMathSource.py`:
- Around line 743-747: The arXiv branch currently skips using the new
author-name-strings flow so unresolved authors are lost; update the arXiv
enrichment path to apply the same P43/author_name_strings handling as the
non-arXiv path by passing author_name_strings into ZBMathPublication and using
publication.create() / publication.update() logic (or equivalent patch logic)
when enriching existing arXiv items, and ensure create_arxiv_item() and the
"existing-arXiv-item" patch branch use publication.author_name_strings (not only
publication.authors) when building the P43 patch so unresolved author strings
are preserved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c1533e7-b9ea-4f05-bd38-64024e7c44d2

📥 Commits

Reviewing files that changed from the base of the PR and between 5524879 and 2180371.

📒 Files selected for processing (2)
  • mardi_importer/mardi_importer/zbmath/ZBMathPublication.py
  • mardi_importer/mardi_importer/zbmath/ZBMathSource.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
mardi_importer/mardi_importer/zbmath/ZBMathSource.py (1)

729-733: ⚠️ Potential issue | 🟠 Major

Name-only authors are still skipped for arXiv records.

author_name_strings only helps on the normal ZBMathPublication.create()/update() path. When publication.is_arxiv() is true, this method bypasses that path and the later arXiv update/create branches only write publication.authors, so authors without zbMATH ids are still dropped there.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mardi_importer/mardi_importer/zbmath/ZBMathSource.py` around lines 729 - 733,
The arXiv branch drops name-only authors because it only writes
publication.authors and ignores author_name_strings; update the arXiv
create/update logic to preserve name-only authors by either (a) populating
publication.authors with entries created from author_name_strings when ZBMath
ids are missing, or (b) ensuring the arXiv path reads and uses
author_name_strings alongside publication.authors before saving; change the
logic around ZBMathPublication (check is_arxiv(), the arXiv update/create
branches) so name-only authors are included in the final saved author list.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mardi_importer/mardi_importer/zbmath/ZBMathSource.py`:
- Around line 527-535: The current loop splits linked authors into `authors` and
name-only authors into `author_name_strings`, losing original order; change the
logic in the loop that iterates over `zip(author_strings, author_ids)` to
produce one ordered author sequence (e.g. append entries to a single list like
`ordered_authors`) where each entry preserves position and contains either an
`id` field or a `name` field (and include an `ordinal` index from enumerate to
record original position). Then update downstream usage to consume
`ordered_authors` (or emit both linked-author and name-string claims with the
same `ordinal`) so the original author order can always be reconstructed; key
symbols to edit: the loop using `author_strings`, `author_ids`, and the lists
`authors` / `author_name_strings`.

---

Outside diff comments:
In `@mardi_importer/mardi_importer/zbmath/ZBMathSource.py`:
- Around line 729-733: The arXiv branch drops name-only authors because it only
writes publication.authors and ignores author_name_strings; update the arXiv
create/update logic to preserve name-only authors by either (a) populating
publication.authors with entries created from author_name_strings when ZBMath
ids are missing, or (b) ensuring the arXiv path reads and uses
author_name_strings alongside publication.authors before saving; change the
logic around ZBMathPublication (check is_arxiv(), the arXiv update/create
branches) so name-only authors are included in the final saved author list.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 66e52ccd-6114-4908-8cf5-2c04ef412524

📥 Commits

Reviewing files that changed from the base of the PR and between 2180371 and a9c964b.

📒 Files selected for processing (1)
  • mardi_importer/mardi_importer/zbmath/ZBMathSource.py

@LizzAlice LizzAlice merged commit ea0593e into main Apr 10, 2026
2 checks passed
@LizzAlice LizzAlice deleted the zbmath_author_strings branch April 10, 2026 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant