Zbmath author strings by LizzAlice · Pull Request #226 · MaRDI4NFDI/docker-importer

LizzAlice · 2026-04-10T13:00:55Z

implement author name string handling for zbmath

Summary by CodeRabbit

New Features
- Publications now store explicit author name strings so names are preserved when author identifiers are missing, improving data completeness and discoverability.
Refactor
- Improved author parsing and creation flow with more robust handling of missing or conflicting author identifiers, reducing missed author associations.

coderabbitai · 2026-04-10T13:01:06Z

📝 Walkthrough

Walkthrough

Constructor and import logic were extended to collect and propagate author name strings: ZBMathSource now builds parallel authors and author_name_strings lists, and ZBMathPublication accepts author_name_strings and inserts them as P43 claims after existing author (P50) claims.

Changes

Cohort / File(s)	Summary
Publication: P43 insertion `mardi_importer/mardi_importer/zbmath/ZBMathPublication.py`	Constructor now accepts `author_name_strings`; `insert_claims` iterates `author_name_strings` after adding P50 author claims and adds P43 claims for those name strings.
Source: author parsing & propagation `mardi_importer/mardi_importer/zbmath/ZBMathSource.py`	Refactored author parsing to always split `author_ids`/`author_strings`, normalize entries, build `authors` and `author_name_strings` in parallel, create or reuse author items when IDs exist, and pass `author_name_strings` to `ZBMathPublication`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

eloiferrer

Poem

🐰 I nibbled through lists of names so bright,

When IDs hid, I kept their letters tight,
P50s first, then P43s hop in line,
A tidy record, every author fine. 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title "Zbmath author strings" directly relates to the main objective of implementing author name string handling for zbmath, as evidenced by the PR description and the changes that add author_name_strings argument to ZBMathPublication and refactor author parsing in ZBMathSource.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch zbmath_author_strings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

mardi_importer/mardi_importer/zbmath/ZBMathSource.py (1)

743-747: ⚠️ Potential issue | 🟠 Major

arXiv records still bypass the new author-string handling.

Passing author_name_strings into ZBMathPublication only helps the non-arXiv path. From Lines 763-799, arXiv records skip publication.create() / publication.update(), and both the existing-item patch logic and create_arxiv_item() still mirror only publication.authors. Unresolved authors on zbMath arXiv papers will therefore still disappear.

🧩 One way to keep the arXiv path in sync

         if publication.authors:
             author_claims = []
             for author in publication.authors:
                 claim = self.api.get_claim("wdt:P50", author)
                 author_claims.append(claim)
             item.add_claims(author_claims)
+        if publication.author_name_strings:
+            author_string_claims = []
+            for author_string in publication.author_name_strings:
+                claim = self.api.get_claim("P43", author_string)
+                author_string_claims.append(claim)
+            item.add_claims(author_string_claims)

Apply the same P43 handling in the existing-arXiv-item enrichment branch as well.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@mardi_importer/mardi_importer/zbmath/ZBMathSource.py` around lines 743 - 747,
The arXiv branch currently skips using the new author-name-strings flow so
unresolved authors are lost; update the arXiv enrichment path to apply the same
P43/author_name_strings handling as the non-arXiv path by passing
author_name_strings into ZBMathPublication and using publication.create() /
publication.update() logic (or equivalent patch logic) when enriching existing
arXiv items, and ensure create_arxiv_item() and the "existing-arXiv-item" patch
branch use publication.author_name_strings (not only publication.authors) when
building the P43 patch so unresolved author strings are preserved.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mardi_importer/mardi_importer/zbmath/ZBMathSource.py`:
- Around line 536-546: The loop skips/ignores entries where author_ids contains
the literal string "None" because the guard checks only falsy values; normalize
author_ids before the guards (or adjust the checks) so that any "None" string is
treated as missing. Specifically, in the code handling author_strings/author_ids
within ZBMathSource (referencing author_strings, author_ids, author_name_strings
and process_data), map author_ids entries of "None" to None or empty string
(e.g., author_ids = [None if x == "None" else x for x in author_ids]) so the
existing guards like "if not a and not a_id" and "if a and not a_id" behave
correctly and the fallback branch runs for mixed rows; apply the same
normalization where the same pattern appears later (the second occurrence around
the other author loop).

---

Outside diff comments:
In `@mardi_importer/mardi_importer/zbmath/ZBMathSource.py`:
- Around line 743-747: The arXiv branch currently skips using the new
author-name-strings flow so unresolved authors are lost; update the arXiv
enrichment path to apply the same P43/author_name_strings handling as the
non-arXiv path by passing author_name_strings into ZBMathPublication and using
publication.create() / publication.update() logic (or equivalent patch logic)
when enriching existing arXiv items, and ensure create_arxiv_item() and the
"existing-arXiv-item" patch branch use publication.author_name_strings (not only
publication.authors) when building the P43 patch so unresolved author strings
are preserved.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c1533e7-b9ea-4f05-bd38-64024e7c44d2

📥 Commits

Reviewing files that changed from the base of the PR and between 5524879 and 2180371.

📒 Files selected for processing (2)

mardi_importer/mardi_importer/zbmath/ZBMathPublication.py
mardi_importer/mardi_importer/zbmath/ZBMathSource.py

mardi_importer/mardi_importer/zbmath/ZBMathSource.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

mardi_importer/mardi_importer/zbmath/ZBMathSource.py (1)
729-733: ⚠️ Potential issue | 🟠 Major

Name-only authors are still skipped for arXiv records.

author_name_strings only helps on the normal ZBMathPublication.create()/update() path. When publication.is_arxiv() is true, this method bypasses that path and the later arXiv update/create branches only write publication.authors, so authors without zbMATH ids are still dropped there.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mardi_importer/mardi_importer/zbmath/ZBMathSource.py` around lines 729 - 733,
The arXiv branch drops name-only authors because it only writes
publication.authors and ignores author_name_strings; update the arXiv
create/update logic to preserve name-only authors by either (a) populating
publication.authors with entries created from author_name_strings when ZBMath
ids are missing, or (b) ensuring the arXiv path reads and uses
author_name_strings alongside publication.authors before saving; change the
logic around ZBMathPublication (check is_arxiv(), the arXiv update/create
branches) so name-only authors are included in the final saved author list.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mardi_importer/mardi_importer/zbmath/ZBMathSource.py`:
- Around line 527-535: The current loop splits linked authors into `authors` and
name-only authors into `author_name_strings`, losing original order; change the
logic in the loop that iterates over `zip(author_strings, author_ids)` to
produce one ordered author sequence (e.g. append entries to a single list like
`ordered_authors`) where each entry preserves position and contains either an
`id` field or a `name` field (and include an `ordinal` index from enumerate to
record original position). Then update downstream usage to consume
`ordered_authors` (or emit both linked-author and name-string claims with the
same `ordinal`) so the original author order can always be reconstructed; key
symbols to edit: the loop using `author_strings`, `author_ids`, and the lists
`authors` / `author_name_strings`.

---

Outside diff comments:
In `@mardi_importer/mardi_importer/zbmath/ZBMathSource.py`:
- Around line 729-733: The arXiv branch drops name-only authors because it only
writes publication.authors and ignores author_name_strings; update the arXiv
create/update logic to preserve name-only authors by either (a) populating
publication.authors with entries created from author_name_strings when ZBMath
ids are missing, or (b) ensuring the arXiv path reads and uses
author_name_strings alongside publication.authors before saving; change the
logic around ZBMathPublication (check is_arxiv(), the arXiv update/create
branches) so name-only authors are included in the final saved author list.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 66e52ccd-6114-4908-8cf5-2c04ef412524

📥 Commits

Reviewing files that changed from the base of the PR and between 2180371 and a9c964b.

📒 Files selected for processing (1)

mardi_importer/mardi_importer/zbmath/ZBMathSource.py

mardi_importer/mardi_importer/zbmath/ZBMathSource.py

LaAlice added 2 commits April 10, 2026 14:58

enable author name strings for zbmath

4c9b6e5

small correction

2180371

LizzAlice requested a review from eloiferrer April 10, 2026 13:01

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

mardi_importer/mardi_importer/zbmath/ZBMathSource.py Outdated Show resolved Hide resolved

refactor author handling

a9c964b

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

mardi_importer/mardi_importer/zbmath/ZBMathSource.py Show resolved Hide resolved

LizzAlice merged commit ea0593e into main Apr 10, 2026
2 checks passed

LizzAlice deleted the zbmath_author_strings branch April 10, 2026 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zbmath author strings#226

Zbmath author strings#226
LizzAlice merged 3 commits intomainfrom
zbmath_author_strings

LizzAlice commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LizzAlice commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LizzAlice commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading