Skip to content

Include license#225

Merged
LizzAlice merged 3 commits intomainfrom
include_license
Apr 10, 2026
Merged

Include license#225
LizzAlice merged 3 commits intomainfrom
include_license

Conversation

@LizzAlice
Copy link
Copy Markdown
Contributor

@LizzAlice LizzAlice commented Apr 10, 2026

add license and set arxiv item as preprint of

Summary by CodeRabbit

  • New Features
    • License metadata is now extracted and stored with publications.
    • License strings are mapped to canonical license identifiers and linked to publication records.
    • arXiv preprints are automatically associated with their corresponding publication records when applicable.
    • Configuration template now supports "license" as a standard metadata field.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

Added license support to the ZBMath importer: configuration tag added, license extraction in the source, a license→Wikibase-QID mapping, publication constructor now accepts licenses, publication creation adds license (P163) claims, and arXiv items are linked back (P1676) when arXiv IDs are present.

Changes

Cohort / File(s) Summary
Configuration
config/import_config.config.template
Appended license to the [ZBMath] tags list.
Publication logic
mardi_importer/mardi_importer/zbmath/ZBMathPublication.py
Added module-level license_dict mapping license URLs → Wikibase Q-IDs; extended ZBMathPublication.__init__ to accept licenses; on create(), search for arXiv items and add P1676 (is preprint of) to arXiv items; insert_claims() now adds P163 (license) claims from mapped licenses and removed direct wdt:P818 arXiv claim.
Source extraction
mardi_importer/mardi_importer/zbmath/ZBMathSource.py
Extracted license in process_data() (literal_eval + join); push() splits info_dict["license"] into licenses and passes them to ZBMathPublication.

Sequence Diagram(s)

sequenceDiagram
    participant Source as ZBMathSource
    participant Pub as ZBMathPublication
    participant WB as Wikibase

    Source->>Pub: create(..., arxiv_id?, licenses?, ...)
    Pub->>WB: Create publication item
    WB-->>Pub: Return publication Q-ID

    alt arxiv_id present
        Pub->>WB: Search for arXiv entities
        WB-->>Pub: Return matching arXiv items
        Pub->>WB: Add P1676 claim to arXiv items (preprint of → publication Q-ID)
        WB-->>Pub: Confirm write
    end

    alt licenses provided
        Pub->>WB: Resolve license URLs via license_dict → Q-IDs
        Pub->>WB: Add P163 claims (license) to publication item
        WB-->>Pub: Confirm claims added
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • eloiferrer

Poem

🐰 A rabbit hops through lines of code and cheer,
Mapping licenses far and near,
ArXiv links and tags in place,
Metadata dances, keeping pace —
Hooray! The importer spreads good cheer. 🐇

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Include license' accurately summarizes the main change: adding license support to the import system across configuration, ZBMath classes, and data extraction.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch include_license

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mardi_importer/mardi_importer/zbmath/ZBMathPublication.py`:
- Around line 198-203: The loop over self.licenses uses license_dict[l] directly
and will KeyError on unknown URLs; change it to guard lookups by checking
membership (or using .get) before calling self.api.get_claim so unmapped license
URLs are skipped (and optionally logged) rather than aborting the import; keep
using self.api.get_claim(...) and accumulate claims into license_claims before
calling self.item.add_claims(license_claims).
- Around line 148-154: The P1676 "is preprint of" claim is never added for arXiv
records because publication.create() isn't called for them in
ZBMathSource.push(); update the arXiv branch so it adds the same P1676 link when
creating or updating arXiv items. Specifically, in ZBMathSource.push() where you
call create_arxiv_item() or update an existing arXiv item, after obtaining the
arXiv item object add the P1676 claim pointing to publication_id and write the
item (the same logic currently in ZBMathPublication.create()); alternatively,
modify create_arxiv_item() to accept the publication_id and create the P1676
claim there; ensure you use publication.is_arxiv(), create_arxiv_item(), and the
arxiv_item.add_claim/write() calls so the preprint relation is always created or
updated.

In `@mardi_importer/mardi_importer/zbmath/ZBMathSource.py`:
- Line 351: In process_data(), harden the handling of row["license"] before
calling literal_eval and join: check for None or empty string and treat as empty
list; if the value is a plain text fallback (or does not look like a serialized
iterable) treat it as a single-element list; otherwise attempt
literal_eval(row["license"]) inside a try/except (catch SyntaxError/ValueError)
and fall back to treating the original string as a single item; finally set
record["license"] = ";".join(parsed_list) (or "" for empty list). Reference
symbols: row["license"], literal_eval, record["license"], process_data().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6624f5de-9f66-41ad-8854-8cf8ae0923dd

📥 Commits

Reviewing files that changed from the base of the PR and between e74ca73 and b343a77.

📒 Files selected for processing (3)
  • config/import_config.config.template
  • mardi_importer/mardi_importer/zbmath/ZBMathPublication.py
  • mardi_importer/mardi_importer/zbmath/ZBMathSource.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
mardi_importer/mardi_importer/zbmath/ZBMathPublication.py (1)

198-207: Rename ambiguous variable l to license_url.

The KeyError hardening looks good. However, static analysis flags l as an ambiguous variable name (E741) — it resembles 1 in many fonts.

♻️ Suggested fix
         if self.licenses:
             license_claims = []
-            for l in self.licenses:
-                license_qid = license_dict.get(l)
+            for license_url in self.licenses:
+                license_qid = license_dict.get(license_url)
                 if not license_qid:
                     continue
                 claim = self.api.get_claim("P163", license_qid)
                 license_claims.append(claim)
             if license_claims:
                 self.item.add_claims(license_claims)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mardi_importer/mardi_importer/zbmath/ZBMathPublication.py` around lines 198 -
207, Rename the ambiguous loop variable `l` to a descriptive name like
`license_url` in the ZBMathPublication license-processing block: update the
for-loop header (for l in self.licenses) and all uses of `l` to `license_url`,
preserving the existing logic that looks up license_qid via
license_dict.get(license_url), skips missing keys, calls
self.api.get_claim("P163", license_qid), collects claim into license_claims, and
finally calls self.item.add_claims(license_claims) if any.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@mardi_importer/mardi_importer/zbmath/ZBMathPublication.py`:
- Around line 198-207: Rename the ambiguous loop variable `l` to a descriptive
name like `license_url` in the ZBMathPublication license-processing block:
update the for-loop header (for l in self.licenses) and all uses of `l` to
`license_url`, preserving the existing logic that looks up license_qid via
license_dict.get(license_url), skips missing keys, calls
self.api.get_claim("P163", license_qid), collects claim into license_claims, and
finally calls self.item.add_claims(license_claims) if any.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 83d3e0ee-e71e-4266-b2da-318742bd66bb

📥 Commits

Reviewing files that changed from the base of the PR and between b343a77 and 8f7fc2d.

📒 Files selected for processing (1)
  • mardi_importer/mardi_importer/zbmath/ZBMathPublication.py

@LizzAlice LizzAlice requested a review from eloiferrer April 10, 2026 11:43
@LizzAlice LizzAlice merged commit 5524879 into main Apr 10, 2026
2 checks passed
@LizzAlice LizzAlice deleted the include_license branch April 13, 2026 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants