Skip to content

Releases: opencitations/oc_ds_converter

v2.0.0

02 Apr 14:11
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

2.0.0 (2026-04-02)

  • refactor!(jalc): remove publisher prefix mapping (a64328e)
  • refactor(crossref)!: auto-generate publishers file from Crossref API (dd6496a)
  • refactor(crossref)!: replace tqdm with Rich for progress display (8d07567)
  • refactor(storage)!: make Redis the only storage backend (4df3775)

Bug Fixes

  • cache: handle empty cache file in init_cache (db92656)
  • ci: track .coveragerc so CI can find it (c69e63c)
  • ci: use Python 3.12 for coverage badge generation (13354eb)
  • clean up PROCESS-DB after preprocessing completes (1119b72)
  • crossref: skip citing entities without DOI references (f2f16b8)
  • datacite: resolve test failures after PR #12 merge (a9328ee)
  • doi: only attempt DOI repair when API service is enabled (9093d88)
  • jalc: use lock for atomic counter increments in multiprocessing (84efa34)
  • progress: exclude cached items from time remaining estimates (96bebb4)
  • progress: use EMA for time remaining estimates (3b5fec2)
  • resolve type errors and linting issues across process modules (7a687b6)
  • restore tqdm dependency for process modules (2429343)
  • test: switch coverage runner from unittest to pytest (fa2cc44)
  • types: correct type annotations across processing and storage modules (e5d22c6)

Features

  • crossref: add Redis publishers storage and age-based regeneration (c01039b)
  • crossref: store DOI-ORCID index in Redis for multiprocessing (d4c2ed4)
  • jalc: extract ORCID from researcher_id_list in creator metadata (b03e9e0)
  • jalc: track progress per JSON file in multiprocessing mode (0709722)
  • orcid-index: parallelize CSV loading to Redis with ProcessPoolExecutor (6f8fa00)
  • storage: restore SqliteStorageManager and InMemoryStorageManager (91f3ca7)

Performance Improvements

  • crossref: only invoke BeautifulSoup when the text actually contains angle brackets (3ee7afc)
  • crossref: prefetch DOI-ORCID index (99e4f57)
  • crossref: remove broken O(n²) ORCID fallback in get_agents_strings_list (413fff2)

BREAKING CHANGES

  • JalcProcessing no longer accepts publishers_filepath
    or use_redis_publishers parameters. The -p/--publishers CLI argument
    has been removed from jalc_process.py.
  • CLI arguments --storage_path and --redis_storage_manager removed.
  • The verbose parameter is removed from preprocess()
    and the -v/--verbose CLI flag no longer exists. Progress is now
    always displayed.
  • The -p/--publishers CLI argument has been removed.
    The publishers file is now generated automatically.

OC DS Converter 1.1.0

12 Mar 16:17

Choose a tag to compare

Changes since v1.0.0

New features

  • Add ORCID index validation to Datacite and OpenAIRE processors
  • Reintegrate DOI-ORCID index validation in Crossref processing
  • Add Zotero plugin support, ISSN and ISBN manager updates

Bug fixes

  • Fix Redis-related issues in RA/BR processing
  • Comment out VIAF API calls due to API issues

Testing and CI

  • Add JALC tests for ORCID index functionality
  • Expand Datacite process tests
  • Update crossref_process tests
  • Fix tests for Redis on GitHub Actions, syntax checks, raw strings
  • Support Python 3.8 to 3.13

Dependencies

  • Update pandas to ^2.2.3

OC DS Converter 1.0.0

26 Jul 08:35

Choose a tag to compare

[v1.0.0] - 2024-07-26

Summary

This is the first release of the OpenCitations Data Sources Converter (oc_ds_converter), a dedicated software tool for converting scholarly bibliographic metadata from various data sources into the format accepted by OpenCitations. The software produces two main outputs/tables: citation data and metadata. These outputs are used in the data ingestion workflow of OpenCitations, contributing to the population of the two datasets currently managed: OpenCitations Index and OpenCitations Meta.

For a detailed description, usage guidelines, and list of features, please refer to the README file.