Releases: opencitations/oc_ds_converter
Releases · opencitations/oc_ds_converter
v2.0.0
Immutable
release. Only release title and notes can be modified.
2.0.0 (2026-04-02)
- refactor!(jalc): remove publisher prefix mapping (a64328e)
- refactor(crossref)!: auto-generate publishers file from Crossref API (dd6496a)
- refactor(crossref)!: replace tqdm with Rich for progress display (8d07567)
- refactor(storage)!: make Redis the only storage backend (4df3775)
Bug Fixes
- cache: handle empty cache file in init_cache (db92656)
- ci: track .coveragerc so CI can find it (c69e63c)
- ci: use Python 3.12 for coverage badge generation (13354eb)
- clean up PROCESS-DB after preprocessing completes (1119b72)
- crossref: skip citing entities without DOI references (f2f16b8)
- datacite: resolve test failures after PR #12 merge (a9328ee)
- doi: only attempt DOI repair when API service is enabled (9093d88)
- jalc: use lock for atomic counter increments in multiprocessing (84efa34)
- progress: exclude cached items from time remaining estimates (96bebb4)
- progress: use EMA for time remaining estimates (3b5fec2)
- resolve type errors and linting issues across process modules (7a687b6)
- restore tqdm dependency for process modules (2429343)
- test: switch coverage runner from unittest to pytest (fa2cc44)
- types: correct type annotations across processing and storage modules (e5d22c6)
Features
- crossref: add Redis publishers storage and age-based regeneration (c01039b)
- crossref: store DOI-ORCID index in Redis for multiprocessing (d4c2ed4)
- jalc: extract ORCID from researcher_id_list in creator metadata (b03e9e0)
- jalc: track progress per JSON file in multiprocessing mode (0709722)
- orcid-index: parallelize CSV loading to Redis with ProcessPoolExecutor (6f8fa00)
- storage: restore SqliteStorageManager and InMemoryStorageManager (91f3ca7)
Performance Improvements
- crossref: only invoke BeautifulSoup when the text actually contains angle brackets (3ee7afc)
- crossref: prefetch DOI-ORCID index (99e4f57)
- crossref: remove broken O(n²) ORCID fallback in get_agents_strings_list (413fff2)
BREAKING CHANGES
- JalcProcessing no longer accepts publishers_filepath
or use_redis_publishers parameters. The -p/--publishers CLI argument
has been removed from jalc_process.py. - CLI arguments --storage_path and --redis_storage_manager removed.
- The verbose parameter is removed from preprocess()
and the -v/--verbose CLI flag no longer exists. Progress is now
always displayed. - The -p/--publishers CLI argument has been removed.
The publishers file is now generated automatically.
OC DS Converter 1.1.0
Changes since v1.0.0
New features
- Add ORCID index validation to Datacite and OpenAIRE processors
- Reintegrate DOI-ORCID index validation in Crossref processing
- Add Zotero plugin support, ISSN and ISBN manager updates
Bug fixes
- Fix Redis-related issues in RA/BR processing
- Comment out VIAF API calls due to API issues
Testing and CI
- Add JALC tests for ORCID index functionality
- Expand Datacite process tests
- Update crossref_process tests
- Fix tests for Redis on GitHub Actions, syntax checks, raw strings
- Support Python 3.8 to 3.13
Dependencies
- Update pandas to ^2.2.3
OC DS Converter 1.0.0
[v1.0.0] - 2024-07-26
Summary
This is the first release of the OpenCitations Data Sources Converter (oc_ds_converter), a dedicated software tool for converting scholarly bibliographic metadata from various data sources into the format accepted by OpenCitations. The software produces two main outputs/tables: citation data and metadata. These outputs are used in the data ingestion workflow of OpenCitations, contributing to the population of the two datasets currently managed: OpenCitations Index and OpenCitations Meta.
For a detailed description, usage guidelines, and list of features, please refer to the README file.