Releases: MITLibraries/browsertrix-harvester
Releases · MITLibraries/browsertrix-harvester
v2.0 - Pivot to full HTML records versus metadata records
v1.4 Handle empty crawls
v1.3 - Initial Production Release
NOTE: it is known that crawls resulting in zero seed URLs will throw an error. This release will allow for a "full" harvest in production, with a fix for this coming soon, at which time we'll enable daily harvests.
What's Changed
- TIMX 557 and misc updates by @ghukill in #44
- TIMX 562 - Handle crawls with different pages and CDX data by @ghukill in #45
- USE-93 - Support pre-crawl, sitemap parsing by @ghukill in #46
- USE 97 - Generate delete metadata records by @ghukill in #47
- USE 93 (contd) - Streamline sitemap CLI arg by @ghukill in #48
- USE 86 - Remove crawler workers defaults by @ghukill in #49
- In 1524 - 2025-10 Maintenance by @jonavellecuerdo in #50
New Contributors
- @jonavellecuerdo made their first contribution in #50
Full Changelog: v1.2.1...v1.3
v1.2.1 - Update Deployment Workflows
What's Changed
- Updates For New Shared Workflows by @cabutlermit in #43
New Contributors
- @cabutlermit made their first contribution in #43
Full Changelog: v1.2...v1.2.1
v1.2 - Support JSONLines output
What's Changed
- IN-1240 - Replace pipenv check with pip-audit by @ghukill in #41
- TIMX 542 - support JSONLines output by @ghukill in #42
Full Changelog: v1.1.1...v1.2
Maintenance updates
v1.1.0 Align with Browsertrix-Crawler 12.x
What's Changed
Full Changelog: v1.0.0...v1.1.0
Initial Release
Initial production release.
What's Changed
- Initial scaffolding of CLI app by @ghukill in #7
- Add web crawl capabilities to harvester app by @ghukill in #11
- Metadata record parsing by @ghukill in #12
- PR4 - Add CI and AWS terraform by @ghukill in #15
Full Changelog: https://github.com/MITLibraries/browsertrix-harvester/commits/v1.0.0