Releases · MITLibraries/browsertrix-harvester

11 Dec 18:46

ghukill

v2.0

284cafe

v2.0 - Pivot to full HTML records versus metadata records Latest

Latest

What's Changed

USE 240 - prep work for staff directory in mitlibwebsite source by @ghukill in #52
USE 258 - Rework harvester to return "records" vs "metadata records" by @ghukill in #53
USE 272 - Add response headers to output records by @ghukill in #54

Full Changelog: v1.4...v2.0

Contributors

ghukill

Assets 2

31 Oct 18:36

ehanson8

v1.4

1e61e73

v1.4 Handle empty crawls

What's Changed

USE-91-handle-empty-crawls by @ehanson8 in #51

Full Changelog: v1.3...v1.4

Contributors

ehanson8

Assets 2

23 Oct 14:38

ghukill

v1.3

d186380

v1.3 - Initial Production Release

NOTE: it is known that crawls resulting in zero seed URLs will throw an error. This release will allow for a "full" harvest in production, with a fix for this coming soon, at which time we'll enable daily harvests.

What's Changed

TIMX 557 and misc updates by @ghukill in #44
TIMX 562 - Handle crawls with different pages and CDX data by @ghukill in #45
USE-93 - Support pre-crawl, sitemap parsing by @ghukill in #46
USE 97 - Generate delete metadata records by @ghukill in #47
USE 93 (contd) - Streamline sitemap CLI arg by @ghukill in #48
USE 86 - Remove crawler workers defaults by @ghukill in #49
In 1524 - 2025-10 Maintenance by @jonavellecuerdo in #50

New Contributors

@jonavellecuerdo made their first contribution in #50

Full Changelog: v1.2.1...v1.3

Contributors

ghukill and jonavellecuerdo

Assets 2

06 Oct 13:13

cabutlermit

v1.2.1

17eaef1

v1.2.1 - Update Deployment Workflows

What's Changed

Updates For New Shared Workflows by @cabutlermit in #43

New Contributors

@cabutlermit made their first contribution in #43

Full Changelog: v1.2...v1.2.1

Contributors

cabutlermit

Assets 2

19 Aug 15:00

ghukill

v1.2

3b79a44

v1.2 - Support JSONLines output

What's Changed

IN-1240 - Replace pipenv check with pip-audit by @ghukill in #41
TIMX 542 - support JSONLines output by @ghukill in #42

Full Changelog: v1.1.1...v1.2

Contributors

ghukill

Assets 2

20 Sep 15:58

ehanson8

v1.1.1

38f6316

Maintenance updates

What's Changed

Maintenance 09 2024 by @ehanson8 in #39
Update Dockerfile by @ehanson8 in #40

New Contributors

@ehanson8 made their first contribution in #39

Full Changelog: v1.1.0...v1.1.1

Contributors

ehanson8

Assets 2

06 Nov 15:20

ghukill

v1.1.0

50b8009

v1.1.0 Align with Browsertrix-Crawler 12.x

What's Changed

Align btrix CLI arguments for v0.12.0 release by @ghukill in #22

Full Changelog: v1.0.0...v1.1.0

Contributors

ghukill

Assets 2

16 Oct 20:10

ghukill

v1.0.0

7e4a5e1

Initial Release

Initial production release.

What's Changed

Initial scaffolding of CLI app by @ghukill in #7
Add web crawl capabilities to harvester app by @ghukill in #11
Metadata record parsing by @ghukill in #12
PR4 - Add CI and AWS terraform by @ghukill in #15

Full Changelog: https://github.com/MITLibraries/browsertrix-harvester/commits/v1.0.0

Contributors

ghukill

Assets 2

Releases: MITLibraries/browsertrix-harvester

v2.0 - Pivot to full HTML records versus metadata records

What's Changed

Contributors

Uh oh!

v1.4 Handle empty crawls

What's Changed

Contributors

Uh oh!

v1.3 - Initial Production Release

What's Changed

New Contributors

Contributors

Uh oh!

v1.2.1 - Update Deployment Workflows

What's Changed

New Contributors

Contributors

Uh oh!

v1.2 - Support JSONLines output

What's Changed

Contributors

Uh oh!

Maintenance updates

What's Changed

New Contributors

Contributors

Uh oh!

v1.1.0 Align with Browsertrix-Crawler 12.x

What's Changed

Contributors

Uh oh!

Initial Release

What's Changed

Contributors

Uh oh!