feat: Add shortcut to copy first seed + fix incorrect crawl URL labels #2803

SuaYoo · 2025-08-13T02:07:57Z

Resolves #2751
Partially addresses #2801

Changes

Adds option to copy first seed URL to clipboard to workflow actions
Switches instances of "Crawl Start URL" to "First Page URL" when the label can apply to any workflow scope type
Updates user guide to clarify "Page URL" vs. "Crawl Start URL"

Manual testing

Log in
Go to "Crawling"
Select three dots overflow menu icon for a workflow. Verify "Copy Crawl Start URL" option is shown
Choose "Copy Crawl Start URL". Verify URL is copied to clipboard
Go into workflow
Select "Actions" dropdown. Verify "Copy Crawl Start URL" option is shown and works as expected

Screenshots

Page	Image/video
Crawl Workflows
Crawl Workflows (workflow action menu)
Workflow Editor
User Guide / Crawl Workflow Settings

Follow-ups

This PR partially fixes an inconsistency in "Crawl Start URL" usage for all workflows, even though this label technically does not apply to Single Page and List of Pages crawl scopes. I created #2801 to address any remaining work needed to update docs.

An alternative solution would be to add scopeType to crawlconfigs list JSON response (cc @tw4l.) This way, we could conditionally display "Copy Crawl Start URL" or "Copy First Page URL" depending on the scope type.

emma-sg

Looks good!

tw4l · 2025-08-14T19:37:03Z

I think elsewhere in the application and our documentation where we use "Page URL", it typically refers to pages that have been crawled, rather than the Crawl Start URL(s) (i.e. seeds, though we don't use that vocabulary in the Browsertrix frontend). In other words, the results of crawling vs. the URLs used to configure crawl scope. Personally think that's a useful distinction worth keeping, so I'm hesitant to use Page URL in both contexts as proposed here. [edit: it was pointed out to me that I'm incorrect and we do currently use "Page URL" in the Page and List of Pages workflow config form, though I think the seed distinction is still a useful one]

For instance, even a single page scoped workflow could have a single Crawl Start URL but multiple crawled Page URLs if the box to include linked pages is checked. List of Pages is a little more ambiguous, but similarly, I suppose each could be considered a Crawl Start URL that depending on workflow configuration options might result in additional Page URLs being crawled for each.

Also perhaps worth noting that it's possible via the backend API to configure multiple seeds/Crawl Start URLs that each have their own scope types and additional configuration (e.g. includes/excludes) that override the default, though we have not made it possible to configure workflows that way via the frontend. That is a pattern that will be common to many experienced web archivists, as it's a common practice with Browsertrix Crawler and other crawlers like Heritrix.

It seems to me that we're creating some confusion for ourselves (myself included!) by avoiding use of the word "seed" in some ways, though I understand that we made that decision to try to make the interface friendly to people who aren't already web archiving experts. Maybe the best solution would be to consistently use "Crawl Start URL" interchangeably with seed as we at least mostly currently do, and add "First" as a prefix as necessary within the UI when there are multiple. So for instance, "Copy Crawl Start URL" -> "Copy First Crawl Start URL" for workflows with list of pages. I'd be happy to make any backend changes necessary to facilitate that - i ooks like the crawlconfigs/ list endpoint isn't returning the global scopeType for workflows currently because we're excluding the entire config to avoid slowing the response down, but it would be easy to add that field.

SuaYoo added 3 commits August 12, 2025 16:07

remove unused render

2f3b2c3

add copy option

cbee597

update docs and info text

201f42f

SuaYoo marked this pull request as ready for review August 14, 2025 18:42

switch to generic label

d0130af

SuaYoo changed the title ~~feat: Add shortcut to copy first seed + update docs~~ feat: Add shortcut to copy first seed + fix incorrect crawl URL labels Aug 14, 2025

SuaYoo requested review from ikreymer, emma-sg, DaleLore and tw4l August 14, 2025 18:54

emma-sg approved these changes Aug 14, 2025

View reviewed changes

SuaYoo mentioned this pull request Aug 14, 2025

[Docs]: Update references to "Page List" and "Crawl URL(s)" #2801

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Add shortcut to copy first seed + fix incorrect crawl URL labels #2803

feat: Add shortcut to copy first seed + fix incorrect crawl URL labels #2803

Uh oh!

SuaYoo commented Aug 13, 2025 •

edited

Loading

Uh oh!

emma-sg left a comment

Uh oh!

tw4l commented Aug 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

feat: Add shortcut to copy first seed + fix incorrect crawl URL labels #2803

Are you sure you want to change the base?

feat: Add shortcut to copy first seed + fix incorrect crawl URL labels #2803

Uh oh!

Conversation

SuaYoo commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Manual testing

Screenshots

Follow-ups

Uh oh!

emma-sg left a comment

Choose a reason for hiding this comment

Uh oh!

tw4l commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

SuaYoo commented Aug 13, 2025 •

edited

Loading

tw4l commented Aug 14, 2025 •

edited

Loading