Skip to content

feat: Add shortcut to copy first seed + fix incorrect crawl URL labels #2803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions frontend/docs/docs/user-guide/workflow-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,9 @@ Crawl scopes are categorized as a **Page Crawl** or **Site Crawl**:

### Page URL(s)

One or more URLs of the page to crawl. URLs must follow [valid URL syntax](https://www.w3.org/Addressing/URL/url-spec.html). For example, if you're crawling a page that can be accessed on the public internet, your URL should start with `http://` or `https://`.
One or more URLs of the pages to crawl, visible when using a crawl scope of _Single Page_ or _List of Pages_. URLs will be crawled in the order that they are specified.

URLs must follow [valid URL syntax](https://www.w3.org/Addressing/URL/url-spec.html). For example, if you're crawling a page that can be accessed on the public internet, your URL should start with `http://` or `https://`.

See [List Of Pages](#list-of-pages) for additional info when providing a list of URLs.

Expand All @@ -90,7 +92,7 @@ See [List Of Pages](#list-of-pages) for additional info when providing a list of

### Crawl Start URL

This is the first page that the crawler will visit. _Site Crawl_ scopes are based on this URL.
This is the first page that the crawler will visit. When using a crawl scope of _In-Page Links_, _Pages in Same Directory_, _Pages on Same Domain_, or _Pages on Same Domain + Subdomains_, this URL is the basis for determining whether a linked URL is within scope and should be crawled.

### Include Any Linked Page

Expand Down Expand Up @@ -349,7 +351,9 @@ Describe and organize your crawl workflow and the resulting archived items.

### Name

Allows a custom name to be set for the workflow. If no name is set, the workflow's name will be set to the _Crawl Start URL_. For Page List crawls, the workflow's name will be set to the first URL present in the _Crawl URL(s)_ field, with an added `(+x)` where `x` represents the total number of URLs in the list.
Allows a custom name to be set for the workflow.

If no name is set, the workflow's name will be set to the first page URL specified in _Scope_ (also referred to as the crawl start URL.) For _Single Page_ and _List of Pages_ crawls, the workflow's name will be suffixed by `+ N` where `N` represents the number of page URLs in addition to the crawl start URL.

### Description

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ const crawlSortOptions: SortOptions = [
},
{
field: "firstSeed",
label: msg("Crawl Start URL"),
label: msg("First Page URL"),
defaultDirection: 1,
},
];
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,13 @@ export class WorkflowActionMenu extends BtrixElement {
`,
)}

<sl-menu-item
@click=${() => ClipboardController.copyToClipboard(workflow.firstSeed)}
>
<sl-icon name="link" slot="prefix"></sl-icon>
${msg("Copy First Page URL")}
</sl-menu-item>

<sl-menu-item
@click=${() =>
ClipboardController.copyToClipboard(workflow.tags.join(", "))}
Expand Down
21 changes: 19 additions & 2 deletions frontend/src/features/crawl-workflows/workflow-editor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2165,6 +2165,20 @@ https://archiveweb.page/images/${"logo.svg"}`}
};

private renderJobMetadata() {
const link_to_scope = html`<button
type="button"
class="text-blue-600 hover:text-blue-500"
@click=${async () => {
this.updateProgressState({ activeTab: "scope" });

await this.updateComplete;

void this.scrollToActivePanel();
}}
>
${msg("Scope")}
</button>`;

return html`
${inputCol(html`
<sl-input
Expand All @@ -2179,8 +2193,11 @@ https://archiveweb.page/images/${"logo.svg"}`}
></sl-input>
`)}
${this.renderHelpTextCol(
msg(`Customize this Workflow's name. Workflows are named after
the first Crawl URL by default.`),
html`${msg(`Customize the name of this workflow.`)}
${msg(
html`If omitted, the workflow will be named after the first page URL
specified in ${link_to_scope}.`,
)} `,
)}
${inputCol(html`
<sl-textarea
Expand Down
2 changes: 1 addition & 1 deletion frontend/src/pages/crawls.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ const sortableFields: Record<
defaultDirection: "desc",
},
firstSeed: {
label: msg("Crawl Start URL"),
label: msg("First Page URL"),
defaultDirection: "desc",
},
fileSize: {
Expand Down
6 changes: 3 additions & 3 deletions frontend/src/pages/org/archived-items.ts
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ const sortableFields: Record<
export class CrawlsList extends BtrixElement {
static FieldLabels: Record<SearchFields, string> = {
name: msg("Name"),
firstSeed: msg("Crawl Start URL"),
firstSeed: msg("First Page URL"),
};

@property({ type: Boolean })
Expand Down Expand Up @@ -540,8 +540,8 @@ export class CrawlsList extends BtrixElement {
placeholder=${this.itemType === "upload"
? msg("Search all uploads by name")
: this.itemType === "crawl"
? msg("Search all crawls by name or Crawl Start URL")
: msg("Search all items by name or Crawl Start URL")}
? msg("Search all crawls by name or first page URL")
: msg("Search all items by name or first page URL")}
@btrix-select=${(e: CustomEvent) => {
const { key, value } = e.detail;
this.filterBy = {
Expand Down
140 changes: 3 additions & 137 deletions frontend/src/pages/org/workflows-list.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ import type {
} from "@/components/ui/filter-chip";
import { parsePage, type PageChangeEvent } from "@/components/ui/pagination";
import { type SelectEvent } from "@/components/ui/search-combobox";
import { ClipboardController } from "@/controllers/clipboard";
import { SearchParamsController } from "@/controllers/searchParams";
import type { SelectJobTypeEvent } from "@/features/crawl-workflows/new-workflow-dialog";
import {
Expand All @@ -42,7 +41,6 @@ import {
} from "@/types/workflow";
import { isApiError } from "@/utils/api";
import { settingsForDuplicate } from "@/utils/crawl-workflows/settingsForDuplicate";
import { isArchivingDisabled } from "@/utils/orgs";
import { tw } from "@/utils/tailwind";

type SearchFields = "name" | "firstSeed";
Expand Down Expand Up @@ -77,7 +75,7 @@ const sortableFields: Record<
defaultDirection: "asc",
},
firstSeed: {
label: msg("Crawl Start URL"),
label: msg("First Page URL"),
defaultDirection: "asc",
},
created: {
Expand Down Expand Up @@ -107,7 +105,7 @@ const USED_FILTERS = [
export class WorkflowsList extends BtrixElement {
static FieldLabels: Record<SearchFields, string> = {
name: msg("Name"),
firstSeed: msg("Crawl Start URL"),
firstSeed: msg("First Page URL"),
};

@state()
Expand Down Expand Up @@ -744,7 +742,7 @@ export class WorkflowsList extends BtrixElement {
.searchOptions=${this.searchOptions}
.keyLabels=${WorkflowsList.FieldLabels}
selectedKey=${ifDefined(this.selectedSearchFilterKey)}
placeholder=${msg("Search all Workflows by name or Crawl Start URL")}
placeholder=${msg("Search all workflows by name or first page URL")}
@btrix-select=${(e: SelectEvent<typeof this.searchKeys>) => {
const { key, value } = e.detail;
if (key == null) return;
Expand Down Expand Up @@ -850,138 +848,6 @@ export class WorkflowsList extends BtrixElement {
</btrix-workflow-list-item>
`;

private renderMenu(workflow: ListWorkflow) {
return html`
${when(
workflow.isCrawlRunning && this.appState.isCrawler,
// HACK shoelace doesn't current have a way to override non-hover
// color without resetting the --sl-color-neutral-700 variable
() => html`
<sl-menu-item
@click=${() => void this.stop(workflow.lastCrawlId)}
?disabled=${workflow.lastCrawlStopping}
>
<sl-icon name="dash-square" slot="prefix"></sl-icon>
${msg("Stop Crawl")}
</sl-menu-item>
<sl-menu-item
style="--sl-color-neutral-700: var(--danger)"
@click=${() => void this.cancel(workflow.lastCrawlId)}
>
<sl-icon name="x-octagon" slot="prefix"></sl-icon>
${msg(html`Cancel & Discard Crawl`)}
</sl-menu-item>
`,
)}
${when(
this.appState.isCrawler && !workflow.isCrawlRunning,
() => html`
<sl-menu-item
style="--sl-color-neutral-700: var(--success)"
?disabled=${isArchivingDisabled(this.org, true)}
@click=${() => void this.runNow(workflow)}
>
<sl-icon name="play" slot="prefix"></sl-icon>
${msg("Run Crawl")}
</sl-menu-item>
`,
)}
${when(
this.appState.isCrawler &&
workflow.isCrawlRunning &&
!workflow.lastCrawlStopping,
// HACK shoelace doesn't current have a way to override non-hover
// color without resetting the --sl-color-neutral-700 variable
() => html`
<sl-divider></sl-divider>
<sl-menu-item
@click=${() =>
this.navigate.to(
`${this.navigate.orgBasePath}/workflows/${workflow.id}/${WorkflowTab.LatestCrawl}`,
{
dialog: "scale",
},
)}
>
<sl-icon name="plus-slash-minus" slot="prefix"></sl-icon>
${msg("Edit Browser Windows")}
</sl-menu-item>
<sl-menu-item
?disabled=${workflow.lastCrawlState !== "running"}
@click=${() =>
this.navigate.to(
`${this.navigate.orgBasePath}/workflows/${workflow.id}/${WorkflowTab.LatestCrawl}`,
{
dialog: "exclusions",
},
)}
>
<sl-icon name="table" slot="prefix"></sl-icon>
${msg("Edit Exclusions")}
</sl-menu-item>
<sl-divider></sl-divider>
`,
)}
${when(
this.appState.isCrawler,
() =>
html`<sl-menu-item
@click=${() =>
this.navigate.to(
`${this.navigate.orgBasePath}/workflows/${workflow.id}?edit`,
)}
>
<sl-icon name="gear" slot="prefix"></sl-icon>
${msg("Edit Workflow Settings")}
</sl-menu-item>`,
)}
<sl-menu-item
@click=${() =>
ClipboardController.copyToClipboard(workflow.tags.join(", "))}
?disabled=${!workflow.tags.length}
>
<sl-icon name="tags" slot="prefix"></sl-icon>
${msg("Copy Tags")}
</sl-menu-item>
${when(
this.appState.isCrawler,
() => html`
<sl-menu-item
?disabled=${isArchivingDisabled(this.org, true)}
@click=${() => void this.duplicateConfig(workflow)}
>
<sl-icon name="files" slot="prefix"></sl-icon>
${msg("Duplicate Workflow")}
</sl-menu-item>
<sl-divider></sl-divider>
<sl-menu-item
@click=${() => ClipboardController.copyToClipboard(workflow.id)}
>
<sl-icon name="copy" slot="prefix"></sl-icon>
${msg("Copy Workflow ID")}
</sl-menu-item>
${when(
!workflow.crawlCount,
() => html`
<sl-divider></sl-divider>
<sl-menu-item
style="--sl-color-neutral-700: var(--danger)"
@click=${async () => {
this.workflowToDelete = workflow;
await this.updateComplete;
void this.deleteDialog?.show();
}}
>
<sl-icon name="trash3" slot="prefix"></sl-icon>
${msg("Delete Workflow")}
</sl-menu-item>
`,
)}
`,
)}
`;
}

private renderName(crawlConfig: ListWorkflow) {
if (crawlConfig.name) return crawlConfig.name;
const { firstSeed, seedCount } = crawlConfig;
Expand Down
Loading