fix(generate): normalise crawl URLs + TUI docs on homepage and getting-started#166
Merged
Merged
Conversation
…cates The crawl dedup map used raw href strings, so https://example.com/page and https://example.com/page/ were treated as distinct targets. This produced duplicate entries in generated configs for sites like golang.org where both forms appear as hrefs on the same page. normalizeCrawlURL now strips fragments, query strings, and trailing slashes from non-root paths before inserting into the visited set and before returning links from extractLinks. Root paths are kept as "/". Also: docs homepage and getting-started now include the --tui dashboard with an example output block so the feature is visible up front. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sendit generate --urlwas producing duplicate targets for trailing-slash variants (e.g.https://example.com/pageandhttps://example.com/page/both appeared). AddednormalizeCrawlURLwhich strips fragments, query strings, and trailing slashes from non-root paths before deduplication — applied in bothextractLinksand at the seed URL._index.md): added a "Terminal UI" section at the top with an example dashboard block so the feature is visible before the section table.getting-started.md): added "Run with the terminal UI" section with example--tuioutput, positioned between "Run" and "Run with Docker".Test plan
sendit generate --url https://golang.org --depth 1produces no trailing-slash duplicates🤖 Generated with Claude Code