Skip to content

Conversation

Mearman
Copy link
Member

@Mearman Mearman commented Jul 30, 2025

Summary

Adds a new dedicated check-links command specifically optimized for external HTTP/HTTPS link validation, addressing the feature request in issue #37.

Key Features

🔄 Smart retry logic for temporary failures

  • Configurable retry attempts (default: 3)
  • Configurable delay between retries (default: 1000ms)
  • Exponential backoff support

High-performance parallel checking

  • Configurable concurrency (default: 10 concurrent requests)
  • HEAD requests by default (faster than GET)
  • Optional timeout configuration (default: 10s)

📊 Multiple output formats

  • Text (human-readable console output)
  • JSON (structured data for programmatic use)
  • Markdown (formatted reports)
  • CSV (spreadsheet-compatible)

🤖 Bot-detection handling

  • Ignores 403/999 status codes by default (common bot-detection responses)
  • Configurable ignore patterns with regex support
  • Follow/no-follow redirects option

📈 Progress tracking and statistics

  • Progress indicators for large documentation sets
  • Response time measurement and averages
  • Cache hit rate reporting (ready for future caching implementation)
  • Domain-based analysis and grouping

Usage Examples

# Check all external links in current directory
markmv check-links

# Check with custom timeout and retry logic
markmv check-links docs/**/*.md --timeout 15000 --retry 5

# Generate JSON report
markmv check-links --format json > external-links-report.json

# Generate markdown report grouped by domain
markmv check-links --format markdown --group-by domain > report.md

# High concurrency checking with GET requests
markmv check-links --concurrency 20 --method GET --verbose

# Ignore specific patterns and status codes
markmv check-links --ignore-patterns "localhost,127.0.0.1" --ignore-status "403,999,503"

Implementation Details

  • New command file: src/commands/check-links.ts
  • CLI integration: Added to main CLI with comprehensive options
  • Type-safe: Full TypeScript implementation with proper interfaces
  • Error handling: Graceful handling of network failures and timeouts
  • CI/CD friendly: Returns exit code 1 if broken links found

Comparison with existing validate --check-external

Feature validate --check-external check-links
Focus General validation with external option External links only
Retry logic Basic Smart retry with configurable delays
Output formats Text, JSON Text, JSON, Markdown, CSV
Progress tracking Basic Detailed with statistics
Bot handling No Built-in 403/999 ignore
Concurrency Not configurable Fully configurable
Domain grouping No Yes
Response times No Yes

Test Plan

  • Command builds successfully
  • Help text displays correctly
  • Dry-run mode works as expected
  • Basic CLI option parsing functions
  • Test with actual external links (requires network)
  • Test all output formats
  • Test retry logic with failing links
  • Test concurrency limits
  • Test ignore patterns and status codes

Breaking Changes

None - this is a new command that doesn't affect existing functionality.

Related Issues

Resolves #37: Feature Request: Add standalone command for external link validation

Mearman added 3 commits July 30, 2025 11:26
- Add dedicated check-links command optimized for external HTTP/HTTPS links
- Support smart retry logic with configurable attempts and delays
- Provide multiple output formats: text, JSON, markdown, CSV
- Include progress indicators and response time measurements
- Add bot-detection handling (ignores 403/999 status codes by default)
- Support domain-based grouping and ignore patterns
- Implement configurable concurrency for parallel checking
- Add comprehensive CLI options for timeout, retries, and formatting

Resolves #37: Feature Request: Add standalone command for external link validation
- Add 17 test cases covering all major functionality
- Test external link detection and validation
- Test retry logic with configurable delays
- Test ignore patterns and status code filtering
- Test multiple output formats (text, JSON, markdown, CSV)
- Test result grouping by file, status, and domain
- Test glob pattern file discovery
- Test error handling and edge cases
- Fix ignore pattern implementation to properly filter before counting
- All tests passing with complete coverage of command functionality

Enhances PR #39 with proper test coverage
- Replace type coercions with proper type guards
- Use proper TypeScript types instead of 'any'
- Fix mocking in tests with vi.mocked
- Remove non-null assertions with proper null checks
- Maintain strict type safety without type coercions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Add standalone command for external link validation
1 participant