Skip to content

Git integration for incremental validation #36

@Mearman

Description

@Mearman

Problem

Running full link validation on large projects with extensive documentation can be time-consuming and resource-intensive. For projects with 100+ markdown files and hundreds of external links, validation can take several minutes.

Real-world Context

In a typical development workflow:

  • Most commits only change a few documentation files
  • Full validation runs check unchanged files repeatedly
  • CI pipelines waste time validating previously verified links
  • Pre-commit hooks become too slow to be practical

Proposed Solution

Add Git integration to enable incremental validation:

  1. Changed files detection

    # Only validate files changed since last commit
    markmv validate --git-diff HEAD~1
    
    # Only validate files in current branch vs main
    markmv validate --git-diff main..HEAD
    
    # Validate staged changes only
    markmv validate --git-staged
  2. Smart dependency tracking

    • Detect when shared files (like common link references) change
    • Validate dependent files that reference changed content
    • Handle moved/renamed files intelligently
  3. Validation caching

    • Store validation results with git commit hashes
    • Skip re-validation of unchanged files with same external links
    • Invalidate cache when external validation rules change
  4. Pre-commit hook integration

    # Fast pre-commit validation
    markmv validate --git-staged --cache --fail-fast

Expected Output

$ markmv validate --git-diff HEAD~1

🔍 Git Integration
Changed since HEAD~1: 3 files
- docs/firebase-setup.md (modified)
- README.md (modified)  
- docs/new-feature.md (added)

📊 Validation Summary
Files processed: 3 (97 cached, 2 unchanged)
Total links found: 23 (140 from cache)
New/changed links: 8
Broken links: 0
Processing time: 847ms (was 29s for full validation)

💾 Cache Status
- Cached results: 97 files, 140 links
- Cache hit rate: 97.1%
- Last full validation: 2 hours ago

Configuration Options

# .markmv.yml
git:
  enabled: true
  cache:
    enabled: true
    location: ".markmv-cache"
    ttl: "24h"  # Force re-check external links after 24h
  hooks:
    pre-commit: true
    fail-fast: true  # Exit on first error for faster feedback
  diff:
    base-branch: "main"
    include-dependencies: true  # Check files that reference changed files

Pre-commit Hook Setup

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: markmv-validate
        name: Validate markdown links
        entry: markmv validate --git-staged --fail-fast --cache
        language: node
        files: '\.md$'
        pass_filenames: false

CI/CD Integration Benefits

# GitHub Actions example
- name: Validate documentation links
  run: |
    if [[ "${{ github.event_name }}" == "pull_request" ]]; then
      # Only validate changed files in PR
      markmv validate --git-diff origin/${{ github.base_ref }}..HEAD
    else
      # Full validation on main branch (with caching)
      markmv validate --cache
    fi

Advanced Features

  1. Dependency tracking

    • When shared-links.md changes, validate all files that reference it
    • Handle link includes and shared reference files
  2. Smart invalidation

    • Invalidate cache when markmv config changes
    • Detect when external link patterns change
  3. Parallel processing

    • Process changed files in parallel with cached results
    • Maintain performance even with complex dependency graphs

Benefits

  • Faster development workflow: Pre-commit hooks run in <1 second instead of 30+ seconds
  • Efficient CI/CD: PR checks only validate relevant changes
  • Better developer experience: Quick feedback without sacrificing thoroughness
  • Resource optimization: Reduce unnecessary external requests
  • Scalability: Handle large documentation projects efficiently

This would make markmv practical for large-scale projects and enable seamless integration into modern development workflows.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions