RustCrawler

A comprehensive, production-ready web crawler built in Rust to analyze and improve website quality.

Features

RustCrawler provides three types of website analysis with multiple output formats:

🔍 SEO Crawler

Analyzes search engine optimization aspects:

Title tag presence and length
Meta description tags
H1 heading tags
Canonical URL tags
Robots meta tags
Internal link validation (configurable limit)

⚡ Performance Crawler

Evaluates website performance metrics:

Response time measurement
Page size analysis
External resource counting (scripts, stylesheets)
Compression detection (Brotli, Gzip, Deflate)

♿ A11Y (Accessibility) Crawler

Checks web accessibility standards:

HTML lang attribute
Image alt attributes
ARIA landmarks and attributes
Semantic HTML5 tags
Form label associations
Skip navigation links

📊 Output Formats

Terminal: Color-coded, human-readable output
JSON: Machine-readable format for integration
HTML: Styled report for sharing

Prerequisites

Docker
Make

Note: Rust and Cargo are NOT required on your host machine. They are included in the Docker container.

Getting Started

Installation

First, build the Docker image with the latest Rust version:

make install

This command downloads and sets up a Docker container with the latest version of Rust.

Usage

Interactive Mode

make run            # Run in debug mode
make run-release    # Run in release mode
make run-release    # Run in release mode

Follows an interactive prompt to select URL and crawlers.

CLI Mode

# Analyze a URL with all crawlers
docker run --rm rustcrawler cargo run -- --url https://example.com --all

# Run specific crawlers
docker run --rm rustcrawler cargo run -- --url https://example.com --seo --performance

# Generate JSON report
docker run --rm rustcrawler cargo run -- --url https://example.com --all --format json --output report.json

# Generate HTML report
docker run --rm rustcrawler cargo run -- --url https://example.com --all --format html --output report.html

# Use custom configuration
docker run --rm rustcrawler cargo run -- --url https://example.com --all --config config.json

# Override settings
docker run --rm rustcrawler cargo run -- --url https://example.com --all --timeout 60 --max-links 20

CLI Options

--url <URL>: URL to analyze
--seo: Run SEO crawler
--performance: Run Performance crawler
--a11y: Run A11Y crawler
--all: Run all crawlers
--format <terminal|json|html>: Output format (default: terminal)
--output <FILE>: Output file for JSON/HTML
--config <FILE>: Configuration file path
--timeout <SECONDS>: Request timeout
--max-links <N>: Maximum internal links to check

Configuration File

Create a config.json:

{
  "timeout_secs": 30,
  "max_links_to_check": 10,
  "user_agent": "RustCrawler/0.1.0",
  "follow_redirects": true,
  "max_redirects": 5
}

Building and Running

All commands run inside the Docker container, so you don't need Rust installed locally.

Build

make build          # Build in debug mode
make build-release  # Build in release mode

Testing

make test           # Run all tests (17 tests)
make test-verbose   # Run tests with verbose output

Code Formatting

make format         # Format code with rustfmt
make format-check   # Check formatting without modifying files

Linting

make lint           # Run clippy linter
make check          # Check if code compiles

Development

make shell          # Open a shell in the Docker container

Cleaning

make clean          # Remove build artifacts and Docker image

Help

make help           # Display all available targets

Dependencies

The project uses the following main dependencies:

reqwest - HTTP client for making requests
url - URL parsing and validation
colored - Terminal color output
tokio - Async runtime
thiserror - Custom error types
serde / serde_json - Serialization for JSON output
clap - Command-line argument parsing
chrono - Date/time handling for reports

Project Structure

The project follows Rust best practices with a modular architecture:

RustCrawler/
├── src/
│   ├── main.rs              # Application entry point with CLI
│   ├── lib.rs               # Library root with public exports
│   ├── cli.rs               # CLI argument definitions
│   ├── config.rs            # Configuration management
│   ├── error.rs             # Custom error types
│   ├── config.rs            # Configuration management
│   ├── error.rs             # Custom error types
│   ├── client.rs            # HTTP client wrapper
│   ├── models.rs            # Data models and validation
│   ├── output.rs            # JSON/HTML report generation
│   ├── utils.rs             # Utility functions for I/O and display
│   └── crawlers/
│       ├── mod.rs           # Crawler trait and common functions
│       ├── seo.rs           # SEO crawler implementation
│       ├── performance.rs   # Performance crawler implementation
│       └── a11y.rs          # Accessibility crawler implementation
├── Cargo.toml               # Rust dependencies and project configuration
├── Dockerfile               # Docker container setup
├── Makefile                 # Build and run commands
├── ARCHITECTURE.md          # Detailed architecture documentation
└── README.md                # This file

Architecture Highlights

Modular Design: Each crawler is implemented in its own module with the Crawler trait
Separation of Concerns: HTTP client, models, configuration, and utilities are separate modules
Error Handling: Custom error types using thiserror for better error messages
Configuration: Externalized configuration with JSON file support
CLI + Interactive: Supports both command-line and interactive modes
Multiple Outputs: Terminal, JSON, and HTML report formats
Testable: 17 unit tests covering all major functionality
Extensible: Easy to add new crawlers by implementing the Crawler trait
Type Safety: Strong typing with custom models for data structures
Library + Binary: Can be used as a library or standalone application

Contributing

When contributing to this project:

Ensure your code builds with make build
Run tests with make test (17 tests should pass)
Format code with make format
Check for linting issues with make lint
Follow Rust naming conventions and best practices
Add tests for new functionality

Recent Improvements

Version 0.1.0

✅ Custom error types with thiserror
✅ Configuration management (JSON file support)
✅ CLI with clap for non-interactive use
✅ JSON and HTML output formats
✅ Configurable timeouts and limits
✅ User-agent customization
✅ Redirect policy configuration
✅ 17 comprehensive unit tests

Future Enhancements

Async/await for parallel crawling
HTML parser (scraper crate) for more accurate analysis
Integration tests with mock servers
Sitemap crawling
Rate limiting
Retry logic with exponential backoff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RustCrawler

Features

🔍 SEO Crawler

⚡ Performance Crawler

♿ A11Y (Accessibility) Crawler

📊 Output Formats

Prerequisites

Getting Started

Installation

Usage

Interactive Mode

CLI Mode

CLI Options

Configuration File

Building and Running

Build

Testing

Code Formatting

Linting

Development

Cleaning

Help

Dependencies

Project Structure

Architecture Highlights

Contributing

Recent Improvements

Version 0.1.0

Future Enhancements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md

DocRoms/RustCrawler

Folders and files

Latest commit

History

Repository files navigation

RustCrawler

Features

🔍 SEO Crawler

⚡ Performance Crawler

♿ A11Y (Accessibility) Crawler

📊 Output Formats

Prerequisites

Getting Started

Installation

Usage

Interactive Mode

CLI Mode

CLI Options

Configuration File

Building and Running

Build

Testing

Code Formatting

Linting

Development

Cleaning

Help

Dependencies

Project Structure

Architecture Highlights

Contributing

Recent Improvements

Version 0.1.0

Future Enhancements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages