Skip to content

yoeunes/regex-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,837 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

RegexParser

Author Badge GitHub Release Badge License Badge Packagist Downloads Badge GitHub Stars Badge Supported PHP Version Badge

RegexParser: Static Analysis, Linter & Logic Solver

RegexParser is a PHP 8.2+ library that treats regular expressions as code.

Unlike simple wrappers around preg_match, RegexParser implements a complete compiler pipeline (Lexer → Parser → AST) and an Automata-based Logic Solver (AST → NFA → DFA).

This architecture allows for advanced static analysis:

  • Linting: Detect redundancy, useless flags, and optimizations.
  • Safety: Statically detect catastrophic backtracking (ReDoS).
  • Logic: Mathematically compare patterns (Intersection, Equivalence, Subset).

Built for learning, validation, and robust tooling in PHP projects.

If you are new to regex, start with the Regex Tutorial. If you want a short overview, see the Quick Start Guide.

Getting started

# Install the library
composer require yoeunes/regex-parser

# Try the CLI
vendor/bin/regex explain '/\d{4}-\d{2}-\d{2}/'

What RegexParser provides

  • 🏗️ Deep Parsing: Parse /pattern/flags into a structured, typed AST.
  • 🧠 Logic Solver: Mathematically compare two regexes using NFA/DFA transformation. Detect route conflicts and validate security subsets.
  • 🛡️ ReDoS Analysis: Analyze potential catastrophic backtracking risks structure-wise.
  • 🧹 Linter: Clean up legacy code (useless flags, redundant groups) via the CLI.
  • 📖 Explanation: Explain patterns in plain English.
  • 🔧 Visitor API: A flexible API for building custom regex tooling.

Philosophy & Accuracy

RegexParser separates what it can guarantee from what is heuristic:

  • Guaranteed: parsing, AST structure, error offsets, and syntax validation for the targeted PHP/PCRE version.
  • Heuristic: ReDoS analysis is structural and conservative; treat it as potential risk unless confirmed.
  • Context matters: PCRE version, JIT, and backtrack/recursion limits change practical impact.

How to report a vulnerability responsibly

If you believe a pattern is exploitable:

  1. Run confirmed mode and capture a bounded, reproducible PoC.
  2. Include the pattern, input lengths, timings, JIT setting, and PCRE limits.
  3. Verify impact in the real code path before filing a security issue.

See SECURITY.md for reporting channels.

Safer rewrites (verify behavior)

These techniques reduce backtracking but can change matching behavior. Always validate with tests.

/(a+)+$/     -> /a+$/      (semantics often preserved, but verify captures)
/(a+)+$/     -> /a++$/     (possessive, no backtracking)
/(a|aa)+/    -> /a+/       (only if alternation is redundant)
/(a|aa)+/    -> /(?>a|aa)+/ (atomic, avoids backtracking)

How it works

  • Regex::parse() splits the literal into pattern and flags.
  • The lexer produces a token stream.
  • The parser builds an AST (RegexNode).
  • Visitors walk the AST to validate, explain, analyze, or transform.

For the full architecture, see docs/ARCHITECTURE.md.

CLI quick tour

# Parse and validate a pattern
vendor/bin/regex parse '/^hello world$/'

# Get plain English explanation
vendor/bin/regex explain '/\d{4}-\d{2}-\d{2}/'

# Check for potential ReDoS risk (theoretical by default)
vendor/bin/regex analyze '/(a+)+$/'

# Colorize pattern for better readability
vendor/bin/regex highlight '/\d+/'

# Lint your entire codebase
vendor/bin/regex lint src/

Regex Lint Output

PHP API at a glance

use RegexParser\Regex;
use RegexParser\ReDoS\ReDoSMode;

$regex = Regex::create([
    'runtime_pcre_validation' => true,
]);

// Parse a pattern into AST
$ast = $regex->parse('/^hello world$/i');

// Validate pattern safety
$result = $regex->validate('/(?<=test)foo/');
if (!$result->isValid()) {
    echo $result->getErrorMessage();
}

// Check for ReDoS risk (theoretical by default)
$analysis = $regex->redos('/(a+)+$/');
echo $analysis->severity->value; // 'critical', 'safe', etc.

// Optional: attempt bounded confirmation
$confirmed = $regex->redos('/(a+)+$/', mode: ReDoSMode::CONFIRMED);
echo $confirmed->isConfirmed() ? 'confirmed' : 'theoretical';

// Get human-readable explanation
echo $regex->explain('/\d{4}-\d{2}-\d{2}/');

Integrations

RegexParser integrates with common PHP tooling:

  • Symfony bundle: docs/guides/cli.md
  • PHPStan: vendor/yoeunes/regex-parser/extension.neon
  • GitHub Actions: vendor/bin/regex lint in your CI pipeline

Performance

RegexParser ships lightweight benchmark scripts in benchmarks/ to track parser, compiler, and formatter throughput.

  • Run formatter benchmarks: php benchmarks/benchmark_formatters.php
  • Run all benchmarks: for file in benchmarks/benchmark_*.php; do echo "Running $file"; php "$file"; echo; done

Documentation

Start here:

Key references:

Contributing

Contributions are welcome! See CONTRIBUTING.md to get started.

# Set up development environment
composer install

# Run tests
composer phpunit

# Check code style
composer phpcs

# Run static analysis
composer phpstan

License

Released under the MIT License.

Support

If you run into issues or have questions, please open an issue on GitHub: https://github.com/yoeunes/regex-parser/issues.