This reference documents the public API surface of RegexParser: entry points, configuration options, return objects, and the exception hierarchy.
Creates a configured Regex instance. This is the primary entry point for all library operations.
Factory steps:
- Validate options.
- Create a configured instance.
- Return a ready-to-use
Regex.
Example:
use RegexParser\Regex;
$regex = Regex::create([
'cache' => '/var/cache/regex',
'max_pattern_length' => 100_000,
'max_lookbehind_length' => 255,
'runtime_pcre_validation' => false,
'redos_ignored_patterns' => [],
'max_recursion_depth' => 1024,
'php_version' => '8.2',
]);
$result = $regex->validate('/foo|bar/');
echo $result->isValid() ? 'Valid' : 'Invalid';Alias for Regex::create(). Use whichever reads better in your code.
// These are equivalent
$regex = Regex::create();
$regex = Regex::new();Lexes a regex into a TokenStream with positional offsets. Useful for custom analysis or debugging.
use RegexParser\Regex;
$stream = Regex::tokenize('/foo|bar/i');
foreach ($stream as $token) {
echo "Type: {$token->type->value}, Value: '{$token->value}'\n";
echo "Position: {$token->start} - {$token->end}\n";
}Clears static caches used by the validator. Important for long-running processes to prevent memory growth.
use RegexParser\Regex;
$regex = Regex::create();
// Process many patterns...
foreach ($patterns as $pattern) {
$regex->validate($pattern);
}
// Clear caches periodically
$regex->clearValidatorCaches();All options are validated. Unknown keys throw InvalidRegexOptionException.
| Option | Type | Default | Description | Performance Impact |
|---|---|---|---|---|
cache |
null | string | CacheInterface |
FilesystemCache |
Cache for parsed ASTs | High - speeds repeated patterns |
max_pattern_length |
int |
100_000 |
Maximum pattern length | Low - prevents abuse |
max_lookbehind_length |
int |
255 |
Maximum lookbehind length | Low - PCRE compliance |
runtime_pcre_validation |
bool |
false |
Compile-check via preg_match() | Medium - extra compile step |
redos_ignored_patterns |
array<string> |
[] |
Patterns to skip ReDoS | Low - reduces false positives |
max_recursion_depth |
int |
1024 |
Parser recursion guard | Low - prevents stack overflow |
php_version |
string | int |
PHP_VERSION_ID |
Target PHP version | Low - feature validation |
Parses a pattern body plus flags/delimiter into a RegexNode. Use this when you have separate pattern components.
use RegexParser\Regex;
$pattern = 'foo|bar';
$flags = 'i';
$delimiter = '/';
$ast = Regex::create()->parsePattern($pattern, $flags, $delimiter);
echo $ast->flags; // 'i'
echo $ast->delimiter; // '/'
echo $ast->pattern; // SequenceNode or AlternationNodeParses a full PCRE string (/pattern/flags).
use RegexParser\Regex;
// Strict parsing (default)
$ast = Regex::create()->parse('/foo|bar/i');
echo $ast->flags; // 'i'
echo $ast->delimiter; // '/'
// Tolerant parsing - returns AST even with errors
$result = Regex::create()->parse('/[unclosed/i', true);
echo $result->ast; // Partial AST
echo $result->errors[0]->getMessage(); // First errorReturns a structured validation result without throwing exceptions.
use RegexParser\Regex;
$result = Regex::create()->validate('/foo|bar/');
echo $result->isValid(); // true
echo $result->complexityScore; // int
echo $result->category->value; // ValidationErrorCategory enumValidationResult Fields:
| Field | Type | Description |
|---|---|---|
isValid |
bool | Whether pattern is valid |
error |
string|null | Error message if invalid |
errorCode |
string|null | Stable error code |
offset |
int|null | Byte offset of error |
caretSnippet |
string|null | Snippet with caret |
hint |
string|null | Fix suggestion |
complexityScore |
int | Pattern complexity |
category |
ValidationErrorCategory | Error category |
Aggregates validation, lint, ReDoS analysis, optimization, and explanation into a single report.
use RegexParser\Regex;
$report = Regex::create()->analyze('/(a+)+b/');
echo $report->isValid; // true/false
echo count($report->errors); // Validation errors
echo count($report->lintIssues); // Lint warnings
echo $report->redos->severity->value; // 'critical', 'safe', etc.
echo $report->explain; // Human explanation
echo $report->highlighted; // Syntax-highlighted patternAnalysisReport Fields:
| Field | Type | Description |
|---|---|---|
isValid |
bool | Pattern is syntactically valid |
errors |
array | Validation errors |
lintIssues |
array | Linting warnings |
redos |
ReDoSAnalysis | ReDoS analysis result |
optimizations |
array | Suggested optimizations |
explain |
string | Human explanation |
highlighted |
string | Highlighted pattern |
redos(string $regex, ?ReDoSSeverity $threshold = null, ReDoSMode $mode = ReDoSMode::THEORETICAL, ?ReDoSConfirmOptions $confirmOptions = null): ReDoSAnalysis
Analyzes ReDoS risk without an analysis report. Default mode is theoretical (structural). Use confirmed mode to attempt bounded evidence collection.
use RegexParser\Regex;
use RegexParser\ReDoS\ReDoSMode;
$analysis = Regex::create()->redos('/(a+)+b/', mode: ReDoSMode::THEORETICAL);
echo $analysis->severity->value; // 'critical', 'safe', etc.
echo $analysis->score; // int (0-10)
echo $analysis->confidenceLevel()->value; // 'high', 'medium', 'low'
echo $analysis->vulnerablePart; // Subpattern causing risk
echo $analysis->recommendations[0]; // Suggested fix
// Optional: bounded confirmation
$confirmed = Regex::create()->redos('/(a+)+b/', mode: ReDoSMode::CONFIRMED);
echo $confirmed->isConfirmed() ? 'confirmed' : 'theoretical';ReDoSAnalysis Fields:
| Field | Type | Description |
|---|---|---|
severity |
ReDoSSeverity | Risk level |
score |
int | Risk score (0-10) |
mode |
ReDoSMode | off, theoretical, or confirmed |
confidence |
Confidence | Analysis confidence (use confidenceLevel()) |
confirmation |
ReDoSConfirmation|null | Bounded evidence details |
vulnerablePart |
string|null | Risky subpattern |
recommendations |
array | Suggested fixes (verify behavior) |
hotspots |
array | Problem locations |
suggestedRewrite |
string|null | Suggested rewrite (verify behavior) |
Applies safe optimizations to the pattern.
use RegexParser\Regex;
$result = Regex::create()->optimize('/[0-9]+/', [
'digits' => true, // [0-9] -> \d
'word' => true, // [A-Za-z0-9_] -> \w
'ranges' => true, // Normalize ranges
'canonicalizeCharClasses' => true, // Normalize character class order/dedup
'autoPossessify' => false, // Add possessive quantifiers
'allowAlternationFactorization' => false, // Factor common parts
'minQuantifierCount' => 4, // Use {n} only when repetition >= 4
'verifyWithAutomata' => false, // Verify equivalence with the automata solver when possible
]);
echo $result->original; // '/[0-9]+/'
echo $result->optimized; // '/\d+/'
echo $result->changes[0]; // 'Optimized pattern.'When verifyWithAutomata is enabled, RegexParser validates that the optimization is language-equivalent for the
supported regular subset. Unsupported patterns fall back to the original behavior.
Transpiles a PCRE literal to another regex dialect (starting with JavaScript).
use RegexParser\Regex;
$result = Regex::create()->transpile('/(?P<word>\\w+)/i', 'javascript');
echo $result->literal; // '/(?<word>\\w+)/i'
echo $result->constructor; // 'new RegExp("(?<word>\\w+)", "i")'
print_r($result->warnings);Notes:
- Unsupported PCRE constructs throw
TranspileException. - JavaScript targets may add
/uwhen Unicode properties or code point escapes are used. /xis dropped after comments/whitespace are normalized.TranspileOptionslets you disable JS lookbehind support (allowLookbehind: false).- Available targets:
javascript(alias:js).
Extracts fixed literals and prefix/suffix data for fast prefilters or indexing.
use RegexParser\Regex;
$result = Regex::create()->literals('/user-\d{4}/');
print_r($result->literals); // ['user-']
echo $result->patterns[0]; // '/user-\d{4}/'
echo $result->prefix; // 'user-'
echo $result->suffix; // ''
echo $result->literalSet; // LiteralSet objectGener that matches the patternates a sample string. Useful for testing or documentation.
use RegexParser\Regex;
$sample = Regex::create()->generate('/[A-Z][a-z]{3,5}\d{2}/');
echo $sample; // e.g., "Word12"Generates a human-readable explanation of the pattern.
use RegexParser\Regex;
// Plain text explanation
$text = Regex::create()->explain('/\d{3}-\d{4}/');
echo $text;
/*
Match exactly 3 digits, then hyphen, then exactly 4 digits.
*/
// HTML explanation for docs/UIs
$html = Regex::create()->explain('/\w+@\w+\.\w+/', 'html');
echo $html;
// <p>Match one or more word characters, then @, then...Generates syntax-highlighted output.
use RegexParser\Regex;
// ANSI colors for console
$highlighted = Regex::create()->highlight('/\d+/', 'console');
echo $highlighted; // "\033[38;2;78;201;176m\\d\033[0m\033[38;2;215;186;125m+\033[0m"
// HTML for web
$html = Regex::create()->highlight('/[a-z]+/', 'html');
echo $html;
// <span class="regex-token regex-literal">[a-z]</span>...Returned by validate(). Provides structured validation feedback.
$result = Regex::create()->validate('/[unclosed/');
if (!$result->isValid()) {
echo $result->error; // "Unterminated character class"
echo $result->errorCode; // "regex.syntax.unterminated"
echo $result->offset; // 9
echo $result->caretSnippet; // "Pattern: [unclosed\n ^"
echo $result->hint; // "Close the bracket: ]"
echo $result->category->value; // "syntax"
}Returned by parse($regex, true). Contains partial AST plus errors.
$result = Regex::create()->parse('/[broken/i', true);
echo $result->ast instanceof \RegexParser\Node\RegexNode; // true (partial)
echo count($result->errors); // 1
echo $result->errors[0]->getMessage(); // "Unterminated character class"Returned by analyze(). Comprehensive pattern analysis.
$report = Regex::create()->analyze('/(a+)+b/');
if (!$report->isValid) {
// Handle validation errors
foreach ($report->errors as $error) {
echo $error['message'];
}
}
// Check ReDoS safety
if ($report->redos->severity->value !== 'safe') {
echo "Pattern may be vulnerable!";
echo $report->redos->recommendations[0];
}
// Get explanation
echo $report->explain;Returned by optimize(). Shows what changed.
$result = Regex::create()->optimize('/[0-9]+/');
echo $result->original; // '/[0-9]+/'
echo $result->optimized; // '/\d+/'
foreach ($result->changes as $change) {
echo "- $change\n";
}
// Output:
// - Replaced [0-9] with \d
// - Saved 5 charactersReturned by transpile(). Includes JavaScript output and diagnostics.
$result = Regex::create()->transpile('/(?P<word>\\w+)/i', 'javascript');
echo $result->pattern; // '(?<word>\\w+)'
echo $result->flags; // 'i'
echo $result->literal; // '/(?<word>\\w+)/i'
echo $result->constructor; // 'new RegExp("(?<word>\\w+)", "i")'
foreach ($result->warnings as $warning) {
echo "- $warning\n";
}Returned by literals(). Extracts fixed content.
$result = Regex::create()->literals('/user-\d{4}/');
echo $result->prefix; // 'user-'
echo $result->suffix; // ''
echo $result->confidence->value; // 'high'
foreach ($result->literals as $literal) {
echo "Found literal: $literal\n";
}
// Output: Found literal: user-RegexParser uses a focused exception hierarchy for precise error handling:
Exception hierarchy (simplified):
RegexParserExceptionInterfaceInvalidRegexOptionException(invalid configuration option)LexerException(tokenization failure)ParserExceptionSyntaxErrorException(invalid syntax)SemanticErrorException(semantic validation failure)
RecursionLimitException(max recursion depth)ResourceLimitException(resource limits)RegexException(base exception with position and error code)TranspileException(unsupported target or feature during transpile)
Usage Examples:
use RegexParser\Regex;
use RegexParser\Exception\LexerException;
use RegexParser\Exception\ParserException;
use RegexParser\Exception\InvalidRegexOptionException;
try {
$regex = Regex::create(['invalid_key' => 'value']);
} catch (InvalidRegexOptionException $e) {
echo "Bad option: {$e->getMessage()}";
}
try {
$ast = Regex::create()->parse('/[unclosed/');
} catch (LexerException $e) {
echo "Tokenization failed: {$e->getMessage()}";
} catch (ParserException $e) {
echo "Parse failed: {$e->getMessage()}";
}
// Catch-all for any library error
try {
$result = Regex::create()->validate('/test/');
} catch (\RegexParser\Exception\RegexParserExceptionInterface $e) {
echo "RegexParser error: {$e->getMessage()}";
}| Method | Returns | Purpose |
|---|---|---|
create($options) |
Regex | Factory method |
parse($pattern) |
RegexNode | Parse to AST |
parse($pattern, true) |
TolerantParseResult | Parse with errors |
validate($regex) |
ValidationResult | Check validity |
analyze($regex) |
AnalysisReport | Analysis report |
redos($regex) |
ReDoSAnalysis | ReDoS check |
optimize($regex) |
OptimizationResult | Optimize pattern |
transpile($regex, $target) |
TranspileResult | Convert dialects |
explain($regex) |
string | Human explanation |
highlight($regex) |
string | Syntax highlight |
generate($regex) |
string | Generate sample |
literals($regex) |
LiteralExtractionResult | Extract literals |
Previous: Reference Index | Next: Diagnostics