This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
league/commonmark is a highly-extensible PHP Markdown parser that fully supports the CommonMark spec and GitHub-Flavored Markdown (GFM). It's based on the CommonMark JS reference implementation and provides a robust, extensible architecture for parsing and rendering Markdown content.
composer test- Run all tests (includes linting, static analysis, unit tests, and pathological tests)composer phpunit- Run PHPUnit tests only (no coverage)composer pathological- Run pathological performance tests
composer phpcs- Run PHP CodeSniffer for coding standardscomposer phpcbf- Automatically fix coding standards issuescomposer phpstan- Run PHPStan static analysiscomposer psalm- Run Psalm static analysis with stats
(IMPORTANT: you MUST ALWAYS use PHP 7.4 to run phpcs and phpcbf. You SHOULD use the php service from docker-compose, which uses that version. Example: docker compose exec php composer phpcs)
./tests/benchmark/benchmark.php- Compare performance against other Markdown parsers
Converters: Main entry points using Facade pattern
CommonMarkConverter- Preconfigured withCommonMarkCoreExtensionGithubFlavoredMarkdownConverter- Includes GFM extensions bundleMarkdownConverter- Base class orchestratingMarkdownParser+HtmlRenderer- Pattern: Factory with default configurations + Facade for complex pipeline
Environment System: Service container and registry
Environment- Central registry managing parsers/renderers with priorities- Implements PSR-14 event dispatcher for pre/post processing hooks
- Uses lazy initialization - extensions registered on first use
- Pattern: Registry + Builder + Dependency Injection
Parser Architecture: Two-phase recursive descent parsing
- Block Phase:
MarkdownParserprocesses line-by-line with active parser stackBlockStartParserInterface- Strategy pattern for block detection- State machine with continuation tracking and reference processing
- Security: NUL character replacement, configurable nesting limits
- Inline Phase:
InlineParserEnginewith regex pre-compilationInlineParserInterface- Strategy with regex-based matching- Position-based parser coordination with delimiter processing
- Adjacent text merging optimization
AST (Abstract Syntax Tree): Composite pattern with doubly-linked structure
Nodebase class with tree navigation/manipulation methodsAbstractBlock/AbstractInline- Template method pattern for element typesDocument- Root node with reference map storage- Uses
Dflydev\DotAccessData\Datafor flexible metadata storage - Supports multiple traversal: iterator, walker, query system
Rendering: Visitor pattern with strategy delegation
HtmlRenderer- Traverses AST, delegates to node-specific renderersNodeRendererInterface- Strategy pattern for extensible rendering- Hierarchical renderer lookup supporting inheritance
- Pre/post-render events with configurable block separators
Extension System: Plugin pattern with composite support
ExtensionInterface- Simple contract for environment configurationCommonMarkCoreExtension- Complete spec implementation with prioritiesGithubFlavoredMarkdownExtension- Composite bundling multiple GFM features- Performance: Optimized parser ordering and lazy registration
src/Extension/: All built-in extensions
CommonMark/- Core CommonMark specification featuresGithubFlavoredMarkdownExtension.php- GFM bundle extension- Individual feature extensions:
Table/,Strikethrough/,TaskList/, etc.
src/Parser/: Parsing logic
Block/- Block-level parsing componentsInline/- Inline parsing componentsMarkdownParser.php- Main parsing coordinator
src/Node/: AST node definitions
Block/- Block-level nodes (paragraphs, headings, lists, etc.)Inline/- Inline nodes (text, emphasis, links, etc.)
src/Renderer/: Output rendering
Block/andInline/subdirectories mirror node structureHtmlRenderer.php- Main HTML output renderer
The library uses a doubly-linked AST where all elements (including the root Document) extend from the Node class:
- Iterator:
$node->iterator()- Fastest for complete tree traversal - Walker:
$node->walker()- Full control with enter/leave events, useresumeAt()for safe modifications - Query:
(new Query())->where()->findAll($node)- Easy but memory-intensive, creates snapshots - Manual:
$node->next(),$node->parent(),$node->children()- Best for direct relationships
- Adding:
appendChild(),prependChild(),insertAfter(),insertBefore() - Removing:
detach(),replaceWith(),detachChildren(),replaceChildren() - Data:
$node->data->set('custom/info', $value),$node->data->set('attributes/class', 'css-class')
- Implement
ExtensionInterfacewithregister(EnvironmentBuilderInterface $environment)method - Register components with priorities:
addInlineParser(),addBlockStartParser(),addRenderer() - Follow existing extension patterns in
src/Extension/
- Block Parsers:
BlockStartParserInterface- implementtryStart()andtryContinue() - Inline Parsers:
InlineParserInterface- implementgetMatchDefinition()andparse() - Delimiter Processors:
DelimiterProcessorInterface- for emphasis-style wrapping syntax - Renderers:
NodeRendererInterface- implementrender(), useHtmlElementfor safety - Events: PSR-14 events like
DocumentParsedEventfor AST manipulation - Configuration:
ConfigurableExtensionInterfacewithleague/configvalidation
Cursorclass: dual ASCII/UTF-8 paths, character caching, position state management- Key methods:
peek(),match(),saveState()/restoreState(),advanceBy()
- Unit Tests (
tests/unit/) - Component testing, mirrors source structure - Functional Tests (
tests/functional/) - End-to-end with.md/.htmlpairs - Pathological Tests (
tests/pathological/) - Security/DoS prevention - Extension Tests (
tests/functional/Extension/) - Per-extension testing
composer test- Full test suitecomposer phpunit- PHPUnit tests onlycomposer pathological- Security/performance tests
When handling untrusted user input, certain security settings are essential to prevent XSS, DoS, and other attacks. These particular ones should be checked where necessary:
Implementation: HtmlFilter::filter() in HtmlBlockRenderer and HtmlInlineRenderer
Default: 'allow' (unsafe for untrusted input)
Attack Vector: XSS through raw HTML injection
Options:
HtmlFilter::STRIPreturns empty stringHtmlFilter::ESCAPEuseshtmlspecialchars($html, ENT_NOQUOTES)HtmlFilter::ALLOWreturns raw HTML unchanged
Implementation: RegexHelper::isLinkPotentiallyUnsafe() in LinkRenderer and ImageRenderer
Default: true (allows unsafe links)
Attack Vector: XSS through malicious protocols (javascript:, vbscript:, file:, data:)