EarleyBird

Experimental implementation of ixml in Rust. Currently targeting the 1.0 spec.

Usage

Running

For development, use cargo run -- (recommended):

Parse files

cargo run -- parse -g grammar.ixml -i input.txt

Quick inline testing (useful for development)

cargo run -- test -g 'rule: "a" | "b".' -i 'a'

Run test suite

cargo run -- suite
# Or with environment variables for debugging:
RUST_LOG=info RUST_BACKTRACE=1 cargo run -- suite

Alternatively, build the eb binary first:

cargo build --release
./target/release/eb parse -g grammar.ixml -i input.txt
./target/release/eb test -g 'rule: "a" | "b".' -i 'a'
./target/release/eb suite

The test suite is a git submodule in ixml/ that contains the reference implementation.

Debugging and Transparency

This implementation includes comprehensive debugging tools to aid in conformance work and troubleshooting.

Verbosity Levels

Both parse and test commands support verbosity levels via -v or --verbose:

off (default): Silent operation, only shows final output or errors
basic: Shows input files/strings being processed and basic error context
detailed: Adds grammar statistics, parsing success confirmations, and enhanced error details
trace: Detailed Earley parser step-by-step tracing with position filtering support

Examples:

# Quick debugging with inline strings
cargo run -- test -g 'expr: term, ("+", term)*. term: "a".' -i 'a+a' -v detailed

# Detailed file-based parsing
cargo run -- parse -g examples/simple.ixml -i examples/simple.txt -v detailed

# Basic debugging for parse failures
cargo run -- test -g 'test: "exact".' -i 'wrong' -v basic

# Trace Earley parser operations at specific position
cargo run -- test -g 'rule: "a".' -i 'a' -v trace --debug-pos 0

# Full trace (verbose - use with caution)
cargo run -- test -g 'rule: "a".' -i 'a' -v trace

# External trace file
cargo run -- test -g 'rule: "a".' -i 'a' -v trace --trace-file earley.log
cargo run -- parse -g grammar.ixml -i input.txt -v trace --trace-file debug.log

Debug Output Structure

The logging system provides structured, component-specific output:

[GRAMMAR]: Grammar construction and validation
[PARSER]: High-level parsing operations
[EARLEY]: Detailed Earley algorithm steps (trace mode)
[EARLEY@n]: Earley operations at specific input position n (legacy format)
[EARLEY-FAIL@n]: Parse failures at position n showing expected vs actual (legacy format)

Structured Format (New)

When using --trace-file, output uses a pipe-separated structured format for easier analysis:

EARLEY|pos=n|op=OPERATION|...: Earley operations with key-value pairs
EARLEY-FAIL|pos=n|expected=...|actual='x': Parse failures with structured data

Common Operations:

op=PREDICTOR: Parser predicting what nonterminal should come next
op=SCANNER: Parser trying to match terminal characters
op=COMPLETER: Parser completing a rule and looking for continuations
op=SCANNER-MATCH: Successful character matches with position advancement

Example structured output:

EARLEY|pos=0|op=PREDICTOR|task=T1|mark=|name=rule
EARLEY|pos=0|op=SCANNER|task=T2|tmark=|matcher='a'
EARLEY|pos=0|op=SCANNER-MATCH|char='a'|new_pos=1
EARLEY|pos=1|op=COMPLETER|task=T2|completed
EARLEY-FAIL|pos=1|expected='b'|actual='c'

Analysis with grep:

# All operations at position 5
grep "pos=5" earley.log

# All scanner operations
grep "op=SCANNER" earley.log

# All parse failures
grep "EARLEY-FAIL" earley.log

# Scanner matches with specific character
grep "op=SCANNER-MATCH.*char='a'" earley.log

Parse Failure Analysis

When parsing fails, the debug system automatically provides:

Input position where parsing stopped
Context around the failure point
Expected vs actual characters
Grammar content that was being processed

Earley Parser Tracing

The trace verbosity level provides detailed step-by-step Earley algorithm debugging:

Key Operations Traced:

PREDICTOR: When the parser predicts what nonterminal should come next
SCANNER: When the parser tries to match terminal characters
COMPLETER: When the parser completes a rule and looks for continuations
MATCH: Successful character matches with position advancement
FAIL: Parse failures showing expected vs actual characters

Position Filtering:

# Only show trace at input position 0
cargo run -- test -g 'rule: "a".' -i 'abc' -v trace --debug-pos 0

# Only show trace at input position 2
cargo run -- test -g 'rule: "a", "b", "c".' -i 'abc' -v trace --debug-pos 2

This is essential for conformance debugging as full traces can be thousands of lines even for simple grammars.

External Trace Files: The --trace-file option is especially useful for workflows to avoid overwhelming the terminal:

cargo run -- parse -g complex.ixml -i large-input.txt -v trace --trace-file trace.log
# Then analyze the trace file separately:
grep "EARLEY-FAIL" trace.log | head -10
grep "pos=42" trace.log

Implementation Notes for Developers

The debug infrastructure is centralized in src/debug.rs:

DebugLevel: Enum controlling output verbosity
DebugConfig: Configuration with position filtering, failure-only modes, and trace file output
Debug macros: debug_basic!(), debug_detailed!(), debug_trace!()
Component-specific macros: debug_grammar!(), debug_parser!(), debug_earley!()
Position-aware macros: debug_earley_pos!(), debug_earley_fail!()
Failure analysis: debug_parse_failure() for detailed error context

Key benefits:

No scattered println! or eprintln! statements throughout codebase
Debug levels controlled centrally without conditional bloat in main code
Position filtering prevents trace output overload
External trace files prevent clutter in debug output (use grep instead!)
Structured pipe-separated format for easy grep analysis
Easy to add new debugging without changing existing code structure

Bootstrap Implementation Strategy

Comment Handling: Comments ({...}) are preprocessed and stripped before grammar parsing to avoid performance issues. Since comments are fully nestable and never appear in output XML, removing them during preprocessing eliminates the character-by-character parsing overhead that would generate massive trace output. This approach maintains semantic correctness while dramatically improving parse performance for grammars with extensive documentation.

Future Debugging Enhancements (Phase 2)

Planned advanced debugging features:

HTML trace viewer for step-by-step parse visualization
Earley chart state visualization
Parse tree diff viewer for comparing expected vs actual results
Interactive parser stepper for debugging complex failures
Export capabilities (JSON/GraphViz) for external analysis tools

Future work

more generally, performance profiling and optimization
flamegraph profiling?

Statement on AI Generated Code

As of May 1, 2023, no AI generated code has been used in any part of this project.

The core concepts and architecture were all built 'by hand'.

Since this is a learning project, I intend to experiment with different code generation products in the future, especially for testing and fleshing out the details of the implementation.

Test suite: The test harness expects to locate resources from the official ixml repo https://github.com/invisibleXML/ixml in a symlinked directory called /ixml.

Assuming you have this repo checked out in a sibling directory to earleybird, the command for this is ln -s ../ixml .

References

Invisible XML: https://invisiblexml.org/

IXML Repo: https://github.com/invisibleXML/ixml

Test Suite: https://github.com/invisibleXML/ixml/tree/master/tests

Vulturine Guinea Fowl: https://en.wikipedia.org/wiki/Vulturine_guineafowl

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
archive		archive
examples		examples
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
TODO.txt		TODO.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EarleyBird

Usage

Running

Parse files

Quick inline testing (useful for development)

Run test suite

Debugging and Transparency

Verbosity Levels

Debug Output Structure

Structured Format (New)

Parse Failure Analysis

Earley Parser Tracing

Implementation Notes for Developers

Bootstrap Implementation Strategy

Future Debugging Enhancements (Phase 2)

Future work

Statement on AI Generated Code

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mdubinko/earleybird

Folders and files

Latest commit

History

Repository files navigation

EarleyBird

Usage

Running

Parse files

Quick inline testing (useful for development)

Run test suite

Debugging and Transparency

Verbosity Levels

Debug Output Structure

Structured Format (New)

Parse Failure Analysis

Earley Parser Tracing

Implementation Notes for Developers

Bootstrap Implementation Strategy

Future Debugging Enhancements (Phase 2)

Future work

Statement on AI Generated Code

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages