Skip to content

Movement-aware parsing core, derivation artifacts, CLI workflow, and docs refresh#1

Open
yzhouwang wants to merge 3 commits into
isonovio:mainfrom
yzhouwang:main
Open

Movement-aware parsing core, derivation artifacts, CLI workflow, and docs refresh#1
yzhouwang wants to merge 3 commits into
isonovio:mainfrom
yzhouwang:main

Conversation

@yzhouwang
Copy link
Copy Markdown

Summary

This PR upgrades the parser from baseline feature flattening to a movement-aware pipeline for a controlled English grammar, and makes outputs directly usable for linguistics experiments.

Main goals completed:

  • preserve existing ambiguity regression (PP attachment = 2 parses),
  • stop treating MOVED as only a plain feature shortcut in runtime behavior,
  • add memoization/cycle safety for recursive search,
  • expose research artifacts (PNG trees + derivation JSON),
  • document capabilities, limits, and reproducibility.

What Changed

1) Movement-aware parsing semantics

  • Added explicit movement-chain lifecycle tracking (introduced/discharged/pending).
  • Added feature-compatibility checks for discharge behavior.
  • Added unresolved-chain diagnostics to exported derivations.

2) Parser/search stability

  • Added canonical parser state memoization and cycle/dead-state control.
  • Preserved intended ambiguity while avoiding runaway recursion in cyclic functional rules.
  • Added deterministic traversal/order behavior for stable outputs.

3) Controlled grammar support

  • Expanded controlled English lexicon entries for:
    • declaratives,
    • do-support declaratives,
    • yes-no questions,
    • wh-object questions,
    • PP attachment ambiguity.

4) Research outputs

  • Tree rendering now includes movement-chain markers (e.g., CH1) and optional movement arrows.
  • Added derivation JSON artifact export with:
    • token stream,
    • step trace,
    • chain events,
    • well-formedness + unresolved-chain info.
  • Added deterministic artifact naming per parse.

5) CLI workflow

  • Added CLI options for:
    • single sentence parse,
    • batch core fixtures,
    • output format (png|json|both),
    • output directory,
    • toggling movement arrows.

6) Tests and docs

  • Added/updated tests for:
    • PP ambiguity exact-count regression (2),
    • do-support / yes-no / wh coverage,
    • rejection controls for malformed/inverted/tensed-after-did cases,
    • chain invariants and termination stress behavior.
  • Added capability and quality docs:
    • docs/capability_matrix.md
    • docs/quality_report.md
  • Updated README to match current behavior and CLI usage.
  • Improved .gitignore for Rust/macOS/generated artifacts.

Validation

  • cargo test -q
    • result: 31 passed, 0 failed, 5 ignored
  • cargo clippy --all-targets --all-features -q
    • result: clean

Current Scope / Known Limits

  • Grammar is intentionally controlled and small (not open-domain English).
  • Some question analyses still rely on a constrained normalization bridge; fully strict surface-faithful derivations for all constructions remain in progress.
  • Formal claims should be interpreted over this controlled grammar and implementation scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant