Skip to content

Nearley dsl#69

Open
loyaltypollution wants to merge 19 commits intosource-academy:mainfrom
loyaltypollution:NearleyDSL
Open

Nearley dsl#69
loyaltypollution wants to merge 19 commits intosource-academy:mainfrom
loyaltypollution:NearleyDSL

Conversation

@loyaltypollution
Copy link
Contributor

Summary:
This pull request (Fixes #68) introduces unified Nearley + Moo–based parsing system. It replaces the old, fragmented setup (manual tokenizer, static grammar, separate AST DSL) with a single declarative pipeline that integrates tokenization, grammar rules, and AST generation.

Key Improvements:

  • Adds a Moo lexer for tokens and keywords.
  • Defines a Nearley grammar aligned with the existing language subset.
  • Embeds AST node generation within grammar rules.
  • Maintains compatibility with generate-ast.ts.
  • Introduces a build step to compile grammar into the parser.

Benefits:

  • Centralized source of truth for grammar (grammar.ne + lexer.moo).
  • Automatic derivation of TokenType enums — no manual syncing.
  • Deprecates tokenizer.ts and Grammar.gram.
  • Easier debugging through Nearley’s readable parse trees.

Next Steps:

  • Expand grammar to support more Python constructs (+=, comprehensions, etc.).
  • Reassess integration with generate-ast.ts -> currently creates ExprNS/StmtNS objects

- Moved error handling logic to a dedicated errors module, improving organization and maintainability.
@loyaltypollution
Copy link
Contributor Author

loyaltypollution commented Oct 29, 2025

Just discovered from link that the code is slow:

main -> statement:* {% flatten %}

Turns out that instead of the post-processor function being executed when all statements are matched it gets executed at every increment:

with 0 statements
with 1 statement
with 2 statements

This was benchmarking result of simple ackermann function
image

In order to ensure the Nearley parser emitted the current internal Python AST , we created these post-processor functions. However they are slow. A move to Nearley parser might require rethinking the internal Python AST

loyaltypollution and others added 16 commits October 30, 2025 02:13
This merge integrates the WASM compiler from main while preserving
the new Nearley-based parser from NearleyDSL.

Key changes:
- Adapted pyRunner.ts to use new parser (Tokenizer + Parser + Resolver)
- Updated AST type definitions to avoid duplication
- Fixed resolver to use new validators subsystem
- Merged package dependencies for both parser and WASM compiler
- Removed old translator and parser error handling (superseded by Nearley)
- Updated test utilities to work with new parser architecture
- Kept WASM compiler (now uses new AST types from py-slang parser)

The CSE machine already expected the new AST types, so no changes
were needed there. Both the WASM compiler and CSE machine now consume
the unified AST generated by the new Nearley parser.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Fixed parser imports: use NearleyParser from parser-adapter
- Fixed Resolver constructor calls to match new signature (source, ast)
- Fixed WASM compiler imports and complex number parsing
- Fixed PyWasmEvaluator result type conversion
- Ensured all compilers work with new unified AST types

Build now completes successfully.
- Use NearleyParser from parser-adapter instead of non-existent Parser module
- Fix Resolver constructor to pass AST instead of chapter number

Parser and analysis tests now pass successfully.
Changed parser grammar to create BigIntLiteral nodes for integer literals
instead of Literal nodes with parsed numbers. This ensures integers stay
as bigint type throughout arithmetic operations instead of being converted
to float.

Now 23+1 returns 24 (bigint) instead of 24.0 (float).
Tests now expect BigIntLiteral for integer literals instead of Literal,
reflecting the grammar changes to properly preserve integer types.
The issue was that we were creating a Tokenizer separately and passing
tokens to the Nearley parser, which has its own integrated lexer and
ignores the tokens parameter.

Reverted to using the parse() function from parser-adapter, which matches
the original implementation and allows Nearley to use its integrated lexer
correctly for handling indentation and nested structures.

This fixes the 'Unexpected token: else' error in nested if statements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hard-coded Tokenizer and Parser limit Python language extensibility

2 participants