Skip to content

Lexical disambiguation not working as expected #146

@bramhaag

Description

@bramhaag
  • parglare version: 0.16.1
  • Python version: 3.10.6
  • Operating System: Windows 10, 64-bit

Description

I am trying to parse a language with a large amount of ambiguity. To resolve the ambiguity, I have assigned priorities to my terminals. It would appear that not all parses are considered, and valid input fails because of that.

What I Did

I've reduced my grammar to the following:

grammar.pg
Program: Stmt+ dot;
Stmt: (add | ADD) id+ (to | TO) id;

terminals
dot: ".";
ADD: "ADD" {10};
TO: "TO" {10};

id: /[a-zA-Z]+/ {7}; 

add: "add" {5};
to: "to" {5};

The input I'm trying to parse is: ADD A to B ADD C TO D ADD E to F.. In this language, there are no reserved keywords and the keywords are not case sensitive, nor is there a required statement terminator character. The disambiguation strategy that I've tried to implement using lexing priorities:

  1. Uppercase words are considered keywords by default (priority 10)
  2. All other words are considered identifiers (priority 7)
  3. If a keyword is required, lowercase words should be considered a keyword (priority 5)

According to my strategy, I expect the following result: [ADD [A to B ADD C] TO D], [ADD [E] to F]., or at least any result.

Instead, I get the following error:

parglare.exceptions.ParseError: Error at 1:32:"ADD E to F **> ." => Expected: TO or id or to but found <dot(.)>

What I expected to happen is that once trying to use the final to as an identifier failed, the parser would retry it as a keyword instead, as this terminal has a lower priority. This never seems to happen however. Is this intentional, and if so, how could I best implement my disambiguation strategy?

If I use the GLRParser class and give the id and to terminals the same priority, I do get a parse forest with two paths, one of which is the expected one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions