-
Notifications
You must be signed in to change notification settings - Fork 33
Description
- parglare version: 0.16.1
- Python version: 3.10.6
- Operating System: Windows 10, 64-bit
Description
I am trying to parse a language with a large amount of ambiguity. To resolve the ambiguity, I have assigned priorities to my terminals. It would appear that not all parses are considered, and valid input fails because of that.
What I Did
I've reduced my grammar to the following:
grammar.pg
Program: Stmt+ dot;
Stmt: (add | ADD) id+ (to | TO) id;
terminals
dot: ".";
ADD: "ADD" {10};
TO: "TO" {10};
id: /[a-zA-Z]+/ {7};
add: "add" {5};
to: "to" {5};
The input I'm trying to parse is: ADD A to B ADD C TO D ADD E to F.. In this language, there are no reserved keywords and the keywords are not case sensitive, nor is there a required statement terminator character. The disambiguation strategy that I've tried to implement using lexing priorities:
- Uppercase words are considered keywords by default (priority 10)
- All other words are considered identifiers (priority 7)
- If a keyword is required, lowercase words should be considered a keyword (priority 5)
According to my strategy, I expect the following result: [ADD [A to B ADD C] TO D], [ADD [E] to F]., or at least any result.
Instead, I get the following error:
parglare.exceptions.ParseError: Error at 1:32:"ADD E to F **> ." => Expected: TO or id or to but found <dot(.)>
What I expected to happen is that once trying to use the final to as an identifier failed, the parser would retry it as a keyword instead, as this terminal has a lower priority. This never seems to happen however. Is this intentional, and if so, how could I best implement my disambiguation strategy?
If I use the GLRParser class and give the id and to terminals the same priority, I do get a parse forest with two paths, one of which is the expected one.