Skip to content

Whitespace terminal rules are always matched first #1828

@dsogari

Description

@dsogari

Langium version: latest
Package name: langium

Steps To Reproduce

  1. Open the playground page.

  2. Copy and paste the following grammar into the Grammar pane:

    grammar Test
    
    entry Doc: Node;
    
    Node: name=NAME WS TEXT;
    
    terminal NAME:  /\w+/;
    terminal WS:    /\s/; // matches whitespace
    terminal TEXT:  /.+/; // matches whitespace
    
  3. Copy and paste the following content into the Content pane (notice the trailing spaces):

    abc   
    

Link to code example: playground example

The current behavior

Image

The expected behavior

The content should be parsed and the following syntax tree should appear in the Syntax tree pane:

{
  $type: "Node", 
  name: "abc"
}

Additional notes

I think this behaviour is caused by the following piece of code in token-builder.ts:

const pattern = terminalToken.PATTERN;
if (typeof pattern === 'object' && pattern && 'test' in pattern && isWhitespace(pattern)) {
    tokens.unshift(terminalToken);
} else {
    tokens.push(terminalToken);
}

In other words, whitespace-matching lexer rules are being given priority, thus changing the order specified in the grammar and altering the expected behavior. This is espeically frustrating, for instance, when we need a single whitespace as a delimiter/separator for specific parser rules.

Metadata

Metadata

Assignees

No one assigned

    Labels

    as designedThe feature in question is working as designed

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions