Skip to content

"unrecognized input" after upgrade #208

@danabr

Description

@danabr

Description

I recently upgraded from FsLexYacc 10.0 to the latest 11.3.0. After the upgrade, parsing a comment line // ä now fails with "unrecognized input". I have made no changes to the lexer or parser options, nor to the parser or lexer definitions.

Repro steps

I have managed to create a small-ish reproducer:

Parser.fsy:

%token EOF
%token <string*FSharp.Text.Lexing.Position> IDENTIFIER

%start top
%type <string> top

%%

top: EOF { "hello" }

Lexer.fsl:

{
module Lexer

open FSharp.Text.Lexing
open Parser

let lexeme lexbuf = LexBuffer<char>.LexemeString lexbuf

}

let alpha = ['a' - 'z' 'A' - 'Z']
let swe = ['ä' 'Ä' 'ö' 'Ö' 'å' 'Å' ]
let letter = alpha | swe
let ident = letter+
let newline = ('\n' | "\r\n" )

rule token = parse
| "//"           { commentline lexbuf.StartPos lexbuf }
| ident          { IDENTIFIER(lexeme lexbuf, lexbuf.StartPos) }
| newline        { token lexbuf }
| eof            { EOF }
| _              { failwith "unknown token" }

and commentline p = parse
| newline        { token lexbuf }
| eof            { EOF }
| _              { commentline p lexbuf }

Program.fs:

open Parser
open Lexer

let input = "// ä"
let lexbuf = FSharp.Text.Lexing.LexBuffer<_>.FromString input
let result = Parser.top Lexer.token lexbuf

printfn "%s" result

FsLexYaccRepro.fsproj:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="FsLexYacc.Runtime" Version="11.3.0" />
    <PackageReference Include="FsLexYacc" Version="11.3.0" />
  </ItemGroup>

  <ItemGroup>
    <FsLex Include="Lexer.fsl">
      <OtherFlags>--unicode</OtherFlags>
    </FsLex>
    <FsYacc Include="Parser.fsy">
      <OtherFlags>--module Parser</OtherFlags>
    </FsYacc>
    <Compile Include="Parser.fs" />
    <Compile Include="Lexer.fs" />
    <Compile Include="Program.fs" />
  </ItemGroup>
</Project>

Expected behavior

When running the program above with dotnet run the output should be "hello".

Actual behavior

We get an exception with the stacktrace:

Unhandled exception. System.Exception: unrecognized input
   at FSharp.Text.Lexing.LexBuffer`1.EndOfScan() in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Lexing.fs:line 128
   at FSharp.Text.Lexing.UnicodeTables.scanUntilSentinel(LexBuffer`1 lexBuffer, Int32 state) in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Lexing.fs:line 448
   at Lexer.commentline(Position p, LexBuffer`1 lexbuf) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Lexer.fs:line 81
   at Lexer.token(LexBuffer`1 lexbuf) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Lexer.fs:line 18
   at Program.result@6.Invoke(LexBuffer`1 lexbuf)
   at FSharp.Text.Parsing.Implementation.interpret[tok,a](Tables`1 tables, FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 initialState) in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Parsing.fs:line 346
   at FSharp.Text.Parsing.Tables`1.Interpret[char](FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 startState) in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Parsing.fs:line 498
   at Parser.engine[a](FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 startState) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Parser.fs:line 111
   at Parser.top[a](FSharpFunc`2 lexer, LexBuffer`1 lexbuf) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Parser.fs:line 113
   at <StartupCode$FsLexYaccRepro>.$Program.main@() in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Program.fs:line 6

Note that parsing the input "// a" works fine. Also, parsing works if I remove ä from swe in Lexer.fsl.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions