Skip to content

Conversation

yhabteab
Copy link
Member

@yhabteab yhabteab commented Apr 27, 2023

After all the testing we did last week, it turns out that the current implementation of the filter Parser is very buggy and sometimes crashes with very simple filter strings. This PR solves this by using goyacc generated Parser. This is similar to how we parse Icinga2 DSL with flex/bison. The grammar rules can be found in the parser.y file and you can find the Lexer and other parsing utils in the lexer.go file.

For local testing/debugging purposes you can re-build the parser with the following command (go generate ./...). Since Go 1.24, you don't even need to pre-install the goyacc tool into your GOBIN, instead it's been added as a tool dependency to the go.mod file, so you can just run go mod tidy to download it like any other missing dependency. It won't install it to your GOBIN, but it will be available in the module cache and can be used to generate the parser.

The go generate command will also generate a parser.output file which contains all the parser states and is useful to resolve some shift/reduce conflicts.

@cla-bot cla-bot bot added the cla/signed CLA is signed by all contributors of a PR label Apr 27, 2023
@yhabteab yhabteab force-pushed the enhanced-filter-parser branch 3 times, most recently from 941daff to 080e335 Compare May 2, 2023 07:50
@yhabteab yhabteab changed the title [Draft]: Enhance filter parser Enhance filter parser May 2, 2023
@yhabteab yhabteab requested a review from julianbrost May 2, 2023 08:12
@julianbrost
Copy link
Collaborator

However, I haven't pushed the auto-generated parser yet because it involves a lot of code changes that don't need to be reviewed, but when this PR is done, we can think about merging this parser here or auto-generating it every time when building noma, basically just like the Icinga2 *.ti files.

Have a look at go generate. Also, common practice in Go is to commit these files to the repo so that you don't have to bother with the these dependencies unless you actually want to change the parser.

@yhabteab yhabteab force-pushed the enhanced-filter-parser branch 2 times, most recently from 2337de5 to 176b442 Compare May 3, 2023 10:00
@yhabteab
Copy link
Member Author

yhabteab commented May 3, 2023

fuzz: elapsed: 3m30s, execs: 1483038 (7807/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m33s, execs: 1507525 (8160/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m36s, execs: 1530168 (7550/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m39s, execs: 1553639 (7821/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m42s, execs: 1576623 (7661/sec), new interesting: 94 (total: 1558)
fuzz: elapsed: 3m45s, execs: 1595193 (6190/sec), new interesting: 95 (total: 1559)
fuzz: elapsed: 3m48s, execs: 1609919 (4910/sec), new interesting: 97 (total: 1561)
fuzz: elapsed: 3m51s, execs: 1630670 (6917/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 3m54s, execs: 1648763 (6029/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 3m57s, execs: 1667104 (6113/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m0s, execs: 1685923 (6255/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m3s, execs: 1704380 (6154/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m6s, execs: 1723764 (6479/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m9s, execs: 1744018 (6752/sec), new interesting: 99 (total: 1563)
fuzz: elapsed: 4m12s, execs: 1758080 (5418/sec), new interesting: 100 (total: 1564)
--- PASS: FuzzParser (251.66s)
PASS
ok  	github.com/icinga/noma/internal/filter	251.870s

@yhabteab yhabteab force-pushed the enhanced-filter-parser branch 5 times, most recently from 089cfc5 to 877a020 Compare May 3, 2023 16:32
@yhabteab yhabteab force-pushed the enhanced-filter-parser branch 7 times, most recently from 6cbb497 to 67991f2 Compare October 16, 2023 06:51
@yhabteab yhabteab force-pushed the enhanced-filter-parser branch 7 times, most recently from 8b2c4eb to 3603283 Compare October 26, 2023 14:52
@julianbrost julianbrost requested a review from oxzi October 27, 2023 12:27
Copy link
Member

@oxzi oxzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just went a bit over the code and needed to educate myself about yacc again. Feel free to argue on each of my comments.

lex.Scanner.Error = lex.ScanError

// Enable parsers error verbose to get more context of the parsing failures
yyErrorVerbose = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out, this line resp. writing this internal variable results in data races.
As the tests are set to run in parallel, this Parse function is called concurrently and there are racing writes on yyErrorVerbose.

How bad is this? In this specific context, I wouldn't be really concerned but this can become worse later on.
As this is more off an global variable configuration, I would suggest setting this in an init function or check with goyacc directly?

==================
WARNING: DATA RACE
Write at 0x0000009e228f by goroutine 9:
  github.com/icinga/icinga-notifications/internal/filter.Parse()
      /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/lexer.go:29 +0x424
  github.com/icinga/icinga-notifications/internal/filter.TestParser.func2()
      /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:86 +0x4f
  testing.tRunner()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238
  testing.(*T).Run.func1()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44

Previous write at 0x0000009e228f by goroutine 8:
  github.com/icinga/icinga-notifications/internal/filter.Parse()
      /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/lexer.go:29 +0x424
  github.com/icinga/icinga-notifications/internal/filter.TestParser.func1()
      /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:16 +0x44
  testing.tRunner()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238
  testing.(*T).Run.func1()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44

Goroutine 9 (running) created at:
  testing.(*T).Run()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x82a
  github.com/icinga/icinga-notifications/internal/filter.TestParser()
      /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:83 +0x68
  testing.tRunner()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238
  testing.(*T).Run.func1()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44

Goroutine 8 (running) created at:
  testing.(*T).Run()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x82a
  github.com/icinga/icinga-notifications/internal/filter.TestParser()
      /home/apenning/git/github.com/Icinga/icinga-notifications/internal/filter/parser_test.go:13 +0x4b
  testing.tRunner()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1595 +0x238
  testing.(*T).Run.func1()
      /nix/store/dwmb0qcai52d0zkgpm6f5ifx2a8yvsdg-go-1.21.3/share/go/src/testing/testing.go:1648 +0x44
==================

Copy link
Member Author

@yhabteab yhabteab Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea 🤷 how to set this using goyacc, but for now I will just use init() as discussed offline.

@yhabteab yhabteab force-pushed the enhanced-filter-parser branch from 3603283 to 8f8936d Compare October 27, 2023 15:40
Comment on lines 13 to 15
// identifiersMatcher contains a compiled regexp and is used by the Lexer to match filter identifiers.
// Currently, it allows to match any character except a LogicalOp and CompOperator.
var identifiersMatcher = regexp.MustCompile("[^!&|~<>=()]")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to work with a positive list here. Doing it like this, there is only very limited room for expansion. For example, if the regex would have been [^!&|<>=()] before (missing the ~), adding ~ there would have made previously valid identifiers invalid. Yes, this means that more characters have to be escaped/urlencoded, but I don't think that this would be bad, it would just have to be consistent with what the web module writes into the database.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand on why don't doing it the way it is, and I don't want to list all the valid chars because I don't know them all.

Comment on lines 53 to 103
// Lexer is used to tokenize the filter input into a set literals.
// This is just a wrapper around the Scanner type and implements the yyLexer interface used by the parser.
type Lexer struct {
scanner.Scanner
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fully understanding this yet, but https://pkg.go.dev/text/scanner says:

By default, a Scanner skips white space and Go comments and recognizes all literals as defined by the Go language specification. It may be customized to recognize only a subset of those literals and to recognize different identifier and white space characters.

I don't see any such customization, which sounds like this does way more than it has to and even should do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go white spaces are just like in PHP and the IPL parser also skips all these characters, just like this lexer does, so I won't do anything in this regard. For other customizations, see the inline comments in the Parse() function.

@yhabteab yhabteab force-pushed the enhanced-filter-parser branch from 8f8936d to ff0ab0c Compare November 6, 2023 12:02
@yhabteab yhabteab force-pushed the enhanced-filter-parser branch 2 times, most recently from eebcabd to 1ccb78e Compare November 20, 2023 15:32
@yhabteab yhabteab force-pushed the enhanced-filter-parser branch from 1ccb78e to e5cd4e2 Compare December 1, 2023 09:02
@yhabteab yhabteab force-pushed the enhanced-filter-parser branch from e5cd4e2 to 98471a8 Compare April 16, 2024 12:52
@yhabteab yhabteab force-pushed the enhanced-filter-parser branch from 98471a8 to af89454 Compare August 1, 2024 11:53
@yhabteab yhabteab requested review from oxzi and julianbrost August 1, 2024 14:45
@yhabteab yhabteab force-pushed the enhanced-filter-parser branch from af89454 to 7142584 Compare August 16, 2024 14:51
@yhabteab yhabteab force-pushed the enhanced-filter-parser branch from 7142584 to 174a428 Compare July 15, 2025 13:59
yhabteab added 4 commits July 15, 2025 16:03
Otherwise the dwarf information of that parser will get messed
up - incomplete and one will not be able to step into that source
code using a debugger (tested it using delve).
@yhabteab yhabteab force-pushed the enhanced-filter-parser branch from 174a428 to d8af662 Compare July 15, 2025 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla/signed CLA is signed by all contributors of a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants