-
Notifications
You must be signed in to change notification settings - Fork 640
main: using regex for choosing a parser for given file name #4270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
a5a3a28 to
c50f467
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4270 +/- ##
==========================================
+ Coverage 86.01% 86.02% +0.01%
==========================================
Files 250 251 +1
Lines 64159 64352 +193
==========================================
+ Hits 55187 55361 +174
- Misses 8972 8991 +19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR extends the --map-<LANG> option to support regular expression matching for file names, addressing limitations where glob patterns and extension matching are insufficient for generic file names. The implementation adds a new regex-based mapping type alongside existing extension and pattern mappings.
Key Changes:
- Introduced regex pattern support using
%regex%[i]syntax for language file mappings - Added new
rexprcodemodule to handle regex compilation and matching - Extended optlib2c to generate C code from regex mapping definitions
Reviewed Changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| main/rexprcode.c | New module implementing regex pattern compilation, matching, and encoding |
| main/rexprcode_p.h | Public interface for regex code operations |
| main/parse.c | Core integration of regex matching into language detection logic |
| main/parse.h | Added rExprSrc structure definition and REXPR_LAST_ENTRY macro |
| main/parse_p.h | Extended langmapType enum with LMAP_REXPR flag |
| main/options.c | Command-line option parsing for regex patterns with icase flag support |
| optlib/rpmMacros.ctags | Example usage replacing commented-out patterns with regex |
| optlib/rpmMacros.c | Generated C code with regex mapping definitions |
| misc/optlib2c | Extended Perl script to parse and generate regex mapping code |
| source.mak | Build system updates for new source files |
| win32/ctags_vs2013.vcxproj | Visual Studio project file updates |
| win32/ctags_vs2013.vcxproj.filters | Visual Studio filter file updates |
| Tmain/list-map-rexprs.d/* | Test cases for new --list-map-rexprs option |
| Tmain/versioning.d/stdout-expected.txt | Updated test output expectations |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
misc/optlib2c
Outdated
| unless ($_[0]->{'langdef'} eq $1); | ||
| my $spec = $2; | ||
| if ($spec =~ /\((.*)\)/) { | ||
| if ($spec =~ /%(.+)%(i)?/) { |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern %(.+)% is greedy and will match incorrectly if the expression contains '%' characters, even with escaping. For example, %a%b%c% would capture a%b%c instead of a. The pattern should be non-greedy: %(.+?)%(i)?/ or better yet, should properly handle escaped '%' characters in the capture group.
| if ($spec =~ /%(.+)%(i)?/) { | |
| if ($spec =~ /%(.+?)%(i)?/) { |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
misc/optlib2c
Outdated
| unless ($_[0]->{'langdef'} eq $1); | ||
| my $spec = $2; | ||
| if ($spec =~ /\((.*)\)/) { | ||
| if ($spec =~ /%(.+)%(i)?/) { |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex should use a non-greedy quantifier .+? instead of .+ to prevent matching across multiple patterns when there are multiple % characters in the input. This could cause incorrect parsing of escaped % characters.
| if ($spec =~ /%(.+)%(i)?/) { | |
| if ($spec =~ /%(.+?)%(i)?/) { |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 5 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
misc/optlib2c
Outdated
| if ($spec =~ /%(.+?)%(i|\{icase\})?/) { | ||
| my $rexpr = { expr => $1, | ||
| iCase => (defined $2 && ($2 eq 'i' || $2 eq 'icase'))? 1: 0 }; |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex allows {icase} as an alternative to i, but this syntax is not documented in the PR description or help text. Either document this alternative syntax or remove it to avoid confusion.
| if ($spec =~ /%(.+?)%(i|\{icase\})?/) { | |
| my $rexpr = { expr => $1, | |
| iCase => (defined $2 && ($2 eq 'i' || $2 eq 'icase'))? 1: 0 }; | |
| if ($spec =~ /%(.+?)%(i)?/) { | |
| my $rexpr = { expr => $1, | |
| iCase => (defined $2 && $2 eq 'i')? 1: 0 }; |
Copilot uses AI. Check for mistakes.
|
|
||
| static flagDefinition langmapRexprFlagDef[] = { | ||
| { 'i', "icase", langmap_rexpr_icase_short, langmap_rexpr_icase_long, | ||
| NULL, "applied in a case-insensitive manner"}, |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The long flag name is 'icase', but the optlib2c script also accepts '{icase}' syntax (line 296 in misc/optlib2c). These should be consistent, or the alternative syntax should be documented.
| NULL, "applied in a case-insensitive manner"}, | |
| NULL, "applied in a case-insensitive manner (accepts both 'icase' and '{icase}' syntax)"}, |
Copilot uses AI. Check for mistakes.
231a606 to
f865e33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 21 out of 23 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 22 out of 24 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 22 out of 24 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
baseFilenameSansExtensionNew("a.in.in", ".in.in") could not return
"a" with the original code.
Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
…nsExtensionNew Delete baseFilenameSansExtensionNew() from the source tree. Signed-off-by: Masatake YAMATO <[email protected]>
The original code used a boolean value to toggle how filenames were mapped to the parser by glob-like pattern or by extension. To support the third way mapping a file name to a parser, by regular expression pattern, we will use an enum value instead of Boolean. Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
This change extends --map-<LANG> option to support regular expression matching with the full file name. The original --map-<LANG> option supports the glob based matching and the extension comparison with the file basename. However, two methods are not enough if the file names are too generic. See universal-ctags#3287 . The regular expression passed to --map-<LANG> must be surrounded by % character like --map-RpmMacros='%(.*/)?macros\.d/macros\.([^/]+)$%' If you want to match in a case-insensitive way, append `i' after the second % like --map-RpmMacros='%(.*/)?macros\.d/macros\.([^/]+)$%i' If you want to use % as part of an expression, put \ before % for escaping. Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
…options Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
Signed-off-by: Masatake YAMATO <[email protected]>
|
Updating the man page is the last task. |
main: using regex for choosing a parser for the given file name
This change extends --map- option to support regular
expression matching with the full file name.
The original --map- option supports glob based matching
and extension comparison with the file basename.
However, two methods are not enough if the file names are too
generic. See #3287 .
The regular expression passed to --map- must be surround
by % character like
--map-RpmMacros='%(.*/)?macros.d/macros.([^/]+)$%'
If you want to match in a case-insensitive way, append `i' after the second % like
--map-RpmMacros='%(.*/)?macros.d/macros.([^/]+)$%i'
If you want to use % as part of an expression, put \ before % for escaping.
TODO: