Skip to content

Fix: Convert variable-length lookbehinds to PCRE-compatible form#128

Open
Chocapikk wants to merge 1 commit intojeff-hykin:masterfrom
Chocapikk:pcre-compat
Open

Fix: Convert variable-length lookbehinds to PCRE-compatible form#128
Chocapikk wants to merge 1 commit intojeff-hykin:masterfrom
Chocapikk:pcre-compat

Conversation

@Chocapikk
Copy link
Copy Markdown

@Chocapikk Chocapikk commented Feb 8, 2026

Summary

Fix PCRE-incompatible lookbehind and two broken include references.

Changes

PCRE compatibility

The normal_statement.begin regex wraps lookbehind alternatives in a non-capturing group, which prevents PCRE from auto-splitting:

# Before (PCRE cannot auto-split through the (?:...) wrapper)
(?<=(?:^|;|\||&|!|\(|\{|\`))

# After (PCRE auto-splits top-level alternatives)
(?<=^|;|\||&|!|\(|\{|\`)

Broken includes

Two include references point to non-existent repository keys:

  • #arithmetic_dollar -> #arithmetic_no_dollar (fixes arithmetic scoping in interpolation)
  • #line_continuation_character -> #line_continuation (fixes backslash continuation in comments)

Why

There is a PR on github-linguist/linguist to replace the archived atom/language-shellscript grammar with better-shell-syntax for GitHub.com syntax highlighting. These fixes are required for that integration.

@RedCMD
Copy link
Copy Markdown

RedCMD commented Feb 8, 2026

are you sure PCRE doesn't support ^ in a lookbehind?
(?<=^|;|&|\s)
AFAIK PCRE and Onigmo both support top level variable-length-lookbehinds
as they automatically split it into multiple constant-length-lookbehinds

out of all of them
I only see one regex that wasn't supported by PCRE
image

with the actual change being very small
image

(the rest of the warnings are just broken includes)

even https://regex101.com/r/6uHp7M/1 agrees
https://regex101.com/r/Rw2eEt/1

@Chocapikk
Copy link
Copy Markdown
Author

You're right. I over-fixed this - PCRE handles top-level variable-length lookbehinds by auto-splitting them.

The only actual issue is in normal_statement.begin, where the lookbehind wraps alternatives in a non-capturing group (?<=(?:^|;|\||&|!|\(|\{|\)), which prevents PCRE from auto-splitting. Removing the (?:...)` wrapper fixes it.

I've force-pushed with just that single change. Thanks for catching this.

- Unwrap non-capturing group in normal_statement lookbehind
  for PCRE compatibility: (?<=(?:^|...)) -> (?<=^|...)

- Fix broken include references:
  #arithmetic_dollar -> #arithmetic_no_dollar
  #line_continuation_character -> #line_continuation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants