Skip to content

Conversation

@ignat980
Copy link

@ignat980 ignat980 commented Nov 10, 2025

fix(compiler): robustly strip psql meta commands without breaking SQL

Replace naive line-based removal with a single-pass state machine that correctly distinguishes psql meta-commands from backslashes in SQL code, literals, and comments.

The previous implementation would incorrectly strip any line starting with a backslash, breaking valid SQL containing:

  • Backslashes in string literals (e.g. E'\\n', escape sequences)
  • Meta-command text in comments or documentation
  • Dollar-quoted function bodies with backslash content

Changes:

  • Track parsing state for single quotes, dollar quotes, and block comments
  • Only remove backslash commands at true line starts outside any literal context
  • Properly handle escaped quotes (''), nested block comments (/* /* */ */)
  • Support dollar-quoted tags with identifiers (e.g. $tag$...$tag$)
  • Add comprehensive test suite covering:
    • All documented psql meta-commands (\connect, \set, \d*, etc.) See PostgreSQL psql docs
    • String literals with backslashes and nested quotes
    • Dollar-quoted blocks with various tag formats
    • Nested block comments containing meta-command text
    • Edge cases: empty input, whitespace-only, missing newlines

Performance improvements:

  • Pre-allocate output buffer with strings.Builder.Grow()
  • Single pass eliminates redundant string operations
  • Reduces allocations by avoiding intermediate line slices

Testing

  • go test ./internal/compiler
  • 100% test coverage of new function removePsqlMetaCommands()

Addresses gbarr's comment in #4082 which closes #4065

Replace naive line-based removal with a single-pass state machine that correctly distinguishes psql meta-commands from backslashes in SQL code, literals, and comments.

The previous implementation would incorrectly strip any line starting with a backslash, breaking valid SQL containing:
- Backslashes in string literals (E'\\n', escape sequences)
- Meta-command text in comments or documentation
- Dollar-quoted function bodies with backslash content

Changes:
- Track parsing state for single quotes, dollar quotes, and block comments
- Only remove backslash commands at true line starts outside any literal context
- Properly handle escaped quotes (''), nested block comments (/* /* */ */)
- Support dollar-quoted tags with identifiers ($tag$...$tag$)
- Add comprehensive test suite covering:
  * All documented psql meta-commands (\connect, \set, \d*, etc.)
  * String literals with backslashes and nested quotes
  * Dollar-quoted blocks with various tag formats
  * Nested block comments containing meta-command text
  * Edge cases: empty input, whitespace-only, missing newlines

Performance improvements:
- Pre-allocate output buffer with strings.Builder.Grow()
- Single pass eliminates redundant string operations
- Reduces allocations by avoiding intermediate line slice
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. 🔧 golang labels Nov 10, 2025
@ignat980
Copy link
Author

@andrewmbenton please review

@gbarr
Copy link

gbarr commented Nov 10, 2025

Thanks @ignat980 this looks a much more complete solution than I was expecting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files. 🔧 golang

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants