|
| 1 | +# Redshift Parser |
| 2 | + |
| 3 | +A comprehensive SQL parser for Amazon Redshift built with ANTLR 4, supporting both PostgreSQL and Redshift-specific syntax. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +This project is a Go-based SQL parser specifically designed for Amazon Redshift. It originated as a fork of the PostgreSQL parser but has been restructured to accommodate Redshift's unique syntax requirements and incompatibilities with standard PostgreSQL. |
| 8 | + |
| 9 | +## Features |
| 10 | + |
| 11 | +- **Complete SQL Support**: Parses 200+ SQL statement types including DDL, DML, and advanced constructs |
| 12 | +- **Redshift-Specific Syntax**: Full support for Redshift extensions like `IDENTITY` columns, `DISTKEY`, `SORTKEY`, and more |
| 13 | +- **Engine-Aware Parsing**: Dual-mode parser that can handle both PostgreSQL and Redshift syntax |
| 14 | +- **Comprehensive Testing**: 200+ test cases covering real-world SQL scenarios |
| 15 | +- **High Performance**: Optimized for production use with parser reuse and efficient error handling |
| 16 | + |
| 17 | +## Installation |
| 18 | + |
| 19 | +```bash |
| 20 | +go get github.com/bytebase/redshift-parser |
| 21 | +``` |
| 22 | + |
| 23 | +## Quick Start |
| 24 | + |
| 25 | +```go |
| 26 | +package main |
| 27 | + |
| 28 | +import ( |
| 29 | + "fmt" |
| 30 | + "github.com/antlr4-go/antlr/v4" |
| 31 | + "github.com/bytebase/redshift-parser" |
| 32 | +) |
| 33 | + |
| 34 | +func main() { |
| 35 | + // Parse a Redshift-specific CREATE TABLE statement |
| 36 | + sql := `CREATE TABLE users ( |
| 37 | + id INT IDENTITY(1,1), |
| 38 | + name VARCHAR(100), |
| 39 | + email VARCHAR(255) UNIQUE |
| 40 | + ) DISTKEY(id) SORTKEY(name);` |
| 41 | + |
| 42 | + // Create lexer and parser |
| 43 | + input := antlr.NewInputStream(sql) |
| 44 | + lexer := parser.NewRedshiftLexer(input) |
| 45 | + stream := antlr.NewCommonTokenStream(lexer, 0) |
| 46 | + p := parser.NewRedshiftParser(stream) |
| 47 | + |
| 48 | + // Set engine to Redshift for dialect-specific parsing |
| 49 | + p.Engine = parser.EngineRedshift |
| 50 | + |
| 51 | + // Parse the SQL |
| 52 | + tree := p.Root() |
| 53 | + |
| 54 | + fmt.Println("Successfully parsed Redshift SQL!") |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +## Supported SQL Features |
| 59 | + |
| 60 | +### DDL (Data Definition Language) |
| 61 | +- `CREATE TABLE` with Redshift-specific options (DISTKEY, SORTKEY, IDENTITY) |
| 62 | +- `ALTER TABLE` with column modifications and constraints |
| 63 | +- `CREATE INDEX` with various index types |
| 64 | +- `CREATE VIEW` and materialized views |
| 65 | +- `CREATE FUNCTION` and stored procedures |
| 66 | + |
| 67 | +### DML (Data Manipulation Language) |
| 68 | +- `SELECT` with complex joins, subqueries, and window functions |
| 69 | +- `INSERT` with conflict resolution (`ON CONFLICT`) |
| 70 | +- `UPDATE` with joins and CTEs |
| 71 | +- `DELETE` with complex conditions |
| 72 | +- `MERGE` statements |
| 73 | + |
| 74 | +### Advanced Features |
| 75 | +- Common Table Expressions (CTEs) |
| 76 | +- Window functions and analytics |
| 77 | +- JSON operations and path expressions |
| 78 | +- Array operations |
| 79 | +- Regular expressions |
| 80 | +- Full-text search |
| 81 | + |
| 82 | +## Engine Modes |
| 83 | + |
| 84 | +The parser supports two engine modes: |
| 85 | + |
| 86 | +### Redshift Mode |
| 87 | +```go |
| 88 | +p.Engine = parser.EngineRedshift |
| 89 | +``` |
| 90 | +- Supports Redshift-specific syntax extensions |
| 91 | +- Handles `IDENTITY` columns, distribution keys, sort keys |
| 92 | +- Supports Redshift built-in functions and data types |
| 93 | + |
| 94 | +### PostgreSQL Mode |
| 95 | +```go |
| 96 | +p.Engine = parser.EnginePostgreSQL |
| 97 | +``` |
| 98 | +- Standard PostgreSQL syntax compliance |
| 99 | +- Useful for compatibility testing and migration scenarios |
| 100 | + |
| 101 | +## Development |
| 102 | + |
| 103 | +### Prerequisites |
| 104 | +- Go 1.21+ |
| 105 | +- ANTLR 4.13.2+ |
| 106 | + |
| 107 | +### Building from Source |
| 108 | + |
| 109 | +1. **Clone the repository**: |
| 110 | +```bash |
| 111 | +git clone https://github.com/bytebase/redshift-parser.git |
| 112 | +cd redshift-parser |
| 113 | +``` |
| 114 | + |
| 115 | +2. **Generate parser code**: |
| 116 | +```bash |
| 117 | +./build.sh |
| 118 | +``` |
| 119 | + |
| 120 | +3. **Run tests**: |
| 121 | +```bash |
| 122 | +go test -v |
| 123 | +``` |
| 124 | + |
| 125 | +### Project Structure |
| 126 | + |
| 127 | +``` |
| 128 | +redshift-parser/ |
| 129 | +├── RedshiftLexer.g4 # ANTLR lexer grammar |
| 130 | +├── RedshiftParser.g4 # ANTLR parser grammar |
| 131 | +├── build.sh # Code generation script |
| 132 | +├── redshift_lexer_base.go # Base lexer implementation |
| 133 | +├── redshift_parser_base.go # Base parser with engine support |
| 134 | +├── keywords.go # 600+ SQL keywords |
| 135 | +├── builtin_function.go # Built-in function definitions |
| 136 | +├── examples/ # 200+ SQL test files |
| 137 | +├── parser_test.go # Main test suite |
| 138 | +├── engine_specific_test.go # Engine-specific tests |
| 139 | +└── CLAUDE.md # Development guide |
| 140 | +``` |
| 141 | + |
| 142 | +## Testing |
| 143 | + |
| 144 | +The project includes comprehensive test coverage: |
| 145 | + |
| 146 | +```bash |
| 147 | +# Run all tests |
| 148 | +go test -v |
| 149 | + |
| 150 | +# Run specific test |
| 151 | +go test -run TestRedshiftParser -v |
| 152 | + |
| 153 | +# Run benchmarks |
| 154 | +go test -bench=. -v |
| 155 | + |
| 156 | +# Test specific engine |
| 157 | +go test -run TestRedshiftSyntax -v |
| 158 | +``` |
| 159 | + |
| 160 | +Test files are located in the `examples/` directory and cover: |
| 161 | +- Basic SQL operations |
| 162 | +- Complex queries with joins and subqueries |
| 163 | +- Redshift-specific syntax |
| 164 | +- Error handling scenarios |
| 165 | +- Performance benchmarks |
| 166 | + |
| 167 | +## Grammar Files |
| 168 | + |
| 169 | +The parser is built using ANTLR 4 grammars: |
| 170 | + |
| 171 | +- **RedshiftLexer.g4**: Tokenization rules for SQL keywords, operators, and literals |
| 172 | +- **RedshiftParser.g4**: Grammar rules for SQL statement parsing |
| 173 | + |
| 174 | +After modifying grammar files, regenerate the Go code: |
| 175 | + |
| 176 | +```bash |
| 177 | +./build.sh |
| 178 | +``` |
| 179 | + |
| 180 | +## Contributing |
| 181 | + |
| 182 | +1. Fork the repository |
| 183 | +2. Create a feature branch |
| 184 | +3. Add tests for your changes |
| 185 | +4. Ensure all tests pass |
| 186 | +5. Update documentation as needed |
| 187 | +6. Submit a pull request |
| 188 | + |
| 189 | +### Development Guidelines |
| 190 | + |
| 191 | +- Always run `./build.sh` before testing after grammar changes |
| 192 | +- Add test cases for new SQL syntax support |
| 193 | +- Follow existing code patterns and conventions |
| 194 | +- Use AWS Redshift documentation for syntax reference |
| 195 | +- Test against both PostgreSQL and Redshift engines |
| 196 | + |
| 197 | +## License |
| 198 | + |
| 199 | +This project is licensed under the MIT License. See the grammar files for additional license information from the original PostgreSQL grammar contributors. |
| 200 | + |
| 201 | +## Acknowledgments |
| 202 | + |
| 203 | +- Based on the PostgreSQL grammar from [Tunnel Vision Labs](https://github.com/tunnelvisionlabs/antlr4-postgresql) |
| 204 | +- Forked from [Bytebase PostgreSQL Parser](https://github.com/bytebase/postgresql-parser) |
| 205 | +- Built with [ANTLR 4](https://github.com/antlr/antlr4) |
| 206 | + |
| 207 | +## Related Projects |
| 208 | + |
| 209 | +- [Bytebase](https://github.com/bytebase/bytebase) - Database DevOps platform |
| 210 | +- [PostgreSQL Parser](https://github.com/bytebase/postgresql-parser) - Original PostgreSQL parser |
| 211 | +- [ANTLR 4](https://github.com/antlr/antlr4) - Parser generator toolkit |
| 212 | + |
| 213 | +## Support |
| 214 | + |
| 215 | +- [GitHub Issues](https://github.com/bytebase/redshift-parser/issues) - Bug reports and feature requests |
| 216 | +- [AWS Redshift Documentation](https://docs.aws.amazon.com/redshift/latest/dg/c_SQL_commands.html) - SQL syntax reference |
| 217 | +- [ANTLR Documentation](https://github.com/antlr/antlr4/blob/master/doc/index.md) - Grammar development guide |
0 commit comments