Skip to content
This repository was archived by the owner on Aug 4, 2025. It is now read-only.

Commit ad72073

Browse files
committed
chore: add README.md
Signed-off-by: h3n4l <[email protected]>
1 parent da0ff28 commit ad72073

File tree

1 file changed

+217
-0
lines changed

1 file changed

+217
-0
lines changed

README.md

Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
# Redshift Parser
2+
3+
A comprehensive SQL parser for Amazon Redshift built with ANTLR 4, supporting both PostgreSQL and Redshift-specific syntax.
4+
5+
## Overview
6+
7+
This project is a Go-based SQL parser specifically designed for Amazon Redshift. It originated as a fork of the PostgreSQL parser but has been restructured to accommodate Redshift's unique syntax requirements and incompatibilities with standard PostgreSQL.
8+
9+
## Features
10+
11+
- **Complete SQL Support**: Parses 200+ SQL statement types including DDL, DML, and advanced constructs
12+
- **Redshift-Specific Syntax**: Full support for Redshift extensions like `IDENTITY` columns, `DISTKEY`, `SORTKEY`, and more
13+
- **Engine-Aware Parsing**: Dual-mode parser that can handle both PostgreSQL and Redshift syntax
14+
- **Comprehensive Testing**: 200+ test cases covering real-world SQL scenarios
15+
- **High Performance**: Optimized for production use with parser reuse and efficient error handling
16+
17+
## Installation
18+
19+
```bash
20+
go get github.com/bytebase/redshift-parser
21+
```
22+
23+
## Quick Start
24+
25+
```go
26+
package main
27+
28+
import (
29+
"fmt"
30+
"github.com/antlr4-go/antlr/v4"
31+
"github.com/bytebase/redshift-parser"
32+
)
33+
34+
func main() {
35+
// Parse a Redshift-specific CREATE TABLE statement
36+
sql := `CREATE TABLE users (
37+
id INT IDENTITY(1,1),
38+
name VARCHAR(100),
39+
email VARCHAR(255) UNIQUE
40+
) DISTKEY(id) SORTKEY(name);`
41+
42+
// Create lexer and parser
43+
input := antlr.NewInputStream(sql)
44+
lexer := parser.NewRedshiftLexer(input)
45+
stream := antlr.NewCommonTokenStream(lexer, 0)
46+
p := parser.NewRedshiftParser(stream)
47+
48+
// Set engine to Redshift for dialect-specific parsing
49+
p.Engine = parser.EngineRedshift
50+
51+
// Parse the SQL
52+
tree := p.Root()
53+
54+
fmt.Println("Successfully parsed Redshift SQL!")
55+
}
56+
```
57+
58+
## Supported SQL Features
59+
60+
### DDL (Data Definition Language)
61+
- `CREATE TABLE` with Redshift-specific options (DISTKEY, SORTKEY, IDENTITY)
62+
- `ALTER TABLE` with column modifications and constraints
63+
- `CREATE INDEX` with various index types
64+
- `CREATE VIEW` and materialized views
65+
- `CREATE FUNCTION` and stored procedures
66+
67+
### DML (Data Manipulation Language)
68+
- `SELECT` with complex joins, subqueries, and window functions
69+
- `INSERT` with conflict resolution (`ON CONFLICT`)
70+
- `UPDATE` with joins and CTEs
71+
- `DELETE` with complex conditions
72+
- `MERGE` statements
73+
74+
### Advanced Features
75+
- Common Table Expressions (CTEs)
76+
- Window functions and analytics
77+
- JSON operations and path expressions
78+
- Array operations
79+
- Regular expressions
80+
- Full-text search
81+
82+
## Engine Modes
83+
84+
The parser supports two engine modes:
85+
86+
### Redshift Mode
87+
```go
88+
p.Engine = parser.EngineRedshift
89+
```
90+
- Supports Redshift-specific syntax extensions
91+
- Handles `IDENTITY` columns, distribution keys, sort keys
92+
- Supports Redshift built-in functions and data types
93+
94+
### PostgreSQL Mode
95+
```go
96+
p.Engine = parser.EnginePostgreSQL
97+
```
98+
- Standard PostgreSQL syntax compliance
99+
- Useful for compatibility testing and migration scenarios
100+
101+
## Development
102+
103+
### Prerequisites
104+
- Go 1.21+
105+
- ANTLR 4.13.2+
106+
107+
### Building from Source
108+
109+
1. **Clone the repository**:
110+
```bash
111+
git clone https://github.com/bytebase/redshift-parser.git
112+
cd redshift-parser
113+
```
114+
115+
2. **Generate parser code**:
116+
```bash
117+
./build.sh
118+
```
119+
120+
3. **Run tests**:
121+
```bash
122+
go test -v
123+
```
124+
125+
### Project Structure
126+
127+
```
128+
redshift-parser/
129+
├── RedshiftLexer.g4 # ANTLR lexer grammar
130+
├── RedshiftParser.g4 # ANTLR parser grammar
131+
├── build.sh # Code generation script
132+
├── redshift_lexer_base.go # Base lexer implementation
133+
├── redshift_parser_base.go # Base parser with engine support
134+
├── keywords.go # 600+ SQL keywords
135+
├── builtin_function.go # Built-in function definitions
136+
├── examples/ # 200+ SQL test files
137+
├── parser_test.go # Main test suite
138+
├── engine_specific_test.go # Engine-specific tests
139+
└── CLAUDE.md # Development guide
140+
```
141+
142+
## Testing
143+
144+
The project includes comprehensive test coverage:
145+
146+
```bash
147+
# Run all tests
148+
go test -v
149+
150+
# Run specific test
151+
go test -run TestRedshiftParser -v
152+
153+
# Run benchmarks
154+
go test -bench=. -v
155+
156+
# Test specific engine
157+
go test -run TestRedshiftSyntax -v
158+
```
159+
160+
Test files are located in the `examples/` directory and cover:
161+
- Basic SQL operations
162+
- Complex queries with joins and subqueries
163+
- Redshift-specific syntax
164+
- Error handling scenarios
165+
- Performance benchmarks
166+
167+
## Grammar Files
168+
169+
The parser is built using ANTLR 4 grammars:
170+
171+
- **RedshiftLexer.g4**: Tokenization rules for SQL keywords, operators, and literals
172+
- **RedshiftParser.g4**: Grammar rules for SQL statement parsing
173+
174+
After modifying grammar files, regenerate the Go code:
175+
176+
```bash
177+
./build.sh
178+
```
179+
180+
## Contributing
181+
182+
1. Fork the repository
183+
2. Create a feature branch
184+
3. Add tests for your changes
185+
4. Ensure all tests pass
186+
5. Update documentation as needed
187+
6. Submit a pull request
188+
189+
### Development Guidelines
190+
191+
- Always run `./build.sh` before testing after grammar changes
192+
- Add test cases for new SQL syntax support
193+
- Follow existing code patterns and conventions
194+
- Use AWS Redshift documentation for syntax reference
195+
- Test against both PostgreSQL and Redshift engines
196+
197+
## License
198+
199+
This project is licensed under the MIT License. See the grammar files for additional license information from the original PostgreSQL grammar contributors.
200+
201+
## Acknowledgments
202+
203+
- Based on the PostgreSQL grammar from [Tunnel Vision Labs](https://github.com/tunnelvisionlabs/antlr4-postgresql)
204+
- Forked from [Bytebase PostgreSQL Parser](https://github.com/bytebase/postgresql-parser)
205+
- Built with [ANTLR 4](https://github.com/antlr/antlr4)
206+
207+
## Related Projects
208+
209+
- [Bytebase](https://github.com/bytebase/bytebase) - Database DevOps platform
210+
- [PostgreSQL Parser](https://github.com/bytebase/postgresql-parser) - Original PostgreSQL parser
211+
- [ANTLR 4](https://github.com/antlr/antlr4) - Parser generator toolkit
212+
213+
## Support
214+
215+
- [GitHub Issues](https://github.com/bytebase/redshift-parser/issues) - Bug reports and feature requests
216+
- [AWS Redshift Documentation](https://docs.aws.amazon.com/redshift/latest/dg/c_SQL_commands.html) - SQL syntax reference
217+
- [ANTLR Documentation](https://github.com/antlr/antlr4/blob/master/doc/index.md) - Grammar development guide

0 commit comments

Comments
 (0)