|
| 1 | +# Redshift Parser Development Guide |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +This repository is a Redshift parser built with ANTLR 4, forked from github.com/bytebase/postgresql-parser. Due to incompatibility issues with PostgreSQL, this separate repository was created to support Amazon Redshift-specific syntax and features. |
| 6 | + |
| 7 | +## Architecture |
| 8 | + |
| 9 | +### Core Components |
| 10 | + |
| 11 | +1. **ANTLR Grammar Files**: |
| 12 | + - `RedshiftLexer.g4` - Tokenization rules for Redshift SQL |
| 13 | + - `RedshiftParser.g4` - Parser grammar with 200+ statement types |
| 14 | + - Generated Go files: `redshift_parser.go`, `redshift_lexer.go`, etc. |
| 15 | + |
| 16 | +2. **Base Classes**: |
| 17 | + - `redshift_parser_base.go` - Engine-aware parser with PostgreSQL/Redshift support |
| 18 | + - `redshift_lexer_base.go` - Base lexer implementation |
| 19 | + - `string_stack.go` - Utility for string stack operations |
| 20 | + |
| 21 | +3. **Supporting Files**: |
| 22 | + - `keywords.go` - 600+ PostgreSQL keywords with reserved status |
| 23 | + - `builtin_function.go` - Built-in function definitions |
| 24 | + - `build.sh` - ANTLR code generation script |
| 25 | + |
| 26 | +### Engine Support |
| 27 | + |
| 28 | +The parser supports multiple database engines: |
| 29 | +- `EnginePostgreSQL` - Standard PostgreSQL syntax |
| 30 | +- `EngineRedshift` - Amazon Redshift-specific syntax extensions |
| 31 | + |
| 32 | +## Development Guidelines |
| 33 | + |
| 34 | +### Code Conventions |
| 35 | + |
| 36 | +1. **Follow existing patterns**: Always examine existing code before making changes |
| 37 | +2. **Token/Rule/Name Convention**: Maintain consistency with current ANTLR grammar naming |
| 38 | +3. **Engine-specific features**: Use engine detection for Redshift-specific syntax |
| 39 | +4. **Error handling**: Implement proper error listeners and recovery mechanisms |
| 40 | + |
| 41 | +### Testing Requirements |
| 42 | + |
| 43 | +**CRITICAL**: Every change must include a related test case. |
| 44 | + |
| 45 | +1. **Test Structure**: |
| 46 | + - Add SQL test files to the `examples/` directory |
| 47 | + - Use Go-based tests in `parser_test.go` and `engine_specific_test.go` |
| 48 | + - Tests automatically parse all SQL files in `examples/` |
| 49 | + |
| 50 | +2. **Test Content Sources**: |
| 51 | + - Reference https://docs.aws.amazon.com/redshift/latest/dg/c_SQL_commands.html |
| 52 | + - Crawl syntax examples from AWS Redshift documentation |
| 53 | + - Use real-world SQL examples when possible |
| 54 | + |
| 55 | +3. **Test Categories**: |
| 56 | + - DDL: CREATE, ALTER, DROP statements |
| 57 | + - DML: SELECT, INSERT, UPDATE, DELETE |
| 58 | + - Redshift-specific: IDENTITY columns, DISTKEY, SORTKEY, etc. |
| 59 | + - Advanced: Window functions, CTEs, JSON operations |
| 60 | + |
| 61 | +### Adding New Features |
| 62 | + |
| 63 | +1. **Grammar Changes**: |
| 64 | + ```bash |
| 65 | + # Edit RedshiftLexer.g4 or RedshiftParser.g4 |
| 66 | + # Run build script to regenerate Go code |
| 67 | + ./build.sh |
| 68 | + ``` |
| 69 | + |
| 70 | +2. **Engine-Specific Logic**: |
| 71 | + - Use `GetEngine()` method to detect Redshift vs PostgreSQL |
| 72 | + - Implement conditional parsing for dialect-specific features |
| 73 | + - See `engine_specific_test.go` for examples |
| 74 | + |
| 75 | +3. **Testing Process**: |
| 76 | + - Create SQL test files in `examples/` |
| 77 | + - Run tests: `go test -v` |
| 78 | + - Verify both parsing success and error handling |
| 79 | + |
| 80 | +### Common Tasks |
| 81 | + |
| 82 | +#### Adding Redshift-Specific Syntax |
| 83 | + |
| 84 | +1. Identify the syntax difference from PostgreSQL |
| 85 | +2. Update the appropriate grammar file (lexer or parser) |
| 86 | +3. Add engine-specific logic if needed |
| 87 | +4. Create test cases with AWS documentation examples |
| 88 | +5. Verify tests pass for both engines |
| 89 | + |
| 90 | +#### Adding New Keywords |
| 91 | + |
| 92 | +1. Add to `keywords.go` with appropriate reserved status |
| 93 | +2. Update lexer grammar if needed |
| 94 | +3. Test keyword recognition in various contexts |
| 95 | + |
| 96 | +#### Adding Built-in Functions |
| 97 | + |
| 98 | +1. Add to `builtin_function.go` in appropriate category |
| 99 | +2. Update parser rules if function has special syntax |
| 100 | +3. Test function parsing and recognition |
| 101 | + |
| 102 | +## Build and Test Commands |
| 103 | + |
| 104 | +**IMPORTANT**: Always run `./build.sh` before running tests to generate the latest Go code from ANTLR grammars. |
| 105 | + |
| 106 | +```bash |
| 107 | +# Generate parser code from ANTLR grammars (REQUIRED before testing) |
| 108 | +./build.sh |
| 109 | + |
| 110 | +# Run all tests |
| 111 | +go test -v |
| 112 | + |
| 113 | +# Run specific test |
| 114 | +go test -run TestParser -v |
| 115 | + |
| 116 | +# Run benchmarks |
| 117 | +go test -bench=. -v |
| 118 | +``` |
| 119 | + |
| 120 | +## References |
| 121 | + |
| 122 | +- [AWS Redshift SQL Commands](https://docs.aws.amazon.com/redshift/latest/dg/c_SQL_commands.html) |
| 123 | +- [ANTLR 4 Documentation](https://github.com/antlr/antlr4/blob/master/doc/index.md) |
| 124 | +- [PostgreSQL Grammar Reference](https://github.com/tunnelvisionlabs/antlr4-postgresql) |
| 125 | + |
| 126 | +## Contributing |
| 127 | + |
| 128 | +1. Always add test cases for new features |
| 129 | +2. Follow existing code patterns and conventions |
| 130 | +3. Test against both PostgreSQL and Redshift engines |
| 131 | +4. Use AWS documentation for accurate syntax examples |
| 132 | +5. Ensure all tests pass before submitting changes |
0 commit comments