Skip to content

Latest commit

 

History

History
884 lines (683 loc) · 32 KB

File metadata and controls

884 lines (683 loc) · 32 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

cel2sql converts CEL (Common Expression Language) expressions to SQL conditions. It supports multiple SQL dialects: PostgreSQL (default), MySQL, SQLite, DuckDB, and BigQuery.

Module: github.com/spandigital/cel2sql/v3 Go Version: 1.24+ Current Version: v3.0.0

Common Development Commands

Build and Test

make build              # Build the project
make test               # Run tests with race detection and coverage
make test-coverage      # Generate HTML coverage report
make bench              # Run performance benchmarks
make bench-compare      # Run benchmarks and save for comparison

Code Quality

make lint               # Run golangci-lint (required before commits)
make fmt                # Format code with go fmt and goimports
make vuln-check         # Run security vulnerability checks
make ci                 # Run all checks (fmt, lint, test, vuln-check)

Dependencies

make deps               # Download and verify dependencies
make update-deps        # Update all dependencies
make install-tools      # Install development tools (golangci-lint, goimports, govulncheck)

Run a single test

go test -v -run TestFunctionName ./...

Core Architecture

Main Components

  1. cel2sql.go - Main conversion engine that transforms CEL AST to PostgreSQL SQL strings

    • Entry point: Convert(ast *cel.Ast) (string, error)
    • Uses visitor pattern to traverse CEL expression tree
    • Handles operators, functions, comprehensions, and type conversions
  2. comprehensions.go - CEL comprehensions support (all, exists, exists_one, filter, map)

    • Converts to PostgreSQL UNNEST() patterns
    • Supports nested comprehensions
    • Works with both schema-based arrays and JSON arrays
  3. json.go - JSON/JSONB field handling

    • Detects JSON columns and applies PostgreSQL path operators (->>, ?)
    • Supports nested JSON field access
    • Handles has() macro for JSON field existence checks
  4. operators.go - Operator conversion logic

    • Maps CEL operators to PostgreSQL equivalents
    • Handles special cases (IS NULL, boolean comparisons, etc.)
  5. timestamps.go - Timestamp and duration handling

    • Converts CEL timestamp operations to PostgreSQL TIMESTAMP operations
    • Handles INTERVAL conversions
  6. pg/provider.go - PostgreSQL type provider for CEL type system

    • Maps PostgreSQL types to CEL types
    • Supports dynamic schema loading from live databases via LoadTableSchema
    • Handles composite types and arrays
  7. mysql/provider.go - MySQL type provider

    • Maps MySQL types to CEL types
    • LoadTableSchema uses information_schema.columns with table_schema = DATABASE()
    • Accepts *sql.DB (caller owns connection)
  8. sqlite/provider.go - SQLite type provider

    • Maps SQLite type affinity to CEL types
    • LoadTableSchema uses PRAGMA table_info with table name validation
    • Accepts *sql.DB (caller owns connection)
  9. duckdb/provider.go - DuckDB type provider

    • Maps DuckDB types to CEL types, detects array types from [] suffix
    • LoadTableSchema uses information_schema.columns
    • Accepts *sql.DB (works with any DuckDB driver)
  10. bigquery/provider.go - BigQuery type provider

    • Maps BigQuery types to CEL types
    • LoadTableSchema uses BigQuery client API (Table.Metadata)
    • Handles nested RECORD types recursively
    • Accepts *bigquery.Client + dataset ID
  11. sqltypes/types.go - Custom SQL type definitions for CEL (DATE, TIME, DATETIME, INTERVAL)

Type System Integration

The library uses CEL's protobuf-based type system (exprpb.Type, exprpb.Expr). PostgreSQL types are mapped to CEL types through pg.TypeProvider:

Text and String Types:

  • text, varchar, char, character varying, characterdecls.String
  • xmldecls.String
  • inet, cidrdecls.String (network addresses)
  • macaddr, macaddr8decls.String (MAC addresses)
  • tsvector, tsquerydecls.String (full-text search)

Numeric Types:

  • bigint, integer, int, int4, int8, smallint, int2decls.Int
  • double precision, real, float4, float8, numeric, decimaldecls.Double
  • moneydecls.Double

Boolean and Binary:

  • boolean, booldecls.Bool
  • byteadecls.Bytes
  • uuiddecls.Bytes

Temporal Types:

  • timestamp, timestamptz, timestamp with time zone, timestamp without time zonedecls.Timestamp
  • datesqltypes.Date
  • time, timetz, time with time zone, time without time zonesqltypes.Time

Structured Types:

  • json, jsonbdecls.Dyn (with automatic JSON path support)
  • Arrays: Set Repeated: true in schema
  • Multi-dimensional arrays: Set Repeated: true and Dimensions: N in schema
  • Composite types: Use nested Schema fields

Unsupported Types: Unknown PostgreSQL types (e.g., point, polygon, box, custom enums) will cause FindStructFieldType() to return found=false. This prevents silent type mismatches. Add explicit support for custom types or use composite type definitions.

Multi-Dimensional Array Support

cel2sql supports PostgreSQL multi-dimensional arrays (1D, 2D, 3D, 4D+) with automatic dimension detection.

Dimension Detection:

  • Automatically detects dimensions from PostgreSQL type strings via detectArrayDimensions()
  • Supports bracket notation: integer[] (1D), integer[][] (2D), integer[][][] (3D)
  • Supports underscore notation: _int4 (1D), _int4[] (2D), _int4[][] (3D)
  • Correctly combines both notations (e.g., _int4[] = 2D: 1 from underscore + 1 from bracket)

Schema Definition:

schema := pg.NewSchema([]pg.FieldSchema{
    {Name: "tags", Type: "text", Repeated: true, Dimensions: 1},      // 1D: text[]
    {Name: "matrix", Type: "integer", Repeated: true, Dimensions: 2},  // 2D: integer[][]
    {Name: "cube", Type: "float", Repeated: true, Dimensions: 3},      // 3D: float[][][]
})

SQL Generation:

  • size() function automatically uses correct dimension in ARRAY_LENGTH(field, dimension)
  • Example: size(matrix) where matrix integer[][]COALESCE(ARRAY_LENGTH(matrix, 2), 0)
  • Dimension lookup uses CEL typeMap for accurate type resolution in getArrayDimension()

Backward Compatibility:

  • Arrays without schema information default to dimension 1
  • Explicit Dimensions: 0 defaults to dimension 1
  • Existing code continues to work without changes

Implementation Details:

  • pg/provider.go: Contains detectArrayDimensions() and dimension detection logic
  • cel2sql.go: Contains getArrayDimension() that uses CEL typeMap for lookup
  • LoadTableSchema(): Queries udt_name column to detect dimensions from database

JSON/JSONB Support

CEL field access on JSON/JSONB columns automatically converts to PostgreSQL JSON path operations:

  • user.preferences.themeuser.preferences->>'theme'
  • has(user.preferences.theme)user.preferences ? 'theme'
  • Works with nested paths: user.profile.settings.keyuser.profile->'settings'->>'key'

Detection happens in shouldUseJSONPath() and visitSelect() functions in json.go.

CEL Comprehensions Support

Full support for CEL comprehensions converted to PostgreSQL UNNEST patterns:

  • list.all(x, x > 0)NOT EXISTS (SELECT 1 FROM UNNEST(list) AS x WHERE NOT (x > 0))
  • list.exists(x, condition)EXISTS (SELECT 1 FROM UNNEST(list) AS x WHERE condition)
  • list.filter(x, condition)ARRAY(SELECT x FROM UNNEST(list) AS x WHERE condition)
  • list.map(x, transform)ARRAY(SELECT transform FROM UNNEST(list) AS x)
  • Supports nested comprehensions and both schema arrays and JSON arrays

Pattern recognition and conversion logic is in comprehensions.go.

Regex Pattern Matching (v2.8.0+)

Supports CEL matches() function with automatic RE2 to POSIX regex conversion:

  • field.matches(r"pattern")field ~ 'pattern'
  • field.matches(r"(?i)pattern")field ~* 'pattern' (case-insensitive, (?i) stripped from pattern)
  • field.matches(r"(?:abc)")field ~ '(abc)' (non-capturing groups converted to regular groups)

Automatic Conversions:

  • Case-insensitive flag (?i) → Uses ~* operator, flag stripped from pattern
  • Non-capturing groups (?:...) → Converted to regular groups (...)
  • Character classes: \d[[:digit:]], \w[[:alnum:]_], \s[[:space:]]
  • Word boundaries: \b\y

Unsupported RE2 Features (will return errors):

  • Lookahead assertions: (?=...), (?!...)
  • Lookbehind assertions: (?<=...), (?<!...)
  • Named capture groups: (?P<name>...)
  • Inline flags other than (?i): (?m), (?s), (?-i), etc.

These validations prevent PostgreSQL syntax errors and ensure predictable behavior.

Code Quality Requirements

Pre-commit Checklist

  1. Run make fmt - Format code
  2. Run make lint - Must pass without errors (golangci-lint)
  3. Run make test - All tests must pass
  4. Common lint fixes:
    • Use errors.New() instead of fmt.Errorf() for static error messages
    • Rename unused parameters to _
    • Add comments for exported constants and types
    • Include package comments for main packages

Testing Guidelines

  • Use the dialect-specific schema/provider for each dialect's tests
  • Use pg.NewTypeProvider() for PostgreSQL, mysql.NewTypeProvider() for MySQL, etc.
  • Include tests for nested types, arrays, and JSON fields
  • Verify SQL output matches the target dialect's syntax
  • Use testcontainers for integration tests (PostgreSQL, MySQL, BigQuery)
  • Use in-memory databases for SQLite integration tests (no Docker needed)
  • DuckDB integration tests require CGO; use unit tests for type mapping validation
  • Provider tests live in {dialect}/provider_test.go

Performance Benchmarks

cel2sql includes comprehensive benchmarks to track performance and detect regressions. Benchmarks are automatically run as part of the CI/CD pipeline.

Running Benchmarks Locally

# Run all benchmarks
make bench

# Run benchmarks and save output for comparison
make bench-compare

# Run specific benchmark
go test -bench=BenchmarkConvertSimple -benchmem ./...

# Run benchmarks with custom duration
go test -bench=. -benchmem -benchtime=5s ./...

Benchmark Categories

The benchmark suite covers all major conversion features:

Simple Operations (BenchmarkConvertSimple)

  • Field comparisons (equality, greater than, less than)
  • String operations (equality checks)
  • Boolean checks

Operators (BenchmarkConvertOperators)

  • Logical operators (AND, OR)
  • Arithmetic operators (+, -, *, /, %)
  • String concatenation
  • Complex nested expressions

Comprehensions (BenchmarkConvertComprehensions)

  • all() - Universal quantification
  • exists() - Existential quantification
  • exists_one() - Unique existence
  • filter() - Array filtering
  • map() - Array transformation

JSON/JSONB (BenchmarkConvertJSONPath)

  • Simple and nested JSON field access
  • JSON field existence checks (has())
  • JSON field comparisons
  • Complex JSON expressions

Regex (BenchmarkConvertRegex)

  • Simple patterns
  • Case-insensitive patterns
  • Character classes (\d, \w, \s)
  • Word boundaries (\b)

Complex Expressions (BenchmarkConvertDeeplyNested, BenchmarkConvertLargeExpression)

  • Deeply nested AND/OR chains
  • Nested parentheses
  • Ternary operators
  • Large expressions with many conditions

Timestamps (BenchmarkConvertTimestamps)

  • Timestamp comparisons
  • Date/time function calls
  • DateTime operations

String Operations (BenchmarkConvertStringOperations)

  • startsWith, endsWith, contains
  • String concatenation
  • Multiple string operations combined

Query Analysis (BenchmarkAnalyzeQuery)

  • Index recommendation generation
  • Pattern detection for optimization

Options (BenchmarkConvertWithOptions)

  • Various conversion option combinations
  • Schema usage overhead
  • Max depth and output length limits

Benchmark Output

Benchmarks report:

  • Iterations: Number of times the operation was executed
  • ns/op: Nanoseconds per operation (lower is better)
  • B/op: Bytes allocated per operation (lower is better)
  • allocs/op: Number of allocations per operation (lower is better)

Example output:

BenchmarkConvertSimple/equality-12     1412060    848.7 ns/op    1593 B/op    25 allocs/op
BenchmarkConvertOperators/logical_and-12   943438    1255 ns/op    2154 B/op    36 allocs/op

CI/CD Integration

Benchmarks run automatically on every PR and push to main:

  • Runs on Go 1.24.x
  • Stores benchmark results in gh-pages branch for historical tracking
  • Comments on PRs when performance changes >150%
  • Generates visual charts at https://spandigital.github.io/cel2sql/dev/bench/
  • Provides summary of all benchmark results

Comparing Benchmark Results

To compare benchmarks between two runs:

# Run benchmarks and save baseline
make bench-compare  # Saves to bench-new.txt

# Make changes to code

# Run benchmarks again and compare
mv bench-new.txt bench-old.txt
make bench-compare

# Install benchstat for detailed comparison (optional)
go install golang.org/x/perf/cmd/benchstat@latest
benchstat bench-old.txt bench-new.txt

When to Run Benchmarks

  • Before and after performance optimizations
  • When modifying core conversion logic
  • When adding new features that may impact performance
  • To validate that changes don't cause regressions

Common Patterns

Creating Type Providers (Pre-defined Schemas)

All dialects support pre-defined schemas via NewTypeProvider:

// PostgreSQL
schema := pg.NewSchema([]pg.FieldSchema{
    {Name: "field_name", Type: "text", Repeated: false},
    {Name: "array_field", Type: "text", Repeated: true},
    {Name: "json_field", Type: "jsonb", Repeated: false},
    {Name: "composite_field", Type: "composite", Schema: []pg.FieldSchema{...}},
})
provider := pg.NewTypeProvider(map[string]pg.Schema{"TableName": schema})

// MySQL (same schema types, dialect-specific type names)
schema := mysql.NewSchema([]mysql.FieldSchema{
    {Name: "name", Type: "varchar"},
    {Name: "metadata", Type: "json", IsJSON: true},
})
provider := mysql.NewTypeProvider(map[string]mysql.Schema{"TableName": schema})

// SQLite, DuckDB, BigQuery follow the same pattern with their own type names

Dynamic Schema Loading

All dialects support runtime schema introspection from live databases via LoadTableSchema:

// PostgreSQL — accepts connection string, manages its own pool
provider, err := pg.NewTypeProviderWithConnection(ctx, connectionString)
if err != nil {
    return err
}
defer provider.Close()
err = provider.LoadTableSchema(ctx, "tableName")

// MySQL — accepts *sql.DB, caller owns connection
db, _ := sql.Open("mysql", "user:pass@tcp(host:3306)/db?parseTime=true")
provider, err := mysql.NewTypeProviderWithConnection(ctx, db)
err = provider.LoadTableSchema(ctx, "tableName")

// SQLite — accepts *sql.DB, uses PRAGMA table_info (validates table name)
db, _ := sql.Open("sqlite", "mydb.sqlite")
provider, err := sqlite.NewTypeProviderWithConnection(ctx, db)
err = provider.LoadTableSchema(ctx, "tableName")

// DuckDB — accepts *sql.DB, works with any DuckDB driver
db, _ := sql.Open("duckdb", "mydb.duckdb")
provider, err := duckdb.NewTypeProviderWithConnection(ctx, db)
err = provider.LoadTableSchema(ctx, "tableName")

// BigQuery — accepts *bigquery.Client + dataset ID
client, _ := bigquery.NewClient(ctx, "project-id")
provider, err := bqprovider.NewTypeProviderWithClient(ctx, client, "dataset_id")
err = provider.LoadTableSchema(ctx, "tableName")

Key differences per dialect:

  • PostgreSQL: NewTypeProviderWithConnection(ctx, connString) — owns its pgxpool, Close() releases it
  • MySQL/SQLite/DuckDB: NewTypeProviderWithConnection(ctx, *sql.DB) — caller owns DB, Close() is no-op
  • BigQuery: NewTypeProviderWithClient(ctx, *bigquery.Client, datasetID) — caller owns client, Close() is no-op
  • SQLite: Table name validated via regex (^[a-zA-Z_][a-zA-Z0-9_]*$) since PRAGMA doesn't support parameterized queries

CEL Environment Setup

env, err := cel.NewEnv(
    cel.CustomTypeProvider(provider),
    cel.Variable("table", cel.ObjectType("TableName")),
)

Converting CEL to SQL

ast, issues := env.Compile(`table.field == "value" && table.age > 30`)
if issues != nil && issues.Err() != nil {
    return issues.Err()
}

sqlCondition, err := cel2sql.Convert(ast)
// Returns: table.field = 'value' AND table.age > 30

Query Analysis and Index Recommendations

cel2sql can analyze CEL expressions and recommend dialect-specific database indexes to optimize performance.

Architecture

Index analysis uses the IndexAdvisor interface (dialect/index_advisor.go):

  • Pattern detection stays centralized in analysis.go (walks the CEL AST once)
  • DDL generation is delegated to per-dialect IndexAdvisor implementations
  • Each built-in dialect implements IndexAdvisor on its *Dialect struct
  • Use dialect.GetIndexAdvisor(d) to type-assert a dialect to IndexAdvisor
  • Unsupported patterns return nil (silently skipped)

PatternTypes detected: PatternComparison, PatternJSONAccess, PatternRegexMatch, PatternArrayMembership, PatternArrayComprehension, PatternJSONArrayComprehension.

Using AnalyzeQuery

ast, issues := env.Compile(`person.age > 18 && person.metadata.verified == true`)
if issues != nil && issues.Err() != nil {
    return issues.Err()
}

// PostgreSQL (default)
sql, recommendations, err := cel2sql.AnalyzeQuery(ast,
    cel2sql.WithSchemas(schemas))

// Or with a specific dialect
sql, recommendations, err := cel2sql.AnalyzeQuery(ast,
    cel2sql.WithSchemas(schemas),
    cel2sql.WithDialect(mysql.New()))
if err != nil {
    return err
}

for _, rec := range recommendations {
    fmt.Printf("Column: %s, Type: %s\n", rec.Column, rec.IndexType)
    fmt.Printf("Reason: %s\n", rec.Reason)
    fmt.Printf("Execute: %s\n\n", rec.Expression)
}

Per-Dialect Index Types

Pattern PostgreSQL MySQL SQLite DuckDB BigQuery
Comparison BTREE BTREE BTREE ART CLUSTERING
JSON access GIN BTREE (functional) (nil) ART SEARCH_INDEX
Regex match GIN + pg_trgm FULLTEXT (nil) (nil) (nil)
Array membership GIN (nil) (nil) ART (nil)
Array comprehension GIN (nil) (nil) ART (nil)
JSON array comprehension GIN BTREE (functional) (nil) ART SEARCH_INDEX

Per-dialect DDL examples:

  • PostgreSQL: CREATE INDEX idx_col_gin ON table_name USING GIN (col);
  • MySQL: CREATE INDEX idx_col_btree ON table_name (col); / CREATE FULLTEXT INDEX ...
  • SQLite: CREATE INDEX idx_col ON table_name (col);
  • DuckDB: CREATE INDEX idx_col ON table_name (col); (ART by default)
  • BigQuery: ALTER TABLE t SET OPTIONS (clustering_columns=['col']); / CREATE SEARCH INDEX ...

IndexRecommendation Structure

type IndexRecommendation struct {
    Column     string  // Full column name (e.g., "person.metadata")
    IndexType  string  // Dialect-specific: "BTREE", "GIN", "ART", "CLUSTERING", "SEARCH_INDEX", etc.
    Expression string  // Complete DDL statement for the target dialect
    Reason     string  // Explanation of why this index is recommended
}

Implementation Files

  • dialect/index_advisor.goIndexAdvisor interface, PatternType, IndexPattern, GetIndexAdvisor() helper
  • dialect/postgres/index_advisor.go — PostgreSQL: BTREE, GIN, GIN+pg_trgm
  • dialect/mysql/index_advisor.go — MySQL: BTREE, FULLTEXT
  • dialect/sqlite/index_advisor.go — SQLite: BTREE only
  • dialect/duckdb/index_advisor.go — DuckDB: ART
  • dialect/bigquery/index_advisor.go — BigQuery: CLUSTERING, SEARCH_INDEX

When to Use

  • Development: Discover which indexes your queries need
  • Performance tuning: Identify missing indexes causing slow queries
  • Production monitoring: Analyze user-generated filter expressions

See examples/index_analysis/ for a complete working example with all 5 dialects.

Logging and Observability

cel2sql supports structured logging using Go's standard log/slog package (Go 1.21+).

Logging is optional and has zero overhead when not enabled (uses slog.DiscardHandler by default).

Enable Logging

import "log/slog"

// JSON handler for production/machine parsing
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
    Level: slog.LevelDebug,
}))

// Text handler for development/debugging
logger := slog.New(slog.NewTextHandler(os.Stderr, nil))

sql, err := cel2sql.Convert(ast,
    cel2sql.WithSchemas(schemas),
    cel2sql.WithLogger(logger))

What Gets Logged

  • JSON path detection decisions - Table, field, operator selection (->>, ?)
  • Comprehension type identification - all, exists, exists_one, filter, map
  • Schema lookups - Hits/misses, field type detection
  • Performance metrics - Conversion duration
  • Regex pattern transformations - RE2 to POSIX conversion
  • Operator mapping decisions - CEL to SQL operator conversion
  • Error contexts - Full details when conversions fail

Log Levels

  • Debug: Detailed conversion steps, operator mappings, schema lookups
  • Error: Conversion failures with full context

Example Usage

// Without logger - zero overhead (default)
sql, err := cel2sql.Convert(ast)

// With logging - detailed observability
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
    Level: slog.LevelDebug,
}))

sql, err := cel2sql.Convert(ast,
    cel2sql.WithSchemas(schemas),
    cel2sql.WithContext(ctx),
    cel2sql.WithLogger(logger))

See examples/logging/ for a complete working example with both JSON and text handlers.

Context Support (v2.10.0)

cel2sql supports context propagation for cancellation, timeouts, and observability integration.

Context support is optional and uses the functional options pattern.

Enable Context

import "context"

// With timeout
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

sql, err := cel2sql.Convert(ast,
    cel2sql.WithContext(ctx),
    cel2sql.WithSchemas(schemas))

// With cancellation
ctx, cancel := context.WithCancel(context.Background())
defer cancel()

sql, err := cel2sql.Convert(ast,
    cel2sql.WithContext(ctx))

When Context Is Checked

Context cancellation is checked at key recursion points:

  • visit() - Main traversal entry point
  • visitCall() - Every function call
  • visitComprehension() - Before processing comprehensions
  • Individual comprehension handlers

Error Handling

If context is cancelled or times out during conversion:

sql, err := cel2sql.Convert(ast, cel2sql.WithContext(ctx))
if err != nil {
    if errors.Is(err, context.Canceled) {
        // Conversion was cancelled
    } else if errors.Is(err, context.DeadlineExceeded) {
        // Conversion timed out
    }
}

Benefits

  • Cancellation: Stop long-running conversions
  • Timeouts: Protect against complex expressions
  • Observability: Integrate with distributed tracing
  • Resource Cleanup: Automatic cleanup on cancellation

Example Usage

// Without context (default) - backward compatible
sql, err := cel2sql.Convert(ast)

// With context and other options
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))

sql, err := cel2sql.Convert(ast,
    cel2sql.WithContext(ctx),
    cel2sql.WithSchemas(schemas),
    cel2sql.WithLogger(logger))

See examples/context/ for a complete working example with timeouts and cancellation.

Security Features (v2.10.0)

cel2sql includes comprehensive security protections against common attack vectors.

Field Name Validation

All field names are validated to prevent SQL injection:

Validation Rules:

  • Maximum length: 63 characters (PostgreSQL NAMEDATALEN-1)
  • Format: Must start with letter/underscore, contain only alphanumeric + underscore
  • Reserved keywords: 60+ SQL keywords are rejected
  • Empty strings: Not allowed

Protection Against:

// ❌ These will be rejected:
field'; DROP TABLE users--
SELECT * FROM sensitive
user OR 1=1

Validation happens at:

  • visitSelect() - Field names in select expressions
  • visitIdent() - Identifier names to prevent reserved keywords

JSON Field Escaping

Single quotes in JSON field names are automatically escaped:

Automatic Escaping:

// CEL with quote in field name
user.preferences.user'name == "test"

// Generated SQL (quotes escaped)
user.preferences->>'user''name' = 'test'

Protection Against:

  • SQL injection via malicious JSON field names
  • Field names like: user' OR '1'='1

Escaping applied in:

  • visitSelect() - JSON path operators (->>)
  • visitHasFunction() - JSON existence operators (?)
  • visitNestedJSONHas() - jsonb_extract_path_text()
  • buildJSONPath*() - All JSON path construction

ReDoS Protection

Comprehensive validation prevents Regular Expression Denial of Service attacks:

Pattern Validation:

  • Length limit: 500 characters maximum
  • Nested quantifiers: Detects patterns like (a+)+, (a*)*
  • Capture group limit: Maximum 20 groups
  • Quantified alternation: Blocks patterns like (a|a)*b
  • Nesting depth limit: Maximum 10 levels

Examples:

// ✅ Safe patterns (allowed)
field.matches(r"[a-z]+@[a-z]+\.[a-z]+")
field.matches(r"(?i)^user_\d+$")

// ❌ Dangerous patterns (rejected)
field.matches(r"(a+)+b")           // Nested quantifiers
field.matches(r"(a|a)*b")          // Quantified alternation
field.matches(r"(((((((((((a"))    // Excessive nesting

Protection Against:

  • Catastrophic backtracking (CWE-1333)
  • CPU exhaustion from complex patterns
  • Service disruption from malicious regex

Resource Exhaustion Protection

cel2sql includes multiple layers of protection against resource exhaustion attacks (CWE-400):

Recursion Depth Limits:

  • Default limit: 100 levels of expression nesting
  • Configurable: Use WithMaxDepth() to adjust
  • Protection: Prevents stack overflow from deeply nested expressions (CWE-674)

SQL Output Length Limits:

  • Default limit: 50,000 characters of generated SQL
  • Configurable: Use WithMaxOutputLength() to adjust
  • Protection: Prevents memory exhaustion from extremely large SQL queries

Comprehension Depth Limits:

  • Fixed limit: 3 levels of nested comprehensions
  • Protection: Prevents resource exhaustion from deeply nested UNNEST/subquery operations

Byte Array Length Limits:

  • Fixed limit: 10,000 bytes maximum
  • Applies to: Non-parameterized mode (hex encoding)
  • Protection: Prevents memory exhaustion from large hex-encoded SQL strings (each byte → ~4 chars)
  • Note: Parameterized mode bypasses this limit (bytes passed directly to database driver)

Examples:

// Use default limits (recommended)
sql, err := cel2sql.Convert(ast)

// Custom recursion depth limit
sql, err := cel2sql.Convert(ast,
    cel2sql.WithMaxDepth(150))

// Custom SQL output length limit
sql, err := cel2sql.Convert(ast,
    cel2sql.WithMaxOutputLength(100000))

// Combine multiple limits
sql, err := cel2sql.Convert(ast,
    cel2sql.WithMaxDepth(75),
    cel2sql.WithMaxOutputLength(25000),
    cel2sql.WithContext(ctx))

Protection Against:

  • Stack overflow from deeply nested expressions
  • Memory exhaustion from extremely large SQL output
  • CPU/memory exhaustion from deeply nested comprehensions
  • Memory exhaustion from large hex-encoded byte arrays
  • DoS attacks via resource consumption

Context Timeouts

Use context timeouts as defense-in-depth:

// Protect against complex expressions
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

sql, err := cel2sql.Convert(ast,
    cel2sql.WithContext(ctx),
    cel2sql.WithSchemas(schemas))

Security Best Practices

  1. Always validate user input before passing to CEL
  2. Use context timeouts for user-provided expressions
  3. Enable logging to monitor conversion patterns
  4. Keep schemas minimal - only expose necessary fields
  5. Use prepared statements when executing generated SQL
  6. Test edge cases with your specific field names

For detailed security information, see the security documentation.

Important Notes

Migration Context

This project was originally BigQuery-only, migrated to PostgreSQL in v2.0, and expanded to multi-dialect in v3.0:

  • v2.0: All cloud.google.com/go/bigquery dependencies removed, bq/ package removed
  • v3.0: Multi-dialect support added (PostgreSQL, MySQL, SQLite, DuckDB, BigQuery)
  • Each dialect has its own type provider with LoadTableSchema support
  • BigQuery dependency re-added for BigQuery dialect support

Things to Avoid

  • Do NOT remove protobuf dependencies (required by CEL)
  • Do NOT use direct SQL string concatenation (use proper escaping)
  • Do NOT ignore context cancellation in database operations
  • Do NOT use PRAGMA with user-controlled table names without validation (SQLite)
  • Do NOT assume a specific dialect — use the dialect interface for dialect-specific behavior

When Adding Features

  1. Consider all supported SQL dialects, not just PostgreSQL
  2. Add comprehensive tests with realistic schemas for each affected dialect
  3. Update type mappings in the appropriate {dialect}/provider.go if needed
  4. Document new CEL operators/functions in README.md
  5. Ensure backward compatibility
  6. Run make ci before committing

Project Structure

cel2sql/
├── cel2sql.go              # Main conversion engine
├── analysis.go             # Query analysis and index recommendations (multi-dialect)
├── comprehensions.go       # CEL comprehensions support
├── json.go                 # JSON/JSONB handling
├── operators.go            # Operator conversion
├── timestamps.go           # Timestamp/duration handling
├── utils.go                # Utility functions
├── schema/                 # Dialect-agnostic schema types
│   └── schema.go           # FieldSchema, Schema with O(1) lookup
├── pg/                     # PostgreSQL type provider
│   └── provider.go         # LoadTableSchema via information_schema + pgxpool
├── mysql/                  # MySQL type provider
│   └── provider.go         # LoadTableSchema via information_schema + *sql.DB
├── sqlite/                 # SQLite type provider
│   └── provider.go         # LoadTableSchema via PRAGMA table_info + *sql.DB
├── duckdb/                 # DuckDB type provider
│   └── provider.go         # LoadTableSchema via information_schema + *sql.DB
├── bigquery/               # BigQuery type provider
│   └── provider.go         # LoadTableSchema via BigQuery client API
├── dialect/                # Dialect interface and implementations
│   ├── dialect.go          # Core Dialect interface (~40 methods)
│   ├── index_advisor.go    # IndexAdvisor interface, PatternType, IndexPattern
│   ├── postgres/           # PostgreSQL dialect + IndexAdvisor (BTREE, GIN, GIN+trgm)
│   ├── mysql/              # MySQL dialect + IndexAdvisor (BTREE, FULLTEXT)
│   ├── sqlite/             # SQLite dialect + IndexAdvisor (BTREE only)
│   ├── duckdb/             # DuckDB dialect + IndexAdvisor (ART)
│   └── bigquery/           # BigQuery dialect + IndexAdvisor (CLUSTERING, SEARCH_INDEX)
├── sqltypes/               # Custom SQL types for CEL
│   └── types.go
├── testcases/              # Shared test cases with per-dialect expected SQL
├── testutil/               # Multi-dialect test runner + env factories
└── examples/               # Usage examples
    ├── basic/
    ├── comprehensions/
    ├── index_analysis/     # Multi-dialect index recommendation demo
    └── load_table_schema/

Each example should be in its own subdirectory with main.go and README.md, runnable via go run main.go.