-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Summary
We've developed an efficient visitor pattern implementation for traversing Substrait protobuf plans in Go. We'd like to contribute this to the Substrait community as it solves common traversal needs in an idiomatic Go way.
Background
When working with Substrait plans, we frequently need to:
- Collect statistics about plan nodes
- Compute hashes for plan comparison/caching
- Extract metadata (table names, function calls, etc.)
- Transform or validate plan structures
The naive approach requires writing custom traversal logic for each use case, leading to:
- Duplicate switch statements across different analyzers
- Error-prone traversal logic
- Missed nodes when plan structures evolve
- Performance issues from repeated traversals
Proposed Design
Code can be seen in #136
Core Interfaces
// Visitor is a marker interface for all visitors
type Visitor interface{}
// RelVisitor is implemented by visitors that want to visit relation nodes
type RelVisitor interface {
VisitRel(rel *substraitpb.Rel)
}
// ExprVisitor is implemented by visitors that want to visit expression nodes
type ExprVisitor interface {
VisitExpr(expr *substraitpb.Expression)
}Usage Example
// Visitor that only cares about counting relations
type NodeCounter struct {
count int
}
func (n *NodeCounter) VisitRel(rel *substraitpb.Rel) {
n.count++
}
// Visitor that only cares about function names
type FunctionCollector struct {
functions []string
}
func (f *FunctionCollector) VisitExpr(expr *substraitpb.Expression) {
if sf := expr.GetScalarFunction(); sf != nil {
f.functions = append(f.functions, sf.Name)
}
}
// Usage
counter := &NodeCounter{}
traverse.Walk(plan, counter)
fmt.Printf("Found %d relations\n", counter.count)Multiple Visitors in Single Traversal
// Combine multiple analyses in one pass
multi := traverse.NewMultiVisitor(
NewHashVisitor(ctx),
NewStatsCollector(),
NewTableExtractor(),
)
traverse.Walk(plan, multi)Why This Design is Idiomatic Go
-
Optional Interface Implementation: Following patterns like
http.ResponseWriter+http.Flusher, visitors only implement the methods they need:if exprVisitor, ok := visitor.(ExprVisitor); ok { exprVisitor.VisitExpr(expr) }
-
Composition over Inheritance:
MultiVisitorcomposes behaviors without inheritance -
Interface Segregation: Separate interfaces for different node types (relations vs expressions)
-
No Forced Implementations: Unlike a single fat interface, you don't need no-op methods
Design Tradeoffs
1. Statefulness
Decision: Visitors maintain their own state
Rationale:
- Allows natural encapsulation (e.g.,
HashVisitorowns its digest) - Enables efficient multi-visitor patterns
- Follows Go's preference for explicit state over implicit context
Alternative considered: Passing context through visit methods
// Rejected: adds complexity and allocations
VisitRel(ctx VisitContext, rel *substraitpb.Rel)2. Extension Metadata Handling
Decision: Visitors that need plan metadata accept it at construction
ctx := traverse.NewPlanContext(plan) // Extracts extensions once
visitor := NewHashVisitor(ctx) // Dependency injection
traverse.Walk(plan.Relations[0], visitor)Rationale:
- Single extraction of metadata (efficient)
- Clean dependency injection pattern
- Visitors remain focused on traversal
Alternative considered: Passing plan to each visit method (rejected due to coupling)
3. Cycle Detection
Decision: Don't include cycle detection by default
Rationale:
- proto structs with cycles are invalid substrait
- they couldn't be marshaled/un-marshaled so there isn't a security risk
- the functionality can be included in a composable visitor that maintains it's own map of visited rel's
4. Read-Only Constraint
Decision: Visitors cannot modify the tree during traversal
Rationale:
- Enables safe concurrent traversal
- Prevents subtle bugs from modification during iteration
- Simplifies implementation significantly
Given there is no way to pass immutable references, read-only behavior is a matter of documentation the risks of updating protobufs during traversal.
5. Protobuf Focused
The use case in mind doesn't use the non-protobuf go structs as the service is more of a router then planner/constructor but the design should be applicable there as well.
Performance Characteristics
- Zero allocations per node visited (only initial map allocation for cycle detection)
- Single-pass traversal for multiple visitors
- Minimal interface assertions
Questions for the Community
-
Naming: Should this be part of
substrait-goor a separatesubstrait-go-traversepackage? -
Error Handling: Should visitors be able to return errors to stop traversal?
VisitRel(rel *substraitpb.Rel) error // Alternative design
-
Additional Interfaces: Are there other optional interfaces that would be useful?
LiteralVisitorfor visiting literals?TypeVisitorfor visiting type definitions?FieldRefVisitor?
-
Context Pattern: Should we standardize how visitors receive plan metadata?
type ContextualVisitor interface { SetContext(ctx *PlanContext) }
-
Modification Support: Should we provide a separate
MutableVisitorpattern for transformations?
Implementation
We have an implementation with:
- Core traversal logic with cycle detection
- Helper types (
MultiVisitor,StatefulVisitor) - Plan context extraction for extensions
- Test coverage
- Real-world visitors (hashing, statistics, table extraction)
We're happy to contribute this if there's interest. The code is currently internal but was designed with open-sourcing in mind.
Related Work
- Java's Calcite
RelVisitor- Similar goals, different language idioms