Skip to content

Standardized Visitor Pattern for Substrait Protobuf Traversal #135

@alexanderbianchi

Description

@alexanderbianchi

Summary

We've developed an efficient visitor pattern implementation for traversing Substrait protobuf plans in Go. We'd like to contribute this to the Substrait community as it solves common traversal needs in an idiomatic Go way.

Background

When working with Substrait plans, we frequently need to:

  • Collect statistics about plan nodes
  • Compute hashes for plan comparison/caching
  • Extract metadata (table names, function calls, etc.)
  • Transform or validate plan structures

The naive approach requires writing custom traversal logic for each use case, leading to:

  • Duplicate switch statements across different analyzers
  • Error-prone traversal logic
  • Missed nodes when plan structures evolve
  • Performance issues from repeated traversals

Proposed Design

Code can be seen in #136

Core Interfaces

// Visitor is a marker interface for all visitors
type Visitor interface{}

// RelVisitor is implemented by visitors that want to visit relation nodes
type RelVisitor interface {
    VisitRel(rel *substraitpb.Rel)
}

// ExprVisitor is implemented by visitors that want to visit expression nodes
type ExprVisitor interface {
    VisitExpr(expr *substraitpb.Expression)
}

Usage Example

// Visitor that only cares about counting relations
type NodeCounter struct {
    count int
}

func (n *NodeCounter) VisitRel(rel *substraitpb.Rel) {
    n.count++
}

// Visitor that only cares about function names
type FunctionCollector struct {
    functions []string
}

func (f *FunctionCollector) VisitExpr(expr *substraitpb.Expression) {
    if sf := expr.GetScalarFunction(); sf != nil {
        f.functions = append(f.functions, sf.Name)
    }
}

// Usage
counter := &NodeCounter{}
traverse.Walk(plan, counter)
fmt.Printf("Found %d relations\n", counter.count)

Multiple Visitors in Single Traversal

// Combine multiple analyses in one pass
multi := traverse.NewMultiVisitor(
    NewHashVisitor(ctx),
    NewStatsCollector(),
    NewTableExtractor(),
)
traverse.Walk(plan, multi)

Why This Design is Idiomatic Go

  1. Optional Interface Implementation: Following patterns like http.ResponseWriter + http.Flusher, visitors only implement the methods they need:

    if exprVisitor, ok := visitor.(ExprVisitor); ok {
        exprVisitor.VisitExpr(expr)
    }
  2. Composition over Inheritance: MultiVisitor composes behaviors without inheritance

  3. Interface Segregation: Separate interfaces for different node types (relations vs expressions)

  4. No Forced Implementations: Unlike a single fat interface, you don't need no-op methods

Design Tradeoffs

1. Statefulness

Decision: Visitors maintain their own state

Rationale:

  • Allows natural encapsulation (e.g., HashVisitor owns its digest)
  • Enables efficient multi-visitor patterns
  • Follows Go's preference for explicit state over implicit context

Alternative considered: Passing context through visit methods

// Rejected: adds complexity and allocations
VisitRel(ctx VisitContext, rel *substraitpb.Rel)

2. Extension Metadata Handling

Decision: Visitors that need plan metadata accept it at construction

ctx := traverse.NewPlanContext(plan)  // Extracts extensions once
visitor := NewHashVisitor(ctx)         // Dependency injection
traverse.Walk(plan.Relations[0], visitor)

Rationale:

  • Single extraction of metadata (efficient)
  • Clean dependency injection pattern
  • Visitors remain focused on traversal

Alternative considered: Passing plan to each visit method (rejected due to coupling)

3. Cycle Detection

Decision: Don't include cycle detection by default

Rationale:

  • proto structs with cycles are invalid substrait
  • they couldn't be marshaled/un-marshaled so there isn't a security risk
  • the functionality can be included in a composable visitor that maintains it's own map of visited rel's

4. Read-Only Constraint

Decision: Visitors cannot modify the tree during traversal

Rationale:

  • Enables safe concurrent traversal
  • Prevents subtle bugs from modification during iteration
  • Simplifies implementation significantly

Given there is no way to pass immutable references, read-only behavior is a matter of documentation the risks of updating protobufs during traversal.

5. Protobuf Focused

The use case in mind doesn't use the non-protobuf go structs as the service is more of a router then planner/constructor but the design should be applicable there as well.

Performance Characteristics

  • Zero allocations per node visited (only initial map allocation for cycle detection)
  • Single-pass traversal for multiple visitors
  • Minimal interface assertions

Questions for the Community

  1. Naming: Should this be part of substrait-go or a separate substrait-go-traverse package?

  2. Error Handling: Should visitors be able to return errors to stop traversal?

    VisitRel(rel *substraitpb.Rel) error  // Alternative design
  3. Additional Interfaces: Are there other optional interfaces that would be useful?

    • LiteralVisitor for visiting literals?
    • TypeVisitor for visiting type definitions?
    • FieldRefVisitor?
  4. Context Pattern: Should we standardize how visitors receive plan metadata?

    type ContextualVisitor interface {
        SetContext(ctx *PlanContext)
    }
  5. Modification Support: Should we provide a separate MutableVisitor pattern for transformations?

Implementation

We have an implementation with:

  • Core traversal logic with cycle detection
  • Helper types (MultiVisitor, StatefulVisitor)
  • Plan context extraction for extensions
  • Test coverage
  • Real-world visitors (hashing, statistics, table extraction)

We're happy to contribute this if there's interest. The code is currently internal but was designed with open-sourcing in mind.

Related Work

  • Java's Calcite RelVisitor - Similar goals, different language idioms

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions