Skip to content

Commit 7762b3c

Browse files
committed
chore: Physical planner - allow to disable inplace constant evaluation
1 parent 8d4663b commit 7762b3c

File tree

2 files changed

+115
-1
lines changed

2 files changed

+115
-1
lines changed

CLAUDE.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
This is a fork of Apache Arrow DataFusion, an extensible query execution framework written in Rust that uses Apache Arrow as its in-memory format. This fork is maintained by Cube and includes custom extensions and optimizations.
8+
9+
## Key Commands
10+
11+
### Building
12+
```bash
13+
cargo build # Build the project
14+
cargo build --release # Build with optimizations
15+
cargo build -p datafusion # Build specific package
16+
```
17+
18+
### Testing
19+
```bash
20+
# Setup test data (required before first test run)
21+
git submodule init
22+
git submodule update
23+
export PARQUET_TEST_DATA=$(pwd)/parquet-testing/data/
24+
export ARROW_TEST_DATA=$(pwd)/testing/data/
25+
26+
# Run tests
27+
cargo test # Run all tests
28+
cargo test -p datafusion # Test specific package
29+
cargo test test_name # Run specific test
30+
cargo test -- --nocapture # Show println! output during tests
31+
```
32+
33+
### Formatting and Linting
34+
```bash
35+
cargo fmt # Format code
36+
cargo fmt --check # Check formatting without changes
37+
cargo clippy # Run linter
38+
```
39+
40+
### Benchmarks
41+
```bash
42+
cargo bench # Run benchmarks
43+
cargo bench -p datafusion # Run datafusion benchmarks
44+
```
45+
46+
## Architecture Overview
47+
48+
### Core Components
49+
50+
1. **Logical Planning** (`datafusion/src/logical_plan/`)
51+
- `LogicalPlan`: Represents logical query plans (SELECT, JOIN, etc.)
52+
- `Expr`: Expression trees for filters, projections, aggregations
53+
- `DFSchema`: Schema representation with field metadata
54+
- SQL parsing and planning in `datafusion/src/sql/`
55+
56+
2. **Physical Planning** (`datafusion/src/physical_plan/`)
57+
- `ExecutionPlan`: Physical execution operators
58+
- Expression implementations for actual computation
59+
- Aggregate functions with `Accumulator` trait
60+
- Custom operators for hash joins, sorts, aggregations
61+
62+
3. **Execution** (`datafusion/src/execution/`)
63+
- `ExecutionContext`: Main entry point for query execution
64+
- DataFrame API for programmatic query building
65+
- Manages memory, concurrency, and resource limits
66+
67+
4. **Optimizer** (`datafusion/src/optimizer/`)
68+
- Rule-based optimizer with passes like:
69+
- Predicate pushdown
70+
- Projection pushdown
71+
- Constant folding
72+
- Join reordering
73+
74+
5. **Cube Extensions** (`datafusion/src/cube_ext/`)
75+
- Custom operators and functions specific to Cube's fork
76+
- Performance optimizations including:
77+
- `GroupsAccumulator` for efficient grouped aggregation
78+
- `GroupsAccumulatorFlatAdapter` for flattened group values
79+
- Specialized join and aggregation implementations
80+
81+
### Key Design Patterns
82+
83+
- **Visitor Pattern**: Used extensively for traversing and transforming plans
84+
- **Async/Await**: All execution is async using Tokio runtime
85+
- **Arrow Arrays**: All data processing uses Arrow columnar format
86+
- **Stream Processing**: Results are produced as async streams of RecordBatches
87+
88+
### Adding New Functionality
89+
90+
**Scalar Functions**: Implement in appropriate module under `physical_plan/`, register in `physical_plan/functions.rs`
91+
92+
**Aggregate Functions**: Create `Accumulator` implementation, register in `physical_plan/aggregates.rs`
93+
94+
**Optimizer Rules**: Implement `OptimizerRule` trait, add to optimizer pipeline
95+
96+
**Physical Operators**: Implement `ExecutionPlan` trait with proper partitioning and execution
97+
98+
## Important Notes
99+
100+
- This is a Cube fork with custom modifications - Ballista components are disabled
101+
- The `cube_ext` module contains Cube-specific extensions and optimizations
102+
- Performance-critical paths often have specialized implementations for primitive types
103+
- Always run tests with proper test data environment variables set

datafusion/src/physical_plan/planner.rs

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -249,12 +249,14 @@ pub trait ExtensionPlanner {
249249
/// Default single node physical query planner that converts a
250250
/// `LogicalPlan` to an `ExecutionPlan` suitable for execution.
251251
pub struct DefaultPhysicalPlanner {
252+
should_evaluate_constants: bool,
252253
extension_planners: Vec<Arc<dyn ExtensionPlanner + Send + Sync>>,
253254
}
254255

255256
impl Default for DefaultPhysicalPlanner {
256257
fn default() -> Self {
257258
Self {
259+
should_evaluate_constants: true,
258260
extension_planners: vec![
259261
Arc::new(LogicalAliasPlanner {}),
260262
Arc::new(CrossJoinPlanner {}),
@@ -265,6 +267,15 @@ impl Default for DefaultPhysicalPlanner {
265267
}
266268
}
267269

270+
impl DefaultPhysicalPlanner {
271+
pub fn disable_constant_evaluation(self) -> Self {
272+
let mut mv = self;
273+
mv.should_evaluate_constants = false;
274+
275+
mv
276+
}
277+
}
278+
268279
impl PhysicalPlanner for DefaultPhysicalPlanner {
269280
/// Create a physical plan from a logical plan
270281
fn create_physical_plan(
@@ -1360,7 +1371,7 @@ impl DefaultPhysicalPlanner {
13601371
res_expr: Arc<dyn PhysicalExpr>,
13611372
inputs: Vec<Arc<dyn PhysicalExpr>>,
13621373
) -> Result<Arc<dyn PhysicalExpr>> {
1363-
if inputs
1374+
if self.should_evaluate_constants && inputs
13641375
.iter()
13651376
.all(|i| i.as_any().downcast_ref::<Literal>().is_some())
13661377
{

0 commit comments

Comments
 (0)