Skip to content

Conversation

@diegoimbert
Copy link
Contributor

@diegoimbert diegoimbert commented Nov 4, 2025

Align with behavior of other SQL languages
When only one statement : return array of rows
When multiple statements : return array of array of rows

This will break existing scripts

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Nov 4, 2025

Deploying windmill with  Cloudflare Pages  Cloudflare Pages

Latest commit: ae188f5
Status: ✅  Deploy successful!
Preview URL: https://d93bd2a3.windmill.pages.dev
Branch Preview URL: https://di-duckdb-return-array-of-re.windmill.pages.dev

View logs

@diegoimbert diegoimbert marked this pull request as ready for review November 4, 2025 00:27
@claude
Copy link
Contributor

claude bot commented Nov 4, 2025

Claude finished @diegoimbert's task —— View job


Code Review Complete

Todo List:

  • Read the modified file and understand the changes
  • Review code quality and adherence to Rust best practices
  • Check for potential bugs and edge cases
  • Assess performance implications
  • Evaluate security considerations
  • Consider breaking change impact
  • Provide comprehensive feedback
  • Add reproduction instructions

Summary

This PR changes the DuckDB FFI return format to align with other SQL languages in Windmill:

  • Single statement: Returns [row1, row2, ...] (array of rows)
  • Multiple statements: Returns [[rows_from_stmt1], [rows_from_stmt2], ...] (array of arrays)

Overall Assessment: The implementation is correct and follows good Rust practices. However, there are important considerations around the breaking change and one potential bug.


Critical Issues

1. Breaking Change - Missing Migration Path ⚠️

Severity: High (User Impact)

This PR introduces a breaking change that will affect all existing DuckDB scripts. While the PR description mentions this, there's no migration strategy provided.

Impact:

  • Existing scripts expecting {result: [...]} or similar will break
  • Users will need to update all their DuckDB scripts manually

Recommendations:

  1. Document the breaking change clearly in release notes
  2. Consider a gradual migration:
    • Add a feature flag or configuration option to maintain backward compatibility temporarily
    • Provide a migration tool or script to help users update existing scripts
    • Add deprecation warnings before fully removing old behavior
  3. Update any relevant documentation or examples in the codebase

Bugs & Potential Issues

2. Unwrap on Empty Results Could Panic 🐛

Location: backend/windmill-duckdb-ffi-internal/src/lib.rs:207

Issue:

let results = if results.len() == 1 {
    serde_json::value::to_raw_value(&results.get(0).unwrap())  // ← Potential panic
    //                                                ^^^^^^^^

Problem: While results.get(0) should be safe when results.len() == 1, using .unwrap() is not idiomatic. If the logic is ever refactored or if there's an edge case, this could panic.

Better approach:

let results = if results.len() == 1 {
    serde_json::value::to_raw_value(results.get(0).expect("results.len() == 1"))
    // OR even better:
    serde_json::value::to_raw_value(&results[0])
} else {
    serde_json::value::to_raw_value(&results)
}
.map_err(|e| e.to_string())?;

Using &results[0] is cleaner since we've already checked the length, or use .expect() with a message to make the invariant explicit.

3. Empty Query Blocks Behavior

Question: What happens when query_block_list_count is 0?

Looking at the code:

  • results would be an empty Vec
  • The condition results.len() == 1 would be false
  • We'd serialize an empty array: []

This seems reasonable, but it's worth confirming this is the intended behavior. Previously, the code had:

result.unwrap_or_else(|| RawValue::from_string("[]".to_string()).unwrap())

So the behavior is consistent (empty array for no queries), which is good.


Code Quality & Best Practices

4. Good: Improved Code Structure ✅

The new implementation is cleaner:

  • Collects all results upfront rather than keeping just the last one
  • More functional style with results.push(result)
  • Clear separation of concerns

5. Good: Consistent Error Handling ✅

Error handling remains consistent with .map_err(|e| e.to_string())? pattern used throughout the file.

6. Minor: Variable Naming

The old code used result (singular) and the new code uses results (plural). This is good and more accurate. However, there's a variable shadowing issue:

for (query_block_index, query_block) in query_block_list.enumerate() {
    let result = do_duckdb_inner(...)  // Local variable
    results.push(result);
}
let results = if results.len() == 1 {  // Shadows the vector
    //...
}

While this is valid Rust, it could be clearer to use a different name for the final value:

let final_result = if results.len() == 1 {
    serde_json::value::to_raw_value(&results[0])
} else {
    serde_json::value::to_raw_value(&results)
}
.map_err(|e| e.to_string())?;
Ok((final_result.get().to_string(), column_order))

Performance Considerations

7. Performance: Additional Allocation

Impact: Minor

The new code allocates a Vec<Box<RawValue>> to store all results, whereas the old code only kept the last result in an Option<Box<RawValue>>.

Analysis:

  • For single-statement queries (the common case), this adds minimal overhead
  • For multi-statement queries, this is necessary for the new behavior
  • The memory overhead is proportional to the number of statements, not the number of rows (each statement's rows are already serialized to RawValue)

Conclusion: The performance impact is negligible for typical use cases.

8. Performance: Double Serialization for Single Statement

Issue: For single-statement queries, we now serialize twice:

  1. do_duckdb_inner returns a Box<RawValue> (already serialized rows)
  2. We then call to_raw_value(&results.get(0).unwrap()) which re-serializes it

Impact: This creates an unnecessary allocation and serialization pass for the common single-statement case.

Better approach:

let results = if results.len() == 1 {
    results.into_iter().next().unwrap()  // Just take ownership, no re-serialization
} else {
    serde_json::value::to_raw_value(&results).map_err(|e| e.to_string())?
};
Ok((results.get().to_string(), column_order))

This avoids the double serialization by just extracting the single Box<RawValue> directly.


Security Considerations

9. Security: No New Concerns ✅

The changes don't introduce any new security issues:

  • Input validation remains the same
  • SQL injection prevention through parameterized queries is unchanged
  • No new unsafe code introduced

Testing Recommendations

10. Test Coverage Needed

This breaking change should have comprehensive tests covering:

  1. Single statement returning rows:

    SELECT * FROM users WHERE id = 1;

    Expected: [{id: 1, name: "Alice"}, ...]

  2. Single statement returning no rows:

    SELECT * FROM users WHERE 1 = 0;

    Expected: []

  3. Multiple statements:

    SELECT * FROM table1;
    SELECT * FROM table2;

    Expected: [[{...}, ...], [{...}, ...]]

  4. Multiple statements with one returning no rows:
    Expected: [[], [{...}]]

  5. Empty query list (edge case):
    Expected: []


Documentation

11. Update Documentation

Please ensure the following are updated:

  • API documentation for the DuckDB FFI
  • Migration guide for users updating from the old format
  • Example scripts showing the new return format
  • Release notes with clear breaking change warnings

Recommended Changes

High Priority:

  1. Fix the .unwrap() → Use .expect() with message or direct indexing (line 207)
  2. Optimize single-statement case to avoid double serialization
  3. Add comprehensive tests for the new behavior
  4. Document the breaking change and provide migration guidance

Medium Priority:

  1. Consider renaming the final results variable to avoid shadowing
  2. Add inline comments explaining the single vs. multiple statement logic

Suggested Code Improvement:

let final_result = if results.len() == 1 {
    // Single statement: return the array of rows directly (unwrap the outer array)
    results.into_iter().next().expect("checked results.len() == 1")
} else {
    // Multiple statements: return array of arrays
    serde_json::value::to_raw_value(&results).map_err(|e| e.to_string())?
};
Ok((final_result.get().to_string(), column_order))

Testing Instructions

To verify these changes work correctly, navigate to the Windmill workspace and create a new DuckDB script.

For single statement behavior: Create a DuckDB script with a simple SELECT query like SELECT 1 as num, 'test' as text; and execute it. The result should be an array of row objects [{num: 1, text: 'test'}] rather than being wrapped in an additional structure.

For multiple statements: Create a DuckDB script with multiple SELECT statements separated by semicolons, such as SELECT 1 as first; SELECT 2 as second;. The result should be an array containing two arrays: [[{first: 1}], [{second: 2}]].

For backwards compatibility verification: Take any existing DuckDB script that was working before this change and run it - it will likely fail or return data in a different format, confirming the breaking change. You should verify that the new format aligns with how PostgreSQL, MySQL, and other SQL languages return their results in Windmill.


@diegoimbert diegoimbert marked this pull request as draft November 6, 2025 07:43
@diegoimbert diegoimbert changed the title Return array of results duckdb SQL result_collection flag Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants