feat: collector automatically merge and align multiple collect() called with different schema #1153

skalwaghe-56 · 2025-10-06T12:29:53Z

Implemented automatic schema merging for collectors that:

Unions fields from multiple collect() calls instead of requiring identical schemas
Fills missing fields with null values during execution
Maintains consistent field ordering (alphabetically sorted by field name)
Preserves auto-generated UUID fields across merged schemas

Closes #428.

src/execution/evaluator.rs

…ed with different schema

georgeh0

Hi @skalwaghe-56, sorry for the late reply! I missed it somehow.

georgeh0 · 2025-10-21T05:31:25Z

src/builder/analyzer.rs

+    // Prioritize UUID fields by placing them at the beginning for efficiency
+    fields.sort_by(|a, b| {
+        let a_is_uuid = matches!(a.value_type.typ, ValueType::Basic(BasicValueType::Uuid));
+        let b_is_uuid = matches!(b.value_type.typ, ValueType::Basic(BasicValueType::Uuid));
+
+        match (a_is_uuid, b_is_uuid) {
+            (true, false) => std::cmp::Ordering::Less, // UUID fields first
+            (false, true) => std::cmp::Ordering::Greater, // UUID fields first
+            _ => a.name.cmp(&b.name),                  // Then alphabetical
+        }
+    });


I don't quite like sorting fields. I think we want to preserve the original order if possible. We only need to add one restriction - if the same field appears on both, they must have consistent in ordering (otherwise raise an error). Then we can merge them without changing the order.

One possible merging approach is (pseudo code, only to show the gist):

let mut output_fields = vec![]; let next_field_id_1 = 0; let next_field_id_2 = 0; for (idx, field) in schema2.iter().enumerate() { if Some(idx1) = field index in schema1 { if (idx1 < next_field_id_1) { api_bail!("order mismatch..."); } output_fields.extend(schema1.fields[next_field_id_1..idx1]); output_fields.extend(schema2.fields[next_field_id_2..idx]); output_fields.push(merged field); next_field_id_1 = idx1 + 1; next_field_id_2 = idx + 1; } else if field is uuid { // For UUID, emit it immediately to make sure it still appears first .... } } next_field_id_1.extend(schema1.fields[next_field_id_1..]); next_field_id_2.extend(schema2.fields[next_field_id_2..]);

skalwaghe-56 force-pushed the fix-issue-428 branch from 77d959d to fa29eac Compare October 6, 2025 12:39

skalwaghe-56 mentioned this pull request Oct 6, 2025

[FEATURE] collector automatically merge and align multiple collect() called with different schema #428

Open

georgeh0 reviewed Oct 7, 2025

View reviewed changes

src/execution/evaluator.rs Outdated Show resolved Hide resolved

src/execution/evaluator.rs Show resolved Hide resolved

skalwaghe-56 force-pushed the fix-issue-428 branch from fa29eac to c1c43fc Compare October 7, 2025 11:09

feat: collector automatically merge and align multiple collect() call…

531b594

…ed with different schema

skalwaghe-56 force-pushed the fix-issue-428 branch from c1c43fc to 531b594 Compare October 7, 2025 11:17

skalwaghe-56 requested a review from georgeh0 October 7, 2025 11:18

georgeh0 reviewed Oct 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: collector automatically merge and align multiple collect() called with different schema #1153

feat: collector automatically merge and align multiple collect() called with different schema #1153

skalwaghe-56 commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

georgeh0 left a comment

Uh oh!

georgeh0 Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: collector automatically merge and align multiple collect() called with different schema #1153

Are you sure you want to change the base?

feat: collector automatically merge and align multiple collect() called with different schema #1153

Conversation

skalwaghe-56 commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

georgeh0 left a comment

Choose a reason for hiding this comment

Uh oh!

georgeh0 Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants