-
Notifications
You must be signed in to change notification settings - Fork 226
feat: collector automatically merge and align multiple collect() called with different schema #1153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
77d959d
to
fa29eac
Compare
fa29eac
to
c1c43fc
Compare
…ed with different schema
c1c43fc
to
531b594
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @skalwaghe-56, sorry for the late reply! I missed it somehow.
// Prioritize UUID fields by placing them at the beginning for efficiency | ||
fields.sort_by(|a, b| { | ||
let a_is_uuid = matches!(a.value_type.typ, ValueType::Basic(BasicValueType::Uuid)); | ||
let b_is_uuid = matches!(b.value_type.typ, ValueType::Basic(BasicValueType::Uuid)); | ||
|
||
match (a_is_uuid, b_is_uuid) { | ||
(true, false) => std::cmp::Ordering::Less, // UUID fields first | ||
(false, true) => std::cmp::Ordering::Greater, // UUID fields first | ||
_ => a.name.cmp(&b.name), // Then alphabetical | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite like sorting fields. I think we want to preserve the original order if possible. We only need to add one restriction - if the same field appears on both, they must have consistent in ordering (otherwise raise an error). Then we can merge them without changing the order.
One possible merging approach is (pseudo code, only to show the gist):
let mut output_fields = vec![];
let next_field_id_1 = 0;
let next_field_id_2 = 0;
for (idx, field) in schema2.iter().enumerate() {
if Some(idx1) = field index in schema1 {
if (idx1 < next_field_id_1) {
api_bail!("order mismatch...");
}
output_fields.extend(schema1.fields[next_field_id_1..idx1]);
output_fields.extend(schema2.fields[next_field_id_2..idx]);
output_fields.push(merged field);
next_field_id_1 = idx1 + 1;
next_field_id_2 = idx + 1;
} else if field is uuid {
// For UUID, emit it immediately to make sure it still appears first
....
}
}
next_field_id_1.extend(schema1.fields[next_field_id_1..]);
next_field_id_2.extend(schema2.fields[next_field_id_2..]);
Implemented automatic schema merging for collectors that:
Closes #428.