Rewrite binary vector example #422

yhmo · 2025-12-31T04:00:38Z

Core invariant: ConvertToBinaryVector and ConvertToBoolArray use the same bit ordering (0th bool → bit 0, …, 7th → bit 7) so binary vectors round-trip exactly; the example relies on this invariant for correct encode/decode and verification (functions at top of file).
Logic removed/simplified: replaced ad-hoc string primary keys and previous raw byte generation with deterministic packing and INT64 primary keys — schema now declares pk as INT64, column inserts use Int64FieldData, row-based inserts set row[pk]=i and row[vector]=ConvertToBinaryVector(bools) (insert paths around the insert blocks).
Why this does NOT introduce data loss or regression: every inserted binary vector (row-based path) is encoded with ConvertToBinaryVector and later decoded with ConvertToBoolArray during query result processing; the code compares decoded bools against the original bools_array by id and exits on mismatch, ensuring detection of any corruption (query/result handling and validation in the Query result loop).
Enhancement/refactor: improves clarity and determinism — explicit binary packing/validation, numeric IN-filter construction for queries, and consistent INT64 PK handling replace redundant string-based PKs and non-deterministic vector generation, reducing memory/alloc overhead and making round-trip behavior explicit.

sre-ci-robot · 2025-12-31T04:00:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yhmo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [yhmo]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2025-12-31T04:04:07Z

Walkthrough

The PR modifies examples/src/v2/vector_binary.cpp: the ID field schema is changed from VARCHAR to INT64; column- and row-based insertions are updated to use int64 IDs and Int64FieldData; query construction and ID extraction use numeric IDs. Two anonymous-namespace helpers, ConvertToBinaryVector and ConvertToBoolArray, were added to pack/unpack vector to/from vector<uint8_t> and are used to store and validate the binary-vector field. Query result processing decodes binary vectors back to booleans and compares them to the original per-row booleans, emitting diagnostic output on mismatches.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Rewrite binary vector example' clearly and concisely summarizes the main change—rewriting the binary vector example file. It is specific, directly related to the changeset, and accurately reflects the primary objective of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

examples/src/v2/vector_binary.cpp (1)
176-190: Consider using the 4-parameter version of std::equal for safer comparison.

The verification logic works correctly for the current dimension=128. However, the 3-parameter std::equal call on line 182 assumes original_bools is at least as long as bools. If dimension were ever changed to a non-multiple of 8, ConvertToBoolArray would return extra padding bits, making bools longer than original_bools, which could lead to undefined behavior.
🔎 Proposed fix for more robust comparison
-            if (!std::equal(bools.begin(), bools.end(), original_bools.begin())) {
+            // Only compare the first 'dimension' elements to ignore any padding bits
+            if (bools.size() < dimension || 
+                !std::equal(bools.begin(), bools.begin() + dimension, original_bools.begin(), original_bools.end())) {
                 std::cout << "Output vector is not equal to the original!" << std::endl;
Alternatively, truncate bools to dimension before comparison:
             auto bools = ConvertToBoolArray(binary);
+            bools.resize(dimension);  // Truncate to dimension to remove padding bits
             auto id = row[field_id].get<int64_t>();

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6f74ef and 9c48d7e.

📒 Files selected for processing (1)

examples/src/v2/vector_binary.cpp

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build and test AMD64 Fedora 39
GitHub Check: Test all with coverage (22.04)
GitHub Check: Build and test windows
GitHub Check: Summary

🔇 Additional comments (5)

examples/src/v2/vector_binary.cpp (5)

82-82: LGTM!

Changing the primary key from VARCHAR to INT64 simplifies the example and aligns with typical numeric ID use cases.

105-111: LGTM!

The column-based insertion correctly uses INT64 field data and int64_t IDs, consistent with the schema change.

121-126: LGTM!

Good approach to store the original boolean vectors for verification. The code works correctly since dimension=128 is a multiple of 8.

133-135: LGTM!

Row-based insertion correctly uses int64_t IDs and the ConvertToBinaryVector utility to encode boolean vectors.

153-155: LGTM!

Query filter correctly retrieves and uses int64_t IDs with numeric literal formatting.

examples/src/v2/vector_binary.cpp

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

examples/src/v2/vector_binary.cpp (1)

24-62: Fix typos in comments (previously flagged).

The typos mentioned in the previous review are still present:

Line 30: "euqal" → "equal"

Line 31: "8X of dimension" is misleading (should clarify it's dimension/8 rounded up)

Line 48: "fromt" → "from"

🧹 Nitpick comments (1)

examples/src/v2/vector_binary.cpp (1)
50-60: Consider adding a dimension parameter for robustness.

ConvertToBoolArray always returns binary.size() * 8 booleans. If the vector dimension is not a multiple of 8, the returned vector will include padding bits that weren't part of the original data. While this works correctly with dimension=128, adding a dimension parameter to trim the result would make the function more robust for other use cases.
🔎 Suggested enhancement
 std::vector<bool>
-ConvertToBoolArray(const std::vector<uint8_t>& binary) {
+ConvertToBoolArray(const std::vector<uint8_t>& binary, size_t dimension) {
     std::vector<bool> bits;
-    bits.reserve(binary.size() * 8);
+    bits.reserve(dimension);
     for (uint8_t byte : binary) {
         for (int i = 0; i < 8; i++) {
+            if (bits.size() >= dimension) break;
             bool bit_is_set = (byte >> i) & 1;
             bits.push_back(bit_is_set);
         }
+        if (bits.size() >= dimension) break;
     }
     return bits;
 }
Then update the call site at line 180:
-auto bools = ConvertToBoolArray(binary);
+auto bools = ConvertToBoolArray(binary, dimension);

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c48d7e and 9866171.

📒 Files selected for processing (1)

examples/src/v2/vector_binary.cpp

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Summary

🔇 Additional comments (5)

examples/src/v2/vector_binary.cpp (5)

82-82: LGTM: ID field type change to INT64.

Changing the primary key from VARCHAR to INT64 is a good improvement. Numeric IDs are more efficient and semantically appropriate for this use case.

105-111: LGTM: Column-based insertion updated correctly.

The changes to use int64_t IDs and Int64FieldData are correct and consistent with the schema change.

121-139: LGTM: Row-based insertion and validation preparation.

The pre-generation of original boolean vectors in bools_array for later validation is a good practice. The row data construction correctly uses int64_t IDs and the ConvertToBinaryVector helper.

154-156: LGTM: Query filter construction updated correctly.

The query filter correctly uses int64_t IDs extracted from the row data and constructs a numeric IN clause.

178-190: LGTM: Validation logic is well-designed.

The validation logic correctly:

Decodes binary vectors back to booleans using ConvertToBoolArray

Retrieves the row ID as int64_t

Compares decoded vectors against the original stored vectors

Provides clear diagnostic output on mismatch

The code correctly assumes that query results contain only row-based insert data (IDs 0–9), which matches the constructed filter.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9866171 and 3bff387.

📒 Files selected for processing (1)

examples/src/v2/vector_binary.cpp

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Summary

🔇 Additional comments (4)

examples/src/v2/vector_binary.cpp (4)

82-82: LGTM! Schema change is consistent.

The ID field type change from VARCHAR to INT64 is intentional and all related code has been updated accordingly.

105-111: LGTM! Column-based insertion updated correctly.

The ID data type and field data type have been updated consistently with the schema change.

121-128: LGTM! Ground truth storage added.

Storing the original boolean vectors for later validation is a good practice for this example.

130-149: LGTM! Row-based insertion updated correctly.

The ID type change to int64_t and the use of ConvertToBinaryVector are both correct and consistent with the schema.

coderabbitai · 2025-12-31T04:30:07Z

examples/src/v2/vector_binary.cpp

+            std::cout << "\tRow: " << row << std::endl;
+            auto binary = row[field_vector].get<std::vector<uint8_t>>();
+            auto bools = ConvertToBoolArray(binary);
+            auto id = row[field_id].get<int64_t>();
+            auto original_bools = bools_array.at(id);
+            if (!std::equal(bools.begin(), bools.end(), original_bools.begin())) {
+                std::cout << "Output vector is not equal to the original!" << std::endl;
+                std::cout << "\tOutput vector: ";
+                util::PrintList(bools);
+                std::cout << "\tOriginal vector: ";
+                util::PrintList(original_bools);
+                exit(1);
+            }
        }


⚠️ Potential issue | 🟡 Minor

Consider defensive validation for robustness.

The validation logic has two minor issues that could cause crashes if the code is modified or unexpected data is returned:

Line 183: bools_array.at(id) assumes id is in range [0, 9]. While the query filter ensures only row-based IDs are returned, an unexpected ID would throw std::out_of_range. Consider adding a bounds check.

Line 184: The 2-parameter std::equal doesn't verify that both ranges have the same size. If sizes differ (e.g., due to unexpected binary vector size from server), this could cause undefined behavior or incorrect validation. Use the 4-parameter version to compare sizes: std::equal(bools.begin(), bools.end(), original_bools.begin(), original_bools.end()).

🔎 Proposed defensive validation

for (const auto& row : output_rows) { std::cout << "\tRow: " << row << std::endl; auto binary = row[field_vector].get<std::vector<uint8_t>>(); auto bools = ConvertToBoolArray(binary); auto id = row[field_id].get<int64_t>(); + if (id < 0 || id >= static_cast<int64_t>(bools_array.size())) { + std::cout << "Unexpected ID " << id << " outside valid range [0, " << bools_array.size() << ")" << std::endl; + exit(1); + } auto original_bools = bools_array.at(id); - if (!std::equal(bools.begin(), bools.end(), original_bools.begin())) { + if (!std::equal(bools.begin(), bools.end(), original_bools.begin(), original_bools.end())) { std::cout << "Output vector is not equal to the original!" << std::endl; std::cout << "\tOutput vector: "; util::PrintList(bools); std::cout << "\tOriginal vector: "; util::PrintList(original_bools); exit(1); } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

std::cout << "\tRow: " << row << std::endl;

auto binary = row[field_vector].get<std::vector<uint8_t>>();

auto bools = ConvertToBoolArray(binary);

auto id = row[field_id].get<int64_t>();

auto original_bools = bools_array.at(id);

if (!std::equal(bools.begin(), bools.end(), original_bools.begin())) {

std::cout << "Output vector is not equal to the original!" << std::endl;

std::cout << "\tOutput vector: ";

util::PrintList(bools);

std::cout << "\tOriginal vector: ";

util::PrintList(original_bools);

exit(1);

}

}

std::cout << "\tRow: " << row << std::endl;

auto binary = row[field_vector].get<std::vector<uint8_t>>();

auto bools = ConvertToBoolArray(binary);

auto id = row[field_id].get<int64_t>();

if (id < 0 || id >= static_cast<int64_t>(bools_array.size())) {

std::cout << "Unexpected ID " << id << " outside valid range [0, " << bools_array.size() << ")" << std::endl;

exit(1);

}

auto original_bools = bools_array.at(id);

if (!std::equal(bools.begin(), bools.end(), original_bools.begin(), original_bools.end())) {

std::cout << "Output vector is not equal to the original!" << std::endl;

std::cout << "\tOutput vector: ";

util::PrintList(bools);

std::cout << "\tOriginal vector: ";

util::PrintList(original_bools);

exit(1);

}

}

🤖 Prompt for AI Agents

In examples/src/v2/vector_binary.cpp around lines 179 to 192, the validation assumes id is in-range and that the two vectors are the same length; add a defensive bounds check before accessing bools_array.at(id) (verify 0 <= id < bools_array.size() and log/exit on failure), and ensure you compare sizes before equality (check bools.size() == original_bools.size() and handle/log a size mismatch), otherwise use the 4-parameter std::equal that considers both ranges; include clear diagnostic logs when either check fails and exit with nonzero status.

codecov · 2025-12-31T04:45:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 45.12%. Comparing base (a0592aa) to head (92f99cb).
⚠️ Report is 70 commits behind head on master.

❌ Your project status has failed because the head coverage (45.12%) is below the target coverage (90.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #422      +/-   ##
==========================================
- Coverage   53.47%   45.12%   -8.36%     
==========================================
  Files          52      244     +192     
  Lines        4432    12491    +8059     
  Branches        0     1219    +1219     
==========================================
+ Hits         2370     5636    +3266     
- Misses       2062     6762    +4700     
- Partials        0       93      +93

see 261 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: yhmo <[email protected]>

sre-ci-robot requested a review from matrixji December 31, 2025 04:00

sre-ci-robot requested a review from xiaofan-luan December 31, 2025 04:00

sre-ci-robot added approved size/M labels Dec 31, 2025

mergify bot added the dco-passed label Dec 31, 2025

coderabbitai bot reviewed Dec 31, 2025

View reviewed changes

examples/src/v2/vector_binary.cpp Show resolved Hide resolved

yhmo force-pushed the ma branch from 9c48d7e to 9866171 Compare December 31, 2025 04:10

coderabbitai bot reviewed Dec 31, 2025

View reviewed changes

yhmo force-pushed the ma branch from 9866171 to 3bff387 Compare December 31, 2025 04:26

coderabbitai bot reviewed Dec 31, 2025

View reviewed changes

mergify bot added the ci-passed label Dec 31, 2025

Rewrite binary vector example

92f99cb

Signed-off-by: yhmo <[email protected]>

yhmo force-pushed the ma branch from 3bff387 to 92f99cb Compare January 4, 2026 02:08

yhmo added the lgtm label Jan 4, 2026

sre-ci-robot merged commit 2e3e7bf into milvus-io:master Jan 4, 2026
10 checks passed

yhmo deleted the ma branch January 4, 2026 02:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rewrite binary vector example #422

Rewrite binary vector example #422

Uh oh!

yhmo commented Dec 31, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

sre-ci-robot commented Dec 31, 2025

Uh oh!

coderabbitai bot commented Dec 31, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 31, 2025

Uh oh!

codecov bot commented Dec 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rewrite binary vector example #422

Rewrite binary vector example #422

Uh oh!

Conversation

yhmo commented Dec 31, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sre-ci-robot commented Dec 31, 2025

Uh oh!

coderabbitai bot commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yhmo commented Dec 31, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 31, 2025 •

edited

Loading

codecov bot commented Dec 31, 2025 •

edited

Loading