feat: support generating the groundtruth and evaluate on it by kemingy · Pull Request #68 · tensorchord/vechord

kemingy · 2025-08-18T10:05:47Z

It's hard to know the user's retrieval methods. Thus, it has to be passed as an argument.

It's the user's responsibility to guarantee that the source table has no updates. This is also due to the limitation of retrieval methods. It's much easier for users to work directly on their Table than to wrap a snapshot into their retrieval logic.

Signed-off-by: Keming <kemingyang@tensorchord.ai>

Copilot

Pull Request Overview

This PR adds support for generating groundtruth data and evaluating retrieval methods on it, consolidating embedding models under a unified API and adding utility functions for type checking.

Adds a GroundTruth class to generate evaluation datasets from retrieval results and measure performance metrics
Consolidates embedding models by unifying BaseTextEmbedding and BaseMultiModalEmbedding into a single BaseEmbedding class
Adds utility functions for type checking iterables and extracting iterator types

Reviewed Changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
vechord/utils.py	Adds utility functions for type checking lists/iterables and extracting nested types
vechord/registry.py	Removes duplicate type checking functions, imports them from utils module
vechord/pipeline.py	Updates to use unified embedding API and adds transaction context management
vechord/model/web.py	Updates documentation to clarify optional steps parameter
vechord/groundtruth.py	New module implementing groundtruth generation and evaluation functionality
vechord/evaluate.py	Updates to support list-based truth IDs and InputType enum usage
vechord/embedding.py	Consolidates embedding classes into unified BaseEmbedding interface
tests/test_run.py	Adds tests for running pipelines with spacy embedding
tests/test_groundtruth.py	Adds tests for groundtruth generation and evaluation
tests/conftest.py	Fixes test fixture parameter handling

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

vechord/utils.py

vechord/evaluate.py

Signed-off-by: Keming <kemingyang@tensorchord.ai>

feat: support generating the groundtruth and evaluate on it

e16ac4c

Signed-off-by: Keming <kemingyang@tensorchord.ai>

kemingy requested a review from Copilot August 18, 2025 10:05

Copilot AI reviewed Aug 18, 2025

View reviewed changes

vechord/utils.py Outdated Show resolved Hide resolved

vechord/evaluate.py Outdated Show resolved Hide resolved

kemingy added 4 commits August 18, 2025 18:11

fix ci test

734b81d

Signed-off-by: Keming <kemingyang@tensorchord.ai>

add docs

f22f5fe

Signed-off-by: Keming <kemingyang@tensorchord.ai>

add example to docs

d2d476d

Signed-off-by: Keming <kemingyang@tensorchord.ai>

update doc

95470d1

Signed-off-by: Keming <kemingyang@tensorchord.ai>

kemingy merged commit 8bfd7cc into tensorchord:main Aug 20, 2025
7 checks passed

kemingy deleted the groundtruth branch August 20, 2025 03:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support generating the groundtruth and evaluate on it#68

feat: support generating the groundtruth and evaluate on it#68
kemingy merged 5 commits intotensorchord:mainfrom
kemingy:groundtruth

kemingy commented Aug 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kemingy commented Aug 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants