Skip to content

feat: reuse doc/chunk/emb results, augment chunks, web service#6

Merged
kemingy merged 7 commits intomainfrom
augment
Feb 10, 2025
Merged

feat: reuse doc/chunk/emb results, augment chunks, web service#6
kemingy merged 7 commits intomainfrom
augment

Conversation

@kemingy
Copy link
Member

@kemingy kemingy commented Jan 31, 2025

  • chunk augmentation for context & query
  • doc summarization
  • more semantic chunker
  • prepare for dense/sparse/keyword embedding (not supported by vectorchord)]
  • reuse the doc/chunk/emb
  • augment with chunk-context/chunk-query/doc-summary
  • web service

Signed-off-by: Keming <kemingyang@tensorchord.ai>
@kemingy kemingy requested a review from Copilot January 31, 2025 10:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 6 out of 10 changed files in this pull request and generated no comments.

Files not reviewed (4)
  • vechord/config.py: Evaluated as low risk
  • vechord/extract.py: Evaluated as low risk
  • vechord/pipeline.py: Evaluated as low risk
  • test.py: Evaluated as low risk
Comments suppressed due to low confidence (4)

vechord/augment.py:49

  • The method name 'augment_chunks' should be renamed to 'augment_context' to match the abstract method name in 'BaseAugmenter'.
def augment_chunks(self, chunks: list[str]) -> list[str]:

vechord/augment.py:59

  • The method name 'augment_queries' should be renamed to 'augment_query' to match the abstract method name in 'BaseAugmenter'.
def augment_queries(self, chunks: list[str]) -> list[str]:

vechord/embedding.py:18

  • [nitpick] The method name 'vectorize_doc' might be confusing. Consider renaming it to 'vectorize' for consistency with the original method name.
def vectorize_doc(self, text: str) -> np.ndarray:

vechord/embedding.py:21

  • Ensure that the new method 'vectorize_query' is covered by tests to validate its functionality.
def vectorize_query(self, text: str) -> np.ndarray:

Signed-off-by: Keming <kemingyang@tensorchord.ai>
Signed-off-by: Keming <kemingyang@tensorchord.ai>
@kemingy kemingy requested a review from Copilot January 31, 2025 10:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 6 out of 10 changed files in this pull request and generated no comments.

Files not reviewed (4)
  • vechord/config.py: Evaluated as low risk
  • vechord/extract.py: Evaluated as low risk
  • vechord/pipeline.py: Evaluated as low risk
  • test.py: Evaluated as low risk

Signed-off-by: Keming <kemingyang@tensorchord.ai>
@kemingy kemingy changed the title feat: chunk augmentation feat: reuse doc/chunk/emb results, augment chunks, web service Feb 4, 2025
Signed-off-by: Keming <kemingyang@tensorchord.ai>
@VoVAllen
Copy link
Member

VoVAllen commented Feb 6, 2025

What does current interface look like?

@kemingy
Copy link
Member Author

kemingy commented Feb 6, 2025

What does current interface look like?

Almost the same as before. Although the interface can support dense/sparse/keywords, vectorchord doesn't.

@VoVAllen
Copy link
Member

VoVAllen commented Feb 6, 2025

Can you add an example.py and we merge this PR first? @kemingy

@kemingy
Copy link
Member Author

kemingy commented Feb 6, 2025

Can you add an example.py and we merge this PR first? @kemingy

Sure.

Signed-off-by: Keming <kemingyang@tensorchord.ai>
Signed-off-by: Keming <kemingyang@tensorchord.ai>
@kemingy
Copy link
Member Author

kemingy commented Feb 10, 2025

@VoVAllen shall we merge this PR?

@VoVAllen
Copy link
Member

Let's merge it first.

@kemingy kemingy merged commit 0ba8ee2 into main Feb 10, 2025
2 checks passed
@kemingy kemingy deleted the augment branch February 10, 2025 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants